All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] Zbb string optimizations and call support in alternatives
@ 2022-11-10 16:49 Heiko Stuebner
  2022-11-10 16:49 ` [PATCH 1/7] efi/riscv: libstub: mark when compiling libstub Heiko Stuebner
                   ` (6 more replies)
  0 siblings, 7 replies; 51+ messages in thread
From: Heiko Stuebner @ 2022-11-10 16:49 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: christoph.muellner, prabhakar.csengg, conor, philipp.tomsich,
	ajones, heiko, emil.renner.berthing, Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

The Zbb extension can be used to make string functions run a lot
faster.

To allow There are essentially two problems to solve:
- making it possible for str* functions to replace what they do
  in a performant way

  This is done by inlining the core functions and then
  using alternatives to call the actual variant.

  This of course will need a more intelligent selection mechanism
  down the road when more variants may exist using different
  available extensions.

- actually allowing calls in alternatives
  Function calls use auipc + jalr to reach those 32bit relative
  addresses but when they're compiled the offset will be wrong
  as alternatives live in a different section. So when the patch
  gets applied the address will point to the wrong location.

  So similar to arm64 the target addresses need to be updated.

  This is probably also helpful for other things needing more
  complex code in alternatives.


In my half-scientific test-case of running the functions in question
on a 95 character string in a loop of 10000 iterations, the Zbb
variants shave off around 2/3 of the original runtime.


changes since rfc:
- make Zbb code actually work
- drop some unneeded patches
- a lot of cleanups


Heiko Stuebner (7):
  efi/riscv: libstub: mark when compiling libstub
  RISC-V: add auipc elements to parse_asm header
  RISC-V: add U-type imm parsing to parse_asm header
  RISC-V: add rd reg parsing to parse_asm header
  RISC-V: fix auipc-jalr addresses in patched alternatives
  RISC-V: add infrastructure to allow different str* implementations
  RISC-V: add zbb support to string functions

 arch/riscv/Kconfig                    |  23 ++++++
 arch/riscv/include/asm/errata_list.h  |   3 +-
 arch/riscv/include/asm/hwcap.h        |   1 +
 arch/riscv/include/asm/parse_asm.h    |  21 +++++
 arch/riscv/include/asm/string.h       |  83 ++++++++++++++++++++
 arch/riscv/kernel/cpu.c               |   1 +
 arch/riscv/kernel/cpufeature.c        |  97 ++++++++++++++++++++++-
 arch/riscv/kernel/image-vars.h        |   6 +-
 arch/riscv/lib/Makefile               |   6 ++
 arch/riscv/lib/strcmp.S               |  39 ++++++++++
 arch/riscv/lib/strcmp_zbb.S           |  91 ++++++++++++++++++++++
 arch/riscv/lib/strlen.S               |  29 +++++++
 arch/riscv/lib/strlen_zbb.S           |  98 ++++++++++++++++++++++++
 arch/riscv/lib/strncmp.S              |  41 ++++++++++
 arch/riscv/lib/strncmp_zbb.S          | 106 ++++++++++++++++++++++++++
 drivers/firmware/efi/libstub/Makefile |   2 +-
 16 files changed, 640 insertions(+), 7 deletions(-)
 create mode 100644 arch/riscv/lib/strcmp.S
 create mode 100644 arch/riscv/lib/strcmp_zbb.S
 create mode 100644 arch/riscv/lib/strlen.S
 create mode 100644 arch/riscv/lib/strlen_zbb.S
 create mode 100644 arch/riscv/lib/strncmp.S
 create mode 100644 arch/riscv/lib/strncmp_zbb.S

-- 
2.35.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 1/7] efi/riscv: libstub: mark when compiling libstub
  2022-11-10 16:49 [PATCH 0/7] Zbb string optimizations and call support in alternatives Heiko Stuebner
@ 2022-11-10 16:49 ` Heiko Stuebner
  2022-11-13 17:16   ` Conor Dooley
  2022-11-10 16:49 ` [PATCH 2/7] RISC-V: add auipc elements to parse_asm header Heiko Stuebner
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 51+ messages in thread
From: Heiko Stuebner @ 2022-11-10 16:49 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: christoph.muellner, prabhakar.csengg, conor, philipp.tomsich,
	ajones, heiko, emil.renner.berthing, Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

We want to runtime-optimize some core functions (str*, mem*)
but not have this leak into libstub. Instead libstub
for the short while it's running should just use the generic
implementation.

To be able to determine whether functions are getting compiled
as part of libstub or not, add a compile-flag we can check
via #ifdef.

Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 drivers/firmware/efi/libstub/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index b1601aad7e1a..39c8e3da1937 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -25,7 +25,7 @@ cflags-$(CONFIG_ARM)		:= $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
 				   -fno-builtin -fpic \
 				   $(call cc-option,-mno-single-pic-base)
 cflags-$(CONFIG_RISCV)		:= $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
-				   -fpic
+				   -fpic -DRISCV_EFISTUB
 cflags-$(CONFIG_LOONGARCH)	:= $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
 				   -fpie
 
-- 
2.35.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 2/7] RISC-V: add auipc elements to parse_asm header
  2022-11-10 16:49 [PATCH 0/7] Zbb string optimizations and call support in alternatives Heiko Stuebner
  2022-11-10 16:49 ` [PATCH 1/7] efi/riscv: libstub: mark when compiling libstub Heiko Stuebner
@ 2022-11-10 16:49 ` Heiko Stuebner
  2022-11-13 17:18   ` Conor Dooley
  2022-11-10 16:49 ` [PATCH 3/7] RISC-V: add U-type imm parsing " Heiko Stuebner
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 51+ messages in thread
From: Heiko Stuebner @ 2022-11-10 16:49 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: christoph.muellner, prabhakar.csengg, conor, philipp.tomsich,
	ajones, heiko, emil.renner.berthing, Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

We will want to use the opcode parsing outside kdb as well and need
at least the auipc element there.

Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/include/asm/parse_asm.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/riscv/include/asm/parse_asm.h b/arch/riscv/include/asm/parse_asm.h
index f36368de839f..c287c1426aa7 100644
--- a/arch/riscv/include/asm/parse_asm.h
+++ b/arch/riscv/include/asm/parse_asm.h
@@ -100,6 +100,7 @@
 #define RVC_C2_RD_OPOFF		7
 
 /* parts of opcode for RVG*/
+#define OPCODE_AUIPC		0x17
 #define OPCODE_BRANCH		0x63
 #define OPCODE_JALR		0x67
 #define OPCODE_JAL		0x6f
@@ -129,6 +130,7 @@
 
 #define FUNCT12_SRET		0x10200000
 
+#define MATCH_AUIPC		(OPCODE_AUIPC)
 #define MATCH_JALR		(FUNCT3_JALR | OPCODE_JALR)
 #define MATCH_JAL		(OPCODE_JAL)
 #define MATCH_BEQ		(FUNCT3_BEQ | OPCODE_BRANCH)
@@ -145,6 +147,7 @@
 #define MATCH_C_JR		(FUNCT4_C_JR | OPCODE_C_2)
 #define MATCH_C_JALR		(FUNCT4_C_JALR | OPCODE_C_2)
 
+#define MASK_AUIPC		0x7f
 #define MASK_JALR		0x707f
 #define MASK_JAL		0x7f
 #define MASK_C_JALR		0xf07f
-- 
2.35.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 3/7] RISC-V: add U-type imm parsing to parse_asm header
  2022-11-10 16:49 [PATCH 0/7] Zbb string optimizations and call support in alternatives Heiko Stuebner
  2022-11-10 16:49 ` [PATCH 1/7] efi/riscv: libstub: mark when compiling libstub Heiko Stuebner
  2022-11-10 16:49 ` [PATCH 2/7] RISC-V: add auipc elements to parse_asm header Heiko Stuebner
@ 2022-11-10 16:49 ` Heiko Stuebner
  2022-11-13 19:06   ` Conor Dooley
  2022-11-10 16:49 ` [PATCH 4/7] RISC-V: add rd reg " Heiko Stuebner
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 51+ messages in thread
From: Heiko Stuebner @ 2022-11-10 16:49 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: christoph.muellner, prabhakar.csengg, conor, philipp.tomsich,
	ajones, heiko, emil.renner.berthing, Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Similar to other existing types, allow extracting the immediate
for a U-type instruction.

U-type immediates are special in that regard, that the value
in the instruction in bits [31:12] already represents the same
bits of the immediate, so no shifting is required.

Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/include/asm/parse_asm.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/riscv/include/asm/parse_asm.h b/arch/riscv/include/asm/parse_asm.h
index c287c1426aa7..939ede0ee527 100644
--- a/arch/riscv/include/asm/parse_asm.h
+++ b/arch/riscv/include/asm/parse_asm.h
@@ -25,6 +25,15 @@
 #define J_IMM_11_MASK		GENMASK(0, 0)
 #define J_IMM_19_12_MASK	GENMASK(7, 0)
 
+/*
+ * U-type IMMs contain the upper 20bits [31:20] of an immediate with
+ * the rest filled in by zeros, so no shifting required. Similarly,
+ * bit31 contains the signed state, so no sign extension necessary.
+ */
+#define U_IMM_SIGN_OPOFF	31
+#define U_IMM_31_12_OPOFF	0
+#define U_IMM_31_12_MASK	GENMASK(31, 12)
+
 /* The bit field of immediate value in B-type instruction */
 #define B_IMM_SIGN_OPOFF	31
 #define B_IMM_10_5_OPOFF	25
@@ -183,6 +192,10 @@ static inline bool is_ ## INSN_NAME ## _insn(long insn) \
 #define RV_X(X, s, mask)  (((X) >> (s)) & (mask))
 #define RVC_X(X, s, mask) RV_X(X, s, mask)
 
+#define EXTRACT_UTYPE_IMM(x) \
+	({typeof(x) x_ = (x); \
+	(RV_X(x_, U_IMM_31_12_OPOFF, U_IMM_31_12_MASK)); })
+
 #define EXTRACT_JTYPE_IMM(x) \
 	({typeof(x) x_ = (x); \
 	(RV_X(x_, J_IMM_10_1_OPOFF, J_IMM_10_1_MASK) << J_IMM_10_1_OFF) | \
-- 
2.35.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 4/7] RISC-V: add rd reg parsing to parse_asm header
  2022-11-10 16:49 [PATCH 0/7] Zbb string optimizations and call support in alternatives Heiko Stuebner
                   ` (2 preceding siblings ...)
  2022-11-10 16:49 ` [PATCH 3/7] RISC-V: add U-type imm parsing " Heiko Stuebner
@ 2022-11-10 16:49 ` Heiko Stuebner
  2022-11-13 19:08   ` Conor Dooley
  2022-11-10 16:49 ` [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives Heiko Stuebner
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 51+ messages in thread
From: Heiko Stuebner @ 2022-11-10 16:49 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: christoph.muellner, prabhakar.csengg, conor, philipp.tomsich,
	ajones, heiko, emil.renner.berthing, Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Add a macro to allow parsing of the rd register from an instruction.

Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/include/asm/parse_asm.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/riscv/include/asm/parse_asm.h b/arch/riscv/include/asm/parse_asm.h
index 939ede0ee527..305b15f7d41c 100644
--- a/arch/riscv/include/asm/parse_asm.h
+++ b/arch/riscv/include/asm/parse_asm.h
@@ -51,6 +51,7 @@
 #define RVG_RS1_OPOFF		15
 #define RVG_RS2_OPOFF		20
 #define RVG_RD_OPOFF		7
+#define RVG_RD_MASK		GENMASK(4, 0)
 
 /* The bit field of immediate value in RVC J instruction */
 #define RVC_J_IMM_SIGN_OPOFF	12
@@ -192,6 +193,10 @@ static inline bool is_ ## INSN_NAME ## _insn(long insn) \
 #define RV_X(X, s, mask)  (((X) >> (s)) & (mask))
 #define RVC_X(X, s, mask) RV_X(X, s, mask)
 
+#define EXTRACT_RD_REG(x) \
+	({typeof(x) x_ = (x); \
+	(RV_X(x_, RVG_RD_OPOFF, RVG_RD_MASK)); })
+
 #define EXTRACT_UTYPE_IMM(x) \
 	({typeof(x) x_ = (x); \
 	(RV_X(x_, U_IMM_31_12_OPOFF, U_IMM_31_12_MASK)); })
-- 
2.35.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives
  2022-11-10 16:49 [PATCH 0/7] Zbb string optimizations and call support in alternatives Heiko Stuebner
                   ` (3 preceding siblings ...)
  2022-11-10 16:49 ` [PATCH 4/7] RISC-V: add rd reg " Heiko Stuebner
@ 2022-11-10 16:49 ` Heiko Stuebner
  2022-11-13 20:31   ` Conor Dooley
                     ` (3 more replies)
  2022-11-10 16:49 ` [PATCH 6/7] RISC-V: add infrastructure to allow different str* implementations Heiko Stuebner
  2022-11-10 16:49 ` [PATCH 7/7] RISC-V: add zbb support to string functions Heiko Stuebner
  6 siblings, 4 replies; 51+ messages in thread
From: Heiko Stuebner @ 2022-11-10 16:49 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: christoph.muellner, prabhakar.csengg, conor, philipp.tomsich,
	ajones, heiko, emil.renner.berthing, Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Alternatives live in a different section, so addresses used by call
functions will point to wrong locations after the patch got applied.

Similar to arm64, adjust the location to consider that offset.

Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/kernel/cpufeature.c | 79 +++++++++++++++++++++++++++++++++-
 1 file changed, 77 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 694267d1fe81..026512ca9c4c 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -298,6 +298,74 @@ static u32 __init_or_module cpufeature_probe(unsigned int stage)
 	return cpu_req_feature;
 }
 
+#include <asm/parse_asm.h>
+
+DECLARE_INSN(jalr, MATCH_JALR, MASK_JALR)
+DECLARE_INSN(auipc, MATCH_AUIPC, MASK_AUIPC)
+
+static inline bool is_auipc_jalr_pair(long insn1, long insn2)
+{
+	return is_auipc_insn(insn1) && is_jalr_insn(insn2);
+}
+
+#define JALR_SIGN_MASK		BIT(I_IMM_SIGN_OPOFF - I_IMM_11_0_OPOFF)
+#define JALR_OFFSET_MASK	I_IMM_11_0_MASK
+#define AUIPC_OFFSET_MASK	U_IMM_31_12_MASK
+#define AUIPC_PAD		(0x00001000)
+#define JALR_SHIFT		I_IMM_11_0_OPOFF
+
+#define to_jalr_imm(offset)						\
+	((offset & I_IMM_11_0_MASK) << I_IMM_11_0_OPOFF)
+
+#define to_auipc_imm(offset)						\
+	((offset & JALR_SIGN_MASK) ?					\
+	((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) :	\
+	(offset & AUIPC_OFFSET_MASK))
+
+static void riscv_alternative_fix_auipc_jalr(unsigned int *alt_ptr,
+					     unsigned int len, int patch_offset)
+{
+	int num_instr = len / sizeof(u32);
+	unsigned int call[2];
+	int i;
+	int imm1;
+	u32 rd1;
+
+	for (i = 0; i < num_instr; i++) {
+		/* is there a further instruction? */
+		if (i + 1 >= num_instr)
+			continue;
+
+		if (!is_auipc_jalr_pair(*(alt_ptr + i), *(alt_ptr + i + 1)))
+			continue;
+
+		/* call will use ra register */
+		rd1 = EXTRACT_RD_REG(*(alt_ptr + i));
+		if (rd1 != 1)
+			continue;
+
+		/* get and adjust new target address */
+		imm1 = EXTRACT_UTYPE_IMM(*(alt_ptr + i));
+		imm1 += EXTRACT_ITYPE_IMM(*(alt_ptr + i + 1));
+		imm1 -= patch_offset;
+
+		/* pick the original auipc + jalr */
+		call[0] = *(alt_ptr + i);
+		call[1] = *(alt_ptr + i + 1);
+
+		/* drop the old IMMs */
+		call[0] &= ~(U_IMM_31_12_MASK);
+		call[1] &= ~(I_IMM_11_0_MASK << I_IMM_11_0_OPOFF);
+
+		/* add the adapted IMMs */
+		call[0] |= to_auipc_imm(imm1);
+		call[1] |= to_jalr_imm(imm1);
+
+		/* patch the call place again */
+		patch_text_nosync(alt_ptr + i * sizeof(u32), call, 8);
+	}
+}
+
 void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
 						  struct alt_entry *end,
 						  unsigned int stage)
@@ -316,8 +384,15 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
 		}
 
 		tmp = (1U << alt->errata_id);
-		if (cpu_req_feature & tmp)
-			patch_text_nosync(alt->old_ptr, alt->alt_ptr, alt->alt_len);
+		if (cpu_req_feature & tmp) {
+			/* do the basic patching */
+			patch_text_nosync(alt->old_ptr, alt->alt_ptr,
+					  alt->alt_len);
+
+			riscv_alternative_fix_auipc_jalr(alt->old_ptr,
+							 alt->alt_len,
+							 alt->old_ptr - alt->alt_ptr);
+		}
 	}
 }
 #endif
-- 
2.35.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 6/7] RISC-V: add infrastructure to allow different str* implementations
  2022-11-10 16:49 [PATCH 0/7] Zbb string optimizations and call support in alternatives Heiko Stuebner
                   ` (4 preceding siblings ...)
  2022-11-10 16:49 ` [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives Heiko Stuebner
@ 2022-11-10 16:49 ` Heiko Stuebner
  2022-11-13 22:07   ` Conor Dooley
  2022-11-10 16:49 ` [PATCH 7/7] RISC-V: add zbb support to string functions Heiko Stuebner
  6 siblings, 1 reply; 51+ messages in thread
From: Heiko Stuebner @ 2022-11-10 16:49 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: christoph.muellner, prabhakar.csengg, conor, philipp.tomsich,
	ajones, heiko, emil.renner.berthing, Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Depending on supported extensions on specific RISC-V cores,
optimized str* functions might make sense.

This adds basic infrastructure to allow patching the function calls
via alternatives later on.

The main idea is to have the core str* functions be inline functions
which then call the most optimized variant and this call then be
replaced via alternatives.

The big advantage is that we don't need additional calls.
Though we need to duplicate the generic functions as the main code
expects either itself or the architecture to provide the str* functions.

The added *_generic functions are done in assembler (taken from
disassembling the main-kernel functions for now) to allow us to control
the used registers.

Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/include/asm/string.h | 66 +++++++++++++++++++++++++++++++++
 arch/riscv/kernel/image-vars.h  |  6 +--
 arch/riscv/lib/Makefile         |  3 ++
 arch/riscv/lib/strcmp.S         | 39 +++++++++++++++++++
 arch/riscv/lib/strlen.S         | 29 +++++++++++++++
 arch/riscv/lib/strncmp.S        | 41 ++++++++++++++++++++
 6 files changed, 181 insertions(+), 3 deletions(-)
 create mode 100644 arch/riscv/lib/strcmp.S
 create mode 100644 arch/riscv/lib/strlen.S
 create mode 100644 arch/riscv/lib/strncmp.S

diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h
index 909049366555..b98481d2d154 100644
--- a/arch/riscv/include/asm/string.h
+++ b/arch/riscv/include/asm/string.h
@@ -18,6 +18,72 @@ extern asmlinkage void *__memcpy(void *, const void *, size_t);
 #define __HAVE_ARCH_MEMMOVE
 extern asmlinkage void *memmove(void *, const void *, size_t);
 extern asmlinkage void *__memmove(void *, const void *, size_t);
+
+#define __HAVE_ARCH_STRCMP
+extern asmlinkage int __strcmp_generic(const char *cs, const char *ct);
+
+static inline int strcmp(const char *cs, const char *ct)
+{
+#ifdef RISCV_EFISTUB
+	return __strcmp_generic(cs, ct);
+#else
+	register const char *a0 asm("a0") = cs;
+	register const char *a1 asm("a1") = ct;
+	register int a0_out asm("a0");
+
+	asm volatile("call __strcmp_generic\n\t"
+		: "=r"(a0_out)
+		: "r"(a0), "r"(a1)
+		: "ra", "t0", "t1", "t2");
+
+	return a0_out;
+#endif
+}
+
+#define __HAVE_ARCH_STRNCMP
+extern asmlinkage int __strncmp_generic(const char *cs,
+					const char *ct, size_t count);
+
+static inline int strncmp(const char *cs, const char *ct, size_t count)
+{
+#ifdef RISCV_EFISTUB
+	return __strncmp_generic(cs, ct, count);
+#else
+	register const char *a0 asm("a0") = cs;
+	register const char *a1 asm("a1") = ct;
+	register size_t a2 asm("a2") = count;
+	register int a0_out asm("a0");
+
+	asm volatile("call __strncmp_generic\n\t"
+		: "=r"(a0_out)
+		: "r"(a0), "r"(a1), "r"(a2)
+		: "ra", "t0", "t1", "t2");
+
+	return a0_out;
+#endif
+}
+
+#define __HAVE_ARCH_STRLEN
+extern asmlinkage __kernel_size_t __strlen_generic(const char *);
+
+static inline __kernel_size_t strlen(const char *s)
+{
+#ifdef RISCV_EFISTUB
+	return __strlen_generic(s);
+#else
+	register const char *a0 asm("a0") = s;
+	register int a0_out asm("a0");
+
+	asm volatile(
+		"call __strlen_generic\n\t"
+		: "=r"(a0_out)
+		: "r"(a0)
+		: "ra", "t0", "t1");
+
+	return a0_out;
+#endif
+}
+
 /* For those files which don't want to check by kasan. */
 #if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
 #define memcpy(dst, src, len) __memcpy(dst, src, len)
diff --git a/arch/riscv/kernel/image-vars.h b/arch/riscv/kernel/image-vars.h
index d6e5f739905e..2f270b9fde63 100644
--- a/arch/riscv/kernel/image-vars.h
+++ b/arch/riscv/kernel/image-vars.h
@@ -25,10 +25,10 @@
  */
 __efistub_memcmp		= memcmp;
 __efistub_memchr		= memchr;
-__efistub_strlen		= strlen;
+__efistub___strlen_generic	= __strlen_generic;
 __efistub_strnlen		= strnlen;
-__efistub_strcmp		= strcmp;
-__efistub_strncmp		= strncmp;
+__efistub___strcmp_generic	= __strcmp_generic;
+__efistub___strncmp_generic	= __strncmp_generic;
 __efistub_strrchr		= strrchr;
 
 __efistub__start		= _start;
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index 25d5c9664e57..6c74b0bedd60 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -3,6 +3,9 @@ lib-y			+= delay.o
 lib-y			+= memcpy.o
 lib-y			+= memset.o
 lib-y			+= memmove.o
+lib-y			+= strcmp.o
+lib-y			+= strlen.o
+lib-y			+= strncmp.o
 lib-$(CONFIG_MMU)	+= uaccess.o
 lib-$(CONFIG_64BIT)	+= tishift.o
 
diff --git a/arch/riscv/lib/strcmp.S b/arch/riscv/lib/strcmp.S
new file mode 100644
index 000000000000..f23a5c5e39d8
--- /dev/null
+++ b/arch/riscv/lib/strcmp.S
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+#include <asm-generic/export.h>
+
+/* int __strcmp_generic(const char *cs, const char *ct) */
+ENTRY(__strcmp_generic)
+	/*
+	 * Returns
+	 *   a0 - comparison result, like strncmp
+	 *
+	 * Parameters
+	 *   a0 - string1
+	 *   a1 - string2
+	 *   a2 - number of characters to compare
+	 *
+	 * Clobbers
+	 *   t0, t1, t2, t3, t4, t5
+	 */
+	mv	t2, a1
+1:
+	lbu	t1, 0(a0)
+	lbu	t0, 0(a1)
+	addi	a0, a0, 1
+	addi	a1, a1, 1
+	beq	t1, t0, 3f
+	li	a0, 1
+	bgeu	t1, t0, 2f
+	li	a0, -1
+2:
+	mv	a1, t2
+	ret
+3:
+	bnez	t1, 1b
+	li	a0, 0
+	j	2b
+END(__strcmp_generic)
+EXPORT_SYMBOL(__strcmp_generic)
diff --git a/arch/riscv/lib/strlen.S b/arch/riscv/lib/strlen.S
new file mode 100644
index 000000000000..e0e7440ac724
--- /dev/null
+++ b/arch/riscv/lib/strlen.S
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+#include <asm-generic/export.h>
+
+/* int __strlen_generic(const char *s) */
+ENTRY(__strlen_generic)
+	/*
+	 * Returns
+	 *   a0 - string length
+	 *
+	 * Parameters
+	 *   a0 - String to measure
+	 *
+	 * Clobbers:
+	 *   t0, t1
+	 */
+	mv	t1, a0
+1:
+	lbu	t0, 0(t1)
+	bnez	t0, 2f
+	sub	a0, t1, a0
+	ret
+2:
+	addi	t1, t1, 1
+	j	1b
+END(__strlen_generic)
+EXPORT_SYMBOL(__strlen_generic)
diff --git a/arch/riscv/lib/strncmp.S b/arch/riscv/lib/strncmp.S
new file mode 100644
index 000000000000..8d271cd0df72
--- /dev/null
+++ b/arch/riscv/lib/strncmp.S
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+#include <asm-generic/export.h>
+
+/* int __strncmp_generic(const char *cs, const char *ct, size_t count) */
+ENTRY(__strncmp_generic)
+	/*
+	 * Returns
+	 *   a0 - comparison result, like strncmp
+	 *
+	 * Parameters
+	 *   a0 - string1
+	 *   a1 - string2
+	 *   a2 - number of characters to compare
+	 *
+	 * Clobbers
+	 *   t0, t1, t2
+	 */
+	li	t0, 0
+1:
+	beq	a2, t0, 4f
+	add	t1, a0, t0
+	add	t2, a1, t0
+	lbu	t1, 0(t1)
+	lbu	t2, 0(t2)
+	beq	t1, t2, 3f
+	li	a0, 1
+	bgeu	t1, t2, 2f
+	li	a0, -1
+2:
+	ret
+3:
+	addi	t0, t0, 1
+	bnez	t1, 1b
+4:
+	li	a0, 0
+	j	2b
+END(__strncmp_generic)
+EXPORT_SYMBOL(__strncmp_generic)
-- 
2.35.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 7/7] RISC-V: add zbb support to string functions
  2022-11-10 16:49 [PATCH 0/7] Zbb string optimizations and call support in alternatives Heiko Stuebner
                   ` (5 preceding siblings ...)
  2022-11-10 16:49 ` [PATCH 6/7] RISC-V: add infrastructure to allow different str* implementations Heiko Stuebner
@ 2022-11-10 16:49 ` Heiko Stuebner
  2022-11-13 23:29   ` Conor Dooley
  6 siblings, 1 reply; 51+ messages in thread
From: Heiko Stuebner @ 2022-11-10 16:49 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: christoph.muellner, prabhakar.csengg, conor, philipp.tomsich,
	ajones, heiko, emil.renner.berthing, Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Add handling for ZBB extension and add support for using it as a
variant for optimized string functions.

Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/Kconfig                   |  23 ++++++
 arch/riscv/include/asm/errata_list.h |   3 +-
 arch/riscv/include/asm/hwcap.h       |   1 +
 arch/riscv/include/asm/string.h      |  29 ++++++--
 arch/riscv/kernel/cpu.c              |   1 +
 arch/riscv/kernel/cpufeature.c       |  18 +++++
 arch/riscv/lib/Makefile              |   3 +
 arch/riscv/lib/strcmp_zbb.S          |  91 +++++++++++++++++++++++
 arch/riscv/lib/strlen_zbb.S          |  98 +++++++++++++++++++++++++
 arch/riscv/lib/strncmp_zbb.S         | 106 +++++++++++++++++++++++++++
 10 files changed, 366 insertions(+), 7 deletions(-)
 create mode 100644 arch/riscv/lib/strcmp_zbb.S
 create mode 100644 arch/riscv/lib/strlen_zbb.S
 create mode 100644 arch/riscv/lib/strncmp_zbb.S

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index acfc4d298aab..56633931e808 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -411,6 +411,29 @@ config RISCV_ISA_SVPBMT
 
 	   If you don't know what to do here, say Y.
 
+config TOOLCHAIN_HAS_ZBB
+	bool
+	default y
+	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zbb)
+	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zbb)
+	depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900
+
+config RISCV_ISA_ZBB
+	bool "Zbb extension support for "
+	depends on TOOLCHAIN_HAS_ZBB
+	depends on !XIP_KERNEL && MMU
+	select RISCV_ALTERNATIVE
+	default y
+	help
+	   Adds support to dynamically detect the presence of the ZBB
+	   extension (basic bit manipulation) and enable its usage.
+
+	   The Zbb extension provides instructions to accelerate a number
+	   of bit-specific operations (count bit population, sign extending,
+	   bitrotation, etc).
+
+	   If you don't know what to do here, say Y.
+
 config TOOLCHAIN_HAS_ZICBOM
 	bool
 	default y
diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
index 4180312d2a70..95e626b7281e 100644
--- a/arch/riscv/include/asm/errata_list.h
+++ b/arch/riscv/include/asm/errata_list.h
@@ -24,7 +24,8 @@
 
 #define	CPUFEATURE_SVPBMT 0
 #define	CPUFEATURE_ZICBOM 1
-#define	CPUFEATURE_NUMBER 2
+#define	CPUFEATURE_ZBB 2
+#define	CPUFEATURE_NUMBER 3
 
 #ifdef __ASSEMBLY__
 
diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index b22525290073..ac5555fd9788 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -59,6 +59,7 @@ enum riscv_isa_ext_id {
 	RISCV_ISA_EXT_ZIHINTPAUSE,
 	RISCV_ISA_EXT_SSTC,
 	RISCV_ISA_EXT_SVINVAL,
+	RISCV_ISA_EXT_ZBB,
 	RISCV_ISA_EXT_ID_MAX = RISCV_ISA_EXT_MAX,
 };
 
diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h
index b98481d2d154..806c402c874e 100644
--- a/arch/riscv/include/asm/string.h
+++ b/arch/riscv/include/asm/string.h
@@ -6,6 +6,8 @@
 #ifndef _ASM_RISCV_STRING_H
 #define _ASM_RISCV_STRING_H
 
+#include <asm/alternative-macros.h>
+#include <asm/errata_list.h>
 #include <linux/types.h>
 #include <linux/linkage.h>
 
@@ -21,6 +23,7 @@ extern asmlinkage void *__memmove(void *, const void *, size_t);
 
 #define __HAVE_ARCH_STRCMP
 extern asmlinkage int __strcmp_generic(const char *cs, const char *ct);
+extern asmlinkage int __strcmp_zbb(const char *cs, const char *ct);
 
 static inline int strcmp(const char *cs, const char *ct)
 {
@@ -31,10 +34,14 @@ static inline int strcmp(const char *cs, const char *ct)
 	register const char *a1 asm("a1") = ct;
 	register int a0_out asm("a0");
 
-	asm volatile("call __strcmp_generic\n\t"
+	asm volatile(
+		ALTERNATIVE(
+			"call __strcmp_generic\n\t",
+			"call __strcmp_zbb\n\t",
+			0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)
 		: "=r"(a0_out)
 		: "r"(a0), "r"(a1)
-		: "ra", "t0", "t1", "t2");
+		: "ra", "t0", "t1", "t2", "t3", "t4", "t5");
 
 	return a0_out;
 #endif
@@ -43,6 +50,8 @@ static inline int strcmp(const char *cs, const char *ct)
 #define __HAVE_ARCH_STRNCMP
 extern asmlinkage int __strncmp_generic(const char *cs,
 					const char *ct, size_t count);
+extern asmlinkage int __strncmp_zbb(const char *cs,
+				    const char *ct, size_t count);
 
 static inline int strncmp(const char *cs, const char *ct, size_t count)
 {
@@ -54,10 +63,14 @@ static inline int strncmp(const char *cs, const char *ct, size_t count)
 	register size_t a2 asm("a2") = count;
 	register int a0_out asm("a0");
 
-	asm volatile("call __strncmp_generic\n\t"
+	asm volatile(
+		ALTERNATIVE(
+			"call __strncmp_generic\n\t",
+			"call __strncmp_zbb\n\t",
+			0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)
 		: "=r"(a0_out)
 		: "r"(a0), "r"(a1), "r"(a2)
-		: "ra", "t0", "t1", "t2");
+		: "ra", "t0", "t1", "t2", "t3", "t4", "t5", "t6");
 
 	return a0_out;
 #endif
@@ -65,6 +78,7 @@ static inline int strncmp(const char *cs, const char *ct, size_t count)
 
 #define __HAVE_ARCH_STRLEN
 extern asmlinkage __kernel_size_t __strlen_generic(const char *);
+extern asmlinkage __kernel_size_t __strlen_zbb(const char *);
 
 static inline __kernel_size_t strlen(const char *s)
 {
@@ -75,10 +89,13 @@ static inline __kernel_size_t strlen(const char *s)
 	register int a0_out asm("a0");
 
 	asm volatile(
-		"call __strlen_generic\n\t"
+		ALTERNATIVE(
+			"call __strlen_generic\n\t",
+			"call __strlen_zbb\n\t",
+			0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)
 		: "=r"(a0_out)
 		: "r"(a0)
-		: "ra", "t0", "t1");
+		: "ra", "t0", "t1", "t2", "t3");
 
 	return a0_out;
 #endif
diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index bf9dd6764bad..66ff36a57e20 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -166,6 +166,7 @@ static struct riscv_isa_ext_data isa_ext_arr[] = {
 	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
 	__RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
 	__RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
+	__RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
 	__RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
 	__RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
 	__RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 026512ca9c4c..f19b9d4e2dca 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -201,6 +201,7 @@ void __init riscv_fill_hwcap(void)
 			} else {
 				SET_ISA_EXT_MAP("sscofpmf", RISCV_ISA_EXT_SSCOFPMF);
 				SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
+				SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB);
 				SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
 				SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE);
 				SET_ISA_EXT_MAP("sstc", RISCV_ISA_EXT_SSTC);
@@ -278,6 +279,20 @@ static bool __init_or_module cpufeature_probe_zicbom(unsigned int stage)
 	return true;
 }
 
+static bool __init_or_module cpufeature_probe_zbb(unsigned int stage)
+{
+	if (!IS_ENABLED(CONFIG_RISCV_ISA_ZBB))
+		return false;
+
+	if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
+		return false;
+
+	if (!riscv_isa_extension_available(NULL, ZBB))
+		return false;
+
+	return true;
+}
+
 /*
  * Probe presence of individual extensions.
  *
@@ -295,6 +310,9 @@ static u32 __init_or_module cpufeature_probe(unsigned int stage)
 	if (cpufeature_probe_zicbom(stage))
 		cpu_req_feature |= BIT(CPUFEATURE_ZICBOM);
 
+	if (cpufeature_probe_zbb(stage))
+		cpu_req_feature |= BIT(CPUFEATURE_ZBB);
+
 	return cpu_req_feature;
 }
 
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index 6c74b0bedd60..b632483f851c 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -4,8 +4,11 @@ lib-y			+= memcpy.o
 lib-y			+= memset.o
 lib-y			+= memmove.o
 lib-y			+= strcmp.o
+lib-$(CONFIG_RISCV_ISA_ZBB) += strcmp_zbb.o
 lib-y			+= strlen.o
+lib-$(CONFIG_RISCV_ISA_ZBB) += strlen_zbb.o
 lib-y			+= strncmp.o
+lib-$(CONFIG_RISCV_ISA_ZBB) += strncmp_zbb.o
 lib-$(CONFIG_MMU)	+= uaccess.o
 lib-$(CONFIG_64BIT)	+= tishift.o
 
diff --git a/arch/riscv/lib/strcmp_zbb.S b/arch/riscv/lib/strcmp_zbb.S
new file mode 100644
index 000000000000..aff9b941d3ee
--- /dev/null
+++ b/arch/riscv/lib/strcmp_zbb.S
@@ -0,0 +1,91 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2022 VRULL GmbH
+ * Author: Christoph Muellner <christoph.muellner@vrull.eu>
+ */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+#include <asm-generic/export.h>
+
+#define src1		a0
+#define result		a0
+#define src2		t5
+#define data1		t0
+#define data2		t1
+#define align		t2
+#define data1_orcb	t3
+#define m1		t4
+
+.option push
+.option arch,+zbb
+
+/* int __strcmp_zbb(const char *cs, const char *ct) */
+ENTRY(__strcmp_zbb)
+	/*
+	 * Returns
+	 *   a0 - comparison result, like strncmp
+	 *
+	 * Parameters
+	 *   a0 - string1
+	 *   a1 - string2
+	 *   a2 - number of characters to compare
+	 *
+	 * Clobbers
+	 *   t0, t1, t2, t3, t4, t5
+	 */
+	mv	src2, a1
+
+	or	align, src1, src2
+	li	m1, -1
+	and	align, align, SZREG-1
+	bnez	align, 3f
+	/* Main loop for aligned string.  */
+	.p2align 3
+1:
+	REG_L	data1, 0(src1)
+	REG_L	data2, 0(src2)
+	orc.b	data1_orcb, data1
+	bne	data1_orcb, m1, 2f
+	addi	src1, src1, SZREG
+	addi	src2, src2, SZREG
+	beq	data1, data2, 1b
+
+	/* Words don't match, and no null byte in the first
+	 * word. Get bytes in big-endian order and compare.  */
+#ifndef CONFIG_CPU_BIG_ENDIAN
+	rev8	data1, data1
+	rev8	data2, data2
+#endif
+	/* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence.  */
+	sltu	result, data1, data2
+	neg	result, result
+	ori	result, result, 1
+	ret
+
+2:
+	/* Found a null byte.
+	 * If words don't match, fall back to simple loop.  */
+	bne	data1, data2, 3f
+
+	/* Otherwise, strings are equal.  */
+	li	result, 0
+	ret
+
+	/* Simple loop for misaligned strings.  */
+	.p2align 3
+3:
+	lbu	data1, 0(src1)
+	lbu	data2, 0(src2)
+	addi	src1, src1, 1
+	addi	src2, src2, 1
+	bne	data1, data2, 4f
+	bnez	data1, 3b
+
+4:
+	sub	result, data1, data2
+	ret
+END(__strcmp_zbb)
+EXPORT_SYMBOL(__strcmp_zbb)
+
+.option pop
diff --git a/arch/riscv/lib/strlen_zbb.S b/arch/riscv/lib/strlen_zbb.S
new file mode 100644
index 000000000000..bc8d3607a32f
--- /dev/null
+++ b/arch/riscv/lib/strlen_zbb.S
@@ -0,0 +1,98 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2022 VRULL GmbH
+ * Author: Christoph Muellner <christoph.muellner@vrull.eu>
+ */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+#include <asm-generic/export.h>
+
+#define src		a0
+#define result		a0
+#define addr		t0
+#define data		t1
+#define offset		t2
+#define offset_bits	t2
+#define valid_bytes	t3
+#define m1		t3
+
+#ifdef CONFIG_CPU_BIG_ENDIAN
+# define CZ	clz
+# define SHIFT	sll
+#else
+# define CZ	ctz
+# define SHIFT	srl
+#endif
+
+.option push
+.option arch,+zbb
+
+/* int __strlen_zbb(const char *s) */
+ENTRY(__strlen_zbb)
+	/*
+	 * Returns
+	 *   a0 - string length
+	 *
+	 * Parameters
+	 *   a0 - String to measure
+	 *
+	 * Clobbers
+	 *   t0, t1, t2, t3
+	 */
+
+	/* Number of irrelevant bytes in the first word.  */
+	andi	offset, src, SZREG-1
+	/* Align pointer.  */
+	andi	addr, src, -SZREG
+
+	li	valid_bytes, SZREG
+	sub	valid_bytes, valid_bytes, offset
+	slli	offset_bits, offset, RISCV_LGPTR
+
+	/* Get the first word.  */
+	REG_L	data, 0(addr)
+	/* Shift away the partial data we loaded to remove the irrelevant bytes
+	 * preceding the string with the effect of adding NUL bytes at the
+	 * end of the string.  */
+	SHIFT	data, data, offset_bits
+	/* Convert non-NUL into 0xff and NUL into 0x00.  */
+	orc.b	data, data
+	/* Convert non-NUL into 0x00 and NUL into 0xff.  */
+	not	data, data
+	/* Search for the first set bit (corresponding to a NUL byte in the
+	 * original chunk).  */
+	CZ	data, data
+	/* The first chunk is special: commpare against the number
+	 * of valid bytes in this chunk.  */
+	srli	result, data, 3
+	bgtu	valid_bytes, result, 3f
+
+	/* Prepare for the word comparison loop.  */
+	addi	offset, addr, SZREG
+	li	m1, -1
+
+	/* Our critical loop is 4 instructions and processes data in
+	 * 4 byte or 8 byte chunks.  */
+	.p2align 3
+1:
+	REG_L	data, SZREG(addr)
+	addi	addr, addr, SZREG
+	orc.b	data, data
+	beq	data, m1, 1b
+2:
+	not	data, data
+	CZ	data, data
+	/* Get number of processed words.  */
+	sub	offset, addr, offset
+	/* Add number of characters in the first word.  */
+	add	result, result, offset
+	srli	data, data, 3
+	/* Add number of characters in the last word.  */
+	add	result, result, data
+3:
+	ret
+END(__strlen_zbb)
+EXPORT_SYMBOL(__strlen_zbb)
+
+.option pop
diff --git a/arch/riscv/lib/strncmp_zbb.S b/arch/riscv/lib/strncmp_zbb.S
new file mode 100644
index 000000000000..852c8425d238
--- /dev/null
+++ b/arch/riscv/lib/strncmp_zbb.S
@@ -0,0 +1,106 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2022 VRULL GmbH
+ * Author: Christoph Muellner <christoph.muellner@vrull.eu>
+ */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+#include <asm-generic/export.h>
+
+#define src1		a0
+#define result		a0
+#define src2		t6
+#define len		a2
+#define data1		t0
+#define data2		t1
+#define align		t2
+#define data1_orcb	t3
+#define limit		t4
+#define m1		t5
+
+.option push
+.option arch,+zbb
+
+/* int __strncmp_zbb(const char *cs, const char *ct, size_t count) */
+ENTRY(__strncmp_zbb)
+	/*
+	 * Returns
+	 *   a0 - comparison result, like strncmp
+	 *
+	 * Parameters
+	 *   a0 - string1
+	 *   a1 - string2
+	 *   a2 - number of characters to compare
+	 *
+	 * Clobbers
+	 *   t0, t1, t2, t3, t4, t5, t6
+	 */
+	mv	src2, a1
+
+	or	align, src1, src2
+	li	m1, -1
+	and	align, align, SZREG-1
+	add	limit, src1, len
+	bnez	align, 4f
+
+	/* Adjust limit for fast-path.  */
+	addi	limit, limit, -SZREG
+	/* Main loop for aligned string.  */
+	.p2align 3
+1:
+	bgt	src1, limit, 3f
+	REG_L	data1, 0(src1)
+	REG_L	data2, 0(src2)
+	orc.b	data1_orcb, data1
+	bne	data1_orcb, m1, 2f
+	addi	src1, src1, SZREG
+	addi	src2, src2, SZREG
+	beq	data1, data2, 1b
+
+	/* Words don't match, and no null byte in the first
+	 * word. Get bytes in big-endian order and compare.  */
+#ifndef CONFIG_CPU_BIG_ENDIAN
+	rev8	data1, data1
+	rev8	data2, data2
+#endif
+	/* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence.  */
+	sltu	result, data1, data2
+	neg	result, result
+	ori	result, result, 1
+	ret
+
+2:
+	/* Found a null byte.
+	 * If words don't match, fall back to simple loop.  */
+	bne	data1, data2, 3f
+
+	/* Otherwise, strings are equal.  */
+	li	result, 0
+	ret
+
+	/* Simple loop for misaligned strings.  */
+3:
+	/* Restore limit for slow-path.  */
+	addi	limit, limit, SZREG
+	.p2align 3
+4:
+	bge	src1, limit, 6f
+	lbu	data1, 0(src1)
+	lbu	data2, 0(src2)
+	addi	src1, src1, 1
+	addi	src2, src2, 1
+	bne	data1, data2, 5f
+	bnez	data1, 4b
+
+5:
+	sub	result, data1, data2
+	ret
+
+6:
+	li	result, 0
+	ret
+END(__strncmp_zbb)
+EXPORT_SYMBOL(__strncmp_zbb)
+
+.option pop
-- 
2.35.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/7] efi/riscv: libstub: mark when compiling libstub
  2022-11-10 16:49 ` [PATCH 1/7] efi/riscv: libstub: mark when compiling libstub Heiko Stuebner
@ 2022-11-13 17:16   ` Conor Dooley
  2022-11-13 17:20     ` Heiko Stübner
  0 siblings, 1 reply; 51+ messages in thread
From: Conor Dooley @ 2022-11-13 17:16 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, prabhakar.csengg,
	philipp.tomsich, ajones, emil.renner.berthing, Heiko Stuebner

On Thu, Nov 10, 2022 at 05:49:18PM +0100, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> We want to runtime-optimize some core functions (str*, mem*)
> but not have this leak into libstub. Instead libstub
> for the short while it's running should just use the generic

Totally pedantic reword, mostly b/c I am an eejit and confused myself
the first time reading this:

"Instead, libstub, for the short while it's running, should just use
the generic implementation."

> implementation.
> 
> To be able to determine whether functions are getting compiled
> as part of libstub or not, add a compile-flag we can check
> via #ifdef.

Exempting the stub makes sense to me given when it runs. What's the
actual downside of not exempting it though?

> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  drivers/firmware/efi/libstub/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index b1601aad7e1a..39c8e3da1937 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -25,7 +25,7 @@ cflags-$(CONFIG_ARM)		:= $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
>  				   -fno-builtin -fpic \
>  				   $(call cc-option,-mno-single-pic-base)
>  cflags-$(CONFIG_RISCV)		:= $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
> -				   -fpic
> +				   -fpic -DRISCV_EFISTUB
>  cflags-$(CONFIG_LOONGARCH)	:= $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
>  				   -fpie
>  
> -- 
> 2.35.1
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/7] RISC-V: add auipc elements to parse_asm header
  2022-11-10 16:49 ` [PATCH 2/7] RISC-V: add auipc elements to parse_asm header Heiko Stuebner
@ 2022-11-13 17:18   ` Conor Dooley
  0 siblings, 0 replies; 51+ messages in thread
From: Conor Dooley @ 2022-11-13 17:18 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, prabhakar.csengg,
	philipp.tomsich, ajones, emil.renner.berthing, Heiko Stuebner

Reviewed-by: Conor Dooley <conor.dooley@microchip.com>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/7] efi/riscv: libstub: mark when compiling libstub
  2022-11-13 17:16   ` Conor Dooley
@ 2022-11-13 17:20     ` Heiko Stübner
  2022-11-13 18:06       ` Conor Dooley
  0 siblings, 1 reply; 51+ messages in thread
From: Heiko Stübner @ 2022-11-13 17:20 UTC (permalink / raw)
  To: Conor Dooley
  Cc: linux-riscv, palmer, christoph.muellner, prabhakar.csengg,
	philipp.tomsich, ajones, emil.renner.berthing

Am Sonntag, 13. November 2022, 18:16:10 CET schrieb Conor Dooley:
> On Thu, Nov 10, 2022 at 05:49:18PM +0100, Heiko Stuebner wrote:
> > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > 
> > We want to runtime-optimize some core functions (str*, mem*)
> > but not have this leak into libstub. Instead libstub
> > for the short while it's running should just use the generic
> 
> Totally pedantic reword, mostly b/c I am an eejit and confused myself
> the first time reading this:
> 
> "Instead, libstub, for the short while it's running, should just use
> the generic implementation."
> 
> > implementation.
> > 
> > To be able to determine whether functions are getting compiled
> > as part of libstub or not, add a compile-flag we can check
> > via #ifdef.
> 
> Exempting the stub makes sense to me given when it runs. What's the
> actual downside of not exempting it though?

You run into build-errors.

I.e. the alternatives (zbb-variants in this case) get compiled
so there is the reference to the _strlen_zbb function in the efi-stub,
which of course is not fullfillable, hence the "magic" in patch6 :-) .


Heiko


> > Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > ---
> >  drivers/firmware/efi/libstub/Makefile | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> > index b1601aad7e1a..39c8e3da1937 100644
> > --- a/drivers/firmware/efi/libstub/Makefile
> > +++ b/drivers/firmware/efi/libstub/Makefile
> > @@ -25,7 +25,7 @@ cflags-$(CONFIG_ARM)		:= $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
> >  				   -fno-builtin -fpic \
> >  				   $(call cc-option,-mno-single-pic-base)
> >  cflags-$(CONFIG_RISCV)		:= $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
> > -				   -fpic
> > +				   -fpic -DRISCV_EFISTUB
> >  cflags-$(CONFIG_LOONGARCH)	:= $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
> >  				   -fpie
> >  
> 





_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/7] efi/riscv: libstub: mark when compiling libstub
  2022-11-13 17:20     ` Heiko Stübner
@ 2022-11-13 18:06       ` Conor Dooley
  0 siblings, 0 replies; 51+ messages in thread
From: Conor Dooley @ 2022-11-13 18:06 UTC (permalink / raw)
  To: Heiko Stübner
  Cc: linux-riscv, palmer, christoph.muellner, prabhakar.csengg,
	philipp.tomsich, ajones, emil.renner.berthing

On Sun, Nov 13, 2022 at 06:20:29PM +0100, Heiko Stübner wrote:
> Am Sonntag, 13. November 2022, 18:16:10 CET schrieb Conor Dooley:
> > On Thu, Nov 10, 2022 at 05:49:18PM +0100, Heiko Stuebner wrote:
> > > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > > 
> > > We want to runtime-optimize some core functions (str*, mem*)
> > > but not have this leak into libstub. Instead libstub
> > > for the short while it's running should just use the generic
> > 
> > Totally pedantic reword, mostly b/c I am an eejit and confused myself
> > the first time reading this:
> > 
> > "Instead, libstub, for the short while it's running, should just use
> > the generic implementation."
> > 
> > > implementation.
> > > 
> > > To be able to determine whether functions are getting compiled
> > > as part of libstub or not, add a compile-flag we can check
> > > via #ifdef.
> > 
> > Exempting the stub makes sense to me given when it runs. What's the
> > actual downside of not exempting it though?
> 
> You run into build-errors.

Ah right, I figured it had to be something more than avoiding some
overhead. Might be worth noting that the purpose is to avoid build
errors.
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>

> 
> I.e. the alternatives (zbb-variants in this case) get compiled
> so there is the reference to the _strlen_zbb function in the efi-stub,
> which of course is not fullfillable, hence the "magic" in patch6 :-) .
> 
> 
> Heiko
> 
> 
> > > Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > > ---
> > >  drivers/firmware/efi/libstub/Makefile | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> > > index b1601aad7e1a..39c8e3da1937 100644
> > > --- a/drivers/firmware/efi/libstub/Makefile
> > > +++ b/drivers/firmware/efi/libstub/Makefile
> > > @@ -25,7 +25,7 @@ cflags-$(CONFIG_ARM)		:= $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
> > >  				   -fno-builtin -fpic \
> > >  				   $(call cc-option,-mno-single-pic-base)
> > >  cflags-$(CONFIG_RISCV)		:= $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
> > > -				   -fpic
> > > +				   -fpic -DRISCV_EFISTUB
> > >  cflags-$(CONFIG_LOONGARCH)	:= $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
> > >  				   -fpie
> > >  
> > 
> 
> 
> 
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/7] RISC-V: add U-type imm parsing to parse_asm header
  2022-11-10 16:49 ` [PATCH 3/7] RISC-V: add U-type imm parsing " Heiko Stuebner
@ 2022-11-13 19:06   ` Conor Dooley
  0 siblings, 0 replies; 51+ messages in thread
From: Conor Dooley @ 2022-11-13 19:06 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, prabhakar.csengg,
	philipp.tomsich, ajones, emil.renner.berthing, Heiko Stuebner

Reviewed-by: Conor Dooley <conor.dooley@microchip.com>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/7] RISC-V: add rd reg parsing to parse_asm header
  2022-11-10 16:49 ` [PATCH 4/7] RISC-V: add rd reg " Heiko Stuebner
@ 2022-11-13 19:08   ` Conor Dooley
  0 siblings, 0 replies; 51+ messages in thread
From: Conor Dooley @ 2022-11-13 19:08 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, prabhakar.csengg,
	philipp.tomsich, ajones, emil.renner.berthing, Heiko Stuebner

Reviewed-by: Conor Dooley <conor.dooley@microchip.com>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives
  2022-11-10 16:49 ` [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives Heiko Stuebner
@ 2022-11-13 20:31   ` Conor Dooley
  2022-11-14 10:57   ` Emil Renner Berthing
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 51+ messages in thread
From: Conor Dooley @ 2022-11-13 20:31 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, prabhakar.csengg,
	philipp.tomsich, ajones, emil.renner.berthing, Heiko Stuebner

On Thu, Nov 10, 2022 at 05:49:22PM +0100, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> Alternatives live in a different section, so addresses used by call
> functions will point to wrong locations after the patch got applied.
> 
> Similar to arm64, adjust the location to consider that offset.
> 
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>

What a lovely function you've got here, seems to make sense though..
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>

> ---
>  arch/riscv/kernel/cpufeature.c | 79 +++++++++++++++++++++++++++++++++-
>  1 file changed, 77 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 694267d1fe81..026512ca9c4c 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -298,6 +298,74 @@ static u32 __init_or_module cpufeature_probe(unsigned int stage)
>  	return cpu_req_feature;
>  }
>  
> +#include <asm/parse_asm.h>
> +
> +DECLARE_INSN(jalr, MATCH_JALR, MASK_JALR)
> +DECLARE_INSN(auipc, MATCH_AUIPC, MASK_AUIPC)
> +
> +static inline bool is_auipc_jalr_pair(long insn1, long insn2)
> +{
> +	return is_auipc_insn(insn1) && is_jalr_insn(insn2);
> +}
> +
> +#define JALR_SIGN_MASK		BIT(I_IMM_SIGN_OPOFF - I_IMM_11_0_OPOFF)
> +#define JALR_OFFSET_MASK	I_IMM_11_0_MASK
> +#define AUIPC_OFFSET_MASK	U_IMM_31_12_MASK
> +#define AUIPC_PAD		(0x00001000)
> +#define JALR_SHIFT		I_IMM_11_0_OPOFF
> +
> +#define to_jalr_imm(offset)						\
> +	((offset & I_IMM_11_0_MASK) << I_IMM_11_0_OPOFF)
> +
> +#define to_auipc_imm(offset)						\
> +	((offset & JALR_SIGN_MASK) ?					\
> +	((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) :	\
> +	(offset & AUIPC_OFFSET_MASK))
> +
> +static void riscv_alternative_fix_auipc_jalr(unsigned int *alt_ptr,
> +					     unsigned int len, int patch_offset)
> +{
> +	int num_instr = len / sizeof(u32);
> +	unsigned int call[2];
> +	int i;
> +	int imm1;
> +	u32 rd1;
> +
> +	for (i = 0; i < num_instr; i++) {
> +		/* is there a further instruction? */
> +		if (i + 1 >= num_instr)
> +			continue;
> +
> +		if (!is_auipc_jalr_pair(*(alt_ptr + i), *(alt_ptr + i + 1)))
> +			continue;
> +
> +		/* call will use ra register */
> +		rd1 = EXTRACT_RD_REG(*(alt_ptr + i));
> +		if (rd1 != 1)
> +			continue;
> +
> +		/* get and adjust new target address */
> +		imm1 = EXTRACT_UTYPE_IMM(*(alt_ptr + i));
> +		imm1 += EXTRACT_ITYPE_IMM(*(alt_ptr + i + 1));
> +		imm1 -= patch_offset;
> +
> +		/* pick the original auipc + jalr */
> +		call[0] = *(alt_ptr + i);
> +		call[1] = *(alt_ptr + i + 1);
> +
> +		/* drop the old IMMs */
> +		call[0] &= ~(U_IMM_31_12_MASK);
> +		call[1] &= ~(I_IMM_11_0_MASK << I_IMM_11_0_OPOFF);
> +
> +		/* add the adapted IMMs */
> +		call[0] |= to_auipc_imm(imm1);
> +		call[1] |= to_jalr_imm(imm1);
> +
> +		/* patch the call place again */
> +		patch_text_nosync(alt_ptr + i * sizeof(u32), call, 8);
> +	}
> +}
> +
>  void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
>  						  struct alt_entry *end,
>  						  unsigned int stage)
> @@ -316,8 +384,15 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
>  		}
>  
>  		tmp = (1U << alt->errata_id);
> -		if (cpu_req_feature & tmp)
> -			patch_text_nosync(alt->old_ptr, alt->alt_ptr, alt->alt_len);
> +		if (cpu_req_feature & tmp) {
> +			/* do the basic patching */
> +			patch_text_nosync(alt->old_ptr, alt->alt_ptr,
> +					  alt->alt_len);
> +
> +			riscv_alternative_fix_auipc_jalr(alt->old_ptr,
> +							 alt->alt_len,
> +							 alt->old_ptr - alt->alt_ptr);
> +		}
>  	}
>  }
>  #endif
> -- 
> 2.35.1
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 6/7] RISC-V: add infrastructure to allow different str* implementations
  2022-11-10 16:49 ` [PATCH 6/7] RISC-V: add infrastructure to allow different str* implementations Heiko Stuebner
@ 2022-11-13 22:07   ` Conor Dooley
  0 siblings, 0 replies; 51+ messages in thread
From: Conor Dooley @ 2022-11-13 22:07 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, prabhakar.csengg,
	philipp.tomsich, ajones, emil.renner.berthing, Heiko Stuebner

On Thu, Nov 10, 2022 at 05:49:23PM +0100, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> Depending on supported extensions on specific RISC-V cores,
> optimized str* functions might make sense.
> 
> This adds basic infrastructure to allow patching the function calls
> via alternatives later on.
> 
> The main idea is to have the core str* functions be inline functions
> which then call the most optimized variant and this call then be
> replaced via alternatives.
> 
> The big advantage is that we don't need additional calls.
> Though we need to duplicate the generic functions as the main code
> expects either itself or the architecture to provide the str* functions.
> 
> The added *_generic functions are done in assembler (taken from
> disassembling the main-kernel functions for now) to allow us to control
> the used registers.
> 
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  arch/riscv/include/asm/string.h | 66 +++++++++++++++++++++++++++++++++
>  arch/riscv/kernel/image-vars.h  |  6 +--
>  arch/riscv/lib/Makefile         |  3 ++
>  arch/riscv/lib/strcmp.S         | 39 +++++++++++++++++++
>  arch/riscv/lib/strlen.S         | 29 +++++++++++++++
>  arch/riscv/lib/strncmp.S        | 41 ++++++++++++++++++++
>  6 files changed, 181 insertions(+), 3 deletions(-)
>  create mode 100644 arch/riscv/lib/strcmp.S
>  create mode 100644 arch/riscv/lib/strlen.S
>  create mode 100644 arch/riscv/lib/strncmp.S
> 
> diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h
> index 909049366555..b98481d2d154 100644
> --- a/arch/riscv/include/asm/string.h
> +++ b/arch/riscv/include/asm/string.h
> @@ -18,6 +18,72 @@ extern asmlinkage void *__memcpy(void *, const void *, size_t);
>  #define __HAVE_ARCH_MEMMOVE
>  extern asmlinkage void *memmove(void *, const void *, size_t);
>  extern asmlinkage void *__memmove(void *, const void *, size_t);
> +
> +#define __HAVE_ARCH_STRCMP
> +extern asmlinkage int __strcmp_generic(const char *cs, const char *ct);
> +
> +static inline int strcmp(const char *cs, const char *ct)
> +{
> +#ifdef RISCV_EFISTUB
> +	return __strcmp_generic(cs, ct);
> +#else
> +	register const char *a0 asm("a0") = cs;
> +	register const char *a1 asm("a1") = ct;
> +	register int a0_out asm("a0");
> +
> +	asm volatile("call __strcmp_generic\n\t"
> +		: "=r"(a0_out)
> +		: "r"(a0), "r"(a1)
> +		: "ra", "t0", "t1", "t2");
> +
> +	return a0_out;
> +#endif
> +}
> +
> +#define __HAVE_ARCH_STRNCMP
> +extern asmlinkage int __strncmp_generic(const char *cs,
> +					const char *ct, size_t count);
> +
> +static inline int strncmp(const char *cs, const char *ct, size_t count)
> +{
> +#ifdef RISCV_EFISTUB
> +	return __strncmp_generic(cs, ct, count);
> +#else
> +	register const char *a0 asm("a0") = cs;
> +	register const char *a1 asm("a1") = ct;
> +	register size_t a2 asm("a2") = count;
> +	register int a0_out asm("a0");
> +
> +	asm volatile("call __strncmp_generic\n\t"
> +		: "=r"(a0_out)
> +		: "r"(a0), "r"(a1), "r"(a2)
> +		: "ra", "t0", "t1", "t2");
> +
> +	return a0_out;
> +#endif
> +}
> +
> +#define __HAVE_ARCH_STRLEN
> +extern asmlinkage __kernel_size_t __strlen_generic(const char *);
> +
> +static inline __kernel_size_t strlen(const char *s)
> +{
> +#ifdef RISCV_EFISTUB
> +	return __strlen_generic(s);
> +#else
> +	register const char *a0 asm("a0") = s;
> +	register int a0_out asm("a0");
> +
> +	asm volatile(
> +		"call __strlen_generic\n\t"
> +		: "=r"(a0_out)
> +		: "r"(a0)
> +		: "ra", "t0", "t1");
> +
> +	return a0_out;
> +#endif
> +}
> +
>  /* For those files which don't want to check by kasan. */
>  #if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
>  #define memcpy(dst, src, len) __memcpy(dst, src, len)
> diff --git a/arch/riscv/kernel/image-vars.h b/arch/riscv/kernel/image-vars.h
> index d6e5f739905e..2f270b9fde63 100644
> --- a/arch/riscv/kernel/image-vars.h
> +++ b/arch/riscv/kernel/image-vars.h
> @@ -25,10 +25,10 @@
>   */
>  __efistub_memcmp		= memcmp;
>  __efistub_memchr		= memchr;
> -__efistub_strlen		= strlen;
> +__efistub___strlen_generic	= __strlen_generic;
>  __efistub_strnlen		= strnlen;
> -__efistub_strcmp		= strcmp;
> -__efistub_strncmp		= strncmp;
> +__efistub___strcmp_generic	= __strcmp_generic;
> +__efistub___strncmp_generic	= __strncmp_generic;
>  __efistub_strrchr		= strrchr;
>  
>  __efistub__start		= _start;
> diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
> index 25d5c9664e57..6c74b0bedd60 100644
> --- a/arch/riscv/lib/Makefile
> +++ b/arch/riscv/lib/Makefile
> @@ -3,6 +3,9 @@ lib-y			+= delay.o
>  lib-y			+= memcpy.o
>  lib-y			+= memset.o
>  lib-y			+= memmove.o
> +lib-y			+= strcmp.o
> +lib-y			+= strlen.o
> +lib-y			+= strncmp.o
>  lib-$(CONFIG_MMU)	+= uaccess.o
>  lib-$(CONFIG_64BIT)	+= tishift.o
>  
> diff --git a/arch/riscv/lib/strcmp.S b/arch/riscv/lib/strcmp.S
> new file mode 100644
> index 000000000000..f23a5c5e39d8
> --- /dev/null
> +++ b/arch/riscv/lib/strcmp.S
> @@ -0,0 +1,39 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#include <linux/linkage.h>
> +#include <asm/asm.h>
> +#include <asm-generic/export.h>
> +
> +/* int __strcmp_generic(const char *cs, const char *ct) */
> +ENTRY(__strcmp_generic)
> +	/*
> +	 * Returns
> +	 *   a0 - comparison result, like strncmp

The strncmp_generic one below says "like strncmp" too. Given the below
copy paste stuff, is this one copy paste or intentional?

> +	 *
> +	 * Parameters
> +	 *   a0 - string1
> +	 *   a1 - string2
> +	 *   a2 - number of characters to compare

Above line is a copy paste error?

> +	 *
> +	 * Clobbers
> +	 *   t0, t1, t2, t3, t4, t5

As is this one?

Other than those, seems to make sense?

Reviewed-by: Conor Dooley <conor.dooley@microchip.com>

> +	 */
> +	mv	t2, a1
> +1:
> +	lbu	t1, 0(a0)
> +	lbu	t0, 0(a1)
> +	addi	a0, a0, 1
> +	addi	a1, a1, 1
> +	beq	t1, t0, 3f
> +	li	a0, 1
> +	bgeu	t1, t0, 2f
> +	li	a0, -1
> +2:
> +	mv	a1, t2
> +	ret
> +3:
> +	bnez	t1, 1b
> +	li	a0, 0
> +	j	2b
> +END(__strcmp_generic)
> +EXPORT_SYMBOL(__strcmp_generic)
> diff --git a/arch/riscv/lib/strlen.S b/arch/riscv/lib/strlen.S
> new file mode 100644
> index 000000000000..e0e7440ac724
> --- /dev/null
> +++ b/arch/riscv/lib/strlen.S
> @@ -0,0 +1,29 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#include <linux/linkage.h>
> +#include <asm/asm.h>
> +#include <asm-generic/export.h>
> +
> +/* int __strlen_generic(const char *s) */
> +ENTRY(__strlen_generic)
> +	/*
> +	 * Returns
> +	 *   a0 - string length
> +	 *
> +	 * Parameters
> +	 *   a0 - String to measure
> +	 *
> +	 * Clobbers:
> +	 *   t0, t1
> +	 */
> +	mv	t1, a0
> +1:
> +	lbu	t0, 0(t1)
> +	bnez	t0, 2f
> +	sub	a0, t1, a0
> +	ret
> +2:
> +	addi	t1, t1, 1
> +	j	1b
> +END(__strlen_generic)
> +EXPORT_SYMBOL(__strlen_generic)
> diff --git a/arch/riscv/lib/strncmp.S b/arch/riscv/lib/strncmp.S
> new file mode 100644
> index 000000000000..8d271cd0df72
> --- /dev/null
> +++ b/arch/riscv/lib/strncmp.S
> @@ -0,0 +1,41 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#include <linux/linkage.h>
> +#include <asm/asm.h>
> +#include <asm-generic/export.h>
> +
> +/* int __strncmp_generic(const char *cs, const char *ct, size_t count) */
> +ENTRY(__strncmp_generic)
> +	/*
> +	 * Returns
> +	 *   a0 - comparison result, like strncmp
> +	 *
> +	 * Parameters
> +	 *   a0 - string1
> +	 *   a1 - string2
> +	 *   a2 - number of characters to compare
> +	 *
> +	 * Clobbers
> +	 *   t0, t1, t2
> +	 */
> +	li	t0, 0
> +1:
> +	beq	a2, t0, 4f
> +	add	t1, a0, t0
> +	add	t2, a1, t0
> +	lbu	t1, 0(t1)
> +	lbu	t2, 0(t2)
> +	beq	t1, t2, 3f
> +	li	a0, 1
> +	bgeu	t1, t2, 2f
> +	li	a0, -1
> +2:
> +	ret
> +3:
> +	addi	t0, t0, 1
> +	bnez	t1, 1b
> +4:
> +	li	a0, 0
> +	j	2b
> +END(__strncmp_generic)
> +EXPORT_SYMBOL(__strncmp_generic)
> -- 
> 2.35.1
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 7/7] RISC-V: add zbb support to string functions
  2022-11-10 16:49 ` [PATCH 7/7] RISC-V: add zbb support to string functions Heiko Stuebner
@ 2022-11-13 23:29   ` Conor Dooley
  2022-11-13 23:47     ` Heiko Stübner
  2022-11-24 22:23     ` Heiko Stübner
  0 siblings, 2 replies; 51+ messages in thread
From: Conor Dooley @ 2022-11-13 23:29 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, prabhakar.csengg,
	philipp.tomsich, ajones, emil.renner.berthing, Heiko Stuebner

Hey Heiko,
I always seem to forget to open my mails.. I swear I'm not being rude!

In the back of my head while looking at this series, I've been wondering
about Palmer's hwcap stuff. Wonder what the craic is there - I assume
he's just been too busy with the profile stuff. Ditto Ztso, but iirc
that's related. Anyway, that's just an aside as I was reading through
the patch & typed it somewhere lest I forget to bring it up elsewhere.

On Thu, Nov 10, 2022 at 05:49:24PM +0100, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> Add handling for ZBB extension and add support for using it as a
> variant for optimized string functions.
> 
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  arch/riscv/Kconfig                   |  23 ++++++
>  arch/riscv/include/asm/errata_list.h |   3 +-
>  arch/riscv/include/asm/hwcap.h       |   1 +
>  arch/riscv/include/asm/string.h      |  29 ++++++--
>  arch/riscv/kernel/cpu.c              |   1 +
>  arch/riscv/kernel/cpufeature.c       |  18 +++++
>  arch/riscv/lib/Makefile              |   3 +
>  arch/riscv/lib/strcmp_zbb.S          |  91 +++++++++++++++++++++++
>  arch/riscv/lib/strlen_zbb.S          |  98 +++++++++++++++++++++++++
>  arch/riscv/lib/strncmp_zbb.S         | 106 +++++++++++++++++++++++++++
>  10 files changed, 366 insertions(+), 7 deletions(-)
>  create mode 100644 arch/riscv/lib/strcmp_zbb.S
>  create mode 100644 arch/riscv/lib/strlen_zbb.S
>  create mode 100644 arch/riscv/lib/strncmp_zbb.S
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index acfc4d298aab..56633931e808 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -411,6 +411,29 @@ config RISCV_ISA_SVPBMT
>  
>  	   If you don't know what to do here, say Y.
>  
> +config TOOLCHAIN_HAS_ZBB
> +	bool
> +	default y
> +	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zbb)
> +	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zbb)
> +	depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900

Drew wants to switch away from this method and use insn defs instead -
and I must admit it does look cleaner! Check out his Zicboz series and
see what you think of it.

> +config RISCV_ISA_ZBB
> +	bool "Zbb extension support for "

Missing some words in this line?

> +	depends on TOOLCHAIN_HAS_ZBB
> +	depends on !XIP_KERNEL && MMU
> +	select RISCV_ALTERNATIVE
> +	default y
> +	help
> +	   Adds support to dynamically detect the presence of the ZBB
> +	   extension (basic bit manipulation) and enable its usage.
> +
> +	   The Zbb extension provides instructions to accelerate a number
> +	   of bit-specific operations (count bit population, sign extending,
> +	   bitrotation, etc).
> +
> +	   If you don't know what to do here, say Y.
> +
>  config TOOLCHAIN_HAS_ZICBOM
>  	bool
>  	default y
> diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> index 4180312d2a70..95e626b7281e 100644
> --- a/arch/riscv/include/asm/errata_list.h
> +++ b/arch/riscv/include/asm/errata_list.h
> @@ -24,7 +24,8 @@
>  
>  #define	CPUFEATURE_SVPBMT 0
>  #define	CPUFEATURE_ZICBOM 1
> -#define	CPUFEATURE_NUMBER 2
> +#define	CPUFEATURE_ZBB 2
> +#define	CPUFEATURE_NUMBER 3
>  
>  #ifdef __ASSEMBLY__
>  
> diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> index b22525290073..ac5555fd9788 100644
> --- a/arch/riscv/include/asm/hwcap.h
> +++ b/arch/riscv/include/asm/hwcap.h
> @@ -59,6 +59,7 @@ enum riscv_isa_ext_id {
>  	RISCV_ISA_EXT_ZIHINTPAUSE,
>  	RISCV_ISA_EXT_SSTC,
>  	RISCV_ISA_EXT_SVINVAL,
> +	RISCV_ISA_EXT_ZBB,

With ZIHINTPAUSE before SSTC and SVINIVAL I assume this is not something
we are canonically ordering but I never, ever know which ones we are
allowed to re-order at will.

>  	RISCV_ISA_EXT_ID_MAX = RISCV_ISA_EXT_MAX,
>  };

> diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> index bf9dd6764bad..66ff36a57e20 100644
> --- a/arch/riscv/kernel/cpu.c
> +++ b/arch/riscv/kernel/cpu.c
> @@ -166,6 +166,7 @@ static struct riscv_isa_ext_data isa_ext_arr[] = {
>  	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
>  	__RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
>  	__RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
> +	__RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
>  	__RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
>  	__RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
>  	__RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),

This one I do know that Palmer wants canonically ordered.

> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 026512ca9c4c..f19b9d4e2dca 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -201,6 +201,7 @@ void __init riscv_fill_hwcap(void)
>  			} else {
>  				SET_ISA_EXT_MAP("sscofpmf", RISCV_ISA_EXT_SSCOFPMF);
>  				SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
> +				SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB);
>  				SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
>  				SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE);
>  				SET_ISA_EXT_MAP("sstc", RISCV_ISA_EXT_SSTC);

This one looks like it is, sstc aside. Same question as above, can I
reorder this one? I'll send a patch for it if I can...

> diff --git a/arch/riscv/lib/strcmp_zbb.S b/arch/riscv/lib/strcmp_zbb.S
> new file mode 100644
> index 000000000000..aff9b941d3ee
> --- /dev/null
> +++ b/arch/riscv/lib/strcmp_zbb.S
> @@ -0,0 +1,91 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2022 VRULL GmbH
> + * Author: Christoph Muellner <christoph.muellner@vrull.eu>

Is a Co-developed-by: appropriate then?

> + */
> +
> +#include <linux/linkage.h>
> +#include <asm/asm.h>
> +#include <asm-generic/export.h>
> +
> +#define src1		a0
> +#define result		a0
> +#define src2		t5
> +#define data1		t0
> +#define data2		t1
> +#define align		t2
> +#define data1_orcb	t3
> +#define m1		t4
> +
> +.option push
> +.option arch,+zbb
> +
> +/* int __strcmp_zbb(const char *cs, const char *ct) */
> +ENTRY(__strcmp_zbb)
> +	/*
> +	 * Returns
> +	 *   a0 - comparison result, like strncmp
> +	 *
> +	 * Parameters
> +	 *   a0 - string1
> +	 *   a1 - string2
> +	 *   a2 - number of characters to compare

Same copy paste mistake here with a2?

> +	 *
> +	 * Clobbers
> +	 *   t0, t1, t2, t3, t4, t5
> +	 */
> +	mv	src2, a1
> +
> +	or	align, src1, src2
> +	li	m1, -1
> +	and	align, align, SZREG-1
> +	bnez	align, 3f

Line of whitespace here would be nice.

> +	/* Main loop for aligned string.  */
> +	.p2align 3
> +1:
> +	REG_L	data1, 0(src1)
> +	REG_L	data2, 0(src2)
> +	orc.b	data1_orcb, data1
> +	bne	data1_orcb, m1, 2f
> +	addi	src1, src1, SZREG
> +	addi	src2, src2, SZREG
> +	beq	data1, data2, 1b
> +
> +	/* Words don't match, and no null byte in the first
> +	 * word. Get bytes in big-endian order and compare.  */
> +#ifndef CONFIG_CPU_BIG_ENDIAN

I know this is a lift from the reference implementation in the spec, but
do we actually need this ifndef section?

> +	rev8	data1, data1
> +	rev8	data2, data2
> +#endif
> +	/* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence.  */
> +	sltu	result, data1, data2
> +	neg	result, result
> +	ori	result, result, 1
> +	ret
> +
> +2:
> +	/* Found a null byte.
> +	 * If words don't match, fall back to simple loop.  */

Can the multiline comments match the usual multiline comment style that
you used as the start of the function?

> +	bne	data1, data2, 3f
> +
> +	/* Otherwise, strings are equal.  */
> +	li	result, 0
> +	ret
> +
> +	/* Simple loop for misaligned strings.  */
> +	.p2align 3
> +3:
> +	lbu	data1, 0(src1)
> +	lbu	data2, 0(src2)
> +	addi	src1, src1, 1
> +	addi	src2, src2, 1
> +	bne	data1, data2, 4f
> +	bnez	data1, 3b
> +
> +4:
> +	sub	result, data1, data2
> +	ret
> +END(__strcmp_zbb)
> +EXPORT_SYMBOL(__strcmp_zbb)
> +
> +.option pop
> diff --git a/arch/riscv/lib/strlen_zbb.S b/arch/riscv/lib/strlen_zbb.S
> new file mode 100644
> index 000000000000..bc8d3607a32f
> --- /dev/null
> +++ b/arch/riscv/lib/strlen_zbb.S
> @@ -0,0 +1,98 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2022 VRULL GmbH
> + * Author: Christoph Muellner <christoph.muellner@vrull.eu>
> + */
> +
> +#include <linux/linkage.h>
> +#include <asm/asm.h>
> +#include <asm-generic/export.h>
> +
> +#define src		a0
> +#define result		a0
> +#define addr		t0
> +#define data		t1
> +#define offset		t2
> +#define offset_bits	t2
> +#define valid_bytes	t3
> +#define m1		t3
> +
> +#ifdef CONFIG_CPU_BIG_ENDIAN
> +# define CZ	clz
> +# define SHIFT	sll
> +#else
> +# define CZ	ctz
> +# define SHIFT	srl
> +#endif
> +
> +.option push
> +.option arch,+zbb
> +
> +/* int __strlen_zbb(const char *s) */
> +ENTRY(__strlen_zbb)
> +	/*
> +	 * Returns
> +	 *   a0 - string length
> +	 *
> +	 * Parameters
> +	 *   a0 - String to measure
> +	 *
> +	 * Clobbers
> +	 *   t0, t1, t2, t3
> +	 */
> +
> +	/* Number of irrelevant bytes in the first word.  */
> +	andi	offset, src, SZREG-1
> +	/* Align pointer.  */
> +	andi	addr, src, -SZREG
> +
> +	li	valid_bytes, SZREG
> +	sub	valid_bytes, valid_bytes, offset
> +	slli	offset_bits, offset, RISCV_LGPTR
> +
> +	/* Get the first word.  */
> +	REG_L	data, 0(addr)

A line of whitespace prior to the comments would go a long way here with
making this a little more readable - especially as a diff in a
mailclient.

I am oh-so-far from an expert on this kind of stuff, but these three
functions seem to match up with the reference implementations in the
spec. With the couple different nit pick bits fixed, feel free to tack
on:
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>

Anyways, hopefully I've not missed a bunch of should-be-obvious things
while trying to review the series..

Thanks,
Conor.

> +	/* Shift away the partial data we loaded to remove the irrelevant bytes
> +	 * preceding the string with the effect of adding NUL bytes at the
> +	 * end of the string.  */
> +	SHIFT	data, data, offset_bits
> +	/* Convert non-NUL into 0xff and NUL into 0x00.  */
> +	orc.b	data, data
> +	/* Convert non-NUL into 0x00 and NUL into 0xff.  */
> +	not	data, data
> +	/* Search for the first set bit (corresponding to a NUL byte in the
> +	 * original chunk).  */
> +	CZ	data, data
> +	/* The first chunk is special: commpare against the number
> +	 * of valid bytes in this chunk.  */
> +	srli	result, data, 3
> +	bgtu	valid_bytes, result, 3f
> +
> +	/* Prepare for the word comparison loop.  */
> +	addi	offset, addr, SZREG
> +	li	m1, -1
> +
> +	/* Our critical loop is 4 instructions and processes data in
> +	 * 4 byte or 8 byte chunks.  */
> +	.p2align 3
> +1:
> +	REG_L	data, SZREG(addr)
> +	addi	addr, addr, SZREG
> +	orc.b	data, data
> +	beq	data, m1, 1b
> +2:
> +	not	data, data
> +	CZ	data, data
> +	/* Get number of processed words.  */
> +	sub	offset, addr, offset
> +	/* Add number of characters in the first word.  */
> +	add	result, result, offset
> +	srli	data, data, 3
> +	/* Add number of characters in the last word.  */
> +	add	result, result, data
> +3:
> +	ret
> +END(__strlen_zbb)
> +EXPORT_SYMBOL(__strlen_zbb)
> +
> +.option pop
> diff --git a/arch/riscv/lib/strncmp_zbb.S b/arch/riscv/lib/strncmp_zbb.S
> new file mode 100644
> index 000000000000..852c8425d238
> --- /dev/null
> +++ b/arch/riscv/lib/strncmp_zbb.S
> @@ -0,0 +1,106 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2022 VRULL GmbH
> + * Author: Christoph Muellner <christoph.muellner@vrull.eu>
> + */
> +
> +#include <linux/linkage.h>
> +#include <asm/asm.h>
> +#include <asm-generic/export.h>
> +
> +#define src1		a0
> +#define result		a0
> +#define src2		t6
> +#define len		a2
> +#define data1		t0
> +#define data2		t1
> +#define align		t2
> +#define data1_orcb	t3
> +#define limit		t4
> +#define m1		t5
> +
> +.option push
> +.option arch,+zbb
> +
> +/* int __strncmp_zbb(const char *cs, const char *ct, size_t count) */
> +ENTRY(__strncmp_zbb)
> +	/*
> +	 * Returns
> +	 *   a0 - comparison result, like strncmp
> +	 *
> +	 * Parameters
> +	 *   a0 - string1
> +	 *   a1 - string2
> +	 *   a2 - number of characters to compare
> +	 *
> +	 * Clobbers
> +	 *   t0, t1, t2, t3, t4, t5, t6
> +	 */
> +	mv	src2, a1
> +
> +	or	align, src1, src2
> +	li	m1, -1
> +	and	align, align, SZREG-1
> +	add	limit, src1, len
> +	bnez	align, 4f
> +
> +	/* Adjust limit for fast-path.  */
> +	addi	limit, limit, -SZREG
> +	/* Main loop for aligned string.  */
> +	.p2align 3
> +1:
> +	bgt	src1, limit, 3f
> +	REG_L	data1, 0(src1)
> +	REG_L	data2, 0(src2)
> +	orc.b	data1_orcb, data1
> +	bne	data1_orcb, m1, 2f
> +	addi	src1, src1, SZREG
> +	addi	src2, src2, SZREG
> +	beq	data1, data2, 1b
> +
> +	/* Words don't match, and no null byte in the first
> +	 * word. Get bytes in big-endian order and compare.  */
> +#ifndef CONFIG_CPU_BIG_ENDIAN
> +	rev8	data1, data1
> +	rev8	data2, data2
> +#endif
> +	/* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence.  */
> +	sltu	result, data1, data2
> +	neg	result, result
> +	ori	result, result, 1
> +	ret
> +
> +2:
> +	/* Found a null byte.
> +	 * If words don't match, fall back to simple loop.  */
> +	bne	data1, data2, 3f
> +
> +	/* Otherwise, strings are equal.  */
> +	li	result, 0
> +	ret
> +
> +	/* Simple loop for misaligned strings.  */
> +3:
> +	/* Restore limit for slow-path.  */
> +	addi	limit, limit, SZREG
> +	.p2align 3
> +4:
> +	bge	src1, limit, 6f
> +	lbu	data1, 0(src1)
> +	lbu	data2, 0(src2)
> +	addi	src1, src1, 1
> +	addi	src2, src2, 1
> +	bne	data1, data2, 5f
> +	bnez	data1, 4b
> +
> +5:
> +	sub	result, data1, data2
> +	ret
> +
> +6:
> +	li	result, 0
> +	ret
> +END(__strncmp_zbb)
> +EXPORT_SYMBOL(__strncmp_zbb)
> +
> +.option pop
> -- 
> 2.35.1
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 7/7] RISC-V: add zbb support to string functions
  2022-11-13 23:29   ` Conor Dooley
@ 2022-11-13 23:47     ` Heiko Stübner
  2022-11-24 22:23     ` Heiko Stübner
  1 sibling, 0 replies; 51+ messages in thread
From: Heiko Stübner @ 2022-11-13 23:47 UTC (permalink / raw)
  To: Conor Dooley
  Cc: linux-riscv, palmer, christoph.muellner, prabhakar.csengg,
	philipp.tomsich, ajones, emil.renner.berthing

Hi Conor,

Am Montag, 14. November 2022, 00:29:35 CET schrieb Conor Dooley:
> I always seem to forget to open my mails.. I swear I'm not being rude!

no worries :-) .

I also seem to miss one or another channel of communication way more often
than I'd like.


> In the back of my head while looking at this series, I've been wondering
> about Palmer's hwcap stuff. Wonder what the craic is there - I assume
> he's just been too busy with the profile stuff. Ditto Ztso, but iirc
> that's related. Anyway, that's just an aside as I was reading through
> the patch & typed it somewhere lest I forget to bring it up elsewhere.

That is essentially the next step for me.

The far away goal is doing the stuff we talked about at LPC - i.e. allowing
multiple variants for this and then selecting the hopefully fastest one
to patch in :-) .

Though the whole thing has way to many moving pieces for my taste,
so I'm trying to solve this step by step.

This series "simply" provides an optimization for cores supporting the
zbb extension. It works standalone and "solves" my sub-problems of
how to patch these core functions and also allowing function calls
from alternatives :-) .
[and by that may also help the people working on those more
 involved dma-operation variants, that aren't zicbom :-) ]


Next up is the mem* were I essentially have variants for "fast unaligned"
access systems - which then will use Palmer's hwcap work.
His series describes the necessary dt-property to declare if a core
can support said fast unaligned access.


Heiko

> On Thu, Nov 10, 2022 at 05:49:24PM +0100, Heiko Stuebner wrote:
> > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > 
> > Add handling for ZBB extension and add support for using it as a
> > variant for optimized string functions.
> > 
> > Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > ---
> >  arch/riscv/Kconfig                   |  23 ++++++
> >  arch/riscv/include/asm/errata_list.h |   3 +-
> >  arch/riscv/include/asm/hwcap.h       |   1 +
> >  arch/riscv/include/asm/string.h      |  29 ++++++--
> >  arch/riscv/kernel/cpu.c              |   1 +
> >  arch/riscv/kernel/cpufeature.c       |  18 +++++
> >  arch/riscv/lib/Makefile              |   3 +
> >  arch/riscv/lib/strcmp_zbb.S          |  91 +++++++++++++++++++++++
> >  arch/riscv/lib/strlen_zbb.S          |  98 +++++++++++++++++++++++++
> >  arch/riscv/lib/strncmp_zbb.S         | 106 +++++++++++++++++++++++++++
> >  10 files changed, 366 insertions(+), 7 deletions(-)
> >  create mode 100644 arch/riscv/lib/strcmp_zbb.S
> >  create mode 100644 arch/riscv/lib/strlen_zbb.S
> >  create mode 100644 arch/riscv/lib/strncmp_zbb.S
> > 
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index acfc4d298aab..56633931e808 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -411,6 +411,29 @@ config RISCV_ISA_SVPBMT
> >  
> >  	   If you don't know what to do here, say Y.
> >  
> > +config TOOLCHAIN_HAS_ZBB
> > +	bool
> > +	default y
> > +	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zbb)
> > +	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zbb)
> > +	depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900
> 
> Drew wants to switch away from this method and use insn defs instead -
> and I must admit it does look cleaner! Check out his Zicboz series and
> see what you think of it.
> 
> > +config RISCV_ISA_ZBB
> > +	bool "Zbb extension support for "
> 
> Missing some words in this line?
> 
> > +	depends on TOOLCHAIN_HAS_ZBB
> > +	depends on !XIP_KERNEL && MMU
> > +	select RISCV_ALTERNATIVE
> > +	default y
> > +	help
> > +	   Adds support to dynamically detect the presence of the ZBB
> > +	   extension (basic bit manipulation) and enable its usage.
> > +
> > +	   The Zbb extension provides instructions to accelerate a number
> > +	   of bit-specific operations (count bit population, sign extending,
> > +	   bitrotation, etc).
> > +
> > +	   If you don't know what to do here, say Y.
> > +
> >  config TOOLCHAIN_HAS_ZICBOM
> >  	bool
> >  	default y
> > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> > index 4180312d2a70..95e626b7281e 100644
> > --- a/arch/riscv/include/asm/errata_list.h
> > +++ b/arch/riscv/include/asm/errata_list.h
> > @@ -24,7 +24,8 @@
> >  
> >  #define	CPUFEATURE_SVPBMT 0
> >  #define	CPUFEATURE_ZICBOM 1
> > -#define	CPUFEATURE_NUMBER 2
> > +#define	CPUFEATURE_ZBB 2
> > +#define	CPUFEATURE_NUMBER 3
> >  
> >  #ifdef __ASSEMBLY__
> >  
> > diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> > index b22525290073..ac5555fd9788 100644
> > --- a/arch/riscv/include/asm/hwcap.h
> > +++ b/arch/riscv/include/asm/hwcap.h
> > @@ -59,6 +59,7 @@ enum riscv_isa_ext_id {
> >  	RISCV_ISA_EXT_ZIHINTPAUSE,
> >  	RISCV_ISA_EXT_SSTC,
> >  	RISCV_ISA_EXT_SVINVAL,
> > +	RISCV_ISA_EXT_ZBB,
> 
> With ZIHINTPAUSE before SSTC and SVINIVAL I assume this is not something
> we are canonically ordering but I never, ever know which ones we are
> allowed to re-order at will.
> 
> >  	RISCV_ISA_EXT_ID_MAX = RISCV_ISA_EXT_MAX,
> >  };
> 
> > diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> > index bf9dd6764bad..66ff36a57e20 100644
> > --- a/arch/riscv/kernel/cpu.c
> > +++ b/arch/riscv/kernel/cpu.c
> > @@ -166,6 +166,7 @@ static struct riscv_isa_ext_data isa_ext_arr[] = {
> >  	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
> >  	__RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
> >  	__RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
> > +	__RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
> >  	__RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
> >  	__RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
> >  	__RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
> 
> This one I do know that Palmer wants canonically ordered.
> 
> > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > index 026512ca9c4c..f19b9d4e2dca 100644
> > --- a/arch/riscv/kernel/cpufeature.c
> > +++ b/arch/riscv/kernel/cpufeature.c
> > @@ -201,6 +201,7 @@ void __init riscv_fill_hwcap(void)
> >  			} else {
> >  				SET_ISA_EXT_MAP("sscofpmf", RISCV_ISA_EXT_SSCOFPMF);
> >  				SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
> > +				SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB);
> >  				SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
> >  				SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE);
> >  				SET_ISA_EXT_MAP("sstc", RISCV_ISA_EXT_SSTC);
> 
> This one looks like it is, sstc aside. Same question as above, can I
> reorder this one? I'll send a patch for it if I can...
> 
> > diff --git a/arch/riscv/lib/strcmp_zbb.S b/arch/riscv/lib/strcmp_zbb.S
> > new file mode 100644
> > index 000000000000..aff9b941d3ee
> > --- /dev/null
> > +++ b/arch/riscv/lib/strcmp_zbb.S
> > @@ -0,0 +1,91 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (c) 2022 VRULL GmbH
> > + * Author: Christoph Muellner <christoph.muellner@vrull.eu>
> 
> Is a Co-developed-by: appropriate then?
> 
> > + */
> > +
> > +#include <linux/linkage.h>
> > +#include <asm/asm.h>
> > +#include <asm-generic/export.h>
> > +
> > +#define src1		a0
> > +#define result		a0
> > +#define src2		t5
> > +#define data1		t0
> > +#define data2		t1
> > +#define align		t2
> > +#define data1_orcb	t3
> > +#define m1		t4
> > +
> > +.option push
> > +.option arch,+zbb
> > +
> > +/* int __strcmp_zbb(const char *cs, const char *ct) */
> > +ENTRY(__strcmp_zbb)
> > +	/*
> > +	 * Returns
> > +	 *   a0 - comparison result, like strncmp
> > +	 *
> > +	 * Parameters
> > +	 *   a0 - string1
> > +	 *   a1 - string2
> > +	 *   a2 - number of characters to compare
> 
> Same copy paste mistake here with a2?
> 
> > +	 *
> > +	 * Clobbers
> > +	 *   t0, t1, t2, t3, t4, t5
> > +	 */
> > +	mv	src2, a1
> > +
> > +	or	align, src1, src2
> > +	li	m1, -1
> > +	and	align, align, SZREG-1
> > +	bnez	align, 3f
> 
> Line of whitespace here would be nice.
> 
> > +	/* Main loop for aligned string.  */
> > +	.p2align 3
> > +1:
> > +	REG_L	data1, 0(src1)
> > +	REG_L	data2, 0(src2)
> > +	orc.b	data1_orcb, data1
> > +	bne	data1_orcb, m1, 2f
> > +	addi	src1, src1, SZREG
> > +	addi	src2, src2, SZREG
> > +	beq	data1, data2, 1b
> > +
> > +	/* Words don't match, and no null byte in the first
> > +	 * word. Get bytes in big-endian order and compare.  */
> > +#ifndef CONFIG_CPU_BIG_ENDIAN
> 
> I know this is a lift from the reference implementation in the spec, but
> do we actually need this ifndef section?
> 
> > +	rev8	data1, data1
> > +	rev8	data2, data2
> > +#endif
> > +	/* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence.  */
> > +	sltu	result, data1, data2
> > +	neg	result, result
> > +	ori	result, result, 1
> > +	ret
> > +
> > +2:
> > +	/* Found a null byte.
> > +	 * If words don't match, fall back to simple loop.  */
> 
> Can the multiline comments match the usual multiline comment style that
> you used as the start of the function?
> 
> > +	bne	data1, data2, 3f
> > +
> > +	/* Otherwise, strings are equal.  */
> > +	li	result, 0
> > +	ret
> > +
> > +	/* Simple loop for misaligned strings.  */
> > +	.p2align 3
> > +3:
> > +	lbu	data1, 0(src1)
> > +	lbu	data2, 0(src2)
> > +	addi	src1, src1, 1
> > +	addi	src2, src2, 1
> > +	bne	data1, data2, 4f
> > +	bnez	data1, 3b
> > +
> > +4:
> > +	sub	result, data1, data2
> > +	ret
> > +END(__strcmp_zbb)
> > +EXPORT_SYMBOL(__strcmp_zbb)
> > +
> > +.option pop
> > diff --git a/arch/riscv/lib/strlen_zbb.S b/arch/riscv/lib/strlen_zbb.S
> > new file mode 100644
> > index 000000000000..bc8d3607a32f
> > --- /dev/null
> > +++ b/arch/riscv/lib/strlen_zbb.S
> > @@ -0,0 +1,98 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (c) 2022 VRULL GmbH
> > + * Author: Christoph Muellner <christoph.muellner@vrull.eu>
> > + */
> > +
> > +#include <linux/linkage.h>
> > +#include <asm/asm.h>
> > +#include <asm-generic/export.h>
> > +
> > +#define src		a0
> > +#define result		a0
> > +#define addr		t0
> > +#define data		t1
> > +#define offset		t2
> > +#define offset_bits	t2
> > +#define valid_bytes	t3
> > +#define m1		t3
> > +
> > +#ifdef CONFIG_CPU_BIG_ENDIAN
> > +# define CZ	clz
> > +# define SHIFT	sll
> > +#else
> > +# define CZ	ctz
> > +# define SHIFT	srl
> > +#endif
> > +
> > +.option push
> > +.option arch,+zbb
> > +
> > +/* int __strlen_zbb(const char *s) */
> > +ENTRY(__strlen_zbb)
> > +	/*
> > +	 * Returns
> > +	 *   a0 - string length
> > +	 *
> > +	 * Parameters
> > +	 *   a0 - String to measure
> > +	 *
> > +	 * Clobbers
> > +	 *   t0, t1, t2, t3
> > +	 */
> > +
> > +	/* Number of irrelevant bytes in the first word.  */
> > +	andi	offset, src, SZREG-1
> > +	/* Align pointer.  */
> > +	andi	addr, src, -SZREG
> > +
> > +	li	valid_bytes, SZREG
> > +	sub	valid_bytes, valid_bytes, offset
> > +	slli	offset_bits, offset, RISCV_LGPTR
> > +
> > +	/* Get the first word.  */
> > +	REG_L	data, 0(addr)
> 
> A line of whitespace prior to the comments would go a long way here with
> making this a little more readable - especially as a diff in a
> mailclient.
> 
> I am oh-so-far from an expert on this kind of stuff, but these three
> functions seem to match up with the reference implementations in the
> spec. With the couple different nit pick bits fixed, feel free to tack
> on:
> Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
> 
> Anyways, hopefully I've not missed a bunch of should-be-obvious things
> while trying to review the series..
> 
> Thanks,
> Conor.
> 
> > +	/* Shift away the partial data we loaded to remove the irrelevant bytes
> > +	 * preceding the string with the effect of adding NUL bytes at the
> > +	 * end of the string.  */
> > +	SHIFT	data, data, offset_bits
> > +	/* Convert non-NUL into 0xff and NUL into 0x00.  */
> > +	orc.b	data, data
> > +	/* Convert non-NUL into 0x00 and NUL into 0xff.  */
> > +	not	data, data
> > +	/* Search for the first set bit (corresponding to a NUL byte in the
> > +	 * original chunk).  */
> > +	CZ	data, data
> > +	/* The first chunk is special: commpare against the number
> > +	 * of valid bytes in this chunk.  */
> > +	srli	result, data, 3
> > +	bgtu	valid_bytes, result, 3f
> > +
> > +	/* Prepare for the word comparison loop.  */
> > +	addi	offset, addr, SZREG
> > +	li	m1, -1
> > +
> > +	/* Our critical loop is 4 instructions and processes data in
> > +	 * 4 byte or 8 byte chunks.  */
> > +	.p2align 3
> > +1:
> > +	REG_L	data, SZREG(addr)
> > +	addi	addr, addr, SZREG
> > +	orc.b	data, data
> > +	beq	data, m1, 1b
> > +2:
> > +	not	data, data
> > +	CZ	data, data
> > +	/* Get number of processed words.  */
> > +	sub	offset, addr, offset
> > +	/* Add number of characters in the first word.  */
> > +	add	result, result, offset
> > +	srli	data, data, 3
> > +	/* Add number of characters in the last word.  */
> > +	add	result, result, data
> > +3:
> > +	ret
> > +END(__strlen_zbb)
> > +EXPORT_SYMBOL(__strlen_zbb)
> > +
> > +.option pop
> > diff --git a/arch/riscv/lib/strncmp_zbb.S b/arch/riscv/lib/strncmp_zbb.S
> > new file mode 100644
> > index 000000000000..852c8425d238
> > --- /dev/null
> > +++ b/arch/riscv/lib/strncmp_zbb.S
> > @@ -0,0 +1,106 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (c) 2022 VRULL GmbH
> > + * Author: Christoph Muellner <christoph.muellner@vrull.eu>
> > + */
> > +
> > +#include <linux/linkage.h>
> > +#include <asm/asm.h>
> > +#include <asm-generic/export.h>
> > +
> > +#define src1		a0
> > +#define result		a0
> > +#define src2		t6
> > +#define len		a2
> > +#define data1		t0
> > +#define data2		t1
> > +#define align		t2
> > +#define data1_orcb	t3
> > +#define limit		t4
> > +#define m1		t5
> > +
> > +.option push
> > +.option arch,+zbb
> > +
> > +/* int __strncmp_zbb(const char *cs, const char *ct, size_t count) */
> > +ENTRY(__strncmp_zbb)
> > +	/*
> > +	 * Returns
> > +	 *   a0 - comparison result, like strncmp
> > +	 *
> > +	 * Parameters
> > +	 *   a0 - string1
> > +	 *   a1 - string2
> > +	 *   a2 - number of characters to compare
> > +	 *
> > +	 * Clobbers
> > +	 *   t0, t1, t2, t3, t4, t5, t6
> > +	 */
> > +	mv	src2, a1
> > +
> > +	or	align, src1, src2
> > +	li	m1, -1
> > +	and	align, align, SZREG-1
> > +	add	limit, src1, len
> > +	bnez	align, 4f
> > +
> > +	/* Adjust limit for fast-path.  */
> > +	addi	limit, limit, -SZREG
> > +	/* Main loop for aligned string.  */
> > +	.p2align 3
> > +1:
> > +	bgt	src1, limit, 3f
> > +	REG_L	data1, 0(src1)
> > +	REG_L	data2, 0(src2)
> > +	orc.b	data1_orcb, data1
> > +	bne	data1_orcb, m1, 2f
> > +	addi	src1, src1, SZREG
> > +	addi	src2, src2, SZREG
> > +	beq	data1, data2, 1b
> > +
> > +	/* Words don't match, and no null byte in the first
> > +	 * word. Get bytes in big-endian order and compare.  */
> > +#ifndef CONFIG_CPU_BIG_ENDIAN
> > +	rev8	data1, data1
> > +	rev8	data2, data2
> > +#endif
> > +	/* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence.  */
> > +	sltu	result, data1, data2
> > +	neg	result, result
> > +	ori	result, result, 1
> > +	ret
> > +
> > +2:
> > +	/* Found a null byte.
> > +	 * If words don't match, fall back to simple loop.  */
> > +	bne	data1, data2, 3f
> > +
> > +	/* Otherwise, strings are equal.  */
> > +	li	result, 0
> > +	ret
> > +
> > +	/* Simple loop for misaligned strings.  */
> > +3:
> > +	/* Restore limit for slow-path.  */
> > +	addi	limit, limit, SZREG
> > +	.p2align 3
> > +4:
> > +	bge	src1, limit, 6f
> > +	lbu	data1, 0(src1)
> > +	lbu	data2, 0(src2)
> > +	addi	src1, src1, 1
> > +	addi	src2, src2, 1
> > +	bne	data1, data2, 5f
> > +	bnez	data1, 4b
> > +
> > +5:
> > +	sub	result, data1, data2
> > +	ret
> > +
> > +6:
> > +	li	result, 0
> > +	ret
> > +END(__strncmp_zbb)
> > +EXPORT_SYMBOL(__strncmp_zbb)
> > +
> > +.option pop
> 





_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives
  2022-11-10 16:49 ` [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives Heiko Stuebner
  2022-11-13 20:31   ` Conor Dooley
@ 2022-11-14 10:57   ` Emil Renner Berthing
  2022-11-14 11:35     ` Andrew Jones
  2022-11-15 14:28   ` Lad, Prabhakar
  2022-11-21  9:50   ` Lad, Prabhakar
  3 siblings, 1 reply; 51+ messages in thread
From: Emil Renner Berthing @ 2022-11-14 10:57 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, prabhakar.csengg, conor,
	philipp.tomsich, ajones, Heiko Stuebner

On Thu, 10 Nov 2022 at 17:50, Heiko Stuebner <heiko@sntech.de> wrote:
>
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
>
> Alternatives live in a different section, so addresses used by call
> functions will point to wrong locations after the patch got applied.
>
> Similar to arm64, adjust the location to consider that offset.
>
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  arch/riscv/kernel/cpufeature.c | 79 +++++++++++++++++++++++++++++++++-
>  1 file changed, 77 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 694267d1fe81..026512ca9c4c 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -298,6 +298,74 @@ static u32 __init_or_module cpufeature_probe(unsigned int stage)
>         return cpu_req_feature;
>  }
>
> +#include <asm/parse_asm.h>
> +
> +DECLARE_INSN(jalr, MATCH_JALR, MASK_JALR)
> +DECLARE_INSN(auipc, MATCH_AUIPC, MASK_AUIPC)
> +
> +static inline bool is_auipc_jalr_pair(long insn1, long insn2)
> +{
> +       return is_auipc_insn(insn1) && is_jalr_insn(insn2);
> +}
> +
> +#define JALR_SIGN_MASK         BIT(I_IMM_SIGN_OPOFF - I_IMM_11_0_OPOFF)
> +#define JALR_OFFSET_MASK       I_IMM_11_0_MASK
> +#define AUIPC_OFFSET_MASK      U_IMM_31_12_MASK
> +#define AUIPC_PAD              (0x00001000)
> +#define JALR_SHIFT             I_IMM_11_0_OPOFF
> +
> +#define to_jalr_imm(offset)                                            \
> +       ((offset & I_IMM_11_0_MASK) << I_IMM_11_0_OPOFF)
> +
> +#define to_auipc_imm(offset)                                           \
> +       ((offset & JALR_SIGN_MASK) ?                                    \
> +       ((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) :    \
> +       (offset & AUIPC_OFFSET_MASK))
> +
> +static void riscv_alternative_fix_auipc_jalr(unsigned int *alt_ptr,
> +                                            unsigned int len, int patch_offset)
> +{
> +       int num_instr = len / sizeof(u32);
> +       unsigned int call[2];
> +       int i;
> +       int imm1;
> +       u32 rd1;
> +
> +       for (i = 0; i < num_instr; i++) {
> +               /* is there a further instruction? */
> +               if (i + 1 >= num_instr)
> +                       continue;

Isn't this the same as for (i = 0; i < num_instr - 1; i++) ?

> +
> +               if (!is_auipc_jalr_pair(*(alt_ptr + i), *(alt_ptr + i + 1)))
> +                       continue;
> +
> +               /* call will use ra register */
> +               rd1 = EXTRACT_RD_REG(*(alt_ptr + i));
> +               if (rd1 != 1)
> +                       continue;
> +
> +               /* get and adjust new target address */
> +               imm1 = EXTRACT_UTYPE_IMM(*(alt_ptr + i));
> +               imm1 += EXTRACT_ITYPE_IMM(*(alt_ptr + i + 1));
> +               imm1 -= patch_offset;
> +
> +               /* pick the original auipc + jalr */
> +               call[0] = *(alt_ptr + i);
> +               call[1] = *(alt_ptr + i + 1);
> +
> +               /* drop the old IMMs */
> +               call[0] &= ~(U_IMM_31_12_MASK);
> +               call[1] &= ~(I_IMM_11_0_MASK << I_IMM_11_0_OPOFF);
> +
> +               /* add the adapted IMMs */
> +               call[0] |= to_auipc_imm(imm1);
> +               call[1] |= to_jalr_imm(imm1);
> +
> +               /* patch the call place again */
> +               patch_text_nosync(alt_ptr + i * sizeof(u32), call, 8);
> +       }
> +}
> +
>  void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
>                                                   struct alt_entry *end,
>                                                   unsigned int stage)
> @@ -316,8 +384,15 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
>                 }
>
>                 tmp = (1U << alt->errata_id);
> -               if (cpu_req_feature & tmp)
> -                       patch_text_nosync(alt->old_ptr, alt->alt_ptr, alt->alt_len);
> +               if (cpu_req_feature & tmp) {
> +                       /* do the basic patching */
> +                       patch_text_nosync(alt->old_ptr, alt->alt_ptr,
> +                                         alt->alt_len);
> +
> +                       riscv_alternative_fix_auipc_jalr(alt->old_ptr,
> +                                                        alt->alt_len,
> +                                                        alt->old_ptr - alt->alt_ptr);

Here you're casting a void pointer to an instruction to an unsigned
int pointer, but since we enable compressed instructions this may
result in an unaligned pointer. Using this pointer will work, but may
be slow. Eg. fault to m-mode to be patched up. We already do that in
other places in the arch/riscv, but I'd prefer not to add new
instances of this.

> +               }
>         }
>  }
>  #endif
> --
> 2.35.1

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives
  2022-11-14 10:57   ` Emil Renner Berthing
@ 2022-11-14 11:35     ` Andrew Jones
  2022-11-14 11:38       ` Emil Renner Berthing
  2022-11-14 11:38       ` Heiko Stübner
  0 siblings, 2 replies; 51+ messages in thread
From: Andrew Jones @ 2022-11-14 11:35 UTC (permalink / raw)
  To: Emil Renner Berthing
  Cc: Heiko Stuebner, linux-riscv, palmer, christoph.muellner,
	prabhakar.csengg, conor, philipp.tomsich, Heiko Stuebner

On Mon, Nov 14, 2022 at 11:57:29AM +0100, Emil Renner Berthing wrote:
> On Thu, 10 Nov 2022 at 17:50, Heiko Stuebner <heiko@sntech.de> wrote:
...
> > @@ -316,8 +384,15 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
> >                 }
> >
> >                 tmp = (1U << alt->errata_id);
> > -               if (cpu_req_feature & tmp)
> > -                       patch_text_nosync(alt->old_ptr, alt->alt_ptr, alt->alt_len);
> > +               if (cpu_req_feature & tmp) {
> > +                       /* do the basic patching */
> > +                       patch_text_nosync(alt->old_ptr, alt->alt_ptr,
> > +                                         alt->alt_len);
> > +
> > +                       riscv_alternative_fix_auipc_jalr(alt->old_ptr,
> > +                                                        alt->alt_len,
> > +                                                        alt->old_ptr - alt->alt_ptr);
> 
> Here you're casting a void pointer to an instruction to an unsigned
> int pointer, but since we enable compressed instructions this may
> result in an unaligned pointer. Using this pointer will work, but may
> be slow. Eg. fault to m-mode to be patched up. We already do that in
> other places in the arch/riscv, but I'd prefer not to add new
> instances of this.

Alternative instruction sequences (old and new) have compression disabled.

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives
  2022-11-14 11:35     ` Andrew Jones
@ 2022-11-14 11:38       ` Emil Renner Berthing
  2022-11-14 11:38       ` Heiko Stübner
  1 sibling, 0 replies; 51+ messages in thread
From: Emil Renner Berthing @ 2022-11-14 11:38 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Heiko Stuebner, linux-riscv, palmer, christoph.muellner,
	prabhakar.csengg, conor, philipp.tomsich, Heiko Stuebner

On Mon, 14 Nov 2022 at 12:35, Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Mon, Nov 14, 2022 at 11:57:29AM +0100, Emil Renner Berthing wrote:
> > On Thu, 10 Nov 2022 at 17:50, Heiko Stuebner <heiko@sntech.de> wrote:
> ...
> > > @@ -316,8 +384,15 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
> > >                 }
> > >
> > >                 tmp = (1U << alt->errata_id);
> > > -               if (cpu_req_feature & tmp)
> > > -                       patch_text_nosync(alt->old_ptr, alt->alt_ptr, alt->alt_len);
> > > +               if (cpu_req_feature & tmp) {
> > > +                       /* do the basic patching */
> > > +                       patch_text_nosync(alt->old_ptr, alt->alt_ptr,
> > > +                                         alt->alt_len);
> > > +
> > > +                       riscv_alternative_fix_auipc_jalr(alt->old_ptr,
> > > +                                                        alt->alt_len,
> > > +                                                        alt->old_ptr - alt->alt_ptr);
> >
> > Here you're casting a void pointer to an instruction to an unsigned
> > int pointer, but since we enable compressed instructions this may
> > result in an unaligned pointer. Using this pointer will work, but may
> > be slow. Eg. fault to m-mode to be patched up. We already do that in
> > other places in the arch/riscv, but I'd prefer not to add new
> > instances of this.
>
> Alternative instruction sequences (old and new) have compression disabled.

I see, but if the instructions before the alternative sequence ends on
an unaligned address will there be inserted 16bit NOPs to make sure
the alternative sequence will be aligned?

> Thanks,
> drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives
  2022-11-14 11:35     ` Andrew Jones
  2022-11-14 11:38       ` Emil Renner Berthing
@ 2022-11-14 11:38       ` Heiko Stübner
  2022-11-14 12:15         ` Andrew Jones
  2022-11-14 12:47         ` Philipp Tomsich
  1 sibling, 2 replies; 51+ messages in thread
From: Heiko Stübner @ 2022-11-14 11:38 UTC (permalink / raw)
  To: Emil Renner Berthing, Andrew Jones
  Cc: linux-riscv, palmer, christoph.muellner, prabhakar.csengg, conor,
	philipp.tomsich

Am Montag, 14. November 2022, 12:35:53 CET schrieb Andrew Jones:
> On Mon, Nov 14, 2022 at 11:57:29AM +0100, Emil Renner Berthing wrote:
> > On Thu, 10 Nov 2022 at 17:50, Heiko Stuebner <heiko@sntech.de> wrote:
> ...
> > > @@ -316,8 +384,15 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
> > >                 }
> > >
> > >                 tmp = (1U << alt->errata_id);
> > > -               if (cpu_req_feature & tmp)
> > > -                       patch_text_nosync(alt->old_ptr, alt->alt_ptr, alt->alt_len);
> > > +               if (cpu_req_feature & tmp) {
> > > +                       /* do the basic patching */
> > > +                       patch_text_nosync(alt->old_ptr, alt->alt_ptr,
> > > +                                         alt->alt_len);
> > > +
> > > +                       riscv_alternative_fix_auipc_jalr(alt->old_ptr,
> > > +                                                        alt->alt_len,
> > > +                                                        alt->old_ptr - alt->alt_ptr);
> > 
> > Here you're casting a void pointer to an instruction to an unsigned
> > int pointer, but since we enable compressed instructions this may
> > result in an unaligned pointer. Using this pointer will work, but may
> > be slow. Eg. fault to m-mode to be patched up. We already do that in
> > other places in the arch/riscv, but I'd prefer not to add new
> > instances of this.
> 
> Alternative instruction sequences (old and new) have compression disabled.

That was my first thought as well, but I think Emil was talking more about the
placement of the alternative block inside the running kernel.

i.e. I guess the starting point of an alternative sequence could also be unaligned.

Though I don't _yet_ see how an improvement could look like.



_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives
  2022-11-14 11:38       ` Heiko Stübner
@ 2022-11-14 12:15         ` Andrew Jones
  2022-11-14 12:29           ` Emil Renner Berthing
  2022-11-14 12:47         ` Philipp Tomsich
  1 sibling, 1 reply; 51+ messages in thread
From: Andrew Jones @ 2022-11-14 12:15 UTC (permalink / raw)
  To: Heiko Stübner
  Cc: Emil Renner Berthing, linux-riscv, palmer, christoph.muellner,
	prabhakar.csengg, conor, philipp.tomsich

On Mon, Nov 14, 2022 at 12:38:39PM +0100, Heiko Stübner wrote:
> Am Montag, 14. November 2022, 12:35:53 CET schrieb Andrew Jones:
> > On Mon, Nov 14, 2022 at 11:57:29AM +0100, Emil Renner Berthing wrote:
> > > On Thu, 10 Nov 2022 at 17:50, Heiko Stuebner <heiko@sntech.de> wrote:
> > ...
> > > > @@ -316,8 +384,15 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
> > > >                 }
> > > >
> > > >                 tmp = (1U << alt->errata_id);
> > > > -               if (cpu_req_feature & tmp)
> > > > -                       patch_text_nosync(alt->old_ptr, alt->alt_ptr, alt->alt_len);
> > > > +               if (cpu_req_feature & tmp) {
> > > > +                       /* do the basic patching */
> > > > +                       patch_text_nosync(alt->old_ptr, alt->alt_ptr,
> > > > +                                         alt->alt_len);
> > > > +
> > > > +                       riscv_alternative_fix_auipc_jalr(alt->old_ptr,
> > > > +                                                        alt->alt_len,
> > > > +                                                        alt->old_ptr - alt->alt_ptr);
> > > 
> > > Here you're casting a void pointer to an instruction to an unsigned
> > > int pointer, but since we enable compressed instructions this may
> > > result in an unaligned pointer. Using this pointer will work, but may
> > > be slow. Eg. fault to m-mode to be patched up. We already do that in
> > > other places in the arch/riscv, but I'd prefer not to add new
> > > instances of this.
> > 
> > Alternative instruction sequences (old and new) have compression disabled.
> 
> That was my first thought as well, but I think Emil was talking more about the
> placement of the alternative block inside the running kernel.
> 
> i.e. I guess the starting point of an alternative sequence could also be unaligned.

Oh, I see.

> 
> Though I don't _yet_ see how an improvement could look like.

I think we can patch the alternative macros to add alignment. Something
like

diff --git a/arch/riscv/include/asm/alternative-macros.h b/arch/riscv/include/asm/alternative-macros.h
index ec2f3f1b836f..3c330a9066f7 100644
--- a/arch/riscv/include/asm/alternative-macros.h
+++ b/arch/riscv/include/asm/alternative-macros.h
@@ -20,6 +20,7 @@
        ALT_ENTRY 886b, 888f, \vendor_id, \errata_id, 889f - 888f
        .popsection
        .subsection 1
+       .balign 4
 888 :
        .option push
        .option norvc
@@ -34,6 +35,7 @@
 .endm

 .macro __ALTERNATIVE_CFG old_c, new_c, vendor_id, errata_id, enable
+       .balign 4
 886 :
        .option push
        .option norvc
@@ -49,6 +51,7 @@

 .macro __ALTERNATIVE_CFG_2 old_c, new_c_1, vendor_id_1, errata_id_1, enable_1, \
                                  new_c_2, vendor_id_2, errata_id_2, enable_2
+       .balign 4
 886 :
        .option push
        .option norvc
@@ -87,6 +90,7 @@
        ALT_ENTRY("886b", "888f", __stringify(vendor_id), __stringify(errata_id), "889f - 888f") \
        ".popsection\n"                                                 \
        ".subsection 1\n"                                               \
+       ".balign 4\n"                                                   \
        "888 :\n"                                                       \
        ".option push\n"                                                \
        ".option norvc\n"                                               \
@@ -100,6 +104,7 @@
        ".endif\n"

 #define __ALTERNATIVE_CFG(old_c, new_c, vendor_id, errata_id, enable)  \
+       ".balign 4\n"                                                   \
        "886 :\n"                                                       \
        ".option push\n"                                                \
        ".option norvc\n"                                               \
@@ -116,6 +121,7 @@
                                        enable_1,                       \
                                   new_c_2, vendor_id_2, errata_id_2,   \
                                        enable_2)                       \
+       ".balign 4\n"                                                   \
        "886 :\n"                                                       \
        ".option push\n"                                                \
        ".option norvc\n"                                               \


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives
  2022-11-14 12:15         ` Andrew Jones
@ 2022-11-14 12:29           ` Emil Renner Berthing
  0 siblings, 0 replies; 51+ messages in thread
From: Emil Renner Berthing @ 2022-11-14 12:29 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Heiko Stübner, linux-riscv, palmer, christoph.muellner,
	prabhakar.csengg, conor, philipp.tomsich

On Mon, 14 Nov 2022 at 13:15, Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Mon, Nov 14, 2022 at 12:38:39PM +0100, Heiko Stübner wrote:
> > Am Montag, 14. November 2022, 12:35:53 CET schrieb Andrew Jones:
> > > On Mon, Nov 14, 2022 at 11:57:29AM +0100, Emil Renner Berthing wrote:
> > > > On Thu, 10 Nov 2022 at 17:50, Heiko Stuebner <heiko@sntech.de> wrote:
> > > ...
> > > > > @@ -316,8 +384,15 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
> > > > >                 }
> > > > >
> > > > >                 tmp = (1U << alt->errata_id);
> > > > > -               if (cpu_req_feature & tmp)
> > > > > -                       patch_text_nosync(alt->old_ptr, alt->alt_ptr, alt->alt_len);
> > > > > +               if (cpu_req_feature & tmp) {
> > > > > +                       /* do the basic patching */
> > > > > +                       patch_text_nosync(alt->old_ptr, alt->alt_ptr,
> > > > > +                                         alt->alt_len);
> > > > > +
> > > > > +                       riscv_alternative_fix_auipc_jalr(alt->old_ptr,
> > > > > +                                                        alt->alt_len,
> > > > > +                                                        alt->old_ptr - alt->alt_ptr);
> > > >
> > > > Here you're casting a void pointer to an instruction to an unsigned
> > > > int pointer, but since we enable compressed instructions this may
> > > > result in an unaligned pointer. Using this pointer will work, but may
> > > > be slow. Eg. fault to m-mode to be patched up. We already do that in
> > > > other places in the arch/riscv, but I'd prefer not to add new
> > > > instances of this.
> > >
> > > Alternative instruction sequences (old and new) have compression disabled.
> >
> > That was my first thought as well, but I think Emil was talking more about the
> > placement of the alternative block inside the running kernel.
> >
> > i.e. I guess the starting point of an alternative sequence could also be unaligned.
>
> Oh, I see.
>
> >
> > Though I don't _yet_ see how an improvement could look like.
>
> I think we can patch the alternative macros to add alignment. Something
> like
>
> diff --git a/arch/riscv/include/asm/alternative-macros.h b/arch/riscv/include/asm/alternative-macros.h
> index ec2f3f1b836f..3c330a9066f7 100644
> --- a/arch/riscv/include/asm/alternative-macros.h
> +++ b/arch/riscv/include/asm/alternative-macros.h
> @@ -20,6 +20,7 @@
>         ALT_ENTRY 886b, 888f, \vendor_id, \errata_id, 889f - 888f
>         .popsection
>         .subsection 1
> +       .balign 4
>  888 :
>         .option push
>         .option norvc
> @@ -34,6 +35,7 @@
>  .endm
>
>  .macro __ALTERNATIVE_CFG old_c, new_c, vendor_id, errata_id, enable
> +       .balign 4
>  886 :
>         .option push
>         .option norvc
> @@ -49,6 +51,7 @@
>
>  .macro __ALTERNATIVE_CFG_2 old_c, new_c_1, vendor_id_1, errata_id_1, enable_1, \
>                                   new_c_2, vendor_id_2, errata_id_2, enable_2
> +       .balign 4
>  886 :
>         .option push
>         .option norvc
> @@ -87,6 +90,7 @@
>         ALT_ENTRY("886b", "888f", __stringify(vendor_id), __stringify(errata_id), "889f - 888f") \
>         ".popsection\n"                                                 \
>         ".subsection 1\n"                                               \
> +       ".balign 4\n"                                                   \
>         "888 :\n"                                                       \
>         ".option push\n"                                                \
>         ".option norvc\n"                                               \
> @@ -100,6 +104,7 @@
>         ".endif\n"
>
>  #define __ALTERNATIVE_CFG(old_c, new_c, vendor_id, errata_id, enable)  \
> +       ".balign 4\n"                                                   \
>         "886 :\n"                                                       \
>         ".option push\n"                                                \
>         ".option norvc\n"                                               \
> @@ -116,6 +121,7 @@
>                                         enable_1,                       \
>                                    new_c_2, vendor_id_2, errata_id_2,   \
>                                         enable_2)                       \
> +       ".balign 4\n"                                                   \
>         "886 :\n"                                                       \
>         ".option push\n"                                                \
>         ".option norvc\n"                                               \

Why not just use accessors? Eg.

unsigned int riscv_instruction_at(void *p)
{
  u16 *parcel = p;
  return (unsigned int)parcel[0] | (unsigned int)parcel[1] << 16;
}

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives
  2022-11-14 11:38       ` Heiko Stübner
  2022-11-14 12:15         ` Andrew Jones
@ 2022-11-14 12:47         ` Philipp Tomsich
  1 sibling, 0 replies; 51+ messages in thread
From: Philipp Tomsich @ 2022-11-14 12:47 UTC (permalink / raw)
  To: Heiko Stübner
  Cc: Emil Renner Berthing, Andrew Jones, linux-riscv, palmer,
	christoph.muellner, prabhakar.csengg, conor

On Mon, 14 Nov 2022 at 12:38, Heiko Stübner <heiko@sntech.de> wrote:
>
> Am Montag, 14. November 2022, 12:35:53 CET schrieb Andrew Jones:
> > On Mon, Nov 14, 2022 at 11:57:29AM +0100, Emil Renner Berthing wrote:
> > > On Thu, 10 Nov 2022 at 17:50, Heiko Stuebner <heiko@sntech.de> wrote:
> > ...
> > > > @@ -316,8 +384,15 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
> > > >                 }
> > > >
> > > >                 tmp = (1U << alt->errata_id);
> > > > -               if (cpu_req_feature & tmp)
> > > > -                       patch_text_nosync(alt->old_ptr, alt->alt_ptr, alt->alt_len);
> > > > +               if (cpu_req_feature & tmp) {
> > > > +                       /* do the basic patching */
> > > > +                       patch_text_nosync(alt->old_ptr, alt->alt_ptr,
> > > > +                                         alt->alt_len);
> > > > +
> > > > +                       riscv_alternative_fix_auipc_jalr(alt->old_ptr,
> > > > +                                                        alt->alt_len,
> > > > +                                                        alt->old_ptr - alt->alt_ptr);
> > >
> > > Here you're casting a void pointer to an instruction to an unsigned
> > > int pointer, but since we enable compressed instructions this may
> > > result in an unaligned pointer. Using this pointer will work, but may
> > > be slow. Eg. fault to m-mode to be patched up. We already do that in
> > > other places in the arch/riscv, but I'd prefer not to add new
> > > instances of this.
> >
> > Alternative instruction sequences (old and new) have compression disabled.
>
> That was my first thought as well, but I think Emil was talking more about the
> placement of the alternative block inside the running kernel.
>
> i.e. I guess the starting point of an alternative sequence could also be unaligned.

Indeed. Instruction alignment is only guaranteed to be 16bits, even
for larger instructions.

> Though I don't _yet_ see how an improvement could look like.

The general strategy would be multiple smaller accesses that are then
stitched together.
This will require (for a 32bit opcode) at least 2 loads, 1 shift and 1
or — for a critical path of 3 instructions.

Given that we are running on RV64, you can handle up to 48bit by
aligning down, performing a 64bit load and (if needed) shifting.

Finally, profiles looks like it will give us support for misaligned
loads (see https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#rva22-profiles
and where the Zicclsm extension is mandated).

Philipp.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives
  2022-11-10 16:49 ` [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives Heiko Stuebner
  2022-11-13 20:31   ` Conor Dooley
  2022-11-14 10:57   ` Emil Renner Berthing
@ 2022-11-15 14:28   ` Lad, Prabhakar
  2022-11-17 11:51     ` Heiko Stübner
  2022-11-21  9:50   ` Lad, Prabhakar
  3 siblings, 1 reply; 51+ messages in thread
From: Lad, Prabhakar @ 2022-11-15 14:28 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	ajones, emil.renner.berthing, Heiko Stuebner

Hi Heiko,

Thank you for the patch.

On Thu, Nov 10, 2022 at 4:50 PM Heiko Stuebner <heiko@sntech.de> wrote:
>
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
>
> Alternatives live in a different section, so addresses used by call
> functions will point to wrong locations after the patch got applied.
>
> Similar to arm64, adjust the location to consider that offset.
>
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  arch/riscv/kernel/cpufeature.c | 79 +++++++++++++++++++++++++++++++++-
>  1 file changed, 77 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 694267d1fe81..026512ca9c4c 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -298,6 +298,74 @@ static u32 __init_or_module cpufeature_probe(unsigned int stage)
>         return cpu_req_feature;
>  }
>
> +#include <asm/parse_asm.h>
> +
> +DECLARE_INSN(jalr, MATCH_JALR, MASK_JALR)
> +DECLARE_INSN(auipc, MATCH_AUIPC, MASK_AUIPC)
> +
> +static inline bool is_auipc_jalr_pair(long insn1, long insn2)
> +{
> +       return is_auipc_insn(insn1) && is_jalr_insn(insn2);
> +}
> +
> +#define JALR_SIGN_MASK         BIT(I_IMM_SIGN_OPOFF - I_IMM_11_0_OPOFF)
> +#define JALR_OFFSET_MASK       I_IMM_11_0_MASK
> +#define AUIPC_OFFSET_MASK      U_IMM_31_12_MASK
> +#define AUIPC_PAD              (0x00001000)
> +#define JALR_SHIFT             I_IMM_11_0_OPOFF
> +
> +#define to_jalr_imm(offset)                                            \
> +       ((offset & I_IMM_11_0_MASK) << I_IMM_11_0_OPOFF)
> +
> +#define to_auipc_imm(offset)                                           \
> +       ((offset & JALR_SIGN_MASK) ?                                    \
> +       ((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) :    \
> +       (offset & AUIPC_OFFSET_MASK))
> +
> +static void riscv_alternative_fix_auipc_jalr(unsigned int *alt_ptr,
> +                                            unsigned int len, int patch_offset)
> +{

I am yet to test this with my ASM code yet, but maybe can we move this
to [0] so that other erratas can make use of it too?

[0] arch/riscv/kernel/patch.c

Cheers,
Prabhakar

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives
  2022-11-15 14:28   ` Lad, Prabhakar
@ 2022-11-17 11:51     ` Heiko Stübner
  0 siblings, 0 replies; 51+ messages in thread
From: Heiko Stübner @ 2022-11-17 11:51 UTC (permalink / raw)
  To: Lad, Prabhakar
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	ajones, emil.renner.berthing

Am Dienstag, 15. November 2022, 15:28:27 CET schrieb Lad, Prabhakar:
> Hi Heiko,
> 
> Thank you for the patch.
> 
> On Thu, Nov 10, 2022 at 4:50 PM Heiko Stuebner <heiko@sntech.de> wrote:
> >
> > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> >
> > Alternatives live in a different section, so addresses used by call
> > functions will point to wrong locations after the patch got applied.
> >
> > Similar to arm64, adjust the location to consider that offset.
> >
> > Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > ---
> >  arch/riscv/kernel/cpufeature.c | 79 +++++++++++++++++++++++++++++++++-
> >  1 file changed, 77 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > index 694267d1fe81..026512ca9c4c 100644
> > --- a/arch/riscv/kernel/cpufeature.c
> > +++ b/arch/riscv/kernel/cpufeature.c
> > @@ -298,6 +298,74 @@ static u32 __init_or_module cpufeature_probe(unsigned int stage)
> >         return cpu_req_feature;
> >  }
> >
> > +#include <asm/parse_asm.h>
> > +
> > +DECLARE_INSN(jalr, MATCH_JALR, MASK_JALR)
> > +DECLARE_INSN(auipc, MATCH_AUIPC, MASK_AUIPC)
> > +
> > +static inline bool is_auipc_jalr_pair(long insn1, long insn2)
> > +{
> > +       return is_auipc_insn(insn1) && is_jalr_insn(insn2);
> > +}
> > +
> > +#define JALR_SIGN_MASK         BIT(I_IMM_SIGN_OPOFF - I_IMM_11_0_OPOFF)
> > +#define JALR_OFFSET_MASK       I_IMM_11_0_MASK
> > +#define AUIPC_OFFSET_MASK      U_IMM_31_12_MASK
> > +#define AUIPC_PAD              (0x00001000)
> > +#define JALR_SHIFT             I_IMM_11_0_OPOFF
> > +
> > +#define to_jalr_imm(offset)                                            \
> > +       ((offset & I_IMM_11_0_MASK) << I_IMM_11_0_OPOFF)
> > +
> > +#define to_auipc_imm(offset)                                           \
> > +       ((offset & JALR_SIGN_MASK) ?                                    \
> > +       ((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) :    \
> > +       (offset & AUIPC_OFFSET_MASK))
> > +
> > +static void riscv_alternative_fix_auipc_jalr(unsigned int *alt_ptr,
> > +                                            unsigned int len, int patch_offset)
> > +{
> 
> I am yet to test this with my ASM code yet, but maybe can we move this
> to [0] so that other erratas can make use of it too?
> 
> [0] arch/riscv/kernel/patch.c

yeah, that sounds like a very good plan.

I also want to make the to_foo_imm macros shared.
I.e. right now they're just duplicated from the ftrace patching code.

Heiko



_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives
  2022-11-10 16:49 ` [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives Heiko Stuebner
                     ` (2 preceding siblings ...)
  2022-11-15 14:28   ` Lad, Prabhakar
@ 2022-11-21  9:50   ` Lad, Prabhakar
  2022-11-21 11:27     ` Heiko Stübner
  3 siblings, 1 reply; 51+ messages in thread
From: Lad, Prabhakar @ 2022-11-21  9:50 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	ajones, emil.renner.berthing, Heiko Stuebner

Hi Heiko,

On Thu, Nov 10, 2022 at 4:50 PM Heiko Stuebner <heiko@sntech.de> wrote:
>
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
>
> Alternatives live in a different section, so addresses used by call
> functions will point to wrong locations after the patch got applied.
>
> Similar to arm64, adjust the location to consider that offset.
>
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  arch/riscv/kernel/cpufeature.c | 79 +++++++++++++++++++++++++++++++++-
>  1 file changed, 77 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 694267d1fe81..026512ca9c4c 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -298,6 +298,74 @@ static u32 __init_or_module cpufeature_probe(unsigned int stage)
>         return cpu_req_feature;
>  }
>
> +#include <asm/parse_asm.h>
> +
> +DECLARE_INSN(jalr, MATCH_JALR, MASK_JALR)
> +DECLARE_INSN(auipc, MATCH_AUIPC, MASK_AUIPC)
> +
> +static inline bool is_auipc_jalr_pair(long insn1, long insn2)
> +{
> +       return is_auipc_insn(insn1) && is_jalr_insn(insn2);
> +}
> +
> +#define JALR_SIGN_MASK         BIT(I_IMM_SIGN_OPOFF - I_IMM_11_0_OPOFF)
> +#define JALR_OFFSET_MASK       I_IMM_11_0_MASK
> +#define AUIPC_OFFSET_MASK      U_IMM_31_12_MASK
> +#define AUIPC_PAD              (0x00001000)
> +#define JALR_SHIFT             I_IMM_11_0_OPOFF
> +
> +#define to_jalr_imm(offset)                                            \
> +       ((offset & I_IMM_11_0_MASK) << I_IMM_11_0_OPOFF)
> +
> +#define to_auipc_imm(offset)                                           \
> +       ((offset & JALR_SIGN_MASK) ?                                    \
> +       ((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) :    \
> +       (offset & AUIPC_OFFSET_MASK))
> +
> +static void riscv_alternative_fix_auipc_jalr(unsigned int *alt_ptr,
> +                                            unsigned int len, int patch_offset)
> +{
> +       int num_instr = len / sizeof(u32);
> +       unsigned int call[2];
> +       int i;
> +       int imm1;
> +       u32 rd1;
> +
> +       for (i = 0; i < num_instr; i++) {
> +               /* is there a further instruction? */
> +               if (i + 1 >= num_instr)
> +                       continue;
> +
> +               if (!is_auipc_jalr_pair(*(alt_ptr + i), *(alt_ptr + i + 1)))
> +                       continue;
> +
> +               /* call will use ra register */
> +               rd1 = EXTRACT_RD_REG(*(alt_ptr + i));
> +               if (rd1 != 1)
> +                       continue;
> +
> +               /* get and adjust new target address */
> +               imm1 = EXTRACT_UTYPE_IMM(*(alt_ptr + i));
> +               imm1 += EXTRACT_ITYPE_IMM(*(alt_ptr + i + 1));
> +               imm1 -= patch_offset;
> +
> +               /* pick the original auipc + jalr */
> +               call[0] = *(alt_ptr + i);
> +               call[1] = *(alt_ptr + i + 1);
> +
> +               /* drop the old IMMs */
> +               call[0] &= ~(U_IMM_31_12_MASK);
> +               call[1] &= ~(I_IMM_11_0_MASK << I_IMM_11_0_OPOFF);
> +
> +               /* add the adapted IMMs */
> +               call[0] |= to_auipc_imm(imm1);
> +               call[1] |= to_jalr_imm(imm1);
> +
> +               /* patch the call place again */
> +               patch_text_nosync(alt_ptr + i * sizeof(u32), call, 8);
> +       }
> +}
> +

I have the below assembly code which I have tested without the
alternatives for the RZ/Five CMO,

#define ALT_CMO_OP(_op, _start, _size, _cachesize, _dir, _ops)        \
asm volatile(".option push\n\t\n\t"                    \
         ".option norvc\n\t"                    \
         ".option norelax\n\t"                    \
         "addi sp,sp,-16\n\t"                    \
         "sd    s0,0(sp)\n\t"                    \
         "sd    ra,8(sp)\n\t"                    \
         "addi    s0,sp,16\n\t"                    \
         "mv a4,%6\n\t"                        \
         "mv a3,%5\n\t"                        \
         "mv a2,%4\n\t"                        \
         "mv a1,%3\n\t"                        \
         "mv a0,%0\n\t"                        \
         "call rzfive_cmo\n\t"                    \
         "ld    ra,8(sp)\n\t"                    \
         "ld    s0,0(sp)\n\t"                    \
         "addi    sp,sp,16\n\t"                    \
         ".option pop\n\t"                        \
         : : "r"(_cachesize),                    \
         "r"((unsigned long)(_start) & ~((_cachesize) - 1UL)),    \
         "r"((unsigned long)(_start) + (_size)),            \
         "r"((unsigned long) (_start)),                \
         "r"((unsigned long) (_size)),                \
         "r"((unsigned long) (_dir)),                \
         "r"((unsigned long) (_ops))                \
         : "a0", "a1", "a2", "a3", "a4", "memory")

Now when integrate this with ALTERNATIVE_2() as below,

#define ALT_CMO_OP(_op, _start, _size, _cachesize, _dir, _ops)        \
asm volatile(ALTERNATIVE_2(                        \
    __nops(14),                            \
    "mv a0, %1\n\t"                            \
    "j 2f\n\t"                            \
    "3:\n\t"                            \
    "cbo." __stringify(_op) " (a0)\n\t"                \
    "add a0, a0, %0\n\t"                        \
    "2:\n\t"                            \
    "bltu a0, %2, 3b\n\t"                        \
    __nops(8), 0, CPUFEATURE_ZICBOM, CONFIG_RISCV_ISA_ZICBOM,    \
    ".option push\n\t\n\t"                        \
    ".option norvc\n\t"                        \
    ".option norelax\n\t"                        \
    "addi sp,sp,-16\n\t"                        \
    "sd    s0,0(sp)\n\t"                        \
    "sd    ra,8(sp)\n\t"                        \
    "addi    s0,sp,16\n\t"                        \
    "mv a4,%6\n\t"                            \
    "mv a3,%5\n\t"                            \
    "mv a2,%4\n\t"                            \
    "mv a1,%3\n\t"                            \
    "mv a0,%0\n\t"                            \
    "call rzfive_cmo\n\t"                \
    "ld    ra,8(sp)\n\t"                        \
    "ld    s0,0(sp)\n\t"                        \
    "addi    sp,sp,16\n\t"                        \
    ".option pop\n\t"                        \
    , ANDESTECH_VENDOR_ID,                        \
            ERRATA_ANDESTECH_NO_IOCP, CONFIG_ERRATA_RZFIVE_CMO)    \
    : : "r"(_cachesize),                        \
    "r"((unsigned long)(_start) & ~((_cachesize) - 1UL)),    \
    "r"((unsigned long)(_start) + (_size)),            \
    "r"((unsigned long) (_start)),                \
    "r"((unsigned long) (_size)),                \
    "r"((unsigned long) (_dir)),                \
    "r"((unsigned long) (_ops))                \
    : "a0", "a1", "a2", "a3", "a4", "memory")

I am seeing kernel panic with this change. Looking at the
riscv_alternative_fix_auipc_jalr() implementation it assumes the rest
of the alternative options are calls too. Is my understanding correct
here?

Do you think this is the correct approach in my case?

Note, I wanted to test with ALTERNATIVE_2() first to make sure
everything is okay and then later test my ALTERNATIVE_3()
implementation.

Cheers,
Prabhakar

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives
  2022-11-21  9:50   ` Lad, Prabhakar
@ 2022-11-21 11:27     ` Heiko Stübner
  2022-11-21 15:06       ` Lad, Prabhakar
  0 siblings, 1 reply; 51+ messages in thread
From: Heiko Stübner @ 2022-11-21 11:27 UTC (permalink / raw)
  To: Lad, Prabhakar
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	ajones, emil.renner.berthing

Hi,

Am Montag, 21. November 2022, 10:50:09 CET schrieb Lad, Prabhakar:
> On Thu, Nov 10, 2022 at 4:50 PM Heiko Stuebner <heiko@sntech.de> wrote:
> >
> > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> >
> > Alternatives live in a different section, so addresses used by call
> > functions will point to wrong locations after the patch got applied.
> >
> > Similar to arm64, adjust the location to consider that offset.
> >
> > Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > ---

[...]

> I have the below assembly code which I have tested without the
> alternatives for the RZ/Five CMO,
> 
> #define ALT_CMO_OP(_op, _start, _size, _cachesize, _dir, _ops)        \
> asm volatile(".option push\n\t\n\t"                    \
>          ".option norvc\n\t"                    \
>          ".option norelax\n\t"                    \
>          "addi sp,sp,-16\n\t"                    \
>          "sd    s0,0(sp)\n\t"                    \
>          "sd    ra,8(sp)\n\t"                    \
>          "addi    s0,sp,16\n\t"                    \
>          "mv a4,%6\n\t"                        \
>          "mv a3,%5\n\t"                        \
>          "mv a2,%4\n\t"                        \
>          "mv a1,%3\n\t"                        \
>          "mv a0,%0\n\t"                        \
>          "call rzfive_cmo\n\t"                    \
>          "ld    ra,8(sp)\n\t"                    \
>          "ld    s0,0(sp)\n\t"                    \
>          "addi    sp,sp,16\n\t"                    \
>          ".option pop\n\t"                        \
>          : : "r"(_cachesize),                    \
>          "r"((unsigned long)(_start) & ~((_cachesize) - 1UL)),    \
>          "r"((unsigned long)(_start) + (_size)),            \
>          "r"((unsigned long) (_start)),                \
>          "r"((unsigned long) (_size)),                \
>          "r"((unsigned long) (_dir)),                \
>          "r"((unsigned long) (_ops))                \
>          : "a0", "a1", "a2", "a3", "a4", "memory")
>
> Now when integrate this with ALTERNATIVE_2() as below,
> 
> #define ALT_CMO_OP(_op, _start, _size, _cachesize, _dir, _ops)        \
> asm volatile(ALTERNATIVE_2(                        \
>     __nops(14),                            \
>     "mv a0, %1\n\t"                            \
>     "j 2f\n\t"                            \
>     "3:\n\t"                            \
>     "cbo." __stringify(_op) " (a0)\n\t"                \
>     "add a0, a0, %0\n\t"                        \
>     "2:\n\t"                            \
>     "bltu a0, %2, 3b\n\t"                        \
>     __nops(8), 0, CPUFEATURE_ZICBOM, CONFIG_RISCV_ISA_ZICBOM,    \
>     ".option push\n\t\n\t"                        \
>     ".option norvc\n\t"                        \
>     ".option norelax\n\t"                        \
>     "addi sp,sp,-16\n\t"                        \
>     "sd    s0,0(sp)\n\t"                        \
>     "sd    ra,8(sp)\n\t"                        \
>     "addi    s0,sp,16\n\t"                        \
>     "mv a4,%6\n\t"                            \
>     "mv a3,%5\n\t"                            \
>     "mv a2,%4\n\t"                            \
>     "mv a1,%3\n\t"                            \
>     "mv a0,%0\n\t"                            \
>     "call rzfive_cmo\n\t"                \
>     "ld    ra,8(sp)\n\t"                        \
>     "ld    s0,0(sp)\n\t"                        \
>     "addi    sp,sp,16\n\t"                        \
>     ".option pop\n\t"                        \
>     , ANDESTECH_VENDOR_ID,                        \
>             ERRATA_ANDESTECH_NO_IOCP, CONFIG_ERRATA_RZFIVE_CMO)    \
>     : : "r"(_cachesize),                        \
>     "r"((unsigned long)(_start) & ~((_cachesize) - 1UL)),    \
>     "r"((unsigned long)(_start) + (_size)),            \
>     "r"((unsigned long) (_start)),                \
>     "r"((unsigned long) (_size)),                \
>     "r"((unsigned long) (_dir)),                \
>     "r"((unsigned long) (_ops))                \
>     : "a0", "a1", "a2", "a3", "a4", "memory")
> 
> I am seeing kernel panic with this change. Looking at the
> riscv_alternative_fix_auipc_jalr() implementation it assumes the rest
> of the alternative options are calls too. Is my understanding correct
> here?

The loop walks through the instructions after the location got patched and
checks if an instruction is an auipc and the next one is a jalr and only then
adjusts the address accordingly.

So it _should_ leave all other (non auipc+jalr) instructions alone.
(hopefully)


> Do you think this is the correct approach in my case?

It does look correct on first glance.

As I had that passing thought, are you actually calling
	riscv_alternative_fix_auipc_jalr()
from your errata/.../foo.c after doing the patching?

I.e. with the current patchset, that function is only called from the
cpufeature part, but for example not from the other patching locations.
[and a future revision should probably change that :-) ]


After making sure that function actually runs, the next thing you could try
is to have both the "original" code and the patch be identical, i.e.
replace the cbo* part with your code as well and then just output the
instructions via printk to check what the addresses do in both.

After riscv_alternative_fix_auipc_jalr() ran then both code variants
should be identical when using the same code in both areas.


> Note, I wanted to test with ALTERNATIVE_2() first to make sure
> everything is okay and then later test my ALTERNATIVE_3()
> implementation.

sounds like a very sensible idea to use the existing macros
first for verification :-)


Heiko



_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives
  2022-11-21 11:27     ` Heiko Stübner
@ 2022-11-21 15:06       ` Lad, Prabhakar
  2022-11-21 21:31         ` Lad, Prabhakar
  0 siblings, 1 reply; 51+ messages in thread
From: Lad, Prabhakar @ 2022-11-21 15:06 UTC (permalink / raw)
  To: Heiko Stübner
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	ajones, emil.renner.berthing

Hi Heiko,

On Mon, Nov 21, 2022 at 11:27 AM Heiko Stübner <heiko@sntech.de> wrote:
>
> Hi,
>
> Am Montag, 21. November 2022, 10:50:09 CET schrieb Lad, Prabhakar:
> > On Thu, Nov 10, 2022 at 4:50 PM Heiko Stuebner <heiko@sntech.de> wrote:
> > >
> > > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > >
> > > Alternatives live in a different section, so addresses used by call
> > > functions will point to wrong locations after the patch got applied.
> > >
> > > Similar to arm64, adjust the location to consider that offset.
> > >
> > > Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > > ---
>
> [...]
>
> > I have the below assembly code which I have tested without the
> > alternatives for the RZ/Five CMO,
> >
> > #define ALT_CMO_OP(_op, _start, _size, _cachesize, _dir, _ops)        \
> > asm volatile(".option push\n\t\n\t"                    \
> >          ".option norvc\n\t"                    \
> >          ".option norelax\n\t"                    \
> >          "addi sp,sp,-16\n\t"                    \
> >          "sd    s0,0(sp)\n\t"                    \
> >          "sd    ra,8(sp)\n\t"                    \
> >          "addi    s0,sp,16\n\t"                    \
> >          "mv a4,%6\n\t"                        \
> >          "mv a3,%5\n\t"                        \
> >          "mv a2,%4\n\t"                        \
> >          "mv a1,%3\n\t"                        \
> >          "mv a0,%0\n\t"                        \
> >          "call rzfive_cmo\n\t"                    \
> >          "ld    ra,8(sp)\n\t"                    \
> >          "ld    s0,0(sp)\n\t"                    \
> >          "addi    sp,sp,16\n\t"                    \
> >          ".option pop\n\t"                        \
> >          : : "r"(_cachesize),                    \
> >          "r"((unsigned long)(_start) & ~((_cachesize) - 1UL)),    \
> >          "r"((unsigned long)(_start) + (_size)),            \
> >          "r"((unsigned long) (_start)),                \
> >          "r"((unsigned long) (_size)),                \
> >          "r"((unsigned long) (_dir)),                \
> >          "r"((unsigned long) (_ops))                \
> >          : "a0", "a1", "a2", "a3", "a4", "memory")
> >
> > Now when integrate this with ALTERNATIVE_2() as below,
> >
> > #define ALT_CMO_OP(_op, _start, _size, _cachesize, _dir, _ops)        \
> > asm volatile(ALTERNATIVE_2(                        \
> >     __nops(14),                            \
> >     "mv a0, %1\n\t"                            \
> >     "j 2f\n\t"                            \
> >     "3:\n\t"                            \
> >     "cbo." __stringify(_op) " (a0)\n\t"                \
> >     "add a0, a0, %0\n\t"                        \
> >     "2:\n\t"                            \
> >     "bltu a0, %2, 3b\n\t"                        \
> >     __nops(8), 0, CPUFEATURE_ZICBOM, CONFIG_RISCV_ISA_ZICBOM,    \
> >     ".option push\n\t\n\t"                        \
> >     ".option norvc\n\t"                        \
> >     ".option norelax\n\t"                        \
> >     "addi sp,sp,-16\n\t"                        \
> >     "sd    s0,0(sp)\n\t"                        \
> >     "sd    ra,8(sp)\n\t"                        \
> >     "addi    s0,sp,16\n\t"                        \
> >     "mv a4,%6\n\t"                            \
> >     "mv a3,%5\n\t"                            \
> >     "mv a2,%4\n\t"                            \
> >     "mv a1,%3\n\t"                            \
> >     "mv a0,%0\n\t"                            \
> >     "call rzfive_cmo\n\t"                \
> >     "ld    ra,8(sp)\n\t"                        \
> >     "ld    s0,0(sp)\n\t"                        \
> >     "addi    sp,sp,16\n\t"                        \
> >     ".option pop\n\t"                        \
> >     , ANDESTECH_VENDOR_ID,                        \
> >             ERRATA_ANDESTECH_NO_IOCP, CONFIG_ERRATA_RZFIVE_CMO)    \
> >     : : "r"(_cachesize),                        \
> >     "r"((unsigned long)(_start) & ~((_cachesize) - 1UL)),    \
> >     "r"((unsigned long)(_start) + (_size)),            \
> >     "r"((unsigned long) (_start)),                \
> >     "r"((unsigned long) (_size)),                \
> >     "r"((unsigned long) (_dir)),                \
> >     "r"((unsigned long) (_ops))                \
> >     : "a0", "a1", "a2", "a3", "a4", "memory")
> >
> > I am seeing kernel panic with this change. Looking at the
> > riscv_alternative_fix_auipc_jalr() implementation it assumes the rest
> > of the alternative options are calls too. Is my understanding correct
> > here?
>
> The loop walks through the instructions after the location got patched and
> checks if an instruction is an auipc and the next one is a jalr and only then
> adjusts the address accordingly.
>
Ok so my understanding was wrong here.

> So it _should_ leave all other (non auipc+jalr) instructions alone.
> (hopefully)
>
Agreed.

>
> > Do you think this is the correct approach in my case?
>
> It does look correct on first glance.
>
\o/

> As I had that passing thought, are you actually calling
>         riscv_alternative_fix_auipc_jalr()
> from your errata/.../foo.c after doing the patching?
>
> I.e. with the current patchset, that function is only called from the
> cpufeature part, but for example not from the other patching locations.
> [and a future revision should probably change that :-) ]
>
>
I have made a local copy of riscv_alternative_fix_auipc_jalr() and
then calling it after patch_text_nosync() referring to your patch for
str functions.

> After making sure that function actually runs, the next thing you could try
> is to have both the "original" code and the patch be identical, i.e.
> replace the cbo* part with your code as well and then just output the
> instructions via printk to check what the addresses do in both.
>
> After riscv_alternative_fix_auipc_jalr() ran then both code variants
> should be identical when using the same code in both areas.
>
So I have added debug prints to match the instructions as below after
and before patching:

static void riscv_alternative_print_inst(unsigned int *alt_ptr,
                     unsigned int len)
{
    int num_instr = len / sizeof(u32);
    int i;

    for (i = 0; i < num_instr; i++)
        pr_err("%s instruction: 0x%x\n", __func__, *(alt_ptr + i));

}

void __init_or_module andes_errata_patch_func(struct alt_entry *begin,
struct alt_entry *end,
                          unsigned long archid, unsigned long impid,
                          unsigned int stage)
{
....
    if (cpu_req_errata & tmp) {
        pr_err("stage: %x -> %px--> %x %x %x\n", stage, alt, tmp,
cpu_req_errata, alt->errata_id);
        pr_err("old:%ps alt:%ps len:%lx\n", alt->old_ptr,
alt->alt_ptr, alt->alt_len);
        pr_err("Print old start\n");
        riscv_alternative_print_inst(alt->old_ptr, alt->alt_len);
        pr_err("Print old end\n");
        patch_text_nosync(alt->old_ptr, alt->alt_ptr, alt->alt_len);

        riscv_alternative_fix_auipc_jalr(alt->old_ptr, alt->alt_len,
                        alt->old_ptr - alt->alt_ptr);
        pr_err("Print patch start\n");
        riscv_alternative_print_inst(alt->alt_ptr, alt->alt_len);
        pr_err("Print patch end\n");
    }
.....
}

Below is the log:
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] Print new old end
[    0.000000] riscv_alternative_fix_auipc_jalr num instruction: 14
[    0.000000] Print patch start
[    0.000000] riscv_alternative_print_inst instruction: 0xff010113
[    0.000000] riscv_alternative_print_inst instruction: 0x813023
[    0.000000] riscv_alternative_print_inst instruction: 0x113423
[    0.000000] riscv_alternative_print_inst instruction: 0x1010413
[    0.000000] riscv_alternative_print_inst instruction: 0xf0713
[    0.000000] riscv_alternative_print_inst instruction: 0x78693
[    0.000000] riscv_alternative_print_inst instruction: 0x88613
[    0.000000] riscv_alternative_print_inst instruction: 0x80593
[    0.000000] riscv_alternative_print_inst instruction: 0xe0513
[    0.000000] riscv_alternative_print_inst instruction: 0x97
[    0.000000] riscv_alternative_print_inst instruction: 0xcba080e7
[    0.000000] riscv_alternative_print_inst instruction: 0x813083
[    0.000000] riscv_alternative_print_inst instruction: 0x13403
[    0.000000] riscv_alternative_print_inst instruction: 0x1010113
[    0.000000] Print patch end
[    0.000000] stage: 0 -> ffffffff80a2492c--> 1 1 0
[    0.000000] old:arch_sync_dma_for_device
alt:riscv_noncoherent_supported len:38
[    0.000000] Print  old start
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x970013
    ====================> This instruction doesn't look correct it
should be 0x13?
[    0.000000] Print  old end
[    0.000000] riscv_alternative_fix_auipc_jalr num instruction: 14
[    0.000000] Print patch start
[    0.000000] riscv_alternative_print_inst instruction: 0xff010113
[    0.000000] riscv_alternative_print_inst instruction: 0x813023
[    0.000000] riscv_alternative_print_inst instruction: 0x113423
[    0.000000] riscv_alternative_print_inst instruction: 0x1010413
[    0.000000] riscv_alternative_print_inst instruction: 0x78713
[    0.000000] riscv_alternative_print_inst instruction: 0x78693
[    0.000000] riscv_alternative_print_inst instruction: 0x88613
[    0.000000] riscv_alternative_print_inst instruction: 0x80593
[    0.000000] riscv_alternative_print_inst instruction: 0xe0513
[    0.000000] riscv_alternative_print_inst instruction: 0x97
[    0.000000] riscv_alternative_print_inst instruction: 0xc82080e7
====================> This instruction doesn't look correct comparing
to objdump output this should be 000080e7 or does it require the
offset too?
[    0.000000] riscv_alternative_print_inst instruction: 0x813083
[    0.000000] riscv_alternative_print_inst instruction: 0x13403
[    0.000000] riscv_alternative_print_inst instruction: 0x1010113
[    0.000000] Print patch end
[    0.000000] stage: 0 -> ffffffff80a24950--> 1 1 0
[    0.000000] old:arch_sync_dma_for_cpu alt:riscv_noncoherent_supported len:38
[    0.000000] Print old start
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x97
====================> This instruction doesn't look correct it should
be 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0xeee080e7
      ====================> This instruction doesn't look correct it
should be 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] Print old end
[    0.000000] riscv_alternative_fix_auipc_jalr num instruction: 14
[    0.000000] Print patch start
[    0.000000] riscv_alternative_print_inst instruction: 0xff010113
[    0.000000] riscv_alternative_print_inst instruction: 0x813023
[    0.000000] riscv_alternative_print_inst instruction: 0x113423
[    0.000000] riscv_alternative_print_inst instruction: 0x1010413
[    0.000000] riscv_alternative_print_inst instruction: 0xf0713
[    0.000000] riscv_alternative_print_inst instruction: 0x80693
[    0.000000] riscv_alternative_print_inst instruction: 0x88613
[    0.000000] riscv_alternative_print_inst instruction: 0x78593
[    0.000000] riscv_alternative_print_inst instruction: 0xe0513
[    0.000000] riscv_alternative_print_inst instruction: 0x97
[    0.000000] riscv_alternative_print_inst instruction: 0xc4a080e7
====================> This instruction doesn't look correct comparing
to objdump output this should be 000080e7 or does it require the
offset too?
[    0.000000] riscv_alternative_print_inst instruction: 0x813083
[    0.000000] riscv_alternative_print_inst instruction: 0x13403
[    0.000000] riscv_alternative_print_inst instruction: 0x1010113
[    0.000000] Print patch end
[    0.000000] stage: 0 -> ffffffff80a24974--> 1 1 0
[    0.000000] old:arch_dma_prep_coherent alt:riscv_noncoherent_supported len:38
[    0.000000] Print old start
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x970013
====================> This instruction doesn't look correct it should
be 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x80e70000
====================> This instruction doesn't look correct it should
be 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0xe720
====================> This instruction doesn't look correct it should
be 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] riscv_alternative_print_inst instruction: 0x13
[    0.000000] Print old end
[    0.000000] riscv_alternative_fix_auipc_jalr num instruction: 14
[    0.000000] Print patch start
[    0.000000] riscv_alternative_print_inst instruction: 0xff010113
[    0.000000] riscv_alternative_print_inst instruction: 0x813023
[    0.000000] riscv_alternative_print_inst instruction: 0x113423
[    0.000000] riscv_alternative_print_inst instruction: 0x1010413
[    0.000000] riscv_alternative_print_inst instruction: 0xf0713
[    0.000000] riscv_alternative_print_inst instruction: 0xe8693
[    0.000000] riscv_alternative_print_inst instruction: 0x88613
[    0.000000] riscv_alternative_print_inst instruction: 0x78593
[    0.000000] riscv_alternative_print_inst instruction: 0x30513
[    0.000000] riscv_alternative_print_inst instruction: 0x97
[    0.000000] riscv_alternative_print_inst instruction: 0xc12080e7
====================> This instruction doesn't look correct comparing
to objdump output this should be 000080e7 + offset?
[    0.000000] riscv_alternative_print_inst instruction: 0x813083
[    0.000000] riscv_alternative_print_inst instruction: 0x13403
[    0.000000] riscv_alternative_print_inst instruction: 0x1010113
[    0.000000] Print patch end

Here is the output from objdump of the file (dma-noncoherent.o):

000000000000032e <.L888^B1>:
 32e:    ff010113              addi    sp,sp,-16
 332:    00813023              sd    s0,0(sp)
 336:    00113423              sd    ra,8(sp)
 33a:    01010413              addi    s0,sp,16
 33e:    000f0713              mv    a4,t5
 342:    00078693              mv    a3,a5
 346:    00088613              mv    a2,a7
 34a:    00080593              mv    a1,a6
 34e:    000e0513              mv    a0,t3
 352:    00000097              auipc    ra,0x0
 356:    000080e7              jalr    ra # 352 <.L888^B1+0x24>
 35a:    00813083              ld    ra,8(sp)
 35e:    00013403              ld    s0,0(sp)
 362:    01010113              addi    sp,sp,16

0000000000000366 <.L888^B2>:
 366:    ff010113              addi    sp,sp,-16
 36a:    00813023              sd    s0,0(sp)
 36e:    00113423              sd    ra,8(sp)
 372:    01010413              addi    s0,sp,16
 376:    00078713              mv    a4,a5
 37a:    00078693              mv    a3,a5
 37e:    00088613              mv    a2,a7
 382:    00080593              mv    a1,a6
 386:    000e0513              mv    a0,t3
 38a:    00000097              auipc    ra,0x0
 38e:    000080e7              jalr    ra # 38a <.L888^B2+0x24>
 392:    00813083              ld    ra,8(sp)
 396:    00013403              ld    s0,0(sp)
 39a:    01010113              addi    sp,sp,16

000000000000039e <.L888^B3>:
 39e:    ff010113              addi    sp,sp,-16
 3a2:    00813023              sd    s0,0(sp)
 3a6:    00113423              sd    ra,8(sp)
 3aa:    01010413              addi    s0,sp,16
 3ae:    000f0713              mv    a4,t5
 3b2:    00080693              mv    a3,a6
 3b6:    00088613              mv    a2,a7
 3ba:    00078593              mv    a1,a5
 3be:    000e0513              mv    a0,t3
 3c2:    00000097              auipc    ra,0x0
 3c6:    000080e7              jalr    ra # 3c2 <.L888^B3+0x24>
 3ca:    00813083              ld    ra,8(sp)
 3ce:    00013403              ld    s0,0(sp)
 3d2:    01010113              addi    sp,sp,16

00000000000003d6 <.L888^B4>:
 3d6:    ff010113              addi    sp,sp,-16
 3da:    00813023              sd    s0,0(sp)
 3de:    00113423              sd    ra,8(sp)
 3e2:    01010413              addi    s0,sp,16
 3e6:    000f0713              mv    a4,t5
 3ea:    000e8693              mv    a3,t4
 3ee:    00088613              mv    a2,a7
 3f2:    00078593              mv    a1,a5
 3f6:    00030513              mv    a0,t1
 3fa:    00000097              auipc    ra,0x0
 3fe:    000080e7              jalr    ra # 3fa <.L888^B4+0x24>
 402:    00813083              ld    ra,8(sp)
 406:    00013403              ld    s0,0(sp)
 40a:    01010113              addi    sp,sp,16

Disassembly of section __ksymtab_strings:

Any pointers what could be happening?

>
> > Note, I wanted to test with ALTERNATIVE_2() first to make sure
> > everything is okay and then later test my ALTERNATIVE_3()
> > implementation.
>
> sounds like a very sensible idea to use the existing macros
> first for verification :-)
>
:)

Cheers,
Prabhakar

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives
  2022-11-21 15:06       ` Lad, Prabhakar
@ 2022-11-21 21:31         ` Lad, Prabhakar
  2022-11-21 22:17           ` Heiko Stübner
  0 siblings, 1 reply; 51+ messages in thread
From: Lad, Prabhakar @ 2022-11-21 21:31 UTC (permalink / raw)
  To: Heiko Stübner
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	ajones, emil.renner.berthing

[-- Attachment #1: Type: text/plain, Size: 22090 bytes --]

Hi Heiko,

On Mon, Nov 21, 2022 at 3:06 PM Lad, Prabhakar
<prabhakar.csengg@gmail.com> wrote:
>
> Hi Heiko,
>
> On Mon, Nov 21, 2022 at 11:27 AM Heiko Stübner <heiko@sntech.de> wrote:
> >
> > Hi,
> >
> > Am Montag, 21. November 2022, 10:50:09 CET schrieb Lad, Prabhakar:
> > > On Thu, Nov 10, 2022 at 4:50 PM Heiko Stuebner <heiko@sntech.de> wrote:
> > > >
> > > > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > > >
> > > > Alternatives live in a different section, so addresses used by call
> > > > functions will point to wrong locations after the patch got applied.
> > > >
> > > > Similar to arm64, adjust the location to consider that offset.
> > > >
> > > > Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > > > ---
> >
> > [...]
> >
> > > I have the below assembly code which I have tested without the
> > > alternatives for the RZ/Five CMO,
> > >
> > > #define ALT_CMO_OP(_op, _start, _size, _cachesize, _dir, _ops)        \
> > > asm volatile(".option push\n\t\n\t"                    \
> > >          ".option norvc\n\t"                    \
> > >          ".option norelax\n\t"                    \
> > >          "addi sp,sp,-16\n\t"                    \
> > >          "sd    s0,0(sp)\n\t"                    \
> > >          "sd    ra,8(sp)\n\t"                    \
> > >          "addi    s0,sp,16\n\t"                    \
> > >          "mv a4,%6\n\t"                        \
> > >          "mv a3,%5\n\t"                        \
> > >          "mv a2,%4\n\t"                        \
> > >          "mv a1,%3\n\t"                        \
> > >          "mv a0,%0\n\t"                        \
> > >          "call rzfive_cmo\n\t"                    \
> > >          "ld    ra,8(sp)\n\t"                    \
> > >          "ld    s0,0(sp)\n\t"                    \
> > >          "addi    sp,sp,16\n\t"                    \
> > >          ".option pop\n\t"                        \
> > >          : : "r"(_cachesize),                    \
> > >          "r"((unsigned long)(_start) & ~((_cachesize) - 1UL)),    \
> > >          "r"((unsigned long)(_start) + (_size)),            \
> > >          "r"((unsigned long) (_start)),                \
> > >          "r"((unsigned long) (_size)),                \
> > >          "r"((unsigned long) (_dir)),                \
> > >          "r"((unsigned long) (_ops))                \
> > >          : "a0", "a1", "a2", "a3", "a4", "memory")
> > >
> > > Now when integrate this with ALTERNATIVE_2() as below,
> > >
> > > #define ALT_CMO_OP(_op, _start, _size, _cachesize, _dir, _ops)        \
> > > asm volatile(ALTERNATIVE_2(                        \
> > >     __nops(14),                            \
> > >     "mv a0, %1\n\t"                            \
> > >     "j 2f\n\t"                            \
> > >     "3:\n\t"                            \
> > >     "cbo." __stringify(_op) " (a0)\n\t"                \
> > >     "add a0, a0, %0\n\t"                        \
> > >     "2:\n\t"                            \
> > >     "bltu a0, %2, 3b\n\t"                        \
> > >     __nops(8), 0, CPUFEATURE_ZICBOM, CONFIG_RISCV_ISA_ZICBOM,    \
> > >     ".option push\n\t\n\t"                        \
> > >     ".option norvc\n\t"                        \
> > >     ".option norelax\n\t"                        \
> > >     "addi sp,sp,-16\n\t"                        \
> > >     "sd    s0,0(sp)\n\t"                        \
> > >     "sd    ra,8(sp)\n\t"                        \
> > >     "addi    s0,sp,16\n\t"                        \
> > >     "mv a4,%6\n\t"                            \
> > >     "mv a3,%5\n\t"                            \
> > >     "mv a2,%4\n\t"                            \
> > >     "mv a1,%3\n\t"                            \
> > >     "mv a0,%0\n\t"                            \
> > >     "call rzfive_cmo\n\t"                \
> > >     "ld    ra,8(sp)\n\t"                        \
> > >     "ld    s0,0(sp)\n\t"                        \
> > >     "addi    sp,sp,16\n\t"                        \
> > >     ".option pop\n\t"                        \
> > >     , ANDESTECH_VENDOR_ID,                        \
> > >             ERRATA_ANDESTECH_NO_IOCP, CONFIG_ERRATA_RZFIVE_CMO)    \
> > >     : : "r"(_cachesize),                        \
> > >     "r"((unsigned long)(_start) & ~((_cachesize) - 1UL)),    \
> > >     "r"((unsigned long)(_start) + (_size)),            \
> > >     "r"((unsigned long) (_start)),                \
> > >     "r"((unsigned long) (_size)),                \
> > >     "r"((unsigned long) (_dir)),                \
> > >     "r"((unsigned long) (_ops))                \
> > >     : "a0", "a1", "a2", "a3", "a4", "memory")
> > >
> > > I am seeing kernel panic with this change. Looking at the
> > > riscv_alternative_fix_auipc_jalr() implementation it assumes the rest
> > > of the alternative options are calls too. Is my understanding correct
> > > here?
> >
> > The loop walks through the instructions after the location got patched and
> > checks if an instruction is an auipc and the next one is a jalr and only then
> > adjusts the address accordingly.
> >
> Ok so my understanding was wrong here.
>
> > So it _should_ leave all other (non auipc+jalr) instructions alone.
> > (hopefully)
> >
> Agreed.
>
> >
> > > Do you think this is the correct approach in my case?
> >
> > It does look correct on first glance.
> >
> \o/
>
> > As I had that passing thought, are you actually calling
> >         riscv_alternative_fix_auipc_jalr()
> > from your errata/.../foo.c after doing the patching?
> >
> > I.e. with the current patchset, that function is only called from the
> > cpufeature part, but for example not from the other patching locations.
> > [and a future revision should probably change that :-) ]
> >
> >
> I have made a local copy of riscv_alternative_fix_auipc_jalr() and
> then calling it after patch_text_nosync() referring to your patch for
> str functions.
>
> > After making sure that function actually runs, the next thing you could try
> > is to have both the "original" code and the patch be identical, i.e.
> > replace the cbo* part with your code as well and then just output the
> > instructions via printk to check what the addresses do in both.
> >
> > After riscv_alternative_fix_auipc_jalr() ran then both code variants
> > should be identical when using the same code in both areas.
> >
> So I have added debug prints to match the instructions as below after
> and before patching:
>
> static void riscv_alternative_print_inst(unsigned int *alt_ptr,
>                      unsigned int len)
> {
>     int num_instr = len / sizeof(u32);
>     int i;
>
>     for (i = 0; i < num_instr; i++)
>         pr_err("%s instruction: 0x%x\n", __func__, *(alt_ptr + i));
>
> }
>
> void __init_or_module andes_errata_patch_func(struct alt_entry *begin,
> struct alt_entry *end,
>                           unsigned long archid, unsigned long impid,
>                           unsigned int stage)
> {
> ....
>     if (cpu_req_errata & tmp) {
>         pr_err("stage: %x -> %px--> %x %x %x\n", stage, alt, tmp,
> cpu_req_errata, alt->errata_id);
>         pr_err("old:%ps alt:%ps len:%lx\n", alt->old_ptr,
> alt->alt_ptr, alt->alt_len);
>         pr_err("Print old start\n");
>         riscv_alternative_print_inst(alt->old_ptr, alt->alt_len);
>         pr_err("Print old end\n");
>         patch_text_nosync(alt->old_ptr, alt->alt_ptr, alt->alt_len);
>
>         riscv_alternative_fix_auipc_jalr(alt->old_ptr, alt->alt_len,
>                         alt->old_ptr - alt->alt_ptr);
>         pr_err("Print patch start\n");
>         riscv_alternative_print_inst(alt->alt_ptr, alt->alt_len);
>         pr_err("Print patch end\n");
>     }
> .....
> }
>
> Below is the log:
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] Print new old end
> [    0.000000] riscv_alternative_fix_auipc_jalr num instruction: 14
> [    0.000000] Print patch start
> [    0.000000] riscv_alternative_print_inst instruction: 0xff010113
> [    0.000000] riscv_alternative_print_inst instruction: 0x813023
> [    0.000000] riscv_alternative_print_inst instruction: 0x113423
> [    0.000000] riscv_alternative_print_inst instruction: 0x1010413
> [    0.000000] riscv_alternative_print_inst instruction: 0xf0713
> [    0.000000] riscv_alternative_print_inst instruction: 0x78693
> [    0.000000] riscv_alternative_print_inst instruction: 0x88613
> [    0.000000] riscv_alternative_print_inst instruction: 0x80593
> [    0.000000] riscv_alternative_print_inst instruction: 0xe0513
> [    0.000000] riscv_alternative_print_inst instruction: 0x97
> [    0.000000] riscv_alternative_print_inst instruction: 0xcba080e7
> [    0.000000] riscv_alternative_print_inst instruction: 0x813083
> [    0.000000] riscv_alternative_print_inst instruction: 0x13403
> [    0.000000] riscv_alternative_print_inst instruction: 0x1010113
> [    0.000000] Print patch end
> [    0.000000] stage: 0 -> ffffffff80a2492c--> 1 1 0
> [    0.000000] old:arch_sync_dma_for_device
> alt:riscv_noncoherent_supported len:38
> [    0.000000] Print  old start
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x970013
>     ====================> This instruction doesn't look correct it
> should be 0x13?
> [    0.000000] Print  old end
> [    0.000000] riscv_alternative_fix_auipc_jalr num instruction: 14
> [    0.000000] Print patch start
> [    0.000000] riscv_alternative_print_inst instruction: 0xff010113
> [    0.000000] riscv_alternative_print_inst instruction: 0x813023
> [    0.000000] riscv_alternative_print_inst instruction: 0x113423
> [    0.000000] riscv_alternative_print_inst instruction: 0x1010413
> [    0.000000] riscv_alternative_print_inst instruction: 0x78713
> [    0.000000] riscv_alternative_print_inst instruction: 0x78693
> [    0.000000] riscv_alternative_print_inst instruction: 0x88613
> [    0.000000] riscv_alternative_print_inst instruction: 0x80593
> [    0.000000] riscv_alternative_print_inst instruction: 0xe0513
> [    0.000000] riscv_alternative_print_inst instruction: 0x97
> [    0.000000] riscv_alternative_print_inst instruction: 0xc82080e7
> ====================> This instruction doesn't look correct comparing
> to objdump output this should be 000080e7 or does it require the
> offset too?
> [    0.000000] riscv_alternative_print_inst instruction: 0x813083
> [    0.000000] riscv_alternative_print_inst instruction: 0x13403
> [    0.000000] riscv_alternative_print_inst instruction: 0x1010113
> [    0.000000] Print patch end
> [    0.000000] stage: 0 -> ffffffff80a24950--> 1 1 0
> [    0.000000] old:arch_sync_dma_for_cpu alt:riscv_noncoherent_supported len:38
> [    0.000000] Print old start
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x97
> ====================> This instruction doesn't look correct it should
> be 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0xeee080e7
>       ====================> This instruction doesn't look correct it
> should be 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] Print old end
> [    0.000000] riscv_alternative_fix_auipc_jalr num instruction: 14
> [    0.000000] Print patch start
> [    0.000000] riscv_alternative_print_inst instruction: 0xff010113
> [    0.000000] riscv_alternative_print_inst instruction: 0x813023
> [    0.000000] riscv_alternative_print_inst instruction: 0x113423
> [    0.000000] riscv_alternative_print_inst instruction: 0x1010413
> [    0.000000] riscv_alternative_print_inst instruction: 0xf0713
> [    0.000000] riscv_alternative_print_inst instruction: 0x80693
> [    0.000000] riscv_alternative_print_inst instruction: 0x88613
> [    0.000000] riscv_alternative_print_inst instruction: 0x78593
> [    0.000000] riscv_alternative_print_inst instruction: 0xe0513
> [    0.000000] riscv_alternative_print_inst instruction: 0x97
> [    0.000000] riscv_alternative_print_inst instruction: 0xc4a080e7
> ====================> This instruction doesn't look correct comparing
> to objdump output this should be 000080e7 or does it require the
> offset too?
> [    0.000000] riscv_alternative_print_inst instruction: 0x813083
> [    0.000000] riscv_alternative_print_inst instruction: 0x13403
> [    0.000000] riscv_alternative_print_inst instruction: 0x1010113
> [    0.000000] Print patch end
> [    0.000000] stage: 0 -> ffffffff80a24974--> 1 1 0
> [    0.000000] old:arch_dma_prep_coherent alt:riscv_noncoherent_supported len:38
> [    0.000000] Print old start
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x970013
> ====================> This instruction doesn't look correct it should
> be 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x80e70000
> ====================> This instruction doesn't look correct it should
> be 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0xe720
> ====================> This instruction doesn't look correct it should
> be 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] riscv_alternative_print_inst instruction: 0x13
> [    0.000000] Print old end
> [    0.000000] riscv_alternative_fix_auipc_jalr num instruction: 14
> [    0.000000] Print patch start
> [    0.000000] riscv_alternative_print_inst instruction: 0xff010113
> [    0.000000] riscv_alternative_print_inst instruction: 0x813023
> [    0.000000] riscv_alternative_print_inst instruction: 0x113423
> [    0.000000] riscv_alternative_print_inst instruction: 0x1010413
> [    0.000000] riscv_alternative_print_inst instruction: 0xf0713
> [    0.000000] riscv_alternative_print_inst instruction: 0xe8693
> [    0.000000] riscv_alternative_print_inst instruction: 0x88613
> [    0.000000] riscv_alternative_print_inst instruction: 0x78593
> [    0.000000] riscv_alternative_print_inst instruction: 0x30513
> [    0.000000] riscv_alternative_print_inst instruction: 0x97
> [    0.000000] riscv_alternative_print_inst instruction: 0xc12080e7
> ====================> This instruction doesn't look correct comparing
> to objdump output this should be 000080e7 + offset?
> [    0.000000] riscv_alternative_print_inst instruction: 0x813083
> [    0.000000] riscv_alternative_print_inst instruction: 0x13403
> [    0.000000] riscv_alternative_print_inst instruction: 0x1010113
> [    0.000000] Print patch end
>
> Here is the output from objdump of the file (dma-noncoherent.o):
>
> 000000000000032e <.L888^B1>:
>  32e:    ff010113              addi    sp,sp,-16
>  332:    00813023              sd    s0,0(sp)
>  336:    00113423              sd    ra,8(sp)
>  33a:    01010413              addi    s0,sp,16
>  33e:    000f0713              mv    a4,t5
>  342:    00078693              mv    a3,a5
>  346:    00088613              mv    a2,a7
>  34a:    00080593              mv    a1,a6
>  34e:    000e0513              mv    a0,t3
>  352:    00000097              auipc    ra,0x0
>  356:    000080e7              jalr    ra # 352 <.L888^B1+0x24>
>  35a:    00813083              ld    ra,8(sp)
>  35e:    00013403              ld    s0,0(sp)
>  362:    01010113              addi    sp,sp,16
>
> 0000000000000366 <.L888^B2>:
>  366:    ff010113              addi    sp,sp,-16
>  36a:    00813023              sd    s0,0(sp)
>  36e:    00113423              sd    ra,8(sp)
>  372:    01010413              addi    s0,sp,16
>  376:    00078713              mv    a4,a5
>  37a:    00078693              mv    a3,a5
>  37e:    00088613              mv    a2,a7
>  382:    00080593              mv    a1,a6
>  386:    000e0513              mv    a0,t3
>  38a:    00000097              auipc    ra,0x0
>  38e:    000080e7              jalr    ra # 38a <.L888^B2+0x24>
>  392:    00813083              ld    ra,8(sp)
>  396:    00013403              ld    s0,0(sp)
>  39a:    01010113              addi    sp,sp,16
>
> 000000000000039e <.L888^B3>:
>  39e:    ff010113              addi    sp,sp,-16
>  3a2:    00813023              sd    s0,0(sp)
>  3a6:    00113423              sd    ra,8(sp)
>  3aa:    01010413              addi    s0,sp,16
>  3ae:    000f0713              mv    a4,t5
>  3b2:    00080693              mv    a3,a6
>  3b6:    00088613              mv    a2,a7
>  3ba:    00078593              mv    a1,a5
>  3be:    000e0513              mv    a0,t3
>  3c2:    00000097              auipc    ra,0x0
>  3c6:    000080e7              jalr    ra # 3c2 <.L888^B3+0x24>
>  3ca:    00813083              ld    ra,8(sp)
>  3ce:    00013403              ld    s0,0(sp)
>  3d2:    01010113              addi    sp,sp,16
>
> 00000000000003d6 <.L888^B4>:
>  3d6:    ff010113              addi    sp,sp,-16
>  3da:    00813023              sd    s0,0(sp)
>  3de:    00113423              sd    ra,8(sp)
>  3e2:    01010413              addi    s0,sp,16
>  3e6:    000f0713              mv    a4,t5
>  3ea:    000e8693              mv    a3,t4
>  3ee:    00088613              mv    a2,a7
>  3f2:    00078593              mv    a1,a5
>  3f6:    00030513              mv    a0,t1
>  3fa:    00000097              auipc    ra,0x0
>  3fe:    000080e7              jalr    ra # 3fa <.L888^B4+0x24>
>  402:    00813083              ld    ra,8(sp)
>  406:    00013403              ld    s0,0(sp)
>  40a:    01010113              addi    sp,sp,16
>
> Disassembly of section __ksymtab_strings:
>
> Any pointers what could be happening?
>

Some more information,

- If I drop the riscv_alternative_fix_auipc_jalr() call after
patch_text_nosync() and then print the alt->old_ptr instructions
before patching I can see the instructions as 0x13 (nop) which is
correct.

- if I call riscv_alternative_fix_auipc_jalr() call after
patch_text_nosync() and then print the alt->old_ptr instructions
before patching I dont see 0x13 (nop) consistently for old
instructions.

- If I replace the nop's in the old instructions with my assembly code
of rz/five cmo and then just use patch_text_nosync() I can see the
correct actual instruction being printed apart from jalr (is some sort
of offset added to it as I see last 4 bits match?) and then is
replaced correctly by the same alt instructions apart from the jalr
(log [0]).

- If I replace the nop's in the old instructions with my assembly code
of rz/five cmo and then use patch_text_nosync() and
riscv_alternative_fix_auipc_jalr() I can see the actual old
instructions differs a bit and again the jalr instruction differs too
in the patched code (log [1]).

[0] https://paste.debian.net/1261412/
[1] https://paste.debian.net/1261413/

Attached is the objump of dma-noncoherent.o for reference.

Note, if I replace the old/orignal instruction to my asm code for
rz/five cmo and replace the errata id's to deadbeef the code works OK.

Cheers,
Prabhakar

[-- Attachment #2: dma-noncoherent.objdump --]
[-- Type: application/octet-stream, Size: 23236 bytes --]


dma-noncoherent.o:     file format elf64-littleriscv


Disassembly of section .text:

0000000000000000 <cache_do_nop>:
   0:	1141                	c.addi	sp,-16
   2:	e422                	c.sdsp	s0,8(sp)
   4:	0800                	c.addi4spn	s0,sp,16
   6:	6422                	c.ldsp	s0,8(sp)
   8:	0141                	c.addi	sp,16
   a:	8082                	c.jr	ra

000000000000000c <testcall_via_asm_cache>:
   c:	4789                	c.li	a5,2
   e:	04f70463          	beq	a4,a5,56 <.L12>
  12:	1141                	c.addi	sp,-16
  14:	e022                	c.sdsp	s0,0(sp)
  16:	e406                	c.sdsp	ra,8(sp)
  18:	0800                	c.addi4spn	s0,sp,16
  1a:	852e                	c.mv	a0,a1
  1c:	c719                	c.beqz	a4,2a <.L16>
  1e:	9af5                	c.andi	a3,-3
  20:	ca91                	c.beqz	a3,34 <.L15>

0000000000000022 <.L6>:
  22:	60a2                	c.ldsp	ra,8(sp)
  24:	6402                	c.ldsp	s0,0(sp)
  26:	0141                	c.addi	sp,16
  28:	8082                	c.jr	ra

000000000000002a <.L16>:
  2a:	4705                	c.li	a4,1
  2c:	00d75d63          	bge	a4,a3,46 <.L17>
  30:	fef699e3          	bne	a3,a5,22 <.L6>

0000000000000034 <.L15>:
  34:	85b2                	c.mv	a1,a2
  36:	00000097          	auipc	ra,0x0
  3a:	000080e7          	jalr	ra # 36 <.L15+0x2>
  3e:	60a2                	c.ldsp	ra,8(sp)
  40:	6402                	c.ldsp	s0,0(sp)
  42:	0141                	c.addi	sp,16
  44:	8082                	c.jr	ra

0000000000000046 <.L17>:
  46:	fc06cee3          	bltz	a3,22 <.L6>
  4a:	85b2                	c.mv	a1,a2
  4c:	00000097          	auipc	ra,0x0
  50:	000080e7          	jalr	ra # 4c <.L17+0x6>
  54:	b7f9                	c.j	22 <.L6>

0000000000000056 <.L12>:
  56:	8082                	c.jr	ra

0000000000000058 <arch_sync_dma_for_device>:
  58:	1141                	c.addi	sp,-16
  5a:	e422                	c.sdsp	s0,8(sp)
  5c:	0800                	c.addi4spn	s0,sp,16
  5e:	00000817          	auipc	a6,0x0
  62:	00083803          	ld	a6,0(a6) # 5e <arch_sync_dma_for_device+0x6>
  66:	87b2                	c.mv	a5,a2
  68:	88ae                	c.mv	a7,a1
  6a:	982a                	c.add	a6,a0
  6c:	c63d                	c.beqz	a2,da <.L19>
  6e:	fff6071b          	addiw	a4,a2,-1
  72:	4685                	c.li	a3,1
  74:	06e6e063          	bltu	a3,a4,d4 <.L18>
  78:	00000e17          	auipc	t3,0x0
  7c:	000e2e03          	lw	t3,0(t3) # 78 <arch_sync_dma_for_device+0x20>
  80:	020e1313          	slli	t1,t3,0x20
  84:	02035313          	srli	t1,t1,0x20
  88:	02061793          	slli	a5,a2,0x20
  8c:	40600333          	neg	t1,t1
  90:	01037333          	and	t1,t1,a6
  94:	01058eb3          	add	t4,a1,a6
  98:	9381                	c.srli	a5,0x20
  9a:	4f01                	c.li	t5,0

000000000000009c <.L886^B1>:
  9c:	ff010113          	addi	sp,sp,-16
  a0:	00813023          	sd	s0,0(sp)
  a4:	00113423          	sd	ra,8(sp)
  a8:	01010413          	addi	s0,sp,16
  ac:	000f0713          	mv	a4,t5
  b0:	00078693          	mv	a3,a5
  b4:	00088613          	mv	a2,a7
  b8:	00080593          	mv	a1,a6
  bc:	000e0513          	mv	a0,t3
  c0:	00000097          	auipc	ra,0x0
  c4:	000080e7          	jalr	ra # c0 <.L886^B1+0x24>
  c8:	00813083          	ld	ra,8(sp)
  cc:	00013403          	ld	s0,0(sp)
  d0:	01010113          	addi	sp,sp,16

00000000000000d4 <.L18>:
  d4:	6422                	c.ldsp	s0,8(sp)
  d6:	0141                	c.addi	sp,16
  d8:	8082                	c.jr	ra

00000000000000da <.L19>:
  da:	00000e17          	auipc	t3,0x0
  de:	000e2e03          	lw	t3,0(t3) # da <.L19>
  e2:	020e1313          	slli	t1,t3,0x20
  e6:	02035313          	srli	t1,t1,0x20
  ea:	40600333          	neg	t1,t1
  ee:	01037333          	and	t1,t1,a6
  f2:	01058eb3          	add	t4,a1,a6

00000000000000f6 <.L886^B2>:
  f6:	ff010113          	addi	sp,sp,-16
  fa:	00813023          	sd	s0,0(sp)
  fe:	00113423          	sd	ra,8(sp)
 102:	01010413          	addi	s0,sp,16
 106:	00078713          	mv	a4,a5
 10a:	00078693          	mv	a3,a5
 10e:	00088613          	mv	a2,a7
 112:	00080593          	mv	a1,a6
 116:	000e0513          	mv	a0,t3
 11a:	00000097          	auipc	ra,0x0
 11e:	000080e7          	jalr	ra # 11a <.L886^B2+0x24>
 122:	00813083          	ld	ra,8(sp)
 126:	00013403          	ld	s0,0(sp)
 12a:	01010113          	addi	sp,sp,16
 12e:	6422                	c.ldsp	s0,8(sp)
 130:	0141                	c.addi	sp,16
 132:	8082                	c.jr	ra

0000000000000134 <arch_sync_dma_for_cpu>:
 134:	1141                	c.addi	sp,-16
 136:	e422                	c.sdsp	s0,8(sp)
 138:	0800                	c.addi4spn	s0,sp,16
 13a:	ffd67793          	andi	a5,a2,-3
 13e:	e7b5                	c.bnez	a5,1aa <.L22>
 140:	00000e17          	auipc	t3,0x0
 144:	000e2e03          	lw	t3,0(t3) # 140 <arch_sync_dma_for_cpu+0xc>
 148:	020e1313          	slli	t1,t3,0x20
 14c:	00000797          	auipc	a5,0x0
 150:	0007b783          	ld	a5,0(a5) # 14c <arch_sync_dma_for_cpu+0x18>
 154:	02035313          	srli	t1,t1,0x20
 158:	97aa                	c.add	a5,a0
 15a:	02061813          	slli	a6,a2,0x20
 15e:	40600333          	neg	t1,t1
 162:	88ae                	c.mv	a7,a1
 164:	00f37333          	and	t1,t1,a5
 168:	00f58eb3          	add	t4,a1,a5
 16c:	02085813          	srli	a6,a6,0x20
 170:	4f05                	c.li	t5,1

0000000000000172 <.L886^B3>:
 172:	ff010113          	addi	sp,sp,-16
 176:	00813023          	sd	s0,0(sp)
 17a:	00113423          	sd	ra,8(sp)
 17e:	01010413          	addi	s0,sp,16
 182:	000f0713          	mv	a4,t5
 186:	00080693          	mv	a3,a6
 18a:	00088613          	mv	a2,a7
 18e:	00078593          	mv	a1,a5
 192:	000e0513          	mv	a0,t3
 196:	00000097          	auipc	ra,0x0
 19a:	000080e7          	jalr	ra # 196 <.L886^B3+0x24>
 19e:	00813083          	ld	ra,8(sp)
 1a2:	00013403          	ld	s0,0(sp)
 1a6:	01010113          	addi	sp,sp,16

00000000000001aa <.L22>:
 1aa:	6422                	c.ldsp	s0,8(sp)
 1ac:	0141                	c.addi	sp,16
 1ae:	8082                	c.jr	ra

00000000000001b0 <arch_dma_prep_coherent>:
 1b0:	1141                	c.addi	sp,-16
 1b2:	e422                	c.sdsp	s0,8(sp)
 1b4:	0800                	c.addi4spn	s0,sp,16
 1b6:	00000797          	auipc	a5,0x0
 1ba:	0007b783          	ld	a5,0(a5) # 1b6 <arch_dma_prep_coherent+0x6>
 1be:	40f507b3          	sub	a5,a0,a5
 1c2:	00000317          	auipc	t1,0x0
 1c6:	00032303          	lw	t1,0(t1) # 1c2 <arch_dma_prep_coherent+0x12>
 1ca:	00000517          	auipc	a0,0x0
 1ce:	00053503          	ld	a0,0(a0) # 1ca <arch_dma_prep_coherent+0x1a>
 1d2:	8799                	c.srai	a5,0x6
 1d4:	97aa                	c.add	a5,a0
 1d6:	02031813          	slli	a6,t1,0x20
 1da:	00000717          	auipc	a4,0x0
 1de:	00073703          	ld	a4,0(a4) # 1da <arch_dma_prep_coherent+0x2a>
 1e2:	07b2                	c.slli	a5,0xc
 1e4:	02085813          	srli	a6,a6,0x20
 1e8:	97ba                	c.add	a5,a4
 1ea:	41000833          	neg	a6,a6
 1ee:	88ae                	c.mv	a7,a1
 1f0:	00f87833          	and	a6,a6,a5
 1f4:	00f58e33          	add	t3,a1,a5
 1f8:	4e81                	c.li	t4,0
 1fa:	4f09                	c.li	t5,2

00000000000001fc <.L886^B4>:
 1fc:	ff010113          	addi	sp,sp,-16
 200:	00813023          	sd	s0,0(sp)
 204:	00113423          	sd	ra,8(sp)
 208:	01010413          	addi	s0,sp,16
 20c:	000f0713          	mv	a4,t5
 210:	000e8693          	mv	a3,t4
 214:	00088613          	mv	a2,a7
 218:	00078593          	mv	a1,a5
 21c:	00030513          	mv	a0,t1
 220:	00000097          	auipc	ra,0x0
 224:	000080e7          	jalr	ra # 220 <.L886^B4+0x24>
 228:	00813083          	ld	ra,8(sp)
 22c:	00013403          	ld	s0,0(sp)
 230:	01010113          	addi	sp,sp,16
 234:	6422                	c.ldsp	s0,8(sp)
 236:	0141                	c.addi	sp,16
 238:	8082                	c.jr	ra

000000000000023a <arch_setup_dma_ops>:
 23a:	7179                	c.addi16sp	sp,-48
 23c:	f022                	c.sdsp	s0,32(sp)
 23e:	ec26                	c.sdsp	s1,24(sp)
 240:	e84a                	c.sdsp	s2,16(sp)
 242:	f406                	c.sdsp	ra,40(sp)
 244:	e44e                	c.sdsp	s3,8(sp)
 246:	1800                	c.addi4spn	s0,sp,48
 248:	84ba                	c.mv	s1,a4
 24a:	892a                	c.mv	s2,a0
 24c:	e331                	c.bnez	a4,290 <.L29>
 24e:	00000997          	auipc	s3,0x0
 252:	00098993          	mv	s3,s3
 256:	0009a703          	lw	a4,0(s3) # 24e <arch_setup_dma_ops+0x14>
 25a:	04000793          	li	a5,64
 25e:	04e7e963          	bltu	a5,a4,2b0 <.L35>

0000000000000262 <.L30>:
 262:	00000797          	auipc	a5,0x0
 266:	0007c783          	lbu	a5,0(a5) # 262 <.L30>
 26a:	e39d                	c.bnez	a5,290 <.L29>
 26c:	854a                	c.mv	a0,s2
 26e:	00000097          	auipc	ra,0x0
 272:	000080e7          	jalr	ra # 26e <.L30+0xc>
 276:	05093603          	ld	a2,80(s2)
 27a:	c22d                	c.beqz	a2,2dc <.L36>

000000000000027c <.L33>:
 27c:	85aa                	c.mv	a1,a0
 27e:	00000517          	auipc	a0,0x0
 282:	00050513          	mv	a0,a0
 286:	00000097          	auipc	ra,0x0
 28a:	000080e7          	jalr	ra # 286 <.L33+0xa>

000000000000028e <.L1^B1>:
 28e:	9002                	c.ebreak

0000000000000290 <.L29>:
 290:	2ec94783          	lbu	a5,748(s2)
 294:	8885                	c.andi	s1,1
 296:	0496                	c.slli	s1,0x5
 298:	fdf7f793          	andi	a5,a5,-33
 29c:	8cdd                	c.or	s1,a5
 29e:	2e990623          	sb	s1,748(s2)
 2a2:	70a2                	c.ldsp	ra,40(sp)
 2a4:	7402                	c.ldsp	s0,32(sp)
 2a6:	64e2                	c.ldsp	s1,24(sp)
 2a8:	6942                	c.ldsp	s2,16(sp)
 2aa:	69a2                	c.ldsp	s3,8(sp)
 2ac:	6145                	c.addi16sp	sp,48
 2ae:	8082                	c.jr	ra

00000000000002b0 <.L35>:
 2b0:	00000097          	auipc	ra,0x0
 2b4:	000080e7          	jalr	ra # 2b0 <.L35>
 2b8:	05093603          	ld	a2,80(s2)
 2bc:	c21d                	c.beqz	a2,2e2 <.L37>

00000000000002be <.L31>:
 2be:	0009a703          	lw	a4,0(s3)
 2c2:	85aa                	c.mv	a1,a0
 2c4:	04000693          	li	a3,64
 2c8:	00000517          	auipc	a0,0x0
 2cc:	00050513          	mv	a0,a0
 2d0:	00000097          	auipc	ra,0x0
 2d4:	000080e7          	jalr	ra # 2d0 <.L31+0x12>

00000000000002d8 <.L1^B2>:
 2d8:	9002                	c.ebreak
 2da:	b761                	c.j	262 <.L30>

00000000000002dc <.L36>:
 2dc:	00093603          	ld	a2,0(s2)
 2e0:	bf71                	c.j	27c <.L33>

00000000000002e2 <.L37>:
 2e2:	00093603          	ld	a2,0(s2)
 2e6:	bfe1                	c.j	2be <.L31>

00000000000002e8 <riscv_noncoherent_supported>:
 2e8:	00000797          	auipc	a5,0x0
 2ec:	0007a783          	lw	a5,0(a5) # 2e8 <riscv_noncoherent_supported>
 2f0:	c799                	c.beqz	a5,2fe <.L44>
 2f2:	4785                	c.li	a5,1
 2f4:	00000717          	auipc	a4,0x0
 2f8:	00f70023          	sb	a5,0(a4) # 2f4 <riscv_noncoherent_supported+0xc>
 2fc:	8082                	c.jr	ra

00000000000002fe <.L44>:
 2fe:	1141                	c.addi	sp,-16
 300:	e406                	c.sdsp	ra,8(sp)
 302:	e022                	c.sdsp	s0,0(sp)
 304:	0800                	c.addi4spn	s0,sp,16
 306:	00000517          	auipc	a0,0x0
 30a:	00050513          	mv	a0,a0
 30e:	00000097          	auipc	ra,0x0
 312:	000080e7          	jalr	ra # 30e <.L44+0x10>

0000000000000316 <.L1^B3>:
 316:	9002                	c.ebreak
 318:	60a2                	c.ldsp	ra,8(sp)
 31a:	6402                	c.ldsp	s0,0(sp)
 31c:	4785                	c.li	a5,1
 31e:	00000717          	auipc	a4,0x0
 322:	00f70023          	sb	a5,0(a4) # 31e <.L1^B3+0x8>
 326:	0141                	c.addi	sp,16
 328:	8082                	c.jr	ra

000000000000032a <.L888^B1>:
 32a:	ff010113          	addi	sp,sp,-16
 32e:	00813023          	sd	s0,0(sp)
 332:	00113423          	sd	ra,8(sp)
 336:	01010413          	addi	s0,sp,16
 33a:	000f0713          	mv	a4,t5
 33e:	00078693          	mv	a3,a5
 342:	00088613          	mv	a2,a7
 346:	00080593          	mv	a1,a6
 34a:	000e0513          	mv	a0,t3
 34e:	00000097          	auipc	ra,0x0
 352:	000080e7          	jalr	ra # 34e <.L888^B1+0x24>
 356:	00813083          	ld	ra,8(sp)
 35a:	00013403          	ld	s0,0(sp)
 35e:	01010113          	addi	sp,sp,16

0000000000000362 <.L888^B2>:
 362:	ff010113          	addi	sp,sp,-16
 366:	00813023          	sd	s0,0(sp)
 36a:	00113423          	sd	ra,8(sp)
 36e:	01010413          	addi	s0,sp,16
 372:	00078713          	mv	a4,a5
 376:	00078693          	mv	a3,a5
 37a:	00088613          	mv	a2,a7
 37e:	00080593          	mv	a1,a6
 382:	000e0513          	mv	a0,t3
 386:	00000097          	auipc	ra,0x0
 38a:	000080e7          	jalr	ra # 386 <.L888^B2+0x24>
 38e:	00813083          	ld	ra,8(sp)
 392:	00013403          	ld	s0,0(sp)
 396:	01010113          	addi	sp,sp,16

000000000000039a <.L888^B3>:
 39a:	ff010113          	addi	sp,sp,-16
 39e:	00813023          	sd	s0,0(sp)
 3a2:	00113423          	sd	ra,8(sp)
 3a6:	01010413          	addi	s0,sp,16
 3aa:	000f0713          	mv	a4,t5
 3ae:	00080693          	mv	a3,a6
 3b2:	00088613          	mv	a2,a7
 3b6:	00078593          	mv	a1,a5
 3ba:	000e0513          	mv	a0,t3
 3be:	00000097          	auipc	ra,0x0
 3c2:	000080e7          	jalr	ra # 3be <.L888^B3+0x24>
 3c6:	00813083          	ld	ra,8(sp)
 3ca:	00013403          	ld	s0,0(sp)
 3ce:	01010113          	addi	sp,sp,16

00000000000003d2 <.L888^B4>:
 3d2:	ff010113          	addi	sp,sp,-16
 3d6:	00813023          	sd	s0,0(sp)
 3da:	00113423          	sd	ra,8(sp)
 3de:	01010413          	addi	s0,sp,16
 3e2:	000f0713          	mv	a4,t5
 3e6:	000e8693          	mv	a3,t4
 3ea:	00088613          	mv	a2,a7
 3ee:	00078593          	mv	a1,a5
 3f2:	00030513          	mv	a0,t1
 3f6:	00000097          	auipc	ra,0x0
 3fa:	000080e7          	jalr	ra # 3f6 <.L888^B4+0x24>
 3fe:	00813083          	ld	ra,8(sp)
 402:	00013403          	ld	s0,0(sp)
 406:	01010113          	addi	sp,sp,16

Disassembly of section __ksymtab_strings:

0000000000000000 <__kstrtab_testcall_via_asm_cache>:
   0:	6574                	c.ld	a3,200(a0)
   2:	61637473          	csrrci	s0,0x616,6
   6:	6c6c                	c.ld	a1,216(s0)
   8:	765f 6169 615f      	0x615f6169765f
   e:	635f6d73          	csrrsi	s10,0x635,30
  12:	6361                	c.lui	t1,0x18
  14:	6568                	c.ld	a0,200(a0)
	...

0000000000000017 <__kstrtabns_testcall_via_asm_cache>:
	...

0000000000000018 <__kstrtab_cache_do_nop>:
  18:	68636163          	bltu	t1,t1,69a <.L889^B4+0x290>
  1c:	5f65                	c.li	t5,-7
  1e:	6f64                	c.ld	s1,216(a4)
  20:	6e5f 706f       	0x706f6e5f

0000000000000025 <__kstrtabns_cache_do_nop>:
	...

0000000000000026 <__kstrtab_cache_do_zicbom>:
  26:	68636163          	bltu	t1,t1,6a8 <.L889^B4+0x29e>
  2a:	5f65                	c.li	t5,-7
  2c:	6f64                	c.ld	s1,216(a4)
  2e:	7a5f 6369 6f62      	0x6f6263697a5f
  34:	006d                	c.nop	27

0000000000000036 <__kstrtabns_cache_do_zicbom>:
	...

Disassembly of section .rodata.str1.8:

0000000000000000 <.LC0>:
   0:	3301                	c.addiw	t1,-32
   2:	7325                	c.lui	t1,0xfffe9
   4:	6320                	c.ld	s0,64(a4)
   6:	7361                	c.lui	t1,0xffff8
   8:	253d                	c.addiw	a0,15
   a:	2075                	c.jal	b6 <.LC3+0x2e>
   c:	6176                	c.ldsp	sp,344(sp)
   e:	6464                	c.ld	s1,200(s0)
  10:	3d72                	c.fldsp	fs10,312(sp)
  12:	7025                	c.lui	zero,0xfffe9
  14:	7320                	c.ld	s0,96(a4)
  16:	7a69                	c.lui	s4,0xffffa
  18:	3a65                	c.addiw	s4,-7
  1a:	6c25                	c.lui	s8,0x9
  1c:	0a78                	c.addi4spn	a4,sp,284
	...

0000000000000020 <.LC1>:
  20:	7325                	c.lui	t1,0xfffe9
  22:	2520                	c.fld	fs0,72(a0)
  24:	41203a73          	csrrc	s4,0x412,zero
  28:	4352                	c.lwsp	t1,20(sp)
  2a:	5f48                	c.lw	a0,60(a4)
  2c:	4d44                	c.lw	s1,28(a0)
  2e:	5f41                	c.li	t5,-16
  30:	494d                	c.li	s2,19
  32:	414e                	c.lwsp	sp,208(sp)
  34:	494c                	c.lw	a1,20(a0)
  36:	73204e47          	fmsub.d	ft8,ft0,fs2,fa4,rmm
  3a:	616d                	c.addi16sp	sp,240
  3c:	6c6c                	c.ld	a1,216(s0)
  3e:	7265                	c.lui	tp,0xffff9
  40:	7420                	c.ld	s0,104(s0)
  42:	6168                	c.ld	a0,192(a0)
  44:	206e                	c.fldsp	ft0,216(sp)
  46:	6972                	c.ldsp	s2,280(sp)
  48:	2c766373          	csrrsi	t1,0x2c7,12
  4c:	6d6f6263          	bltu	t5,s6,710 <.L889^B4+0x306>
  50:	622d                	c.lui	tp,0xb
  52:	6f6c                	c.ld	a1,216(a4)
  54:	732d6b63          	bltu	s10,s2,78a <.L889^B4+0x380>
  58:	7a69                	c.lui	s4,0xffffa
  5a:	2065                	c.jal	102 <.LC4+0x32>
  5c:	2528                	c.fld	fa0,72(a0)
  5e:	2064                	c.fld	fs1,192(s0)
  60:	203c                	c.fld	fa5,64(s0)
  62:	6425                	c.lui	s0,0x9
  64:	0029                	c.nop	10
	...

0000000000000068 <.LC2>:
  68:	7261                	c.lui	tp,0xffff8
  6a:	722f6863          	bltu	t5,sp,79a <.L889^B4+0x390>
  6e:	7369                	c.lui	t1,0xffffa
  70:	6d2f7663          	bgeu	t5,s2,73c <.L889^B4+0x332>
  74:	2f6d                	c.addiw	t5,27
  76:	6d64                	c.ld	s1,216(a0)
  78:	2d61                	c.addiw	s10,24
  7a:	6f6e                	c.ldsp	t5,216(sp)
  7c:	636e                	c.ldsp	t1,216(sp)
  7e:	7265686f          	jal	a6,567a4 <.L889^B4+0x5639a>
  82:	6e65                	c.lui	t3,0x19
  84:	2e74                	c.fld	fa3,216(a2)
  86:	          	beq	a0,s2,7a6 <.L889^B4+0x39c>

0000000000000088 <.LC3>:
  88:	7325                	c.lui	t1,0xfffe9
  8a:	2520                	c.fld	fs0,72(a0)
  8c:	64203a73          	csrrc	s4,0x642,zero
  90:	7665                	c.lui	a2,0xffff9
  92:	6369                	c.lui	t1,0x1a
  94:	2065                	c.jal	13c <arch_sync_dma_for_cpu+0x8>
  96:	6f6e                	c.ldsp	t5,216(sp)
  98:	2d6e                	c.fldsp	fs10,216(sp)
  9a:	65686f63          	bltu	a6,s6,6f8 <.L889^B4+0x2ee>
  9e:	6572                	c.ldsp	a0,280(sp)
  a0:	746e                	c.ldsp	s0,248(sp)
  a2:	6220                	c.ld	s0,64(a2)
  a4:	7475                	c.lui	s0,0xffffd
  a6:	6e20                	c.ld	s0,88(a2)
  a8:	6f6e206f          	j	e279e <.L889^B4+0xe2394>
  ac:	2d6e                	c.fldsp	fs10,216(sp)
  ae:	65686f63          	bltu	a6,s6,70c <.L889^B4+0x302>
  b2:	6572                	c.ldsp	a0,280(sp)
  b4:	746e                	c.ldsp	s0,248(sp)
  b6:	6f20                	c.ld	s0,88(a4)
  b8:	6570                	c.ld	a2,200(a0)
  ba:	6172                	c.ldsp	sp,280(sp)
  bc:	6974                	c.ld	a3,208(a0)
  be:	20736e6f          	jal	t3,36ac4 <.L889^B4+0x366ba>
  c2:	70707573          	csrrci	a0,0x707,0
  c6:	6574726f          	jal	tp,47f1c <.L889^B4+0x47b12>
  ca:	0064                	c.addi4spn	s1,sp,12
  cc:	0000                	unimp
	...

00000000000000d0 <.LC4>:
  d0:	6f4e                	c.ldsp	t5,208(sp)
  d2:	2d6e                	c.fldsp	fs10,216(sp)
  d4:	65686f63          	bltu	a6,s6,732 <.L889^B4+0x328>
  d8:	6572                	c.ldsp	a0,280(sp)
  da:	746e                	c.ldsp	s0,248(sp)
  dc:	4420                	c.lw	s0,72(s0)
  de:	414d                	c.li	sp,19
  e0:	7320                	c.ld	s0,96(a4)
  e2:	7075                	c.lui	zero,0xffffd
  e4:	6f70                	c.ld	a2,216(a4)
  e6:	7472                	c.ldsp	s0,312(sp)
  e8:	6520                	c.ld	s0,72(a0)
  ea:	616e                	c.ldsp	sp,216(sp)
  ec:	6c62                	c.ldsp	s8,24(sp)
  ee:	6465                	c.lui	s0,0x19
  f0:	7720                	c.ld	s0,104(a4)
  f2:	7469                	c.lui	s0,0xffffa
  f4:	6f68                	c.ld	a0,216(a4)
  f6:	7475                	c.lui	s0,0xffffd
  f8:	6120                	c.ld	s0,64(a0)
  fa:	6220                	c.ld	s0,64(a2)
  fc:	6f6c                	c.ld	a1,216(a4)
  fe:	73206b63          	bltu	zero,s2,834 <.L889^B4+0x42a>
 102:	7a69                	c.lui	s4,0xffffa
 104:	0a65                	c.addi	s4,25
	...

Disassembly of section .text.unlikely:

0000000000000000 <cache_do_zicbom>:
   0:	1141                	c.addi	sp,-16
   2:	e022                	c.sdsp	s0,0(sp)
   4:	e406                	c.sdsp	ra,8(sp)
   6:	0800                	c.addi4spn	s0,sp,16
   8:	86ae                	c.mv	a3,a1
   a:	8732                	c.mv	a4,a2
   c:	00000597          	auipc	a1,0x0
  10:	00058593          	mv	a1,a1
  14:	862a                	c.mv	a2,a0
  16:	00000517          	auipc	a0,0x0
  1a:	00050513          	mv	a0,a0
  1e:	00000097          	auipc	ra,0x0
  22:	000080e7          	jalr	ra # 1e <cache_do_zicbom+0x1e>
  26:	60a2                	c.ldsp	ra,8(sp)
  28:	6402                	c.ldsp	s0,0(sp)
  2a:	0141                	c.addi	sp,16
  2c:	8082                	c.jr	ra

Disassembly of section .alternative:

0000000000000000 <.alternative>:
	...
  10:	031e                	c.slli	t1,0x7
	...
  32:	0000                	unimp
  34:	031e                	c.slli	t1,0x7
	...
  56:	0000                	unimp
  58:	031e                	c.slli	t1,0x7
	...
  7a:	0000                	unimp
  7c:	031e                	c.slli	t1,0x7
	...

Disassembly of section __bug_table:

0000000000000000 <__bug_table>:
	...
   8:	00d9                	c.addi	ra,22
   a:	0209                	c.addi	tp,2
	...
  14:	020900d3          	fadd.d	ft1,fs2,ft0,rne
	...
  20:	00e2                	c.slli	ra,0x18
  22:	0909                	c.addi	s2,2

Disassembly of section .rodata:

0000000000000000 <__func__.36886>:
   0:	68636163          	bltu	t1,t1,682 <.L889^B4+0x278>
   4:	5f65                	c.li	t5,-7
   6:	6f64                	c.ld	s1,216(a4)
   8:	7a5f 6369 6f62      	0x6f6263697a5f
   e:	006d                	c.nop	27

Disassembly of section .sbss:

0000000000000000 <noncoherent_supported>:
	...

Disassembly of section ___ksymtab+cache_do_nop:

0000000000000000 <__ksymtab_cache_do_nop>:
	...

Disassembly of section ___ksymtab+cache_do_zicbom:

0000000000000000 <__ksymtab_cache_do_zicbom>:
	...

Disassembly of section ___ksymtab+testcall_via_asm_cache:

0000000000000000 <__ksymtab_testcall_via_asm_cache>:
	...

Disassembly of section .comment:

0000000000000000 <.comment>:
   0:	4700                	c.lw	s0,8(a4)
   2:	203a4343          	fmadd.s	ft6,fs4,ft3,ft4,rmm
   6:	5528                	c.lw	a0,104(a0)
   8:	7562                	c.ldsp	a0,56(sp)
   a:	746e                	c.ldsp	s0,248(sp)
   c:	2075                	c.jal	b8 <.L886^B1+0x1c>
   e:	2e39                	c.addiw	t3,14
  10:	2e34                	c.fld	fa3,88(a2)
  12:	2d30                	c.fld	fa2,88(a0)
  14:	7531                	c.lui	a0,0xfffec
  16:	7562                	c.ldsp	a0,56(sp)
  18:	746e                	c.ldsp	s0,248(sp)
  1a:	3175                	c.addiw	sp,-3
  1c:	327e                	c.fldsp	ft4,504(sp)
  1e:	2e30                	c.fld	fa2,88(a2)
  20:	3430                	c.fld	fa2,104(s0)
  22:	2029                	c.jal	2c <.L16+0x2>
  24:	2e39                	c.addiw	t3,14
  26:	2e34                	c.fld	fa3,88(a2)
  28:	0030                	c.addi4spn	a2,sp,8

[-- Attachment #3: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives
  2022-11-21 21:31         ` Lad, Prabhakar
@ 2022-11-21 22:17           ` Heiko Stübner
  2022-11-21 22:38             ` Heiko Stübner
                               ` (2 more replies)
  0 siblings, 3 replies; 51+ messages in thread
From: Heiko Stübner @ 2022-11-21 22:17 UTC (permalink / raw)
  To: Lad, Prabhakar
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	ajones, emil.renner.berthing

Am Montag, 21. November 2022, 22:31:36 CET schrieb Lad, Prabhakar:
> Hi Heiko,
> 
> On Mon, Nov 21, 2022 at 3:06 PM Lad, Prabhakar
> <prabhakar.csengg@gmail.com> wrote:
> >
> > Hi Heiko,
> >
> > On Mon, Nov 21, 2022 at 11:27 AM Heiko Stübner <heiko@sntech.de> wrote:
> > >
> > > Hi,
> > >
> > > Am Montag, 21. November 2022, 10:50:09 CET schrieb Lad, Prabhakar:
> > > > On Thu, Nov 10, 2022 at 4:50 PM Heiko Stuebner <heiko@sntech.de> wrote:
> > > > >
> > > > > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > > > >
> > > > > Alternatives live in a different section, so addresses used by call
> > > > > functions will point to wrong locations after the patch got applied.
> > > > >
> > > > > Similar to arm64, adjust the location to consider that offset.
> > > > >
> > > > > Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > > > > ---
> > >
> > > [...]
> > >
> > > > I have the below assembly code which I have tested without the
> > > > alternatives for the RZ/Five CMO,
> > > >
> > > > #define ALT_CMO_OP(_op, _start, _size, _cachesize, _dir, _ops)        \
> > > > asm volatile(".option push\n\t\n\t"                    \
> > > >          ".option norvc\n\t"                    \
> > > >          ".option norelax\n\t"                    \
> > > >          "addi sp,sp,-16\n\t"                    \
> > > >          "sd    s0,0(sp)\n\t"                    \
> > > >          "sd    ra,8(sp)\n\t"                    \
> > > >          "addi    s0,sp,16\n\t"                    \
> > > >          "mv a4,%6\n\t"                        \
> > > >          "mv a3,%5\n\t"                        \
> > > >          "mv a2,%4\n\t"                        \
> > > >          "mv a1,%3\n\t"                        \
> > > >          "mv a0,%0\n\t"                        \
> > > >          "call rzfive_cmo\n\t"                    \
> > > >          "ld    ra,8(sp)\n\t"                    \
> > > >          "ld    s0,0(sp)\n\t"                    \
> > > >          "addi    sp,sp,16\n\t"                    \
> > > >          ".option pop\n\t"                        \
> > > >          : : "r"(_cachesize),                    \
> > > >          "r"((unsigned long)(_start) & ~((_cachesize) - 1UL)),    \
> > > >          "r"((unsigned long)(_start) + (_size)),            \
> > > >          "r"((unsigned long) (_start)),                \
> > > >          "r"((unsigned long) (_size)),                \
> > > >          "r"((unsigned long) (_dir)),                \
> > > >          "r"((unsigned long) (_ops))                \
> > > >          : "a0", "a1", "a2", "a3", "a4", "memory")
> > > >
> > > > Now when integrate this with ALTERNATIVE_2() as below,
> > > >
> > > > #define ALT_CMO_OP(_op, _start, _size, _cachesize, _dir, _ops)        \
> > > > asm volatile(ALTERNATIVE_2(                        \
> > > >     __nops(14),                            \
> > > >     "mv a0, %1\n\t"                            \
> > > >     "j 2f\n\t"                            \
> > > >     "3:\n\t"                            \
> > > >     "cbo." __stringify(_op) " (a0)\n\t"                \
> > > >     "add a0, a0, %0\n\t"                        \
> > > >     "2:\n\t"                            \
> > > >     "bltu a0, %2, 3b\n\t"                        \
> > > >     __nops(8), 0, CPUFEATURE_ZICBOM, CONFIG_RISCV_ISA_ZICBOM,    \
> > > >     ".option push\n\t\n\t"                        \
> > > >     ".option norvc\n\t"                        \
> > > >     ".option norelax\n\t"                        \
> > > >     "addi sp,sp,-16\n\t"                        \
> > > >     "sd    s0,0(sp)\n\t"                        \
> > > >     "sd    ra,8(sp)\n\t"                        \
> > > >     "addi    s0,sp,16\n\t"                        \
> > > >     "mv a4,%6\n\t"                            \
> > > >     "mv a3,%5\n\t"                            \
> > > >     "mv a2,%4\n\t"                            \
> > > >     "mv a1,%3\n\t"                            \
> > > >     "mv a0,%0\n\t"                            \
> > > >     "call rzfive_cmo\n\t"                \
> > > >     "ld    ra,8(sp)\n\t"                        \
> > > >     "ld    s0,0(sp)\n\t"                        \
> > > >     "addi    sp,sp,16\n\t"                        \
> > > >     ".option pop\n\t"                        \
> > > >     , ANDESTECH_VENDOR_ID,                        \
> > > >             ERRATA_ANDESTECH_NO_IOCP, CONFIG_ERRATA_RZFIVE_CMO)    \
> > > >     : : "r"(_cachesize),                        \
> > > >     "r"((unsigned long)(_start) & ~((_cachesize) - 1UL)),    \
> > > >     "r"((unsigned long)(_start) + (_size)),            \
> > > >     "r"((unsigned long) (_start)),                \
> > > >     "r"((unsigned long) (_size)),                \
> > > >     "r"((unsigned long) (_dir)),                \
> > > >     "r"((unsigned long) (_ops))                \
> > > >     : "a0", "a1", "a2", "a3", "a4", "memory")
> > > >
> > > > I am seeing kernel panic with this change. Looking at the
> > > > riscv_alternative_fix_auipc_jalr() implementation it assumes the rest
> > > > of the alternative options are calls too. Is my understanding correct
> > > > here?
> > >
> > > The loop walks through the instructions after the location got patched and
> > > checks if an instruction is an auipc and the next one is a jalr and only then
> > > adjusts the address accordingly.
> > >
> > Ok so my understanding was wrong here.
> >
> > > So it _should_ leave all other (non auipc+jalr) instructions alone.
> > > (hopefully)
> > >
> > Agreed.
> >
> > >
> > > > Do you think this is the correct approach in my case?
> > >
> > > It does look correct on first glance.
> > >
> > \o/
> >
> > > As I had that passing thought, are you actually calling
> > >         riscv_alternative_fix_auipc_jalr()
> > > from your errata/.../foo.c after doing the patching?
> > >
> > > I.e. with the current patchset, that function is only called from the
> > > cpufeature part, but for example not from the other patching locations.
> > > [and a future revision should probably change that :-) ]
> > >
> > >
> > I have made a local copy of riscv_alternative_fix_auipc_jalr() and
> > then calling it after patch_text_nosync() referring to your patch for
> > str functions.
> >
> > > After making sure that function actually runs, the next thing you could try
> > > is to have both the "original" code and the patch be identical, i.e.
> > > replace the cbo* part with your code as well and then just output the
> > > instructions via printk to check what the addresses do in both.
> > >
> > > After riscv_alternative_fix_auipc_jalr() ran then both code variants
> > > should be identical when using the same code in both areas.
> > >
> > So I have added debug prints to match the instructions as below after
> > and before patching:
> >
> > static void riscv_alternative_print_inst(unsigned int *alt_ptr,
> >                      unsigned int len)
> > {
> >     int num_instr = len / sizeof(u32);
> >     int i;
> >
> >     for (i = 0; i < num_instr; i++)
> >         pr_err("%s instruction: 0x%x\n", __func__, *(alt_ptr + i));
> >
> > }
> >
> > void __init_or_module andes_errata_patch_func(struct alt_entry *begin,
> > struct alt_entry *end,
> >                           unsigned long archid, unsigned long impid,
> >                           unsigned int stage)
> > {
> > ....
> >     if (cpu_req_errata & tmp) {
> >         pr_err("stage: %x -> %px--> %x %x %x\n", stage, alt, tmp,
> > cpu_req_errata, alt->errata_id);
> >         pr_err("old:%ps alt:%ps len:%lx\n", alt->old_ptr,
> > alt->alt_ptr, alt->alt_len);
> >         pr_err("Print old start\n");
> >         riscv_alternative_print_inst(alt->old_ptr, alt->alt_len);
> >         pr_err("Print old end\n");
> >         patch_text_nosync(alt->old_ptr, alt->alt_ptr, alt->alt_len);
> >
> >         riscv_alternative_fix_auipc_jalr(alt->old_ptr, alt->alt_len,
> >                         alt->old_ptr - alt->alt_ptr);
> >         pr_err("Print patch start\n");
> >         riscv_alternative_print_inst(alt->alt_ptr, alt->alt_len);
> >         pr_err("Print patch end\n");
> >     }
> > .....
> > }
> >
> > Below is the log:
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] Print new old end
> > [    0.000000] riscv_alternative_fix_auipc_jalr num instruction: 14
> > [    0.000000] Print patch start
> > [    0.000000] riscv_alternative_print_inst instruction: 0xff010113
> > [    0.000000] riscv_alternative_print_inst instruction: 0x813023
> > [    0.000000] riscv_alternative_print_inst instruction: 0x113423
> > [    0.000000] riscv_alternative_print_inst instruction: 0x1010413
> > [    0.000000] riscv_alternative_print_inst instruction: 0xf0713
> > [    0.000000] riscv_alternative_print_inst instruction: 0x78693
> > [    0.000000] riscv_alternative_print_inst instruction: 0x88613
> > [    0.000000] riscv_alternative_print_inst instruction: 0x80593
> > [    0.000000] riscv_alternative_print_inst instruction: 0xe0513
> > [    0.000000] riscv_alternative_print_inst instruction: 0x97
> > [    0.000000] riscv_alternative_print_inst instruction: 0xcba080e7
> > [    0.000000] riscv_alternative_print_inst instruction: 0x813083
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13403
> > [    0.000000] riscv_alternative_print_inst instruction: 0x1010113
> > [    0.000000] Print patch end
> > [    0.000000] stage: 0 -> ffffffff80a2492c--> 1 1 0
> > [    0.000000] old:arch_sync_dma_for_device
> > alt:riscv_noncoherent_supported len:38
> > [    0.000000] Print  old start
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x970013
> >     ====================> This instruction doesn't look correct it
> > should be 0x13?
> > [    0.000000] Print  old end
> > [    0.000000] riscv_alternative_fix_auipc_jalr num instruction: 14
> > [    0.000000] Print patch start
> > [    0.000000] riscv_alternative_print_inst instruction: 0xff010113
> > [    0.000000] riscv_alternative_print_inst instruction: 0x813023
> > [    0.000000] riscv_alternative_print_inst instruction: 0x113423
> > [    0.000000] riscv_alternative_print_inst instruction: 0x1010413
> > [    0.000000] riscv_alternative_print_inst instruction: 0x78713
> > [    0.000000] riscv_alternative_print_inst instruction: 0x78693
> > [    0.000000] riscv_alternative_print_inst instruction: 0x88613
> > [    0.000000] riscv_alternative_print_inst instruction: 0x80593
> > [    0.000000] riscv_alternative_print_inst instruction: 0xe0513
> > [    0.000000] riscv_alternative_print_inst instruction: 0x97
> > [    0.000000] riscv_alternative_print_inst instruction: 0xc82080e7
> > ====================> This instruction doesn't look correct comparing
> > to objdump output this should be 000080e7 or does it require the
> > offset too?
> > [    0.000000] riscv_alternative_print_inst instruction: 0x813083
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13403
> > [    0.000000] riscv_alternative_print_inst instruction: 0x1010113
> > [    0.000000] Print patch end
> > [    0.000000] stage: 0 -> ffffffff80a24950--> 1 1 0
> > [    0.000000] old:arch_sync_dma_for_cpu alt:riscv_noncoherent_supported len:38
> > [    0.000000] Print old start
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x97
> > ====================> This instruction doesn't look correct it should
> > be 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0xeee080e7
> >       ====================> This instruction doesn't look correct it
> > should be 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] Print old end
> > [    0.000000] riscv_alternative_fix_auipc_jalr num instruction: 14
> > [    0.000000] Print patch start
> > [    0.000000] riscv_alternative_print_inst instruction: 0xff010113
> > [    0.000000] riscv_alternative_print_inst instruction: 0x813023
> > [    0.000000] riscv_alternative_print_inst instruction: 0x113423
> > [    0.000000] riscv_alternative_print_inst instruction: 0x1010413
> > [    0.000000] riscv_alternative_print_inst instruction: 0xf0713
> > [    0.000000] riscv_alternative_print_inst instruction: 0x80693
> > [    0.000000] riscv_alternative_print_inst instruction: 0x88613
> > [    0.000000] riscv_alternative_print_inst instruction: 0x78593
> > [    0.000000] riscv_alternative_print_inst instruction: 0xe0513
> > [    0.000000] riscv_alternative_print_inst instruction: 0x97
> > [    0.000000] riscv_alternative_print_inst instruction: 0xc4a080e7
> > ====================> This instruction doesn't look correct comparing
> > to objdump output this should be 000080e7 or does it require the
> > offset too?
> > [    0.000000] riscv_alternative_print_inst instruction: 0x813083
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13403
> > [    0.000000] riscv_alternative_print_inst instruction: 0x1010113
> > [    0.000000] Print patch end
> > [    0.000000] stage: 0 -> ffffffff80a24974--> 1 1 0
> > [    0.000000] old:arch_dma_prep_coherent alt:riscv_noncoherent_supported len:38
> > [    0.000000] Print old start
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x970013
> > ====================> This instruction doesn't look correct it should
> > be 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x80e70000
> > ====================> This instruction doesn't look correct it should
> > be 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0xe720
> > ====================> This instruction doesn't look correct it should
> > be 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13
> > [    0.000000] Print old end
> > [    0.000000] riscv_alternative_fix_auipc_jalr num instruction: 14
> > [    0.000000] Print patch start
> > [    0.000000] riscv_alternative_print_inst instruction: 0xff010113
> > [    0.000000] riscv_alternative_print_inst instruction: 0x813023
> > [    0.000000] riscv_alternative_print_inst instruction: 0x113423
> > [    0.000000] riscv_alternative_print_inst instruction: 0x1010413
> > [    0.000000] riscv_alternative_print_inst instruction: 0xf0713
> > [    0.000000] riscv_alternative_print_inst instruction: 0xe8693
> > [    0.000000] riscv_alternative_print_inst instruction: 0x88613
> > [    0.000000] riscv_alternative_print_inst instruction: 0x78593
> > [    0.000000] riscv_alternative_print_inst instruction: 0x30513
> > [    0.000000] riscv_alternative_print_inst instruction: 0x97
> > [    0.000000] riscv_alternative_print_inst instruction: 0xc12080e7
> > ====================> This instruction doesn't look correct comparing
> > to objdump output this should be 000080e7 + offset?
> > [    0.000000] riscv_alternative_print_inst instruction: 0x813083
> > [    0.000000] riscv_alternative_print_inst instruction: 0x13403
> > [    0.000000] riscv_alternative_print_inst instruction: 0x1010113
> > [    0.000000] Print patch end
> >
> > Here is the output from objdump of the file (dma-noncoherent.o):
> >
> > 000000000000032e <.L888^B1>:
> >  32e:    ff010113              addi    sp,sp,-16
> >  332:    00813023              sd    s0,0(sp)
> >  336:    00113423              sd    ra,8(sp)
> >  33a:    01010413              addi    s0,sp,16
> >  33e:    000f0713              mv    a4,t5
> >  342:    00078693              mv    a3,a5
> >  346:    00088613              mv    a2,a7
> >  34a:    00080593              mv    a1,a6
> >  34e:    000e0513              mv    a0,t3
> >  352:    00000097              auipc    ra,0x0
> >  356:    000080e7              jalr    ra # 352 <.L888^B1+0x24>
> >  35a:    00813083              ld    ra,8(sp)
> >  35e:    00013403              ld    s0,0(sp)
> >  362:    01010113              addi    sp,sp,16
> >
> > 0000000000000366 <.L888^B2>:
> >  366:    ff010113              addi    sp,sp,-16
> >  36a:    00813023              sd    s0,0(sp)
> >  36e:    00113423              sd    ra,8(sp)
> >  372:    01010413              addi    s0,sp,16
> >  376:    00078713              mv    a4,a5
> >  37a:    00078693              mv    a3,a5
> >  37e:    00088613              mv    a2,a7
> >  382:    00080593              mv    a1,a6
> >  386:    000e0513              mv    a0,t3
> >  38a:    00000097              auipc    ra,0x0
> >  38e:    000080e7              jalr    ra # 38a <.L888^B2+0x24>
> >  392:    00813083              ld    ra,8(sp)
> >  396:    00013403              ld    s0,0(sp)
> >  39a:    01010113              addi    sp,sp,16
> >
> > 000000000000039e <.L888^B3>:
> >  39e:    ff010113              addi    sp,sp,-16
> >  3a2:    00813023              sd    s0,0(sp)
> >  3a6:    00113423              sd    ra,8(sp)
> >  3aa:    01010413              addi    s0,sp,16
> >  3ae:    000f0713              mv    a4,t5
> >  3b2:    00080693              mv    a3,a6
> >  3b6:    00088613              mv    a2,a7
> >  3ba:    00078593              mv    a1,a5
> >  3be:    000e0513              mv    a0,t3
> >  3c2:    00000097              auipc    ra,0x0
> >  3c6:    000080e7              jalr    ra # 3c2 <.L888^B3+0x24>
> >  3ca:    00813083              ld    ra,8(sp)
> >  3ce:    00013403              ld    s0,0(sp)
> >  3d2:    01010113              addi    sp,sp,16
> >
> > 00000000000003d6 <.L888^B4>:
> >  3d6:    ff010113              addi    sp,sp,-16
> >  3da:    00813023              sd    s0,0(sp)
> >  3de:    00113423              sd    ra,8(sp)
> >  3e2:    01010413              addi    s0,sp,16
> >  3e6:    000f0713              mv    a4,t5
> >  3ea:    000e8693              mv    a3,t4
> >  3ee:    00088613              mv    a2,a7
> >  3f2:    00078593              mv    a1,a5
> >  3f6:    00030513              mv    a0,t1
> >  3fa:    00000097              auipc    ra,0x0
> >  3fe:    000080e7              jalr    ra # 3fa <.L888^B4+0x24>
> >  402:    00813083              ld    ra,8(sp)
> >  406:    00013403              ld    s0,0(sp)
> >  40a:    01010113              addi    sp,sp,16
> >
> > Disassembly of section __ksymtab_strings:
> >
> > Any pointers what could be happening?
> >
> 
> Some more information,
> 
> - If I drop the riscv_alternative_fix_auipc_jalr() call after
> patch_text_nosync() and then print the alt->old_ptr instructions
> before patching I can see the instructions as 0x13 (nop) which is
> correct.
> 
> - if I call riscv_alternative_fix_auipc_jalr() call after
> patch_text_nosync() and then print the alt->old_ptr instructions
> before patching I dont see 0x13 (nop) consistently for old
> instructions.

which is to be expected I guess.

alt->old_ptr points to the memory location where the live kernel code
lives.

I.e. the code at this location is the thing the kernel actually runs.
The code at this location then gets overwritten by the alternative
assembly.


> - If I replace the nop's in the old instructions with my assembly code
> of rz/five cmo and then just use patch_text_nosync() I can see the
> correct actual instruction being printed apart from jalr (is some sort
> of offset added to it as I see last 4 bits match?) and then is
> replaced correctly by the same alt instructions apart from the jalr
> (log [0]).
> 
> - If I replace the nop's in the old instructions with my assembly code
> of rz/five cmo and then use patch_text_nosync() and
> riscv_alternative_fix_auipc_jalr() I can see the actual old
> instructions differs a bit and again the jalr instruction differs too
> in the patched code (log [1]).
> 
> [0] https://paste.debian.net/1261412/
> [1] https://paste.debian.net/1261413/
> 
> Attached is the objump of dma-noncoherent.o for reference.

I did read that objdumps are not really conclusive when looking
at auipc + jalr instructions, hence the printing of the actual instructions.

As either manually or with a helper like

	https://luplab.gitlab.io/rvcodecjs/#q=0xf4c080e7

you can then decode the actual instruction and compare.

In your log the two jalr instructions decode to different offsets,
	jalr x1, x1, -180
vs
	jalr x1, x1, -834

Can you check what the patch_offset value is in your case?

Interestingly the
	auipc x1, 0
is 0 for both cases.

I'll try to build a real test-setup mimicing what you're doing
tomorrow (european tomorrow).


Heiko



_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives
  2022-11-21 22:17           ` Heiko Stübner
@ 2022-11-21 22:38             ` Heiko Stübner
  2022-11-22  0:16               ` Lad, Prabhakar
  2022-11-21 23:59             ` Lad, Prabhakar
  2022-11-22 10:59             ` Lad, Prabhakar
  2 siblings, 1 reply; 51+ messages in thread
From: Heiko Stübner @ 2022-11-21 22:38 UTC (permalink / raw)
  To: Lad, Prabhakar
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	ajones, emil.renner.berthing

Am Montag, 21. November 2022, 23:17:11 CET schrieb Heiko Stübner:
> Am Montag, 21. November 2022, 22:31:36 CET schrieb Lad, Prabhakar:
> > Some more information,
> > 
> > - If I drop the riscv_alternative_fix_auipc_jalr() call after
> > patch_text_nosync() and then print the alt->old_ptr instructions
> > before patching I can see the instructions as 0x13 (nop) which is
> > correct.
> > 
> > - if I call riscv_alternative_fix_auipc_jalr() call after
> > patch_text_nosync() and then print the alt->old_ptr instructions
> > before patching I dont see 0x13 (nop) consistently for old
> > instructions.
> 
> which is to be expected I guess.
> 
> alt->old_ptr points to the memory location where the live kernel code
> lives.
> 
> I.e. the code at this location is the thing the kernel actually runs.
> The code at this location then gets overwritten by the alternative
> assembly.
> 
> 
> > - If I replace the nop's in the old instructions with my assembly code
> > of rz/five cmo and then just use patch_text_nosync() I can see the
> > correct actual instruction being printed apart from jalr (is some sort
> > of offset added to it as I see last 4 bits match?) and then is
> > replaced correctly by the same alt instructions apart from the jalr
> > (log [0]).
> > 
> > - If I replace the nop's in the old instructions with my assembly code
> > of rz/five cmo and then use patch_text_nosync() and
> > riscv_alternative_fix_auipc_jalr() I can see the actual old
> > instructions differs a bit and again the jalr instruction differs too
> > in the patched code (log [1]).
> > 
> > [0] https://paste.debian.net/1261412/
> > [1] https://paste.debian.net/1261413/
> > 
> > Attached is the objump of dma-noncoherent.o for reference.
> 
> I did read that objdumps are not really conclusive when looking
> at auipc + jalr instructions, hence the printing of the actual instructions.
> 
> As either manually or with a helper like
> 
> 	https://luplab.gitlab.io/rvcodecjs/#q=0xf4c080e7
> 
> you can then decode the actual instruction and compare.
> 
> In your log the two jalr instructions decode to different offsets,
> 	jalr x1, x1, -180
> vs
> 	jalr x1, x1, -834
> 
> Can you check what the patch_offset value is in your case?
> 
> Interestingly the
> 	auipc x1, 0
> is 0 for both cases.
> 
> I'll try to build a real test-setup mimicing what you're doing
> tomorrow (european tomorrow).

also, is it possible for you to put your code on some github
or so? Sometimes looking at the actual code makes things
a lot easier :-)

Thanks
Heiko



_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives
  2022-11-21 22:17           ` Heiko Stübner
  2022-11-21 22:38             ` Heiko Stübner
@ 2022-11-21 23:59             ` Lad, Prabhakar
  2022-11-22 10:59             ` Lad, Prabhakar
  2 siblings, 0 replies; 51+ messages in thread
From: Lad, Prabhakar @ 2022-11-21 23:59 UTC (permalink / raw)
  To: Heiko Stübner
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	ajones, emil.renner.berthing

Hi Heiko,

On Mon, Nov 21, 2022 at 10:17 PM Heiko Stübner <heiko@sntech.de> wrote:
>
> Am Montag, 21. November 2022, 22:31:36 CET schrieb Lad, Prabhakar:
> > Hi Heiko,
> >
> > On Mon, Nov 21, 2022 at 3:06 PM Lad, Prabhakar
> > <prabhakar.csengg@gmail.com> wrote:
> > >
<snip>
> > Some more information,
> >
> > - If I drop the riscv_alternative_fix_auipc_jalr() call after
> > patch_text_nosync() and then print the alt->old_ptr instructions
> > before patching I can see the instructions as 0x13 (nop) which is
> > correct.
> >
> > - if I call riscv_alternative_fix_auipc_jalr() call after
> > patch_text_nosync() and then print the alt->old_ptr instructions
> > before patching I dont see 0x13 (nop) consistently for old
> > instructions.
>
> which is to be expected I guess.
>
> alt->old_ptr points to the memory location where the live kernel code
> lives.
>
Agreed.

> I.e. the code at this location is the thing the kernel actually runs.
> The code at this location then gets overwritten by the alternative
> assembly.
>
But shouldn't the actual code be nops before patching?

>
> > - If I replace the nop's in the old instructions with my assembly code
> > of rz/five cmo and then just use patch_text_nosync() I can see the
> > correct actual instruction being printed apart from jalr (is some sort
> > of offset added to it as I see last 4 bits match?) and then is
> > replaced correctly by the same alt instructions apart from the jalr
> > (log [0]).
> >
> > - If I replace the nop's in the old instructions with my assembly code
> > of rz/five cmo and then use patch_text_nosync() and
> > riscv_alternative_fix_auipc_jalr() I can see the actual old
> > instructions differs a bit and again the jalr instruction differs too
> > in the patched code (log [1]).
> >
> > [0] https://paste.debian.net/1261412/
> > [1] https://paste.debian.net/1261413/
> >
> > Attached is the objump of dma-noncoherent.o for reference.
>
> I did read that objdumps are not really conclusive when looking
> at auipc + jalr instructions, hence the printing of the actual instructions.
>
> As either manually or with a helper like
>
>         https://luplab.gitlab.io/rvcodecjs/#q=0xf4c080e7
>
> you can then decode the actual instruction and compare.
>
OK, I will give it a try.


> In your log the two jalr instructions decode to different offsets,
>         jalr x1, x1, -180
> vs
>         jalr x1, x1, -834
>
> Can you check what the patch_offset value is in your case?
>
I'll check that and let you know.

> Interestingly the
>         auipc x1, 0
> is 0 for both cases.
>
> I'll try to build a real test-setup mimicing what you're doing
> tomorrow (european tomorrow).
>
Thank you!

Cheers,
Prabhakar

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives
  2022-11-21 22:38             ` Heiko Stübner
@ 2022-11-22  0:16               ` Lad, Prabhakar
  0 siblings, 0 replies; 51+ messages in thread
From: Lad, Prabhakar @ 2022-11-22  0:16 UTC (permalink / raw)
  To: Heiko Stübner
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	ajones, emil.renner.berthing

Hi Heiko,

On Mon, Nov 21, 2022 at 10:38 PM Heiko Stübner <heiko@sntech.de> wrote:
>
> Am Montag, 21. November 2022, 23:17:11 CET schrieb Heiko Stübner:
> > Am Montag, 21. November 2022, 22:31:36 CET schrieb Lad, Prabhakar:
> > > Some more information,
> > >
> > > - If I drop the riscv_alternative_fix_auipc_jalr() call after
> > > patch_text_nosync() and then print the alt->old_ptr instructions
> > > before patching I can see the instructions as 0x13 (nop) which is
> > > correct.
> > >
> > > - if I call riscv_alternative_fix_auipc_jalr() call after
> > > patch_text_nosync() and then print the alt->old_ptr instructions
> > > before patching I dont see 0x13 (nop) consistently for old
> > > instructions.
> >
> > which is to be expected I guess.
> >
> > alt->old_ptr points to the memory location where the live kernel code
> > lives.
> >
> > I.e. the code at this location is the thing the kernel actually runs.
> > The code at this location then gets overwritten by the alternative
> > assembly.
> >
> >
> > > - If I replace the nop's in the old instructions with my assembly code
> > > of rz/five cmo and then just use patch_text_nosync() I can see the
> > > correct actual instruction being printed apart from jalr (is some sort
> > > of offset added to it as I see last 4 bits match?) and then is
> > > replaced correctly by the same alt instructions apart from the jalr
> > > (log [0]).
> > >
> > > - If I replace the nop's in the old instructions with my assembly code
> > > of rz/five cmo and then use patch_text_nosync() and
> > > riscv_alternative_fix_auipc_jalr() I can see the actual old
> > > instructions differs a bit and again the jalr instruction differs too
> > > in the patched code (log [1]).
> > >
> > > [0] https://paste.debian.net/1261412/
> > > [1] https://paste.debian.net/1261413/
> > >
> > > Attached is the objump of dma-noncoherent.o for reference.
> >
> > I did read that objdumps are not really conclusive when looking
> > at auipc + jalr instructions, hence the printing of the actual instructions.
> >
> > As either manually or with a helper like
> >
> >       https://luplab.gitlab.io/rvcodecjs/#q=0xf4c080e7
> >
> > you can then decode the actual instruction and compare.
> >
> > In your log the two jalr instructions decode to different offsets,
> >       jalr x1, x1, -180
> > vs
> >       jalr x1, x1, -834
> >
> > Can you check what the patch_offset value is in your case?
> >
> > Interestingly the
> >       auipc x1, 0
> > is 0 for both cases.
> >
> > I'll try to build a real test-setup mimicing what you're doing
> > tomorrow (european tomorrow).
>
> also, is it possible for you to put your code on some github
> or so? Sometimes looking at the actual code makes things
> a lot easier :-)
>
I have pushed my changes here
https://github.com/prabhakarlad/linux/tree/rzfive-cmo

Cheers,
Prabhakar

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives
  2022-11-21 22:17           ` Heiko Stübner
  2022-11-21 22:38             ` Heiko Stübner
  2022-11-21 23:59             ` Lad, Prabhakar
@ 2022-11-22 10:59             ` Lad, Prabhakar
  2022-11-22 11:19               ` Heiko Stübner
  2 siblings, 1 reply; 51+ messages in thread
From: Lad, Prabhakar @ 2022-11-22 10:59 UTC (permalink / raw)
  To: Heiko Stübner
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	ajones, emil.renner.berthing

Hi Heiko,

On Mon, Nov 21, 2022 at 10:17 PM Heiko Stübner <heiko@sntech.de> wrote:
>
> Am Montag, 21. November 2022, 22:31:36 CET schrieb Lad, Prabhakar:
> > Hi Heiko,
> >
<snip>
> As either manually or with a helper like
>
>         https://luplab.gitlab.io/rvcodecjs/#q=0xf4c080e7
>
> you can then decode the actual instruction and compare.
>
> In your log the two jalr instructions decode to different offsets,
>         jalr x1, x1, -180
> vs
>         jalr x1, x1, -834
>
> Can you check what the patch_offset value is in your case?
>
patch_offset for the above case is -654.

Cheers,
Prabhakar

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives
  2022-11-22 10:59             ` Lad, Prabhakar
@ 2022-11-22 11:19               ` Heiko Stübner
  2022-11-22 11:37                 ` Heiko Stübner
  0 siblings, 1 reply; 51+ messages in thread
From: Heiko Stübner @ 2022-11-22 11:19 UTC (permalink / raw)
  To: Lad, Prabhakar
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	ajones, emil.renner.berthing

Am Dienstag, 22. November 2022, 11:59:57 CET schrieb Lad, Prabhakar:
> Hi Heiko,
> 
> On Mon, Nov 21, 2022 at 10:17 PM Heiko Stübner <heiko@sntech.de> wrote:
> >
> > Am Montag, 21. November 2022, 22:31:36 CET schrieb Lad, Prabhakar:
> > > Hi Heiko,
> > >
> <snip>
> > As either manually or with a helper like
> >
> >         https://luplab.gitlab.io/rvcodecjs/#q=0xf4c080e7
> >
> > you can then decode the actual instruction and compare.
> >
> > In your log the two jalr instructions decode to different offsets,
> >         jalr x1, x1, -180
> > vs
> >         jalr x1, x1, -834
> >
> > Can you check what the patch_offset value is in your case?
> >
> patch_offset for the above case is -654.

which is a big indicator that the auipc-jalr-fixup function is not catching
the instruction ... i.e. -180 - 654 = -834.

I managed to reproduce that issue with your branch now
(after hacking up stuff a bit to run in qemu :-) ).

I'll try to find out where the fixup fails.


Heiko



_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives
  2022-11-22 11:19               ` Heiko Stübner
@ 2022-11-22 11:37                 ` Heiko Stübner
  2022-11-22 12:28                   ` Lad, Prabhakar
  0 siblings, 1 reply; 51+ messages in thread
From: Heiko Stübner @ 2022-11-22 11:37 UTC (permalink / raw)
  To: Lad, Prabhakar
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	ajones, emil.renner.berthing

Am Dienstag, 22. November 2022, 12:19:40 CET schrieb Heiko Stübner:
> Am Dienstag, 22. November 2022, 11:59:57 CET schrieb Lad, Prabhakar:
> > Hi Heiko,
> > 
> > On Mon, Nov 21, 2022 at 10:17 PM Heiko Stübner <heiko@sntech.de> wrote:
> > >
> > > Am Montag, 21. November 2022, 22:31:36 CET schrieb Lad, Prabhakar:
> > > > Hi Heiko,
> > > >
> > <snip>
> > > As either manually or with a helper like
> > >
> > >         https://luplab.gitlab.io/rvcodecjs/#q=0xf4c080e7
> > >
> > > you can then decode the actual instruction and compare.
> > >
> > > In your log the two jalr instructions decode to different offsets,
> > >         jalr x1, x1, -180
> > > vs
> > >         jalr x1, x1, -834
> > >
> > > Can you check what the patch_offset value is in your case?
> > >
> > patch_offset for the above case is -654.
> 
> which is a big indicator that the auipc-jalr-fixup function is not catching
> the instruction ... i.e. -180 - 654 = -834.
> 
> I managed to reproduce that issue with your branch now
> (after hacking up stuff a bit to run in qemu :-) ).
> 
> I'll try to find out where the fixup fails.

imagine me with a slightly red head now ... as there is a slightly
embarrassing mistake in the fixup function ;-) .


When going from void* to unsigned int* pointers I have missed
adjusting the actual patch-location.

The call needs to be
	patch_text_nosync(alt_ptr + i, call, 8);

instead of the current
	patch_text_nosync(alt_ptr + i * sizeof(u32), call, 8);

In my str* cases this didn't matter because "i" was 0 there, but in your
longer assembly it actually patched the wrong location.


Heiko

============
For reference, my debug prints to find where the patching fails was:

diff --git a/arch/riscv/errata/renesas/errata.c b/arch/riscv/errata/renesas/errata.c
index 986f1c762d72..a5a47c5e9ff8 100644
--- a/arch/riscv/errata/renesas/errata.c
+++ b/arch/riscv/errata/renesas/errata.c
@@ -72,6 +72,7 @@ static void riscv_alternative_fix_auipc_jalr(unsigned int *alt_ptr,
        u32 rd1;
 
        for (i = 0; i < num_instr; i++) {
+printk("%s: looking at inst 0x%x\n", __func__, *(alt_ptr + i));
                /* is there a further instruction? */
                if (i + 1 >= num_instr)
                        continue;
@@ -84,6 +85,7 @@ static void riscv_alternative_fix_auipc_jalr(unsigned int *alt_ptr,
                if (rd1 != 1)
                        continue;
 
+printk("%s: -> found a auipc + jalr pair\n", __func__);
                /* get and adjust new target address */
                imm1 = EXTRACT_UTYPE_IMM(*(alt_ptr + i));
                imm1 += EXTRACT_ITYPE_IMM(*(alt_ptr + i + 1));
@@ -101,8 +103,10 @@ static void riscv_alternative_fix_auipc_jalr(unsigned int *alt_ptr,
                call[0] |= to_auipc_imm(imm1);
                call[1] |= to_jalr_imm(imm1);
 
+printk("%s: patching to 0x%x and 0x%x\n", __func__, call[0], call[1]);
                /* patch the call place again */
-               patch_text_nosync(alt_ptr + i * sizeof(u32), call, 8);
+               patch_text_nosync(alt_ptr + i, call, 8);
+printk("%s: patched to 0x%x and 0x%x\n", __func__, *(alt_ptr + i), *(alt_ptr + i + 1));
        }
 }
 
and then realizing that the "patching to" and "patched to" where different.




_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives
  2022-11-22 11:37                 ` Heiko Stübner
@ 2022-11-22 12:28                   ` Lad, Prabhakar
  0 siblings, 0 replies; 51+ messages in thread
From: Lad, Prabhakar @ 2022-11-22 12:28 UTC (permalink / raw)
  To: Heiko Stübner
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	ajones, emil.renner.berthing

Hi Heiko,

On Tue, Nov 22, 2022 at 11:37 AM Heiko Stübner <heiko@sntech.de> wrote:
>
> Am Dienstag, 22. November 2022, 12:19:40 CET schrieb Heiko Stübner:
> > Am Dienstag, 22. November 2022, 11:59:57 CET schrieb Lad, Prabhakar:
> > > Hi Heiko,
> > >
> > > On Mon, Nov 21, 2022 at 10:17 PM Heiko Stübner <heiko@sntech.de> wrote:
> > > >
> > > > Am Montag, 21. November 2022, 22:31:36 CET schrieb Lad, Prabhakar:
> > > > > Hi Heiko,
> > > > >
> > > <snip>
> > > > As either manually or with a helper like
> > > >
> > > >         https://luplab.gitlab.io/rvcodecjs/#q=0xf4c080e7
> > > >
> > > > you can then decode the actual instruction and compare.
> > > >
> > > > In your log the two jalr instructions decode to different offsets,
> > > >         jalr x1, x1, -180
> > > > vs
> > > >         jalr x1, x1, -834
> > > >
> > > > Can you check what the patch_offset value is in your case?
> > > >
> > > patch_offset for the above case is -654.
> >
> > which is a big indicator that the auipc-jalr-fixup function is not catching
> > the instruction ... i.e. -180 - 654 = -834.
> >
> > I managed to reproduce that issue with your branch now
> > (after hacking up stuff a bit to run in qemu :-) ).
> >
> > I'll try to find out where the fixup fails.
>
> imagine me with a slightly red head now ... as there is a slightly
> embarrassing mistake in the fixup function ;-) .
>
Cheer up now :-)

>
> When going from void* to unsigned int* pointers I have missed
> adjusting the actual patch-location.
>
> The call needs to be
>         patch_text_nosync(alt_ptr + i, call, 8);
>
That did the trick! I have done some limited testing on the board
(even replaced orignal instructions back to nop's even with its
working too).

> instead of the current
>         patch_text_nosync(alt_ptr + i * sizeof(u32), call, 8);
>
> In my str* cases this didn't matter because "i" was 0 there, but in your
> longer assembly it actually patched the wrong location.
>
Ahaa right the alt macro just had calls.

>
> Heiko
>
> ============
> For reference, my debug prints to find where the patching fails was:
>
> diff --git a/arch/riscv/errata/renesas/errata.c b/arch/riscv/errata/renesas/errata.c
> index 986f1c762d72..a5a47c5e9ff8 100644
> --- a/arch/riscv/errata/renesas/errata.c
> +++ b/arch/riscv/errata/renesas/errata.c
> @@ -72,6 +72,7 @@ static void riscv_alternative_fix_auipc_jalr(unsigned int *alt_ptr,
>         u32 rd1;
>
>         for (i = 0; i < num_instr; i++) {
> +printk("%s: looking at inst 0x%x\n", __func__, *(alt_ptr + i));
>                 /* is there a further instruction? */
>                 if (i + 1 >= num_instr)
>                         continue;
> @@ -84,6 +85,7 @@ static void riscv_alternative_fix_auipc_jalr(unsigned int *alt_ptr,
>                 if (rd1 != 1)
>                         continue;
>
> +printk("%s: -> found a auipc + jalr pair\n", __func__);
>                 /* get and adjust new target address */
>                 imm1 = EXTRACT_UTYPE_IMM(*(alt_ptr + i));
>                 imm1 += EXTRACT_ITYPE_IMM(*(alt_ptr + i + 1));
> @@ -101,8 +103,10 @@ static void riscv_alternative_fix_auipc_jalr(unsigned int *alt_ptr,
>                 call[0] |= to_auipc_imm(imm1);
>                 call[1] |= to_jalr_imm(imm1);
>
> +printk("%s: patching to 0x%x and 0x%x\n", __func__, call[0], call[1]);
>                 /* patch the call place again */
> -               patch_text_nosync(alt_ptr + i * sizeof(u32), call, 8);
> +               patch_text_nosync(alt_ptr + i, call, 8);
> +printk("%s: patched to 0x%x and 0x%x\n", __func__, *(alt_ptr + i), *(alt_ptr + i + 1));
>         }
>  }
>
> and then realizing that the "patching to" and "patched to" where different.
>
Thanks for the hunk.
>

Now waiting for your v3. Meanwhile, I'll look into the ALT3() macro.

Cheers,
Prabhakar

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 7/7] RISC-V: add zbb support to string functions
  2022-11-13 23:29   ` Conor Dooley
  2022-11-13 23:47     ` Heiko Stübner
@ 2022-11-24 22:23     ` Heiko Stübner
  2022-11-24 22:32       ` Conor Dooley
  1 sibling, 1 reply; 51+ messages in thread
From: Heiko Stübner @ 2022-11-24 22:23 UTC (permalink / raw)
  To: Conor Dooley
  Cc: linux-riscv, palmer, christoph.muellner, prabhakar.csengg,
	philipp.tomsich, ajones, emil.renner.berthing

> > diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> > index b22525290073..ac5555fd9788 100644
> > --- a/arch/riscv/include/asm/hwcap.h
> > +++ b/arch/riscv/include/asm/hwcap.h
> > @@ -59,6 +59,7 @@ enum riscv_isa_ext_id {
> >  	RISCV_ISA_EXT_ZIHINTPAUSE,
> >  	RISCV_ISA_EXT_SSTC,
> >  	RISCV_ISA_EXT_SVINVAL,
> > +	RISCV_ISA_EXT_ZBB,
> 
> With ZIHINTPAUSE before SSTC and SVINIVAL I assume this is not something
> we are canonically ordering but I never, ever know which ones we are
> allowed to re-order at will.

I guess we could extend the comments with suitable hints,
which could help in the future.

> 
> >  	RISCV_ISA_EXT_ID_MAX = RISCV_ISA_EXT_MAX,
> >  };
> 
> > diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> > index bf9dd6764bad..66ff36a57e20 100644
> > --- a/arch/riscv/kernel/cpu.c
> > +++ b/arch/riscv/kernel/cpu.c
> > @@ -166,6 +166,7 @@ static struct riscv_isa_ext_data isa_ext_arr[] = {
> >  	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
> >  	__RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
> >  	__RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
> > +	__RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
> >  	__RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
> >  	__RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
> >  	__RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
> 
> This one I do know that Palmer wants canonically ordered.
> 
> > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > index 026512ca9c4c..f19b9d4e2dca 100644
> > --- a/arch/riscv/kernel/cpufeature.c
> > +++ b/arch/riscv/kernel/cpufeature.c
> > @@ -201,6 +201,7 @@ void __init riscv_fill_hwcap(void)
> >  			} else {
> >  				SET_ISA_EXT_MAP("sscofpmf", RISCV_ISA_EXT_SSCOFPMF);
> >  				SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
> > +				SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB);
> >  				SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
> >  				SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE);
> >  				SET_ISA_EXT_MAP("sstc", RISCV_ISA_EXT_SSTC);
> 
> This one looks like it is, sstc aside. Same question as above, can I
> reorder this one? I'll send a patch for it if I can...

hmm, I don't see the difference between cpu.c above
(..., svpbmt, zbb, zicbom, ...) and here
(..., svpbmt, zbb, zicbom, ...)


> > diff --git a/arch/riscv/lib/strcmp_zbb.S b/arch/riscv/lib/strcmp_zbb.S
> > new file mode 100644
> > index 000000000000..aff9b941d3ee
> > --- /dev/null
> > +++ b/arch/riscv/lib/strcmp_zbb.S
> > @@ -0,0 +1,91 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (c) 2022 VRULL GmbH
> > + * Author: Christoph Muellner <christoph.muellner@vrull.eu>
> 
> Is a Co-developed-by: appropriate then?

I'd think so ... i.e. the assembly is from Christoph, but is originally
part of a pending glibc patchset, hence Christoph and me
decided on the co-developed thingy :-) .

> > +	/* Main loop for aligned string.  */
> > +	.p2align 3
> > +1:
> > +	REG_L	data1, 0(src1)
> > +	REG_L	data2, 0(src2)
> > +	orc.b	data1_orcb, data1
> > +	bne	data1_orcb, m1, 2f
> > +	addi	src1, src1, SZREG
> > +	addi	src2, src2, SZREG
> > +	beq	data1, data2, 1b
> > +
> > +	/* Words don't match, and no null byte in the first
> > +	 * word. Get bytes in big-endian order and compare.  */
> > +#ifndef CONFIG_CPU_BIG_ENDIAN
> 
> I know this is a lift from the reference implementation in the spec, but
> do we actually need this ifndef section?

I don't know, but _if_ big endian support comes in the future,
having one place less to break also might be helpful? :-)



Heiko



_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 7/7] RISC-V: add zbb support to string functions
  2022-11-24 22:23     ` Heiko Stübner
@ 2022-11-24 22:32       ` Conor Dooley
  2022-11-24 23:51         ` Heiko Stuebner
  0 siblings, 1 reply; 51+ messages in thread
From: Conor Dooley @ 2022-11-24 22:32 UTC (permalink / raw)
  To: Heiko Stübner
  Cc: linux-riscv, palmer, christoph.muellner, prabhakar.csengg,
	philipp.tomsich, ajones, emil.renner.berthing

On Thu, Nov 24, 2022 at 11:23:08PM +0100, Heiko Stübner wrote:
> > > diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> > > index b22525290073..ac5555fd9788 100644
> > > --- a/arch/riscv/include/asm/hwcap.h
> > > +++ b/arch/riscv/include/asm/hwcap.h
> > > @@ -59,6 +59,7 @@ enum riscv_isa_ext_id {
> > >  	RISCV_ISA_EXT_ZIHINTPAUSE,
> > >  	RISCV_ISA_EXT_SSTC,
> > >  	RISCV_ISA_EXT_SVINVAL,
> > > +	RISCV_ISA_EXT_ZBB,
> > 
> > With ZIHINTPAUSE before SSTC and SVINIVAL I assume this is not something
> > we are canonically ordering but I never, ever know which ones we are
> > allowed to re-order at will.
> 
> I guess we could extend the comments with suitable hints,
> which could help in the future.

Aye, for the likes of me that will never, ever remember I like the idea!

> > >  	RISCV_ISA_EXT_ID_MAX = RISCV_ISA_EXT_MAX,
> > >  };
> > 
> > > diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> > > index bf9dd6764bad..66ff36a57e20 100644
> > > --- a/arch/riscv/kernel/cpu.c
> > > +++ b/arch/riscv/kernel/cpu.c
> > > @@ -166,6 +166,7 @@ static struct riscv_isa_ext_data isa_ext_arr[] = {
> > >  	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
> > >  	__RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
> > >  	__RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
> > > +	__RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
> > >  	__RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
> > >  	__RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
> > >  	__RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
> > 
> > This one I do know that Palmer wants canonically ordered.

btw, idk if you noticed but I appear to have picked canonical ordering
as today's thing to get confused about a lot.

You put zbb after the S extentions - does that meant it is *not* an
"Additional Standard Extension" but rather a regular Z one?

> > > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > > index 026512ca9c4c..f19b9d4e2dca 100644
> > > --- a/arch/riscv/kernel/cpufeature.c
> > > +++ b/arch/riscv/kernel/cpufeature.c
> > > @@ -201,6 +201,7 @@ void __init riscv_fill_hwcap(void)
> > >  			} else {
> > >  				SET_ISA_EXT_MAP("sscofpmf", RISCV_ISA_EXT_SSCOFPMF);
> > >  				SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
> > > +				SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB);
> > >  				SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
> > >  				SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE);
> > >  				SET_ISA_EXT_MAP("sstc", RISCV_ISA_EXT_SSTC);
> > 
> > This one looks like it is, sstc aside. Same question as above, can I
> > reorder this one? I'll send a patch for it if I can...
> 
> hmm, I don't see the difference between cpu.c above
> (..., svpbmt, zbb, zicbom, ...) and here
> (..., svpbmt, zbb, zicbom, ...)

sstc appears last here but first in the cpu.c hunk above.

> > > diff --git a/arch/riscv/lib/strcmp_zbb.S b/arch/riscv/lib/strcmp_zbb.S
> > > new file mode 100644
> > > index 000000000000..aff9b941d3ee
> > > --- /dev/null
> > > +++ b/arch/riscv/lib/strcmp_zbb.S
> > > @@ -0,0 +1,91 @@
> > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > +/*
> > > + * Copyright (c) 2022 VRULL GmbH
> > > + * Author: Christoph Muellner <christoph.muellner@vrull.eu>
> > 
> > Is a Co-developed-by: appropriate then?
> 
> I'd think so ... i.e. the assembly is from Christoph, but is originally
> part of a pending glibc patchset, hence Christoph and me
> decided on the co-developed thingy :-) .

Check your patch again, I don't see a Co-developed-by: tag. (That's what
I was getting at, not the validity of "Author: Christoph...")

> > > +	/* Main loop for aligned string.  */
> > > +	.p2align 3
> > > +1:
> > > +	REG_L	data1, 0(src1)
> > > +	REG_L	data2, 0(src2)
> > > +	orc.b	data1_orcb, data1
> > > +	bne	data1_orcb, m1, 2f
> > > +	addi	src1, src1, SZREG
> > > +	addi	src2, src2, SZREG
> > > +	beq	data1, data2, 1b
> > > +
> > > +	/* Words don't match, and no null byte in the first
> > > +	 * word. Get bytes in big-endian order and compare.  */
> > > +#ifndef CONFIG_CPU_BIG_ENDIAN
> > 
> > I know this is a lift from the reference implementation in the spec, but
> > do we actually need this ifndef section?
> 
> I don't know, but _if_ big endian support comes in the future,
> having one place less to break also might be helpful? :-)

One less place to have to go and fix it up, but I hope it never comes to
pass! And no harm is being close to the one in the spec...


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 7/7] RISC-V: add zbb support to string functions
  2022-11-24 22:32       ` Conor Dooley
@ 2022-11-24 23:51         ` Heiko Stuebner
  2022-11-25  7:49           ` Andrew Jones
  0 siblings, 1 reply; 51+ messages in thread
From: Heiko Stuebner @ 2022-11-24 23:51 UTC (permalink / raw)
  To: Conor Dooley
  Cc: linux-riscv, palmer, christoph.muellner, prabhakar.csengg,
	philipp.tomsich, ajones, emil.renner.berthing

Am Donnerstag, 24. November 2022, 23:32:58 CET schrieb Conor Dooley:
> On Thu, Nov 24, 2022 at 11:23:08PM +0100, Heiko Stübner wrote:
> > > > diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> > > > index b22525290073..ac5555fd9788 100644
> > > > --- a/arch/riscv/include/asm/hwcap.h
> > > > +++ b/arch/riscv/include/asm/hwcap.h
> > > > @@ -59,6 +59,7 @@ enum riscv_isa_ext_id {
> > > >  	RISCV_ISA_EXT_ZIHINTPAUSE,
> > > >  	RISCV_ISA_EXT_SSTC,
> > > >  	RISCV_ISA_EXT_SVINVAL,
> > > > +	RISCV_ISA_EXT_ZBB,
> > > 
> > > With ZIHINTPAUSE before SSTC and SVINIVAL I assume this is not something
> > > we are canonically ordering but I never, ever know which ones we are
> > > allowed to re-order at will.
> > 
> > I guess we could extend the comments with suitable hints,
> > which could help in the future.
> 
> Aye, for the likes of me that will never, ever remember I like the idea!

I'm 100% with you on this. I remember that this came up either with
svpbmt or zicbom in the past, but I still again forgot how the ordering
goes.

> 
> > > >  	RISCV_ISA_EXT_ID_MAX = RISCV_ISA_EXT_MAX,
> > > >  };
> > > 
> > > > diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> > > > index bf9dd6764bad..66ff36a57e20 100644
> > > > --- a/arch/riscv/kernel/cpu.c
> > > > +++ b/arch/riscv/kernel/cpu.c
> > > > @@ -166,6 +166,7 @@ static struct riscv_isa_ext_data isa_ext_arr[] = {
> > > >  	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
> > > >  	__RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
> > > >  	__RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
> > > > +	__RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
> > > >  	__RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
> > > >  	__RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
> > > >  	__RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
> > > 
> > > This one I do know that Palmer wants canonically ordered.
> 
> btw, idk if you noticed but I appear to have picked canonical ordering
> as today's thing to get confused about a lot.
> 
> You put zbb after the S extentions - does that meant it is *not* an
> "Additional Standard Extension" but rather a regular Z one?

This confuses me completely now :-) .

The list is still too short to see where other extensions
are placed. I guess I need to find that stuff again about
extension ordering.


> > > > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > > > index 026512ca9c4c..f19b9d4e2dca 100644
> > > > --- a/arch/riscv/kernel/cpufeature.c
> > > > +++ b/arch/riscv/kernel/cpufeature.c
> > > > @@ -201,6 +201,7 @@ void __init riscv_fill_hwcap(void)
> > > >  			} else {
> > > >  				SET_ISA_EXT_MAP("sscofpmf", RISCV_ISA_EXT_SSCOFPMF);
> > > >  				SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
> > > > +				SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB);
> > > >  				SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
> > > >  				SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE);
> > > >  				SET_ISA_EXT_MAP("sstc", RISCV_ISA_EXT_SSTC);
> > > 
> > > This one looks like it is, sstc aside. Same question as above, can I
> > > reorder this one? I'll send a patch for it if I can...
> > 
> > hmm, I don't see the difference between cpu.c above
> > (..., svpbmt, zbb, zicbom, ...) and here
> > (..., svpbmt, zbb, zicbom, ...)
> 
> sstc appears last here but first in the cpu.c hunk above.
> 
> > > > diff --git a/arch/riscv/lib/strcmp_zbb.S b/arch/riscv/lib/strcmp_zbb.S
> > > > new file mode 100644
> > > > index 000000000000..aff9b941d3ee
> > > > --- /dev/null
> > > > +++ b/arch/riscv/lib/strcmp_zbb.S
> > > > @@ -0,0 +1,91 @@
> > > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > > +/*
> > > > + * Copyright (c) 2022 VRULL GmbH
> > > > + * Author: Christoph Muellner <christoph.muellner@vrull.eu>
> > > 
> > > Is a Co-developed-by: appropriate then?
> > 
> > I'd think so ... i.e. the assembly is from Christoph, but is originally
> > part of a pending glibc patchset, hence Christoph and me
> > decided on the co-developed thingy :-) .
> 
> Check your patch again, I don't see a Co-developed-by: tag. (That's what
> I was getting at, not the validity of "Author: Christoph...")

Now I remember, I talked with Christoph about that _after_ sending
this series. So my "git log" did show the Co-developed-by
all the time, which then confused me :-)


Heiko



_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 7/7] RISC-V: add zbb support to string functions
  2022-11-24 23:51         ` Heiko Stuebner
@ 2022-11-25  7:49           ` Andrew Jones
  2022-11-25  8:17             ` Conor.Dooley
       [not found]             ` <CAEg0e7h9skbWPVDsz9CdB8dATN5XM9eT-uPY0A7xRZmX=qTU6A@mail.gmail.com>
  0 siblings, 2 replies; 51+ messages in thread
From: Andrew Jones @ 2022-11-25  7:49 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: Conor Dooley, linux-riscv, palmer, christoph.muellner,
	prabhakar.csengg, philipp.tomsich, emil.renner.berthing

On Fri, Nov 25, 2022 at 12:51:54AM +0100, Heiko Stuebner wrote:
> Am Donnerstag, 24. November 2022, 23:32:58 CET schrieb Conor Dooley:
> > On Thu, Nov 24, 2022 at 11:23:08PM +0100, Heiko Stübner wrote:
...
> > > > > diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> > > > > index bf9dd6764bad..66ff36a57e20 100644
> > > > > --- a/arch/riscv/kernel/cpu.c
> > > > > +++ b/arch/riscv/kernel/cpu.c
> > > > > @@ -166,6 +166,7 @@ static struct riscv_isa_ext_data isa_ext_arr[] = {
> > > > >  	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
> > > > >  	__RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
> > > > >  	__RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
> > > > > +	__RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
> > > > >  	__RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
> > > > >  	__RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
> > > > >  	__RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
> > > > 
> > > > This one I do know that Palmer wants canonically ordered.
> > 
> > btw, idk if you noticed but I appear to have picked canonical ordering
> > as today's thing to get confused about a lot.
> > 
> > You put zbb after the S extentions - does that meant it is *not* an
> > "Additional Standard Extension" but rather a regular Z one?
> 
> This confuses me completely now :-) .
>

Can we instead post a patch to the spec that changes the order to
alphabetical? The only other option I see is to develop a tool which sorts
extensions and every RISC-V developer keeps it in their back pocket. A
kernel specific tool to check each list we want to keep sorted would be
nice too.

My preference would be to change the spec to alphabetical order, though,
because the spec isn't explicit[*] enough to write a tool that can handle
all cases. We'll end up needing to have conversations like this one to
write the tool and eventually the tool will be what everyone looks to,
rather than the spec...

[*] The spec uses words like 'can', 'should', and 'conventional'.

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 7/7] RISC-V: add zbb support to string functions
  2022-11-25  7:49           ` Andrew Jones
@ 2022-11-25  8:17             ` Conor.Dooley
       [not found]             ` <CAEg0e7h9skbWPVDsz9CdB8dATN5XM9eT-uPY0A7xRZmX=qTU6A@mail.gmail.com>
  1 sibling, 0 replies; 51+ messages in thread
From: Conor.Dooley @ 2022-11-25  8:17 UTC (permalink / raw)
  To: ajones, heiko, palmer
  Cc: conor, linux-riscv, christoph.muellner, prabhakar.csengg,
	philipp.tomsich, emil.renner.berthing

On 25/11/2022 07:49, Andrew Jones wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On Fri, Nov 25, 2022 at 12:51:54AM +0100, Heiko Stuebner wrote:
>> Am Donnerstag, 24. November 2022, 23:32:58 CET schrieb Conor Dooley:
>>> On Thu, Nov 24, 2022 at 11:23:08PM +0100, Heiko Stübner wrote:
> ...
>>>>>> diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
>>>>>> index bf9dd6764bad..66ff36a57e20 100644
>>>>>> --- a/arch/riscv/kernel/cpu.c
>>>>>> +++ b/arch/riscv/kernel/cpu.c
>>>>>> @@ -166,6 +166,7 @@ static struct riscv_isa_ext_data isa_ext_arr[] = {
>>>>>>        __RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
>>>>>>        __RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
>>>>>>        __RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
>>>>>> +     __RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
>>>>>>        __RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
>>>>>>        __RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
>>>>>>        __RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
>>>>>
>>>>> This one I do know that Palmer wants canonically ordered.
>>>
>>> btw, idk if you noticed but I appear to have picked canonical ordering
>>> as today's thing to get confused about a lot.
>>>
>>> You put zbb after the S extentions - does that meant it is *not* an
>>> "Additional Standard Extension" but rather a regular Z one?
>>
>> This confuses me completely now :-) .
>>
> 
> Can we instead post a patch to the spec that changes the order to
> alphabetical? The only other option I see is to develop a tool which sorts
> extensions and every RISC-V developer keeps it in their back pocket. A
> kernel specific tool to check each list we want to keep sorted would be
> nice too.

Is there some reason that these things need to be output in canonical
order in the first place by the kernel?
Could we say to hell with even trying to figure out what the correct
order is (since yeah, it'll be a conversation ~every time this comes up)
or is that breaking uAPI since someone's parser in userland may expect to
see canonical order only?
That's been my assumption for why it was re-sorted, @Palmer?

> My preference would be to change the spec to alphabetical order, though,
> because the spec isn't explicit[*] enough to write a tool that can handle
> all cases. We'll end up needing to have conversations like this one to
> write the tool and eventually the tool will be what everyone looks to,
> rather than the spec...

If it could be explicitly clear what constitutes an "additional standard
extension" that'd be good enough for me! Say:

diff --git a/src/naming.tex b/src/naming.tex
index bfd67d4..9d63a86 100644
--- a/src/naming.tex
+++ b/src/naming.tex
@@ -80,7 +80,9 @@ Standard extensions can also be named using a single ``Z'' followed by an
  alphabetical name and an optional version number.  For example,
  ``Zifencei'' names the instruction-fetch fence extension described in
  Chapter~\ref{chap:zifencei}; ``Zifencei2'' and ``Zifencei2p0'' name version
-2.0 of same.
+2.0 of same. The entire set of Additional Standard Extensions are documented
+in the Standard Unprivileged Extensions section of Table~\ref{isanametable}
+below.
  
  The first letter following the ``Z'' conventionally indicates the most closely
  related alphabetical extension category, IMAFDQCVH.  For the ``Zam''

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 7/7] RISC-V: add zbb support to string functions
       [not found]             ` <CAEg0e7h9skbWPVDsz9CdB8dATN5XM9eT-uPY0A7xRZmX=qTU6A@mail.gmail.com>
@ 2022-11-25 15:28               ` Andrew Jones
  2022-11-25 16:35                 ` Christoph Müllner
  2022-11-25 16:36                 ` Conor Dooley
  0 siblings, 2 replies; 51+ messages in thread
From: Andrew Jones @ 2022-11-25 15:28 UTC (permalink / raw)
  To: Christoph Müllner
  Cc: Heiko Stuebner, Conor Dooley, linux-riscv, palmer,
	prabhakar.csengg, philipp.tomsich, emil.renner.berthing

On Fri, Nov 25, 2022 at 12:26:42PM +0100, Christoph Müllner wrote:
> On Fri, Nov 25, 2022 at 8:49 AM Andrew Jones <ajones@ventanamicro.com>
> wrote:
> 
> > On Fri, Nov 25, 2022 at 12:51:54AM +0100, Heiko Stuebner wrote:
> > > Am Donnerstag, 24. November 2022, 23:32:58 CET schrieb Conor Dooley:
> > > > On Thu, Nov 24, 2022 at 11:23:08PM +0100, Heiko Stübner wrote:
> > ...
> > > > > > > diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> > > > > > > index bf9dd6764bad..66ff36a57e20 100644
> > > > > > > --- a/arch/riscv/kernel/cpu.c
> > > > > > > +++ b/arch/riscv/kernel/cpu.c
> > > > > > > @@ -166,6 +166,7 @@ static struct riscv_isa_ext_data
> > isa_ext_arr[] = {
> > > > > > >       __RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
> > > > > > >       __RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
> > > > > > >       __RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
> > > > > > > +     __RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
> > > > > > >       __RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
> > > > > > >       __RISCV_ISA_EXT_DATA(zihintpause,
> > RISCV_ISA_EXT_ZIHINTPAUSE),
> > > > > > >       __RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
> > > > > >
> > > > > > This one I do know that Palmer wants canonically ordered.
> > > >
> > > > btw, idk if you noticed but I appear to have picked canonical ordering
> > > > as today's thing to get confused about a lot.
> > > >
> > > > You put zbb after the S extentions - does that meant it is *not* an
> > > > "Additional Standard Extension" but rather a regular Z one?
> > >
> > > This confuses me completely now :-) .
> > >
> >
> > Can we instead post a patch to the spec that changes the order to
> > alphabetical? The only other option I see is to develop a tool which sorts
> > extensions and every RISC-V developer keeps it in their back pocket. A
> > kernel specific tool to check each list we want to keep sorted would be
> > nice too.
> >
> > My preference would be to change the spec to alphabetical order, though,
> > because the spec isn't explicit[*] enough to write a tool that can handle
> > all cases. We'll end up needing to have conversations like this one to
> > write the tool and eventually the tool will be what everyone looks to,
> > rather than the spec...
> >
> > [*] The spec uses words like 'can', 'should', and 'conventional'.
> >
> 
> The unpriv spec is clear about the canonical order (table "Standard ISA
> extension names"):

The caption under table 27.1 does indeed declare the table defines the
canonical order and that it *must* be used for the name string, but
almost everywhere else in chapter 27 the word "should" is used to suggest
how extensions be ordered (only X-extensions say where they must be).

> 1) Base ISA
> 2) Standard Unpriv Extension (non alphabetical)

The 'non alphabetical' part makes this a PITA.

And section 27.6 says the first letter "conventionally indicates...". I
suppose we can assume it "must indicate"?

> 3) Standard Supervisor-Level Extensions

Are the conventions for the first character of S-extensions defined? I've
seen 'Ss' for "Privileged arch and Supervisor-level extensions", e.g.
Sscofpmf. 'Sv' for virtual memory (I guess) related extensions, e.g.
Svinval, and we appear to be using alphabetical order for them.

> 4) Standard Machine-Level Extensions
> 5) Non-Standard (Vendor) Extensions

Anyway, for the relatively easier problem of this kernel list and this
patch, we could do something with defines like below in order to try and
keep the order right.

/*
 * Each sub-list is sorted alphabetically.
 */
#define S_EXTENSIONS                                                    \
        __RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),         \
        __RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),                 \
        __RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),           \
        __RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),

#define Zi_EXTENSIONS                                                   \
        __RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),             \
        __RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),

#define Zb_EXTENSIONS                                                   \
        __RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZICBOM),                \

static struct riscv_isa_ext_data isa_ext_arr[] = {
        Zi_EXTENSIONS
        Zb_EXTENSIONS
        S_EXTENSIONS
        __RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
};

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 7/7] RISC-V: add zbb support to string functions
  2022-11-25 15:28               ` Andrew Jones
@ 2022-11-25 16:35                 ` Christoph Müllner
  2022-11-25 16:39                   ` Conor Dooley
  2022-11-25 16:36                 ` Conor Dooley
  1 sibling, 1 reply; 51+ messages in thread
From: Christoph Müllner @ 2022-11-25 16:35 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Heiko Stuebner, Conor Dooley, linux-riscv, palmer,
	prabhakar.csengg, philipp.tomsich, emil.renner.berthing

On Fri, Nov 25, 2022 at 4:28 PM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Fri, Nov 25, 2022 at 12:26:42PM +0100, Christoph Müllner wrote:
> > On Fri, Nov 25, 2022 at 8:49 AM Andrew Jones <ajones@ventanamicro.com>
> > wrote:
> >
> > > On Fri, Nov 25, 2022 at 12:51:54AM +0100, Heiko Stuebner wrote:
> > > > Am Donnerstag, 24. November 2022, 23:32:58 CET schrieb Conor Dooley:
> > > > > On Thu, Nov 24, 2022 at 11:23:08PM +0100, Heiko Stübner wrote:
> > > ...
> > > > > > > > diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> > > > > > > > index bf9dd6764bad..66ff36a57e20 100644
> > > > > > > > --- a/arch/riscv/kernel/cpu.c
> > > > > > > > +++ b/arch/riscv/kernel/cpu.c
> > > > > > > > @@ -166,6 +166,7 @@ static struct riscv_isa_ext_data
> > > isa_ext_arr[] = {
> > > > > > > >       __RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
> > > > > > > >       __RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
> > > > > > > >       __RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
> > > > > > > > +     __RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
> > > > > > > >       __RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
> > > > > > > >       __RISCV_ISA_EXT_DATA(zihintpause,
> > > RISCV_ISA_EXT_ZIHINTPAUSE),
> > > > > > > >       __RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
> > > > > > >
> > > > > > > This one I do know that Palmer wants canonically ordered.
> > > > >
> > > > > btw, idk if you noticed but I appear to have picked canonical ordering
> > > > > as today's thing to get confused about a lot.
> > > > >
> > > > > You put zbb after the S extentions - does that meant it is *not* an
> > > > > "Additional Standard Extension" but rather a regular Z one?
> > > >
> > > > This confuses me completely now :-) .
> > > >
> > >
> > > Can we instead post a patch to the spec that changes the order to
> > > alphabetical? The only other option I see is to develop a tool which sorts
> > > extensions and every RISC-V developer keeps it in their back pocket. A
> > > kernel specific tool to check each list we want to keep sorted would be
> > > nice too.
> > >
> > > My preference would be to change the spec to alphabetical order, though,
> > > because the spec isn't explicit[*] enough to write a tool that can handle
> > > all cases. We'll end up needing to have conversations like this one to
> > > write the tool and eventually the tool will be what everyone looks to,
> > > rather than the spec...
> > >
> > > [*] The spec uses words like 'can', 'should', and 'conventional'.
> > >
> >
> > The unpriv spec is clear about the canonical order (table "Standard ISA
> > extension names"):
>
> The caption under table 27.1 does indeed declare the table defines the
> canonical order and that it *must* be used for the name string, but
> almost everywhere else in chapter 27 the word "should" is used to suggest
> how extensions be ordered (only X-extensions say where they must be).
>
> > 1) Base ISA
> > 2) Standard Unpriv Extension (non alphabetical)
>
> The 'non alphabetical' part makes this a PITA.
>
> And section 27.6 says the first letter "conventionally indicates...". I
> suppose we can assume it "must indicate"?

I think it is what it is and changing it would just open pointless discussions.
Further, I think that consistency is more important than trying to establish a
new second ordering scheme (regardless if that is simpler and/or better)
unless we have to overcome technical issues that would otherwise not
be possible.

>
> > 3) Standard Supervisor-Level Extensions
>
> Are the conventions for the first character of S-extensions defined? I've
> seen 'Ss' for "Privileged arch and Supervisor-level extensions", e.g.
> Sscofpmf. 'Sv' for virtual memory (I guess) related extensions, e.g.
> Svinval, and we appear to be using alphabetical order for them.
>
> > 4) Standard Machine-Level Extensions
> > 5) Non-Standard (Vendor) Extensions
>
> Anyway, for the relatively easier problem of this kernel list and this
> patch, we could do something with defines like below in order to try and
> keep the order right.
>
> /*
>  * Each sub-list is sorted alphabetically.
>  */
> #define S_EXTENSIONS                                                    \
>         __RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),         \
>         __RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),                 \
>         __RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),           \
>         __RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
>
> #define Zi_EXTENSIONS                                                   \
>         __RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),             \
>         __RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
>
> #define Zb_EXTENSIONS                                                   \
>         __RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZICBOM),                \
>
> static struct riscv_isa_ext_data isa_ext_arr[] = {
>         Zi_EXTENSIONS
>         Zb_EXTENSIONS
>         S_EXTENSIONS
>         __RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
> };

It might be worth to look how Binutils are grouping them:
  https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elfxx-riscv.c#l1261

They also group extensions together but use the following groups:

const struct riscv_supported_ext *riscv_all_supported_ext[] =
{
  riscv_supported_std_ext,
  riscv_supported_std_z_ext,
  riscv_supported_std_s_ext,
  riscv_supported_std_zxm_ext,
  riscv_supported_vendor_x_ext,
  NULL
};

BR
Christoph

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 7/7] RISC-V: add zbb support to string functions
  2022-11-25 15:28               ` Andrew Jones
  2022-11-25 16:35                 ` Christoph Müllner
@ 2022-11-25 16:36                 ` Conor Dooley
  1 sibling, 0 replies; 51+ messages in thread
From: Conor Dooley @ 2022-11-25 16:36 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Christoph Müllner, Heiko Stuebner, linux-riscv, palmer,
	prabhakar.csengg, philipp.tomsich, emil.renner.berthing

Hey Drew, Christoph,

@Christoph, I did not receive your mail unfortunately as it appears to
be html. I assume you know why that's a problem and had some mailer
issues etc.

also + CC Samuel Ortiz since we discussed this elsewhere...

On Fri, Nov 25, 2022 at 04:28:21PM +0100, Andrew Jones wrote:
> On Fri, Nov 25, 2022 at 12:26:42PM +0100, Christoph Müllner wrote:
> > On Fri, Nov 25, 2022 at 8:49 AM Andrew Jones <ajones@ventanamicro.com>
> > wrote:
> > 
> > > On Fri, Nov 25, 2022 at 12:51:54AM +0100, Heiko Stuebner wrote:
> > > > Am Donnerstag, 24. November 2022, 23:32:58 CET schrieb Conor Dooley:
> > > > > On Thu, Nov 24, 2022 at 11:23:08PM +0100, Heiko Stübner wrote:
> > > ...
> > > > > > > > diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> > > > > > > > index bf9dd6764bad..66ff36a57e20 100644
> > > > > > > > --- a/arch/riscv/kernel/cpu.c
> > > > > > > > +++ b/arch/riscv/kernel/cpu.c
> > > > > > > > @@ -166,6 +166,7 @@ static struct riscv_isa_ext_data
> > > isa_ext_arr[] = {
> > > > > > > >       __RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
> > > > > > > >       __RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
> > > > > > > >       __RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
> > > > > > > > +     __RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
> > > > > > > >       __RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
> > > > > > > >       __RISCV_ISA_EXT_DATA(zihintpause,
> > > RISCV_ISA_EXT_ZIHINTPAUSE),
> > > > > > > >       __RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
> > > > > > >
> > > > > > > This one I do know that Palmer wants canonically ordered.
> > > > >
> > > > > btw, idk if you noticed but I appear to have picked canonical ordering
> > > > > as today's thing to get confused about a lot.
> > > > >
> > > > > You put zbb after the S extentions - does that meant it is *not* an
> > > > > "Additional Standard Extension" but rather a regular Z one?
> > > >
> > > > This confuses me completely now :-) .
> > > >
> > >
> > > Can we instead post a patch to the spec that changes the order to
> > > alphabetical? The only other option I see is to develop a tool which sorts
> > > extensions and every RISC-V developer keeps it in their back pocket. A
> > > kernel specific tool to check each list we want to keep sorted would be
> > > nice too.
> > >
> > > My preference would be to change the spec to alphabetical order, though,
> > > because the spec isn't explicit[*] enough to write a tool that can handle
> > > all cases. We'll end up needing to have conversations like this one to
> > > write the tool and eventually the tool will be what everyone looks to,
> > > rather than the spec...
> > >
> > > [*] The spec uses words like 'can', 'should', and 'conventional'.
> > >
> > 
> > The unpriv spec is clear about the canonical order (table "Standard ISA
> > extension names"):

So I went reading the isa-manual again for the nth time.. It seems that I
missed the sentence:
Standard machine-level instruction-set extensions are prefixed with the
three letters ``Zxm''. Woops & apologies @Samuel!

However, that table is only clear (as pointed out by Drew) on a
categorical level.

> The caption under table 27.1 does indeed declare the table defines the
> canonical order and that it *must* be used for the name string, but
> almost everywhere else in chapter 27 the word "should" is used to suggest
> how extensions be ordered (only X-extensions say where they must be).
> 
> > 1) Base ISA
> > 2) Standard Unpriv Extension (non alphabetical)
> 
> The 'non alphabetical' part makes this a PITA.
> 
> And section 27.6 says the first letter "conventionally indicates...". I
> suppose we can assume it "must indicate"?

Convention would imply it *should* but not that it must. I think, for
the kernel, we take a stronger view and say that we *will* put them in
the listed order in that table.

> > 3) Standard Supervisor-Level Extensions
> 
> Are the conventions for the first character of S-extensions defined? I've
> seen 'Ss' for "Privileged arch and Supervisor-level extensions", e.g.
> Sscofpmf. 'Sv' for virtual memory (I guess) related extensions, e.g.
> Svinval, and we appear to be using alphabetical order for them.

Again, appears to be a "should". I'd vote for doing everyone a favour
and making it "must" in this case kernel wise.
> 
> > 4) Standard Machine-Level Extensions
> > 5) Non-Standard (Vendor) Extensions
> 
> Anyway, for the relatively easier problem of this kernel list and this
> patch, we could do something with defines like below in order to try and
> keep the order right.
> 
> /*
>  * Each sub-list is sorted alphabetically.
>  */
> #define S_EXTENSIONS                                                    \
>         __RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),         \
>         __RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),                 \
>         __RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),           \
>         __RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
> 
> #define Zi_EXTENSIONS                                                   \
>         __RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),             \
>         __RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
> 
> #define Zb_EXTENSIONS                                                   \
>         __RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZICBOM),                \
> 
> static struct riscv_isa_ext_data isa_ext_arr[] = {
>         Zi_EXTENSIONS
>         Zb_EXTENSIONS
>         S_EXTENSIONS
>         __RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX)

The above LGTM (apart from the accident in the zbb entry!)

Thanks for all putting up with my confusion here - although the
lack of clarity appears to be rather widespread hahaha.

Thanks,
Conor.


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 7/7] RISC-V: add zbb support to string functions
  2022-11-25 16:35                 ` Christoph Müllner
@ 2022-11-25 16:39                   ` Conor Dooley
  2022-11-25 17:02                     ` Christoph Müllner
  0 siblings, 1 reply; 51+ messages in thread
From: Conor Dooley @ 2022-11-25 16:39 UTC (permalink / raw)
  To: Christoph Müllner
  Cc: Andrew Jones, Heiko Stuebner, linux-riscv, palmer,
	prabhakar.csengg, philipp.tomsich, emil.renner.berthing

On Fri, Nov 25, 2022 at 05:35:39PM +0100, Christoph Müllner wrote:
> On Fri, Nov 25, 2022 at 4:28 PM Andrew Jones <ajones@ventanamicro.com> wrote:

> > /*
> >  * Each sub-list is sorted alphabetically.
> >  */
> > #define S_EXTENSIONS                                                    \
> >         __RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),         \
> >         __RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),                 \
> >         __RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),           \
> >         __RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
> >
> > #define Zi_EXTENSIONS                                                   \
> >         __RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),             \
> >         __RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
> >
> > #define Zb_EXTENSIONS                                                   \
> >         __RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZICBOM),                \
> >
> > static struct riscv_isa_ext_data isa_ext_arr[] = {
> >         Zi_EXTENSIONS
> >         Zb_EXTENSIONS
> >         S_EXTENSIONS
> >         __RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
> > };
> 
> It might be worth to look how Binutils are grouping them:
>   https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elfxx-riscv.c#l1261
> 
> They also group extensions together but use the following groups:
> 
> const struct riscv_supported_ext *riscv_all_supported_ext[] =
> {
>   riscv_supported_std_ext,
>   riscv_supported_std_z_ext,
>   riscv_supported_std_s_ext,
>   riscv_supported_std_zxm_ext,
>   riscv_supported_vendor_x_ext,
>   NULL
> };

I think in this case, Drew's differentiation between Zi & Zb is an
improvement :)


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 7/7] RISC-V: add zbb support to string functions
  2022-11-25 16:39                   ` Conor Dooley
@ 2022-11-25 17:02                     ` Christoph Müllner
  2022-11-25 17:11                       ` Conor Dooley
  0 siblings, 1 reply; 51+ messages in thread
From: Christoph Müllner @ 2022-11-25 17:02 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Andrew Jones, Heiko Stuebner, linux-riscv, palmer,
	prabhakar.csengg, philipp.tomsich, emil.renner.berthing

On Fri, Nov 25, 2022 at 5:39 PM Conor Dooley <conor@kernel.org> wrote:
>
> On Fri, Nov 25, 2022 at 05:35:39PM +0100, Christoph Müllner wrote:
> > On Fri, Nov 25, 2022 at 4:28 PM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> > > /*
> > >  * Each sub-list is sorted alphabetically.
> > >  */
> > > #define S_EXTENSIONS                                                    \
> > >         __RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),         \
> > >         __RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),                 \
> > >         __RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),           \
> > >         __RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
> > >
> > > #define Zi_EXTENSIONS                                                   \
> > >         __RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),             \
> > >         __RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
> > >
> > > #define Zb_EXTENSIONS                                                   \
> > >         __RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZICBOM),                \
> > >
> > > static struct riscv_isa_ext_data isa_ext_arr[] = {
> > >         Zi_EXTENSIONS
> > >         Zb_EXTENSIONS
> > >         S_EXTENSIONS
> > >         __RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
> > > };
> >
> > It might be worth to look how Binutils are grouping them:
> >   https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elfxx-riscv.c#l1261
> >
> > They also group extensions together but use the following groups:
> >
> > const struct riscv_supported_ext *riscv_all_supported_ext[] =
> > {
> >   riscv_supported_std_ext,
> >   riscv_supported_std_z_ext,
> >   riscv_supported_std_s_ext,
> >   riscv_supported_std_zxm_ext,
> >   riscv_supported_vendor_x_ext,
> >   NULL
> > };
>
> I think in this case, Drew's differentiation between Zi & Zb is an
> improvement :)


That might be the case, but it also brings up the following questions:
Where to put Zf* extensions (between Zi and Zb)?
Where to put Zk* and Zv* (after Zb*)?
Adding a subgroup for each Z{N}* extension might result in many small
groups with little benefit.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 7/7] RISC-V: add zbb support to string functions
  2022-11-25 17:02                     ` Christoph Müllner
@ 2022-11-25 17:11                       ` Conor Dooley
  2022-11-25 17:42                         ` Christoph Müllner
  0 siblings, 1 reply; 51+ messages in thread
From: Conor Dooley @ 2022-11-25 17:11 UTC (permalink / raw)
  To: Christoph Müllner
  Cc: Andrew Jones, Heiko Stuebner, linux-riscv, palmer,
	prabhakar.csengg, philipp.tomsich, emil.renner.berthing

On Fri, Nov 25, 2022 at 06:02:12PM +0100, Christoph Müllner wrote:
> On Fri, Nov 25, 2022 at 5:39 PM Conor Dooley <conor@kernel.org> wrote:
> >
> > On Fri, Nov 25, 2022 at 05:35:39PM +0100, Christoph Müllner wrote:
> > > On Fri, Nov 25, 2022 at 4:28 PM Andrew Jones <ajones@ventanamicro.com> wrote:
> >
> > > > /*
> > > >  * Each sub-list is sorted alphabetically.
> > > >  */
> > > > #define S_EXTENSIONS                                                    \
> > > >         __RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),         \
> > > >         __RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),                 \
> > > >         __RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),           \
> > > >         __RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
> > > >
> > > > #define Zi_EXTENSIONS                                                   \
> > > >         __RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),             \
> > > >         __RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
> > > >
> > > > #define Zb_EXTENSIONS                                                   \
> > > >         __RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZICBOM),                \
> > > >
> > > > static struct riscv_isa_ext_data isa_ext_arr[] = {
> > > >         Zi_EXTENSIONS
> > > >         Zb_EXTENSIONS
> > > >         S_EXTENSIONS
> > > >         __RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
> > > > };
> > >
> > > It might be worth to look how Binutils are grouping them:
> > >   https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elfxx-riscv.c#l1261
> > >
> > > They also group extensions together but use the following groups:
> > >
> > > const struct riscv_supported_ext *riscv_all_supported_ext[] =
> > > {
> > >   riscv_supported_std_ext,
> > >   riscv_supported_std_z_ext,
> > >   riscv_supported_std_s_ext,
> > >   riscv_supported_std_zxm_ext,
> > >   riscv_supported_vendor_x_ext,
> > >   NULL
> > > };
> >
> > I think in this case, Drew's differentiation between Zi & Zb is an
> > improvement :)
> 
> 
> That might be the case, but it also brings up the following questions:
> Where to put Zf* extensions (between Zi and Zb)?

I don't think that differs either way?

> Where to put Zk* and Zv* (after Zb*)?

Ditto?

> Adding a subgroup for each Z{N}* extension might result in many small
> groups with little benefit.

If it prevents another one of these ordering episodes, I think it's
worth doing! If a "group" only has one element it could alternatively be
added directly also.

Also neat, you got your mailer sorted :)


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 7/7] RISC-V: add zbb support to string functions
  2022-11-25 17:11                       ` Conor Dooley
@ 2022-11-25 17:42                         ` Christoph Müllner
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Müllner @ 2022-11-25 17:42 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Andrew Jones, Heiko Stuebner, linux-riscv, palmer,
	prabhakar.csengg, philipp.tomsich, emil.renner.berthing

On Fri, Nov 25, 2022 at 6:11 PM Conor Dooley <conor@kernel.org> wrote:
>
> On Fri, Nov 25, 2022 at 06:02:12PM +0100, Christoph Müllner wrote:
> > On Fri, Nov 25, 2022 at 5:39 PM Conor Dooley <conor@kernel.org> wrote:
> > >
> > > On Fri, Nov 25, 2022 at 05:35:39PM +0100, Christoph Müllner wrote:
> > > > On Fri, Nov 25, 2022 at 4:28 PM Andrew Jones <ajones@ventanamicro.com> wrote:
> > >
> > > > > /*
> > > > >  * Each sub-list is sorted alphabetically.
> > > > >  */
> > > > > #define S_EXTENSIONS                                                    \
> > > > >         __RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),         \
> > > > >         __RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),                 \
> > > > >         __RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),           \
> > > > >         __RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
> > > > >
> > > > > #define Zi_EXTENSIONS                                                   \
> > > > >         __RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),             \
> > > > >         __RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
> > > > >
> > > > > #define Zb_EXTENSIONS                                                   \
> > > > >         __RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZICBOM),                \
> > > > >
> > > > > static struct riscv_isa_ext_data isa_ext_arr[] = {
> > > > >         Zi_EXTENSIONS
> > > > >         Zb_EXTENSIONS
> > > > >         S_EXTENSIONS
> > > > >         __RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
> > > > > };
> > > >
> > > > It might be worth to look how Binutils are grouping them:
> > > >   https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elfxx-riscv.c#l1261
> > > >
> > > > They also group extensions together but use the following groups:
> > > >
> > > > const struct riscv_supported_ext *riscv_all_supported_ext[] =
> > > > {
> > > >   riscv_supported_std_ext,
> > > >   riscv_supported_std_z_ext,
> > > >   riscv_supported_std_s_ext,
> > > >   riscv_supported_std_zxm_ext,
> > > >   riscv_supported_vendor_x_ext,
> > > >   NULL
> > > > };
> > >
> > > I think in this case, Drew's differentiation between Zi & Zb is an
> > > improvement :)
> >
> >
> > That might be the case, but it also brings up the following questions:
> > Where to put Zf* extensions (between Zi and Zb)?
>
> I don't think that differs either way?
>
> > Where to put Zk* and Zv* (after Zb*)?
>
> Ditto?
>
> > Adding a subgroup for each Z{N}* extension might result in many small
> > groups with little benefit.
>
> If it prevents another one of these ordering episodes, I think it's
> worth doing! If a "group" only has one element it could alternatively be
> added directly also.

Sounds good then.

>
> Also neat, you got your mailer sorted :)

One just needs to remember to check the "Plain text mode" box.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2022-11-25 17:42 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-10 16:49 [PATCH 0/7] Zbb string optimizations and call support in alternatives Heiko Stuebner
2022-11-10 16:49 ` [PATCH 1/7] efi/riscv: libstub: mark when compiling libstub Heiko Stuebner
2022-11-13 17:16   ` Conor Dooley
2022-11-13 17:20     ` Heiko Stübner
2022-11-13 18:06       ` Conor Dooley
2022-11-10 16:49 ` [PATCH 2/7] RISC-V: add auipc elements to parse_asm header Heiko Stuebner
2022-11-13 17:18   ` Conor Dooley
2022-11-10 16:49 ` [PATCH 3/7] RISC-V: add U-type imm parsing " Heiko Stuebner
2022-11-13 19:06   ` Conor Dooley
2022-11-10 16:49 ` [PATCH 4/7] RISC-V: add rd reg " Heiko Stuebner
2022-11-13 19:08   ` Conor Dooley
2022-11-10 16:49 ` [PATCH 5/7] RISC-V: fix auipc-jalr addresses in patched alternatives Heiko Stuebner
2022-11-13 20:31   ` Conor Dooley
2022-11-14 10:57   ` Emil Renner Berthing
2022-11-14 11:35     ` Andrew Jones
2022-11-14 11:38       ` Emil Renner Berthing
2022-11-14 11:38       ` Heiko Stübner
2022-11-14 12:15         ` Andrew Jones
2022-11-14 12:29           ` Emil Renner Berthing
2022-11-14 12:47         ` Philipp Tomsich
2022-11-15 14:28   ` Lad, Prabhakar
2022-11-17 11:51     ` Heiko Stübner
2022-11-21  9:50   ` Lad, Prabhakar
2022-11-21 11:27     ` Heiko Stübner
2022-11-21 15:06       ` Lad, Prabhakar
2022-11-21 21:31         ` Lad, Prabhakar
2022-11-21 22:17           ` Heiko Stübner
2022-11-21 22:38             ` Heiko Stübner
2022-11-22  0:16               ` Lad, Prabhakar
2022-11-21 23:59             ` Lad, Prabhakar
2022-11-22 10:59             ` Lad, Prabhakar
2022-11-22 11:19               ` Heiko Stübner
2022-11-22 11:37                 ` Heiko Stübner
2022-11-22 12:28                   ` Lad, Prabhakar
2022-11-10 16:49 ` [PATCH 6/7] RISC-V: add infrastructure to allow different str* implementations Heiko Stuebner
2022-11-13 22:07   ` Conor Dooley
2022-11-10 16:49 ` [PATCH 7/7] RISC-V: add zbb support to string functions Heiko Stuebner
2022-11-13 23:29   ` Conor Dooley
2022-11-13 23:47     ` Heiko Stübner
2022-11-24 22:23     ` Heiko Stübner
2022-11-24 22:32       ` Conor Dooley
2022-11-24 23:51         ` Heiko Stuebner
2022-11-25  7:49           ` Andrew Jones
2022-11-25  8:17             ` Conor.Dooley
     [not found]             ` <CAEg0e7h9skbWPVDsz9CdB8dATN5XM9eT-uPY0A7xRZmX=qTU6A@mail.gmail.com>
2022-11-25 15:28               ` Andrew Jones
2022-11-25 16:35                 ` Christoph Müllner
2022-11-25 16:39                   ` Conor Dooley
2022-11-25 17:02                     ` Christoph Müllner
2022-11-25 17:11                       ` Conor Dooley
2022-11-25 17:42                         ` Christoph Müllner
2022-11-25 16:36                 ` Conor Dooley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.