linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/5] riscv: Add fine-tuned checksum functions
@ 2023-09-11 22:57 Charlie Jenkins
  2023-09-11 22:57 ` [PATCH v4 1/5] riscv: Checksum header Charlie Jenkins
                   ` (4 more replies)
  0 siblings, 5 replies; 24+ messages in thread
From: Charlie Jenkins @ 2023-09-11 22:57 UTC (permalink / raw)
  To: Charlie Jenkins, Palmer Dabbelt, Conor Dooley, Samuel Holland,
	David Laight, linux-riscv, linux-kernel
  Cc: Paul Walmsley, Albert Ou

Each architecture generally implements fine-tuned checksum functions to
leverage the instruction set. This patch adds the main checksum
functions that are used in networking.

Vector support is included in this patch to start a discussion on that,
it can probably be optimized more. The vector patches still need some
work as they rely on GCC vector intrinsics types which cannot work in
the kernel since it requires C vector support rather than just assembler
support. I have tested the vector patches as standalone algorithms in QEMU.

This patch takes heavy use of the Zbb extension using alternatives
patching.

To test this patch, enable the configs for KUNIT, then CHECKSUM_KUNIT
and RISCV_CHECKSUM_KUNIT.

I have attempted to make these functions as optimal as possible, but I
have not ran anything on actual riscv hardware. My performance testing
has been limited to inspecting the assembly, running the algorithms on
x86 hardware, and running in QEMU.

ip_fast_csum is a relatively small function so even though it is
possible to read 64 bits at a time on compatible hardware, the
bottleneck becomes the clean up and setup code so loading 32 bits at a
time is actually faster.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
Changes in v4:
- Suggestion by David Laight to use an improved checksum used in
  arch/arc.
- Eliminates zero-extension on rv32, but not on rv64.
- Reduces data dependency which should improve execution speed on
  rv32 and rv64
- Still passes CHECKSUM_KUNIT and RISCV_CHECKSUM_KUNIT on rv32 and
  rv64 with and without zbb.
- Link to v3: https://lore.kernel.org/r/20230907-optimize_checksum-v3-0-c502d34d9d73@rivosinc.com

Changes in v3:
- Use riscv_has_extension_likely and has_vector where possible (Conor)
- Reduce ifdefs by using IS_ENABLED where possible (Conor)
- Use kernel_vector_begin in the vector code (Samuel)
- Link to v2: https://lore.kernel.org/r/20230905-optimize_checksum-v2-0-ccd658db743b@rivosinc.com

Changes in v2:
- After more benchmarking, rework functions to improve performance.
- Remove tests that overlapped with the already existing checksum
  tests and make tests more extensive.
- Use alternatives to activate code with Zbb and vector extensions
- Link to v1: https://lore.kernel.org/r/20230826-optimize_checksum-v1-0-937501b4522a@rivosinc.com

---
Charlie Jenkins (5):
      riscv: Checksum header
      riscv: Add checksum library
      riscv: Vector checksum header
      riscv: Vector checksum library
      riscv: Test checksum functions

 arch/riscv/Kconfig.debug              |   1 +
 arch/riscv/include/asm/checksum.h     | 181 +++++++++++++++++++
 arch/riscv/lib/Kconfig.debug          |  31 ++++
 arch/riscv/lib/Makefile               |   3 +
 arch/riscv/lib/csum.c                 | 302 +++++++++++++++++++++++++++++++
 arch/riscv/lib/riscv_checksum_kunit.c | 330 ++++++++++++++++++++++++++++++++++
 6 files changed, 848 insertions(+)
---
base-commit: af3c30d33476bc2694b0d699173544b07f7ae7de
change-id: 20230804-optimize_checksum-db145288ac21
-- 
- Charlie


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 1/5] riscv: Checksum header
  2023-09-11 22:57 [PATCH v4 0/5] riscv: Add fine-tuned checksum functions Charlie Jenkins
@ 2023-09-11 22:57 ` Charlie Jenkins
  2023-09-12 10:24   ` Emil Renner Berthing
  2023-09-11 22:57 ` [PATCH v4 2/5] riscv: Add checksum library Charlie Jenkins
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Charlie Jenkins @ 2023-09-11 22:57 UTC (permalink / raw)
  To: Charlie Jenkins, Palmer Dabbelt, Conor Dooley, Samuel Holland,
	David Laight, linux-riscv, linux-kernel
  Cc: Paul Walmsley, Albert Ou

Provide checksum algorithms that have been designed to leverage riscv
instructions such as rotate. In 64-bit, can take advantage of the larger
register to avoid some overflow checking.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
 arch/riscv/include/asm/checksum.h | 95 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 95 insertions(+)

diff --git a/arch/riscv/include/asm/checksum.h b/arch/riscv/include/asm/checksum.h
new file mode 100644
index 000000000000..0d7fc8275a5e
--- /dev/null
+++ b/arch/riscv/include/asm/checksum.h
@@ -0,0 +1,95 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * IP checksum routines
+ *
+ * Copyright (C) 2023 Rivos Inc.
+ */
+#ifndef __ASM_RISCV_CHECKSUM_H
+#define __ASM_RISCV_CHECKSUM_H
+
+#include <linux/in6.h>
+#include <linux/uaccess.h>
+
+#ifdef CONFIG_32BIT
+typedef unsigned int csum_t;
+#else
+typedef unsigned long csum_t;
+#endif
+
+/*
+ *	Fold a partial checksum without adding pseudo headers
+ */
+static inline __sum16 csum_fold(__wsum sum)
+{
+	return (~sum - ror32(sum, 16)) >> 16;
+}
+
+#define csum_fold csum_fold
+
+/*
+ * Quickly compute an IP checksum with the assumption that IPv4 headers will
+ * always be in multiples of 32-bits, and have an ihl of at least 5.
+ * @ihl is the number of 32 bit segments and must be greater than or equal to 5.
+ * @iph is assumed to be word aligned.
+ */
+static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
+{
+	csum_t csum = 0;
+	int pos = 0;
+
+	do {
+		csum += ((const unsigned int *)iph)[pos];
+		if (IS_ENABLED(CONFIG_32BIT))
+			csum += csum < ((const unsigned int *)iph)[pos];
+	} while (++pos < ihl);
+
+	/*
+	 * ZBB only saves three instructions on 32-bit and five on 64-bit so not
+	 * worth checking if supported without Alternatives.
+	 */
+	if (IS_ENABLED(CONFIG_RISCV_ISA_ZBB) &&
+	    IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
+		csum_t fold_temp;
+
+		asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
+					      RISCV_ISA_EXT_ZBB, 1)
+		    :
+		    :
+		    :
+		    : no_zbb);
+
+		if (IS_ENABLED(CONFIG_32BIT)) {
+			asm(".option push				\n\
+			.option arch,+zbb				\n\
+				not	%[fold_temp], %[csum]		\n\
+				rori	%[csum], %[csum], 16		\n\
+				sub	%[csum], %[fold_temp], %[csum]	\n\
+			.option pop"
+			: [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp));
+		} else {
+			asm(".option push				\n\
+			.option arch,+zbb				\n\
+				rori	%[fold_temp], %[csum], 32	\n\
+				add	%[csum], %[fold_temp], %[csum]	\n\
+				srli	%[csum], %[csum], 32		\n\
+				not	%[fold_temp], %[csum]		\n\
+				roriw	%[csum], %[csum], 16		\n\
+				subw	%[csum], %[fold_temp], %[csum]	\n\
+			.option pop"
+			: [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp));
+		}
+		return csum >> 16;
+	}
+no_zbb:
+#ifndef CONFIG_32BIT
+		csum += (csum >> 32) | (csum << 32);
+		csum >>= 32;
+#endif
+	return csum_fold((__force __wsum)csum);
+}
+
+#define ip_fast_csum ip_fast_csum
+
+#include <asm-generic/checksum.h>
+
+#endif // __ASM_RISCV_CHECKSUM_H

-- 
2.42.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 2/5] riscv: Add checksum library
  2023-09-11 22:57 [PATCH v4 0/5] riscv: Add fine-tuned checksum functions Charlie Jenkins
  2023-09-11 22:57 ` [PATCH v4 1/5] riscv: Checksum header Charlie Jenkins
@ 2023-09-11 22:57 ` Charlie Jenkins
  2023-09-12  8:45   ` David Laight
  2023-09-14 12:25   ` Conor Dooley
  2023-09-11 22:57 ` [PATCH v4 3/5] riscv: Vector checksum header Charlie Jenkins
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 24+ messages in thread
From: Charlie Jenkins @ 2023-09-11 22:57 UTC (permalink / raw)
  To: Charlie Jenkins, Palmer Dabbelt, Conor Dooley, Samuel Holland,
	David Laight, linux-riscv, linux-kernel
  Cc: Paul Walmsley, Albert Ou

Provide a 32 and 64 bit version of do_csum. When compiled for 32-bit
will load from the buffer in groups of 32 bits, and when compiled for
64-bit will load in groups of 64 bits. Benchmarking by proxy compiling
csum_ipv6_magic (64-bit version) for an x86 chip as well as running
the riscv generated code in QEMU, discovered that summing in a
tree-like structure is about 4% faster than doing 64-bit reads.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
 arch/riscv/include/asm/checksum.h |  11 ++
 arch/riscv/lib/Makefile           |   1 +
 arch/riscv/lib/csum.c             | 210 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 222 insertions(+)

diff --git a/arch/riscv/include/asm/checksum.h b/arch/riscv/include/asm/checksum.h
index 0d7fc8275a5e..a09a4053fb87 100644
--- a/arch/riscv/include/asm/checksum.h
+++ b/arch/riscv/include/asm/checksum.h
@@ -16,6 +16,14 @@ typedef unsigned int csum_t;
 typedef unsigned long csum_t;
 #endif
 
+/* Default version is sufficient for 32 bit */
+#ifdef CONFIG_64BIT
+#define _HAVE_ARCH_IPV6_CSUM
+__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
+			const struct in6_addr *daddr,
+			__u32 len, __u8 proto, __wsum sum);
+#endif
+
 /*
  *	Fold a partial checksum without adding pseudo headers
  */
@@ -90,6 +98,9 @@ static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
 
 #define ip_fast_csum ip_fast_csum
 
+extern unsigned int do_csum(const unsigned char *buff, int len);
+#define do_csum do_csum
+
 #include <asm-generic/checksum.h>
 
 #endif // __ASM_RISCV_CHECKSUM_H
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index 26cb2502ecf8..2aa1a4ad361f 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -6,6 +6,7 @@ lib-y			+= memmove.o
 lib-y			+= strcmp.o
 lib-y			+= strlen.o
 lib-y			+= strncmp.o
+lib-y			+= csum.o
 lib-$(CONFIG_MMU)	+= uaccess.o
 lib-$(CONFIG_64BIT)	+= tishift.o
 lib-$(CONFIG_RISCV_ISA_ZICBOZ)	+= clear_page.o
diff --git a/arch/riscv/lib/csum.c b/arch/riscv/lib/csum.c
new file mode 100644
index 000000000000..47d98c51bab2
--- /dev/null
+++ b/arch/riscv/lib/csum.c
@@ -0,0 +1,210 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * IP checksum library
+ *
+ * Influenced by arch/arm64/lib/csum.c
+ * Copyright (C) 2023 Rivos Inc.
+ */
+#include <linux/bitops.h>
+#include <linux/compiler.h>
+#include <linux/kasan-checks.h>
+#include <linux/kernel.h>
+
+#include <net/checksum.h>
+
+/* Default version is sufficient for 32 bit */
+#ifndef CONFIG_32BIT
+__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
+			const struct in6_addr *daddr,
+			__u32 len, __u8 proto, __wsum csum)
+{
+	/*
+	 * Inform the compiler/processor that the operation we are performing is
+	 * "Commutative and Associative" by summing parts of the checksum in a
+	 * tree-like structure (Section 2(A) of "Computing the Internet
+	 * Checksum"). Furthermore, defer the overflow until the end of the
+	 * computation which is shown to be valid in Section 2(C)(1) of the
+	 * same handbook.
+	 */
+	unsigned long sum, sum1, sum2, sum3, sum4, ulen, uproto;
+
+	uproto = htonl(proto);
+	ulen = htonl(len);
+
+	sum   = saddr->s6_addr32[0];
+	sum  += saddr->s6_addr32[1];
+	sum1  = saddr->s6_addr32[2];
+	sum1 += saddr->s6_addr32[3];
+
+	sum2  = daddr->s6_addr32[0];
+	sum2 += daddr->s6_addr32[1];
+	sum3  = daddr->s6_addr32[2];
+	sum3 += daddr->s6_addr32[3];
+
+	sum4  = csum;
+	sum4 += ulen;
+	sum4 += uproto;
+
+	sum  += sum1;
+	sum2 += sum3;
+
+	sum += sum2;
+	sum += sum4;
+
+	if (IS_ENABLED(CONFIG_RISCV_ISA_ZBB) &&
+	    IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
+		csum_t fold_temp;
+
+		/*
+		 * Zbb is likely available when the kernel is compiled with Zbb
+		 * support, so nop when Zbb is available and jump when Zbb is
+		 * not available.
+		 */
+		asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
+					      RISCV_ISA_EXT_ZBB, 1)
+				  :
+				  :
+				  :
+				  : no_zbb);
+		asm(".option push					\n\
+		.option arch,+zbb					\n\
+			rori	%[fold_temp], %[sum], 32		\n\
+			add	%[sum], %[fold_temp], %[sum]		\n\
+			srli	%[sum], %[sum], 32			\n\
+			not	%[fold_temp], %[sum]			\n\
+			roriw	%[sum], %[sum], 16			\n\
+			subw	%[sum], %[fold_temp], %[sum]		\n\
+		.option pop"
+		: [sum] "+r" (sum), [fold_temp] "=&r" (fold_temp));
+		return (__force __sum16)(sum >> 16);
+	}
+no_zbb:
+	sum += (sum >> 32) | (sum << 32);
+	sum >>= 32;
+	return csum_fold((__force __wsum)sum);
+}
+EXPORT_SYMBOL(csum_ipv6_magic);
+#endif // !CONFIG_32BIT
+
+#ifdef CONFIG_32BIT
+#define OFFSET_MASK 3
+#elif CONFIG_64BIT
+#define OFFSET_MASK 7
+#endif
+
+/*
+ * Perform a checksum on an arbitrary memory address.
+ * Algorithm accounts for buff being misaligned.
+ * If buff is not aligned, will over-read bytes but not use the bytes that it
+ * shouldn't. The same thing will occur on the tail-end of the read.
+ */
+unsigned int __no_sanitize_address do_csum(const unsigned char *buff, int len)
+{
+	unsigned int offset, shift;
+	csum_t csum, data;
+	const csum_t *ptr;
+
+	if (unlikely(len <= 0))
+		return 0;
+	/*
+	 * To align the address, grab the whole first byte in buff.
+	 * Since it is inside of a same byte, it will never cross pages or cache
+	 * lines.
+	 * Directly call KASAN with the alignment we will be using.
+	 */
+	offset = (csum_t)buff & OFFSET_MASK;
+	kasan_check_read(buff, len);
+	ptr = (const csum_t *)(buff - offset);
+	len = len + offset - sizeof(csum_t);
+
+	/*
+	 * Clear the most signifant bits that were over-read if buff was not
+	 * aligned.
+	 */
+	shift = offset * 8;
+	data = *ptr;
+#ifdef __LITTLE_ENDIAN
+	data = (data >> shift) << shift;
+#else
+	data = (data << shift) >> shift;
+#endif
+	/*
+	 * Do 32-bit reads on RV32 and 64-bit reads otherwise. This should be
+	 * faster than doing 32-bit reads on architectures that support larger
+	 * reads.
+	 */
+	while (len > 0) {
+		csum += data;
+		csum += csum < data;
+		len -= sizeof(csum_t);
+		ptr += 1;
+		data = *ptr;
+	}
+
+	/*
+	 * Perform alignment (and over-read) bytes on the tail if any bytes
+	 * leftover.
+	 */
+	shift = len * -8;
+#ifdef __LITTLE_ENDIAN
+	data = (data << shift) >> shift;
+#else
+	data = (data >> shift) << shift;
+#endif
+	csum += data;
+	csum += csum < data;
+
+	if (!riscv_has_extension_likely(RISCV_ISA_EXT_ZBB))
+		goto no_zbb;
+
+	unsigned int fold_temp;
+
+	if (IS_ENABLED(CONFIG_32BIT)) {
+		asm_volatile_goto(".option push			\n\
+		.option arch,+zbb				\n\
+			rori	%[fold_temp], %[csum], 16	\n\
+			andi	%[offset], %[offset], 1		\n\
+			add	%[csum], %[fold_temp], %[csum]	\n\
+			beq	%[offset], zero, %l[end]	\n\
+			rev8	%[csum], %[csum]		\n\
+			zext.h	%[csum], %[csum]		\n\
+		.option pop"
+			: [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp)
+			: [offset] "r" (offset)
+			:
+			: end);
+
+		return csum;
+	} else {
+		asm_volatile_goto(".option push			\n\
+		.option arch,+zbb				\n\
+			rori	%[fold_temp], %[csum], 32	\n\
+			add	%[csum], %[fold_temp], %[csum]	\n\
+			srli	%[csum], %[csum], 32		\n\
+			roriw	%[fold_temp], %[csum], 16	\n\
+			addw	%[csum], %[fold_temp], %[csum]	\n\
+			andi	%[offset], %[offset], 1		\n\
+			beq	%[offset], zero, %l[end]	\n\
+			rev8	%[csum], %[csum]		\n\
+			srli	%[csum], %[csum], 32		\n\
+			zext.h	%[csum], %[csum]		\n\
+		.option pop"
+			: [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp)
+			: [offset] "r" (offset)
+			:
+			: end);
+
+		return csum;
+	}
+end:
+		return csum >> 16;
+no_zbb:
+#ifndef CONFIG_32BIT
+		csum += (csum >> 32) | (csum << 32);
+		csum >>= 32;
+#endif
+	csum = (unsigned int)csum + (((unsigned int)csum >> 16) | ((unsigned int)csum << 16));
+	if (offset & 1)
+		return (unsigned short)swab32(csum);
+	return csum >> 16;
+}

-- 
2.42.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 3/5] riscv: Vector checksum header
  2023-09-11 22:57 [PATCH v4 0/5] riscv: Add fine-tuned checksum functions Charlie Jenkins
  2023-09-11 22:57 ` [PATCH v4 1/5] riscv: Checksum header Charlie Jenkins
  2023-09-11 22:57 ` [PATCH v4 2/5] riscv: Add checksum library Charlie Jenkins
@ 2023-09-11 22:57 ` Charlie Jenkins
  2023-09-11 22:57 ` [PATCH v4 4/5] riscv: Vector checksum library Charlie Jenkins
  2023-09-11 22:57 ` [PATCH v4 5/5] riscv: Test checksum functions Charlie Jenkins
  4 siblings, 0 replies; 24+ messages in thread
From: Charlie Jenkins @ 2023-09-11 22:57 UTC (permalink / raw)
  To: Charlie Jenkins, Palmer Dabbelt, Conor Dooley, Samuel Holland,
	David Laight, linux-riscv, linux-kernel
  Cc: Paul Walmsley, Albert Ou

Vector code is written in assembly rather than using the GCC vector
instrinsics because they did not provide optimal code. Vector
instrinsic types are still used so the inline assembly can
appropriately select vector registers. However, this code cannot be
merged yet because it is currently not possible to use vector
instrinsics in the kernel because vector support needs to be directly
enabled by assembly.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
 arch/riscv/include/asm/checksum.h | 75 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)

diff --git a/arch/riscv/include/asm/checksum.h b/arch/riscv/include/asm/checksum.h
index a09a4053fb87..a99c1f61e795 100644
--- a/arch/riscv/include/asm/checksum.h
+++ b/arch/riscv/include/asm/checksum.h
@@ -10,6 +10,10 @@
 #include <linux/in6.h>
 #include <linux/uaccess.h>
 
+#ifdef CONFIG_RISCV_ISA_V
+#include <riscv_vector.h>
+#endif
+
 #ifdef CONFIG_32BIT
 typedef unsigned int csum_t;
 #else
@@ -42,6 +46,77 @@ static inline __sum16 csum_fold(__wsum sum)
  */
 static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
 {
+#ifdef CONFIG_RISCV_ISA_V
+	if (!has_vector())
+		goto no_vector;
+
+	vuint64m1_t prev_buffer;
+	vuint32m1_t curr_buffer;
+	unsigned int vl;
+
+	if (IS_ENABLED(CONFIG_32BIT)) {
+		csum_t high_result, low_result;
+
+		kernel_vector_begin();
+		asm(".option push						\n\
+		.option arch, +v						\n\
+		vsetivli	x0, 1, e64, ta, ma				\n\
+		vmv.v.i		%[prev_buffer], 0				\n\
+		1:								\n\
+		vsetvli		%[vl], %[ihl], e32, m1, ta, ma			\n\
+		vle32.v		%[curr_buffer], (%[iph])			\n\
+		vwredsumu.vs	%[prev_buffer], %[curr_buffer], %[prev_buffer]	\n\
+		sub %[ihl],	%[ihl], %[vl]					\n\
+		slli %[vl],	%[vl], 2					\n\
+		add %[iph],	%[vl], %[iph]					\n\
+		# If not all of iph could fit into vector reg, do another sum	\n\
+		bne		%[ihl], zero, 1b				\n\
+		vsetivli	x0, 1, e64, m1, ta, ma				\n\
+		vmv.x.s		%[low_result], %[prev_buffer]			\n\
+		addi		%[vl], x0, 32					\n\
+		vsrl.vx		%[prev_buffer], %[prev_buffer], %[vl]		\n\
+		vmv.x.s		%[high_result], %[prev_buffer]			\n\
+		.option pop"
+		: [vl] "=&r" (vl), [prev_buffer] "=&vd" (prev_buffer),
+			[curr_buffer] "=&vd" (curr_buffer),
+			[high_result] "=&r" (high_result),
+			[low_result] "=&r" (low_result)
+		: [iph] "r" (iph), [ihl] "r" (ihl));
+		kernel_vector_end();
+
+		high_result += low_result;
+		high_result += high_result < low_result;
+	} else {
+		csum_t result;
+
+		kernel_vector_begin();
+		asm(".option push						\n\
+		.option arch, +v						\n\
+		vsetivli	x0, 1, e64, ta, ma				\n\
+		vmv.v.i		%[prev_buffer], 0				\n\
+		1:								\n\
+		# Setup 32-bit sum of iph					\n\
+		vsetvli		%[vl], %[ihl], e32, m1, ta, ma			\n\
+		vle32.v		%[curr_buffer], (%[iph])			\n\
+		# Sum each 32-bit segment of iph that can fit into a vector reg	\n\
+		vwredsumu.vs	%[prev_buffer], %[curr_buffer], %[prev_buffer]	\n\
+		subw %[ihl],	%[ihl], %[vl]					\n\
+		slli %[vl],	%[vl], 2					\n\
+		addw %[iph],	%[vl], %[iph]					\n\
+		# If not all of iph could fit into vector reg, do another sum	\n\
+		bne		%[ihl], zero, 1b				\n\
+		vsetvli	x0, x0, e64, m1, ta, ma					\n\
+		vmv.x.s	%[result], %[prev_buffer]				\n\
+		.option pop"
+		: [vl] "=&r" (vl), [prev_buffer] "=&vd" (prev_buffer),
+			[curr_buffer] "=&vd" (curr_buffer),
+			[result] "=&r" (result)
+		: [iph] "r" (iph), [ihl] "r" (ihl));
+		kernel_vector_end();
+	}
+no_vector:
+#endif // !CONFIG_RISCV_ISA_V
+
 	csum_t csum = 0;
 	int pos = 0;
 

-- 
2.42.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 4/5] riscv: Vector checksum library
  2023-09-11 22:57 [PATCH v4 0/5] riscv: Add fine-tuned checksum functions Charlie Jenkins
                   ` (2 preceding siblings ...)
  2023-09-11 22:57 ` [PATCH v4 3/5] riscv: Vector checksum header Charlie Jenkins
@ 2023-09-11 22:57 ` Charlie Jenkins
  2023-09-14 12:46   ` Conor Dooley
  2023-09-11 22:57 ` [PATCH v4 5/5] riscv: Test checksum functions Charlie Jenkins
  4 siblings, 1 reply; 24+ messages in thread
From: Charlie Jenkins @ 2023-09-11 22:57 UTC (permalink / raw)
  To: Charlie Jenkins, Palmer Dabbelt, Conor Dooley, Samuel Holland,
	David Laight, linux-riscv, linux-kernel
  Cc: Paul Walmsley, Albert Ou

This patch is not ready for merge as vector support in the kernel is
limited. However, the code has been tested in QEMU so the algorithms
do work. This code requires the kernel to be compiled with C vector
support, but that is not yet possible. It is written in assembly
rather than using the GCC vector instrinsics because they did not
provide optimal code.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
 arch/riscv/lib/csum.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 92 insertions(+)

diff --git a/arch/riscv/lib/csum.c b/arch/riscv/lib/csum.c
index 47d98c51bab2..eb4596fc7f5b 100644
--- a/arch/riscv/lib/csum.c
+++ b/arch/riscv/lib/csum.c
@@ -12,6 +12,10 @@
 
 #include <net/checksum.h>
 
+#ifdef CONFIG_RISCV_ISA_V
+#include <riscv_vector.h>
+#endif
+
 /* Default version is sufficient for 32 bit */
 #ifndef CONFIG_32BIT
 __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
@@ -115,6 +119,94 @@ unsigned int __no_sanitize_address do_csum(const unsigned char *buff, int len)
 	offset = (csum_t)buff & OFFSET_MASK;
 	kasan_check_read(buff, len);
 	ptr = (const csum_t *)(buff - offset);
+#ifdef CONFIG_RISCV_ISA_V
+	if (!has_vector())
+		goto no_vector;
+
+	len += offset;
+
+	vuint64m1_t prev_buffer;
+	vuint32m1_t curr_buffer;
+	unsigned int shift, cl, tail_seg;
+	csum_t vl, csum;
+	const csum_t *ptr;
+
+#ifdef CONFIG_32BIT
+	csum_t high_result, low_result;
+#else
+	csum_t result;
+#endif
+
+	// Read the tail segment
+	tail_seg = len % 4;
+	csum = 0;
+	if (tail_seg) {
+		shift = (4 - tail_seg) * 8;
+		csum = *(unsigned int *)((const unsigned char *)ptr + len - tail_seg);
+		csum = ((unsigned int)csum << shift) >> shift;
+		len -= tail_seg;
+	}
+
+	unsigned int start_mask = (unsigned int)(~(~0U << offset));
+
+	kernel_vector_begin();
+	asm(".option push						\n\
+	.option arch, +v						\n\
+	vsetvli	 %[vl], %[len], e8, m1, ta, ma				\n\
+	# clear out mask and vector registers since we switch up sizes	\n\
+	vmclr.m	 v0							\n\
+	vmclr.m	 %[prev_buffer]						\n\
+	vmclr.m  %[curr_buffer]						\n\
+	# Mask out the leading bits of a misaligned address		\n\
+	vsetivli x0, 1, e64, m1, ta, ma					\n\
+	vmv.s.x	 %[prev_buffer], %[csum]				\n\
+	vmv.s.x	 v0, %[start_mask]					\n\
+	vsetvli	 %[vl], %[len], e8, m1, ta, ma				\n\
+	vmnot.m	 v0, v0							\n\
+	vle8.v	 %[curr_buffer], (%[buff]), v0.t			\n\
+	j	 2f							\n\
+	# Iterate through the buff and sum all words			\n\
+	1:								\n\
+	vsetvli	 %[vl], %[len], e8, m1, ta, ma				\n\
+	vle8.v	 %[curr_buffer], (%[buff])				\n\
+	2:								\n\
+	vsetvli x0, x0, e32, m1, ta, ma					\n\
+	vwredsumu.vs	%[prev_buffer], %[curr_buffer], %[prev_buffer]	\n\t"
+#ifdef CONFIG_32BIT
+	"sub	 %[len], %[len], %[vl]					\n\
+	slli	 %[vl], %[vl], 2					\n\
+	add	 %[buff], %[vl], %[buff]				\n\
+	bnez	 %[len], 1b						\n\
+	vsetvli	 x0, x0, e64, m1, ta, ma				\n\
+	vmv.x.s	 %[result], %[prev_buffer]				\n\
+	addi	 %[vl], x0, 32						\n\
+	vsrl.vx	 %[prev_buffer], %[prev_buffer], %[vl]			\n\
+	vmv.x.s	 %[high_result], %[prev_buffer]				\n\
+	.option  pop"
+	    : [vl] "=&r"(vl), [prev_buffer] "=&vd"(prev_buffer),
+	      [curr_buffer] "=&vd"(curr_buffer),
+	      [high_result] "=&r"(high_result), [low_result] "=&r"(low_result)
+	    : [buff] "r"(ptr), [len] "r"(len), [start_mask] "r"(start_mask),
+	      [csum] "r"(csum));
+
+	high_result += low_result;
+	high_result += high_result < low_result;
+#else // !CONFIG_32BIT
+	"subw	 %[len], %[len], %[vl]					\n\
+	slli	 %[vl], %[vl], 2					\n\
+	addw	 %[buff], %[vl], %[buff]				\n\
+	bnez	 %[len], 1b						\n\
+	vsetvli  x0, x0, e64, m1, ta, ma				\n\
+	vmv.x.s  %[result], %[prev_buffer]				\n\
+	.option pop"
+	    : [vl] "=&r"(vl), [prev_buffer] "=&vd"(prev_buffer),
+	      [curr_buffer] "=&vd"(curr_buffer), [result] "=&r"(result)
+	    : [buff] "r"(ptr), [len] "r"(len), [start_mask] "r"(start_mask),
+	      [csum] "r"(csum));
+#endif // !CONFIG_32BIT
+	kernel_vector_end();
+no_vector:
+#endif // CONFIG_RISCV_ISA_V
 	len = len + offset - sizeof(csum_t);
 
 	/*

-- 
2.42.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 5/5] riscv: Test checksum functions
  2023-09-11 22:57 [PATCH v4 0/5] riscv: Add fine-tuned checksum functions Charlie Jenkins
                   ` (3 preceding siblings ...)
  2023-09-11 22:57 ` [PATCH v4 4/5] riscv: Vector checksum library Charlie Jenkins
@ 2023-09-11 22:57 ` Charlie Jenkins
  4 siblings, 0 replies; 24+ messages in thread
From: Charlie Jenkins @ 2023-09-11 22:57 UTC (permalink / raw)
  To: Charlie Jenkins, Palmer Dabbelt, Conor Dooley, Samuel Holland,
	David Laight, linux-riscv, linux-kernel
  Cc: Paul Walmsley, Albert Ou

Add Kconfig support for riscv specific testing modules. This was created
to supplement lib/checksum_kunit.c, and add tests for ip_fast_csum and
csum_ipv6_magic.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
 arch/riscv/Kconfig.debug              |   1 +
 arch/riscv/lib/Kconfig.debug          |  31 ++++
 arch/riscv/lib/Makefile               |   2 +
 arch/riscv/lib/riscv_checksum_kunit.c | 330 ++++++++++++++++++++++++++++++++++
 4 files changed, 364 insertions(+)

diff --git a/arch/riscv/Kconfig.debug b/arch/riscv/Kconfig.debug
index e69de29bb2d1..53a84ec4f91f 100644
--- a/arch/riscv/Kconfig.debug
+++ b/arch/riscv/Kconfig.debug
@@ -0,0 +1 @@
+source "arch/riscv/lib/Kconfig.debug"
diff --git a/arch/riscv/lib/Kconfig.debug b/arch/riscv/lib/Kconfig.debug
new file mode 100644
index 000000000000..15fc83b68340
--- /dev/null
+++ b/arch/riscv/lib/Kconfig.debug
@@ -0,0 +1,31 @@
+# SPDX-License-Identifier: GPL-2.0-only
+menu "riscv Testing and Coverage"
+
+menuconfig RUNTIME_TESTING_MENU
+	bool "Runtime Testing"
+	def_bool y
+	help
+	  Enable riscv runtime testing.
+
+if RUNTIME_TESTING_MENU
+
+config RISCV_CHECKSUM_KUNIT
+	tristate "KUnit test riscv checksum functions at runtime" if !KUNIT_ALL_TESTS
+	depends on KUNIT
+	default KUNIT_ALL_TESTS
+	help
+	  Enable this option to test the checksum functions at boot.
+
+	  KUnit tests run during boot and output the results to the debug log
+	  in TAP format (http://testanything.org/). Only useful for kernel devs
+	  running the KUnit test harness, and not intended for inclusion into a
+	  production build.
+
+	  For more information on KUnit and unit tests in general please refer
+	  to the KUnit documentation in Documentation/dev-tools/kunit/.
+
+	  If unsure, say N.
+
+endif # RUNTIME_TESTING_MENU
+
+endmenu # "riscv Testing and Coverage"
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index 2aa1a4ad361f..1535a8c81430 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -12,3 +12,5 @@ lib-$(CONFIG_64BIT)	+= tishift.o
 lib-$(CONFIG_RISCV_ISA_ZICBOZ)	+= clear_page.o
 
 obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
+
+obj-$(CONFIG_RISCV_CHECKSUM_KUNIT) += riscv_checksum_kunit.o
diff --git a/arch/riscv/lib/riscv_checksum_kunit.c b/arch/riscv/lib/riscv_checksum_kunit.c
new file mode 100644
index 000000000000..27f0e465447f
--- /dev/null
+++ b/arch/riscv/lib/riscv_checksum_kunit.c
@@ -0,0 +1,330 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Test cases for checksum
+ */
+
+#include <linux/in6.h>
+
+#include <kunit/test.h>
+#include <net/checksum.h>
+#include <net/ip6_checksum.h>
+
+#define CHECK_EQ(lhs, rhs) KUNIT_ASSERT_EQ(test, lhs, rhs)
+
+static const u8 random_buf[] = {
+	0x3d, 0xf9, 0x6f, 0x81, 0x84, 0x11, 0xb8, 0x03, 0x8f, 0x00, 0x1e, 0xfd,
+	0xc6, 0x77, 0xf7, 0x72, 0xde, 0x16, 0xe2, 0xf7, 0xf8, 0x81, 0x4b, 0x3e,
+	0x36, 0x57, 0x9c, 0x10, 0x4e, 0x53, 0x44, 0x94, 0x5e, 0x6c, 0x5b, 0xde,
+	0x98, 0x8a, 0xc5, 0x0a, 0x5d, 0x24, 0x38, 0x4c, 0x50, 0xef, 0x20, 0xe8,
+	0x14, 0x4e, 0x8d, 0x3e, 0x80, 0x9a, 0xd9, 0xf1, 0xb5, 0x2d, 0x27, 0x6d,
+	0xb4, 0x99, 0x9b, 0x10, 0xf7, 0x12, 0x14, 0xff, 0xe8, 0xe1, 0xd5, 0x1a,
+	0x96, 0x86, 0x6a, 0xb3, 0xde, 0x10, 0xf3, 0xa5, 0x08, 0xbd, 0x74, 0x27,
+	0x5a, 0x72, 0x4f, 0x5a, 0xd3, 0x4b, 0xbb, 0x73, 0xe3, 0x71, 0xd1, 0x1d,
+	0x8c, 0xb3, 0x69, 0xd9, 0x3c, 0xda, 0x58, 0x73, 0x86, 0x19, 0xd1, 0xf9,
+	0x58, 0xee, 0x4a, 0x39, 0xf9, 0x43, 0x38, 0x22, 0x8a, 0x6f, 0xee, 0xb5,
+	0x7a, 0x31, 0x52, 0x32, 0x80, 0xf1, 0x70, 0x60, 0x7c, 0x0a, 0xa6, 0x54,
+	0x08, 0x11, 0x99, 0xa1, 0x4b, 0x58, 0xc1, 0xbe, 0x6d, 0x5e, 0xd1, 0x32,
+	0x79, 0xcf, 0xaf, 0x7c, 0x52, 0x6f, 0x26, 0xc4, 0xa8, 0x1d, 0x67, 0x04,
+	0x2f, 0xb8, 0x10, 0x9d, 0x97, 0x2f, 0xe3, 0xa1, 0xf7, 0x88, 0xa4, 0xab,
+	0xd9, 0x22, 0xaa, 0x8d, 0x11, 0x3b, 0x27, 0x34, 0x31, 0xd6, 0x44, 0xeb,
+	0x9f, 0x4c, 0x22, 0x29, 0xea, 0x83, 0xa4, 0x6b, 0x48, 0x7a, 0xe7, 0x4c,
+	0x84, 0x5b, 0x24, 0xbe, 0x1e, 0x1f, 0xf6, 0xc7, 0x9e, 0xd4, 0xc1, 0x52,
+	0x23, 0x18, 0xaa, 0xfe, 0x72, 0x63, 0x7f, 0x2f, 0xcd, 0xda, 0x0e, 0x39,
+	0x09, 0xbb, 0x84, 0x24, 0xa4, 0xa9, 0x2f, 0x01, 0x55, 0xfa, 0xb4, 0xa7,
+	0x0c, 0x9c, 0xb0, 0x22, 0x71, 0x85, 0x91, 0x62, 0x97, 0xdc, 0x8d, 0xaf
+};
+
+static const __sum16 expected_csum_ipv6_magic[] = {
+	0xf45f, 0x1b52, 0xa787, 0x5002, 0x562d, 0x2aed, 0x54b4, 0xc018, 0xfdc9,
+	0xae5a, 0xa9d1, 0x79e1, 0x12c6, 0xa262, 0x1290, 0x7632, 0x85d,	0xfa1c,
+	0xbe47, 0x304b, 0x506e, 0x4dd0, 0x1ce7, 0x49f5, 0x4c39, 0xa900, 0x16d6,
+	0x4c3d, 0xf8b7, 0x71ab, 0x9109, 0x992b, 0x19a9, 0x8b0f, 0xff9c, 0x3113,
+	0x152f, 0xcffc, 0xb3af, 0xfb87, 0x7015, 0x2005, 0x2fa5, 0x4c99, 0xd8fe,
+	0xffb5, 0x4610, 0xe437, 0xa888, 0x49b0, 0x8705, 0xabfa, 0x2ed2, 0x8788,
+	0xdff8, 0x662f, 0x3ac0, 0xf00c, 0x863a, 0xce3f, 0xfe40, 0x38e0, 0xb0a9,
+	0x181,	0xee1d, 0x707a, 0x922,	0xd470, 0x3fad, 0x6b7b, 0x3945, 0x8991,
+	0x3ffb, 0xc8c5, 0xfae1, 0x59cb, 0xfc51, 0x6954, 0x8955, 0x49d3, 0xc582,
+	0x61bd, 0xe5a4, 0xaf1d, 0xa2d0, 0xb02b, 0xbf1e, 0x20ac, 0xd5d4, 0x2450,
+	0xc454, 0x6a16, 0x4f9c, 0xeecf, 0xb7de, 0x9f27, 0x99fe, 0xb715, 0xfdc0,
+	0xc6a2, 0xbb1a, 0xf0c2, 0xbb01, 0x8f53, 0xad2f, 0x9bf7, 0x9f3,	0x87ca,
+	0xb445, 0xc220, 0x8b20, 0xd65a, 0xba07, 0x6b33, 0x4139, 0xbeef, 0x673a,
+	0xbab8, 0xa929, 0x54cf, 0x2a18, 0xbbd1, 0x2d8,	0x2269, 0xa025, 0xeece,
+	0x64a6, 0x5b74, 0x5ef7, 0xbaf5, 0x26e9, 0x2009, 0xabc0, 0x97a1, 0x41f,
+	0xe0a7, 0x6d8b, 0x2845, 0x374a, 0x76e0, 0x7303, 0x1384, 0x854e, 0xcfac,
+	0xc102, 0xc7f1, 0x479d, 0x9d8b, 0xd587, 0xc173, 0xb00c, 0xc4d1, 0xe8ed,
+	0x51d2, 0x48d4, 0xd9eb, 0x6744, 0xcaf,	0xf785, 0xe8dc, 0x9034, 0x7413,
+	0x26ce, 0x3b4b, 0xbf9,	0xba2a, 0xe9d8, 0x89de, 0x5150, 0x28ef, 0xbefb,
+	0xb67f, 0xee07, 0x1c10, 0x2534, 0x78ce, 0xfc75, 0x7a6d, 0x5cdd, 0x7edb,
+	0xf3ad, 0xd7bf, 0x3b1,	0xc411, 0xacfc, 0xe3b5, 0xca9d, 0x174e, 0x893b,
+	0x442c, 0x4dec, 0x827d, 0x5783, 0x2dac, 0x7d26, 0x3530, 0xb0db, 0x11bc,
+	0xb2ac, 0x4462
+};
+
+static const __sum16 expected_fast_csum[] = {
+	0x78e9, 0x2e78, 0xf02e, 0x52f0, 0x3353, 0xa133, 0xeda0, 0xbced, 0xe0bc,
+	0xfde0, 0x8dfd, 0xd78d, 0xf6d7, 0x3ff7, 0x240,	0xdc02, 0x96db, 0x2197,
+	0x2321, 0x3e23, 0x103f, 0xda10, 0x6dda, 0xed6d, 0x5fed, 0xd35f, 0xc7d2,
+	0x4ec8, 0xf04d, 0x87f0, 0xf587, 0x3ef5, 0x4b3f, 0x1d4b, 0x1d1d, 0x9f1c,
+	0x99f,	0x8209, 0x6682, 0x2067, 0x420,	0xc903, 0x8ec8, 0x658e, 0xca65,
+	0xbec9, 0xa6bf, 0xcba6, 0x8fcb, 0xf78e, 0xfbf6, 0xaefb, 0x1faf, 0x991f,
+	0x3399, 0x834,	0x7208, 0xdf71, 0x8edf, 0x138e, 0x5613, 0xbf56, 0x32bf,
+	0xe632, 0x1be6, 0x831c, 0xbc82, 0x47bc, 0x6148, 0x5d61, 0xf75d, 0x77f7,
+	0x9e77, 0x2a9e, 0xb32a, 0xc3b2, 0x48c4, 0x1649, 0xa615, 0x9fa6, 0x729f,
+	0x6b72, 0x556b, 0x8755, 0x987,	0x5b09, 0x625b, 0xd961, 0xc2d8, 0x53c3,
+	0x2053, 0xc420, 0x5ac4, 0xae5a, 0x88ae, 0x4789, 0x8447, 0x4984, 0xc849,
+	0xc4c7, 0xebc4, 0x86eb, 0x9487, 0x8d94, 0xd58d, 0x93d5, 0xfd92, 0xf3fd,
+	0x96f4, 0xd096, 0x7ad1, 0x757a, 0x5f75, 0x6660, 0x9266, 0x592,	0x1305,
+	0x4413, 0x2a44, 0x712a, 0x2171, 0x7e21, 0xf47d, 0xfef3, 0xf3fe, 0x5f4,
+	0x1606, 0xc715, 0xf9c6, 0xf0f9, 0x94f0, 0x7095, 0x2570, 0xd024, 0x18d0,
+	0x219,	0xb602, 0x1eb6, 0x561e, 0xcf56, 0x77cf, 0xa577, 0xa6a5, 0x93a6,
+	0x3793, 0x1537, 0x7e15, 0x207e, 0x4f20, 0x994e, 0x9b99, 0x159b, 0xd215,
+	0xacd2, 0xb4ac, 0xecb4, 0x84ec, 0xea84, 0x66ea, 0xb666, 0x18b6, 0xae18,
+	0xfbad, 0x6efc, 0x746f, 0x7c74, 0x797c, 0x7c79, 0xb97c, 0xdba,	0x620d,
+	0xd061, 0xa2d0, 0x5da2, 0x825d, 0x6082, 0xf85f, 0x72f8, 0xaf73, 0xc1ae,
+	0xd2c1, 0xb8a5, 0xacb8, 0x5aad, 0x805a, 0xcb80, 0xb6cb, 0x89b6, 0x2a8a,
+	0xf929, 0x5af9, 0x8d5a, 0x1d8d, 0xac1d, 0x4bac, 0x994b, 0x7d99, 0x17e,
+	0xff01, 0xf3fe, 0xa8f4, 0x9fa9, 0x51a0, 0x3251, 0x7c32, 0x887b, 0x9d88,
+	0x919d, 0xac91, 0x63ac, 0x7a63, 0x1c7a, 0xe51b, 0xbee4, 0x8dbe, 0xfd8d,
+	0xc1fd, 0x6ec2, 0xa66e, 0x5fa6, 0xd05f, 0x59d0, 0x3659, 0x6b36, 0x5a6b,
+	0xb859, 0xc1b7, 0xc5c1, 0xcc5,	0x930d, 0x8b92, 0x5a8b, 0xae5a, 0xe5ad,
+	0x4fe5, 0x6f50, 0x366f, 0xbb36, 0xe3bb, 0x2be3, 0x962b, 0x7196, 0xf071,
+	0x98f0, 0x3c99, 0x4f3c, 0x604f, 0x1660, 0xb915, 0xa1b9, 0xbea1, 0x11bf,
+	0xc311, 0xec3,	0xcd0e, 0xe1cc, 0xcde1, 0xbbcd, 0x6fbc, 0xf26e, 0x9f3,
+	0x250a, 0x8c24, 0xc88c, 0x2fc8, 0xf62e, 0x30f6, 0x7a30, 0x357a, 0x9b35,
+	0xf9b,	0xa30f, 0x92a3, 0xf492, 0xebf4, 0xf6eb, 0xcef6, 0x5ece, 0xe05e,
+	0xe0e0, 0xf7e0, 0x87f8, 0xb487, 0x70b4, 0x9c70, 0x839c, 0xa683, 0x92a6,
+	0xd192, 0x37d2, 0x2238, 0x1523, 0xd414, 0xacd3, 0x81ad, 0x9881, 0xf897,
+	0xfbf7, 0x14fc, 0xd15,	0x320d, 0x9032, 0x3390, 0xf232, 0xd5f1, 0xa7d5,
+	0x3a8,	0x2a04, 0x4e2a, 0xc64d, 0x21c6, 0xb321, 0x60b3, 0x361,	0x3a03,
+	0x5c39, 0xc25c, 0x60c2, 0x7660, 0x8976, 0x5489, 0xa654, 0xcaa5, 0x7bca,
+	0xf77b, 0x2f7,	0x9702, 0xaf97, 0x9caf, 0x9e9c, 0xdd9e, 0xd2dd, 0xdcd2,
+	0x62dd, 0x5463, 0xaa53, 0x76aa, 0xc375, 0x5c3,	0x2f06, 0xf42e, 0xa2f4,
+	0xa1a2, 0x4ea1, 0xe04e, 0x84e0, 0x8f85, 0x938f, 0x4c93, 0xf24c, 0xa1f2,
+	0xb9a1, 0x27ba, 0x8927, 0x1a89, 0xa51a, 0x4ba4, 0x114b, 0xde10, 0x12de,
+	0x6112, 0xab61, 0x50d3, 0xc250, 0xf6c2, 0xedf6, 0xe3ed, 0x13e4, 0x8913,
+	0x7089, 0xae6f, 0x66ae, 0x2466, 0xbf23, 0x16c0, 0x2917, 0x6a29, 0xe86a,
+	0x90e8, 0x7691, 0xb875, 0x37b9, 0xc837, 0x1bc9, 0xfc1b, 0xd9fb, 0xfbd9,
+	0x8ffb, 0xb88f, 0x52b8, 0xd751, 0xead6, 0xfcea, 0x7fd,	0x2408, 0xb223,
+	0xf6b1, 0x71f6, 0xc472, 0x13c4, 0x3c14, 0xc53c, 0x47c4, 0x3947, 0x8a38,
+	0x9b89, 0xbb9b, 0x55bb, 0x2456, 0xc24,	0x590c, 0x4258, 0x9642, 0xdc95,
+	0x2edc, 0x542f, 0xc54,	0xb90c, 0xd6b9, 0x14d7, 0x9214, 0xec91, 0xa4ec,
+	0xcda4, 0xf2cd, 0xadf2, 0x8fad, 0xc18f, 0x30c1, 0x430,	0x1205, 0x6112,
+	0x4061, 0xcd40, 0x81cc, 0x2682, 0x2e26, 0x382e, 0x6e38, 0x906e, 0x6590,
+	0xb265, 0x11b2, 0x6211, 0xe061, 0x8be0, 0xce8b, 0xeccd, 0xfcec, 0x3fd,
+	0x3504, 0x4d35, 0x114d, 0x1a11, 0xcf19, 0x82cf, 0xf83,	0x210,	0xfb01,
+	0xdfb,	0xbd0d, 0x6bd,	0x3607, 0xc735, 0x5c8,	0x7a05, 0x247a, 0xf824,
+	0x2cf8, 0x302d, 0x8530, 0x3d85, 0x1b3e, 0xc71a, 0x95c6, 0x5296, 0x7b52,
+	0xb97a, 0x6ab9, 0xca6a, 0xaca,	0x90b,	0x4409, 0x3144, 0x631,	0x5d06,
+	0x745c, 0x3474, 0x4835, 0x3e48, 0xa43e, 0x8ba4, 0xf68a, 0x20f7, 0xae20,
+	0x91ad, 0x8f91, 0x478f, 0x8f47, 0x9b8e, 0x5e9b, 0xb85e, 0x71b8, 0x4c71,
+	0xad4c, 0x73ad, 0x5273, 0xdb52, 0xe6db, 0x63e7, 0x2f64, 0x852f, 0xc884,
+	0x66c8, 0xa166, 0x6fa1, 0x726f, 0xb472, 0x4db4, 0xf94c, 0x81f9, 0x6581,
+	0xb365, 0xb4b3, 0x68b4, 0xb068, 0xbdb0, 0x23be, 0xeb23, 0xa3eb, 0xd8a3,
+	0x5ed9, 0xdc5e, 0x12dc, 0xa212, 0x85a1, 0x885,	0xeb07, 0xe9ea, 0xf8e9,
+	0xa7f9, 0x93a7, 0x9493, 0x6940, 0x1f69, 0xf61f, 0x33f6, 0x9933, 0x1f99,
+	0x201f, 0x1220, 0x1912, 0x4419, 0xf543, 0x29f5, 0xa62a, 0xa0a6, 0x2ea0,
+	0x772f, 0xb976, 0x40ba, 0x8240, 0x9582, 0x3b96, 0xe3c,	0x230e, 0x8022,
+	0x6f7f, 0x6f,	0x9900, 0x7599, 0x3c75, 0xf3c,	0xf60e, 0xb7f5, 0x79b8,
+	0x1f79, 0xd31f, 0x66d3, 0xb266, 0x16b2, 0x5b16, 0x65b,	0x4b06, 0xcd4a,
+	0xe8cc, 0x9ae8, 0x819a, 0xc81,	0x600d, 0x3a5f, 0xa23a, 0x46a2, 0x3346,
+	0x5f33, 0x4a5f, 0x854a, 0x7285, 0xf73,	0xa10,	0xf209, 0xebf1, 0x5deb,
+	0xe55d, 0x2ee5, 0xd2f,	0xf90c, 0xfff8, 0x6400, 0x5f63, 0xe5f,	0x850e,
+	0xba85, 0x8cba, 0x378d, 0x3437, 0x4734, 0xa147, 0xe0a0, 0x5ae0, 0x665b,
+	0x7d65, 0xe7e,	0xea0e, 0x1de9, 0x631e, 0x5a63, 0x685a, 0x2a68, 0x6b2a,
+	0x8b6a, 0xf8b,	0xe40f, 0x29e4, 0x4d2a, 0x6b4d, 0xb06b, 0xebaf, 0x10ec,
+	0xa910, 0x20a9, 0x5221, 0xe451, 0xd6e4, 0x18d7, 0xa019, 0xd89f, 0x71d8,
+	0x1372, 0x3313, 0x2333, 0x6e23, 0xe6e,	0xfe0e, 0x87fd, 0x488,	0x805,
+	0x7907, 0x9078, 0x1e90, 0xc81e, 0x1ec8, 0x901f, 0x1090, 0x6210, 0x2462,
+	0x4d24, 0x524d, 0x9e52, 0x8b9e, 0xfe8b, 0x4efe, 0xe34e, 0x29e3, 0xa629,
+	0xdca5, 0xb6db, 0x64b6, 0xab64, 0x5aab, 0x1d5a, 0x901d, 0x3490, 0xc134,
+	0x90c1, 0xe490, 0x3ae5, 0xe33a, 0x82e3, 0xdc82, 0xeddc, 0x6ded, 0xa06d,
+	0x90a0, 0xa490, 0x2ba5, 0x632b, 0xc562, 0x25c5, 0x5e25, 0xc5e,	0x9c0c,
+	0x359b, 0xec35, 0x48ec, 0xc048, 0x7c1,	0xa407, 0xe0a4, 0xde1,	0x8f0d,
+	0xf18e, 0xc9f1, 0x3fc9, 0xb23f, 0x7ab2, 0xa07a, 0x9da0, 0x1d9d, 0xd31c,
+	0xdbd2, 0x45dc, 0xa145, 0x1a2,	0x1e86, 0x2b1e, 0x8d2b, 0xd58c, 0x3d6,
+	0xfd03, 0xf0fc, 0x7cf1, 0xa87c, 0xbba8, 0xb9ba, 0xb8b9, 0xceb8, 0x6acf,
+	0xf86a, 0xd4f8, 0x2cd5, 0x332d, 0xa932, 0x3ba9, 0xaf3b, 0x7eaf, 0x37f,
+	0xa303, 0xd4a2, 0x24d4, 0x9224, 0x2592, 0x9225, 0x7c91, 0xd27c, 0xacd2,
+	0x67ac, 0x2267, 0xf221, 0xa7f1, 0xb5a8, 0xaab5, 0xb9aa, 0x5ba,	0x1105,
+	0x8410, 0x2484, 0xc923, 0xcac8, 0x10cb, 0xfd10, 0xbcfc, 0xbdbd, 0x77bd,
+	0x9977, 0xb599, 0x7db5, 0x627d, 0xcc62, 0x80cc, 0x4a81, 0x534a, 0x653,
+	0xa905, 0x55a9, 0xd155, 0x3bd1, 0x33c,	0x7302, 0xbd73, 0xabbc, 0x78ab,
+	0x3779, 0xdb37, 0xffdb, 0xdfff, 0x20df, 0x1d21, 0xb91c, 0x3cb9, 0x333d,
+	0x2233, 0x22,	0xdd00, 0x83dd, 0x5b83, 0xd15b, 0xe1d0, 0x42e1, 0xc142,
+	0x83c1, 0xbe83, 0xabbe, 0x11ac, 0x611,	0x5c06, 0x195c, 0xc319, 0x80c3,
+	0xee80, 0x49ee, 0x724a, 0xec72, 0x42ec, 0x2443, 0x3424, 0xa634, 0xcba5,
+	0x5acb, 0xe45a, 0x15e4, 0xe415, 0xdce4, 0xc3dc, 0xfbc3, 0x5efb, 0xb85e,
+	0x5b9,	0x8d05, 0x178d, 0xeb16, 0xf8ea, 0x3cf9, 0x803d, 0xee80, 0xcbee,
+	0x67cb, 0xd68,	0xfd0c, 0xf5fc, 0xbef6, 0x83be, 0x7d83, 0x87d,	0xff07,
+	0x9ff,	0xa809, 0x38a7, 0x9638, 0x2796, 0xaa27, 0x61aa, 0xc761, 0xfbc7,
+	0x51fc, 0x3852, 0xda37, 0xc4da, 0x21c4, 0x9e21, 0xa49e, 0x2ba5, 0xf82b,
+	0x93f7, 0xe393, 0x15e3, 0x3c16, 0x763c, 0xdf75, 0xf5de, 0x96f5, 0xa096,
+	0xf3a0, 0x8cf3, 0xd28c, 0x5d3,	0xe305, 0xf2e2, 0xbcf2, 0x4bbd, 0x714b,
+	0x2e71, 0xca2e, 0xe4ca, 0xd4e4, 0xe4d4, 0x63e4, 0x8363, 0x3b83, 0x2b3b,
+	0x402b, 0x8f3f, 0x3b8f, 0xc53b, 0xedc5, 0x8928, 0x889,	0x5e09, 0x405e,
+	0x9340, 0x7493, 0xb573, 0xbb6,	0xd10a, 0x85d1, 0x8385, 0x1683, 0x4217,
+	0x5d42, 0x1f5d, 0x7b1f, 0xa07a, 0xa3a0, 0x89a3, 0x5e8a, 0x145f, 0xa314,
+	0xfca2, 0x52fc, 0x2a53, 0x9229, 0x6e92, 0x1a6f, 0x8019, 0x7f7f, 0xf17e,
+	0xedf0, 0x6aee, 0xb66a, 0x50b6, 0xa750, 0x7ba7, 0x617b, 0xf561, 0x33f5,
+	0x5a33, 0x885a, 0xc187, 0x4bc1, 0xe64b, 0x41e6, 0x6342, 0x1363, 0xf113,
+	0x54f0, 0xf354, 0x26f3, 0xbe26, 0xc3bd, 0xe6c3, 0xcbe6, 0xbacc, 0xf5ba,
+	0x34f5, 0xb334, 0xc8b2, 0x2ac9, 0x882a, 0x6d88, 0x256d, 0xde25, 0x1ede,
+	0x211e, 0x2421, 0xb124, 0x17b1, 0x3c18, 0xf93b, 0xd8f8, 0x3bd9, 0xb3c,
+	0xcd0b, 0x5fcd, 0x6e5f, 0x646e, 0x5e64, 0xf25d, 0xe9f2, 0x14ea, 0xdf14,
+	0xeede, 0x5fee, 0xcd5f, 0x59cd, 0x245a, 0x9b24, 0x399b, 0xba39, 0x14bb,
+	0x1b15, 0x4d1b, 0x974c, 0x8d97, 0xf28d, 0x35f2, 0xd36,	0x50d,	0x8905,
+	0x8c88, 0xc98c, 0x99c9, 0x1399, 0xbb13, 0x90bb, 0xc190, 0xfc2,	0xe60f,
+	0x84e5, 0x3685, 0xab36, 0x7ab,	0xc907, 0x62c9, 0x8062, 0x4081, 0x9940,
+	0x2399, 0x9b23, 0x929a, 0x2b92, 0x1b2b, 0x941b, 0xe793, 0x48e7, 0x8a48,
+	0x308a, 0x8630, 0xf785, 0x7cf7, 0xcd7c, 0xeecd, 0x3aef, 0x93b,	0xbd08,
+	0x85bd, 0x9085, 0x5390, 0xa253, 0x2a3,	0xac02, 0x91ab, 0xf791, 0x9cf7,
+	0x89d,	0xa708, 0xfda6, 0xe5fc, 0x74e6, 0xa75,	0x370a, 0x4d37, 0x7d4c,
+	0x5d7d, 0x165e, 0x7815, 0xeb77, 0x70eb, 0x4670, 0x9246, 0x9592, 0x6696,
+	0x667,	0x6106, 0xb360, 0xc7b3, 0x72c7, 0xf272, 0xd0f2, 0x36d0, 0x3136,
+	0x4f31, 0x2c4f, 0x772c, 0x4777, 0x3747, 0xe38,	0x1893, 0x8018, 0x2280,
+	0xcf22, 0xbbce, 0x3ebc, 0x7f3e, 0x697f, 0x4469, 0x7844, 0xaa77, 0xbca9,
+	0xb5bc, 0xcdb5, 0xffcd, 0x9e00, 0x59e,	0xc805, 0x82c7, 0xe83,	0x6a0f,
+	0x106a, 0xd910, 0x47d9, 0x1847, 0x9517, 0x8d94, 0x5b8d, 0x835b, 0x1383,
+	0x5013, 0xed4f, 0x30ed, 0x6d30, 0x8c6d, 0xd58b, 0xc4d5, 0x65c5, 0x9265,
+	0xb692, 0x75b6, 0xb975, 0x27b9, 0xa227, 0x19a2, 0x1f19, 0xbd1f, 0x84bc,
+	0x3185, 0xb630, 0xdb6,	0x720d, 0x2e72, 0x662e, 0x1566, 0xd615, 0x2dd6,
+	0x4f2e, 0x814e, 0x1d81, 0x7b1d, 0x4b7b, 0xfb4b, 0x15fb, 0x1215, 0xb412,
+	0x36b3, 0x7d36, 0xfc7d, 0x6cfc, 0x9a6d, 0xa9b,	0x930a, 0x1693, 0xaa16,
+	0x92a9, 0xa792, 0xf6a7, 0x86f6, 0x9787, 0xfa97, 0x1ffa, 0xc61f, 0x23c6,
+	0x8d23, 0x18d,	0xf501, 0xaaf4, 0xfaaa, 0x75fb, 0x3576, 0x9835, 0x798,
+	0x3008, 0x2130, 0x4021, 0x803f, 0x5e80, 0xd55e, 0xf6d4, 0x7bf7, 0xba7b,
+	0x86ba, 0x6386, 0x7d63, 0x977d, 0x2797, 0x4228, 0x5d42, 0xf25c, 0x2df3,
+	0xd62d, 0x62d6, 0xa063, 0xee9f, 0xc7ee, 0x73c7, 0xba73, 0xb3ba, 0xc5b3,
+	0xc7c5, 0x48c7, 0x7048, 0xf66f, 0xf6f5, 0x9cf6, 0xc59d, 0x63c5, 0x9863,
+	0xce98, 0x67ce, 0x4d68, 0x884d, 0x2488, 0xc323, 0x78c3, 0x7978, 0x2479,
+	0x8524, 0xc385, 0x1ac4, 0x471a, 0xf546, 0x73f5, 0xbc73, 0xa4bc, 0x11a5,
+	0x6d11, 0x416d, 0x3b41, 0x553b, 0x3d55, 0x5b3d, 0xc75b, 0x59c7, 0x3859,
+	0x9637, 0xc895, 0x79c8, 0x1779, 0xc417, 0x8bc4, 0xdb8b, 0xc4db, 0x7ec4,
+	0x497f, 0xa449, 0x6ea4, 0x206f, 0x7b20, 0x687a, 0x1669, 0xbd16, 0x1ebd,
+	0x3d1e, 0xc13c, 0x4cc1, 0x4e4c, 0x794e, 0x6379, 0x6364, 0x4121, 0x4a41,
+	0xec4a, 0x2cec, 0x2f2d, 0x312f, 0xa630, 0xfa6,	0xb80e, 0xe8b7, 0x8ae8,
+	0xdf8a, 0x1ae0, 0xf21a, 0xf8f1, 0x4df9, 0x5b4e, 0x355b, 0x5f35, 0x360,
+	0x5803, 0x1358, 0xf812, 0x88f7, 0x1b89, 0x291b, 0xec28, 0x5aec, 0x495a,
+	0xca48, 0x8bca, 0x1b8b, 0x7a1b, 0x717a, 0x2971, 0x5829, 0xe058, 0x96e0,
+	0xf896, 0xcf9,	0xa90c, 0x96a8, 0x8196, 0x1381, 0x5a13, 0x8059, 0xd780,
+	0xcfd6, 0xa1d0, 0x58a1, 0x3c58, 0x7c3c, 0xa17b, 0xbfa1, 0x61bf, 0x4062,
+	0xe040, 0x6fe0, 0xf46f, 0xc5f3, 0x67c5, 0x2168, 0x1321, 0x7213, 0xea71,
+	0x6fea, 0xb96f, 0x4bb9, 0x964c, 0xaa96, 0x8ab,	0x9208, 0x6d91, 0xad6d,
+	0xc2ad, 0xc5c2, 0x43c6, 0x2444, 0x6323, 0xa663, 0xa8a6, 0x32a8, 0x5b33,
+	0x15b,	0x2e01, 0x532e, 0x8f53, 0x98f,	0x4809, 0x9148, 0x3b91, 0x8b3b,
+	0xf08a, 0xf1,	0x401,	0x104,	0xef00, 0x13ef, 0xd313, 0xcdd2, 0x2fce,
+	0xb82f, 0x9ab8, 0xea9a, 0x49ea, 0xc849, 0x45c8, 0x3246, 0x3b33, 0x5c3b,
+	0x715c, 0x9671, 0xd96,	0xf80d, 0x21f8, 0x4d21, 0xa24c, 0xdfa1, 0x88df,
+	0x2989, 0x9329, 0xca92, 0xa1ca, 0x72a1, 0x4672, 0xe146, 0xfce1, 0x2afd,
+	0x292b, 0x7629, 0x5d75, 0xd75d, 0xc6d6, 0x3fc6, 0x8b3f, 0xb68b, 0x3b7,
+	0x1803, 0xd817, 0x34d8, 0x2b35, 0x5a2b, 0xf5a,	0x440f, 0xf543, 0x38f5,
+	0x6939, 0xc469, 0x27c4, 0xf827, 0x77f8, 0x2877, 0x7428, 0x3274, 0xbd31,
+	0xd7bc, 0x6ed7, 0xe36e, 0xee4,	0x4a0e, 0xad49, 0x6ead, 0x796e, 0xd279,
+	0xebd2, 0xfceb, 0x99fc, 0x929a, 0xc93,	0x630d, 0x7462, 0x8874, 0xdd88,
+	0xf5dc, 0x6ef5, 0xed6e, 0xa1ed, 0xc9a1, 0x7dc9, 0x597d, 0xc159, 0xb47f,
+	0x3cb4, 0x133d, 0xd312, 0xa2d2, 0xa1a2, 0x86a1, 0x3287, 0x1d32, 0xd1d,
+	0x840c, 0x8f83, 0x7090, 0x5f70, 0xd55f, 0x42d6, 0x4942, 0x3849, 0x7e37,
+	0x447e, 0x5b45, 0xa75b, 0x56a7, 0x8856, 0xe187, 0xdfe0, 0x27e0, 0x8927,
+	0x9288, 0xce92, 0x28ce, 0x9e28, 0x959e, 0xa295, 0x8fa2, 0xae8f, 0x13af,
+	0x7413, 0x5274, 0x7e52, 0xe97d, 0xf7e8, 0x9bf7, 0x5e9b, 0xca5e, 0x22ca,
+	0x623,	0xda05, 0x14da, 0xb214, 0x88b1, 0xe688, 0x53e6, 0xe053, 0xd4e0,
+	0xe8d4, 0xcce8, 0x45cd, 0xc45,	0x220c, 0x4022, 0xdd3f, 0x95dd, 0x4096,
+	0x8440, 0xad84, 0x27ad, 0xd326, 0x70d3, 0x4171, 0x2142, 0xc521, 0x9c5,
+	0xdb09, 0x9eda, 0xd49e, 0xf1d4, 0x36f2, 0xf836, 0x83f8, 0x4984, 0x8449,
+	0xf584, 0x5ff5, 0x7b5f, 0x6e7b, 0x956e, 0xfc94, 0x30fc, 0x6231, 0x1e62,
+	0x4c1e, 0x5f4c, 0xb65f, 0x1b6,	0xd801, 0xa2d7, 0x11a3, 0xe711, 0x54e7,
+	0xfc54, 0xe8fb, 0xb8e9, 0xdab8, 0x27db, 0x3228, 0x8931, 0xf289, 0xe5f2,
+	0xb3e5, 0xa4b4, 0x1ba4, 0x3c1b, 0x1d3c, 0xf71c, 0xb0f6, 0x6db0, 0x616d,
+	0xba61, 0xa5ba, 0xe2a5, 0xee3,	0xd90e, 0x39d9, 0xd739, 0x88d7, 0xf288,
+	0xb4f2, 0x67b4, 0x9167, 0x2591, 0x1526, 0x5115, 0x3350, 0xde32, 0x27de,
+	0x1428, 0x2b14, 0xf22a, 0x4f2,	0x6405, 0xee63, 0x66ee, 0x9b67, 0xdb9a,
+	0xf5db, 0x8bf6, 0xaf8b, 0x40af, 0x6340, 0xdb62, 0xc7da, 0x4cc8, 0x4d4d,
+	0x524d, 0xa52,	0x5809, 0xc657, 0xacc6, 0x57ac, 0x1a58, 0x221a, 0x6f21,
+	0xf66f, 0xd7f6, 0xe4d8, 0xa5e4, 0x4a6,	0x2d05, 0x3a2d, 0xa639, 0xb4a6,
+	0x32b5, 0x7132, 0x7370, 0xe372, 0xffe2, 0x800,	0x3a08, 0x9c39, 0x29d,
+	0x2825, 0xad27, 0xf3ad, 0xf5f3, 0x7f6,	0xc607, 0x7fc5, 0xe27f, 0x72e2,
+	0x7a72, 0x607a, 0x8460, 0x5e84, 0x625e, 0xf461, 0x83f4, 0x4c84, 0xcc4c,
+	0xdccb, 0x43dd, 0x2144, 0x5e21, 0x925e, 0xb691, 0x2ab6, 0xe42a, 0xc4e3,
+	0xbc5,	0xae0b, 0xffad, 0x8eff, 0xf48e, 0xc8f4, 0x7fc8, 0xe97f, 0x1fe9,
+	0x5420, 0xd553, 0x6cd5, 0xc96c, 0x59c9, 0x9a59, 0xca99, 0x68ca, 0x3d68,
+	0x7c3d, 0x527c, 0x4452, 0xc744, 0xd2c6, 0xfbd2, 0x8efb, 0x408e, 0xb640,
+	0xecb5, 0x44ed, 0xa545, 0x1a5,	0x8f01, 0xf08e, 0xd9f0, 0x1ada, 0x41b,
+	0xc803, 0x5ec7, 0x445f, 0x4044, 0x640,	0xd07,	0x6f0d, 0xfd6e, 0xd3fd,
+	0xb7d3, 0xedb7, 0x33ee, 0xb233, 0x92b2, 0x8893, 0x9288, 0xe292, 0x96e2,
+	0x9f96, 0xfb9f, 0x52fb, 0x6452, 0x3f64, 0x783f, 0xbd77, 0x9fbd, 0x17a0,
+	0x1c17, 0x231c, 0x1323, 0xb413, 0x15b4, 0x5f16, 0x6f5e, 0x426f, 0x543,
+	0x4505, 0xda45, 0x52da, 0xfc52, 0x9afc, 0xd29a, 0x89d2, 0xbc89, 0x77bc,
+	0x1478, 0xd913, 0x79d9, 0x7f79, 0x77f,	0x9f07, 0x289f, 0x2d28, 0xbd2c,
+	0xa5bd, 0xf1a5, 0x6cf2, 0x736d, 0xb673, 0xceb5, 0xc3ce, 0x15c3, 0xa415,
+	0xbaa4, 0xf2ba, 0xf1f2, 0x84f1, 0x7884, 0x8678, 0x6186, 0x4661, 0xf845,
+	0xf7f7, 0x4cf8, 0xbf4c, 0x49bf, 0x5c4a, 0x4a5c, 0xab4a, 0x89ab, 0x8689,
+	0xf485, 0x60f4, 0xef60, 0x4eef, 0x194f, 0x7e19, 0x707e, 0xfa6f, 0x35fa,
+	0x3036, 0xf02f, 0x17f0, 0xc517, 0x79c4, 0xa279, 0x7ba2, 0x67c,	0xa07,
+	0x7b09, 0x687b, 0xf868, 0xbbf8, 0xd7bb, 0x30d8, 0x8231, 0xb582, 0xaab4,
+	0xaaaa, 0x90aa, 0xaf90, 0x2faf, 0x262f, 0x4126, 0xe640, 0x91e6, 0x9991,
+	0x1a9a
+};
+
+#define IPv4_MIN_WORDS 5
+#define IPv4_MAX_WORDS 15
+#define NUM_IPv6_TESTS 200
+#define NUM_IP_FAST_CSUM_TESTS 181
+
+static void test_ip_fast_csum(struct kunit *test)
+{
+	__sum16 csum_result, expected;
+
+	for (int len = IPv4_MIN_WORDS; len < IPv4_MAX_WORDS; len++) {
+		for (int index = 0; index < NUM_IP_FAST_CSUM_TESTS; index++) {
+			csum_result = ip_fast_csum(random_buf + index, len);
+			expected =
+				expected_fast_csum[(len - IPv4_MIN_WORDS) *
+						   NUM_IP_FAST_CSUM_TESTS +
+						   index];
+			CHECK_EQ(expected, csum_result);
+		}
+	}
+}
+
+static void test_csum_ipv6_magic(struct kunit *test)
+{
+	const struct in6_addr *saddr;
+	const struct in6_addr *daddr;
+	unsigned int len;
+	unsigned char proto;
+	unsigned int csum;
+
+	const int daddr_offset = sizeof(struct in6_addr);
+	const int len_offset = sizeof(struct in6_addr) + sizeof(struct in6_addr);
+	const int proto_offset = sizeof(struct in6_addr) + sizeof(struct in6_addr) +
+			     sizeof(int);
+	const int csum_offset = sizeof(struct in6_addr) + sizeof(struct in6_addr) +
+			    sizeof(int) + sizeof(char);
+
+	for (int i = 0; i < NUM_IPv6_TESTS; i++) {
+		saddr = (const struct in6_addr *)(random_buf + i);
+		daddr = (const struct in6_addr *)(random_buf + i +
+						  daddr_offset);
+		len = *(unsigned int *)(random_buf + i + len_offset);
+		proto = *(random_buf + i + proto_offset);
+		csum = *(unsigned int *)(random_buf + i + csum_offset);
+		CHECK_EQ(expected_csum_ipv6_magic[i],
+			 csum_ipv6_magic(saddr, daddr, len, proto, csum));
+	}
+}
+
+static struct kunit_case __refdata riscv_checksum_test_cases[] = {
+	KUNIT_CASE(test_ip_fast_csum),
+	KUNIT_CASE(test_csum_ipv6_magic),
+	{}
+};
+
+static struct kunit_suite riscv_checksum_test_suite = {
+	.name = "riscv_checksum",
+	.test_cases = riscv_checksum_test_cases,
+};
+
+kunit_test_suites(&riscv_checksum_test_suite);
+
+MODULE_AUTHOR("Charlie Jenkins <charlie@rivosinc.com>");
+MODULE_LICENSE("GPL");

-- 
2.42.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* RE: [PATCH v4 2/5] riscv: Add checksum library
  2023-09-11 22:57 ` [PATCH v4 2/5] riscv: Add checksum library Charlie Jenkins
@ 2023-09-12  8:45   ` David Laight
  2023-09-13  3:09     ` Charlie Jenkins
  2023-09-14 12:25   ` Conor Dooley
  1 sibling, 1 reply; 24+ messages in thread
From: David Laight @ 2023-09-12  8:45 UTC (permalink / raw)
  To: 'Charlie Jenkins',
	Palmer Dabbelt, Conor Dooley, Samuel Holland, linux-riscv,
	linux-kernel
  Cc: Paul Walmsley, Albert Ou

From: Charlie Jenkins
> Sent: 11 September 2023 23:57
> 
> Provide a 32 and 64 bit version of do_csum. When compiled for 32-bit
> will load from the buffer in groups of 32 bits, and when compiled for
> 64-bit will load in groups of 64 bits. Benchmarking by proxy compiling
> csum_ipv6_magic (64-bit version) for an x86 chip as well as running
> the riscv generated code in QEMU, discovered that summing in a
> tree-like structure is about 4% faster than doing 64-bit reads.
> 
...
> +	sum   = saddr->s6_addr32[0];
> +	sum  += saddr->s6_addr32[1];
> +	sum1  = saddr->s6_addr32[2];
> +	sum1 += saddr->s6_addr32[3];
> +
> +	sum2  = daddr->s6_addr32[0];
> +	sum2 += daddr->s6_addr32[1];
> +	sum3  = daddr->s6_addr32[2];
> +	sum3 += daddr->s6_addr32[3];
> +
> +	sum4  = csum;
> +	sum4 += ulen;
> +	sum4 += uproto;
> +
> +	sum  += sum1;
> +	sum2 += sum3;
> +
> +	sum += sum2;
> +	sum += sum4;

Have you got gcc to compile that as-is?

Whenever I've tried to get a 'tree add' compiled so that the
early adds can be executed in parallel gcc always pessimises
it to a linear sequence of adds.

But I agree that adding 32bit values to a 64bit register
may be no slower than trying to do an 'add carry' sequence
that is guaranteed to only do one add/clock.
(And on Intel cpu from core-2 until IIRC Haswell adc took 2 clocks!)

IIRC RISCV doesn't have a carry flag, so the adc sequence
is hard - probably takes two extra instructions per value.
Although with parallel execute it may not matter.
Consider:
	val = buf[offset];
	sum += val;
	carry += sum < val;
	val = buf[offset1];
	sum += val;
	...
the compare and 'carry +=' can be executed at the same time
as the following two instructions.
You do then a final sum += carry; sum += sum < carry;

Assuming all instructions are 1 clock and any read delays
get filled with other instructions (by source or hardware
instruction re-ordering) even without parallel execute
that is 4 clocks for 64 bits, which is much the same as the
2 clocks for 32 bits.

Remember that all the 32bit values can summed first as
they won't overflow.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/5] riscv: Checksum header
  2023-09-11 22:57 ` [PATCH v4 1/5] riscv: Checksum header Charlie Jenkins
@ 2023-09-12 10:24   ` Emil Renner Berthing
  2023-09-13  2:38     ` Charlie Jenkins
  0 siblings, 1 reply; 24+ messages in thread
From: Emil Renner Berthing @ 2023-09-12 10:24 UTC (permalink / raw)
  To: Charlie Jenkins, Palmer Dabbelt, Conor Dooley, Samuel Holland,
	David Laight, linux-riscv, linux-kernel
  Cc: Paul Walmsley, Albert Ou

Charlie Jenkins wrote:
> Provide checksum algorithms that have been designed to leverage riscv
> instructions such as rotate. In 64-bit, can take advantage of the larger
> register to avoid some overflow checking.
>
> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> ---
>  arch/riscv/include/asm/checksum.h | 95 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 95 insertions(+)
>
> diff --git a/arch/riscv/include/asm/checksum.h b/arch/riscv/include/asm/checksum.h
> new file mode 100644
> index 000000000000..0d7fc8275a5e
> --- /dev/null
> +++ b/arch/riscv/include/asm/checksum.h
> @@ -0,0 +1,95 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * IP checksum routines
> + *
> + * Copyright (C) 2023 Rivos Inc.
> + */
> +#ifndef __ASM_RISCV_CHECKSUM_H
> +#define __ASM_RISCV_CHECKSUM_H
> +
> +#include <linux/in6.h>
> +#include <linux/uaccess.h>
> +
> +#ifdef CONFIG_32BIT
> +typedef unsigned int csum_t;
> +#else
> +typedef unsigned long csum_t;
> +#endif

Hi Charlie,

Isn't unsigned long already 32bit on 32bit RISC-V, so why is this #ifdef
needed?

> +
> +/*
> + *	Fold a partial checksum without adding pseudo headers
> + */
> +static inline __sum16 csum_fold(__wsum sum)
> +{
> +	return (~sum - ror32(sum, 16)) >> 16;
> +}
> +
> +#define csum_fold csum_fold
> +
> +/*
> + * Quickly compute an IP checksum with the assumption that IPv4 headers will
> + * always be in multiples of 32-bits, and have an ihl of at least 5.
> + * @ihl is the number of 32 bit segments and must be greater than or equal to 5.
> + * @iph is assumed to be word aligned.
> + */
> +static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
> +{
> +	csum_t csum = 0;
> +	int pos = 0;
> +
> +	do {
> +		csum += ((const unsigned int *)iph)[pos];
> +		if (IS_ENABLED(CONFIG_32BIT))
> +			csum += csum < ((const unsigned int *)iph)[pos];
> +	} while (++pos < ihl);
> +
> +	/*
> +	 * ZBB only saves three instructions on 32-bit and five on 64-bit so not
> +	 * worth checking if supported without Alternatives.
> +	 */
> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZBB) &&
> +	    IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
> +		csum_t fold_temp;
> +
> +		asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
> +					      RISCV_ISA_EXT_ZBB, 1)
> +		    :
> +		    :
> +		    :
> +		    : no_zbb);
> +
> +		if (IS_ENABLED(CONFIG_32BIT)) {
> +			asm(".option push				\n\
> +			.option arch,+zbb				\n\
> +				not	%[fold_temp], %[csum]		\n\
> +				rori	%[csum], %[csum], 16		\n\
> +				sub	%[csum], %[fold_temp], %[csum]	\n\
> +			.option pop"
> +			: [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp));
> +		} else {
> +			asm(".option push				\n\
> +			.option arch,+zbb				\n\
> +				rori	%[fold_temp], %[csum], 32	\n\
> +				add	%[csum], %[fold_temp], %[csum]	\n\
> +				srli	%[csum], %[csum], 32		\n\
> +				not	%[fold_temp], %[csum]		\n\
> +				roriw	%[csum], %[csum], 16		\n\
> +				subw	%[csum], %[fold_temp], %[csum]	\n\
> +			.option pop"
> +			: [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp));
> +		}
> +		return csum >> 16;
> +	}
> +no_zbb:
> +#ifndef CONFIG_32BIT
> +		csum += (csum >> 32) | (csum << 32);
> +		csum >>= 32;

The indentation seems off here.

/Emil

> +#endif
> +	return csum_fold((__force __wsum)csum);
> +}
> +
> +#define ip_fast_csum ip_fast_csum
> +
> +#include <asm-generic/checksum.h>
> +
> +#endif // __ASM_RISCV_CHECKSUM_H
>
> --
> 2.42.0
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.inradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/5] riscv: Checksum header
  2023-09-12 10:24   ` Emil Renner Berthing
@ 2023-09-13  2:38     ` Charlie Jenkins
  2023-09-13  9:19       ` Emil Renner Berthing
  0 siblings, 1 reply; 24+ messages in thread
From: Charlie Jenkins @ 2023-09-13  2:38 UTC (permalink / raw)
  To: Emil Renner Berthing
  Cc: Palmer Dabbelt, Conor Dooley, Samuel Holland, David Laight,
	linux-riscv, linux-kernel, Paul Walmsley, Albert Ou

On Tue, Sep 12, 2023 at 03:24:29AM -0700, Emil Renner Berthing wrote:
> Charlie Jenkins wrote:
> > Provide checksum algorithms that have been designed to leverage riscv
> > instructions such as rotate. In 64-bit, can take advantage of the larger
> > register to avoid some overflow checking.
> >
> > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > ---
> >  arch/riscv/include/asm/checksum.h | 95 +++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 95 insertions(+)
> >
> > diff --git a/arch/riscv/include/asm/checksum.h b/arch/riscv/include/asm/checksum.h
> > new file mode 100644
> > index 000000000000..0d7fc8275a5e
> > --- /dev/null
> > +++ b/arch/riscv/include/asm/checksum.h
> > @@ -0,0 +1,95 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * IP checksum routines
> > + *
> > + * Copyright (C) 2023 Rivos Inc.
> > + */
> > +#ifndef __ASM_RISCV_CHECKSUM_H
> > +#define __ASM_RISCV_CHECKSUM_H
> > +
> > +#include <linux/in6.h>
> > +#include <linux/uaccess.h>
> > +
> > +#ifdef CONFIG_32BIT
> > +typedef unsigned int csum_t;
> > +#else
> > +typedef unsigned long csum_t;
> > +#endif
> 
> Hi Charlie,
> 
> Isn't unsigned long already 32bit on 32bit RISC-V, so why is this #ifdef
> needed?
Oh, I wasn't sure so I ran sizeof(long) in qemu-system-riscv32 and it
gave me 8 so assumed a long was 8 bytes. Do you think it would make what
is going on more clear if I use u32 and u64 or would you recommend just
using long?
> 
> > +
> > +/*
> > + *	Fold a partial checksum without adding pseudo headers
> > + */
> > +static inline __sum16 csum_fold(__wsum sum)
> > +{
> > +	return (~sum - ror32(sum, 16)) >> 16;
> > +}
> > +
> > +#define csum_fold csum_fold
> > +
> > +/*
> > + * Quickly compute an IP checksum with the assumption that IPv4 headers will
> > + * always be in multiples of 32-bits, and have an ihl of at least 5.
> > + * @ihl is the number of 32 bit segments and must be greater than or equal to 5.
> > + * @iph is assumed to be word aligned.
> > + */
> > +static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
> > +{
> > +	csum_t csum = 0;
> > +	int pos = 0;
> > +
> > +	do {
> > +		csum += ((const unsigned int *)iph)[pos];
> > +		if (IS_ENABLED(CONFIG_32BIT))
> > +			csum += csum < ((const unsigned int *)iph)[pos];
> > +	} while (++pos < ihl);
> > +
> > +	/*
> > +	 * ZBB only saves three instructions on 32-bit and five on 64-bit so not
> > +	 * worth checking if supported without Alternatives.
> > +	 */
> > +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZBB) &&
> > +	    IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
> > +		csum_t fold_temp;
> > +
> > +		asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
> > +					      RISCV_ISA_EXT_ZBB, 1)
> > +		    :
> > +		    :
> > +		    :
> > +		    : no_zbb);
> > +
> > +		if (IS_ENABLED(CONFIG_32BIT)) {
> > +			asm(".option push				\n\
> > +			.option arch,+zbb				\n\
> > +				not	%[fold_temp], %[csum]		\n\
> > +				rori	%[csum], %[csum], 16		\n\
> > +				sub	%[csum], %[fold_temp], %[csum]	\n\
> > +			.option pop"
> > +			: [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp));
> > +		} else {
> > +			asm(".option push				\n\
> > +			.option arch,+zbb				\n\
> > +				rori	%[fold_temp], %[csum], 32	\n\
> > +				add	%[csum], %[fold_temp], %[csum]	\n\
> > +				srli	%[csum], %[csum], 32		\n\
> > +				not	%[fold_temp], %[csum]		\n\
> > +				roriw	%[csum], %[csum], 16		\n\
> > +				subw	%[csum], %[fold_temp], %[csum]	\n\
> > +			.option pop"
> > +			: [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp));
> > +		}
> > +		return csum >> 16;
> > +	}
> > +no_zbb:
> > +#ifndef CONFIG_32BIT
> > +		csum += (csum >> 32) | (csum << 32);
> > +		csum >>= 32;
> 
> The indentation seems off here.
> 
> /Emil
> 
> > +#endif
> > +	return csum_fold((__force __wsum)csum);
> > +}
> > +
> > +#define ip_fast_csum ip_fast_csum
> > +
> > +#include <asm-generic/checksum.h>
> > +
> > +#endif // __ASM_RISCV_CHECKSUM_H
> >
> > --
> > 2.42.0
> >
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.inradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 2/5] riscv: Add checksum library
  2023-09-12  8:45   ` David Laight
@ 2023-09-13  3:09     ` Charlie Jenkins
  2023-09-13  8:47       ` David Laight
  0 siblings, 1 reply; 24+ messages in thread
From: Charlie Jenkins @ 2023-09-13  3:09 UTC (permalink / raw)
  To: David Laight
  Cc: Palmer Dabbelt, Conor Dooley, Samuel Holland, linux-riscv,
	linux-kernel, Paul Walmsley, Albert Ou

On Tue, Sep 12, 2023 at 08:45:38AM +0000, David Laight wrote:
> From: Charlie Jenkins
> > Sent: 11 September 2023 23:57
> > 
> > Provide a 32 and 64 bit version of do_csum. When compiled for 32-bit
> > will load from the buffer in groups of 32 bits, and when compiled for
> > 64-bit will load in groups of 64 bits. Benchmarking by proxy compiling
> > csum_ipv6_magic (64-bit version) for an x86 chip as well as running
> > the riscv generated code in QEMU, discovered that summing in a
> > tree-like structure is about 4% faster than doing 64-bit reads.
> > 
> ...
> > +	sum   = saddr->s6_addr32[0];
> > +	sum  += saddr->s6_addr32[1];
> > +	sum1  = saddr->s6_addr32[2];
> > +	sum1 += saddr->s6_addr32[3];
> > +
> > +	sum2  = daddr->s6_addr32[0];
> > +	sum2 += daddr->s6_addr32[1];
> > +	sum3  = daddr->s6_addr32[2];
> > +	sum3 += daddr->s6_addr32[3];
> > +
> > +	sum4  = csum;
> > +	sum4 += ulen;
> > +	sum4 += uproto;
> > +
> > +	sum  += sum1;
> > +	sum2 += sum3;
> > +
> > +	sum += sum2;
> > +	sum += sum4;
> 
> Have you got gcc to compile that as-is?
> 
> Whenever I've tried to get a 'tree add' compiled so that the
> early adds can be executed in parallel gcc always pessimises
> it to a linear sequence of adds.
> 
> But I agree that adding 32bit values to a 64bit register
> may be no slower than trying to do an 'add carry' sequence
> that is guaranteed to only do one add/clock.
> (And on Intel cpu from core-2 until IIRC Haswell adc took 2 clocks!)
> 
> IIRC RISCV doesn't have a carry flag, so the adc sequence
> is hard - probably takes two extra instructions per value.
> Although with parallel execute it may not matter.
> Consider:
> 	val = buf[offset];
> 	sum += val;
> 	carry += sum < val;
> 	val = buf[offset1];
> 	sum += val;
> 	...
> the compare and 'carry +=' can be executed at the same time
> as the following two instructions.
> You do then a final sum += carry; sum += sum < carry;
> 
> Assuming all instructions are 1 clock and any read delays
> get filled with other instructions (by source or hardware
> instruction re-ordering) even without parallel execute
> that is 4 clocks for 64 bits, which is much the same as the
> 2 clocks for 32 bits.
> 
> Remember that all the 32bit values can summed first as
> they won't overflow.
> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
Yeah it does seem like the tree-add does just do a linear add. All three
of them were pretty much the same on riscv so I used the version that
did best on x86 with the knowledge that my QEMU setup does not
accurately represent real hardware.

I don't quite understand how doing the carry in the middle of each
stage, even though it can be executed at the same time, would be faster
than just doing a single overflow check at the end. I can just revert
back to the non-tree add version since there is no improvement on riscv.
I can also revert back to the default version that uses carry += sum < val
as well.

- Charlie


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v4 2/5] riscv: Add checksum library
  2023-09-13  3:09     ` Charlie Jenkins
@ 2023-09-13  8:47       ` David Laight
  2023-09-13 23:18         ` Charlie Jenkins
  0 siblings, 1 reply; 24+ messages in thread
From: David Laight @ 2023-09-13  8:47 UTC (permalink / raw)
  To: 'Charlie Jenkins'
  Cc: Palmer Dabbelt, Conor Dooley, Samuel Holland, linux-riscv,
	linux-kernel, Paul Walmsley, Albert Ou

From: Charlie Jenkins
> Sent: 13 September 2023 04:10
> 
> On Tue, Sep 12, 2023 at 08:45:38AM +0000, David Laight wrote:
> > From: Charlie Jenkins
> > > Sent: 11 September 2023 23:57
> > >
> > > Provide a 32 and 64 bit version of do_csum. When compiled for 32-bit
> > > will load from the buffer in groups of 32 bits, and when compiled for
> > > 64-bit will load in groups of 64 bits. Benchmarking by proxy compiling
> > > csum_ipv6_magic (64-bit version) for an x86 chip as well as running
> > > the riscv generated code in QEMU, discovered that summing in a
> > > tree-like structure is about 4% faster than doing 64-bit reads.
> > >
> > ...
> > > +	sum   = saddr->s6_addr32[0];
> > > +	sum  += saddr->s6_addr32[1];
> > > +	sum1  = saddr->s6_addr32[2];
> > > +	sum1 += saddr->s6_addr32[3];
> > > +
> > > +	sum2  = daddr->s6_addr32[0];
> > > +	sum2 += daddr->s6_addr32[1];
> > > +	sum3  = daddr->s6_addr32[2];
> > > +	sum3 += daddr->s6_addr32[3];
> > > +
> > > +	sum4  = csum;
> > > +	sum4 += ulen;
> > > +	sum4 += uproto;
> > > +
> > > +	sum  += sum1;
> > > +	sum2 += sum3;
> > > +
> > > +	sum += sum2;
> > > +	sum += sum4;
> >
> > Have you got gcc to compile that as-is?
> >
> > Whenever I've tried to get a 'tree add' compiled so that the
> > early adds can be executed in parallel gcc always pessimises
> > it to a linear sequence of adds.
> >
> > But I agree that adding 32bit values to a 64bit register
> > may be no slower than trying to do an 'add carry' sequence
> > that is guaranteed to only do one add/clock.
> > (And on Intel cpu from core-2 until IIRC Haswell adc took 2 clocks!)
> >
> > IIRC RISCV doesn't have a carry flag, so the adc sequence
> > is hard - probably takes two extra instructions per value.
> > Although with parallel execute it may not matter.
> > Consider:
> > 	val = buf[offset];
> > 	sum += val;
> > 	carry += sum < val;
> > 	val = buf[offset1];
> > 	sum += val;
> > 	...
> > the compare and 'carry +=' can be executed at the same time
> > as the following two instructions.
> > You do then a final sum += carry; sum += sum < carry;
> >
> > Assuming all instructions are 1 clock and any read delays
> > get filled with other instructions (by source or hardware
> > instruction re-ordering) even without parallel execute
> > that is 4 clocks for 64 bits, which is much the same as the
> > 2 clocks for 32 bits.
> >
> > Remember that all the 32bit values can summed first as
> > they won't overflow.
> >
> > 	David

> Yeah it does seem like the tree-add does just do a linear add. All three
> of them were pretty much the same on riscv so I used the version that
> did best on x86 with the knowledge that my QEMU setup does not
> accurately represent real hardware.

The problem there is that any measurement on x86 has pretty much
no relevance to what any RISCV cpu might do.
The multiple execution units and out of order execution on x86
are far different from anything any RISCV cpu is likely to have
for many years.
You might get nearer running on one of the Atom cpu - but it won't
really match.
There are too many fundamental differences between the architectures.

All you can do is to find and read the instruction timings for
a target physical cpu and look for things like:
- Whether arithmetic results are available next clock.
  (It probably is)
- How many clocks it takes for read data to be available.
  I suspect the cpu will stall if the data is needed.
  A block of sequential reads is one way to avoid the stall.
  On x86 the instruction that needs the data is just deferred
  until it is available, the following instructions execute
  (provided their input are all available).
- Clock delays for taken/not taken predicted/not predicted branches.
  
> I don't quite understand how doing the carry in the middle of each
> stage, even though it can be executed at the same time, would be faster
> than just doing a single overflow check at the end.

You need to do half as many reads and adds.

> I can just revert
> back to the non-tree add version since there is no improvement on riscv.

The 'tree' version is only likely to be faster on cpu (like x86)
that can (at least sometimes) do two memory reads in one clock
and can do two adds and two read in the same clock.
Even then, without out of order execution, it is hard to get right.

Oh, you might want to try getting the default csum_fold() to
be the faster 'arc' version rather than adding your own version.

	David

> I can also revert back to the default version that uses carry += sum < val
> as well.
> 
> - Charlie

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/5] riscv: Checksum header
  2023-09-13  2:38     ` Charlie Jenkins
@ 2023-09-13  9:19       ` Emil Renner Berthing
  0 siblings, 0 replies; 24+ messages in thread
From: Emil Renner Berthing @ 2023-09-13  9:19 UTC (permalink / raw)
  To: Charlie Jenkins, Emil Renner Berthing
  Cc: Palmer Dabbelt, Conor Dooley, Samuel Holland, David Laight,
	linux-riscv, linux-kernel, Paul Walmsley, Albert Ou

Charlie Jenkins wrote:
> On Tue, Sep 12, 2023 at 03:24:29AM -0700, Emil Renner Berthing wrote:
> > Charlie Jenkins wrote:
> > > Provide checksum algorithms that have been designed to leverage riscv
> > > instructions such as rotate. In 64-bit, can take advantage of the larger
> > > register to avoid some overflow checking.
> > >
> > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > > ---
> > >  arch/riscv/include/asm/checksum.h | 95 +++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 95 insertions(+)
> > >
> > > diff --git a/arch/riscv/include/asm/checksum.h b/arch/riscv/include/asm/checksum.h
> > > new file mode 100644
> > > index 000000000000..0d7fc8275a5e
> > > --- /dev/null
> > > +++ b/arch/riscv/include/asm/checksum.h
> > > @@ -0,0 +1,95 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +/*
> > > + * IP checksum routines
> > > + *
> > > + * Copyright (C) 2023 Rivos Inc.
> > > + */
> > > +#ifndef __ASM_RISCV_CHECKSUM_H
> > > +#define __ASM_RISCV_CHECKSUM_H
> > > +
> > > +#include <linux/in6.h>
> > > +#include <linux/uaccess.h>
> > > +
> > > +#ifdef CONFIG_32BIT
> > > +typedef unsigned int csum_t;
> > > +#else
> > > +typedef unsigned long csum_t;
> > > +#endif
> >
> > Hi Charlie,
> >
> > Isn't unsigned long already 32bit on 32bit RISC-V, so why is this #ifdef
> > needed?
> Oh, I wasn't sure so I ran sizeof(long) in qemu-system-riscv32 and it
> gave me 8 so assumed a long was 8 bytes. Do you think it would make what
> is going on more clear if I use u32 and u64 or would you recommend just
> using long?

Yeah, it doesn't seem like csum_t is used anywhere else, so I'd just use
unsigned long if all you want is a register sized unsigned value. It'll be more
familiar and easier to read for most people.

> >
> > > +
> > > +/*
> > > + *	Fold a partial checksum without adding pseudo headers
> > > + */
> > > +static inline __sum16 csum_fold(__wsum sum)
> > > +{
> > > +	return (~sum - ror32(sum, 16)) >> 16;
> > > +}
> > > +
> > > +#define csum_fold csum_fold
> > > +
> > > +/*
> > > + * Quickly compute an IP checksum with the assumption that IPv4 headers will
> > > + * always be in multiples of 32-bits, and have an ihl of at least 5.
> > > + * @ihl is the number of 32 bit segments and must be greater than or equal to 5.
> > > + * @iph is assumed to be word aligned.
> > > + */
> > > +static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
> > > +{
> > > +	csum_t csum = 0;
> > > +	int pos = 0;
> > > +
> > > +	do {
> > > +		csum += ((const unsigned int *)iph)[pos];
> > > +		if (IS_ENABLED(CONFIG_32BIT))
> > > +			csum += csum < ((const unsigned int *)iph)[pos];
> > > +	} while (++pos < ihl);
> > > +
> > > +	/*
> > > +	 * ZBB only saves three instructions on 32-bit and five on 64-bit so not
> > > +	 * worth checking if supported without Alternatives.
> > > +	 */
> > > +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZBB) &&
> > > +	    IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
> > > +		csum_t fold_temp;
> > > +
> > > +		asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
> > > +					      RISCV_ISA_EXT_ZBB, 1)
> > > +		    :
> > > +		    :
> > > +		    :
> > > +		    : no_zbb);
> > > +
> > > +		if (IS_ENABLED(CONFIG_32BIT)) {
> > > +			asm(".option push				\n\
> > > +			.option arch,+zbb				\n\
> > > +				not	%[fold_temp], %[csum]		\n\
> > > +				rori	%[csum], %[csum], 16		\n\
> > > +				sub	%[csum], %[fold_temp], %[csum]	\n\
> > > +			.option pop"
> > > +			: [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp));
> > > +		} else {
> > > +			asm(".option push				\n\
> > > +			.option arch,+zbb				\n\
> > > +				rori	%[fold_temp], %[csum], 32	\n\
> > > +				add	%[csum], %[fold_temp], %[csum]	\n\
> > > +				srli	%[csum], %[csum], 32		\n\
> > > +				not	%[fold_temp], %[csum]		\n\
> > > +				roriw	%[csum], %[csum], 16		\n\
> > > +				subw	%[csum], %[fold_temp], %[csum]	\n\
> > > +			.option pop"
> > > +			: [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp));
> > > +		}
> > > +		return csum >> 16;
> > > +	}
> > > +no_zbb:
> > > +#ifndef CONFIG_32BIT
> > > +		csum += (csum >> 32) | (csum << 32);
> > > +		csum >>= 32;
> >
> > The indentation seems off here.
> >
> > /Emil
> >
> > > +#endif
> > > +	return csum_fold((__force __wsum)csum);
> > > +}
> > > +
> > > +#define ip_fast_csum ip_fast_csum
> > > +
> > > +#include <asm-generic/checksum.h>
> > > +
> > > +#endif // __ASM_RISCV_CHECKSUM_H
> > >
> > > --
> > > 2.42.0
> > >
> > >
> > > _______________________________________________
> > > linux-riscv mailing list
> > > linux-riscv@lists.infradead.org
> > > http://lists.inradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 2/5] riscv: Add checksum library
  2023-09-13  8:47       ` David Laight
@ 2023-09-13 23:18         ` Charlie Jenkins
  2023-09-14  0:41           ` Charlie Jenkins
  0 siblings, 1 reply; 24+ messages in thread
From: Charlie Jenkins @ 2023-09-13 23:18 UTC (permalink / raw)
  To: David Laight
  Cc: Palmer Dabbelt, Conor Dooley, Samuel Holland, linux-riscv,
	linux-kernel, Paul Walmsley, Albert Ou

On Wed, Sep 13, 2023 at 08:47:49AM +0000, David Laight wrote:
> From: Charlie Jenkins
> > Sent: 13 September 2023 04:10
> > 
> > On Tue, Sep 12, 2023 at 08:45:38AM +0000, David Laight wrote:
> > > From: Charlie Jenkins
> > > > Sent: 11 September 2023 23:57
> > > >
> > > > Provide a 32 and 64 bit version of do_csum. When compiled for 32-bit
> > > > will load from the buffer in groups of 32 bits, and when compiled for
> > > > 64-bit will load in groups of 64 bits. Benchmarking by proxy compiling
> > > > csum_ipv6_magic (64-bit version) for an x86 chip as well as running
> > > > the riscv generated code in QEMU, discovered that summing in a
> > > > tree-like structure is about 4% faster than doing 64-bit reads.
> > > >
> > > ...
> > > > +	sum   = saddr->s6_addr32[0];
> > > > +	sum  += saddr->s6_addr32[1];
> > > > +	sum1  = saddr->s6_addr32[2];
> > > > +	sum1 += saddr->s6_addr32[3];
> > > > +
> > > > +	sum2  = daddr->s6_addr32[0];
> > > > +	sum2 += daddr->s6_addr32[1];
> > > > +	sum3  = daddr->s6_addr32[2];
> > > > +	sum3 += daddr->s6_addr32[3];
> > > > +
> > > > +	sum4  = csum;
> > > > +	sum4 += ulen;
> > > > +	sum4 += uproto;
> > > > +
> > > > +	sum  += sum1;
> > > > +	sum2 += sum3;
> > > > +
> > > > +	sum += sum2;
> > > > +	sum += sum4;
> > >
> > > Have you got gcc to compile that as-is?
> > >
> > > Whenever I've tried to get a 'tree add' compiled so that the
> > > early adds can be executed in parallel gcc always pessimises
> > > it to a linear sequence of adds.
> > >
> > > But I agree that adding 32bit values to a 64bit register
> > > may be no slower than trying to do an 'add carry' sequence
> > > that is guaranteed to only do one add/clock.
> > > (And on Intel cpu from core-2 until IIRC Haswell adc took 2 clocks!)
> > >
> > > IIRC RISCV doesn't have a carry flag, so the adc sequence
> > > is hard - probably takes two extra instructions per value.
> > > Although with parallel execute it may not matter.
> > > Consider:
> > > 	val = buf[offset];
> > > 	sum += val;
> > > 	carry += sum < val;
> > > 	val = buf[offset1];
> > > 	sum += val;
> > > 	...
> > > the compare and 'carry +=' can be executed at the same time
> > > as the following two instructions.
> > > You do then a final sum += carry; sum += sum < carry;
> > >
> > > Assuming all instructions are 1 clock and any read delays
> > > get filled with other instructions (by source or hardware
> > > instruction re-ordering) even without parallel execute
> > > that is 4 clocks for 64 bits, which is much the same as the
> > > 2 clocks for 32 bits.
> > >
> > > Remember that all the 32bit values can summed first as
> > > they won't overflow.
> > >
> > > 	David
> 
> > Yeah it does seem like the tree-add does just do a linear add. All three
> > of them were pretty much the same on riscv so I used the version that
> > did best on x86 with the knowledge that my QEMU setup does not
> > accurately represent real hardware.
> 
> The problem there is that any measurement on x86 has pretty much
> no relevance to what any RISCV cpu might do.
> The multiple execution units and out of order execution on x86
> are far different from anything any RISCV cpu is likely to have
> for many years.
> You might get nearer running on one of the Atom cpu - but it won't
> really match.
> There are too many fundamental differences between the architectures.
> 
> All you can do is to find and read the instruction timings for
> a target physical cpu and look for things like:
> - Whether arithmetic results are available next clock.
>   (It probably is)
> - How many clocks it takes for read data to be available.
>   I suspect the cpu will stall if the data is needed.
>   A block of sequential reads is one way to avoid the stall.
>   On x86 the instruction that needs the data is just deferred
>   until it is available, the following instructions execute
>   (provided their input are all available).
> - Clock delays for taken/not taken predicted/not predicted branches.
>   
> > I don't quite understand how doing the carry in the middle of each
> > stage, even though it can be executed at the same time, would be faster
> > than just doing a single overflow check at the end.
> 
> You need to do half as many reads and adds.
> 
I missed that you were talking about 64-bit reads. I was talking to
somebody about this a couple weeks ago and they mentioned a counter
example that showed that adding the carry after was not the same as
adding it in the middle. Even though addition is commutative, I wasn't
sure if the overflow checking was. I can't rememeber what the counter
example was, but I have a feeling it was flawed.
> > I can just revert
> > back to the non-tree add version since there is no improvement on riscv.
> 
> The 'tree' version is only likely to be faster on cpu (like x86)
> that can (at least sometimes) do two memory reads in one clock
> and can do two adds and two read in the same clock.
> Even then, without out of order execution, it is hard to get right.
> 
> Oh, you might want to try getting the default csum_fold() to
> be the faster 'arc' version rather than adding your own version.
> 
> 	David
> 
> > I can also revert back to the default version that uses carry += sum < val
> > as well.
> > 
> > - Charlie
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 2/5] riscv: Add checksum library
  2023-09-13 23:18         ` Charlie Jenkins
@ 2023-09-14  0:41           ` Charlie Jenkins
  0 siblings, 0 replies; 24+ messages in thread
From: Charlie Jenkins @ 2023-09-14  0:41 UTC (permalink / raw)
  To: David Laight
  Cc: Palmer Dabbelt, Conor Dooley, Samuel Holland, linux-riscv,
	linux-kernel, Paul Walmsley, Albert Ou

On Wed, Sep 13, 2023 at 07:18:18PM -0400, Charlie Jenkins wrote:
> On Wed, Sep 13, 2023 at 08:47:49AM +0000, David Laight wrote:
> > From: Charlie Jenkins
> > > Sent: 13 September 2023 04:10
> > > 
> > > On Tue, Sep 12, 2023 at 08:45:38AM +0000, David Laight wrote:
> > > > From: Charlie Jenkins
> > > > > Sent: 11 September 2023 23:57
> > > > >
> > > > > Provide a 32 and 64 bit version of do_csum. When compiled for 32-bit
> > > > > will load from the buffer in groups of 32 bits, and when compiled for
> > > > > 64-bit will load in groups of 64 bits. Benchmarking by proxy compiling
> > > > > csum_ipv6_magic (64-bit version) for an x86 chip as well as running
> > > > > the riscv generated code in QEMU, discovered that summing in a
> > > > > tree-like structure is about 4% faster than doing 64-bit reads.
> > > > >
> > > > ...
> > > > > +	sum   = saddr->s6_addr32[0];
> > > > > +	sum  += saddr->s6_addr32[1];
> > > > > +	sum1  = saddr->s6_addr32[2];
> > > > > +	sum1 += saddr->s6_addr32[3];
> > > > > +
> > > > > +	sum2  = daddr->s6_addr32[0];
> > > > > +	sum2 += daddr->s6_addr32[1];
> > > > > +	sum3  = daddr->s6_addr32[2];
> > > > > +	sum3 += daddr->s6_addr32[3];
> > > > > +
> > > > > +	sum4  = csum;
> > > > > +	sum4 += ulen;
> > > > > +	sum4 += uproto;
> > > > > +
> > > > > +	sum  += sum1;
> > > > > +	sum2 += sum3;
> > > > > +
> > > > > +	sum += sum2;
> > > > > +	sum += sum4;
> > > >
> > > > Have you got gcc to compile that as-is?
> > > >
> > > > Whenever I've tried to get a 'tree add' compiled so that the
> > > > early adds can be executed in parallel gcc always pessimises
> > > > it to a linear sequence of adds.
> > > >
> > > > But I agree that adding 32bit values to a 64bit register
> > > > may be no slower than trying to do an 'add carry' sequence
> > > > that is guaranteed to only do one add/clock.
> > > > (And on Intel cpu from core-2 until IIRC Haswell adc took 2 clocks!)
> > > >
> > > > IIRC RISCV doesn't have a carry flag, so the adc sequence
> > > > is hard - probably takes two extra instructions per value.
> > > > Although with parallel execute it may not matter.
> > > > Consider:
> > > > 	val = buf[offset];
> > > > 	sum += val;
> > > > 	carry += sum < val;
> > > > 	val = buf[offset1];
> > > > 	sum += val;
> > > > 	...
> > > > the compare and 'carry +=' can be executed at the same time
> > > > as the following two instructions.
> > > > You do then a final sum += carry; sum += sum < carry;
> > > >
> > > > Assuming all instructions are 1 clock and any read delays
> > > > get filled with other instructions (by source or hardware
> > > > instruction re-ordering) even without parallel execute
> > > > that is 4 clocks for 64 bits, which is much the same as the
> > > > 2 clocks for 32 bits.
> > > >
> > > > Remember that all the 32bit values can summed first as
> > > > they won't overflow.
> > > >
> > > > 	David
> > 
> > > Yeah it does seem like the tree-add does just do a linear add. All three
> > > of them were pretty much the same on riscv so I used the version that
> > > did best on x86 with the knowledge that my QEMU setup does not
> > > accurately represent real hardware.
> > 
> > The problem there is that any measurement on x86 has pretty much
> > no relevance to what any RISCV cpu might do.
> > The multiple execution units and out of order execution on x86
> > are far different from anything any RISCV cpu is likely to have
> > for many years.
> > You might get nearer running on one of the Atom cpu - but it won't
> > really match.
> > There are too many fundamental differences between the architectures.
> > 
> > All you can do is to find and read the instruction timings for
> > a target physical cpu and look for things like:
> > - Whether arithmetic results are available next clock.
> >   (It probably is)
> > - How many clocks it takes for read data to be available.
> >   I suspect the cpu will stall if the data is needed.
> >   A block of sequential reads is one way to avoid the stall.
> >   On x86 the instruction that needs the data is just deferred
> >   until it is available, the following instructions execute
> >   (provided their input are all available).
> > - Clock delays for taken/not taken predicted/not predicted branches.
> >   
> > > I don't quite understand how doing the carry in the middle of each
> > > stage, even though it can be executed at the same time, would be faster
> > > than just doing a single overflow check at the end.
> > 
> > You need to do half as many reads and adds.
> > 
> I missed that you were talking about 64-bit reads. I was talking to
> somebody about this a couple weeks ago and they mentioned a counter
> example that showed that adding the carry after was not the same as
> adding it in the middle. Even though addition is commutative, I wasn't
> sure if the overflow checking was. I can't rememeber what the counter
> example was, but I have a feeling it was flawed.

Sorry to double respond to this. It seems like it is the same. However
it seems like it is still slower. After cleaning up my benchmarking more,
it seems like the best way to go is to use the 32-bit adds and
accumulate the overflow in the upper 32 bits.

> > > I can just revert
> > > back to the non-tree add version since there is no improvement on riscv.
> > 
> > The 'tree' version is only likely to be faster on cpu (like x86)
> > that can (at least sometimes) do two memory reads in one clock
> > and can do two adds and two read in the same clock.
> > Even then, without out of order execution, it is hard to get right.
> > 
> > Oh, you might want to try getting the default csum_fold() to
> > be the faster 'arc' version rather than adding your own version.
I do like this idea. I can extract out the changes into the default
version.

- Charlie
> > 
> > 	David
> > 
> > > I can also revert back to the default version that uses carry += sum < val
> > > as well.
> > > 
> > > - Charlie
> > 
> > -
> > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> > Registration No: 1397386 (Wales)
> > 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 2/5] riscv: Add checksum library
  2023-09-11 22:57 ` [PATCH v4 2/5] riscv: Add checksum library Charlie Jenkins
  2023-09-12  8:45   ` David Laight
@ 2023-09-14 12:25   ` Conor Dooley
  2023-09-14 17:58     ` Charlie Jenkins
  1 sibling, 1 reply; 24+ messages in thread
From: Conor Dooley @ 2023-09-14 12:25 UTC (permalink / raw)
  To: Charlie Jenkins
  Cc: Palmer Dabbelt, Conor Dooley, Samuel Holland, David Laight,
	linux-riscv, linux-kernel, Paul Walmsley, Albert Ou

[-- Attachment #1: Type: text/plain, Size: 10953 bytes --]

On Mon, Sep 11, 2023 at 03:57:12PM -0700, Charlie Jenkins wrote:
> Provide a 32 and 64 bit version of do_csum. When compiled for 32-bit
> will load from the buffer in groups of 32 bits, and when compiled for
> 64-bit will load in groups of 64 bits. Benchmarking by proxy compiling
> csum_ipv6_magic (64-bit version) for an x86 chip as well as running
> the riscv generated code in QEMU, discovered that summing in a
> tree-like structure is about 4% faster than doing 64-bit reads.
> 
> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> ---
>  arch/riscv/include/asm/checksum.h |  11 ++
>  arch/riscv/lib/Makefile           |   1 +
>  arch/riscv/lib/csum.c             | 210 ++++++++++++++++++++++++++++++++++++++
>  3 files changed, 222 insertions(+)
> 
> diff --git a/arch/riscv/include/asm/checksum.h b/arch/riscv/include/asm/checksum.h
> index 0d7fc8275a5e..a09a4053fb87 100644
> --- a/arch/riscv/include/asm/checksum.h
> +++ b/arch/riscv/include/asm/checksum.h
> @@ -16,6 +16,14 @@ typedef unsigned int csum_t;
>  typedef unsigned long csum_t;
>  #endif
>  
> +/* Default version is sufficient for 32 bit */
> +#ifdef CONFIG_64BIT
> +#define _HAVE_ARCH_IPV6_CSUM
> +__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
> +			const struct in6_addr *daddr,
> +			__u32 len, __u8 proto, __wsum sum);
> +#endif
> +
>  /*
>   *	Fold a partial checksum without adding pseudo headers
>   */
> @@ -90,6 +98,9 @@ static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
>  
>  #define ip_fast_csum ip_fast_csum
>  
> +extern unsigned int do_csum(const unsigned char *buff, int len);
> +#define do_csum do_csum
> +
>  #include <asm-generic/checksum.h>
>  
>  #endif // __ASM_RISCV_CHECKSUM_H
> diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
> index 26cb2502ecf8..2aa1a4ad361f 100644
> --- a/arch/riscv/lib/Makefile
> +++ b/arch/riscv/lib/Makefile
> @@ -6,6 +6,7 @@ lib-y			+= memmove.o
>  lib-y			+= strcmp.o
>  lib-y			+= strlen.o
>  lib-y			+= strncmp.o
> +lib-y			+= csum.o
>  lib-$(CONFIG_MMU)	+= uaccess.o
>  lib-$(CONFIG_64BIT)	+= tishift.o
>  lib-$(CONFIG_RISCV_ISA_ZICBOZ)	+= clear_page.o
> diff --git a/arch/riscv/lib/csum.c b/arch/riscv/lib/csum.c
> new file mode 100644
> index 000000000000..47d98c51bab2
> --- /dev/null
> +++ b/arch/riscv/lib/csum.c
> @@ -0,0 +1,210 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * IP checksum library
> + *
> + * Influenced by arch/arm64/lib/csum.c
> + * Copyright (C) 2023 Rivos Inc.
> + */
> +#include <linux/bitops.h>
> +#include <linux/compiler.h>
> +#include <linux/kasan-checks.h>
> +#include <linux/kernel.h>
> +
> +#include <net/checksum.h>
> +
> +/* Default version is sufficient for 32 bit */
> +#ifndef CONFIG_32BIT
> +__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
> +			const struct in6_addr *daddr,
> +			__u32 len, __u8 proto, __wsum csum)
> +{
> +	/*
> +	 * Inform the compiler/processor that the operation we are performing is
> +	 * "Commutative and Associative" by summing parts of the checksum in a
> +	 * tree-like structure (Section 2(A) of "Computing the Internet
> +	 * Checksum"). Furthermore, defer the overflow until the end of the
> +	 * computation which is shown to be valid in Section 2(C)(1) of the
> +	 * same handbook.
> +	 */
> +	unsigned long sum, sum1, sum2, sum3, sum4, ulen, uproto;
> +
> +	uproto = htonl(proto);
> +	ulen = htonl(len);
> +
> +	sum   = saddr->s6_addr32[0];
> +	sum  += saddr->s6_addr32[1];
> +	sum1  = saddr->s6_addr32[2];
> +	sum1 += saddr->s6_addr32[3];
> +
> +	sum2  = daddr->s6_addr32[0];
> +	sum2 += daddr->s6_addr32[1];
> +	sum3  = daddr->s6_addr32[2];
> +	sum3 += daddr->s6_addr32[3];
> +
> +	sum4  = csum;
> +	sum4 += ulen;
> +	sum4 += uproto;
> +
> +	sum  += sum1;
> +	sum2 += sum3;
> +
> +	sum += sum2;
> +	sum += sum4;
> +
> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZBB) &&
> +	    IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
> +		csum_t fold_temp;
> +
> +		/*
> +		 * Zbb is likely available when the kernel is compiled with Zbb
> +		 * support, so nop when Zbb is available and jump when Zbb is
> +		 * not available.
> +		 */
> +		asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
> +					      RISCV_ISA_EXT_ZBB, 1)
> +				  :
> +				  :
> +				  :
> +				  : no_zbb);
> +		asm(".option push					\n\
> +		.option arch,+zbb					\n\
> +			rori	%[fold_temp], %[sum], 32		\n\
> +			add	%[sum], %[fold_temp], %[sum]		\n\
> +			srli	%[sum], %[sum], 32			\n\
> +			not	%[fold_temp], %[sum]			\n\
> +			roriw	%[sum], %[sum], 16			\n\
> +			subw	%[sum], %[fold_temp], %[sum]		\n\
> +		.option pop"
> +		: [sum] "+r" (sum), [fold_temp] "=&r" (fold_temp));
> +		return (__force __sum16)(sum >> 16);
> +	}
> +no_zbb:
> +	sum += (sum >> 32) | (sum << 32);
> +	sum >>= 32;
> +	return csum_fold((__force __wsum)sum);
> +}
> +EXPORT_SYMBOL(csum_ipv6_magic);
> +#endif // !CONFIG_32BIT
> +
> +#ifdef CONFIG_32BIT
> +#define OFFSET_MASK 3
> +#elif CONFIG_64BIT
> +#define OFFSET_MASK 7
> +#endif
> +
> +/*
> + * Perform a checksum on an arbitrary memory address.
> + * Algorithm accounts for buff being misaligned.
> + * If buff is not aligned, will over-read bytes but not use the bytes that it
> + * shouldn't. The same thing will occur on the tail-end of the read.
> + */
> +unsigned int __no_sanitize_address do_csum(const unsigned char *buff, int len)
> +{
> +	unsigned int offset, shift;
> +	csum_t csum, data;
> +	const csum_t *ptr;
> +
> +	if (unlikely(len <= 0))
> +		return 0;
> +	/*
> +	 * To align the address, grab the whole first byte in buff.
> +	 * Since it is inside of a same byte, it will never cross pages or cache
> +	 * lines.
> +	 * Directly call KASAN with the alignment we will be using.
> +	 */
> +	offset = (csum_t)buff & OFFSET_MASK;
> +	kasan_check_read(buff, len);
> +	ptr = (const csum_t *)(buff - offset);
> +	len = len + offset - sizeof(csum_t);
> +
> +	/*
> +	 * Clear the most signifant bits that were over-read if buff was not
> +	 * aligned.
> +	 */
> +	shift = offset * 8;
> +	data = *ptr;
> +#ifdef __LITTLE_ENDIAN
> +	data = (data >> shift) << shift;
> +#else
> +	data = (data << shift) >> shift;
> +#endif
> +	/*
> +	 * Do 32-bit reads on RV32 and 64-bit reads otherwise. This should be
> +	 * faster than doing 32-bit reads on architectures that support larger
> +	 * reads.
> +	 */
> +	while (len > 0) {
> +		csum += data;

arch/riscv/lib/csum.c:137:3: warning: variable 'csum' is uninitialized when used here [-Wuninitialized]
                csum += data;
                ^~~~
arch/riscv/lib/csum.c:104:13: note: initialize the variable 'csum' to silence this warning
        csum_t csum, data;
                   ^
                    = 0
> +		csum += csum < data;
> +		len -= sizeof(csum_t);
> +		ptr += 1;
> +		data = *ptr;
> +	}
> +
> +	/*
> +	 * Perform alignment (and over-read) bytes on the tail if any bytes
> +	 * leftover.
> +	 */
> +	shift = len * -8;
> +#ifdef __LITTLE_ENDIAN
> +	data = (data << shift) >> shift;
> +#else
> +	data = (data >> shift) << shift;
> +#endif
> +	csum += data;
> +	csum += csum < data;
> +
> +	if (!riscv_has_extension_likely(RISCV_ISA_EXT_ZBB))
> +		goto no_zbb;

I think this is missing a change for IS_ENABLED(CONFIG_RISCV_ISA_ZBB)?
arch/riscv/lib/csum.c:180:1: warning: unknown option, expected 'push', 'pop', 'rvc', 'norvc', 'relax' or 'norelax' [-Winline-asm]
                .option arch,+zbb                               \n\
^
<inline asm>:2:11: note: instantiated into assembly here
                .option arch,+zbb                               
                        ^
arch/riscv/lib/csum.c:181:1: error: instruction requires the following: 'Zbb' (Basic Bit-Manipulation) or 'Zbkb' (Bitmanip instructions for Cryptography)
                        rori    %[fold_temp], %[csum], 32       \n\
^
<inline asm>:3:4: note: instantiated into assembly here
                        rori    a2, a0, 32      
                        ^
arch/riscv/lib/csum.c:184:1: error: instruction requires the following: 'Zbb' (Basic Bit-Manipulation) or 'Zbkb' (Bitmanip instructions for Cryptography)
                        roriw   %[fold_temp], %[csum], 16       \n\
^
<inline asm>:6:4: note: instantiated into assembly here
                        roriw   a2, a0, 16      
                        ^
arch/riscv/lib/csum.c:188:1: error: instruction requires the following: 'Zbb' (Basic Bit-Manipulation) or 'Zbkb' (Bitmanip instructions for Cryptography)
                        rev8    %[csum], %[csum]                \n\
^
<inline asm>:10:4: note: instantiated into assembly here
                        rev8    a0, a0          
                        ^
2 warnings and 3 errors generated.

clang before 17 doesn't support `.option arch`, so the guard is required
around any code using it. You've got the guard on the other `.option
arch` user above, so that one seems to have escaped notice ;)

Going forward, it'd be good to test this stuff with non-latest clang to
make sure you appropriately consider the !Zbb cases.


> +
> +	unsigned int fold_temp;
> +
> +	if (IS_ENABLED(CONFIG_32BIT)) {
> +		asm_volatile_goto(".option push			\n\
> +		.option arch,+zbb				\n\
> +			rori	%[fold_temp], %[csum], 16	\n\
> +			andi	%[offset], %[offset], 1		\n\
> +			add	%[csum], %[fold_temp], %[csum]	\n\
> +			beq	%[offset], zero, %l[end]	\n\
> +			rev8	%[csum], %[csum]		\n\
> +			zext.h	%[csum], %[csum]		\n\
> +		.option pop"
> +			: [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp)
> +			: [offset] "r" (offset)
> +			:
> +			: end);
> +
> +		return csum;
> +	} else {
> +		asm_volatile_goto(".option push			\n\
> +		.option arch,+zbb				\n\
> +			rori	%[fold_temp], %[csum], 32	\n\
> +			add	%[csum], %[fold_temp], %[csum]	\n\
> +			srli	%[csum], %[csum], 32		\n\
> +			roriw	%[fold_temp], %[csum], 16	\n\
> +			addw	%[csum], %[fold_temp], %[csum]	\n\
> +			andi	%[offset], %[offset], 1		\n\
> +			beq	%[offset], zero, %l[end]	\n\
> +			rev8	%[csum], %[csum]		\n\
> +			srli	%[csum], %[csum], 32		\n\
> +			zext.h	%[csum], %[csum]		\n\
> +		.option pop"
> +			: [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp)
> +			: [offset] "r" (offset)
> +			:
> +			: end);
> +
> +		return csum;
> +	}
> +end:
> +		return csum >> 16;
> +no_zbb:
> +#ifndef CONFIG_32BIT

These can also be moved to IS_ENABLED() FYI, since there's no 32-bit
stuff here that'd break the build for 64-bit. Ditto elsewhere where
you've got similar stuff.

Cheers,
Conor.

> +		csum += (csum >> 32) | (csum << 32);
> +		csum >>= 32;
> +#endif
> +	csum = (unsigned int)csum + (((unsigned int)csum >> 16) | ((unsigned int)csum << 16));
> +	if (offset & 1)
> +		return (unsigned short)swab32(csum);
> +	return csum >> 16;
> +}
> 
> -- 
> 2.42.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 4/5] riscv: Vector checksum library
  2023-09-11 22:57 ` [PATCH v4 4/5] riscv: Vector checksum library Charlie Jenkins
@ 2023-09-14 12:46   ` Conor Dooley
  2023-09-14 16:14     ` Charlie Jenkins
  0 siblings, 1 reply; 24+ messages in thread
From: Conor Dooley @ 2023-09-14 12:46 UTC (permalink / raw)
  To: Charlie Jenkins
  Cc: Palmer Dabbelt, Conor Dooley, Samuel Holland, David Laight,
	linux-riscv, linux-kernel, Paul Walmsley, Albert Ou

[-- Attachment #1: Type: text/plain, Size: 1127 bytes --]

On Mon, Sep 11, 2023 at 03:57:14PM -0700, Charlie Jenkins wrote:
> This patch is not ready for merge as vector support in the kernel is
> limited. However, the code has been tested in QEMU so the algorithms
> do work. This code requires the kernel to be compiled with C vector
> support, but that is not yet possible. It is written in assembly
> rather than using the GCC vector instrinsics because they did not
> provide optimal code.
> 
> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> ---
>  arch/riscv/lib/csum.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 92 insertions(+)
> 
> diff --git a/arch/riscv/lib/csum.c b/arch/riscv/lib/csum.c
> index 47d98c51bab2..eb4596fc7f5b 100644
> --- a/arch/riscv/lib/csum.c
> +++ b/arch/riscv/lib/csum.c
> @@ -12,6 +12,10 @@
>  
>  #include <net/checksum.h>
>  
> +#ifdef CONFIG_RISCV_ISA_V
> +#include <riscv_vector.h>

What actually includes this header, I don't see it in either Andy's
in-kernel vector series or Bjorn's blake2 one.
Can you link to the pre-requisites in your cover letter please.

Thanks,
Conor.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 4/5] riscv: Vector checksum library
  2023-09-14 12:46   ` Conor Dooley
@ 2023-09-14 16:14     ` Charlie Jenkins
  2023-09-14 16:29       ` Conor Dooley
  0 siblings, 1 reply; 24+ messages in thread
From: Charlie Jenkins @ 2023-09-14 16:14 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Palmer Dabbelt, Conor Dooley, Samuel Holland, David Laight,
	linux-riscv, linux-kernel, Paul Walmsley, Albert Ou

On Thu, Sep 14, 2023 at 01:46:29PM +0100, Conor Dooley wrote:
> On Mon, Sep 11, 2023 at 03:57:14PM -0700, Charlie Jenkins wrote:
> > This patch is not ready for merge as vector support in the kernel is
> > limited. However, the code has been tested in QEMU so the algorithms
> > do work. This code requires the kernel to be compiled with C vector
> > support, but that is not yet possible. It is written in assembly
> > rather than using the GCC vector instrinsics because they did not
> > provide optimal code.
> > 
> > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > ---
> >  arch/riscv/lib/csum.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 92 insertions(+)
> > 
> > diff --git a/arch/riscv/lib/csum.c b/arch/riscv/lib/csum.c
> > index 47d98c51bab2..eb4596fc7f5b 100644
> > --- a/arch/riscv/lib/csum.c
> > +++ b/arch/riscv/lib/csum.c
> > @@ -12,6 +12,10 @@
> >  
> >  #include <net/checksum.h>
> >  
> > +#ifdef CONFIG_RISCV_ISA_V
> > +#include <riscv_vector.h>
> 
> What actually includes this header, I don't see it in either Andy's
> in-kernel vector series or Bjorn's blake2 one.
> Can you link to the pre-requisites in your cover letter please.
> 
> Thanks,
> Conor.

It is defined here:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/main/doc/rvv-intrinsic-spec.adoc.
The header is for the vector intrinsics that are supported by llvm and
gcc.

- Charlie


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 4/5] riscv: Vector checksum library
  2023-09-14 16:14     ` Charlie Jenkins
@ 2023-09-14 16:29       ` Conor Dooley
  2023-09-14 17:29         ` Charlie Jenkins
  0 siblings, 1 reply; 24+ messages in thread
From: Conor Dooley @ 2023-09-14 16:29 UTC (permalink / raw)
  To: Charlie Jenkins
  Cc: Conor Dooley, Palmer Dabbelt, Samuel Holland, David Laight,
	linux-riscv, linux-kernel, Paul Walmsley, Albert Ou

[-- Attachment #1: Type: text/plain, Size: 2038 bytes --]

On Thu, Sep 14, 2023 at 12:14:16PM -0400, Charlie Jenkins wrote:
> On Thu, Sep 14, 2023 at 01:46:29PM +0100, Conor Dooley wrote:
> > On Mon, Sep 11, 2023 at 03:57:14PM -0700, Charlie Jenkins wrote:
> > > This patch is not ready for merge as vector support in the kernel is
> > > limited. However, the code has been tested in QEMU so the algorithms
> > > do work. This code requires the kernel to be compiled with C vector
> > > support, but that is not yet possible. It is written in assembly
> > > rather than using the GCC vector instrinsics because they did not
> > > provide optimal code.
> > > 
> > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > > ---
> > >  arch/riscv/lib/csum.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 92 insertions(+)
> > > 
> > > diff --git a/arch/riscv/lib/csum.c b/arch/riscv/lib/csum.c
> > > index 47d98c51bab2..eb4596fc7f5b 100644
> > > --- a/arch/riscv/lib/csum.c
> > > +++ b/arch/riscv/lib/csum.c
> > > @@ -12,6 +12,10 @@
> > >  
> > >  #include <net/checksum.h>
> > >  
> > > +#ifdef CONFIG_RISCV_ISA_V
> > > +#include <riscv_vector.h>
> > 
> > What actually includes this header, I don't see it in either Andy's
> > in-kernel vector series or Bjorn's blake2 one.
> > Can you link to the pre-requisites in your cover letter please.
> > 
> > Thanks,
> > Conor.
> 
> It is defined here:
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/main/doc/rvv-intrinsic-spec.adoc.
> The header is for the vector intrinsics that are supported by llvm and
> gcc.

Well, whatever you're doing with it does not work, producing 3600 or so
fatal errors during compilation, all saying:
../arch/riscv/include/asm/checksum.h:14:10: fatal error: riscv_vector.h: No such file or directory

Do you have some makefile hack somewhere that's not part of this
patchset? Also, I'm dumb, but can you show me where are the actual
intrinsics are being used in this patch anyway? I just seem some
types & asm.

Thanks,
Conor.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 4/5] riscv: Vector checksum library
  2023-09-14 16:29       ` Conor Dooley
@ 2023-09-14 17:29         ` Charlie Jenkins
  2023-09-14 17:36           ` Conor Dooley
  0 siblings, 1 reply; 24+ messages in thread
From: Charlie Jenkins @ 2023-09-14 17:29 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Conor Dooley, Palmer Dabbelt, Samuel Holland, David Laight,
	linux-riscv, linux-kernel, Paul Walmsley, Albert Ou

On Thu, Sep 14, 2023 at 05:29:29PM +0100, Conor Dooley wrote:
> On Thu, Sep 14, 2023 at 12:14:16PM -0400, Charlie Jenkins wrote:
> > On Thu, Sep 14, 2023 at 01:46:29PM +0100, Conor Dooley wrote:
> > > On Mon, Sep 11, 2023 at 03:57:14PM -0700, Charlie Jenkins wrote:
> > > > This patch is not ready for merge as vector support in the kernel is
> > > > limited. However, the code has been tested in QEMU so the algorithms
> > > > do work. This code requires the kernel to be compiled with C vector
> > > > support, but that is not yet possible. It is written in assembly
> > > > rather than using the GCC vector instrinsics because they did not
> > > > provide optimal code.
> > > > 
> > > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > > > ---
> > > >  arch/riscv/lib/csum.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 92 insertions(+)
> > > > 
> > > > diff --git a/arch/riscv/lib/csum.c b/arch/riscv/lib/csum.c
> > > > index 47d98c51bab2..eb4596fc7f5b 100644
> > > > --- a/arch/riscv/lib/csum.c
> > > > +++ b/arch/riscv/lib/csum.c
> > > > @@ -12,6 +12,10 @@
> > > >  
> > > >  #include <net/checksum.h>
> > > >  
> > > > +#ifdef CONFIG_RISCV_ISA_V
> > > > +#include <riscv_vector.h>
> > > 
> > > What actually includes this header, I don't see it in either Andy's
> > > in-kernel vector series or Bjorn's blake2 one.
> > > Can you link to the pre-requisites in your cover letter please.
> > > 
> > > Thanks,
> > > Conor.
> > 
> > It is defined here:
> > https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/main/doc/rvv-intrinsic-spec.adoc.
> > The header is for the vector intrinsics that are supported by llvm and
> > gcc.
> 
> Well, whatever you're doing with it does not work, producing 3600 or so
> fatal errors during compilation, all saying:
> ../arch/riscv/include/asm/checksum.h:14:10: fatal error: riscv_vector.h: No such file or directory
> 
> Do you have some makefile hack somewhere that's not part of this
> patchset? Also, I'm dumb, but can you show me where are the actual
> intrinsics are being used in this patch anyway? I just seem some
> types & asm.
> 
> Thanks,
> Conor.
> 

Intrinsics are needed for the vector types. Vector types are needed to
get the inline asm to select vector registers at compile time. I could
manually select vector registers to use but that is not ideal. In order
to get this to work, vector has to be enabled in the compiler. This
patch will not compile right now, but since people are working on vector
I was hoping that it would be possible in the future. Palmer recommended
that I just put up this patch for now since I had the code, but only the
non-vector versions should be candidates for release for now.

- Charlie


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 4/5] riscv: Vector checksum library
  2023-09-14 17:29         ` Charlie Jenkins
@ 2023-09-14 17:36           ` Conor Dooley
  2023-09-14 20:59             ` Charlie Jenkins
  0 siblings, 1 reply; 24+ messages in thread
From: Conor Dooley @ 2023-09-14 17:36 UTC (permalink / raw)
  To: Charlie Jenkins
  Cc: Conor Dooley, Palmer Dabbelt, Samuel Holland, David Laight,
	linux-riscv, linux-kernel, Paul Walmsley, Albert Ou

[-- Attachment #1: Type: text/plain, Size: 3352 bytes --]

On Thu, Sep 14, 2023 at 01:29:18PM -0400, Charlie Jenkins wrote:
> On Thu, Sep 14, 2023 at 05:29:29PM +0100, Conor Dooley wrote:
> > On Thu, Sep 14, 2023 at 12:14:16PM -0400, Charlie Jenkins wrote:
> > > On Thu, Sep 14, 2023 at 01:46:29PM +0100, Conor Dooley wrote:
> > > > On Mon, Sep 11, 2023 at 03:57:14PM -0700, Charlie Jenkins wrote:
> > > > > This patch is not ready for merge as vector support in the kernel is
> > > > > limited. However, the code has been tested in QEMU so the algorithms
> > > > > do work. This code requires the kernel to be compiled with C vector
> > > > > support, but that is not yet possible. It is written in assembly
> > > > > rather than using the GCC vector instrinsics because they did not
> > > > > provide optimal code.
> > > > > 
> > > > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > > > > ---
> > > > >  arch/riscv/lib/csum.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > >  1 file changed, 92 insertions(+)
> > > > > 
> > > > > diff --git a/arch/riscv/lib/csum.c b/arch/riscv/lib/csum.c
> > > > > index 47d98c51bab2..eb4596fc7f5b 100644
> > > > > --- a/arch/riscv/lib/csum.c
> > > > > +++ b/arch/riscv/lib/csum.c
> > > > > @@ -12,6 +12,10 @@
> > > > >  
> > > > >  #include <net/checksum.h>
> > > > >  
> > > > > +#ifdef CONFIG_RISCV_ISA_V
> > > > > +#include <riscv_vector.h>
> > > > 
> > > > What actually includes this header, I don't see it in either Andy's
> > > > in-kernel vector series or Bjorn's blake2 one.
> > > > Can you link to the pre-requisites in your cover letter please.
> > > > 
> > > > Thanks,
> > > > Conor.
> > > 
> > > It is defined here:
> > > https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/main/doc/rvv-intrinsic-spec.adoc.
> > > The header is for the vector intrinsics that are supported by llvm and
> > > gcc.
> > 
> > Well, whatever you're doing with it does not work, producing 3600 or so
> > fatal errors during compilation, all saying:
> > ../arch/riscv/include/asm/checksum.h:14:10: fatal error: riscv_vector.h: No such file or directory
> > 
> > Do you have some makefile hack somewhere that's not part of this
> > patchset? Also, I'm dumb, but can you show me where are the actual
> > intrinsics are being used in this patch anyway? I just seem some
> > types & asm.
> > 
> > Thanks,
> > Conor.
> > 
> 
> Intrinsics are needed for the vector types. Vector types are needed to
> get the inline asm to select vector registers at compile time. I could
> manually select vector registers to use but that is not ideal. In order
> to get this to work, vector has to be enabled in the compiler. This
> patch will not compile right now, but since people are working on vector
> I was hoping that it would be possible in the future. Palmer recommended
> that I just put up this patch for now since I had the code, but only the
> non-vector versions should be candidates for release for now.

I see. I was pretty unclear to me anyway what the craic was, you should
probably note that the build failures from here onwards are
known-broken. If you want that header, I guess you probably need to
have v set in -march?
If so, the in-kernel vector patches that have been posted do not do that.
Oh-so-far from an expert on what is a safe way to do these kinda things
though, sadly.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 2/5] riscv: Add checksum library
  2023-09-14 12:25   ` Conor Dooley
@ 2023-09-14 17:58     ` Charlie Jenkins
  2023-09-14 18:02       ` Conor Dooley
  0 siblings, 1 reply; 24+ messages in thread
From: Charlie Jenkins @ 2023-09-14 17:58 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Palmer Dabbelt, Conor Dooley, Samuel Holland, David Laight,
	linux-riscv, linux-kernel, Paul Walmsley, Albert Ou

On Thu, Sep 14, 2023 at 01:25:23PM +0100, Conor Dooley wrote:
> On Mon, Sep 11, 2023 at 03:57:12PM -0700, Charlie Jenkins wrote:
> > Provide a 32 and 64 bit version of do_csum. When compiled for 32-bit
> > will load from the buffer in groups of 32 bits, and when compiled for
> > 64-bit will load in groups of 64 bits. Benchmarking by proxy compiling
> > csum_ipv6_magic (64-bit version) for an x86 chip as well as running
> > the riscv generated code in QEMU, discovered that summing in a
> > tree-like structure is about 4% faster than doing 64-bit reads.
> > 
> > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > ---
> >  arch/riscv/include/asm/checksum.h |  11 ++
> >  arch/riscv/lib/Makefile           |   1 +
> >  arch/riscv/lib/csum.c             | 210 ++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 222 insertions(+)
> > 
> > diff --git a/arch/riscv/include/asm/checksum.h b/arch/riscv/include/asm/checksum.h
> > index 0d7fc8275a5e..a09a4053fb87 100644
> > --- a/arch/riscv/include/asm/checksum.h
> > +++ b/arch/riscv/include/asm/checksum.h
> > @@ -16,6 +16,14 @@ typedef unsigned int csum_t;
> >  typedef unsigned long csum_t;
> >  #endif
> >  
> > +/* Default version is sufficient for 32 bit */
> > +#ifdef CONFIG_64BIT
> > +#define _HAVE_ARCH_IPV6_CSUM
> > +__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
> > +			const struct in6_addr *daddr,
> > +			__u32 len, __u8 proto, __wsum sum);
> > +#endif
> > +
> >  /*
> >   *	Fold a partial checksum without adding pseudo headers
> >   */
> > @@ -90,6 +98,9 @@ static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
> >  
> >  #define ip_fast_csum ip_fast_csum
> >  
> > +extern unsigned int do_csum(const unsigned char *buff, int len);
> > +#define do_csum do_csum
> > +
> >  #include <asm-generic/checksum.h>
> >  
> >  #endif // __ASM_RISCV_CHECKSUM_H
> > diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
> > index 26cb2502ecf8..2aa1a4ad361f 100644
> > --- a/arch/riscv/lib/Makefile
> > +++ b/arch/riscv/lib/Makefile
> > @@ -6,6 +6,7 @@ lib-y			+= memmove.o
> >  lib-y			+= strcmp.o
> >  lib-y			+= strlen.o
> >  lib-y			+= strncmp.o
> > +lib-y			+= csum.o
> >  lib-$(CONFIG_MMU)	+= uaccess.o
> >  lib-$(CONFIG_64BIT)	+= tishift.o
> >  lib-$(CONFIG_RISCV_ISA_ZICBOZ)	+= clear_page.o
> > diff --git a/arch/riscv/lib/csum.c b/arch/riscv/lib/csum.c
> > new file mode 100644
> > index 000000000000..47d98c51bab2
> > --- /dev/null
> > +++ b/arch/riscv/lib/csum.c
> > @@ -0,0 +1,210 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * IP checksum library
> > + *
> > + * Influenced by arch/arm64/lib/csum.c
> > + * Copyright (C) 2023 Rivos Inc.
> > + */
> > +#include <linux/bitops.h>
> > +#include <linux/compiler.h>
> > +#include <linux/kasan-checks.h>
> > +#include <linux/kernel.h>
> > +
> > +#include <net/checksum.h>
> > +
> > +/* Default version is sufficient for 32 bit */
> > +#ifndef CONFIG_32BIT
> > +__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
> > +			const struct in6_addr *daddr,
> > +			__u32 len, __u8 proto, __wsum csum)
> > +{
> > +	/*
> > +	 * Inform the compiler/processor that the operation we are performing is
> > +	 * "Commutative and Associative" by summing parts of the checksum in a
> > +	 * tree-like structure (Section 2(A) of "Computing the Internet
> > +	 * Checksum"). Furthermore, defer the overflow until the end of the
> > +	 * computation which is shown to be valid in Section 2(C)(1) of the
> > +	 * same handbook.
> > +	 */
> > +	unsigned long sum, sum1, sum2, sum3, sum4, ulen, uproto;
> > +
> > +	uproto = htonl(proto);
> > +	ulen = htonl(len);
> > +
> > +	sum   = saddr->s6_addr32[0];
> > +	sum  += saddr->s6_addr32[1];
> > +	sum1  = saddr->s6_addr32[2];
> > +	sum1 += saddr->s6_addr32[3];
> > +
> > +	sum2  = daddr->s6_addr32[0];
> > +	sum2 += daddr->s6_addr32[1];
> > +	sum3  = daddr->s6_addr32[2];
> > +	sum3 += daddr->s6_addr32[3];
> > +
> > +	sum4  = csum;
> > +	sum4 += ulen;
> > +	sum4 += uproto;
> > +
> > +	sum  += sum1;
> > +	sum2 += sum3;
> > +
> > +	sum += sum2;
> > +	sum += sum4;
> > +
> > +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZBB) &&
> > +	    IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
> > +		csum_t fold_temp;
> > +
> > +		/*
> > +		 * Zbb is likely available when the kernel is compiled with Zbb
> > +		 * support, so nop when Zbb is available and jump when Zbb is
> > +		 * not available.
> > +		 */
> > +		asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
> > +					      RISCV_ISA_EXT_ZBB, 1)
> > +				  :
> > +				  :
> > +				  :
> > +				  : no_zbb);
> > +		asm(".option push					\n\
> > +		.option arch,+zbb					\n\
> > +			rori	%[fold_temp], %[sum], 32		\n\
> > +			add	%[sum], %[fold_temp], %[sum]		\n\
> > +			srli	%[sum], %[sum], 32			\n\
> > +			not	%[fold_temp], %[sum]			\n\
> > +			roriw	%[sum], %[sum], 16			\n\
> > +			subw	%[sum], %[fold_temp], %[sum]		\n\
> > +		.option pop"
> > +		: [sum] "+r" (sum), [fold_temp] "=&r" (fold_temp));
> > +		return (__force __sum16)(sum >> 16);
> > +	}
> > +no_zbb:
> > +	sum += (sum >> 32) | (sum << 32);
> > +	sum >>= 32;
> > +	return csum_fold((__force __wsum)sum);
> > +}
> > +EXPORT_SYMBOL(csum_ipv6_magic);
> > +#endif // !CONFIG_32BIT
> > +
> > +#ifdef CONFIG_32BIT
> > +#define OFFSET_MASK 3
> > +#elif CONFIG_64BIT
> > +#define OFFSET_MASK 7
> > +#endif
> > +
> > +/*
> > + * Perform a checksum on an arbitrary memory address.
> > + * Algorithm accounts for buff being misaligned.
> > + * If buff is not aligned, will over-read bytes but not use the bytes that it
> > + * shouldn't. The same thing will occur on the tail-end of the read.
> > + */
> > +unsigned int __no_sanitize_address do_csum(const unsigned char *buff, int len)
> > +{
> > +	unsigned int offset, shift;
> > +	csum_t csum, data;
> > +	const csum_t *ptr;
> > +
> > +	if (unlikely(len <= 0))
> > +		return 0;
> > +	/*
> > +	 * To align the address, grab the whole first byte in buff.
> > +	 * Since it is inside of a same byte, it will never cross pages or cache
> > +	 * lines.
> > +	 * Directly call KASAN with the alignment we will be using.
> > +	 */
> > +	offset = (csum_t)buff & OFFSET_MASK;
> > +	kasan_check_read(buff, len);
> > +	ptr = (const csum_t *)(buff - offset);
> > +	len = len + offset - sizeof(csum_t);
> > +
> > +	/*
> > +	 * Clear the most signifant bits that were over-read if buff was not
> > +	 * aligned.
> > +	 */
> > +	shift = offset * 8;
> > +	data = *ptr;
> > +#ifdef __LITTLE_ENDIAN
> > +	data = (data >> shift) << shift;
> > +#else
> > +	data = (data << shift) >> shift;
> > +#endif
> > +	/*
> > +	 * Do 32-bit reads on RV32 and 64-bit reads otherwise. This should be
> > +	 * faster than doing 32-bit reads on architectures that support larger
> > +	 * reads.
> > +	 */
> > +	while (len > 0) {
> > +		csum += data;
> 
> arch/riscv/lib/csum.c:137:3: warning: variable 'csum' is uninitialized when used here [-Wuninitialized]
>                 csum += data;
>                 ^~~~
> arch/riscv/lib/csum.c:104:13: note: initialize the variable 'csum' to silence this warning
>         csum_t csum, data;
>                    ^
>                     = 0
> > +		csum += csum < data;
> > +		len -= sizeof(csum_t);
> > +		ptr += 1;
> > +		data = *ptr;
> > +	}
> > +
> > +	/*
> > +	 * Perform alignment (and over-read) bytes on the tail if any bytes
> > +	 * leftover.
> > +	 */
> > +	shift = len * -8;
> > +#ifdef __LITTLE_ENDIAN
> > +	data = (data << shift) >> shift;
> > +#else
> > +	data = (data >> shift) << shift;
> > +#endif
> > +	csum += data;
> > +	csum += csum < data;
> > +
> > +	if (!riscv_has_extension_likely(RISCV_ISA_EXT_ZBB))
> > +		goto no_zbb;
> 
> I think this is missing a change for IS_ENABLED(CONFIG_RISCV_ISA_ZBB)?
> arch/riscv/lib/csum.c:180:1: warning: unknown option, expected 'push', 'pop', 'rvc', 'norvc', 'relax' or 'norelax' [-Winline-asm]
>                 .option arch,+zbb                               \n\
> ^
> <inline asm>:2:11: note: instantiated into assembly here
>                 .option arch,+zbb                               
>                         ^
> arch/riscv/lib/csum.c:181:1: error: instruction requires the following: 'Zbb' (Basic Bit-Manipulation) or 'Zbkb' (Bitmanip instructions for Cryptography)
>                         rori    %[fold_temp], %[csum], 32       \n\
> ^
> <inline asm>:3:4: note: instantiated into assembly here
>                         rori    a2, a0, 32      
>                         ^
> arch/riscv/lib/csum.c:184:1: error: instruction requires the following: 'Zbb' (Basic Bit-Manipulation) or 'Zbkb' (Bitmanip instructions for Cryptography)
>                         roriw   %[fold_temp], %[csum], 16       \n\
> ^
> <inline asm>:6:4: note: instantiated into assembly here
>                         roriw   a2, a0, 16      
>                         ^
> arch/riscv/lib/csum.c:188:1: error: instruction requires the following: 'Zbb' (Basic Bit-Manipulation) or 'Zbkb' (Bitmanip instructions for Cryptography)
>                         rev8    %[csum], %[csum]                \n\
> ^
> <inline asm>:10:4: note: instantiated into assembly here
>                         rev8    a0, a0          
>                         ^
> 2 warnings and 3 errors generated.
> 
> clang before 17 doesn't support `.option arch`, so the guard is required
> around any code using it. You've got the guard on the other `.option
> arch` user above, so that one seems to have escaped notice ;)
> 
> Going forward, it'd be good to test this stuff with non-latest clang to
> make sure you appropriately consider the !Zbb cases.
> 
Yes this was an oversight to drop the guard.
> 
> > +
> > +	unsigned int fold_temp;
> > +
> > +	if (IS_ENABLED(CONFIG_32BIT)) {
> > +		asm_volatile_goto(".option push			\n\
> > +		.option arch,+zbb				\n\
> > +			rori	%[fold_temp], %[csum], 16	\n\
> > +			andi	%[offset], %[offset], 1		\n\
> > +			add	%[csum], %[fold_temp], %[csum]	\n\
> > +			beq	%[offset], zero, %l[end]	\n\
> > +			rev8	%[csum], %[csum]		\n\
> > +			zext.h	%[csum], %[csum]		\n\
> > +		.option pop"
> > +			: [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp)
> > +			: [offset] "r" (offset)
> > +			:
> > +			: end);
> > +
> > +		return csum;
> > +	} else {
> > +		asm_volatile_goto(".option push			\n\
> > +		.option arch,+zbb				\n\
> > +			rori	%[fold_temp], %[csum], 32	\n\
> > +			add	%[csum], %[fold_temp], %[csum]	\n\
> > +			srli	%[csum], %[csum], 32		\n\
> > +			roriw	%[fold_temp], %[csum], 16	\n\
> > +			addw	%[csum], %[fold_temp], %[csum]	\n\
> > +			andi	%[offset], %[offset], 1		\n\
> > +			beq	%[offset], zero, %l[end]	\n\
> > +			rev8	%[csum], %[csum]		\n\
> > +			srli	%[csum], %[csum], 32		\n\
> > +			zext.h	%[csum], %[csum]		\n\
> > +		.option pop"
> > +			: [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp)
> > +			: [offset] "r" (offset)
> > +			:
> > +			: end);
> > +
> > +		return csum;
> > +	}
> > +end:
> > +		return csum >> 16;
> > +no_zbb:
> > +#ifndef CONFIG_32BIT
> 
> These can also be moved to IS_ENABLED() FYI, since there's no 32-bit
> stuff here that'd break the build for 64-bit. Ditto elsewhere where
> you've got similar stuff.
> 
> Cheers,
> Conor.
This is an ifndef, so 32-bit compilation would throw a warning about
shifting by 32 bits if IS_ENABLED was used instead.

- Charlie
> 
> > +		csum += (csum >> 32) | (csum << 32);
> > +		csum >>= 32;
> > +#endif
> > +	csum = (unsigned int)csum + (((unsigned int)csum >> 16) | ((unsigned int)csum << 16));
> > +	if (offset & 1)
> > +		return (unsigned short)swab32(csum);
> > +	return csum >> 16;
> > +}
> > 
> > -- 
> > 2.42.0
> > 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 2/5] riscv: Add checksum library
  2023-09-14 17:58     ` Charlie Jenkins
@ 2023-09-14 18:02       ` Conor Dooley
  2023-09-14 23:30         ` Charlie Jenkins
  0 siblings, 1 reply; 24+ messages in thread
From: Conor Dooley @ 2023-09-14 18:02 UTC (permalink / raw)
  To: Charlie Jenkins
  Cc: Conor Dooley, Palmer Dabbelt, Samuel Holland, David Laight,
	linux-riscv, linux-kernel, Paul Walmsley, Albert Ou

[-- Attachment #1: Type: text/plain, Size: 541 bytes --]

> > > +#ifndef CONFIG_32BIT
> > 
> > These can also be moved to IS_ENABLED() FYI, since there's no 32-bit
> > stuff here that'd break the build for 64-bit. Ditto elsewhere where
> > you've got similar stuff.
> > 
> > Cheers,
> > Conor.
> This is an ifndef, so 32-bit compilation would throw a warning about
> shifting by 32 bits if IS_ENABLED was used instead.

 Fair enough. I did accidentally invert things in my mail, I did notice
 the n, I just thought it did the elimination beforehand those checks,
 sorry for the noise.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 4/5] riscv: Vector checksum library
  2023-09-14 17:36           ` Conor Dooley
@ 2023-09-14 20:59             ` Charlie Jenkins
  0 siblings, 0 replies; 24+ messages in thread
From: Charlie Jenkins @ 2023-09-14 20:59 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Conor Dooley, Palmer Dabbelt, Samuel Holland, David Laight,
	linux-riscv, linux-kernel, Paul Walmsley, Albert Ou

On Thu, Sep 14, 2023 at 06:36:42PM +0100, Conor Dooley wrote:
> On Thu, Sep 14, 2023 at 01:29:18PM -0400, Charlie Jenkins wrote:
> > On Thu, Sep 14, 2023 at 05:29:29PM +0100, Conor Dooley wrote:
> > > On Thu, Sep 14, 2023 at 12:14:16PM -0400, Charlie Jenkins wrote:
> > > > On Thu, Sep 14, 2023 at 01:46:29PM +0100, Conor Dooley wrote:
> > > > > On Mon, Sep 11, 2023 at 03:57:14PM -0700, Charlie Jenkins wrote:
> > > > > > This patch is not ready for merge as vector support in the kernel is
> > > > > > limited. However, the code has been tested in QEMU so the algorithms
> > > > > > do work. This code requires the kernel to be compiled with C vector
> > > > > > support, but that is not yet possible. It is written in assembly
> > > > > > rather than using the GCC vector instrinsics because they did not
> > > > > > provide optimal code.
> > > > > > 
> > > > > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > > > > > ---
> > > > > >  arch/riscv/lib/csum.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > >  1 file changed, 92 insertions(+)
> > > > > > 
> > > > > > diff --git a/arch/riscv/lib/csum.c b/arch/riscv/lib/csum.c
> > > > > > index 47d98c51bab2..eb4596fc7f5b 100644
> > > > > > --- a/arch/riscv/lib/csum.c
> > > > > > +++ b/arch/riscv/lib/csum.c
> > > > > > @@ -12,6 +12,10 @@
> > > > > >  
> > > > > >  #include <net/checksum.h>
> > > > > >  
> > > > > > +#ifdef CONFIG_RISCV_ISA_V
> > > > > > +#include <riscv_vector.h>
> > > > > 
> > > > > What actually includes this header, I don't see it in either Andy's
> > > > > in-kernel vector series or Bjorn's blake2 one.
> > > > > Can you link to the pre-requisites in your cover letter please.
> > > > > 
> > > > > Thanks,
> > > > > Conor.
> > > > 
> > > > It is defined here:
> > > > https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/main/doc/rvv-intrinsic-spec.adoc.
> > > > The header is for the vector intrinsics that are supported by llvm and
> > > > gcc.
> > > 
> > > Well, whatever you're doing with it does not work, producing 3600 or so
> > > fatal errors during compilation, all saying:
> > > ../arch/riscv/include/asm/checksum.h:14:10: fatal error: riscv_vector.h: No such file or directory
> > > 
> > > Do you have some makefile hack somewhere that's not part of this
> > > patchset? Also, I'm dumb, but can you show me where are the actual
> > > intrinsics are being used in this patch anyway? I just seem some
> > > types & asm.
> > > 
> > > Thanks,
> > > Conor.
> > > 
> > 
> > Intrinsics are needed for the vector types. Vector types are needed to
> > get the inline asm to select vector registers at compile time. I could
> > manually select vector registers to use but that is not ideal. In order
> > to get this to work, vector has to be enabled in the compiler. This
> > patch will not compile right now, but since people are working on vector
> > I was hoping that it would be possible in the future. Palmer recommended
> > that I just put up this patch for now since I had the code, but only the
> > non-vector versions should be candidates for release for now.
> 
> I see. I was pretty unclear to me anyway what the craic was, you should
> probably note that the build failures from here onwards are
> known-broken. If you want that header, I guess you probably need to
> have v set in -march?
> If so, the in-kernel vector patches that have been posted do not do that.
> Oh-so-far from an expert on what is a safe way to do these kinda things
> though, sadly.

It seems like more than just enabling v in the march will need to be
done. Because linux uses -nostdinc the header file won't be included.
After doing some research it also seems like llvm and gcc do not share
inline asm constraints. llvm is missing "vd" to specify a register that
is not a mask register. I think I will drop these vector patches for
now since there seems to be more work than I expected to get this
functional.

- Charlie

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 2/5] riscv: Add checksum library
  2023-09-14 18:02       ` Conor Dooley
@ 2023-09-14 23:30         ` Charlie Jenkins
  0 siblings, 0 replies; 24+ messages in thread
From: Charlie Jenkins @ 2023-09-14 23:30 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Conor Dooley, Palmer Dabbelt, Samuel Holland, David Laight,
	linux-riscv, linux-kernel, Paul Walmsley, Albert Ou

On Thu, Sep 14, 2023 at 07:02:29PM +0100, Conor Dooley wrote:
> > > > +#ifndef CONFIG_32BIT
> > > 
> > > These can also be moved to IS_ENABLED() FYI, since there's no 32-bit
> > > stuff here that'd break the build for 64-bit. Ditto elsewhere where
> > > you've got similar stuff.
> > > 
> > > Cheers,
> > > Conor.
> > This is an ifndef, so 32-bit compilation would throw a warning about
> > shifting by 32 bits if IS_ENABLED was used instead.
> 
>  Fair enough. I did accidentally invert things in my mail, I did notice
>  the n, I just thought it did the elimination beforehand those checks,
>  sorry for the noise.

It appears that LLVM is smart enough to not attempt to compile code that
will never execute but GCC is not. Pretty interesting because it allows the
".option arch" code to be encased in IS_ENABLED because it is only not
supported on LLVM, but the shifting code needs to be in ifdef because
otherwise gcc will complain.


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-09-14 23:31 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-11 22:57 [PATCH v4 0/5] riscv: Add fine-tuned checksum functions Charlie Jenkins
2023-09-11 22:57 ` [PATCH v4 1/5] riscv: Checksum header Charlie Jenkins
2023-09-12 10:24   ` Emil Renner Berthing
2023-09-13  2:38     ` Charlie Jenkins
2023-09-13  9:19       ` Emil Renner Berthing
2023-09-11 22:57 ` [PATCH v4 2/5] riscv: Add checksum library Charlie Jenkins
2023-09-12  8:45   ` David Laight
2023-09-13  3:09     ` Charlie Jenkins
2023-09-13  8:47       ` David Laight
2023-09-13 23:18         ` Charlie Jenkins
2023-09-14  0:41           ` Charlie Jenkins
2023-09-14 12:25   ` Conor Dooley
2023-09-14 17:58     ` Charlie Jenkins
2023-09-14 18:02       ` Conor Dooley
2023-09-14 23:30         ` Charlie Jenkins
2023-09-11 22:57 ` [PATCH v4 3/5] riscv: Vector checksum header Charlie Jenkins
2023-09-11 22:57 ` [PATCH v4 4/5] riscv: Vector checksum library Charlie Jenkins
2023-09-14 12:46   ` Conor Dooley
2023-09-14 16:14     ` Charlie Jenkins
2023-09-14 16:29       ` Conor Dooley
2023-09-14 17:29         ` Charlie Jenkins
2023-09-14 17:36           ` Conor Dooley
2023-09-14 20:59             ` Charlie Jenkins
2023-09-11 22:57 ` [PATCH v4 5/5] riscv: Test checksum functions Charlie Jenkins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).