linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v15 0/5] riscv: Add fine-tuned checksum functions
@ 2024-01-08 23:57 Charlie Jenkins
  2024-01-08 23:57 ` [PATCH v15 1/5] asm-generic: Improve csum_fold Charlie Jenkins
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Charlie Jenkins @ 2024-01-08 23:57 UTC (permalink / raw)
  To: Charlie Jenkins, Palmer Dabbelt, Conor Dooley, Samuel Holland,
	David Laight, Xiao Wang, Evan Green, Guo Ren, linux-riscv,
	linux-kernel, linux-arch
  Cc: Paul Walmsley, Albert Ou, Arnd Bergmann, David Laight, Conor Dooley

Each architecture generally implements fine-tuned checksum functions to
leverage the instruction set. This patch adds the main checksum
functions that are used in networking. Tested on QEMU, this series
allows the CHECKSUM_KUNIT tests to complete an average of 50.9% faster.

This patch takes heavy use of the Zbb extension using alternatives
patching.

To test this patch, enable the configs for KUNIT, then CHECKSUM_KUNIT.

I have attempted to make these functions as optimal as possible, but I
have not ran anything on actual riscv hardware. My performance testing
has been limited to inspecting the assembly, running the algorithms on
x86 hardware, and running in QEMU.

ip_fast_csum is a relatively small function so even though it is
possible to read 64 bits at a time on compatible hardware, the
bottleneck becomes the clean up and setup code so loading 32 bits at a
time is actually faster.

Relies on https://lore.kernel.org/lkml/20230920193801.3035093-1-evan@rivosinc.com/

---
    
The algorithm proposed to replace the default csum_fold can be seen to
compute the same result by running all 2^32 possible inputs.
    
static inline unsigned int ror32(unsigned int word, unsigned int shift)
{
	return (word >> (shift & 31)) | (word << ((-shift) & 31));
}

unsigned short csum_fold(unsigned int csum)
{
	unsigned int sum = csum;
	sum = (sum & 0xffff) + (sum >> 16);
	sum = (sum & 0xffff) + (sum >> 16);
	return ~sum;
}

unsigned short csum_fold_arc(unsigned int csum)
{
	return ((~csum - ror32(csum, 16)) >> 16);
}

int main()
{
	unsigned int start = 0x0;
	do {
		if (csum_fold(start) != csum_fold_arc(start)) {
			printf("Not the same %u\n", start);
			return -1;
		}
		start += 1;
	} while(start != 0x0);
	printf("The same\n");
	return 0;
}

Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
To: Charlie Jenkins <charlie@rivosinc.com>
To: Palmer Dabbelt <palmer@dabbelt.com>
To: Conor Dooley <conor@kernel.org>
To: Samuel Holland <samuel.holland@sifive.com>
To: David Laight <David.Laight@aculab.com>
To: Xiao Wang <xiao.w.wang@intel.com>
To: Evan Green <evan@rivosinc.com>
To: Guo Ren <guoren@kernel.org> 
To: linux-riscv@lists.infradead.org
To: linux-kernel@vger.kernel.org
To: linux-arch@vger.kernel.org
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>

---
Changes in v15:
- Create modify_unaligned_access_branches to consolidate duplicate code
  (Evan)
- Link to v14: https://lore.kernel.org/r/20231227-optimize_checksum-v14-0-ddfd48016566@rivosinc.com

Changes in v14:
- Update misaligned static branch when CPUs are hotplugged (Guo)
- Leave off Evan's reviewed-by on patch 2 since it was completely
  re-written
- Link to v13: https://lore.kernel.org/r/20231220-optimize_checksum-v13-0-a73547e1cad8@rivosinc.com

Changes in v13:
- Move cast from patch 4 to patch 3
- Link to v12: https://lore.kernel.org/r/20231212-optimize_checksum-v12-0-419a4ba6d666@rivosinc.com

Changes in v12:
- Rebase onto 6.7-rc5
- Add performance stats in the cover letter
- Link to v11: https://lore.kernel.org/r/20231117-optimize_checksum-v11-0-7d9d954fe361@rivosinc.com

Changes in v11:
- Extensive modifications to comply to sparse
- Organize include statements (Xiao)
- Add csum_ipv6_magic to commit message (Xiao)
- Remove extraneous len statement (Xiao)
- Add kasan_check_read call (Xiao)
- Improve comment field checksum.h (Xiao)
- Consolidate "buff" and "len" into one parameter "end" (Xiao)
- Link to v10: https://lore.kernel.org/r/20231101-optimize_checksum-v10-0-a498577bb969@rivosinc.com

Changes in v10:
- Move tests that were riscv-specific to be arch agnostic (Arnd)
- Link to v9: https://lore.kernel.org/r/20231031-optimize_checksum-v9-0-ea018e69b229@rivosinc.com

Changes in v9:
- Use ror64 (Xiao)
- Move do_csum and csum_ipv6_magic headers to patch 4 (Xiao)
- Remove word "IP" from checksum headers (Xiao)
- Swap to using ifndef CONFIG_32BIT instead of ifdef CONFIG_64BIT (Xiao)
- Run no alignment code when buff is aligned (Xiao)
- Consolidate two do_csum implementations overlap into do_csum_common
- Link to v8: https://lore.kernel.org/r/20231027-optimize_checksum-v8-0-feb7101d128d@rivosinc.com

Changes in v8:
- Speedups of 12% without Zbb and 21% with Zbb when cpu supports fast
  misaligned accesses for do_csum
- Various formatting updates
- Patch now relies on https://lore.kernel.org/lkml/20230920193801.3035093-1-evan@rivosinc.com/
- Link to v7: https://lore.kernel.org/r/20230919-optimize_checksum-v7-0-06c7d0ddd5d6@rivosinc.com

Changes in v7:
- Included linux/bitops.h in asm-generic/checksum.h to use ror (Conor)
- Optimized loop in do_csum (David)
- Used ror instead of shifting (David)
- Unfortunately had to reintroduce ifdefs because gcc is not smart
  enough to not throw warnings on code that will never execute
- Use ifdef instead of IS_ENABLED on __LITTLE_ENDIAN because IS_ENABLED
  does not work on that
- Only optimize for zbb when alternatives is enabled in do_csum
- Link to v6: https://lore.kernel.org/r/20230915-optimize_checksum-v6-0-14a6cf61c618@rivosinc.com

Changes in v6:
- Fix accuracy of commit message for csum_fold
- Fix indentation
- Link to v5: https://lore.kernel.org/r/20230914-optimize_checksum-v5-0-c95b82a2757e@rivosinc.com

Changes in v5:
- Drop vector patches
- Check ZBB enabled before doing any ZBB code (Conor)
- Check endianness in IS_ENABLED
- Revert to the simpler non-tree based version of ipv6_csum_magic since
  David pointed out that the tree based version is not better.
- Link to v4: https://lore.kernel.org/r/20230911-optimize_checksum-v4-0-77cc2ad9e9d7@rivosinc.com

Changes in v4:
- Suggestion by David Laight to use an improved checksum used in
  arch/arc.
- Eliminates zero-extension on rv32, but not on rv64.
- Reduces data dependency which should improve execution speed on
  rv32 and rv64
- Still passes CHECKSUM_KUNIT and RISCV_CHECKSUM_KUNIT on rv32 and
  rv64 with and without zbb.
- Link to v3: https://lore.kernel.org/r/20230907-optimize_checksum-v3-0-c502d34d9d73@rivosinc.com

Changes in v3:
- Use riscv_has_extension_likely and has_vector where possible (Conor)
- Reduce ifdefs by using IS_ENABLED where possible (Conor)
- Use kernel_vector_begin in the vector code (Samuel)
- Link to v2: https://lore.kernel.org/r/20230905-optimize_checksum-v2-0-ccd658db743b@rivosinc.com

Changes in v2:
- After more benchmarking, rework functions to improve performance.
- Remove tests that overlapped with the already existing checksum
  tests and make tests more extensive.
- Use alternatives to activate code with Zbb and vector extensions
- Link to v1: https://lore.kernel.org/r/20230826-optimize_checksum-v1-0-937501b4522a@rivosinc.com

---
Charlie Jenkins (5):
      asm-generic: Improve csum_fold
      riscv: Add static key for misaligned accesses
      riscv: Add checksum header
      riscv: Add checksum library
      kunit: Add tests for csum_ipv6_magic and ip_fast_csum

 arch/riscv/include/asm/checksum.h   |  93 ++++++++++
 arch/riscv/include/asm/cpufeature.h |   2 +
 arch/riscv/kernel/cpufeature.c      |  90 +++++++++-
 arch/riscv/lib/Makefile             |   1 +
 arch/riscv/lib/csum.c               | 326 ++++++++++++++++++++++++++++++++++++
 include/asm-generic/checksum.h      |   6 +-
 lib/checksum_kunit.c                | 284 ++++++++++++++++++++++++++++++-
 7 files changed, 795 insertions(+), 7 deletions(-)
---
base-commit: a39b6ac3781d46ba18193c9dbb2110f31e9bffe9
change-id: 20230804-optimize_checksum-db145288ac21
-- 
- Charlie


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v15 1/5] asm-generic: Improve csum_fold
  2024-01-08 23:57 [PATCH v15 0/5] riscv: Add fine-tuned checksum functions Charlie Jenkins
@ 2024-01-08 23:57 ` Charlie Jenkins
  2024-01-08 23:57 ` [PATCH v15 2/5] riscv: Add static key for misaligned accesses Charlie Jenkins
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Charlie Jenkins @ 2024-01-08 23:57 UTC (permalink / raw)
  To: Charlie Jenkins, Palmer Dabbelt, Conor Dooley, Samuel Holland,
	David Laight, Xiao Wang, Evan Green, Guo Ren, linux-riscv,
	linux-kernel, linux-arch
  Cc: Paul Walmsley, Albert Ou, Arnd Bergmann, David Laight

This csum_fold implementation introduced into arch/arc by Vineet Gupta
is better than the default implementation on at least arc, x86, and
riscv. Using GCC trunk and compiling non-inlined version, this
implementation has 41.6667%, 25% fewer instructions on riscv64, x86-64
respectively with -O3 optimization. Most implmentations override this
default in asm, but this should be more performant than all of those
other implementations except for arm which has barrel shifting and
sparc32 which has a carry flag.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: David Laight <david.laight@aculab.com>
---
 include/asm-generic/checksum.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/asm-generic/checksum.h b/include/asm-generic/checksum.h
index 43e18db89c14..ad928cce268b 100644
--- a/include/asm-generic/checksum.h
+++ b/include/asm-generic/checksum.h
@@ -2,6 +2,8 @@
 #ifndef __ASM_GENERIC_CHECKSUM_H
 #define __ASM_GENERIC_CHECKSUM_H
 
+#include <linux/bitops.h>
+
 /*
  * computes the checksum of a memory block at buff, length len,
  * and adds in "sum" (32-bit)
@@ -31,9 +33,7 @@ extern __sum16 ip_fast_csum(const void *iph, unsigned int ihl);
 static inline __sum16 csum_fold(__wsum csum)
 {
 	u32 sum = (__force u32)csum;
-	sum = (sum & 0xffff) + (sum >> 16);
-	sum = (sum & 0xffff) + (sum >> 16);
-	return (__force __sum16)~sum;
+	return (__force __sum16)((~sum - ror32(sum, 16)) >> 16);
 }
 #endif
 

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v15 2/5] riscv: Add static key for misaligned accesses
  2024-01-08 23:57 [PATCH v15 0/5] riscv: Add fine-tuned checksum functions Charlie Jenkins
  2024-01-08 23:57 ` [PATCH v15 1/5] asm-generic: Improve csum_fold Charlie Jenkins
@ 2024-01-08 23:57 ` Charlie Jenkins
  2024-01-08 23:57 ` [PATCH v15 3/5] riscv: Add checksum header Charlie Jenkins
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Charlie Jenkins @ 2024-01-08 23:57 UTC (permalink / raw)
  To: Charlie Jenkins, Palmer Dabbelt, Conor Dooley, Samuel Holland,
	David Laight, Xiao Wang, Evan Green, Guo Ren, linux-riscv,
	linux-kernel, linux-arch
  Cc: Paul Walmsley, Albert Ou, Arnd Bergmann

Support static branches depending on the value of misaligned accesses.
This will be used by a later patch in the series. At any point in time,
this static branch will only be enabled if all online CPUs are
considered "fast".

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Evan Green <evan@rivosinc.com>
---
 arch/riscv/include/asm/cpufeature.h |  2 +
 arch/riscv/kernel/cpufeature.c      | 90 +++++++++++++++++++++++++++++++++++--
 2 files changed, 89 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
index a418c3112cd6..7b129e5e2f07 100644
--- a/arch/riscv/include/asm/cpufeature.h
+++ b/arch/riscv/include/asm/cpufeature.h
@@ -133,4 +133,6 @@ static __always_inline bool riscv_cpu_has_extension_unlikely(int cpu, const unsi
 	return __riscv_isa_extension_available(hart_isa[cpu].isa, ext);
 }
 
+DECLARE_STATIC_KEY_FALSE(fast_misaligned_access_speed_key);
+
 #endif
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index b3785ffc1570..b62baeb504d8 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -8,8 +8,10 @@
 
 #include <linux/acpi.h>
 #include <linux/bitmap.h>
+#include <linux/cpu.h>
 #include <linux/cpuhotplug.h>
 #include <linux/ctype.h>
+#include <linux/jump_label.h>
 #include <linux/log2.h>
 #include <linux/memory.h>
 #include <linux/module.h>
@@ -44,6 +46,8 @@ struct riscv_isainfo hart_isa[NR_CPUS];
 /* Performance information */
 DEFINE_PER_CPU(long, misaligned_access_speed);
 
+static cpumask_t fast_misaligned_access;
+
 /**
  * riscv_isa_extension_base() - Get base extension word
  *
@@ -643,6 +647,16 @@ static int check_unaligned_access(void *param)
 		(speed == RISCV_HWPROBE_MISALIGNED_FAST) ? "fast" : "slow");
 
 	per_cpu(misaligned_access_speed, cpu) = speed;
+
+	/*
+	 * Set the value of fast_misaligned_access of a CPU. These operations
+	 * are atomic to avoid race conditions.
+	 */
+	if (speed == RISCV_HWPROBE_MISALIGNED_FAST)
+		cpumask_set_cpu(cpu, &fast_misaligned_access);
+	else
+		cpumask_clear_cpu(cpu, &fast_misaligned_access);
+
 	return 0;
 }
 
@@ -655,13 +669,69 @@ static void check_unaligned_access_nonboot_cpu(void *param)
 		check_unaligned_access(pages[cpu]);
 }
 
+DEFINE_STATIC_KEY_FALSE(fast_misaligned_access_speed_key);
+
+static void modify_unaligned_access_branches(cpumask_t *mask, int weight)
+{
+	if (cpumask_weight(mask) == weight)
+		static_branch_enable_cpuslocked(&fast_misaligned_access_speed_key);
+	else
+		static_branch_disable_cpuslocked(&fast_misaligned_access_speed_key);
+}
+
+static void set_unaligned_access_static_branches_except_cpu(int cpu)
+{
+	/*
+	 * Same as set_unaligned_access_static_branches, except excludes the
+	 * given CPU from the result. When a CPU is hotplugged into an offline
+	 * state, this function is called before the CPU is set to offline in
+	 * the cpumask, and thus the CPU needs to be explicitly excluded.
+	 */
+
+	cpumask_t fast_except_me;
+
+	cpumask_and(&fast_except_me, &fast_misaligned_access, cpu_online_mask);
+	cpumask_clear_cpu(cpu, &fast_except_me);
+
+	modify_unaligned_access_branches(&fast_except_me, num_online_cpus() - 1);
+}
+
+static void set_unaligned_access_static_branches(void)
+{
+	/*
+	 * This will be called after check_unaligned_access_all_cpus so the
+	 * result of unaligned access speed for all CPUs will be available.
+	 *
+	 * To avoid the number of online cpus changing between reading
+	 * cpu_online_mask and calling num_online_cpus, cpus_read_lock must be
+	 * held before calling this function.
+	 */
+
+	cpumask_t fast_and_online;
+
+	cpumask_and(&fast_and_online, &fast_misaligned_access, cpu_online_mask);
+
+	modify_unaligned_access_branches(&fast_and_online, num_online_cpus());
+}
+
+static int lock_and_set_unaligned_access_static_branch(void)
+{
+	cpus_read_lock();
+	set_unaligned_access_static_branches();
+	cpus_read_unlock();
+
+	return 0;
+}
+
+arch_initcall_sync(lock_and_set_unaligned_access_static_branch);
+
 static int riscv_online_cpu(unsigned int cpu)
 {
 	static struct page *buf;
 
 	/* We are already set since the last check */
 	if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_UNKNOWN)
-		return 0;
+		goto exit;
 
 	buf = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER);
 	if (!buf) {
@@ -671,6 +741,17 @@ static int riscv_online_cpu(unsigned int cpu)
 
 	check_unaligned_access(buf);
 	__free_pages(buf, MISALIGNED_BUFFER_ORDER);
+
+exit:
+	set_unaligned_access_static_branches();
+
+	return 0;
+}
+
+static int riscv_offline_cpu(unsigned int cpu)
+{
+	set_unaligned_access_static_branches_except_cpu(cpu);
+
 	return 0;
 }
 
@@ -705,9 +786,12 @@ static int check_unaligned_access_all_cpus(void)
 	/* Check core 0. */
 	smp_call_on_cpu(0, check_unaligned_access, bufs[0], true);
 
-	/* Setup hotplug callback for any new CPUs that come online. */
+	/*
+	 * Setup hotplug callbacks for any new CPUs that come online or go
+	 * offline.
+	 */
 	cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "riscv:online",
-				  riscv_online_cpu, NULL);
+				  riscv_online_cpu, riscv_offline_cpu);
 
 out:
 	unaligned_emulation_finish();

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v15 3/5] riscv: Add checksum header
  2024-01-08 23:57 [PATCH v15 0/5] riscv: Add fine-tuned checksum functions Charlie Jenkins
  2024-01-08 23:57 ` [PATCH v15 1/5] asm-generic: Improve csum_fold Charlie Jenkins
  2024-01-08 23:57 ` [PATCH v15 2/5] riscv: Add static key for misaligned accesses Charlie Jenkins
@ 2024-01-08 23:57 ` Charlie Jenkins
  2024-01-08 23:57 ` [PATCH v15 4/5] riscv: Add checksum library Charlie Jenkins
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Charlie Jenkins @ 2024-01-08 23:57 UTC (permalink / raw)
  To: Charlie Jenkins, Palmer Dabbelt, Conor Dooley, Samuel Holland,
	David Laight, Xiao Wang, Evan Green, Guo Ren, linux-riscv,
	linux-kernel, linux-arch
  Cc: Paul Walmsley, Albert Ou, Arnd Bergmann, Conor Dooley

Provide checksum algorithms that have been designed to leverage riscv
instructions such as rotate. In 64-bit, can take advantage of the larger
register to avoid some overflow checking.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Xiao Wang <xiao.w.wang@intel.com>
---
 arch/riscv/include/asm/checksum.h | 82 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 82 insertions(+)

diff --git a/arch/riscv/include/asm/checksum.h b/arch/riscv/include/asm/checksum.h
new file mode 100644
index 000000000000..5a810126aac7
--- /dev/null
+++ b/arch/riscv/include/asm/checksum.h
@@ -0,0 +1,82 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Checksum routines
+ *
+ * Copyright (C) 2023 Rivos Inc.
+ */
+#ifndef __ASM_RISCV_CHECKSUM_H
+#define __ASM_RISCV_CHECKSUM_H
+
+#include <linux/in6.h>
+#include <linux/uaccess.h>
+
+#define ip_fast_csum ip_fast_csum
+
+/* Define riscv versions of functions before importing asm-generic/checksum.h */
+#include <asm-generic/checksum.h>
+
+/**
+ * Quickly compute an IP checksum with the assumption that IPv4 headers will
+ * always be in multiples of 32-bits, and have an ihl of at least 5.
+ *
+ * @ihl: the number of 32 bit segments and must be greater than or equal to 5.
+ * @iph: assumed to be word aligned given that NET_IP_ALIGN is set to 2 on
+ *  riscv, defining IP headers to be aligned.
+ */
+static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
+{
+	unsigned long csum = 0;
+	int pos = 0;
+
+	do {
+		csum += ((const unsigned int *)iph)[pos];
+		if (IS_ENABLED(CONFIG_32BIT))
+			csum += csum < ((const unsigned int *)iph)[pos];
+	} while (++pos < ihl);
+
+	/*
+	 * ZBB only saves three instructions on 32-bit and five on 64-bit so not
+	 * worth checking if supported without Alternatives.
+	 */
+	if (IS_ENABLED(CONFIG_RISCV_ISA_ZBB) &&
+	    IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
+		unsigned long fold_temp;
+
+		asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
+					      RISCV_ISA_EXT_ZBB, 1)
+		    :
+		    :
+		    :
+		    : no_zbb);
+
+		if (IS_ENABLED(CONFIG_32BIT)) {
+			asm(".option push				\n\
+			.option arch,+zbb				\n\
+				not	%[fold_temp], %[csum]		\n\
+				rori	%[csum], %[csum], 16		\n\
+				sub	%[csum], %[fold_temp], %[csum]	\n\
+			.option pop"
+			: [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp));
+		} else {
+			asm(".option push				\n\
+			.option arch,+zbb				\n\
+				rori	%[fold_temp], %[csum], 32	\n\
+				add	%[csum], %[fold_temp], %[csum]	\n\
+				srli	%[csum], %[csum], 32		\n\
+				not	%[fold_temp], %[csum]		\n\
+				roriw	%[csum], %[csum], 16		\n\
+				subw	%[csum], %[fold_temp], %[csum]	\n\
+			.option pop"
+			: [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp));
+		}
+		return (__force __sum16)(csum >> 16);
+	}
+no_zbb:
+#ifndef CONFIG_32BIT
+	csum += ror64(csum, 32);
+	csum >>= 32;
+#endif
+	return csum_fold((__force __wsum)csum);
+}
+
+#endif /* __ASM_RISCV_CHECKSUM_H */

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v15 4/5] riscv: Add checksum library
  2024-01-08 23:57 [PATCH v15 0/5] riscv: Add fine-tuned checksum functions Charlie Jenkins
                   ` (2 preceding siblings ...)
  2024-01-08 23:57 ` [PATCH v15 3/5] riscv: Add checksum header Charlie Jenkins
@ 2024-01-08 23:57 ` Charlie Jenkins
  2024-01-08 23:57 ` [PATCH v15 5/5] kunit: Add tests for csum_ipv6_magic and ip_fast_csum Charlie Jenkins
  2024-01-20 21:09 ` [PATCH v15 0/5] riscv: Add fine-tuned checksum functions patchwork-bot+linux-riscv
  5 siblings, 0 replies; 15+ messages in thread
From: Charlie Jenkins @ 2024-01-08 23:57 UTC (permalink / raw)
  To: Charlie Jenkins, Palmer Dabbelt, Conor Dooley, Samuel Holland,
	David Laight, Xiao Wang, Evan Green, Guo Ren, linux-riscv,
	linux-kernel, linux-arch
  Cc: Paul Walmsley, Albert Ou, Arnd Bergmann, Conor Dooley

Provide a 32 and 64 bit version of do_csum. When compiled for 32-bit
will load from the buffer in groups of 32 bits, and when compiled for
64-bit will load in groups of 64 bits.

Additionally provide riscv optimized implementation of csum_ipv6_magic.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Xiao Wang <xiao.w.wang@intel.com>
---
 arch/riscv/include/asm/checksum.h |  11 ++
 arch/riscv/lib/Makefile           |   1 +
 arch/riscv/lib/csum.c             | 326 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 338 insertions(+)

diff --git a/arch/riscv/include/asm/checksum.h b/arch/riscv/include/asm/checksum.h
index 5a810126aac7..a5b60b54b101 100644
--- a/arch/riscv/include/asm/checksum.h
+++ b/arch/riscv/include/asm/checksum.h
@@ -12,6 +12,17 @@
 
 #define ip_fast_csum ip_fast_csum
 
+extern unsigned int do_csum(const unsigned char *buff, int len);
+#define do_csum do_csum
+
+/* Default version is sufficient for 32 bit */
+#ifndef CONFIG_32BIT
+#define _HAVE_ARCH_IPV6_CSUM
+__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
+			const struct in6_addr *daddr,
+			__u32 len, __u8 proto, __wsum sum);
+#endif
+
 /* Define riscv versions of functions before importing asm-generic/checksum.h */
 #include <asm-generic/checksum.h>
 
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index 26cb2502ecf8..2aa1a4ad361f 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -6,6 +6,7 @@ lib-y			+= memmove.o
 lib-y			+= strcmp.o
 lib-y			+= strlen.o
 lib-y			+= strncmp.o
+lib-y			+= csum.o
 lib-$(CONFIG_MMU)	+= uaccess.o
 lib-$(CONFIG_64BIT)	+= tishift.o
 lib-$(CONFIG_RISCV_ISA_ZICBOZ)	+= clear_page.o
diff --git a/arch/riscv/lib/csum.c b/arch/riscv/lib/csum.c
new file mode 100644
index 000000000000..06ce8e7250d9
--- /dev/null
+++ b/arch/riscv/lib/csum.c
@@ -0,0 +1,326 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Checksum library
+ *
+ * Influenced by arch/arm64/lib/csum.c
+ * Copyright (C) 2023 Rivos Inc.
+ */
+#include <linux/bitops.h>
+#include <linux/compiler.h>
+#include <linux/jump_label.h>
+#include <linux/kasan-checks.h>
+#include <linux/kernel.h>
+
+#include <asm/cpufeature.h>
+
+#include <net/checksum.h>
+
+/* Default version is sufficient for 32 bit */
+#ifndef CONFIG_32BIT
+__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
+			const struct in6_addr *daddr,
+			__u32 len, __u8 proto, __wsum csum)
+{
+	unsigned int ulen, uproto;
+	unsigned long sum = (__force unsigned long)csum;
+
+	sum += (__force unsigned long)saddr->s6_addr32[0];
+	sum += (__force unsigned long)saddr->s6_addr32[1];
+	sum += (__force unsigned long)saddr->s6_addr32[2];
+	sum += (__force unsigned long)saddr->s6_addr32[3];
+
+	sum += (__force unsigned long)daddr->s6_addr32[0];
+	sum += (__force unsigned long)daddr->s6_addr32[1];
+	sum += (__force unsigned long)daddr->s6_addr32[2];
+	sum += (__force unsigned long)daddr->s6_addr32[3];
+
+	ulen = (__force unsigned int)htonl((unsigned int)len);
+	sum += ulen;
+
+	uproto = (__force unsigned int)htonl(proto);
+	sum += uproto;
+
+	/*
+	 * Zbb support saves 4 instructions, so not worth checking without
+	 * alternatives if supported
+	 */
+	if (IS_ENABLED(CONFIG_RISCV_ISA_ZBB) &&
+	    IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
+		unsigned long fold_temp;
+
+		/*
+		 * Zbb is likely available when the kernel is compiled with Zbb
+		 * support, so nop when Zbb is available and jump when Zbb is
+		 * not available.
+		 */
+		asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
+					      RISCV_ISA_EXT_ZBB, 1)
+				  :
+				  :
+				  :
+				  : no_zbb);
+		asm(".option push					\n\
+		.option arch,+zbb					\n\
+			rori	%[fold_temp], %[sum], 32		\n\
+			add	%[sum], %[fold_temp], %[sum]		\n\
+			srli	%[sum], %[sum], 32			\n\
+			not	%[fold_temp], %[sum]			\n\
+			roriw	%[sum], %[sum], 16			\n\
+			subw	%[sum], %[fold_temp], %[sum]		\n\
+		.option pop"
+		: [sum] "+r" (sum), [fold_temp] "=&r" (fold_temp));
+		return (__force __sum16)(sum >> 16);
+	}
+no_zbb:
+	sum += ror64(sum, 32);
+	sum >>= 32;
+	return csum_fold((__force __wsum)sum);
+}
+EXPORT_SYMBOL(csum_ipv6_magic);
+#endif /* !CONFIG_32BIT */
+
+#ifdef CONFIG_32BIT
+#define OFFSET_MASK 3
+#elif CONFIG_64BIT
+#define OFFSET_MASK 7
+#endif
+
+static inline __no_sanitize_address unsigned long
+do_csum_common(const unsigned long *ptr, const unsigned long *end,
+	       unsigned long data)
+{
+	unsigned int shift;
+	unsigned long csum = 0, carry = 0;
+
+	/*
+	 * Do 32-bit reads on RV32 and 64-bit reads otherwise. This should be
+	 * faster than doing 32-bit reads on architectures that support larger
+	 * reads.
+	 */
+	while (ptr < end) {
+		csum += data;
+		carry += csum < data;
+		data = *(ptr++);
+	}
+
+	/*
+	 * Perform alignment (and over-read) bytes on the tail if any bytes
+	 * leftover.
+	 */
+	shift = ((long)ptr - (long)end) * 8;
+#ifdef __LITTLE_ENDIAN
+	data = (data << shift) >> shift;
+#else
+	data = (data >> shift) << shift;
+#endif
+	csum += data;
+	carry += csum < data;
+	csum += carry;
+	csum += csum < carry;
+
+	return csum;
+}
+
+/*
+ * Algorithm accounts for buff being misaligned.
+ * If buff is not aligned, will over-read bytes but not use the bytes that it
+ * shouldn't. The same thing will occur on the tail-end of the read.
+ */
+static inline __no_sanitize_address unsigned int
+do_csum_with_alignment(const unsigned char *buff, int len)
+{
+	unsigned int offset, shift;
+	unsigned long csum, data;
+	const unsigned long *ptr, *end;
+
+	/*
+	 * Align address to closest word (double word on rv64) that comes before
+	 * buff. This should always be in the same page and cache line.
+	 * Directly call KASAN with the alignment we will be using.
+	 */
+	offset = (unsigned long)buff & OFFSET_MASK;
+	kasan_check_read(buff, len);
+	ptr = (const unsigned long *)(buff - offset);
+
+	/*
+	 * Clear the most significant bytes that were over-read if buff was not
+	 * aligned.
+	 */
+	shift = offset * 8;
+	data = *(ptr++);
+#ifdef __LITTLE_ENDIAN
+	data = (data >> shift) << shift;
+#else
+	data = (data << shift) >> shift;
+#endif
+	end = (const unsigned long *)(buff + len);
+	csum = do_csum_common(ptr, end, data);
+
+	/*
+	 * Zbb support saves 6 instructions, so not worth checking without
+	 * alternatives if supported
+	 */
+	if (IS_ENABLED(CONFIG_RISCV_ISA_ZBB) &&
+	    IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
+		unsigned long fold_temp;
+
+		/*
+		 * Zbb is likely available when the kernel is compiled with Zbb
+		 * support, so nop when Zbb is available and jump when Zbb is
+		 * not available.
+		 */
+		asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
+					      RISCV_ISA_EXT_ZBB, 1)
+				  :
+				  :
+				  :
+				  : no_zbb);
+
+#ifdef CONFIG_32BIT
+		asm_volatile_goto(".option push			\n\
+		.option arch,+zbb				\n\
+			rori	%[fold_temp], %[csum], 16	\n\
+			andi	%[offset], %[offset], 1		\n\
+			add	%[csum], %[fold_temp], %[csum]	\n\
+			beq	%[offset], zero, %l[end]	\n\
+			rev8	%[csum], %[csum]		\n\
+		.option pop"
+			: [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp)
+			: [offset] "r" (offset)
+			:
+			: end);
+
+		return (unsigned short)csum;
+#else /* !CONFIG_32BIT */
+		asm_volatile_goto(".option push			\n\
+		.option arch,+zbb				\n\
+			rori	%[fold_temp], %[csum], 32	\n\
+			add	%[csum], %[fold_temp], %[csum]	\n\
+			srli	%[csum], %[csum], 32		\n\
+			roriw	%[fold_temp], %[csum], 16	\n\
+			addw	%[csum], %[fold_temp], %[csum]	\n\
+			andi	%[offset], %[offset], 1		\n\
+			beq	%[offset], zero, %l[end]	\n\
+			rev8	%[csum], %[csum]		\n\
+		.option pop"
+			: [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp)
+			: [offset] "r" (offset)
+			:
+			: end);
+
+		return (csum << 16) >> 48;
+#endif /* !CONFIG_32BIT */
+end:
+		return csum >> 16;
+	}
+no_zbb:
+#ifndef CONFIG_32BIT
+	csum += ror64(csum, 32);
+	csum >>= 32;
+#endif
+	csum = (u32)csum + ror32((u32)csum, 16);
+	if (offset & 1)
+		return (u16)swab32(csum);
+	return csum >> 16;
+}
+
+/*
+ * Does not perform alignment, should only be used if machine has fast
+ * misaligned accesses, or when buff is known to be aligned.
+ */
+static inline __no_sanitize_address unsigned int
+do_csum_no_alignment(const unsigned char *buff, int len)
+{
+	unsigned long csum, data;
+	const unsigned long *ptr, *end;
+
+	ptr = (const unsigned long *)(buff);
+	data = *(ptr++);
+
+	kasan_check_read(buff, len);
+
+	end = (const unsigned long *)(buff + len);
+	csum = do_csum_common(ptr, end, data);
+
+	/*
+	 * Zbb support saves 6 instructions, so not worth checking without
+	 * alternatives if supported
+	 */
+	if (IS_ENABLED(CONFIG_RISCV_ISA_ZBB) &&
+	    IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
+		unsigned long fold_temp;
+
+		/*
+		 * Zbb is likely available when the kernel is compiled with Zbb
+		 * support, so nop when Zbb is available and jump when Zbb is
+		 * not available.
+		 */
+		asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
+					      RISCV_ISA_EXT_ZBB, 1)
+				  :
+				  :
+				  :
+				  : no_zbb);
+
+#ifdef CONFIG_32BIT
+		asm (".option push				\n\
+		.option arch,+zbb				\n\
+			rori	%[fold_temp], %[csum], 16	\n\
+			add	%[csum], %[fold_temp], %[csum]	\n\
+		.option pop"
+			: [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp)
+			:
+			: );
+
+#else /* !CONFIG_32BIT */
+		asm (".option push				\n\
+		.option arch,+zbb				\n\
+			rori	%[fold_temp], %[csum], 32	\n\
+			add	%[csum], %[fold_temp], %[csum]	\n\
+			srli	%[csum], %[csum], 32		\n\
+			roriw	%[fold_temp], %[csum], 16	\n\
+			addw	%[csum], %[fold_temp], %[csum]	\n\
+		.option pop"
+			: [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp)
+			:
+			: );
+#endif /* !CONFIG_32BIT */
+		return csum >> 16;
+	}
+no_zbb:
+#ifndef CONFIG_32BIT
+	csum += ror64(csum, 32);
+	csum >>= 32;
+#endif
+	csum = (u32)csum + ror32((u32)csum, 16);
+	return csum >> 16;
+}
+
+/*
+ * Perform a checksum on an arbitrary memory address.
+ * Will do a light-weight address alignment if buff is misaligned, unless
+ * cpu supports fast misaligned accesses.
+ */
+unsigned int do_csum(const unsigned char *buff, int len)
+{
+	if (unlikely(len <= 0))
+		return 0;
+
+	/*
+	 * Significant performance gains can be seen by not doing alignment
+	 * on machines with fast misaligned accesses.
+	 *
+	 * There is some duplicate code between the "with_alignment" and
+	 * "no_alignment" implmentations, but the overlap is too awkward to be
+	 * able to fit in one function without introducing multiple static
+	 * branches. The largest chunk of overlap was delegated into the
+	 * do_csum_common function.
+	 */
+	if (static_branch_likely(&fast_misaligned_access_speed_key))
+		return do_csum_no_alignment(buff, len);
+
+	if (((unsigned long)buff & OFFSET_MASK) == 0)
+		return do_csum_no_alignment(buff, len);
+
+	return do_csum_with_alignment(buff, len);
+}

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v15 5/5] kunit: Add tests for csum_ipv6_magic and ip_fast_csum
  2024-01-08 23:57 [PATCH v15 0/5] riscv: Add fine-tuned checksum functions Charlie Jenkins
                   ` (3 preceding siblings ...)
  2024-01-08 23:57 ` [PATCH v15 4/5] riscv: Add checksum library Charlie Jenkins
@ 2024-01-08 23:57 ` Charlie Jenkins
  2024-01-22 16:40   ` Guenter Roeck
  2024-01-20 21:09 ` [PATCH v15 0/5] riscv: Add fine-tuned checksum functions patchwork-bot+linux-riscv
  5 siblings, 1 reply; 15+ messages in thread
From: Charlie Jenkins @ 2024-01-08 23:57 UTC (permalink / raw)
  To: Charlie Jenkins, Palmer Dabbelt, Conor Dooley, Samuel Holland,
	David Laight, Xiao Wang, Evan Green, Guo Ren, linux-riscv,
	linux-kernel, linux-arch
  Cc: Paul Walmsley, Albert Ou, Arnd Bergmann

Supplement existing checksum tests with tests for csum_ipv6_magic and
ip_fast_csum.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
 lib/checksum_kunit.c | 284 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 283 insertions(+), 1 deletion(-)

diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c
index 0eed92b77ba3..af3e5ca4e170 100644
--- a/lib/checksum_kunit.c
+++ b/lib/checksum_kunit.c
@@ -1,15 +1,21 @@
 // SPDX-License-Identifier: GPL-2.0+
 /*
- * Test cases csum_partial and csum_fold
+ * Test cases csum_partial, csum_fold, ip_fast_csum, csum_ipv6_magic
  */
 
 #include <kunit/test.h>
 #include <asm/checksum.h>
+#include <net/ip6_checksum.h>
 
 #define MAX_LEN 512
 #define MAX_ALIGN 64
 #define TEST_BUFLEN (MAX_LEN + MAX_ALIGN)
 
+#define IPv4_MIN_WORDS 5
+#define IPv4_MAX_WORDS 15
+#define NUM_IPv6_TESTS 200
+#define NUM_IP_FAST_CSUM_TESTS 181
+
 /* Values for a little endian CPU. Byte swap each half on big endian CPU. */
 static const u32 random_init_sum = 0x2847aab;
 static const u8 random_buf[] = {
@@ -209,6 +215,237 @@ static const u32 init_sums_no_overflow[] = {
 	0xffff0000, 0xfffffffb,
 };
 
+static const __sum16 expected_csum_ipv6_magic[] = {
+	0x18d4, 0x3085, 0x2e4b, 0xd9f4, 0xbdc8, 0x78f,	0x1034, 0x8422, 0x6fc0,
+	0xd2f6, 0xbeb5, 0x9d3,	0x7e2a, 0x312e, 0x778e, 0xc1bb, 0x7cf2, 0x9d1e,
+	0xca21, 0xf3ff, 0x7569, 0xb02e, 0xca86, 0x7e76, 0x4539, 0x45e3, 0xf28d,
+	0xdf81, 0x8fd5, 0x3b5d, 0x8324, 0xf471, 0x83be, 0x1daf, 0x8c46, 0xe682,
+	0xd1fb, 0x6b2e, 0xe687, 0x2a33, 0x4833, 0x2d67, 0x660f, 0x2e79, 0xd65e,
+	0x6b62, 0x6672, 0x5dbd, 0x8680, 0xbaa5, 0x2229, 0x2125, 0x2d01, 0x1cc0,
+	0x6d36, 0x33c0, 0xee36, 0xd832, 0x9820, 0x8a31, 0x53c5, 0x2e2,	0xdb0e,
+	0x49ed, 0x17a7, 0x77a0, 0xd72e, 0x3d72, 0x7dc8, 0x5b17, 0xf55d, 0xa4d9,
+	0x1446, 0x5d56, 0x6b2e, 0x69a5, 0xadb6, 0xff2a, 0x92e,	0xe044, 0x3402,
+	0xbb60, 0xec7f, 0xe7e6, 0x1986, 0x32f4, 0x8f8,	0x5e00, 0x47c6, 0x3059,
+	0x3969, 0xe957, 0x4388, 0x2854, 0x3334, 0xea71, 0xa6de, 0x33f9, 0x83fc,
+	0x37b4, 0x5531, 0x3404, 0x1010, 0xed30, 0x610a, 0xc95,	0x9aed, 0x6ff,
+	0x5136, 0x2741, 0x660e, 0x8b80, 0xf71,	0xa263, 0x88af, 0x7a73, 0x3c37,
+	0x1908, 0x6db5, 0x2e92, 0x1cd2, 0x70c8, 0xee16, 0xe80,	0xcd55, 0x6e6,
+	0x6434, 0x127,	0x655d, 0x2ea0, 0xb4f4, 0xdc20, 0x5671, 0xe462, 0xe52b,
+	0xdb44, 0x3589, 0xc48f, 0xe60b, 0xd2d2, 0x66ad, 0x498,	0x436,	0xb917,
+	0xf0ca, 0x1a6e, 0x1cb7, 0xbf61, 0x2870, 0xc7e8, 0x5b30, 0xe4a5, 0x168,
+	0xadfc, 0xd035, 0xe690, 0xe283, 0xfb27, 0xe4ad, 0xb1a5, 0xf2d5, 0xc4b6,
+	0x8a30, 0xd7d5, 0x7df9, 0x91d5, 0x63ed, 0x2d21, 0x312b, 0xab19, 0xa632,
+	0x8d2e, 0xef06, 0x57b9, 0xc373, 0xbd1f, 0xa41f, 0x8444, 0x9975, 0x90cb,
+	0xc49c, 0xe965, 0x4eff, 0x5a,	0xef6d, 0xe81a, 0xe260, 0x853a, 0xff7a,
+	0x99aa, 0xb06b, 0xee19, 0xcc2c, 0xf34c, 0x7c49, 0xdac3, 0xa71e, 0xc988,
+	0x3845, 0x1014
+};
+
+static const __sum16 expected_fast_csum[] = {
+	0xda83, 0x45da, 0x4f46, 0x4e4f, 0x34e,	0xe902, 0xa5e9, 0x87a5, 0x7187,
+	0x5671, 0xf556, 0x6df5, 0x816d, 0x8f81, 0xbb8f, 0xfbba, 0x5afb, 0xbe5a,
+	0xedbe, 0xabee, 0x6aac, 0xe6b,	0xea0d, 0x67ea, 0x7e68, 0x8a7e, 0x6f8a,
+	0x3a70, 0x9f3a, 0xe89e, 0x75e8, 0x7976, 0xfa79, 0x2cfa, 0x3c2c, 0x463c,
+	0x7146, 0x7a71, 0x547a, 0xfd53, 0x99fc, 0xb699, 0x92b6, 0xdb91, 0xe8da,
+	0x5fe9, 0x1e60, 0xae1d, 0x39ae, 0xf439, 0xa1f4, 0xdda1, 0xede,	0x790f,
+	0x579,	0x1206, 0x9012, 0x2490, 0xd224, 0x5cd2, 0xa65d, 0xca7,	0x220d,
+	0xf922, 0xbf9,	0x920b, 0x1b92, 0x361c, 0x2e36, 0x4d2e, 0x24d,	0x2,
+	0xcfff, 0x90cf, 0xa591, 0x93a5, 0x7993, 0x9579, 0xc894, 0x50c8, 0x5f50,
+	0xd55e, 0xcad5, 0xf3c9, 0x8f4,	0x4409, 0x5043, 0x5b50, 0x55b,	0x2205,
+	0x1e22, 0x801e, 0x3780, 0xe137, 0x7ee0, 0xf67d, 0x3cf6, 0xa53c, 0x2ea5,
+	0x472e, 0x5147, 0xcf51, 0x1bcf, 0x951c, 0x1e95, 0xc71e, 0xe4c7, 0xc3e4,
+	0x3dc3, 0xee3d, 0xa4ed, 0xf9a4, 0xcbf8, 0x75cb, 0xb375, 0x50b4, 0x3551,
+	0xf835, 0x19f8, 0x8c1a, 0x538c, 0xad52, 0xa3ac, 0xb0a3, 0x5cb0, 0x6c5c,
+	0x5b6c, 0xc05a, 0x92c0, 0x4792, 0xbe47, 0x53be, 0x1554, 0x5715, 0x4b57,
+	0xe54a, 0x20e5, 0x21,	0xd500, 0xa1d4, 0xa8a1, 0x57a9, 0xca57, 0x5ca,
+	0x1c06, 0x4f1c, 0xe24e, 0xd9e2, 0xf0d9, 0x4af1, 0x474b, 0x8146, 0xe81,
+	0xfd0e, 0x84fd, 0x7c85, 0xba7c, 0x17ba, 0x4a17, 0x964a, 0xf595, 0xff5,
+	0x5310, 0x3253, 0x6432, 0x4263, 0x2242, 0xe121, 0x32e1, 0xf632, 0xc5f5,
+	0x21c6, 0x7d22, 0x8e7c, 0x418e, 0x5641, 0x3156, 0x7c31, 0x737c, 0x373,
+	0x2503, 0xc22a, 0x3c2,	0x4a04, 0x8549, 0x5285, 0xa352, 0xe8a3, 0x6fe8,
+	0x1a6f, 0x211a, 0xe021, 0x38e0, 0x7638, 0xf575, 0x9df5, 0x169e, 0xf116,
+	0x23f1, 0xcd23, 0xece,	0x660f, 0x4866, 0x6a48, 0x716a, 0xee71, 0xa2ee,
+	0xb8a2, 0x61b9, 0xa361, 0xf7a2, 0x26f7, 0x1127, 0x6611, 0xe065, 0x36e0,
+	0x1837, 0x3018, 0x1c30, 0x721b, 0x3e71, 0xe43d, 0x99e4, 0x9e9a, 0xb79d,
+	0xa9b7, 0xcaa,	0xeb0c, 0x4eb,	0x1305, 0x8813, 0xb687, 0xa9b6, 0xfba9,
+	0xd7fb, 0xccd8, 0x2ecd, 0x652f, 0xae65, 0x3fae, 0x3a40, 0x563a, 0x7556,
+	0x2776, 0x1228, 0xef12, 0xf9ee, 0xcef9, 0x56cf, 0xa956, 0x24a9, 0xba24,
+	0x5fba, 0x665f, 0xf465, 0x8ff4, 0x6d8f, 0x346d, 0x5f34, 0x385f, 0xd137,
+	0xb8d0, 0xacb8, 0x55ac, 0x7455, 0xe874, 0x89e8, 0xd189, 0xa0d1, 0xb2a0,
+	0xb8b2, 0x36b8, 0x5636, 0xd355, 0x8d3,	0x1908, 0x2118, 0xc21,	0x990c,
+	0x8b99, 0x158c, 0x7815, 0x9e78, 0x6f9e, 0x4470, 0x1d44, 0x341d, 0x2634,
+	0x3f26, 0x793e, 0xc79,	0xcc0b, 0x26cc, 0xd126, 0x1fd1, 0xb41f, 0xb6b4,
+	0x22b7, 0xa122, 0xa1,	0x7f01, 0x837e, 0x3b83, 0xaf3b, 0x6fae, 0x916f,
+	0xb490, 0xffb3, 0xceff, 0x50cf, 0x7550, 0x7275, 0x1272, 0x2613, 0xaa26,
+	0xd5aa, 0x7d5,	0x9607, 0x96,	0xb100, 0xf8b0, 0x4bf8, 0xdd4c, 0xeddd,
+	0x98ed, 0x2599, 0x9325, 0xeb92, 0x8feb, 0xcc8f, 0x2acd, 0x392b, 0x3b39,
+	0xcb3b, 0x6acb, 0xd46a, 0xb8d4, 0x6ab8, 0x106a, 0x2f10, 0x892f, 0x789,
+	0xc806, 0x45c8, 0x7445, 0x3c74, 0x3a3c, 0xcf39, 0xd7ce, 0x58d8, 0x6e58,
+	0x336e, 0x1034, 0xee10, 0xe9ed, 0xc2e9, 0x3fc2, 0xd53e, 0xd2d4, 0xead2,
+	0x8fea, 0x2190, 0x1162, 0xbe11, 0x8cbe, 0x6d8c, 0xfb6c, 0x6dfb, 0xd36e,
+	0x3ad3, 0xf3a,	0x870e, 0xc287, 0x53c3, 0xc54,	0x5b0c, 0x7d5a, 0x797d,
+	0xec79, 0x5dec, 0x4d5e, 0x184e, 0xd618, 0x60d6, 0xb360, 0x98b3, 0xf298,
+	0xb1f2, 0x69b1, 0xf969, 0xef9,	0xab0e, 0x21ab, 0xe321, 0x24e3, 0x8224,
+	0x5481, 0x5954, 0x7a59, 0xff7a, 0x7dff, 0x1a7d, 0xa51a, 0x46a5, 0x6b47,
+	0xe6b,	0x830e, 0xa083, 0xff9f, 0xd0ff, 0xffd0, 0xe6ff, 0x7de7, 0xc67d,
+	0xd0c6, 0x61d1, 0x3a62, 0xc3b,	0x150c, 0x1715, 0x4517, 0x5345, 0x3954,
+	0xdd39, 0xdadd, 0x32db, 0x6a33, 0xd169, 0x86d1, 0xb687, 0x3fb6, 0x883f,
+	0xa487, 0x39a4, 0x2139, 0xbe20, 0xffbe, 0xedfe, 0x8ded, 0x368e, 0xc335,
+	0x51c3, 0x9851, 0xf297, 0xd6f2, 0xb9d6, 0x95ba, 0x2096, 0xea1f, 0x76e9,
+	0x4e76, 0xe04d, 0xd0df, 0x80d0, 0xa280, 0xfca2, 0x75fc, 0xef75, 0x32ef,
+	0x6833, 0xdf68, 0xc4df, 0x76c4, 0xb77,	0xb10a, 0xbfb1, 0x58bf, 0x5258,
+	0x4d52, 0x6c4d, 0x7e6c, 0xb67e, 0xccb5, 0x8ccc, 0xbe8c, 0xc8bd, 0x9ac8,
+	0xa99b, 0x52a9, 0x2f53, 0xc30,	0x3e0c, 0xb83d, 0x83b7, 0x5383, 0x7e53,
+	0x4f7e, 0xe24e, 0xb3e1, 0x8db3, 0x618e, 0xc861, 0xfcc8, 0x34fc, 0x9b35,
+	0xaa9b, 0xb1aa, 0x5eb1, 0x395e, 0x8639, 0xd486, 0x8bd4, 0x558b, 0x2156,
+	0xf721, 0x4ef6, 0x14f,	0x7301, 0xdd72, 0x49de, 0x894a, 0x9889, 0x8898,
+	0x7788, 0x7b77, 0x637b, 0xb963, 0xabb9, 0x7cab, 0xc87b, 0x21c8, 0xcb21,
+	0xdfca, 0xbfdf, 0xf2bf, 0x6af2, 0x626b, 0xb261, 0x3cb2, 0xc63c, 0xc9c6,
+	0xc9c9, 0xb4c9, 0xf9b4, 0x91f9, 0x4091, 0x3a40, 0xcc39, 0xd1cb, 0x7ed1,
+	0x537f, 0x6753, 0xa167, 0xba49, 0x88ba, 0x7789, 0x3877, 0xf037, 0xd3ef,
+	0xb5d4, 0x55b6, 0xa555, 0xeca4, 0xa1ec, 0xb6a2, 0x7b7,	0x9507, 0xfd94,
+	0x82fd, 0x5c83, 0x765c, 0x9676, 0x3f97, 0xda3f, 0x6fda, 0x646f, 0x3064,
+	0x5e30, 0x655e, 0x6465, 0xcb64, 0xcdca, 0x4ccd, 0x3f4c, 0x243f, 0x6f24,
+	0x656f, 0x6065, 0x3560, 0x3b36, 0xac3b, 0x4aac, 0x714a, 0x7e71, 0xda7e,
+	0x7fda, 0xda7f, 0x6fda, 0xff6f, 0xc6ff, 0xedc6, 0xd4ed, 0x70d5, 0xeb70,
+	0xa3eb, 0x80a3, 0xca80, 0x3fcb, 0x2540, 0xf825, 0x7ef8, 0xf87e, 0x73f8,
+	0xb474, 0xb4b4, 0x92b5, 0x9293, 0x93,	0x3500, 0x7134, 0x9071, 0xfa8f,
+	0x51fa, 0x1452, 0xba13, 0x7ab9, 0x957a, 0x8a95, 0x6e8a, 0x6d6e, 0x7c6d,
+	0x447c, 0x9744, 0x4597, 0x8945, 0xef88, 0x8fee, 0x3190, 0x4831, 0x8447,
+	0xa183, 0x1da1, 0xd41d, 0x2dd4, 0x4f2e, 0xc94e, 0xcbc9, 0xc9cb, 0x9ec9,
+	0x319e, 0xd531, 0x20d5, 0x4021, 0xb23f, 0x29b2, 0xd828, 0xecd8, 0x5ded,
+	0xfc5d, 0x4dfc, 0xd24d, 0x6bd2, 0x5f6b, 0xb35e, 0x7fb3, 0xee7e, 0x56ee,
+	0xa657, 0x68a6, 0x8768, 0x7787, 0xb077, 0x4cb1, 0x764c, 0xb175, 0x7b1,
+	0x3d07, 0x603d, 0x3560, 0x3e35, 0xb03d, 0xd6b0, 0xc8d6, 0xd8c8, 0x8bd8,
+	0x3e8c, 0x303f, 0xd530, 0xf1d4, 0x42f1, 0xca42, 0xddca, 0x41dd, 0x3141,
+	0x132,	0xe901, 0x8e9,	0xbe09, 0xe0bd, 0x2ce0, 0x862d, 0x3986, 0x9139,
+	0x6d91, 0x6a6d, 0x8d6a, 0x1b8d, 0xac1b, 0xedab, 0x54ed, 0xc054, 0xcebf,
+	0xc1ce, 0x5c2,	0x3805, 0x6038, 0x5960, 0xd359, 0xdd3,	0xbe0d, 0xafbd,
+	0x6daf, 0x206d, 0x2c20, 0x862c, 0x8e86, 0xec8d, 0xa2ec, 0xa3a2, 0x51a3,
+	0x8051, 0xfd7f, 0x91fd, 0xa292, 0xaf14, 0xeeae, 0x59ef, 0x535a, 0x8653,
+	0x3986, 0x9539, 0xb895, 0xa0b8, 0x26a0, 0x2227, 0xc022, 0x77c0, 0xad77,
+	0x46ad, 0xaa46, 0x60aa, 0x8560, 0x4785, 0xd747, 0x45d7, 0x2346, 0x5f23,
+	0x25f,	0x1d02, 0x71d,	0x8206, 0xc82,	0x180c, 0x3018, 0x4b30, 0x4b,
+	0x3001, 0x1230, 0x2d12, 0x8c2d, 0x148d, 0x4015, 0x5f3f, 0x3d5f, 0x6b3d,
+	0x396b, 0x473a, 0xf746, 0x44f7, 0x8945, 0x3489, 0xcb34, 0x84ca, 0xd984,
+	0xf0d9, 0xbcf0, 0x63bd, 0x3264, 0xf332, 0x45f3, 0x7346, 0x5673, 0xb056,
+	0xd3b0, 0x4ad4, 0x184b, 0x7d18, 0x6c7d, 0xbb6c, 0xfeba, 0xe0fe, 0x10e1,
+	0x5410, 0x2954, 0x9f28, 0x3a9f, 0x5a3a, 0xdb59, 0xbdc,	0xb40b, 0x1ab4,
+	0x131b, 0x5d12, 0x6d5c, 0xe16c, 0xb0e0, 0x89b0, 0xba88, 0xbb,	0x3c01,
+	0xe13b, 0x6fe1, 0x446f, 0xa344, 0x81a3, 0xfe81, 0xc7fd, 0x38c8, 0xb38,
+	0x1a0b, 0x6d19, 0xf36c, 0x47f3, 0x6d48, 0xb76d, 0xd3b7, 0xd8d2, 0x52d9,
+	0x4b53, 0xa54a, 0x34a5, 0xc534, 0x9bc4, 0xed9b, 0xbeed, 0x3ebe, 0x233e,
+	0x9f22, 0x4a9f, 0x774b, 0x4577, 0xa545, 0x64a5, 0xb65,	0x870b, 0x487,
+	0x9204, 0x5f91, 0xd55f, 0x35d5, 0x1a35, 0x71a,	0x7a07, 0x4e7a, 0xfc4e,
+	0x1efc, 0x481f, 0x7448, 0xde74, 0xa7dd, 0x1ea7, 0xaa1e, 0xcfaa, 0xfbcf,
+	0xedfb, 0x6eee, 0x386f, 0x4538, 0x6e45, 0xd96d, 0x11d9, 0x7912, 0x4b79,
+	0x494b, 0x6049, 0xac5f, 0x65ac, 0x1366, 0x5913, 0xe458, 0x7ae4, 0x387a,
+	0x3c38, 0xb03c, 0x76b0, 0x9376, 0xe193, 0x42e1, 0x7742, 0x6476, 0x3564,
+	0x3c35, 0x6a3c, 0xcc69, 0x94cc, 0x5d95, 0xe5e,	0xee0d, 0x4ced, 0xce4c,
+	0x52ce, 0xaa52, 0xdaaa, 0xe4da, 0x1de5, 0x4530, 0x5445, 0x3954, 0xb639,
+	0x81b6, 0x7381, 0x1574, 0xc215, 0x10c2, 0x3f10, 0x6b3f, 0xe76b, 0x7be7,
+	0xbc7b, 0xf7bb, 0x41f7, 0xcc41, 0x38cc, 0x4239, 0xa942, 0x4a9,	0xc504,
+	0x7cc4, 0x437c, 0x6743, 0xea67, 0x8dea, 0xe88d, 0xd8e8, 0xdcd8, 0x17dd,
+	0x5718, 0x958,	0xa609, 0x41a5, 0x5842, 0x159,	0x9f01, 0x269f, 0x5a26,
+	0x405a, 0xc340, 0xb4c3, 0xd4b4, 0xf4d3, 0xf1f4, 0x39f2, 0xe439, 0x67e4,
+	0x4168, 0xa441, 0xdda3, 0xdedd, 0x9df,	0xab0a, 0xa5ab, 0x9a6,	0xba09,
+	0x9ab9, 0xad9a, 0x5ae,	0xe205, 0xece2, 0xecec, 0x14ed, 0xd614, 0x6bd5,
+	0x916c, 0x3391, 0x6f33, 0x206f, 0x8020, 0x780,	0x7207, 0x2472, 0x8a23,
+	0xb689, 0x3ab6, 0xf739, 0x97f6, 0xb097, 0xa4b0, 0xe6a4, 0x88e6, 0x2789,
+	0xb28,	0x350b, 0x1f35, 0x431e, 0x1043, 0xc30f, 0x79c3, 0x379,	0x5703,
+	0x3256, 0x4732, 0x7247, 0x9d72, 0x489d, 0xd348, 0xa4d3, 0x7ca4, 0xbf7b,
+	0x45c0, 0x7b45, 0x337b, 0x4034, 0x843f, 0xd083, 0x35d0, 0x6335, 0x4d63,
+	0xe14c, 0xcce0, 0xfecc, 0x35ff, 0x5636, 0xf856, 0xeef8, 0x2def, 0xfc2d,
+	0x4fc,	0x6e04, 0xb66d, 0x78b6, 0xbb78, 0x3dbb, 0x9a3d, 0x839a, 0x9283,
+	0x593,	0xd504, 0x23d5, 0x5424, 0xd054, 0x61d0, 0xdb61, 0x17db, 0x1f18,
+	0x381f, 0x9e37, 0x679e, 0x1d68, 0x381d, 0x8038, 0x917f, 0x491,	0xbb04,
+	0x23bb, 0x4124, 0xd41,	0xa30c, 0x8ba3, 0x8b8b, 0xc68b, 0xd2c6, 0xebd2,
+	0x93eb, 0xbd93, 0x99bd, 0x1a99, 0xea19, 0x58ea, 0xcf58, 0x73cf, 0x1073,
+	0x9e10, 0x139e, 0xea13, 0xcde9, 0x3ecd, 0x883f, 0xf89,	0x180f, 0x2a18,
+	0x212a, 0xce20, 0x73ce, 0xf373, 0x60f3, 0xad60, 0x4093, 0x8e40, 0xb98e,
+	0xbfb9, 0xf1bf, 0x8bf1, 0x5e8c, 0xe95e, 0x14e9, 0x4e14, 0x1c4e, 0x7f1c,
+	0xe77e, 0x6fe7, 0xf26f, 0x13f2, 0x8b13, 0xda8a, 0x5fda, 0xea5f, 0x4eea,
+	0xa84f, 0x88a8, 0x1f88, 0x2820, 0x9728, 0x5a97, 0x3f5b, 0xb23f, 0x70b2,
+	0x2c70, 0x232d, 0xf623, 0x4f6,	0x905,	0x7509, 0xd675, 0x28d7, 0x9428,
+	0x3794, 0xf036, 0x2bf0, 0xba2c, 0xedb9, 0xd7ed, 0x59d8, 0xed59, 0x4ed,
+	0xe304, 0x18e3, 0x5c19, 0x3d5c, 0x753d, 0x6d75, 0x956d, 0x7f95, 0xc47f,
+	0x83c4, 0xa84,	0x2e0a, 0x5f2e, 0xb95f, 0x77b9, 0x6d78, 0xf46d, 0x1bf4,
+	0xed1b, 0xd6ed, 0xe0d6, 0x5e1,	0x3905, 0x5638, 0xa355, 0x99a2, 0xbe99,
+	0xb4bd, 0x85b4, 0x2e86, 0x542e, 0x6654, 0xd765, 0x73d7, 0x3a74, 0x383a,
+	0x2638, 0x7826, 0x7677, 0x9a76, 0x7e99, 0x2e7e, 0xea2d, 0xa6ea, 0x8a7,
+	0x109,	0x3300, 0xad32, 0x5fad, 0x465f, 0x2f46, 0xc62f, 0xd4c5, 0xad5,
+	0xcb0a, 0x4cb,	0xb004, 0x7baf, 0xe47b, 0x92e4, 0x8e92, 0x638e, 0x1763,
+	0xc17,	0xf20b, 0x1ff2, 0x8920, 0x5889, 0xcb58, 0xf8cb, 0xcaf8, 0x84cb,
+	0x9f84, 0x8a9f, 0x918a, 0x4991, 0x8249, 0xff81, 0x46ff, 0x5046, 0x5f50,
+	0x725f, 0xf772, 0x8ef7, 0xe08f, 0xc1e0, 0x1fc2, 0x9e1f, 0x8b9d, 0x108b,
+	0x411,	0x2b04, 0xb02a, 0x1fb0, 0x1020, 0x7a0f, 0x587a, 0x8958, 0xb188,
+	0xb1b1, 0x49b2, 0xb949, 0x7ab9, 0x917a, 0xfc91, 0xe6fc, 0x47e7, 0xbc47,
+	0x8fbb, 0xea8e, 0x34ea, 0x2635, 0x1726, 0x9616, 0xc196, 0xa6c1, 0xf3a6,
+	0x11f3, 0x4811, 0x3e48, 0xeb3e, 0xf7ea, 0x1bf8, 0xdb1c, 0x8adb, 0xe18a,
+	0x42e1, 0x9d42, 0x5d9c, 0x6e5d, 0x286e, 0x4928, 0x9a49, 0xb09c, 0xa6b0,
+	0x2a7,	0xe702, 0xf5e6, 0x9af5, 0xf9b,	0x810f, 0x8080, 0x180,	0x1702,
+	0x5117, 0xa650, 0x11a6, 0x1011, 0x550f, 0xd554, 0xbdd5, 0x6bbe, 0xc66b,
+	0xfc7,	0x5510, 0x5555, 0x7655, 0x177,	0x2b02, 0x6f2a, 0xb70,	0x9f0b,
+	0xcf9e, 0xf3cf, 0x3ff4, 0xcb40, 0x8ecb, 0x768e, 0x5277, 0x8652, 0x9186,
+	0x9991, 0x5099, 0xd350, 0x93d3, 0x6d94, 0xe6d,	0x530e, 0x3153, 0xa531,
+	0x64a5, 0x7964, 0x7c79, 0x467c, 0x1746, 0x3017, 0x3730, 0x538,	0x5,
+	0x1e00, 0x5b1e, 0x955a, 0xae95, 0x3eaf, 0xff3e, 0xf8ff, 0xb2f9, 0xa1b3,
+	0xb2a1, 0x5b2,	0xad05, 0x7cac, 0x2d7c, 0xd32c, 0x80d2, 0x7280, 0x8d72,
+	0x1b8e, 0x831b, 0xac82, 0xfdac, 0xa7fd, 0x15a8, 0xd614, 0xe0d5, 0x7be0,
+	0xb37b, 0x61b3, 0x9661, 0x9d95, 0xc79d, 0x83c7, 0xd883, 0xead7, 0xceb,
+	0xf60c, 0xa9f5, 0x19a9, 0xa019, 0x8f9f, 0xd48f, 0x3ad5, 0x853a, 0x985,
+	0x5309, 0x6f52, 0x1370, 0x6e13, 0xa96d, 0x98a9, 0x5198, 0x9f51, 0xb69f,
+	0xa1b6, 0x2ea1, 0x672e, 0x2067, 0x6520, 0xaf65, 0x6eaf, 0x7e6f, 0xee7e,
+	0x17ef, 0xa917, 0xcea8, 0x9ace, 0xff99, 0x5dff, 0xdf5d, 0x38df, 0xa39,
+	0x1c0b, 0xe01b, 0x46e0, 0xcb46, 0x90cb, 0xba90, 0x4bb,	0x9104, 0x9d90,
+	0xc89c, 0xf6c8, 0x6cf6, 0x886c, 0x1789, 0xbd17, 0x70bc, 0x7e71, 0x17e,
+	0x1f01, 0xa01f, 0xbaa0, 0x14bb, 0xfc14, 0x7afb, 0xa07a, 0x3da0, 0xbf3d,
+	0x48bf, 0x8c48, 0x968b, 0x9d96, 0xfd9d, 0x96fd, 0x9796, 0x6b97, 0xd16b,
+	0xf4d1, 0x3bf4, 0x253c, 0x9125, 0x6691, 0xc166, 0x34c1, 0x5735, 0x1a57,
+	0xdc19, 0x77db, 0x8577, 0x4a85, 0x824a, 0x9182, 0x7f91, 0xfd7f, 0xb4c3,
+	0xb5b4, 0xb3b5, 0x7eb3, 0x617e, 0x4e61, 0xa4f,	0x530a, 0x3f52, 0xa33e,
+	0x34a3, 0x9234, 0xf091, 0xf4f0, 0x1bf5, 0x311b, 0x9631, 0x6a96, 0x386b,
+	0x1d39, 0xe91d, 0xe8e9, 0x69e8, 0x426a, 0xee42, 0x89ee, 0x368a, 0x2837,
+	0x7428, 0x5974, 0x6159, 0x1d62, 0x7b1d, 0xf77a, 0x7bf7, 0x6b7c, 0x696c,
+	0xf969, 0x4cf9, 0x714c, 0x4e71, 0x6b4e, 0x256c, 0x6e25, 0xe96d, 0x94e9,
+	0x8f94, 0x3e8f, 0x343e, 0x4634, 0xb646, 0x97b5, 0x8997, 0xe8a,	0x900e,
+	0x8090, 0xfd80, 0xa0fd, 0x16a1, 0xf416, 0xebf4, 0x95ec, 0x1196, 0x8911,
+	0x3d89, 0xda3c, 0x9fd9, 0xd79f, 0x4bd7, 0x214c, 0x3021, 0x4f30, 0x994e,
+	0x5c99, 0x6f5d, 0x326f, 0xab31, 0x6aab, 0xe969, 0x90e9, 0x1190, 0xff10,
+	0xa2fe, 0xe0a2, 0x66e1, 0x4067, 0x9e3f, 0x2d9e, 0x712d, 0x8170, 0xd180,
+	0xffd1, 0x25ff, 0x3826, 0x2538, 0x5f24, 0xc45e, 0x1cc4, 0xdf1c, 0x93df,
+	0xc793, 0x80c7, 0x2380, 0xd223, 0x7ed2, 0xfc7e, 0x22fd, 0x7422, 0x1474,
+	0xb714, 0x7db6, 0x857d, 0xa85,	0xa60a, 0x88a6, 0x4289, 0x7842, 0xc278,
+	0xf7c2, 0xcdf7, 0x84cd, 0xae84, 0x8cae, 0xb98c, 0x1aba, 0x4d1a, 0x884c,
+	0x4688, 0xcc46, 0xd8cb, 0x2bd9, 0xbe2b, 0xa2be, 0x72a2, 0xf772, 0xd2f6,
+	0x75d2, 0xc075, 0xa3c0, 0x63a3, 0xae63, 0x8fae, 0x2a90, 0x5f2a, 0xef5f,
+	0x5cef, 0xa05c, 0x89a0, 0x5e89, 0x6b5e, 0x736b, 0x773,	0x9d07, 0xe99c,
+	0x27ea, 0x2028, 0xc20,	0x980b, 0x4797, 0x2848, 0x9828, 0xc197, 0x48c2,
+	0x2449, 0x7024, 0x570,	0x3e05, 0xd3e,	0xf60c, 0xbbf5, 0x69bb, 0x3f6a,
+	0x740,	0xf006, 0xe0ef, 0xbbe0, 0xadbb, 0x56ad, 0xcf56, 0xbfce, 0xa9bf,
+	0x205b, 0x6920, 0xae69, 0x50ae, 0x2050, 0xf01f, 0x27f0, 0x9427, 0x8993,
+	0x8689, 0x4087, 0x6e40, 0xb16e, 0xa1b1, 0xe8a1, 0x87e8, 0x6f88, 0xfe6f,
+	0x4cfe, 0xe94d, 0xd5e9, 0x47d6, 0x3148, 0x5f31, 0xc35f, 0x13c4, 0xa413,
+	0x5a5,	0x2405, 0xc223, 0x66c2, 0x3667, 0x5e37, 0x5f5e, 0x2f5f, 0x8c2f,
+	0xe48c, 0xd0e4, 0x4d1,	0xd104, 0xe4d0, 0xcee4, 0xfcf,	0x480f, 0xa447,
+	0x5ea4, 0xff5e, 0xbefe, 0x8dbe, 0x1d8e, 0x411d, 0x1841, 0x6918, 0x5469,
+	0x1155, 0xc611, 0xaac6, 0x37ab, 0x2f37, 0xca2e, 0x87ca, 0xbd87, 0xabbd,
+	0xb3ab, 0xcb4,	0xce0c, 0xfccd, 0xa5fd, 0x72a5, 0xf072, 0x83f0, 0xfe83,
+	0x97fd, 0xc997, 0xb0c9, 0xadb0, 0xe6ac, 0x88e6, 0x1088, 0xbe10, 0x16be,
+	0xa916, 0xa3a8, 0x46a3, 0x5447, 0xe953, 0x84e8, 0x2085, 0xa11f, 0xfa1,
+	0xdd0f, 0xbedc, 0x5abe, 0x805a, 0xc97f, 0x6dc9, 0x826d, 0x4a82, 0x934a,
+	0x5293, 0xd852, 0xd3d8, 0xadd3, 0xf4ad, 0xf3f4, 0xfcf3, 0xfefc, 0xcafe,
+	0xb7ca, 0x3cb8, 0xa13c, 0x18a1, 0x1418, 0xea13, 0x91ea, 0xf891, 0x53f8,
+	0xa254, 0xe9a2, 0x87ea, 0x4188, 0x1c41, 0xdc1b, 0xf5db, 0xcaf5, 0x45ca,
+	0x6d45, 0x396d, 0xde39, 0x90dd, 0x1e91, 0x1e,	0x7b00, 0x6a7b, 0xa46a,
+	0xc9a3, 0x9bc9, 0x389b, 0x1139, 0x5211, 0x1f52, 0xeb1f, 0xabeb, 0x48ab,
+	0x9348, 0xb392, 0x17b3, 0x1618, 0x5b16, 0x175b, 0xdc17, 0xdedb, 0x1cdf,
+	0xeb1c, 0xd1ea, 0x4ad2, 0xd4b,	0xc20c, 0x24c2, 0x7b25, 0x137b, 0x8b13,
+	0x618b, 0xa061, 0xff9f, 0xfffe, 0x72ff, 0xf572, 0xe2f5, 0xcfe2, 0xd2cf,
+	0x75d3, 0x6a76, 0xc469, 0x1ec4, 0xfc1d, 0x59fb, 0x455a, 0x7a45, 0xa479,
+	0xb7a4
+};
+
 static u8 tmp_buf[TEST_BUFLEN];
 
 #define full_csum(buff, len, sum) csum_fold(csum_partial(buff, len, sum))
@@ -338,10 +575,55 @@ static void test_csum_no_carry_inputs(struct kunit *test)
 	}
 }
 
+static void test_ip_fast_csum(struct kunit *test)
+{
+	__sum16 csum_result, expected;
+
+	for (int len = IPv4_MIN_WORDS; len < IPv4_MAX_WORDS; len++) {
+		for (int index = 0; index < NUM_IP_FAST_CSUM_TESTS; index++) {
+			csum_result = ip_fast_csum(random_buf + index, len);
+			expected =
+				expected_fast_csum[(len - IPv4_MIN_WORDS) *
+						   NUM_IP_FAST_CSUM_TESTS +
+						   index];
+			CHECK_EQ(expected, csum_result);
+		}
+	}
+}
+
+static void test_csum_ipv6_magic(struct kunit *test)
+{
+	const struct in6_addr *saddr;
+	const struct in6_addr *daddr;
+	unsigned int len;
+	unsigned char proto;
+	unsigned int csum;
+
+	const int daddr_offset = sizeof(struct in6_addr);
+	const int len_offset = sizeof(struct in6_addr) + sizeof(struct in6_addr);
+	const int proto_offset = sizeof(struct in6_addr) + sizeof(struct in6_addr) +
+			     sizeof(int);
+	const int csum_offset = sizeof(struct in6_addr) + sizeof(struct in6_addr) +
+			    sizeof(int) + sizeof(char);
+
+	for (int i = 0; i < NUM_IPv6_TESTS; i++) {
+		saddr = (const struct in6_addr *)(random_buf + i);
+		daddr = (const struct in6_addr *)(random_buf + i +
+						  daddr_offset);
+		len = *(unsigned int *)(random_buf + i + len_offset);
+		proto = *(random_buf + i + proto_offset);
+		csum = *(unsigned int *)(random_buf + i + csum_offset);
+		CHECK_EQ(expected_csum_ipv6_magic[i],
+			 csum_ipv6_magic(saddr, daddr, len, proto, csum));
+	}
+}
+
 static struct kunit_case __refdata checksum_test_cases[] = {
 	KUNIT_CASE(test_csum_fixed_random_inputs),
 	KUNIT_CASE(test_csum_all_carry_inputs),
 	KUNIT_CASE(test_csum_no_carry_inputs),
+	KUNIT_CASE(test_ip_fast_csum),
+	KUNIT_CASE(test_csum_ipv6_magic),
 	{}
 };
 

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v15 0/5] riscv: Add fine-tuned checksum functions
  2024-01-08 23:57 [PATCH v15 0/5] riscv: Add fine-tuned checksum functions Charlie Jenkins
                   ` (4 preceding siblings ...)
  2024-01-08 23:57 ` [PATCH v15 5/5] kunit: Add tests for csum_ipv6_magic and ip_fast_csum Charlie Jenkins
@ 2024-01-20 21:09 ` patchwork-bot+linux-riscv
  5 siblings, 0 replies; 15+ messages in thread
From: patchwork-bot+linux-riscv @ 2024-01-20 21:09 UTC (permalink / raw)
  To: Charlie Jenkins
  Cc: linux-riscv, palmer, conor, samuel.holland, David.Laight,
	xiao.w.wang, evan, guoren, linux-kernel, linux-arch,
	paul.walmsley, aou, arnd, david.laight, conor.dooley

Hello:

This series was applied to riscv/linux.git (fixes)
by Palmer Dabbelt <palmer@rivosinc.com>:

On Mon, 08 Jan 2024 15:57:01 -0800 you wrote:
> Each architecture generally implements fine-tuned checksum functions to
> leverage the instruction set. This patch adds the main checksum
> functions that are used in networking. Tested on QEMU, this series
> allows the CHECKSUM_KUNIT tests to complete an average of 50.9% faster.
> 
> This patch takes heavy use of the Zbb extension using alternatives
> patching.
> 
> [...]

Here is the summary with links:
  - [v15,1/5] asm-generic: Improve csum_fold
    https://git.kernel.org/riscv/c/1e7196fa5b03
  - [v15,2/5] riscv: Add static key for misaligned accesses
    https://git.kernel.org/riscv/c/2ce5729fce8f
  - [v15,3/5] riscv: Add checksum header
    https://git.kernel.org/riscv/c/e11e367e9fe5
  - [v15,4/5] riscv: Add checksum library
    https://git.kernel.org/riscv/c/a04c192eabfb
  - [v15,5/5] kunit: Add tests for csum_ipv6_magic and ip_fast_csum
    https://git.kernel.org/riscv/c/6f4c45cbcb00

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v15 5/5] kunit: Add tests for csum_ipv6_magic and ip_fast_csum
  2024-01-08 23:57 ` [PATCH v15 5/5] kunit: Add tests for csum_ipv6_magic and ip_fast_csum Charlie Jenkins
@ 2024-01-22 16:40   ` Guenter Roeck
  2024-01-22 16:52     ` David Laight
  0 siblings, 1 reply; 15+ messages in thread
From: Guenter Roeck @ 2024-01-22 16:40 UTC (permalink / raw)
  To: Charlie Jenkins
  Cc: Palmer Dabbelt, Conor Dooley, Samuel Holland, David Laight,
	Xiao Wang, Evan Green, Guo Ren, linux-riscv, linux-kernel,
	linux-arch, Paul Walmsley, Albert Ou, Arnd Bergmann

Hi,

On Mon, Jan 08, 2024 at 03:57:06PM -0800, Charlie Jenkins wrote:
> Supplement existing checksum tests with tests for csum_ipv6_magic and
> ip_fast_csum.
> 
> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> ---

With this patch in the tree, the arm:mps2-an385 qemu emulation gets a bad hiccup.

[    1.839556] Unhandled exception: IPSR = 00000006 LR = fffffff1
[    1.839804] CPU: 0 PID: 164 Comm: kunit_try_catch Tainted: G                 N 6.8.0-rc1 #1
[    1.839948] Hardware name: Generic DT based system
[    1.840062] PC is at __csum_ipv6_magic+0x8/0xb4
[    1.840408] LR is at test_csum_ipv6_magic+0x3d/0xa4
[    1.840493] pc : [<21212f34>]    lr : [<21117fd5>]    psr: 0100020b
[    1.840586] sp : 2180bebc  ip : 46c7f0d2  fp : 21275b38
[    1.840664] r10: 21276b60  r9 : 21275b28  r8 : 21465cfc
[    1.840751] r7 : 00003085  r6 : 21275b4e  r5 : 2138702c  r4 : 00000001
[    1.840847] r3 : 2c000000  r2 : 1ac7f0d2  r1 : 21275b39  r0 : 21275b29
[    1.840942] xPSR: 0100020b

This translates to:

PC is at __csum_ipv6_magic (arch/arm/lib/csumipv6.S:15)
LR is at test_csum_ipv6_magic (./arch/arm/include/asm/checksum.h:60 ./arch/arm/include/asm/checksum.h:163 lib/checksum_kunit.c:617)

Obviously I can not say if this is a problem with qemu or a problem with
the Linux kernel. Given that, and the presumably low interest in
running mps2-an385 with Linux, I'll simply disable that test. Just take
it as a heads up that there _may_ be a problem with this on arm
nommu systems.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v15 5/5] kunit: Add tests for csum_ipv6_magic and ip_fast_csum
  2024-01-22 16:40   ` Guenter Roeck
@ 2024-01-22 16:52     ` David Laight
  2024-01-22 17:15       ` Guenter Roeck
  0 siblings, 1 reply; 15+ messages in thread
From: David Laight @ 2024-01-22 16:52 UTC (permalink / raw)
  To: 'Guenter Roeck', Charlie Jenkins
  Cc: Palmer Dabbelt, Conor Dooley, Samuel Holland, Xiao Wang,
	Evan Green, Guo Ren, linux-riscv, linux-kernel, linux-arch,
	Paul Walmsley, Albert Ou, Arnd Bergmann

From: Guenter Roeck
> Sent: 22 January 2024 16:40
> 
> Hi,
> 
> On Mon, Jan 08, 2024 at 03:57:06PM -0800, Charlie Jenkins wrote:
> > Supplement existing checksum tests with tests for csum_ipv6_magic and
> > ip_fast_csum.
> >
> > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > ---
> 
> With this patch in the tree, the arm:mps2-an385 qemu emulation gets a bad hiccup.
> 
> [    1.839556] Unhandled exception: IPSR = 00000006 LR = fffffff1
> [    1.839804] CPU: 0 PID: 164 Comm: kunit_try_catch Tainted: G                 N 6.8.0-rc1 #1
> [    1.839948] Hardware name: Generic DT based system
> [    1.840062] PC is at __csum_ipv6_magic+0x8/0xb4
> [    1.840408] LR is at test_csum_ipv6_magic+0x3d/0xa4
> [    1.840493] pc : [<21212f34>]    lr : [<21117fd5>]    psr: 0100020b
> [    1.840586] sp : 2180bebc  ip : 46c7f0d2  fp : 21275b38
> [    1.840664] r10: 21276b60  r9 : 21275b28  r8 : 21465cfc
> [    1.840751] r7 : 00003085  r6 : 21275b4e  r5 : 2138702c  r4 : 00000001
> [    1.840847] r3 : 2c000000  r2 : 1ac7f0d2  r1 : 21275b39  r0 : 21275b29
> [    1.840942] xPSR: 0100020b
> 
> This translates to:
> 
> PC is at __csum_ipv6_magic (arch/arm/lib/csumipv6.S:15)
> LR is at test_csum_ipv6_magic (./arch/arm/include/asm/checksum.h:60
> ./arch/arm/include/asm/checksum.h:163 lib/checksum_kunit.c:617)
> 
> Obviously I can not say if this is a problem with qemu or a problem with
> the Linux kernel. Given that, and the presumably low interest in
> running mps2-an385 with Linux, I'll simply disable that test. Just take
> it as a heads up that there _may_ be a problem with this on arm
> nommu systems.

Can you drop in a disassembly of __csum_ipv6_magic ?
Actually I think it is:
ENTRY(__csum_ipv6_magic)
		str	lr, [sp, #-4]!
		adds	ip, r2, r3
		ldmia	r1, {r1 - r3, lr}

So the fault is (probably) a misaligned ldmia ?
Are they ever supported?

	David

		adcs	ip, ip, r1
		adcs	ip, ip, r2
		adcs	ip, ip, r3
		adcs	ip, ip, lr
		ldmia	r0, {r0 - r3}
		adcs	r0, ip, r0
		adcs	r0, r0, r1
		adcs	r0, r0, r2
		ldr	r2, [sp, #4]
		adcs	r0, r0, r3
		adcs	r0, r0, r2
		adcs	r0, r0, #0
		ldmfd	sp!, {pc}
ENDPROC(__csum_ipv6_magic)

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v15 5/5] kunit: Add tests for csum_ipv6_magic and ip_fast_csum
  2024-01-22 16:52     ` David Laight
@ 2024-01-22 17:15       ` Guenter Roeck
  2024-01-22 21:41         ` David Laight
  0 siblings, 1 reply; 15+ messages in thread
From: Guenter Roeck @ 2024-01-22 17:15 UTC (permalink / raw)
  To: David Laight, Charlie Jenkins
  Cc: Palmer Dabbelt, Conor Dooley, Samuel Holland, Xiao Wang,
	Evan Green, Guo Ren, linux-riscv, linux-kernel, linux-arch,
	Paul Walmsley, Albert Ou, Arnd Bergmann

On 1/22/24 08:52, David Laight wrote:
> From: Guenter Roeck
>> Sent: 22 January 2024 16:40
>>
>> Hi,
>>
>> On Mon, Jan 08, 2024 at 03:57:06PM -0800, Charlie Jenkins wrote:
>>> Supplement existing checksum tests with tests for csum_ipv6_magic and
>>> ip_fast_csum.
>>>
>>> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
>>> ---
>>
>> With this patch in the tree, the arm:mps2-an385 qemu emulation gets a bad hiccup.
>>
>> [    1.839556] Unhandled exception: IPSR = 00000006 LR = fffffff1
>> [    1.839804] CPU: 0 PID: 164 Comm: kunit_try_catch Tainted: G                 N 6.8.0-rc1 #1
>> [    1.839948] Hardware name: Generic DT based system
>> [    1.840062] PC is at __csum_ipv6_magic+0x8/0xb4
>> [    1.840408] LR is at test_csum_ipv6_magic+0x3d/0xa4
>> [    1.840493] pc : [<21212f34>]    lr : [<21117fd5>]    psr: 0100020b
>> [    1.840586] sp : 2180bebc  ip : 46c7f0d2  fp : 21275b38
>> [    1.840664] r10: 21276b60  r9 : 21275b28  r8 : 21465cfc
>> [    1.840751] r7 : 00003085  r6 : 21275b4e  r5 : 2138702c  r4 : 00000001
>> [    1.840847] r3 : 2c000000  r2 : 1ac7f0d2  r1 : 21275b39  r0 : 21275b29
>> [    1.840942] xPSR: 0100020b
>>
>> This translates to:
>>
>> PC is at __csum_ipv6_magic (arch/arm/lib/csumipv6.S:15)
>> LR is at test_csum_ipv6_magic (./arch/arm/include/asm/checksum.h:60
>> ./arch/arm/include/asm/checksum.h:163 lib/checksum_kunit.c:617)
>>
>> Obviously I can not say if this is a problem with qemu or a problem with
>> the Linux kernel. Given that, and the presumably low interest in
>> running mps2-an385 with Linux, I'll simply disable that test. Just take
>> it as a heads up that there _may_ be a problem with this on arm
>> nommu systems.
> 
> Can you drop in a disassembly of __csum_ipv6_magic ?
> Actually I think it is:

It is, as per the PC pointer above. I don't know anything about arm assembler,
much less about its behavior with THUMB code.

> ENTRY(__csum_ipv6_magic)
> 		str	lr, [sp, #-4]!
> 		adds	ip, r2, r3
> 		ldmia	r1, {r1 - r3, lr}
> 
> So the fault is (probably) a misaligned ldmia ?
> Are they ever supported?
> 

Good question. My primary guess is that this never worked. As I said,
this was just intended to be informational, (probably) no reason to bother.

Of course one might ask if it makes sense to even keep the arm nommu code
in the kernel, but that is of course a different question. I do wonder though
if anyone but me is running it.

Thanks,
Guenter

> 	David
> 
> 		adcs	ip, ip, r1
> 		adcs	ip, ip, r2
> 		adcs	ip, ip, r3
> 		adcs	ip, ip, lr
> 		ldmia	r0, {r0 - r3}
> 		adcs	r0, ip, r0
> 		adcs	r0, r0, r1
> 		adcs	r0, r0, r2
> 		ldr	r2, [sp, #4]
> 		adcs	r0, r0, r3
> 		adcs	r0, r0, r2
> 		adcs	r0, r0, #0
> 		ldmfd	sp!, {pc}
> ENDPROC(__csum_ipv6_magic)
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v15 5/5] kunit: Add tests for csum_ipv6_magic and ip_fast_csum
  2024-01-22 17:15       ` Guenter Roeck
@ 2024-01-22 21:41         ` David Laight
  2024-01-22 23:39           ` Palmer Dabbelt
  0 siblings, 1 reply; 15+ messages in thread
From: David Laight @ 2024-01-22 21:41 UTC (permalink / raw)
  To: 'Guenter Roeck', Charlie Jenkins
  Cc: Palmer Dabbelt, Conor Dooley, Samuel Holland, Xiao Wang,
	Evan Green, Guo Ren, linux-riscv, linux-kernel, linux-arch,
	Paul Walmsley, Albert Ou, Arnd Bergmann

From: Guenter Roeck
> Sent: 22 January 2024 17:16
> 
> On 1/22/24 08:52, David Laight wrote:
> > From: Guenter Roeck
> >> Sent: 22 January 2024 16:40
> >>
> >> Hi,
> >>
> >> On Mon, Jan 08, 2024 at 03:57:06PM -0800, Charlie Jenkins wrote:
> >>> Supplement existing checksum tests with tests for csum_ipv6_magic and
> >>> ip_fast_csum.
> >>>
> >>> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> >>> ---
> >>
> >> With this patch in the tree, the arm:mps2-an385 qemu emulation gets a bad hiccup.
> >>
> >> [    1.839556] Unhandled exception: IPSR = 00000006 LR = fffffff1
> >> [    1.839804] CPU: 0 PID: 164 Comm: kunit_try_catch Tainted: G                 N 6.8.0-rc1 #1
> >> [    1.839948] Hardware name: Generic DT based system
> >> [    1.840062] PC is at __csum_ipv6_magic+0x8/0xb4
> >> [    1.840408] LR is at test_csum_ipv6_magic+0x3d/0xa4
> >> [    1.840493] pc : [<21212f34>]    lr : [<21117fd5>]    psr: 0100020b
> >> [    1.840586] sp : 2180bebc  ip : 46c7f0d2  fp : 21275b38
> >> [    1.840664] r10: 21276b60  r9 : 21275b28  r8 : 21465cfc
> >> [    1.840751] r7 : 00003085  r6 : 21275b4e  r5 : 2138702c  r4 : 00000001
> >> [    1.840847] r3 : 2c000000  r2 : 1ac7f0d2  r1 : 21275b39  r0 : 21275b29
> >> [    1.840942] xPSR: 0100020b
> >>
> >> This translates to:
> >>
> >> PC is at __csum_ipv6_magic (arch/arm/lib/csumipv6.S:15)
> >> LR is at test_csum_ipv6_magic (./arch/arm/include/asm/checksum.h:60
> >> ./arch/arm/include/asm/checksum.h:163 lib/checksum_kunit.c:617)
> >>
> >> Obviously I can not say if this is a problem with qemu or a problem with
> >> the Linux kernel. Given that, and the presumably low interest in
> >> running mps2-an385 with Linux, I'll simply disable that test. Just take
> >> it as a heads up that there _may_ be a problem with this on arm
> >> nommu systems.
> >
> > Can you drop in a disassembly of __csum_ipv6_magic ?
> > Actually I think it is:
> 
> It is, as per the PC pointer above. I don't know anything about arm assembler,
> much less about its behavior with THUMB code.

Doesn't look like thumb to me (offset 8 is two 4-byte instructions) and
the code I found looks like arm to me.
(I haven't written any arm asm since before they invented thumb!)

> > ENTRY(__csum_ipv6_magic)
> > 		str	lr, [sp, #-4]!
> > 		adds	ip, r2, r3
> > 		ldmia	r1, {r1 - r3, lr}
> >
> > So the fault is (probably) a misaligned ldmia ?
> > Are they ever supported?
> >
> 
> Good question. My primary guess is that this never worked. As I said,
> this was just intended to be informational, (probably) no reason to bother.
> 
> Of course one might ask if it makes sense to even keep the arm nommu code
> in the kernel, but that is of course a different question. I do wonder though
> if anyone but me is running it.

If it is an alignment fault it isn't a 'nommu' bug.

And traditionally arm didn't support misaligned transfers (well not
in anyway any other cpu did!).
It might be that the kernel assumes that all ethernet packets are
aligned, but the test suite isn't aligning the buffer.
Which would make it a test suite bug.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v15 5/5] kunit: Add tests for csum_ipv6_magic and ip_fast_csum
  2024-01-22 21:41         ` David Laight
@ 2024-01-22 23:39           ` Palmer Dabbelt
  2024-01-22 23:55             ` Charlie Jenkins
  0 siblings, 1 reply; 15+ messages in thread
From: Palmer Dabbelt @ 2024-01-22 23:39 UTC (permalink / raw)
  To: David.Laight
  Cc: linux, charlie, Conor Dooley, samuel.holland, xiao.w.wang,
	Evan Green, guoren, linux-riscv, linux-kernel, linux-arch,
	Paul Walmsley, aou, Arnd Bergmann

On Mon, 22 Jan 2024 13:41:48 PST (-0800), David.Laight@ACULAB.COM wrote:
> From: Guenter Roeck
>> Sent: 22 January 2024 17:16
>> 
>> On 1/22/24 08:52, David Laight wrote:
>> > From: Guenter Roeck
>> >> Sent: 22 January 2024 16:40
>> >>
>> >> Hi,
>> >>
>> >> On Mon, Jan 08, 2024 at 03:57:06PM -0800, Charlie Jenkins wrote:
>> >>> Supplement existing checksum tests with tests for csum_ipv6_magic and
>> >>> ip_fast_csum.
>> >>>
>> >>> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
>> >>> ---
>> >>
>> >> With this patch in the tree, the arm:mps2-an385 qemu emulation gets a bad hiccup.
>> >>
>> >> [    1.839556] Unhandled exception: IPSR = 00000006 LR = fffffff1
>> >> [    1.839804] CPU: 0 PID: 164 Comm: kunit_try_catch Tainted: G                 N 6.8.0-rc1 #1
>> >> [    1.839948] Hardware name: Generic DT based system
>> >> [    1.840062] PC is at __csum_ipv6_magic+0x8/0xb4
>> >> [    1.840408] LR is at test_csum_ipv6_magic+0x3d/0xa4
>> >> [    1.840493] pc : [<21212f34>]    lr : [<21117fd5>]    psr: 0100020b
>> >> [    1.840586] sp : 2180bebc  ip : 46c7f0d2  fp : 21275b38
>> >> [    1.840664] r10: 21276b60  r9 : 21275b28  r8 : 21465cfc
>> >> [    1.840751] r7 : 00003085  r6 : 21275b4e  r5 : 2138702c  r4 : 00000001
>> >> [    1.840847] r3 : 2c000000  r2 : 1ac7f0d2  r1 : 21275b39  r0 : 21275b29
>> >> [    1.840942] xPSR: 0100020b
>> >>
>> >> This translates to:
>> >>
>> >> PC is at __csum_ipv6_magic (arch/arm/lib/csumipv6.S:15)
>> >> LR is at test_csum_ipv6_magic (./arch/arm/include/asm/checksum.h:60
>> >> ./arch/arm/include/asm/checksum.h:163 lib/checksum_kunit.c:617)
>> >>
>> >> Obviously I can not say if this is a problem with qemu or a problem with
>> >> the Linux kernel. Given that, and the presumably low interest in
>> >> running mps2-an385 with Linux, I'll simply disable that test. Just take
>> >> it as a heads up that there _may_ be a problem with this on arm
>> >> nommu systems.
>> >
>> > Can you drop in a disassembly of __csum_ipv6_magic ?
>> > Actually I think it is:
>> 
>> It is, as per the PC pointer above. I don't know anything about arm assembler,
>> much less about its behavior with THUMB code.
> 
> Doesn't look like thumb to me (offset 8 is two 4-byte instructions) and
> the code I found looks like arm to me.
> (I haven't written any arm asm since before they invented thumb!)
> 
>> > ENTRY(__csum_ipv6_magic)
>> > 		str	lr, [sp, #-4]!
>> > 		adds	ip, r2, r3
>> > 		ldmia	r1, {r1 - r3, lr}
>> >
>> > So the fault is (probably) a misaligned ldmia ?
>> > Are they ever supported?
>> >
>> 
>> Good question. My primary guess is that this never worked. As I said,
>> this was just intended to be informational, (probably) no reason to bother.
>> 
>> Of course one might ask if it makes sense to even keep the arm nommu code
>> in the kernel, but that is of course a different question. I do wonder though
>> if anyone but me is running it.
> 
> If it is an alignment fault it isn't a 'nommu' bug.
> 
> And traditionally arm didn't support misaligned transfers (well not
> in anyway any other cpu did!).
> It might be that the kernel assumes that all ethernet packets are
> aligned, but the test suite isn't aligning the buffer.
> Which would make it a test suite bug.

From talking to Evan and Vineet, I think you're right and this is a test 
suite bug: specifically the tests weren't respecting NET_IP_ALIGN.  That 
didn't crop up for ip_fast_csum() as it just uses ldr which supports 
misaligned accesses on the M3 (at least as far as I can tell).

So I think the right fix is something like

    diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c
    index 225bb7701460..2dd282e27dd4 100644
    --- a/lib/checksum_kunit.c
    +++ b/lib/checksum_kunit.c
    @@ -5,6 +5,7 @@
    
     #include <kunit/test.h>
     #include <asm/checksum.h>
    +#include <asm/checksum.h>
     #include <net/ip6_checksum.h>
    
     #define MAX_LEN 512
    @@ -15,6 +16,7 @@
     #define IPv4_MAX_WORDS 15
     #define NUM_IPv6_TESTS 200
     #define NUM_IP_FAST_CSUM_TESTS 181
    +#define SUPPORTED_ALIGNMENT (1 << NET_IP_ALIGN)
    
     /* Values for a little endian CPU. Byte swap each half on big endian CPU. */
     static const u32 random_init_sum = 0x2847aab;
    @@ -486,7 +488,7 @@ static void test_csum_fixed_random_inputs(struct kunit *test)
     	__sum16 result, expec;
    
     	assert_setup_correct(test);
    -	for (align = 0; align < TEST_BUFLEN; ++align) {
    +	for (align = 0; align < TEST_BUFLEN; align += SUPPORTED_ALIGNMENT) {
     		memcpy(&tmp_buf[align], random_buf,
     		       min(MAX_LEN, TEST_BUFLEN - align));
     		for (len = 0; len < MAX_LEN && (align + len) < TEST_BUFLEN;
    @@ -513,7 +515,7 @@ static void test_csum_all_carry_inputs(struct kunit *test)
    
     	assert_setup_correct(test);
     	memset(tmp_buf, 0xff, TEST_BUFLEN);
    -	for (align = 0; align < TEST_BUFLEN; ++align) {
    +	for (align = 0; align < TEST_BUFLEN; align += SUPPORTED_ALIGNMENT) {
     		for (len = 0; len < MAX_LEN && (align + len) < TEST_BUFLEN;
     		     ++len) {
     			/*
    @@ -553,7 +555,7 @@ static void test_csum_no_carry_inputs(struct kunit *test)
    
     	assert_setup_correct(test);
     	memset(tmp_buf, 0x4, TEST_BUFLEN);
    -	for (align = 0; align < TEST_BUFLEN; ++align) {
    +	for (align = 0; align < TEST_BUFLEN; align += SUPPORTED_ALIGNMENT) {
     		for (len = 0; len < MAX_LEN && (align + len) < TEST_BUFLEN;
     		     ++len) {
     			/*

but I haven't even build tested it...

> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v15 5/5] kunit: Add tests for csum_ipv6_magic and ip_fast_csum
  2024-01-22 23:39           ` Palmer Dabbelt
@ 2024-01-22 23:55             ` Charlie Jenkins
  2024-01-23  1:06               ` Guenter Roeck
  0 siblings, 1 reply; 15+ messages in thread
From: Charlie Jenkins @ 2024-01-22 23:55 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: David.Laight, linux, Conor Dooley, samuel.holland, xiao.w.wang,
	Evan Green, guoren, linux-riscv, linux-kernel, linux-arch,
	Paul Walmsley, aou, Arnd Bergmann

On Mon, Jan 22, 2024 at 03:39:16PM -0800, Palmer Dabbelt wrote:
> On Mon, 22 Jan 2024 13:41:48 PST (-0800), David.Laight@ACULAB.COM wrote:
> > From: Guenter Roeck
> > > Sent: 22 January 2024 17:16
> > > 
> > > On 1/22/24 08:52, David Laight wrote:
> > > > From: Guenter Roeck
> > > >> Sent: 22 January 2024 16:40
> > > >>
> > > >> Hi,
> > > >>
> > > >> On Mon, Jan 08, 2024 at 03:57:06PM -0800, Charlie Jenkins wrote:
> > > >>> Supplement existing checksum tests with tests for csum_ipv6_magic and
> > > >>> ip_fast_csum.
> > > >>>
> > > >>> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > > >>> ---
> > > >>
> > > >> With this patch in the tree, the arm:mps2-an385 qemu emulation gets a bad hiccup.
> > > >>
> > > >> [    1.839556] Unhandled exception: IPSR = 00000006 LR = fffffff1
> > > >> [    1.839804] CPU: 0 PID: 164 Comm: kunit_try_catch Tainted: G                 N 6.8.0-rc1 #1
> > > >> [    1.839948] Hardware name: Generic DT based system
> > > >> [    1.840062] PC is at __csum_ipv6_magic+0x8/0xb4
> > > >> [    1.840408] LR is at test_csum_ipv6_magic+0x3d/0xa4
> > > >> [    1.840493] pc : [<21212f34>]    lr : [<21117fd5>]    psr: 0100020b
> > > >> [    1.840586] sp : 2180bebc  ip : 46c7f0d2  fp : 21275b38
> > > >> [    1.840664] r10: 21276b60  r9 : 21275b28  r8 : 21465cfc
> > > >> [    1.840751] r7 : 00003085  r6 : 21275b4e  r5 : 2138702c  r4 : 00000001
> > > >> [    1.840847] r3 : 2c000000  r2 : 1ac7f0d2  r1 : 21275b39  r0 : 21275b29
> > > >> [    1.840942] xPSR: 0100020b
> > > >>
> > > >> This translates to:
> > > >>
> > > >> PC is at __csum_ipv6_magic (arch/arm/lib/csumipv6.S:15)
> > > >> LR is at test_csum_ipv6_magic (./arch/arm/include/asm/checksum.h:60
> > > >> ./arch/arm/include/asm/checksum.h:163 lib/checksum_kunit.c:617)
> > > >>
> > > >> Obviously I can not say if this is a problem with qemu or a problem with
> > > >> the Linux kernel. Given that, and the presumably low interest in
> > > >> running mps2-an385 with Linux, I'll simply disable that test. Just take
> > > >> it as a heads up that there _may_ be a problem with this on arm
> > > >> nommu systems.
> > > >
> > > > Can you drop in a disassembly of __csum_ipv6_magic ?
> > > > Actually I think it is:
> > > 
> > > It is, as per the PC pointer above. I don't know anything about arm assembler,
> > > much less about its behavior with THUMB code.
> > 
> > Doesn't look like thumb to me (offset 8 is two 4-byte instructions) and
> > the code I found looks like arm to me.
> > (I haven't written any arm asm since before they invented thumb!)
> > 
> > > > ENTRY(__csum_ipv6_magic)
> > > > 		str	lr, [sp, #-4]!
> > > > 		adds	ip, r2, r3
> > > > 		ldmia	r1, {r1 - r3, lr}
> > > >
> > > > So the fault is (probably) a misaligned ldmia ?
> > > > Are they ever supported?
> > > >
> > > 
> > > Good question. My primary guess is that this never worked. As I said,
> > > this was just intended to be informational, (probably) no reason to bother.
> > > 
> > > Of course one might ask if it makes sense to even keep the arm nommu code
> > > in the kernel, but that is of course a different question. I do wonder though
> > > if anyone but me is running it.
> > 
> > If it is an alignment fault it isn't a 'nommu' bug.
> > 
> > And traditionally arm didn't support misaligned transfers (well not
> > in anyway any other cpu did!).
> > It might be that the kernel assumes that all ethernet packets are
> > aligned, but the test suite isn't aligning the buffer.
> > Which would make it a test suite bug.
> 
> From talking to Evan and Vineet, I think you're right and this is a test
> suite bug: specifically the tests weren't respecting NET_IP_ALIGN.  That
> didn't crop up for ip_fast_csum() as it just uses ldr which supports
> misaligned accesses on the M3 (at least as far as I can tell).
> 
> So I think the right fix is something like
> 
>    diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c
>    index 225bb7701460..2dd282e27dd4 100644
>    --- a/lib/checksum_kunit.c
>    +++ b/lib/checksum_kunit.c
>    @@ -5,6 +5,7 @@
>     #include <kunit/test.h>
>     #include <asm/checksum.h>
>    +#include <asm/checksum.h>
>     #include <net/ip6_checksum.h>
>     #define MAX_LEN 512
>    @@ -15,6 +16,7 @@
>     #define IPv4_MAX_WORDS 15
>     #define NUM_IPv6_TESTS 200
>     #define NUM_IP_FAST_CSUM_TESTS 181
>    +#define SUPPORTED_ALIGNMENT (1 << NET_IP_ALIGN)
>     /* Values for a little endian CPU. Byte swap each half on big endian CPU. */
>     static const u32 random_init_sum = 0x2847aab;
>    @@ -486,7 +488,7 @@ static void test_csum_fixed_random_inputs(struct kunit *test)
>     	__sum16 result, expec;
>     	assert_setup_correct(test);
>    -	for (align = 0; align < TEST_BUFLEN; ++align) {
>    +	for (align = 0; align < TEST_BUFLEN; align += SUPPORTED_ALIGNMENT) {
>     		memcpy(&tmp_buf[align], random_buf,
>     		       min(MAX_LEN, TEST_BUFLEN - align));
>     		for (len = 0; len < MAX_LEN && (align + len) < TEST_BUFLEN;
>    @@ -513,7 +515,7 @@ static void test_csum_all_carry_inputs(struct kunit *test)
>     	assert_setup_correct(test);
>     	memset(tmp_buf, 0xff, TEST_BUFLEN);
>    -	for (align = 0; align < TEST_BUFLEN; ++align) {
>    +	for (align = 0; align < TEST_BUFLEN; align += SUPPORTED_ALIGNMENT) {
>     		for (len = 0; len < MAX_LEN && (align + len) < TEST_BUFLEN;
>     		     ++len) {
>     			/*
>    @@ -553,7 +555,7 @@ static void test_csum_no_carry_inputs(struct kunit *test)
>     	assert_setup_correct(test);
>     	memset(tmp_buf, 0x4, TEST_BUFLEN);
>    -	for (align = 0; align < TEST_BUFLEN; ++align) {
>    +	for (align = 0; align < TEST_BUFLEN; align += SUPPORTED_ALIGNMENT) {
>     		for (len = 0; len < MAX_LEN && (align + len) < TEST_BUFLEN;
>     		     ++len) {
>     			/*
> 
> but I haven't even build tested it...

This doesn't fix the test_csum_ipv6_magic test case that was causing the
initial problem, but the same trick can be done in that test.

- Charlie

> 
> > 
> > 	David
> > 
> > -
> > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> > Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v15 5/5] kunit: Add tests for csum_ipv6_magic and ip_fast_csum
  2024-01-22 23:55             ` Charlie Jenkins
@ 2024-01-23  1:06               ` Guenter Roeck
  2024-01-23 10:16                 ` David Laight
  0 siblings, 1 reply; 15+ messages in thread
From: Guenter Roeck @ 2024-01-23  1:06 UTC (permalink / raw)
  To: Charlie Jenkins, Palmer Dabbelt
  Cc: David.Laight, Conor Dooley, samuel.holland, xiao.w.wang,
	Evan Green, guoren, linux-riscv, linux-kernel, linux-arch,
	Paul Walmsley, aou, Arnd Bergmann

On 1/22/24 15:55, Charlie Jenkins wrote:
> On Mon, Jan 22, 2024 at 03:39:16PM -0800, Palmer Dabbelt wrote:
>> On Mon, 22 Jan 2024 13:41:48 PST (-0800), David.Laight@ACULAB.COM wrote:
>>> From: Guenter Roeck
>>>> Sent: 22 January 2024 17:16
>>>>
>>>> On 1/22/24 08:52, David Laight wrote:
>>>>> From: Guenter Roeck
>>>>>> Sent: 22 January 2024 16:40
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On Mon, Jan 08, 2024 at 03:57:06PM -0800, Charlie Jenkins wrote:
>>>>>>> Supplement existing checksum tests with tests for csum_ipv6_magic and
>>>>>>> ip_fast_csum.
>>>>>>>
>>>>>>> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
>>>>>>> ---
>>>>>>
>>>>>> With this patch in the tree, the arm:mps2-an385 qemu emulation gets a bad hiccup.
>>>>>>
>>>>>> [    1.839556] Unhandled exception: IPSR = 00000006 LR = fffffff1
>>>>>> [    1.839804] CPU: 0 PID: 164 Comm: kunit_try_catch Tainted: G                 N 6.8.0-rc1 #1
>>>>>> [    1.839948] Hardware name: Generic DT based system
>>>>>> [    1.840062] PC is at __csum_ipv6_magic+0x8/0xb4
>>>>>> [    1.840408] LR is at test_csum_ipv6_magic+0x3d/0xa4
>>>>>> [    1.840493] pc : [<21212f34>]    lr : [<21117fd5>]    psr: 0100020b
>>>>>> [    1.840586] sp : 2180bebc  ip : 46c7f0d2  fp : 21275b38
>>>>>> [    1.840664] r10: 21276b60  r9 : 21275b28  r8 : 21465cfc
>>>>>> [    1.840751] r7 : 00003085  r6 : 21275b4e  r5 : 2138702c  r4 : 00000001
>>>>>> [    1.840847] r3 : 2c000000  r2 : 1ac7f0d2  r1 : 21275b39  r0 : 21275b29
>>>>>> [    1.840942] xPSR: 0100020b
>>>>>>
>>>>>> This translates to:
>>>>>>
>>>>>> PC is at __csum_ipv6_magic (arch/arm/lib/csumipv6.S:15)
>>>>>> LR is at test_csum_ipv6_magic (./arch/arm/include/asm/checksum.h:60
>>>>>> ./arch/arm/include/asm/checksum.h:163 lib/checksum_kunit.c:617)
>>>>>>
>>>>>> Obviously I can not say if this is a problem with qemu or a problem with
>>>>>> the Linux kernel. Given that, and the presumably low interest in
>>>>>> running mps2-an385 with Linux, I'll simply disable that test. Just take
>>>>>> it as a heads up that there _may_ be a problem with this on arm
>>>>>> nommu systems.
>>>>>
>>>>> Can you drop in a disassembly of __csum_ipv6_magic ?
>>>>> Actually I think it is:
>>>>
>>>> It is, as per the PC pointer above. I don't know anything about arm assembler,
>>>> much less about its behavior with THUMB code.
>>>
>>> Doesn't look like thumb to me (offset 8 is two 4-byte instructions) and
>>> the code I found looks like arm to me.
>>> (I haven't written any arm asm since before they invented thumb!)
>>>
>>>>> ENTRY(__csum_ipv6_magic)
>>>>> 		str	lr, [sp, #-4]!
>>>>> 		adds	ip, r2, r3
>>>>> 		ldmia	r1, {r1 - r3, lr}
>>>>>
>>>>> So the fault is (probably) a misaligned ldmia ?
>>>>> Are they ever supported?
>>>>>
>>>>
>>>> Good question. My primary guess is that this never worked. As I said,
>>>> this was just intended to be informational, (probably) no reason to bother.
>>>>
>>>> Of course one might ask if it makes sense to even keep the arm nommu code
>>>> in the kernel, but that is of course a different question. I do wonder though
>>>> if anyone but me is running it.
>>>
>>> If it is an alignment fault it isn't a 'nommu' bug.
>>>
>>> And traditionally arm didn't support misaligned transfers (well not
>>> in anyway any other cpu did!).
>>> It might be that the kernel assumes that all ethernet packets are
>>> aligned, but the test suite isn't aligning the buffer.
>>> Which would make it a test suite bug.
>>
>>  From talking to Evan and Vineet, I think you're right and this is a test
>> suite bug: specifically the tests weren't respecting NET_IP_ALIGN.  That
>> didn't crop up for ip_fast_csum() as it just uses ldr which supports
>> misaligned accesses on the M3 (at least as far as I can tell).
>>
>> So I think the right fix is something like
>>
>>     diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c
>>     index 225bb7701460..2dd282e27dd4 100644
>>     --- a/lib/checksum_kunit.c
>>     +++ b/lib/checksum_kunit.c
>>     @@ -5,6 +5,7 @@
>>      #include <kunit/test.h>
>>      #include <asm/checksum.h>
>>     +#include <asm/checksum.h>
>>      #include <net/ip6_checksum.h>
>>      #define MAX_LEN 512
>>     @@ -15,6 +16,7 @@
>>      #define IPv4_MAX_WORDS 15
>>      #define NUM_IPv6_TESTS 200
>>      #define NUM_IP_FAST_CSUM_TESTS 181
>>     +#define SUPPORTED_ALIGNMENT (1 << NET_IP_ALIGN)
>>      /* Values for a little endian CPU. Byte swap each half on big endian CPU. */
>>      static const u32 random_init_sum = 0x2847aab;
>>     @@ -486,7 +488,7 @@ static void test_csum_fixed_random_inputs(struct kunit *test)
>>      	__sum16 result, expec;
>>      	assert_setup_correct(test);
>>     -	for (align = 0; align < TEST_BUFLEN; ++align) {
>>     +	for (align = 0; align < TEST_BUFLEN; align += SUPPORTED_ALIGNMENT) {
>>      		memcpy(&tmp_buf[align], random_buf,
>>      		       min(MAX_LEN, TEST_BUFLEN - align));
>>      		for (len = 0; len < MAX_LEN && (align + len) < TEST_BUFLEN;
>>     @@ -513,7 +515,7 @@ static void test_csum_all_carry_inputs(struct kunit *test)
>>      	assert_setup_correct(test);
>>      	memset(tmp_buf, 0xff, TEST_BUFLEN);
>>     -	for (align = 0; align < TEST_BUFLEN; ++align) {
>>     +	for (align = 0; align < TEST_BUFLEN; align += SUPPORTED_ALIGNMENT) {
>>      		for (len = 0; len < MAX_LEN && (align + len) < TEST_BUFLEN;
>>      		     ++len) {
>>      			/*
>>     @@ -553,7 +555,7 @@ static void test_csum_no_carry_inputs(struct kunit *test)
>>      	assert_setup_correct(test);
>>      	memset(tmp_buf, 0x4, TEST_BUFLEN);
>>     -	for (align = 0; align < TEST_BUFLEN; ++align) {
>>     +	for (align = 0; align < TEST_BUFLEN; align += SUPPORTED_ALIGNMENT) {
>>      		for (len = 0; len < MAX_LEN && (align + len) < TEST_BUFLEN;
>>      		     ++len) {
>>      			/*
>>
>> but I haven't even build tested it...
> 
> This doesn't fix the test_csum_ipv6_magic test case that was causing the
> initial problem, but the same trick can be done in that test.
> 

The above didn't (and still doesn't) fail for me. The following fixes the problem.
So, yes, I guess the problem has to do with alignment. I don't know if NET_IP_ALIGN
would do the trick, though - it works, but it seems to me that the definition of
NET_IP_ALIGN is supposed to address potential performance issues, not mandatory
IP header alignment.

Thanks,
Guenter

---
diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c
index 225bb7701460..c8730af2a474 100644
--- a/lib/checksum_kunit.c
+++ b/lib/checksum_kunit.c
@@ -591,6 +591,8 @@ static void test_ip_fast_csum(struct kunit *test)
         }
  }

+#define SUPPORTED_ALIGNMENT (1 << NET_IP_ALIGN)
+
  static void test_csum_ipv6_magic(struct kunit *test)
  {
  #if defined(CONFIG_NET)
@@ -607,7 +609,7 @@ static void test_csum_ipv6_magic(struct kunit *test)
         const int csum_offset = sizeof(struct in6_addr) + sizeof(struct in6_addr) +
                             sizeof(int) + sizeof(char);

-       for (int i = 0; i < NUM_IPv6_TESTS; i++) {
+       for (int i = 0; i < NUM_IPv6_TESTS; i+=SUPPORTED_ALIGNMENT) {
                 saddr = (const struct in6_addr *)(random_buf + i);
                 daddr = (const struct in6_addr *)(random_buf + i +
                                                   daddr_offset);



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* RE: [PATCH v15 5/5] kunit: Add tests for csum_ipv6_magic and ip_fast_csum
  2024-01-23  1:06               ` Guenter Roeck
@ 2024-01-23 10:16                 ` David Laight
  0 siblings, 0 replies; 15+ messages in thread
From: David Laight @ 2024-01-23 10:16 UTC (permalink / raw)
  To: 'Guenter Roeck', Charlie Jenkins, Palmer Dabbelt
  Cc: Conor Dooley, samuel.holland, xiao.w.wang, Evan Green, guoren,
	linux-riscv, linux-kernel, linux-arch, Paul Walmsley, aou,
	Arnd Bergmann

From: Guenter Roeck
> Sent: 23 January 2024 01:06
...
> >>     +#define SUPPORTED_ALIGNMENT (1 << NET_IP_ALIGN)
> >>      /* Values for a little endian CPU. Byte swap each half on big endian CPU. */
> >>      static const u32 random_init_sum = 0x2847aab;
> >>     @@ -486,7 +488,7 @@ static void test_csum_fixed_random_inputs(struct kunit *test)
> >>      	__sum16 result, expec;
> >>      	assert_setup_correct(test);
> >>     -	for (align = 0; align < TEST_BUFLEN; ++align) {
> >>     +	for (align = 0; align < TEST_BUFLEN; align += SUPPORTED_ALIGNMENT) {
...

That is all wrong.
NET_IP_ALIGN is the offset for the base of ethernet frames.
If zero the IP header will (usually) be misaligned.
If two the mac addresses are misaligned in order to align
the IP header (6+6+2 bytes in).
I don't think any other values are actually valid, but
there is always that possibility.

So the definition should really be:
#define SUPPORTED_ALIGNMENT (NET_IP_ALIGN ? 4 : 1)

(Which might happen to be the same values :-)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-01-23 10:16 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-08 23:57 [PATCH v15 0/5] riscv: Add fine-tuned checksum functions Charlie Jenkins
2024-01-08 23:57 ` [PATCH v15 1/5] asm-generic: Improve csum_fold Charlie Jenkins
2024-01-08 23:57 ` [PATCH v15 2/5] riscv: Add static key for misaligned accesses Charlie Jenkins
2024-01-08 23:57 ` [PATCH v15 3/5] riscv: Add checksum header Charlie Jenkins
2024-01-08 23:57 ` [PATCH v15 4/5] riscv: Add checksum library Charlie Jenkins
2024-01-08 23:57 ` [PATCH v15 5/5] kunit: Add tests for csum_ipv6_magic and ip_fast_csum Charlie Jenkins
2024-01-22 16:40   ` Guenter Roeck
2024-01-22 16:52     ` David Laight
2024-01-22 17:15       ` Guenter Roeck
2024-01-22 21:41         ` David Laight
2024-01-22 23:39           ` Palmer Dabbelt
2024-01-22 23:55             ` Charlie Jenkins
2024-01-23  1:06               ` Guenter Roeck
2024-01-23 10:16                 ` David Laight
2024-01-20 21:09 ` [PATCH v15 0/5] riscv: Add fine-tuned checksum functions patchwork-bot+linux-riscv

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).