From: Shaokun Zhang <zhangshaokun@hisilicon.com>
To: <linux-arm-kernel@lists.infradead.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>,
Robin Murphy <robin.murphy@arm.com>,
Shaokun Zhang <zhangshaokun@hisilicon.com>,
Lingyan Huang <huanglingyan2@huawei.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>
Subject: [PATCH v4] arm64: lib: accelerate do_csum
Date: Wed, 6 Nov 2019 10:20:06 +0800 [thread overview]
Message-ID: <1573006806-12037-1-git-send-email-zhangshaokun@hisilicon.com> (raw)
From: Lingyan Huang <huanglingyan2@huawei.com>
Function do_csum() in lib/checksum.c is used to compute checksum,
which is turned out to be slowly and costs a lot of resources.
Let's accelerate the checksum computation for arm64.
While we test its performance on Huawei Kunpeng 920 SoC, as follow:
1cycle general(ns) csum_128(ns) csum_64(ns)
64B: 160 80 50
256B: 120 70 60
1023B: 350 140 150
1024B: 350 130 140
1500B: 470 170 180
2048B: 630 210 240
4095B: 1220 390 430
4096B: 1230 390 430
Cc: Will Deacon <will@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Originally-from: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Lingyan Huang <huanglingyan2@huawei.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
---
Hi,
Apologies that we post this version so later, because we want to
optimise it better, Lingyan tested it performance which is attached
in commit log. Both(128 and 64) are much better than the initial
code.
ChangeLog:
based on Robin's code and change strides from 64 to 128.
arch/arm64/include/asm/checksum.h | 3 ++
arch/arm64/lib/Makefile | 2 +-
arch/arm64/lib/csum.c | 81 +++++++++++++++++++++++++++++++++++++++
3 files changed, 85 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/lib/csum.c
diff --git a/arch/arm64/include/asm/checksum.h b/arch/arm64/include/asm/checksum.h
index d064a50deb5f..8d2a7de39744 100644
--- a/arch/arm64/include/asm/checksum.h
+++ b/arch/arm64/include/asm/checksum.h
@@ -35,6 +35,9 @@ static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
}
#define ip_fast_csum ip_fast_csum
+extern unsigned int do_csum(const unsigned char *buff, int len);
+#define do_csum do_csum
+
#include <asm-generic/checksum.h>
#endif /* __ASM_CHECKSUM_H */
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index c21b936dc01d..8a0644a831eb 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -3,7 +3,7 @@ lib-y := clear_user.o delay.o copy_from_user.o \
copy_to_user.o copy_in_user.o copy_page.o \
clear_page.o memchr.o memcpy.o memmove.o memset.o \
memcmp.o strcmp.o strncmp.o strlen.o strnlen.o \
- strchr.o strrchr.o tishift.o
+ strchr.o strrchr.o tishift.o csum.o
ifeq ($(CONFIG_KERNEL_MODE_NEON), y)
obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o
diff --git a/arch/arm64/lib/csum.c b/arch/arm64/lib/csum.c
new file mode 100644
index 000000000000..20170d8dcbc4
--- /dev/null
+++ b/arch/arm64/lib/csum.c
@@ -0,0 +1,81 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (C) 2019 Arm Ltd.
+
+#include <linux/compiler.h>
+#include <linux/kasan-checks.h>
+#include <linux/kernel.h>
+
+#include <net/checksum.h>
+
+
+/* handle overflow */
+static __uint128_t accumulate128(__uint128_t sum, __uint128_t data)
+{
+ sum += (sum >> 64) | (sum << 64);
+ data += (data >> 64) | (data << 64);
+ return (sum + data) >> 64;
+}
+
+unsigned int do_csum(const unsigned char *buff, int len)
+{
+ unsigned int offset, shift, sum, count;
+ __uint128_t data, *ptr;
+ __uint128_t sum128 = 0;
+ u64 sum64 = 0;
+
+ offset = (unsigned long)buff & 0xf;
+ /*
+ * This is to all intents and purposes safe, since rounding down cannot
+ * result in a different page or cache line being accessed, and @buff
+ * should absolutely not be pointing to anything read-sensitive. We do,
+ * however, have to be careful not to piss off KASAN, which means using
+ * unchecked reads to accommodate the head and tail, for which we'll
+ * compensate with an explicit check up-front.
+ */
+ kasan_check_read(buff, len);
+ ptr = (__uint128_t *)(buff - offset);
+ shift = offset * 8;
+
+ /*
+ * Head: zero out any excess leading bytes. Shifting back by the same
+ * amount should be at least as fast as any other way of handling the
+ * odd/even alignment, and means we can ignore it until the very end.
+ */
+ data = READ_ONCE_NOCHECK(*ptr++);
+#ifdef __LITTLE_ENDIAN
+ data = (data >> shift) << shift;
+#else
+ data = (data << shift) >> shift;
+#endif
+ count = 16 - offset;
+
+ /* Body: straightforward aligned loads from here on... */
+
+ while (len > count) {
+ sum128 = accumulate128(sum128, data);
+ data = READ_ONCE_NOCHECK(*ptr++);
+ count += 16;
+ }
+ /*
+ * Tail: zero any over-read bytes similarly to the head, again
+ * preserving odd/even alignment.
+ */
+ shift = (count - len) * 8;
+#ifdef __LITTLE_ENDIAN
+ data = (data << shift) >> shift;
+#else
+ data = (data >> shift) << shift;
+#endif
+ sum128 = accumulate128(sum128, data);
+
+ /* Finally, folding */
+ sum128 += (sum128 >> 64) | (sum128 << 64);
+ sum64 = (sum128 >> 64);
+ sum64 += (sum64 >> 32) | (sum64 << 32);
+ sum = (sum64 >> 32);
+ sum += (sum >> 16) | (sum << 16);
+ if (offset & 1)
+ return (u16)swab32(sum);
+
+ return sum >> 16;
+}
--
2.7.4
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next reply other threads:[~2019-11-06 2:20 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-06 2:20 Shaokun Zhang [this message]
2020-01-08 17:20 ` [PATCH v4] arm64: lib: accelerate do_csum Will Deacon
2020-01-11 8:09 ` Shaokun Zhang
2020-01-14 12:18 ` Robin Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1573006806-12037-1-git-send-email-zhangshaokun@hisilicon.com \
--to=zhangshaokun@hisilicon.com \
--cc=ard.biesheuvel@linaro.org \
--cc=catalin.marinas@arm.com \
--cc=huanglingyan2@huawei.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=robin.murphy@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).