From: Robin Murphy <robin.murphy@arm.com>
To: will@kernel.org, catalin.marinas@arm.com
Cc: linux-arm-kernel@lists.infradead.org, yangyingliang@huawei.com,
shenkai8@huawei.com
Subject: [PATCH 7/8] arm64: Better optimised memchr()
Date: Tue, 11 May 2021 17:12:37 +0100 [thread overview]
Message-ID: <68d3229c208780fdde88ba48e14fcde8c50f6e76.1620738177.git.robin.murphy@arm.com> (raw)
In-Reply-To: <cover.1620738177.git.robin.murphy@arm.com>
Although we implement our own assembly version of memchr(), it turns
out to be barely any better than what GCC can generate for the generic
C version (and would go wrong if the size_t argument were ever large
enough to be interpreted as negative). Unfortunately we can't import the
tuned implementation from the Arm optimized-routines library, since that
has some Advanced SIMD parts which are not really viable for general
kernel library code. What we can do, however, is pep things up with some
relatively straightforward word-at-a-time logic for larger calls.
Adding some timing to optimized-routines' memchr() test for a simple
benchmark, overall this version comes in around half as fast as the SIMD
code, but still nearly 4x faster than our existing implementation.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
arch/arm64/lib/memchr.S | 65 +++++++++++++++++++++++++++++++++--------
1 file changed, 53 insertions(+), 12 deletions(-)
diff --git a/arch/arm64/lib/memchr.S b/arch/arm64/lib/memchr.S
index edf6b970a277..7c2276fdab54 100644
--- a/arch/arm64/lib/memchr.S
+++ b/arch/arm64/lib/memchr.S
@@ -1,9 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/*
- * Based on arch/arm/lib/memchr.S
- *
- * Copyright (C) 1995-2000 Russell King
- * Copyright (C) 2013 ARM Ltd.
+ * Copyright (C) 2021 Arm Ltd.
*/
#include <linux/linkage.h>
@@ -19,16 +16,60 @@
* Returns:
* x0 - address of first occurrence of 'c' or 0
*/
+
+#define L(label) .L ## label
+
+#define REP8_01 0x0101010101010101
+#define REP8_7f 0x7f7f7f7f7f7f7f7f
+
+#define srcin x0
+#define chrin w1
+#define cntin x2
+
+#define result x0
+
+#define wordcnt x3
+#define rep01 x4
+#define repchr x5
+#define cur_word x6
+#define cur_byte w6
+#define tmp x7
+#define tmp2 x8
+
+ .p2align 4
+ nop
SYM_FUNC_START_WEAK_PI(memchr)
- and w1, w1, #0xff
-1: subs x2, x2, #1
- b.mi 2f
- ldrb w3, [x0], #1
- cmp w3, w1
- b.ne 1b
- sub x0, x0, #1
+ and chrin, chrin, #0xff
+ lsr wordcnt, cntin, #3
+ cbz wordcnt, L(byte_loop)
+ mov rep01, #REP8_01
+ mul repchr, x1, rep01
+ and cntin, cntin, #7
+L(word_loop):
+ ldr cur_word, [srcin], #8
+ sub wordcnt, wordcnt, #1
+ eor cur_word, cur_word, repchr
+ sub tmp, cur_word, rep01
+ orr tmp2, cur_word, #REP8_7f
+ bics tmp, tmp, tmp2
+ b.ne L(found_word)
+ cbnz wordcnt, L(word_loop)
+L(byte_loop):
+ cbz cntin, L(not_found)
+ ldrb cur_byte, [srcin], #1
+ sub cntin, cntin, #1
+ cmp cur_byte, chrin
+ b.ne L(byte_loop)
+ sub srcin, srcin, #1
ret
-2: mov x0, #0
+L(found_word):
+CPU_LE( rev tmp, tmp)
+ clz tmp, tmp
+ sub tmp, tmp, #64
+ add result, srcin, tmp, asr #3
+ ret
+L(not_found):
+ mov result, #0
ret
SYM_FUNC_END_PI(memchr)
EXPORT_SYMBOL_NOKASAN(memchr)
--
2.21.0.dirty
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2021-05-11 16:32 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-11 16:12 [PATCH 0/8] arm64: String function updates Robin Murphy
2021-05-11 16:12 ` [PATCH 1/8] arm64: Import latest version of Cortex Strings' memcmp Robin Murphy
2021-05-12 13:28 ` Mark Rutland
2021-05-12 13:38 ` Robin Murphy
2021-05-12 14:51 ` Szabolcs Nagy
2021-05-26 10:17 ` Mark Rutland
2021-05-11 16:12 ` [PATCH 2/8] arm64: Import latest version of Cortex Strings' strcmp Robin Murphy
2021-05-11 16:12 ` [PATCH 3/8] arm64: Import updated version of Cortex Strings' strlen Robin Murphy
2021-05-11 16:12 ` [PATCH 4/8] arm64: Import latest version of Cortex Strings' strncmp Robin Murphy
2021-05-11 16:12 ` [PATCH 5/8] arm64: Add assembly annotations for weak-PI-alias madness Robin Murphy
2021-05-11 16:12 ` [PATCH 6/8] arm64: Import latest memcpy()/memmove() implementation Robin Murphy
2021-05-11 16:12 ` Robin Murphy [this message]
2021-05-14 14:55 ` [PATCH 7/8] arm64: Better optimised memchr() Catalin Marinas
2021-05-14 18:38 ` Robin Murphy
2021-05-11 16:12 ` [PATCH 8/8] arm64: Rewrite __arch_clear_user() Robin Murphy
2021-05-12 10:48 ` Mark Rutland
2021-05-12 11:31 ` Robin Murphy
2021-05-12 13:06 ` Mark Rutland
2021-05-12 13:51 ` Robin Murphy
2021-05-14 11:57 ` [PATCH v2] " Robin Murphy
2021-05-26 11:15 ` Mark Rutland
2021-05-27 13:24 ` Robin Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=68d3229c208780fdde88ba48e14fcde8c50f6e76.1620738177.git.robin.murphy@arm.com \
--to=robin.murphy@arm.com \
--cc=catalin.marinas@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=shenkai8@huawei.com \
--cc=will@kernel.org \
--cc=yangyingliang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).