From: Linus Torvalds <torvalds@linux-foundation.org>
To: Nikolay Borisov <nborisov@suse.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Nick Desaulniers <ndesaulniers@google.com>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH] lib/string: Bring optimized memcmp from glibc
Date: Wed, 21 Jul 2021 12:26:29 -0700 [thread overview]
Message-ID: <CAHk-=wgr3JSoasv3Kyzc0u-L36oAr=hzY9oUrCxaszWaxgLW0A@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wgMyXh3gGuSzj_Dgw=Gn_XPxGSTPq6Pz7dEyx6JNuAh9g@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 2877 bytes --]
On Wed, Jul 21, 2021 at 11:45 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I can do the mutual alignment too, but I'd actually prefer to do it as
> a separate patch, for when there are numbers for that.
>
> And I wouldn't do it as a byte-by-byte case, because that's just stupid.
Here's that "try to align one of the pointers in order to avoid the
lots-of-unaligned case" patch.
It's not quite as simple, and the generated assembly isn't quite as
obvious. But it still generates code that looks good, it's just that
the code to align the first pointer ends up being a bit harder to
read.
And since it's a bit less obvious, the "this is probably buggy because
I didn't actually _test_ it" warning holds even more. But you can see
how much simpler the code still is than the horrendous glibc mess is.
And I removed the "CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS" checking,
because now it should be at least *somewhat* reasonable even on
machines that have a complicated "get_unaligned()".
But maybe I should have kept it. Only testing will tell.
Again: UNTESTED GARBAGE ATTACHED. Be careful. But it migth work, and
ti generates something that looks superficially reasonable.
Gcc again:
memcmp:
cmpq $7, %rdx
jbe .L56
movq (%rsi), %rax
cmpq %rax, (%rdi)
je .L61
.L55:
xorl %ecx, %ecx
jmp .L58
.L62:
addq $1, %rcx
cmpq %rcx, %rdx
je .L51
.L58:
movzbl (%rdi,%rcx), %eax
movzbl (%rsi,%rcx), %r8d
subl %r8d, %eax
je .L62
.L51:
ret
.L56:
testq %rdx, %rdx
jne .L55
xorl %eax, %eax
ret
.L61:
movq %rdi, %rcx
movl $8, %eax
andl $7, %ecx
subq %rcx, %rax
leaq -8(%rcx,%rdx), %rdx
addq %rax, %rdi
addq %rax, %rsi
cmpq $7, %rdx
ja .L57
jmp .L56
.L63:
subq $8, %rdx
addq $8, %rdi
addq $8, %rsi
cmpq $7, %rdx
jbe .L56
.L57:
movq (%rsi), %rax
cmpq %rax, (%rdi)
je .L63
jmp .L55
but clang is similar (except clang isn't as eager to move basic blocks
around, so it's visually very different).
Note no spills, no odd shifts for unaligned accesses, no garbage.
Again: untested, so consider this a starting point rather than
anything good and proper.
Linus
[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1172 bytes --]
lib/string.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/lib/string.c b/lib/string.c
index 77bd0b1d3296..3eb390fc4f73 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -29,6 +29,7 @@
#include <linux/errno.h>
#include <linux/slab.h>
+#include <asm/unaligned.h>
#include <asm/byteorder.h>
#include <asm/word-at-a-time.h>
#include <asm/page.h>
@@ -935,6 +936,31 @@ __visible int memcmp(const void *cs, const void *ct, size_t count)
const unsigned char *su1, *su2;
int res = 0;
+ if (count >= sizeof(unsigned long)) {
+ const unsigned long *u1 = cs;
+ const unsigned long *u2 = ct;
+ unsigned long offset;
+
+ if (get_unaligned(u1) != get_unaligned(u2))
+ goto bytewise;
+
+ /* Align 'u1' up */
+ offset = sizeof(*u1) - ((sizeof(*u1)-1) & (unsigned long)(u1));
+ u1 = cs + offset;
+ u2 = ct + offset;
+ count -= offset;
+
+ while (count >= sizeof(unsigned long)) {
+ if (*u1 != get_unaligned(u2))
+ break;
+ u1++;
+ u2++;
+ count -= sizeof(unsigned long);
+ }
+ cs = u1;
+ ct = u2;
+ }
+bytewise:
for (su1 = cs, su2 = ct; 0 < count; ++su1, ++su2, count--)
if ((res = *su1 - *su2) != 0)
break;
next prev parent reply other threads:[~2021-07-21 19:26 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-21 13:59 [PATCH] lib/string: Bring optimized memcmp from glibc Nikolay Borisov
2021-07-21 14:32 ` Christoph Hellwig
2021-07-21 14:35 ` Nikolay Borisov
2021-07-21 14:42 ` Christoph Hellwig
2021-07-21 14:51 ` Matthew Wilcox
2021-07-21 15:17 ` Peter.Enderborg
2021-07-21 15:34 ` Matthew Wilcox
2021-07-21 15:39 ` Peter.Enderborg
2021-07-21 18:00 ` Linus Torvalds
2021-07-21 18:17 ` Nikolay Borisov
2021-07-21 18:45 ` Linus Torvalds
2021-07-21 18:55 ` Linus Torvalds
2021-07-21 19:26 ` Linus Torvalds [this message]
2021-07-22 11:28 ` Nikolay Borisov
2021-07-22 16:40 ` Linus Torvalds
2021-07-22 17:03 ` Nikolay Borisov
2021-08-26 9:03 ` Nikolay Borisov
2021-07-22 8:28 ` Nikolay Borisov
2021-07-23 14:02 ` David Laight
2021-07-21 20:10 ` David Sterba
2021-07-21 20:27 ` Linus Torvalds
2021-07-22 5:54 ` Nikolay Borisov
2021-07-28 20:12 ` Florian Weimer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHk-=wgr3JSoasv3Kyzc0u-L36oAr=hzY9oUrCxaszWaxgLW0A@mail.gmail.com' \
--to=torvalds@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nborisov@suse.com \
--cc=ndesaulniers@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).