linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Nikolay Borisov <n.borisov.lkml@gmail.com>
Cc: Nikolay Borisov <nborisov@suse.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH] lib/string: Bring optimized memcmp from glibc
Date: Thu, 22 Jul 2021 09:40:43 -0700	[thread overview]
Message-ID: <CAHk-=whTNJyEMmCpNh5FeT+bJzSE2CYDPWyDO12G4q39d1jn1g@mail.gmail.com> (raw)
In-Reply-To: <eef30b51-c69f-0e70-11a8-c35f90aeca67@gmail.com>

On Thu, Jul 22, 2021 at 4:28 AM Nikolay Borisov
<n.borisov.lkml@gmail.com> wrote:
>
> This one also works, tested only on x86-64. Looking at the perf diff:
>
>     30.44%    -28.66%  [kernel.vmlinux]         [k] memcmp

Ok.

So the one that doesn't even bother to align is

    30.44%    -29.38%  [kernel.vmlinux]         [k] memcmp

and the one that aligns the first one is

    30.44%    -28.66%  [kernel.vmlinux]         [k] memcmp

and the difference between the two is basically in the noise:

     1.05%     +0.72%  [kernel.vmlinux]     [k] memcmp

but the first one does seem to be slightly better.

> Now on a more practical note, IIUC your 2nd version makes sense if the
> cost of doing a one unaligned access in the loop body is offset by the
> fact we are doing a native word-sized comparison, right?

So honestly, the reason the first one seems to beat the second one is
that the cost of unaligned accesses on modern x86 is basically
epsilon.

For all normal unaligned accesses there simply is no cost at all.
There is a _tiny_ cost when the unaligned access crosses a cacheline
access boundary (which on older CPU's is every 32 bytes, on modern
ones it's 64 bytes). And then there is another slightly bigger cost
when the unaligned access actually crosses a page boundary.

But even those non-zero cost cases are basically in the noise, because
under most circumstances they will be hidden by any out-of-order
engine, and completely dwarfed by the _real_ costs which are branch
mispredicts and cache misses.

So on the whole, unaligned accesses are basically no cost at all. You
almost have to have unusual code sequences for them to be even
measurable.

So that second patch that aligns one of the sources is basically only
extra overhead for no real advantage. The cost of it is probably one
more branch mispredict, and possibly a cycle or two for the extra
instructions.

Which is why the first one wins: it's simpler, and the extra work the
second one does is basically not worth it on x86. Plus I suspect your
test-case was all aligned anyway to begin with, so the extra work is
_doubly_ pointless.

I suspect the second patch would be worthwhile if

 (a) there really were a lot of strings that weren't aligned (likelihood: low)

 (b) other microarchitectures that do worse on unaligned accesses -
some microarchitectures spend extra cycles on _any_ unaligned accesses
even if they don't cross cache access boundaries etc.

and I can see (b) happening quite easily. You just won't see it on a
modern x86-64 setup.

I suspect we should start with the first version. It's not only better
on x86, but it's simpler, and it's guarded by that

    #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS

so it's fundamentally "safer" on architectures that are just horrible
about unaligned accesses.

Now, admittedly I don't personally even care about such architectures,
and because we use "get_unaligned()", the compiler should always
generate code that doesn't absolutely suck for bad architectures, but
considering how long we've gone with the completely brainlessly simple
"byte at a time" version without anybody even noticing, I think a
minimal change is a better change.

That said, I'm not convinced I want to apply even that minimal first
patch outside of the merge window.

So would you mind reminding me about this patch the next merge window?
Unless there was some big extrernal reason why the performance of
memcmp() mattered to you so much (ie some user that actually noticed
and complained) and we really want to prioritize this..

              Linus

  reply	other threads:[~2021-07-22 16:43 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-21 13:59 [PATCH] lib/string: Bring optimized memcmp from glibc Nikolay Borisov
2021-07-21 14:32 ` Christoph Hellwig
2021-07-21 14:35   ` Nikolay Borisov
2021-07-21 14:42     ` Christoph Hellwig
2021-07-21 14:51       ` Matthew Wilcox
2021-07-21 15:17         ` Peter.Enderborg
2021-07-21 15:34           ` Matthew Wilcox
2021-07-21 15:39             ` Peter.Enderborg
2021-07-21 18:00 ` Linus Torvalds
2021-07-21 18:17   ` Nikolay Borisov
2021-07-21 18:45     ` Linus Torvalds
2021-07-21 18:55       ` Linus Torvalds
2021-07-21 19:26       ` Linus Torvalds
2021-07-22 11:28         ` Nikolay Borisov
2021-07-22 16:40           ` Linus Torvalds [this message]
2021-07-22 17:03             ` Nikolay Borisov
2021-08-26  9:03             ` Nikolay Borisov
2021-07-22  8:28       ` Nikolay Borisov
2021-07-23 14:02       ` David Laight
2021-07-21 20:10   ` David Sterba
2021-07-21 20:27     ` Linus Torvalds
2021-07-22  5:54       ` Nikolay Borisov
2021-07-28 20:12 ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=whTNJyEMmCpNh5FeT+bJzSE2CYDPWyDO12G4q39d1jn1g@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=n.borisov.lkml@gmail.com \
    --cc=nborisov@suse.com \
    --cc=ndesaulniers@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).