linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nikolay Borisov <nborisov@suse.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH] lib/string: Bring optimized memcmp from glibc
Date: Thu, 22 Jul 2021 20:03:22 +0300	[thread overview]
Message-ID: <c8262efc-e97a-4672-0cc1-90c778eecd94@suse.com> (raw)
In-Reply-To: <CAHk-=whTNJyEMmCpNh5FeT+bJzSE2CYDPWyDO12G4q39d1jn1g@mail.gmail.com>



On 22.07.21 г. 19:40, Linus Torvalds wrote:
> On Thu, Jul 22, 2021 at 4:28 AM Nikolay Borisov
> <n.borisov.lkml@gmail.com> wrote:
>>
>> This one also works, tested only on x86-64. Looking at the perf diff:
>>
>>     30.44%    -28.66%  [kernel.vmlinux]         [k] memcmp
> 
> Ok.
> 
> So the one that doesn't even bother to align is
> 
>     30.44%    -29.38%  [kernel.vmlinux]         [k] memcmp
> 
> and the one that aligns the first one is
> 
>     30.44%    -28.66%  [kernel.vmlinux]         [k] memcmp
> 
> and the difference between the two is basically in the noise:
> 
>      1.05%     +0.72%  [kernel.vmlinux]     [k] memcmp
> 
> but the first one does seem to be slightly better.
> 
>> Now on a more practical note, IIUC your 2nd version makes sense if the
>> cost of doing a one unaligned access in the loop body is offset by the
>> fact we are doing a native word-sized comparison, right?
> 
> So honestly, the reason the first one seems to beat the second one is
> that the cost of unaligned accesses on modern x86 is basically
> epsilon.
> 
> For all normal unaligned accesses there simply is no cost at all.
> There is a _tiny_ cost when the unaligned access crosses a cacheline
> access boundary (which on older CPU's is every 32 bytes, on modern
> ones it's 64 bytes). And then there is another slightly bigger cost
> when the unaligned access actually crosses a page boundary.
> 
> But even those non-zero cost cases are basically in the noise, because
> under most circumstances they will be hidden by any out-of-order
> engine, and completely dwarfed by the _real_ costs which are branch
> mispredicts and cache misses.
> 
> So on the whole, unaligned accesses are basically no cost at all. You
> almost have to have unusual code sequences for them to be even
> measurable.
> 
> So that second patch that aligns one of the sources is basically only
> extra overhead for no real advantage. The cost of it is probably one
> more branch mispredict, and possibly a cycle or two for the extra
> instructions.
> 
> Which is why the first one wins: it's simpler, and the extra work the
> second one does is basically not worth it on x86. Plus I suspect your
> test-case was all aligned anyway to begin with, so the extra work is
> _doubly_ pointless.
> 
> I suspect the second patch would be worthwhile if
> 
>  (a) there really were a lot of strings that weren't aligned (likelihood: low)
> 
>  (b) other microarchitectures that do worse on unaligned accesses -
> some microarchitectures spend extra cycles on _any_ unaligned accesses
> even if they don't cross cache access boundaries etc.
> 
> and I can see (b) happening quite easily. You just won't see it on a
> modern x86-64 setup.
> 
> I suspect we should start with the first version. It's not only better
> on x86, but it's simpler, and it's guarded by that
> 
>     #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> 
> so it's fundamentally "safer" on architectures that are just horrible
> about unaligned accesses.
> 
> Now, admittedly I don't personally even care about such architectures,
> and because we use "get_unaligned()", the compiler should always
> generate code that doesn't absolutely suck for bad architectures, but
> considering how long we've gone with the completely brainlessly simple
> "byte at a time" version without anybody even noticing, I think a
> minimal change is a better change.
> 
> That said, I'm not convinced I want to apply even that minimal first
> patch outside of the merge window.
> 
> So would you mind reminding me about this patch the next merge window?
> Unless there was some big extrernal reason why the performance of
> memcmp() mattered to you so much (ie some user that actually noticed
> and complained) and we really want to prioritize this..

I will do my best and hope I don't forget. OTOH there isn't anything
pressing it's something I found while looking at fidedupe's performance
and not even the major contributor but still, it looks sensible to fix
it now, that I have a workload at hand which clearly demonstrates
positive impact and can easily measure any changes.
> 
>               Linus
> 

  reply	other threads:[~2021-07-22 17:03 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-21 13:59 [PATCH] lib/string: Bring optimized memcmp from glibc Nikolay Borisov
2021-07-21 14:32 ` Christoph Hellwig
2021-07-21 14:35   ` Nikolay Borisov
2021-07-21 14:42     ` Christoph Hellwig
2021-07-21 14:51       ` Matthew Wilcox
2021-07-21 15:17         ` Peter.Enderborg
2021-07-21 15:34           ` Matthew Wilcox
2021-07-21 15:39             ` Peter.Enderborg
2021-07-21 18:00 ` Linus Torvalds
2021-07-21 18:17   ` Nikolay Borisov
2021-07-21 18:45     ` Linus Torvalds
2021-07-21 18:55       ` Linus Torvalds
2021-07-21 19:26       ` Linus Torvalds
2021-07-22 11:28         ` Nikolay Borisov
2021-07-22 16:40           ` Linus Torvalds
2021-07-22 17:03             ` Nikolay Borisov [this message]
2021-08-26  9:03             ` Nikolay Borisov
2021-07-22  8:28       ` Nikolay Borisov
2021-07-23 14:02       ` David Laight
2021-07-21 20:10   ` David Sterba
2021-07-21 20:27     ` Linus Torvalds
2021-07-22  5:54       ` Nikolay Borisov
2021-07-28 20:12 ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c8262efc-e97a-4672-0cc1-90c778eecd94@suse.com \
    --to=nborisov@suse.com \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).