linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: David Howells <dhowells@redhat.com>
Cc: kernel test robot <oliver.sang@intel.com>,
	oe-lkp@lists.linux.dev, lkp@intel.com,
	linux-kernel@vger.kernel.org,
	Christian Brauner <brauner@kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
	Christian Brauner <christian@brauner.io>,
	Matthew Wilcox <willy@infradead.org>,
	David Laight <David.Laight@aculab.com>,
	ying.huang@intel.com, feng.tang@intel.com, fengwei.yin@intel.com
Subject: Re: [linus:master] [iov_iter] c9eec08bac: vm-scalability.throughput -16.9% regression
Date: Wed, 15 Nov 2023 14:09:40 -0500	[thread overview]
Message-ID: <CAHk-=whtDxahdzn4yLP_3BNb496AQ0y5QrE36JVLUkqRM+un5A@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=whFGA6YPJp3zazUwBG6ort8i34vGv9utYdOgYpekyt++Q@mail.gmail.com>

On Wed, 15 Nov 2023 at 13:45, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Do you perhaps have CONFIG_CC_OPTIMIZE_FOR_SIZE set? That makes gcc
> use "rep movsb" - even for small copies that most definitely should
> *not* use "rep movsb".

Just to give some background an an example:

        __builtin_memcpy(dst, src, 24);

with -O2 is done as three 64-bit move instructions (well, three in
both direction, so six instructions total), and with -Os you get

        movl $6, %ecx
        rep movsl

instead.  And no, this isn't all that uncommon, because things like
the above is what happens when you copy a small structure around.

And that "rep movsl" is indeed nice and small, but it's truly
horrendously bad from a performance angle on most cores, compared to
the six instructions that can schedule nicely and take a cycle or two.

There are some other cases of similar "-Os generates unacceptable
code". For example, dividing by a constant - when you use -Os, gcc
thinks that it's perfectly fine to actually generate a divide
instruction, because it is indeed small.

But in most cases you really *really* want to use a "multiply by
reciprocal" even though it generates bigger code. Again, it ends up
depending on microarchitecture, and modern cores tend to do better on
divides, but it's another of those things where saving a copuple of
bytes of code space is not the right choice if it means that you use a
slow divider.

And again, those "divide by constant" often happen in implicit
contexts (ie the constant may be the size of a structure, and the
divide is due to taking a pointer difference). Let's say you have a
structure that isn't a power of two, but is (to pick a random but not
unlikely value) is 56 bytes in size.

The code generation for -O2 is (value in %rdi)

        movabsq $2635249153387078803, %rax
        shrq $3, %rdi
        mulq %rdi

and for -Os you get (value in %rax):

        movl $56, %ecx
        xorl %edx, %edx
        divq %rcx

and that 'divq' is certainly again smaller and more obvious, but again
we're talking "single cycles" vs "potentially 50+ cycles" depending on
uarch.

                  Linus

  reply	other threads:[~2023-11-15 19:10 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-07  1:40 [linus:master] [iov_iter] c9eec08bac: vm-scalability.throughput -16.9% regression kernel test robot
2023-11-15 12:48 ` David Howells
2023-11-15 13:18 ` David Howells
2023-11-15 15:20 ` David Howells
2023-11-15 16:53   ` Linus Torvalds
2023-11-15 17:38     ` Linus Torvalds
2023-11-15 18:38       ` Linus Torvalds
2023-11-15 19:09         ` Borislav Petkov
2023-11-15 19:15           ` Linus Torvalds
2023-11-15 20:07             ` Linus Torvalds
2023-11-16 10:07               ` David Laight
2023-11-16 10:14               ` David Howells
2023-11-16 11:38                 ` David Laight
2023-11-15 19:26           ` Linus Torvalds
2023-11-16 15:44             ` Borislav Petkov
2023-11-16 16:48               ` Linus Torvalds
2023-11-16 16:58                 ` David Laight
2023-11-17 11:44                 ` Borislav Petkov
2023-11-17 12:09                   ` Jakub Jelinek
2023-11-17 12:18                     ` Borislav Petkov
2023-11-17 13:09                   ` David Laight
2023-11-17 13:36                     ` Linus Torvalds
2023-11-17 15:20                       ` David Laight
2023-11-16 16:44             ` David Howells
2023-11-17 11:35               ` Borislav Petkov
2023-11-17 14:12               ` David Howells
2023-11-17 16:09                 ` Borislav Petkov
2023-11-17 16:32                   ` Linus Torvalds
2023-11-17 16:44                     ` Linus Torvalds
2023-11-17 19:12                       ` Borislav Petkov
2023-11-17 21:57                         ` Linus Torvalds
2023-11-20 13:32                         ` David Howells
2023-11-20 16:06                           ` Linus Torvalds
2023-11-20 16:09                           ` David Laight
2023-11-15 21:43       ` David Howells
2023-11-15 21:50         ` Linus Torvalds
2023-11-15 21:59           ` Borislav Petkov
2023-11-20 11:52           ` Borislav Petkov
2023-11-15 22:59         ` David Howells
2023-11-16  3:26           ` Linus Torvalds
2023-11-16 16:55             ` David Laight
2023-11-16 17:24               ` Linus Torvalds
2023-11-16 22:53                 ` David Laight
2023-11-16 21:09           ` David Howells
2023-11-16 22:36             ` Linus Torvalds
2023-11-15 18:35     ` David Howells
2023-11-15 18:45       ` Linus Torvalds
2023-11-15 19:09         ` Linus Torvalds [this message]
2023-11-15 20:54       ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=whtDxahdzn4yLP_3BNb496AQ0y5QrE36JVLUkqRM+un5A@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=David.Laight@aculab.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=christian@brauner.io \
    --cc=dhowells@redhat.com \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).