linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Borislav Petkov <bp@alien8.de>
Cc: David Howells <dhowells@redhat.com>,
	kernel test robot <oliver.sang@intel.com>,
	oe-lkp@lists.linux.dev, lkp@intel.com,
	linux-kernel@vger.kernel.org,
	Christian Brauner <brauner@kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
	Christian Brauner <christian@brauner.io>,
	Matthew Wilcox <willy@infradead.org>,
	David Laight <David.Laight@aculab.com>,
	ying.huang@intel.com, feng.tang@intel.com, fengwei.yin@intel.com
Subject: Re: [linus:master] [iov_iter] c9eec08bac: vm-scalability.throughput -16.9% regression
Date: Fri, 17 Nov 2023 11:32:31 -0500	[thread overview]
Message-ID: <CAHk-=wj33FoGBQ7HkqjLbyOBQogWpYAG7WUTXatcfBF5duijjQ@mail.gmail.com> (raw)
In-Reply-To: <20231117160940.GGZVeQRLgLjJZXBLE1@fat_crate.local>

On Fri, 17 Nov 2023 at 11:10, Borislav Petkov <bp@alien8.de> wrote:
>
> Which looks like those added memcpy calls add a lot of overhead due to
> the mitigations crap.

No, you missed where I thought that too and asked David to test
without mitigations.

That load really loves "rep movsb" on his machine (and that includes
the gcc-generated odd inlined "one word by hand, and then 'rep movsq'
for the rest").

It's probably because it's a benchmark that doesn't actually touch the
data, and does page-sized copies. It's pretty much the optimal case
for ERMS.

The "do one word by hand, the rest with 'rep movsq'" model that gcc
uses (but only in this particular code generation case) probably ends
up being quite reasonable in general - the one word by hand allows for
unaligned counts, but it also brings in the beginning of the copy into
the cache (which is *often* the part used later - not in this
benchmark, but in general), and then the rest ends up being done as L2
cacheline copies at least when we have those nice page-aligned
patterns.

Of course, this whole thread started because the kernel test robot
then has exactly the opposite reaction - it seems to really *hate*
that inlined code generation by gcc. Whether that is because it's a
very different microarchitecture, or it's because it's just a very
different access pattern than the one that David's random KUnit test
pattern is, I don't know.

That kernel test robot case is on a Cooper Lake Xeon, which is (I
think) is just Skylake server. Random Intel codenames...

So that test robot has ERMS too, but no FSRM, so we do the old
"memcpy_orig()" with the regular memcpy loop.

And on that Xeon, it really does seem to be the right thing to do.

But the profile is so noisy with other changes that it's not like I
can guarantee that that is the main issue here. The reason I zeroed in
on the memcpy thing was really just that (a) it does show up in the
profiles and (b) the commit that introduced that 16% regression
doesn't really seem to do anything else than reorganize things just
enough that gcc seems to do that alternate memcpy implementation.

The test case seems to be (from the profile) just a simple

do_iter_readv_writev ->
  shmem_file_write_iter ->
    generic_perform_write ->
      copy_page_from_iter_atomic ->
        memcpy_from_iter_mc

and that's then where the new code generation matters (ie does it do
that "inline with rep movsq" or "call memcpy_orig").

For David, the rep movsq is great. For the kernel test robot, it's bad.

                  Linus

  reply	other threads:[~2023-11-17 16:32 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-07  1:40 [linus:master] [iov_iter] c9eec08bac: vm-scalability.throughput -16.9% regression kernel test robot
2023-11-15 12:48 ` David Howells
2023-11-15 13:18 ` David Howells
2023-11-15 15:20 ` David Howells
2023-11-15 16:53   ` Linus Torvalds
2023-11-15 17:38     ` Linus Torvalds
2023-11-15 18:38       ` Linus Torvalds
2023-11-15 19:09         ` Borislav Petkov
2023-11-15 19:15           ` Linus Torvalds
2023-11-15 20:07             ` Linus Torvalds
2023-11-16 10:07               ` David Laight
2023-11-16 10:14               ` David Howells
2023-11-16 11:38                 ` David Laight
2023-11-15 19:26           ` Linus Torvalds
2023-11-16 15:44             ` Borislav Petkov
2023-11-16 16:48               ` Linus Torvalds
2023-11-16 16:58                 ` David Laight
2023-11-17 11:44                 ` Borislav Petkov
2023-11-17 12:09                   ` Jakub Jelinek
2023-11-17 12:18                     ` Borislav Petkov
2023-11-17 13:09                   ` David Laight
2023-11-17 13:36                     ` Linus Torvalds
2023-11-17 15:20                       ` David Laight
2023-11-16 16:44             ` David Howells
2023-11-17 11:35               ` Borislav Petkov
2023-11-17 14:12               ` David Howells
2023-11-17 16:09                 ` Borislav Petkov
2023-11-17 16:32                   ` Linus Torvalds [this message]
2023-11-17 16:44                     ` Linus Torvalds
2023-11-17 19:12                       ` Borislav Petkov
2023-11-17 21:57                         ` Linus Torvalds
2023-11-20 13:32                         ` David Howells
2023-11-20 16:06                           ` Linus Torvalds
2023-11-20 16:09                           ` David Laight
2023-11-15 21:43       ` David Howells
2023-11-15 21:50         ` Linus Torvalds
2023-11-15 21:59           ` Borislav Petkov
2023-11-20 11:52           ` Borislav Petkov
2023-11-15 22:59         ` David Howells
2023-11-16  3:26           ` Linus Torvalds
2023-11-16 16:55             ` David Laight
2023-11-16 17:24               ` Linus Torvalds
2023-11-16 22:53                 ` David Laight
2023-11-16 21:09           ` David Howells
2023-11-16 22:36             ` Linus Torvalds
2023-11-15 18:35     ` David Howells
2023-11-15 18:45       ` Linus Torvalds
2023-11-15 19:09         ` Linus Torvalds
2023-11-15 20:54       ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wj33FoGBQ7HkqjLbyOBQogWpYAG7WUTXatcfBF5duijjQ@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=David.Laight@aculab.com \
    --cc=axboe@kernel.dk \
    --cc=bp@alien8.de \
    --cc=brauner@kernel.org \
    --cc=christian@brauner.io \
    --cc=dhowells@redhat.com \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).