mm-commits.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Hemment <markhemm@googlemail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	the arch/x86 maintainers <x86@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	patrice.chotard@foss.st.com,
	Mikulas Patocka <mpatocka@redhat.com>,
	Lukas Czerner <lczerner@redhat.com>,
	Christoph Hellwig <hch@lst.de>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Chuck Lever <chuck.lever@oracle.com>,
	Hugh Dickins <hughd@google.com>,
	patches@lists.linux.dev, Linux-MM <linux-mm@kvack.org>,
	mm-commits@vger.kernel.org, Mel Gorman <mgorman@suse.de>
Subject: Re: [PATCH] x86/clear_user: Make it faster
Date: Wed, 22 Jun 2022 22:14:05 +0200	[thread overview]
Message-ID: <YrN4DdR9HN0srNWe@zn.tnic> (raw)
In-Reply-To: <CAHk-=wgmOfipHDvshwooTV81hMh6FHieSvhgGVWZMX8w+E-2DQ@mail.gmail.com>

On Wed, Jun 22, 2022 at 10:06:42AM -0500, Linus Torvalds wrote:
> I'm not sure how valid the TSC thing is, with the extra
> synchronization maybe interacting with the whole microcode engine
> startup/stop thing.

Very possible.

So I went and did the original microbenchmark which started people
looking into this in the first place and with it, it looks very
good:

before:

$ dd if=/dev/zero of=/dev/null bs=1024k status=progress
400823418880 bytes (401 GB, 373 GiB) copied, 17 s, 23.6 GB/s

after:

$ dd if=/dev/zero of=/dev/null bs=1024k status=progress
2696274771968 bytes (2.7 TB, 2.5 TiB) copied, 50 s, 53.9 GB/s

So that's very persuasive in my book.

> I'm also not sure the rdtsc is doing the same thing on your AMD tests
> vs your Intel tests - I suspect you end up both using 'rdtscp' (as

That is correct.

> opposed to the 'lsync' variant we also have), but I don't think the
> ordering really is all that well defined architecturally, so AMD may
> have very different serialization rules than Intel does.
>
> .. and that serialization may well be different wrt normal load/stores
> and microcode.

Well, if it is that, hw people have always been telling me to use RDTSC
to measure stuff but I will object next time.

> So those numbers look like they have a 3% difference, but I'm not 100%
> convinced it might not be due to measuring artifacts. The fact that it
> worked well for you on your AMD platform doesn't necessarily mean that
> it has to work on icelake-x.

Well, it certainly is something uarch-specific because that machine
had an eval Icelake sample before and it would show the same minute
slowdown too. I attributed it to the CPU being an eval sample but I
guess uarch-wise it didn't matter.

> But it could equally easily be that "rep stosb" really just isn't any
> better on that platform, and the numbers are just giving the plain
> reality.

Right.

> Or it could mean that it makes some cache access decision ("this is
> big enough that let's not pollute L1 caches, do stores directly to
> L2") that might be better for actual performance afterwards, but that
> makes that clearing itself that bit slower.

There's that too.

> IOW, I do think that microbenchmarks are kind of suspect to begin
> with, and the rdtsc thing in particular may work better on some
> microarchitectures than it does others.
> 
> Very hard to make a judgment call - I think the only thing that really
> ends up mattering is the macro-benchmarks, but I think when you tried
> that it was way too noisy to actually show any real signal.

Yap, that was the reason why I went down to the microbenchmarks. But
even with the real benchmark, it would show a slightly bad numbers on
ICL which got me scratching my head as to why that is...

> That is, of course, a problem with memcpy and memset in general. It's
> easy to do microbenchmarks for them, it's not just clear whether said
> microbenchmarks give numbers that are actually meaningful, exactly
> because of things like cache replacement policy etc.
> 
> And finally, I will repeat that this particular code probably just
> isn't that important. The memory clearing for page allocation and
> regular memcpy is where most of the real time is spent, so I don't
> think that you should necessarily worry too much about this special
> case.

Yeah, I started poking at this because people would come with patches or
complain about stuff being slow but then no one would actually sit down
and do the measurements...

Oh well, anyway, I still think we should take that because that dd thing
above is pretty good-lookin' even on ICL. And we now have a good example
of how all this patching thing should work - have the insns patched
in and only replace with calls to other variants on the minority of
machines.

And the ICL slowdown is small enough and kinda hard to measure...

Thoughts?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

  reply	other threads:[~2022-06-22 20:14 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-15  2:12 incoming Andrew Morton
2022-04-15  2:13 ` [patch 01/14] MAINTAINERS: Broadcom internal lists aren't maintainers Andrew Morton
2022-04-15  2:13 ` [patch 02/14] tmpfs: fix regressions from wider use of ZERO_PAGE Andrew Morton
2022-04-15 22:10   ` Linus Torvalds
2022-04-15 22:21     ` Matthew Wilcox
2022-04-15 22:41     ` Hugh Dickins
2022-04-16  6:36     ` Borislav Petkov
2022-04-16 14:07       ` Mark Hemment
2022-04-16 17:28         ` Borislav Petkov
2022-04-16 17:42           ` Linus Torvalds
2022-04-16 21:15             ` Borislav Petkov
2022-04-17 19:41               ` Borislav Petkov
2022-04-17 20:56                 ` Linus Torvalds
2022-04-18 10:15                   ` Borislav Petkov
2022-04-18 17:10                     ` Linus Torvalds
2022-04-19  9:17                       ` Borislav Petkov
2022-04-19 16:41                         ` Linus Torvalds
2022-04-19 17:48                           ` Borislav Petkov
2022-04-21 15:06                             ` Borislav Petkov
2022-04-21 16:50                               ` Linus Torvalds
2022-04-21 17:22                                 ` Linus Torvalds
2022-04-24 19:37                                   ` Borislav Petkov
2022-04-24 19:54                                     ` Linus Torvalds
2022-04-24 20:24                                       ` Linus Torvalds
2022-04-27  0:14                                       ` Borislav Petkov
2022-04-27  1:29                                         ` Linus Torvalds
2022-04-27 10:41                                           ` Borislav Petkov
2022-04-27 16:00                                             ` Linus Torvalds
2022-05-04 18:56                                               ` Borislav Petkov
2022-05-04 19:22                                                 ` Linus Torvalds
2022-05-04 20:18                                                   ` Borislav Petkov
2022-05-04 20:40                                                     ` Linus Torvalds
2022-05-04 21:01                                                       ` Borislav Petkov
2022-05-04 21:09                                                         ` Linus Torvalds
2022-05-10  9:31                                                           ` clear_user (was: [patch 02/14] tmpfs: fix regressions from wider use of ZERO_PAGE) Borislav Petkov
2022-05-10 17:17                                                             ` Linus Torvalds
2022-05-10 17:28                                                             ` Linus Torvalds
2022-05-10 18:10                                                               ` Borislav Petkov
2022-05-10 18:57                                                                 ` Borislav Petkov
2022-05-24 12:32                                                                   ` [PATCH] x86/clear_user: Make it faster Borislav Petkov
2022-05-24 16:51                                                                     ` Linus Torvalds
2022-05-24 17:30                                                                       ` Borislav Petkov
2022-05-25 12:11                                                                     ` Mark Hemment
2022-05-27 11:28                                                                       ` Borislav Petkov
2022-05-27 11:10                                                                     ` Ingo Molnar
2022-06-22 14:21                                                                     ` Borislav Petkov
2022-06-22 15:06                                                                       ` Linus Torvalds
2022-06-22 20:14                                                                         ` Borislav Petkov [this message]
2022-06-22 21:07                                                                           ` Linus Torvalds
2022-06-23  9:41                                                                             ` Borislav Petkov
2022-07-05 17:01                                                                               ` [PATCH -final] " Borislav Petkov
2022-07-06  9:24                                                                                 ` Alexey Dobriyan
2022-07-11 10:33                                                                                   ` Borislav Petkov
2022-07-12 12:32                                                                                     ` Alexey Dobriyan
2022-08-06 12:49                                                                                       ` Borislav Petkov
2022-04-15  2:13 ` [patch 03/14] mm/secretmem: fix panic when growing a memfd_secret Andrew Morton
2022-04-15  2:13 ` [patch 04/14] irq_work: use kasan_record_aux_stack_noalloc() record callstack Andrew Morton
2022-04-15  2:13 ` [patch 05/14] kasan: fix hw tags enablement when KUNIT tests are disabled Andrew Morton
2022-04-15  2:13 ` [patch 06/14] mm, kfence: support kmem_dump_obj() for KFENCE objects Andrew Morton
2022-04-15  2:13 ` [patch 07/14] mm, page_alloc: fix build_zonerefs_node() Andrew Morton
2022-04-15  2:13 ` [patch 08/14] mm: fix unexpected zeroed page mapping with zram swap Andrew Morton
2022-04-15  2:13 ` [patch 09/14] mm: compaction: fix compiler warning when CONFIG_COMPACTION=n Andrew Morton
2022-04-15  2:13 ` [patch 10/14] hugetlb: do not demote poisoned hugetlb pages Andrew Morton
2022-04-15  2:13 ` [patch 11/14] revert "fs/binfmt_elf: fix PT_LOAD p_align values for loaders" Andrew Morton
2022-04-15  2:13 ` [patch 12/14] revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE" Andrew Morton
2022-04-15  2:14 ` [patch 13/14] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore Andrew Morton
2022-04-15  2:14 ` [patch 14/14] mm: kmemleak: take a full lowmem check in kmemleak_*_phys() Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YrN4DdR9HN0srNWe@zn.tnic \
    --to=bp@alien8.de \
    --cc=akpm@linux-foundation.org \
    --cc=chuck.lever@oracle.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=hughd@google.com \
    --cc=lczerner@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=markhemm@googlemail.com \
    --cc=mgorman@suse.de \
    --cc=mm-commits@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=patches@lists.linux.dev \
    --cc=patrice.chotard@foss.st.com \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).