linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Rasmus Villemoes <mail@rasmusvillemoes.dk>,
	x86-ml <x86@kernel.org>, Andy Lutomirski <luto@kernel.org>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	lkml <linux-kernel@vger.kernel.org>
Subject: Re: [RFC] Improve memset
Date: Tue, 17 Sep 2019 10:15:49 +0200	[thread overview]
Message-ID: <20190917081549.GB12174@nazgul.tnic> (raw)
In-Reply-To: <CAHk-=wjdpJ+VapXfoZE8JRUfvMb8JrVTZe0=TDFYZ-ke+uqBOA@mail.gmail.com>

On Mon, Sep 16, 2019 at 10:25:25AM -0700, Linus Torvalds wrote:
> So the "inline constant sizes" case has advantages over and beyond the
> obvious ones. I suspect that a reasonable cut-off point is somethinig
> like "8*sizeof(long)". But look at things like "struct kstat" uses
> etc, the limit might actually be even higher than that.

Ok, sounds to me like we want to leave the small and build-time known
sizes to gcc to optimize. Especially if you want to clear only a subset
of the struct members and set the rest.

> Also note that while "rep stosb" is _reasonably_ good with current
> CPU's (ie roughly gen 8+), it's not so great a few generations ago
> (gen 6ish), and it can be absolutely horrid on older cores and/or
> atom. The limit for when it is a win ends up depending on whether I$
> footprint is an issue too, of course, but some of the bigger wins tend
> to happen when you have sizes >= 128.

Ok, so I have X86_FEATURE_ERMS set on this old IVB laptop so if gen8 and
newer do perform better, then we need to change our feature detection to
clear ERMS on those older generations and really leave it set only on
gen8+.

I'll change my benchmark to do explicit small-sized moves (instead of
using __builtin_memset) to see whether I can compare performance better.

Which also begs the question do we do __builtin_memset() at all or we
code those small-sized moves ourselves? We'll lose the advantage of the
partial struct members clearing but then we avoid the differences the
different compiler versions would introduce when the builtin is used.

And last but not least we should not leave readability somewhere along
the way: I'd prefer good performance and maintainable code than some
hypothetical max perf on some uarch but ugly and unreadable macro
mess...

> You can basically always beat "rep movs/stos" with hand-tuned AVX2/512
> code for specific cases if you don't look at I$ footprint and the cost
> of the AVX setup (and the cost of frequency changes, which often go
> hand-in-hand with the AVX use). So "rep movs/stos" is seldom
> _optimal_, but it tends to be "quite good" for modern CPU's with
> variable sizes that are in the 100+ byte range.

Right. If we did this, it would be a conditional in front of the REP;
STOS but that ain't worse than our current CALL+JMP.

Thx.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

  parent reply	other threads:[~2019-09-17  8:16 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-13  7:22 [RFC] Improve memset Borislav Petkov
2019-09-13  7:35 ` Ingo Molnar
2019-09-13  7:50   ` Borislav Petkov
2019-09-13  8:51 ` Rasmus Villemoes
2019-09-13  9:00 ` Linus Torvalds
2019-09-13  9:18   ` Rasmus Villemoes
2019-09-13 10:42     ` Borislav Petkov
2019-09-13 16:36       ` Borislav Petkov
2019-09-16  9:18         ` Rasmus Villemoes
2019-09-16 17:25           ` Linus Torvalds
2019-09-16 17:40             ` Andy Lutomirski
2019-09-16 21:29               ` Linus Torvalds
2019-09-16 23:13                 ` Andy Lutomirski
2019-09-16 23:26                   ` Linus Torvalds
2019-09-17  8:15             ` Borislav Petkov [this message]
2019-09-17 10:55             ` David Laight
2019-09-17 20:10 ` Josh Poimboeuf
2019-09-17 20:45   ` Linus Torvalds
2019-09-19 12:55     ` Borislav Petkov
2019-09-19 12:49   ` Borislav Petkov
2019-09-14  9:29 Alexey Dobriyan
2019-09-14 11:39 ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190917081549.GB12174@nazgul.tnic \
    --to=bp@alien8.de \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=luto@kernel.org \
    --cc=mail@rasmusvillemoes.dk \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).