linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Andy Lutomirski <luto@kernel.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Borislav Petkov <bp@alien8.de>,
	Rasmus Villemoes <mail@rasmusvillemoes.dk>,
	x86-ml <x86@kernel.org>, Josh Poimboeuf <jpoimboe@redhat.com>,
	lkml <linux-kernel@vger.kernel.org>
Subject: Re: [RFC] Improve memset
Date: Mon, 16 Sep 2019 14:29:52 -0700	[thread overview]
Message-ID: <CAHk-=whZJhU3c-djPcwCPyYh0y1YXKeyBuJZjq3CzW3v_YHgeg@mail.gmail.com> (raw)
In-Reply-To: <CALCETrX8sR8ELEvUpdHug498dU6+MWSy_SagaRbuZZ9fkztmfw@mail.gmail.com>

On Mon, Sep 16, 2019 at 10:41 AM Andy Lutomirski <luto@kernel.org> wrote:
>
> After some experimentation, I think y'all are just doing it wrong.
> GCC is very clever about this as long as it's given the chance.  This
> test, for example, generates excellent code:
>
> #include <string.h>
>
> __THROW __nonnull ((1)) __attribute__((always_inline)) void
> *memset(void *s, int c, size_t n)
> {
>     asm volatile ("nop");
>     return s;
> }
>
> /* generates 'nop' */
> void zero(void *dest, size_t size)
> {
>     __builtin_memset(dest, 0, size);
> }

I think the point was that we'd like to get the default memset (for
when __builtin_memset() doesn't generate inline code) also inlined
into just "rep stosb", instead of that tail-call "jmp memset".

> So I'm thinking maybe compiler.h should actually do something like:
>
> #define memset __builtin_memset
>
> and we should have some appropriate magic so that the memset inline is
> exempt from the macro.

That "appropriate magic" is easy enough: make sure the memset inline
shows up before the macro definition.

However, gcc never actually inlines the memset() for me, always doing
that "jmp memset"

> FWIW, this obviously wrong code:
>
> __THROW __nonnull ((1)) __attribute__((always_inline)) void
> *memset(void *s, int c, size_t n)
> {
>     __builtin_memset(s, c, n);
>     return s;
> }
>
> generates 'jmp memset'.  It's not entirely clear to me exactly what's
> happening here.

I think calling memset() is always the default fallback for
__builtin_memset, and because it can't be recursiveyl inlined, it's
done as a call. Which is then turned into a tailcall because the
calling conventions match, thus the "jmp memset".

But as mentioned, the example you claim generates excellent code
doesn't actually inline memset() at all for me, and they are all doing
"jmp memset" except for the cases that get turned into direct stores.

Eg (removing the cfi annotations etc stuff):

        zero:
                movq    %rsi, %rdx
                xorl    %esi, %esi
                jmp     memset

rather than that "nop" showing up inside the zero function.

But I agree that when __builtin_memset() generates manual inline code,
it does the right thing, ie

        memset_a_bit:
                movl    $0, (%rdi)
                ret

is clearly the right thing to do. We knew that.

                  Linus

  reply	other threads:[~2019-09-16 21:30 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-13  7:22 [RFC] Improve memset Borislav Petkov
2019-09-13  7:35 ` Ingo Molnar
2019-09-13  7:50   ` Borislav Petkov
2019-09-13  8:51 ` Rasmus Villemoes
2019-09-13  9:00 ` Linus Torvalds
2019-09-13  9:18   ` Rasmus Villemoes
2019-09-13 10:42     ` Borislav Petkov
2019-09-13 16:36       ` Borislav Petkov
2019-09-16  9:18         ` Rasmus Villemoes
2019-09-16 17:25           ` Linus Torvalds
2019-09-16 17:40             ` Andy Lutomirski
2019-09-16 21:29               ` Linus Torvalds [this message]
2019-09-16 23:13                 ` Andy Lutomirski
2019-09-16 23:26                   ` Linus Torvalds
2019-09-17  8:15             ` Borislav Petkov
2019-09-17 10:55             ` David Laight
2019-09-17 20:10 ` Josh Poimboeuf
2019-09-17 20:45   ` Linus Torvalds
2019-09-19 12:55     ` Borislav Petkov
2019-09-19 12:49   ` Borislav Petkov
2019-09-14  9:29 Alexey Dobriyan
2019-09-14 11:39 ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=whZJhU3c-djPcwCPyYh0y1YXKeyBuJZjq3CzW3v_YHgeg@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=luto@kernel.org \
    --cc=mail@rasmusvillemoes.dk \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).