linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rasmus Villemoes <linux@rasmusvillemoes.dk>
To: Borislav Petkov <bp@alien8.de>, x86-ml <x86@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	lkml <linux-kernel@vger.kernel.org>
Subject: Re: [RFC] Improve memset
Date: Fri, 13 Sep 2019 10:51:46 +0200	[thread overview]
Message-ID: <51e0a92e-f976-2a3c-b583-cc7696e711bf@rasmusvillemoes.dk> (raw)
In-Reply-To: <20190913072237.GA12381@zn.tnic>

On 13/09/2019 09.22, Borislav Petkov wrote:
> 
> Instead of calling memset:
> 
> ffffffff8100cd8d:       e8 0e 15 7a 00          callq  ffffffff817ae2a0 <__memset>
> 
> and having a JMP inside it depending on the feature supported, let's simply
> have the REP; STOSB directly in the code:
> 
> ...
> ffffffff81000442:       4c 89 d7                mov    %r10,%rdi
> ffffffff81000445:       b9 00 10 00 00          mov    $0x1000,%ecx
> 
> <---- new memset
> ffffffff8100044a:       f3 aa                   rep stos %al,%es:(%rdi)
> ffffffff8100044c:       90                      nop
> ffffffff8100044d:       90                      nop
> ffffffff8100044e:       90                      nop
> <----
> 

> The result is this:
> 
> static __always_inline void *memset(void *dest, int c, size_t n)
> {
>         void *ret, *dummy;

How is this going to affect cases like memset(p, 0, 4/8/16); where gcc
would normally just do one or two word stores? Is rep; stosb still
competitive in that case? It seems that gcc doesn't recognize memset as
a builtin with this always_inline definition present [1].

>         asm volatile(ALTERNATIVE_2_REVERSE("rep; stosb",
>                                            "call memset_rep",  X86_FEATURE_ERMS,
>                                            "call memset_orig", X86_FEATURE_REP_GOOD)
>                 : "=&D" (ret), "=a" (dummy)
>                 : "0" (dest), "a" (c), "c" (n)
>                 /* clobbers used by memset_orig() and memset_rep_good() */
>                 : "rsi", "rdx", "r8", "r9", "memory");
> 
>         return dest;
> }
> 

Also, am I reading this

>  #include <asm/export.h>
>  
> -.weak memset
> -
>  /*
>   */
> -ENTRY(memset)
> -ENTRY(__memset)

right so there's no longer an actual memset symbol in vmlinux? It seems
that despite the above always_inline definition of memset, gcc can still
emit calls to that to implement large initalizations. (The gcc docs are
actually explicit about that, even under -ffreestanding, "GCC requires
the freestanding environment provide 'memcpy', 'memmove', 'memset' and
'memcmp'.")

[1] I tried this silly stripped-down version with gcc-8:

//#include <string.h>
#include <stddef.h>

#if 1
#define always_inline __inline__ __attribute__((__always_inline__))
static always_inline void *memset(void *dest, int c, size_t n)
{
        void *ret, *dummy;

        asm volatile("rep; stosb"
                : "=&D" (ret), "=a" (dummy)
                : "0" (dest), "a" (c), "c" (n)
                /* clobbers used by memset_orig() and memset_rep_good() */
                : "rsi", "rdx", "r8", "r9", "memory");

        return dest;
}
#endif

struct s { long x; long y; };
int h(struct s *s);
int f(void)
{
	struct s s;
	memset(&s, 0, sizeof(s));
	return h(&s);
}

int g(void)
{
	struct s s[1024] = {};
	return h(s);
}

With or without the string.h include, the result was

0000000000000000 <f>:
   0:   48 83 ec 18             sub    $0x18,%rsp
   4:   31 c0                   xor    %eax,%eax
   6:   b9 10 00 00 00          mov    $0x10,%ecx
   b:   49 89 e2                mov    %rsp,%r10
   e:   4c 89 d7                mov    %r10,%rdi
  11:   f3 aa                   rep stos %al,%es:(%rdi)
  13:   4c 89 d7                mov    %r10,%rdi
  16:   e8 00 00 00 00          callq  1b <f+0x1b>
                        17: R_X86_64_PLT32      h-0x4
  1b:   48 83 c4 18             add    $0x18,%rsp
  1f:   c3                      retq

0000000000000020 <g>:
  20:   48 81 ec 08 40 00 00    sub    $0x4008,%rsp
  27:   ba 00 40 00 00          mov    $0x4000,%edx
  2c:   31 f6                   xor    %esi,%esi
  2e:   48 89 e1                mov    %rsp,%rcx
  31:   48 89 cf                mov    %rcx,%rdi
  34:   e8 00 00 00 00          callq  39 <g+0x19>
                        35: R_X86_64_PLT32      memset-0x4
  39:   48 89 c7                mov    %rax,%rdi
  3c:   e8 00 00 00 00          callq  41 <g+0x21>
                        3d: R_X86_64_PLT32      h-0x4
  41:   48 81 c4 08 40 00 00    add    $0x4008,%rsp
  48:   c3                      retq


With the asm version #if 0'ed out, f() uses two movq (and yields a
warning if the string.h include is omitted, but it still recognizes
memset()).

Rasmus

  parent reply	other threads:[~2019-09-13  8:51 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-13  7:22 [RFC] Improve memset Borislav Petkov
2019-09-13  7:35 ` Ingo Molnar
2019-09-13  7:50   ` Borislav Petkov
2019-09-13  8:51 ` Rasmus Villemoes [this message]
2019-09-13  9:00 ` Linus Torvalds
2019-09-13  9:18   ` Rasmus Villemoes
2019-09-13 10:42     ` Borislav Petkov
2019-09-13 16:36       ` Borislav Petkov
2019-09-16  9:18         ` Rasmus Villemoes
2019-09-16 17:25           ` Linus Torvalds
2019-09-16 17:40             ` Andy Lutomirski
2019-09-16 21:29               ` Linus Torvalds
2019-09-16 23:13                 ` Andy Lutomirski
2019-09-16 23:26                   ` Linus Torvalds
2019-09-17  8:15             ` Borislav Petkov
2019-09-17 10:55             ` David Laight
2019-09-17 20:10 ` Josh Poimboeuf
2019-09-17 20:45   ` Linus Torvalds
2019-09-19 12:55     ` Borislav Petkov
2019-09-19 12:49   ` Borislav Petkov
2019-09-14  9:29 Alexey Dobriyan
2019-09-14 11:39 ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51e0a92e-f976-2a3c-b583-cc7696e711bf@rasmusvillemoes.dk \
    --to=linux@rasmusvillemoes.dk \
    --cc=bp@alien8.de \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).