linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH]     x86: Alias memset to __builtin_memset.
@ 2020-03-23 11:42 Clement Courbet
  2020-03-23 15:47 ` Nathan Chancellor
                   ` (8 more replies)
  0 siblings, 9 replies; 17+ messages in thread
From: Clement Courbet @ 2020-03-23 11:42 UTC (permalink / raw)
  Cc: Clement Courbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, x86, linux-kernel, clang-built-linux

    Recent compilers know the meaning of builtins (`memset`,
    `memcpy`, ...) and can replace calls by inline code when
    deemed better. For example, `memset(p, 0, 4)` will be lowered
    to a four-byte zero store.

    When using -ffreestanding (this is the case e.g. building on
    clang), these optimizations are disabled. This means that **all**
    memsets, including those with small, constant sizes, will result
    in an actual call to memset.

    We have identified several spots where we have high CPU usage
    because of this. For example, a single one of these memsets is
    responsible for about 0.3% of our total CPU usage in the kernel.

    Aliasing `memset` to `__builtin_memset` allows the compiler to
    perform this optimization even when -ffreestanding is used.
    There is no change when -ffreestanding is not used.

    Below is a diff (clang) for `update_sg_lb_stats()`, which
    includes the aforementionned hot memset:
        memset(sgs, 0, sizeof(*sgs));

    Diff:
        movq %rsi, %rbx        ~~~>  movq $0x0, 0x40(%r8)
        movq %rdi, %r15        ~~~>  movq $0x0, 0x38(%r8)
        movl $0x48, %edx       ~~~>  movq $0x0, 0x30(%r8)
        movq %r8, %rdi         ~~~>  movq $0x0, 0x28(%r8)
        xorl %esi, %esi        ~~~>  movq $0x0, 0x20(%r8)
        callq <memset>         ~~~>  movq $0x0, 0x18(%r8)
                               ~~~>  movq $0x0, 0x10(%r8)
                               ~~~>  movq $0x0, 0x8(%r8)
                               ~~~>  movq $0x0, (%r8)

    In terms of code size, this grows the clang-built kernel a
    bit (+0.022%):
    440285512 vmlinux.clang.after
    440383608 vmlinux.clang.before

Signed-off-by: Clement Courbet <courbet@google.com>
---
 arch/x86/include/asm/string_64.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 75314c3dbe47..7073c25aa4a3 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -18,6 +18,15 @@ extern void *__memcpy(void *to, const void *from, size_t len);
 void *memset(void *s, int c, size_t n);
 void *__memset(void *s, int c, size_t n);
 
+/* Recent compilers can generate much better code for known size and/or
+ * fill values, and will fallback on `memset` if they fail.
+ * We alias `memset` to `__builtin_memset` explicitly to inform the compiler to
+ * perform this optimization even when -ffreestanding is used.
+ */
+#if (__GNUC__ >= 4)
+#define memset(s, c, count) __builtin_memset(s, c, count)
+#endif
+
 #define __HAVE_ARCH_MEMSET16
 static inline void *memset16(uint16_t *s, uint16_t v, size_t n)
 {
@@ -74,6 +83,7 @@ int strcmp(const char *cs, const char *ct);
 #undef memcpy
 #define memcpy(dst, src, len) __memcpy(dst, src, len)
 #define memmove(dst, src, len) __memmove(dst, src, len)
+#undef memset
 #define memset(s, c, n) __memset(s, c, n)
 
 #ifndef __NO_FORTIFY
-- 
2.25.1.696.g5e7596f4ac-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-03-27 17:13 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-23 11:42 [PATCH] x86: Alias memset to __builtin_memset Clement Courbet
2020-03-23 15:47 ` Nathan Chancellor
2020-03-24  1:22 ` Nick Desaulniers
2020-03-24  1:26   ` Nick Desaulniers
2020-03-24 14:06 ` Clement Courbet
2020-03-24 17:29   ` Nick Desaulniers
2020-03-24 14:07 ` [PATCH v2] " Clement Courbet
2020-03-24 15:08   ` Joe Perches
2020-03-24 15:59 ` [PATCH v3] " Clement Courbet
2020-03-25 11:33   ` Bernd Petrovitsch
2020-03-24 16:02 ` [PATCH] " Clement Courbet
2020-03-25  7:59 ` Clement Courbet
2020-03-26 12:25 ` [PATCH v4] " Clement Courbet
2020-03-26 12:38 ` [PATCH] " Clement Courbet
2020-03-26 17:18   ` Nick Desaulniers
2020-03-27  4:06   ` Michael Ellerman
2020-03-27 17:12   ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).