linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH]     x86: Alias memset to __builtin_memset.
@ 2020-03-23 11:42 Clement Courbet
  2020-03-23 15:47 ` Nathan Chancellor
                   ` (8 more replies)
  0 siblings, 9 replies; 17+ messages in thread
From: Clement Courbet @ 2020-03-23 11:42 UTC (permalink / raw)
  Cc: Clement Courbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, x86, linux-kernel, clang-built-linux

    Recent compilers know the meaning of builtins (`memset`,
    `memcpy`, ...) and can replace calls by inline code when
    deemed better. For example, `memset(p, 0, 4)` will be lowered
    to a four-byte zero store.

    When using -ffreestanding (this is the case e.g. building on
    clang), these optimizations are disabled. This means that **all**
    memsets, including those with small, constant sizes, will result
    in an actual call to memset.

    We have identified several spots where we have high CPU usage
    because of this. For example, a single one of these memsets is
    responsible for about 0.3% of our total CPU usage in the kernel.

    Aliasing `memset` to `__builtin_memset` allows the compiler to
    perform this optimization even when -ffreestanding is used.
    There is no change when -ffreestanding is not used.

    Below is a diff (clang) for `update_sg_lb_stats()`, which
    includes the aforementionned hot memset:
        memset(sgs, 0, sizeof(*sgs));

    Diff:
        movq %rsi, %rbx        ~~~>  movq $0x0, 0x40(%r8)
        movq %rdi, %r15        ~~~>  movq $0x0, 0x38(%r8)
        movl $0x48, %edx       ~~~>  movq $0x0, 0x30(%r8)
        movq %r8, %rdi         ~~~>  movq $0x0, 0x28(%r8)
        xorl %esi, %esi        ~~~>  movq $0x0, 0x20(%r8)
        callq <memset>         ~~~>  movq $0x0, 0x18(%r8)
                               ~~~>  movq $0x0, 0x10(%r8)
                               ~~~>  movq $0x0, 0x8(%r8)
                               ~~~>  movq $0x0, (%r8)

    In terms of code size, this grows the clang-built kernel a
    bit (+0.022%):
    440285512 vmlinux.clang.after
    440383608 vmlinux.clang.before

Signed-off-by: Clement Courbet <courbet@google.com>
---
 arch/x86/include/asm/string_64.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 75314c3dbe47..7073c25aa4a3 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -18,6 +18,15 @@ extern void *__memcpy(void *to, const void *from, size_t len);
 void *memset(void *s, int c, size_t n);
 void *__memset(void *s, int c, size_t n);
 
+/* Recent compilers can generate much better code for known size and/or
+ * fill values, and will fallback on `memset` if they fail.
+ * We alias `memset` to `__builtin_memset` explicitly to inform the compiler to
+ * perform this optimization even when -ffreestanding is used.
+ */
+#if (__GNUC__ >= 4)
+#define memset(s, c, count) __builtin_memset(s, c, count)
+#endif
+
 #define __HAVE_ARCH_MEMSET16
 static inline void *memset16(uint16_t *s, uint16_t v, size_t n)
 {
@@ -74,6 +83,7 @@ int strcmp(const char *cs, const char *ct);
 #undef memcpy
 #define memcpy(dst, src, len) __memcpy(dst, src, len)
 #define memmove(dst, src, len) __memmove(dst, src, len)
+#undef memset
 #define memset(s, c, n) __memset(s, c, n)
 
 #ifndef __NO_FORTIFY
-- 
2.25.1.696.g5e7596f4ac-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH]     x86: Alias memset to __builtin_memset.
  2020-03-23 11:42 [PATCH] x86: Alias memset to __builtin_memset Clement Courbet
@ 2020-03-23 15:47 ` Nathan Chancellor
  2020-03-24  1:22 ` Nick Desaulniers
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Nathan Chancellor @ 2020-03-23 15:47 UTC (permalink / raw)
  To: Clement Courbet
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, linux-kernel, clang-built-linux

On Mon, Mar 23, 2020 at 12:42:06PM +0100, 'Clement Courbet' via Clang Built Linux wrote:
>     Recent compilers know the meaning of builtins (`memset`,
>     `memcpy`, ...) and can replace calls by inline code when
>     deemed better. For example, `memset(p, 0, 4)` will be lowered
>     to a four-byte zero store.
> 
>     When using -ffreestanding (this is the case e.g. building on
>     clang), these optimizations are disabled. This means that **all**
>     memsets, including those with small, constant sizes, will result
>     in an actual call to memset.
> 
>     We have identified several spots where we have high CPU usage
>     because of this. For example, a single one of these memsets is
>     responsible for about 0.3% of our total CPU usage in the kernel.
> 
>     Aliasing `memset` to `__builtin_memset` allows the compiler to
>     perform this optimization even when -ffreestanding is used.
>     There is no change when -ffreestanding is not used.
> 
>     Below is a diff (clang) for `update_sg_lb_stats()`, which
>     includes the aforementionned hot memset:
>         memset(sgs, 0, sizeof(*sgs));
> 
>     Diff:
>         movq %rsi, %rbx        ~~~>  movq $0x0, 0x40(%r8)
>         movq %rdi, %r15        ~~~>  movq $0x0, 0x38(%r8)
>         movl $0x48, %edx       ~~~>  movq $0x0, 0x30(%r8)
>         movq %r8, %rdi         ~~~>  movq $0x0, 0x28(%r8)
>         xorl %esi, %esi        ~~~>  movq $0x0, 0x20(%r8)
>         callq <memset>         ~~~>  movq $0x0, 0x18(%r8)
>                                ~~~>  movq $0x0, 0x10(%r8)
>                                ~~~>  movq $0x0, 0x8(%r8)
>                                ~~~>  movq $0x0, (%r8)
> 
>     In terms of code size, this grows the clang-built kernel a
>     bit (+0.022%):
>     440285512 vmlinux.clang.after
>     440383608 vmlinux.clang.before
> 
> Signed-off-by: Clement Courbet <courbet@google.com>
> ---
>  arch/x86/include/asm/string_64.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
> index 75314c3dbe47..7073c25aa4a3 100644
> --- a/arch/x86/include/asm/string_64.h
> +++ b/arch/x86/include/asm/string_64.h
> @@ -18,6 +18,15 @@ extern void *__memcpy(void *to, const void *from, size_t len);
>  void *memset(void *s, int c, size_t n);
>  void *__memset(void *s, int c, size_t n);
>  
> +/* Recent compilers can generate much better code for known size and/or
> + * fill values, and will fallback on `memset` if they fail.
> + * We alias `memset` to `__builtin_memset` explicitly to inform the compiler to
> + * perform this optimization even when -ffreestanding is used.
> + */
> +#if (__GNUC__ >= 4)

This ifdef is unnecessary, this will always be true because the minimum
supported GCC version is 4.6 and clang pretends it is 4.2.1. It appears
the Intel compiler will pretend to be whatever whatever GCC version the
host is using (no idea if it is still used by anyone or truly supported
but still worth mentioning) so it would still always be true because of
the GCC 4.6 requirement.

I cannot comment on the validity of the patch even though the reasoning
seems sound to me.

Cheers,
Nathan

> +#define memset(s, c, count) __builtin_memset(s, c, count)
> +#endif
> +
>  #define __HAVE_ARCH_MEMSET16
>  static inline void *memset16(uint16_t *s, uint16_t v, size_t n)
>  {
> @@ -74,6 +83,7 @@ int strcmp(const char *cs, const char *ct);
>  #undef memcpy
>  #define memcpy(dst, src, len) __memcpy(dst, src, len)
>  #define memmove(dst, src, len) __memmove(dst, src, len)
> +#undef memset
>  #define memset(s, c, n) __memset(s, c, n)
>  
>  #ifndef __NO_FORTIFY
> -- 
> 2.25.1.696.g5e7596f4ac-goog
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86: Alias memset to __builtin_memset.
  2020-03-23 11:42 [PATCH] x86: Alias memset to __builtin_memset Clement Courbet
  2020-03-23 15:47 ` Nathan Chancellor
@ 2020-03-24  1:22 ` Nick Desaulniers
  2020-03-24  1:26   ` Nick Desaulniers
  2020-03-24 14:06 ` Clement Courbet
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 17+ messages in thread
From: Nick Desaulniers @ 2020-03-24  1:22 UTC (permalink / raw)
  To: Clement Courbet
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	LKML, clang-built-linux

On Mon, Mar 23, 2020 at 4:43 AM 'Clement Courbet' via Clang Built
Linux <clang-built-linux@googlegroups.com> wrote:
>
>     Recent compilers know the meaning of builtins (`memset`,
>     `memcpy`, ...) and can replace calls by inline code when
>     deemed better. For example, `memset(p, 0, 4)` will be lowered
>     to a four-byte zero store.
>
>     When using -ffreestanding (this is the case e.g. building on
>     clang), these optimizations are disabled. This means that **all**
>     memsets, including those with small, constant sizes, will result
>     in an actual call to memset.

Isn't this only added for 32b x86 (if I'm reading arch/x86/Makefile
right)? Who's adding it for 64b?

arch/x86/Makefile has a comment:
 88         # temporary until string.h is fixed
 89         KBUILD_CFLAGS += -ffreestanding
Did you look into fixing that?

>
>     We have identified several spots where we have high CPU usage
>     because of this. For example, a single one of these memsets is
>     responsible for about 0.3% of our total CPU usage in the kernel.
>
>     Aliasing `memset` to `__builtin_memset` allows the compiler to
>     perform this optimization even when -ffreestanding is used.
>     There is no change when -ffreestanding is not used.
>
>     Below is a diff (clang) for `update_sg_lb_stats()`, which
>     includes the aforementionned hot memset:
>         memset(sgs, 0, sizeof(*sgs));
>
>     Diff:
>         movq %rsi, %rbx        ~~~>  movq $0x0, 0x40(%r8)
>         movq %rdi, %r15        ~~~>  movq $0x0, 0x38(%r8)
>         movl $0x48, %edx       ~~~>  movq $0x0, 0x30(%r8)
>         movq %r8, %rdi         ~~~>  movq $0x0, 0x28(%r8)
>         xorl %esi, %esi        ~~~>  movq $0x0, 0x20(%r8)
>         callq <memset>         ~~~>  movq $0x0, 0x18(%r8)
>                                ~~~>  movq $0x0, 0x10(%r8)
>                                ~~~>  movq $0x0, 0x8(%r8)
>                                ~~~>  movq $0x0, (%r8)
>
>     In terms of code size, this grows the clang-built kernel a
>     bit (+0.022%):
>     440285512 vmlinux.clang.after
>     440383608 vmlinux.clang.before

The before number looks bigger? Did it shrink in size, or was the
above mislabeled?

>
> Signed-off-by: Clement Courbet <courbet@google.com>
> ---
>  arch/x86/include/asm/string_64.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
> index 75314c3dbe47..7073c25aa4a3 100644
> --- a/arch/x86/include/asm/string_64.h
> +++ b/arch/x86/include/asm/string_64.h
> @@ -18,6 +18,15 @@ extern void *__memcpy(void *to, const void *from, size_t len);
>  void *memset(void *s, int c, size_t n);
>  void *__memset(void *s, int c, size_t n);
>
> +/* Recent compilers can generate much better code for known size and/or
> + * fill values, and will fallback on `memset` if they fail.
> + * We alias `memset` to `__builtin_memset` explicitly to inform the compiler to
> + * perform this optimization even when -ffreestanding is used.
> + */
> +#if (__GNUC__ >= 4)
> +#define memset(s, c, count) __builtin_memset(s, c, count)
> +#endif
> +
>  #define __HAVE_ARCH_MEMSET16
>  static inline void *memset16(uint16_t *s, uint16_t v, size_t n)
>  {
> @@ -74,6 +83,7 @@ int strcmp(const char *cs, const char *ct);
>  #undef memcpy
>  #define memcpy(dst, src, len) __memcpy(dst, src, len)
>  #define memmove(dst, src, len) __memmove(dst, src, len)
> +#undef memset
>  #define memset(s, c, n) __memset(s, c, n)
>
>  #ifndef __NO_FORTIFY
> --

-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86: Alias memset to __builtin_memset.
  2020-03-24  1:22 ` Nick Desaulniers
@ 2020-03-24  1:26   ` Nick Desaulniers
  0 siblings, 0 replies; 17+ messages in thread
From: Nick Desaulniers @ 2020-03-24  1:26 UTC (permalink / raw)
  To: Clement Courbet
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	LKML, clang-built-linux

On Mon, Mar 23, 2020 at 6:22 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> On Mon, Mar 23, 2020 at 4:43 AM 'Clement Courbet' via Clang Built
> Linux <clang-built-linux@googlegroups.com> wrote:
> >
> >     Recent compilers know the meaning of builtins (`memset`,
> >     `memcpy`, ...) and can replace calls by inline code when
> >     deemed better. For example, `memset(p, 0, 4)` will be lowered
> >     to a four-byte zero store.
> >
> >     When using -ffreestanding (this is the case e.g. building on
> >     clang), these optimizations are disabled. This means that **all**
> >     memsets, including those with small, constant sizes, will result
> >     in an actual call to memset.
>
> Isn't this only added for 32b x86 (if I'm reading arch/x86/Makefile
> right)? Who's adding it for 64b?
>
> arch/x86/Makefile has a comment:
>  88         # temporary until string.h is fixed
>  89         KBUILD_CFLAGS += -ffreestanding
> Did you look into fixing that?
>
> >
> >     We have identified several spots where we have high CPU usage
> >     because of this. For example, a single one of these memsets is
> >     responsible for about 0.3% of our total CPU usage in the kernel.
> >
> >     Aliasing `memset` to `__builtin_memset` allows the compiler to
> >     perform this optimization even when -ffreestanding is used.
> >     There is no change when -ffreestanding is not used.
> >
> >     Below is a diff (clang) for `update_sg_lb_stats()`, which
> >     includes the aforementionned hot memset:
> >         memset(sgs, 0, sizeof(*sgs));

Further, `make CC=clang -j71 kernel/sched/fair.o V=1` doesn't show
`-ffreestanding` being used.  Any idea where it's coming from in your
build? Maybe a local modification to be removed?

> >
> >     Diff:
> >         movq %rsi, %rbx        ~~~>  movq $0x0, 0x40(%r8)
> >         movq %rdi, %r15        ~~~>  movq $0x0, 0x38(%r8)
> >         movl $0x48, %edx       ~~~>  movq $0x0, 0x30(%r8)
> >         movq %r8, %rdi         ~~~>  movq $0x0, 0x28(%r8)
> >         xorl %esi, %esi        ~~~>  movq $0x0, 0x20(%r8)
> >         callq <memset>         ~~~>  movq $0x0, 0x18(%r8)
> >                                ~~~>  movq $0x0, 0x10(%r8)
> >                                ~~~>  movq $0x0, 0x8(%r8)
> >                                ~~~>  movq $0x0, (%r8)
> >
> >     In terms of code size, this grows the clang-built kernel a
> >     bit (+0.022%):
> >     440285512 vmlinux.clang.after
> >     440383608 vmlinux.clang.before
>
> The before number looks bigger? Did it shrink in size, or was the
> above mislabeled?
>
> >
> > Signed-off-by: Clement Courbet <courbet@google.com>
> > ---
> >  arch/x86/include/asm/string_64.h | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
> > index 75314c3dbe47..7073c25aa4a3 100644
> > --- a/arch/x86/include/asm/string_64.h
> > +++ b/arch/x86/include/asm/string_64.h
> > @@ -18,6 +18,15 @@ extern void *__memcpy(void *to, const void *from, size_t len);
> >  void *memset(void *s, int c, size_t n);
> >  void *__memset(void *s, int c, size_t n);
> >
> > +/* Recent compilers can generate much better code for known size and/or
> > + * fill values, and will fallback on `memset` if they fail.
> > + * We alias `memset` to `__builtin_memset` explicitly to inform the compiler to
> > + * perform this optimization even when -ffreestanding is used.
> > + */
> > +#if (__GNUC__ >= 4)
> > +#define memset(s, c, count) __builtin_memset(s, c, count)
> > +#endif
> > +
> >  #define __HAVE_ARCH_MEMSET16
> >  static inline void *memset16(uint16_t *s, uint16_t v, size_t n)
> >  {
> > @@ -74,6 +83,7 @@ int strcmp(const char *cs, const char *ct);
> >  #undef memcpy
> >  #define memcpy(dst, src, len) __memcpy(dst, src, len)
> >  #define memmove(dst, src, len) __memmove(dst, src, len)
> > +#undef memset
> >  #define memset(s, c, n) __memset(s, c, n)
> >
> >  #ifndef __NO_FORTIFY
> > --
>
> --
> Thanks,
> ~Nick Desaulniers



-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH]     x86: Alias memset to __builtin_memset.
  2020-03-23 11:42 [PATCH] x86: Alias memset to __builtin_memset Clement Courbet
  2020-03-23 15:47 ` Nathan Chancellor
  2020-03-24  1:22 ` Nick Desaulniers
@ 2020-03-24 14:06 ` Clement Courbet
  2020-03-24 17:29   ` Nick Desaulniers
  2020-03-24 14:07 ` [PATCH v2] " Clement Courbet
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 17+ messages in thread
From: Clement Courbet @ 2020-03-24 14:06 UTC (permalink / raw)
  Cc: Nathan Chancellor, Kees Cook, Nick Desaulniers, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, x86,
	Clement Courbet, linux-kernel, clang-built-linux

Thanks for the comments. Answers below.

> This ifdef is unnecessary
> This needs to be within #ifndef CONFIG_FORTIFY_SOURCE

Thanks, fixed.

> shouldn't this just be done universally ?

It looks like every architecture does its own magic with memory
functions. I'm not very familiar with how the linux kernel is
organized, do you have a pointer for where this would go if enabled
globally ?

> Who's adding it for 64b?
> Any idea where it's coming from in your
> build? Maybe a local modification to be removed?

Actually this is from our local build configuration. Apparently this
is needed to build on some architectures that redefine common
symbols, but the authors seemed to feel strongly that this should be
on for all architectures. I've reached out to the authors for an
extended rationale.
On the other hand I think that there is no reason to prevent people
from building with freestanding if we can easily allow them to.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2]     x86: Alias memset to __builtin_memset.
  2020-03-23 11:42 [PATCH] x86: Alias memset to __builtin_memset Clement Courbet
                   ` (2 preceding siblings ...)
  2020-03-24 14:06 ` Clement Courbet
@ 2020-03-24 14:07 ` Clement Courbet
  2020-03-24 15:08   ` Joe Perches
  2020-03-24 15:59 ` [PATCH v3] " Clement Courbet
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 17+ messages in thread
From: Clement Courbet @ 2020-03-24 14:07 UTC (permalink / raw)
  Cc: Nathan Chancellor, Kees Cook, Nick Desaulniers, Clement Courbet,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, linux-kernel, clang-built-linux

    Recent compilers know the meaning of builtins (`memset`,
    `memcpy`, ...) and can replace calls by inline code when
    deemed better. For example, `memset(p, 0, 4)` will be lowered
    to a four-byte zero store.

    When using -ffreestanding (this is the case e.g. building on
    clang), these optimizations are disabled. This means that **all**
    memsets, including those with small, constant sizes, will result
    in an actual call to memset.

    We have identified several spots where we have high CPU usage
    because of this. For example, a single one of these memsets is
    responsible for about 0.3% of our total CPU usage in the kernel.

    Aliasing `memset` to `__builtin_memset` allows the compiler to
    perform this optimization even when -ffreestanding is used.
    There is no change when -ffreestanding is not used.

    Below is a diff (clang) for `update_sg_lb_stats()`, which
    includes the aforementionned hot memset:
        memset(sgs, 0, sizeof(*sgs));

    Diff:
        movq %rsi, %rbx        ~~~>  movq $0x0, 0x40(%r8)
        movq %rdi, %r15        ~~~>  movq $0x0, 0x38(%r8)
        movl $0x48, %edx       ~~~>  movq $0x0, 0x30(%r8)
        movq %r8, %rdi         ~~~>  movq $0x0, 0x28(%r8)
        xorl %esi, %esi        ~~~>  movq $0x0, 0x20(%r8)
        callq <memset>         ~~~>  movq $0x0, 0x18(%r8)
                               ~~~>  movq $0x0, 0x10(%r8)
                               ~~~>  movq $0x0, 0x8(%r8)
                               ~~~>  movq $0x0, (%r8)

    In terms of code size, this grows the clang-built kernel a
    bit (+0.022%):
    440285512 vmlinux.clang.after
    440383608 vmlinux.clang.before

Signed-off-by: Clement Courbet <courbet@google.com>

---

changes in v2:
  - Removed ifdef(GNUC >= 4).
  - Disabled change when CONFIG_FORTIFY_SOURCE is set.
---
 arch/x86/include/asm/string_64.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 75314c3dbe47..9cfce0a840a4 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -18,6 +18,15 @@ extern void *__memcpy(void *to, const void *from, size_t len);
 void *memset(void *s, int c, size_t n);
 void *__memset(void *s, int c, size_t n);
 
+/* Recent compilers can generate much better code for known size and/or
+ * fill values, and will fallback on `memset` if they fail.
+ * We alias `memset` to `__builtin_memset` explicitly to inform the compiler to
+ * perform this optimization even when -ffreestanding is used.
+ */
+#if !defined(CONFIG_FORTIFY_SOURCE)
+#define memset(s, c, count) __builtin_memset(s, c, count)
+#endif
+
 #define __HAVE_ARCH_MEMSET16
 static inline void *memset16(uint16_t *s, uint16_t v, size_t n)
 {
@@ -74,6 +83,7 @@ int strcmp(const char *cs, const char *ct);
 #undef memcpy
 #define memcpy(dst, src, len) __memcpy(dst, src, len)
 #define memmove(dst, src, len) __memmove(dst, src, len)
+#undef memset
 #define memset(s, c, n) __memset(s, c, n)
 
 #ifndef __NO_FORTIFY
-- 
2.25.1.696.g5e7596f4ac-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2]     x86: Alias memset to __builtin_memset.
  2020-03-24 14:07 ` [PATCH v2] " Clement Courbet
@ 2020-03-24 15:08   ` Joe Perches
  0 siblings, 0 replies; 17+ messages in thread
From: Joe Perches @ 2020-03-24 15:08 UTC (permalink / raw)
  To: Clement Courbet
  Cc: Nathan Chancellor, Kees Cook, Nick Desaulniers, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, x86, linux-kernel,
	clang-built-linux

On Tue, 2020-03-24 at 15:07 +0100, Clement Courbet wrote:
>     Recent compilers know the meaning of builtins (`memset`,
>     `memcpy`, ...) and can replace calls by inline code when
>     deemed better. For example, `memset(p, 0, 4)` will be lowered
>     to a four-byte zero store.
> 
>     When using -ffreestanding (this is the case e.g. building on
>     clang), these optimizations are disabled. This means that **all**
>     memsets, including those with small, constant sizes, will result
>     in an actual call to memset.
[]
>     In terms of code size, this grows the clang-built kernel a
>     bit (+0.022%):
>     440285512 vmlinux.clang.after
>     440383608 vmlinux.clang.before

This shows the kernel getting smaller not larger.
Is this still reversed or is this correct?



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3]     x86: Alias memset to __builtin_memset.
  2020-03-23 11:42 [PATCH] x86: Alias memset to __builtin_memset Clement Courbet
                   ` (3 preceding siblings ...)
  2020-03-24 14:07 ` [PATCH v2] " Clement Courbet
@ 2020-03-24 15:59 ` Clement Courbet
  2020-03-25 11:33   ` Bernd Petrovitsch
  2020-03-24 16:02 ` [PATCH] " Clement Courbet
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 17+ messages in thread
From: Clement Courbet @ 2020-03-24 15:59 UTC (permalink / raw)
  Cc: Nathan Chancellor, Kees Cook, Nick Desaulniers, Joe Perches,
	Clement Courbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, x86, linux-kernel, clang-built-linux

    Recent compilers know the meaning of builtins (`memset`,
    `memcpy`, ...) and can replace calls by inline code when
    deemed better. For example, `memset(p, 0, 4)` will be lowered
    to a four-byte zero store.

    When using -ffreestanding (this is the case e.g. building on
    clang), these optimizations are disabled. This means that **all**
    memsets, including those with small, constant sizes, will result
    in an actual call to memset.

    We have identified several spots where we have high CPU usage
    because of this. For example, a single one of these memsets is
    responsible for about 0.3% of our total CPU usage in the kernel.

    Aliasing `memset` to `__builtin_memset` allows the compiler to
    perform this optimization even when -ffreestanding is used.
    There is no change when -ffreestanding is not used.

    Below is a diff (clang) for `update_sg_lb_stats()`, which
    includes the aforementionned hot memset:
        memset(sgs, 0, sizeof(*sgs));

    Diff:
        movq %rsi, %rbx        ~~~>  movq $0x0, 0x40(%r8)
        movq %rdi, %r15        ~~~>  movq $0x0, 0x38(%r8)
        movl $0x48, %edx       ~~~>  movq $0x0, 0x30(%r8)
        movq %r8, %rdi         ~~~>  movq $0x0, 0x28(%r8)
        xorl %esi, %esi        ~~~>  movq $0x0, 0x20(%r8)
        callq <memset>         ~~~>  movq $0x0, 0x18(%r8)
                               ~~~>  movq $0x0, 0x10(%r8)
                               ~~~>  movq $0x0, 0x8(%r8)
                               ~~~>  movq $0x0, (%r8)

    In terms of code size, this shrins the clang-built kernel a
    bit (-0.022%):
    440383608 vmlinux.clang.before
    440285512 vmlinux.clang.after

Signed-off-by: Clement Courbet <courbet@google.com>

---

changes in v2:
  - Removed ifdef(GNUC >= 4).
  - Disabled change when CONFIG_FORTIFY_SOURCE is set.

changes in v3:
  - Fixed commit message: the kernel shrinks in size.
---
 arch/x86/include/asm/string_64.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 75314c3dbe47..9cfce0a840a4 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -18,6 +18,15 @@ extern void *__memcpy(void *to, const void *from, size_t len);
 void *memset(void *s, int c, size_t n);
 void *__memset(void *s, int c, size_t n);
 
+/* Recent compilers can generate much better code for known size and/or
+ * fill values, and will fallback on `memset` if they fail.
+ * We alias `memset` to `__builtin_memset` explicitly to inform the compiler to
+ * perform this optimization even when -ffreestanding is used.
+ */
+#if !defined(CONFIG_FORTIFY_SOURCE)
+#define memset(s, c, count) __builtin_memset(s, c, count)
+#endif
+
 #define __HAVE_ARCH_MEMSET16
 static inline void *memset16(uint16_t *s, uint16_t v, size_t n)
 {
@@ -74,6 +83,7 @@ int strcmp(const char *cs, const char *ct);
 #undef memcpy
 #define memcpy(dst, src, len) __memcpy(dst, src, len)
 #define memmove(dst, src, len) __memmove(dst, src, len)
+#undef memset
 #define memset(s, c, n) __memset(s, c, n)
 
 #ifndef __NO_FORTIFY
-- 
2.25.1.696.g5e7596f4ac-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH]     x86: Alias memset to __builtin_memset.
  2020-03-23 11:42 [PATCH] x86: Alias memset to __builtin_memset Clement Courbet
                   ` (4 preceding siblings ...)
  2020-03-24 15:59 ` [PATCH v3] " Clement Courbet
@ 2020-03-24 16:02 ` Clement Courbet
  2020-03-25  7:59 ` Clement Courbet
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Clement Courbet @ 2020-03-24 16:02 UTC (permalink / raw)
  Cc: Nathan Chancellor, Kees Cook, Nick Desaulniers, Joe Perches,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Clement Courbet, linux-kernel, clang-built-linux

> This shows the kernel getting smaller not larger.
> Is this still reversed or is this correct?

Yes, sorry. This was wrong. The kernel actully shrinks. Fixed.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86: Alias memset to __builtin_memset.
  2020-03-24 14:06 ` Clement Courbet
@ 2020-03-24 17:29   ` Nick Desaulniers
  0 siblings, 0 replies; 17+ messages in thread
From: Nick Desaulniers @ 2020-03-24 17:29 UTC (permalink / raw)
  To: Clement Courbet
  Cc: Nathan Chancellor, Kees Cook, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	LKML, clang-built-linux

On Tue, Mar 24, 2020 at 7:06 AM Clement Courbet <courbet@google.com> wrote:
>
> Thanks for the comments. Answers below.
>
> > This ifdef is unnecessary
> > This needs to be within #ifndef CONFIG_FORTIFY_SOURCE
>
> Thanks, fixed.
>
> > shouldn't this just be done universally ?
>
> It looks like every architecture does its own magic with memory
> functions. I'm not very familiar with how the linux kernel is
> organized, do you have a pointer for where this would go if enabled
> globally ?
>
> > Who's adding it for 64b?
> > Any idea where it's coming from in your
> > build? Maybe a local modification to be removed?
>
> Actually this is from our local build configuration. Apparently this

Not sure we should modify this for someone's downstream tree to work
around an issue they introduced.  If you want to file a bug
internally, I'd be happy to comment on it.

Does __builtin_memset detect support for `rep stosb`, then patch the
kernel to always use it or not?  The kernel is limited in that we use
-mno-sse and friends to avoid saving/restoring vector registers on
context switch unless kernel_fpu_{begin|end}() is called, which the
compiler doesn't insert for memcpy's.

Did you verify what this change does for other compilers?

> is needed to build on some architectures that redefine common
> symbols, but the authors seemed to feel strongly that this should be

Sounds like a linkage error or issue with symbol visibility; I don't
see how -ffreestanding should have any bearing on that.

> on for all architectures. I've reached out to the authors for an
> extended rationale.
> On the other hand I think that there is no reason to prevent people
> from building with freestanding if we can easily allow them to.

I disagree.  The codegen in the kernel is very tricky to get right;
it's somewhat an embedded system, yet provides many symbols that would
have been provided by libc, yet it doesn't use vector operations for
the general case.  Adding -ffreestanding to optimize one hot memset in
one function is using a really big hammer to solve the wrong problem.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH]     x86: Alias memset to __builtin_memset.
  2020-03-23 11:42 [PATCH] x86: Alias memset to __builtin_memset Clement Courbet
                   ` (5 preceding siblings ...)
  2020-03-24 16:02 ` [PATCH] " Clement Courbet
@ 2020-03-25  7:59 ` Clement Courbet
  2020-03-26 12:25 ` [PATCH v4] " Clement Courbet
  2020-03-26 12:38 ` [PATCH] " Clement Courbet
  8 siblings, 0 replies; 17+ messages in thread
From: Clement Courbet @ 2020-03-25  7:59 UTC (permalink / raw)
  Cc: Nathan Chancellor, Kees Cook, Nick Desaulniers, Joe Perches,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Clement Courbet, linux-kernel, clang-built-linux

> Not sure we should modify this for someone's downstream tree to work
> around an issue they introduced.  If you want to file a bug
> internally, I'd be happy to comment on it.

I don't have a strong opinion on that. I don't know the policy of the
linux kernel w.r.t. this. There is an internal bug for this, where
kernel maintainers suggested I upstream this patch.

> Does __builtin_memset detect support for `rep stosb`, then patch the
> kernel to always use it or not?

__builtin_memset just allows the compiler to recognize that this has the
semantics of a memset, exactly like it happens when -freestanding is not
specified.

In terms of what compilers do when expanding the memset, it depends on
the size.
gcc or clang obviously do not generate vector code when -no-sse is
specified, as this would break promises.

clang inlines stores for small sizes and switches to memset as the size
increases: https://godbolt.org/z/_X16xt

gcc inlines stores for tiny sizes, then switches to repstos for larger
sizes: https://godbolt.org/z/m-G233 This behaviour is additionally
configurable through command line flags: https://godbolt.org/z/wsoJpQ

> Did you verify what this change does for other compilers?

Are there other compilers are used to build the kernel on x86 ?

icc does the same as gcc and clang: https://godbolt.org/z/yCju_D

> yet it doesn't use vector operations for the general case

I'm not sure how vector operations relate to freestanding, or this change.

> Adding -ffreestanding to optimize one hot memset in
> one function is using a really big hammer to solve the wrong
> problem.

-ffreestanding was not added to optimize anything. It was already there.
If anything -ffreestanding actually pessimizes a lot of the code
generation, as it prevents the compiler from recognizing the semantics
of common primitives. This is what this change is trying to fix.
Removing -ffreestanding from the options is another (easier?) way to fix
the problem. This change is a stab at accomodating both those who want
freestanding and those who want performance.

The hot memset I mentionned was just the hottest one. But as you can imagine
there are many constant-size memsets spread across many functions, some of
which are also very hot, the others constituting a long tail which is not
negligible.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] x86: Alias memset to __builtin_memset.
  2020-03-24 15:59 ` [PATCH v3] " Clement Courbet
@ 2020-03-25 11:33   ` Bernd Petrovitsch
  0 siblings, 0 replies; 17+ messages in thread
From: Bernd Petrovitsch @ 2020-03-25 11:33 UTC (permalink / raw)
  To: Clement Courbet
  Cc: Nathan Chancellor, Kees Cook, Nick Desaulniers, Joe Perches,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, linux-kernel, clang-built-linux

Hi all!

Sry for being late at the party:

On 24/03/2020 16:59, Clement Courbet wrote:
[...]
> ---
>   arch/x86/include/asm/string_64.h | 10 ++++++++++
>   1 file changed, 10 insertions(+)
> 
> diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
> index 75314c3dbe47..9cfce0a840a4 100644
> --- a/arch/x86/include/asm/string_64.h
> +++ b/arch/x86/include/asm/string_64.h
> @@ -18,6 +18,15 @@ extern void *__memcpy(void *to, const void *from, size_t len);
>   void *memset(void *s, int c, size_t n);
>   void *__memset(void *s, int c, size_t n);
>   
> +/* Recent compilers can generate much better code for known size and/or
> + * fill values, and will fallback on `memset` if they fail.
> + * We alias `memset` to `__builtin_memset` explicitly to inform the compiler to
> + * perform this optimization even when -ffreestanding is used.
> + */
> +#if !defined(CONFIG_FORTIFY_SOURCE)
> +#define memset(s, c, count) __builtin_memset(s, c, count)
To be on the safe side, the usual way to write macros is like

#define memset(s, c, count) __builtin_memset((s), (c), (count))

as no one know what is passed as parameter to memset() and the extra 
pair of parentheses don't hurt.

And similar below (and I fear there are more places).

Or did I miss something in the Kernel?

> +#endif
> +
>   #define __HAVE_ARCH_MEMSET16
>   static inline void *memset16(uint16_t *s, uint16_t v, size_t n)
>   {
> @@ -74,6 +83,7 @@ int strcmp(const char *cs, const char *ct);
>   #undef memcpy
>   #define memcpy(dst, src, len) __memcpy(dst, src, len)
#define memcpy(dst, src, len) __memcpy((dst), (src), (len))
>   #define memmove(dst, src, len) __memmove(dst, src, len)
#define memmove(dst, src, len) __memmove((dst), (src), (len))
> +#undef memset
>   #define memset(s, c, n) __memset(s, c, n)
#define memset(s, c, n) __memset((s), (c), (n))
>   
>   #ifndef __NO_FORTIFY
> 

MfG,
	Bernd
-- 
Bernd Petrovitsch                  Email : bernd@petrovitsch.priv.at
                      LUGA : http://www.luga.at

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4]     x86: Alias memset to __builtin_memset.
  2020-03-23 11:42 [PATCH] x86: Alias memset to __builtin_memset Clement Courbet
                   ` (6 preceding siblings ...)
  2020-03-25  7:59 ` Clement Courbet
@ 2020-03-26 12:25 ` Clement Courbet
  2020-03-26 12:38 ` [PATCH] " Clement Courbet
  8 siblings, 0 replies; 17+ messages in thread
From: Clement Courbet @ 2020-03-26 12:25 UTC (permalink / raw)
  Cc: Nathan Chancellor, Kees Cook, Nick Desaulniers, Joe Perches,
	Bernd Petrovitsch, Clement Courbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, x86, linux-kernel,
	clang-built-linux

    Recent compilers know the meaning of builtins (`memset`,
    `memcpy`, ...) and can replace calls by inline code when
    deemed better. For example, `memset(p, 0, 4)` will be lowered
    to a four-byte zero store.

    When using -ffreestanding (this is the case e.g. building on
    clang), these optimizations are disabled. This means that **all**
    memsets, including those with small, constant sizes, will result
    in an actual call to memset.

    We have identified several spots where we have high CPU usage
    because of this. For example, a single one of these memsets is
    responsible for about 0.3% of our total CPU usage in the kernel.

    Aliasing `memset` to `__builtin_memset` allows the compiler to
    perform this optimization even when -ffreestanding is used.
    There is no change when -ffreestanding is not used.

    Below is a diff (clang) for `update_sg_lb_stats()`, which
    includes the aforementionned hot memset:
        memset(sgs, 0, sizeof(*sgs));

    Diff:
        movq %rsi, %rbx        ~~~>  movq $0x0, 0x40(%r8)
        movq %rdi, %r15        ~~~>  movq $0x0, 0x38(%r8)
        movl $0x48, %edx       ~~~>  movq $0x0, 0x30(%r8)
        movq %r8, %rdi         ~~~>  movq $0x0, 0x28(%r8)
        xorl %esi, %esi        ~~~>  movq $0x0, 0x20(%r8)
        callq <memset>         ~~~>  movq $0x0, 0x18(%r8)
                               ~~~>  movq $0x0, 0x10(%r8)
                               ~~~>  movq $0x0, 0x8(%r8)
                               ~~~>  movq $0x0, (%r8)

    In terms of code size, this shrins the clang-built kernel a
    bit (-0.022%):
    440383608 vmlinux.clang.before
    440285512 vmlinux.clang.after

Signed-off-by: Clement Courbet <courbet@google.com>

---

changes in v2:
  - Removed ifdef(GNUC >= 4).
  - Disabled change when CONFIG_FORTIFY_SOURCE is set.

changes in v3:
  - Fixed commit message: the kernel shrinks in size.

changes in v4:
  - Properly parenthesize the macro.
---
 arch/x86/include/asm/string_64.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 75314c3dbe47..1bfa825e9ad3 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -18,6 +18,15 @@ extern void *__memcpy(void *to, const void *from, size_t len);
 void *memset(void *s, int c, size_t n);
 void *__memset(void *s, int c, size_t n);
 
+/* Recent compilers can generate much better code for known size and/or
+ * fill values, and will fallback on `memset` if they fail.
+ * We alias `memset` to `__builtin_memset` explicitly to inform the compiler to
+ * perform this optimization even when -ffreestanding is used.
+ */
+#if !defined(CONFIG_FORTIFY_SOURCE)
+#define memset(s, c, count) __builtin_memset((s), (c), (count))
+#endif
+
 #define __HAVE_ARCH_MEMSET16
 static inline void *memset16(uint16_t *s, uint16_t v, size_t n)
 {
@@ -74,6 +83,7 @@ int strcmp(const char *cs, const char *ct);
 #undef memcpy
 #define memcpy(dst, src, len) __memcpy(dst, src, len)
 #define memmove(dst, src, len) __memmove(dst, src, len)
+#undef memset
 #define memset(s, c, n) __memset(s, c, n)
 
 #ifndef __NO_FORTIFY
-- 
2.25.1.696.g5e7596f4ac-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH]     x86: Alias memset to __builtin_memset.
  2020-03-23 11:42 [PATCH] x86: Alias memset to __builtin_memset Clement Courbet
                   ` (7 preceding siblings ...)
  2020-03-26 12:25 ` [PATCH v4] " Clement Courbet
@ 2020-03-26 12:38 ` Clement Courbet
  2020-03-26 17:18   ` Nick Desaulniers
                     ` (2 more replies)
  8 siblings, 3 replies; 17+ messages in thread
From: Clement Courbet @ 2020-03-26 12:38 UTC (permalink / raw)
  Cc: Nathan Chancellor, Kees Cook, Nick Desaulniers, Joe Perches,
	Bernd Petrovitsch, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, x86, Segher Boessenkool, Greg Kroah-Hartman,
	Allison Randal, Clement Courbet, linuxppc-dev, linux-kernel,
	clang-built-linux

I discussed with the original authors who added freestanding to our
build. It turns out that it was added globally but this was just to
to workaround powerpc not compiling under clang, but they felt the
fix was appropriate globally.

Now Nick has dug up https://lkml.org/lkml/2019/8/29/1300, which
advises against freestanding. Also, I've did some research and
discovered that the original reason for using freestanding for
powerpc has been fixed here:
https://lore.kernel.org/linuxppc-dev/20191119045712.39633-3-natechancellor@gmail.com/

I'm going to remove -ffreestanding from downstream, so we don't really need
this anymore, sorry for waisting people's time.

I wonder if the freestanding fix from the aforementioned patch is really needed
though. I think that clang is actually right to point out the issue.
I don't see any reason why setjmp()/longjmp() are declared as taking longs
rather than ints. The implementation looks like it only ever propagates the
value (in longjmp) or sets it to 1 (in setjmp), and we only ever call longjmp
with integer parameters. But I'm not a PowerPC expert, so I might
be misreading the code.


So it seems that we could just remove freestanding altogether and rewrite the
code to:

diff --git a/arch/powerpc/include/asm/setjmp.h b/arch/powerpc/include/asm/setjmp.h
index 279d03a1eec6..7941ae68fe21 100644
--- a/arch/powerpc/include/asm/setjmp.h
+++ b/arch/powerpc/include/asm/setjmp.h
@@ -12,7 +12,9 @@

 #define JMP_BUF_LEN    23
-extern long setjmp(long *);
-extern void longjmp(long *, long);
+typedef long * jmp_buf;
+
+extern int setjmp(jmp_buf);
+extern void longjmp(jmp_buf, int);

I'm happy to send a patch for this, and get rid of more -ffreestanding.
Opinions ?

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86: Alias memset to __builtin_memset.
  2020-03-26 12:38 ` [PATCH] " Clement Courbet
@ 2020-03-26 17:18   ` Nick Desaulniers
  2020-03-27  4:06   ` Michael Ellerman
  2020-03-27 17:12   ` Segher Boessenkool
  2 siblings, 0 replies; 17+ messages in thread
From: Nick Desaulniers @ 2020-03-26 17:18 UTC (permalink / raw)
  To: Clement Courbet
  Cc: Nathan Chancellor, Kees Cook, Joe Perches, Bernd Petrovitsch,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Segher Boessenkool, Greg Kroah-Hartman, Allison Randal,
	linuxppc-dev, LKML, clang-built-linux

On Thu, Mar 26, 2020 at 5:38 AM Clement Courbet <courbet@google.com> wrote:
>
> I discussed with the original authors who added freestanding to our
> build. It turns out that it was added globally but this was just to
> to workaround powerpc not compiling under clang, but they felt the
> fix was appropriate globally.
>
> Now Nick has dug up https://lkml.org/lkml/2019/8/29/1300, which
> advises against freestanding. Also, I've did some research and
> discovered that the original reason for using freestanding for
> powerpc has been fixed here:
> https://lore.kernel.org/linuxppc-dev/20191119045712.39633-3-natechancellor@gmail.com/
>
> I'm going to remove -ffreestanding from downstream, so we don't really need
> this anymore, sorry for waisting people's time.
>
> I wonder if the freestanding fix from the aforementioned patch is really needed
> though. I think that clang is actually right to point out the issue.
> I don't see any reason why setjmp()/longjmp() are declared as taking longs
> rather than ints. The implementation looks like it only ever propagates the
> value (in longjmp) or sets it to 1 (in setjmp), and we only ever call longjmp
> with integer parameters. But I'm not a PowerPC expert, so I might
> be misreading the code.
>
>
> So it seems that we could just remove freestanding altogether and rewrite the
> code to:
>
> diff --git a/arch/powerpc/include/asm/setjmp.h b/arch/powerpc/include/asm/setjmp.h
> index 279d03a1eec6..7941ae68fe21 100644
> --- a/arch/powerpc/include/asm/setjmp.h
> +++ b/arch/powerpc/include/asm/setjmp.h
> @@ -12,7 +12,9 @@
>
>  #define JMP_BUF_LEN    23
> -extern long setjmp(long *);
> -extern void longjmp(long *, long);
> +typedef long * jmp_buf;
> +
> +extern int setjmp(jmp_buf);
> +extern void longjmp(jmp_buf, int);
>
> I'm happy to send a patch for this, and get rid of more -ffreestanding.
> Opinions ?

Yes, I think the above diff and additionally cleaning up
-ffreestanding from arch/powerpc/kernel/Makefile and
arch/powerpc/xmon/Makefile would be a much better approach.  If that's
good enough to fix the warning, I kind of can't believe we didn't spot
that in the original code review!

Actually, the god awful warning was:
./arch/powerpc/include/asm/setjmp.h:10:13: error: declaration of
built-in function 'setjmp' requires the declaration of the 'jmp_buf'
type, commonly provided in the header <setjmp.h>.
[-Werror,-Wincomplete-setjmp-declaration]
extern long setjmp(long *) __attribute__((returns_twice));
            ^
So jmp_buf was missing, wasn't used, but also the long vs int confusion.

I tested the above diff, all calls to setjmp under arch/powerpc/ just
compare the return value against 0.  Callers of longjmp just pass 1
for the final parameter. So the above changes should be no functional
change.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH]     x86: Alias memset to __builtin_memset.
  2020-03-26 12:38 ` [PATCH] " Clement Courbet
  2020-03-26 17:18   ` Nick Desaulniers
@ 2020-03-27  4:06   ` Michael Ellerman
  2020-03-27 17:12   ` Segher Boessenkool
  2 siblings, 0 replies; 17+ messages in thread
From: Michael Ellerman @ 2020-03-27  4:06 UTC (permalink / raw)
  To: Clement Courbet
  Cc: Nathan Chancellor, Kees Cook, Nick Desaulniers, Joe Perches,
	Bernd Petrovitsch, Benjamin Herrenschmidt, Paul Mackerras,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Segher Boessenkool, Greg Kroah-Hartman, Allison Randal,
	Clement Courbet, linuxppc-dev, linux-kernel, clang-built-linux

Clement Courbet <courbet@google.com> writes:
> I discussed with the original authors who added freestanding to our
> build. It turns out that it was added globally but this was just to
> to workaround powerpc not compiling under clang, but they felt the
> fix was appropriate globally.
>
> Now Nick has dug up https://lkml.org/lkml/2019/8/29/1300, which
> advises against freestanding. Also, I've did some research and
> discovered that the original reason for using freestanding for
> powerpc has been fixed here:
> https://lore.kernel.org/linuxppc-dev/20191119045712.39633-3-natechancellor@gmail.com/
>
> I'm going to remove -ffreestanding from downstream, so we don't really need
> this anymore, sorry for waisting people's time.
>
> I wonder if the freestanding fix from the aforementioned patch is really needed
> though. I think that clang is actually right to point out the issue.
> I don't see any reason why setjmp()/longjmp() are declared as taking longs
> rather than ints. The implementation looks like it only ever propagates the
> value (in longjmp) or sets it to 1 (in setjmp), and we only ever call longjmp
> with integer parameters. But I'm not a PowerPC expert, so I might
> be misreading the code.
>
>
> So it seems that we could just remove freestanding altogether and rewrite the
> code to:
>
> diff --git a/arch/powerpc/include/asm/setjmp.h b/arch/powerpc/include/asm/setjmp.h
> index 279d03a1eec6..7941ae68fe21 100644
> --- a/arch/powerpc/include/asm/setjmp.h
> +++ b/arch/powerpc/include/asm/setjmp.h
> @@ -12,7 +12,9 @@
>
>  #define JMP_BUF_LEN    23
> -extern long setjmp(long *);
> -extern void longjmp(long *, long);
> +typedef long * jmp_buf;
> +
> +extern int setjmp(jmp_buf);
> +extern void longjmp(jmp_buf, int);
>
> I'm happy to send a patch for this, and get rid of more -ffreestanding.
> Opinions ?

If it works then it looks like a much better fix than using -ffreestanding.

Please submit a patch with a change log etc. and I'd be happy to merge
it.

cheers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH]     x86: Alias memset to __builtin_memset.
  2020-03-26 12:38 ` [PATCH] " Clement Courbet
  2020-03-26 17:18   ` Nick Desaulniers
  2020-03-27  4:06   ` Michael Ellerman
@ 2020-03-27 17:12   ` Segher Boessenkool
  2 siblings, 0 replies; 17+ messages in thread
From: Segher Boessenkool @ 2020-03-27 17:12 UTC (permalink / raw)
  To: Clement Courbet
  Cc: Nathan Chancellor, Kees Cook, Nick Desaulniers, Joe Perches,
	Bernd Petrovitsch, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, x86, Greg Kroah-Hartman, Allison Randal,
	linuxppc-dev, linux-kernel, clang-built-linux

Hi!

On Thu, Mar 26, 2020 at 01:38:39PM +0100, Clement Courbet wrote:
> --- a/arch/powerpc/include/asm/setjmp.h
> +++ b/arch/powerpc/include/asm/setjmp.h
> @@ -12,7 +12,9 @@
> 
>  #define JMP_BUF_LEN    23
> -extern long setjmp(long *);
> -extern void longjmp(long *, long);
> +typedef long * jmp_buf;
> +
> +extern int setjmp(jmp_buf);
> +extern void longjmp(jmp_buf, int);
> 
> I'm happy to send a patch for this, and get rid of more -ffreestanding.
> Opinions ?

Pedantically, jmp_buf should be an array type.  But, this will probably
work fine, and it certainly is better than what we had before.

You could do
  typedef long jmp_buf[JMP_BUF_LEN];
perhaps?

Thanks,


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-03-27 17:13 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-23 11:42 [PATCH] x86: Alias memset to __builtin_memset Clement Courbet
2020-03-23 15:47 ` Nathan Chancellor
2020-03-24  1:22 ` Nick Desaulniers
2020-03-24  1:26   ` Nick Desaulniers
2020-03-24 14:06 ` Clement Courbet
2020-03-24 17:29   ` Nick Desaulniers
2020-03-24 14:07 ` [PATCH v2] " Clement Courbet
2020-03-24 15:08   ` Joe Perches
2020-03-24 15:59 ` [PATCH v3] " Clement Courbet
2020-03-25 11:33   ` Bernd Petrovitsch
2020-03-24 16:02 ` [PATCH] " Clement Courbet
2020-03-25  7:59 ` Clement Courbet
2020-03-26 12:25 ` [PATCH v4] " Clement Courbet
2020-03-26 12:38 ` [PATCH] " Clement Courbet
2020-03-26 17:18   ` Nick Desaulniers
2020-03-27  4:06   ` Michael Ellerman
2020-03-27 17:12   ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).