All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH]     x86: Alias memset to __builtin_memset.
@ 2020-03-23 11:42 Clement Courbet
  2020-03-23 15:47 ` Nathan Chancellor
                   ` (9 more replies)
  0 siblings, 10 replies; 23+ messages in thread
From: Clement Courbet @ 2020-03-23 11:42 UTC (permalink / raw)
  Cc: Clement Courbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, x86, linux-kernel, clang-built-linux

    Recent compilers know the meaning of builtins (`memset`,
    `memcpy`, ...) and can replace calls by inline code when
    deemed better. For example, `memset(p, 0, 4)` will be lowered
    to a four-byte zero store.

    When using -ffreestanding (this is the case e.g. building on
    clang), these optimizations are disabled. This means that **all**
    memsets, including those with small, constant sizes, will result
    in an actual call to memset.

    We have identified several spots where we have high CPU usage
    because of this. For example, a single one of these memsets is
    responsible for about 0.3% of our total CPU usage in the kernel.

    Aliasing `memset` to `__builtin_memset` allows the compiler to
    perform this optimization even when -ffreestanding is used.
    There is no change when -ffreestanding is not used.

    Below is a diff (clang) for `update_sg_lb_stats()`, which
    includes the aforementionned hot memset:
        memset(sgs, 0, sizeof(*sgs));

    Diff:
        movq %rsi, %rbx        ~~~>  movq $0x0, 0x40(%r8)
        movq %rdi, %r15        ~~~>  movq $0x0, 0x38(%r8)
        movl $0x48, %edx       ~~~>  movq $0x0, 0x30(%r8)
        movq %r8, %rdi         ~~~>  movq $0x0, 0x28(%r8)
        xorl %esi, %esi        ~~~>  movq $0x0, 0x20(%r8)
        callq <memset>         ~~~>  movq $0x0, 0x18(%r8)
                               ~~~>  movq $0x0, 0x10(%r8)
                               ~~~>  movq $0x0, 0x8(%r8)
                               ~~~>  movq $0x0, (%r8)

    In terms of code size, this grows the clang-built kernel a
    bit (+0.022%):
    440285512 vmlinux.clang.after
    440383608 vmlinux.clang.before

Signed-off-by: Clement Courbet <courbet@google.com>
---
 arch/x86/include/asm/string_64.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 75314c3dbe47..7073c25aa4a3 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -18,6 +18,15 @@ extern void *__memcpy(void *to, const void *from, size_t len);
 void *memset(void *s, int c, size_t n);
 void *__memset(void *s, int c, size_t n);
 
+/* Recent compilers can generate much better code for known size and/or
+ * fill values, and will fallback on `memset` if they fail.
+ * We alias `memset` to `__builtin_memset` explicitly to inform the compiler to
+ * perform this optimization even when -ffreestanding is used.
+ */
+#if (__GNUC__ >= 4)
+#define memset(s, c, count) __builtin_memset(s, c, count)
+#endif
+
 #define __HAVE_ARCH_MEMSET16
 static inline void *memset16(uint16_t *s, uint16_t v, size_t n)
 {
@@ -74,6 +83,7 @@ int strcmp(const char *cs, const char *ct);
 #undef memcpy
 #define memcpy(dst, src, len) __memcpy(dst, src, len)
 #define memmove(dst, src, len) __memmove(dst, src, len)
+#undef memset
 #define memset(s, c, n) __memset(s, c, n)
 
 #ifndef __NO_FORTIFY
-- 
2.25.1.696.g5e7596f4ac-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH]     x86: Alias memset to __builtin_memset.
  2020-03-23 11:42 [PATCH] x86: Alias memset to __builtin_memset Clement Courbet
@ 2020-03-23 15:47 ` Nathan Chancellor
  2020-03-23 20:16 ` kbuild test robot
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Nathan Chancellor @ 2020-03-23 15:47 UTC (permalink / raw)
  To: Clement Courbet
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, linux-kernel, clang-built-linux

On Mon, Mar 23, 2020 at 12:42:06PM +0100, 'Clement Courbet' via Clang Built Linux wrote:
>     Recent compilers know the meaning of builtins (`memset`,
>     `memcpy`, ...) and can replace calls by inline code when
>     deemed better. For example, `memset(p, 0, 4)` will be lowered
>     to a four-byte zero store.
> 
>     When using -ffreestanding (this is the case e.g. building on
>     clang), these optimizations are disabled. This means that **all**
>     memsets, including those with small, constant sizes, will result
>     in an actual call to memset.
> 
>     We have identified several spots where we have high CPU usage
>     because of this. For example, a single one of these memsets is
>     responsible for about 0.3% of our total CPU usage in the kernel.
> 
>     Aliasing `memset` to `__builtin_memset` allows the compiler to
>     perform this optimization even when -ffreestanding is used.
>     There is no change when -ffreestanding is not used.
> 
>     Below is a diff (clang) for `update_sg_lb_stats()`, which
>     includes the aforementionned hot memset:
>         memset(sgs, 0, sizeof(*sgs));
> 
>     Diff:
>         movq %rsi, %rbx        ~~~>  movq $0x0, 0x40(%r8)
>         movq %rdi, %r15        ~~~>  movq $0x0, 0x38(%r8)
>         movl $0x48, %edx       ~~~>  movq $0x0, 0x30(%r8)
>         movq %r8, %rdi         ~~~>  movq $0x0, 0x28(%r8)
>         xorl %esi, %esi        ~~~>  movq $0x0, 0x20(%r8)
>         callq <memset>         ~~~>  movq $0x0, 0x18(%r8)
>                                ~~~>  movq $0x0, 0x10(%r8)
>                                ~~~>  movq $0x0, 0x8(%r8)
>                                ~~~>  movq $0x0, (%r8)
> 
>     In terms of code size, this grows the clang-built kernel a
>     bit (+0.022%):
>     440285512 vmlinux.clang.after
>     440383608 vmlinux.clang.before
> 
> Signed-off-by: Clement Courbet <courbet@google.com>
> ---
>  arch/x86/include/asm/string_64.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
> index 75314c3dbe47..7073c25aa4a3 100644
> --- a/arch/x86/include/asm/string_64.h
> +++ b/arch/x86/include/asm/string_64.h
> @@ -18,6 +18,15 @@ extern void *__memcpy(void *to, const void *from, size_t len);
>  void *memset(void *s, int c, size_t n);
>  void *__memset(void *s, int c, size_t n);
>  
> +/* Recent compilers can generate much better code for known size and/or
> + * fill values, and will fallback on `memset` if they fail.
> + * We alias `memset` to `__builtin_memset` explicitly to inform the compiler to
> + * perform this optimization even when -ffreestanding is used.
> + */
> +#if (__GNUC__ >= 4)

This ifdef is unnecessary, this will always be true because the minimum
supported GCC version is 4.6 and clang pretends it is 4.2.1. It appears
the Intel compiler will pretend to be whatever whatever GCC version the
host is using (no idea if it is still used by anyone or truly supported
but still worth mentioning) so it would still always be true because of
the GCC 4.6 requirement.

I cannot comment on the validity of the patch even though the reasoning
seems sound to me.

Cheers,
Nathan

> +#define memset(s, c, count) __builtin_memset(s, c, count)
> +#endif
> +
>  #define __HAVE_ARCH_MEMSET16
>  static inline void *memset16(uint16_t *s, uint16_t v, size_t n)
>  {
> @@ -74,6 +83,7 @@ int strcmp(const char *cs, const char *ct);
>  #undef memcpy
>  #define memcpy(dst, src, len) __memcpy(dst, src, len)
>  #define memmove(dst, src, len) __memmove(dst, src, len)
> +#undef memset
>  #define memset(s, c, n) __memset(s, c, n)
>  
>  #ifndef __NO_FORTIFY
> -- 
> 2.25.1.696.g5e7596f4ac-goog
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: Alias memset to __builtin_memset.
  2020-03-23 11:42 [PATCH] x86: Alias memset to __builtin_memset Clement Courbet
  2020-03-23 15:47 ` Nathan Chancellor
@ 2020-03-23 20:16 ` kbuild test robot
  2020-03-23 21:59   ` Kees Cook
  2020-03-24  1:22 ` Nick Desaulniers
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: kbuild test robot @ 2020-03-23 20:16 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3383 bytes --]

Hi Clement,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/x86/core]
[also build test ERROR on v5.6-rc7 next-20200323]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Clement-Courbet/x86-Alias-memset-to-__builtin_memset/20200324-004007
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 248ed51048c40d36728e70914e38bffd7821da57
config: x86_64-randconfig-h003-20200323 (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 006244152d6c7dd6a390ff89b236cc7801834b46)
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from scripts/mod/devicetable-offsets.c:3:
   In file included from include/linux/mod_devicetable.h:13:
   In file included from include/linux/uuid.h:12:
>> include/linux/string.h:358:24: error: definition of builtin function '__builtin_memset'
   __FORTIFY_INLINE void *memset(void *p, int c, __kernel_size_t size)
                          ^
   arch/x86/include/asm/string_64.h:27:29: note: expanded from macro 'memset'
   #define memset(s, c, count) __builtin_memset(s, c, count)
                               ^
   1 error generated.
   make[2]: *** [scripts/Makefile.build:99: scripts/mod/devicetable-offsets.s] Error 1
   make[2]: Target '__build' not remade because of errors.
   make[1]: *** [Makefile:1112: prepare0] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [Makefile:179: sub-make] Error 2
   27 real  8 user  13 sys  79.89% cpu 	make prepare

vim +/__builtin_memset +358 include/linux/string.h

6974f0c4555e28 Daniel Micay 2017-07-12  357  
6974f0c4555e28 Daniel Micay 2017-07-12 @358  __FORTIFY_INLINE void *memset(void *p, int c, __kernel_size_t size)
6974f0c4555e28 Daniel Micay 2017-07-12  359  {
6974f0c4555e28 Daniel Micay 2017-07-12  360  	size_t p_size = __builtin_object_size(p, 0);
6974f0c4555e28 Daniel Micay 2017-07-12  361  	if (__builtin_constant_p(size) && p_size < size)
6974f0c4555e28 Daniel Micay 2017-07-12  362  		__write_overflow();
6974f0c4555e28 Daniel Micay 2017-07-12  363  	if (p_size < size)
6974f0c4555e28 Daniel Micay 2017-07-12  364  		fortify_panic(__func__);
6974f0c4555e28 Daniel Micay 2017-07-12  365  	return __builtin_memset(p, c, size);
6974f0c4555e28 Daniel Micay 2017-07-12  366  }
6974f0c4555e28 Daniel Micay 2017-07-12  367  

:::::: The code@line 358 was first introduced by commit
:::::: 6974f0c4555e285ab217cee58b6e874f776ff409 include/linux/string.h: add the option of fortified string.h functions

:::::: TO: Daniel Micay <danielmicay@gmail.com>
:::::: CC: Linus Torvalds <torvalds@linux-foundation.org>

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 35694 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: Alias memset to __builtin_memset.
  2020-03-23 20:16 ` kbuild test robot
@ 2020-03-23 21:59   ` Kees Cook
  0 siblings, 0 replies; 23+ messages in thread
From: Kees Cook @ 2020-03-23 21:59 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4178 bytes --]

On Tue, Mar 24, 2020 at 04:16:10AM +0800, kbuild test robot wrote:
> Hi Clement,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on tip/x86/core]
> [also build test ERROR on v5.6-rc7 next-20200323]
> [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system. BTW, we also suggest to use '--base' option to specify the
> base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
> 
> url:    https://github.com/0day-ci/linux/commits/Clement-Courbet/x86-Alias-memset-to-__builtin_memset/20200324-004007
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 248ed51048c40d36728e70914e38bffd7821da57
> config: x86_64-randconfig-h003-20200323 (attached as .config)
> compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 006244152d6c7dd6a390ff89b236cc7801834b46)
> reproduce:
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         COMPILER=clang make.cross ARCH=x86_64 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp@intel.com>
> 
> All errors (new ones prefixed by >>):
> 
>    In file included from scripts/mod/devicetable-offsets.c:3:
>    In file included from include/linux/mod_devicetable.h:13:
>    In file included from include/linux/uuid.h:12:
> >> include/linux/string.h:358:24: error: definition of builtin function '__builtin_memset'
>    __FORTIFY_INLINE void *memset(void *p, int c, __kernel_size_t size)
>                           ^
>    arch/x86/include/asm/string_64.h:27:29: note: expanded from macro 'memset'
>    #define memset(s, c, count) __builtin_memset(s, c, count)
>                                ^

This needs to be within #ifndef CONFIG_FORTIFY_SOURCE. FORTIFY already
does this. Additionally, shouldn't this just be done universally,
instead of in x86-specific code?

-Kees

>    1 error generated.
>    make[2]: *** [scripts/Makefile.build:99: scripts/mod/devicetable-offsets.s] Error 1
>    make[2]: Target '__build' not remade because of errors.
>    make[1]: *** [Makefile:1112: prepare0] Error 2
>    make[1]: Target 'prepare' not remade because of errors.
>    make: *** [Makefile:179: sub-make] Error 2
>    27 real  8 user  13 sys  79.89% cpu 	make prepare
> 
> vim +/__builtin_memset +358 include/linux/string.h
> 
> 6974f0c4555e28 Daniel Micay 2017-07-12  357  
> 6974f0c4555e28 Daniel Micay 2017-07-12 @358  __FORTIFY_INLINE void *memset(void *p, int c, __kernel_size_t size)
> 6974f0c4555e28 Daniel Micay 2017-07-12  359  {
> 6974f0c4555e28 Daniel Micay 2017-07-12  360  	size_t p_size = __builtin_object_size(p, 0);
> 6974f0c4555e28 Daniel Micay 2017-07-12  361  	if (__builtin_constant_p(size) && p_size < size)
> 6974f0c4555e28 Daniel Micay 2017-07-12  362  		__write_overflow();
> 6974f0c4555e28 Daniel Micay 2017-07-12  363  	if (p_size < size)
> 6974f0c4555e28 Daniel Micay 2017-07-12  364  		fortify_panic(__func__);
> 6974f0c4555e28 Daniel Micay 2017-07-12  365  	return __builtin_memset(p, c, size);
> 6974f0c4555e28 Daniel Micay 2017-07-12  366  }
> 6974f0c4555e28 Daniel Micay 2017-07-12  367  
> 
> :::::: The code at line 358 was first introduced by commit
> :::::: 6974f0c4555e285ab217cee58b6e874f776ff409 include/linux/string.h: add the option of fortified string.h functions
> 
> :::::: TO: Daniel Micay <danielmicay@gmail.com>
> :::::: CC: Linus Torvalds <torvalds@linux-foundation.org>
> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
> 
> -- 
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe(a)googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/202003240432.AGknaLzA%25lkp%40intel.com.



-- 
Kees Cook

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: Alias memset to __builtin_memset.
  2020-03-23 11:42 [PATCH] x86: Alias memset to __builtin_memset Clement Courbet
  2020-03-23 15:47 ` Nathan Chancellor
  2020-03-23 20:16 ` kbuild test robot
@ 2020-03-24  1:22 ` Nick Desaulniers
  2020-03-24  1:26   ` Nick Desaulniers
  2020-03-24 14:06 ` Clement Courbet
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Nick Desaulniers @ 2020-03-24  1:22 UTC (permalink / raw)
  To: Clement Courbet
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	LKML, clang-built-linux

On Mon, Mar 23, 2020 at 4:43 AM 'Clement Courbet' via Clang Built
Linux <clang-built-linux@googlegroups.com> wrote:
>
>     Recent compilers know the meaning of builtins (`memset`,
>     `memcpy`, ...) and can replace calls by inline code when
>     deemed better. For example, `memset(p, 0, 4)` will be lowered
>     to a four-byte zero store.
>
>     When using -ffreestanding (this is the case e.g. building on
>     clang), these optimizations are disabled. This means that **all**
>     memsets, including those with small, constant sizes, will result
>     in an actual call to memset.

Isn't this only added for 32b x86 (if I'm reading arch/x86/Makefile
right)? Who's adding it for 64b?

arch/x86/Makefile has a comment:
 88         # temporary until string.h is fixed
 89         KBUILD_CFLAGS += -ffreestanding
Did you look into fixing that?

>
>     We have identified several spots where we have high CPU usage
>     because of this. For example, a single one of these memsets is
>     responsible for about 0.3% of our total CPU usage in the kernel.
>
>     Aliasing `memset` to `__builtin_memset` allows the compiler to
>     perform this optimization even when -ffreestanding is used.
>     There is no change when -ffreestanding is not used.
>
>     Below is a diff (clang) for `update_sg_lb_stats()`, which
>     includes the aforementionned hot memset:
>         memset(sgs, 0, sizeof(*sgs));
>
>     Diff:
>         movq %rsi, %rbx        ~~~>  movq $0x0, 0x40(%r8)
>         movq %rdi, %r15        ~~~>  movq $0x0, 0x38(%r8)
>         movl $0x48, %edx       ~~~>  movq $0x0, 0x30(%r8)
>         movq %r8, %rdi         ~~~>  movq $0x0, 0x28(%r8)
>         xorl %esi, %esi        ~~~>  movq $0x0, 0x20(%r8)
>         callq <memset>         ~~~>  movq $0x0, 0x18(%r8)
>                                ~~~>  movq $0x0, 0x10(%r8)
>                                ~~~>  movq $0x0, 0x8(%r8)
>                                ~~~>  movq $0x0, (%r8)
>
>     In terms of code size, this grows the clang-built kernel a
>     bit (+0.022%):
>     440285512 vmlinux.clang.after
>     440383608 vmlinux.clang.before

The before number looks bigger? Did it shrink in size, or was the
above mislabeled?

>
> Signed-off-by: Clement Courbet <courbet@google.com>
> ---
>  arch/x86/include/asm/string_64.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
> index 75314c3dbe47..7073c25aa4a3 100644
> --- a/arch/x86/include/asm/string_64.h
> +++ b/arch/x86/include/asm/string_64.h
> @@ -18,6 +18,15 @@ extern void *__memcpy(void *to, const void *from, size_t len);
>  void *memset(void *s, int c, size_t n);
>  void *__memset(void *s, int c, size_t n);
>
> +/* Recent compilers can generate much better code for known size and/or
> + * fill values, and will fallback on `memset` if they fail.
> + * We alias `memset` to `__builtin_memset` explicitly to inform the compiler to
> + * perform this optimization even when -ffreestanding is used.
> + */
> +#if (__GNUC__ >= 4)
> +#define memset(s, c, count) __builtin_memset(s, c, count)
> +#endif
> +
>  #define __HAVE_ARCH_MEMSET16
>  static inline void *memset16(uint16_t *s, uint16_t v, size_t n)
>  {
> @@ -74,6 +83,7 @@ int strcmp(const char *cs, const char *ct);
>  #undef memcpy
>  #define memcpy(dst, src, len) __memcpy(dst, src, len)
>  #define memmove(dst, src, len) __memmove(dst, src, len)
> +#undef memset
>  #define memset(s, c, n) __memset(s, c, n)
>
>  #ifndef __NO_FORTIFY
> --

-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: Alias memset to __builtin_memset.
  2020-03-24  1:22 ` Nick Desaulniers
@ 2020-03-24  1:26   ` Nick Desaulniers
  0 siblings, 0 replies; 23+ messages in thread
From: Nick Desaulniers @ 2020-03-24  1:26 UTC (permalink / raw)
  To: Clement Courbet
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	LKML, clang-built-linux

On Mon, Mar 23, 2020 at 6:22 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> On Mon, Mar 23, 2020 at 4:43 AM 'Clement Courbet' via Clang Built
> Linux <clang-built-linux@googlegroups.com> wrote:
> >
> >     Recent compilers know the meaning of builtins (`memset`,
> >     `memcpy`, ...) and can replace calls by inline code when
> >     deemed better. For example, `memset(p, 0, 4)` will be lowered
> >     to a four-byte zero store.
> >
> >     When using -ffreestanding (this is the case e.g. building on
> >     clang), these optimizations are disabled. This means that **all**
> >     memsets, including those with small, constant sizes, will result
> >     in an actual call to memset.
>
> Isn't this only added for 32b x86 (if I'm reading arch/x86/Makefile
> right)? Who's adding it for 64b?
>
> arch/x86/Makefile has a comment:
>  88         # temporary until string.h is fixed
>  89         KBUILD_CFLAGS += -ffreestanding
> Did you look into fixing that?
>
> >
> >     We have identified several spots where we have high CPU usage
> >     because of this. For example, a single one of these memsets is
> >     responsible for about 0.3% of our total CPU usage in the kernel.
> >
> >     Aliasing `memset` to `__builtin_memset` allows the compiler to
> >     perform this optimization even when -ffreestanding is used.
> >     There is no change when -ffreestanding is not used.
> >
> >     Below is a diff (clang) for `update_sg_lb_stats()`, which
> >     includes the aforementionned hot memset:
> >         memset(sgs, 0, sizeof(*sgs));

Further, `make CC=clang -j71 kernel/sched/fair.o V=1` doesn't show
`-ffreestanding` being used.  Any idea where it's coming from in your
build? Maybe a local modification to be removed?

> >
> >     Diff:
> >         movq %rsi, %rbx        ~~~>  movq $0x0, 0x40(%r8)
> >         movq %rdi, %r15        ~~~>  movq $0x0, 0x38(%r8)
> >         movl $0x48, %edx       ~~~>  movq $0x0, 0x30(%r8)
> >         movq %r8, %rdi         ~~~>  movq $0x0, 0x28(%r8)
> >         xorl %esi, %esi        ~~~>  movq $0x0, 0x20(%r8)
> >         callq <memset>         ~~~>  movq $0x0, 0x18(%r8)
> >                                ~~~>  movq $0x0, 0x10(%r8)
> >                                ~~~>  movq $0x0, 0x8(%r8)
> >                                ~~~>  movq $0x0, (%r8)
> >
> >     In terms of code size, this grows the clang-built kernel a
> >     bit (+0.022%):
> >     440285512 vmlinux.clang.after
> >     440383608 vmlinux.clang.before
>
> The before number looks bigger? Did it shrink in size, or was the
> above mislabeled?
>
> >
> > Signed-off-by: Clement Courbet <courbet@google.com>
> > ---
> >  arch/x86/include/asm/string_64.h | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
> > index 75314c3dbe47..7073c25aa4a3 100644
> > --- a/arch/x86/include/asm/string_64.h
> > +++ b/arch/x86/include/asm/string_64.h
> > @@ -18,6 +18,15 @@ extern void *__memcpy(void *to, const void *from, size_t len);
> >  void *memset(void *s, int c, size_t n);
> >  void *__memset(void *s, int c, size_t n);
> >
> > +/* Recent compilers can generate much better code for known size and/or
> > + * fill values, and will fallback on `memset` if they fail.
> > + * We alias `memset` to `__builtin_memset` explicitly to inform the compiler to
> > + * perform this optimization even when -ffreestanding is used.
> > + */
> > +#if (__GNUC__ >= 4)
> > +#define memset(s, c, count) __builtin_memset(s, c, count)
> > +#endif
> > +
> >  #define __HAVE_ARCH_MEMSET16
> >  static inline void *memset16(uint16_t *s, uint16_t v, size_t n)
> >  {
> > @@ -74,6 +83,7 @@ int strcmp(const char *cs, const char *ct);
> >  #undef memcpy
> >  #define memcpy(dst, src, len) __memcpy(dst, src, len)
> >  #define memmove(dst, src, len) __memmove(dst, src, len)
> > +#undef memset
> >  #define memset(s, c, n) __memset(s, c, n)
> >
> >  #ifndef __NO_FORTIFY
> > --
>
> --
> Thanks,
> ~Nick Desaulniers



-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH]     x86: Alias memset to __builtin_memset.
  2020-03-23 11:42 [PATCH] x86: Alias memset to __builtin_memset Clement Courbet
                   ` (2 preceding siblings ...)
  2020-03-24  1:22 ` Nick Desaulniers
@ 2020-03-24 14:06 ` Clement Courbet
  2020-03-24 17:29   ` Nick Desaulniers
  2020-03-24 14:07 ` [PATCH v2] " Clement Courbet
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Clement Courbet @ 2020-03-24 14:06 UTC (permalink / raw)
  Cc: Nathan Chancellor, Kees Cook, Nick Desaulniers, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, x86,
	Clement Courbet, linux-kernel, clang-built-linux

Thanks for the comments. Answers below.

> This ifdef is unnecessary
> This needs to be within #ifndef CONFIG_FORTIFY_SOURCE

Thanks, fixed.

> shouldn't this just be done universally ?

It looks like every architecture does its own magic with memory
functions. I'm not very familiar with how the linux kernel is
organized, do you have a pointer for where this would go if enabled
globally ?

> Who's adding it for 64b?
> Any idea where it's coming from in your
> build? Maybe a local modification to be removed?

Actually this is from our local build configuration. Apparently this
is needed to build on some architectures that redefine common
symbols, but the authors seemed to feel strongly that this should be
on for all architectures. I've reached out to the authors for an
extended rationale.
On the other hand I think that there is no reason to prevent people
from building with freestanding if we can easily allow them to.




^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2]     x86: Alias memset to __builtin_memset.
  2020-03-23 11:42 [PATCH] x86: Alias memset to __builtin_memset Clement Courbet
                   ` (3 preceding siblings ...)
  2020-03-24 14:06 ` Clement Courbet
@ 2020-03-24 14:07 ` Clement Courbet
  2020-03-24 15:08   ` Joe Perches
  2020-03-24 15:59 ` [PATCH v3] " Clement Courbet
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Clement Courbet @ 2020-03-24 14:07 UTC (permalink / raw)
  Cc: Nathan Chancellor, Kees Cook, Nick Desaulniers, Clement Courbet,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, linux-kernel, clang-built-linux

    Recent compilers know the meaning of builtins (`memset`,
    `memcpy`, ...) and can replace calls by inline code when
    deemed better. For example, `memset(p, 0, 4)` will be lowered
    to a four-byte zero store.

    When using -ffreestanding (this is the case e.g. building on
    clang), these optimizations are disabled. This means that **all**
    memsets, including those with small, constant sizes, will result
    in an actual call to memset.

    We have identified several spots where we have high CPU usage
    because of this. For example, a single one of these memsets is
    responsible for about 0.3% of our total CPU usage in the kernel.

    Aliasing `memset` to `__builtin_memset` allows the compiler to
    perform this optimization even when -ffreestanding is used.
    There is no change when -ffreestanding is not used.

    Below is a diff (clang) for `update_sg_lb_stats()`, which
    includes the aforementionned hot memset:
        memset(sgs, 0, sizeof(*sgs));

    Diff:
        movq %rsi, %rbx        ~~~>  movq $0x0, 0x40(%r8)
        movq %rdi, %r15        ~~~>  movq $0x0, 0x38(%r8)
        movl $0x48, %edx       ~~~>  movq $0x0, 0x30(%r8)
        movq %r8, %rdi         ~~~>  movq $0x0, 0x28(%r8)
        xorl %esi, %esi        ~~~>  movq $0x0, 0x20(%r8)
        callq <memset>         ~~~>  movq $0x0, 0x18(%r8)
                               ~~~>  movq $0x0, 0x10(%r8)
                               ~~~>  movq $0x0, 0x8(%r8)
                               ~~~>  movq $0x0, (%r8)

    In terms of code size, this grows the clang-built kernel a
    bit (+0.022%):
    440285512 vmlinux.clang.after
    440383608 vmlinux.clang.before

Signed-off-by: Clement Courbet <courbet@google.com>

---

changes in v2:
  - Removed ifdef(GNUC >= 4).
  - Disabled change when CONFIG_FORTIFY_SOURCE is set.
---
 arch/x86/include/asm/string_64.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 75314c3dbe47..9cfce0a840a4 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -18,6 +18,15 @@ extern void *__memcpy(void *to, const void *from, size_t len);
 void *memset(void *s, int c, size_t n);
 void *__memset(void *s, int c, size_t n);
 
+/* Recent compilers can generate much better code for known size and/or
+ * fill values, and will fallback on `memset` if they fail.
+ * We alias `memset` to `__builtin_memset` explicitly to inform the compiler to
+ * perform this optimization even when -ffreestanding is used.
+ */
+#if !defined(CONFIG_FORTIFY_SOURCE)
+#define memset(s, c, count) __builtin_memset(s, c, count)
+#endif
+
 #define __HAVE_ARCH_MEMSET16
 static inline void *memset16(uint16_t *s, uint16_t v, size_t n)
 {
@@ -74,6 +83,7 @@ int strcmp(const char *cs, const char *ct);
 #undef memcpy
 #define memcpy(dst, src, len) __memcpy(dst, src, len)
 #define memmove(dst, src, len) __memmove(dst, src, len)
+#undef memset
 #define memset(s, c, n) __memset(s, c, n)
 
 #ifndef __NO_FORTIFY
-- 
2.25.1.696.g5e7596f4ac-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2]     x86: Alias memset to __builtin_memset.
  2020-03-24 14:07 ` [PATCH v2] " Clement Courbet
@ 2020-03-24 15:08   ` Joe Perches
  0 siblings, 0 replies; 23+ messages in thread
From: Joe Perches @ 2020-03-24 15:08 UTC (permalink / raw)
  To: Clement Courbet
  Cc: Nathan Chancellor, Kees Cook, Nick Desaulniers, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, x86, linux-kernel,
	clang-built-linux

On Tue, 2020-03-24 at 15:07 +0100, Clement Courbet wrote:
>     Recent compilers know the meaning of builtins (`memset`,
>     `memcpy`, ...) and can replace calls by inline code when
>     deemed better. For example, `memset(p, 0, 4)` will be lowered
>     to a four-byte zero store.
> 
>     When using -ffreestanding (this is the case e.g. building on
>     clang), these optimizations are disabled. This means that **all**
>     memsets, including those with small, constant sizes, will result
>     in an actual call to memset.
[]
>     In terms of code size, this grows the clang-built kernel a
>     bit (+0.022%):
>     440285512 vmlinux.clang.after
>     440383608 vmlinux.clang.before

This shows the kernel getting smaller not larger.
Is this still reversed or is this correct?



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3]     x86: Alias memset to __builtin_memset.
  2020-03-23 11:42 [PATCH] x86: Alias memset to __builtin_memset Clement Courbet
                   ` (4 preceding siblings ...)
  2020-03-24 14:07 ` [PATCH v2] " Clement Courbet
@ 2020-03-24 15:59 ` Clement Courbet
  2020-03-25 11:33   ` Bernd Petrovitsch
  2020-03-24 16:02 ` [PATCH] " Clement Courbet
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Clement Courbet @ 2020-03-24 15:59 UTC (permalink / raw)
  Cc: Nathan Chancellor, Kees Cook, Nick Desaulniers, Joe Perches,
	Clement Courbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, x86, linux-kernel, clang-built-linux

    Recent compilers know the meaning of builtins (`memset`,
    `memcpy`, ...) and can replace calls by inline code when
    deemed better. For example, `memset(p, 0, 4)` will be lowered
    to a four-byte zero store.

    When using -ffreestanding (this is the case e.g. building on
    clang), these optimizations are disabled. This means that **all**
    memsets, including those with small, constant sizes, will result
    in an actual call to memset.

    We have identified several spots where we have high CPU usage
    because of this. For example, a single one of these memsets is
    responsible for about 0.3% of our total CPU usage in the kernel.

    Aliasing `memset` to `__builtin_memset` allows the compiler to
    perform this optimization even when -ffreestanding is used.
    There is no change when -ffreestanding is not used.

    Below is a diff (clang) for `update_sg_lb_stats()`, which
    includes the aforementionned hot memset:
        memset(sgs, 0, sizeof(*sgs));

    Diff:
        movq %rsi, %rbx        ~~~>  movq $0x0, 0x40(%r8)
        movq %rdi, %r15        ~~~>  movq $0x0, 0x38(%r8)
        movl $0x48, %edx       ~~~>  movq $0x0, 0x30(%r8)
        movq %r8, %rdi         ~~~>  movq $0x0, 0x28(%r8)
        xorl %esi, %esi        ~~~>  movq $0x0, 0x20(%r8)
        callq <memset>         ~~~>  movq $0x0, 0x18(%r8)
                               ~~~>  movq $0x0, 0x10(%r8)
                               ~~~>  movq $0x0, 0x8(%r8)
                               ~~~>  movq $0x0, (%r8)

    In terms of code size, this shrins the clang-built kernel a
    bit (-0.022%):
    440383608 vmlinux.clang.before
    440285512 vmlinux.clang.after

Signed-off-by: Clement Courbet <courbet@google.com>

---

changes in v2:
  - Removed ifdef(GNUC >= 4).
  - Disabled change when CONFIG_FORTIFY_SOURCE is set.

changes in v3:
  - Fixed commit message: the kernel shrinks in size.
---
 arch/x86/include/asm/string_64.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 75314c3dbe47..9cfce0a840a4 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -18,6 +18,15 @@ extern void *__memcpy(void *to, const void *from, size_t len);
 void *memset(void *s, int c, size_t n);
 void *__memset(void *s, int c, size_t n);
 
+/* Recent compilers can generate much better code for known size and/or
+ * fill values, and will fallback on `memset` if they fail.
+ * We alias `memset` to `__builtin_memset` explicitly to inform the compiler to
+ * perform this optimization even when -ffreestanding is used.
+ */
+#if !defined(CONFIG_FORTIFY_SOURCE)
+#define memset(s, c, count) __builtin_memset(s, c, count)
+#endif
+
 #define __HAVE_ARCH_MEMSET16
 static inline void *memset16(uint16_t *s, uint16_t v, size_t n)
 {
@@ -74,6 +83,7 @@ int strcmp(const char *cs, const char *ct);
 #undef memcpy
 #define memcpy(dst, src, len) __memcpy(dst, src, len)
 #define memmove(dst, src, len) __memmove(dst, src, len)
+#undef memset
 #define memset(s, c, n) __memset(s, c, n)
 
 #ifndef __NO_FORTIFY
-- 
2.25.1.696.g5e7596f4ac-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH]     x86: Alias memset to __builtin_memset.
  2020-03-23 11:42 [PATCH] x86: Alias memset to __builtin_memset Clement Courbet
                   ` (5 preceding siblings ...)
  2020-03-24 15:59 ` [PATCH v3] " Clement Courbet
@ 2020-03-24 16:02 ` Clement Courbet
  2020-03-25  7:59 ` Clement Courbet
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Clement Courbet @ 2020-03-24 16:02 UTC (permalink / raw)
  Cc: Nathan Chancellor, Kees Cook, Nick Desaulniers, Joe Perches,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Clement Courbet, linux-kernel, clang-built-linux

> This shows the kernel getting smaller not larger.
> Is this still reversed or is this correct?

Yes, sorry. This was wrong. The kernel actully shrinks. Fixed.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: Alias memset to __builtin_memset.
  2020-03-24 14:06 ` Clement Courbet
@ 2020-03-24 17:29   ` Nick Desaulniers
  0 siblings, 0 replies; 23+ messages in thread
From: Nick Desaulniers @ 2020-03-24 17:29 UTC (permalink / raw)
  To: Clement Courbet
  Cc: Nathan Chancellor, Kees Cook, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	LKML, clang-built-linux

On Tue, Mar 24, 2020 at 7:06 AM Clement Courbet <courbet@google.com> wrote:
>
> Thanks for the comments. Answers below.
>
> > This ifdef is unnecessary
> > This needs to be within #ifndef CONFIG_FORTIFY_SOURCE
>
> Thanks, fixed.
>
> > shouldn't this just be done universally ?
>
> It looks like every architecture does its own magic with memory
> functions. I'm not very familiar with how the linux kernel is
> organized, do you have a pointer for where this would go if enabled
> globally ?
>
> > Who's adding it for 64b?
> > Any idea where it's coming from in your
> > build? Maybe a local modification to be removed?
>
> Actually this is from our local build configuration. Apparently this

Not sure we should modify this for someone's downstream tree to work
around an issue they introduced.  If you want to file a bug
internally, I'd be happy to comment on it.

Does __builtin_memset detect support for `rep stosb`, then patch the
kernel to always use it or not?  The kernel is limited in that we use
-mno-sse and friends to avoid saving/restoring vector registers on
context switch unless kernel_fpu_{begin|end}() is called, which the
compiler doesn't insert for memcpy's.

Did you verify what this change does for other compilers?

> is needed to build on some architectures that redefine common
> symbols, but the authors seemed to feel strongly that this should be

Sounds like a linkage error or issue with symbol visibility; I don't
see how -ffreestanding should have any bearing on that.

> on for all architectures. I've reached out to the authors for an
> extended rationale.
> On the other hand I think that there is no reason to prevent people
> from building with freestanding if we can easily allow them to.

I disagree.  The codegen in the kernel is very tricky to get right;
it's somewhat an embedded system, yet provides many symbols that would
have been provided by libc, yet it doesn't use vector operations for
the general case.  Adding -ffreestanding to optimize one hot memset in
one function is using a really big hammer to solve the wrong problem.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH]     x86: Alias memset to __builtin_memset.
  2020-03-23 11:42 [PATCH] x86: Alias memset to __builtin_memset Clement Courbet
                   ` (6 preceding siblings ...)
  2020-03-24 16:02 ` [PATCH] " Clement Courbet
@ 2020-03-25  7:59 ` Clement Courbet
  2020-03-26 12:25 ` [PATCH v4] " Clement Courbet
  2020-03-26 12:38   ` Clement Courbet
  9 siblings, 0 replies; 23+ messages in thread
From: Clement Courbet @ 2020-03-25  7:59 UTC (permalink / raw)
  Cc: Nathan Chancellor, Kees Cook, Nick Desaulniers, Joe Perches,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Clement Courbet, linux-kernel, clang-built-linux

> Not sure we should modify this for someone's downstream tree to work
> around an issue they introduced.  If you want to file a bug
> internally, I'd be happy to comment on it.

I don't have a strong opinion on that. I don't know the policy of the
linux kernel w.r.t. this. There is an internal bug for this, where
kernel maintainers suggested I upstream this patch.

> Does __builtin_memset detect support for `rep stosb`, then patch the
> kernel to always use it or not?

__builtin_memset just allows the compiler to recognize that this has the
semantics of a memset, exactly like it happens when -freestanding is not
specified.

In terms of what compilers do when expanding the memset, it depends on
the size.
gcc or clang obviously do not generate vector code when -no-sse is
specified, as this would break promises.

clang inlines stores for small sizes and switches to memset as the size
increases: https://godbolt.org/z/_X16xt

gcc inlines stores for tiny sizes, then switches to repstos for larger
sizes: https://godbolt.org/z/m-G233 This behaviour is additionally
configurable through command line flags: https://godbolt.org/z/wsoJpQ

> Did you verify what this change does for other compilers?

Are there other compilers are used to build the kernel on x86 ?

icc does the same as gcc and clang: https://godbolt.org/z/yCju_D

> yet it doesn't use vector operations for the general case

I'm not sure how vector operations relate to freestanding, or this change.

> Adding -ffreestanding to optimize one hot memset in
> one function is using a really big hammer to solve the wrong
> problem.

-ffreestanding was not added to optimize anything. It was already there.
If anything -ffreestanding actually pessimizes a lot of the code
generation, as it prevents the compiler from recognizing the semantics
of common primitives. This is what this change is trying to fix.
Removing -ffreestanding from the options is another (easier?) way to fix
the problem. This change is a stab at accomodating both those who want
freestanding and those who want performance.

The hot memset I mentionned was just the hottest one. But as you can imagine
there are many constant-size memsets spread across many functions, some of
which are also very hot, the others constituting a long tail which is not
negligible.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] x86: Alias memset to __builtin_memset.
  2020-03-24 15:59 ` [PATCH v3] " Clement Courbet
@ 2020-03-25 11:33   ` Bernd Petrovitsch
  0 siblings, 0 replies; 23+ messages in thread
From: Bernd Petrovitsch @ 2020-03-25 11:33 UTC (permalink / raw)
  To: Clement Courbet
  Cc: Nathan Chancellor, Kees Cook, Nick Desaulniers, Joe Perches,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, linux-kernel, clang-built-linux

Hi all!

Sry for being late at the party:

On 24/03/2020 16:59, Clement Courbet wrote:
[...]
> ---
>   arch/x86/include/asm/string_64.h | 10 ++++++++++
>   1 file changed, 10 insertions(+)
> 
> diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
> index 75314c3dbe47..9cfce0a840a4 100644
> --- a/arch/x86/include/asm/string_64.h
> +++ b/arch/x86/include/asm/string_64.h
> @@ -18,6 +18,15 @@ extern void *__memcpy(void *to, const void *from, size_t len);
>   void *memset(void *s, int c, size_t n);
>   void *__memset(void *s, int c, size_t n);
>   
> +/* Recent compilers can generate much better code for known size and/or
> + * fill values, and will fallback on `memset` if they fail.
> + * We alias `memset` to `__builtin_memset` explicitly to inform the compiler to
> + * perform this optimization even when -ffreestanding is used.
> + */
> +#if !defined(CONFIG_FORTIFY_SOURCE)
> +#define memset(s, c, count) __builtin_memset(s, c, count)
To be on the safe side, the usual way to write macros is like

#define memset(s, c, count) __builtin_memset((s), (c), (count))

as no one know what is passed as parameter to memset() and the extra 
pair of parentheses don't hurt.

And similar below (and I fear there are more places).

Or did I miss something in the Kernel?

> +#endif
> +
>   #define __HAVE_ARCH_MEMSET16
>   static inline void *memset16(uint16_t *s, uint16_t v, size_t n)
>   {
> @@ -74,6 +83,7 @@ int strcmp(const char *cs, const char *ct);
>   #undef memcpy
>   #define memcpy(dst, src, len) __memcpy(dst, src, len)
#define memcpy(dst, src, len) __memcpy((dst), (src), (len))
>   #define memmove(dst, src, len) __memmove(dst, src, len)
#define memmove(dst, src, len) __memmove((dst), (src), (len))
> +#undef memset
>   #define memset(s, c, n) __memset(s, c, n)
#define memset(s, c, n) __memset((s), (c), (n))
>   
>   #ifndef __NO_FORTIFY
> 

MfG,
	Bernd
-- 
Bernd Petrovitsch                  Email : bernd@petrovitsch.priv.at
                      LUGA : http://www.luga.at

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4]     x86: Alias memset to __builtin_memset.
  2020-03-23 11:42 [PATCH] x86: Alias memset to __builtin_memset Clement Courbet
                   ` (7 preceding siblings ...)
  2020-03-25  7:59 ` Clement Courbet
@ 2020-03-26 12:25 ` Clement Courbet
  2020-03-26 12:38   ` Clement Courbet
  9 siblings, 0 replies; 23+ messages in thread
From: Clement Courbet @ 2020-03-26 12:25 UTC (permalink / raw)
  Cc: Nathan Chancellor, Kees Cook, Nick Desaulniers, Joe Perches,
	Bernd Petrovitsch, Clement Courbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, x86, linux-kernel,
	clang-built-linux

    Recent compilers know the meaning of builtins (`memset`,
    `memcpy`, ...) and can replace calls by inline code when
    deemed better. For example, `memset(p, 0, 4)` will be lowered
    to a four-byte zero store.

    When using -ffreestanding (this is the case e.g. building on
    clang), these optimizations are disabled. This means that **all**
    memsets, including those with small, constant sizes, will result
    in an actual call to memset.

    We have identified several spots where we have high CPU usage
    because of this. For example, a single one of these memsets is
    responsible for about 0.3% of our total CPU usage in the kernel.

    Aliasing `memset` to `__builtin_memset` allows the compiler to
    perform this optimization even when -ffreestanding is used.
    There is no change when -ffreestanding is not used.

    Below is a diff (clang) for `update_sg_lb_stats()`, which
    includes the aforementionned hot memset:
        memset(sgs, 0, sizeof(*sgs));

    Diff:
        movq %rsi, %rbx        ~~~>  movq $0x0, 0x40(%r8)
        movq %rdi, %r15        ~~~>  movq $0x0, 0x38(%r8)
        movl $0x48, %edx       ~~~>  movq $0x0, 0x30(%r8)
        movq %r8, %rdi         ~~~>  movq $0x0, 0x28(%r8)
        xorl %esi, %esi        ~~~>  movq $0x0, 0x20(%r8)
        callq <memset>         ~~~>  movq $0x0, 0x18(%r8)
                               ~~~>  movq $0x0, 0x10(%r8)
                               ~~~>  movq $0x0, 0x8(%r8)
                               ~~~>  movq $0x0, (%r8)

    In terms of code size, this shrins the clang-built kernel a
    bit (-0.022%):
    440383608 vmlinux.clang.before
    440285512 vmlinux.clang.after

Signed-off-by: Clement Courbet <courbet@google.com>

---

changes in v2:
  - Removed ifdef(GNUC >= 4).
  - Disabled change when CONFIG_FORTIFY_SOURCE is set.

changes in v3:
  - Fixed commit message: the kernel shrinks in size.

changes in v4:
  - Properly parenthesize the macro.
---
 arch/x86/include/asm/string_64.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 75314c3dbe47..1bfa825e9ad3 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -18,6 +18,15 @@ extern void *__memcpy(void *to, const void *from, size_t len);
 void *memset(void *s, int c, size_t n);
 void *__memset(void *s, int c, size_t n);
 
+/* Recent compilers can generate much better code for known size and/or
+ * fill values, and will fallback on `memset` if they fail.
+ * We alias `memset` to `__builtin_memset` explicitly to inform the compiler to
+ * perform this optimization even when -ffreestanding is used.
+ */
+#if !defined(CONFIG_FORTIFY_SOURCE)
+#define memset(s, c, count) __builtin_memset((s), (c), (count))
+#endif
+
 #define __HAVE_ARCH_MEMSET16
 static inline void *memset16(uint16_t *s, uint16_t v, size_t n)
 {
@@ -74,6 +83,7 @@ int strcmp(const char *cs, const char *ct);
 #undef memcpy
 #define memcpy(dst, src, len) __memcpy(dst, src, len)
 #define memmove(dst, src, len) __memmove(dst, src, len)
+#undef memset
 #define memset(s, c, n) __memset(s, c, n)
 
 #ifndef __NO_FORTIFY
-- 
2.25.1.696.g5e7596f4ac-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH]     x86: Alias memset to __builtin_memset.
  2020-03-23 11:42 [PATCH] x86: Alias memset to __builtin_memset Clement Courbet
@ 2020-03-26 12:38   ` Clement Courbet
  2020-03-23 20:16 ` kbuild test robot
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Clement Courbet @ 2020-03-26 12:38 UTC (permalink / raw)
  Cc: Nathan Chancellor, Kees Cook, Nick Desaulniers, Joe Perches,
	Bernd Petrovitsch, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, x86, Segher Boessenkool, Greg Kroah-Hartman,
	Allison Randal, Clement Courbet, linuxppc-dev, linux-kernel,
	clang-built-linux

I discussed with the original authors who added freestanding to our
build. It turns out that it was added globally but this was just to
to workaround powerpc not compiling under clang, but they felt the
fix was appropriate globally.

Now Nick has dug up https://lkml.org/lkml/2019/8/29/1300, which
advises against freestanding. Also, I've did some research and
discovered that the original reason for using freestanding for
powerpc has been fixed here:
https://lore.kernel.org/linuxppc-dev/20191119045712.39633-3-natechancellor@gmail.com/

I'm going to remove -ffreestanding from downstream, so we don't really need
this anymore, sorry for waisting people's time.

I wonder if the freestanding fix from the aforementioned patch is really needed
though. I think that clang is actually right to point out the issue.
I don't see any reason why setjmp()/longjmp() are declared as taking longs
rather than ints. The implementation looks like it only ever propagates the
value (in longjmp) or sets it to 1 (in setjmp), and we only ever call longjmp
with integer parameters. But I'm not a PowerPC expert, so I might
be misreading the code.


So it seems that we could just remove freestanding altogether and rewrite the
code to:

diff --git a/arch/powerpc/include/asm/setjmp.h b/arch/powerpc/include/asm/setjmp.h
index 279d03a1eec6..7941ae68fe21 100644
--- a/arch/powerpc/include/asm/setjmp.h
+++ b/arch/powerpc/include/asm/setjmp.h
@@ -12,7 +12,9 @@

 #define JMP_BUF_LEN    23
-extern long setjmp(long *);
-extern void longjmp(long *, long);
+typedef long * jmp_buf;
+
+extern int setjmp(jmp_buf);
+extern void longjmp(jmp_buf, int);

I'm happy to send a patch for this, and get rid of more -ffreestanding.
Opinions ?

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH]     x86: Alias memset to __builtin_memset.
@ 2020-03-26 12:38   ` Clement Courbet
  0 siblings, 0 replies; 23+ messages in thread
From: Clement Courbet @ 2020-03-26 12:38 UTC (permalink / raw)
  Cc: x86, Kees Cook, Borislav Petkov, Greg Kroah-Hartman,
	H. Peter Anvin, Nick Desaulniers, linux-kernel,
	clang-built-linux, Ingo Molnar, Paul Mackerras, Clement Courbet,
	Joe Perches, Bernd Petrovitsch, Nathan Chancellor, linuxppc-dev,
	Thomas Gleixner, Allison Randal

I discussed with the original authors who added freestanding to our
build. It turns out that it was added globally but this was just to
to workaround powerpc not compiling under clang, but they felt the
fix was appropriate globally.

Now Nick has dug up https://lkml.org/lkml/2019/8/29/1300, which
advises against freestanding. Also, I've did some research and
discovered that the original reason for using freestanding for
powerpc has been fixed here:
https://lore.kernel.org/linuxppc-dev/20191119045712.39633-3-natechancellor@gmail.com/

I'm going to remove -ffreestanding from downstream, so we don't really need
this anymore, sorry for waisting people's time.

I wonder if the freestanding fix from the aforementioned patch is really needed
though. I think that clang is actually right to point out the issue.
I don't see any reason why setjmp()/longjmp() are declared as taking longs
rather than ints. The implementation looks like it only ever propagates the
value (in longjmp) or sets it to 1 (in setjmp), and we only ever call longjmp
with integer parameters. But I'm not a PowerPC expert, so I might
be misreading the code.


So it seems that we could just remove freestanding altogether and rewrite the
code to:

diff --git a/arch/powerpc/include/asm/setjmp.h b/arch/powerpc/include/asm/setjmp.h
index 279d03a1eec6..7941ae68fe21 100644
--- a/arch/powerpc/include/asm/setjmp.h
+++ b/arch/powerpc/include/asm/setjmp.h
@@ -12,7 +12,9 @@

 #define JMP_BUF_LEN    23
-extern long setjmp(long *);
-extern void longjmp(long *, long);
+typedef long * jmp_buf;
+
+extern int setjmp(jmp_buf);
+extern void longjmp(jmp_buf, int);

I'm happy to send a patch for this, and get rid of more -ffreestanding.
Opinions ?

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: Alias memset to __builtin_memset.
  2020-03-26 12:38   ` Clement Courbet
@ 2020-03-26 17:18     ` Nick Desaulniers
  -1 siblings, 0 replies; 23+ messages in thread
From: Nick Desaulniers @ 2020-03-26 17:18 UTC (permalink / raw)
  To: Clement Courbet
  Cc: Nathan Chancellor, Kees Cook, Joe Perches, Bernd Petrovitsch,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Segher Boessenkool, Greg Kroah-Hartman, Allison Randal,
	linuxppc-dev, LKML, clang-built-linux

On Thu, Mar 26, 2020 at 5:38 AM Clement Courbet <courbet@google.com> wrote:
>
> I discussed with the original authors who added freestanding to our
> build. It turns out that it was added globally but this was just to
> to workaround powerpc not compiling under clang, but they felt the
> fix was appropriate globally.
>
> Now Nick has dug up https://lkml.org/lkml/2019/8/29/1300, which
> advises against freestanding. Also, I've did some research and
> discovered that the original reason for using freestanding for
> powerpc has been fixed here:
> https://lore.kernel.org/linuxppc-dev/20191119045712.39633-3-natechancellor@gmail.com/
>
> I'm going to remove -ffreestanding from downstream, so we don't really need
> this anymore, sorry for waisting people's time.
>
> I wonder if the freestanding fix from the aforementioned patch is really needed
> though. I think that clang is actually right to point out the issue.
> I don't see any reason why setjmp()/longjmp() are declared as taking longs
> rather than ints. The implementation looks like it only ever propagates the
> value (in longjmp) or sets it to 1 (in setjmp), and we only ever call longjmp
> with integer parameters. But I'm not a PowerPC expert, so I might
> be misreading the code.
>
>
> So it seems that we could just remove freestanding altogether and rewrite the
> code to:
>
> diff --git a/arch/powerpc/include/asm/setjmp.h b/arch/powerpc/include/asm/setjmp.h
> index 279d03a1eec6..7941ae68fe21 100644
> --- a/arch/powerpc/include/asm/setjmp.h
> +++ b/arch/powerpc/include/asm/setjmp.h
> @@ -12,7 +12,9 @@
>
>  #define JMP_BUF_LEN    23
> -extern long setjmp(long *);
> -extern void longjmp(long *, long);
> +typedef long * jmp_buf;
> +
> +extern int setjmp(jmp_buf);
> +extern void longjmp(jmp_buf, int);
>
> I'm happy to send a patch for this, and get rid of more -ffreestanding.
> Opinions ?

Yes, I think the above diff and additionally cleaning up
-ffreestanding from arch/powerpc/kernel/Makefile and
arch/powerpc/xmon/Makefile would be a much better approach.  If that's
good enough to fix the warning, I kind of can't believe we didn't spot
that in the original code review!

Actually, the god awful warning was:
./arch/powerpc/include/asm/setjmp.h:10:13: error: declaration of
built-in function 'setjmp' requires the declaration of the 'jmp_buf'
type, commonly provided in the header <setjmp.h>.
[-Werror,-Wincomplete-setjmp-declaration]
extern long setjmp(long *) __attribute__((returns_twice));
            ^
So jmp_buf was missing, wasn't used, but also the long vs int confusion.

I tested the above diff, all calls to setjmp under arch/powerpc/ just
compare the return value against 0.  Callers of longjmp just pass 1
for the final parameter. So the above changes should be no functional
change.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86: Alias memset to __builtin_memset.
@ 2020-03-26 17:18     ` Nick Desaulniers
  0 siblings, 0 replies; 23+ messages in thread
From: Nick Desaulniers @ 2020-03-26 17:18 UTC (permalink / raw)
  To: Clement Courbet
  Cc: maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Kees Cook, Borislav Petkov, Greg Kroah-Hartman, H. Peter Anvin,
	LKML, clang-built-linux, Ingo Molnar, Paul Mackerras,
	Joe Perches, Bernd Petrovitsch, Nathan Chancellor, linuxppc-dev,
	Thomas Gleixner, Allison Randal

On Thu, Mar 26, 2020 at 5:38 AM Clement Courbet <courbet@google.com> wrote:
>
> I discussed with the original authors who added freestanding to our
> build. It turns out that it was added globally but this was just to
> to workaround powerpc not compiling under clang, but they felt the
> fix was appropriate globally.
>
> Now Nick has dug up https://lkml.org/lkml/2019/8/29/1300, which
> advises against freestanding. Also, I've did some research and
> discovered that the original reason for using freestanding for
> powerpc has been fixed here:
> https://lore.kernel.org/linuxppc-dev/20191119045712.39633-3-natechancellor@gmail.com/
>
> I'm going to remove -ffreestanding from downstream, so we don't really need
> this anymore, sorry for waisting people's time.
>
> I wonder if the freestanding fix from the aforementioned patch is really needed
> though. I think that clang is actually right to point out the issue.
> I don't see any reason why setjmp()/longjmp() are declared as taking longs
> rather than ints. The implementation looks like it only ever propagates the
> value (in longjmp) or sets it to 1 (in setjmp), and we only ever call longjmp
> with integer parameters. But I'm not a PowerPC expert, so I might
> be misreading the code.
>
>
> So it seems that we could just remove freestanding altogether and rewrite the
> code to:
>
> diff --git a/arch/powerpc/include/asm/setjmp.h b/arch/powerpc/include/asm/setjmp.h
> index 279d03a1eec6..7941ae68fe21 100644
> --- a/arch/powerpc/include/asm/setjmp.h
> +++ b/arch/powerpc/include/asm/setjmp.h
> @@ -12,7 +12,9 @@
>
>  #define JMP_BUF_LEN    23
> -extern long setjmp(long *);
> -extern void longjmp(long *, long);
> +typedef long * jmp_buf;
> +
> +extern int setjmp(jmp_buf);
> +extern void longjmp(jmp_buf, int);
>
> I'm happy to send a patch for this, and get rid of more -ffreestanding.
> Opinions ?

Yes, I think the above diff and additionally cleaning up
-ffreestanding from arch/powerpc/kernel/Makefile and
arch/powerpc/xmon/Makefile would be a much better approach.  If that's
good enough to fix the warning, I kind of can't believe we didn't spot
that in the original code review!

Actually, the god awful warning was:
./arch/powerpc/include/asm/setjmp.h:10:13: error: declaration of
built-in function 'setjmp' requires the declaration of the 'jmp_buf'
type, commonly provided in the header <setjmp.h>.
[-Werror,-Wincomplete-setjmp-declaration]
extern long setjmp(long *) __attribute__((returns_twice));
            ^
So jmp_buf was missing, wasn't used, but also the long vs int confusion.

I tested the above diff, all calls to setjmp under arch/powerpc/ just
compare the return value against 0.  Callers of longjmp just pass 1
for the final parameter. So the above changes should be no functional
change.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH]     x86: Alias memset to __builtin_memset.
  2020-03-26 12:38   ` Clement Courbet
@ 2020-03-27  4:06     ` Michael Ellerman
  -1 siblings, 0 replies; 23+ messages in thread
From: Michael Ellerman @ 2020-03-27  4:06 UTC (permalink / raw)
  To: Clement Courbet
  Cc: Nathan Chancellor, Kees Cook, Nick Desaulniers, Joe Perches,
	Bernd Petrovitsch, Benjamin Herrenschmidt, Paul Mackerras,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Segher Boessenkool, Greg Kroah-Hartman, Allison Randal,
	Clement Courbet, linuxppc-dev, linux-kernel, clang-built-linux

Clement Courbet <courbet@google.com> writes:
> I discussed with the original authors who added freestanding to our
> build. It turns out that it was added globally but this was just to
> to workaround powerpc not compiling under clang, but they felt the
> fix was appropriate globally.
>
> Now Nick has dug up https://lkml.org/lkml/2019/8/29/1300, which
> advises against freestanding. Also, I've did some research and
> discovered that the original reason for using freestanding for
> powerpc has been fixed here:
> https://lore.kernel.org/linuxppc-dev/20191119045712.39633-3-natechancellor@gmail.com/
>
> I'm going to remove -ffreestanding from downstream, so we don't really need
> this anymore, sorry for waisting people's time.
>
> I wonder if the freestanding fix from the aforementioned patch is really needed
> though. I think that clang is actually right to point out the issue.
> I don't see any reason why setjmp()/longjmp() are declared as taking longs
> rather than ints. The implementation looks like it only ever propagates the
> value (in longjmp) or sets it to 1 (in setjmp), and we only ever call longjmp
> with integer parameters. But I'm not a PowerPC expert, so I might
> be misreading the code.
>
>
> So it seems that we could just remove freestanding altogether and rewrite the
> code to:
>
> diff --git a/arch/powerpc/include/asm/setjmp.h b/arch/powerpc/include/asm/setjmp.h
> index 279d03a1eec6..7941ae68fe21 100644
> --- a/arch/powerpc/include/asm/setjmp.h
> +++ b/arch/powerpc/include/asm/setjmp.h
> @@ -12,7 +12,9 @@
>
>  #define JMP_BUF_LEN    23
> -extern long setjmp(long *);
> -extern void longjmp(long *, long);
> +typedef long * jmp_buf;
> +
> +extern int setjmp(jmp_buf);
> +extern void longjmp(jmp_buf, int);
>
> I'm happy to send a patch for this, and get rid of more -ffreestanding.
> Opinions ?

If it works then it looks like a much better fix than using -ffreestanding.

Please submit a patch with a change log etc. and I'd be happy to merge
it.

cheers

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH]     x86: Alias memset to __builtin_memset.
@ 2020-03-27  4:06     ` Michael Ellerman
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Ellerman @ 2020-03-27  4:06 UTC (permalink / raw)
  To: Clement Courbet
  Cc: Kees Cook, Borislav Petkov, Greg Kroah-Hartman, x86,
	Nick Desaulniers, linux-kernel, clang-built-linux, Ingo Molnar,
	Paul Mackerras, Clement Courbet, H. Peter Anvin, Joe Perches,
	Bernd Petrovitsch, Nathan Chancellor, linuxppc-dev,
	Thomas Gleixner, Allison Randal

Clement Courbet <courbet@google.com> writes:
> I discussed with the original authors who added freestanding to our
> build. It turns out that it was added globally but this was just to
> to workaround powerpc not compiling under clang, but they felt the
> fix was appropriate globally.
>
> Now Nick has dug up https://lkml.org/lkml/2019/8/29/1300, which
> advises against freestanding. Also, I've did some research and
> discovered that the original reason for using freestanding for
> powerpc has been fixed here:
> https://lore.kernel.org/linuxppc-dev/20191119045712.39633-3-natechancellor@gmail.com/
>
> I'm going to remove -ffreestanding from downstream, so we don't really need
> this anymore, sorry for waisting people's time.
>
> I wonder if the freestanding fix from the aforementioned patch is really needed
> though. I think that clang is actually right to point out the issue.
> I don't see any reason why setjmp()/longjmp() are declared as taking longs
> rather than ints. The implementation looks like it only ever propagates the
> value (in longjmp) or sets it to 1 (in setjmp), and we only ever call longjmp
> with integer parameters. But I'm not a PowerPC expert, so I might
> be misreading the code.
>
>
> So it seems that we could just remove freestanding altogether and rewrite the
> code to:
>
> diff --git a/arch/powerpc/include/asm/setjmp.h b/arch/powerpc/include/asm/setjmp.h
> index 279d03a1eec6..7941ae68fe21 100644
> --- a/arch/powerpc/include/asm/setjmp.h
> +++ b/arch/powerpc/include/asm/setjmp.h
> @@ -12,7 +12,9 @@
>
>  #define JMP_BUF_LEN    23
> -extern long setjmp(long *);
> -extern void longjmp(long *, long);
> +typedef long * jmp_buf;
> +
> +extern int setjmp(jmp_buf);
> +extern void longjmp(jmp_buf, int);
>
> I'm happy to send a patch for this, and get rid of more -ffreestanding.
> Opinions ?

If it works then it looks like a much better fix than using -ffreestanding.

Please submit a patch with a change log etc. and I'd be happy to merge
it.

cheers

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH]     x86: Alias memset to __builtin_memset.
  2020-03-26 12:38   ` Clement Courbet
@ 2020-03-27 17:12     ` Segher Boessenkool
  -1 siblings, 0 replies; 23+ messages in thread
From: Segher Boessenkool @ 2020-03-27 17:12 UTC (permalink / raw)
  To: Clement Courbet
  Cc: Nathan Chancellor, Kees Cook, Nick Desaulniers, Joe Perches,
	Bernd Petrovitsch, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, x86, Greg Kroah-Hartman, Allison Randal,
	linuxppc-dev, linux-kernel, clang-built-linux

Hi!

On Thu, Mar 26, 2020 at 01:38:39PM +0100, Clement Courbet wrote:
> --- a/arch/powerpc/include/asm/setjmp.h
> +++ b/arch/powerpc/include/asm/setjmp.h
> @@ -12,7 +12,9 @@
> 
>  #define JMP_BUF_LEN    23
> -extern long setjmp(long *);
> -extern void longjmp(long *, long);
> +typedef long * jmp_buf;
> +
> +extern int setjmp(jmp_buf);
> +extern void longjmp(jmp_buf, int);
> 
> I'm happy to send a patch for this, and get rid of more -ffreestanding.
> Opinions ?

Pedantically, jmp_buf should be an array type.  But, this will probably
work fine, and it certainly is better than what we had before.

You could do
  typedef long jmp_buf[JMP_BUF_LEN];
perhaps?

Thanks,


Segher

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH]     x86: Alias memset to __builtin_memset.
@ 2020-03-27 17:12     ` Segher Boessenkool
  0 siblings, 0 replies; 23+ messages in thread
From: Segher Boessenkool @ 2020-03-27 17:12 UTC (permalink / raw)
  To: Clement Courbet
  Cc: x86, Kees Cook, Borislav Petkov, Greg Kroah-Hartman,
	H. Peter Anvin, Nick Desaulniers, linux-kernel,
	clang-built-linux, Ingo Molnar, Paul Mackerras, Joe Perches,
	Bernd Petrovitsch, Nathan Chancellor, linuxppc-dev,
	Thomas Gleixner, Allison Randal

Hi!

On Thu, Mar 26, 2020 at 01:38:39PM +0100, Clement Courbet wrote:
> --- a/arch/powerpc/include/asm/setjmp.h
> +++ b/arch/powerpc/include/asm/setjmp.h
> @@ -12,7 +12,9 @@
> 
>  #define JMP_BUF_LEN    23
> -extern long setjmp(long *);
> -extern void longjmp(long *, long);
> +typedef long * jmp_buf;
> +
> +extern int setjmp(jmp_buf);
> +extern void longjmp(jmp_buf, int);
> 
> I'm happy to send a patch for this, and get rid of more -ffreestanding.
> Opinions ?

Pedantically, jmp_buf should be an array type.  But, this will probably
work fine, and it certainly is better than what we had before.

You could do
  typedef long jmp_buf[JMP_BUF_LEN];
perhaps?

Thanks,


Segher

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-03-27 17:16 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-23 11:42 [PATCH] x86: Alias memset to __builtin_memset Clement Courbet
2020-03-23 15:47 ` Nathan Chancellor
2020-03-23 20:16 ` kbuild test robot
2020-03-23 21:59   ` Kees Cook
2020-03-24  1:22 ` Nick Desaulniers
2020-03-24  1:26   ` Nick Desaulniers
2020-03-24 14:06 ` Clement Courbet
2020-03-24 17:29   ` Nick Desaulniers
2020-03-24 14:07 ` [PATCH v2] " Clement Courbet
2020-03-24 15:08   ` Joe Perches
2020-03-24 15:59 ` [PATCH v3] " Clement Courbet
2020-03-25 11:33   ` Bernd Petrovitsch
2020-03-24 16:02 ` [PATCH] " Clement Courbet
2020-03-25  7:59 ` Clement Courbet
2020-03-26 12:25 ` [PATCH v4] " Clement Courbet
2020-03-26 12:38 ` [PATCH] " Clement Courbet
2020-03-26 12:38   ` Clement Courbet
2020-03-26 17:18   ` Nick Desaulniers
2020-03-26 17:18     ` Nick Desaulniers
2020-03-27  4:06   ` Michael Ellerman
2020-03-27  4:06     ` Michael Ellerman
2020-03-27 17:12   ` Segher Boessenkool
2020-03-27 17:12     ` Segher Boessenkool

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.