linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* objtool warning "uses BP as a scratch register" with clang-9
@ 2019-08-27 12:30 Arnd Bergmann
  2019-08-27 14:51 ` Josh Poimboeuf
  0 siblings, 1 reply; 38+ messages in thread
From: Arnd Bergmann @ 2019-08-27 12:30 UTC (permalink / raw)
  To: Linux Kernel Mailing List, clang-built-linux
  Cc: Nick Desaulniers, Josh Poimboeuf

I upgraded to the latest clang-9 snapshot from http://apt.llvm.org/ today.
Many problems are fixed, but I still get tons of warnings like

arch/x86/kernel/cpu/mtrr/generic.o: warning: objtool:
mtrr_type_lookup_variable uses BP as a scratch register
arch/x86/kernel/process.o: warning: objtool: get_tsc_mode()+0x21: call
without frame pointer save/setup
arch/x86/kernel/early_printk.o: warning: objtool: early_vga_write uses
BP as a scratch register
arch/x86/kernel/sysfb_simplefb.o: warning: objtool: parse_mode uses BP
as a scratch register
arch/x86/kernel/head64.o: warning: objtool: __startup_64 uses BP as a
scratch register
kernel/time/timeconv.o: warning: objtool: time64_to_tm uses BP as a
scratch register
kernel/trace/ring_buffer.o: warning: objtool:
ring_buffer_discard_commit uses BP as a scratch register
...

I created a reduced test case:

$ cat crc32.i
typedef unsigned u32;
long a, c;
u32 b, f;
u32 *d, *e;
void fn1() {
  u32 *g = &f, *h = e, *i = d;
  for (; a < c; a++)
    b = i[b >> 8 & 255] ^ h[b] ^ g[5];
}
$ clang-9 -c  crc32.i  -O2   ; objtool check  crc32.o
crc32.o: warning: objtool: fn1 uses BP as a scratch register
$ objdump -d crc32.o
0000000000000000 <fn1>:
   0: 55                    push   %rbp
   1: 53                    push   %rbx
   2: 4c 8b 05 00 00 00 00 mov    0x0(%rip),%r8        # 9 <fn1+0x9>
   9: 48 8b 05 00 00 00 00 mov    0x0(%rip),%rax        # 10 <fn1+0x10>
  10: 4c 39 c0              cmp    %r8,%rax
  13: 7e 7f                jle    94 <fn1+0x94>
  15: 48 8b 0d 00 00 00 00 mov    0x0(%rip),%rcx        # 1c <fn1+0x1c>
  1c: 48 8b 15 00 00 00 00 mov    0x0(%rip),%rdx        # 23 <fn1+0x23>
  23: 8b 1d 00 00 00 00    mov    0x0(%rip),%ebx        # 29 <fn1+0x29>
  29: 8b 35 00 00 00 00    mov    0x0(%rip),%esi        # 2f <fn1+0x2f>
  2f: 89 c7                mov    %eax,%edi
  31: 44 29 c7              sub    %r8d,%edi
  34: 40 f6 c7 01          test   $0x1,%dil
  38: 75 05                jne    3f <fn1+0x3f>
  3a: 4c 89 c7              mov    %r8,%rdi
  3d: eb 15                jmp    54 <fn1+0x54>
  3f: 0f b6 ff              movzbl %bh,%edi
  42: 8b 1c 99              mov    (%rcx,%rbx,4),%ebx
  45: 33 1c ba              xor    (%rdx,%rdi,4),%ebx
  48: 31 f3                xor    %esi,%ebx
  4a: 89 1d 00 00 00 00    mov    %ebx,0x0(%rip)        # 50 <fn1+0x50>
  50: 49 8d 78 01          lea    0x1(%r8),%rdi
  54: 49 83 c0 01          add    $0x1,%r8
  58: 4c 39 c0              cmp    %r8,%rax
  5b: 74 30                je     8d <fn1+0x8d>
  5d: 0f 1f 00              nopl   (%rax)
  60: 0f b6 ef              movzbl %bh,%ebp
  63: 89 db                mov    %ebx,%ebx
  65: 8b 1c 99              mov    (%rcx,%rbx,4),%ebx
  68: 33 1c aa              xor    (%rdx,%rbp,4),%ebx
  6b: 31 f3                xor    %esi,%ebx
  6d: 89 1d 00 00 00 00    mov    %ebx,0x0(%rip)        # 73 <fn1+0x73>
  73: 0f b6 ef              movzbl %bh,%ebp
  76: 8b 1c 99              mov    (%rcx,%rbx,4),%ebx
  79: 33 1c aa              xor    (%rdx,%rbp,4),%ebx
  7c: 31 f3                xor    %esi,%ebx
  7e: 89 1d 00 00 00 00    mov    %ebx,0x0(%rip)        # 84 <fn1+0x84>
  84: 48 83 c7 02          add    $0x2,%rdi
  88: 48 39 c7              cmp    %rax,%rdi
  8b: 7c d3                jl     60 <fn1+0x60>
  8d: 48 89 3d 00 00 00 00 mov    %rdi,0x0(%rip)        # 94 <fn1+0x94>
  94: 5b                    pop    %rbx
  95: 5d                    pop    %rbp
  96: c3                    retq

This happens with clang-9 and clang-10 at the moment, but not clang-8.

        Arnd

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-27 12:30 objtool warning "uses BP as a scratch register" with clang-9 Arnd Bergmann
@ 2019-08-27 14:51 ` Josh Poimboeuf
  2019-08-27 14:59   ` Ilie Halip
  0 siblings, 1 reply; 38+ messages in thread
From: Josh Poimboeuf @ 2019-08-27 14:51 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Linux Kernel Mailing List, clang-built-linux, Nick Desaulniers

On Tue, Aug 27, 2019 at 02:30:06PM +0200, Arnd Bergmann wrote:
> I upgraded to the latest clang-9 snapshot from http://apt.llvm.org/ today.
> Many problems are fixed, but I still get tons of warnings like
> 
> arch/x86/kernel/cpu/mtrr/generic.o: warning: objtool:
> mtrr_type_lookup_variable uses BP as a scratch register
> arch/x86/kernel/process.o: warning: objtool: get_tsc_mode()+0x21: call
> without frame pointer save/setup
> arch/x86/kernel/early_printk.o: warning: objtool: early_vga_write uses
> BP as a scratch register
> arch/x86/kernel/sysfb_simplefb.o: warning: objtool: parse_mode uses BP
> as a scratch register
> arch/x86/kernel/head64.o: warning: objtool: __startup_64 uses BP as a
> scratch register
> kernel/time/timeconv.o: warning: objtool: time64_to_tm uses BP as a
> scratch register
> kernel/trace/ring_buffer.o: warning: objtool:
> ring_buffer_discard_commit uses BP as a scratch register
> ...
> 
> I created a reduced test case:
> 
> $ cat crc32.i
> typedef unsigned u32;
> long a, c;
> u32 b, f;
> u32 *d, *e;
> void fn1() {
>   u32 *g = &f, *h = e, *i = d;
>   for (; a < c; a++)
>     b = i[b >> 8 & 255] ^ h[b] ^ g[5];
> }
> $ clang-9 -c  crc32.i  -O2   ; objtool check  crc32.o
> crc32.o: warning: objtool: fn1 uses BP as a scratch register

I assume your config has CONFIG_UNWINDER_FRAME_POINTER=y, since objtool
is doing frame pointer checking.  Though I'm not sure about that, since
I don't see -fno-omit-frame-pointer on the reduced clang command line.
Do you still see this warning with -fno-omit-frame-pointer (assuming
clang has that option)?

-- 
Josh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-27 14:51 ` Josh Poimboeuf
@ 2019-08-27 14:59   ` Ilie Halip
  2019-08-27 19:00     ` Arnd Bergmann
  0 siblings, 1 reply; 38+ messages in thread
From: Ilie Halip @ 2019-08-27 14:59 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Arnd Bergmann, Linux Kernel Mailing List, clang-built-linux,
	Nick Desaulniers

> > $ clang-9 -c  crc32.i  -O2   ; objtool check  crc32.o
> > crc32.o: warning: objtool: fn1 uses BP as a scratch register

Yes, I see it too. https://godbolt.org/z/N56HW1

> Do you still see this warning with -fno-omit-frame-pointer (assuming
> clang has that option)?

Using this makes the warning go away. Running objtool with --no-fp
also gets rid of it.

I.H.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-27 14:59   ` Ilie Halip
@ 2019-08-27 19:00     ` Arnd Bergmann
  2019-08-27 19:22       ` Josh Poimboeuf
  0 siblings, 1 reply; 38+ messages in thread
From: Arnd Bergmann @ 2019-08-27 19:00 UTC (permalink / raw)
  To: Ilie Halip
  Cc: Josh Poimboeuf, Linux Kernel Mailing List, clang-built-linux,
	Nick Desaulniers

On Tue, Aug 27, 2019 at 5:00 PM Ilie Halip <ilie.halip@gmail.com> wrote:
>
> > > $ clang-9 -c  crc32.i  -O2   ; objtool check  crc32.o
> > > crc32.o: warning: objtool: fn1 uses BP as a scratch register
>
> Yes, I see it too. https://godbolt.org/z/N56HW1
>
> > Do you still see this warning with -fno-omit-frame-pointer (assuming
> > clang has that option)?
>
> Using this makes the warning go away. Running objtool with --no-fp
> also gets rid of it.

I still see the warning after adding back the -fno-omit-frame-pointer
in my reduced test case:

$ clang-9 -c  crc32.i -Werror -Wno-address-of-packed-member -Wall
-Wno-pointer-sign -Wno-unused-value -Wno-constant-logical-operand -O2
-Wno-unused -fno-omit-frame-pointer
$ objtool check  crc32.o
crc32.o: warning: objtool: fn1 uses BP as a scratch register

The kernel configuration has frame pointers enabled:
$ make O=obj-x86  CC=clang-9 V=1 lib/crc32.o
...
clang-9 -Wp,-MD,lib/.crc32.o.d  -nostdinc -isystem
/usr/lib/llvm-9/lib/clang/9.0.0/include -I../arch/x86/include
-I./arch/x86/include/generated -I../include -I./include
-I../arch/x86/include/uapi -I./arch/x86/include/generated/uapi
-I../include/uapi -I./include/generated/uapi -include
../include/linux/kconfig.h -include ../include/linux/compiler_types.h
-D__KERNEL__ -Qunused-arguments -Wall -Wundef
-Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing
-fno-common -fshort-wchar -fno-PIE
-Werror=implicit-function-declaration -Werror=implicit-int
-Wno-format-security -std=gnu89 -no-integrated-as
-Werror=unknown-warning-option -mno-sse -mno-mmx -mno-sse2 -mno-3dnow
-mno-avx -m64 -mno-80387 -mstack-alignment=8 -mtune=generic
-mno-red-zone -mcmodel=kernel -DCONFIG_AS_CFI=1
-DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1
-DCONFIG_AS_SSSE3=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1
-DCONFIG_AS_AVX512=1 -DCONFIG_AS_SHA1_NI=1 -DCONFIG_AS_SHA256_NI=1
-Wno-sign-compare -fno-asynchronous-unwind-tables
-fno-delete-null-pointer-checks -Wno-address-of-packed-member -O2
-Wframe-larger-than=2048 -fstack-protector
-Wno-format-invalid-specifier -Wno-gnu -Wno-tautological-compare
-mno-global-merge -Wno-unused-const-variable -fno-omit-frame-pointer
-fno-optimize-sibling-calls -Wdeclaration-after-statement -Wvla
-Wno-pointer-sign -fno-strict-overflow -fno-merge-all-constants
-fno-stack-check -Werror=date-time -Werror=incompatible-pointer-types
-Wno-initializer-overrides -Wno-format -Wno-sign-compare
-Wno-format-zero-length -I ../lib -I ./lib
-DKBUILD_BASENAME='"crc32"' -DKBUILD_MODNAME='"crc32"' -c -o
lib/crc32.o ../lib/crc32.c
  /bin/bash ../scripts/gen_ksymdeps.sh lib/crc32.o >> lib/.crc32.o.cmd
   ./tools/objtool/objtool check  lib/crc32.o
lib/crc32.o: warning: objtool: crc32_le uses BP as a scratch register
lib/crc32.o: warning: objtool: __crc32c_le uses BP as a scratch register
lib/crc32.o: warning: objtool: crc32_be uses BP as a scratch register

I uploaded my .config file to https://pastebin.com/raw/dmQ1CsNs for reference.

I don't see anything unusual in the configuration, but it's interesting that
this configuration has more of those warnings that other configurations,
so there must be something to it. Here is a list of how many objtool warnings
I got in some recent randconfig builds:

$ find rand86/ -name \*success -mmin -300  | while read i ; do echo
`grep objtool $i | wc -l`  $i ; done
0 rand86/0x694311F1-success
0 rand86/0x8D972A20-success
35 rand86/0x23203CF4-success
2 rand86/0xCBBF0EE0-success
2 rand86/0x4C14539-success
0 rand86/0xDAF4E782-success
0 rand86/0xC0A55734-success
0 rand86/0x66A9932-success
0 rand86/0x6EA0E018-success
3 rand86/0x39524E4C-success
0 rand86/0x1529A460-success
0 rand86/0xF8B3C01-success
0 rand86/0x3A38D796-success
1 rand86/0xBC23C91A-success
0 rand86/0xBB45D3A4-success
0 rand86/0xA907618F-success
0 rand86/0xE1FD6050-success
2 rand86/0x564C9F97-success
1 rand86/0x6B87B4EC-success
0 rand86/0x138BF05D-success
0 rand86/0xBEA383C4-success
4 rand86/0x13EDB348-success
0 rand86/0xDD0F59A6-success
1 rand86/0x1F629018-success
0 rand86/0x775CC509-success
8 rand86/0xC368A488-success
0 rand86/0x619AB1C8-success
0 rand86/0xF94B887A-success
197 rand86/0xBB7BE7F0-success
0 rand86/0x9C77525F-success
0 rand86/0x8FB025C4-success
50 rand86/0xA25C2C06-success
0 rand86/0xC96DF498-success
3 rand86/0xA9CE5E8F-success
0 rand86/0xC2F8A840-success
0 rand86/0xEAD8A021-success
0 rand86/0x5721D814-success
141 rand86/0x8259791A-success
0 rand86/0x9EB91B3D-success
0 rand86/0x5F1C100E-success
2 rand86/0xD132EF06-success
0 rand86/0x1C679BF4-success
0 rand86/0xE647B3E0-success
0 rand86/0xEA358480-success

    Arnd

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-27 19:00     ` Arnd Bergmann
@ 2019-08-27 19:22       ` Josh Poimboeuf
  2019-08-27 19:47         ` Arnd Bergmann
  2019-08-28 22:13         ` Nick Desaulniers
  0 siblings, 2 replies; 38+ messages in thread
From: Josh Poimboeuf @ 2019-08-27 19:22 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Ilie Halip, Linux Kernel Mailing List, clang-built-linux,
	Nick Desaulniers

On Tue, Aug 27, 2019 at 09:00:52PM +0200, Arnd Bergmann wrote:
> On Tue, Aug 27, 2019 at 5:00 PM Ilie Halip <ilie.halip@gmail.com> wrote:
> >
> > > > $ clang-9 -c  crc32.i  -O2   ; objtool check  crc32.o
> > > > crc32.o: warning: objtool: fn1 uses BP as a scratch register
> >
> > Yes, I see it too. https://godbolt.org/z/N56HW1
> >
> > > Do you still see this warning with -fno-omit-frame-pointer (assuming
> > > clang has that option)?
> >
> > Using this makes the warning go away. Running objtool with --no-fp
> > also gets rid of it.
> 
> I still see the warning after adding back the -fno-omit-frame-pointer
> in my reduced test case:
> 
> $ clang-9 -c  crc32.i -Werror -Wno-address-of-packed-member -Wall
> -Wno-pointer-sign -Wno-unused-value -Wno-constant-logical-operand -O2
> -Wno-unused -fno-omit-frame-pointer
> $ objtool check  crc32.o
> crc32.o: warning: objtool: fn1 uses BP as a scratch register

This warning most likely means that clang is clobbering RBP in leaf
functions.  With -fno-omit-frame-pointer, leaf functions don't need to
set up the frame pointer, but they at least need to leave RBP untouched,
so that an interrupts/exceptions can unwind through the function.

-- 
Josh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-27 19:22       ` Josh Poimboeuf
@ 2019-08-27 19:47         ` Arnd Bergmann
  2019-08-27 21:21           ` Nick Desaulniers
  2019-08-28 22:13         ` Nick Desaulniers
  1 sibling, 1 reply; 38+ messages in thread
From: Arnd Bergmann @ 2019-08-27 19:47 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Ilie Halip, Linux Kernel Mailing List, clang-built-linux,
	Nick Desaulniers

On Tue, Aug 27, 2019 at 9:23 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Tue, Aug 27, 2019 at 09:00:52PM +0200, Arnd Bergmann wrote:
> > On Tue, Aug 27, 2019 at 5:00 PM Ilie Halip <ilie.halip@gmail.com> wrote:
> > >
> > > > > $ clang-9 -c  crc32.i  -O2   ; objtool check  crc32.o
> > > > > crc32.o: warning: objtool: fn1 uses BP as a scratch register
> > >
> > > Yes, I see it too. https://godbolt.org/z/N56HW1
> > >
> > > > Do you still see this warning with -fno-omit-frame-pointer (assuming
> > > > clang has that option)?
> > >
> > > Using this makes the warning go away. Running objtool with --no-fp
> > > also gets rid of it.
> >
> > I still see the warning after adding back the -fno-omit-frame-pointer
> > in my reduced test case:
> >
> > $ clang-9 -c  crc32.i -Werror -Wno-address-of-packed-member -Wall
> > -Wno-pointer-sign -Wno-unused-value -Wno-constant-logical-operand -O2
> > -Wno-unused -fno-omit-frame-pointer
> > $ objtool check  crc32.o
> > crc32.o: warning: objtool: fn1 uses BP as a scratch register
>
> This warning most likely means that clang is clobbering RBP in leaf
> functions.  With -fno-omit-frame-pointer, leaf functions don't need to
> set up the frame pointer, but they at least need to leave RBP untouched,
> so that an interrupts/exceptions can unwind through the function.

Yes, that clearly matches what I see in the output where it does

   0: 55                    push   %rbp
...
  73: 0f b6 ef              movzbl %bh,%ebp
  76: 8b 1c 99              mov    (%rcx,%rbx,4),%ebx
  79: 33 1c aa              xor    (%rdx,%rbp,4),%ebx
...
  95: 5d                    pop    %rbp
  96: c3                    retq

I just did another simple test: an x86-64 defconfig build with
UNWINDER_FRAME_POINTER shows the exact symptom as
my randconfig, so it sounds like any configuration with frame
pointers would, and there is nothing else to it (this also makes
sense given that it happens with a relatively simple test case
outside of the kernel).

       Arnd

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-27 19:47         ` Arnd Bergmann
@ 2019-08-27 21:21           ` Nick Desaulniers
  2019-08-28  9:00             ` Arnd Bergmann
  0 siblings, 1 reply; 38+ messages in thread
From: Nick Desaulniers @ 2019-08-27 21:21 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Josh Poimboeuf, Ilie Halip, Linux Kernel Mailing List, clang-built-linux

On Tue, Aug 27, 2019 at 12:47 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Tue, Aug 27, 2019 at 9:23 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Tue, Aug 27, 2019 at 09:00:52PM +0200, Arnd Bergmann wrote:
> > > On Tue, Aug 27, 2019 at 5:00 PM Ilie Halip <ilie.halip@gmail.com> wrote:
> > > >
> > > > > > $ clang-9 -c  crc32.i  -O2   ; objtool check  crc32.o
> > > > > > crc32.o: warning: objtool: fn1 uses BP as a scratch register
> > > >
> > > > Yes, I see it too. https://godbolt.org/z/N56HW1
> > > >
> > > > > Do you still see this warning with -fno-omit-frame-pointer (assuming
> > > > > clang has that option)?
> > > >
> > > > Using this makes the warning go away. Running objtool with --no-fp
> > > > also gets rid of it.
> > >
> > > I still see the warning after adding back the -fno-omit-frame-pointer
> > > in my reduced test case:
> > >
> > > $ clang-9 -c  crc32.i -Werror -Wno-address-of-packed-member -Wall
> > > -Wno-pointer-sign -Wno-unused-value -Wno-constant-logical-operand -O2
> > > -Wno-unused -fno-omit-frame-pointer
> > > $ objtool check  crc32.o
> > > crc32.o: warning: objtool: fn1 uses BP as a scratch register
> >
> > This warning most likely means that clang is clobbering RBP in leaf
> > functions.  With -fno-omit-frame-pointer, leaf functions don't need to
> > set up the frame pointer, but they at least need to leave RBP untouched,
> > so that an interrupts/exceptions can unwind through the function.
>
> Yes, that clearly matches what I see in the output where it does
>
>    0: 55                    push   %rbp
> ...
>   73: 0f b6 ef              movzbl %bh,%ebp
>   76: 8b 1c 99              mov    (%rcx,%rbx,4),%ebx
>   79: 33 1c aa              xor    (%rdx,%rbp,4),%ebx
> ...
>   95: 5d                    pop    %rbp
>   96: c3                    retq
>
> I just did another simple test: an x86-64 defconfig build with
> UNWINDER_FRAME_POINTER shows the exact symptom as
> my randconfig, so it sounds like any configuration with frame
> pointers would, and there is nothing else to it (this also makes
> sense given that it happens with a relatively simple test case
> outside of the kernel).
>
>        Arnd

Thanks for the description of the issue and the reduced test case.  It
almost reminds me of
https://github.com/ClangBuiltLinux/linux/issues/612.

I've filed https://bugs.llvm.org/show_bug.cgi?id=43128, anything I
should add to the bug report?

-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-27 21:21           ` Nick Desaulniers
@ 2019-08-28  9:00             ` Arnd Bergmann
  2019-08-28 14:06               ` Arnd Bergmann
                                 ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Arnd Bergmann @ 2019-08-28  9:00 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Josh Poimboeuf, Ilie Halip, Linux Kernel Mailing List, clang-built-linux

On Tue, Aug 27, 2019 at 11:22 PM 'Nick Desaulniers' via Clang Built
Linux <clang-built-linux@googlegroups.com> wrote:
> On Tue, Aug 27, 2019 at 12:47 PM Arnd Bergmann <arnd@arndb.de> wrote:
> > On Tue, Aug 27, 2019 at 9:23 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > > On Tue, Aug 27, 2019 at 09:00:52PM +0200, Arnd Bergmann wrote:
> > > > On Tue, Aug 27, 2019 at 5:00 PM Ilie Halip <ilie.halip@gmail.com> wrote:
> Thanks for the description of the issue and the reduced test case.  It
> almost reminds me of
> https://github.com/ClangBuiltLinux/linux/issues/612.
>
> I've filed https://bugs.llvm.org/show_bug.cgi?id=43128, anything I
> should add to the bug report?

I tried the suggestion to add

diff --git a/Makefile b/Makefile
index 1b23f95db176..97f7bc4c9b4e 100644
--- a/Makefile
+++ b/Makefile
@@ -755,7 +755,7 @@ endif

 KBUILD_CFLAGS += $(call cc-disable-warning, unused-const-variable)
 ifdef CONFIG_FRAME_POINTER
-KBUILD_CFLAGS  += -fno-omit-frame-pointer -fno-optimize-sibling-calls
+KBUILD_CFLAGS  += -fno-omit-frame-pointer -fno-optimize-sibling-calls
$(call cc-option, -mno-omit-leaf-frame-pointer)
 else
 # Some targets (ARM with Thumb2, for example), can't be built with frame
 # pointers.  For those, we don't have FUNCTION_TRACER automatically

from https://bugs.llvm.org/show_bug.cgi?id=43128, this avoids all the
"uses BP as a scratch register" warnings as well as almost all the "call without
frame pointer save/setup" warnings I also saw.

Only a few unique objtool warnings remain now, here are the ones I
currently see,
along with .config files. Let me know which ones I should investigate further,
I assume a lot of these are known issues:

http://paste.ubuntu.com/p/XjdDsypRxX/
0x5BA1B7A1:arch/x86/ia32/ia32_signal.o: warning: objtool:
ia32_setup_rt_frame()+0x238: call to memset() with UACCESS enabled
0x5BA1B7A1:arch/x86/kernel/signal.o: warning: objtool:
__setup_rt_frame()+0x5b8: call to memset() with UACCESS enabled
0x5BA1B7A1:mm/kasan/common.o: warning: objtool: kasan_report()+0x44:
call to __stack_chk_fail() with UACCESS enabled
0x5BA1B7A1:kernel/trace/trace_selftest_dynamic.o: warning: objtool:
__llvm_gcov_writeout()+0x13: call without frame pointer save/setup
0x5BA1B7A1:kernel/trace/trace_selftest_dynamic.o: warning: objtool:
__llvm_gcov_flush()+0x0: call without frame pointer save/setup
0x5BA1B7A1:kernel/trace/trace_clock.o: warning: objtool:
__llvm_gcov_writeout()+0x14: call without frame pointer save/setup
0x5BA1B7A1:kernel/trace/trace_clock.o: warning: objtool:
__llvm_gcov_flush()+0x0: call without frame pointer save/setup
0x5BA1B7A1:kernel/trace/*: # many more of the same, all in this directory
0x5BA1B7A1:kernel/trace/trace_uprobe.o: warning: objtool:
__llvm_gcov_flush()+0x0: call without frame pointer save/setup

http://paste.ubuntu.com/p/PyYNBK5Yx2/
0xC1CF60CC:arch/x86/ia32/ia32_signal.o: warning: objtool:
ia32_setup_rt_frame()+0x205: call to memset() with UACCESS enabled
0xC1CF60CC:arch/x86/kernel/signal.o: warning: objtool:
__setup_rt_frame()+0x597: call to memset() with UACCESS enabled
0xC1CF60CC:arch/x86/kernel/process.o: warning: objtool:
play_dead()+0x3: unreachable instruction
0xC1CF60CC:mm/kasan/common.o: warning: objtool: kasan_report()+0x52:
call to __stack_chk_fail() with UACCESS enabled
0xC1CF60CC:kernel/sched/idle.o: warning: objtool:
switched_to_idle()+0x3: unreachable instruction
0xC1CF60CC:mm/madvise.o: warning: objtool: hugepage_madvise()+0x3:
unreachable instruction
0xC1CF60CC:mm/hugetlb.o: warning: objtool: hugetlb_vm_op_fault()+0x3:
unreachable instruction
0xC1CF60CC:kernel/exit.o: warning: objtool: abort()+0x3: unreachable instruction
0xC1CF60CC:fs/hugetlbfs/inode.o: warning: objtool:
hugetlbfs_write_end()+0x3: unreachable instruction
0xC1CF60CC:fs/xfs/xfs_super.o: warning: objtool:
xfs_fs_alloc_inode()+0x3: unreachable instruction
0xC1CF60CC:drivers/mtd/nand/raw/nand_base.o: warning: objtool:
nand_read_oob()+0x18d4: unreachable instruction

http://paste.ubuntu.com/p/xCXyJR4Gx6/
0x99965895:arch/x86/ia32/ia32_signal.o: warning: objtool:
ia32_setup_rt_frame()+0x1f5: call to memset() with UACCESS enabled
0x99965895:arch/x86/kernel/signal.o: warning: objtool:
__setup_rt_frame()+0x57f: call to memset() with UACCESS enabled
0x99965895:drivers/pinctrl/pinctrl-ingenic.o: warning: objtool:
ingenic_pinconf_set()+0x10d: sibling call from callable instruction
with modified stack frame

http://paste.ubuntu.com/p/SFQXxh6zvy/
0x9278DEDC:drivers/media/dvb-frontends/cxd2880/cxd2880_tnrdmd_dvbt2.o:
warning: objtool: x_tune_dvbt2_demod_setting()+0x7f6: can't find
switch jump table
0x9278DEDC:net/xfrm/xfrm_output.o: warning: objtool:
xfrm_outer_mode_output()+0x109: unreachable instruction
0x9278DEDC:net/xfrm/xfrm_output.o: warning: objtool:
xfrm_outer_mode_output()+0x109: unreachable instruction

http://paste.ubuntu.com/p/9jW8yR6Tph/
0xE872D410:kernel/trace/trace_branch.o: warning: objtool:
ftrace_likely_update()+0x6c: call to __stack_chk_fail() with UACCESS
enabled
0xE872D410:drivers/hwmon/pmbus/adm1275.o: warning: objtool:
adm1275_probe()+0x756: unreachable instruction

http://paste.ubuntu.com/p/qg4bxZbxwq/
0xA833B0C9:drivers/gpu/drm/amd/amdgpu/atom.o: warning: objtool:
atom_op_move() falls through to next function atom_op_and()
0xA833B0C9:drivers/gpu/drm/i915/display/intel_combo_phy.o: warning:
objtool: cnl_set_procmon_ref_values()+0x125: can't find switch jump
table
0xA833B0C9:drivers/gpu/drm/radeon/atom.o: warning: objtool:
atom_op_move() falls through to next function atom_op_and()
0xA833B0C9:drivers/gpu/drm/radeon/evergreen_cs.o: warning: objtool:
evergreen_cs_parse() falls through to next function
evergreen_dma_cs_parse()

http://paste.ubuntu.com/p/W3nq9bSHHZ/
0xFBCA4E34:drivers/gpu/drm/amd/amdgpu/atom.o: warning: objtool:
atom_op_move() falls through to next function atom_op_and()
0xFBCA4E34:drivers/gpu/drm/amd/amdgpu/../display/dc/dce110/dce110_opp_csc_v.o:
warning: objtool: dce110_opp_v_set_csc_default()+0x2bc: can't find
switch jump table
0xFBCA4E34:drivers/gpu/drm/amd/amdgpu/../display/dc/dce110/dce110_mem_input_v.o:
warning: objtool: dce_mem_input_v_program_pte_vm()+0x27f: can't find
switch jump table

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-28  9:00             ` Arnd Bergmann
@ 2019-08-28 14:06               ` Arnd Bergmann
  2019-08-28 14:51               ` Josh Poimboeuf
  2019-08-28 15:13               ` Arnd Bergmann
  2 siblings, 0 replies; 38+ messages in thread
From: Arnd Bergmann @ 2019-08-28 14:06 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Josh Poimboeuf, Ilie Halip, Linux Kernel Mailing List, clang-built-linux

On Wed, Aug 28, 2019 at 11:00 AM Arnd Bergmann <arnd@arndb.de> wrote:
> On Tue, Aug 27, 2019 at 11:22 PM 'Nick Desaulniers' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote:

> http://paste.ubuntu.com/p/XjdDsypRxX/
> 0x5BA1B7A1:arch/x86/ia32/ia32_signal.o: warning: objtool:
> ia32_setup_rt_frame()+0x238: call to memset() with UACCESS enabled
> 0x5BA1B7A1:arch/x86/kernel/signal.o: warning: objtool:
> __setup_rt_frame()+0x5b8: call to memset() with UACCESS enabled
> 0x5BA1B7A1:mm/kasan/common.o: warning: objtool: kasan_report()+0x44:
> call to __stack_chk_fail() with UACCESS enabled
> 0x5BA1B7A1:kernel/trace/trace_selftest_dynamic.o: warning: objtool:
> __llvm_gcov_writeout()+0x13: call without frame pointer save/setup
> 0x5BA1B7A1:kernel/trace/trace_selftest_dynamic.o: warning: objtool:
> __llvm_gcov_flush()+0x0: call without frame pointer save/setup
> 0x5BA1B7A1:kernel/trace/trace_clock.o: warning: objtool:
> __llvm_gcov_writeout()+0x14: call without frame pointer save/setup
> 0x5BA1B7A1:kernel/trace/trace_clock.o: warning: objtool:
> __llvm_gcov_flush()+0x0: call without frame pointer save/setup
> 0x5BA1B7A1:kernel/trace/*: # many more of the same, all in this directory
> 0x5BA1B7A1:kernel/trace/trace_uprobe.o: warning: objtool:
> __llvm_gcov_flush()+0x0: call without frame pointer save/setup

I had a look here and opened
https://bugs.llvm.org/show_bug.cgi?id=43141

It seems that CONFIG_FRAME_POINTER is ignored for the functions
that are generated with CONFIG_GCOV_KERNEL. See also:

$ clang-9 -fprofile-arcs -fno-omit-frame-pointer -c -xc /dev/null   -o null.o
$ ./tools/objtool/objtool check null.o
null.o: warning: objtool: __llvm_gcov_writeout()+0x3e: call without
frame pointer save/setup
null.o: warning: objtool: __llvm_gcov_flush()+0x1: call without frame
pointer save/setup
null.o: warning: objtool: __llvm_gcov_init()+0x15: call without frame
pointer save/setup

       Arnd

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-28  9:00             ` Arnd Bergmann
  2019-08-28 14:06               ` Arnd Bergmann
@ 2019-08-28 14:51               ` Josh Poimboeuf
  2019-08-28 15:29                 ` Arnd Bergmann
  2019-08-28 15:13               ` Arnd Bergmann
  2 siblings, 1 reply; 38+ messages in thread
From: Josh Poimboeuf @ 2019-08-28 14:51 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Nick Desaulniers, Ilie Halip, Linux Kernel Mailing List,
	clang-built-linux

On Wed, Aug 28, 2019 at 11:00:04AM +0200, Arnd Bergmann wrote:
> On Tue, Aug 27, 2019 at 11:22 PM 'Nick Desaulniers' via Clang Built
> Linux <clang-built-linux@googlegroups.com> wrote:
> > On Tue, Aug 27, 2019 at 12:47 PM Arnd Bergmann <arnd@arndb.de> wrote:
> > > On Tue, Aug 27, 2019 at 9:23 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > > > On Tue, Aug 27, 2019 at 09:00:52PM +0200, Arnd Bergmann wrote:
> > > > > On Tue, Aug 27, 2019 at 5:00 PM Ilie Halip <ilie.halip@gmail.com> wrote:
> > Thanks for the description of the issue and the reduced test case.  It
> > almost reminds me of
> > https://github.com/ClangBuiltLinux/linux/issues/612.
> >
> > I've filed https://bugs.llvm.org/show_bug.cgi?id=43128, anything I
> > should add to the bug report?
> 
> I tried the suggestion to add
> 
> diff --git a/Makefile b/Makefile
> index 1b23f95db176..97f7bc4c9b4e 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -755,7 +755,7 @@ endif
> 
>  KBUILD_CFLAGS += $(call cc-disable-warning, unused-const-variable)
>  ifdef CONFIG_FRAME_POINTER
> -KBUILD_CFLAGS  += -fno-omit-frame-pointer -fno-optimize-sibling-calls
> +KBUILD_CFLAGS  += -fno-omit-frame-pointer -fno-optimize-sibling-calls
> $(call cc-option, -mno-omit-leaf-frame-pointer)
>  else
>  # Some targets (ARM with Thumb2, for example), can't be built with frame
>  # pointers.  For those, we don't have FUNCTION_TRACER automatically
> 
> from https://bugs.llvm.org/show_bug.cgi?id=43128, this avoids all the
> "uses BP as a scratch register" warnings as well as almost all the "call without
> frame pointer save/setup" warnings I also saw.
> 
> Only a few unique objtool warnings remain now, here are the ones I
> currently see,
> along with .config files. Let me know which ones I should investigate further,
> I assume a lot of these are known issues:

None of those look necessarily familiar.  What are the remaining "known"
clang issues which were found by objtool?

If you share .o files I can look at them.

> http://paste.ubuntu.com/p/XjdDsypRxX/
> 0x5BA1B7A1:arch/x86/ia32/ia32_signal.o: warning: objtool:
> ia32_setup_rt_frame()+0x238: call to memset() with UACCESS enabled
> 0x5BA1B7A1:arch/x86/kernel/signal.o: warning: objtool:
> __setup_rt_frame()+0x5b8: call to memset() with UACCESS enabled
> 0x5BA1B7A1:mm/kasan/common.o: warning: objtool: kasan_report()+0x44:
> call to __stack_chk_fail() with UACCESS enabled
> 0x5BA1B7A1:kernel/trace/trace_selftest_dynamic.o: warning: objtool:
> __llvm_gcov_writeout()+0x13: call without frame pointer save/setup
> 0x5BA1B7A1:kernel/trace/trace_selftest_dynamic.o: warning: objtool:
> __llvm_gcov_flush()+0x0: call without frame pointer save/setup
> 0x5BA1B7A1:kernel/trace/trace_clock.o: warning: objtool:
> __llvm_gcov_writeout()+0x14: call without frame pointer save/setup
> 0x5BA1B7A1:kernel/trace/trace_clock.o: warning: objtool:
> __llvm_gcov_flush()+0x0: call without frame pointer save/setup
> 0x5BA1B7A1:kernel/trace/*: # many more of the same, all in this directory
> 0x5BA1B7A1:kernel/trace/trace_uprobe.o: warning: objtool:
> __llvm_gcov_flush()+0x0: call without frame pointer save/setup
> 
> http://paste.ubuntu.com/p/PyYNBK5Yx2/
> 0xC1CF60CC:arch/x86/ia32/ia32_signal.o: warning: objtool:
> ia32_setup_rt_frame()+0x205: call to memset() with UACCESS enabled
> 0xC1CF60CC:arch/x86/kernel/signal.o: warning: objtool:
> __setup_rt_frame()+0x597: call to memset() with UACCESS enabled
> 0xC1CF60CC:arch/x86/kernel/process.o: warning: objtool:
> play_dead()+0x3: unreachable instruction
> 0xC1CF60CC:mm/kasan/common.o: warning: objtool: kasan_report()+0x52:
> call to __stack_chk_fail() with UACCESS enabled
> 0xC1CF60CC:kernel/sched/idle.o: warning: objtool:
> switched_to_idle()+0x3: unreachable instruction
> 0xC1CF60CC:mm/madvise.o: warning: objtool: hugepage_madvise()+0x3:
> unreachable instruction
> 0xC1CF60CC:mm/hugetlb.o: warning: objtool: hugetlb_vm_op_fault()+0x3:
> unreachable instruction
> 0xC1CF60CC:kernel/exit.o: warning: objtool: abort()+0x3: unreachable instruction
> 0xC1CF60CC:fs/hugetlbfs/inode.o: warning: objtool:
> hugetlbfs_write_end()+0x3: unreachable instruction
> 0xC1CF60CC:fs/xfs/xfs_super.o: warning: objtool:
> xfs_fs_alloc_inode()+0x3: unreachable instruction
> 0xC1CF60CC:drivers/mtd/nand/raw/nand_base.o: warning: objtool:
> nand_read_oob()+0x18d4: unreachable instruction
> 
> http://paste.ubuntu.com/p/xCXyJR4Gx6/
> 0x99965895:arch/x86/ia32/ia32_signal.o: warning: objtool:
> ia32_setup_rt_frame()+0x1f5: call to memset() with UACCESS enabled
> 0x99965895:arch/x86/kernel/signal.o: warning: objtool:
> __setup_rt_frame()+0x57f: call to memset() with UACCESS enabled
> 0x99965895:drivers/pinctrl/pinctrl-ingenic.o: warning: objtool:
> ingenic_pinconf_set()+0x10d: sibling call from callable instruction
> with modified stack frame
> 
> http://paste.ubuntu.com/p/SFQXxh6zvy/
> 0x9278DEDC:drivers/media/dvb-frontends/cxd2880/cxd2880_tnrdmd_dvbt2.o:
> warning: objtool: x_tune_dvbt2_demod_setting()+0x7f6: can't find
> switch jump table
> 0x9278DEDC:net/xfrm/xfrm_output.o: warning: objtool:
> xfrm_outer_mode_output()+0x109: unreachable instruction
> 0x9278DEDC:net/xfrm/xfrm_output.o: warning: objtool:
> xfrm_outer_mode_output()+0x109: unreachable instruction
> 
> http://paste.ubuntu.com/p/9jW8yR6Tph/
> 0xE872D410:kernel/trace/trace_branch.o: warning: objtool:
> ftrace_likely_update()+0x6c: call to __stack_chk_fail() with UACCESS
> enabled
> 0xE872D410:drivers/hwmon/pmbus/adm1275.o: warning: objtool:
> adm1275_probe()+0x756: unreachable instruction
> 
> http://paste.ubuntu.com/p/qg4bxZbxwq/
> 0xA833B0C9:drivers/gpu/drm/amd/amdgpu/atom.o: warning: objtool:
> atom_op_move() falls through to next function atom_op_and()
> 0xA833B0C9:drivers/gpu/drm/i915/display/intel_combo_phy.o: warning:
> objtool: cnl_set_procmon_ref_values()+0x125: can't find switch jump
> table
> 0xA833B0C9:drivers/gpu/drm/radeon/atom.o: warning: objtool:
> atom_op_move() falls through to next function atom_op_and()
> 0xA833B0C9:drivers/gpu/drm/radeon/evergreen_cs.o: warning: objtool:
> evergreen_cs_parse() falls through to next function
> evergreen_dma_cs_parse()
> 
> http://paste.ubuntu.com/p/W3nq9bSHHZ/
> 0xFBCA4E34:drivers/gpu/drm/amd/amdgpu/atom.o: warning: objtool:
> atom_op_move() falls through to next function atom_op_and()
> 0xFBCA4E34:drivers/gpu/drm/amd/amdgpu/../display/dc/dce110/dce110_opp_csc_v.o:
> warning: objtool: dce110_opp_v_set_csc_default()+0x2bc: can't find
> switch jump table
> 0xFBCA4E34:drivers/gpu/drm/amd/amdgpu/../display/dc/dce110/dce110_mem_input_v.o:
> warning: objtool: dce_mem_input_v_program_pte_vm()+0x27f: can't find
> switch jump table

-- 
Josh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-28  9:00             ` Arnd Bergmann
  2019-08-28 14:06               ` Arnd Bergmann
  2019-08-28 14:51               ` Josh Poimboeuf
@ 2019-08-28 15:13               ` Arnd Bergmann
  2019-08-28 15:22                 ` Josh Poimboeuf
  2 siblings, 1 reply; 38+ messages in thread
From: Arnd Bergmann @ 2019-08-28 15:13 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Josh Poimboeuf, Ilie Halip, Linux Kernel Mailing List, clang-built-linux

On Wed, Aug 28, 2019 at 11:00 AM Arnd Bergmann <arnd@arndb.de> wrote:
> On Tue, Aug 27, 2019 at 11:22 PM 'Nick Desaulniers' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote:
I figured this one out as well:

> http://paste.ubuntu.com/p/XjdDsypRxX/
> 0x5BA1B7A1:arch/x86/ia32/ia32_signal.o: warning: objtool:
> ia32_setup_rt_frame()+0x238: call to memset() with UACCESS enabled
> 0x5BA1B7A1:arch/x86/kernel/signal.o: warning: objtool:
> __setup_rt_frame()+0x5b8: call to memset() with UACCESS enabled

When CONFIG_KASAN is set, clang decides to use memset() to set
the first two struct members in this function:

 static inline void sas_ss_reset(struct task_struct *p)
 {
        p->sas_ss_sp = 0;
        p->sas_ss_size = 0;
        p->sas_ss_flags = SS_DISABLE;
 }

and that is called from save_altstack_ex(). Adding a barrier() after
the sas_ss_sp() works around the issue, but is certainly not the
best solution. Any other ideas?

       Arnd

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-28 15:13               ` Arnd Bergmann
@ 2019-08-28 15:22                 ` Josh Poimboeuf
  2019-08-28 15:28                   ` Arnd Bergmann
  0 siblings, 1 reply; 38+ messages in thread
From: Josh Poimboeuf @ 2019-08-28 15:22 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Nick Desaulniers, Ilie Halip, Linux Kernel Mailing List,
	clang-built-linux

On Wed, Aug 28, 2019 at 05:13:59PM +0200, Arnd Bergmann wrote:
> On Wed, Aug 28, 2019 at 11:00 AM Arnd Bergmann <arnd@arndb.de> wrote:
> > On Tue, Aug 27, 2019 at 11:22 PM 'Nick Desaulniers' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote:
> I figured this one out as well:
> 
> > http://paste.ubuntu.com/p/XjdDsypRxX/
> > 0x5BA1B7A1:arch/x86/ia32/ia32_signal.o: warning: objtool:
> > ia32_setup_rt_frame()+0x238: call to memset() with UACCESS enabled
> > 0x5BA1B7A1:arch/x86/kernel/signal.o: warning: objtool:
> > __setup_rt_frame()+0x5b8: call to memset() with UACCESS enabled
> 
> When CONFIG_KASAN is set, clang decides to use memset() to set
> the first two struct members in this function:
> 
>  static inline void sas_ss_reset(struct task_struct *p)
>  {
>         p->sas_ss_sp = 0;
>         p->sas_ss_size = 0;
>         p->sas_ss_flags = SS_DISABLE;
>  }
> 
> and that is called from save_altstack_ex(). Adding a barrier() after
> the sas_ss_sp() works around the issue, but is certainly not the
> best solution. Any other ideas?

Wow, is the compiler allowed to insert memset calls like that?  Seems a
bit overbearing, at least in a kernel context.  I don't recall GCC ever
doing it.

-- 
Josh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-28 15:22                 ` Josh Poimboeuf
@ 2019-08-28 15:28                   ` Arnd Bergmann
  2019-08-28 15:40                     ` Arnd Bergmann
  2019-08-29 17:34                     ` Josh Poimboeuf
  0 siblings, 2 replies; 38+ messages in thread
From: Arnd Bergmann @ 2019-08-28 15:28 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Nick Desaulniers, Ilie Halip, Linux Kernel Mailing List,
	clang-built-linux

On Wed, Aug 28, 2019 at 5:22 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Wed, Aug 28, 2019 at 05:13:59PM +0200, Arnd Bergmann wrote:
> > On Wed, Aug 28, 2019 at 11:00 AM Arnd Bergmann <arnd@arndb.de> wrote:
> > > On Tue, Aug 27, 2019 at 11:22 PM 'Nick Desaulniers' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote:
> > I figured this one out as well:
> >
> > > http://paste.ubuntu.com/p/XjdDsypRxX/
> > > 0x5BA1B7A1:arch/x86/ia32/ia32_signal.o: warning: objtool:
> > > ia32_setup_rt_frame()+0x238: call to memset() with UACCESS enabled
> > > 0x5BA1B7A1:arch/x86/kernel/signal.o: warning: objtool:
> > > __setup_rt_frame()+0x5b8: call to memset() with UACCESS enabled
> >
> > When CONFIG_KASAN is set, clang decides to use memset() to set
> > the first two struct members in this function:
> >
> >  static inline void sas_ss_reset(struct task_struct *p)
> >  {
> >         p->sas_ss_sp = 0;
> >         p->sas_ss_size = 0;
> >         p->sas_ss_flags = SS_DISABLE;
> >  }
> >
> > and that is called from save_altstack_ex(). Adding a barrier() after
> > the sas_ss_sp() works around the issue, but is certainly not the
> > best solution. Any other ideas?
>
> Wow, is the compiler allowed to insert memset calls like that?  Seems a
> bit overbearing, at least in a kernel context.  I don't recall GCC ever
> doing it.

Yes, it's free to assume that any standard library function behaves
as defined, so it can and will turn struct assignments into memcpy
or back, or replace string operations with others depending on what
seems better for optimization.

clang is more aggressive than gcc here, and this has caused some
other problems in the past, but it's usually harmless.

In theory, we could pass -ffreestanding to tell the compiler
not to make assumptions about standard library function behavior,
but that turns off all kinds of useful optimizations. The problem
is really that the kernel is neither exactly hosted nor freestanding.

       Arnd

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-28 14:51               ` Josh Poimboeuf
@ 2019-08-28 15:29                 ` Arnd Bergmann
  2019-08-28 17:57                   ` Josh Poimboeuf
  0 siblings, 1 reply; 38+ messages in thread
From: Arnd Bergmann @ 2019-08-28 15:29 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Nick Desaulniers, Ilie Halip, Linux Kernel Mailing List,
	clang-built-linux

[-- Attachment #1: Type: text/plain, Size: 826 bytes --]

On Wed, Aug 28, 2019 at 4:51 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Wed, Aug 28, 2019 at 11:00:04AM +0200, Arnd Bergmann wrote:
> > On Tue, Aug 27, 2019 at 11:22 PM 'Nick Desaulniers' via Clang Built
> > Linux <clang-built-linux@googlegroups.com> wrote:
> > > On Tue, Aug 27, 2019 at 12:47 PM Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > Only a few unique objtool warnings remain now, here are the ones I
> > currently see,
> > along with .config files. Let me know which ones I should investigate further,
> > I assume a lot of these are known issues:
>
> None of those look necessarily familiar.  What are the remaining "known"
> clang issues which were found by objtool?

Maybe Nick can identify some.

> If you share .o files I can look at them.

Attaching the ones I could easily recreate here.

       Arnd

[-- Attachment #2: objtool-warnings.tar.gz --]
[-- Type: application/gzip, Size: 217870 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-28 15:28                   ` Arnd Bergmann
@ 2019-08-28 15:40                     ` Arnd Bergmann
  2019-08-29 23:24                       ` Josh Poimboeuf
  2019-08-29 17:34                     ` Josh Poimboeuf
  1 sibling, 1 reply; 38+ messages in thread
From: Arnd Bergmann @ 2019-08-28 15:40 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Nick Desaulniers, Ilie Halip, Linux Kernel Mailing List,
	clang-built-linux

On Wed, Aug 28, 2019 at 5:28 PM Arnd Bergmann <arnd@arndb.de> wrote:
> On Wed, Aug 28, 2019 at 5:22 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Wed, Aug 28, 2019 at 05:13:59PM +0200, Arnd Bergmann wrote:
> > >
> > > When CONFIG_KASAN is set, clang decides to use memset() to set
> > > the first two struct members in this function:
> > >
> > >  static inline void sas_ss_reset(struct task_struct *p)
> > >  {
> > >         p->sas_ss_sp = 0;
> > >         p->sas_ss_size = 0;
> > >         p->sas_ss_flags = SS_DISABLE;
> > >  }
> > >
> > > and that is called from save_altstack_ex(). Adding a barrier() after
> > > the sas_ss_sp() works around the issue, but is certainly not the
> > > best solution. Any other ideas?
> >
> > Wow, is the compiler allowed to insert memset calls like that?  Seems a
> > bit overbearing, at least in a kernel context.  I don't recall GCC ever
> > doing it.
>
> Yes, it's free to assume that any standard library function behaves
> as defined, so it can and will turn struct assignments into memcpy
> or back, or replace string operations with others depending on what
> seems better for optimization.
>
> clang is more aggressive than gcc here, and this has caused some
> other problems in the past, but it's usually harmless.
>
> In theory, we could pass -ffreestanding to tell the compiler
> not to make assumptions about standard library function behavior,
> but that turns off all kinds of useful optimizations. The problem
> is really that the kernel is neither exactly hosted nor freestanding.

A slightly better workaround is to move the sas_ss_reset() out of
the try/catch block. Not sure if this is safe.

      Arnd

diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
index 1cee10091b9f..14f8decf0ebc 100644
--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -379,6 +379,9 @@ int ia32_setup_rt_frame(int sig, struct ksignal *ksig,
                put_user_ex(*((u64 *)&code), (u64 __user *)frame->retcode);
        } put_user_catch(err);

+       if (current->sas_ss_flags & SS_AUTODISARM)
+               sas_ss_reset(current);
+
        err |= __copy_siginfo_to_user32(&frame->info, &ksig->info, false);
        err |= ia32_setup_sigcontext(&frame->uc.uc_mcontext, fpstate,
                                     regs, set->sig[0]);
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 8eb7193e158d..fd49d28abbc5 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -414,6 +414,9 @@ static int __setup_rt_frame(int sig, struct ksignal *ksig,
                 */
                put_user_ex(*((u64 *)&rt_retcode), (u64 *)frame->retcode);
        } put_user_catch(err);
+
+       if (current->sas_ss_flags & SS_AUTODISARM)
+               sas_ss_reset(current);

        err |= copy_siginfo_to_user(&frame->info, &ksig->info);
        err |= setup_sigcontext(&frame->uc.uc_mcontext, fpstate,
diff --git a/include/linux/compat.h b/include/linux/compat.h
index a320495fd577..f5e36931e029 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -520,8 +520,6 @@ int __compat_save_altstack(compat_stack_t __user
*, unsigned long);
        put_user_ex(ptr_to_compat((void __user *)t->sas_ss_sp),
&__uss->ss_sp); \
        put_user_ex(t->sas_ss_flags, &__uss->ss_flags); \
        put_user_ex(t->sas_ss_size, &__uss->ss_size); \
-       if (t->sas_ss_flags & SS_AUTODISARM) \
-               sas_ss_reset(t); \
 } while (0);

 /*
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 67ceb6d7c869..9056239787f7 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -435,8 +435,6 @@ int __save_altstack(stack_t __user *, unsigned long);
        put_user_ex((void __user *)t->sas_ss_sp, &__uss->ss_sp); \
        put_user_ex(t->sas_ss_flags, &__uss->ss_flags); \
        put_user_ex(t->sas_ss_size, &__uss->ss_size); \
-       if (t->sas_ss_flags & SS_AUTODISARM) \
-               sas_ss_reset(t); \
 } while (0);

 #ifdef CONFIG_PROC_FS

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-28 15:29                 ` Arnd Bergmann
@ 2019-08-28 17:57                   ` Josh Poimboeuf
  2019-08-28 19:41                     ` Arnd Bergmann
  0 siblings, 1 reply; 38+ messages in thread
From: Josh Poimboeuf @ 2019-08-28 17:57 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Nick Desaulniers, Ilie Halip, Linux Kernel Mailing List,
	clang-built-linux

On Wed, Aug 28, 2019 at 05:29:39PM +0200, Arnd Bergmann wrote:
> On Wed, Aug 28, 2019 at 4:51 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Wed, Aug 28, 2019 at 11:00:04AM +0200, Arnd Bergmann wrote:
> > > On Tue, Aug 27, 2019 at 11:22 PM 'Nick Desaulniers' via Clang Built
> > > Linux <clang-built-linux@googlegroups.com> wrote:
> > > > On Tue, Aug 27, 2019 at 12:47 PM Arnd Bergmann <arnd@arndb.de> wrote:
> > >
> > > Only a few unique objtool warnings remain now, here are the ones I
> > > currently see,
> > > along with .config files. Let me know which ones I should investigate further,
> > > I assume a lot of these are known issues:
> >
> > None of those look necessarily familiar.  What are the remaining "known"
> > clang issues which were found by objtool?
> 
> Maybe Nick can identify some.
> 
> > If you share .o files I can look at them.
> 
> Attaching the ones I could easily recreate here.

adm1275.o: warning: objtool: adm1275_probe()+0x756: unreachable instruction
- known issue: clang switch table has invalid entries
- also an objtool bug: need to report it as "falls through to next function"

atom.o: warning: objtool: atom_op_move() falls through to next function atom_op_and()
evergreen_cs.o: warning: objtool: evergreen_cs_parse() falls through to next function evergreen_dma_cs_parse()
- known issue: clang switch table has invalid entries

common.o: warning: objtool: kasan_report()+0x52: call to __stack_chk_fail() with UACCESS enabled
trace_branch.o: warning: objtool: ftrace_likely_update()+0x6c: call to __stack_chk_fail() with UACCESS enabled
- objtool bug: need to add __stack_chk_fail to uaccess whitelist

cxd2880_tnrdmd_dvbt2.o: warning: objtool: x_tune_dvbt2_demod_setting()+0x7f6: can't find switch jump table
- objtool bug: tricky switch table issue

exit.o: warning: objtool: abort()+0x3: unreachable instruction
hugetlb.o: warning: objtool: hugetlb_vm_op_fault()+0x3: unreachable instruction
idle.o: warning: objtool: switched_to_idle()+0x3: unreachable instruction
madvise.o: warning: objtool: hugepage_madvise()+0x3: unreachable instruction
privcmd.o: warning: objtool: privcmd_ioctl_mmap_batch()+0x5dd: unreachable instruction
process.o: warning: objtool: play_dead()+0x3: unreachable instruction
rmap.o: warning: objtool: anon_vma_clone()+0x1c2: unreachable instruction
s5c73m3-core.o: warning: objtool: s5c73m3_probe()+0x262: unreachable instruction
videobuf2-core.o: warning: objtool: vb2_core_dqbuf()+0xae6: unreachable instruction
xfrm_output.o: warning: objtool: xfrm_outer_mode_output()+0x109: unreachable instruction
- clang issue: trying to finish frame pointer setup after BUG() in unreachable code path

pinctrl-ingenic.o: warning: objtool: ingenic_pinconf_set()+0x10d: sibling call from callable instruction with modified stack frame
- bad clang bug: sibling call without first popping registers

-- 
Josh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-28 17:57                   ` Josh Poimboeuf
@ 2019-08-28 19:41                     ` Arnd Bergmann
  0 siblings, 0 replies; 38+ messages in thread
From: Arnd Bergmann @ 2019-08-28 19:41 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Nick Desaulniers, Ilie Halip, Linux Kernel Mailing List,
	clang-built-linux

On Wed, Aug 28, 2019 at 7:57 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Wed, Aug 28, 2019 at 05:29:39PM +0200, Arnd Bergmann wrote:
> > On Wed, Aug 28, 2019 at 4:51 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> exit.o: warning: objtool: abort()+0x3: unreachable instruction
> hugetlb.o: warning: objtool: hugetlb_vm_op_fault()+0x3: unreachable instruction
> idle.o: warning: objtool: switched_to_idle()+0x3: unreachable instruction
> madvise.o: warning: objtool: hugepage_madvise()+0x3: unreachable instruction
> privcmd.o: warning: objtool: privcmd_ioctl_mmap_batch()+0x5dd: unreachable instruction
> process.o: warning: objtool: play_dead()+0x3: unreachable instruction
> rmap.o: warning: objtool: anon_vma_clone()+0x1c2: unreachable instruction
> s5c73m3-core.o: warning: objtool: s5c73m3_probe()+0x262: unreachable instruction
> videobuf2-core.o: warning: objtool: vb2_core_dqbuf()+0xae6: unreachable instruction
> xfrm_output.o: warning: objtool: xfrm_outer_mode_output()+0x109: unreachable instruction
> - clang issue: trying to finish frame pointer setup after BUG() in unreachable code path
>
> pinctrl-ingenic.o: warning: objtool: ingenic_pinconf_set()+0x10d: sibling call from callable instruction with modified stack frame
> - bad clang bug: sibling call without first popping registers

I reduced the last one to https://godbolt.org/z/7lZZL3

enum { PIN_CONFIG_BIAS_DISABLE } pinconf_to_config_param(void);
void ingenic_pinconf_set() {
  switch (pinconf_to_config_param())
  case PIN_CONFIG_BIAS_DISABLE: {
    asm("%c0:\n\t.pushsection .discard.unreachable\n\t.long %c0b - "
        ".\n\t.popsection\n\t"
        :
        : "i"(6));
    __builtin_unreachable();
  }
}
void ingenic_pinconf_group_set() {}

$ clang-9  -Os -mretpoline-external-thunk -fno-omit-frame-pointer -c
pinctrl-ingenic.i
$ objtool check  --retpoline --uaccess pinctrl-ingenic.o
pinctrl-ingenic.o: warning: objtool: ingenic_pinconf_set()+0xb:
sibling call from callable instruction with modified stack frame

$objdump -d pinctrl-ingenic.o
0000000000000000 <ingenic_pinconf_set>:
   0: 55                    push   %rbp
   1: 48 89 e5              mov    %rsp,%rbp
   4: e8 00 00 00 00        callq  9 <ingenic_pinconf_set+0x9>
5: R_X86_64_PLT32 pinconf_to_config_param-0x4
   9: 85 c0                test   %eax,%eax
   b: 74 02                je     f <ingenic_pinconf_group_set>
   d: 5d                    pop    %rbp
   e: c3                    retq

000000000000000f <ingenic_pinconf_group_set>:
   f: c3                    retq

I suspect that's actually another variant of the others. It doesn't
seem to be an
actual sibling call, just branching into what happens to be the start of the
next function in an unreachable code path.

Using '-O2' instead of '-0s', I get

pinctrl-ingenic.o: warning: objtool: ingenic_pinconf_set() falls
through to next function ingenic_pinconf_group_set()

0000000000000000 <ingenic_pinconf_set>:
   0: 55                    push   %rbp
   1: 48 89 e5              mov    %rsp,%rbp
   4: e8 00 00 00 00        callq  9 <ingenic_pinconf_set+0x9>
5: R_X86_64_PLT32 pinconf_to_config_param-0x4
   9: 85 c0                test   %eax,%eax
   b: 74 02                je     f <ingenic_pinconf_set+0xf>
   d: 5d                    pop    %rbp
   e: c3                    retq
   f: 90                    nop

0000000000000010 <ingenic_pinconf_group_set>:
  10: c3                    retq

        Arnd

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-27 19:22       ` Josh Poimboeuf
  2019-08-27 19:47         ` Arnd Bergmann
@ 2019-08-28 22:13         ` Nick Desaulniers
  2019-08-29  0:28           ` Josh Poimboeuf
  1 sibling, 1 reply; 38+ messages in thread
From: Nick Desaulniers @ 2019-08-28 22:13 UTC (permalink / raw)
  To: Josh Poimboeuf, Arnd Bergmann
  Cc: Ilie Halip, Linux Kernel Mailing List, clang-built-linux, Eli Friedman

On Tue, Aug 27, 2019 at 12:22 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> On Tue, Aug 27, 2019 at 09:00:52PM +0200, Arnd Bergmann wrote:
> > On Tue, Aug 27, 2019 at 5:00 PM Ilie Halip <ilie.halip@gmail.com> wrote:
> > >
> > > > > $ clang-9 -c  crc32.i  -O2   ; objtool check  crc32.o
> > > > > crc32.o: warning: objtool: fn1 uses BP as a scratch register
> > >
> > > Yes, I see it too. https://godbolt.org/z/N56HW1
> > >
> > > > Do you still see this warning with -fno-omit-frame-pointer (assuming
> > > > clang has that option)?
> > >
> > > Using this makes the warning go away. Running objtool with --no-fp
> > > also gets rid of it.
> >
> > I still see the warning after adding back the -fno-omit-frame-pointer
> > in my reduced test case:
> >
> > $ clang-9 -c  crc32.i -Werror -Wno-address-of-packed-member -Wall
> > -Wno-pointer-sign -Wno-unused-value -Wno-constant-logical-operand -O2
> > -Wno-unused -fno-omit-frame-pointer
> > $ objtool check  crc32.o
> > crc32.o: warning: objtool: fn1 uses BP as a scratch register
>
> This warning most likely means that clang is clobbering RBP in leaf
> functions.  With -fno-omit-frame-pointer, leaf functions don't need to
> set up the frame pointer, but they at least need to leave RBP untouched,
> so that an interrupts/exceptions can unwind through the function.

It sounds like clang has `-mno-omit-leaf-frame-pointer` (via
https://bugs.llvm.org/show_bug.cgi?id=43128#c6).  Arnd, can you give
that a shot?
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-28 22:13         ` Nick Desaulniers
@ 2019-08-29  0:28           ` Josh Poimboeuf
  0 siblings, 0 replies; 38+ messages in thread
From: Josh Poimboeuf @ 2019-08-29  0:28 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Arnd Bergmann, Ilie Halip, Linux Kernel Mailing List,
	clang-built-linux, Eli Friedman

On Wed, Aug 28, 2019 at 03:13:21PM -0700, Nick Desaulniers wrote:
> On Tue, Aug 27, 2019 at 12:22 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >
> > On Tue, Aug 27, 2019 at 09:00:52PM +0200, Arnd Bergmann wrote:
> > > On Tue, Aug 27, 2019 at 5:00 PM Ilie Halip <ilie.halip@gmail.com> wrote:
> > > >
> > > > > > $ clang-9 -c  crc32.i  -O2   ; objtool check  crc32.o
> > > > > > crc32.o: warning: objtool: fn1 uses BP as a scratch register
> > > >
> > > > Yes, I see it too. https://godbolt.org/z/N56HW1
> > > >
> > > > > Do you still see this warning with -fno-omit-frame-pointer (assuming
> > > > > clang has that option)?
> > > >
> > > > Using this makes the warning go away. Running objtool with --no-fp
> > > > also gets rid of it.
> > >
> > > I still see the warning after adding back the -fno-omit-frame-pointer
> > > in my reduced test case:
> > >
> > > $ clang-9 -c  crc32.i -Werror -Wno-address-of-packed-member -Wall
> > > -Wno-pointer-sign -Wno-unused-value -Wno-constant-logical-operand -O2
> > > -Wno-unused -fno-omit-frame-pointer
> > > $ objtool check  crc32.o
> > > crc32.o: warning: objtool: fn1 uses BP as a scratch register
> >
> > This warning most likely means that clang is clobbering RBP in leaf
> > functions.  With -fno-omit-frame-pointer, leaf functions don't need to
> > set up the frame pointer, but they at least need to leave RBP untouched,
> > so that an interrupts/exceptions can unwind through the function.
> 
> It sounds like clang has `-mno-omit-leaf-frame-pointer` (via
> https://bugs.llvm.org/show_bug.cgi?id=43128#c6).  Arnd, can you give
> that a shot?

I think Arnd already confirmed that -mno-omit-leaf-frame-pointer fixed
it in an earlier email.

-- 
Josh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-28 15:28                   ` Arnd Bergmann
  2019-08-28 15:40                     ` Arnd Bergmann
@ 2019-08-29 17:34                     ` Josh Poimboeuf
  2019-08-29 18:30                       ` Linus Torvalds
  1 sibling, 1 reply; 38+ messages in thread
From: Josh Poimboeuf @ 2019-08-29 17:34 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Nick Desaulniers, Ilie Halip, Linux Kernel Mailing List,
	clang-built-linux, Peter Zijlstra, Linus Torvalds,
	Paul E. McKenney

On Wed, Aug 28, 2019 at 05:28:50PM +0200, Arnd Bergmann wrote:
> On Wed, Aug 28, 2019 at 5:22 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Wed, Aug 28, 2019 at 05:13:59PM +0200, Arnd Bergmann wrote:
> > > On Wed, Aug 28, 2019 at 11:00 AM Arnd Bergmann <arnd@arndb.de> wrote:
> > > > On Tue, Aug 27, 2019 at 11:22 PM 'Nick Desaulniers' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote:
> > > I figured this one out as well:
> > >
> > > > http://paste.ubuntu.com/p/XjdDsypRxX/
> > > > 0x5BA1B7A1:arch/x86/ia32/ia32_signal.o: warning: objtool:
> > > > ia32_setup_rt_frame()+0x238: call to memset() with UACCESS enabled
> > > > 0x5BA1B7A1:arch/x86/kernel/signal.o: warning: objtool:
> > > > __setup_rt_frame()+0x5b8: call to memset() with UACCESS enabled
> > >
> > > When CONFIG_KASAN is set, clang decides to use memset() to set
> > > the first two struct members in this function:
> > >
> > >  static inline void sas_ss_reset(struct task_struct *p)
> > >  {
> > >         p->sas_ss_sp = 0;
> > >         p->sas_ss_size = 0;
> > >         p->sas_ss_flags = SS_DISABLE;
> > >  }
> > >
> > > and that is called from save_altstack_ex(). Adding a barrier() after
> > > the sas_ss_sp() works around the issue, but is certainly not the
> > > best solution. Any other ideas?
> >
> > Wow, is the compiler allowed to insert memset calls like that?  Seems a
> > bit overbearing, at least in a kernel context.  I don't recall GCC ever
> > doing it.
> 
> Yes, it's free to assume that any standard library function behaves
> as defined, so it can and will turn struct assignments into memcpy
> or back, or replace string operations with others depending on what
> seems better for optimization.
> 
> clang is more aggressive than gcc here, and this has caused some
> other problems in the past, but it's usually harmless.
> 
> In theory, we could pass -ffreestanding to tell the compiler
> not to make assumptions about standard library function behavior,
> but that turns off all kinds of useful optimizations. The problem
> is really that the kernel is neither exactly hosted nor freestanding.

Adding a few people who may be interested in this discussion.

Peter suggested to try WRITE_ONCE for the two zero writes to see if that
"fixes" it.

However, according to Peter and Paul, but not Mary, Linus has argued
that normal aligned word stores shouldn't tear.  If that's our story
then it sounds like it's a compiler issue and WRITE_ONCE shouldn't
really be needed here to prevent the memset() call.

-- 
Josh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-29 17:34                     ` Josh Poimboeuf
@ 2019-08-29 18:30                       ` Linus Torvalds
  2019-08-29 20:21                         ` Arnd Bergmann
  2019-09-04 11:53                         ` Geert Uytterhoeven
  0 siblings, 2 replies; 38+ messages in thread
From: Linus Torvalds @ 2019-08-29 18:30 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Arnd Bergmann, Nick Desaulniers, Ilie Halip,
	Linux Kernel Mailing List, clang-built-linux, Peter Zijlstra,
	Paul E. McKenney

On Thu, Aug 29, 2019 at 10:35 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> Peter suggested to try WRITE_ONCE for the two zero writes to see if that
> "fixes" it.

I'm sure it "fixes" it.

.. and then where else will we hit this?

It's one thing to turn a structure zeroing into "memset()", but some
places really can't do it.

We use "-ffreestanding" in some places to make sure that gcc doesn't
start calling random libc routines. I wonder if we need to make it a
general rule that it's done unconditionally.

Sadly, I think that ends up also disabling things like
"__builtin_memcpy()" and friends. Which we _do_ want to have access
to, because then gcc can inline the memcpy() when we _do_ use
memcpy().

We used to do all of those heuristics by hand, but wanted to let the
compiler do them for us.

So:

 - we do want "memcpy()" to become "__builtin_memcpy()" which can then
be optimized to either individual inlined assignments _or_ to an
out-of-line call to memcpy().

 - we do *not* want individual assignments to be randomly turned into
memset/memcpy(), because of various different reasons (including
function tracing, but also store tearing, yadda yadda)

Conceptually, "-ffreestanding" is definitely what a kernel needs, but
it has been *too* big of a hammer and disables real code generation,
iirc.

             Linus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-29 18:30                       ` Linus Torvalds
@ 2019-08-29 20:21                         ` Arnd Bergmann
  2019-08-29 22:26                           ` Linus Torvalds
  2019-09-04 11:53                         ` Geert Uytterhoeven
  1 sibling, 1 reply; 38+ messages in thread
From: Arnd Bergmann @ 2019-08-29 20:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, Nick Desaulniers, Ilie Halip,
	Linux Kernel Mailing List, clang-built-linux, Peter Zijlstra,
	Paul E. McKenney

On Thu, Aug 29, 2019 at 8:30 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Aug 29, 2019 at 10:35 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> So:
>
>  - we do want "memcpy()" to become "__builtin_memcpy()" which can then
> be optimized to either individual inlined assignments _or_ to an
> out-of-line call to memcpy().
>
>  - we do *not* want individual assignments to be randomly turned into
> memset/memcpy(), because of various different reasons (including
> function tracing, but also store tearing, yadda yadda)
>
> Conceptually, "-ffreestanding" is definitely what a kernel needs, but
> it has been *too* big of a hammer and disables real code generation,
> iirc.

I just tried passing just "-fno-builtin-memcpy -fno-builtin-memset".
This avoids going all the way to -ffreestanding and prevents the insertion
of unwanted memcpy and memset calls, but unfortunately (and
unsurprisingly) it also prevents the optimization of trivial memset calls.

On the other hand, I could not produce any trivial case like this without
CONFIG_KASAN, see https://godbolt.org/z/v440Qy

clang seems to behave similarly to gcc here, it will produce
calls to memset or memcpy when setting a lot of adjacent
members (17 for x86-clang, 29 for arm64 gcc), but not for two
or three of them. x86 gcc appears to always use string instructions
over memset().

Maybe we can just pass -fno-builtin-memcpy -fno-builtin-memset
for clang when CONFIG_KASAN is set and hope for the best?

        Arnd

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-29 20:21                         ` Arnd Bergmann
@ 2019-08-29 22:26                           ` Linus Torvalds
  2019-08-30 15:02                             ` Josh Poimboeuf
  0 siblings, 1 reply; 38+ messages in thread
From: Linus Torvalds @ 2019-08-29 22:26 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Josh Poimboeuf, Nick Desaulniers, Ilie Halip,
	Linux Kernel Mailing List, clang-built-linux, Peter Zijlstra,
	Paul E. McKenney

On Thu, Aug 29, 2019 at 1:22 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> Maybe we can just pass -fno-builtin-memcpy -fno-builtin-memset
> for clang when CONFIG_KASAN is set and hope for the best?

I really hate how that disables conversions both ways, which is kind
of pointless and wrong.  It's really just "we don't want surprising
memcpy calls for single writes".

Disabling all the *good* "optimize memset/memcpy" cases is really sad.

We actually have a lot of small structures in the kernel on purpose
(often for type safety), and I bet we use memcpy on them on purpose at
times. I'd hate to see that become a function call rather than "copy
two words by hand".

Even for KASAN.

And I guess that when the compiler sees 20+ "set to zero" it's quite
reasonable to say "just turn it into a memset".

So maybe the right thing to do is to just special-case this code, and
hope for the best. If moving the sas_ss_reset() out of the try/catch
thing works, then by all means lets just do that.

(Partly because I've actually wanted to turn the try/catchj thing into
a _real_ try/catch, not a "fall through and check at the end" like it
just happens to be now - the try/catch is actually very misleading as
it is now. So if this is the only case that matters, then...).

                Linus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-28 15:40                     ` Arnd Bergmann
@ 2019-08-29 23:24                       ` Josh Poimboeuf
  2019-08-30 10:44                         ` Arnd Bergmann
  0 siblings, 1 reply; 38+ messages in thread
From: Josh Poimboeuf @ 2019-08-29 23:24 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Nick Desaulniers, Ilie Halip, Linux Kernel Mailing List,
	clang-built-linux

On Wed, Aug 28, 2019 at 05:40:01PM +0200, Arnd Bergmann wrote:
> On Wed, Aug 28, 2019 at 5:28 PM Arnd Bergmann <arnd@arndb.de> wrote:
> > On Wed, Aug 28, 2019 at 5:22 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > > On Wed, Aug 28, 2019 at 05:13:59PM +0200, Arnd Bergmann wrote:
> > > >
> > > > When CONFIG_KASAN is set, clang decides to use memset() to set
> > > > the first two struct members in this function:
> > > >
> > > >  static inline void sas_ss_reset(struct task_struct *p)
> > > >  {
> > > >         p->sas_ss_sp = 0;
> > > >         p->sas_ss_size = 0;
> > > >         p->sas_ss_flags = SS_DISABLE;
> > > >  }
> > > >
> > > > and that is called from save_altstack_ex(). Adding a barrier() after
> > > > the sas_ss_sp() works around the issue, but is certainly not the
> > > > best solution. Any other ideas?
> > >
> > > Wow, is the compiler allowed to insert memset calls like that?  Seems a
> > > bit overbearing, at least in a kernel context.  I don't recall GCC ever
> > > doing it.
> >
> > Yes, it's free to assume that any standard library function behaves
> > as defined, so it can and will turn struct assignments into memcpy
> > or back, or replace string operations with others depending on what
> > seems better for optimization.
> >
> > clang is more aggressive than gcc here, and this has caused some
> > other problems in the past, but it's usually harmless.
> >
> > In theory, we could pass -ffreestanding to tell the compiler
> > not to make assumptions about standard library function behavior,
> > but that turns off all kinds of useful optimizations. The problem
> > is really that the kernel is neither exactly hosted nor freestanding.
> 
> A slightly better workaround is to move the sas_ss_reset() out of
> the try/catch block. Not sure if this is safe.
> 
>       Arnd
> 
> diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
> index 1cee10091b9f..14f8decf0ebc 100644
> --- a/arch/x86/ia32/ia32_signal.c
> +++ b/arch/x86/ia32/ia32_signal.c
> @@ -379,6 +379,9 @@ int ia32_setup_rt_frame(int sig, struct ksignal *ksig,
>                 put_user_ex(*((u64 *)&code), (u64 __user *)frame->retcode);
>         } put_user_catch(err);
> 
> +       if (current->sas_ss_flags & SS_AUTODISARM)
> +               sas_ss_reset(current);
> +
>         err |= __copy_siginfo_to_user32(&frame->info, &ksig->info, false);
>         err |= ia32_setup_sigcontext(&frame->uc.uc_mcontext, fpstate,
>                                      regs, set->sig[0]);
> diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
> index 8eb7193e158d..fd49d28abbc5 100644
> --- a/arch/x86/kernel/signal.c
> +++ b/arch/x86/kernel/signal.c
> @@ -414,6 +414,9 @@ static int __setup_rt_frame(int sig, struct ksignal *ksig,
>                  */
>                 put_user_ex(*((u64 *)&rt_retcode), (u64 *)frame->retcode);
>         } put_user_catch(err);
> +
> +       if (current->sas_ss_flags & SS_AUTODISARM)
> +               sas_ss_reset(current);
> 
>         err |= copy_siginfo_to_user(&frame->info, &ksig->info);
>         err |= setup_sigcontext(&frame->uc.uc_mcontext, fpstate,
> diff --git a/include/linux/compat.h b/include/linux/compat.h
> index a320495fd577..f5e36931e029 100644
> --- a/include/linux/compat.h
> +++ b/include/linux/compat.h
> @@ -520,8 +520,6 @@ int __compat_save_altstack(compat_stack_t __user
> *, unsigned long);
>         put_user_ex(ptr_to_compat((void __user *)t->sas_ss_sp),
> &__uss->ss_sp); \
>         put_user_ex(t->sas_ss_flags, &__uss->ss_flags); \
>         put_user_ex(t->sas_ss_size, &__uss->ss_size); \
> -       if (t->sas_ss_flags & SS_AUTODISARM) \
> -               sas_ss_reset(t); \
>  } while (0);
> 
>  /*
> diff --git a/include/linux/signal.h b/include/linux/signal.h
> index 67ceb6d7c869..9056239787f7 100644
> --- a/include/linux/signal.h
> +++ b/include/linux/signal.h
> @@ -435,8 +435,6 @@ int __save_altstack(stack_t __user *, unsigned long);
>         put_user_ex((void __user *)t->sas_ss_sp, &__uss->ss_sp); \
>         put_user_ex(t->sas_ss_flags, &__uss->ss_flags); \
>         put_user_ex(t->sas_ss_size, &__uss->ss_size); \
> -       if (t->sas_ss_flags & SS_AUTODISARM) \
> -               sas_ss_reset(t); \
>  } while (0);
> 
>  #ifdef CONFIG_PROC_FS

Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>

-- 
Josh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-29 23:24                       ` Josh Poimboeuf
@ 2019-08-30 10:44                         ` Arnd Bergmann
  2019-08-30 15:14                           ` Josh Poimboeuf
  0 siblings, 1 reply; 38+ messages in thread
From: Arnd Bergmann @ 2019-08-30 10:44 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Nick Desaulniers, Ilie Halip, Linux Kernel Mailing List,
	clang-built-linux

On Fri, Aug 30, 2019 at 1:24 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Wed, Aug 28, 2019 at 05:40:01PM +0200, Arnd Bergmann wrote:
> > diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
> > index 8eb7193e158d..fd49d28abbc5 100644
> > --- a/arch/x86/kernel/signal.c
> > +++ b/arch/x86/kernel/signal.c
> > @@ -414,6 +414,9 @@ static int __setup_rt_frame(int sig, struct ksignal *ksig,
> >                  */
> >                 put_user_ex(*((u64 *)&rt_retcode), (u64 *)frame->retcode);
> >         } put_user_catch(err);
> > +
> > +       if (current->sas_ss_flags & SS_AUTODISARM)
> > +               sas_ss_reset(current);
> >
> >         err |= copy_siginfo_to_user(&frame->info, &ksig->info);
> >         err |= setup_sigcontext(&frame->uc.uc_mcontext, fpstate,

> > diff --git a/include/linux/signal.h b/include/linux/signal.h
> > index 67ceb6d7c869..9056239787f7 100644
> > --- a/include/linux/signal.h
> > +++ b/include/linux/signal.h
> > @@ -435,8 +435,6 @@ int __save_altstack(stack_t __user *, unsigned long);
> >         put_user_ex((void __user *)t->sas_ss_sp, &__uss->ss_sp); \
> >         put_user_ex(t->sas_ss_flags, &__uss->ss_flags); \
> >         put_user_ex(t->sas_ss_size, &__uss->ss_size); \
> > -       if (t->sas_ss_flags & SS_AUTODISARM) \
> > -               sas_ss_reset(t); \
> >  } while (0);
> >
> >  #ifdef CONFIG_PROC_FS
>
> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>

Thanks! Before I submit this version for inclusion, let's make sure this
is the best variant. I noticed later that save_altstack_ex() is meant to
behave the same as __save_altstack(), but my patch breaks that
assumption.

Two other alternatives I can think of are

- completely open-code save_altstack_ex() in its only call site on x86,
  in addition to the change above

- explicitly mark memset() as an exception in objtool in
  uaccess_safe_builtin[], assuming that is actually safe.

      Arnd

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-29 22:26                           ` Linus Torvalds
@ 2019-08-30 15:02                             ` Josh Poimboeuf
  2019-08-30 15:39                               ` David Laight
  2019-08-30 15:48                               ` Linus Torvalds
  0 siblings, 2 replies; 38+ messages in thread
From: Josh Poimboeuf @ 2019-08-30 15:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arnd Bergmann, Nick Desaulniers, Ilie Halip,
	Linux Kernel Mailing List, clang-built-linux, Peter Zijlstra,
	Paul E. McKenney

On Thu, Aug 29, 2019 at 03:26:30PM -0700, Linus Torvalds wrote:
> On Thu, Aug 29, 2019 at 1:22 PM Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > Maybe we can just pass -fno-builtin-memcpy -fno-builtin-memset
> > for clang when CONFIG_KASAN is set and hope for the best?
> 
> I really hate how that disables conversions both ways, which is kind
> of pointless and wrong.  It's really just "we don't want surprising
> memcpy calls for single writes".
> 
> Disabling all the *good* "optimize memset/memcpy" cases is really sad.
> 
> We actually have a lot of small structures in the kernel on purpose
> (often for type safety), and I bet we use memcpy on them on purpose at
> times. I'd hate to see that become a function call rather than "copy
> two words by hand".
> 
> Even for KASAN.
> 
> And I guess that when the compiler sees 20+ "set to zero" it's quite
> reasonable to say "just turn it into a memset".

For KASAN, the Clang threshold for inserting memset() is *2* consecutive
writes instead of 17.  Isn't that likely to cause tearing-related
surprises?

I suppose people don't run KASAN in production, but we don't want
tearing-related bugs showing up there because somebody's going to end up
having to debug them regardless.

-- 
Josh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-30 10:44                         ` Arnd Bergmann
@ 2019-08-30 15:14                           ` Josh Poimboeuf
  2019-08-30 15:58                             ` Arnd Bergmann
  2019-08-30 16:03                             ` Linus Torvalds
  0 siblings, 2 replies; 38+ messages in thread
From: Josh Poimboeuf @ 2019-08-30 15:14 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Nick Desaulniers, Ilie Halip, Linux Kernel Mailing List,
	clang-built-linux, Peter Zijlstra

On Fri, Aug 30, 2019 at 12:44:24PM +0200, Arnd Bergmann wrote:
> On Fri, Aug 30, 2019 at 1:24 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Wed, Aug 28, 2019 at 05:40:01PM +0200, Arnd Bergmann wrote:
> > > diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
> > > index 8eb7193e158d..fd49d28abbc5 100644
> > > --- a/arch/x86/kernel/signal.c
> > > +++ b/arch/x86/kernel/signal.c
> > > @@ -414,6 +414,9 @@ static int __setup_rt_frame(int sig, struct ksignal *ksig,
> > >                  */
> > >                 put_user_ex(*((u64 *)&rt_retcode), (u64 *)frame->retcode);
> > >         } put_user_catch(err);
> > > +
> > > +       if (current->sas_ss_flags & SS_AUTODISARM)
> > > +               sas_ss_reset(current);
> > >
> > >         err |= copy_siginfo_to_user(&frame->info, &ksig->info);
> > >         err |= setup_sigcontext(&frame->uc.uc_mcontext, fpstate,
> 
> > > diff --git a/include/linux/signal.h b/include/linux/signal.h
> > > index 67ceb6d7c869..9056239787f7 100644
> > > --- a/include/linux/signal.h
> > > +++ b/include/linux/signal.h
> > > @@ -435,8 +435,6 @@ int __save_altstack(stack_t __user *, unsigned long);
> > >         put_user_ex((void __user *)t->sas_ss_sp, &__uss->ss_sp); \
> > >         put_user_ex(t->sas_ss_flags, &__uss->ss_flags); \
> > >         put_user_ex(t->sas_ss_size, &__uss->ss_size); \
> > > -       if (t->sas_ss_flags & SS_AUTODISARM) \
> > > -               sas_ss_reset(t); \
> > >  } while (0);
> > >
> > >  #ifdef CONFIG_PROC_FS
> >
> > Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
> 
> Thanks! Before I submit this version for inclusion, let's make sure this
> is the best variant. I noticed later that save_altstack_ex() is meant to
> behave the same as __save_altstack(), but my patch breaks that
> assumption.

Good point.

There's also compat_save_altstack_ex() -- which presumably needs the
same fix? -- and __compat_save_altstack().

> Two other alternatives I can think of are
> 
> - completely open-code save_altstack_ex() in its only call site on x86,
>   in addition to the change above

But it has two call sites: the 32-bit and 64-bit versions of
save_altstack_ex().

> - explicitly mark memset() as an exception in objtool in
>   uaccess_safe_builtin[], assuming that is actually safe.

I wonder if this might open up more theoretical SMAP holes for other
callers to memset().

What about just adding a couple of WRITE_ONCE's to sas_ss_reset()?  That
would probably be the least disruptive option.

Or even better, it would be great if we could get Clang to change their
memset() insertion heuristics, so that KASAN acts more like non-KASAN
code in that regard.

-- 
Josh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-30 15:02                             ` Josh Poimboeuf
@ 2019-08-30 15:39                               ` David Laight
  2019-08-30 15:48                               ` Linus Torvalds
  1 sibling, 0 replies; 38+ messages in thread
From: David Laight @ 2019-08-30 15:39 UTC (permalink / raw)
  To: 'Josh Poimboeuf', Linus Torvalds
  Cc: Arnd Bergmann, Nick Desaulniers, Ilie Halip,
	Linux Kernel Mailing List, clang-built-linux, Peter Zijlstra,
	Paul E. McKenney

From: Josh Poimboeuf
> Sent: 30 August 2019 16:02
> On Thu, Aug 29, 2019 at 03:26:30PM -0700, Linus Torvalds wrote:
> > On Thu, Aug 29, 2019 at 1:22 PM Arnd Bergmann <arnd@arndb.de> wrote:
> > >
> > > Maybe we can just pass -fno-builtin-memcpy -fno-builtin-memset
> > > for clang when CONFIG_KASAN is set and hope for the best?
> >
> > I really hate how that disables conversions both ways, which is kind
> > of pointless and wrong.  It's really just "we don't want surprising
> > memcpy calls for single writes".
> >
> > Disabling all the *good* "optimize memset/memcpy" cases is really sad.
> >
> > We actually have a lot of small structures in the kernel on purpose
> > (often for type safety), and I bet we use memcpy on them on purpose at
> > times. I'd hate to see that become a function call rather than "copy
> > two words by hand".
> >
> > Even for KASAN.
> >
> > And I guess that when the compiler sees 20+ "set to zero" it's quite
> > reasonable to say "just turn it into a memset".
> 
> For KASAN, the Clang threshold for inserting memset() is *2* consecutive
> writes instead of 17.  Isn't that likely to cause tearing-related
> surprises?

Hmmm... I don't think I'd ever want a compiler to convert a sequence
of zero writes into a memset.

It is as bad as a certain compiler converting:
	for (i = 0; i < n; i++)
		tgt[i] = src[i];
into a 'rep movs' instruction.
Inlining memcpy() as 'rep movs' is one thing, the opposite is wrong.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-30 15:02                             ` Josh Poimboeuf
  2019-08-30 15:39                               ` David Laight
@ 2019-08-30 15:48                               ` Linus Torvalds
  2019-08-30 15:55                                 ` David Laight
  2019-08-30 16:49                                 ` Josh Poimboeuf
  1 sibling, 2 replies; 38+ messages in thread
From: Linus Torvalds @ 2019-08-30 15:48 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Arnd Bergmann, Nick Desaulniers, Ilie Halip,
	Linux Kernel Mailing List, clang-built-linux, Peter Zijlstra,
	Paul E. McKenney

On Fri, Aug 30, 2019 at 8:02 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> For KASAN, the Clang threshold for inserting memset() is *2* consecutive
> writes instead of 17.  Isn't that likely to cause tearing-related
> surprises?

Tearing isn't likely to be a problem.

It's not like memcpy() does byte-by-byte copies. If you pass it a
word-aligned pointer, it will do word-aligned accesses simply for
performance reasons.

Even on x86, where we use "rep movsb", we (a) tend to disable it for
small copies and (b) it turns out that microcode that does the
optimized movsb (which is the only case we use it) probably ends up
doing atomic things anyway. Note the "probably". I don't have
microcode source code, but there are other indications like "we know
it doesn't take interrupts on a byte-per-byte level, only on the
cacheline level".

So it's probably not an issue from a tearing standpoint - but it
worries me because of "this has to be a leaf function" kind of issues
where we may be using individual stores on purpose. We do have things
like that.

               Linus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-30 15:48                               ` Linus Torvalds
@ 2019-08-30 15:55                                 ` David Laight
  2019-08-30 16:01                                   ` Linus Torvalds
  2019-08-30 16:49                                 ` Josh Poimboeuf
  1 sibling, 1 reply; 38+ messages in thread
From: David Laight @ 2019-08-30 15:55 UTC (permalink / raw)
  To: 'Linus Torvalds', Josh Poimboeuf
  Cc: Arnd Bergmann, Nick Desaulniers, Ilie Halip,
	Linux Kernel Mailing List, clang-built-linux, Peter Zijlstra,
	Paul E. McKenney

From: Linus Torvalds
> Sent: 30 August 2019 16:49
> On Fri, Aug 30, 2019 at 8:02 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >
> > For KASAN, the Clang threshold for inserting memset() is *2* consecutive
> > writes instead of 17.  Isn't that likely to cause tearing-related
> > surprises?
> 
> Tearing isn't likely to be a problem.
> 
> It's not like memcpy() does byte-by-byte copies. If you pass it a
> word-aligned pointer, it will do word-aligned accesses simply for
> performance reasons.
> 
> Even on x86, where we use "rep movsb", we (a) tend to disable it for
> small copies and (b) it turns out that microcode that does the
> optimized movsb (which is the only case we use it) probably ends up
> doing atomic things anyway. Note the "probably". I don't have
> microcode source code, but there are other indications like "we know
> it doesn't take interrupts on a byte-per-byte level, only on the
> cacheline level".
> 
> So it's probably not an issue from a tearing standpoint - but it
> worries me because of "this has to be a leaf function" kind of issues
> where we may be using individual stores on purpose. We do have things
> like that.

Even in userspace you might be accessing mmap()ed PCIe device memory.
The last thing you want is the compiler converting anything into
'rep movsb'.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-30 15:14                           ` Josh Poimboeuf
@ 2019-08-30 15:58                             ` Arnd Bergmann
  2019-08-30 16:12                               ` David Laight
  2019-08-30 16:03                             ` Linus Torvalds
  1 sibling, 1 reply; 38+ messages in thread
From: Arnd Bergmann @ 2019-08-30 15:58 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Nick Desaulniers, Ilie Halip, Linux Kernel Mailing List,
	clang-built-linux, Peter Zijlstra

On Fri, Aug 30, 2019 at 5:14 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Fri, Aug 30, 2019 at 12:44:24PM +0200, Arnd Bergmann wrote:
> > On Fri, Aug 30, 2019 at 1:24 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > > On Wed, Aug 28, 2019 at 05:40:01PM +0200, Arnd Bergmann wrote:
> > > > diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
> > > > index 8eb7193e158d..fd49d28abbc5 100644
> > > > --- a/arch/x86/kernel/signal.c
> > > > +++ b/arch/x86/kernel/signal.c
> > > > @@ -414,6 +414,9 @@ static int __setup_rt_frame(int sig, struct ksignal *ksig,
> > > >                  */
> > > >                 put_user_ex(*((u64 *)&rt_retcode), (u64 *)frame->retcode);
> > > >         } put_user_catch(err);
> > > > +
> > > > +       if (current->sas_ss_flags & SS_AUTODISARM)
> > > > +               sas_ss_reset(current);
> > > >
> > > >         err |= copy_siginfo_to_user(&frame->info, &ksig->info);
> > > >         err |= setup_sigcontext(&frame->uc.uc_mcontext, fpstate,
> >
> > > > diff --git a/include/linux/signal.h b/include/linux/signal.h
> > > > index 67ceb6d7c869..9056239787f7 100644
> > > > --- a/include/linux/signal.h
> > > > +++ b/include/linux/signal.h
> > > > @@ -435,8 +435,6 @@ int __save_altstack(stack_t __user *, unsigned long);
> > > >         put_user_ex((void __user *)t->sas_ss_sp, &__uss->ss_sp); \
> > > >         put_user_ex(t->sas_ss_flags, &__uss->ss_flags); \
> > > >         put_user_ex(t->sas_ss_size, &__uss->ss_size); \
> > > > -       if (t->sas_ss_flags & SS_AUTODISARM) \
> > > > -               sas_ss_reset(t); \
> > > >  } while (0);
> > > >
> > > >  #ifdef CONFIG_PROC_FS
> > >
> > > Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
> >
> > Thanks! Before I submit this version for inclusion, let's make sure this
> > is the best variant. I noticed later that save_altstack_ex() is meant to
> > behave the same as __save_altstack(), but my patch breaks that
> > assumption.
>
> Good point.
>
> There's also compat_save_altstack_ex() -- which presumably needs the
> same fix? -- and __compat_save_altstack().

Yes, I meant both here of course (as in my earlier patch).

> > Two other alternatives I can think of are
> >
> > - completely open-code save_altstack_ex() in its only call site on x86,
> >   in addition to the change above
>
> But it has two call sites: the 32-bit and 64-bit versions of
> save_altstack_ex().

Ah, that's what I get for looking only at the compat version.

> > - explicitly mark memset() as an exception in objtool in
> >   uaccess_safe_builtin[], assuming that is actually safe.
>
> I wonder if this might open up more theoretical SMAP holes for other
> callers to memset().
>
> What about just adding a couple of WRITE_ONCE's to sas_ss_reset()?  That
> would probably be the least disruptive option.

Fine with me, too.

> Or even better, it would be great if we could get Clang to change their
> memset() insertion heuristics, so that KASAN acts more like non-KASAN
> code in that regard.

I suspect that's going to be harder. The clang-9 release is going to be
soon, and that change probably wouldn't be considered a regression fix.

Maybe Nick can find what happens, but I don't actually see any reference
to KASAN in the llvm source code related to the memset generation.

https://github.com/llvm-mirror/clang/blob/master/lib/CodeGen/CGExprAgg.cpp#L1803
has a check for >16 bytes, but that again does not match my observation.

     Arnd

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-30 15:55                                 ` David Laight
@ 2019-08-30 16:01                                   ` Linus Torvalds
  2019-08-30 16:42                                     ` David Laight
  0 siblings, 1 reply; 38+ messages in thread
From: Linus Torvalds @ 2019-08-30 16:01 UTC (permalink / raw)
  To: David Laight
  Cc: Josh Poimboeuf, Arnd Bergmann, Nick Desaulniers, Ilie Halip,
	Linux Kernel Mailing List, clang-built-linux, Peter Zijlstra,
	Paul E. McKenney

On Fri, Aug 30, 2019 at 8:55 AM David Laight <David.Laight@aculab.com> wrote:
>
> Even in userspace you might be accessing mmap()ed PCIe device memory.
> The last thing you want is the compiler converting anything into
> 'rep movsb'.

Agreed, although for actual IO accesses you likely should really be
doing "volatile" anyway.

But yeah, in general it's just not obviously safe to turn individual
accesses into memset/memcpy. In contrast, the reverse is obviously
fine (and _required_ for any kind of half-way good performance when
you do small constant-sized memory copies, which is actually a common
pattern partly because the insane C aliasing rules have taught people
that it's the _only_ safe pattern in some situations).

This is why I think "-ffreestanding" and "-fno-builtin-memcpy" are
completely broken as-is: they are an all-or-nothing thing, they don't
understand that it's directional.

                 Linus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-30 15:14                           ` Josh Poimboeuf
  2019-08-30 15:58                             ` Arnd Bergmann
@ 2019-08-30 16:03                             ` Linus Torvalds
  1 sibling, 0 replies; 38+ messages in thread
From: Linus Torvalds @ 2019-08-30 16:03 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Arnd Bergmann, Nick Desaulniers, Ilie Halip,
	Linux Kernel Mailing List, clang-built-linux, Peter Zijlstra

On Fri, Aug 30, 2019 at 8:14 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> What about just adding a couple of WRITE_ONCE's to sas_ss_reset()?  That
> would probably be the least disruptive option.

I think that WRITE_ONCE() with a comment about why is a good idea.

The reason I dislike WRITE_ONCE() in general is not because it's
wrong, it's because people often use it mindlessly and the rationale
for it isn't clear.

But with a comment about why, that issue obviously goes away.

            Linus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-30 15:58                             ` Arnd Bergmann
@ 2019-08-30 16:12                               ` David Laight
  0 siblings, 0 replies; 38+ messages in thread
From: David Laight @ 2019-08-30 16:12 UTC (permalink / raw)
  To: 'Arnd Bergmann', Josh Poimboeuf
  Cc: Nick Desaulniers, Ilie Halip, Linux Kernel Mailing List,
	clang-built-linux, Peter Zijlstra

From: Arnd Bergmann
> Sent: 30 August 2019 16:59
...
> > Or even better, it would be great if we could get Clang to change their
> > memset() insertion heuristics, so that KASAN acts more like non-KASAN
> > code in that regard.
> 
> I suspect that's going to be harder. The clang-9 release is going to be
> soon, and that change probably wouldn't be considered a regression fix.
> 
> Maybe Nick can find what happens, but I don't actually see any reference
> to KASAN in the llvm source code related to the memset generation.
> 
> https://github.com/llvm-mirror/clang/blob/master/lib/CodeGen/CGExprAgg.cpp#L1803
> has a check for >16 bytes, but that again does not match my observation.

That looks like the code for initialising a local structure variable,
not for merging 'random' writes into a memset().

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-30 16:01                                   ` Linus Torvalds
@ 2019-08-30 16:42                                     ` David Laight
  0 siblings, 0 replies; 38+ messages in thread
From: David Laight @ 2019-08-30 16:42 UTC (permalink / raw)
  To: 'Linus Torvalds'
  Cc: Josh Poimboeuf, Arnd Bergmann, Nick Desaulniers, Ilie Halip,
	Linux Kernel Mailing List, clang-built-linux, Peter Zijlstra,
	Paul E. McKenney

From: Linus Torvalds
> Sent: 30 August 2019 17:01
> On Fri, Aug 30, 2019 at 8:55 AM David Laight <David.Laight@aculab.com> wrote:
...
> But yeah, in general it's just not obviously safe to turn individual
> accesses into memset/memcpy. In contrast, the reverse is obviously
> fine (and _required_ for any kind of half-way good performance when
> you do small constant-sized memory copies, which is actually a common
> pattern partly because the insane C aliasing rules have taught people
> that it's the _only_ safe pattern in some situations).

I wonder where the actual cutoff is for converting a sequence of writes
of zero into a call to memset()?

If you assume either:
1) cold cache (for memset).
2) branch predictor not set for zeroing a small number of words.
I suspect that it is considerable.

> This is why I think "-ffreestanding" and "-fno-builtin-memcpy" are
> completely broken as-is: they are an all-or-nothing thing, they don't
> understand that it's directional.

Yep, and some of the conversions are a just a PITA.
eg printf("%s", string) => puts(string).

I was also trying to get around the memcpy@GLIBC4 fubar so I could
compile code that would run on an old system.
I managed everything except the memcpy() calls that gcc emits for
structure copies (it might even do that even for 'freestanding').
It really ought to emit a call to a different symbol that would normally
be aliased to memcpy() (or better a memcpy_words() function).

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-30 15:48                               ` Linus Torvalds
  2019-08-30 15:55                                 ` David Laight
@ 2019-08-30 16:49                                 ` Josh Poimboeuf
  2019-09-02  9:02                                   ` David Laight
  1 sibling, 1 reply; 38+ messages in thread
From: Josh Poimboeuf @ 2019-08-30 16:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arnd Bergmann, Nick Desaulniers, Ilie Halip,
	Linux Kernel Mailing List, clang-built-linux, Peter Zijlstra,
	Paul E. McKenney

On Fri, Aug 30, 2019 at 08:48:49AM -0700, Linus Torvalds wrote:
> On Fri, Aug 30, 2019 at 8:02 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >
> > For KASAN, the Clang threshold for inserting memset() is *2* consecutive
> > writes instead of 17.  Isn't that likely to cause tearing-related
> > surprises?
> 
> Tearing isn't likely to be a problem.
> 
> It's not like memcpy() does byte-by-byte copies. If you pass it a
> word-aligned pointer, it will do word-aligned accesses simply for
> performance reasons.
> 
> Even on x86, where we use "rep movsb", we (a) tend to disable it for
> small copies and (b) it turns out that microcode that does the
> optimized movsb (which is the only case we use it) probably ends up
> doing atomic things anyway. Note the "probably". I don't have
> microcode source code, but there are other indications like "we know
> it doesn't take interrupts on a byte-per-byte level, only on the
> cacheline level".

The microcode argument is not all that comforting :-)

Also what about unaligned accesses, e.g. if a struct member isn't on a
word boundary?  Arnd's godbolt link showed those can get combined too.

I don't see x86 memcpy() doing any destination alignment checks.

Have we audited other arches' memset/memcpy implementations?

> So it's probably not an issue from a tearing standpoint - but it
> worries me because of "this has to be a leaf function" kind of issues
> where we may be using individual stores on purpose. We do have things
> like that.

It sounds like everybody's in agreement that replacing accesses with
memset/memcpy is bad in a kernel context.  Should we push for a new
fine-grained compiler option to disable it?

-- 
Josh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-30 16:49                                 ` Josh Poimboeuf
@ 2019-09-02  9:02                                   ` David Laight
  0 siblings, 0 replies; 38+ messages in thread
From: David Laight @ 2019-09-02  9:02 UTC (permalink / raw)
  To: 'Josh Poimboeuf', Linus Torvalds
  Cc: Arnd Bergmann, Nick Desaulniers, Ilie Halip,
	Linux Kernel Mailing List, clang-built-linux, Peter Zijlstra,
	Paul E. McKenney

From: Josh Poimboeuf
> Sent: 30 August 2019 17:49
> On Fri, Aug 30, 2019 at 08:48:49AM -0700, Linus Torvalds wrote:
> > On Fri, Aug 30, 2019 at 8:02 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > >
> > > For KASAN, the Clang threshold for inserting memset() is *2* consecutive
> > > writes instead of 17.  Isn't that likely to cause tearing-related
> > > surprises?
> >
> > Tearing isn't likely to be a problem.
> >
> > It's not like memcpy() does byte-by-byte copies. If you pass it a
> > word-aligned pointer, it will do word-aligned accesses simply for
> > performance reasons.
> >
> > Even on x86, where we use "rep movsb", we (a) tend to disable it for
> > small copies and (b) it turns out that microcode that does the
> > optimized movsb (which is the only case we use it) probably ends up
> > doing atomic things anyway. Note the "probably". I don't have
> > microcode source code, but there are other indications like "we know
> > it doesn't take interrupts on a byte-per-byte level, only on the
> > cacheline level".
> 
> The microcode argument is not all that comforting :-)
> 
> Also what about unaligned accesses, e.g. if a struct member isn't on a
> word boundary?  Arnd's godbolt link showed those can get combined too.

I'd guess that it has to 'complete' a partial copy.
After all there are no mid-instruction interrupt states so the interrupt
returns to a new 'rep movsb' instruction (the isr can change si/di/cx).
Either the source, or destination is almost certainly cache line aligned.

> I don't see x86 memcpy() doing any destination alignment checks.

I don't think anyone has tried to instrument whether it is better to
do misaligned reads or writes (and it probably depends on the cpu).
The code will probably be more critical on the reads.
The real gain will be when the source and destination have the same
mis-alignment.

...
> > So it's probably not an issue from a tearing standpoint - but it
> > worries me because of "this has to be a leaf function" kind of issues
> > where we may be using individual stores on purpose. We do have things
> > like that.
> 
> It sounds like everybody's in agreement that replacing accesses with
> memset/memcpy is bad in a kernel context.  Should we push for a new
> fine-grained compiler option to disable it?

I'm not sure it is a good idea in ANY context.
It seems like something the compiler people has discovered they can do
without actually deciding whether it is useful.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: objtool warning "uses BP as a scratch register" with clang-9
  2019-08-29 18:30                       ` Linus Torvalds
  2019-08-29 20:21                         ` Arnd Bergmann
@ 2019-09-04 11:53                         ` Geert Uytterhoeven
  1 sibling, 0 replies; 38+ messages in thread
From: Geert Uytterhoeven @ 2019-09-04 11:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, Arnd Bergmann, Nick Desaulniers, Ilie Halip,
	Linux Kernel Mailing List, clang-built-linux, Peter Zijlstra,
	Paul E. McKenney

Hi Linus,

On Thu, Aug 29, 2019 at 8:31 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Aug 29, 2019 at 10:35 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >
> > Peter suggested to try WRITE_ONCE for the two zero writes to see if that
> > "fixes" it.
>
> I'm sure it "fixes" it.
>
> .. and then where else will we hit this?
>
> It's one thing to turn a structure zeroing into "memset()", but some
> places really can't do it.
>
> We use "-ffreestanding" in some places to make sure that gcc doesn't
> start calling random libc routines. I wonder if we need to make it a
> general rule that it's done unconditionally.
>
> Sadly, I think that ends up also disabling things like
> "__builtin_memcpy()" and friends. Which we _do_ want to have access
> to, because then gcc can inline the memcpy() when we _do_ use
> memcpy().
>
> We used to do all of those heuristics by hand, but wanted to let the
> compiler do them for us.
>
> So:
>
>  - we do want "memcpy()" to become "__builtin_memcpy()" which can then
> be optimized to either individual inlined assignments _or_ to an
> out-of-line call to memcpy().

You can do

    #define memcpy(d, s, n) __builtin_memcpy(d, s, n)

Alpha, m68k, sparc, and x86 already do.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2019-09-04 11:54 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-27 12:30 objtool warning "uses BP as a scratch register" with clang-9 Arnd Bergmann
2019-08-27 14:51 ` Josh Poimboeuf
2019-08-27 14:59   ` Ilie Halip
2019-08-27 19:00     ` Arnd Bergmann
2019-08-27 19:22       ` Josh Poimboeuf
2019-08-27 19:47         ` Arnd Bergmann
2019-08-27 21:21           ` Nick Desaulniers
2019-08-28  9:00             ` Arnd Bergmann
2019-08-28 14:06               ` Arnd Bergmann
2019-08-28 14:51               ` Josh Poimboeuf
2019-08-28 15:29                 ` Arnd Bergmann
2019-08-28 17:57                   ` Josh Poimboeuf
2019-08-28 19:41                     ` Arnd Bergmann
2019-08-28 15:13               ` Arnd Bergmann
2019-08-28 15:22                 ` Josh Poimboeuf
2019-08-28 15:28                   ` Arnd Bergmann
2019-08-28 15:40                     ` Arnd Bergmann
2019-08-29 23:24                       ` Josh Poimboeuf
2019-08-30 10:44                         ` Arnd Bergmann
2019-08-30 15:14                           ` Josh Poimboeuf
2019-08-30 15:58                             ` Arnd Bergmann
2019-08-30 16:12                               ` David Laight
2019-08-30 16:03                             ` Linus Torvalds
2019-08-29 17:34                     ` Josh Poimboeuf
2019-08-29 18:30                       ` Linus Torvalds
2019-08-29 20:21                         ` Arnd Bergmann
2019-08-29 22:26                           ` Linus Torvalds
2019-08-30 15:02                             ` Josh Poimboeuf
2019-08-30 15:39                               ` David Laight
2019-08-30 15:48                               ` Linus Torvalds
2019-08-30 15:55                                 ` David Laight
2019-08-30 16:01                                   ` Linus Torvalds
2019-08-30 16:42                                     ` David Laight
2019-08-30 16:49                                 ` Josh Poimboeuf
2019-09-02  9:02                                   ` David Laight
2019-09-04 11:53                         ` Geert Uytterhoeven
2019-08-28 22:13         ` Nick Desaulniers
2019-08-29  0:28           ` Josh Poimboeuf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).