linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
@ 2020-10-21 16:58 Naresh Kamboju
  2020-10-21 17:05 ` Linus Torvalds
  0 siblings, 1 reply; 18+ messages in thread
From: Naresh Kamboju @ 2020-10-21 16:58 UTC (permalink / raw)
  To: open list, linux-m68k, X86 ML, LTP List, lkft-triage,
	Linux-Next Mailing List, linux-mm, kasan-dev
  Cc: Christian Brauner, Ingo Molnar, Thomas Gleixner,
	Matthew Wilcox (Oracle), Peter Zijlstra (Intel),
	Al Viro, Geert Uytterhoeven, Linus Torvalds, Viresh Kumar,
	zenglg.jy, Stephen Rothwell, Eric W. Biederman, Dmitry Vyukov

LTP mm mtest05 (mmstress), mtest06_3 and mallocstress01 (mallocstress) tested on
x86 KASAN enabled build. But tests are getting PASS on Non KASAN builds.
This regression started happening from next-20201015 nowards

There are few more regression on linux next,
  ltp-cve-tests:
    * cve-2015-7550
  ltp-math-tests:
    * float_bessel
    * float_exp_log
    * float_iperb
    * float_power
    * float_trigo
  ltp-mm-tests:
    * mallocstress01
    * mtest05
    * mtest06_3
  ltp-syscalls-tests:
    * clone08
    * clone301
    * fcntl34
    * fcntl34_64
    * fcntl36
    * fcntl36_64
    * keyctl02
    * rt_tgsigqueueinfo01

metadata:
  git branch: master
  git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
  git describe: next-20201015
  kernel-config:
https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/kernel.config

steps to reproduce:
  # boot x86_64 with KASAN enabled kernel and run tests
  # cd /opt/ltp/testcases/bin
  # ./mmstress
  # ./mmap3 -x 0.002 -p
  # ./mallocstress

mtest05  (mmstress) :
--------------------
mmstress    0  TINFO  :  run mmstress -h for all options
mmstress    0  TINFO  :  test1: Test case tests the race condition
between simultaneous read faults in the same address space.
[  279.469207] mmstress[1309]: segfault at 7f3d71a36ee8 ip
00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in
libc-2.27.so[7f3d77058000+1aa000]
[  279.469305] audit: type=1701 audit(1602818315.656:3):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=1307
comm=\"mmstress\" exe=\"/opt/ltp/testcases/bin/mmstress\" sig=11 res=1
[  279.481636] Code: 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f
84 00 00 00 00 00 0f 1f 44 00 00 b8 18 00 00 00 0f 05 48 3d 01 f0 ff
ff 73 01 <c3> 48 8b 0d 91 22 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e
0f 1f
[  279.498212] mmstress[1311]: segfault at 7f3d70a34ee8 ip
00007f3d77132bdf sp 00007f3d70a34ee8 error 4 in
libc-2.27.so[7f3d77058000+1aa000]
[  279.516839] Code: 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f
84 00 00 00 00 00 0f 1f 44 00 00 b8 18 00 00 00 0f 05 48 3d 01 f0 ff
ff 73 01 <c3> 48 8b 0d 91 22 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e
0f 1f
tst_test.c:1246: INFO: Timeout per run is 0h 15m 00s
tst_test.c:1246: INFO: Timeout per run is 0h 09m 00s
tst_test.c:1291: BROK: Test killed by SIGBUS!

mtest06_3 (mmap3 -x 0.002 -p) :
-------------------------------
mmap3.c:154: INFO: Seed 22
mmap3.c:155: INFO: Number of loops 1000
mmap3.c:156: INFO: Number of threads 40
mmap3.c:157: INFO: MAP[  286.657788] mmap3[1350]: segfault at
7f12179d4680 ip 00007f121859951d sp 00007f12179d1e10 error 6 in
libpthread-2.27.so[7f1218589000+19000]
_PRIVATE = 1
mm[  286.671184] Code: c4 10 5b 5d 41 5c c3 66 0f 1f 44 00 00 48 8b 15
99 8a 20 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 48 8b 15 85 8a 20
00 f7 d8 <64> 89 02 48 c7 c0 ff ff ff ff eb b6 0f 1f 80 00 00 00 00 b8
01 00
[  286.677386] audit: type=1701 audit(1602818322.844:6):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=1348
comm=\"mmap3\" exe=\"/opt/ltp/testcases/bin/mmap3\" sig=11 res=1
ap3.c:158: INFO: Execution time 0.002000H

mallocstress01 (mallocstress) :
------------------------------
pid[1496]: shmat_rd_wr(): shmget():success got segment id 32830
pid[1496]: do_shmat_shmadt(): got shmat address = 0x7f301eae9000
pid[1496]: shmat_rd_wr(): shmget():success got segment id 328[
291.851376] mallocstress[1502]: segfault at 0 ip 0000000000000000 sp
00007f80dea3ec50 error 14
30
pid[1496]: d[  291.851466] mallocstress[1507]: segfault at
7f80dc239c98 ip 00007f80df2bf81c sp 00007f80dc239c98 error 4
o_shmat_shmadt()[  291.851485] mallocstress[1505]: segfault at
7f80dd23bc38 ip 00007f80df33fe93 sp 00007f80dd23bc38 error 4
[  291.851490] Code: 00 00 00 00 0f 1f 00 41 52 52 4d 31 d2 ba 02 00
00 00 be 80 00 00 00 39 d0 75 07 b8 ca 00 00 00 0f 05 89 d0 87 07 85
c0 75 f1 <5a> 41 5a c3 66 0f 1f 84 00 00 00 00 00 56 52 c7 07 00 00 00
00 be
: got shmat addr[  291.851565] audit: type=1701
audit(1602818328.038:7): auid=4294967295 uid=0 gid=0 ses=4294967295
subj=kernel pid=1500 comm=\"mallocstress\"
exe=\"/opt/ltp/testcases/bin/mallocstress\" sig=11 res=1
[  291.852984] mallocstress[1504]: segfault at 7f80dda3cc38 ip
00007f80df33fe93 sp 00007f80dda3cc38 error 4
ess = 0x7f301e85[  291.852988] Code: 00 00 00 00 0f 1f 00 41 52 52 4d
31 d2 ba 02 00 00 00 be 80 00 00 00 39 d0 75 07 b8 ca 00 00 00 0f 05
89 d0 87 07 85 c0 75 f1 <5a> 41 5a c3 66 0f 1f 84 00 00 00 00 00 56 52
c7 07 00 00 00 00 be
[  291.853045] audit: type=1701 audit(1602818328.040:8):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=1500
comm=\"mallocstress\" exe=\"/opt/ltp/testcases/bin/mallocstress\"
sig=11 res=1
5000
tst_test.c[  291.860373] Code: Unable to access opcode bytes at RIP
0xffffffffffffffd6.
[  291.860453] mallocstress[1506]: segfault at 7f80dca3ac98 ip
00007f80df2bf81c sp 00007f80dca3ac98 error 4
:1246: INFO: Tim[  291.860654] audit: type=1701
audit(1602818328.047:9): auid=4294967295 uid=0 gid=0 ses=4294967295
subj=kernel pid=1500 comm=\"mallocstress\"
exe=\"/opt/ltp/testcases/bin/mallocstress\" sig=11 res=1
[  291.871350]
eout per run is [  291.871397] mallocstress[1501]: segfault at 0 ip
0000000000000000 sp 00007f80df23fc50 error 14
[  291.871401] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
0h 30m 00s
[  291.871467] audit: type=1701 audit(1602818328.058:10):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=1500
comm=\"mallocstress\" exe=\"/opt/ltp/testcases/bin/mallocstress\"
sig=11 res=1
[  291.882113]  in libc-2.27.so[7f80df241000+1aa000]
[  291.900984] Code: ff 48 85 c0 75 d8 0f 1f 84 00 00 00 00 00 8b 35
26 11 33 00 48 83 c1 10 85 f6 0f 85 42 01 00 00 48 81 c4 88 00 00 00
48 89 c8 <5b> 5d 41 5c 41 5d 41 5e 41 5f c3 66 0f 1f 84 00 00 00 00 00
4c 8b
[  291.919351] Code: ff 48 85 c0 75 d8 0f 1f 84 00 00 00 00 00 8b 35
26 11 33 00 48 83 c1 10 85 f6 0f 85 42 01 00 00 48 81 c4 88 00 00 00
48 89 c8 <5b> 5d 41 5c 41 5d 41 5e 41 5f c3 66 0f 1f 84 00 00 00 00 00
4c 8b

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>

full test log link,
https://lkft.validation.linaro.org/scheduler/job/1844090

-- 
Linaro LKFT
https://lkft.linaro.org


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
  2020-10-21 16:58 mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000] Naresh Kamboju
@ 2020-10-21 17:05 ` Linus Torvalds
  2020-10-21 17:22   ` Naresh Kamboju
  0 siblings, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2020-10-21 17:05 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: open list, linux-m68k, X86 ML, LTP List, lkft-triage,
	Linux-Next Mailing List, linux-mm, kasan-dev, Christian Brauner,
	Ingo Molnar, Thomas Gleixner, Matthew Wilcox (Oracle),
	Peter Zijlstra (Intel),
	Al Viro, Geert Uytterhoeven, Viresh Kumar, zenglg.jy,
	Stephen Rothwell, Eric W. Biederman, Dmitry Vyukov

On Wed, Oct 21, 2020 at 9:58 AM Naresh Kamboju
<naresh.kamboju@linaro.org> wrote:
>
> LTP mm mtest05 (mmstress), mtest06_3 and mallocstress01 (mallocstress) tested on
> x86 KASAN enabled build. But tests are getting PASS on Non KASAN builds.
> This regression started happening from next-20201015 nowards

Is it repeatable enough to be bisectable?

             Linus


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
  2020-10-21 17:05 ` Linus Torvalds
@ 2020-10-21 17:22   ` Naresh Kamboju
  2020-10-22 20:55     ` Naresh Kamboju
  0 siblings, 1 reply; 18+ messages in thread
From: Naresh Kamboju @ 2020-10-21 17:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: open list, linux-m68k, X86 ML, LTP List, lkft-triage,
	Linux-Next Mailing List, linux-mm, kasan-dev, Christian Brauner,
	Ingo Molnar, Thomas Gleixner, Matthew Wilcox (Oracle),
	Peter Zijlstra (Intel),
	Al Viro, Geert Uytterhoeven, Viresh Kumar, zenglg.jy,
	Stephen Rothwell, Eric W. Biederman, Dmitry Vyukov

On Wed, 21 Oct 2020 at 22:35, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, Oct 21, 2020 at 9:58 AM Naresh Kamboju
> <naresh.kamboju@linaro.org> wrote:
> >
> > LTP mm mtest05 (mmstress), mtest06_3 and mallocstress01 (mallocstress) tested on
> > x86 KASAN enabled build. But tests are getting PASS on Non KASAN builds.
> > This regression started happening from next-20201015 nowards
>
> Is it repeatable enough to be bisectable?

Yes. This is easily reproducible.
I will bisect and report here.

>
>              Linus

- Naresh


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
  2020-10-21 17:22   ` Naresh Kamboju
@ 2020-10-22 20:55     ` Naresh Kamboju
  2020-10-22 23:43       ` Linus Torvalds
  0 siblings, 1 reply; 18+ messages in thread
From: Naresh Kamboju @ 2020-10-22 20:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: open list, linux-m68k, X86 ML, LTP List, lkft-triage,
	Linux-Next Mailing List, linux-mm, kasan-dev, Christian Brauner,
	Ingo Molnar, Thomas Gleixner, Matthew Wilcox (Oracle),
	Peter Zijlstra (Intel),
	Al Viro, Geert Uytterhoeven, Viresh Kumar, zenglg.jy,
	Stephen Rothwell, Eric W. Biederman, Dmitry Vyukov

On Wed, 21 Oct 2020 at 22:52, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
>
> On Wed, 21 Oct 2020 at 22:35, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Wed, Oct 21, 2020 at 9:58 AM Naresh Kamboju
> > <naresh.kamboju@linaro.org> wrote:
> > >
> > > LTP mm mtest05 (mmstress), mtest06_3 and mallocstress01 (mallocstress) tested on
> > > x86 KASAN enabled build. But tests are getting PASS on Non KASAN builds.
> > > This regression started happening from next-20201015 nowards
> >
> > Is it repeatable enough to be bisectable?
>
> Yes. This is easily reproducible.
> I will bisect and report here.

The bad commit points to,

commit d55564cfc222326e944893eff0c4118353e349ec
x86: Make __put_user() generate an out-of-line call

I have reverted this single patch and confirmed the reported
problem is not seen anymore.

- Naresh


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
  2020-10-22 20:55     ` Naresh Kamboju
@ 2020-10-22 23:43       ` Linus Torvalds
  2020-10-23  0:11         ` Linus Torvalds
  0 siblings, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2020-10-22 23:43 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: open list, linux-m68k, X86 ML, LTP List, lkft-triage,
	Linux-Next Mailing List, linux-mm, kasan-dev, Christian Brauner,
	Ingo Molnar, Thomas Gleixner, Matthew Wilcox (Oracle),
	Peter Zijlstra (Intel),
	Al Viro, Geert Uytterhoeven, Viresh Kumar, zenglg.jy,
	Stephen Rothwell, Eric W. Biederman, Dmitry Vyukov

On Thu, Oct 22, 2020 at 1:55 PM Naresh Kamboju
<naresh.kamboju@linaro.org> wrote:
>
> The bad commit points to,
>
> commit d55564cfc222326e944893eff0c4118353e349ec
> x86: Make __put_user() generate an out-of-line call
>
> I have reverted this single patch and confirmed the reported
> problem is not seen anymore.

Thanks. Very funky, but thanks. I've been running that commit on my
machine for over half a year, and it still looks "trivially correct"
to me, but let me go look at it one more time. Can't argue with a
reliable bisect and revert..

            Linus


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
  2020-10-22 23:43       ` Linus Torvalds
@ 2020-10-23  0:11         ` Linus Torvalds
  2020-10-23  0:22           ` Linus Torvalds
  2020-10-23  1:36           ` [LTP] " Daniel Díaz
  0 siblings, 2 replies; 18+ messages in thread
From: Linus Torvalds @ 2020-10-23  0:11 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: open list, linux-m68k, X86 ML, LTP List, lkft-triage,
	Linux-Next Mailing List, linux-mm, kasan-dev, Christian Brauner,
	Ingo Molnar, Thomas Gleixner, Matthew Wilcox (Oracle),
	Peter Zijlstra (Intel),
	Al Viro, Geert Uytterhoeven, Viresh Kumar, zenglg.jy,
	Stephen Rothwell, Eric W. Biederman, Dmitry Vyukov

On Thu, Oct 22, 2020 at 4:43 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Thanks. Very funky, but thanks. I've been running that commit on my
> machine for over half a year, and it still looks "trivially correct"
> to me, but let me go look at it one more time. Can't argue with a
> reliable bisect and revert..

Hmm. The fact that it only happens with KASAN makes me suspect it's
some bad interaction with the inline asm syntax change (and explains
why I've run with this for half a year without issues).

In particular, I wonder if it's that KASAN causes some reload pattern,
and the whole

     register __typeof__(*(ptr)) __val_pu asm("%"_ASM_AX);
..
     asm volatile(.. "r" (__val_pu) ..)

thing causes problems. That's an ugly pattern, but it's written that
way to get gcc to handle the 64-bit case properly (with the value in
%rax:%rdx).

It turns out that the decode of the user-mode SIGSEGV code is a
variation of system calls, ie

   0: b8 18 00 00 00        mov    $0x18,%eax
   5: 0f 05                syscall
   7: 48 3d 01 f0 ff ff    cmp    $0xfffffffffffff001,%rax
   d: 73 01                jae    0x10
   f:* c3                    retq    <-- trapping instruction

or

   0: 41 52                push   %r10
   2: 52                    push   %rdx
   3: 4d 31 d2              xor    %r10,%r10
   6: ba 02 00 00 00        mov    $0x2,%edx
   b: be 80 00 00 00        mov    $0x80,%esi
  10: 39 d0                cmp    %edx,%eax
  12: 75 07                jne    0x1b
  14: b8 ca 00 00 00        mov    $0xca,%eax
  19: 0f 05                syscall
  1b: 89 d0                mov    %edx,%eax
  1d: 87 07                xchg   %eax,(%rdi)
  1f: 85 c0                test   %eax,%eax
  21: 75 f1                jne    0x14
  23:* 5a                    pop    %rdx <-- trapping instruction
  24: 41 5a                pop    %r10
  26: c3                    retq

so in both cases it looks like 'syscall' returned with a bad stack pointer.

Which is certainly a sign of some code generation issue.

Very annoying, because it probably means that it's compiler-specific
too. And that "syscall 018" looks very odd. I think that's
sched_yield() on x86-64, which doesn't have any __put_user() cases at
all..

Would you mind sending me the problematic vmlinux file in private (or,
likely better - a pointer to some place I can download it, it's going
to be huge).

                      Linus


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
  2020-10-23  0:11         ` Linus Torvalds
@ 2020-10-23  0:22           ` Linus Torvalds
  2020-10-23  1:36           ` [LTP] " Daniel Díaz
  1 sibling, 0 replies; 18+ messages in thread
From: Linus Torvalds @ 2020-10-23  0:22 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: open list, linux-m68k, X86 ML, LTP List, lkft-triage,
	Linux-Next Mailing List, linux-mm, kasan-dev, Christian Brauner,
	Ingo Molnar, Thomas Gleixner, Matthew Wilcox (Oracle),
	Peter Zijlstra (Intel),
	Al Viro, Geert Uytterhoeven, Viresh Kumar, zenglg.jy,
	Stephen Rothwell, Eric W. Biederman, Dmitry Vyukov

On Thu, Oct 22, 2020 at 5:11 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> In particular, I wonder if it's that KASAN causes some reload pattern,
> and the whole
>
>      register __typeof__(*(ptr)) __val_pu asm("%"_ASM_AX);
> ..
>      asm volatile(.. "r" (__val_pu) ..)
>
> thing causes problems.

That pattern isn't new (see the same pattern and the comment above get_user).

But our previous use of that pattern had it as an output of the asm,
and the new use is as an input. That obviously shouldn't matter, but
if it's some odd compiler code generation interaction, all bets are
off..

                Linus


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
  2020-10-23  0:11         ` Linus Torvalds
  2020-10-23  0:22           ` Linus Torvalds
@ 2020-10-23  1:36           ` Daniel Díaz
  2020-10-23  3:05             ` Linus Torvalds
  1 sibling, 1 reply; 18+ messages in thread
From: Daniel Díaz @ 2020-10-23  1:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Naresh Kamboju, Stephen Rothwell, Matthew Wilcox (Oracle),
	zenglg.jy, Peter Zijlstra (Intel),
	Viresh Kumar, X86 ML, open list, lkft-triage, Eric W. Biederman,
	linux-mm, linux-m68k, Linux-Next Mailing List, Thomas Gleixner,
	kasan-dev, Dmitry Vyukov, Geert Uytterhoeven, Christian Brauner,
	Ingo Molnar, LTP List, Al Viro

Hello!

On Thu, 22 Oct 2020 at 19:11, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Oct 22, 2020 at 4:43 PM Linus Torvalds
> Would you mind sending me the problematic vmlinux file in private (or,
> likely better - a pointer to some place I can download it, it's going
> to be huge).

The kernel Naresh originally referred to is here:
  https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/

Greetings!

Daniel Díaz
daniel.diaz@linaro.org


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
  2020-10-23  1:36           ` [LTP] " Daniel Díaz
@ 2020-10-23  3:05             ` Linus Torvalds
  2020-10-23  5:02               ` Sean Christopherson
  2020-10-23 17:00               ` Naresh Kamboju
  0 siblings, 2 replies; 18+ messages in thread
From: Linus Torvalds @ 2020-10-23  3:05 UTC (permalink / raw)
  To: Daniel Díaz
  Cc: Naresh Kamboju, Stephen Rothwell, Matthew Wilcox (Oracle),
	zenglg.jy, Peter Zijlstra (Intel),
	Viresh Kumar, X86 ML, open list, lkft-triage, Eric W. Biederman,
	linux-mm, linux-m68k, Linux-Next Mailing List, Thomas Gleixner,
	kasan-dev, Dmitry Vyukov, Geert Uytterhoeven, Christian Brauner,
	Ingo Molnar, LTP List, Al Viro

[-- Attachment #1: Type: text/plain, Size: 1398 bytes --]

On Thu, Oct 22, 2020 at 6:36 PM Daniel Díaz <daniel.diaz@linaro.org> wrote:
>
> The kernel Naresh originally referred to is here:
>   https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/

Thanks.

And when I started looking at it, I realized that my original idea
("just look for __put_user_nocheck_X calls, there aren't so many of
those") was garbage, and that I was just being stupid.

Yes, the commit that broke was about __put_user(), but in order to not
duplicate all the code, it re-used the regular put_user()
infrastructure, and so all the normal put_user() calls are potential
problem spots too if this is about the compiler interaction with KASAN
and the asm changes.

So it's not just a couple of special cases to look at, it's all the
normal cases too.

Ok, back to the drawing board, but I think reverting it is probably
the right thing to do if I can't think of something smart.

That said, since you see this on x86-64, where the whole ugly trick with that

   register asm("%"_ASM_AX)

is unnecessary (because the 8-byte case is still just a single
register, no %eax:%edx games needed), it would be interesting to hear
if the attached patch fixes it. That would confirm that the problem
really is due to some register allocation issue interaction (or,
alternatively, it would tell me that there's something else going on).

                  Linus

[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 880 bytes --]

 arch/x86/include/asm/uaccess.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index f13659523108..0f3e202d9eea 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -211,14 +211,14 @@ extern void __put_user_nocheck_8(void);
 #define do_put_user_call(fn,x,ptr)					\
 ({									\
 	int __ret_pu;							\
-	register __typeof__(*(ptr)) __val_pu asm("%"_ASM_AX);		\
+	__typeof__(*(ptr)) __val_pu;					\
 	__chk_user_ptr(ptr);						\
 	__val_pu = (x);							\
 	asm volatile("call __" #fn "_%P[size]"				\
 		     : "=c" (__ret_pu),					\
 			ASM_CALL_CONSTRAINT				\
 		     : "0" (ptr),					\
-		       "r" (__val_pu),					\
+		       "a" (__val_pu),					\
 		       [size] "i" (sizeof(*(ptr)))			\
 		     :"ebx");						\
 	__builtin_expect(__ret_pu, 0);					\

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
  2020-10-23  3:05             ` Linus Torvalds
@ 2020-10-23  5:02               ` Sean Christopherson
  2020-10-23  7:14                 ` Rasmus Villemoes
  2020-10-23 15:52                 ` Linus Torvalds
  2020-10-23 17:00               ` Naresh Kamboju
  1 sibling, 2 replies; 18+ messages in thread
From: Sean Christopherson @ 2020-10-23  5:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Daniel Díaz, Naresh Kamboju, Stephen Rothwell,
	Matthew Wilcox (Oracle), zenglg.jy, Peter Zijlstra (Intel),
	Viresh Kumar, X86 ML, open list, lkft-triage, Eric W. Biederman,
	linux-mm, linux-m68k, Linux-Next Mailing List, Thomas Gleixner,
	kasan-dev, Dmitry Vyukov, Geert Uytterhoeven, Christian Brauner,
	Ingo Molnar, LTP List, Al Viro

On Thu, Oct 22, 2020 at 08:05:05PM -0700, Linus Torvalds wrote:
> On Thu, Oct 22, 2020 at 6:36 PM Daniel Díaz <daniel.diaz@linaro.org> wrote:
> >
> > The kernel Naresh originally referred to is here:
> >   https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/
> 
> Thanks.
> 
> And when I started looking at it, I realized that my original idea
> ("just look for __put_user_nocheck_X calls, there aren't so many of
> those") was garbage, and that I was just being stupid.
> 
> Yes, the commit that broke was about __put_user(), but in order to not
> duplicate all the code, it re-used the regular put_user()
> infrastructure, and so all the normal put_user() calls are potential
> problem spots too if this is about the compiler interaction with KASAN
> and the asm changes.
> 
> So it's not just a couple of special cases to look at, it's all the
> normal cases too.
> 
> Ok, back to the drawing board, but I think reverting it is probably
> the right thing to do if I can't think of something smart.
> 
> That said, since you see this on x86-64, where the whole ugly trick with that
> 
>    register asm("%"_ASM_AX)
> 
> is unnecessary (because the 8-byte case is still just a single
> register, no %eax:%edx games needed), it would be interesting to hear
> if the attached patch fixes it. That would confirm that the problem
> really is due to some register allocation issue interaction (or,
> alternatively, it would tell me that there's something else going on).

I haven't reproduced the crash, but I did find a smoking gun that confirms the
"register shenanigans are evil shenanigans" theory.  I ran into a similar thing
recently where a seemingly innocuous line of code after loading a value into a
register variable wreaked havoc because it clobbered the input register.

This put_user() in schedule_tail():

   if (current->set_child_tid)
           put_user(task_pid_vnr(current), current->set_child_tid);

generates the following assembly with KASAN out-of-line:

   0xffffffff810dccc9 <+73>: xor    %edx,%edx
   0xffffffff810dcccb <+75>: xor    %esi,%esi
   0xffffffff810dcccd <+77>: mov    %rbp,%rdi
   0xffffffff810dccd0 <+80>: callq  0xffffffff810bf5e0 <__task_pid_nr_ns>
   0xffffffff810dccd5 <+85>: mov    %r12,%rdi
   0xffffffff810dccd8 <+88>: callq  0xffffffff81388c60 <__asan_load8>
   0xffffffff810dccdd <+93>: mov    0x590(%rbp),%rcx
   0xffffffff810dcce4 <+100>: callq  0xffffffff817708a0 <__put_user_4>
   0xffffffff810dcce9 <+105>: pop    %rbx
   0xffffffff810dccea <+106>: pop    %rbp
   0xffffffff810dcceb <+107>: pop    %r12

__task_pid_nr_ns() returns the pid in %rax, which gets clobbered by
__asan_load8()'s check on current for the current->set_child_tid dereference.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
  2020-10-23  5:02               ` Sean Christopherson
@ 2020-10-23  7:14                 ` Rasmus Villemoes
  2020-10-23 15:54                   ` Linus Torvalds
  2020-10-23 15:52                 ` Linus Torvalds
  1 sibling, 1 reply; 18+ messages in thread
From: Rasmus Villemoes @ 2020-10-23  7:14 UTC (permalink / raw)
  To: Sean Christopherson, Linus Torvalds
  Cc: Daniel Díaz, Naresh Kamboju, Stephen Rothwell,
	Matthew Wilcox (Oracle), zenglg.jy, Peter Zijlstra (Intel),
	Viresh Kumar, X86 ML, open list, lkft-triage, Eric W. Biederman,
	linux-mm, linux-m68k, Linux-Next Mailing List, Thomas Gleixner,
	kasan-dev, Dmitry Vyukov, Geert Uytterhoeven, Christian Brauner,
	Ingo Molnar, LTP List, Al Viro

On 23/10/2020 07.02, Sean Christopherson wrote:
> On Thu, Oct 22, 2020 at 08:05:05PM -0700, Linus Torvalds wrote:
>> On Thu, Oct 22, 2020 at 6:36 PM Daniel Díaz <daniel.diaz@linaro.org> wrote:
>>>
>>> The kernel Naresh originally referred to is here:
>>>   https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/
>>
>> Thanks.
>>
>> And when I started looking at it, I realized that my original idea
>> ("just look for __put_user_nocheck_X calls, there aren't so many of
>> those") was garbage, and that I was just being stupid.
>>
>> Yes, the commit that broke was about __put_user(), but in order to not
>> duplicate all the code, it re-used the regular put_user()
>> infrastructure, and so all the normal put_user() calls are potential
>> problem spots too if this is about the compiler interaction with KASAN
>> and the asm changes.
>>
>> So it's not just a couple of special cases to look at, it's all the
>> normal cases too.
>>
>> Ok, back to the drawing board, but I think reverting it is probably
>> the right thing to do if I can't think of something smart.
>>
>> That said, since you see this on x86-64, where the whole ugly trick with that
>>
>>    register asm("%"_ASM_AX)
>>
>> is unnecessary (because the 8-byte case is still just a single
>> register, no %eax:%edx games needed), it would be interesting to hear
>> if the attached patch fixes it. That would confirm that the problem
>> really is due to some register allocation issue interaction (or,
>> alternatively, it would tell me that there's something else going on).
> 
> I haven't reproduced the crash, but I did find a smoking gun that confirms the
> "register shenanigans are evil shenanigans" theory.  I ran into a similar thing
> recently where a seemingly innocuous line of code after loading a value into a
> register variable wreaked havoc because it clobbered the input register.
> 
> This put_user() in schedule_tail():
> 
>    if (current->set_child_tid)
>            put_user(task_pid_vnr(current), current->set_child_tid);
> 
> generates the following assembly with KASAN out-of-line:
> 
>    0xffffffff810dccc9 <+73>: xor    %edx,%edx
>    0xffffffff810dcccb <+75>: xor    %esi,%esi
>    0xffffffff810dcccd <+77>: mov    %rbp,%rdi
>    0xffffffff810dccd0 <+80>: callq  0xffffffff810bf5e0 <__task_pid_nr_ns>
>    0xffffffff810dccd5 <+85>: mov    %r12,%rdi
>    0xffffffff810dccd8 <+88>: callq  0xffffffff81388c60 <__asan_load8>
>    0xffffffff810dccdd <+93>: mov    0x590(%rbp),%rcx
>    0xffffffff810dcce4 <+100>: callq  0xffffffff817708a0 <__put_user_4>
>    0xffffffff810dcce9 <+105>: pop    %rbx
>    0xffffffff810dccea <+106>: pop    %rbp
>    0xffffffff810dcceb <+107>: pop    %r12
> 
> __task_pid_nr_ns() returns the pid in %rax, which gets clobbered by
> __asan_load8()'s check on current for the current->set_child_tid dereference.
> 

Yup, and you don't need KASAN to implicitly generate function calls for
you. With x86_64 defconfig, I get

extern u64 __user *get_destination(int x, int y);

void pu_test(void)
{
	u64 big = 0x1234abcd5678;

	if (put_user(big, get_destination(4, 5)))
		pr_warn("uh");
}

to generate

0000000000004d60 <pu_test>:
    4d60:       53                      push   %rbx
    4d61:       be 05 00 00 00          mov    $0x5,%esi
    4d66:       bf 04 00 00 00          mov    $0x4,%edi
    4d6b:       e8 00 00 00 00          callq  4d70 <pu_test+0x10>
                        4d6c: R_X86_64_PC32     get_destination-0x4
    4d70:       48 89 c1                mov    %rax,%rcx
    4d73:       e8 00 00 00 00          callq  4d78 <pu_test+0x18>
                        4d74: R_X86_64_PC32     __put_user_8-0x4
    4d78:       85 c9                   test   %ecx,%ecx
    4d7a:       75 02                   jne    4d7e <pu_test+0x1e>
    4d7c:       5b                      pop    %rbx
    4d7d:       c3                      retq
    4d7e:       5b                      pop    %rbx
    4d7f:       48 c7 c7 00 00 00 00    mov    $0x0,%rdi
                        4d82: R_X86_64_32S      .rodata.str1.1+0xfa
    4d86:       e9 00 00 00 00          jmpq   4d8b <pu_test+0x2b>
                        4d87: R_X86_64_PC32     printk-0x4


That's certainly garbage. Now, I don't know if it's a sufficient fix (or
could break something else), but the obvious first step of rearranging
so that the ptr argument is evaluated before the assignment to __val_pu

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 477c503f2753..b5d3290fcd09 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -235,13 +235,13 @@ extern void __put_user_nocheck_8(void);
 #define do_put_user_call(fn,x,ptr)                                     \
 ({                                                                     \
        int __ret_pu;                                                   \
-       register __typeof__(*(ptr)) __val_pu asm("%"_ASM_AX);           \
+       __typeof__(ptr) __ptr = (ptr);                                  \
+       register __typeof__(*(ptr)) __val_pu asm("%"_ASM_AX) = (x);     \
        __chk_user_ptr(ptr);                                            \
-       __val_pu = (x);                                                 \
        asm volatile("call __" #fn "_%P[size]"                          \
                     : "=c" (__ret_pu),                                 \
                        ASM_CALL_CONSTRAINT                             \
-                    : "0" (ptr),                                       \
+                    : "0" (__ptr),                                     \
                       "r" (__val_pu),                                  \
                       [size] "i" (sizeof(*(ptr)))                      \
                     :"ebx");                                           \


at least gets us

0000000000004d60 <pu_test>:
    4d60:       53                      push   %rbx
    4d61:       be 05 00 00 00          mov    $0x5,%esi
    4d66:       bf 04 00 00 00          mov    $0x4,%edi
    4d6b:       e8 00 00 00 00          callq  4d70 <pu_test+0x10>
                        4d6c: R_X86_64_PC32     get_destination-0x4
    4d70:       48 89 c1                mov    %rax,%rcx
    4d73:       48 b8 78 56 cd ab 34    movabs $0x1234abcd5678,%rax
    4d7a:       12 00 00
    4d7d:       e8 00 00 00 00          callq  4d82 <pu_test+0x22>
                        4d7e: R_X86_64_PC32     __put_user_8-0x4


FWIW, https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html
does warn about function calls and other things that might clobber the
register variables between the assignment and the use as an input
(though the case of evaluating other input operands is not explicitly
mentioned).

Rasmus


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
  2020-10-23  5:02               ` Sean Christopherson
  2020-10-23  7:14                 ` Rasmus Villemoes
@ 2020-10-23 15:52                 ` Linus Torvalds
  1 sibling, 0 replies; 18+ messages in thread
From: Linus Torvalds @ 2020-10-23 15:52 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Daniel Díaz, Naresh Kamboju, Stephen Rothwell,
	Matthew Wilcox (Oracle), zenglg.jy, Peter Zijlstra (Intel),
	Viresh Kumar, X86 ML, open list, lkft-triage, Eric W. Biederman,
	linux-mm, linux-m68k, Linux-Next Mailing List, Thomas Gleixner,
	kasan-dev, Dmitry Vyukov, Geert Uytterhoeven, Christian Brauner,
	Ingo Molnar, LTP List, Al Viro

On Thu, Oct 22, 2020 at 10:02 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> I haven't reproduced the crash, but I did find a smoking gun that confirms the
> "register shenanigans are evil shenanigans" theory.  I ran into a similar thing
> recently where a seemingly innocuous line of code after loading a value into a
> register variable wreaked havoc because it clobbered the input register.

Yup, that certainly looks like the smoking gun.

Thanks for finding an example of this, clearly I'll have to either go
back to the "conditionally use 'A' or 'a' depending on size" model, or
perhaps try Rasmus' patch.

              Linus


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
  2020-10-23  7:14                 ` Rasmus Villemoes
@ 2020-10-23 15:54                   ` Linus Torvalds
  2020-10-23 16:32                     ` Linus Torvalds
  0 siblings, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2020-10-23 15:54 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Sean Christopherson, Daniel Díaz, Naresh Kamboju,
	Stephen Rothwell, Matthew Wilcox (Oracle),
	zenglg.jy, Peter Zijlstra (Intel),
	Viresh Kumar, X86 ML, open list, lkft-triage, Eric W. Biederman,
	linux-mm, linux-m68k, Linux-Next Mailing List, Thomas Gleixner,
	kasan-dev, Dmitry Vyukov, Geert Uytterhoeven, Christian Brauner,
	Ingo Molnar, LTP List, Al Viro

On Fri, Oct 23, 2020 at 12:14 AM Rasmus Villemoes
<linux@rasmusvillemoes.dk> wrote:
>
> That's certainly garbage. Now, I don't know if it's a sufficient fix (or
> could break something else), but the obvious first step of rearranging
> so that the ptr argument is evaluated before the assignment to __val_pu

Ack. We could do that.

I'm more inclined to just bite the bullet and go back to the ugly
conditional on the size that I had hoped to avoid, but if that turns
out too ugly, mind signing off on your patch and I'll have that as a
fallback?

             Linus


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
  2020-10-23 15:54                   ` Linus Torvalds
@ 2020-10-23 16:32                     ` Linus Torvalds
  2020-10-23 17:50                       ` Naresh Kamboju
  0 siblings, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2020-10-23 16:32 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Sean Christopherson, Daniel Díaz, Naresh Kamboju,
	Stephen Rothwell, Matthew Wilcox (Oracle),
	zenglg.jy, Peter Zijlstra (Intel),
	Viresh Kumar, X86 ML, open list, lkft-triage, Eric W. Biederman,
	linux-mm, linux-m68k, Linux-Next Mailing List, Thomas Gleixner,
	kasan-dev, Dmitry Vyukov, Geert Uytterhoeven, Christian Brauner,
	Ingo Molnar, LTP List, Al Viro

[-- Attachment #1: Type: text/plain, Size: 1486 bytes --]

On Fri, Oct 23, 2020 at 8:54 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Fri, Oct 23, 2020 at 12:14 AM Rasmus Villemoes
> <linux@rasmusvillemoes.dk> wrote:
> >
> > That's certainly garbage. Now, I don't know if it's a sufficient fix (or
> > could break something else), but the obvious first step of rearranging
> > so that the ptr argument is evaluated before the assignment to __val_pu
>
> Ack. We could do that.
>
> I'm more inclined to just bite the bullet and go back to the ugly
> conditional on the size that I had hoped to avoid, but if that turns
> out too ugly, mind signing off on your patch and I'll have that as a
> fallback?

Actually, looking at that code, and the fact that we've used the
"register asm()" format forever for the get_user() side, I think your
approach is the right one.

I'd rename the internal ptr variable to "__ptr_pu", and make sure the
assignments happen just before the asm call (with the __val_pu
assignment being the final thing).

lso, it needs to be

        void __user *__ptr_pu;

instead of

        __typeof__(ptr) __ptr = (ptr);

because "ptr" may actually be an array, and we need to have the usual
C "array to pointer" conversions happen, rather than try to make
__ptr_pu be an array too.

So the patch would become something like the appended instead, but I'd
still like your sign-off (and I'd put you as author of the fix).

Narest, can you confirm that this patch fixes the issue for you?

                  Linus

[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 862 bytes --]

 arch/x86/include/asm/uaccess.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index f13659523108..d006af915d4a 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -211,13 +211,15 @@ extern void __put_user_nocheck_8(void);
 #define do_put_user_call(fn,x,ptr)					\
 ({									\
 	int __ret_pu;							\
+	void __user *__ptr_pu;						\
 	register __typeof__(*(ptr)) __val_pu asm("%"_ASM_AX);		\
 	__chk_user_ptr(ptr);						\
+	__ptr_pu = (ptr);						\
 	__val_pu = (x);							\
 	asm volatile("call __" #fn "_%P[size]"				\
 		     : "=c" (__ret_pu),					\
 			ASM_CALL_CONSTRAINT				\
-		     : "0" (ptr),					\
+		     : "0" (__ptr_pu),					\
 		       "r" (__val_pu),					\
 		       [size] "i" (sizeof(*(ptr)))			\
 		     :"ebx");						\

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
  2020-10-23  3:05             ` Linus Torvalds
  2020-10-23  5:02               ` Sean Christopherson
@ 2020-10-23 17:00               ` Naresh Kamboju
  2020-10-23 17:50                 ` Linus Torvalds
  1 sibling, 1 reply; 18+ messages in thread
From: Naresh Kamboju @ 2020-10-23 17:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Daniel Díaz, Stephen Rothwell, Matthew Wilcox (Oracle),
	Peter Zijlstra (Intel),
	Viresh Kumar, X86 ML, open list, lkft-triage, Eric W. Biederman,
	linux-mm, linux-m68k, Linux-Next Mailing List, Thomas Gleixner,
	kasan-dev, Dmitry Vyukov, Geert Uytterhoeven, Christian Brauner,
	Ingo Molnar, LTP List, Al Viro

On Fri, 23 Oct 2020 at 08:35, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Thu, Oct 22, 2020 at 6:36 PM Daniel Díaz <daniel.diaz@linaro.org> wrote:
> >
> > The kernel Naresh originally referred to is here:
> >   https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/
>
> is unnecessary (because the 8-byte case is still just a single
> register, no %eax:%edx games needed), it would be interesting to hear
> if the attached patch fixes it. That would confirm that the problem
> really is due to some register allocation issue interaction (or,
> alternatively, it would tell me that there's something else going on).

[Old patch from yesterday]

After applying your patch on top on linux next tag 20201015
there are two observations,
  1) i386 build failed. please find build error build
  2) x86_64 kasan test PASS and the reported error not found.


i386 build failure,
----------------------
make -sk KBUILD_BUILD_USER=TuxBuild -C/linux -j16 ARCH=i386 HOSTCC=gcc
CC="sccache gcc" O=build
#
In file included from ../include/linux/uaccess.h:11,
                 from ../arch/x86/include/asm/fpu/xstate.h:5,
                 from ../arch/x86/include/asm/pgtable.h:26,
                 from ../include/linux/pgtable.h:6,
                 from ../include/linux/mm.h:33,
                 from ../include/linux/memblock.h:13,
                 from ../fs/proc/page.c:2:
../fs/proc/page.c: In function ‘kpagecgroup_read’:
../arch/x86/include/asm/uaccess.h:217:2: error: inconsistent operand
constraints in an ‘asm’
  217 |  asm volatile("call __" #fn "_%P[size]"    \
      |  ^~~
../arch/x86/include/asm/uaccess.h:244:44: note: in expansion of macro
‘do_put_user_call’
  244 | #define put_user(x, ptr) ({ might_fault();
do_put_user_call(put_user,x,ptr); })
      |                                            ^~~~~~~~~~~~~~~~
../fs/proc/page.c:307:7: note: in expansion of macro ‘put_user’
  307 |   if (put_user(ino, out)) {
      |       ^~~~~~~~
make[3]: *** [../scripts/Makefile.build:283: fs/proc/page.o] Error 1
make[3]: Target '__build' not remade because of errors.
make[2]: *** [../scripts/Makefile.build:500: fs/proc] Error 2
In file included from ../include/linux/uaccess.h:11,
                 from ../include/linux/sched/task.h:11,
                 from ../include/linux/sched/signal.h:9,
                 from ../include/linux/rcuwait.h:6,
                 from ../include/linux/percpu-rwsem.h:7,
                 from ../include/linux/fs.h:33,
                 from ../include/linux/cgroup.h:17,
                 from ../include/linux/memcontrol.h:13,
                 from ../include/linux/swap.h:9,
                 from ../include/linux/suspend.h:5,
                 from ../kernel/power/user.c:10:
../kernel/power/user.c: In function ‘snapshot_ioctl’:
../arch/x86/include/asm/uaccess.h:217:2: error: inconsistent operand
constraints in an ‘asm’
  217 |  asm volatile("call __" #fn "_%P[size]"    \
      |  ^~~
../arch/x86/include/asm/uaccess.h:244:44: note: in expansion of macro
‘do_put_user_call’
  244 | #define put_user(x, ptr) ({ might_fault();
do_put_user_call(put_user,x,ptr); })
      |                                            ^~~~~~~~~~~~~~~~
../kernel/power/user.c:340:11: note: in expansion of macro ‘put_user’
  340 |   error = put_user(size, (loff_t __user *)arg);
      |           ^~~~~~~~
../arch/x86/include/asm/uaccess.h:217:2: error: inconsistent operand
constraints in an ‘asm’
  217 |  asm volatile("call __" #fn "_%P[size]"    \
      |  ^~~
../arch/x86/include/asm/uaccess.h:244:44: note: in expansion of macro
‘do_put_user_call’
  244 | #define put_user(x, ptr) ({ might_fault();
do_put_user_call(put_user,x,ptr); })
      |                                            ^~~~~~~~~~~~~~~~
../kernel/power/user.c:346:11: note: in expansion of macro ‘put_user’
  346 |   error = put_user(size, (loff_t __user *)arg);
      |           ^~~~~~~~
../arch/x86/include/asm/uaccess.h:217:2: error: inconsistent operand
constraints in an ‘asm’
  217 |  asm volatile("call __" #fn "_%P[size]"    \
      |  ^~~
../arch/x86/include/asm/uaccess.h:244:44: note: in expansion of macro
‘do_put_user_call’
  244 | #define put_user(x, ptr) ({ might_fault();
do_put_user_call(put_user,x,ptr); })
      |                                            ^~~~~~~~~~~~~~~~
../kernel/power/user.c:357:12: note: in expansion of macro ‘put_user’
  357 |    error = put_user(offset, (loff_t __user *)arg);
      |            ^~~~~~~~


x86_64 Kasan tested and the reported issue not found.
https://lkft.validation.linaro.org/scheduler/job/1868029#L2374

- Naresh


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
  2020-10-23 16:32                     ` Linus Torvalds
@ 2020-10-23 17:50                       ` Naresh Kamboju
  0 siblings, 0 replies; 18+ messages in thread
From: Naresh Kamboju @ 2020-10-23 17:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rasmus Villemoes, Sean Christopherson, Daniel Díaz,
	Stephen Rothwell, Matthew Wilcox (Oracle),
	zenglg.jy, Peter Zijlstra (Intel),
	Viresh Kumar, X86 ML, open list, lkft-triage, Eric W. Biederman,
	linux-mm, linux-m68k, Linux-Next Mailing List, Thomas Gleixner,
	kasan-dev, Dmitry Vyukov, Geert Uytterhoeven, Christian Brauner,
	Ingo Molnar, LTP List, Al Viro

On Fri, 23 Oct 2020 at 22:03, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Fri, Oct 23, 2020 at 8:54 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Fri, Oct 23, 2020 at 12:14 AM Rasmus Villemoes
> > <linux@rasmusvillemoes.dk> wrote:
> > >
> > > That's certainly garbage. Now, I don't know if it's a sufficient fix (or
> > > could break something else), but the obvious first step of rearranging
> > > so that the ptr argument is evaluated before the assignment to __val_pu
> >
> > Ack. We could do that.
> >
> > I'm more inclined to just bite the bullet and go back to the ugly
> > conditional on the size that I had hoped to avoid, but if that turns
> > out too ugly, mind signing off on your patch and I'll have that as a
> > fallback?
>
> Actually, looking at that code, and the fact that we've used the
> "register asm()" format forever for the get_user() side, I think your
> approach is the right one.
>
> I'd rename the internal ptr variable to "__ptr_pu", and make sure the
> assignments happen just before the asm call (with the __val_pu
> assignment being the final thing).
>
> lso, it needs to be
>
>         void __user *__ptr_pu;
>
> instead of
>
>         __typeof__(ptr) __ptr = (ptr);
>
> because "ptr" may actually be an array, and we need to have the usual
> C "array to pointer" conversions happen, rather than try to make
> __ptr_pu be an array too.
>
> So the patch would become something like the appended instead, but I'd
> still like your sign-off (and I'd put you as author of the fix).
>
> Narest, can you confirm that this patch fixes the issue for you?

This patch fixed the reported problem.

Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>

Build location:
https://builds.tuxbuild.com/uDAiW8jkN61oWoyxZDkEYA/

Test logs,
https://lkft.validation.linaro.org/scheduler/job/1868045#L1597

- Naresh


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
  2020-10-23 17:00               ` Naresh Kamboju
@ 2020-10-23 17:50                 ` Linus Torvalds
  2020-10-23 21:15                   ` Song Liu
  0 siblings, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2020-10-23 17:50 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: Daniel Díaz, Stephen Rothwell, Matthew Wilcox (Oracle),
	Peter Zijlstra (Intel),
	Viresh Kumar, X86 ML, open list, lkft-triage, Eric W. Biederman,
	linux-mm, linux-m68k, Linux-Next Mailing List, Thomas Gleixner,
	kasan-dev, Dmitry Vyukov, Geert Uytterhoeven, Christian Brauner,
	Ingo Molnar, LTP List, Al Viro

On Fri, Oct 23, 2020 at 10:00 AM Naresh Kamboju
<naresh.kamboju@linaro.org> wrote:
>
> [Old patch from yesterday]
>
> After applying your patch on top on linux next tag 20201015
> there are two observations,
>   1) i386 build failed. please find build error build

Yes, this was expected. That patch explicitly only works on x86-64,
because 32-bit needs the double register handling for 64-bit values
(mainly loff_t).

>   2) x86_64 kasan test PASS and the reported error not found.

Ok, good. That confirms that the problem you reported is indeed the
register allocation.

The patch I sent an hour ago (the one based on Rasmus' one from
yesterday) should fix things too, and - unlike yesterday's - work on
32-bit.

But I'll wait for confirmation (and hopefully a sign-off from Rasmus
so that I can give him authorship) before actually committing it.

              Linus


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
  2020-10-23 17:50                 ` Linus Torvalds
@ 2020-10-23 21:15                   ` Song Liu
  0 siblings, 0 replies; 18+ messages in thread
From: Song Liu @ 2020-10-23 21:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Naresh Kamboju, Daniel Díaz, Stephen Rothwell,
	Matthew Wilcox (Oracle), Peter Zijlstra (Intel),
	Viresh Kumar, X86 ML, open list, lkft-triage, Eric W. Biederman,
	linux-mm, linux-m68k, Linux-Next Mailing List, Thomas Gleixner,
	kasan-dev, Dmitry Vyukov, Geert Uytterhoeven, Christian Brauner,
	Ingo Molnar, LTP List, Al Viro

On Fri, Oct 23, 2020 at 10:51 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Fri, Oct 23, 2020 at 10:00 AM Naresh Kamboju
> <naresh.kamboju@linaro.org> wrote:
> >
> > [Old patch from yesterday]
> >
> > After applying your patch on top on linux next tag 20201015
> > there are two observations,
> >   1) i386 build failed. please find build error build
>
> Yes, this was expected. That patch explicitly only works on x86-64,
> because 32-bit needs the double register handling for 64-bit values
> (mainly loff_t).
>
> >   2) x86_64 kasan test PASS and the reported error not found.
>
> Ok, good. That confirms that the problem you reported is indeed the
> register allocation.
>
> The patch I sent an hour ago (the one based on Rasmus' one from
> yesterday) should fix things too, and - unlike yesterday's - work on
> 32-bit.
>
> But I'll wait for confirmation (and hopefully a sign-off from Rasmus
> so that I can give him authorship) before actually committing it.
>
>               Linus

My test vm failed to boot since

commit d55564cfc222326e944893eff0c4118353e349ec
x86: Make __put_user() generate an out-of-line call

The patch also fixed it.

Thanks!
Song


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-10-23 21:16 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-21 16:58 mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000] Naresh Kamboju
2020-10-21 17:05 ` Linus Torvalds
2020-10-21 17:22   ` Naresh Kamboju
2020-10-22 20:55     ` Naresh Kamboju
2020-10-22 23:43       ` Linus Torvalds
2020-10-23  0:11         ` Linus Torvalds
2020-10-23  0:22           ` Linus Torvalds
2020-10-23  1:36           ` [LTP] " Daniel Díaz
2020-10-23  3:05             ` Linus Torvalds
2020-10-23  5:02               ` Sean Christopherson
2020-10-23  7:14                 ` Rasmus Villemoes
2020-10-23 15:54                   ` Linus Torvalds
2020-10-23 16:32                     ` Linus Torvalds
2020-10-23 17:50                       ` Naresh Kamboju
2020-10-23 15:52                 ` Linus Torvalds
2020-10-23 17:00               ` Naresh Kamboju
2020-10-23 17:50                 ` Linus Torvalds
2020-10-23 21:15                   ` Song Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).