All of lore.kernel.org
 help / color / mirror / Atom feed
* linux v5.18.3 fails to boot
@ 2022-06-09 18:13 John David Anglin
  2022-06-10 15:06 ` John David Anglin
  0 siblings, 1 reply; 8+ messages in thread
From: John David Anglin @ 2022-06-09 18:13 UTC (permalink / raw)
  To: linux-parisc; +Cc: Helge Deller

[-- Attachment #1: Type: text/plain, Size: 2723 bytes --]

[...]
ata3: SATA link down (SStatus 0 SControl 0)
       _______________________________
      < Your System ate a SPARC! Gah! >
       -------------------------------
              \   ^__^
                  (__)\       )\/\
                   U  ||----w |
                      ||     ||
swapper/0 (pid 0): Illegal instruction (code 8)
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.3+ #1
Hardware name: 9000/785/C8000

      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000000001000001101100001110 Not tainted
r00-03  0000000008041b0e 000000004e8045b0 0000000010bbda78 000000004e804470
r04-07  0000000010bb5000 0000000000001440 0000000054355000 0000000000001400
r08-11  0000000055000000 000000000000000e 000000000000000f 0000000055002800
r12-15  0000000000000000 0000000055002800 0000000040b668c0 0000000040b668c0
r16-19  0000000000000001 0000000000000001 0000000051b799f0 0000000000000000
r20-23  0000000000000001 fffffffffffff5b9 0000000000000000 0000000000200000
r24-27  000000000000000c 000000000800000e 0000000054355144 0000000040b3e0c0
r28-31  0000000000010001 000000004e804620 000000004e8045b0 0000000040edd040
sr00-03  00000000000a5c00 0000000000000000 0000000000000000 00000000000a5c00
sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000

IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000010bbd9e0 0000000010bbd9e4
  IIR: 006110c2    ISR: 0000000010240000  IOR: 00000003b76dd048
  CPU:        0   CR30: 0000000040edd040 CR31: ffffffffffffffff
  ORIG_R28: 0000000000000000
  IAOQ[0]: mpt_reply+0x130/0x4f0 [mptbase]
  IAOQ[1]: mpt_reply+0x134/0x4f0 [mptbase]
  RP(r2): mpt_reply+0x1c8/0x4f0 [mptbase]
Backtrace:
  [<0000000010bbde24>] mpt_interrupt+0x84/0xe8 [mptbase]
  [<000000004026dd64>] __handle_irq_event_percpu+0xc4/0x250
  [<000000004026df28>] handle_irq_event_percpu+0x38/0xd8
  [<00000000402776c4>] handle_percpu_irq+0xb4/0xf0
  [<000000004026c90c>] generic_handle_irq+0x5c/0x90
  [<00000000401a20e4>] call_on_stack+0x18/0x24

CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.3+ #1
Hardware name: 9000/785/C8000
Backtrace:
  [<00000000401a8cd8>] show_stack+0x70/0x90
  [<0000000040a8e238>] dump_stack_lvl+0xd8/0x128
  [<0000000040a8e2bc>] dump_stack+0x34/0x48
  [<00000000401a8efc>] die_if_kernel+0x1e4/0x3f8
  [<00000000401a9af4>] handle_interruption+0x59c/0xb58
  [<00000000401a107c>] intr_check_sig+0x0/0x3c

Kernel panic - not syncing: Fatal exception in interrupt

v5.18.2 with similar config is okay.  The fault seems consistent. IIR contains illegal instruction.

Attached config.

Dave

-- 
John David Anglin  dave.anglin@bell.net

[-- Attachment #2: config-5.18.3+.gz --]
[-- Type: application/x-gzip, Size: 22714 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: linux v5.18.3 fails to boot
  2022-06-09 18:13 linux v5.18.3 fails to boot John David Anglin
@ 2022-06-10 15:06 ` John David Anglin
  2022-06-10 16:06   ` Kuniyuki Iwashima
  0 siblings, 1 reply; 8+ messages in thread
From: John David Anglin @ 2022-06-10 15:06 UTC (permalink / raw)
  To: linux-parisc; +Cc: Helge Deller, Kuniyuki Iwashima

I did a regression search. e039c0b5985999b150594126225e1ee51df7b4c9 is the first bad commit

commit e039c0b5985999b150594126225e1ee51df7b4c9
Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Date:   Fri Apr 29 14:38:01 2022 -0700

     list: fix a data-race around ep->rdllist

     [ Upstream commit d679ae94fdd5d3ab00c35078f5af5f37e068b03d ]

     ep_poll() first calls ep_events_available() with no lock held and checks
     if ep->rdllist is empty by list_empty_careful(), which reads
     rdllist->prev.  Thus all accesses to it need some protection to avoid
     store/load-tearing.

     Note INIT_LIST_HEAD_RCU() already has the annotation for both prev
     and next.

     Commit bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket
     fds.") added the first lockless ep_events_available(), and commit
     c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()")
     made some ep_events_available() calls lockless and added single call under
     a lock, finally commit e59d3c64cba6 ("epoll: eliminate unnecessary lock
     for zero timeout") made the last ep_events_available() lockless.

     BUG: KCSAN: data-race in do_epoll_wait / do_epoll_wait

     write to 0xffff88810480c7d8 of 8 bytes by task 1802 on cpu 0:
      INIT_LIST_HEAD include/linux/list.h:38 [inline]
      list_splice_init include/linux/list.h:492 [inline]
      ep_start_scan fs/eventpoll.c:622 [inline]
      ep_send_events fs/eventpoll.c:1656 [inline]
      ep_poll fs/eventpoll.c:1806 [inline]
      do_epoll_wait+0x4eb/0xf40 fs/eventpoll.c:2234
      do_epoll_pwait fs/eventpoll.c:2268 [inline]
      __do_sys_epoll_pwait fs/eventpoll.c:2281 [inline]
      __se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275
      __x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x44/0xae

     read to 0xffff88810480c7d8 of 8 bytes by task 1799 on cpu 1:
      list_empty_careful include/linux/list.h:329 [inline]
      ep_events_available fs/eventpoll.c:381 [inline]
      ep_poll fs/eventpoll.c:1797 [inline]
      do_epoll_wait+0x279/0xf40 fs/eventpoll.c:2234
      do_epoll_pwait fs/eventpoll.c:2268 [inline]
      __do_sys_epoll_pwait fs/eventpoll.c:2281 [inline]
      __se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275
      __x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x44/0xae

     value changed: 0xffff88810480c7d0 -> 0xffff888103c15098

     Reported by Kernel Concurrency Sanitizer on:
     CPU: 1 PID: 1799 Comm: syz-fuzzer Tainted: G        W 5.17.0-rc7-syzkaller-dirty #0
     Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

     Link: https://lkml.kernel.org/r/20220322002653.33865-3-kuniyu@amazon.co.jp
     Fixes: e59d3c64cba6 ("epoll: eliminate unnecessary lock for zero timeout")
     Fixes: c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()")
     Fixes: bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket fds.")
     Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
     Reported-by: syzbot+bdd6e38a1ed5ee58d8bd@syzkaller.appspotmail.com
     Cc: Al Viro <viro@zeniv.linux.org.uk>, Andrew Morton <akpm@linux-foundation.org>
     Cc: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
     Cc: Kuniyuki Iwashima <kuni1840@gmail.com>
     Cc: "Soheil Hassas Yeganeh" <soheil@google.com>
     Cc: Davidlohr Bueso <dave@stgolabs.net>
     Cc: "Sridhar Samudrala" <sridhar.samudrala@intel.com>
     Cc: Alexander Duyck <alexander.h.duyck@intel.com>
     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
     Signed-off-by: Sasha Levin <sashal@kernel.org>

  include/linux/list.h | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

Reverting above change fixes v5.18.3 boot.

It seems the most significant byte of the "ldd -10(r3),rp" instruction in mpt_reply has been
set to 0:

     4084:       bf 80 21 18     cmpb,*<> r0,ret0,4118 <mpt_reply+0x1b8>
     4088:       08 04 02 5b     copy r4,dp
     408c:       00 00 04 00     sync
     4090:       0c 61 10 c2     ldd -10(r3),rp

See IIR value in crash output.

On 2022-06-09 2:13 p.m., John David Anglin wrote:
> [...]
> ata3: SATA link down (SStatus 0 SControl 0)
>       _______________________________
>      < Your System ate a SPARC! Gah! >
>       -------------------------------
>              \   ^__^
>                  (__)\       )\/\
>                   U  ||----w |
>                      ||     ||
> swapper/0 (pid 0): Illegal instruction (code 8)
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.3+ #1
> Hardware name: 9000/785/C8000
>
>      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
> PSW: 00001000000001000001101100001110 Not tainted
> r00-03  0000000008041b0e 000000004e8045b0 0000000010bbda78 000000004e804470
> r04-07  0000000010bb5000 0000000000001440 0000000054355000 0000000000001400
> r08-11  0000000055000000 000000000000000e 000000000000000f 0000000055002800
> r12-15  0000000000000000 0000000055002800 0000000040b668c0 0000000040b668c0
> r16-19  0000000000000001 0000000000000001 0000000051b799f0 0000000000000000
> r20-23  0000000000000001 fffffffffffff5b9 0000000000000000 0000000000200000
> r24-27  000000000000000c 000000000800000e 0000000054355144 0000000040b3e0c0
> r28-31  0000000000010001 000000004e804620 000000004e8045b0 0000000040edd040
> sr00-03  00000000000a5c00 0000000000000000 0000000000000000 00000000000a5c00
> sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>
> IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000010bbd9e0 0000000010bbd9e4
>  IIR: 006110c2    ISR: 0000000010240000  IOR: 00000003b76dd048
>  CPU:        0   CR30: 0000000040edd040 CR31: ffffffffffffffff
>  ORIG_R28: 0000000000000000
>  IAOQ[0]: mpt_reply+0x130/0x4f0 [mptbase]
>  IAOQ[1]: mpt_reply+0x134/0x4f0 [mptbase]
>  RP(r2): mpt_reply+0x1c8/0x4f0 [mptbase]
> Backtrace:
>  [<0000000010bbde24>] mpt_interrupt+0x84/0xe8 [mptbase]
>  [<000000004026dd64>] __handle_irq_event_percpu+0xc4/0x250
>  [<000000004026df28>] handle_irq_event_percpu+0x38/0xd8
>  [<00000000402776c4>] handle_percpu_irq+0xb4/0xf0
>  [<000000004026c90c>] generic_handle_irq+0x5c/0x90
>  [<00000000401a20e4>] call_on_stack+0x18/0x24
>
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.3+ #1
> Hardware name: 9000/785/C8000
> Backtrace:
>  [<00000000401a8cd8>] show_stack+0x70/0x90
>  [<0000000040a8e238>] dump_stack_lvl+0xd8/0x128
>  [<0000000040a8e2bc>] dump_stack+0x34/0x48
>  [<00000000401a8efc>] die_if_kernel+0x1e4/0x3f8
>  [<00000000401a9af4>] handle_interruption+0x59c/0xb58
>  [<00000000401a107c>] intr_check_sig+0x0/0x3c
>
> Kernel panic - not syncing: Fatal exception in interrupt
>
> v5.18.2 with similar config is okay.  The fault seems consistent. IIR contains illegal instruction.
>
> Attached config.
>
> Dave
>


-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: linux v5.18.3 fails to boot
  2022-06-10 15:06 ` John David Anglin
@ 2022-06-10 16:06   ` Kuniyuki Iwashima
  2022-06-10 16:49     ` John David Anglin
  0 siblings, 1 reply; 8+ messages in thread
From: Kuniyuki Iwashima @ 2022-06-10 16:06 UTC (permalink / raw)
  To: dave.anglin; +Cc: deller, kuniyu, linux-parisc

Hello,
Thanks for heads up!

Date:   Fri, 10 Jun 2022 11:06:24 -0400
From:   John David Anglin <dave.anglin@bell.net>
> I did a regression search. e039c0b5985999b150594126225e1ee51df7b4c9 is the first bad commit
> 
> commit e039c0b5985999b150594126225e1ee51df7b4c9
> Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
> Date:   Fri Apr 29 14:38:01 2022 -0700
> 
>      list: fix a data-race around ep->rdllist
> 
>      [ Upstream commit d679ae94fdd5d3ab00c35078f5af5f37e068b03d ]
> 
>      ep_poll() first calls ep_events_available() with no lock held and checks
>      if ep->rdllist is empty by list_empty_careful(), which reads
>      rdllist->prev.  Thus all accesses to it need some protection to avoid
>      store/load-tearing.
> 
>      Note INIT_LIST_HEAD_RCU() already has the annotation for both prev
>      and next.
> 
>      Commit bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket
>      fds.") added the first lockless ep_events_available(), and commit
>      c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()")
>      made some ep_events_available() calls lockless and added single call under
>      a lock, finally commit e59d3c64cba6 ("epoll: eliminate unnecessary lock
>      for zero timeout") made the last ep_events_available() lockless.
> 
>      BUG: KCSAN: data-race in do_epoll_wait / do_epoll_wait
> 
>      write to 0xffff88810480c7d8 of 8 bytes by task 1802 on cpu 0:
>       INIT_LIST_HEAD include/linux/list.h:38 [inline]
>       list_splice_init include/linux/list.h:492 [inline]
>       ep_start_scan fs/eventpoll.c:622 [inline]
>       ep_send_events fs/eventpoll.c:1656 [inline]
>       ep_poll fs/eventpoll.c:1806 [inline]
>       do_epoll_wait+0x4eb/0xf40 fs/eventpoll.c:2234
>       do_epoll_pwait fs/eventpoll.c:2268 [inline]
>       __do_sys_epoll_pwait fs/eventpoll.c:2281 [inline]
>       __se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275
>       __x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275
>       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
>       entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
>      read to 0xffff88810480c7d8 of 8 bytes by task 1799 on cpu 1:
>       list_empty_careful include/linux/list.h:329 [inline]
>       ep_events_available fs/eventpoll.c:381 [inline]
>       ep_poll fs/eventpoll.c:1797 [inline]
>       do_epoll_wait+0x279/0xf40 fs/eventpoll.c:2234
>       do_epoll_pwait fs/eventpoll.c:2268 [inline]
>       __do_sys_epoll_pwait fs/eventpoll.c:2281 [inline]
>       __se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275
>       __x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275
>       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
>       entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
>      value changed: 0xffff88810480c7d0 -> 0xffff888103c15098
> 
>      Reported by Kernel Concurrency Sanitizer on:
>      CPU: 1 PID: 1799 Comm: syz-fuzzer Tainted: G        W 5.17.0-rc7-syzkaller-dirty #0
>      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> 
>      Link: https://lkml.kernel.org/r/20220322002653.33865-3-kuniyu@amazon.co.jp
>      Fixes: e59d3c64cba6 ("epoll: eliminate unnecessary lock for zero timeout")
>      Fixes: c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()")
>      Fixes: bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket fds.")
>      Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
>      Reported-by: syzbot+bdd6e38a1ed5ee58d8bd@syzkaller.appspotmail.com
>      Cc: Al Viro <viro@zeniv.linux.org.uk>, Andrew Morton <akpm@linux-foundation.org>
>      Cc: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
>      Cc: Kuniyuki Iwashima <kuni1840@gmail.com>
>      Cc: "Soheil Hassas Yeganeh" <soheil@google.com>
>      Cc: Davidlohr Bueso <dave@stgolabs.net>
>      Cc: "Sridhar Samudrala" <sridhar.samudrala@intel.com>
>      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
>      Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>      Signed-off-by: Sasha Levin <sashal@kernel.org>
> 
>   include/linux/list.h | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> Reverting above change fixes v5.18.3 boot.
> 
> It seems the most significant byte of the "ldd -10(r3),rp" instruction in mpt_reply has been
> set to 0:
> 
>      4084:       bf 80 21 18     cmpb,*<> r0,ret0,4118 <mpt_reply+0x1b8>
>      4088:       08 04 02 5b     copy r4,dp
>      408c:       00 00 04 00     sync
>      4090:       0c 61 10 c2     ldd -10(r3),rp
> 
> See IIR value in crash output.

The commit was added to prevent compiler optimisation from splitting
read/write operations.  I think it can lead in a change in opcodes but
must be safe.  So I'm not sure why the commit causes boot failure for now.

I'm not familiar with PARISC and this may be a stupid question though,
what does `ldd` exactly do? and which line is it executed in the func/file?


> On 2022-06-09 2:13 p.m., John David Anglin wrote:
> > [...]
> > ata3: SATA link down (SStatus 0 SControl 0)
> >       _______________________________
> >      < Your System ate a SPARC! Gah! >
> >       -------------------------------
> >              \   ^__^
> >                  (__)\       )\/\
> >                   U  ||----w |
> >                      ||     ||
> > swapper/0 (pid 0): Illegal instruction (code 8)
> > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.3+ #1
> > Hardware name: 9000/785/C8000
> >
> >      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
> > PSW: 00001000000001000001101100001110 Not tainted
> > r00-03  0000000008041b0e 000000004e8045b0 0000000010bbda78 000000004e804470
> > r04-07  0000000010bb5000 0000000000001440 0000000054355000 0000000000001400
> > r08-11  0000000055000000 000000000000000e 000000000000000f 0000000055002800
> > r12-15  0000000000000000 0000000055002800 0000000040b668c0 0000000040b668c0
> > r16-19  0000000000000001 0000000000000001 0000000051b799f0 0000000000000000
> > r20-23  0000000000000001 fffffffffffff5b9 0000000000000000 0000000000200000
> > r24-27  000000000000000c 000000000800000e 0000000054355144 0000000040b3e0c0
> > r28-31  0000000000010001 000000004e804620 000000004e8045b0 0000000040edd040
> > sr00-03  00000000000a5c00 0000000000000000 0000000000000000 00000000000a5c00
> > sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >
> > IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000010bbd9e0 0000000010bbd9e4
> >  IIR: 006110c2    ISR: 0000000010240000  IOR: 00000003b76dd048
> >  CPU:        0   CR30: 0000000040edd040 CR31: ffffffffffffffff
> >  ORIG_R28: 0000000000000000
> >  IAOQ[0]: mpt_reply+0x130/0x4f0 [mptbase]
> >  IAOQ[1]: mpt_reply+0x134/0x4f0 [mptbase]
> >  RP(r2): mpt_reply+0x1c8/0x4f0 [mptbase]
> > Backtrace:
> >  [<0000000010bbde24>] mpt_interrupt+0x84/0xe8 [mptbase]
> >  [<000000004026dd64>] __handle_irq_event_percpu+0xc4/0x250
> >  [<000000004026df28>] handle_irq_event_percpu+0x38/0xd8
> >  [<00000000402776c4>] handle_percpu_irq+0xb4/0xf0
> >  [<000000004026c90c>] generic_handle_irq+0x5c/0x90
> >  [<00000000401a20e4>] call_on_stack+0x18/0x24
> >
> > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.3+ #1
> > Hardware name: 9000/785/C8000
> > Backtrace:
> >  [<00000000401a8cd8>] show_stack+0x70/0x90
> >  [<0000000040a8e238>] dump_stack_lvl+0xd8/0x128
> >  [<0000000040a8e2bc>] dump_stack+0x34/0x48
> >  [<00000000401a8efc>] die_if_kernel+0x1e4/0x3f8
> >  [<00000000401a9af4>] handle_interruption+0x59c/0xb58
> >  [<00000000401a107c>] intr_check_sig+0x0/0x3c

Can you decode these stacktraces with ./scripts/decode_stacktrace.sh ?


> > Kernel panic - not syncing: Fatal exception in interrupt
> >
> > v5.18.2 with similar config is okay.  The fault seems consistent. IIR contains illegal instruction.
> >
> > Attached config.

Also, can you forward this config to me?

Best regards,
Kuniyuki

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: linux v5.18.3 fails to boot
  2022-06-10 16:06   ` Kuniyuki Iwashima
@ 2022-06-10 16:49     ` John David Anglin
  2022-06-10 18:18       ` John David Anglin
  0 siblings, 1 reply; 8+ messages in thread
From: John David Anglin @ 2022-06-10 16:49 UTC (permalink / raw)
  To: Kuniyuki Iwashima; +Cc: deller, kuniyu, linux-parisc

On 2022-06-10 12:06 p.m., Kuniyuki Iwashima wrote:
> Hello,
> Thanks for heads up!
>
> Date:   Fri, 10 Jun 2022 11:06:24 -0400
> From:   John David Anglin <dave.anglin@bell.net>
>> I did a regression search. e039c0b5985999b150594126225e1ee51df7b4c9 is the first bad commit
>>
>> commit e039c0b5985999b150594126225e1ee51df7b4c9
>> Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
>> Date:   Fri Apr 29 14:38:01 2022 -0700
>>
>>       list: fix a data-race around ep->rdllist
>>
>>       [ Upstream commit d679ae94fdd5d3ab00c35078f5af5f37e068b03d ]
>>
>>       ep_poll() first calls ep_events_available() with no lock held and checks
>>       if ep->rdllist is empty by list_empty_careful(), which reads
>>       rdllist->prev.  Thus all accesses to it need some protection to avoid
>>       store/load-tearing.
>>
>>       Note INIT_LIST_HEAD_RCU() already has the annotation for both prev
>>       and next.
>>
>>       Commit bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket
>>       fds.") added the first lockless ep_events_available(), and commit
>>       c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()")
>>       made some ep_events_available() calls lockless and added single call under
>>       a lock, finally commit e59d3c64cba6 ("epoll: eliminate unnecessary lock
>>       for zero timeout") made the last ep_events_available() lockless.
>>
>>       BUG: KCSAN: data-race in do_epoll_wait / do_epoll_wait
>>
>>       write to 0xffff88810480c7d8 of 8 bytes by task 1802 on cpu 0:
>>        INIT_LIST_HEAD include/linux/list.h:38 [inline]
>>        list_splice_init include/linux/list.h:492 [inline]
>>        ep_start_scan fs/eventpoll.c:622 [inline]
>>        ep_send_events fs/eventpoll.c:1656 [inline]
>>        ep_poll fs/eventpoll.c:1806 [inline]
>>        do_epoll_wait+0x4eb/0xf40 fs/eventpoll.c:2234
>>        do_epoll_pwait fs/eventpoll.c:2268 [inline]
>>        __do_sys_epoll_pwait fs/eventpoll.c:2281 [inline]
>>        __se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275
>>        __x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275
>>        do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>        do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
>>        entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>>       read to 0xffff88810480c7d8 of 8 bytes by task 1799 on cpu 1:
>>        list_empty_careful include/linux/list.h:329 [inline]
>>        ep_events_available fs/eventpoll.c:381 [inline]
>>        ep_poll fs/eventpoll.c:1797 [inline]
>>        do_epoll_wait+0x279/0xf40 fs/eventpoll.c:2234
>>        do_epoll_pwait fs/eventpoll.c:2268 [inline]
>>        __do_sys_epoll_pwait fs/eventpoll.c:2281 [inline]
>>        __se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275
>>        __x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275
>>        do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>        do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
>>        entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>>       value changed: 0xffff88810480c7d0 -> 0xffff888103c15098
>>
>>       Reported by Kernel Concurrency Sanitizer on:
>>       CPU: 1 PID: 1799 Comm: syz-fuzzer Tainted: G        W 5.17.0-rc7-syzkaller-dirty #0
>>       Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>>
>>       Link: https://lkml.kernel.org/r/20220322002653.33865-3-kuniyu@amazon.co.jp
>>       Fixes: e59d3c64cba6 ("epoll: eliminate unnecessary lock for zero timeout")
>>       Fixes: c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()")
>>       Fixes: bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket fds.")
>>       Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
>>       Reported-by: syzbot+bdd6e38a1ed5ee58d8bd@syzkaller.appspotmail.com
>>       Cc: Al Viro <viro@zeniv.linux.org.uk>, Andrew Morton <akpm@linux-foundation.org>
>>       Cc: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
>>       Cc: Kuniyuki Iwashima <kuni1840@gmail.com>
>>       Cc: "Soheil Hassas Yeganeh" <soheil@google.com>
>>       Cc: Davidlohr Bueso <dave@stgolabs.net>
>>       Cc: "Sridhar Samudrala" <sridhar.samudrala@intel.com>
>>       Cc: Alexander Duyck <alexander.h.duyck@intel.com>
>>       Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>>       Signed-off-by: Sasha Levin <sashal@kernel.org>
>>
>>    include/linux/list.h | 6 +++---
>>    1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> Reverting above change fixes v5.18.3 boot.
>>
>> It seems the most significant byte of the "ldd -10(r3),rp" instruction in mpt_reply has been
>> set to 0:
>>
>>       4084:       bf 80 21 18     cmpb,*<> r0,ret0,4118 <mpt_reply+0x1b8>
>>       4088:       08 04 02 5b     copy r4,dp
>>       408c:       00 00 04 00     sync
>>       4090:       0c 61 10 c2     ldd -10(r3),rp
>>
>> See IIR value in crash output.
> The commit was added to prevent compiler optimisation from splitting
> read/write operations.  I think it can lead in a change in opcodes but
> must be safe.  So I'm not sure why the commit causes boot failure for now.
Neither am I.
>
> I'm not familiar with PARISC and this may be a stupid question though,
> what does `ldd` exactly do? and which line is it executed in the func/file?
ldd performs a 64-bit load to register rp (r2).  It is part of mpt_reply's epilogue.
The prior "sync" instruction corresponds to the "mb()" at the end of mpt_reply.

I would have thought this code should have been write protected.  It seems
CONFIG_STRICT_MODULE_RWX is not explicitly set in config:

CONFIG_ARCH_HAS_STRICT_KERNEL_RWX=y
CONFIG_STRICT_KERNEL_RWX=y

I think I need to try enabling CONFIG_STRICT_MODULE_RWX.

>
>
>> On 2022-06-09 2:13 p.m., John David Anglin wrote:
>>> [...]
>>> ata3: SATA link down (SStatus 0 SControl 0)
>>>        _______________________________
>>>       < Your System ate a SPARC! Gah! >
>>>        -------------------------------
>>>               \   ^__^
>>>                   (__)\       )\/\
>>>                    U  ||----w |
>>>                       ||     ||
>>> swapper/0 (pid 0): Illegal instruction (code 8)
>>> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.3+ #1
>>> Hardware name: 9000/785/C8000
>>>
>>>       YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
>>> PSW: 00001000000001000001101100001110 Not tainted
>>> r00-03  0000000008041b0e 000000004e8045b0 0000000010bbda78 000000004e804470
>>> r04-07  0000000010bb5000 0000000000001440 0000000054355000 0000000000001400
>>> r08-11  0000000055000000 000000000000000e 000000000000000f 0000000055002800
>>> r12-15  0000000000000000 0000000055002800 0000000040b668c0 0000000040b668c0
>>> r16-19  0000000000000001 0000000000000001 0000000051b799f0 0000000000000000
>>> r20-23  0000000000000001 fffffffffffff5b9 0000000000000000 0000000000200000
>>> r24-27  000000000000000c 000000000800000e 0000000054355144 0000000040b3e0c0
>>> r28-31  0000000000010001 000000004e804620 000000004e8045b0 0000000040edd040
>>> sr00-03  00000000000a5c00 0000000000000000 0000000000000000 00000000000a5c00
>>> sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>>
>>> IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000010bbd9e0 0000000010bbd9e4
>>>   IIR: 006110c2    ISR: 0000000010240000  IOR: 00000003b76dd048
>>>   CPU:        0   CR30: 0000000040edd040 CR31: ffffffffffffffff
>>>   ORIG_R28: 0000000000000000
>>>   IAOQ[0]: mpt_reply+0x130/0x4f0 [mptbase]
>>>   IAOQ[1]: mpt_reply+0x134/0x4f0 [mptbase]
>>>   RP(r2): mpt_reply+0x1c8/0x4f0 [mptbase]
>>> Backtrace:
>>>   [<0000000010bbde24>] mpt_interrupt+0x84/0xe8 [mptbase]
>>>   [<000000004026dd64>] __handle_irq_event_percpu+0xc4/0x250
>>>   [<000000004026df28>] handle_irq_event_percpu+0x38/0xd8
>>>   [<00000000402776c4>] handle_percpu_irq+0xb4/0xf0
>>>   [<000000004026c90c>] generic_handle_irq+0x5c/0x90
>>>   [<00000000401a20e4>] call_on_stack+0x18/0x24
>>>
>>> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.3+ #1
>>> Hardware name: 9000/785/C8000
>>> Backtrace:
>>>   [<00000000401a8cd8>] show_stack+0x70/0x90
>>>   [<0000000040a8e238>] dump_stack_lvl+0xd8/0x128
>>>   [<0000000040a8e2bc>] dump_stack+0x34/0x48
>>>   [<00000000401a8efc>] die_if_kernel+0x1e4/0x3f8
>>>   [<00000000401a9af4>] handle_interruption+0x59c/0xb58
>>>   [<00000000401a107c>] intr_check_sig+0x0/0x3c
> Can you decode these stacktraces with ./scripts/decode_stacktrace.sh ?
I don't think this is helpful as the code in mpt_reply has been corrupted.
>
>
>>> Kernel panic - not syncing: Fatal exception in interrupt
>>>
>>> v5.18.2 with similar config is okay.  The fault seems consistent. IIR contains illegal instruction.
>>>
>>> Attached config.
> Also, can you forward this config to me?
>
> Best regards,
> Kuniyuki


-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: linux v5.18.3 fails to boot
  2022-06-10 16:49     ` John David Anglin
@ 2022-06-10 18:18       ` John David Anglin
  2022-06-27  0:08         ` Helge Deller
  0 siblings, 1 reply; 8+ messages in thread
From: John David Anglin @ 2022-06-10 18:18 UTC (permalink / raw)
  To: Kuniyuki Iwashima; +Cc: deller, kuniyu, linux-parisc

[-- Attachment #1: Type: text/plain, Size: 2274 bytes --]

On 2022-06-10 12:49 p.m., John David Anglin wrote:
>> The commit was added to prevent compiler optimisation from splitting
>> read/write operations.  I think it can lead in a change in opcodes but
>> must be safe.  So I'm not sure why the commit causes boot failure for now.
> Neither am I.
>>
>> I'm not familiar with PARISC and this may be a stupid question though,
>> what does `ldd` exactly do? and which line is it executed in the func/file?
> ldd performs a 64-bit load to register rp (r2).  It is part of mpt_reply's epilogue.
> The prior "sync" instruction corresponds to the "mb()" at the end of mpt_reply.
>
> I would have thought this code should have been write protected.  It seems
> CONFIG_STRICT_MODULE_RWX is not explicitly set in config:
>
> CONFIG_ARCH_HAS_STRICT_KERNEL_RWX=y
> CONFIG_STRICT_KERNEL_RWX=y
>
> I think I need to try enabling CONFIG_STRICT_MODULE_RWX.
With CONFIG_STRICT_MODULE_RWX, the fault went away and the system boots normally.

To enable CONFIG_STRICT_MODULE_RWX, I needed to add attached patch to Kconfig.

As far as I can tell, this only affects patch_map in the parisc backend:

static void __kprobes *patch_map(void *addr, int fixmap, unsigned long *flags,
                                  int *need_unmap)
{
         unsigned long uintaddr = (uintptr_t) addr;
         bool module = !core_kernel_text(uintaddr);
         struct page *page;

         *need_unmap = 0;
         if (module && IS_ENABLED(CONFIG_STRICT_MODULE_RWX))
                 page = vmalloc_to_page(addr);
         else if (!module && IS_ENABLED(CONFIG_STRICT_KERNEL_RWX))
                 page = virt_to_page(addr);
         else
                 return addr;

Possibly, this might affect Fusion MPT base driver but no patches are applied:

[   29.971295] mptbase alternatives: applied 0 out of 3 patches
[   29.971295] Fusion MPT base driver 3.04.20
[   29.971295] Copyright (c) 1999-2008 LSI Corporation
[   29.971295] Fusion MPT SPI Host driver 3.04.20

Dave

-- 
John David Anglin  dave.anglin@bell.net

[-- Attachment #2: Kconfig.patch --]
[-- Type: text/plain, Size: 410 bytes --]

diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index bd22578859d0..f3a2044ee402 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -10,6 +10,7 @@ config PARISC
 	select ARCH_WANT_FRAME_POINTERS
 	select ARCH_HAS_ELF_RANDOMIZE
 	select ARCH_HAS_STRICT_KERNEL_RWX
+	select ARCH_HAS_STRICT_MODULE_RWX
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
 	select ARCH_HAS_PTE_SPECIAL
 	select ARCH_NO_SG_CHAIN

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: linux v5.18.3 fails to boot
  2022-06-10 18:18       ` John David Anglin
@ 2022-06-27  0:08         ` Helge Deller
  2022-06-27  6:15           ` Sam James
  2022-06-27 17:24           ` Kuniyuki Iwashima
  0 siblings, 2 replies; 8+ messages in thread
From: Helge Deller @ 2022-06-27  0:08 UTC (permalink / raw)
  To: John David Anglin, Kuniyuki Iwashima; +Cc: kuniyu, linux-parisc

On 6/10/22 20:18, John David Anglin wrote:
> On 2022-06-10 12:49 p.m., John David Anglin wrote:
>>> The commit was added to prevent compiler optimisation from splitting
>>> read/write operations.  I think it can lead in a change in opcodes but
>>> must be safe.  So I'm not sure why the commit causes boot failure for now.
>> Neither am I.
>>>
>>> I'm not familiar with PARISC and this may be a stupid question though,
>>> what does `ldd` exactly do? and which line is it executed in the func/file?
>> ldd performs a 64-bit load to register rp (r2).  It is part of mpt_reply's epilogue.
>> The prior "sync" instruction corresponds to the "mb()" at the end of mpt_reply.
>>
>
> Possibly, this might affect Fusion MPT base driver but no patches are applied:
>
> [   29.971295] mptbase alternatives: applied 0 out of 3 patches
> [   29.971295] Fusion MPT base driver 3.04.20
> [   29.971295] Copyright (c) 1999-2008 LSI Corporation
> [   29.971295] Fusion MPT SPI Host driver 3.04.20

To sum it up - this issue war triggered by a few special situations:

The kernel patching code uses the altinstructions table from kernel modules to patch
in alternative assembly instructions.
To read the entries it uses a 32-bit ldw() instruction since the table holds 32-bit values.
Because of another issue this table was located at unaligned memory addresses.
That's why then the kernel ldw() emulation jumped in and read the content.
Commit e8aa7b17fe41 ("parisc/unaligned: Rewrite inline assembly of emulate_ldw()")
broke the ldw() emulation and as such invalid 32-bit values were read back.
This then triggered random memory corruption, because the kernel then patched addresses which it shouldn't.

I just sent a patch to the parisc mailing list to fix up the ldw() handler, which
finally fixed this issue here too.

Everyone who runs kernel v5.18+ on parisc should better apply the patch I sent:
https://patchwork.kernel.org/project/linux-parisc/patch/20220626233911.1023515-1-deller@gmx.de/

Helge

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: linux v5.18.3 fails to boot
  2022-06-27  0:08         ` Helge Deller
@ 2022-06-27  6:15           ` Sam James
  2022-06-27 17:24           ` Kuniyuki Iwashima
  1 sibling, 0 replies; 8+ messages in thread
From: Sam James @ 2022-06-27  6:15 UTC (permalink / raw)
  To: Helge Deller; +Cc: John David Anglin, Kuniyuki Iwashima, kuniyu, linux-parisc

[-- Attachment #1: Type: text/plain, Size: 2260 bytes --]



> On 27 Jun 2022, at 01:08, Helge Deller <deller@gmx.de> wrote:
> 
> On 6/10/22 20:18, John David Anglin wrote:
>> On 2022-06-10 12:49 p.m., John David Anglin wrote:
>>>> The commit was added to prevent compiler optimisation from splitting
>>>> read/write operations.  I think it can lead in a change in opcodes but
>>>> must be safe.  So I'm not sure why the commit causes boot failure for now.
>>> Neither am I.
>>>> 
>>>> I'm not familiar with PARISC and this may be a stupid question though,
>>>> what does `ldd` exactly do? and which line is it executed in the func/file?
>>> ldd performs a 64-bit load to register rp (r2).  It is part of mpt_reply's epilogue.
>>> The prior "sync" instruction corresponds to the "mb()" at the end of mpt_reply.
>>> 
>> 
>> Possibly, this might affect Fusion MPT base driver but no patches are applied:
>> 
>> [   29.971295] mptbase alternatives: applied 0 out of 3 patches
>> [   29.971295] Fusion MPT base driver 3.04.20
>> [   29.971295] Copyright (c) 1999-2008 LSI Corporation
>> [   29.971295] Fusion MPT SPI Host driver 3.04.20
> 
> To sum it up - this issue war triggered by a few special situations:
> 
> The kernel patching code uses the altinstructions table from kernel modules to patch
> in alternative assembly instructions.
> To read the entries it uses a 32-bit ldw() instruction since the table holds 32-bit values.
> Because of another issue this table was located at unaligned memory addresses.
> That's why then the kernel ldw() emulation jumped in and read the content.
> Commit e8aa7b17fe41 ("parisc/unaligned: Rewrite inline assembly of emulate_ldw()")
> broke the ldw() emulation and as such invalid 32-bit values were read back.
> This then triggered random memory corruption, because the kernel then patched addresses which it shouldn't.
> 
> I just sent a patch to the parisc mailing list to fix up the ldw() handler, which
> finally fixed this issue here too.
> 
> Everyone who runs kernel v5.18+ on parisc should better apply the patch I sent:
> https://patchwork.kernel.org/project/linux-parisc/patch/20220626233911.1023515-1-deller@gmx.de/
> 

Appreciate you summarising - I was just wondering about this bug earlier :)

> Helge


Best,
sam

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 358 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: linux v5.18.3 fails to boot
  2022-06-27  0:08         ` Helge Deller
  2022-06-27  6:15           ` Sam James
@ 2022-06-27 17:24           ` Kuniyuki Iwashima
  1 sibling, 0 replies; 8+ messages in thread
From: Kuniyuki Iwashima @ 2022-06-27 17:24 UTC (permalink / raw)
  To: deller; +Cc: dave.anglin, kuniyu, kuniyu, linux-parisc

From:   Helge Deller <deller@gmx.de>
Date:   Mon, 27 Jun 2022 02:08:29 +0200
> On 6/10/22 20:18, John David Anglin wrote:
>> On 2022-06-10 12:49 p.m., John David Anglin wrote:
>>>> The commit was added to prevent compiler optimisation from splitting
>>>> read/write operations.  I think it can lead in a change in opcodes but
>>>> must be safe.  So I'm not sure why the commit causes boot failure for now.
>>> Neither am I.
>>>>
>>>> I'm not familiar with PARISC and this may be a stupid question though,
>>>> what does `ldd` exactly do? and which line is it executed in the func/file?
>>> ldd performs a 64-bit load to register rp (r2).  It is part of mpt_reply's epilogue.
>>> The prior "sync" instruction corresponds to the "mb()" at the end of mpt_reply.
>>>
>>
>> Possibly, this might affect Fusion MPT base driver but no patches are applied:
>>
>> [   29.971295] mptbase alternatives: applied 0 out of 3 patches
>> [   29.971295] Fusion MPT base driver 3.04.20
>> [   29.971295] Copyright (c) 1999-2008 LSI Corporation
>> [   29.971295] Fusion MPT SPI Host driver 3.04.20
> 
> To sum it up - this issue war triggered by a few special situations:
> 
> The kernel patching code uses the altinstructions table from kernel modules to patch
> in alternative assembly instructions.
> To read the entries it uses a 32-bit ldw() instruction since the table holds 32-bit values.
> Because of another issue this table was located at unaligned memory addresses.
> That's why then the kernel ldw() emulation jumped in and read the content.
> Commit e8aa7b17fe41 ("parisc/unaligned: Rewrite inline assembly of emulate_ldw()")
> broke the ldw() emulation and as such invalid 32-bit values were read back.
> This then triggered random memory corruption, because the kernel then patched addresses which it shouldn't.
> 
> I just sent a patch to the parisc mailing list to fix up the ldw() handler, which
> finally fixed this issue here too.

Interesting!
I was wondering enabling CONFIG_STRICT_MODULE_RWX, which was originally off,
could have another impact.
I appreciate your summary and fix!

Best regards,
Kuniyuki


> 
> Everyone who runs kernel v5.18+ on parisc should better apply the patch I sent:
> https://patchwork.kernel.org/project/linux-parisc/patch/20220626233911.1023515-1-deller@gmx.de/
> 
> Helge


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-06-27 17:41 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-09 18:13 linux v5.18.3 fails to boot John David Anglin
2022-06-10 15:06 ` John David Anglin
2022-06-10 16:06   ` Kuniyuki Iwashima
2022-06-10 16:49     ` John David Anglin
2022-06-10 18:18       ` John David Anglin
2022-06-27  0:08         ` Helge Deller
2022-06-27  6:15           ` Sam James
2022-06-27 17:24           ` Kuniyuki Iwashima

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.