* linux v5.18.3 fails to boot
@ 2022-06-09 18:13 John David Anglin
2022-06-10 15:06 ` John David Anglin
0 siblings, 1 reply; 8+ messages in thread
From: John David Anglin @ 2022-06-09 18:13 UTC (permalink / raw)
To: linux-parisc; +Cc: Helge Deller
[-- Attachment #1: Type: text/plain, Size: 2723 bytes --]
[...]
ata3: SATA link down (SStatus 0 SControl 0)
_______________________________
< Your System ate a SPARC! Gah! >
-------------------------------
\ ^__^
(__)\ )\/\
U ||----w |
|| ||
swapper/0 (pid 0): Illegal instruction (code 8)
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.3+ #1
Hardware name: 9000/785/C8000
YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000000001000001101100001110 Not tainted
r00-03 0000000008041b0e 000000004e8045b0 0000000010bbda78 000000004e804470
r04-07 0000000010bb5000 0000000000001440 0000000054355000 0000000000001400
r08-11 0000000055000000 000000000000000e 000000000000000f 0000000055002800
r12-15 0000000000000000 0000000055002800 0000000040b668c0 0000000040b668c0
r16-19 0000000000000001 0000000000000001 0000000051b799f0 0000000000000000
r20-23 0000000000000001 fffffffffffff5b9 0000000000000000 0000000000200000
r24-27 000000000000000c 000000000800000e 0000000054355144 0000000040b3e0c0
r28-31 0000000000010001 000000004e804620 000000004e8045b0 0000000040edd040
sr00-03 00000000000a5c00 0000000000000000 0000000000000000 00000000000a5c00
sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000010bbd9e0 0000000010bbd9e4
IIR: 006110c2 ISR: 0000000010240000 IOR: 00000003b76dd048
CPU: 0 CR30: 0000000040edd040 CR31: ffffffffffffffff
ORIG_R28: 0000000000000000
IAOQ[0]: mpt_reply+0x130/0x4f0 [mptbase]
IAOQ[1]: mpt_reply+0x134/0x4f0 [mptbase]
RP(r2): mpt_reply+0x1c8/0x4f0 [mptbase]
Backtrace:
[<0000000010bbde24>] mpt_interrupt+0x84/0xe8 [mptbase]
[<000000004026dd64>] __handle_irq_event_percpu+0xc4/0x250
[<000000004026df28>] handle_irq_event_percpu+0x38/0xd8
[<00000000402776c4>] handle_percpu_irq+0xb4/0xf0
[<000000004026c90c>] generic_handle_irq+0x5c/0x90
[<00000000401a20e4>] call_on_stack+0x18/0x24
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.3+ #1
Hardware name: 9000/785/C8000
Backtrace:
[<00000000401a8cd8>] show_stack+0x70/0x90
[<0000000040a8e238>] dump_stack_lvl+0xd8/0x128
[<0000000040a8e2bc>] dump_stack+0x34/0x48
[<00000000401a8efc>] die_if_kernel+0x1e4/0x3f8
[<00000000401a9af4>] handle_interruption+0x59c/0xb58
[<00000000401a107c>] intr_check_sig+0x0/0x3c
Kernel panic - not syncing: Fatal exception in interrupt
v5.18.2 with similar config is okay. The fault seems consistent. IIR contains illegal instruction.
Attached config.
Dave
--
John David Anglin dave.anglin@bell.net
[-- Attachment #2: config-5.18.3+.gz --]
[-- Type: application/x-gzip, Size: 22714 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: linux v5.18.3 fails to boot
2022-06-09 18:13 linux v5.18.3 fails to boot John David Anglin
@ 2022-06-10 15:06 ` John David Anglin
2022-06-10 16:06 ` Kuniyuki Iwashima
0 siblings, 1 reply; 8+ messages in thread
From: John David Anglin @ 2022-06-10 15:06 UTC (permalink / raw)
To: linux-parisc; +Cc: Helge Deller, Kuniyuki Iwashima
I did a regression search. e039c0b5985999b150594126225e1ee51df7b4c9 is the first bad commit
commit e039c0b5985999b150594126225e1ee51df7b4c9
Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Date: Fri Apr 29 14:38:01 2022 -0700
list: fix a data-race around ep->rdllist
[ Upstream commit d679ae94fdd5d3ab00c35078f5af5f37e068b03d ]
ep_poll() first calls ep_events_available() with no lock held and checks
if ep->rdllist is empty by list_empty_careful(), which reads
rdllist->prev. Thus all accesses to it need some protection to avoid
store/load-tearing.
Note INIT_LIST_HEAD_RCU() already has the annotation for both prev
and next.
Commit bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket
fds.") added the first lockless ep_events_available(), and commit
c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()")
made some ep_events_available() calls lockless and added single call under
a lock, finally commit e59d3c64cba6 ("epoll: eliminate unnecessary lock
for zero timeout") made the last ep_events_available() lockless.
BUG: KCSAN: data-race in do_epoll_wait / do_epoll_wait
write to 0xffff88810480c7d8 of 8 bytes by task 1802 on cpu 0:
INIT_LIST_HEAD include/linux/list.h:38 [inline]
list_splice_init include/linux/list.h:492 [inline]
ep_start_scan fs/eventpoll.c:622 [inline]
ep_send_events fs/eventpoll.c:1656 [inline]
ep_poll fs/eventpoll.c:1806 [inline]
do_epoll_wait+0x4eb/0xf40 fs/eventpoll.c:2234
do_epoll_pwait fs/eventpoll.c:2268 [inline]
__do_sys_epoll_pwait fs/eventpoll.c:2281 [inline]
__se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275
__x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff88810480c7d8 of 8 bytes by task 1799 on cpu 1:
list_empty_careful include/linux/list.h:329 [inline]
ep_events_available fs/eventpoll.c:381 [inline]
ep_poll fs/eventpoll.c:1797 [inline]
do_epoll_wait+0x279/0xf40 fs/eventpoll.c:2234
do_epoll_pwait fs/eventpoll.c:2268 [inline]
__do_sys_epoll_pwait fs/eventpoll.c:2281 [inline]
__se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275
__x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0xffff88810480c7d0 -> 0xffff888103c15098
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 1799 Comm: syz-fuzzer Tainted: G W 5.17.0-rc7-syzkaller-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Link: https://lkml.kernel.org/r/20220322002653.33865-3-kuniyu@amazon.co.jp
Fixes: e59d3c64cba6 ("epoll: eliminate unnecessary lock for zero timeout")
Fixes: c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()")
Fixes: bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket fds.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Reported-by: syzbot+bdd6e38a1ed5ee58d8bd@syzkaller.appspotmail.com
Cc: Al Viro <viro@zeniv.linux.org.uk>, Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Cc: Kuniyuki Iwashima <kuni1840@gmail.com>
Cc: "Soheil Hassas Yeganeh" <soheil@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Sridhar Samudrala" <sridhar.samudrala@intel.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
include/linux/list.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
Reverting above change fixes v5.18.3 boot.
It seems the most significant byte of the "ldd -10(r3),rp" instruction in mpt_reply has been
set to 0:
4084: bf 80 21 18 cmpb,*<> r0,ret0,4118 <mpt_reply+0x1b8>
4088: 08 04 02 5b copy r4,dp
408c: 00 00 04 00 sync
4090: 0c 61 10 c2 ldd -10(r3),rp
See IIR value in crash output.
On 2022-06-09 2:13 p.m., John David Anglin wrote:
> [...]
> ata3: SATA link down (SStatus 0 SControl 0)
> _______________________________
> < Your System ate a SPARC! Gah! >
> -------------------------------
> \ ^__^
> (__)\ )\/\
> U ||----w |
> || ||
> swapper/0 (pid 0): Illegal instruction (code 8)
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.3+ #1
> Hardware name: 9000/785/C8000
>
> YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
> PSW: 00001000000001000001101100001110 Not tainted
> r00-03 0000000008041b0e 000000004e8045b0 0000000010bbda78 000000004e804470
> r04-07 0000000010bb5000 0000000000001440 0000000054355000 0000000000001400
> r08-11 0000000055000000 000000000000000e 000000000000000f 0000000055002800
> r12-15 0000000000000000 0000000055002800 0000000040b668c0 0000000040b668c0
> r16-19 0000000000000001 0000000000000001 0000000051b799f0 0000000000000000
> r20-23 0000000000000001 fffffffffffff5b9 0000000000000000 0000000000200000
> r24-27 000000000000000c 000000000800000e 0000000054355144 0000000040b3e0c0
> r28-31 0000000000010001 000000004e804620 000000004e8045b0 0000000040edd040
> sr00-03 00000000000a5c00 0000000000000000 0000000000000000 00000000000a5c00
> sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>
> IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000010bbd9e0 0000000010bbd9e4
> IIR: 006110c2 ISR: 0000000010240000 IOR: 00000003b76dd048
> CPU: 0 CR30: 0000000040edd040 CR31: ffffffffffffffff
> ORIG_R28: 0000000000000000
> IAOQ[0]: mpt_reply+0x130/0x4f0 [mptbase]
> IAOQ[1]: mpt_reply+0x134/0x4f0 [mptbase]
> RP(r2): mpt_reply+0x1c8/0x4f0 [mptbase]
> Backtrace:
> [<0000000010bbde24>] mpt_interrupt+0x84/0xe8 [mptbase]
> [<000000004026dd64>] __handle_irq_event_percpu+0xc4/0x250
> [<000000004026df28>] handle_irq_event_percpu+0x38/0xd8
> [<00000000402776c4>] handle_percpu_irq+0xb4/0xf0
> [<000000004026c90c>] generic_handle_irq+0x5c/0x90
> [<00000000401a20e4>] call_on_stack+0x18/0x24
>
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.3+ #1
> Hardware name: 9000/785/C8000
> Backtrace:
> [<00000000401a8cd8>] show_stack+0x70/0x90
> [<0000000040a8e238>] dump_stack_lvl+0xd8/0x128
> [<0000000040a8e2bc>] dump_stack+0x34/0x48
> [<00000000401a8efc>] die_if_kernel+0x1e4/0x3f8
> [<00000000401a9af4>] handle_interruption+0x59c/0xb58
> [<00000000401a107c>] intr_check_sig+0x0/0x3c
>
> Kernel panic - not syncing: Fatal exception in interrupt
>
> v5.18.2 with similar config is okay. The fault seems consistent. IIR contains illegal instruction.
>
> Attached config.
>
> Dave
>
--
John David Anglin dave.anglin@bell.net
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: linux v5.18.3 fails to boot
2022-06-10 15:06 ` John David Anglin
@ 2022-06-10 16:06 ` Kuniyuki Iwashima
2022-06-10 16:49 ` John David Anglin
0 siblings, 1 reply; 8+ messages in thread
From: Kuniyuki Iwashima @ 2022-06-10 16:06 UTC (permalink / raw)
To: dave.anglin; +Cc: deller, kuniyu, linux-parisc
Hello,
Thanks for heads up!
Date: Fri, 10 Jun 2022 11:06:24 -0400
From: John David Anglin <dave.anglin@bell.net>
> I did a regression search. e039c0b5985999b150594126225e1ee51df7b4c9 is the first bad commit
>
> commit e039c0b5985999b150594126225e1ee51df7b4c9
> Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
> Date: Fri Apr 29 14:38:01 2022 -0700
>
> list: fix a data-race around ep->rdllist
>
> [ Upstream commit d679ae94fdd5d3ab00c35078f5af5f37e068b03d ]
>
> ep_poll() first calls ep_events_available() with no lock held and checks
> if ep->rdllist is empty by list_empty_careful(), which reads
> rdllist->prev. Thus all accesses to it need some protection to avoid
> store/load-tearing.
>
> Note INIT_LIST_HEAD_RCU() already has the annotation for both prev
> and next.
>
> Commit bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket
> fds.") added the first lockless ep_events_available(), and commit
> c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()")
> made some ep_events_available() calls lockless and added single call under
> a lock, finally commit e59d3c64cba6 ("epoll: eliminate unnecessary lock
> for zero timeout") made the last ep_events_available() lockless.
>
> BUG: KCSAN: data-race in do_epoll_wait / do_epoll_wait
>
> write to 0xffff88810480c7d8 of 8 bytes by task 1802 on cpu 0:
> INIT_LIST_HEAD include/linux/list.h:38 [inline]
> list_splice_init include/linux/list.h:492 [inline]
> ep_start_scan fs/eventpoll.c:622 [inline]
> ep_send_events fs/eventpoll.c:1656 [inline]
> ep_poll fs/eventpoll.c:1806 [inline]
> do_epoll_wait+0x4eb/0xf40 fs/eventpoll.c:2234
> do_epoll_pwait fs/eventpoll.c:2268 [inline]
> __do_sys_epoll_pwait fs/eventpoll.c:2281 [inline]
> __se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275
> __x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> read to 0xffff88810480c7d8 of 8 bytes by task 1799 on cpu 1:
> list_empty_careful include/linux/list.h:329 [inline]
> ep_events_available fs/eventpoll.c:381 [inline]
> ep_poll fs/eventpoll.c:1797 [inline]
> do_epoll_wait+0x279/0xf40 fs/eventpoll.c:2234
> do_epoll_pwait fs/eventpoll.c:2268 [inline]
> __do_sys_epoll_pwait fs/eventpoll.c:2281 [inline]
> __se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275
> __x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> value changed: 0xffff88810480c7d0 -> 0xffff888103c15098
>
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 1 PID: 1799 Comm: syz-fuzzer Tainted: G W 5.17.0-rc7-syzkaller-dirty #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>
> Link: https://lkml.kernel.org/r/20220322002653.33865-3-kuniyu@amazon.co.jp
> Fixes: e59d3c64cba6 ("epoll: eliminate unnecessary lock for zero timeout")
> Fixes: c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()")
> Fixes: bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket fds.")
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
> Reported-by: syzbot+bdd6e38a1ed5ee58d8bd@syzkaller.appspotmail.com
> Cc: Al Viro <viro@zeniv.linux.org.uk>, Andrew Morton <akpm@linux-foundation.org>
> Cc: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
> Cc: Kuniyuki Iwashima <kuni1840@gmail.com>
> Cc: "Soheil Hassas Yeganeh" <soheil@google.com>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: "Sridhar Samudrala" <sridhar.samudrala@intel.com>
> Cc: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
>
> include/linux/list.h | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> Reverting above change fixes v5.18.3 boot.
>
> It seems the most significant byte of the "ldd -10(r3),rp" instruction in mpt_reply has been
> set to 0:
>
> 4084: bf 80 21 18 cmpb,*<> r0,ret0,4118 <mpt_reply+0x1b8>
> 4088: 08 04 02 5b copy r4,dp
> 408c: 00 00 04 00 sync
> 4090: 0c 61 10 c2 ldd -10(r3),rp
>
> See IIR value in crash output.
The commit was added to prevent compiler optimisation from splitting
read/write operations. I think it can lead in a change in opcodes but
must be safe. So I'm not sure why the commit causes boot failure for now.
I'm not familiar with PARISC and this may be a stupid question though,
what does `ldd` exactly do? and which line is it executed in the func/file?
> On 2022-06-09 2:13 p.m., John David Anglin wrote:
> > [...]
> > ata3: SATA link down (SStatus 0 SControl 0)
> > _______________________________
> > < Your System ate a SPARC! Gah! >
> > -------------------------------
> > \ ^__^
> > (__)\ )\/\
> > U ||----w |
> > || ||
> > swapper/0 (pid 0): Illegal instruction (code 8)
> > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.3+ #1
> > Hardware name: 9000/785/C8000
> >
> > YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
> > PSW: 00001000000001000001101100001110 Not tainted
> > r00-03 0000000008041b0e 000000004e8045b0 0000000010bbda78 000000004e804470
> > r04-07 0000000010bb5000 0000000000001440 0000000054355000 0000000000001400
> > r08-11 0000000055000000 000000000000000e 000000000000000f 0000000055002800
> > r12-15 0000000000000000 0000000055002800 0000000040b668c0 0000000040b668c0
> > r16-19 0000000000000001 0000000000000001 0000000051b799f0 0000000000000000
> > r20-23 0000000000000001 fffffffffffff5b9 0000000000000000 0000000000200000
> > r24-27 000000000000000c 000000000800000e 0000000054355144 0000000040b3e0c0
> > r28-31 0000000000010001 000000004e804620 000000004e8045b0 0000000040edd040
> > sr00-03 00000000000a5c00 0000000000000000 0000000000000000 00000000000a5c00
> > sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >
> > IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000010bbd9e0 0000000010bbd9e4
> > IIR: 006110c2 ISR: 0000000010240000 IOR: 00000003b76dd048
> > CPU: 0 CR30: 0000000040edd040 CR31: ffffffffffffffff
> > ORIG_R28: 0000000000000000
> > IAOQ[0]: mpt_reply+0x130/0x4f0 [mptbase]
> > IAOQ[1]: mpt_reply+0x134/0x4f0 [mptbase]
> > RP(r2): mpt_reply+0x1c8/0x4f0 [mptbase]
> > Backtrace:
> > [<0000000010bbde24>] mpt_interrupt+0x84/0xe8 [mptbase]
> > [<000000004026dd64>] __handle_irq_event_percpu+0xc4/0x250
> > [<000000004026df28>] handle_irq_event_percpu+0x38/0xd8
> > [<00000000402776c4>] handle_percpu_irq+0xb4/0xf0
> > [<000000004026c90c>] generic_handle_irq+0x5c/0x90
> > [<00000000401a20e4>] call_on_stack+0x18/0x24
> >
> > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.3+ #1
> > Hardware name: 9000/785/C8000
> > Backtrace:
> > [<00000000401a8cd8>] show_stack+0x70/0x90
> > [<0000000040a8e238>] dump_stack_lvl+0xd8/0x128
> > [<0000000040a8e2bc>] dump_stack+0x34/0x48
> > [<00000000401a8efc>] die_if_kernel+0x1e4/0x3f8
> > [<00000000401a9af4>] handle_interruption+0x59c/0xb58
> > [<00000000401a107c>] intr_check_sig+0x0/0x3c
Can you decode these stacktraces with ./scripts/decode_stacktrace.sh ?
> > Kernel panic - not syncing: Fatal exception in interrupt
> >
> > v5.18.2 with similar config is okay. The fault seems consistent. IIR contains illegal instruction.
> >
> > Attached config.
Also, can you forward this config to me?
Best regards,
Kuniyuki
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: linux v5.18.3 fails to boot
2022-06-10 16:06 ` Kuniyuki Iwashima
@ 2022-06-10 16:49 ` John David Anglin
2022-06-10 18:18 ` John David Anglin
0 siblings, 1 reply; 8+ messages in thread
From: John David Anglin @ 2022-06-10 16:49 UTC (permalink / raw)
To: Kuniyuki Iwashima; +Cc: deller, kuniyu, linux-parisc
On 2022-06-10 12:06 p.m., Kuniyuki Iwashima wrote:
> Hello,
> Thanks for heads up!
>
> Date: Fri, 10 Jun 2022 11:06:24 -0400
> From: John David Anglin <dave.anglin@bell.net>
>> I did a regression search. e039c0b5985999b150594126225e1ee51df7b4c9 is the first bad commit
>>
>> commit e039c0b5985999b150594126225e1ee51df7b4c9
>> Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
>> Date: Fri Apr 29 14:38:01 2022 -0700
>>
>> list: fix a data-race around ep->rdllist
>>
>> [ Upstream commit d679ae94fdd5d3ab00c35078f5af5f37e068b03d ]
>>
>> ep_poll() first calls ep_events_available() with no lock held and checks
>> if ep->rdllist is empty by list_empty_careful(), which reads
>> rdllist->prev. Thus all accesses to it need some protection to avoid
>> store/load-tearing.
>>
>> Note INIT_LIST_HEAD_RCU() already has the annotation for both prev
>> and next.
>>
>> Commit bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket
>> fds.") added the first lockless ep_events_available(), and commit
>> c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()")
>> made some ep_events_available() calls lockless and added single call under
>> a lock, finally commit e59d3c64cba6 ("epoll: eliminate unnecessary lock
>> for zero timeout") made the last ep_events_available() lockless.
>>
>> BUG: KCSAN: data-race in do_epoll_wait / do_epoll_wait
>>
>> write to 0xffff88810480c7d8 of 8 bytes by task 1802 on cpu 0:
>> INIT_LIST_HEAD include/linux/list.h:38 [inline]
>> list_splice_init include/linux/list.h:492 [inline]
>> ep_start_scan fs/eventpoll.c:622 [inline]
>> ep_send_events fs/eventpoll.c:1656 [inline]
>> ep_poll fs/eventpoll.c:1806 [inline]
>> do_epoll_wait+0x4eb/0xf40 fs/eventpoll.c:2234
>> do_epoll_pwait fs/eventpoll.c:2268 [inline]
>> __do_sys_epoll_pwait fs/eventpoll.c:2281 [inline]
>> __se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275
>> __x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275
>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>> do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
>> entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>> read to 0xffff88810480c7d8 of 8 bytes by task 1799 on cpu 1:
>> list_empty_careful include/linux/list.h:329 [inline]
>> ep_events_available fs/eventpoll.c:381 [inline]
>> ep_poll fs/eventpoll.c:1797 [inline]
>> do_epoll_wait+0x279/0xf40 fs/eventpoll.c:2234
>> do_epoll_pwait fs/eventpoll.c:2268 [inline]
>> __do_sys_epoll_pwait fs/eventpoll.c:2281 [inline]
>> __se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275
>> __x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275
>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>> do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
>> entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>> value changed: 0xffff88810480c7d0 -> 0xffff888103c15098
>>
>> Reported by Kernel Concurrency Sanitizer on:
>> CPU: 1 PID: 1799 Comm: syz-fuzzer Tainted: G W 5.17.0-rc7-syzkaller-dirty #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>>
>> Link: https://lkml.kernel.org/r/20220322002653.33865-3-kuniyu@amazon.co.jp
>> Fixes: e59d3c64cba6 ("epoll: eliminate unnecessary lock for zero timeout")
>> Fixes: c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()")
>> Fixes: bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket fds.")
>> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
>> Reported-by: syzbot+bdd6e38a1ed5ee58d8bd@syzkaller.appspotmail.com
>> Cc: Al Viro <viro@zeniv.linux.org.uk>, Andrew Morton <akpm@linux-foundation.org>
>> Cc: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
>> Cc: Kuniyuki Iwashima <kuni1840@gmail.com>
>> Cc: "Soheil Hassas Yeganeh" <soheil@google.com>
>> Cc: Davidlohr Bueso <dave@stgolabs.net>
>> Cc: "Sridhar Samudrala" <sridhar.samudrala@intel.com>
>> Cc: Alexander Duyck <alexander.h.duyck@intel.com>
>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> Signed-off-by: Sasha Levin <sashal@kernel.org>
>>
>> include/linux/list.h | 6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> Reverting above change fixes v5.18.3 boot.
>>
>> It seems the most significant byte of the "ldd -10(r3),rp" instruction in mpt_reply has been
>> set to 0:
>>
>> 4084: bf 80 21 18 cmpb,*<> r0,ret0,4118 <mpt_reply+0x1b8>
>> 4088: 08 04 02 5b copy r4,dp
>> 408c: 00 00 04 00 sync
>> 4090: 0c 61 10 c2 ldd -10(r3),rp
>>
>> See IIR value in crash output.
> The commit was added to prevent compiler optimisation from splitting
> read/write operations. I think it can lead in a change in opcodes but
> must be safe. So I'm not sure why the commit causes boot failure for now.
Neither am I.
>
> I'm not familiar with PARISC and this may be a stupid question though,
> what does `ldd` exactly do? and which line is it executed in the func/file?
ldd performs a 64-bit load to register rp (r2). It is part of mpt_reply's epilogue.
The prior "sync" instruction corresponds to the "mb()" at the end of mpt_reply.
I would have thought this code should have been write protected. It seems
CONFIG_STRICT_MODULE_RWX is not explicitly set in config:
CONFIG_ARCH_HAS_STRICT_KERNEL_RWX=y
CONFIG_STRICT_KERNEL_RWX=y
I think I need to try enabling CONFIG_STRICT_MODULE_RWX.
>
>
>> On 2022-06-09 2:13 p.m., John David Anglin wrote:
>>> [...]
>>> ata3: SATA link down (SStatus 0 SControl 0)
>>> _______________________________
>>> < Your System ate a SPARC! Gah! >
>>> -------------------------------
>>> \ ^__^
>>> (__)\ )\/\
>>> U ||----w |
>>> || ||
>>> swapper/0 (pid 0): Illegal instruction (code 8)
>>> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.3+ #1
>>> Hardware name: 9000/785/C8000
>>>
>>> YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
>>> PSW: 00001000000001000001101100001110 Not tainted
>>> r00-03 0000000008041b0e 000000004e8045b0 0000000010bbda78 000000004e804470
>>> r04-07 0000000010bb5000 0000000000001440 0000000054355000 0000000000001400
>>> r08-11 0000000055000000 000000000000000e 000000000000000f 0000000055002800
>>> r12-15 0000000000000000 0000000055002800 0000000040b668c0 0000000040b668c0
>>> r16-19 0000000000000001 0000000000000001 0000000051b799f0 0000000000000000
>>> r20-23 0000000000000001 fffffffffffff5b9 0000000000000000 0000000000200000
>>> r24-27 000000000000000c 000000000800000e 0000000054355144 0000000040b3e0c0
>>> r28-31 0000000000010001 000000004e804620 000000004e8045b0 0000000040edd040
>>> sr00-03 00000000000a5c00 0000000000000000 0000000000000000 00000000000a5c00
>>> sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>>
>>> IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000010bbd9e0 0000000010bbd9e4
>>> IIR: 006110c2 ISR: 0000000010240000 IOR: 00000003b76dd048
>>> CPU: 0 CR30: 0000000040edd040 CR31: ffffffffffffffff
>>> ORIG_R28: 0000000000000000
>>> IAOQ[0]: mpt_reply+0x130/0x4f0 [mptbase]
>>> IAOQ[1]: mpt_reply+0x134/0x4f0 [mptbase]
>>> RP(r2): mpt_reply+0x1c8/0x4f0 [mptbase]
>>> Backtrace:
>>> [<0000000010bbde24>] mpt_interrupt+0x84/0xe8 [mptbase]
>>> [<000000004026dd64>] __handle_irq_event_percpu+0xc4/0x250
>>> [<000000004026df28>] handle_irq_event_percpu+0x38/0xd8
>>> [<00000000402776c4>] handle_percpu_irq+0xb4/0xf0
>>> [<000000004026c90c>] generic_handle_irq+0x5c/0x90
>>> [<00000000401a20e4>] call_on_stack+0x18/0x24
>>>
>>> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.3+ #1
>>> Hardware name: 9000/785/C8000
>>> Backtrace:
>>> [<00000000401a8cd8>] show_stack+0x70/0x90
>>> [<0000000040a8e238>] dump_stack_lvl+0xd8/0x128
>>> [<0000000040a8e2bc>] dump_stack+0x34/0x48
>>> [<00000000401a8efc>] die_if_kernel+0x1e4/0x3f8
>>> [<00000000401a9af4>] handle_interruption+0x59c/0xb58
>>> [<00000000401a107c>] intr_check_sig+0x0/0x3c
> Can you decode these stacktraces with ./scripts/decode_stacktrace.sh ?
I don't think this is helpful as the code in mpt_reply has been corrupted.
>
>
>>> Kernel panic - not syncing: Fatal exception in interrupt
>>>
>>> v5.18.2 with similar config is okay. The fault seems consistent. IIR contains illegal instruction.
>>>
>>> Attached config.
> Also, can you forward this config to me?
>
> Best regards,
> Kuniyuki
--
John David Anglin dave.anglin@bell.net
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: linux v5.18.3 fails to boot
2022-06-10 16:49 ` John David Anglin
@ 2022-06-10 18:18 ` John David Anglin
2022-06-27 0:08 ` Helge Deller
0 siblings, 1 reply; 8+ messages in thread
From: John David Anglin @ 2022-06-10 18:18 UTC (permalink / raw)
To: Kuniyuki Iwashima; +Cc: deller, kuniyu, linux-parisc
[-- Attachment #1: Type: text/plain, Size: 2274 bytes --]
On 2022-06-10 12:49 p.m., John David Anglin wrote:
>> The commit was added to prevent compiler optimisation from splitting
>> read/write operations. I think it can lead in a change in opcodes but
>> must be safe. So I'm not sure why the commit causes boot failure for now.
> Neither am I.
>>
>> I'm not familiar with PARISC and this may be a stupid question though,
>> what does `ldd` exactly do? and which line is it executed in the func/file?
> ldd performs a 64-bit load to register rp (r2). It is part of mpt_reply's epilogue.
> The prior "sync" instruction corresponds to the "mb()" at the end of mpt_reply.
>
> I would have thought this code should have been write protected. It seems
> CONFIG_STRICT_MODULE_RWX is not explicitly set in config:
>
> CONFIG_ARCH_HAS_STRICT_KERNEL_RWX=y
> CONFIG_STRICT_KERNEL_RWX=y
>
> I think I need to try enabling CONFIG_STRICT_MODULE_RWX.
With CONFIG_STRICT_MODULE_RWX, the fault went away and the system boots normally.
To enable CONFIG_STRICT_MODULE_RWX, I needed to add attached patch to Kconfig.
As far as I can tell, this only affects patch_map in the parisc backend:
static void __kprobes *patch_map(void *addr, int fixmap, unsigned long *flags,
int *need_unmap)
{
unsigned long uintaddr = (uintptr_t) addr;
bool module = !core_kernel_text(uintaddr);
struct page *page;
*need_unmap = 0;
if (module && IS_ENABLED(CONFIG_STRICT_MODULE_RWX))
page = vmalloc_to_page(addr);
else if (!module && IS_ENABLED(CONFIG_STRICT_KERNEL_RWX))
page = virt_to_page(addr);
else
return addr;
Possibly, this might affect Fusion MPT base driver but no patches are applied:
[ 29.971295] mptbase alternatives: applied 0 out of 3 patches
[ 29.971295] Fusion MPT base driver 3.04.20
[ 29.971295] Copyright (c) 1999-2008 LSI Corporation
[ 29.971295] Fusion MPT SPI Host driver 3.04.20
Dave
--
John David Anglin dave.anglin@bell.net
[-- Attachment #2: Kconfig.patch --]
[-- Type: text/plain, Size: 410 bytes --]
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index bd22578859d0..f3a2044ee402 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -10,6 +10,7 @@ config PARISC
select ARCH_WANT_FRAME_POINTERS
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_STRICT_KERNEL_RWX
+ select ARCH_HAS_STRICT_MODULE_RWX
select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARCH_HAS_PTE_SPECIAL
select ARCH_NO_SG_CHAIN
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: linux v5.18.3 fails to boot
2022-06-10 18:18 ` John David Anglin
@ 2022-06-27 0:08 ` Helge Deller
2022-06-27 6:15 ` Sam James
2022-06-27 17:24 ` Kuniyuki Iwashima
0 siblings, 2 replies; 8+ messages in thread
From: Helge Deller @ 2022-06-27 0:08 UTC (permalink / raw)
To: John David Anglin, Kuniyuki Iwashima; +Cc: kuniyu, linux-parisc
On 6/10/22 20:18, John David Anglin wrote:
> On 2022-06-10 12:49 p.m., John David Anglin wrote:
>>> The commit was added to prevent compiler optimisation from splitting
>>> read/write operations. I think it can lead in a change in opcodes but
>>> must be safe. So I'm not sure why the commit causes boot failure for now.
>> Neither am I.
>>>
>>> I'm not familiar with PARISC and this may be a stupid question though,
>>> what does `ldd` exactly do? and which line is it executed in the func/file?
>> ldd performs a 64-bit load to register rp (r2). It is part of mpt_reply's epilogue.
>> The prior "sync" instruction corresponds to the "mb()" at the end of mpt_reply.
>>
>
> Possibly, this might affect Fusion MPT base driver but no patches are applied:
>
> [ 29.971295] mptbase alternatives: applied 0 out of 3 patches
> [ 29.971295] Fusion MPT base driver 3.04.20
> [ 29.971295] Copyright (c) 1999-2008 LSI Corporation
> [ 29.971295] Fusion MPT SPI Host driver 3.04.20
To sum it up - this issue war triggered by a few special situations:
The kernel patching code uses the altinstructions table from kernel modules to patch
in alternative assembly instructions.
To read the entries it uses a 32-bit ldw() instruction since the table holds 32-bit values.
Because of another issue this table was located at unaligned memory addresses.
That's why then the kernel ldw() emulation jumped in and read the content.
Commit e8aa7b17fe41 ("parisc/unaligned: Rewrite inline assembly of emulate_ldw()")
broke the ldw() emulation and as such invalid 32-bit values were read back.
This then triggered random memory corruption, because the kernel then patched addresses which it shouldn't.
I just sent a patch to the parisc mailing list to fix up the ldw() handler, which
finally fixed this issue here too.
Everyone who runs kernel v5.18+ on parisc should better apply the patch I sent:
https://patchwork.kernel.org/project/linux-parisc/patch/20220626233911.1023515-1-deller@gmx.de/
Helge
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: linux v5.18.3 fails to boot
2022-06-27 0:08 ` Helge Deller
@ 2022-06-27 6:15 ` Sam James
2022-06-27 17:24 ` Kuniyuki Iwashima
1 sibling, 0 replies; 8+ messages in thread
From: Sam James @ 2022-06-27 6:15 UTC (permalink / raw)
To: Helge Deller; +Cc: John David Anglin, Kuniyuki Iwashima, kuniyu, linux-parisc
[-- Attachment #1: Type: text/plain, Size: 2260 bytes --]
> On 27 Jun 2022, at 01:08, Helge Deller <deller@gmx.de> wrote:
>
> On 6/10/22 20:18, John David Anglin wrote:
>> On 2022-06-10 12:49 p.m., John David Anglin wrote:
>>>> The commit was added to prevent compiler optimisation from splitting
>>>> read/write operations. I think it can lead in a change in opcodes but
>>>> must be safe. So I'm not sure why the commit causes boot failure for now.
>>> Neither am I.
>>>>
>>>> I'm not familiar with PARISC and this may be a stupid question though,
>>>> what does `ldd` exactly do? and which line is it executed in the func/file?
>>> ldd performs a 64-bit load to register rp (r2). It is part of mpt_reply's epilogue.
>>> The prior "sync" instruction corresponds to the "mb()" at the end of mpt_reply.
>>>
>>
>> Possibly, this might affect Fusion MPT base driver but no patches are applied:
>>
>> [ 29.971295] mptbase alternatives: applied 0 out of 3 patches
>> [ 29.971295] Fusion MPT base driver 3.04.20
>> [ 29.971295] Copyright (c) 1999-2008 LSI Corporation
>> [ 29.971295] Fusion MPT SPI Host driver 3.04.20
>
> To sum it up - this issue war triggered by a few special situations:
>
> The kernel patching code uses the altinstructions table from kernel modules to patch
> in alternative assembly instructions.
> To read the entries it uses a 32-bit ldw() instruction since the table holds 32-bit values.
> Because of another issue this table was located at unaligned memory addresses.
> That's why then the kernel ldw() emulation jumped in and read the content.
> Commit e8aa7b17fe41 ("parisc/unaligned: Rewrite inline assembly of emulate_ldw()")
> broke the ldw() emulation and as such invalid 32-bit values were read back.
> This then triggered random memory corruption, because the kernel then patched addresses which it shouldn't.
>
> I just sent a patch to the parisc mailing list to fix up the ldw() handler, which
> finally fixed this issue here too.
>
> Everyone who runs kernel v5.18+ on parisc should better apply the patch I sent:
> https://patchwork.kernel.org/project/linux-parisc/patch/20220626233911.1023515-1-deller@gmx.de/
>
Appreciate you summarising - I was just wondering about this bug earlier :)
> Helge
Best,
sam
[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 358 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: linux v5.18.3 fails to boot
2022-06-27 0:08 ` Helge Deller
2022-06-27 6:15 ` Sam James
@ 2022-06-27 17:24 ` Kuniyuki Iwashima
1 sibling, 0 replies; 8+ messages in thread
From: Kuniyuki Iwashima @ 2022-06-27 17:24 UTC (permalink / raw)
To: deller; +Cc: dave.anglin, kuniyu, kuniyu, linux-parisc
From: Helge Deller <deller@gmx.de>
Date: Mon, 27 Jun 2022 02:08:29 +0200
> On 6/10/22 20:18, John David Anglin wrote:
>> On 2022-06-10 12:49 p.m., John David Anglin wrote:
>>>> The commit was added to prevent compiler optimisation from splitting
>>>> read/write operations. I think it can lead in a change in opcodes but
>>>> must be safe. So I'm not sure why the commit causes boot failure for now.
>>> Neither am I.
>>>>
>>>> I'm not familiar with PARISC and this may be a stupid question though,
>>>> what does `ldd` exactly do? and which line is it executed in the func/file?
>>> ldd performs a 64-bit load to register rp (r2). It is part of mpt_reply's epilogue.
>>> The prior "sync" instruction corresponds to the "mb()" at the end of mpt_reply.
>>>
>>
>> Possibly, this might affect Fusion MPT base driver but no patches are applied:
>>
>> [ 29.971295] mptbase alternatives: applied 0 out of 3 patches
>> [ 29.971295] Fusion MPT base driver 3.04.20
>> [ 29.971295] Copyright (c) 1999-2008 LSI Corporation
>> [ 29.971295] Fusion MPT SPI Host driver 3.04.20
>
> To sum it up - this issue war triggered by a few special situations:
>
> The kernel patching code uses the altinstructions table from kernel modules to patch
> in alternative assembly instructions.
> To read the entries it uses a 32-bit ldw() instruction since the table holds 32-bit values.
> Because of another issue this table was located at unaligned memory addresses.
> That's why then the kernel ldw() emulation jumped in and read the content.
> Commit e8aa7b17fe41 ("parisc/unaligned: Rewrite inline assembly of emulate_ldw()")
> broke the ldw() emulation and as such invalid 32-bit values were read back.
> This then triggered random memory corruption, because the kernel then patched addresses which it shouldn't.
>
> I just sent a patch to the parisc mailing list to fix up the ldw() handler, which
> finally fixed this issue here too.
Interesting!
I was wondering enabling CONFIG_STRICT_MODULE_RWX, which was originally off,
could have another impact.
I appreciate your summary and fix!
Best regards,
Kuniyuki
>
> Everyone who runs kernel v5.18+ on parisc should better apply the patch I sent:
> https://patchwork.kernel.org/project/linux-parisc/patch/20220626233911.1023515-1-deller@gmx.de/
>
> Helge
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-06-27 17:41 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-09 18:13 linux v5.18.3 fails to boot John David Anglin
2022-06-10 15:06 ` John David Anglin
2022-06-10 16:06 ` Kuniyuki Iwashima
2022-06-10 16:49 ` John David Anglin
2022-06-10 18:18 ` John David Anglin
2022-06-27 0:08 ` Helge Deller
2022-06-27 6:15 ` Sam James
2022-06-27 17:24 ` Kuniyuki Iwashima
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.