* linux v5.18.3 fails to boot @ 2022-06-09 18:13 John David Anglin 2022-06-10 15:06 ` John David Anglin 0 siblings, 1 reply; 8+ messages in thread From: John David Anglin @ 2022-06-09 18:13 UTC (permalink / raw) To: linux-parisc; +Cc: Helge Deller [-- Attachment #1: Type: text/plain, Size: 2723 bytes --] [...] ata3: SATA link down (SStatus 0 SControl 0) _______________________________ < Your System ate a SPARC! Gah! > ------------------------------- \ ^__^ (__)\ )\/\ U ||----w | || || swapper/0 (pid 0): Illegal instruction (code 8) CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.3+ #1 Hardware name: 9000/785/C8000 YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI PSW: 00001000000001000001101100001110 Not tainted r00-03 0000000008041b0e 000000004e8045b0 0000000010bbda78 000000004e804470 r04-07 0000000010bb5000 0000000000001440 0000000054355000 0000000000001400 r08-11 0000000055000000 000000000000000e 000000000000000f 0000000055002800 r12-15 0000000000000000 0000000055002800 0000000040b668c0 0000000040b668c0 r16-19 0000000000000001 0000000000000001 0000000051b799f0 0000000000000000 r20-23 0000000000000001 fffffffffffff5b9 0000000000000000 0000000000200000 r24-27 000000000000000c 000000000800000e 0000000054355144 0000000040b3e0c0 r28-31 0000000000010001 000000004e804620 000000004e8045b0 0000000040edd040 sr00-03 00000000000a5c00 0000000000000000 0000000000000000 00000000000a5c00 sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000010bbd9e0 0000000010bbd9e4 IIR: 006110c2 ISR: 0000000010240000 IOR: 00000003b76dd048 CPU: 0 CR30: 0000000040edd040 CR31: ffffffffffffffff ORIG_R28: 0000000000000000 IAOQ[0]: mpt_reply+0x130/0x4f0 [mptbase] IAOQ[1]: mpt_reply+0x134/0x4f0 [mptbase] RP(r2): mpt_reply+0x1c8/0x4f0 [mptbase] Backtrace: [<0000000010bbde24>] mpt_interrupt+0x84/0xe8 [mptbase] [<000000004026dd64>] __handle_irq_event_percpu+0xc4/0x250 [<000000004026df28>] handle_irq_event_percpu+0x38/0xd8 [<00000000402776c4>] handle_percpu_irq+0xb4/0xf0 [<000000004026c90c>] generic_handle_irq+0x5c/0x90 [<00000000401a20e4>] call_on_stack+0x18/0x24 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.3+ #1 Hardware name: 9000/785/C8000 Backtrace: [<00000000401a8cd8>] show_stack+0x70/0x90 [<0000000040a8e238>] dump_stack_lvl+0xd8/0x128 [<0000000040a8e2bc>] dump_stack+0x34/0x48 [<00000000401a8efc>] die_if_kernel+0x1e4/0x3f8 [<00000000401a9af4>] handle_interruption+0x59c/0xb58 [<00000000401a107c>] intr_check_sig+0x0/0x3c Kernel panic - not syncing: Fatal exception in interrupt v5.18.2 with similar config is okay. The fault seems consistent. IIR contains illegal instruction. Attached config. Dave -- John David Anglin dave.anglin@bell.net [-- Attachment #2: config-5.18.3+.gz --] [-- Type: application/x-gzip, Size: 22714 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: linux v5.18.3 fails to boot 2022-06-09 18:13 linux v5.18.3 fails to boot John David Anglin @ 2022-06-10 15:06 ` John David Anglin 2022-06-10 16:06 ` Kuniyuki Iwashima 0 siblings, 1 reply; 8+ messages in thread From: John David Anglin @ 2022-06-10 15:06 UTC (permalink / raw) To: linux-parisc; +Cc: Helge Deller, Kuniyuki Iwashima I did a regression search. e039c0b5985999b150594126225e1ee51df7b4c9 is the first bad commit commit e039c0b5985999b150594126225e1ee51df7b4c9 Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Date: Fri Apr 29 14:38:01 2022 -0700 list: fix a data-race around ep->rdllist [ Upstream commit d679ae94fdd5d3ab00c35078f5af5f37e068b03d ] ep_poll() first calls ep_events_available() with no lock held and checks if ep->rdllist is empty by list_empty_careful(), which reads rdllist->prev. Thus all accesses to it need some protection to avoid store/load-tearing. Note INIT_LIST_HEAD_RCU() already has the annotation for both prev and next. Commit bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket fds.") added the first lockless ep_events_available(), and commit c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()") made some ep_events_available() calls lockless and added single call under a lock, finally commit e59d3c64cba6 ("epoll: eliminate unnecessary lock for zero timeout") made the last ep_events_available() lockless. BUG: KCSAN: data-race in do_epoll_wait / do_epoll_wait write to 0xffff88810480c7d8 of 8 bytes by task 1802 on cpu 0: INIT_LIST_HEAD include/linux/list.h:38 [inline] list_splice_init include/linux/list.h:492 [inline] ep_start_scan fs/eventpoll.c:622 [inline] ep_send_events fs/eventpoll.c:1656 [inline] ep_poll fs/eventpoll.c:1806 [inline] do_epoll_wait+0x4eb/0xf40 fs/eventpoll.c:2234 do_epoll_pwait fs/eventpoll.c:2268 [inline] __do_sys_epoll_pwait fs/eventpoll.c:2281 [inline] __se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275 __x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff88810480c7d8 of 8 bytes by task 1799 on cpu 1: list_empty_careful include/linux/list.h:329 [inline] ep_events_available fs/eventpoll.c:381 [inline] ep_poll fs/eventpoll.c:1797 [inline] do_epoll_wait+0x279/0xf40 fs/eventpoll.c:2234 do_epoll_pwait fs/eventpoll.c:2268 [inline] __do_sys_epoll_pwait fs/eventpoll.c:2281 [inline] __se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275 __x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0xffff88810480c7d0 -> 0xffff888103c15098 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 1799 Comm: syz-fuzzer Tainted: G W 5.17.0-rc7-syzkaller-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Link: https://lkml.kernel.org/r/20220322002653.33865-3-kuniyu@amazon.co.jp Fixes: e59d3c64cba6 ("epoll: eliminate unnecessary lock for zero timeout") Fixes: c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()") Fixes: bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket fds.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Reported-by: syzbot+bdd6e38a1ed5ee58d8bd@syzkaller.appspotmail.com Cc: Al Viro <viro@zeniv.linux.org.uk>, Andrew Morton <akpm@linux-foundation.org> Cc: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Cc: Kuniyuki Iwashima <kuni1840@gmail.com> Cc: "Soheil Hassas Yeganeh" <soheil@google.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Sridhar Samudrala" <sridhar.samudrala@intel.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> include/linux/list.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) Reverting above change fixes v5.18.3 boot. It seems the most significant byte of the "ldd -10(r3),rp" instruction in mpt_reply has been set to 0: 4084: bf 80 21 18 cmpb,*<> r0,ret0,4118 <mpt_reply+0x1b8> 4088: 08 04 02 5b copy r4,dp 408c: 00 00 04 00 sync 4090: 0c 61 10 c2 ldd -10(r3),rp See IIR value in crash output. On 2022-06-09 2:13 p.m., John David Anglin wrote: > [...] > ata3: SATA link down (SStatus 0 SControl 0) > _______________________________ > < Your System ate a SPARC! Gah! > > ------------------------------- > \ ^__^ > (__)\ )\/\ > U ||----w | > || || > swapper/0 (pid 0): Illegal instruction (code 8) > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.3+ #1 > Hardware name: 9000/785/C8000 > > YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI > PSW: 00001000000001000001101100001110 Not tainted > r00-03 0000000008041b0e 000000004e8045b0 0000000010bbda78 000000004e804470 > r04-07 0000000010bb5000 0000000000001440 0000000054355000 0000000000001400 > r08-11 0000000055000000 000000000000000e 000000000000000f 0000000055002800 > r12-15 0000000000000000 0000000055002800 0000000040b668c0 0000000040b668c0 > r16-19 0000000000000001 0000000000000001 0000000051b799f0 0000000000000000 > r20-23 0000000000000001 fffffffffffff5b9 0000000000000000 0000000000200000 > r24-27 000000000000000c 000000000800000e 0000000054355144 0000000040b3e0c0 > r28-31 0000000000010001 000000004e804620 000000004e8045b0 0000000040edd040 > sr00-03 00000000000a5c00 0000000000000000 0000000000000000 00000000000a5c00 > sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000010bbd9e0 0000000010bbd9e4 > IIR: 006110c2 ISR: 0000000010240000 IOR: 00000003b76dd048 > CPU: 0 CR30: 0000000040edd040 CR31: ffffffffffffffff > ORIG_R28: 0000000000000000 > IAOQ[0]: mpt_reply+0x130/0x4f0 [mptbase] > IAOQ[1]: mpt_reply+0x134/0x4f0 [mptbase] > RP(r2): mpt_reply+0x1c8/0x4f0 [mptbase] > Backtrace: > [<0000000010bbde24>] mpt_interrupt+0x84/0xe8 [mptbase] > [<000000004026dd64>] __handle_irq_event_percpu+0xc4/0x250 > [<000000004026df28>] handle_irq_event_percpu+0x38/0xd8 > [<00000000402776c4>] handle_percpu_irq+0xb4/0xf0 > [<000000004026c90c>] generic_handle_irq+0x5c/0x90 > [<00000000401a20e4>] call_on_stack+0x18/0x24 > > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.3+ #1 > Hardware name: 9000/785/C8000 > Backtrace: > [<00000000401a8cd8>] show_stack+0x70/0x90 > [<0000000040a8e238>] dump_stack_lvl+0xd8/0x128 > [<0000000040a8e2bc>] dump_stack+0x34/0x48 > [<00000000401a8efc>] die_if_kernel+0x1e4/0x3f8 > [<00000000401a9af4>] handle_interruption+0x59c/0xb58 > [<00000000401a107c>] intr_check_sig+0x0/0x3c > > Kernel panic - not syncing: Fatal exception in interrupt > > v5.18.2 with similar config is okay. The fault seems consistent. IIR contains illegal instruction. > > Attached config. > > Dave > -- John David Anglin dave.anglin@bell.net ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: linux v5.18.3 fails to boot 2022-06-10 15:06 ` John David Anglin @ 2022-06-10 16:06 ` Kuniyuki Iwashima 2022-06-10 16:49 ` John David Anglin 0 siblings, 1 reply; 8+ messages in thread From: Kuniyuki Iwashima @ 2022-06-10 16:06 UTC (permalink / raw) To: dave.anglin; +Cc: deller, kuniyu, linux-parisc Hello, Thanks for heads up! Date: Fri, 10 Jun 2022 11:06:24 -0400 From: John David Anglin <dave.anglin@bell.net> > I did a regression search. e039c0b5985999b150594126225e1ee51df7b4c9 is the first bad commit > > commit e039c0b5985999b150594126225e1ee51df7b4c9 > Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> > Date: Fri Apr 29 14:38:01 2022 -0700 > > list: fix a data-race around ep->rdllist > > [ Upstream commit d679ae94fdd5d3ab00c35078f5af5f37e068b03d ] > > ep_poll() first calls ep_events_available() with no lock held and checks > if ep->rdllist is empty by list_empty_careful(), which reads > rdllist->prev. Thus all accesses to it need some protection to avoid > store/load-tearing. > > Note INIT_LIST_HEAD_RCU() already has the annotation for both prev > and next. > > Commit bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket > fds.") added the first lockless ep_events_available(), and commit > c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()") > made some ep_events_available() calls lockless and added single call under > a lock, finally commit e59d3c64cba6 ("epoll: eliminate unnecessary lock > for zero timeout") made the last ep_events_available() lockless. > > BUG: KCSAN: data-race in do_epoll_wait / do_epoll_wait > > write to 0xffff88810480c7d8 of 8 bytes by task 1802 on cpu 0: > INIT_LIST_HEAD include/linux/list.h:38 [inline] > list_splice_init include/linux/list.h:492 [inline] > ep_start_scan fs/eventpoll.c:622 [inline] > ep_send_events fs/eventpoll.c:1656 [inline] > ep_poll fs/eventpoll.c:1806 [inline] > do_epoll_wait+0x4eb/0xf40 fs/eventpoll.c:2234 > do_epoll_pwait fs/eventpoll.c:2268 [inline] > __do_sys_epoll_pwait fs/eventpoll.c:2281 [inline] > __se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275 > __x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275 > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 > entry_SYSCALL_64_after_hwframe+0x44/0xae > > read to 0xffff88810480c7d8 of 8 bytes by task 1799 on cpu 1: > list_empty_careful include/linux/list.h:329 [inline] > ep_events_available fs/eventpoll.c:381 [inline] > ep_poll fs/eventpoll.c:1797 [inline] > do_epoll_wait+0x279/0xf40 fs/eventpoll.c:2234 > do_epoll_pwait fs/eventpoll.c:2268 [inline] > __do_sys_epoll_pwait fs/eventpoll.c:2281 [inline] > __se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275 > __x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275 > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 > entry_SYSCALL_64_after_hwframe+0x44/0xae > > value changed: 0xffff88810480c7d0 -> 0xffff888103c15098 > > Reported by Kernel Concurrency Sanitizer on: > CPU: 1 PID: 1799 Comm: syz-fuzzer Tainted: G W 5.17.0-rc7-syzkaller-dirty #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > Link: https://lkml.kernel.org/r/20220322002653.33865-3-kuniyu@amazon.co.jp > Fixes: e59d3c64cba6 ("epoll: eliminate unnecessary lock for zero timeout") > Fixes: c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()") > Fixes: bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket fds.") > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> > Reported-by: syzbot+bdd6e38a1ed5ee58d8bd@syzkaller.appspotmail.com > Cc: Al Viro <viro@zeniv.linux.org.uk>, Andrew Morton <akpm@linux-foundation.org> > Cc: Kuniyuki Iwashima <kuniyu@amazon.co.jp> > Cc: Kuniyuki Iwashima <kuni1840@gmail.com> > Cc: "Soheil Hassas Yeganeh" <soheil@google.com> > Cc: Davidlohr Bueso <dave@stgolabs.net> > Cc: "Sridhar Samudrala" <sridhar.samudrala@intel.com> > Cc: Alexander Duyck <alexander.h.duyck@intel.com> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > Signed-off-by: Sasha Levin <sashal@kernel.org> > > include/linux/list.h | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > Reverting above change fixes v5.18.3 boot. > > It seems the most significant byte of the "ldd -10(r3),rp" instruction in mpt_reply has been > set to 0: > > 4084: bf 80 21 18 cmpb,*<> r0,ret0,4118 <mpt_reply+0x1b8> > 4088: 08 04 02 5b copy r4,dp > 408c: 00 00 04 00 sync > 4090: 0c 61 10 c2 ldd -10(r3),rp > > See IIR value in crash output. The commit was added to prevent compiler optimisation from splitting read/write operations. I think it can lead in a change in opcodes but must be safe. So I'm not sure why the commit causes boot failure for now. I'm not familiar with PARISC and this may be a stupid question though, what does `ldd` exactly do? and which line is it executed in the func/file? > On 2022-06-09 2:13 p.m., John David Anglin wrote: > > [...] > > ata3: SATA link down (SStatus 0 SControl 0) > > _______________________________ > > < Your System ate a SPARC! Gah! > > > ------------------------------- > > \ ^__^ > > (__)\ )\/\ > > U ||----w | > > || || > > swapper/0 (pid 0): Illegal instruction (code 8) > > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.3+ #1 > > Hardware name: 9000/785/C8000 > > > > YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI > > PSW: 00001000000001000001101100001110 Not tainted > > r00-03 0000000008041b0e 000000004e8045b0 0000000010bbda78 000000004e804470 > > r04-07 0000000010bb5000 0000000000001440 0000000054355000 0000000000001400 > > r08-11 0000000055000000 000000000000000e 000000000000000f 0000000055002800 > > r12-15 0000000000000000 0000000055002800 0000000040b668c0 0000000040b668c0 > > r16-19 0000000000000001 0000000000000001 0000000051b799f0 0000000000000000 > > r20-23 0000000000000001 fffffffffffff5b9 0000000000000000 0000000000200000 > > r24-27 000000000000000c 000000000800000e 0000000054355144 0000000040b3e0c0 > > r28-31 0000000000010001 000000004e804620 000000004e8045b0 0000000040edd040 > > sr00-03 00000000000a5c00 0000000000000000 0000000000000000 00000000000a5c00 > > sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > > IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000010bbd9e0 0000000010bbd9e4 > > IIR: 006110c2 ISR: 0000000010240000 IOR: 00000003b76dd048 > > CPU: 0 CR30: 0000000040edd040 CR31: ffffffffffffffff > > ORIG_R28: 0000000000000000 > > IAOQ[0]: mpt_reply+0x130/0x4f0 [mptbase] > > IAOQ[1]: mpt_reply+0x134/0x4f0 [mptbase] > > RP(r2): mpt_reply+0x1c8/0x4f0 [mptbase] > > Backtrace: > > [<0000000010bbde24>] mpt_interrupt+0x84/0xe8 [mptbase] > > [<000000004026dd64>] __handle_irq_event_percpu+0xc4/0x250 > > [<000000004026df28>] handle_irq_event_percpu+0x38/0xd8 > > [<00000000402776c4>] handle_percpu_irq+0xb4/0xf0 > > [<000000004026c90c>] generic_handle_irq+0x5c/0x90 > > [<00000000401a20e4>] call_on_stack+0x18/0x24 > > > > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.3+ #1 > > Hardware name: 9000/785/C8000 > > Backtrace: > > [<00000000401a8cd8>] show_stack+0x70/0x90 > > [<0000000040a8e238>] dump_stack_lvl+0xd8/0x128 > > [<0000000040a8e2bc>] dump_stack+0x34/0x48 > > [<00000000401a8efc>] die_if_kernel+0x1e4/0x3f8 > > [<00000000401a9af4>] handle_interruption+0x59c/0xb58 > > [<00000000401a107c>] intr_check_sig+0x0/0x3c Can you decode these stacktraces with ./scripts/decode_stacktrace.sh ? > > Kernel panic - not syncing: Fatal exception in interrupt > > > > v5.18.2 with similar config is okay. The fault seems consistent. IIR contains illegal instruction. > > > > Attached config. Also, can you forward this config to me? Best regards, Kuniyuki ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: linux v5.18.3 fails to boot 2022-06-10 16:06 ` Kuniyuki Iwashima @ 2022-06-10 16:49 ` John David Anglin 2022-06-10 18:18 ` John David Anglin 0 siblings, 1 reply; 8+ messages in thread From: John David Anglin @ 2022-06-10 16:49 UTC (permalink / raw) To: Kuniyuki Iwashima; +Cc: deller, kuniyu, linux-parisc On 2022-06-10 12:06 p.m., Kuniyuki Iwashima wrote: > Hello, > Thanks for heads up! > > Date: Fri, 10 Jun 2022 11:06:24 -0400 > From: John David Anglin <dave.anglin@bell.net> >> I did a regression search. e039c0b5985999b150594126225e1ee51df7b4c9 is the first bad commit >> >> commit e039c0b5985999b150594126225e1ee51df7b4c9 >> Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> >> Date: Fri Apr 29 14:38:01 2022 -0700 >> >> list: fix a data-race around ep->rdllist >> >> [ Upstream commit d679ae94fdd5d3ab00c35078f5af5f37e068b03d ] >> >> ep_poll() first calls ep_events_available() with no lock held and checks >> if ep->rdllist is empty by list_empty_careful(), which reads >> rdllist->prev. Thus all accesses to it need some protection to avoid >> store/load-tearing. >> >> Note INIT_LIST_HEAD_RCU() already has the annotation for both prev >> and next. >> >> Commit bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket >> fds.") added the first lockless ep_events_available(), and commit >> c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()") >> made some ep_events_available() calls lockless and added single call under >> a lock, finally commit e59d3c64cba6 ("epoll: eliminate unnecessary lock >> for zero timeout") made the last ep_events_available() lockless. >> >> BUG: KCSAN: data-race in do_epoll_wait / do_epoll_wait >> >> write to 0xffff88810480c7d8 of 8 bytes by task 1802 on cpu 0: >> INIT_LIST_HEAD include/linux/list.h:38 [inline] >> list_splice_init include/linux/list.h:492 [inline] >> ep_start_scan fs/eventpoll.c:622 [inline] >> ep_send_events fs/eventpoll.c:1656 [inline] >> ep_poll fs/eventpoll.c:1806 [inline] >> do_epoll_wait+0x4eb/0xf40 fs/eventpoll.c:2234 >> do_epoll_pwait fs/eventpoll.c:2268 [inline] >> __do_sys_epoll_pwait fs/eventpoll.c:2281 [inline] >> __se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275 >> __x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275 >> do_syscall_x64 arch/x86/entry/common.c:50 [inline] >> do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 >> entry_SYSCALL_64_after_hwframe+0x44/0xae >> >> read to 0xffff88810480c7d8 of 8 bytes by task 1799 on cpu 1: >> list_empty_careful include/linux/list.h:329 [inline] >> ep_events_available fs/eventpoll.c:381 [inline] >> ep_poll fs/eventpoll.c:1797 [inline] >> do_epoll_wait+0x279/0xf40 fs/eventpoll.c:2234 >> do_epoll_pwait fs/eventpoll.c:2268 [inline] >> __do_sys_epoll_pwait fs/eventpoll.c:2281 [inline] >> __se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275 >> __x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275 >> do_syscall_x64 arch/x86/entry/common.c:50 [inline] >> do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 >> entry_SYSCALL_64_after_hwframe+0x44/0xae >> >> value changed: 0xffff88810480c7d0 -> 0xffff888103c15098 >> >> Reported by Kernel Concurrency Sanitizer on: >> CPU: 1 PID: 1799 Comm: syz-fuzzer Tainted: G W 5.17.0-rc7-syzkaller-dirty #0 >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 >> >> Link: https://lkml.kernel.org/r/20220322002653.33865-3-kuniyu@amazon.co.jp >> Fixes: e59d3c64cba6 ("epoll: eliminate unnecessary lock for zero timeout") >> Fixes: c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()") >> Fixes: bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket fds.") >> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> >> Reported-by: syzbot+bdd6e38a1ed5ee58d8bd@syzkaller.appspotmail.com >> Cc: Al Viro <viro@zeniv.linux.org.uk>, Andrew Morton <akpm@linux-foundation.org> >> Cc: Kuniyuki Iwashima <kuniyu@amazon.co.jp> >> Cc: Kuniyuki Iwashima <kuni1840@gmail.com> >> Cc: "Soheil Hassas Yeganeh" <soheil@google.com> >> Cc: Davidlohr Bueso <dave@stgolabs.net> >> Cc: "Sridhar Samudrala" <sridhar.samudrala@intel.com> >> Cc: Alexander Duyck <alexander.h.duyck@intel.com> >> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> >> Signed-off-by: Sasha Levin <sashal@kernel.org> >> >> include/linux/list.h | 6 +++--- >> 1 file changed, 3 insertions(+), 3 deletions(-) >> >> Reverting above change fixes v5.18.3 boot. >> >> It seems the most significant byte of the "ldd -10(r3),rp" instruction in mpt_reply has been >> set to 0: >> >> 4084: bf 80 21 18 cmpb,*<> r0,ret0,4118 <mpt_reply+0x1b8> >> 4088: 08 04 02 5b copy r4,dp >> 408c: 00 00 04 00 sync >> 4090: 0c 61 10 c2 ldd -10(r3),rp >> >> See IIR value in crash output. > The commit was added to prevent compiler optimisation from splitting > read/write operations. I think it can lead in a change in opcodes but > must be safe. So I'm not sure why the commit causes boot failure for now. Neither am I. > > I'm not familiar with PARISC and this may be a stupid question though, > what does `ldd` exactly do? and which line is it executed in the func/file? ldd performs a 64-bit load to register rp (r2). It is part of mpt_reply's epilogue. The prior "sync" instruction corresponds to the "mb()" at the end of mpt_reply. I would have thought this code should have been write protected. It seems CONFIG_STRICT_MODULE_RWX is not explicitly set in config: CONFIG_ARCH_HAS_STRICT_KERNEL_RWX=y CONFIG_STRICT_KERNEL_RWX=y I think I need to try enabling CONFIG_STRICT_MODULE_RWX. > > >> On 2022-06-09 2:13 p.m., John David Anglin wrote: >>> [...] >>> ata3: SATA link down (SStatus 0 SControl 0) >>> _______________________________ >>> < Your System ate a SPARC! Gah! > >>> ------------------------------- >>> \ ^__^ >>> (__)\ )\/\ >>> U ||----w | >>> || || >>> swapper/0 (pid 0): Illegal instruction (code 8) >>> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.3+ #1 >>> Hardware name: 9000/785/C8000 >>> >>> YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI >>> PSW: 00001000000001000001101100001110 Not tainted >>> r00-03 0000000008041b0e 000000004e8045b0 0000000010bbda78 000000004e804470 >>> r04-07 0000000010bb5000 0000000000001440 0000000054355000 0000000000001400 >>> r08-11 0000000055000000 000000000000000e 000000000000000f 0000000055002800 >>> r12-15 0000000000000000 0000000055002800 0000000040b668c0 0000000040b668c0 >>> r16-19 0000000000000001 0000000000000001 0000000051b799f0 0000000000000000 >>> r20-23 0000000000000001 fffffffffffff5b9 0000000000000000 0000000000200000 >>> r24-27 000000000000000c 000000000800000e 0000000054355144 0000000040b3e0c0 >>> r28-31 0000000000010001 000000004e804620 000000004e8045b0 0000000040edd040 >>> sr00-03 00000000000a5c00 0000000000000000 0000000000000000 00000000000a5c00 >>> sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>> >>> IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000010bbd9e0 0000000010bbd9e4 >>> IIR: 006110c2 ISR: 0000000010240000 IOR: 00000003b76dd048 >>> CPU: 0 CR30: 0000000040edd040 CR31: ffffffffffffffff >>> ORIG_R28: 0000000000000000 >>> IAOQ[0]: mpt_reply+0x130/0x4f0 [mptbase] >>> IAOQ[1]: mpt_reply+0x134/0x4f0 [mptbase] >>> RP(r2): mpt_reply+0x1c8/0x4f0 [mptbase] >>> Backtrace: >>> [<0000000010bbde24>] mpt_interrupt+0x84/0xe8 [mptbase] >>> [<000000004026dd64>] __handle_irq_event_percpu+0xc4/0x250 >>> [<000000004026df28>] handle_irq_event_percpu+0x38/0xd8 >>> [<00000000402776c4>] handle_percpu_irq+0xb4/0xf0 >>> [<000000004026c90c>] generic_handle_irq+0x5c/0x90 >>> [<00000000401a20e4>] call_on_stack+0x18/0x24 >>> >>> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.3+ #1 >>> Hardware name: 9000/785/C8000 >>> Backtrace: >>> [<00000000401a8cd8>] show_stack+0x70/0x90 >>> [<0000000040a8e238>] dump_stack_lvl+0xd8/0x128 >>> [<0000000040a8e2bc>] dump_stack+0x34/0x48 >>> [<00000000401a8efc>] die_if_kernel+0x1e4/0x3f8 >>> [<00000000401a9af4>] handle_interruption+0x59c/0xb58 >>> [<00000000401a107c>] intr_check_sig+0x0/0x3c > Can you decode these stacktraces with ./scripts/decode_stacktrace.sh ? I don't think this is helpful as the code in mpt_reply has been corrupted. > > >>> Kernel panic - not syncing: Fatal exception in interrupt >>> >>> v5.18.2 with similar config is okay. The fault seems consistent. IIR contains illegal instruction. >>> >>> Attached config. > Also, can you forward this config to me? > > Best regards, > Kuniyuki -- John David Anglin dave.anglin@bell.net ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: linux v5.18.3 fails to boot 2022-06-10 16:49 ` John David Anglin @ 2022-06-10 18:18 ` John David Anglin 2022-06-27 0:08 ` Helge Deller 0 siblings, 1 reply; 8+ messages in thread From: John David Anglin @ 2022-06-10 18:18 UTC (permalink / raw) To: Kuniyuki Iwashima; +Cc: deller, kuniyu, linux-parisc [-- Attachment #1: Type: text/plain, Size: 2274 bytes --] On 2022-06-10 12:49 p.m., John David Anglin wrote: >> The commit was added to prevent compiler optimisation from splitting >> read/write operations. I think it can lead in a change in opcodes but >> must be safe. So I'm not sure why the commit causes boot failure for now. > Neither am I. >> >> I'm not familiar with PARISC and this may be a stupid question though, >> what does `ldd` exactly do? and which line is it executed in the func/file? > ldd performs a 64-bit load to register rp (r2). It is part of mpt_reply's epilogue. > The prior "sync" instruction corresponds to the "mb()" at the end of mpt_reply. > > I would have thought this code should have been write protected. It seems > CONFIG_STRICT_MODULE_RWX is not explicitly set in config: > > CONFIG_ARCH_HAS_STRICT_KERNEL_RWX=y > CONFIG_STRICT_KERNEL_RWX=y > > I think I need to try enabling CONFIG_STRICT_MODULE_RWX. With CONFIG_STRICT_MODULE_RWX, the fault went away and the system boots normally. To enable CONFIG_STRICT_MODULE_RWX, I needed to add attached patch to Kconfig. As far as I can tell, this only affects patch_map in the parisc backend: static void __kprobes *patch_map(void *addr, int fixmap, unsigned long *flags, int *need_unmap) { unsigned long uintaddr = (uintptr_t) addr; bool module = !core_kernel_text(uintaddr); struct page *page; *need_unmap = 0; if (module && IS_ENABLED(CONFIG_STRICT_MODULE_RWX)) page = vmalloc_to_page(addr); else if (!module && IS_ENABLED(CONFIG_STRICT_KERNEL_RWX)) page = virt_to_page(addr); else return addr; Possibly, this might affect Fusion MPT base driver but no patches are applied: [ 29.971295] mptbase alternatives: applied 0 out of 3 patches [ 29.971295] Fusion MPT base driver 3.04.20 [ 29.971295] Copyright (c) 1999-2008 LSI Corporation [ 29.971295] Fusion MPT SPI Host driver 3.04.20 Dave -- John David Anglin dave.anglin@bell.net [-- Attachment #2: Kconfig.patch --] [-- Type: text/plain, Size: 410 bytes --] diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig index bd22578859d0..f3a2044ee402 100644 --- a/arch/parisc/Kconfig +++ b/arch/parisc/Kconfig @@ -10,6 +10,7 @@ config PARISC select ARCH_WANT_FRAME_POINTERS select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_STRICT_KERNEL_RWX + select ARCH_HAS_STRICT_MODULE_RWX select ARCH_HAS_UBSAN_SANITIZE_ALL select ARCH_HAS_PTE_SPECIAL select ARCH_NO_SG_CHAIN ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: linux v5.18.3 fails to boot 2022-06-10 18:18 ` John David Anglin @ 2022-06-27 0:08 ` Helge Deller 2022-06-27 6:15 ` Sam James 2022-06-27 17:24 ` Kuniyuki Iwashima 0 siblings, 2 replies; 8+ messages in thread From: Helge Deller @ 2022-06-27 0:08 UTC (permalink / raw) To: John David Anglin, Kuniyuki Iwashima; +Cc: kuniyu, linux-parisc On 6/10/22 20:18, John David Anglin wrote: > On 2022-06-10 12:49 p.m., John David Anglin wrote: >>> The commit was added to prevent compiler optimisation from splitting >>> read/write operations. I think it can lead in a change in opcodes but >>> must be safe. So I'm not sure why the commit causes boot failure for now. >> Neither am I. >>> >>> I'm not familiar with PARISC and this may be a stupid question though, >>> what does `ldd` exactly do? and which line is it executed in the func/file? >> ldd performs a 64-bit load to register rp (r2). It is part of mpt_reply's epilogue. >> The prior "sync" instruction corresponds to the "mb()" at the end of mpt_reply. >> > > Possibly, this might affect Fusion MPT base driver but no patches are applied: > > [ 29.971295] mptbase alternatives: applied 0 out of 3 patches > [ 29.971295] Fusion MPT base driver 3.04.20 > [ 29.971295] Copyright (c) 1999-2008 LSI Corporation > [ 29.971295] Fusion MPT SPI Host driver 3.04.20 To sum it up - this issue war triggered by a few special situations: The kernel patching code uses the altinstructions table from kernel modules to patch in alternative assembly instructions. To read the entries it uses a 32-bit ldw() instruction since the table holds 32-bit values. Because of another issue this table was located at unaligned memory addresses. That's why then the kernel ldw() emulation jumped in and read the content. Commit e8aa7b17fe41 ("parisc/unaligned: Rewrite inline assembly of emulate_ldw()") broke the ldw() emulation and as such invalid 32-bit values were read back. This then triggered random memory corruption, because the kernel then patched addresses which it shouldn't. I just sent a patch to the parisc mailing list to fix up the ldw() handler, which finally fixed this issue here too. Everyone who runs kernel v5.18+ on parisc should better apply the patch I sent: https://patchwork.kernel.org/project/linux-parisc/patch/20220626233911.1023515-1-deller@gmx.de/ Helge ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: linux v5.18.3 fails to boot 2022-06-27 0:08 ` Helge Deller @ 2022-06-27 6:15 ` Sam James 2022-06-27 17:24 ` Kuniyuki Iwashima 1 sibling, 0 replies; 8+ messages in thread From: Sam James @ 2022-06-27 6:15 UTC (permalink / raw) To: Helge Deller; +Cc: John David Anglin, Kuniyuki Iwashima, kuniyu, linux-parisc [-- Attachment #1: Type: text/plain, Size: 2260 bytes --] > On 27 Jun 2022, at 01:08, Helge Deller <deller@gmx.de> wrote: > > On 6/10/22 20:18, John David Anglin wrote: >> On 2022-06-10 12:49 p.m., John David Anglin wrote: >>>> The commit was added to prevent compiler optimisation from splitting >>>> read/write operations. I think it can lead in a change in opcodes but >>>> must be safe. So I'm not sure why the commit causes boot failure for now. >>> Neither am I. >>>> >>>> I'm not familiar with PARISC and this may be a stupid question though, >>>> what does `ldd` exactly do? and which line is it executed in the func/file? >>> ldd performs a 64-bit load to register rp (r2). It is part of mpt_reply's epilogue. >>> The prior "sync" instruction corresponds to the "mb()" at the end of mpt_reply. >>> >> >> Possibly, this might affect Fusion MPT base driver but no patches are applied: >> >> [ 29.971295] mptbase alternatives: applied 0 out of 3 patches >> [ 29.971295] Fusion MPT base driver 3.04.20 >> [ 29.971295] Copyright (c) 1999-2008 LSI Corporation >> [ 29.971295] Fusion MPT SPI Host driver 3.04.20 > > To sum it up - this issue war triggered by a few special situations: > > The kernel patching code uses the altinstructions table from kernel modules to patch > in alternative assembly instructions. > To read the entries it uses a 32-bit ldw() instruction since the table holds 32-bit values. > Because of another issue this table was located at unaligned memory addresses. > That's why then the kernel ldw() emulation jumped in and read the content. > Commit e8aa7b17fe41 ("parisc/unaligned: Rewrite inline assembly of emulate_ldw()") > broke the ldw() emulation and as such invalid 32-bit values were read back. > This then triggered random memory corruption, because the kernel then patched addresses which it shouldn't. > > I just sent a patch to the parisc mailing list to fix up the ldw() handler, which > finally fixed this issue here too. > > Everyone who runs kernel v5.18+ on parisc should better apply the patch I sent: > https://patchwork.kernel.org/project/linux-parisc/patch/20220626233911.1023515-1-deller@gmx.de/ > Appreciate you summarising - I was just wondering about this bug earlier :) > Helge Best, sam [-- Attachment #2: Message signed with OpenPGP --] [-- Type: application/pgp-signature, Size: 358 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: linux v5.18.3 fails to boot 2022-06-27 0:08 ` Helge Deller 2022-06-27 6:15 ` Sam James @ 2022-06-27 17:24 ` Kuniyuki Iwashima 1 sibling, 0 replies; 8+ messages in thread From: Kuniyuki Iwashima @ 2022-06-27 17:24 UTC (permalink / raw) To: deller; +Cc: dave.anglin, kuniyu, kuniyu, linux-parisc From: Helge Deller <deller@gmx.de> Date: Mon, 27 Jun 2022 02:08:29 +0200 > On 6/10/22 20:18, John David Anglin wrote: >> On 2022-06-10 12:49 p.m., John David Anglin wrote: >>>> The commit was added to prevent compiler optimisation from splitting >>>> read/write operations. I think it can lead in a change in opcodes but >>>> must be safe. So I'm not sure why the commit causes boot failure for now. >>> Neither am I. >>>> >>>> I'm not familiar with PARISC and this may be a stupid question though, >>>> what does `ldd` exactly do? and which line is it executed in the func/file? >>> ldd performs a 64-bit load to register rp (r2). It is part of mpt_reply's epilogue. >>> The prior "sync" instruction corresponds to the "mb()" at the end of mpt_reply. >>> >> >> Possibly, this might affect Fusion MPT base driver but no patches are applied: >> >> [ 29.971295] mptbase alternatives: applied 0 out of 3 patches >> [ 29.971295] Fusion MPT base driver 3.04.20 >> [ 29.971295] Copyright (c) 1999-2008 LSI Corporation >> [ 29.971295] Fusion MPT SPI Host driver 3.04.20 > > To sum it up - this issue war triggered by a few special situations: > > The kernel patching code uses the altinstructions table from kernel modules to patch > in alternative assembly instructions. > To read the entries it uses a 32-bit ldw() instruction since the table holds 32-bit values. > Because of another issue this table was located at unaligned memory addresses. > That's why then the kernel ldw() emulation jumped in and read the content. > Commit e8aa7b17fe41 ("parisc/unaligned: Rewrite inline assembly of emulate_ldw()") > broke the ldw() emulation and as such invalid 32-bit values were read back. > This then triggered random memory corruption, because the kernel then patched addresses which it shouldn't. > > I just sent a patch to the parisc mailing list to fix up the ldw() handler, which > finally fixed this issue here too. Interesting! I was wondering enabling CONFIG_STRICT_MODULE_RWX, which was originally off, could have another impact. I appreciate your summary and fix! Best regards, Kuniyuki > > Everyone who runs kernel v5.18+ on parisc should better apply the patch I sent: > https://patchwork.kernel.org/project/linux-parisc/patch/20220626233911.1023515-1-deller@gmx.de/ > > Helge ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-06-27 17:41 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-06-09 18:13 linux v5.18.3 fails to boot John David Anglin 2022-06-10 15:06 ` John David Anglin 2022-06-10 16:06 ` Kuniyuki Iwashima 2022-06-10 16:49 ` John David Anglin 2022-06-10 18:18 ` John David Anglin 2022-06-27 0:08 ` Helge Deller 2022-06-27 6:15 ` Sam James 2022-06-27 17:24 ` Kuniyuki Iwashima
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.