* BUG: soft lockup in smp_call_function @ 2020-07-12 23:02 syzbot 2020-07-13 6:53 ` Dmitry Vyukov ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: syzbot @ 2020-07-12 23:02 UTC (permalink / raw) To: bp, hpa, linux-kernel, luto, mingo, syzkaller-bugs, tglx, x86 Hello, syzbot found the following crash on: HEAD commit: 4437dd6e Merge tag 'io_uring-5.8-2020-07-12' of git://git... git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=17183f0b100000 kernel config: https://syzkaller.appspot.com/x/.config?x=66ad203c2bb6d8b dashboard link: https://syzkaller.appspot.com/bug?extid=cce3691658bef1b12ac9 compiler: gcc (GCC) 10.1.0-syz 20200507 Unfortunately, I don't have any reproducer for this crash yet. IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+cce3691658bef1b12ac9@syzkaller.appspotmail.com watchdog: BUG: soft lockup - CPU#0 stuck for 123s! [syz-executor.5:6928] Modules linked in: irq event stamp: 45064 hardirqs last enabled at (45063): [<ffffffff88000c42>] asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:587 hardirqs last disabled at (45064): [<ffffffff87e4f2ed>] idtentry_enter_cond_rcu+0x1d/0x50 arch/x86/entry/common.c:649 softirqs last enabled at (44672): [<ffffffff88200748>] __do_softirq+0x748/0xa60 kernel/softirq.c:319 softirqs last disabled at (44667): [<ffffffff88000f0f>] asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:711 CPU: 0 PID: 6928 Comm: syz-executor.5 Not tainted 5.8.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:csd_lock_wait kernel/smp.c:108 [inline] RIP: 0010:smp_call_function_single+0x197/0x4f0 kernel/smp.c:382 Code: 48 8d 74 24 40 48 89 44 24 50 48 8b 44 24 08 48 89 44 24 58 e8 fa f9 ff ff 41 89 c5 eb 07 e8 90 00 0b 00 f3 90 44 8b 64 24 48 <31> ff 41 83 e4 01 44 89 e6 e8 fb fc 0a 00 45 85 e4 75 e1 e8 71 00 RSP: 0018:ffffc90008b27aa0 EFLAGS: 00000293 RAX: 0000000000000000 RBX: 1ffff92001164f58 RCX: ffffffff8168b985 RDX: ffff88805cd7e4c0 RSI: ffffffff8168b970 RDI: 0000000000000005 RBP: ffffc90008b27b68 R08: 0000000000000001 R09: ffff8880ae736dc7 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000011 R13: 0000000000000000 R14: 0000000000000001 R15: ffffc90008b27c88 FS: 00000000014c3940(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000556f55a93ed3 CR3: 0000000213330000 CR4: 00000000001406f0 DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 Call Trace: task_function_call+0xd7/0x160 kernel/events/core.c:116 event_function_call+0x20a/0x420 kernel/events/core.c:283 perf_remove_from_context+0x9b/0x1c0 kernel/events/core.c:2338 perf_event_release_kernel+0x485/0xe00 kernel/events/core.c:4918 perf_release+0x33/0x40 kernel/events/core.c:5014 __fput+0x33c/0x880 fs/file_table.c:281 task_work_run+0xdd/0x190 kernel/task_work.c:135 tracehook_notify_resume include/linux/tracehook.h:188 [inline] exit_to_usermode_loop arch/x86/entry/common.c:239 [inline] __prepare_exit_to_usermode+0x1e9/0x1f0 arch/x86/entry/common.c:269 do_syscall_64+0x6c/0xe0 arch/x86/entry/common.c:393 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x416721 Code: Bad RIP value. RSP: 002b:00007ffdb12d68f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000416721 RDX: 0000000000000000 RSI: 0000000000000e0d RDI: 0000000000000004 RBP: 0000000000000001 R08: 00000000569bce0a R09: 00000000569bce0e R10: 00007ffdb12d69e0 R11: 0000000000000293 R12: 000000000078c900 R13: 000000000078c900 R14: ffffffffffffffff R15: 000000000078bf0c Sending NMI from CPU 0 to CPUs 1: NMI backtrace for cpu 1 CPU: 1 PID: 6932 Comm: syz-executor.5 Not tainted 5.8.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:hlock_class kernel/locking/lockdep.c:179 [inline] RIP: 0010:mark_lock+0xc4/0x1710 kernel/locking/lockdep.c:3937 Code: 20 66 81 e3 ff 1f 0f b7 db be 08 00 00 00 48 89 d8 48 c1 f8 06 48 8d 3c c5 20 3a 59 8c e8 e4 6d 59 00 48 0f a3 1d 6c b6 ff 0a <0f> 83 c1 00 00 00 48 69 db b8 00 00 00 48 81 c3 40 3e 59 8c 48 8d RSP: 0018:ffffc90000da8a48 EFLAGS: 00000047 RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffff815983ac RDX: fffffbfff18b2745 RSI: 0000000000000008 RDI: ffffffff8c593a20 RBP: ffff8880336c0540 R08: 0000000000000000 R09: ffffffff8c593a27 R10: fffffbfff18b2744 R11: 0000000000000001 R12: 0000000000000008 R13: ffff8880336c0e5a R14: ffff8880336c0e38 R15: ffff8880336c0e10 FS: 00007fd045d50700(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000012ec08cc CR3: 0000000213330000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 Call Trace: <IRQ> mark_usage kernel/locking/lockdep.c:3885 [inline] __lock_acquire+0x944/0x56e0 kernel/locking/lockdep.c:4334 lock_acquire+0x1f1/0xad0 kernel/locking/lockdep.c:4959 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x8c/0xc0 kernel/locking/spinlock.c:159 debug_object_deactivate lib/debugobjects.c:710 [inline] debug_object_deactivate+0x101/0x300 lib/debugobjects.c:698 debug_hrtimer_deactivate kernel/time/hrtimer.c:421 [inline] debug_deactivate kernel/time/hrtimer.c:482 [inline] __run_hrtimer kernel/time/hrtimer.c:1488 [inline] __hrtimer_run_queues+0x3cb/0xfc0 kernel/time/hrtimer.c:1584 hrtimer_interrupt+0x32a/0x930 kernel/time/hrtimer.c:1646 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1080 [inline] __sysvec_apic_timer_interrupt+0x142/0x5e0 arch/x86/kernel/apic/apic.c:1097 asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:711 </IRQ> __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline] sysvec_apic_timer_interrupt+0xe0/0x120 arch/x86/kernel/apic/apic.c:1091 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:587 RIP: 0010:arch_static_branch arch/x86/include/asm/jump_label.h:25 [inline] RIP: 0010:static_key_false include/linux/jump_label.h:200 [inline] RIP: 0010:native_write_msr+0x6/0x20 arch/x86/include/asm/msr.h:164 Code: c3 89 ef e8 0c 0b f9 ff b8 40 00 00 00 0f 00 d8 89 ef e8 dd 0b f9 ff eb d2 e8 96 6a b2 06 66 0f 1f 44 00 00 89 f9 89 f0 0f 30 <0f> 1f 44 00 00 c3 48 c1 e2 20 89 f6 48 09 d6 31 d2 e9 f4 7c 7a 02 RSP: 0018:ffffc90008b87970 EFLAGS: 00000282 RAX: 00000000000000fb RBX: ffff8880ae619374 RCX: 0000000000000830 RDX: 0000000000000000 RSI: 00000000000000fb RDI: 0000000000000830 RBP: 00000000000000fb R08: 0000000000000001 R09: ffff8880ae636dc7 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000040 paravirt_write_msr arch/x86/include/asm/paravirt.h:173 [inline] wrmsrl arch/x86/include/asm/paravirt.h:206 [inline] native_x2apic_icr_write arch/x86/include/asm/apic.h:249 [inline] __x2apic_send_IPI_dest arch/x86/kernel/apic/x2apic_phys.c:112 [inline] x2apic_send_IPI+0xb5/0x100 arch/x86/kernel/apic/x2apic_phys.c:41 __smp_call_single_queue kernel/smp.c:150 [inline] generic_exec_single+0x118/0x480 kernel/smp.c:181 smp_call_function_single+0x186/0x4f0 kernel/smp.c:379 smp_call_function_many_cond+0x1a4/0x990 kernel/smp.c:518 smp_call_function_many kernel/smp.c:577 [inline] smp_call_function kernel/smp.c:599 [inline] on_each_cpu+0x4a/0x240 kernel/smp.c:699 clock_was_set+0x18/0x20 kernel/time/hrtimer.c:872 timekeeping_inject_offset+0x3e9/0x4d0 kernel/time/timekeeping.c:1305 do_adjtimex+0x28f/0x990 kernel/time/timekeeping.c:2332 do_clock_adjtime kernel/time/posix-timers.c:1109 [inline] __do_sys_clock_adjtime+0x155/0x250 kernel/time/posix-timers.c:1121 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45cba9 Code: Bad RIP value. RSP: 002b:00007fd045d4fc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000131 RAX: ffffffffffffffda RBX: 00000000004dbac0 RCX: 000000000045cba9 RDX: 0000000000000000 RSI: 0000000020000300 RDI: 0000000000000000 RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 0000000000000077 R14: 00000000004c34f3 R15: 00007fd045d506d4 --- This bug is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this bug report. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG: soft lockup in smp_call_function 2020-07-12 23:02 BUG: soft lockup in smp_call_function syzbot @ 2020-07-13 6:53 ` Dmitry Vyukov 2020-07-13 8:58 ` syzbot 2020-07-15 5:43 ` syzbot 2 siblings, 0 replies; 12+ messages in thread From: Dmitry Vyukov @ 2020-07-13 6:53 UTC (permalink / raw) To: syzbot Cc: Borislav Petkov, H. Peter Anvin, LKML, Andy Lutomirski, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers On Mon, Jul 13, 2020 at 1:02 AM syzbot <syzbot+cce3691658bef1b12ac9@syzkaller.appspotmail.com> wrote: > > Hello, > > syzbot found the following crash on: > > HEAD commit: 4437dd6e Merge tag 'io_uring-5.8-2020-07-12' of git://git... > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=17183f0b100000 > kernel config: https://syzkaller.appspot.com/x/.config?x=66ad203c2bb6d8b > dashboard link: https://syzkaller.appspot.com/bug?extid=cce3691658bef1b12ac9 > compiler: gcc (GCC) 10.1.0-syz 20200507 > > Unfortunately, I don't have any reproducer for this crash yet. > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+cce3691658bef1b12ac9@syzkaller.appspotmail.com It looks like something broken in the kernel recently and now instead of diagnosing a stall on one CPU, it diagnoses it as a stall in smp_call_function on another CPU. This produces a large number of actionable crash reports. > watchdog: BUG: soft lockup - CPU#0 stuck for 123s! [syz-executor.5:6928] > Modules linked in: > irq event stamp: 45064 > hardirqs last enabled at (45063): [<ffffffff88000c42>] asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:587 > hardirqs last disabled at (45064): [<ffffffff87e4f2ed>] idtentry_enter_cond_rcu+0x1d/0x50 arch/x86/entry/common.c:649 > softirqs last enabled at (44672): [<ffffffff88200748>] __do_softirq+0x748/0xa60 kernel/softirq.c:319 > softirqs last disabled at (44667): [<ffffffff88000f0f>] asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:711 > CPU: 0 PID: 6928 Comm: syz-executor.5 Not tainted 5.8.0-rc4-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > RIP: 0010:csd_lock_wait kernel/smp.c:108 [inline] > RIP: 0010:smp_call_function_single+0x197/0x4f0 kernel/smp.c:382 > Code: 48 8d 74 24 40 48 89 44 24 50 48 8b 44 24 08 48 89 44 24 58 e8 fa f9 ff ff 41 89 c5 eb 07 e8 90 00 0b 00 f3 90 44 8b 64 24 48 <31> ff 41 83 e4 01 44 89 e6 e8 fb fc 0a 00 45 85 e4 75 e1 e8 71 00 > RSP: 0018:ffffc90008b27aa0 EFLAGS: 00000293 > RAX: 0000000000000000 RBX: 1ffff92001164f58 RCX: ffffffff8168b985 > RDX: ffff88805cd7e4c0 RSI: ffffffff8168b970 RDI: 0000000000000005 > RBP: ffffc90008b27b68 R08: 0000000000000001 R09: ffff8880ae736dc7 > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000011 > R13: 0000000000000000 R14: 0000000000000001 R15: ffffc90008b27c88 > FS: 00000000014c3940(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000556f55a93ed3 CR3: 0000000213330000 CR4: 00000000001406f0 > DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 > Call Trace: > task_function_call+0xd7/0x160 kernel/events/core.c:116 > event_function_call+0x20a/0x420 kernel/events/core.c:283 > perf_remove_from_context+0x9b/0x1c0 kernel/events/core.c:2338 > perf_event_release_kernel+0x485/0xe00 kernel/events/core.c:4918 > perf_release+0x33/0x40 kernel/events/core.c:5014 > __fput+0x33c/0x880 fs/file_table.c:281 > task_work_run+0xdd/0x190 kernel/task_work.c:135 > tracehook_notify_resume include/linux/tracehook.h:188 [inline] > exit_to_usermode_loop arch/x86/entry/common.c:239 [inline] > __prepare_exit_to_usermode+0x1e9/0x1f0 arch/x86/entry/common.c:269 > do_syscall_64+0x6c/0xe0 arch/x86/entry/common.c:393 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > RIP: 0033:0x416721 > Code: Bad RIP value. > RSP: 002b:00007ffdb12d68f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 > RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000416721 > RDX: 0000000000000000 RSI: 0000000000000e0d RDI: 0000000000000004 > RBP: 0000000000000001 R08: 00000000569bce0a R09: 00000000569bce0e > R10: 00007ffdb12d69e0 R11: 0000000000000293 R12: 000000000078c900 > R13: 000000000078c900 R14: ffffffffffffffff R15: 000000000078bf0c > Sending NMI from CPU 0 to CPUs 1: > NMI backtrace for cpu 1 > CPU: 1 PID: 6932 Comm: syz-executor.5 Not tainted 5.8.0-rc4-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > RIP: 0010:hlock_class kernel/locking/lockdep.c:179 [inline] > RIP: 0010:mark_lock+0xc4/0x1710 kernel/locking/lockdep.c:3937 > Code: 20 66 81 e3 ff 1f 0f b7 db be 08 00 00 00 48 89 d8 48 c1 f8 06 48 8d 3c c5 20 3a 59 8c e8 e4 6d 59 00 48 0f a3 1d 6c b6 ff 0a <0f> 83 c1 00 00 00 48 69 db b8 00 00 00 48 81 c3 40 3e 59 8c 48 8d > RSP: 0018:ffffc90000da8a48 EFLAGS: 00000047 > RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffff815983ac > RDX: fffffbfff18b2745 RSI: 0000000000000008 RDI: ffffffff8c593a20 > RBP: ffff8880336c0540 R08: 0000000000000000 R09: ffffffff8c593a27 > R10: fffffbfff18b2744 R11: 0000000000000001 R12: 0000000000000008 > R13: ffff8880336c0e5a R14: ffff8880336c0e38 R15: ffff8880336c0e10 > FS: 00007fd045d50700(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000012ec08cc CR3: 0000000213330000 CR4: 00000000001406e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 > Call Trace: > <IRQ> > mark_usage kernel/locking/lockdep.c:3885 [inline] > __lock_acquire+0x944/0x56e0 kernel/locking/lockdep.c:4334 > lock_acquire+0x1f1/0xad0 kernel/locking/lockdep.c:4959 > __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] > _raw_spin_lock_irqsave+0x8c/0xc0 kernel/locking/spinlock.c:159 > debug_object_deactivate lib/debugobjects.c:710 [inline] > debug_object_deactivate+0x101/0x300 lib/debugobjects.c:698 > debug_hrtimer_deactivate kernel/time/hrtimer.c:421 [inline] > debug_deactivate kernel/time/hrtimer.c:482 [inline] > __run_hrtimer kernel/time/hrtimer.c:1488 [inline] > __hrtimer_run_queues+0x3cb/0xfc0 kernel/time/hrtimer.c:1584 > hrtimer_interrupt+0x32a/0x930 kernel/time/hrtimer.c:1646 > local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1080 [inline] > __sysvec_apic_timer_interrupt+0x142/0x5e0 arch/x86/kernel/apic/apic.c:1097 > asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:711 > </IRQ> > __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline] > run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline] > sysvec_apic_timer_interrupt+0xe0/0x120 arch/x86/kernel/apic/apic.c:1091 > asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:587 > RIP: 0010:arch_static_branch arch/x86/include/asm/jump_label.h:25 [inline] > RIP: 0010:static_key_false include/linux/jump_label.h:200 [inline] > RIP: 0010:native_write_msr+0x6/0x20 arch/x86/include/asm/msr.h:164 > Code: c3 89 ef e8 0c 0b f9 ff b8 40 00 00 00 0f 00 d8 89 ef e8 dd 0b f9 ff eb d2 e8 96 6a b2 06 66 0f 1f 44 00 00 89 f9 89 f0 0f 30 <0f> 1f 44 00 00 c3 48 c1 e2 20 89 f6 48 09 d6 31 d2 e9 f4 7c 7a 02 > RSP: 0018:ffffc90008b87970 EFLAGS: 00000282 > RAX: 00000000000000fb RBX: ffff8880ae619374 RCX: 0000000000000830 > RDX: 0000000000000000 RSI: 00000000000000fb RDI: 0000000000000830 > RBP: 00000000000000fb R08: 0000000000000001 R09: ffff8880ae636dc7 > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000040 > paravirt_write_msr arch/x86/include/asm/paravirt.h:173 [inline] > wrmsrl arch/x86/include/asm/paravirt.h:206 [inline] > native_x2apic_icr_write arch/x86/include/asm/apic.h:249 [inline] > __x2apic_send_IPI_dest arch/x86/kernel/apic/x2apic_phys.c:112 [inline] > x2apic_send_IPI+0xb5/0x100 arch/x86/kernel/apic/x2apic_phys.c:41 > __smp_call_single_queue kernel/smp.c:150 [inline] > generic_exec_single+0x118/0x480 kernel/smp.c:181 > smp_call_function_single+0x186/0x4f0 kernel/smp.c:379 > smp_call_function_many_cond+0x1a4/0x990 kernel/smp.c:518 > smp_call_function_many kernel/smp.c:577 [inline] > smp_call_function kernel/smp.c:599 [inline] > on_each_cpu+0x4a/0x240 kernel/smp.c:699 > clock_was_set+0x18/0x20 kernel/time/hrtimer.c:872 > timekeeping_inject_offset+0x3e9/0x4d0 kernel/time/timekeeping.c:1305 > do_adjtimex+0x28f/0x990 kernel/time/timekeeping.c:2332 > do_clock_adjtime kernel/time/posix-timers.c:1109 [inline] > __do_sys_clock_adjtime+0x155/0x250 kernel/time/posix-timers.c:1121 > do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > RIP: 0033:0x45cba9 > Code: Bad RIP value. > RSP: 002b:00007fd045d4fc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000131 > RAX: ffffffffffffffda RBX: 00000000004dbac0 RCX: 000000000045cba9 > RDX: 0000000000000000 RSI: 0000000020000300 RDI: 0000000000000000 > RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff > R13: 0000000000000077 R14: 00000000004c34f3 R15: 00007fd045d506d4 > > > --- > This bug is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkaller@googlegroups.com. > > syzbot will keep track of this bug report. See: > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > -- > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/000000000000080a5f05aa4691a8%40google.com. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG: soft lockup in smp_call_function 2020-07-12 23:02 BUG: soft lockup in smp_call_function syzbot 2020-07-13 6:53 ` Dmitry Vyukov @ 2020-07-13 8:58 ` syzbot 2020-07-15 5:43 ` syzbot 2 siblings, 0 replies; 12+ messages in thread From: syzbot @ 2020-07-13 8:58 UTC (permalink / raw) To: bp, dvyukov, hpa, linux-kernel, luto, mingo, syzkaller-bugs, tglx, x86 syzbot has found a reproducer for the following crash on: HEAD commit: 4437dd6e Merge tag 'io_uring-5.8-2020-07-12' of git://git... git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=15998b47100000 kernel config: https://syzkaller.appspot.com/x/.config?x=66ad203c2bb6d8b dashboard link: https://syzkaller.appspot.com/bug?extid=cce3691658bef1b12ac9 compiler: gcc (GCC) 10.1.0-syz 20200507 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11a160bf100000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16d2bd77100000 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+cce3691658bef1b12ac9@syzkaller.appspotmail.com watchdog: BUG: soft lockup - CPU#1 stuck for 123s! [syz-executor644:6902] Modules linked in: irq event stamp: 173384 hardirqs last enabled at (173383): [<ffffffff88000c42>] asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:587 hardirqs last disabled at (173384): [<ffffffff87e4f2ed>] idtentry_enter_cond_rcu+0x1d/0x50 arch/x86/entry/common.c:649 softirqs last enabled at (172872): [<ffffffff88200748>] __do_softirq+0x748/0xa60 kernel/softirq.c:319 softirqs last disabled at (172863): [<ffffffff88000f0f>] asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:711 CPU: 1 PID: 6902 Comm: syz-executor644 Not tainted 5.8.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:csd_lock_wait kernel/smp.c:108 [inline] RIP: 0010:smp_call_function_single+0x192/0x4f0 kernel/smp.c:382 Code: 10 8b 7c 24 1c 48 8d 74 24 40 48 89 44 24 50 48 8b 44 24 08 48 89 44 24 58 e8 fa f9 ff ff 41 89 c5 eb 07 e8 90 00 0b 00 f3 90 <44> 8b 64 24 48 31 ff 41 83 e4 01 44 89 e6 e8 fb fc 0a 00 45 85 e4 RSP: 0018:ffffc900023a7a60 EFLAGS: 00000293 RAX: 0000000000000000 RBX: 1ffff92000474f50 RCX: ffffffff8168b985 RDX: ffff8880955564c0 RSI: ffffffff8168b970 RDI: 0000000000000005 RBP: ffffc900023a7b28 R08: 0000000000000001 R09: ffff8880ae636dc7 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000040 FS: 0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004c8aa8 CR3: 0000000009a79000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: smp_call_function_many_cond+0x1a4/0x990 kernel/smp.c:518 __flush_tlb_others arch/x86/include/asm/paravirt.h:74 [inline] flush_tlb_others arch/x86/mm/tlb.c:856 [inline] flush_tlb_mm_range+0x295/0x4d0 arch/x86/mm/tlb.c:943 tlb_flush arch/x86/include/asm/tlb.h:24 [inline] tlb_flush_mmu_tlbonly include/asm-generic/tlb.h:425 [inline] tlb_flush_mmu_tlbonly include/asm-generic/tlb.h:415 [inline] tlb_flush_mmu mm/mmu_gather.c:248 [inline] tlb_finish_mmu+0x3e7/0x8c0 mm/mmu_gather.c:328 exit_mmap+0x2d1/0x510 mm/mmap.c:3152 __mmput+0x122/0x470 kernel/fork.c:1093 mmput+0x53/0x60 kernel/fork.c:1114 exit_mm kernel/exit.c:482 [inline] do_exit+0xa8f/0x2a40 kernel/exit.c:792 do_group_exit+0x125/0x310 kernel/exit.c:903 __do_sys_exit_group kernel/exit.c:914 [inline] __se_sys_exit_group kernel/exit.c:912 [inline] __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:912 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x442478 Code: Bad RIP value. RSP: 002b:00007ffeb3621f28 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 0000000000442478 RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001 RBP: 00000000004c8a70 R08: 00000000000000e7 R09: ffffffffffffffd0 R10: 0000000001bbbbbb R11: 0000000000000246 R12: 0000000000000001 R13: 00000000006dba40 R14: 0000000000000000 R15: 0000000000000000 Sending NMI from CPU 1 to CPUs 0: NMI backtrace for cpu 0 CPU: 0 PID: 7110 Comm: kworker/0:0 Not tainted 5.8.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: wg-crypt-wg1 wg_packet_tx_worker RIP: 0010:__lock_is_held kernel/locking/lockdep.c:4733 [inline] RIP: 0010:lock_is_held_type+0x72/0xe0 kernel/locking/lockdep.c:4996 Code: 00 00 85 c0 7f 0d eb 72 83 c3 01 3b 9d c8 08 00 00 7d 67 48 63 c3 4c 89 e6 48 8d 04 80 4d 8d 74 c5 00 4c 89 f7 e8 1e fe ff ff <85> c0 74 da 8b 4c 24 04 41 bc 01 00 00 00 83 f9 ff 74 11 41 0f b6 RSP: 0018:ffffc90000007d38 EFLAGS: 00000046 RAX: 0000000000000001 RBX: 0000000000000006 RCX: ffffffff86626577 RDX: dffffc0000000000 RSI: ffff888093ce4300 RDI: ffff888094a20c00 RBP: ffff888094a20240 R08: 0000000000000001 R09: 0000000000000003 R10: 0000000000000000 R11: 0000000000000000 R12: ffff888093ce4300 R13: ffff888094a20b10 R14: ffff888094a20c00 R15: 0000000000000082 FS: 0000000000000000(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000610 CR3: 000000009e49a000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> lock_is_held include/linux/lockdep.h:425 [inline] advance_sched+0x638/0x990 net/sched/sch_taprio.c:702 __run_hrtimer kernel/time/hrtimer.c:1520 [inline] __hrtimer_run_queues+0x6a9/0xfc0 kernel/time/hrtimer.c:1584 hrtimer_interrupt+0x32a/0x930 kernel/time/hrtimer.c:1646 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1080 [inline] __sysvec_apic_timer_interrupt+0x142/0x5e0 arch/x86/kernel/apic/apic.c:1097 asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:711 </IRQ> __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline] sysvec_apic_timer_interrupt+0xe0/0x120 arch/x86/kernel/apic/apic.c:1091 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:587 RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:765 [inline] RIP: 0010:seqcount_lockdep_reader_access include/linux/seqlock.h:99 [inline] RIP: 0010:read_seqcount_begin.constprop.0+0x110/0x1f0 include/linux/seqlock.h:182 Code: 00 fc ff df 48 c1 e8 03 80 3c 10 00 0f 85 d0 00 00 00 48 83 3d 90 ff 44 03 00 0f 84 92 00 00 00 e8 05 d9 03 fb 48 89 df 57 9d <0f> 1f 44 00 00 e8 f6 d8 03 fb 44 8b 25 6f 4a 3f 04 31 ff 44 89 e3 RSP: 0018:ffffc90001b67630 EFLAGS: 00000293 RAX: 0000000000000000 RBX: 0000000000000293 RCX: 0000000000000003 RDX: ffff888094a20240 RSI: ffffffff866fe0fb RDI: 0000000000000293 RBP: 0000000000000200 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffff8880941ed2c0 R13: 0000000011108e1c R14: 0000000000000000 R15: dffffc0000000000 nf_conntrack_get_ht include/net/netfilter/nf_conntrack.h:300 [inline] ____nf_conntrack_find net/netfilter/nf_conntrack_core.c:740 [inline] __nf_conntrack_find_get+0xcb/0xf30 net/netfilter/nf_conntrack_core.c:778 resolve_normal_ct net/netfilter/nf_conntrack_core.c:1684 [inline] nf_conntrack_in+0x484/0x12a0 net/netfilter/nf_conntrack_core.c:1846 ipv4_conntrack_local+0x11c/0x220 net/netfilter/nf_conntrack_proto.c:200 nf_hook_entry_hookfn include/linux/netfilter.h:135 [inline] nf_hook_slow+0xc5/0x1e0 net/netfilter/core.c:512 nf_hook+0x2b8/0x540 include/linux/netfilter.h:262 __ip_local_out+0x26e/0x530 net/ipv4/ip_output.c:114 ip_local_out+0x26/0x1a0 net/ipv4/ip_output.c:123 iptunnel_xmit+0x5a0/0x9b0 net/ipv4/ip_tunnel_core.c:81 send4+0x4c6/0xd70 drivers/net/wireguard/socket.c:85 wg_socket_send_skb_to_peer+0x192/0x220 drivers/net/wireguard/socket.c:174 wg_packet_create_data_done drivers/net/wireguard/send.c:252 [inline] wg_packet_tx_worker+0x2f9/0x980 drivers/net/wireguard/send.c:280 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:291 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293 INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 0.000 msecs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG: soft lockup in smp_call_function 2020-07-12 23:02 BUG: soft lockup in smp_call_function syzbot 2020-07-13 6:53 ` Dmitry Vyukov 2020-07-13 8:58 ` syzbot @ 2020-07-15 5:43 ` syzbot 2 siblings, 0 replies; 12+ messages in thread From: syzbot @ 2020-07-15 5:43 UTC (permalink / raw) To: bp, davem, dvyukov, hpa, jhs, jiri, keescook, linux-kernel, luto, luto, mingo, netdev, syzkaller-bugs, tglx, vinicius.gomes, wad, x86, xiyou.wangcong syzbot has bisected this issue to: commit 5a781ccbd19e4664babcbe4b4ead7aa2b9283d22 Author: Vinicius Costa Gomes <vinicius.gomes@intel.com> Date: Sat Sep 29 00:59:43 2018 +0000 tc: Add support for configuring the taprio scheduler bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1196d06f100000 start commit: 4437dd6e Merge tag 'io_uring-5.8-2020-07-12' of git://git... git tree: upstream final oops: https://syzkaller.appspot.com/x/report.txt?x=1396d06f100000 console output: https://syzkaller.appspot.com/x/log.txt?x=1596d06f100000 kernel config: https://syzkaller.appspot.com/x/.config?x=66ad203c2bb6d8b dashboard link: https://syzkaller.appspot.com/bug?extid=cce3691658bef1b12ac9 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11a160bf100000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16d2bd77100000 Reported-by: syzbot+cce3691658bef1b12ac9@syzkaller.appspotmail.com Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler") For information about bisection process see: https://goo.gl/tpsmEJ#bisection ^ permalink raw reply [flat|nested] 12+ messages in thread
* BUG: soft lockup in smp_call_function @ 2021-09-15 1:59 Hao Sun 2021-09-15 5:12 ` Zhang, Qiang 0 siblings, 1 reply; 12+ messages in thread From: Hao Sun @ 2021-09-15 1:59 UTC (permalink / raw) To: linux-kernel; +Cc: mingo Hello, When using Healer to fuzz the latest Linux kernel, the following two similar crashes was triggered. HEAD commit: 6880fa6c5660 Linux 5.15-rc1 git tree: upstream console output: https://drive.google.com/file/d/1W2g_mj5JPDXI5U7HibVepNdxMdChKS0_/view?usp=sharing kernel config: https://drive.google.com/file/d/1rUzyMbe5vcs6khA3tL9EHTLJvsUdWcgB/view?usp=sharing similar report: INFO: rcu detected stall in smp_call_function Sorry, I don't have a reproducer for this crash, hope the symbolized report can help. If you fix this issue, please add the following tag to the commit: Reported-by: Hao Sun <sunhao.th@gmail.com> watchdog: BUG: soft lockup - CPU#1 stuck for 157s! [kworker/1:8:14241] Modules linked in: irq event stamp: 371624 hardirqs last enabled at (371623): [<ffffffff84400c02>] asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 hardirqs last disabled at (371624): [<ffffffff8424818a>] sysvec_apic_timer_interrupt+0xa/0xc0 arch/x86/kernel/apic/apic.c:1097 softirqs last enabled at (337636): [<ffffffff84600398>] softirq_handle_end kernel/softirq.c:401 [inline] softirqs last enabled at (337636): [<ffffffff84600398>] __do_softirq+0x398/0x561 kernel/softirq.c:587 softirqs last disabled at (337625): [<ffffffff81235b42>] invoke_softirq kernel/softirq.c:432 [inline] softirqs last disabled at (337625): [<ffffffff81235b42>] __irq_exit_rcu kernel/softirq.c:636 [inline] softirqs last disabled at (337625): [<ffffffff81235b42>] irq_exit_rcu+0xe2/0x100 kernel/softirq.c:648 CPU: 1 PID: 14241 Comm: kworker/1:8 Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: rcu_gp wait_rcu_exp_gp RIP: 0010:csd_lock_wait kernel/smp.c:440 [inline] RIP: 0010:csd_lock kernel/smp.c:446 [inline] RIP: 0010:smp_call_function_single+0x1a3/0x220 kernel/smp.c:745 Code: 48 c7 c3 80 a8 02 00 e8 cb b1 f1 02 89 c0 48 83 f8 07 77 6c 48 03 1c c5 80 98 62 85 8b 43 08 a8 01 74 0e e8 3f 9d 04 00 f3 90 <8b> 43 08 a8 01 75 f2 e8 31 9d 04 00 83 4b 08 01 48 89 de 44 89 e7 RSP: 0018:ffffc90001d4fd00 EFLAGS: 00000293 RAX: 0000000000000000 RBX: ffff88813dc2a880 RCX: 0000000000000000 RDX: ffff888112ac8000 RSI: ffffffff8132f4c1 RDI: ffffffff852d80e6 RBP: ffffc90001d4fd90 R08: 0000000000000001 R09: 0000000000000001 R10: ffffc90001d4fda8 R11: 0000000000000002 R12: 0000000000000002 R13: 0000000000000000 R14: ffffffff812ef7c0 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88813dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055a15807ccf0 CR3: 0000000025f6b000 CR4: 0000000000750ee0 PKRU: 55555554 Call Trace: sync_rcu_exp_select_node_cpus+0x27b/0x440 kernel/rcu/tree_exp.h:393 sync_rcu_exp_select_cpus+0x238/0x5d0 kernel/rcu/tree_exp.h:441 rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:615 [inline] wait_rcu_exp_gp+0xa/0x20 kernel/rcu/tree_exp.h:629 process_one_work+0x359/0x850 kernel/workqueue.c:2297 worker_thread+0x41/0x4d0 kernel/workqueue.c:2444 kthread+0x178/0x1b0 kernel/kthread.c:319 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 Sending NMI from CPU 1 to CPUs 0,2-3: NMI backtrace for cpu 0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:__lock_acquire+0x14f/0x1d60 kernel/locking/lockdep.c:4955 Code: 00 48 89 43 08 48 8b 04 24 48 89 43 18 41 8b b5 dc 09 00 00 65 8b 05 08 ed d5 7e 85 c0 0f 95 c2 31 c0 85 f6 44 89 fe 0f 95 c0 <83> e1 03 c1 e6 07 8d 14 50 0f b6 43 21 83 e2 03 c1 e2 05 83 e0 1f RSP: 0018:ffffc90000003c40 EFLAGS: 00000002 RAX: 0000000000000001 RBX: ffffffff858b2d70 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88813dd1a3d8 RBP: 0000000000000057 R08: 0000000000000001 R09: 0000000000000001 R10: ffff88813dd1a3d8 R11: 0000000000000003 R12: 0000000000000001 R13: ffffffff858b2300 R14: ffffffff858b2cf8 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055a1580bf6d7 CR3: 0000000025f6b000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <IRQ> lock_acquire+0x1f9/0x340 kernel/locking/lockdep.c:5625 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x38/0x50 kernel/locking/spinlock.c:162 lock_hrtimer_base+0x2f/0x70 kernel/time/hrtimer.c:173 hrtimer_try_to_cancel+0x6d/0x270 kernel/time/hrtimer.c:1331 hrtimer_cancel+0x12/0x30 kernel/time/hrtimer.c:1443 __disable_vblank drivers/gpu/drm/drm_vblank.c:434 [inline] drm_vblank_disable_and_save+0xca/0x130 drivers/gpu/drm/drm_vblank.c:478 vblank_disable_fn+0x83/0xa0 drivers/gpu/drm/drm_vblank.c:495 call_timer_fn+0xcb/0x3f0 kernel/time/timer.c:1421 expire_timers kernel/time/timer.c:1466 [inline] __run_timers kernel/time/timer.c:1734 [inline] run_timer_softirq+0x6bd/0x820 kernel/time/timer.c:1747 __do_softirq+0xe9/0x561 kernel/softirq.c:558 invoke_softirq kernel/softirq.c:432 [inline] __irq_exit_rcu kernel/softirq.c:636 [inline] irq_exit_rcu+0xe2/0x100 kernel/softirq.c:648 sysvec_apic_timer_interrupt+0x9e/0xc0 arch/x86/kernel/apic/apic.c:1097 </IRQ> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 RIP: 0010:default_idle+0xb/0x10 arch/x86/kernel/process.c:717 Code: 60 ce 25 84 75 09 48 83 c4 18 5b 5d 41 5c c3 e8 ab c4 fe ff cc cc cc cc cc cc cc cc cc cc cc eb 07 0f 00 2d ff 00 5c 00 fb f4 <c3> 0f 1f 40 00 65 48 8b 04 25 40 70 01 00 f0 80 48 02 20 48 8b 10 RSP: 0018:ffffffff85803e90 EFLAGS: 00000202 RAX: 00000000010c94ab RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff85524f3d RDI: ffffffff853cbec6 RBP: ffffffff860d9290 R08: 0000000000000001 R09: 0000000000000001 R10: ffffffff85803be8 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff858b2300 default_idle_call+0x6a/0x260 kernel/sched/idle.c:112 cpuidle_idle_call kernel/sched/idle.c:194 [inline] do_idle+0x1e9/0x2b0 kernel/sched/idle.c:306 cpu_startup_entry+0x14/0x20 kernel/sched/idle.c:403 start_kernel+0x839/0x860 init/main.c:1141 secondary_startup_64_no_verify+0xb0/0xbb NMI backtrace for cpu 2 CPU: 2 PID: 9794 Comm: sshd Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:__sanitizer_cov_trace_pc+0x37/0x60 kernel/kcov.c:197 Code: 65 48 8b 14 25 40 70 01 00 81 e1 00 01 00 00 a9 00 01 ff 00 74 0e 85 c9 74 35 8b 82 34 15 00 00 85 c0 74 2b 8b 82 10 15 00 00 <83> f8 02 75 20 48 8b 8a 18 15 00 00 8b 92 14 15 00 00 48 8b 01 48 RSP: 0018:ffffc900008cfa80 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff88807dc318e0 RCX: 0000000000000000 RDX: ffff88800e9c4480 RSI: ffffffff8132ebc0 RDI: 00000000ffffffff RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001 R10: ffffc900008cfa30 R11: 0000000000000000 R12: ffff88807dd2a900 R13: ffff88807dd2a908 R14: ffffffff85a27620 R15: 0000000000000000 FS: 00007f831f8568c0(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffd76d1df1f CR3: 000000000fe3c000 CR4: 0000000000750ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: rep_nop arch/x86/include/asm/vdso/processor.h:13 [inline] cpu_relax arch/x86/include/asm/vdso/processor.h:18 [inline] csd_lock_wait kernel/smp.c:440 [inline] smp_call_function_many_cond+0x1d0/0x550 kernel/smp.c:969 on_each_cpu_cond_mask+0x48/0x90 kernel/smp.c:1135 __purge_vmap_area_lazy+0xd7/0xa40 mm/vmalloc.c:1683 _vm_unmap_aliases+0x233/0x280 mm/vmalloc.c:2107 change_page_attr_set_clr+0xb3/0x340 arch/x86/mm/pat/set_memory.c:1740 change_page_attr_clear arch/x86/mm/pat/set_memory.c:1797 [inline] set_memory_ro+0x2b/0x40 arch/x86/mm/pat/set_memory.c:1943 bpf_jit_binary_lock_ro include/linux/filter.h:890 [inline] bpf_int_jit_compile+0x65d/0x6b0 arch/x86/net/bpf_jit_comp.c:2338 bpf_prog_select_runtime+0x173/0x1e0 kernel/bpf/core.c:1914 bpf_migrate_filter net/core/filter.c:1295 [inline] bpf_prepare_filter+0x40d/0x5d0 net/core/filter.c:1343 bpf_prog_create_from_user+0xcf/0x150 net/core/filter.c:1437 seccomp_prepare_filter kernel/seccomp.c:666 [inline] seccomp_prepare_user_filter kernel/seccomp.c:703 [inline] seccomp_set_mode_filter kernel/seccomp.c:1824 [inline] do_seccomp+0x2ca/0xdb0 kernel/seccomp.c:1944 prctl_set_seccomp+0x2f/0x60 kernel/seccomp.c:1997 __do_sys_prctl kernel/sys.c:2346 [inline] __se_sys_prctl kernel/sys.c:2264 [inline] __x64_sys_prctl+0xd4/0x770 kernel/sys.c:2264 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x34/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f831d9cd54a Code: 48 8b 0d 51 f9 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 9d 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1e f9 2a 00 f7 d8 64 89 01 48 RSP: 002b:00007ffd76d1c428 EFLAGS: 00000246 ORIG_RAX: 000000000000009d RAX: ffffffffffffffda RBX: 00005597d3a36f20 RCX: 00007f831d9cd54a RDX: 00005597d32bdf50 RSI: 0000000000000002 RDI: 0000000000000016 RBP: 00005597d3a35d40 R08: 0000000000000000 R09: 0000000000000005 R10: 00007f831d9cd54a R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000028 R14: 0000000000000000 R15: 00007ffd76d1c870 NMI backtrace for cpu 3 CPU: 3 PID: 9782 Comm: syz-executor Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:kvm_wait+0x4a/0x70 arch/x86/kernel/kvm.c:893 Code: 5d c3 89 f3 48 89 fd 9c 58 fa f6 c4 02 75 30 0f b6 45 00 38 c3 74 16 e8 94 82 25 00 fb 5b 5d c3 eb 07 0f 00 2d 95 c1 6d 03 f4 <5b> 5d c3 e8 7e 82 25 00 eb 07 0f 00 2d 83 c1 6d 03 fb f4 eb c0 e8 RSP: 0000:ffffc90000768e00 EFLAGS: 00000046 RAX: 0000000000000003 RBX: ffff888100998338 RCX: 0000000000000008 RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff888100998338 RBP: ffff88813dd2a600 R08: 0000000000000000 R09: 0000000000000001 R10: ffffc90000768d20 R11: 0000000000000002 R12: 0000000000000001 R13: 0000000000000100 R14: 0000000000000000 R15: 0000000000100000 FS: 00007f849523b700(0000) GS:ffff88813dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020400010 CR3: 0000000025f6b000 CR4: 0000000000750ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> pv_wait arch/x86/include/asm/paravirt.h:597 [inline] pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:470 [inline] __pv_queued_spin_lock_slowpath+0x262/0x330 kernel/locking/qspinlock.c:508 pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:585 [inline] queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline] queued_spin_lock include/asm-generic/qspinlock.h:85 [inline] do_raw_spin_lock+0xb6/0xc0 kernel/locking/spinlock_debug.c:115 spin_lock include/linux/spinlock.h:363 [inline] drm_handle_vblank+0x86/0x530 drivers/gpu/drm/drm_vblank.c:1951 vkms_vblank_simulate+0x5a/0x190 drivers/gpu/drm/vkms/vkms_crtc.c:29 __run_hrtimer kernel/time/hrtimer.c:1685 [inline] __hrtimer_run_queues+0xb8/0x610 kernel/time/hrtimer.c:1749 hrtimer_interrupt+0xfe/0x280 kernel/time/hrtimer.c:1811 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline] __sysvec_apic_timer_interrupt+0x9c/0x2c0 arch/x86/kernel/apic/apic.c:1103 sysvec_apic_timer_interrupt+0x99/0xc0 arch/x86/kernel/apic/apic.c:1097 </IRQ> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 RIP: 0010:__raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline] RIP: 0010:_raw_spin_unlock_irq+0x25/0x40 kernel/locking/spinlock.c:202 Code: 0f 1f 44 00 00 53 48 8b 74 24 08 48 89 fb 48 83 c7 18 e8 be ce 05 fd 48 89 df e8 f6 06 06 fd e8 e1 19 14 fd fb bf 01 00 00 00 <e8> 76 9c 01 fd 65 8b 05 df 93 db 7b 85 c0 74 02 5b c3 e8 9b 4c da RSP: 0000:ffffc90001503e18 EFLAGS: 00000202 RAX: 00000000000b619e RBX: ffff88813dd18240 RCX: 0000000000000100 RDX: 0000000000000000 RSI: ffffffff85524f3d RDI: 0000000000000001 RBP: ffffc90001503e50 R08: 0000000000000001 R09: 0000000000000001 R10: ffffc90001503e28 R11: 0000000000000001 R12: ffff88810d86c9c0 R13: ffffc90001503e50 R14: ffffffff839a5e10 R15: ffffffff860d9290 expire_timers kernel/time/timer.c:1465 [inline] __run_timers kernel/time/timer.c:1734 [inline] run_timer_softirq+0x6ae/0x820 kernel/time/timer.c:1747 __do_softirq+0xe9/0x561 kernel/softirq.c:558 invoke_softirq kernel/softirq.c:432 [inline] __irq_exit_rcu kernel/softirq.c:636 [inline] irq_exit_rcu+0xe2/0x100 kernel/softirq.c:648 sysvec_apic_timer_interrupt+0x52/0xc0 arch/x86/kernel/apic/apic.c:1097 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 RIP: 0033:0x418dcf Code: 1f 84 00 00 00 00 00 0f 1f 40 00 64 8b 04 25 08 03 00 00 41 89 c3 41 83 cb 02 41 39 c3 74 17 f0 64 44 0f b1 1c 25 08 03 00 00 <75> e7 41 83 e3 bb 41 83 fb 0a 74 01 c3 48 83 ec 08 64 48 c7 04 25 RSP: 002b:00007f849523a9b8 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000003 R15: 00000000204413a0 ---------------- Code disassembly (best guess): 0: 48 c7 c3 80 a8 02 00 mov $0x2a880,%rbx 7: e8 cb b1 f1 02 callq 0x2f1b1d7 c: 89 c0 mov %eax,%eax e: 48 83 f8 07 cmp $0x7,%rax 12: 77 6c ja 0x80 14: 48 03 1c c5 80 98 62 add -0x7a9d6780(,%rax,8),%rbx 1b: 85 1c: 8b 43 08 mov 0x8(%rbx),%eax 1f: a8 01 test $0x1,%al 21: 74 0e je 0x31 23: e8 3f 9d 04 00 callq 0x49d67 28: f3 90 pause * 2a: 8b 43 08 mov 0x8(%rbx),%eax <-- trapping instruction 2d: a8 01 test $0x1,%al 2f: 75 f2 jne 0x23 31: e8 31 9d 04 00 callq 0x49d67 36: 83 4b 08 01 orl $0x1,0x8(%rbx) 3a: 48 89 de mov %rbx,%rsi 3d: 44 89 e7 mov %r12d,%edi ===== INFO: rcu detected stall in smp_call_function rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 0-....: (3203 ticks this GP) idle=417/1/0x4000000000000000 softirq=25875/25875 fqs=485 (t=10500 jiffies g=37621 q=38) rcu: rcu_preempt kthread starved for 486 jiffies! g37621 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=3 rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_preempt state:R running task stack:13832 pid: 16 ppid: 2 flags:0x00004000 Call Trace: context_switch kernel/sched/core.c:4940 [inline] __schedule+0x323/0xae0 kernel/sched/core.c:6287 schedule+0x36/0xe0 kernel/sched/core.c:6366 schedule_timeout+0x2cb/0x430 kernel/time/timer.c:1881 rcu_gp_fqs_loop+0x2f7/0x3e0 kernel/rcu/tree.c:1957 rcu_gp_kthread+0xd0/0x160 kernel/rcu/tree.c:2130 kthread+0x178/0x1b0 kernel/kthread.c:319 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 rcu: Stack dump where RCU GP kthread last ran: Sending NMI from CPU 0 to CPUs 3: NMI backtrace for cpu 3 CPU: 3 PID: 14220 Comm: kworker/3:10 Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: rcu_gp wait_rcu_exp_gp RIP: 0010:csd_lock_wait kernel/smp.c:440 [inline] RIP: 0010:csd_lock kernel/smp.c:446 [inline] RIP: 0010:smp_call_function_single+0x1a6/0x220 kernel/smp.c:745 Code: 80 a8 02 00 e8 cb b1 f1 02 89 c0 48 83 f8 07 77 6c 48 03 1c c5 80 98 62 85 8b 43 08 a8 01 74 0e e8 3f 9d 04 00 f3 90 8b 43 08 <a8> 01 75 f2 e8 31 9d 04 00 83 4b 08 01 48 89 de 44 89 e7 4c 89 73 RSP: 0018:ffffc90005793d00 EFLAGS: 00000293 RAX: 0000000000000001 RBX: ffff88813dd2a880 RCX: 0000000000000000 RDX: ffff888103ddc480 RSI: ffffffff8132f4c1 RDI: ffffffff852d80e6 RBP: ffffc90005793d90 R08: 0000000000000001 R09: 0000000000000001 R10: ffffc90005793da8 R11: 0000000000000000 R12: 0000000000000002 R13: 0000000000000000 R14: ffffffff812ef7c0 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88813dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055b72e9c2cd8 CR3: 000000000dedc000 CR4: 0000000000750ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: sync_rcu_exp_select_node_cpus+0x27b/0x440 kernel/rcu/tree_exp.h:393 sync_rcu_exp_select_cpus+0x238/0x5d0 kernel/rcu/tree_exp.h:441 rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:615 [inline] wait_rcu_exp_gp+0xa/0x20 kernel/rcu/tree_exp.h:629 process_one_work+0x359/0x850 kernel/workqueue.c:2297 worker_thread+0x41/0x4d0 kernel/workqueue.c:2444 kthread+0x178/0x1b0 kernel/kthread.c:319 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 NMI backtrace for cpu 0 CPU: 0 PID: 9 Comm: kworker/u9:0 Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: events_unbound toggle_allocation_gate Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x8d/0xcf lib/dump_stack.c:106 nmi_cpu_backtrace+0x1e9/0x210 lib/nmi_backtrace.c:105 nmi_trigger_cpumask_backtrace+0x120/0x180 lib/nmi_backtrace.c:62 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline] rcu_dump_cpu_stacks+0x125/0x17b kernel/rcu/tree_stall.h:343 print_cpu_stall kernel/rcu/tree_stall.h:627 [inline] check_cpu_stall kernel/rcu/tree_stall.h:711 [inline] rcu_pending kernel/rcu/tree.c:3880 [inline] rcu_sched_clock_irq+0xc20/0x1360 kernel/rcu/tree.c:2599 update_process_times+0xac/0x100 kernel/time/timer.c:1785 tick_sched_handle.isra.20+0x27/0x70 kernel/time/tick-sched.c:226 tick_sched_timer+0x8a/0xb0 kernel/time/tick-sched.c:1421 __run_hrtimer kernel/time/hrtimer.c:1685 [inline] __hrtimer_run_queues+0x23e/0x610 kernel/time/hrtimer.c:1749 hrtimer_interrupt+0xfe/0x280 kernel/time/hrtimer.c:1811 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline] __sysvec_apic_timer_interrupt+0x9c/0x2c0 arch/x86/kernel/apic/apic.c:1103 sysvec_apic_timer_interrupt+0x99/0xc0 arch/x86/kernel/apic/apic.c:1097 </IRQ> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 RIP: 0010:csd_lock_wait kernel/smp.c:440 [inline] RIP: 0010:smp_call_function_many_cond+0x1d2/0x550 kernel/smp.c:969 Code: a6 04 00 4c 63 fd 49 8b 1c 24 49 83 ff 07 0f 87 31 03 00 00 4a 03 1c fd 80 98 62 85 8b 43 08 a8 01 74 0e e8 40 a6 04 00 f3 90 <8b> 43 08 a8 01 75 f2 e8 32 a6 04 00 eb ad 48 83 c4 40 5b 5d 41 5c RSP: 0018:ffffc9000067bc80 EFLAGS: 00000293 RAX: 0000000000000000 RBX: ffff88807dd2f9c0 RCX: 0000000000000000 RDX: ffff88800985c480 RSI: ffffffff8132ebc0 RDI: 0000000000000001 RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000001 R10: ffffc9000067bc28 R11: 0000000000000000 R12: ffff88807dc2a900 R13: ffff88807dc2a908 R14: ffffffff85a27620 R15: 0000000000000002 on_each_cpu_cond_mask+0x48/0x90 kernel/smp.c:1135 on_each_cpu include/linux/smp.h:71 [inline] text_poke_sync arch/x86/kernel/alternative.c:929 [inline] text_poke_bp_batch+0xb6/0x2c0 arch/x86/kernel/alternative.c:1114 text_poke_flush arch/x86/kernel/alternative.c:1268 [inline] text_poke_finish+0x16/0x30 arch/x86/kernel/alternative.c:1275 arch_jump_label_transform_apply+0x13/0x20 arch/x86/kernel/jump_label.c:146 jump_label_update+0xbc/0x190 kernel/jump_label.c:830 static_key_enable_cpuslocked+0x77/0xb0 kernel/jump_label.c:177 static_key_enable+0x16/0x20 kernel/jump_label.c:190 toggle_allocation_gate+0x71/0x240 mm/kfence/core.c:626 process_one_work+0x359/0x850 kernel/workqueue.c:2297 worker_thread+0x41/0x4d0 kernel/workqueue.c:2444 kthread+0x178/0x1b0 kernel/kthread.c:319 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 Sending NMI from CPU 0 to CPUs 2: ---------------- Code disassembly (best guess), 2 bytes skipped: 0: 02 00 add (%rax),%al 2: e8 cb b1 f1 02 callq 0x2f1b1d2 7: 89 c0 mov %eax,%eax 9: 48 83 f8 07 cmp $0x7,%rax d: 77 6c ja 0x7b f: 48 03 1c c5 80 98 62 add -0x7a9d6780(,%rax,8),%rbx 16: 85 17: 8b 43 08 mov 0x8(%rbx),%eax 1a: a8 01 test $0x1,%al 1c: 74 0e je 0x2c 1e: e8 3f 9d 04 00 callq 0x49d62 23: f3 90 pause 25: 8b 43 08 mov 0x8(%rbx),%eax * 28: a8 01 test $0x1,%al <-- trapping instruction 2a: 75 f2 jne 0x1e 2c: e8 31 9d 04 00 callq 0x49d62 31: 83 4b 08 01 orl $0x1,0x8(%rbx) 35: 48 89 de mov %rbx,%rsi 38: 44 89 e7 mov %r12d,%edi 3b: 4c rex.WR 3c: 89 .byte 0x89 3d: 73 .byte 0x73 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG: soft lockup in smp_call_function 2021-09-15 1:59 Hao Sun @ 2021-09-15 5:12 ` Zhang, Qiang 0 siblings, 0 replies; 12+ messages in thread From: Zhang, Qiang @ 2021-09-15 5:12 UTC (permalink / raw) To: Hao Sun, linux-kernel; +Cc: mingo Hi Sun It's like csd_lock cannot be acquired, causing the problem Enable CONFIG_CSD_LOCK_WAIT_DEBUG for more information. Thanks Qiang ________________________________________ From: Hao Sun <sunhao.th@gmail.com> Sent: Wednesday, 15 September 2021 09:59 To: linux-kernel@vger.kernel.org Cc: mingo@kernel.org Subject: BUG: soft lockup in smp_call_function [Please note: This e-mail is from an EXTERNAL e-mail address] Hello, When using Healer to fuzz the latest Linux kernel, the following two similar crashes was triggered. HEAD commit: 6880fa6c5660 Linux 5.15-rc1 git tree: upstream console output: https://drive.google.com/file/d/1W2g_mj5JPDXI5U7HibVepNdxMdChKS0_/view?usp=sharing kernel config: https://drive.google.com/file/d/1rUzyMbe5vcs6khA3tL9EHTLJvsUdWcgB/view?usp=sharing similar report: INFO: rcu detected stall in smp_call_function Sorry, I don't have a reproducer for this crash, hope the symbolized report can help. If you fix this issue, please add the following tag to the commit: Reported-by: Hao Sun <sunhao.th@gmail.com> watchdog: BUG: soft lockup - CPU#1 stuck for 157s! [kworker/1:8:14241] Modules linked in: irq event stamp: 371624 hardirqs last enabled at (371623): [<ffffffff84400c02>] asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 hardirqs last disabled at (371624): [<ffffffff8424818a>] sysvec_apic_timer_interrupt+0xa/0xc0 arch/x86/kernel/apic/apic.c:1097 softirqs last enabled at (337636): [<ffffffff84600398>] softirq_handle_end kernel/softirq.c:401 [inline] softirqs last enabled at (337636): [<ffffffff84600398>] __do_softirq+0x398/0x561 kernel/softirq.c:587 softirqs last disabled at (337625): [<ffffffff81235b42>] invoke_softirq kernel/softirq.c:432 [inline] softirqs last disabled at (337625): [<ffffffff81235b42>] __irq_exit_rcu kernel/softirq.c:636 [inline] softirqs last disabled at (337625): [<ffffffff81235b42>] irq_exit_rcu+0xe2/0x100 kernel/softirq.c:648 CPU: 1 PID: 14241 Comm: kworker/1:8 Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: rcu_gp wait_rcu_exp_gp RIP: 0010:csd_lock_wait kernel/smp.c:440 [inline] RIP: 0010:csd_lock kernel/smp.c:446 [inline] RIP: 0010:smp_call_function_single+0x1a3/0x220 kernel/smp.c:745 Code: 48 c7 c3 80 a8 02 00 e8 cb b1 f1 02 89 c0 48 83 f8 07 77 6c 48 03 1c c5 80 98 62 85 8b 43 08 a8 01 74 0e e8 3f 9d 04 00 f3 90 <8b> 43 08 a8 01 75 f2 e8 31 9d 04 00 83 4b 08 01 48 89 de 44 89 e7 RSP: 0018:ffffc90001d4fd00 EFLAGS: 00000293 RAX: 0000000000000000 RBX: ffff88813dc2a880 RCX: 0000000000000000 RDX: ffff888112ac8000 RSI: ffffffff8132f4c1 RDI: ffffffff852d80e6 RBP: ffffc90001d4fd90 R08: 0000000000000001 R09: 0000000000000001 R10: ffffc90001d4fda8 R11: 0000000000000002 R12: 0000000000000002 R13: 0000000000000000 R14: ffffffff812ef7c0 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88813dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055a15807ccf0 CR3: 0000000025f6b000 CR4: 0000000000750ee0 PKRU: 55555554 Call Trace: sync_rcu_exp_select_node_cpus+0x27b/0x440 kernel/rcu/tree_exp.h:393 sync_rcu_exp_select_cpus+0x238/0x5d0 kernel/rcu/tree_exp.h:441 rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:615 [inline] wait_rcu_exp_gp+0xa/0x20 kernel/rcu/tree_exp.h:629 process_one_work+0x359/0x850 kernel/workqueue.c:2297 worker_thread+0x41/0x4d0 kernel/workqueue.c:2444 kthread+0x178/0x1b0 kernel/kthread.c:319 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 Sending NMI from CPU 1 to CPUs 0,2-3: NMI backtrace for cpu 0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:__lock_acquire+0x14f/0x1d60 kernel/locking/lockdep.c:4955 Code: 00 48 89 43 08 48 8b 04 24 48 89 43 18 41 8b b5 dc 09 00 00 65 8b 05 08 ed d5 7e 85 c0 0f 95 c2 31 c0 85 f6 44 89 fe 0f 95 c0 <83> e1 03 c1 e6 07 8d 14 50 0f b6 43 21 83 e2 03 c1 e2 05 83 e0 1f RSP: 0018:ffffc90000003c40 EFLAGS: 00000002 RAX: 0000000000000001 RBX: ffffffff858b2d70 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88813dd1a3d8 RBP: 0000000000000057 R08: 0000000000000001 R09: 0000000000000001 R10: ffff88813dd1a3d8 R11: 0000000000000003 R12: 0000000000000001 R13: ffffffff858b2300 R14: ffffffff858b2cf8 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055a1580bf6d7 CR3: 0000000025f6b000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <IRQ> lock_acquire+0x1f9/0x340 kernel/locking/lockdep.c:5625 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x38/0x50 kernel/locking/spinlock.c:162 lock_hrtimer_base+0x2f/0x70 kernel/time/hrtimer.c:173 hrtimer_try_to_cancel+0x6d/0x270 kernel/time/hrtimer.c:1331 hrtimer_cancel+0x12/0x30 kernel/time/hrtimer.c:1443 __disable_vblank drivers/gpu/drm/drm_vblank.c:434 [inline] drm_vblank_disable_and_save+0xca/0x130 drivers/gpu/drm/drm_vblank.c:478 vblank_disable_fn+0x83/0xa0 drivers/gpu/drm/drm_vblank.c:495 call_timer_fn+0xcb/0x3f0 kernel/time/timer.c:1421 expire_timers kernel/time/timer.c:1466 [inline] __run_timers kernel/time/timer.c:1734 [inline] run_timer_softirq+0x6bd/0x820 kernel/time/timer.c:1747 __do_softirq+0xe9/0x561 kernel/softirq.c:558 invoke_softirq kernel/softirq.c:432 [inline] __irq_exit_rcu kernel/softirq.c:636 [inline] irq_exit_rcu+0xe2/0x100 kernel/softirq.c:648 sysvec_apic_timer_interrupt+0x9e/0xc0 arch/x86/kernel/apic/apic.c:1097 </IRQ> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 RIP: 0010:default_idle+0xb/0x10 arch/x86/kernel/process.c:717 Code: 60 ce 25 84 75 09 48 83 c4 18 5b 5d 41 5c c3 e8 ab c4 fe ff cc cc cc cc cc cc cc cc cc cc cc eb 07 0f 00 2d ff 00 5c 00 fb f4 <c3> 0f 1f 40 00 65 48 8b 04 25 40 70 01 00 f0 80 48 02 20 48 8b 10 RSP: 0018:ffffffff85803e90 EFLAGS: 00000202 RAX: 00000000010c94ab RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff85524f3d RDI: ffffffff853cbec6 RBP: ffffffff860d9290 R08: 0000000000000001 R09: 0000000000000001 R10: ffffffff85803be8 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff858b2300 default_idle_call+0x6a/0x260 kernel/sched/idle.c:112 cpuidle_idle_call kernel/sched/idle.c:194 [inline] do_idle+0x1e9/0x2b0 kernel/sched/idle.c:306 cpu_startup_entry+0x14/0x20 kernel/sched/idle.c:403 start_kernel+0x839/0x860 init/main.c:1141 secondary_startup_64_no_verify+0xb0/0xbb NMI backtrace for cpu 2 CPU: 2 PID: 9794 Comm: sshd Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:__sanitizer_cov_trace_pc+0x37/0x60 kernel/kcov.c:197 Code: 65 48 8b 14 25 40 70 01 00 81 e1 00 01 00 00 a9 00 01 ff 00 74 0e 85 c9 74 35 8b 82 34 15 00 00 85 c0 74 2b 8b 82 10 15 00 00 <83> f8 02 75 20 48 8b 8a 18 15 00 00 8b 92 14 15 00 00 48 8b 01 48 RSP: 0018:ffffc900008cfa80 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff88807dc318e0 RCX: 0000000000000000 RDX: ffff88800e9c4480 RSI: ffffffff8132ebc0 RDI: 00000000ffffffff RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001 R10: ffffc900008cfa30 R11: 0000000000000000 R12: ffff88807dd2a900 R13: ffff88807dd2a908 R14: ffffffff85a27620 R15: 0000000000000000 FS: 00007f831f8568c0(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffd76d1df1f CR3: 000000000fe3c000 CR4: 0000000000750ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: rep_nop arch/x86/include/asm/vdso/processor.h:13 [inline] cpu_relax arch/x86/include/asm/vdso/processor.h:18 [inline] csd_lock_wait kernel/smp.c:440 [inline] smp_call_function_many_cond+0x1d0/0x550 kernel/smp.c:969 on_each_cpu_cond_mask+0x48/0x90 kernel/smp.c:1135 __purge_vmap_area_lazy+0xd7/0xa40 mm/vmalloc.c:1683 _vm_unmap_aliases+0x233/0x280 mm/vmalloc.c:2107 change_page_attr_set_clr+0xb3/0x340 arch/x86/mm/pat/set_memory.c:1740 change_page_attr_clear arch/x86/mm/pat/set_memory.c:1797 [inline] set_memory_ro+0x2b/0x40 arch/x86/mm/pat/set_memory.c:1943 bpf_jit_binary_lock_ro include/linux/filter.h:890 [inline] bpf_int_jit_compile+0x65d/0x6b0 arch/x86/net/bpf_jit_comp.c:2338 bpf_prog_select_runtime+0x173/0x1e0 kernel/bpf/core.c:1914 bpf_migrate_filter net/core/filter.c:1295 [inline] bpf_prepare_filter+0x40d/0x5d0 net/core/filter.c:1343 bpf_prog_create_from_user+0xcf/0x150 net/core/filter.c:1437 seccomp_prepare_filter kernel/seccomp.c:666 [inline] seccomp_prepare_user_filter kernel/seccomp.c:703 [inline] seccomp_set_mode_filter kernel/seccomp.c:1824 [inline] do_seccomp+0x2ca/0xdb0 kernel/seccomp.c:1944 prctl_set_seccomp+0x2f/0x60 kernel/seccomp.c:1997 __do_sys_prctl kernel/sys.c:2346 [inline] __se_sys_prctl kernel/sys.c:2264 [inline] __x64_sys_prctl+0xd4/0x770 kernel/sys.c:2264 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x34/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f831d9cd54a Code: 48 8b 0d 51 f9 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 9d 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1e f9 2a 00 f7 d8 64 89 01 48 RSP: 002b:00007ffd76d1c428 EFLAGS: 00000246 ORIG_RAX: 000000000000009d RAX: ffffffffffffffda RBX: 00005597d3a36f20 RCX: 00007f831d9cd54a RDX: 00005597d32bdf50 RSI: 0000000000000002 RDI: 0000000000000016 RBP: 00005597d3a35d40 R08: 0000000000000000 R09: 0000000000000005 R10: 00007f831d9cd54a R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000028 R14: 0000000000000000 R15: 00007ffd76d1c870 NMI backtrace for cpu 3 CPU: 3 PID: 9782 Comm: syz-executor Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:kvm_wait+0x4a/0x70 arch/x86/kernel/kvm.c:893 Code: 5d c3 89 f3 48 89 fd 9c 58 fa f6 c4 02 75 30 0f b6 45 00 38 c3 74 16 e8 94 82 25 00 fb 5b 5d c3 eb 07 0f 00 2d 95 c1 6d 03 f4 <5b> 5d c3 e8 7e 82 25 00 eb 07 0f 00 2d 83 c1 6d 03 fb f4 eb c0 e8 RSP: 0000:ffffc90000768e00 EFLAGS: 00000046 RAX: 0000000000000003 RBX: ffff888100998338 RCX: 0000000000000008 RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff888100998338 RBP: ffff88813dd2a600 R08: 0000000000000000 R09: 0000000000000001 R10: ffffc90000768d20 R11: 0000000000000002 R12: 0000000000000001 R13: 0000000000000100 R14: 0000000000000000 R15: 0000000000100000 FS: 00007f849523b700(0000) GS:ffff88813dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020400010 CR3: 0000000025f6b000 CR4: 0000000000750ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> pv_wait arch/x86/include/asm/paravirt.h:597 [inline] pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:470 [inline] __pv_queued_spin_lock_slowpath+0x262/0x330 kernel/locking/qspinlock.c:508 pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:585 [inline] queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline] queued_spin_lock include/asm-generic/qspinlock.h:85 [inline] do_raw_spin_lock+0xb6/0xc0 kernel/locking/spinlock_debug.c:115 spin_lock include/linux/spinlock.h:363 [inline] drm_handle_vblank+0x86/0x530 drivers/gpu/drm/drm_vblank.c:1951 vkms_vblank_simulate+0x5a/0x190 drivers/gpu/drm/vkms/vkms_crtc.c:29 __run_hrtimer kernel/time/hrtimer.c:1685 [inline] __hrtimer_run_queues+0xb8/0x610 kernel/time/hrtimer.c:1749 hrtimer_interrupt+0xfe/0x280 kernel/time/hrtimer.c:1811 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline] __sysvec_apic_timer_interrupt+0x9c/0x2c0 arch/x86/kernel/apic/apic.c:1103 sysvec_apic_timer_interrupt+0x99/0xc0 arch/x86/kernel/apic/apic.c:1097 </IRQ> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 RIP: 0010:__raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline] RIP: 0010:_raw_spin_unlock_irq+0x25/0x40 kernel/locking/spinlock.c:202 Code: 0f 1f 44 00 00 53 48 8b 74 24 08 48 89 fb 48 83 c7 18 e8 be ce 05 fd 48 89 df e8 f6 06 06 fd e8 e1 19 14 fd fb bf 01 00 00 00 <e8> 76 9c 01 fd 65 8b 05 df 93 db 7b 85 c0 74 02 5b c3 e8 9b 4c da RSP: 0000:ffffc90001503e18 EFLAGS: 00000202 RAX: 00000000000b619e RBX: ffff88813dd18240 RCX: 0000000000000100 RDX: 0000000000000000 RSI: ffffffff85524f3d RDI: 0000000000000001 RBP: ffffc90001503e50 R08: 0000000000000001 R09: 0000000000000001 R10: ffffc90001503e28 R11: 0000000000000001 R12: ffff88810d86c9c0 R13: ffffc90001503e50 R14: ffffffff839a5e10 R15: ffffffff860d9290 expire_timers kernel/time/timer.c:1465 [inline] __run_timers kernel/time/timer.c:1734 [inline] run_timer_softirq+0x6ae/0x820 kernel/time/timer.c:1747 __do_softirq+0xe9/0x561 kernel/softirq.c:558 invoke_softirq kernel/softirq.c:432 [inline] __irq_exit_rcu kernel/softirq.c:636 [inline] irq_exit_rcu+0xe2/0x100 kernel/softirq.c:648 sysvec_apic_timer_interrupt+0x52/0xc0 arch/x86/kernel/apic/apic.c:1097 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 RIP: 0033:0x418dcf Code: 1f 84 00 00 00 00 00 0f 1f 40 00 64 8b 04 25 08 03 00 00 41 89 c3 41 83 cb 02 41 39 c3 74 17 f0 64 44 0f b1 1c 25 08 03 00 00 <75> e7 41 83 e3 bb 41 83 fb 0a 74 01 c3 48 83 ec 08 64 48 c7 04 25 RSP: 002b:00007f849523a9b8 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000003 R15: 00000000204413a0 ---------------- Code disassembly (best guess): 0: 48 c7 c3 80 a8 02 00 mov $0x2a880,%rbx 7: e8 cb b1 f1 02 callq 0x2f1b1d7 c: 89 c0 mov %eax,%eax e: 48 83 f8 07 cmp $0x7,%rax 12: 77 6c ja 0x80 14: 48 03 1c c5 80 98 62 add -0x7a9d6780(,%rax,8),%rbx 1b: 85 1c: 8b 43 08 mov 0x8(%rbx),%eax 1f: a8 01 test $0x1,%al 21: 74 0e je 0x31 23: e8 3f 9d 04 00 callq 0x49d67 28: f3 90 pause * 2a: 8b 43 08 mov 0x8(%rbx),%eax <-- trapping instruction 2d: a8 01 test $0x1,%al 2f: 75 f2 jne 0x23 31: e8 31 9d 04 00 callq 0x49d67 36: 83 4b 08 01 orl $0x1,0x8(%rbx) 3a: 48 89 de mov %rbx,%rsi 3d: 44 89 e7 mov %r12d,%edi ===== INFO: rcu detected stall in smp_call_function rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 0-....: (3203 ticks this GP) idle=417/1/0x4000000000000000 softirq=25875/25875 fqs=485 (t=10500 jiffies g=37621 q=38) rcu: rcu_preempt kthread starved for 486 jiffies! g37621 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=3 rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_preempt state:R running task stack:13832 pid: 16 ppid: 2 flags:0x00004000 Call Trace: context_switch kernel/sched/core.c:4940 [inline] __schedule+0x323/0xae0 kernel/sched/core.c:6287 schedule+0x36/0xe0 kernel/sched/core.c:6366 schedule_timeout+0x2cb/0x430 kernel/time/timer.c:1881 rcu_gp_fqs_loop+0x2f7/0x3e0 kernel/rcu/tree.c:1957 rcu_gp_kthread+0xd0/0x160 kernel/rcu/tree.c:2130 kthread+0x178/0x1b0 kernel/kthread.c:319 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 rcu: Stack dump where RCU GP kthread last ran: Sending NMI from CPU 0 to CPUs 3: NMI backtrace for cpu 3 CPU: 3 PID: 14220 Comm: kworker/3:10 Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: rcu_gp wait_rcu_exp_gp RIP: 0010:csd_lock_wait kernel/smp.c:440 [inline] RIP: 0010:csd_lock kernel/smp.c:446 [inline] RIP: 0010:smp_call_function_single+0x1a6/0x220 kernel/smp.c:745 Code: 80 a8 02 00 e8 cb b1 f1 02 89 c0 48 83 f8 07 77 6c 48 03 1c c5 80 98 62 85 8b 43 08 a8 01 74 0e e8 3f 9d 04 00 f3 90 8b 43 08 <a8> 01 75 f2 e8 31 9d 04 00 83 4b 08 01 48 89 de 44 89 e7 4c 89 73 RSP: 0018:ffffc90005793d00 EFLAGS: 00000293 RAX: 0000000000000001 RBX: ffff88813dd2a880 RCX: 0000000000000000 RDX: ffff888103ddc480 RSI: ffffffff8132f4c1 RDI: ffffffff852d80e6 RBP: ffffc90005793d90 R08: 0000000000000001 R09: 0000000000000001 R10: ffffc90005793da8 R11: 0000000000000000 R12: 0000000000000002 R13: 0000000000000000 R14: ffffffff812ef7c0 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88813dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055b72e9c2cd8 CR3: 000000000dedc000 CR4: 0000000000750ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: sync_rcu_exp_select_node_cpus+0x27b/0x440 kernel/rcu/tree_exp.h:393 sync_rcu_exp_select_cpus+0x238/0x5d0 kernel/rcu/tree_exp.h:441 rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:615 [inline] wait_rcu_exp_gp+0xa/0x20 kernel/rcu/tree_exp.h:629 process_one_work+0x359/0x850 kernel/workqueue.c:2297 worker_thread+0x41/0x4d0 kernel/workqueue.c:2444 kthread+0x178/0x1b0 kernel/kthread.c:319 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 NMI backtrace for cpu 0 CPU: 0 PID: 9 Comm: kworker/u9:0 Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: events_unbound toggle_allocation_gate Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x8d/0xcf lib/dump_stack.c:106 nmi_cpu_backtrace+0x1e9/0x210 lib/nmi_backtrace.c:105 nmi_trigger_cpumask_backtrace+0x120/0x180 lib/nmi_backtrace.c:62 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline] rcu_dump_cpu_stacks+0x125/0x17b kernel/rcu/tree_stall.h:343 print_cpu_stall kernel/rcu/tree_stall.h:627 [inline] check_cpu_stall kernel/rcu/tree_stall.h:711 [inline] rcu_pending kernel/rcu/tree.c:3880 [inline] rcu_sched_clock_irq+0xc20/0x1360 kernel/rcu/tree.c:2599 update_process_times+0xac/0x100 kernel/time/timer.c:1785 tick_sched_handle.isra.20+0x27/0x70 kernel/time/tick-sched.c:226 tick_sched_timer+0x8a/0xb0 kernel/time/tick-sched.c:1421 __run_hrtimer kernel/time/hrtimer.c:1685 [inline] __hrtimer_run_queues+0x23e/0x610 kernel/time/hrtimer.c:1749 hrtimer_interrupt+0xfe/0x280 kernel/time/hrtimer.c:1811 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline] __sysvec_apic_timer_interrupt+0x9c/0x2c0 arch/x86/kernel/apic/apic.c:1103 sysvec_apic_timer_interrupt+0x99/0xc0 arch/x86/kernel/apic/apic.c:1097 </IRQ> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 RIP: 0010:csd_lock_wait kernel/smp.c:440 [inline] RIP: 0010:smp_call_function_many_cond+0x1d2/0x550 kernel/smp.c:969 Code: a6 04 00 4c 63 fd 49 8b 1c 24 49 83 ff 07 0f 87 31 03 00 00 4a 03 1c fd 80 98 62 85 8b 43 08 a8 01 74 0e e8 40 a6 04 00 f3 90 <8b> 43 08 a8 01 75 f2 e8 32 a6 04 00 eb ad 48 83 c4 40 5b 5d 41 5c RSP: 0018:ffffc9000067bc80 EFLAGS: 00000293 RAX: 0000000000000000 RBX: ffff88807dd2f9c0 RCX: 0000000000000000 RDX: ffff88800985c480 RSI: ffffffff8132ebc0 RDI: 0000000000000001 RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000001 R10: ffffc9000067bc28 R11: 0000000000000000 R12: ffff88807dc2a900 R13: ffff88807dc2a908 R14: ffffffff85a27620 R15: 0000000000000002 on_each_cpu_cond_mask+0x48/0x90 kernel/smp.c:1135 on_each_cpu include/linux/smp.h:71 [inline] text_poke_sync arch/x86/kernel/alternative.c:929 [inline] text_poke_bp_batch+0xb6/0x2c0 arch/x86/kernel/alternative.c:1114 text_poke_flush arch/x86/kernel/alternative.c:1268 [inline] text_poke_finish+0x16/0x30 arch/x86/kernel/alternative.c:1275 arch_jump_label_transform_apply+0x13/0x20 arch/x86/kernel/jump_label.c:146 jump_label_update+0xbc/0x190 kernel/jump_label.c:830 static_key_enable_cpuslocked+0x77/0xb0 kernel/jump_label.c:177 static_key_enable+0x16/0x20 kernel/jump_label.c:190 toggle_allocation_gate+0x71/0x240 mm/kfence/core.c:626 process_one_work+0x359/0x850 kernel/workqueue.c:2297 worker_thread+0x41/0x4d0 kernel/workqueue.c:2444 kthread+0x178/0x1b0 kernel/kthread.c:319 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 Sending NMI from CPU 0 to CPUs 2: ---------------- Code disassembly (best guess), 2 bytes skipped: 0: 02 00 add (%rax),%al 2: e8 cb b1 f1 02 callq 0x2f1b1d2 7: 89 c0 mov %eax,%eax 9: 48 83 f8 07 cmp $0x7,%rax d: 77 6c ja 0x7b f: 48 03 1c c5 80 98 62 add -0x7a9d6780(,%rax,8),%rbx 16: 85 17: 8b 43 08 mov 0x8(%rbx),%eax 1a: a8 01 test $0x1,%al 1c: 74 0e je 0x2c 1e: e8 3f 9d 04 00 callq 0x49d62 23: f3 90 pause 25: 8b 43 08 mov 0x8(%rbx),%eax * 28: a8 01 test $0x1,%al <-- trapping instruction 2a: 75 f2 jne 0x1e 2c: e8 31 9d 04 00 callq 0x49d62 31: 83 4b 08 01 orl $0x1,0x8(%rbx) 35: 48 89 de mov %rbx,%rsi 38: 44 89 e7 mov %r12d,%edi 3b: 4c rex.WR 3c: 89 .byte 0x89 3d: 73 .byte 0x73 ^ permalink raw reply [flat|nested] 12+ messages in thread
* BUG: soft lockup in smp_call_function @ 2023-09-12 23:02 Sanan Hasanov 2023-09-13 10:05 ` Peter Zijlstra 2023-09-13 11:07 ` Hillf Danton 0 siblings, 2 replies; 12+ messages in thread From: Sanan Hasanov @ 2023-09-12 23:02 UTC (permalink / raw) To: peterz, paulmck, jgross, vschneid, yury.norov, linux-kernel Cc: syzkaller, contact Good day, dear maintainers, We found a bug using a modified kernel configuration file used by syzbot. We enhanced the coverage of the configuration file using our tool, klocalizer. Kernel Branch: 6.3.0-next-20230426 Kernel Config: https://drive.google.com/file/d/1WSUEWrith9-539qo6xRqmwy4LfDtmKpp/view?usp=sharing Reproducer: https://drive.google.com/file/d/1pN6FfcjuUs6Wx94g1gufuYGjRbMMgiZ4/view?usp=sharing Thank you! Best regards, Sanan Hasanov watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [kworker/u16:1:12] Modules linked in: irq event stamp: 192794 hardirqs last enabled at (192793): [<ffffffff89a0140a>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 hardirqs last disabled at (192794): [<ffffffff89975d4f>] sysvec_apic_timer_interrupt+0xf/0xc0 arch/x86/kernel/apic/apic.c:1106 softirqs last enabled at (187764): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] softirqs last enabled at (187764): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 softirqs last disabled at (187671): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] softirqs last disabled at (187671): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 CPU: 5 PID: 12 Comm: kworker/u16:1 Not tainted 6.3.0-next-20230426 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Workqueue: events_unbound toggle_allocation_gate RIP: 0010:csd_lock_wait kernel/smp.c:294 [inline] RIP: 0010:smp_call_function_many_cond+0x5bd/0x1020 kernel/smp.c:828 Code: 0b 00 85 ed 74 4d 48 b8 00 00 00 00 00 fc ff df 4d 89 f4 4c 89 f5 49 c1 ec 03 83 e5 07 49 01 c4 83 c5 03 e8 b5 07 0b 00 f3 90 <41> 0f b6 04 24 40 38 c5 7c 08 84 c0 0f 85 46 08 00 00 8b 43 08 31 RSP: 0018:ffffc900000cf9e8 EFLAGS: 00000293 RAX: 0000000000000000 RBX: ffff888119cc4d80 RCX: 0000000000000000 RDX: ffff888100325940 RSI: ffffffff8176807b RDI: 0000000000000005 RBP: 0000000000000003 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffffed10233989b1 R13: 0000000000000001 R14: ffff888119cc4d88 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff888119e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000555556a6cc88 CR3: 000000000bb73000 CR4: 0000000000350ee0 Call Trace: <TASK> on_each_cpu_cond_mask+0x40/0x90 kernel/smp.c:996 on_each_cpu include/linux/smp.h:71 [inline] text_poke_sync arch/x86/kernel/alternative.c:1770 [inline] text_poke_bp_batch+0x237/0x770 arch/x86/kernel/alternative.c:1970 text_poke_flush arch/x86/kernel/alternative.c:2161 [inline] text_poke_flush arch/x86/kernel/alternative.c:2158 [inline] text_poke_finish+0x1a/0x30 arch/x86/kernel/alternative.c:2168 arch_jump_label_transform_apply+0x17/0x30 arch/x86/kernel/jump_label.c:146 jump_label_update+0x321/0x400 kernel/jump_label.c:829 static_key_enable_cpuslocked+0x1b5/0x270 kernel/jump_label.c:205 static_key_enable+0x1a/0x20 kernel/jump_label.c:218 toggle_allocation_gate mm/kfence/core.c:831 [inline] toggle_allocation_gate+0xf4/0x220 mm/kfence/core.c:823 process_one_work+0x993/0x15e0 kernel/workqueue.c:2405 worker_thread+0x67d/0x10c0 kernel/workqueue.c:2552 kthread+0x33e/0x440 kernel/kthread.c:379 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 </TASK> Sending NMI from CPU 5 to CPUs 0-4,6-7: NMI backtrace for cpu 1 CPU: 1 PID: 20602 Comm: syz-executor.3 Not tainted 6.3.0-next-20230426 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:hlock_class kernel/locking/lockdep.c:228 [inline] RIP: 0010:check_wait_context kernel/locking/lockdep.c:4747 [inline] RIP: 0010:__lock_acquire+0x489/0x5d00 kernel/locking/lockdep.c:5024 Code: 41 81 e5 ff 1f 45 0f b7 ed be 08 00 00 00 4c 89 e8 48 c1 e8 06 48 8d 3c c5 00 6b 2c 90 e8 5f 90 6e 00 4c 0f a3 2d d7 35 c9 0e <0f> 83 5c 0c 00 00 4f 8d 6c 6d 00 49 c1 e5 06 49 81 c5 20 6f 2c 90 RSP: 0018:ffffc90002aa7350 EFLAGS: 00000047 RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffffff81633521 RDX: fffffbfff2058d62 RSI: 0000000000000008 RDI: ffffffff902c6b08 RBP: ffff888042995940 R08: 0000000000000000 R09: ffffffff902c6b0f R10: fffffbfff2058d61 R11: 0000000000000001 R12: ffff888119e2b818 R13: 0000000000000063 R14: 0000000000000002 R15: ffff888042996598 FS: 00007fdaad065700(0000) GS:ffff888119c80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b30623000 CR3: 0000000101969000 CR4: 0000000000350ee0 Call Trace: <TASK> lock_acquire kernel/locking/lockdep.c:5691 [inline] lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5656 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162 lock_hrtimer_base kernel/time/hrtimer.c:173 [inline] hrtimer_try_to_cancel kernel/time/hrtimer.c:1331 [inline] hrtimer_try_to_cancel+0xa9/0x2e0 kernel/time/hrtimer.c:1316 hrtimer_cancel+0x17/0x40 kernel/time/hrtimer.c:1443 __disable_vblank drivers/gpu/drm/drm_vblank.c:434 [inline] drm_vblank_disable_and_save+0x282/0x3d0 drivers/gpu/drm/drm_vblank.c:478 drm_crtc_vblank_off+0x312/0x970 drivers/gpu/drm/drm_vblank.c:1366 disable_outputs+0x7c7/0xbb0 drivers/gpu/drm/drm_atomic_helper.c:1202 drm_atomic_helper_commit_modeset_disables+0x1d/0x40 drivers/gpu/drm/drm_atomic_helper.c:1397 vkms_atomic_commit_tail+0x51/0x240 drivers/gpu/drm/vkms/vkms_drv.c:71 commit_tail+0x288/0x420 drivers/gpu/drm/drm_atomic_helper.c:1812 drm_atomic_helper_commit drivers/gpu/drm/drm_atomic_helper.c:2052 [inline] drm_atomic_helper_commit+0x306/0x390 drivers/gpu/drm/drm_atomic_helper.c:1985 drm_atomic_commit+0x20a/0x2d0 drivers/gpu/drm/drm_atomic.c:1503 drm_client_modeset_commit_atomic+0x698/0x7e0 drivers/gpu/drm/drm_client_modeset.c:1045 drm_client_modeset_dpms+0x174/0x200 drivers/gpu/drm/drm_client_modeset.c:1226 drm_fb_helper_dpms drivers/gpu/drm/drm_fb_helper.c:323 [inline] drm_fb_helper_blank+0xd1/0x260 drivers/gpu/drm/drm_fb_helper.c:356 fb_blank+0x105/0x190 drivers/video/fbdev/core/fbmem.c:1088 do_fb_ioctl+0x390/0x760 drivers/video/fbdev/core/fbmem.c:1180 fb_ioctl+0xeb/0x150 drivers/video/fbdev/core/fbmem.c:1204 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x197/0x210 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0x80 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fdaabe8edcd Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fdaad064bf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007fdaabfbbf80 RCX: 00007fdaabe8edcd RDX: 0000000000000004 RSI: 0000000000004611 RDI: 0000000000000003 RBP: 00007fdaabefc59c R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffdadeffe9f R14: 00007ffdadf00040 R15: 00007fdaad064d80 </TASK> NMI backtrace for cpu 0 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] NMI backtrace for cpu 0 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] NMI backtrace for cpu 0 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 NMI backtrace for cpu 3 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] NMI backtrace for cpu 3 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] NMI backtrace for cpu 3 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 NMI backtrace for cpu 6 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] NMI backtrace for cpu 6 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] NMI backtrace for cpu 6 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 NMI backtrace for cpu 7 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] NMI backtrace for cpu 7 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] NMI backtrace for cpu 7 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 NMI backtrace for cpu 4 CPU: 4 PID: 20623 Comm: syz-executor.6 Not tainted 6.3.0-next-20230426 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:kvm_wait+0xb7/0x110 arch/x86/kernel/kvm.c:1064 Code: 40 38 c6 74 1b 48 83 c4 10 c3 c3 e8 93 d3 50 00 eb 07 0f 00 2d 4a 04 92 08 fb f4 48 83 c4 10 c3 eb 07 0f 00 2d 3a 04 92 08 f4 <48> 83 c4 10 c3 89 74 24 0c 48 89 3c 24 e8 d7 d4 50 00 8b 74 24 0c RSP: 0018:ffffc90000300b50 EFLAGS: 00000046 RAX: 0000000000000003 RBX: 0000000000000000 RCX: dffffc0000000000 RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88810b0803d8 RBP: ffff88810b0803d8 R08: 0000000000000001 R09: ffff88810b0803d8 R10: ffffed102161007b R11: ffffc90000300ff8 R12: 0000000000000000 R13: ffffed102161007b R14: 0000000000000001 R15: ffff888119e3d3c0 FS: 0000000000000000(0000) GS:ffff888119e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f28183bd0b0 CR3: 000000000bb73000 CR4: 0000000000350ee0 Call Trace: <IRQ> pv_wait arch/x86/include/asm/paravirt.h:598 [inline] pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:470 [inline] __pv_queued_spin_lock_slowpath+0x8e4/0xb80 kernel/locking/qspinlock.c:511 pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:586 [inline] queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline] queued_spin_lock include/asm-generic/qspinlock.h:114 [inline] do_raw_spin_lock+0x20d/0x2b0 kernel/locking/spinlock_debug.c:115 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline] _raw_spin_lock_irqsave+0x45/0x60 kernel/locking/spinlock.c:162 drm_handle_vblank+0x11e/0xb80 drivers/gpu/drm/drm_vblank.c:1986 vkms_vblank_simulate+0xe8/0x3e0 drivers/gpu/drm/vkms/vkms_crtc.c:29 __run_hrtimer kernel/time/hrtimer.c:1685 [inline] __hrtimer_run_queues+0x599/0xa30 kernel/time/hrtimer.c:1749 hrtimer_interrupt+0x320/0x7b0 kernel/time/hrtimer.c:1811 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1095 [inline] __sysvec_apic_timer_interrupt+0x14a/0x430 arch/x86/kernel/apic/apic.c:1112 sysvec_apic_timer_interrupt+0x92/0xc0 arch/x86/kernel/apic/apic.c:1106 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline] RIP: 0010:__sanitizer_cov_trace_pc+0x11/0x70 kernel/kcov.c:207 Code: a8 01 00 00 e8 b0 ff ff ff 31 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 65 8b 05 0d 33 82 7e 89 c1 48 8b 34 24 <81> e1 00 01 00 00 65 48 8b 14 25 40 bb 03 00 a9 00 01 ff 00 74 0e RSP: 0018:ffffc90002be76d8 EFLAGS: 00000286 RAX: 0000000080000001 RBX: 0000000000000001 RCX: 0000000080000001 RDX: 00007f2817c77000 RSI: ffffffff81bcd756 RDI: ffffc90002be7ad8 RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffffea00014fc480 R13: 0000000000000000 R14: dffffc0000000000 R15: 8000000053f12007 zap_drop_file_uffd_wp mm/memory.c:1352 [inline] zap_install_uffd_wp_if_needed mm/memory.c:1371 [inline] zap_pte_range mm/memory.c:1417 [inline] zap_pmd_range mm/memory.c:1564 [inline] zap_pud_range mm/memory.c:1593 [inline] zap_p4d_range mm/memory.c:1614 [inline] unmap_page_range+0x1046/0x4470 mm/memory.c:1635 unmap_single_vma+0x19a/0x2b0 mm/memory.c:1681 unmap_vmas+0x234/0x380 mm/memory.c:1720 exit_mmap+0x190/0x930 mm/mmap.c:3111 __mmput+0x128/0x4c0 kernel/fork.c:1351 mmput+0x60/0x70 kernel/fork.c:1373 exit_mm kernel/exit.c:564 [inline] do_exit+0x9d1/0x29f0 kernel/exit.c:858 do_group_exit+0xd4/0x2a0 kernel/exit.c:1021 get_signal+0x2311/0x25c0 kernel/signal.c:2874 arch_do_signal_or_restart+0x79/0x5a0 arch/x86/kernel/signal.c:307 exit_to_user_mode_loop kernel/entry/common.c:168 [inline] exit_to_user_mode_prepare+0x11f/0x240 kernel/entry/common.c:204 __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:297 do_syscall_64+0x46/0x80 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f281828edcd Code: Unable to access opcode bytes at 0x7f281828eda3. RSP: 002b:00007f28194c0c98 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca RAX: fffffffffffffe00 RBX: 00007f28183bbf80 RCX: 00007f281828edcd RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f28183bbf88 RBP: 00007f28183bbf88 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f28183bbf8c R13: 00007ffd5038e1ef R14: 00007ffd5038e390 R15: 00007f28194c0d80 </TASK> NMI backtrace for cpu 2 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] NMI backtrace for cpu 2 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] NMI backtrace for cpu 2 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG: soft lockup in smp_call_function 2023-09-12 23:02 Sanan Hasanov @ 2023-09-13 10:05 ` Peter Zijlstra 2023-09-13 11:07 ` Hillf Danton 1 sibling, 0 replies; 12+ messages in thread From: Peter Zijlstra @ 2023-09-13 10:05 UTC (permalink / raw) To: Sanan Hasanov Cc: paulmck, jgross, vschneid, yury.norov, linux-kernel, syzkaller, contact, rodrigosiqueiramelo, melissa.srw, mairacanal, hamohammed.sa, daniel, airlied On Tue, Sep 12, 2023 at 11:02:56PM +0000, Sanan Hasanov wrote: > Good day, dear maintainers, > > We found a bug using a modified kernel configuration file used by syzbot. > > We enhanced the coverage of the configuration file using our tool, klocalizer. > > Kernel Branch: 6.3.0-next-20230426 > Kernel Config: https://drive.google.com/file/d/1WSUEWrith9-539qo6xRqmwy4LfDtmKpp/view?usp=sharing > Reproducer: https://drive.google.com/file/d/1pN6FfcjuUs6Wx94g1gufuYGjRbMMgiZ4/view?usp=sharing > Thank you! AFAICT the thing is stuck in DRM somewhere... > watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [kworker/u16:1:12] > Modules linked in: > irq event stamp: 192794 > hardirqs last enabled at (192793): [<ffffffff89a0140a>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 > hardirqs last disabled at (192794): [<ffffffff89975d4f>] sysvec_apic_timer_interrupt+0xf/0xc0 arch/x86/kernel/apic/apic.c:1106 > softirqs last enabled at (187764): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] > softirqs last enabled at (187764): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 > softirqs last disabled at (187671): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] > softirqs last disabled at (187671): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 > CPU: 5 PID: 12 Comm: kworker/u16:1 Not tainted 6.3.0-next-20230426 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > Workqueue: events_unbound toggle_allocation_gate > RIP: 0010:csd_lock_wait kernel/smp.c:294 [inline] > RIP: 0010:smp_call_function_many_cond+0x5bd/0x1020 kernel/smp.c:828 > Code: 0b 00 85 ed 74 4d 48 b8 00 00 00 00 00 fc ff df 4d 89 f4 4c 89 f5 49 c1 ec 03 83 e5 07 49 01 c4 83 c5 03 e8 b5 07 0b 00 f3 90 <41> 0f b6 04 24 40 38 c5 7c 08 84 c0 0f 85 46 08 00 00 8b 43 08 31 > RSP: 0018:ffffc900000cf9e8 EFLAGS: 00000293 > RAX: 0000000000000000 RBX: ffff888119cc4d80 RCX: 0000000000000000 > RDX: ffff888100325940 RSI: ffffffff8176807b RDI: 0000000000000005 > RBP: 0000000000000003 R08: 0000000000000005 R09: 0000000000000000 > R10: 0000000000000001 R11: 0000000000000001 R12: ffffed10233989b1 > R13: 0000000000000001 R14: ffff888119cc4d88 R15: 0000000000000001 > FS: 0000000000000000(0000) GS:ffff888119e80000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000555556a6cc88 CR3: 000000000bb73000 CR4: 0000000000350ee0 > Call Trace: > <TASK> > on_each_cpu_cond_mask+0x40/0x90 kernel/smp.c:996 > on_each_cpu include/linux/smp.h:71 [inline] > text_poke_sync arch/x86/kernel/alternative.c:1770 [inline] > text_poke_bp_batch+0x237/0x770 arch/x86/kernel/alternative.c:1970 > text_poke_flush arch/x86/kernel/alternative.c:2161 [inline] > text_poke_flush arch/x86/kernel/alternative.c:2158 [inline] > text_poke_finish+0x1a/0x30 arch/x86/kernel/alternative.c:2168 > arch_jump_label_transform_apply+0x17/0x30 arch/x86/kernel/jump_label.c:146 > jump_label_update+0x321/0x400 kernel/jump_label.c:829 > static_key_enable_cpuslocked+0x1b5/0x270 kernel/jump_label.c:205 > static_key_enable+0x1a/0x20 kernel/jump_label.c:218 > toggle_allocation_gate mm/kfence/core.c:831 [inline] > toggle_allocation_gate+0xf4/0x220 mm/kfence/core.c:823 > process_one_work+0x993/0x15e0 kernel/workqueue.c:2405 > worker_thread+0x67d/0x10c0 kernel/workqueue.c:2552 > kthread+0x33e/0x440 kernel/kthread.c:379 > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 > </TASK> Right, so this is waiting for an IPI to be processed.. while #1 has IRQs disabled > Sending NMI from CPU 5 to CPUs 0-4,6-7: > NMI backtrace for cpu 1 > CPU: 1 PID: 20602 Comm: syz-executor.3 Not tainted 6.3.0-next-20230426 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > RIP: 0010:hlock_class kernel/locking/lockdep.c:228 [inline] > RIP: 0010:check_wait_context kernel/locking/lockdep.c:4747 [inline] > RIP: 0010:__lock_acquire+0x489/0x5d00 kernel/locking/lockdep.c:5024 > Code: 41 81 e5 ff 1f 45 0f b7 ed be 08 00 00 00 4c 89 e8 48 c1 e8 06 48 8d 3c c5 00 6b 2c 90 e8 5f 90 6e 00 4c 0f a3 2d d7 35 c9 0e <0f> 83 5c 0c 00 00 4f 8d 6c 6d 00 49 c1 e5 06 49 81 c5 20 6f 2c 90 > RSP: 0018:ffffc90002aa7350 EFLAGS: 00000047 > RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffffff81633521 > RDX: fffffbfff2058d62 RSI: 0000000000000008 RDI: ffffffff902c6b08 > RBP: ffff888042995940 R08: 0000000000000000 R09: ffffffff902c6b0f > R10: fffffbfff2058d61 R11: 0000000000000001 R12: ffff888119e2b818 > R13: 0000000000000063 R14: 0000000000000002 R15: ffff888042996598 > FS: 00007fdaad065700(0000) GS:ffff888119c80000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000001b30623000 CR3: 0000000101969000 CR4: 0000000000350ee0 > Call Trace: > <TASK> > lock_acquire kernel/locking/lockdep.c:5691 [inline] > lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5656 > __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] > _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162 > lock_hrtimer_base kernel/time/hrtimer.c:173 [inline] > hrtimer_try_to_cancel kernel/time/hrtimer.c:1331 [inline] > hrtimer_try_to_cancel+0xa9/0x2e0 kernel/time/hrtimer.c:1316 > hrtimer_cancel+0x17/0x40 kernel/time/hrtimer.c:1443 And this is trying to cancel a hrtimer which is ran on CPU4 and won't be making much progress. > __disable_vblank drivers/gpu/drm/drm_vblank.c:434 [inline] So we're here, holding vbl_lock, vblank_time_lock one of which is what #4 is waiting on. This also has IRQs disabled, which is what #1 is waiting on. > drm_vblank_disable_and_save+0x282/0x3d0 drivers/gpu/drm/drm_vblank.c:478 > drm_crtc_vblank_off+0x312/0x970 drivers/gpu/drm/drm_vblank.c:1366 > disable_outputs+0x7c7/0xbb0 drivers/gpu/drm/drm_atomic_helper.c:1202 > drm_atomic_helper_commit_modeset_disables+0x1d/0x40 drivers/gpu/drm/drm_atomic_helper.c:1397 > vkms_atomic_commit_tail+0x51/0x240 drivers/gpu/drm/vkms/vkms_drv.c:71 > commit_tail+0x288/0x420 drivers/gpu/drm/drm_atomic_helper.c:1812 > drm_atomic_helper_commit drivers/gpu/drm/drm_atomic_helper.c:2052 [inline] > drm_atomic_helper_commit+0x306/0x390 drivers/gpu/drm/drm_atomic_helper.c:1985 > drm_atomic_commit+0x20a/0x2d0 drivers/gpu/drm/drm_atomic.c:1503 > drm_client_modeset_commit_atomic+0x698/0x7e0 drivers/gpu/drm/drm_client_modeset.c:1045 > drm_client_modeset_dpms+0x174/0x200 drivers/gpu/drm/drm_client_modeset.c:1226 > drm_fb_helper_dpms drivers/gpu/drm/drm_fb_helper.c:323 [inline] > drm_fb_helper_blank+0xd1/0x260 drivers/gpu/drm/drm_fb_helper.c:356 > fb_blank+0x105/0x190 drivers/video/fbdev/core/fbmem.c:1088 > do_fb_ioctl+0x390/0x760 drivers/video/fbdev/core/fbmem.c:1180 > fb_ioctl+0xeb/0x150 drivers/video/fbdev/core/fbmem.c:1204 > vfs_ioctl fs/ioctl.c:51 [inline] > __do_sys_ioctl fs/ioctl.c:870 [inline] > __se_sys_ioctl fs/ioctl.c:856 [inline] > __x64_sys_ioctl+0x197/0x210 fs/ioctl.c:856 > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > do_syscall_64+0x39/0x80 arch/x86/entry/common.c:80 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > RIP: 0033:0x7fdaabe8edcd > Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 > RSP: 002b:00007fdaad064bf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 00007fdaabfbbf80 RCX: 00007fdaabe8edcd > RDX: 0000000000000004 RSI: 0000000000004611 RDI: 0000000000000003 > RBP: 00007fdaabefc59c R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > R13: 00007ffdadeffe9f R14: 00007ffdadf00040 R15: 00007fdaad064d80 > </TASK> > CPU: 4 PID: 20623 Comm: syz-executor.6 Not tainted 6.3.0-next-20230426 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > RIP: 0010:kvm_wait+0xb7/0x110 arch/x86/kernel/kvm.c:1064 > Code: 40 38 c6 74 1b 48 83 c4 10 c3 c3 e8 93 d3 50 00 eb 07 0f 00 2d 4a 04 92 08 fb f4 48 83 c4 10 c3 eb 07 0f 00 2d 3a 04 92 08 f4 <48> 83 c4 10 c3 89 74 24 0c 48 89 3c 24 e8 d7 d4 50 00 8b 74 24 0c > RSP: 0018:ffffc90000300b50 EFLAGS: 00000046 > RAX: 0000000000000003 RBX: 0000000000000000 RCX: dffffc0000000000 > RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88810b0803d8 > RBP: ffff88810b0803d8 R08: 0000000000000001 R09: ffff88810b0803d8 > R10: ffffed102161007b R11: ffffc90000300ff8 R12: 0000000000000000 > R13: ffffed102161007b R14: 0000000000000001 R15: ffff888119e3d3c0 > FS: 0000000000000000(0000) GS:ffff888119e00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f28183bd0b0 CR3: 000000000bb73000 CR4: 0000000000350ee0 > Call Trace: > <IRQ> > pv_wait arch/x86/include/asm/paravirt.h:598 [inline] > pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:470 [inline] > __pv_queued_spin_lock_slowpath+0x8e4/0xb80 kernel/locking/qspinlock.c:511 > pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:586 [inline] > queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline] > queued_spin_lock include/asm-generic/qspinlock.h:114 [inline] > do_raw_spin_lock+0x20d/0x2b0 kernel/locking/spinlock_debug.c:115 > __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline] > _raw_spin_lock_irqsave+0x45/0x60 kernel/locking/spinlock.c:162 > drm_handle_vblank+0x11e/0xb80 drivers/gpu/drm/drm_vblank.c:1986 stuck on a spinlock held by #5 > vkms_vblank_simulate+0xe8/0x3e0 drivers/gpu/drm/vkms/vkms_crtc.c:29 > __run_hrtimer kernel/time/hrtimer.c:1685 [inline] > __hrtimer_run_queues+0x599/0xa30 kernel/time/hrtimer.c:1749 > hrtimer_interrupt+0x320/0x7b0 kernel/time/hrtimer.c:1811 > local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1095 [inline] > __sysvec_apic_timer_interrupt+0x14a/0x430 arch/x86/kernel/apic/apic.c:1112 > sysvec_apic_timer_interrupt+0x92/0xc0 arch/x86/kernel/apic/apic.c:1106 > </IRQ> > <TASK> > asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 > RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline] > RIP: 0010:__sanitizer_cov_trace_pc+0x11/0x70 kernel/kcov.c:207 > Code: a8 01 00 00 e8 b0 ff ff ff 31 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 65 8b 05 0d 33 82 7e 89 c1 48 8b 34 24 <81> e1 00 01 00 00 65 48 8b 14 25 40 bb 03 00 a9 00 01 ff 00 74 0e > RSP: 0018:ffffc90002be76d8 EFLAGS: 00000286 > RAX: 0000000080000001 RBX: 0000000000000001 RCX: 0000000080000001 > RDX: 00007f2817c77000 RSI: ffffffff81bcd756 RDI: ffffc90002be7ad8 > RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 > R10: 0000000000000001 R11: 0000000000000001 R12: ffffea00014fc480 > R13: 0000000000000000 R14: dffffc0000000000 R15: 8000000053f12007 > zap_drop_file_uffd_wp mm/memory.c:1352 [inline] > zap_install_uffd_wp_if_needed mm/memory.c:1371 [inline] > zap_pte_range mm/memory.c:1417 [inline] > zap_pmd_range mm/memory.c:1564 [inline] > zap_pud_range mm/memory.c:1593 [inline] > zap_p4d_range mm/memory.c:1614 [inline] > unmap_page_range+0x1046/0x4470 mm/memory.c:1635 > unmap_single_vma+0x19a/0x2b0 mm/memory.c:1681 > unmap_vmas+0x234/0x380 mm/memory.c:1720 > exit_mmap+0x190/0x930 mm/mmap.c:3111 > __mmput+0x128/0x4c0 kernel/fork.c:1351 > mmput+0x60/0x70 kernel/fork.c:1373 > exit_mm kernel/exit.c:564 [inline] > do_exit+0x9d1/0x29f0 kernel/exit.c:858 > do_group_exit+0xd4/0x2a0 kernel/exit.c:1021 > get_signal+0x2311/0x25c0 kernel/signal.c:2874 > arch_do_signal_or_restart+0x79/0x5a0 arch/x86/kernel/signal.c:307 > exit_to_user_mode_loop kernel/entry/common.c:168 [inline] > exit_to_user_mode_prepare+0x11f/0x240 kernel/entry/common.c:204 > __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] > syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:297 > do_syscall_64+0x46/0x80 arch/x86/entry/common.c:86 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > RIP: 0033:0x7f281828edcd > Code: Unable to access opcode bytes at 0x7f281828eda3. > RSP: 002b:00007f28194c0c98 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca > RAX: fffffffffffffe00 RBX: 00007f28183bbf80 RCX: 00007f281828edcd > RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f28183bbf88 > RBP: 00007f28183bbf88 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 00007f28183bbf8c > R13: 00007ffd5038e1ef R14: 00007ffd5038e390 R15: 00007f28194c0d80 > </TASK> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG: soft lockup in smp_call_function 2023-09-12 23:02 Sanan Hasanov 2023-09-13 10:05 ` Peter Zijlstra @ 2023-09-13 11:07 ` Hillf Danton 2023-09-13 14:30 ` Tetsuo Handa 1 sibling, 1 reply; 12+ messages in thread From: Hillf Danton @ 2023-09-13 11:07 UTC (permalink / raw) To: Sanan Hasanov Cc: peterz, Linus Torvalds, Tetsuo Handa, Thomas Gleixner, syzkaller, linux-mm, LKML On Tue, 12 Sep 2023 23:02:56 +0000 Sanan Hasanov <Sanan.Hasanov@ucf.edu> > Good day, dear maintainers, > > We found a bug using a modified kernel configuration file used by syzbot. > Thanks for your report. > We enhanced the coverage of the configuration file using our tool, klocalizer. > > Kernel Branch: 6.3.0-next-20230426 > Kernel Config: https://drive.google.com/file/d/1WSUEWrith9-539qo6xRqmwy4LfDtmKpp/view?usp=sharing > Reproducer: https://drive.google.com/file/d/1pN6FfcjuUs6Wx94g1gufuYGjRbMMgiZ4/view?usp=sharing > Thank you! > > Best regards, > Sanan Hasanov > > watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [kworker/u16:1:12] > Modules linked in: > irq event stamp: 192794 > hardirqs last enabled at (192793): [<ffffffff89a0140a>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 > hardirqs last disabled at (192794): [<ffffffff89975d4f>] sysvec_apic_timer_interrupt+0xf/0xc0 arch/x86/kernel/apic/apic.c:1106 > softirqs last enabled at (187764): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] > softirqs last enabled at (187764): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 > softirqs last disabled at (187671): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] > softirqs last disabled at (187671): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 > CPU: 5 PID: 12 Comm: kworker/u16:1 Not tainted 6.3.0-next-20230426 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > Workqueue: events_unbound toggle_allocation_gate > RIP: 0010:csd_lock_wait kernel/smp.c:294 [inline] > RIP: 0010:smp_call_function_many_cond+0x5bd/0x1020 kernel/smp.c:828 > Code: 0b 00 85 ed 74 4d 48 b8 00 00 00 00 00 fc ff df 4d 89 f4 4c 89 f5 49 c1 ec 03 83 e5 07 49 01 c4 83 c5 03 e8 b5 07 0b 00 f3 90 <41> 0f b6 04 24 40 38 c5 7c 08 84 c0 0f 85 46 08 00 00 8b 43 08 31 > RSP: 0018:ffffc900000cf9e8 EFLAGS: 00000293 > RAX: 0000000000000000 RBX: ffff888119cc4d80 RCX: 0000000000000000 > RDX: ffff888100325940 RSI: ffffffff8176807b RDI: 0000000000000005 > RBP: 0000000000000003 R08: 0000000000000005 R09: 0000000000000000 > R10: 0000000000000001 R11: 0000000000000001 R12: ffffed10233989b1 > R13: 0000000000000001 R14: ffff888119cc4d88 R15: 0000000000000001 > FS: 0000000000000000(0000) GS:ffff888119e80000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000555556a6cc88 CR3: 000000000bb73000 CR4: 0000000000350ee0 > Call Trace: > <TASK> > on_each_cpu_cond_mask+0x40/0x90 kernel/smp.c:996 > on_each_cpu include/linux/smp.h:71 [inline] > text_poke_sync arch/x86/kernel/alternative.c:1770 [inline] > text_poke_bp_batch+0x237/0x770 arch/x86/kernel/alternative.c:1970 > text_poke_flush arch/x86/kernel/alternative.c:2161 [inline] > text_poke_flush arch/x86/kernel/alternative.c:2158 [inline] > text_poke_finish+0x1a/0x30 arch/x86/kernel/alternative.c:2168 > arch_jump_label_transform_apply+0x17/0x30 arch/x86/kernel/jump_label.c:146 > jump_label_update+0x321/0x400 kernel/jump_label.c:829 > static_key_enable_cpuslocked+0x1b5/0x270 kernel/jump_label.c:205 > static_key_enable+0x1a/0x20 kernel/jump_label.c:218 > toggle_allocation_gate mm/kfence/core.c:831 [inline] > toggle_allocation_gate+0xf4/0x220 mm/kfence/core.c:823 > process_one_work+0x993/0x15e0 kernel/workqueue.c:2405 > worker_thread+0x67d/0x10c0 kernel/workqueue.c:2552 > kthread+0x33e/0x440 kernel/kthread.c:379 > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 > </TASK> > Sending NMI from CPU 5 to CPUs 0-4,6-7: > NMI backtrace for cpu 1 > CPU: 1 PID: 20602 Comm: syz-executor.3 Not tainted 6.3.0-next-20230426 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > RIP: 0010:hlock_class kernel/locking/lockdep.c:228 [inline] > RIP: 0010:check_wait_context kernel/locking/lockdep.c:4747 [inline] > RIP: 0010:__lock_acquire+0x489/0x5d00 kernel/locking/lockdep.c:5024 > Code: 41 81 e5 ff 1f 45 0f b7 ed be 08 00 00 00 4c 89 e8 48 c1 e8 06 48 8d 3c c5 00 6b 2c 90 e8 5f 90 6e 00 4c 0f a3 2d d7 35 c9 0e <0f> 83 5c 0c 00 00 4f 8d 6c 6d 00 49 c1 e5 06 49 81 c5 20 6f 2c 90 > RSP: 0018:ffffc90002aa7350 EFLAGS: 00000047 > RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffffff81633521 > RDX: fffffbfff2058d62 RSI: 0000000000000008 RDI: ffffffff902c6b08 > RBP: ffff888042995940 R08: 0000000000000000 R09: ffffffff902c6b0f > R10: fffffbfff2058d61 R11: 0000000000000001 R12: ffff888119e2b818 > R13: 0000000000000063 R14: 0000000000000002 R15: ffff888042996598 > FS: 00007fdaad065700(0000) GS:ffff888119c80000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000001b30623000 CR3: 0000000101969000 CR4: 0000000000350ee0 > Call Trace: > <TASK> > lock_acquire kernel/locking/lockdep.c:5691 [inline] > lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5656 > __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] > _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162 > lock_hrtimer_base kernel/time/hrtimer.c:173 [inline] > hrtimer_try_to_cancel kernel/time/hrtimer.c:1331 [inline] > hrtimer_try_to_cancel+0xa9/0x2e0 kernel/time/hrtimer.c:1316 > hrtimer_cancel+0x17/0x40 kernel/time/hrtimer.c:1443 > __disable_vblank drivers/gpu/drm/drm_vblank.c:434 [inline] > drm_vblank_disable_and_save+0x282/0x3d0 drivers/gpu/drm/drm_vblank.c:478 > drm_crtc_vblank_off+0x312/0x970 drivers/gpu/drm/drm_vblank.c:1366 cpu1 cpu4 (see below) ==== ==== drm_crtc_vblank_off __run_hrtimer spin_lock_irq(&dev->event_lock); ... drm_handle_vblank hrtimer_cancel spin_lock_irqsave(&dev->event_lock, irqflags); Deadlock should have been reported instead provided the lockdep_map in struct timer_list were added also to hrtimer, so it is highly appreciated if Tetsuo or Thomas adds it before 6.8 or 6.10. Hillf > disable_outputs+0x7c7/0xbb0 drivers/gpu/drm/drm_atomic_helper.c:1202 > drm_atomic_helper_commit_modeset_disables+0x1d/0x40 drivers/gpu/drm/drm_atomic_helper.c:1397 > vkms_atomic_commit_tail+0x51/0x240 drivers/gpu/drm/vkms/vkms_drv.c:71 > commit_tail+0x288/0x420 drivers/gpu/drm/drm_atomic_helper.c:1812 > drm_atomic_helper_commit drivers/gpu/drm/drm_atomic_helper.c:2052 [inline] > drm_atomic_helper_commit+0x306/0x390 drivers/gpu/drm/drm_atomic_helper.c:1985 > drm_atomic_commit+0x20a/0x2d0 drivers/gpu/drm/drm_atomic.c:1503 > drm_client_modeset_commit_atomic+0x698/0x7e0 drivers/gpu/drm/drm_client_modeset.c:1045 > drm_client_modeset_dpms+0x174/0x200 drivers/gpu/drm/drm_client_modeset.c:1226 > drm_fb_helper_dpms drivers/gpu/drm/drm_fb_helper.c:323 [inline] > drm_fb_helper_blank+0xd1/0x260 drivers/gpu/drm/drm_fb_helper.c:356 > fb_blank+0x105/0x190 drivers/video/fbdev/core/fbmem.c:1088 > do_fb_ioctl+0x390/0x760 drivers/video/fbdev/core/fbmem.c:1180 > fb_ioctl+0xeb/0x150 drivers/video/fbdev/core/fbmem.c:1204 > vfs_ioctl fs/ioctl.c:51 [inline] > __do_sys_ioctl fs/ioctl.c:870 [inline] > __se_sys_ioctl fs/ioctl.c:856 [inline] > __x64_sys_ioctl+0x197/0x210 fs/ioctl.c:856 > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > do_syscall_64+0x39/0x80 arch/x86/entry/common.c:80 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > RIP: 0033:0x7fdaabe8edcd > Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 > RSP: 002b:00007fdaad064bf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 00007fdaabfbbf80 RCX: 00007fdaabe8edcd > RDX: 0000000000000004 RSI: 0000000000004611 RDI: 0000000000000003 > RBP: 00007fdaabefc59c R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > R13: 00007ffdadeffe9f R14: 00007ffdadf00040 R15: 00007fdaad064d80 > </TASK> > NMI backtrace for cpu 0 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] > NMI backtrace for cpu 0 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] > NMI backtrace for cpu 0 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 > NMI backtrace for cpu 3 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] > NMI backtrace for cpu 3 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] > NMI backtrace for cpu 3 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 > NMI backtrace for cpu 6 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] > NMI backtrace for cpu 6 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] > NMI backtrace for cpu 6 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 > NMI backtrace for cpu 7 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] > NMI backtrace for cpu 7 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] > NMI backtrace for cpu 7 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 > NMI backtrace for cpu 4 > CPU: 4 PID: 20623 Comm: syz-executor.6 Not tainted 6.3.0-next-20230426 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > RIP: 0010:kvm_wait+0xb7/0x110 arch/x86/kernel/kvm.c:1064 > Code: 40 38 c6 74 1b 48 83 c4 10 c3 c3 e8 93 d3 50 00 eb 07 0f 00 2d 4a 04 92 08 fb f4 48 83 c4 10 c3 eb 07 0f 00 2d 3a 04 92 08 f4 <48> 83 c4 10 c3 89 74 24 0c 48 89 3c 24 e8 d7 d4 50 00 8b 74 24 0c > RSP: 0018:ffffc90000300b50 EFLAGS: 00000046 > RAX: 0000000000000003 RBX: 0000000000000000 RCX: dffffc0000000000 > RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88810b0803d8 > RBP: ffff88810b0803d8 R08: 0000000000000001 R09: ffff88810b0803d8 > R10: ffffed102161007b R11: ffffc90000300ff8 R12: 0000000000000000 > R13: ffffed102161007b R14: 0000000000000001 R15: ffff888119e3d3c0 > FS: 0000000000000000(0000) GS:ffff888119e00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f28183bd0b0 CR3: 000000000bb73000 CR4: 0000000000350ee0 > Call Trace: > <IRQ> > pv_wait arch/x86/include/asm/paravirt.h:598 [inline] > pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:470 [inline] > __pv_queued_spin_lock_slowpath+0x8e4/0xb80 kernel/locking/qspinlock.c:511 > pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:586 [inline] > queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline] > queued_spin_lock include/asm-generic/qspinlock.h:114 [inline] > do_raw_spin_lock+0x20d/0x2b0 kernel/locking/spinlock_debug.c:115 > __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline] > _raw_spin_lock_irqsave+0x45/0x60 kernel/locking/spinlock.c:162 > drm_handle_vblank+0x11e/0xb80 drivers/gpu/drm/drm_vblank.c:1986 > vkms_vblank_simulate+0xe8/0x3e0 drivers/gpu/drm/vkms/vkms_crtc.c:29 > __run_hrtimer kernel/time/hrtimer.c:1685 [inline] > __hrtimer_run_queues+0x599/0xa30 kernel/time/hrtimer.c:1749 > hrtimer_interrupt+0x320/0x7b0 kernel/time/hrtimer.c:1811 > local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1095 [inline] > __sysvec_apic_timer_interrupt+0x14a/0x430 arch/x86/kernel/apic/apic.c:1112 > sysvec_apic_timer_interrupt+0x92/0xc0 arch/x86/kernel/apic/apic.c:1106 > </IRQ> > <TASK> > asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 > RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline] > RIP: 0010:__sanitizer_cov_trace_pc+0x11/0x70 kernel/kcov.c:207 > Code: a8 01 00 00 e8 b0 ff ff ff 31 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 65 8b 05 0d 33 82 7e 89 c1 48 8b 34 24 <81> e1 00 01 00 00 65 48 8b 14 25 40 bb 03 00 a9 00 01 ff 00 74 0e > RSP: 0018:ffffc90002be76d8 EFLAGS: 00000286 > RAX: 0000000080000001 RBX: 0000000000000001 RCX: 0000000080000001 > RDX: 00007f2817c77000 RSI: ffffffff81bcd756 RDI: ffffc90002be7ad8 > RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 > R10: 0000000000000001 R11: 0000000000000001 R12: ffffea00014fc480 > R13: 0000000000000000 R14: dffffc0000000000 R15: 8000000053f12007 > zap_drop_file_uffd_wp mm/memory.c:1352 [inline] > zap_install_uffd_wp_if_needed mm/memory.c:1371 [inline] > zap_pte_range mm/memory.c:1417 [inline] > zap_pmd_range mm/memory.c:1564 [inline] > zap_pud_range mm/memory.c:1593 [inline] > zap_p4d_range mm/memory.c:1614 [inline] > unmap_page_range+0x1046/0x4470 mm/memory.c:1635 > unmap_single_vma+0x19a/0x2b0 mm/memory.c:1681 > unmap_vmas+0x234/0x380 mm/memory.c:1720 > exit_mmap+0x190/0x930 mm/mmap.c:3111 > __mmput+0x128/0x4c0 kernel/fork.c:1351 > mmput+0x60/0x70 kernel/fork.c:1373 > exit_mm kernel/exit.c:564 [inline] > do_exit+0x9d1/0x29f0 kernel/exit.c:858 > do_group_exit+0xd4/0x2a0 kernel/exit.c:1021 > get_signal+0x2311/0x25c0 kernel/signal.c:2874 > arch_do_signal_or_restart+0x79/0x5a0 arch/x86/kernel/signal.c:307 > exit_to_user_mode_loop kernel/entry/common.c:168 [inline] > exit_to_user_mode_prepare+0x11f/0x240 kernel/entry/common.c:204 > __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] > syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:297 > do_syscall_64+0x46/0x80 arch/x86/entry/common.c:86 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > RIP: 0033:0x7f281828edcd > Code: Unable to access opcode bytes at 0x7f281828eda3. > RSP: 002b:00007f28194c0c98 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca > RAX: fffffffffffffe00 RBX: 00007f28183bbf80 RCX: 00007f281828edcd > RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f28183bbf88 > RBP: 00007f28183bbf88 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 00007f28183bbf8c > R13: 00007ffd5038e1ef R14: 00007ffd5038e390 R15: 00007f28194c0d80 > </TASK> > NMI backtrace for cpu 2 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] > NMI backtrace for cpu 2 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] > NMI backtrace for cpu 2 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG: soft lockup in smp_call_function 2023-09-13 11:07 ` Hillf Danton @ 2023-09-13 14:30 ` Tetsuo Handa 2023-09-14 12:21 ` Hillf Danton 0 siblings, 1 reply; 12+ messages in thread From: Tetsuo Handa @ 2023-09-13 14:30 UTC (permalink / raw) To: Hillf Danton, Sanan Hasanov, Thomas Gleixner, peterz Cc: Linus Torvalds, syzkaller, LKML On 2023/09/13 20:07, Hillf Danton wrote: > > cpu1 cpu4 (see below) > ==== ==== > drm_crtc_vblank_off __run_hrtimer > spin_lock_irq(&dev->event_lock); > ... > drm_handle_vblank > hrtimer_cancel spin_lock_irqsave(&dev->event_lock, irqflags); > > > Deadlock should have been reported instead provided the lockdep_map in > struct timer_list were added also to hrtimer, so it is highly appreciated > if Tetsuo or Thomas adds it before 6.8 or 6.10. Not me. ;-) Since hrtimer_cancel() retries forever until lock_hrtimer_base() succeeds, we want to add a lockdep annotation into hrtimer_cancel() so that we can detect this type of deadlock? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG: soft lockup in smp_call_function 2023-09-13 14:30 ` Tetsuo Handa @ 2023-09-14 12:21 ` Hillf Danton 2023-09-14 13:13 ` Tetsuo Handa 0 siblings, 1 reply; 12+ messages in thread From: Hillf Danton @ 2023-09-14 12:21 UTC (permalink / raw) To: Tetsuo Handa Cc: Sanan Hasanov, Thomas Gleixner, Linus Torvalds, syzkaller, linux-mm, LKML On Wed, 13 Sep 2023 23:30:23 +0900 Tetsuo Handa wrote: > On 2023/09/13 20:07, Hillf Danton wrote: > > > > cpu1 cpu4 (see below) > > ==== ==== > > drm_crtc_vblank_off __run_hrtimer > > spin_lock_irq(&dev->event_lock); > > ... > > drm_handle_vblank > > hrtimer_cancel spin_lock_irqsave(&dev->event_lock, irqflags); > > > > > > Deadlock should have been reported instead provided the lockdep_map in > > struct timer_list were added also to hrtimer, so it is highly appreciated > > if Tetsuo or Thomas adds it before 6.8 or 6.10. > > Not me. ;-) > > Since hrtimer_cancel() retries forever until lock_hrtimer_base() succeeds, > we want to add a lockdep annotation into hrtimer_cancel() so that we can > detect this type of deadlock? Yes, you are right. The diff below is my two cents (only for thoughts). --- x/include/linux/timer.h +++ y/include/linux/timer.h @@ -124,6 +124,9 @@ struct hrtimer { u8 is_rel; u8 is_soft; u8 is_hard; +#ifdef CONFIG_LOCKDEP + struct lockdep_map lockdep_map; +#endif }; /** @@ -369,33 +372,65 @@ static inline void hrtimer_cancel_wait_r /* Exported timer functions: */ /* Initialize timers: */ -extern void hrtimer_init(struct hrtimer *timer, clockid_t which_clock, - enum hrtimer_mode mode); -extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t clock_id, - enum hrtimer_mode mode); +extern void hrtimer_init_key(struct hrtimer *timer, clockid_t which_clock, + enum hrtimer_mode mode, + const char *name, struct lock_class_key *key); +extern void hrtimer_init_sleeper_key(struct hrtimer_sleeper *sl, clockid_t clock_id, + enum hrtimer_mode mode, + const char *name, struct lock_class_key *key); +#ifdef CONFIG_LOCKDEP +#define hrtimer_init(t, c, m) \ + do { \ + static struct lock_class_key __key; \ + hrtimer_init_key(t, c, m, #t, &__key); \ + } while (0) + +#define hrtimer_init_sleeper(s, c, m) \ + do { \ + static struct lock_class_key __key; \ + hrtimer_init_sleeper_key(s, c, m, #s, &__key); \ + } while (0) +#else +#define hrtimer_init(t, c, m) \ + hrtimer_init_key(t, c, m, NULL, NULL) + +#define hrtimer_init_sleeper(s, c, m) \ + hrtimer_init_sleeper_key(s, c, m, NULL, NULL) +#endif #ifdef CONFIG_DEBUG_OBJECTS_TIMERS -extern void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t which_clock, - enum hrtimer_mode mode); -extern void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl, +extern void hrtimer_init_on_stack_key(struct hrtimer *timer, clockid_t which_clock, + enum hrtimer_mode mode, + const char *name, struct lock_class_key *key); +extern void hrtimer_init_sleeper_on_stack_key(struct hrtimer_sleeper *sl, clockid_t clock_id, - enum hrtimer_mode mode); + enum hrtimer_mode mode, + const char *name, struct lock_class_key *key); +#ifdef CONFIG_LOCKDEP + #define hrtimer_init_on_stack(t, c, m) \ + do { \ + static struct lock_class_key __key; \ + hrtimer_init_on_stack_key(t, c, m, #t, &__key); \ + } while (0) + #define hrtimer_init_sleeper_on_stack(s, c, m) \ + do { \ + static struct lock_class_key __key; \ + hrtimer_init_sleeper_on_stack_key(s, c, m, #s, &__key); \ + } while (0) +#else + #define hrtimer_init_on_stack(t, c, m) \ + hrtimer_init_on_stack_key(t, c, m, NULL, NULL) + #define hrtimer_init_sleeper_on_stack(s, c, m) \ + hrtimer_init_sleeper_on_stack_key(s, c, m, NULL, NULL) +#endif extern void destroy_hrtimer_on_stack(struct hrtimer *timer); #else -static inline void hrtimer_init_on_stack(struct hrtimer *timer, - clockid_t which_clock, - enum hrtimer_mode mode) -{ - hrtimer_init(timer, which_clock, mode); -} +#define hrtimer_init_on_stack(t, c, m) \ + hrtimer_init(t, c, m) -static inline void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl, - clockid_t clock_id, - enum hrtimer_mode mode) -{ - hrtimer_init_sleeper(sl, clock_id, mode); -} +#define hrtimer_init_sleeper_on_stack(s, c, m) \ + hrtimer_init_sleeper(s, c, m) static inline void destroy_hrtimer_on_stack(struct hrtimer *timer) { } #endif --- x/kernel/time/hrtimer.c +++ y/kernel/time/hrtimer.c @@ -428,22 +428,26 @@ static inline void debug_hrtimer_deactiv static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id, enum hrtimer_mode mode); -void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t clock_id, - enum hrtimer_mode mode) +void hrtimer_init_on_stack_key(struct hrtimer *timer, clockid_t clock_id, + enum hrtimer_mode mode, + const char *name, struct lock_class_key *key) { debug_object_init_on_stack(timer, &hrtimer_debug_descr); __hrtimer_init(timer, clock_id, mode); + lockdep_init_map(&timer->lockdep_map, name, key, 0); } EXPORT_SYMBOL_GPL(hrtimer_init_on_stack); static void __hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t clock_id, enum hrtimer_mode mode); -void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl, - clockid_t clock_id, enum hrtimer_mode mode) +void hrtimer_init_sleeper_on_stack_key(struct hrtimer_sleeper *sl, + clockid_t clock_id, enum hrtimer_mode mode, + const char *name, struct lock_class_key *key) { debug_object_init_on_stack(&sl->timer, &hrtimer_debug_descr); __hrtimer_init_sleeper(sl, clock_id, mode); + lockdep_init_map(&sl->timer.lockdep_map, name, key, 0); } EXPORT_SYMBOL_GPL(hrtimer_init_sleeper_on_stack); @@ -1439,6 +1443,8 @@ int hrtimer_cancel(struct hrtimer *timer { int ret; + lock_map_acquire(&timer->lockdep_map); + lock_map_release(&timer->lockdep_map); do { ret = hrtimer_try_to_cancel(timer); @@ -1586,11 +1592,12 @@ static void __hrtimer_init(struct hrtime * but the PINNED bit is ignored as pinning happens * when the hrtimer is started */ -void hrtimer_init(struct hrtimer *timer, clockid_t clock_id, - enum hrtimer_mode mode) +void hrtimer_init_key(struct hrtimer *timer, clockid_t clock_id, enum hrtimer_mode mode, + const char *name, struct lock_class_key *key) { debug_init(timer, clock_id, mode); __hrtimer_init(timer, clock_id, mode); + lockdep_init_map(&timer->lockdep_map, name, key, 0); } EXPORT_SYMBOL_GPL(hrtimer_init); @@ -1647,6 +1654,11 @@ static void __run_hrtimer(struct hrtimer enum hrtimer_restart (*fn)(struct hrtimer *); bool expires_in_hardirq; int restart; +#ifdef CONFIG_LOCKDEP + struct lockdep_map lockdep_map; + + lockdep_copy_map(&lockdep_map, &timer->lockdep_map); +#endif lockdep_assert_held(&cpu_base->lock); @@ -1682,7 +1694,9 @@ static void __run_hrtimer(struct hrtimer trace_hrtimer_expire_entry(timer, now); expires_in_hardirq = lockdep_hrtimer_enter(timer); + lock_map_acquire(&lockdep_map); restart = fn(timer); + lock_map_release(&lockdep_map); lockdep_hrtimer_exit(expires_in_hardirq); trace_hrtimer_expire_exit(timer); @@ -2004,12 +2018,13 @@ static void __hrtimer_init_sleeper(struc * @clock_id: the clock to be used * @mode: timer mode abs/rel */ -void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t clock_id, - enum hrtimer_mode mode) +void hrtimer_init_sleeper_key(struct hrtimer_sleeper *sl, clockid_t clock_id, + enum hrtimer_mode mode, + const char *name, struct lock_class_key *key) { debug_init(&sl->timer, clock_id, mode); __hrtimer_init_sleeper(sl, clock_id, mode); - + lockdep_init_map(&sl->timer.lockdep_map, name, key, 0); } EXPORT_SYMBOL_GPL(hrtimer_init_sleeper); -- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG: soft lockup in smp_call_function 2023-09-14 12:21 ` Hillf Danton @ 2023-09-14 13:13 ` Tetsuo Handa 0 siblings, 0 replies; 12+ messages in thread From: Tetsuo Handa @ 2023-09-14 13:13 UTC (permalink / raw) To: Thomas Gleixner Cc: Sanan Hasanov, Linus Torvalds, syzkaller, LKML, Hillf Danton On 2023/09/14 21:21, Hillf Danton wrote: > On Wed, 13 Sep 2023 23:30:23 +0900 Tetsuo Handa wrote: >> On 2023/09/13 20:07, Hillf Danton wrote: >>> >>> cpu1 cpu4 (see below) >>> ==== ==== >>> drm_crtc_vblank_off __run_hrtimer >>> spin_lock_irq(&dev->event_lock); >>> ... >>> drm_handle_vblank >>> hrtimer_cancel spin_lock_irqsave(&dev->event_lock, irqflags); >>> >>> >>> Deadlock should have been reported instead provided the lockdep_map in >>> struct timer_list were added also to hrtimer, so it is highly appreciated >>> if Tetsuo or Thomas adds it before 6.8 or 6.10. >> >> Not me. ;-) >> >> Since hrtimer_cancel() retries forever until lock_hrtimer_base() succeeds, >> we want to add a lockdep annotation into hrtimer_cancel() so that we can >> detect this type of deadlock? Here is a reproducer. ---------------------------------------- #include <linux/module.h> #include <linux/delay.h> static DEFINE_SPINLOCK(lock1); static struct hrtimer timer1; static enum hrtimer_restart timer_func(struct hrtimer *timer) { unsigned long flags; mdelay(100); // Wait for test_init() to hold lock1. spin_lock_irqsave(&lock1, flags); spin_unlock_irqrestore(&lock1, flags); return HRTIMER_RESTART; } static int __init test_init(void) { unsigned long flags; hrtimer_init(&timer1, CLOCK_MONOTONIC, HRTIMER_MODE_REL); timer1.function = &timer_func; hrtimer_start(&timer1, 1, HRTIMER_MODE_REL); mdelay(10); // Wait for timer_func() to start. spin_lock_irqsave(&lock1, flags); hrtimer_cancel(&timer1); // Wait for timer_func() to finish. spin_unlock_irqrestore(&lock1, flags); return -EINVAL; } module_init(test_init); MODULE_LICENSE("GPL"); ---------------------------------------- ---------------------------------------- [ 996.507681] test: loading out-of-tree module taints kernel. [ 996.514019] test: module verification failed: signature and/or required key missing - tainting kernel [ 1061.893054] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 1061.900411] rcu: 4-...0: (1 GPs behind) idle=ed6c/1/0x4000000000000000 softirq=3304/3305 fqs=15784 [ 1061.909128] rcu: (detected by 0, t=65018 jiffies, g=12625, q=4422 ncpus=12) [ 1061.915003] Sending NMI from CPU 0 to CPUs 4: [ 1061.918972] NMI backtrace for cpu 4 [ 1061.919036] CPU: 4 PID: 3826 Comm: insmod Tainted: G OE 6.6.0-rc1+ #20 [ 1061.919093] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 [ 1061.919095] RIP: 0010:delay_tsc+0x34/0xa0 [ 1061.919560] Code: ff 05 e8 b1 26 70 65 44 8b 0d e4 b1 26 70 0f 01 f9 66 90 48 c1 e2 20 48 09 c2 49 89 d0 eb 21 65 ff 0d c8 b1 26 70 74 54 f3 90 <65> ff 05 bd b1 26 70 65 8b 35 ba b1 26 70 41 39 f1 75 28 41 89 f1 [ 1061.919563] RSP: 0018:ffffb471c059cf00 EFLAGS: 00000083 [ 1061.919567] RAX: 0000028efe104ef6 RBX: 0000000000000041 RCX: 0000000000000004 [ 1061.919569] RDX: 00000000002192f8 RSI: 0000000000000004 RDI: 000000000027d81e [ 1061.919571] RBP: ffff8970dafe5040 R08: 0000028efdeebbfe R09: 0000000000000004 [ 1061.919572] R10: 0000000000000001 R11: ffffffffc0a8d600 R12: ffffffff90e030e0 [ 1061.919574] R13: ffff8970dafe5040 R14: ffffffffc0a8b010 R15: ffff8970dafe5100 [ 1061.919630] FS: 00007fdd998eb740(0000) GS:ffff8970dae00000(0000) knlGS:0000000000000000 [ 1061.919633] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1061.919635] CR2: 0000000001edf678 CR3: 00000001a1060000 CR4: 0000000000350ee0 [ 1061.919639] Call Trace: [ 1061.919641] <NMI> [ 1061.919646] ? nmi_cpu_backtrace+0xb1/0x130 [ 1061.919711] ? nmi_cpu_backtrace_handler+0x11/0x20 [ 1061.922096] ? nmi_handle+0xe4/0x290 [ 1061.922163] ? default_do_nmi+0x49/0x100 [ 1061.922196] ? exc_nmi+0x152/0x1e0 [ 1061.922198] ? end_repeat_nmi+0x16/0x67 [ 1061.922340] ? __pfx_timer_func+0x10/0x10 [test] [ 1061.922347] ? delay_tsc+0x34/0xa0 [ 1061.922349] ? delay_tsc+0x34/0xa0 [ 1061.922350] ? delay_tsc+0x34/0xa0 [ 1061.922352] </NMI> [ 1061.922353] <IRQ> [ 1061.922353] timer_func+0x19/0xff0 [test] [ 1061.922358] __hrtimer_run_queues+0x177/0x3a0 [ 1061.922362] hrtimer_interrupt+0x104/0x240 [ 1061.922364] ? __do_softirq+0x2db/0x392 [ 1061.922827] __sysvec_apic_timer_interrupt+0x64/0x180 [ 1061.922833] sysvec_apic_timer_interrupt+0x65/0x80 [ 1061.922894] </IRQ> [ 1061.922896] <TASK> [ 1061.922898] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 1061.922902] RIP: 0010:delay_tsc+0x4d/0xa0 [ 1061.922907] Code: c2 49 89 d0 eb 21 65 ff 0d c8 b1 26 70 74 54 f3 90 65 ff 05 bd b1 26 70 65 8b 35 ba b1 26 70 41 39 f1 75 28 41 89 f1 0f 01 f9 <66> 90 48 c1 e2 20 48 09 d0 48 89 c2 4c 29 c2 48 39 fa 72 c8 65 ff [ 1061.922909] RSP: 0018:ffffb471c1e63bd0 EFLAGS: 00000246 [ 1061.922912] RAX: 00000000751ed8ab RBX: 000000000000000a RCX: 0000000000000004 [ 1061.922914] RDX: 0000000000000267 RSI: 0000000000000004 RDI: 000000000027d81e [ 1061.922915] RBP: ffffffffc0a91010 R08: 00000267751adc59 R09: 0000000000000004 [ 1061.922917] R10: 0000000000000001 R11: ffffffff90cd85c8 R12: 0000000000000000 [ 1061.922918] R13: ffffb471c1e63d20 R14: 0000000000000000 R15: ffffffffc0a8d080 [ 1061.922923] ? __pfx_test_init+0x10/0x10 [test] [ 1061.922934] test_init+0x52/0xff0 [test] [ 1061.922941] do_one_initcall+0x5c/0x280 [ 1061.923004] ? kmalloc_trace+0xa9/0xc0 [ 1061.923105] do_init_module+0x60/0x240 [ 1061.923111] load_module+0x1f6e/0x20d0 [ 1061.923119] ? ima_post_read_file+0xe3/0xf0 [ 1061.923225] ? init_module_from_file+0x88/0xc0 [ 1061.923229] init_module_from_file+0x88/0xc0 [ 1061.923238] idempotent_init_module+0x19c/0x250 [ 1061.923244] ? security_capable+0x39/0x60 [ 1061.923304] __x64_sys_finit_module+0x5b/0xb0 [ 1061.923310] do_syscall_64+0x3b/0x90 [ 1061.923366] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 1061.923421] RIP: 0033:0x7fdd986f8e29 [ 1061.923427] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 17 e0 2c 00 f7 d8 64 89 01 48 [ 1061.923429] RSP: 002b:00007ffe2f34dd18 EFLAGS: 00000206 ORIG_RAX: 0000000000000139 [ 1061.923432] RAX: ffffffffffffffda RBX: 0000000001ede240 RCX: 00007fdd986f8e29 [ 1061.923434] RDX: 0000000000000000 RSI: 000000000041a96e RDI: 0000000000000003 [ 1061.923435] RBP: 000000000041a96e R08: 0000000000000000 R09: 00007ffe2f34deb8 [ 1061.923436] R10: 0000000000000003 R11: 0000000000000206 R12: 0000000000000000 [ 1061.923437] R13: 0000000001ede210 R14: 0000000000000000 R15: 0000000000000000 [ 1061.923444] </TASK> [ 1061.923446] INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 4.474 msecs ---------------------------------------- > > Yes, you are right. > > The diff below is my two cents (only for thoughts). > I'm thinking something like below. (Completely untested.) I haven't checked IRQ state handling. But in the last diff chunk, why raw_spin_unlock_irqrestore() (which does not re-enable IRQs if the caller already disabled IRQs) is used before calling the callback function and raw_spin_lock_irq() (which always disables IRQs) is used after calling the callback function? Is it legal to disable IRQs again when the caller already disabled IRQs? ---------------------------------------- diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 0ee140176f10..5640730ec31c 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -123,8 +123,11 @@ struct hrtimer { u8 state; u8 is_rel; u8 is_soft; u8 is_hard; +#ifdef CONFIG_LOCKDEP + struct lockdep_map lockdep_map; +#endif }; /** * struct hrtimer_sleeper - simple sleeper structure @@ -440,15 +443,15 @@ static inline void hrtimer_restart(struct hrtimer *timer) hrtimer_start_expires(timer, HRTIMER_MODE_ABS); } /* Query timers: */ -extern ktime_t __hrtimer_get_remaining(const struct hrtimer *timer, bool adjust); +extern ktime_t __hrtimer_get_remaining(struct hrtimer *timer, bool adjust); /** * hrtimer_get_remaining - get remaining time for the timer * @timer: the timer to read */ -static inline ktime_t hrtimer_get_remaining(const struct hrtimer *timer) +static inline ktime_t hrtimer_get_remaining(struct hrtimer *timer) { return __hrtimer_get_remaining(timer, false); } diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 238262e4aba7..fe0681d34b56 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -161,14 +161,23 @@ static inline bool is_migration_base(struct hrtimer_clock_base *base) * possible to set timer->base = &migration_base and drop the lock: the timer * remains locked. */ static -struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer, +struct hrtimer_clock_base *lock_hrtimer_base(struct hrtimer *timer, unsigned long *flags) __acquires(&timer->base->lock) { struct hrtimer_clock_base *base; +#ifdef CONFIG_LOCKDEP + unsigned long flags2; + + local_irq_save(flags2); + lock_map_acquire(&timer->lockdep_map); + lock_map_release(&timer->lockdep_map); + local_irq_restore(flags2); +#endif + for (;;) { base = READ_ONCE(timer->base); if (likely(base != &migration_base)) { raw_spin_lock_irqsave(&base->cpu_base->lock, *flags); @@ -1456,9 +1465,9 @@ EXPORT_SYMBOL_GPL(hrtimer_cancel); * __hrtimer_get_remaining - get remaining time for the timer * @timer: the timer to read * @adjust: adjust relative timers when CONFIG_TIME_LOW_RES=y */ -ktime_t __hrtimer_get_remaining(const struct hrtimer *timer, bool adjust) +ktime_t __hrtimer_get_remaining(struct hrtimer *timer, bool adjust) { unsigned long flags; ktime_t rem; @@ -1574,8 +1583,14 @@ static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id, timer->is_soft = softtimer; timer->is_hard = !!(mode & HRTIMER_MODE_HARD); timer->base = &cpu_base->clock_base[base]; timerqueue_init(&timer->node); +#ifdef CONFIG_LOCKDEP + { + static struct lock_class_key __key; + lockdep_init_map(&timer->lockdep_map, "hrtimer", &__key, 0); + } +#endif } /** * hrtimer_init - initialize a timer to the given clock @@ -1684,9 +1699,19 @@ static void __run_hrtimer(struct hrtimer_cpu_base *cpu_base, raw_spin_unlock_irqrestore(&cpu_base->lock, flags); trace_hrtimer_expire_entry(timer, now); expires_in_hardirq = lockdep_hrtimer_enter(timer); +#ifdef CONFIG_LOCKDEP + local_irq_save(flags); + lock_map_acquire(&timer->lockdep_map); + local_irq_restore(flags); +#endif restart = fn(timer); +#ifdef CONFIG_LOCKDEP + local_irq_save(flags); + lock_map_release(&timer->lockdep_map); + local_irq_restore(flags); +#endif lockdep_hrtimer_exit(expires_in_hardirq); trace_hrtimer_expire_exit(timer); raw_spin_lock_irq(&cpu_base->lock); ---------------------------------------- ^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-09-14 13:14 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-07-12 23:02 BUG: soft lockup in smp_call_function syzbot 2020-07-13 6:53 ` Dmitry Vyukov 2020-07-13 8:58 ` syzbot 2020-07-15 5:43 ` syzbot 2021-09-15 1:59 Hao Sun 2021-09-15 5:12 ` Zhang, Qiang 2023-09-12 23:02 Sanan Hasanov 2023-09-13 10:05 ` Peter Zijlstra 2023-09-13 11:07 ` Hillf Danton 2023-09-13 14:30 ` Tetsuo Handa 2023-09-14 12:21 ` Hillf Danton 2023-09-14 13:13 ` Tetsuo Handa
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.