On Fri, Aug 07, 2020 at 09:23PM +0200, Peter Zijlstra wrote: > Hi, > > Syzbot report: > > https://lkml.kernel.org/r/000000000000e3068105ac405407@google.com > > triggered a whole bunch of fallout. > > These 3 patches are in reverse order of discovery. With them applied the > reproducer no longer triggers and the machine boots without errors. > > I also spend half the day trying to rewrite TRACE_IRQFLAGS, but so far haven't > had success. I have the feeling that there's something slightly simpler hiding > in there, but so far it doesn't want to come out. > > Please, look carefully, this is all tricky code, and it's been a long and hot > day here, mistakes are almost guaranteed :/ > > I also want to remove the _rcuidle() from trace_preemptirq.c, but given > this was quite enough, I didn't touch those. Unfortunately I get LOCKDEP_DEBUG warnings, when testing with one of syzbot's configs. This appears at some point during boot (no other test): DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled()) WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:4875 check_flags+0x52/0x200 kernel/locking/lockdep.c:4875 Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.0+ #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 RIP: 0010:check_flags+0x52/0x200 kernel/locking/lockdep.c:4875 Code: c0 74 48 e8 40 f7 a5 03 85 c0 74 20 83 3d bd 33 53 05 00 75 17 48 c7 c7 ce 12 c9 85 48 c7 c6 04 8c ca 85 31 c0 e8 4e c1 f7 ff <0f> 0b 48 c7 c7 db 8b ca 85 e9 8c 01 00 00 85 c0 75 0f 65 8b 05 c5 RSP: 0000:ffffffff86203bf0 EFLAGS: 00010046 RAX: 1f9fa70771052a00 RBX: 0000000000000000 RCX: ffffffff8627e980 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffffffff813d8494 R09: 0000000000000000 R10: 0000000000000012 R11: ffffffff8627e980 R12: ffffffff8627e980 R13: ffffffff862e3420 R14: 00000000ffffffff R15: 0000000000000046 FS: 0000000000000000(0000) GS:ffff88881fa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88883ffff000 CR3: 0000000006275001 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 00000000 Call Trace: lock_is_held_type+0x4d/0x110 kernel/locking/lockdep.c:5043 lock_is_held include/linux/lockdep.h:266 [inline] rcu_read_lock_sched_held+0x41/0x90 kernel/rcu/update.c:136 trace_lock_acquire+0x49/0x120 include/trace/events/lock.h:13 lock_acquire+0x8d/0x2c0 kernel/locking/lockdep.c:5003 mutex_trylock+0xdd/0x120 kernel/locking/mutex.c:1420 exp_funnel_lock kernel/rcu/tree_exp.h:290 [inline] synchronize_rcu_expedited+0x19d/0x400 kernel/rcu/tree_exp.h:838 synchronize_rcu+0xa4/0xe0 kernel/rcu/tree.c:3574 rcu_test_sync_prims+0x5/0x10 kernel/rcu/update.c:234 rest_init+0xc/0x270 init/main.c:665 start_kernel+0x445/0x4bd init/main.c:1045 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243 irq event stamp: 6599 hardirqs last enabled at (6599): [] console_unlock+0x6fa/0x7d0 kernel/printk/printk.c:2530 hardirqs last disabled at (6598): [] console_unlock+0xf0/0x7d0 kernel/printk/printk.c:2441 softirqs last enabled at (5810): [] common_interrupt+0x1d/0x240 arch/x86/kernel/irq.c:239 softirqs last disabled at (5809): [] local_bh_disable+0x0/0x20 net/mptcp/protocol.c:1301 ---[ end trace f3de5b6e614e349e ]--- possible reason: unannotated irqs-off. irq event stamp: 6599 hardirqs last enabled at (6599): [] console_unlock+0x6fa/0x7d0 kernel/printk/printk.c:2530 hardirqs last disabled at (6598): [] console_unlock+0xf0/0x7d0 kernel/printk/printk.c:2441 softirqs last enabled at (5810): [] common_interrupt+0x1d/0x240 arch/x86/kernel/irq.c:239 softirqs last disabled at (5809): [] local_bh_disable+0x0/0x20 net/mptcp/protocol.c:1301 The .config is attached. Thanks, -- Marco