* [syzbot] possible deadlock in del_gendisk @ 2021-04-13 17:33 syzbot 2021-04-13 17:41 ` Steven Rostedt 0 siblings, 1 reply; 10+ messages in thread From: syzbot @ 2021-04-13 17:33 UTC (permalink / raw) To: akpm, axboe, bp, hpa, jmattson, joro, kvm, linux-block, linux-kernel, mark.rutland, masahiroy, mingo, pbonzini, peterz, rafael.j.wysocki, rostedt, seanjc, syzkaller-bugs, tglx, vkuznets, wanpengli, will, x86 Hello, syzbot found the following issue on: HEAD commit: e99d8a84 Add linux-next specific files for 20210409 git tree: linux-next console output: https://syzkaller.appspot.com/x/log.txt?x=13b01681d00000 kernel config: https://syzkaller.appspot.com/x/.config?x=7cd69574979bfeb7 dashboard link: https://syzkaller.appspot.com/bug?extid=61e04e51b7ac86930589 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=148265d9d00000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16a981a1d00000 The issue was bisected to: commit 997acaf6b4b59c6a9c259740312a69ea549cc684 Author: Mark Rutland <mark.rutland@arm.com> Date: Mon Jan 11 15:37:07 2021 +0000 lockdep: report broken irq restoration bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=16a7e77ed00000 final oops: https://syzkaller.appspot.com/x/report.txt?x=15a7e77ed00000 console output: https://syzkaller.appspot.com/x/log.txt?x=11a7e77ed00000 IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+61e04e51b7ac86930589@syzkaller.appspotmail.com Fixes: 997acaf6b4b5 ("lockdep: report broken irq restoration") ====================================================== WARNING: possible circular locking dependency detected 5.12.0-rc6-next-20210409-syzkaller #0 Not tainted ------------------------------------------------------ syz-executor104/8440 is trying to acquire lock: ffff888016e9dca0 (&bdev->bd_mutex){+.+.}-{3:3}, at: del_gendisk+0x250/0x9e0 block/genhd.c:618 but task is already holding lock: ffffffff8c7d9430 (bdev_lookup_sem){++++}-{3:3}, at: del_gendisk+0x222/0x9e0 block/genhd.c:616 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (bdev_lookup_sem){++++}-{3:3}: down_write+0x92/0x150 kernel/locking/rwsem.c:1406 del_gendisk+0x222/0x9e0 block/genhd.c:616 loop_remove drivers/block/loop.c:2191 [inline] loop_control_ioctl drivers/block/loop.c:2291 [inline] loop_control_ioctl+0x40d/0x4f0 drivers/block/loop.c:2251 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #1 (loop_ctl_mutex){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:949 [inline] __mutex_lock+0x139/0x1120 kernel/locking/mutex.c:1096 lo_open+0x1a/0x130 drivers/block/loop.c:1890 __blkdev_get+0x135/0xa30 fs/block_dev.c:1305 blkdev_get_by_dev fs/block_dev.c:1457 [inline] blkdev_get_by_dev+0x26c/0x600 fs/block_dev.c:1425 blkdev_open+0x154/0x2b0 fs/block_dev.c:1554 do_dentry_open+0x4b9/0x11b0 fs/open.c:826 do_open fs/namei.c:3361 [inline] path_openat+0x1c09/0x27d0 fs/namei.c:3494 do_filp_open+0x190/0x3d0 fs/namei.c:3521 do_sys_openat2+0x16d/0x420 fs/open.c:1187 do_sys_open fs/open.c:1203 [inline] __do_sys_open fs/open.c:1211 [inline] __se_sys_open fs/open.c:1207 [inline] __x64_sys_open+0x119/0x1c0 fs/open.c:1207 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (&bdev->bd_mutex){+.+.}-{3:3}: check_prev_add kernel/locking/lockdep.c:2938 [inline] check_prevs_add kernel/locking/lockdep.c:3061 [inline] validate_chain kernel/locking/lockdep.c:3676 [inline] __lock_acquire+0x2a17/0x5230 kernel/locking/lockdep.c:4902 lock_acquire kernel/locking/lockdep.c:5512 [inline] lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5477 __mutex_lock_common kernel/locking/mutex.c:949 [inline] __mutex_lock+0x139/0x1120 kernel/locking/mutex.c:1096 del_gendisk+0x250/0x9e0 block/genhd.c:618 loop_remove drivers/block/loop.c:2191 [inline] loop_control_ioctl drivers/block/loop.c:2291 [inline] loop_control_ioctl+0x40d/0x4f0 drivers/block/loop.c:2251 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Chain exists of: &bdev->bd_mutex --> loop_ctl_mutex --> bdev_lookup_sem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(bdev_lookup_sem); lock(loop_ctl_mutex); lock(bdev_lookup_sem); lock(&bdev->bd_mutex); *** DEADLOCK *** 2 locks held by syz-executor104/8440: #0: ffffffff8ca5f148 (loop_ctl_mutex){+.+.}-{3:3}, at: loop_control_ioctl+0x7b/0x4f0 drivers/block/loop.c:2257 #1: ffffffff8c7d9430 (bdev_lookup_sem){++++}-{3:3}, at: del_gendisk+0x222/0x9e0 block/genhd.c:616 stack backtrace: CPU: 1 PID: 8440 Comm: syz-executor104 Not tainted 5.12.0-rc6-next-20210409-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2129 check_prev_add kernel/locking/lockdep.c:2938 [inline] check_prevs_add kernel/locking/lockdep.c:3061 [inline] validate_chain kernel/locking/lockdep.c:3676 [inline] __lock_acquire+0x2a17/0x5230 kernel/locking/lockdep.c:4902 lock_acquire kernel/locking/lockdep.c:5512 [inline] lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5477 __mutex_lock_common kernel/locking/mutex.c:949 [inline] __mutex_lock+0x139/0x1120 kernel/locking/mutex.c:1096 del_gendisk+0x250/0x9e0 block/genhd.c:618 loop_remove drivers/block/loop.c:2191 [inline] loop_control_ioctl drivers/block/loop.c:2291 [inline] loop_control_ioctl+0x40d/0x4f0 drivers/block/loop.c:2251 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x43ee49 Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fff4d86c238 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000400488 RCX: 000000000043ee49 RDX: 0000000000000002 RSI: 0000000000004c81 RDI: 0000000000000003 RBP: 0000000000402e30 R08: 0000000000000000 R09: 0000000000400488 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402ec0 R13: 0000000000000000 R14: 00000000004ac018 R15: 0000000000400488 --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. For information about bisection process see: https://goo.gl/tpsmEJ#bisection syzbot can test patches for this issue, for details see: https://goo.gl/tpsmEJ#testing-patches ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot] possible deadlock in del_gendisk 2021-04-13 17:33 [syzbot] possible deadlock in del_gendisk syzbot @ 2021-04-13 17:41 ` Steven Rostedt 2021-04-13 17:43 ` Steven Rostedt 0 siblings, 1 reply; 10+ messages in thread From: Steven Rostedt @ 2021-04-13 17:41 UTC (permalink / raw) To: syzbot Cc: akpm, axboe, bp, hpa, jmattson, joro, kvm, linux-block, linux-kernel, mark.rutland, masahiroy, mingo, pbonzini, peterz, rafael.j.wysocki, seanjc, syzkaller-bugs, tglx, vkuznets, wanpengli, will, x86 On Tue, 13 Apr 2021 10:33:19 -0700 syzbot <syzbot+61e04e51b7ac86930589@syzkaller.appspotmail.com> wrote: > Hello, > > syzbot found the following issue on: > > HEAD commit: e99d8a84 Add linux-next specific files for 20210409 > git tree: linux-next > console output: https://syzkaller.appspot.com/x/log.txt?x=13b01681d00000 > kernel config: https://syzkaller.appspot.com/x/.config?x=7cd69574979bfeb7 > dashboard link: https://syzkaller.appspot.com/bug?extid=61e04e51b7ac86930589 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=148265d9d00000 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16a981a1d00000 > > The issue was bisected to: > > commit 997acaf6b4b59c6a9c259740312a69ea549cc684 > Author: Mark Rutland <mark.rutland@arm.com> > Date: Mon Jan 11 15:37:07 2021 +0000 > > lockdep: report broken irq restoration As the below splats look like it has nothing to do with this patch, and this patch will add a WARN() if there's broken logic somewhere, I bet the bisect got confused (if it is automated and does a panic_on_warning), because it will panic for broken code that his patch detects. That is, the bisect was confused because it was triggering on two different issues. One that triggered the reported splat below, and another that this commit detects and warns on. Hence, the bisect failed. -- Steve > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=16a7e77ed00000 > final oops: https://syzkaller.appspot.com/x/report.txt?x=15a7e77ed00000 > console output: https://syzkaller.appspot.com/x/log.txt?x=11a7e77ed00000 > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+61e04e51b7ac86930589@syzkaller.appspotmail.com > Fixes: 997acaf6b4b5 ("lockdep: report broken irq restoration") > > ====================================================== > WARNING: possible circular locking dependency detected > 5.12.0-rc6-next-20210409-syzkaller #0 Not tainted > ------------------------------------------------------ > syz-executor104/8440 is trying to acquire lock: > ffff888016e9dca0 (&bdev->bd_mutex){+.+.}-{3:3}, at: del_gendisk+0x250/0x9e0 block/genhd.c:618 > > but task is already holding lock: > ffffffff8c7d9430 (bdev_lookup_sem){++++}-{3:3}, at: del_gendisk+0x222/0x9e0 block/genhd.c:616 > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #2 (bdev_lookup_sem){++++}-{3:3}: > down_write+0x92/0x150 kernel/locking/rwsem.c:1406 > del_gendisk+0x222/0x9e0 block/genhd.c:616 > loop_remove drivers/block/loop.c:2191 [inline] > loop_control_ioctl drivers/block/loop.c:2291 [inline] > loop_control_ioctl+0x40d/0x4f0 drivers/block/loop.c:2251 > vfs_ioctl fs/ioctl.c:48 [inline] > __do_sys_ioctl fs/ioctl.c:753 [inline] > __se_sys_ioctl fs/ioctl.c:739 [inline] > __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739 > do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 > entry_SYSCALL_64_after_hwframe+0x44/0xae > > -> #1 (loop_ctl_mutex){+.+.}-{3:3}: > __mutex_lock_common kernel/locking/mutex.c:949 [inline] > __mutex_lock+0x139/0x1120 kernel/locking/mutex.c:1096 > lo_open+0x1a/0x130 drivers/block/loop.c:1890 > __blkdev_get+0x135/0xa30 fs/block_dev.c:1305 > blkdev_get_by_dev fs/block_dev.c:1457 [inline] > blkdev_get_by_dev+0x26c/0x600 fs/block_dev.c:1425 > blkdev_open+0x154/0x2b0 fs/block_dev.c:1554 > do_dentry_open+0x4b9/0x11b0 fs/open.c:826 > do_open fs/namei.c:3361 [inline] > path_openat+0x1c09/0x27d0 fs/namei.c:3494 > do_filp_open+0x190/0x3d0 fs/namei.c:3521 > do_sys_openat2+0x16d/0x420 fs/open.c:1187 > do_sys_open fs/open.c:1203 [inline] > __do_sys_open fs/open.c:1211 [inline] > __se_sys_open fs/open.c:1207 [inline] > __x64_sys_open+0x119/0x1c0 fs/open.c:1207 > do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 > entry_SYSCALL_64_after_hwframe+0x44/0xae > > -> #0 (&bdev->bd_mutex){+.+.}-{3:3}: > check_prev_add kernel/locking/lockdep.c:2938 [inline] > check_prevs_add kernel/locking/lockdep.c:3061 [inline] > validate_chain kernel/locking/lockdep.c:3676 [inline] > __lock_acquire+0x2a17/0x5230 kernel/locking/lockdep.c:4902 > lock_acquire kernel/locking/lockdep.c:5512 [inline] > lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5477 > __mutex_lock_common kernel/locking/mutex.c:949 [inline] > __mutex_lock+0x139/0x1120 kernel/locking/mutex.c:1096 > del_gendisk+0x250/0x9e0 block/genhd.c:618 > loop_remove drivers/block/loop.c:2191 [inline] > loop_control_ioctl drivers/block/loop.c:2291 [inline] > loop_control_ioctl+0x40d/0x4f0 drivers/block/loop.c:2251 > vfs_ioctl fs/ioctl.c:48 [inline] > __do_sys_ioctl fs/ioctl.c:753 [inline] > __se_sys_ioctl fs/ioctl.c:739 [inline] > __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739 > do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 > entry_SYSCALL_64_after_hwframe+0x44/0xae > > other info that might help us debug this: > > Chain exists of: > &bdev->bd_mutex --> loop_ctl_mutex --> bdev_lookup_sem > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(bdev_lookup_sem); > lock(loop_ctl_mutex); > lock(bdev_lookup_sem); > lock(&bdev->bd_mutex); > > *** DEADLOCK *** > > 2 locks held by syz-executor104/8440: > #0: ffffffff8ca5f148 (loop_ctl_mutex){+.+.}-{3:3}, at: loop_control_ioctl+0x7b/0x4f0 drivers/block/loop.c:2257 > #1: ffffffff8c7d9430 (bdev_lookup_sem){++++}-{3:3}, at: del_gendisk+0x222/0x9e0 block/genhd.c:616 > > stack backtrace: > CPU: 1 PID: 8440 Comm: syz-executor104 Not tainted 5.12.0-rc6-next-20210409-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:79 [inline] > dump_stack+0x141/0x1d7 lib/dump_stack.c:120 > check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2129 > check_prev_add kernel/locking/lockdep.c:2938 [inline] > check_prevs_add kernel/locking/lockdep.c:3061 [inline] > validate_chain kernel/locking/lockdep.c:3676 [inline] > __lock_acquire+0x2a17/0x5230 kernel/locking/lockdep.c:4902 > lock_acquire kernel/locking/lockdep.c:5512 [inline] > lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5477 > __mutex_lock_common kernel/locking/mutex.c:949 [inline] > __mutex_lock+0x139/0x1120 kernel/locking/mutex.c:1096 > del_gendisk+0x250/0x9e0 block/genhd.c:618 > loop_remove drivers/block/loop.c:2191 [inline] > loop_control_ioctl drivers/block/loop.c:2291 [inline] > loop_control_ioctl+0x40d/0x4f0 drivers/block/loop.c:2251 > vfs_ioctl fs/ioctl.c:48 [inline] > __do_sys_ioctl fs/ioctl.c:753 [inline] > __se_sys_ioctl fs/ioctl.c:739 [inline] > __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739 > do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 > entry_SYSCALL_64_after_hwframe+0x44/0xae > RIP: 0033:0x43ee49 > Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 > RSP: 002b:00007fff4d86c238 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 0000000000400488 RCX: 000000000043ee49 > RDX: 0000000000000002 RSI: 0000000000004c81 RDI: 0000000000000003 > RBP: 0000000000402e30 R08: 0000000000000000 R09: 0000000000400488 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402ec0 > R13: 0000000000000000 R14: 00000000004ac018 R15: 0000000000400488 > > > --- > This report is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkaller@googlegroups.com. > > syzbot will keep track of this issue. See: > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > For information about bisection process see: https://goo.gl/tpsmEJ#bisection > syzbot can test patches for this issue, for details see: > https://goo.gl/tpsmEJ#testing-patches ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot] possible deadlock in del_gendisk 2021-04-13 17:41 ` Steven Rostedt @ 2021-04-13 17:43 ` Steven Rostedt 2021-04-13 18:24 ` Dmitry Vyukov 0 siblings, 1 reply; 10+ messages in thread From: Steven Rostedt @ 2021-04-13 17:43 UTC (permalink / raw) To: syzbot Cc: akpm, axboe, bp, hpa, jmattson, joro, kvm, linux-block, linux-kernel, mark.rutland, masahiroy, mingo, pbonzini, peterz, rafael.j.wysocki, seanjc, syzkaller-bugs, tglx, vkuznets, wanpengli, will, x86 On Tue, 13 Apr 2021 13:41:47 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > As the below splats look like it has nothing to do with this patch, and > this patch will add a WARN() if there's broken logic somewhere, I bet the > bisect got confused (if it is automated and does a panic_on_warning), > because it will panic for broken code that his patch detects. > > That is, the bisect was confused because it was triggering on two different > issues. One that triggered the reported splat below, and another that this > commit detects and warns on. Is it possible to update the the bisect to make sure that if it is failing on warnings, to make sure the warnings are somewhat related, before decided that its the same bug? -- Steve ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot] possible deadlock in del_gendisk 2021-04-13 17:43 ` Steven Rostedt @ 2021-04-13 18:24 ` Dmitry Vyukov 2021-04-13 18:40 ` Steven Rostedt 0 siblings, 1 reply; 10+ messages in thread From: Dmitry Vyukov @ 2021-04-13 18:24 UTC (permalink / raw) To: Steven Rostedt, syzkaller Cc: syzbot, Andrew Morton, Jens Axboe, Borislav Petkov, H. Peter Anvin, Jim Mattson, Joerg Roedel, KVM list, linux-block, LKML, Mark Rutland, masahiroy, Ingo Molnar, Paolo Bonzini, Peter Zijlstra, rafael.j.wysocki, Sean Christopherson, syzkaller-bugs, Thomas Gleixner, Vitaly Kuznetsov, Wanpeng Li, Will Deacon, the arch/x86 maintainers On Tue, Apr 13, 2021 at 7:43 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Tue, 13 Apr 2021 13:41:47 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > As the below splats look like it has nothing to do with this patch, and > > this patch will add a WARN() if there's broken logic somewhere, I bet the > > bisect got confused (if it is automated and does a panic_on_warning), > > because it will panic for broken code that his patch detects. > > > > That is, the bisect was confused because it was triggering on two different > > issues. One that triggered the reported splat below, and another that this > > commit detects and warns on. > > Is it possible to update the the bisect to make sure that if it is failing > on warnings, to make sure the warnings are somewhat related, before decided > that its the same bug? It does not seem to be feasible, bugs manifest differently in both space and time. Also even if we somehow conclude the crash we see is different, it says nothing about the original bug. For more data see: https://groups.google.com/g/syzkaller/c/sR8aAXaWEF4/m/tTWYRgvmAwAJ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot] possible deadlock in del_gendisk 2021-04-13 18:24 ` Dmitry Vyukov @ 2021-04-13 18:40 ` Steven Rostedt 2021-04-13 18:43 ` Steven Rostedt 0 siblings, 1 reply; 10+ messages in thread From: Steven Rostedt @ 2021-04-13 18:40 UTC (permalink / raw) To: Dmitry Vyukov Cc: syzkaller, syzbot, Andrew Morton, Jens Axboe, Borislav Petkov, H. Peter Anvin, Jim Mattson, Joerg Roedel, KVM list, linux-block, LKML, Mark Rutland, masahiroy, Ingo Molnar, Paolo Bonzini, Peter Zijlstra, rafael.j.wysocki, Sean Christopherson, syzkaller-bugs, Thomas Gleixner, Vitaly Kuznetsov, Wanpeng Li, Will Deacon, the arch/x86 maintainers On Tue, 13 Apr 2021 20:24:00 +0200 Dmitry Vyukov <dvyukov@google.com> wrote: > On Tue, Apr 13, 2021 at 7:43 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > > > On Tue, 13 Apr 2021 13:41:47 -0400 > > Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > As the below splats look like it has nothing to do with this patch, and > > > this patch will add a WARN() if there's broken logic somewhere, I bet the > > > bisect got confused (if it is automated and does a panic_on_warning), > > > because it will panic for broken code that his patch detects. > > > > > > That is, the bisect was confused because it was triggering on two different > > > issues. One that triggered the reported splat below, and another that this > > > commit detects and warns on. > > > > Is it possible to update the the bisect to make sure that if it is failing > > on warnings, to make sure the warnings are somewhat related, before decided > > that its the same bug? > > It does not seem to be feasible, bugs manifest differently in both > space and time. Also even if we somehow conclude the crash we see is > different, it says nothing about the original bug. For more data see: > https://groups.google.com/g/syzkaller/c/sR8aAXaWEF4/m/tTWYRgvmAwAJ Sure, but if you trigger a lockdep bug, lockdep bugs are usually very similar in all instances. It usually will include the same locks, or at least be the same type of lockdep bug (where some types are related). If the bisect saw an issue before and at this commit, I would think it didn't trigger a lockdep bug at all, and simply triggered a warning. In fact, according to: https://syzkaller.appspot.com/x/report.txt?x=15a7e77ed00000 That's exactly what it did. That is, if the original bug is a lockdep warning, you should only be interested in lockdep warnings (at a minimum). The final bug here had: ------------[ cut here ]------------ raw_local_irq_restore() called with IRQs enabled WARNING: CPU: 0 PID: 8777 at kernel/locking/irqflag-debug.c:9 warn_bogus_irq_restore kernel/locking/irqflag-debug.c:9 [inline] WARNING: CPU: 0 PID: 8777 at kernel/locking/irqflag-debug.c:9 warn_bogus_irq_restore+0x1d/0x20 kernel/locking/irqflag-debug.c:7 Modules linked in: CPU: 0 PID: 8777 Comm: syz-executor.1 Not tainted 5.11.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:warn_bogus_irq_restore kernel/locking/irqflag-debug.c:9 [inline] RIP: 0010:warn_bogus_irq_restore+0x1d/0x20 kernel/locking/irqflag-debug.c:7 Code: 51 00 e9 3f fe ff ff cc cc cc cc cc cc 80 3d e0 b4 ce 0a 00 74 01 c3 48 c7 c7 60 f5 8a 88 c6 05 cf b4 ce 0a 01 e8 17 01 a4 06 <0f> 0b c3 48 c7 c0 a0 46 4d 8e 53 48 89 fb 48 ba 00 00 00 00 00 fc RSP: 0018:ffffc900017bf9f8 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff8881477b2040 RCX: 0000000000000000 RDX: 0000000000000002 RSI: 0000000000000004 RDI: fffff520002f7f31 RBP: 0000000000000246 R08: 0000000000000001 R09: ffff8880b9e2015b R10: ffffed10173c402b R11: 0000000000000001 R12: 0000000000000003 R13: ffffed1028ef6408 R14: 0000000000000001 R15: ffff8880b9e359c0 FS: 00000000017b8400(0000) GS:ffff8880b9e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000017c1848 CR3: 0000000010920000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: kvm_wait arch/x86/kernel/kvm.c:860 [inline] kvm_wait+0xc3/0xe0 arch/x86/kernel/kvm.c:837 pv_wait arch/x86/include/asm/paravirt.h:564 [inline] pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:470 [inline] __pv_queued_spin_lock_slowpath+0x8b8/0xb40 kernel/locking/qspinlock.c:508 pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:554 [inline] queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline] queued_spin_lock include/asm-generic/qspinlock.h:85 [inline] do_raw_spin_lock+0x200/0x2b0 kernel/locking/spinlock_debug.c:113 spin_lock include/linux/spinlock.h:354 [inline] ext4_lock_group fs/ext4/ext4.h:3379 [inline] __ext4_new_inode+0x2da2/0x44d0 fs/ext4/ialloc.c:1187 ext4_mkdir+0x298/0x910 fs/ext4/namei.c:2793 vfs_mkdir+0x413/0x660 fs/namei.c:3652 do_mkdirat+0x1eb/0x250 fs/namei.c:3675 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x465567 Code: 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 53 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffe00ad69f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000053 RAX: ffffffffffffffda RBX: 00007ffe00ad6a90 RCX: 0000000000465567 RDX: 0000000000000000 RSI: 00000000000001ff RDI: 00007ffe00ad6a90 RBP: 00007ffe00ad6a6c R08: 0000000000000000 R09: 0000000000000006 R10: 00007ffe00ad6794 R11: 0000000000000202 R12: 0000000000000032 R13: 000000000006549a R14: 0000000000000002 R15: 00007ffe00ad6ad0 Which shows a WARN was triggered, and totally unrelated to what you were bisecting. Just matching types of warnings (lockdep to lockdep, or WARN_ON to WARN_ON) would help reduce the number of bogus bisects you are having. -- Steve ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot] possible deadlock in del_gendisk 2021-04-13 18:40 ` Steven Rostedt @ 2021-04-13 18:43 ` Steven Rostedt 2021-04-16 7:51 ` Bisections with different bug manifestations Dmitry Vyukov 0 siblings, 1 reply; 10+ messages in thread From: Steven Rostedt @ 2021-04-13 18:43 UTC (permalink / raw) To: Dmitry Vyukov Cc: syzkaller, syzbot, Andrew Morton, Jens Axboe, Borislav Petkov, H. Peter Anvin, Jim Mattson, Joerg Roedel, KVM list, linux-block, LKML, Mark Rutland, masahiroy, Ingo Molnar, Paolo Bonzini, Peter Zijlstra, rafael.j.wysocki, Sean Christopherson, syzkaller-bugs, Thomas Gleixner, Vitaly Kuznetsov, Wanpeng Li, Will Deacon, the arch/x86 maintainers On Tue, 13 Apr 2021 14:40:09 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > ------------[ cut here ]------------ > raw_local_irq_restore() called with IRQs enabled > WARNING: CPU: 0 PID: 8777 at kernel/locking/irqflag-debug.c:9 warn_bogus_irq_restore kernel/locking/irqflag-debug.c:9 [inline] > WARNING: CPU: 0 PID: 8777 at kernel/locking/irqflag-debug.c:9 warn_bogus_irq_restore+0x1d/0x20 kernel/locking/irqflag-debug.c:7 In fact, when you have the above, which is a WARN() with text: "raw_local_irq_restore() called with IRQs enabled" It is pretty much guaranteed that all triggers of this bug will have the above warning with the same text. -- Steve ^ permalink raw reply [flat|nested] 10+ messages in thread
* Bisections with different bug manifestations 2021-04-13 18:43 ` Steven Rostedt @ 2021-04-16 7:51 ` Dmitry Vyukov 2021-04-16 13:13 ` Steven Rostedt 0 siblings, 1 reply; 10+ messages in thread From: Dmitry Vyukov @ 2021-04-16 7:51 UTC (permalink / raw) To: Steven Rostedt Cc: syzkaller, syzbot, Andrew Morton, Jens Axboe, Borislav Petkov, H. Peter Anvin, Jim Mattson, Joerg Roedel, KVM list, linux-block, LKML, Mark Rutland, masahiroy, Ingo Molnar, Paolo Bonzini, Peter Zijlstra, rafael.j.wysocki, Sean Christopherson, syzkaller-bugs, Thomas Gleixner, Vitaly Kuznetsov, Wanpeng Li, Will Deacon, the arch/x86 maintainers On Tue, Apr 13, 2021 at 8:43 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Tue, 13 Apr 2021 14:40:09 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > ------------[ cut here ]------------ > > raw_local_irq_restore() called with IRQs enabled > > WARNING: CPU: 0 PID: 8777 at kernel/locking/irqflag-debug.c:9 warn_bogus_irq_restore kernel/locking/irqflag-debug.c:9 [inline] > > WARNING: CPU: 0 PID: 8777 at kernel/locking/irqflag-debug.c:9 warn_bogus_irq_restore+0x1d/0x20 kernel/locking/irqflag-debug.c:7 > > In fact, when you have the above, which is a WARN() with text: > > "raw_local_irq_restore() called with IRQs enabled" > > It is pretty much guaranteed that all triggers of this bug will have the > above warning with the same text. This looks nice on paper and looking at only 1 bisection log. But unfortunately in practice theory and practice are different... This was discussed at length multiple times: https://groups.google.com/g/syzkaller/search?q=bisection+different+manifestations https://groups.google.com/g/syzkaller-bugs/c/nFeC8-UG1gg/m/y6gUEsvAAgAJ https://groups.google.com/g/syzkaller/c/sR8aAXaWEF4/m/tTWYRgvmAwAJ https://groups.google.com/g/syzkaller/c/9NdprHsGBqo/m/Yj9uWRDgBQAJ If you look at substantial base of bisection logs, you will find lots of cases where bug types, functions don't match. Kernel crashes differently even on the same revision. And obviously things change if you change revisions. Also if you see presumably a different bug, what does it say regarding the original bug. I would very much like to improve automatic bisection quality, but it does not look trivial at all. Some random examples where, say, your hypothesis of WARN-to-WARN, BUG-to-BUG does not hold even on the same kernel revision (add to this different revisions and the fact that a different bug does not give info regarding the original bug): run #0: crashed: KASAN: use-after-free Read in fuse_dev_do_read run #1: crashed: WARNING in request_end run #2: crashed: KASAN: use-after-free Read in fuse_dev_do_read run #3: OK run #4: OK run #0: crashed: KASAN: slab-out-of-bounds Read in __ip_append_data run #1: crashed: inconsistent lock state in rhashtable_walk_enter run #2: crashed: inconsistent lock state in rhashtable_walk_enter run #3: crashed: inconsistent lock state in rhashtable_walk_enter run #4: crashed: inconsistent lock state in rhashtable_walk_enter run #5: crashed: inconsistent lock state in rhashtable_walk_enter run #6: crashed: inconsistent lock state in rhashtable_walk_enter run #7: crashed: kernel BUG at arch/x86/mm/physaddr.c:LINE! run #8: crashed: inconsistent lock state in rhashtable_walk_enter run #9: crashed: inconsistent lock state in rhashtable_walk_enter run #0: crashed: kernel BUG at arch/x86/mm/physaddr.c:LINE! run #1: crashed: inconsistent lock state in rhashtable_walk_enter run #2: crashed: inconsistent lock state in rhashtable_walk_enter run #3: crashed: inconsistent lock state in rhashtable_walk_enter run #4: crashed: inconsistent lock state in rhashtable_walk_enter run #5: crashed: kernel BUG at arch/x86/mm/physaddr.c:LINE! run #6: crashed: inconsistent lock state in rhashtable_walk_enter run #7: crashed: kernel BUG at arch/x86/mm/physaddr.c:LINE! run #8: crashed: inconsistent lock state in rhashtable_walk_enter run #9: crashed: inconsistent lock state in rhashtable_walk_enter run #0: crashed: KASAN: use-after-free Read in __vb2_perform_fileio run #1: crashed: KASAN: use-after-free Write in __vb2_cleanup_fileio run #2: crashed: KASAN: use-after-free Read in __vb2_perform_fileio run #3: crashed: KASAN: use-after-free Read in __vb2_perform_fileio run #4: crashed: INFO: task hung in vivid_stop_generating_vid_cap run #5: crashed: INFO: task hung in vivid_stop_generating_vid_cap run #6: crashed: INFO: task hung in vivid_stop_generating_vid_cap run #7: crashed: INFO: task hung in vivid_stop_generating_vid_cap run #8: crashed: INFO: task hung in vivid_stop_generating_vid_cap run #9: crashed: INFO: task hung in vivid_stop_generating_vid_cap run #0: crashed: general protection fault in sctp_assoc_rwnd_increase run #1: crashed: general protection fault in sctp_ulpevent_free run #2: crashed: general protection fault in sctp_assoc_rwnd_increase run #3: crashed: general protection fault in sctp_assoc_rwnd_increase run #4: crashed: general protection fault in sctp_assoc_rwnd_increase run #5: crashed: general protection fault in sctp_assoc_rwnd_increase run #6: crashed: general protection fault in sctp_assoc_rwnd_increase run #7: crashed: general protection fault in sctp_assoc_rwnd_increase run #0: crashed: general protection fault in sctp_assoc_rwnd_increase run #1: crashed: general protection fault in sctp_assoc_rwnd_increase run #2: crashed: general protection fault in sctp_assoc_rwnd_increase run #3: crashed: general protection fault in sctp_assoc_rwnd_increase run #4: crashed: general protection fault in corrupted run #5: crashed: general protection fault in sctp_assoc_rwnd_increase run #6: crashed: general protection fault in sctp_assoc_rwnd_increase run #7: crashed: general protection fault in corrupted run #0: crashed: INFO: rcu detected stall in corrupted run #1: crashed: INFO: rcu detected stall in ext4_file_write_iter run #2: crashed: INFO: rcu detected stall in sys_sendfile64 run #3: crashed: INFO: rcu detected stall in corrupted run #4: crashed: INFO: rcu detected stall in ext4_file_write_iter run #5: crashed: INFO: rcu detected stall in corrupted run #6: crashed: INFO: rcu detected stall in corrupted run #7: crashed: INFO: rcu detected stall in ext4_file_write_iter run #8: crashed: INFO: rcu detected stall in sys_sendfile64 run #9: crashed: INFO: rcu detected stall in ext4_file_write_iter run #0: crashed: INFO: rcu detected stall in sys_sendfile64 run #1: crashed: INFO: rcu detected stall in corrupted run #2: crashed: INFO: rcu detected stall in corrupted run #3: crashed: INFO: rcu detected stall in sys_sendfile64 run #4: crashed: INFO: rcu detected stall in corrupted run #5: crashed: INFO: rcu detected stall in corrupted run #6: crashed: INFO: rcu detected stall in corrupted run #7: crashed: INFO: rcu detected stall in corrupted run #8: crashed: INFO: rcu detected stall in sys_sendfile64 run #9: crashed: INFO: rcu detected stall in corrupted run #0: crashed: INFO: rcu detected stall in rw_verify_area run #1: crashed: INFO: rcu detected stall in ext4_file_write_iter run #2: crashed: INFO: rcu detected stall in corrupted run #3: crashed: INFO: rcu detected stall in corrupted run #4: crashed: INFO: rcu detected stall in ext4_file_write_iter run #5: crashed: INFO: rcu detected stall in ext4_file_write_iter run #6: crashed: INFO: rcu detected stall in corrupted run #7: crashed: INFO: rcu detected stall in ext4_file_write_iter run #8: crashed: INFO: rcu detected stall in ext4_file_write_iter run #9: crashed: INFO: rcu detected stall in rw_verify_area run #0: crashed: INFO: rcu detected stall in ext4_file_write_iter run #1: crashed: INFO: rcu detected stall in corrupted run #2: crashed: INFO: rcu detected stall in sys_sendfile64 run #3: crashed: INFO: rcu detected stall in sys_sendfile64 run #4: crashed: INFO: rcu detected stall in corrupted run #5: crashed: INFO: rcu detected stall in sys_sendfile64 run #6: crashed: INFO: rcu detected stall in sys_sendfile64 run #7: crashed: INFO: rcu detected stall in corrupted run #8: crashed: INFO: rcu detected stall in corrupted run #9: crashed: INFO: rcu detected stall in sys_sendfile64 run #0: crashed: KASAN: use-after-free Read in link_path_walk run #1: crashed: KASAN: use-after-free Read in link_path_walk run #2: crashed: KASAN: use-after-free Read in trailing_symlink run #3: crashed: KASAN: use-after-free Read in trailing_symlink run #4: crashed: KASAN: use-after-free Read in trailing_symlink run #5: crashed: KASAN: use-after-free Read in link_path_walk run #6: crashed: KASAN: use-after-free Read in link_path_walk run #7: crashed: KASAN: use-after-free Read in link_path_walk run #8: crashed: KASAN: use-after-free Read in trailing_symlink run #9: crashed: KASAN: use-after-free Read in trailing_symlink run #0: crashed: BUG: unable to handle kernel NULL pointer dereference in mrvl_setup run #1: crashed: BUG: unable to handle kernel NULL pointer dereference in mrvl_setup run #2: crashed: BUG: unable to handle kernel NULL pointer dereference in mrvl_setup run #3: crashed: BUG: unable to handle kernel NULL pointer dereference in mrvl_setup run #4: crashed: WARNING: ODEBUG bug in corrupted run #5: crashed: BUG: unable to handle kernel NULL pointer dereference in mrvl_setup run #6: crashed: BUG: unable to handle kernel NULL pointer dereference in mrvl_setup run #7: crashed: BUG: unable to handle kernel NULL pointer dereference in mrvl_setup run #8: crashed: BUG: unable to handle kernel NULL pointer dereference in mrvl_setup run #9: crashed: BUG: unable to handle kernel NULL pointer dereference in mrvl_setup run #0: crashed: KASAN: use-after-free Read in delayed_uprobe_remove run #1: crashed: KASAN: use-after-free Read in delayed_uprobe_remove run #2: crashed: general protection fault in delayed_uprobe_remove run #3: crashed: KASAN: use-after-free Read in delayed_uprobe_remove run #4: crashed: general protection fault in delayed_uprobe_remove run #5: crashed: KASAN: use-after-free Read in delayed_uprobe_remove run #6: OK run #7: OK run #8: OK run #9: OK run #0: crashed: general protection fault in delayed_uprobe_remove run #1: crashed: KASAN: use-after-free Read in delayed_uprobe_remove run #2: crashed: KASAN: use-after-free Read in delayed_uprobe_remove run #3: crashed: KASAN: use-after-free Read in delayed_uprobe_remove run #4: crashed: general protection fault in delayed_uprobe_remove run #5: crashed: KASAN: use-after-free Read in delayed_uprobe_remove run #6: OK run #7: OK run #8: OK run #9: OK ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Bisections with different bug manifestations 2021-04-16 7:51 ` Bisections with different bug manifestations Dmitry Vyukov @ 2021-04-16 13:13 ` Steven Rostedt 2021-04-16 13:26 ` Dmitry Vyukov 0 siblings, 1 reply; 10+ messages in thread From: Steven Rostedt @ 2021-04-16 13:13 UTC (permalink / raw) To: Dmitry Vyukov Cc: syzkaller, syzbot, Andrew Morton, Jens Axboe, Borislav Petkov, H. Peter Anvin, Jim Mattson, Joerg Roedel, KVM list, linux-block, LKML, Mark Rutland, masahiroy, Ingo Molnar, Paolo Bonzini, Peter Zijlstra, rafael.j.wysocki, Sean Christopherson, syzkaller-bugs, Thomas Gleixner, Vitaly Kuznetsov, Wanpeng Li, Will Deacon, the arch/x86 maintainers On Fri, 16 Apr 2021 09:51:45 +0200 Dmitry Vyukov <dvyukov@google.com> wrote: > > If you look at substantial base of bisection logs, you will find lots > of cases where bug types, functions don't match. Kernel crashes > differently even on the same revision. And obviously things change if > you change revisions. Also if you see presumably a different bug, what > does it say regarding the original bug. Yes, but there are also several types of cases where the issue will be the same. Namely lockdep. I agree that use after free warnings can have a side effect, and may be more difficult. But there's many other bugs that remain consistent across kernels. And if you stumble on one of them, look for it only. And if you hit another bug, and if it doesn't crash, then ignore it (of course this could be an issue if you have panic on warning set). But otherwise, just skip it. > > I would very much like to improve automatic bisection quality, but it > does not look trivial at all. > > Some random examples where, say, your hypothesis of WARN-to-WARN, > BUG-to-BUG does not hold even on the same kernel revision (add to this At least lockdep to lockdep, as when I do manual bisects, that's exactly what I look for, and ignore all other warnings. And that has found the problem commit pretty much every time. > different revisions and the fact that a different bug does not give > info regarding the original bug): > Can you tell me that all these examples bisected to the commit that caused the bug? Because if it did not, then you may have just proved my point ;-) > run #0: crashed: KASAN: use-after-free Read in fuse_dev_do_read > run #1: crashed: WARNING in request_end > run #2: crashed: KASAN: use-after-free Read in fuse_dev_do_read > run #3: OK > run #4: OK > [..] -- Steve ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Bisections with different bug manifestations 2021-04-16 13:13 ` Steven Rostedt @ 2021-04-16 13:26 ` Dmitry Vyukov 2021-04-16 13:48 ` Dmitry Vyukov 0 siblings, 1 reply; 10+ messages in thread From: Dmitry Vyukov @ 2021-04-16 13:26 UTC (permalink / raw) To: Steven Rostedt Cc: syzkaller, syzbot, Andrew Morton, Jens Axboe, Borislav Petkov, H. Peter Anvin, Jim Mattson, Joerg Roedel, KVM list, linux-block, LKML, Mark Rutland, masahiroy, Ingo Molnar, Paolo Bonzini, Peter Zijlstra, rafael.j.wysocki, Sean Christopherson, syzkaller-bugs, Thomas Gleixner, Vitaly Kuznetsov, Wanpeng Li, Will Deacon, the arch/x86 maintainers On Fri, Apr 16, 2021 at 3:13 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Fri, 16 Apr 2021 09:51:45 +0200 > Dmitry Vyukov <dvyukov@google.com> wrote: > > > > If you look at substantial base of bisection logs, you will find lots > > of cases where bug types, functions don't match. Kernel crashes > > differently even on the same revision. And obviously things change if > > you change revisions. Also if you see presumably a different bug, what > > does it say regarding the original bug. > > Yes, but there are also several types of cases where the issue will be the > same. Namely lockdep. I agree that use after free warnings can have a side > effect, and may be more difficult. But how do we know it's lockdep, rather than a use-after-free manifested as lockdep? A Significant portion of kernel bugs are caused by concurrency and can manifest in different ways, e.g. these are not lockdep, or WARN, or use-after-free, but rather a race in nature. > But there's many other bugs that remain > consistent across kernels. And if you stumble on one of them, look for it > only. For example? Does not look to be true for WARN, BUG, KASAN, "inconsistent lock state". > And if you hit another bug, and if it doesn't crash, then ignore it (of > course this could be an issue if you have panic on warning set). But > otherwise, just skip it. It's not possible to skip, say, BUG. And if we skip, say, a use-after-free, how do we know we are not making things worse? Because now we are running on corrupted memory, so anything can happen. Definitely a stray lockdep report can happen, or other way around not happen when it should... > > I would very much like to improve automatic bisection quality, but it > > does not look trivial at all. > > > > Some random examples where, say, your hypothesis of WARN-to-WARN, > > BUG-to-BUG does not hold even on the same kernel revision (add to this > > At least lockdep to lockdep, as when I do manual bisects, that's exactly > what I look for, and ignore all other warnings. And that has found the > problem commit pretty much every time. What lockdep bug types do you mean? All? In the examples above you can see at least "inconsistent lock state" mixed with 2 other completely different bug types. > > different revisions and the fact that a different bug does not give > > info regarding the original bug): > > > > Can you tell me that all these examples bisected to the commit that caused > the bug? Because if it did not, then you may have just proved my point ;-) I don't know now what was the result, but for a single run these were manifestations of the same root bug. E.g. see below, that's UAF in fuse_dev_do_read vs WARNING in request_end. request_end is also fuse. And you can see that a memory corruption causing a random bug type, in this case WARNING, but can as well be LOCKDEP. > > run #0: crashed: KASAN: use-after-free Read in fuse_dev_do_read > > run #1: crashed: WARNING in request_end > > run #2: crashed: KASAN: use-after-free Read in fuse_dev_do_read > > run #3: OK > > run #4: OK ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Bisections with different bug manifestations 2021-04-16 13:26 ` Dmitry Vyukov @ 2021-04-16 13:48 ` Dmitry Vyukov 0 siblings, 0 replies; 10+ messages in thread From: Dmitry Vyukov @ 2021-04-16 13:48 UTC (permalink / raw) To: Steven Rostedt Cc: syzkaller, syzbot, Andrew Morton, Jens Axboe, Borislav Petkov, H. Peter Anvin, Jim Mattson, Joerg Roedel, KVM list, linux-block, LKML, Mark Rutland, masahiroy, Ingo Molnar, Paolo Bonzini, Peter Zijlstra, rafael.j.wysocki, Sean Christopherson, syzkaller-bugs, Thomas Gleixner, Vitaly Kuznetsov, Wanpeng Li, Will Deacon, the arch/x86 maintainers On Fri, Apr 16, 2021 at 3:26 PM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Fri, Apr 16, 2021 at 3:13 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > > > On Fri, 16 Apr 2021 09:51:45 +0200 > > Dmitry Vyukov <dvyukov@google.com> wrote: > > > > > > If you look at substantial base of bisection logs, you will find lots > > > of cases where bug types, functions don't match. Kernel crashes > > > differently even on the same revision. And obviously things change if > > > you change revisions. Also if you see presumably a different bug, what > > > does it say regarding the original bug. > > > > Yes, but there are also several types of cases where the issue will be the > > same. Namely lockdep. I agree that use after free warnings can have a side > > effect, and may be more difficult. > > But how do we know it's lockdep, rather than a use-after-free > manifested as lockdep? > A Significant portion of kernel bugs are caused by concurrency and can > manifest in different ways, e.g. these are not lockdep, or WARN, or > use-after-free, but rather a race in nature. > > > But there's many other bugs that remain > > consistent across kernels. And if you stumble on one of them, look for it > > only. > > For example? Does not look to be true for WARN, BUG, KASAN, > "inconsistent lock state". > > > > And if you hit another bug, and if it doesn't crash, then ignore it (of > > course this could be an issue if you have panic on warning set). But > > otherwise, just skip it. > > It's not possible to skip, say, BUG. > And if we skip, say, a use-after-free, how do we know we are not > making things worse? Because now we are running on corrupted memory, > so anything can happen. Definitely a stray lockdep report can happen, > or other way around not happen when it should... > > > > I would very much like to improve automatic bisection quality, but it > > > does not look trivial at all. > > > > > > Some random examples where, say, your hypothesis of WARN-to-WARN, > > > BUG-to-BUG does not hold even on the same kernel revision (add to this > > > > At least lockdep to lockdep, as when I do manual bisects, that's exactly > > what I look for, and ignore all other warnings. And that has found the > > problem commit pretty much every time. > > What lockdep bug types do you mean? All? > In the examples above you can see at least "inconsistent lock state" > mixed with 2 other completely different bug types. I've looked for some examples. Here a reliable deadlock in io_uring changes to reliable task hung in io_uring: run #6: crashed: possible deadlock in io_poll_double_wake all runs: crashed: INFO: task hung in io_ring_exit_work https://syzkaller.appspot.com/text?tag=Log&x=135e4d31d00000 Here deadlock is mixed with "lock held when returning to user space": run #6: crashed: possible deadlock in inet_stream_connect run #7: crashed: WARNING: lock held when returning to user space in inet_stream_connect https://syzkaller.appspot.com/text?tag=Log&x=109b88f2d00000 Here a reliable deadlock in n_tty_receive_buf_common changes to UAF in n_tty_receive_buf_common: all runs: crashed: possible deadlock in n_tty_receive_buf_common all runs: crashed: KASAN: use-after-free Read in n_tty_receive_buf_common https://syzkaller.appspot.com/text?tag=Log&x=106596b5e00000 Here deadlock changes to UAF and corrupted list: run #4: crashed: possible deadlock in neigh_change_state run #1: crashed: KASAN: use-after-free Read in neigh_mark_dead run #2: crashed: BUG: corrupted list in ___neigh_create run #3: crashed: BUG: corrupted list in neigh_mark_dead https://syzkaller.appspot.com/text?tag=Log&x=1328a1e2e00000 But it also shows another interesting aspect, kernel output is sometimes corrupted and kernel does not manage to print proper complete oops message. In these cases we can't reliably say that it's also a deadlock or not, we can't say anything about the crash, it's unparsable. This problem also shows up here: https://syzkaller.appspot.com/text?tag=Log&x=145c3a6d200000 and here https://syzkaller.appspot.com/text?tag=Log&x=1006f18c600000 and in the second one also these 2 are mixed: run #1: crashed: WARNING: locking bug in corrupted run #2: crashed: WARNING: possible circular locking dependency detected This one shows a related issue. Kernel crash output periodically changes, and a testing system fails to parse it. This can happen if we go back to older releases, or when we do fix bisection and test a very new kernel with changed output: https://syzkaller.appspot.com/text?tag=Log&x=108f4416e00000 It's also easy to imagine that a program that triggers 2 deadlocks on different revisions, both are deadlocks, but completely different and unrelated. And we can't look at function names as they change over time. All these things are almost trivial when you do bisection manually. But for an automated system, they are quite challenging to handle. > > > different revisions and the fact that a different bug does not give > > > info regarding the original bug): > > > > > > > Can you tell me that all these examples bisected to the commit that caused > > the bug? Because if it did not, then you may have just proved my point ;-) > > I don't know now what was the result, but for a single run these were > manifestations of the same root bug. > E.g. see below, that's UAF in fuse_dev_do_read vs WARNING in > request_end. request_end is also fuse. And you can see that a memory > corruption causing a random bug type, in this case WARNING, but can as > well be LOCKDEP. > > > > > run #0: crashed: KASAN: use-after-free Read in fuse_dev_do_read > > > run #1: crashed: WARNING in request_end > > > run #2: crashed: KASAN: use-after-free Read in fuse_dev_do_read > > > run #3: OK > > > run #4: OK ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-04-16 13:49 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-04-13 17:33 [syzbot] possible deadlock in del_gendisk syzbot 2021-04-13 17:41 ` Steven Rostedt 2021-04-13 17:43 ` Steven Rostedt 2021-04-13 18:24 ` Dmitry Vyukov 2021-04-13 18:40 ` Steven Rostedt 2021-04-13 18:43 ` Steven Rostedt 2021-04-16 7:51 ` Bisections with different bug manifestations Dmitry Vyukov 2021-04-16 13:13 ` Steven Rostedt 2021-04-16 13:26 ` Dmitry Vyukov 2021-04-16 13:48 ` Dmitry Vyukov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).