Linux-Block Archive on lore.kernel.org
 help / color / Atom feed
* [syzbot] possible deadlock in del_gendisk
@ 2021-04-13 17:33 syzbot
  2021-04-13 17:41 ` Steven Rostedt
  0 siblings, 1 reply; 10+ messages in thread
From: syzbot @ 2021-04-13 17:33 UTC (permalink / raw)
  To: akpm, axboe, bp, hpa, jmattson, joro, kvm, linux-block,
	linux-kernel, mark.rutland, masahiroy, mingo, pbonzini, peterz,
	rafael.j.wysocki, rostedt, seanjc, syzkaller-bugs, tglx,
	vkuznets, wanpengli, will, x86

Hello,

syzbot found the following issue on:

HEAD commit:    e99d8a84 Add linux-next specific files for 20210409
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=13b01681d00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=7cd69574979bfeb7
dashboard link: https://syzkaller.appspot.com/bug?extid=61e04e51b7ac86930589
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=148265d9d00000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=16a981a1d00000

The issue was bisected to:

commit 997acaf6b4b59c6a9c259740312a69ea549cc684
Author: Mark Rutland <mark.rutland@arm.com>
Date:   Mon Jan 11 15:37:07 2021 +0000

    lockdep: report broken irq restoration

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=16a7e77ed00000
final oops:     https://syzkaller.appspot.com/x/report.txt?x=15a7e77ed00000
console output: https://syzkaller.appspot.com/x/log.txt?x=11a7e77ed00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+61e04e51b7ac86930589@syzkaller.appspotmail.com
Fixes: 997acaf6b4b5 ("lockdep: report broken irq restoration")

======================================================
WARNING: possible circular locking dependency detected
5.12.0-rc6-next-20210409-syzkaller #0 Not tainted
------------------------------------------------------
syz-executor104/8440 is trying to acquire lock:
ffff888016e9dca0 (&bdev->bd_mutex){+.+.}-{3:3}, at: del_gendisk+0x250/0x9e0 block/genhd.c:618

but task is already holding lock:
ffffffff8c7d9430 (bdev_lookup_sem){++++}-{3:3}, at: del_gendisk+0x222/0x9e0 block/genhd.c:616

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (bdev_lookup_sem){++++}-{3:3}:
       down_write+0x92/0x150 kernel/locking/rwsem.c:1406
       del_gendisk+0x222/0x9e0 block/genhd.c:616
       loop_remove drivers/block/loop.c:2191 [inline]
       loop_control_ioctl drivers/block/loop.c:2291 [inline]
       loop_control_ioctl+0x40d/0x4f0 drivers/block/loop.c:2251
       vfs_ioctl fs/ioctl.c:48 [inline]
       __do_sys_ioctl fs/ioctl.c:753 [inline]
       __se_sys_ioctl fs/ioctl.c:739 [inline]
       __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #1 (loop_ctl_mutex){+.+.}-{3:3}:
       __mutex_lock_common kernel/locking/mutex.c:949 [inline]
       __mutex_lock+0x139/0x1120 kernel/locking/mutex.c:1096
       lo_open+0x1a/0x130 drivers/block/loop.c:1890
       __blkdev_get+0x135/0xa30 fs/block_dev.c:1305
       blkdev_get_by_dev fs/block_dev.c:1457 [inline]
       blkdev_get_by_dev+0x26c/0x600 fs/block_dev.c:1425
       blkdev_open+0x154/0x2b0 fs/block_dev.c:1554
       do_dentry_open+0x4b9/0x11b0 fs/open.c:826
       do_open fs/namei.c:3361 [inline]
       path_openat+0x1c09/0x27d0 fs/namei.c:3494
       do_filp_open+0x190/0x3d0 fs/namei.c:3521
       do_sys_openat2+0x16d/0x420 fs/open.c:1187
       do_sys_open fs/open.c:1203 [inline]
       __do_sys_open fs/open.c:1211 [inline]
       __se_sys_open fs/open.c:1207 [inline]
       __x64_sys_open+0x119/0x1c0 fs/open.c:1207
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #0 (&bdev->bd_mutex){+.+.}-{3:3}:
       check_prev_add kernel/locking/lockdep.c:2938 [inline]
       check_prevs_add kernel/locking/lockdep.c:3061 [inline]
       validate_chain kernel/locking/lockdep.c:3676 [inline]
       __lock_acquire+0x2a17/0x5230 kernel/locking/lockdep.c:4902
       lock_acquire kernel/locking/lockdep.c:5512 [inline]
       lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5477
       __mutex_lock_common kernel/locking/mutex.c:949 [inline]
       __mutex_lock+0x139/0x1120 kernel/locking/mutex.c:1096
       del_gendisk+0x250/0x9e0 block/genhd.c:618
       loop_remove drivers/block/loop.c:2191 [inline]
       loop_control_ioctl drivers/block/loop.c:2291 [inline]
       loop_control_ioctl+0x40d/0x4f0 drivers/block/loop.c:2251
       vfs_ioctl fs/ioctl.c:48 [inline]
       __do_sys_ioctl fs/ioctl.c:753 [inline]
       __se_sys_ioctl fs/ioctl.c:739 [inline]
       __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:

Chain exists of:
  &bdev->bd_mutex --> loop_ctl_mutex --> bdev_lookup_sem

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(bdev_lookup_sem);
                               lock(loop_ctl_mutex);
                               lock(bdev_lookup_sem);
  lock(&bdev->bd_mutex);

 *** DEADLOCK ***

2 locks held by syz-executor104/8440:
 #0: ffffffff8ca5f148 (loop_ctl_mutex){+.+.}-{3:3}, at: loop_control_ioctl+0x7b/0x4f0 drivers/block/loop.c:2257
 #1: ffffffff8c7d9430 (bdev_lookup_sem){++++}-{3:3}, at: del_gendisk+0x222/0x9e0 block/genhd.c:616

stack backtrace:
CPU: 1 PID: 8440 Comm: syz-executor104 Not tainted 5.12.0-rc6-next-20210409-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x141/0x1d7 lib/dump_stack.c:120
 check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2129
 check_prev_add kernel/locking/lockdep.c:2938 [inline]
 check_prevs_add kernel/locking/lockdep.c:3061 [inline]
 validate_chain kernel/locking/lockdep.c:3676 [inline]
 __lock_acquire+0x2a17/0x5230 kernel/locking/lockdep.c:4902
 lock_acquire kernel/locking/lockdep.c:5512 [inline]
 lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5477
 __mutex_lock_common kernel/locking/mutex.c:949 [inline]
 __mutex_lock+0x139/0x1120 kernel/locking/mutex.c:1096
 del_gendisk+0x250/0x9e0 block/genhd.c:618
 loop_remove drivers/block/loop.c:2191 [inline]
 loop_control_ioctl drivers/block/loop.c:2291 [inline]
 loop_control_ioctl+0x40d/0x4f0 drivers/block/loop.c:2251
 vfs_ioctl fs/ioctl.c:48 [inline]
 __do_sys_ioctl fs/ioctl.c:753 [inline]
 __se_sys_ioctl fs/ioctl.c:739 [inline]
 __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x43ee49
Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff4d86c238 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000400488 RCX: 000000000043ee49
RDX: 0000000000000002 RSI: 0000000000004c81 RDI: 0000000000000003
RBP: 0000000000402e30 R08: 0000000000000000 R09: 0000000000400488
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402ec0
R13: 0000000000000000 R14: 00000000004ac018 R15: 0000000000400488


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [syzbot] possible deadlock in del_gendisk
  2021-04-13 17:33 [syzbot] possible deadlock in del_gendisk syzbot
@ 2021-04-13 17:41 ` Steven Rostedt
  2021-04-13 17:43   ` Steven Rostedt
  0 siblings, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2021-04-13 17:41 UTC (permalink / raw)
  To: syzbot
  Cc: akpm, axboe, bp, hpa, jmattson, joro, kvm, linux-block,
	linux-kernel, mark.rutland, masahiroy, mingo, pbonzini, peterz,
	rafael.j.wysocki, seanjc, syzkaller-bugs, tglx, vkuznets,
	wanpengli, will, x86

On Tue, 13 Apr 2021 10:33:19 -0700
syzbot <syzbot+61e04e51b7ac86930589@syzkaller.appspotmail.com> wrote:

> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    e99d8a84 Add linux-next specific files for 20210409
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=13b01681d00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=7cd69574979bfeb7
> dashboard link: https://syzkaller.appspot.com/bug?extid=61e04e51b7ac86930589
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=148265d9d00000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=16a981a1d00000
> 
> The issue was bisected to:
> 
> commit 997acaf6b4b59c6a9c259740312a69ea549cc684
> Author: Mark Rutland <mark.rutland@arm.com>
> Date:   Mon Jan 11 15:37:07 2021 +0000
> 
>     lockdep: report broken irq restoration

As the below splats look like it has nothing to do with this patch, and
this patch will add a WARN() if there's broken logic somewhere, I bet the
bisect got confused (if it is automated and does a panic_on_warning),
because it will panic for broken code that his patch detects.

That is, the bisect was confused because it was triggering on two different
issues. One that triggered the reported splat below, and another that this
commit detects and warns on.

Hence, the bisect failed.

-- Steve


> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=16a7e77ed00000
> final oops:     https://syzkaller.appspot.com/x/report.txt?x=15a7e77ed00000
> console output: https://syzkaller.appspot.com/x/log.txt?x=11a7e77ed00000
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+61e04e51b7ac86930589@syzkaller.appspotmail.com
> Fixes: 997acaf6b4b5 ("lockdep: report broken irq restoration")
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.12.0-rc6-next-20210409-syzkaller #0 Not tainted
> ------------------------------------------------------
> syz-executor104/8440 is trying to acquire lock:
> ffff888016e9dca0 (&bdev->bd_mutex){+.+.}-{3:3}, at: del_gendisk+0x250/0x9e0 block/genhd.c:618
> 
> but task is already holding lock:
> ffffffff8c7d9430 (bdev_lookup_sem){++++}-{3:3}, at: del_gendisk+0x222/0x9e0 block/genhd.c:616
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #2 (bdev_lookup_sem){++++}-{3:3}:  
>        down_write+0x92/0x150 kernel/locking/rwsem.c:1406
>        del_gendisk+0x222/0x9e0 block/genhd.c:616
>        loop_remove drivers/block/loop.c:2191 [inline]
>        loop_control_ioctl drivers/block/loop.c:2291 [inline]
>        loop_control_ioctl+0x40d/0x4f0 drivers/block/loop.c:2251
>        vfs_ioctl fs/ioctl.c:48 [inline]
>        __do_sys_ioctl fs/ioctl.c:753 [inline]
>        __se_sys_ioctl fs/ioctl.c:739 [inline]
>        __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
>        do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>        entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #1 (loop_ctl_mutex){+.+.}-{3:3}:  
>        __mutex_lock_common kernel/locking/mutex.c:949 [inline]
>        __mutex_lock+0x139/0x1120 kernel/locking/mutex.c:1096
>        lo_open+0x1a/0x130 drivers/block/loop.c:1890
>        __blkdev_get+0x135/0xa30 fs/block_dev.c:1305
>        blkdev_get_by_dev fs/block_dev.c:1457 [inline]
>        blkdev_get_by_dev+0x26c/0x600 fs/block_dev.c:1425
>        blkdev_open+0x154/0x2b0 fs/block_dev.c:1554
>        do_dentry_open+0x4b9/0x11b0 fs/open.c:826
>        do_open fs/namei.c:3361 [inline]
>        path_openat+0x1c09/0x27d0 fs/namei.c:3494
>        do_filp_open+0x190/0x3d0 fs/namei.c:3521
>        do_sys_openat2+0x16d/0x420 fs/open.c:1187
>        do_sys_open fs/open.c:1203 [inline]
>        __do_sys_open fs/open.c:1211 [inline]
>        __se_sys_open fs/open.c:1207 [inline]
>        __x64_sys_open+0x119/0x1c0 fs/open.c:1207
>        do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>        entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #0 (&bdev->bd_mutex){+.+.}-{3:3}:  
>        check_prev_add kernel/locking/lockdep.c:2938 [inline]
>        check_prevs_add kernel/locking/lockdep.c:3061 [inline]
>        validate_chain kernel/locking/lockdep.c:3676 [inline]
>        __lock_acquire+0x2a17/0x5230 kernel/locking/lockdep.c:4902
>        lock_acquire kernel/locking/lockdep.c:5512 [inline]
>        lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5477
>        __mutex_lock_common kernel/locking/mutex.c:949 [inline]
>        __mutex_lock+0x139/0x1120 kernel/locking/mutex.c:1096
>        del_gendisk+0x250/0x9e0 block/genhd.c:618
>        loop_remove drivers/block/loop.c:2191 [inline]
>        loop_control_ioctl drivers/block/loop.c:2291 [inline]
>        loop_control_ioctl+0x40d/0x4f0 drivers/block/loop.c:2251
>        vfs_ioctl fs/ioctl.c:48 [inline]
>        __do_sys_ioctl fs/ioctl.c:753 [inline]
>        __se_sys_ioctl fs/ioctl.c:739 [inline]
>        __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
>        do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>        entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> other info that might help us debug this:
> 
> Chain exists of:
>   &bdev->bd_mutex --> loop_ctl_mutex --> bdev_lookup_sem
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(bdev_lookup_sem);
>                                lock(loop_ctl_mutex);
>                                lock(bdev_lookup_sem);
>   lock(&bdev->bd_mutex);
> 
>  *** DEADLOCK ***
> 
> 2 locks held by syz-executor104/8440:
>  #0: ffffffff8ca5f148 (loop_ctl_mutex){+.+.}-{3:3}, at: loop_control_ioctl+0x7b/0x4f0 drivers/block/loop.c:2257
>  #1: ffffffff8c7d9430 (bdev_lookup_sem){++++}-{3:3}, at: del_gendisk+0x222/0x9e0 block/genhd.c:616
> 
> stack backtrace:
> CPU: 1 PID: 8440 Comm: syz-executor104 Not tainted 5.12.0-rc6-next-20210409-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:79 [inline]
>  dump_stack+0x141/0x1d7 lib/dump_stack.c:120
>  check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2129
>  check_prev_add kernel/locking/lockdep.c:2938 [inline]
>  check_prevs_add kernel/locking/lockdep.c:3061 [inline]
>  validate_chain kernel/locking/lockdep.c:3676 [inline]
>  __lock_acquire+0x2a17/0x5230 kernel/locking/lockdep.c:4902
>  lock_acquire kernel/locking/lockdep.c:5512 [inline]
>  lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5477
>  __mutex_lock_common kernel/locking/mutex.c:949 [inline]
>  __mutex_lock+0x139/0x1120 kernel/locking/mutex.c:1096
>  del_gendisk+0x250/0x9e0 block/genhd.c:618
>  loop_remove drivers/block/loop.c:2191 [inline]
>  loop_control_ioctl drivers/block/loop.c:2291 [inline]
>  loop_control_ioctl+0x40d/0x4f0 drivers/block/loop.c:2251
>  vfs_ioctl fs/ioctl.c:48 [inline]
>  __do_sys_ioctl fs/ioctl.c:753 [inline]
>  __se_sys_ioctl fs/ioctl.c:739 [inline]
>  __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
>  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x43ee49
> Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007fff4d86c238 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 0000000000400488 RCX: 000000000043ee49
> RDX: 0000000000000002 RSI: 0000000000004c81 RDI: 0000000000000003
> RBP: 0000000000402e30 R08: 0000000000000000 R09: 0000000000400488
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402ec0
> R13: 0000000000000000 R14: 00000000004ac018 R15: 0000000000400488
> 
> 
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> syzbot can test patches for this issue, for details see:
> https://goo.gl/tpsmEJ#testing-patches


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [syzbot] possible deadlock in del_gendisk
  2021-04-13 17:41 ` Steven Rostedt
@ 2021-04-13 17:43   ` Steven Rostedt
  2021-04-13 18:24     ` Dmitry Vyukov
  0 siblings, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2021-04-13 17:43 UTC (permalink / raw)
  To: syzbot
  Cc: akpm, axboe, bp, hpa, jmattson, joro, kvm, linux-block,
	linux-kernel, mark.rutland, masahiroy, mingo, pbonzini, peterz,
	rafael.j.wysocki, seanjc, syzkaller-bugs, tglx, vkuznets,
	wanpengli, will, x86

On Tue, 13 Apr 2021 13:41:47 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> As the below splats look like it has nothing to do with this patch, and
> this patch will add a WARN() if there's broken logic somewhere, I bet the
> bisect got confused (if it is automated and does a panic_on_warning),
> because it will panic for broken code that his patch detects.
> 
> That is, the bisect was confused because it was triggering on two different
> issues. One that triggered the reported splat below, and another that this
> commit detects and warns on.

Is it possible to update the the bisect to make sure that if it is failing
on warnings, to make sure the warnings are somewhat related, before decided
that its the same bug?

-- Steve

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [syzbot] possible deadlock in del_gendisk
  2021-04-13 17:43   ` Steven Rostedt
@ 2021-04-13 18:24     ` Dmitry Vyukov
  2021-04-13 18:40       ` Steven Rostedt
  0 siblings, 1 reply; 10+ messages in thread
From: Dmitry Vyukov @ 2021-04-13 18:24 UTC (permalink / raw)
  To: Steven Rostedt, syzkaller
  Cc: syzbot, Andrew Morton, Jens Axboe, Borislav Petkov,
	H. Peter Anvin, Jim Mattson, Joerg Roedel, KVM list, linux-block,
	LKML, Mark Rutland, masahiroy, Ingo Molnar, Paolo Bonzini,
	Peter Zijlstra, rafael.j.wysocki, Sean Christopherson,
	syzkaller-bugs, Thomas Gleixner, Vitaly Kuznetsov, Wanpeng Li,
	Will Deacon, the arch/x86 maintainers

On Tue, Apr 13, 2021 at 7:43 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Tue, 13 Apr 2021 13:41:47 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > As the below splats look like it has nothing to do with this patch, and
> > this patch will add a WARN() if there's broken logic somewhere, I bet the
> > bisect got confused (if it is automated and does a panic_on_warning),
> > because it will panic for broken code that his patch detects.
> >
> > That is, the bisect was confused because it was triggering on two different
> > issues. One that triggered the reported splat below, and another that this
> > commit detects and warns on.
>
> Is it possible to update the the bisect to make sure that if it is failing
> on warnings, to make sure the warnings are somewhat related, before decided
> that its the same bug?

It does not seem to be feasible, bugs manifest differently in both
space and time. Also even if we somehow conclude the crash we see is
different, it says nothing about the original bug. For more data see:
https://groups.google.com/g/syzkaller/c/sR8aAXaWEF4/m/tTWYRgvmAwAJ

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [syzbot] possible deadlock in del_gendisk
  2021-04-13 18:24     ` Dmitry Vyukov
@ 2021-04-13 18:40       ` Steven Rostedt
  2021-04-13 18:43         ` Steven Rostedt
  0 siblings, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2021-04-13 18:40 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzkaller, syzbot, Andrew Morton, Jens Axboe, Borislav Petkov,
	H. Peter Anvin, Jim Mattson, Joerg Roedel, KVM list, linux-block,
	LKML, Mark Rutland, masahiroy, Ingo Molnar, Paolo Bonzini,
	Peter Zijlstra, rafael.j.wysocki, Sean Christopherson,
	syzkaller-bugs, Thomas Gleixner, Vitaly Kuznetsov, Wanpeng Li,
	Will Deacon, the arch/x86 maintainers

On Tue, 13 Apr 2021 20:24:00 +0200
Dmitry Vyukov <dvyukov@google.com> wrote:

> On Tue, Apr 13, 2021 at 7:43 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Tue, 13 Apr 2021 13:41:47 -0400
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> >  
> > > As the below splats look like it has nothing to do with this patch, and
> > > this patch will add a WARN() if there's broken logic somewhere, I bet the
> > > bisect got confused (if it is automated and does a panic_on_warning),
> > > because it will panic for broken code that his patch detects.
> > >
> > > That is, the bisect was confused because it was triggering on two different
> > > issues. One that triggered the reported splat below, and another that this
> > > commit detects and warns on.  
> >
> > Is it possible to update the the bisect to make sure that if it is failing
> > on warnings, to make sure the warnings are somewhat related, before decided
> > that its the same bug?  
> 
> It does not seem to be feasible, bugs manifest differently in both
> space and time. Also even if we somehow conclude the crash we see is
> different, it says nothing about the original bug. For more data see:
> https://groups.google.com/g/syzkaller/c/sR8aAXaWEF4/m/tTWYRgvmAwAJ

Sure, but if you trigger a lockdep bug, lockdep bugs are usually very
similar in all instances. It usually will include the same locks, or at
least be the same type of lockdep bug (where some types are related).

If the bisect saw an issue before and at this commit, I would think it
didn't trigger a lockdep bug at all, and simply triggered a warning.

In fact, according to:

  https://syzkaller.appspot.com/x/report.txt?x=15a7e77ed00000

That's exactly what it did.

That is, if the original bug is a lockdep warning, you should only be
interested in lockdep warnings (at a minimum). The final bug here had:

------------[ cut here ]------------
raw_local_irq_restore() called with IRQs enabled
WARNING: CPU: 0 PID: 8777 at kernel/locking/irqflag-debug.c:9 warn_bogus_irq_restore kernel/locking/irqflag-debug.c:9 [inline]
WARNING: CPU: 0 PID: 8777 at kernel/locking/irqflag-debug.c:9 warn_bogus_irq_restore+0x1d/0x20 kernel/locking/irqflag-debug.c:7
Modules linked in:
CPU: 0 PID: 8777 Comm: syz-executor.1 Not tainted 5.11.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:warn_bogus_irq_restore kernel/locking/irqflag-debug.c:9 [inline]
RIP: 0010:warn_bogus_irq_restore+0x1d/0x20 kernel/locking/irqflag-debug.c:7
Code: 51 00 e9 3f fe ff ff cc cc cc cc cc cc 80 3d e0 b4 ce 0a 00 74 01 c3 48 c7 c7 60 f5 8a 88 c6 05 cf b4 ce 0a 01 e8 17 01 a4 06 <0f> 0b c3 48 c7 c0 a0 46 4d 8e 53 48 89 fb 48 ba 00 00 00 00 00 fc
RSP: 0018:ffffc900017bf9f8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8881477b2040 RCX: 0000000000000000
RDX: 0000000000000002 RSI: 0000000000000004 RDI: fffff520002f7f31
RBP: 0000000000000246 R08: 0000000000000001 R09: ffff8880b9e2015b
R10: ffffed10173c402b R11: 0000000000000001 R12: 0000000000000003
R13: ffffed1028ef6408 R14: 0000000000000001 R15: ffff8880b9e359c0
FS:  00000000017b8400(0000) GS:ffff8880b9e00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000017c1848 CR3: 0000000010920000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 kvm_wait arch/x86/kernel/kvm.c:860 [inline]
 kvm_wait+0xc3/0xe0 arch/x86/kernel/kvm.c:837
 pv_wait arch/x86/include/asm/paravirt.h:564 [inline]
 pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:470 [inline]
 __pv_queued_spin_lock_slowpath+0x8b8/0xb40 kernel/locking/qspinlock.c:508
 pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:554 [inline]
 queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline]
 queued_spin_lock include/asm-generic/qspinlock.h:85 [inline]
 do_raw_spin_lock+0x200/0x2b0 kernel/locking/spinlock_debug.c:113
 spin_lock include/linux/spinlock.h:354 [inline]
 ext4_lock_group fs/ext4/ext4.h:3379 [inline]
 __ext4_new_inode+0x2da2/0x44d0 fs/ext4/ialloc.c:1187
 ext4_mkdir+0x298/0x910 fs/ext4/namei.c:2793
 vfs_mkdir+0x413/0x660 fs/namei.c:3652
 do_mkdirat+0x1eb/0x250 fs/namei.c:3675
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x465567
Code: 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 53 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe00ad69f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000053
RAX: ffffffffffffffda RBX: 00007ffe00ad6a90 RCX: 0000000000465567
RDX: 0000000000000000 RSI: 00000000000001ff RDI: 00007ffe00ad6a90
RBP: 00007ffe00ad6a6c R08: 0000000000000000 R09: 0000000000000006
R10: 00007ffe00ad6794 R11: 0000000000000202 R12: 0000000000000032
R13: 000000000006549a R14: 0000000000000002 R15: 00007ffe00ad6ad0



Which shows a WARN was triggered, and totally unrelated to what you were
bisecting.

Just matching types of warnings (lockdep to lockdep, or WARN_ON to WARN_ON)
would help reduce the number of bogus bisects you are having.

-- Steve

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [syzbot] possible deadlock in del_gendisk
  2021-04-13 18:40       ` Steven Rostedt
@ 2021-04-13 18:43         ` Steven Rostedt
  2021-04-16  7:51           ` Bisections with different bug manifestations Dmitry Vyukov
  0 siblings, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2021-04-13 18:43 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzkaller, syzbot, Andrew Morton, Jens Axboe, Borislav Petkov,
	H. Peter Anvin, Jim Mattson, Joerg Roedel, KVM list, linux-block,
	LKML, Mark Rutland, masahiroy, Ingo Molnar, Paolo Bonzini,
	Peter Zijlstra, rafael.j.wysocki, Sean Christopherson,
	syzkaller-bugs, Thomas Gleixner, Vitaly Kuznetsov, Wanpeng Li,
	Will Deacon, the arch/x86 maintainers

On Tue, 13 Apr 2021 14:40:09 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> ------------[ cut here ]------------
> raw_local_irq_restore() called with IRQs enabled
> WARNING: CPU: 0 PID: 8777 at kernel/locking/irqflag-debug.c:9 warn_bogus_irq_restore kernel/locking/irqflag-debug.c:9 [inline]
> WARNING: CPU: 0 PID: 8777 at kernel/locking/irqflag-debug.c:9 warn_bogus_irq_restore+0x1d/0x20 kernel/locking/irqflag-debug.c:7

In fact, when you have the above, which is a WARN() with text:

 "raw_local_irq_restore() called with IRQs enabled"

It is pretty much guaranteed that all triggers of this bug will have the
above warning with the same text.

-- Steve


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Bisections with different bug manifestations
  2021-04-13 18:43         ` Steven Rostedt
@ 2021-04-16  7:51           ` Dmitry Vyukov
  2021-04-16 13:13             ` Steven Rostedt
  0 siblings, 1 reply; 10+ messages in thread
From: Dmitry Vyukov @ 2021-04-16  7:51 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: syzkaller, syzbot, Andrew Morton, Jens Axboe, Borislav Petkov,
	H. Peter Anvin, Jim Mattson, Joerg Roedel, KVM list, linux-block,
	LKML, Mark Rutland, masahiroy, Ingo Molnar, Paolo Bonzini,
	Peter Zijlstra, rafael.j.wysocki, Sean Christopherson,
	syzkaller-bugs, Thomas Gleixner, Vitaly Kuznetsov, Wanpeng Li,
	Will Deacon, the arch/x86 maintainers

On Tue, Apr 13, 2021 at 8:43 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Tue, 13 Apr 2021 14:40:09 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > ------------[ cut here ]------------
> > raw_local_irq_restore() called with IRQs enabled
> > WARNING: CPU: 0 PID: 8777 at kernel/locking/irqflag-debug.c:9 warn_bogus_irq_restore kernel/locking/irqflag-debug.c:9 [inline]
> > WARNING: CPU: 0 PID: 8777 at kernel/locking/irqflag-debug.c:9 warn_bogus_irq_restore+0x1d/0x20 kernel/locking/irqflag-debug.c:7
>
> In fact, when you have the above, which is a WARN() with text:
>
>  "raw_local_irq_restore() called with IRQs enabled"
>
> It is pretty much guaranteed that all triggers of this bug will have the
> above warning with the same text.

This looks nice on paper and looking at only 1 bisection log. But
unfortunately in practice theory and practice are different...

This was discussed at length multiple times:
https://groups.google.com/g/syzkaller/search?q=bisection+different+manifestations
https://groups.google.com/g/syzkaller-bugs/c/nFeC8-UG1gg/m/y6gUEsvAAgAJ
https://groups.google.com/g/syzkaller/c/sR8aAXaWEF4/m/tTWYRgvmAwAJ
https://groups.google.com/g/syzkaller/c/9NdprHsGBqo/m/Yj9uWRDgBQAJ

If you look at substantial base of bisection logs, you will find lots
of cases where bug types, functions don't match. Kernel crashes
differently even on the same revision. And obviously things change if
you change revisions. Also if you see presumably a different bug, what
does it say regarding the original bug.

I would very much like to improve automatic bisection quality, but it
does not look trivial at all.

Some random examples where, say, your hypothesis of WARN-to-WARN,
BUG-to-BUG does not hold even on the same kernel revision (add to this
different revisions and the fact that a different bug does not give
info regarding the original bug):

run #0: crashed: KASAN: use-after-free Read in fuse_dev_do_read
run #1: crashed: WARNING in request_end
run #2: crashed: KASAN: use-after-free Read in fuse_dev_do_read
run #3: OK
run #4: OK

run #0: crashed: KASAN: slab-out-of-bounds Read in __ip_append_data
run #1: crashed: inconsistent lock state in rhashtable_walk_enter
run #2: crashed: inconsistent lock state in rhashtable_walk_enter
run #3: crashed: inconsistent lock state in rhashtable_walk_enter
run #4: crashed: inconsistent lock state in rhashtable_walk_enter
run #5: crashed: inconsistent lock state in rhashtable_walk_enter
run #6: crashed: inconsistent lock state in rhashtable_walk_enter
run #7: crashed: kernel BUG at arch/x86/mm/physaddr.c:LINE!
run #8: crashed: inconsistent lock state in rhashtable_walk_enter
run #9: crashed: inconsistent lock state in rhashtable_walk_enter

run #0: crashed: kernel BUG at arch/x86/mm/physaddr.c:LINE!
run #1: crashed: inconsistent lock state in rhashtable_walk_enter
run #2: crashed: inconsistent lock state in rhashtable_walk_enter
run #3: crashed: inconsistent lock state in rhashtable_walk_enter
run #4: crashed: inconsistent lock state in rhashtable_walk_enter
run #5: crashed: kernel BUG at arch/x86/mm/physaddr.c:LINE!
run #6: crashed: inconsistent lock state in rhashtable_walk_enter
run #7: crashed: kernel BUG at arch/x86/mm/physaddr.c:LINE!
run #8: crashed: inconsistent lock state in rhashtable_walk_enter
run #9: crashed: inconsistent lock state in rhashtable_walk_enter

run #0: crashed: KASAN: use-after-free Read in __vb2_perform_fileio
run #1: crashed: KASAN: use-after-free Write in __vb2_cleanup_fileio
run #2: crashed: KASAN: use-after-free Read in __vb2_perform_fileio
run #3: crashed: KASAN: use-after-free Read in __vb2_perform_fileio
run #4: crashed: INFO: task hung in vivid_stop_generating_vid_cap
run #5: crashed: INFO: task hung in vivid_stop_generating_vid_cap
run #6: crashed: INFO: task hung in vivid_stop_generating_vid_cap
run #7: crashed: INFO: task hung in vivid_stop_generating_vid_cap
run #8: crashed: INFO: task hung in vivid_stop_generating_vid_cap
run #9: crashed: INFO: task hung in vivid_stop_generating_vid_cap

run #0: crashed: general protection fault in sctp_assoc_rwnd_increase
run #1: crashed: general protection fault in sctp_ulpevent_free
run #2: crashed: general protection fault in sctp_assoc_rwnd_increase
run #3: crashed: general protection fault in sctp_assoc_rwnd_increase
run #4: crashed: general protection fault in sctp_assoc_rwnd_increase
run #5: crashed: general protection fault in sctp_assoc_rwnd_increase
run #6: crashed: general protection fault in sctp_assoc_rwnd_increase
run #7: crashed: general protection fault in sctp_assoc_rwnd_increase

run #0: crashed: general protection fault in sctp_assoc_rwnd_increase
run #1: crashed: general protection fault in sctp_assoc_rwnd_increase
run #2: crashed: general protection fault in sctp_assoc_rwnd_increase
run #3: crashed: general protection fault in sctp_assoc_rwnd_increase
run #4: crashed: general protection fault in corrupted
run #5: crashed: general protection fault in sctp_assoc_rwnd_increase
run #6: crashed: general protection fault in sctp_assoc_rwnd_increase
run #7: crashed: general protection fault in corrupted

run #0: crashed: INFO: rcu detected stall in corrupted
run #1: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #2: crashed: INFO: rcu detected stall in sys_sendfile64
run #3: crashed: INFO: rcu detected stall in corrupted
run #4: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #5: crashed: INFO: rcu detected stall in corrupted
run #6: crashed: INFO: rcu detected stall in corrupted
run #7: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #8: crashed: INFO: rcu detected stall in sys_sendfile64
run #9: crashed: INFO: rcu detected stall in ext4_file_write_iter

run #0: crashed: INFO: rcu detected stall in sys_sendfile64
run #1: crashed: INFO: rcu detected stall in corrupted
run #2: crashed: INFO: rcu detected stall in corrupted
run #3: crashed: INFO: rcu detected stall in sys_sendfile64
run #4: crashed: INFO: rcu detected stall in corrupted
run #5: crashed: INFO: rcu detected stall in corrupted
run #6: crashed: INFO: rcu detected stall in corrupted
run #7: crashed: INFO: rcu detected stall in corrupted
run #8: crashed: INFO: rcu detected stall in sys_sendfile64
run #9: crashed: INFO: rcu detected stall in corrupted

run #0: crashed: INFO: rcu detected stall in rw_verify_area
run #1: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #2: crashed: INFO: rcu detected stall in corrupted
run #3: crashed: INFO: rcu detected stall in corrupted
run #4: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #5: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #6: crashed: INFO: rcu detected stall in corrupted
run #7: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #8: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #9: crashed: INFO: rcu detected stall in rw_verify_area

run #0: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #1: crashed: INFO: rcu detected stall in corrupted
run #2: crashed: INFO: rcu detected stall in sys_sendfile64
run #3: crashed: INFO: rcu detected stall in sys_sendfile64
run #4: crashed: INFO: rcu detected stall in corrupted
run #5: crashed: INFO: rcu detected stall in sys_sendfile64
run #6: crashed: INFO: rcu detected stall in sys_sendfile64
run #7: crashed: INFO: rcu detected stall in corrupted
run #8: crashed: INFO: rcu detected stall in corrupted
run #9: crashed: INFO: rcu detected stall in sys_sendfile64

run #0: crashed: KASAN: use-after-free Read in link_path_walk
run #1: crashed: KASAN: use-after-free Read in link_path_walk
run #2: crashed: KASAN: use-after-free Read in trailing_symlink
run #3: crashed: KASAN: use-after-free Read in trailing_symlink
run #4: crashed: KASAN: use-after-free Read in trailing_symlink
run #5: crashed: KASAN: use-after-free Read in link_path_walk
run #6: crashed: KASAN: use-after-free Read in link_path_walk
run #7: crashed: KASAN: use-after-free Read in link_path_walk
run #8: crashed: KASAN: use-after-free Read in trailing_symlink
run #9: crashed: KASAN: use-after-free Read in trailing_symlink

run #0: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup
run #1: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup
run #2: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup
run #3: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup
run #4: crashed: WARNING: ODEBUG bug in corrupted
run #5: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup
run #6: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup
run #7: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup
run #8: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup
run #9: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup

run #0: crashed: KASAN: use-after-free Read in delayed_uprobe_remove
run #1: crashed: KASAN: use-after-free Read in delayed_uprobe_remove
run #2: crashed: general protection fault in delayed_uprobe_remove
run #3: crashed: KASAN: use-after-free Read in delayed_uprobe_remove
run #4: crashed: general protection fault in delayed_uprobe_remove
run #5: crashed: KASAN: use-after-free Read in delayed_uprobe_remove
run #6: OK
run #7: OK
run #8: OK
run #9: OK

run #0: crashed: general protection fault in delayed_uprobe_remove
run #1: crashed: KASAN: use-after-free Read in delayed_uprobe_remove
run #2: crashed: KASAN: use-after-free Read in delayed_uprobe_remove
run #3: crashed: KASAN: use-after-free Read in delayed_uprobe_remove
run #4: crashed: general protection fault in delayed_uprobe_remove
run #5: crashed: KASAN: use-after-free Read in delayed_uprobe_remove
run #6: OK
run #7: OK
run #8: OK
run #9: OK

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bisections with different bug manifestations
  2021-04-16  7:51           ` Bisections with different bug manifestations Dmitry Vyukov
@ 2021-04-16 13:13             ` Steven Rostedt
  2021-04-16 13:26               ` Dmitry Vyukov
  0 siblings, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2021-04-16 13:13 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzkaller, syzbot, Andrew Morton, Jens Axboe, Borislav Petkov,
	H. Peter Anvin, Jim Mattson, Joerg Roedel, KVM list, linux-block,
	LKML, Mark Rutland, masahiroy, Ingo Molnar, Paolo Bonzini,
	Peter Zijlstra, rafael.j.wysocki, Sean Christopherson,
	syzkaller-bugs, Thomas Gleixner, Vitaly Kuznetsov, Wanpeng Li,
	Will Deacon, the arch/x86 maintainers

On Fri, 16 Apr 2021 09:51:45 +0200
Dmitry Vyukov <dvyukov@google.com> wrote:
> 
> If you look at substantial base of bisection logs, you will find lots
> of cases where bug types, functions don't match. Kernel crashes
> differently even on the same revision. And obviously things change if
> you change revisions. Also if you see presumably a different bug, what
> does it say regarding the original bug.

Yes, but there are also several types of cases where the issue will be the
same. Namely lockdep. I agree that use after free warnings can have a side
effect, and may be more difficult. But there's many other bugs that remain
consistent across kernels. And if you stumble on one of them, look for it
only.

And if you hit another bug, and if it doesn't crash, then ignore it (of
course this could be an issue if you have panic on warning set). But
otherwise, just skip it.

> 
> I would very much like to improve automatic bisection quality, but it
> does not look trivial at all.
> 
> Some random examples where, say, your hypothesis of WARN-to-WARN,
> BUG-to-BUG does not hold even on the same kernel revision (add to this

At least lockdep to lockdep, as when I do manual bisects, that's exactly
what I look for, and ignore all other warnings. And that has found the
problem commit pretty much every time.

> different revisions and the fact that a different bug does not give
> info regarding the original bug):
> 

Can you tell me that all these examples bisected to the commit that caused
the bug? Because if it did not, then you may have just proved my point ;-)

> run #0: crashed: KASAN: use-after-free Read in fuse_dev_do_read
> run #1: crashed: WARNING in request_end
> run #2: crashed: KASAN: use-after-free Read in fuse_dev_do_read
> run #3: OK
> run #4: OK
> 

[..]

-- Steve

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bisections with different bug manifestations
  2021-04-16 13:13             ` Steven Rostedt
@ 2021-04-16 13:26               ` Dmitry Vyukov
  2021-04-16 13:48                 ` Dmitry Vyukov
  0 siblings, 1 reply; 10+ messages in thread
From: Dmitry Vyukov @ 2021-04-16 13:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: syzkaller, syzbot, Andrew Morton, Jens Axboe, Borislav Petkov,
	H. Peter Anvin, Jim Mattson, Joerg Roedel, KVM list, linux-block,
	LKML, Mark Rutland, masahiroy, Ingo Molnar, Paolo Bonzini,
	Peter Zijlstra, rafael.j.wysocki, Sean Christopherson,
	syzkaller-bugs, Thomas Gleixner, Vitaly Kuznetsov, Wanpeng Li,
	Will Deacon, the arch/x86 maintainers

On Fri, Apr 16, 2021 at 3:13 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Fri, 16 Apr 2021 09:51:45 +0200
> Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > If you look at substantial base of bisection logs, you will find lots
> > of cases where bug types, functions don't match. Kernel crashes
> > differently even on the same revision. And obviously things change if
> > you change revisions. Also if you see presumably a different bug, what
> > does it say regarding the original bug.
>
> Yes, but there are also several types of cases where the issue will be the
> same. Namely lockdep. I agree that use after free warnings can have a side
> effect, and may be more difficult.

But how do we know it's lockdep, rather than a use-after-free
manifested as lockdep?
A Significant portion of kernel bugs are caused by concurrency and can
manifest in different ways, e.g. these are not lockdep, or WARN, or
use-after-free, but rather a race in nature.

> But there's many other bugs that remain
> consistent across kernels. And if you stumble on one of them, look for it
> only.

For example? Does not look to be true for WARN, BUG, KASAN,
"inconsistent lock state".


> And if you hit another bug, and if it doesn't crash, then ignore it (of
> course this could be an issue if you have panic on warning set). But
> otherwise, just skip it.

It's not possible to skip, say, BUG.
And if we skip, say, a use-after-free, how do we know we are not
making things worse? Because now we are running on corrupted memory,
so anything can happen. Definitely a stray lockdep report can happen,
or other way around not happen when it should...

> > I would very much like to improve automatic bisection quality, but it
> > does not look trivial at all.
> >
> > Some random examples where, say, your hypothesis of WARN-to-WARN,
> > BUG-to-BUG does not hold even on the same kernel revision (add to this
>
> At least lockdep to lockdep, as when I do manual bisects, that's exactly
> what I look for, and ignore all other warnings. And that has found the
> problem commit pretty much every time.

What lockdep bug types do you mean? All?
In the examples above you can see at least "inconsistent lock state"
mixed with 2 other completely different bug types.

> > different revisions and the fact that a different bug does not give
> > info regarding the original bug):
> >
>
> Can you tell me that all these examples bisected to the commit that caused
> the bug? Because if it did not, then you may have just proved my point ;-)

I don't know now what was the result, but for a single run these were
manifestations of the same root bug.
E.g. see below, that's UAF in fuse_dev_do_read vs WARNING in
request_end. request_end is also fuse. And you can see that a memory
corruption causing a random bug type, in this case WARNING, but can as
well be LOCKDEP.


> > run #0: crashed: KASAN: use-after-free Read in fuse_dev_do_read
> > run #1: crashed: WARNING in request_end
> > run #2: crashed: KASAN: use-after-free Read in fuse_dev_do_read
> > run #3: OK
> > run #4: OK

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bisections with different bug manifestations
  2021-04-16 13:26               ` Dmitry Vyukov
@ 2021-04-16 13:48                 ` Dmitry Vyukov
  0 siblings, 0 replies; 10+ messages in thread
From: Dmitry Vyukov @ 2021-04-16 13:48 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: syzkaller, syzbot, Andrew Morton, Jens Axboe, Borislav Petkov,
	H. Peter Anvin, Jim Mattson, Joerg Roedel, KVM list, linux-block,
	LKML, Mark Rutland, masahiroy, Ingo Molnar, Paolo Bonzini,
	Peter Zijlstra, rafael.j.wysocki, Sean Christopherson,
	syzkaller-bugs, Thomas Gleixner, Vitaly Kuznetsov, Wanpeng Li,
	Will Deacon, the arch/x86 maintainers

On Fri, Apr 16, 2021 at 3:26 PM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Fri, Apr 16, 2021 at 3:13 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Fri, 16 Apr 2021 09:51:45 +0200
> > Dmitry Vyukov <dvyukov@google.com> wrote:
> > >
> > > If you look at substantial base of bisection logs, you will find lots
> > > of cases where bug types, functions don't match. Kernel crashes
> > > differently even on the same revision. And obviously things change if
> > > you change revisions. Also if you see presumably a different bug, what
> > > does it say regarding the original bug.
> >
> > Yes, but there are also several types of cases where the issue will be the
> > same. Namely lockdep. I agree that use after free warnings can have a side
> > effect, and may be more difficult.
>
> But how do we know it's lockdep, rather than a use-after-free
> manifested as lockdep?
> A Significant portion of kernel bugs are caused by concurrency and can
> manifest in different ways, e.g. these are not lockdep, or WARN, or
> use-after-free, but rather a race in nature.
>
> > But there's many other bugs that remain
> > consistent across kernels. And if you stumble on one of them, look for it
> > only.
>
> For example? Does not look to be true for WARN, BUG, KASAN,
> "inconsistent lock state".
>
>
> > And if you hit another bug, and if it doesn't crash, then ignore it (of
> > course this could be an issue if you have panic on warning set). But
> > otherwise, just skip it.
>
> It's not possible to skip, say, BUG.
> And if we skip, say, a use-after-free, how do we know we are not
> making things worse? Because now we are running on corrupted memory,
> so anything can happen. Definitely a stray lockdep report can happen,
> or other way around not happen when it should...
>
> > > I would very much like to improve automatic bisection quality, but it
> > > does not look trivial at all.
> > >
> > > Some random examples where, say, your hypothesis of WARN-to-WARN,
> > > BUG-to-BUG does not hold even on the same kernel revision (add to this
> >
> > At least lockdep to lockdep, as when I do manual bisects, that's exactly
> > what I look for, and ignore all other warnings. And that has found the
> > problem commit pretty much every time.
>
> What lockdep bug types do you mean? All?
> In the examples above you can see at least "inconsistent lock state"
> mixed with 2 other completely different bug types.

I've looked for some examples.

Here a reliable deadlock in io_uring changes to reliable task hung in io_uring:
run #6: crashed: possible deadlock in io_poll_double_wake
all runs: crashed: INFO: task hung in io_ring_exit_work
https://syzkaller.appspot.com/text?tag=Log&x=135e4d31d00000

Here deadlock is mixed with "lock held when returning to user space":
run #6: crashed: possible deadlock in inet_stream_connect
run #7: crashed: WARNING: lock held when returning to user space in
inet_stream_connect
https://syzkaller.appspot.com/text?tag=Log&x=109b88f2d00000

Here a reliable deadlock in n_tty_receive_buf_common changes to UAF in
n_tty_receive_buf_common:
all runs: crashed: possible deadlock in n_tty_receive_buf_common
all runs: crashed: KASAN: use-after-free Read in n_tty_receive_buf_common
https://syzkaller.appspot.com/text?tag=Log&x=106596b5e00000

Here deadlock changes to UAF and corrupted list:
run #4: crashed: possible deadlock in neigh_change_state
run #1: crashed: KASAN: use-after-free Read in neigh_mark_dead
run #2: crashed: BUG: corrupted list in ___neigh_create
run #3: crashed: BUG: corrupted list in neigh_mark_dead
https://syzkaller.appspot.com/text?tag=Log&x=1328a1e2e00000

But it also shows another interesting aspect, kernel output is
sometimes corrupted and kernel does not manage to print proper
complete oops message. In these cases we can't reliably say that it's
also a deadlock or not, we can't say anything about the crash, it's
unparsable. This problem also shows up here:
https://syzkaller.appspot.com/text?tag=Log&x=145c3a6d200000
and here
https://syzkaller.appspot.com/text?tag=Log&x=1006f18c600000
and in the second one also these 2 are mixed:
run #1: crashed: WARNING: locking bug in corrupted
run #2: crashed: WARNING: possible circular locking dependency detected

This one shows a related issue. Kernel crash output periodically
changes, and a testing system fails to parse it. This can happen if we
go back to older releases, or when we do fix bisection and test a very
new kernel with changed output:
https://syzkaller.appspot.com/text?tag=Log&x=108f4416e00000


It's also easy to imagine that a program that triggers 2 deadlocks on
different revisions, both are deadlocks, but completely different and
unrelated. And we can't look at function names as they change over
time.
All these things are almost trivial when you do bisection manually.
But for an automated system, they are quite challenging to handle.



> > > different revisions and the fact that a different bug does not give
> > > info regarding the original bug):
> > >
> >
> > Can you tell me that all these examples bisected to the commit that caused
> > the bug? Because if it did not, then you may have just proved my point ;-)
>
> I don't know now what was the result, but for a single run these were
> manifestations of the same root bug.
> E.g. see below, that's UAF in fuse_dev_do_read vs WARNING in
> request_end. request_end is also fuse. And you can see that a memory
> corruption causing a random bug type, in this case WARNING, but can as
> well be LOCKDEP.
>
>
> > > run #0: crashed: KASAN: use-after-free Read in fuse_dev_do_read
> > > run #1: crashed: WARNING in request_end
> > > run #2: crashed: KASAN: use-after-free Read in fuse_dev_do_read
> > > run #3: OK
> > > run #4: OK

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, back to index

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-13 17:33 [syzbot] possible deadlock in del_gendisk syzbot
2021-04-13 17:41 ` Steven Rostedt
2021-04-13 17:43   ` Steven Rostedt
2021-04-13 18:24     ` Dmitry Vyukov
2021-04-13 18:40       ` Steven Rostedt
2021-04-13 18:43         ` Steven Rostedt
2021-04-16  7:51           ` Bisections with different bug manifestations Dmitry Vyukov
2021-04-16 13:13             ` Steven Rostedt
2021-04-16 13:26               ` Dmitry Vyukov
2021-04-16 13:48                 ` Dmitry Vyukov

Linux-Block Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-block/0 linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ https://lore.kernel.org/linux-block \
		linux-block@vger.kernel.org
	public-inbox-index linux-block

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-block


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git