* WARNING in up_write @ 2018-04-03 2:01 syzbot 2018-04-04 19:24 ` Dmitry Vyukov 0 siblings, 1 reply; 16+ messages in thread From: syzbot @ 2018-04-03 2:01 UTC (permalink / raw) To: linux-fsdevel, linux-kernel, syzkaller-bugs, viro Hello, syzbot hit the following crash on upstream commit 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +0000) Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=dc5ab2babdf22ca091af So far this crash happened 8 times on upstream. C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5688491102961664 syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=5709211904245760 Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5720789257027584 Kernel config: https://syzkaller.appspot.com/x/.config?id=6801295859785128502 compiler: gcc (GCC) 7.1.1 20170620 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+dc5ab2babdf22ca091af@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. EXT4-fs (sda1): shut down requested (0) ------------[ cut here ]------------ DEBUG_LOCKS_WARN_ON(sem->owner != get_current()) WARNING: CPU: 1 PID: 4441 at kernel/locking/rwsem.c:133 up_write+0x1cc/0x210 kernel/locking/rwsem.c:133 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 4441 Comm: syzkaller594909 Not tainted 4.16.0+ #11 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x1a7/0x27d lib/dump_stack.c:53 panic+0x1f8/0x42c kernel/panic.c:183 __warn+0x1dc/0x200 kernel/panic.c:547 report_bug+0x1f4/0x2b0 lib/bug.c:186 fixup_bug.part.10+0x37/0x80 arch/x86/kernel/traps.c:178 fixup_bug arch/x86/kernel/traps.c:247 [inline] do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315 invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986 RIP: 0010:up_write+0x1cc/0x210 kernel/locking/rwsem.c:133 RSP: 0018:ffff8801b349f710 EFLAGS: 00010286 RAX: dffffc0000000008 RBX: ffff8801ccc0ce40 RCX: ffffffff815ae26e RDX: 0000000000000000 RSI: 1ffff10036693e92 RDI: 1ffff10036693e67 RBP: ffff8801b349f798 R08: fffffbfff10b0659 R09: fffffbfff10b0659 R10: ffff8801b349f708 R11: fffffbfff10b0658 R12: 1ffff10036693ee2 R13: dffffc0000000000 R14: ffff8801b349f770 R15: ffff8801ccc0ce98 percpu_up_write+0xca/0x110 kernel/locking/percpu-rwsem.c:183 sb_freeze_unlock fs/super.c:1390 [inline] thaw_super+0x1ca/0x260 fs/super.c:1524 thaw_bdev+0x151/0x180 fs/block_dev.c:555 ext4_shutdown fs/ext4/ioctl.c:489 [inline] ext4_ioctl+0x1f85/0x3e60 fs/ext4/ioctl.c:1048 vfs_ioctl fs/ioctl.c:46 [inline] do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686 SYSC_ioctl fs/ioctl.c:701 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x440109 RSP: 002b:00007fffce185d28 EFLAGS: 00000213 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440109 RDX: 0000000020000100 RSI: 000000008004587d RDI: 0000000000000003 RBP: 00000000006ca018 R08: 000000000000000f R09: 65732f636f72702f R10: 0000000000000000 R11: 0000000000000213 R12: 0000000000401990 R13: 0000000000401a20 R14: 0000000000000000 R15: 0000000000000000 Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled Rebooting in 86400 seconds.. --- This bug is generated by a dumb bot. It may contain errors. See https://goo.gl/tpsmEJ for details. Direct all questions to syzkaller@googlegroups.com. syzbot will keep track of this bug report. If you forgot to add the Reported-by tag, once the fix for this bug is merged into any tree, please reply to this email with: #syz fix: exact-commit-title If you want to test a patch for this bug, please reply with: #syz test: git://repo/address.git branch and provide the patch inline or as an attachment. To mark this as a duplicate of another syzbot report, please reply with: #syz dup: exact-subject-of-another-report If it's a one-off invalid bug report, please reply with: #syz invalid Note: if the crash happens again, it will cause creation of a new bug report. Note: all commands must start from beginning of the line in the email body. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: WARNING in up_write 2018-04-03 2:01 WARNING in up_write syzbot @ 2018-04-04 19:24 ` Dmitry Vyukov 2018-04-04 19:35 ` Matthew Wilcox 0 siblings, 1 reply; 16+ messages in thread From: Dmitry Vyukov @ 2018-04-04 19:24 UTC (permalink / raw) To: syzbot, Theodore Ts'o; +Cc: linux-fsdevel, LKML, syzkaller-bugs, Al Viro On Tue, Apr 3, 2018 at 4:01 AM, syzbot <syzbot+dc5ab2babdf22ca091af@syzkaller.appspotmail.com> wrote: > Hello, > > syzbot hit the following crash on upstream commit > 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +0000) > Merge branch 'ras-core-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > syzbot dashboard link: > https://syzkaller.appspot.com/bug?extid=dc5ab2babdf22ca091af > > So far this crash happened 8 times on upstream. > C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5688491102961664 > syzkaller reproducer: > https://syzkaller.appspot.com/x/repro.syz?id=5709211904245760 > Raw console output: > https://syzkaller.appspot.com/x/log.txt?id=5720789257027584 > Kernel config: > https://syzkaller.appspot.com/x/.config?id=6801295859785128502 > compiler: gcc (GCC) 7.1.1 20170620 > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+dc5ab2babdf22ca091af@syzkaller.appspotmail.com > It will help syzbot understand when the bug is fixed. See footer for > details. > If you forward the report, please keep this part and the footer. +Ted for ext4 frames > EXT4-fs (sda1): shut down requested (0) > ------------[ cut here ]------------ > DEBUG_LOCKS_WARN_ON(sem->owner != get_current()) > WARNING: CPU: 1 PID: 4441 at kernel/locking/rwsem.c:133 up_write+0x1cc/0x210 > kernel/locking/rwsem.c:133 > Kernel panic - not syncing: panic_on_warn set ... > > CPU: 1 PID: 4441 Comm: syzkaller594909 Not tainted 4.16.0+ #11 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:17 [inline] > dump_stack+0x1a7/0x27d lib/dump_stack.c:53 > panic+0x1f8/0x42c kernel/panic.c:183 > __warn+0x1dc/0x200 kernel/panic.c:547 > report_bug+0x1f4/0x2b0 lib/bug.c:186 > fixup_bug.part.10+0x37/0x80 arch/x86/kernel/traps.c:178 > fixup_bug arch/x86/kernel/traps.c:247 [inline] > do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296 > do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315 > invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986 > RIP: 0010:up_write+0x1cc/0x210 kernel/locking/rwsem.c:133 > RSP: 0018:ffff8801b349f710 EFLAGS: 00010286 > RAX: dffffc0000000008 RBX: ffff8801ccc0ce40 RCX: ffffffff815ae26e > RDX: 0000000000000000 RSI: 1ffff10036693e92 RDI: 1ffff10036693e67 > RBP: ffff8801b349f798 R08: fffffbfff10b0659 R09: fffffbfff10b0659 > R10: ffff8801b349f708 R11: fffffbfff10b0658 R12: 1ffff10036693ee2 > R13: dffffc0000000000 R14: ffff8801b349f770 R15: ffff8801ccc0ce98 > percpu_up_write+0xca/0x110 kernel/locking/percpu-rwsem.c:183 > sb_freeze_unlock fs/super.c:1390 [inline] > thaw_super+0x1ca/0x260 fs/super.c:1524 > thaw_bdev+0x151/0x180 fs/block_dev.c:555 > ext4_shutdown fs/ext4/ioctl.c:489 [inline] > ext4_ioctl+0x1f85/0x3e60 fs/ext4/ioctl.c:1048 > vfs_ioctl fs/ioctl.c:46 [inline] > do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686 > SYSC_ioctl fs/ioctl.c:701 [inline] > SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692 > do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 > entry_SYSCALL_64_after_hwframe+0x42/0xb7 > RIP: 0033:0x440109 > RSP: 002b:00007fffce185d28 EFLAGS: 00000213 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440109 > RDX: 0000000020000100 RSI: 000000008004587d RDI: 0000000000000003 > RBP: 00000000006ca018 R08: 000000000000000f R09: 65732f636f72702f > R10: 0000000000000000 R11: 0000000000000213 R12: 0000000000401990 > R13: 0000000000401a20 R14: 0000000000000000 R15: 0000000000000000 > Dumping ftrace buffer: > (ftrace buffer empty) > Kernel Offset: disabled > Rebooting in 86400 seconds.. > > > --- > This bug is generated by a dumb bot. It may contain errors. > See https://goo.gl/tpsmEJ for details. > Direct all questions to syzkaller@googlegroups.com. > > syzbot will keep track of this bug report. > If you forgot to add the Reported-by tag, once the fix for this bug is > merged > into any tree, please reply to this email with: > #syz fix: exact-commit-title > If you want to test a patch for this bug, please reply with: > #syz test: git://repo/address.git branch > and provide the patch inline or as an attachment. > To mark this as a duplicate of another syzbot report, please reply with: > #syz dup: exact-subject-of-another-report > If it's a one-off invalid bug report, please reply with: > #syz invalid > Note: if the crash happens again, it will cause creation of a new bug > report. > Note: all commands must start from beginning of the line in the email body. > > -- > You received this message because you are subscribed to the Google Groups > "syzkaller-bugs" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to syzkaller-bugs+unsubscribe@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/syzkaller-bugs/001a1148578c10e4700568e814eb%40google.com. > For more options, visit https://groups.google.com/d/optout. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: WARNING in up_write 2018-04-04 19:24 ` Dmitry Vyukov @ 2018-04-04 19:35 ` Matthew Wilcox 2018-04-05 3:22 ` Theodore Y. Ts'o 0 siblings, 1 reply; 16+ messages in thread From: Matthew Wilcox @ 2018-04-04 19:35 UTC (permalink / raw) To: Dmitry Vyukov Cc: syzbot, Theodore Ts'o, linux-fsdevel, LKML, syzkaller-bugs, Al Viro On Wed, Apr 04, 2018 at 09:24:05PM +0200, Dmitry Vyukov wrote: > On Tue, Apr 3, 2018 at 4:01 AM, syzbot > <syzbot+dc5ab2babdf22ca091af@syzkaller.appspotmail.com> wrote: > > DEBUG_LOCKS_WARN_ON(sem->owner != get_current()) > > WARNING: CPU: 1 PID: 4441 at kernel/locking/rwsem.c:133 up_write+0x1cc/0x210 > > kernel/locking/rwsem.c:133 > > Kernel panic - not syncing: panic_on_warn set ... Message-Id: <1522852646-2196-1-git-send-email-longman@redhat.com> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: WARNING in up_write 2018-04-04 19:35 ` Matthew Wilcox @ 2018-04-05 3:22 ` Theodore Y. Ts'o 2018-04-05 3:24 ` Matthew Wilcox 0 siblings, 1 reply; 16+ messages in thread From: Theodore Y. Ts'o @ 2018-04-05 3:22 UTC (permalink / raw) To: Matthew Wilcox Cc: Dmitry Vyukov, syzbot, linux-fsdevel, LKML, syzkaller-bugs, Al Viro On Wed, Apr 04, 2018 at 12:35:04PM -0700, Matthew Wilcox wrote: > On Wed, Apr 04, 2018 at 09:24:05PM +0200, Dmitry Vyukov wrote: > > On Tue, Apr 3, 2018 at 4:01 AM, syzbot > > <syzbot+dc5ab2babdf22ca091af@syzkaller.appspotmail.com> wrote: > > > DEBUG_LOCKS_WARN_ON(sem->owner != get_current()) > > > WARNING: CPU: 1 PID: 4441 at kernel/locking/rwsem.c:133 up_write+0x1cc/0x210 > > > kernel/locking/rwsem.c:133 > > > Kernel panic - not syncing: panic_on_warn set ... > > Message-Id: <1522852646-2196-1-git-send-email-longman@redhat.com> > We were way ahead of syzbot in this case. :-) I reported the problem Tuesday morning: https://lkml.org/lkml/2018/4/4/814 And within a few hours Waiman had proposed a fix: https://patchwork.kernel.org/patch/10322639/ Note also that it's not ext4 specific. It can be trivially reproduced using any one of: kvm-xfstests -c ext4 generic/068 kvm-xfstests -c btrfs generic/068 kvm-xfstests -c xfs generic/068 (Basically, any file system that supports freeze/thaw.) Cheers, - Ted ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: WARNING in up_write 2018-04-05 3:22 ` Theodore Y. Ts'o @ 2018-04-05 3:24 ` Matthew Wilcox 2018-04-05 8:22 ` Dmitry Vyukov 2018-04-05 22:32 ` Dave Chinner 0 siblings, 2 replies; 16+ messages in thread From: Matthew Wilcox @ 2018-04-05 3:24 UTC (permalink / raw) To: Theodore Y. Ts'o, Dmitry Vyukov, syzbot, linux-fsdevel, LKML, syzkaller-bugs, Al Viro On Wed, Apr 04, 2018 at 11:22:00PM -0400, Theodore Y. Ts'o wrote: > On Wed, Apr 04, 2018 at 12:35:04PM -0700, Matthew Wilcox wrote: > > On Wed, Apr 04, 2018 at 09:24:05PM +0200, Dmitry Vyukov wrote: > > > On Tue, Apr 3, 2018 at 4:01 AM, syzbot > > > <syzbot+dc5ab2babdf22ca091af@syzkaller.appspotmail.com> wrote: > > > > DEBUG_LOCKS_WARN_ON(sem->owner != get_current()) > > > > WARNING: CPU: 1 PID: 4441 at kernel/locking/rwsem.c:133 up_write+0x1cc/0x210 > > > > kernel/locking/rwsem.c:133 > > > > Kernel panic - not syncing: panic_on_warn set ... > > > > Message-Id: <1522852646-2196-1-git-send-email-longman@redhat.com> > > > > We were way ahead of syzbot in this case. :-) Not really ... syzbot caught it Monday evening ;-) Date: Mon, 02 Apr 2018 19:01:01 -0700 From: syzbot <syzbot+dc5ab2babdf22ca091af@syzkaller.appspotmail.com> To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com, viro@zeniv.linux.org.uk Subject: WARNING in up_write ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: WARNING in up_write 2018-04-05 3:24 ` Matthew Wilcox @ 2018-04-05 8:22 ` Dmitry Vyukov 2018-09-04 8:28 ` Dmitry Vyukov 2018-04-05 22:32 ` Dave Chinner 1 sibling, 1 reply; 16+ messages in thread From: Dmitry Vyukov @ 2018-04-05 8:22 UTC (permalink / raw) To: Matthew Wilcox Cc: Theodore Y. Ts'o, syzbot, linux-fsdevel, LKML, syzkaller-bugs, Al Viro On Thu, Apr 5, 2018 at 5:24 AM, Matthew Wilcox <willy@infradead.org> wrote: > On Wed, Apr 04, 2018 at 11:22:00PM -0400, Theodore Y. Ts'o wrote: >> On Wed, Apr 04, 2018 at 12:35:04PM -0700, Matthew Wilcox wrote: >> > On Wed, Apr 04, 2018 at 09:24:05PM +0200, Dmitry Vyukov wrote: >> > > On Tue, Apr 3, 2018 at 4:01 AM, syzbot >> > > <syzbot+dc5ab2babdf22ca091af@syzkaller.appspotmail.com> wrote: >> > > > DEBUG_LOCKS_WARN_ON(sem->owner != get_current()) >> > > > WARNING: CPU: 1 PID: 4441 at kernel/locking/rwsem.c:133 up_write+0x1cc/0x210 >> > > > kernel/locking/rwsem.c:133 >> > > > Kernel panic - not syncing: panic_on_warn set ... >> > >> > Message-Id: <1522852646-2196-1-git-send-email-longman@redhat.com> >> > >> >> We were way ahead of syzbot in this case. :-) > > Not really ... syzbot caught it Monday evening ;-) > > Date: Mon, 02 Apr 2018 19:01:01 -0700 > From: syzbot <syzbot+dc5ab2babdf22ca091af@syzkaller.appspotmail.com> > To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, > syzkaller-bugs@googlegroups.com, viro@zeniv.linux.org.uk > Subject: WARNING in up_write :) #syz fix: locking/rwsem: Add up_write_non_owner() for percpu_up_write() ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: WARNING in up_write 2018-04-05 8:22 ` Dmitry Vyukov @ 2018-09-04 8:28 ` Dmitry Vyukov 0 siblings, 0 replies; 16+ messages in thread From: Dmitry Vyukov @ 2018-09-04 8:28 UTC (permalink / raw) To: Matthew Wilcox Cc: Theodore Y. Ts'o, syzbot, linux-fsdevel, LKML, syzkaller-bugs, Al Viro On Thu, Apr 5, 2018 at 10:22 AM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Thu, Apr 5, 2018 at 5:24 AM, Matthew Wilcox <willy@infradead.org> wrote: >> On Wed, Apr 04, 2018 at 11:22:00PM -0400, Theodore Y. Ts'o wrote: >>> On Wed, Apr 04, 2018 at 12:35:04PM -0700, Matthew Wilcox wrote: >>> > On Wed, Apr 04, 2018 at 09:24:05PM +0200, Dmitry Vyukov wrote: >>> > > On Tue, Apr 3, 2018 at 4:01 AM, syzbot >>> > > <syzbot+dc5ab2babdf22ca091af@syzkaller.appspotmail.com> wrote: >>> > > > DEBUG_LOCKS_WARN_ON(sem->owner != get_current()) >>> > > > WARNING: CPU: 1 PID: 4441 at kernel/locking/rwsem.c:133 up_write+0x1cc/0x210 >>> > > > kernel/locking/rwsem.c:133 >>> > > > Kernel panic - not syncing: panic_on_warn set ... >>> > >>> > Message-Id: <1522852646-2196-1-git-send-email-longman@redhat.com> >>> > >>> >>> We were way ahead of syzbot in this case. :-) >> >> Not really ... syzbot caught it Monday evening ;-) >> >> Date: Mon, 02 Apr 2018 19:01:01 -0700 >> From: syzbot <syzbot+dc5ab2babdf22ca091af@syzkaller.appspotmail.com> >> To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, >> syzkaller-bugs@googlegroups.com, viro@zeniv.linux.org.uk >> Subject: WARNING in up_write > > :) > > #syz fix: locking/rwsem: Add up_write_non_owner() for percpu_up_write() The title was later changed to: #syz fix: locking/rwsem: Add a new RWSEM_ANONYMOUSLY_OWNED flag ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: WARNING in up_write 2018-04-05 3:24 ` Matthew Wilcox 2018-04-05 8:22 ` Dmitry Vyukov @ 2018-04-05 22:32 ` Dave Chinner 2018-04-06 0:13 ` Eric Biggers 1 sibling, 1 reply; 16+ messages in thread From: Dave Chinner @ 2018-04-05 22:32 UTC (permalink / raw) To: Matthew Wilcox Cc: Theodore Y. Ts'o, Dmitry Vyukov, syzbot, linux-fsdevel, LKML, syzkaller-bugs, Al Viro On Wed, Apr 04, 2018 at 08:24:54PM -0700, Matthew Wilcox wrote: > On Wed, Apr 04, 2018 at 11:22:00PM -0400, Theodore Y. Ts'o wrote: > > On Wed, Apr 04, 2018 at 12:35:04PM -0700, Matthew Wilcox wrote: > > > On Wed, Apr 04, 2018 at 09:24:05PM +0200, Dmitry Vyukov wrote: > > > > On Tue, Apr 3, 2018 at 4:01 AM, syzbot > > > > <syzbot+dc5ab2babdf22ca091af@syzkaller.appspotmail.com> wrote: > > > > > DEBUG_LOCKS_WARN_ON(sem->owner != get_current()) > > > > > WARNING: CPU: 1 PID: 4441 at kernel/locking/rwsem.c:133 up_write+0x1cc/0x210 > > > > > kernel/locking/rwsem.c:133 > > > > > Kernel panic - not syncing: panic_on_warn set ... > > > > > > Message-Id: <1522852646-2196-1-git-send-email-longman@redhat.com> > > > > > > > We were way ahead of syzbot in this case. :-) > > Not really ... syzbot caught it Monday evening ;-) Rather than arguing over who reported it first, I think that time would be better spent reflecting on why the syzbot report was completely ignored until *after* Ted diagnosed the issue independently and Waiman had already fixed it.... Clearly there is scope for improvement here. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: WARNING in up_write 2018-04-05 22:32 ` Dave Chinner @ 2018-04-06 0:13 ` Eric Biggers 2018-04-06 1:37 ` Theodore Y. Ts'o 2018-04-06 2:01 ` WARNING in up_write Dave Chinner 0 siblings, 2 replies; 16+ messages in thread From: Eric Biggers @ 2018-04-06 0:13 UTC (permalink / raw) To: Dave Chinner Cc: Matthew Wilcox, Theodore Y. Ts'o, Dmitry Vyukov, syzbot, linux-fsdevel, LKML, syzkaller-bugs, Al Viro On Fri, Apr 06, 2018 at 08:32:26AM +1000, Dave Chinner wrote: > On Wed, Apr 04, 2018 at 08:24:54PM -0700, Matthew Wilcox wrote: > > On Wed, Apr 04, 2018 at 11:22:00PM -0400, Theodore Y. Ts'o wrote: > > > On Wed, Apr 04, 2018 at 12:35:04PM -0700, Matthew Wilcox wrote: > > > > On Wed, Apr 04, 2018 at 09:24:05PM +0200, Dmitry Vyukov wrote: > > > > > On Tue, Apr 3, 2018 at 4:01 AM, syzbot > > > > > <syzbot+dc5ab2babdf22ca091af@syzkaller.appspotmail.com> wrote: > > > > > > DEBUG_LOCKS_WARN_ON(sem->owner != get_current()) > > > > > > WARNING: CPU: 1 PID: 4441 at kernel/locking/rwsem.c:133 up_write+0x1cc/0x210 > > > > > > kernel/locking/rwsem.c:133 > > > > > > Kernel panic - not syncing: panic_on_warn set ... > > > > > > > > Message-Id: <1522852646-2196-1-git-send-email-longman@redhat.com> > > > > > > > > > > We were way ahead of syzbot in this case. :-) > > > > Not really ... syzbot caught it Monday evening ;-) > > Rather than arguing over who reported it first, I think that time > would be better spent reflecting on why the syzbot report was > completely ignored until *after* Ted diagnosed the issue > independently and Waiman had already fixed it.... > > Clearly there is scope for improvement here. > > Cheers, > Well, ultimately a human needed to investigate the syzbot bug report to figure out what was really going on. In my view, the largest problem is that there are simply too many bugs, so many are getting ignored. If there were only a few bugs, then Dmitry would investigate each one and send a "real" bug report of better quality than the automated system can provide, or even send a fix directly. But in reality, on the same day this bug was reported, syzbot also found 10 other bugs, and in the previous 2 days it had found 38 more. No single person can keep up with that. You can see the current bug list, which has 172 open bugs, on the dashboard at https://syzkaller.appspot.com/. Yes, the kernel really is that broken. Though, of course most bugs are in specific modules, not the core kernel. And although quite a few of these bugs will end up to be duplicates or even already fixed, a human still has to look at each one to figure that out. (Though, I do think that syzbot should try to automatically detect when a reproducible bug was already fixed, via bisection. It would cause a few bugs to be incorrectly considered fixed, but it may be a worthwhile tradeoff.) These bugs are all over the kernel as well, so most developers don't see the big picture but rather just see a few bugs for "their" subsystem on "their" subsystem's mailing list and sometimes demand special attention. Of course, it's great when people suggest ways to improve the process. But it's not great when people just don't feel responsible for fixing bugs and wait for Someone Else to do it. I'm hoping that in the future the syzbot "team", which seems to actually be just Dmitry now, can get more resources towards helping fix the bugs. But either way, in the end Linux is a community effort. Note also that syzbot wasn't super useful in this particular case because people running xfstests came across the same bug. But, this is actually a rare case. Most syzbot bug reports have been for weird corner cases or races that no one ever thought of before, so there are no existing tests that find them. Thanks, Eric ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: WARNING in up_write 2018-04-06 0:13 ` Eric Biggers @ 2018-04-06 1:37 ` Theodore Y. Ts'o 2018-04-08 6:31 ` Running syzkaller repros using kvm-xfstests Theodore Y. Ts'o 2018-04-06 2:01 ` WARNING in up_write Dave Chinner 1 sibling, 1 reply; 16+ messages in thread From: Theodore Y. Ts'o @ 2018-04-06 1:37 UTC (permalink / raw) To: Eric Biggers Cc: Dave Chinner, Matthew Wilcox, Dmitry Vyukov, syzbot, linux-fsdevel, LKML, syzkaller-bugs, Al Viro On Thu, Apr 05, 2018 at 05:13:25PM -0700, Eric Biggers wrote: > Well, ultimately a human needed to investigate the syzbot bug report to figure > out what was really going on. In my view, the largest problem is that there are > simply too many bugs, so many are getting ignored. If there were only a few > bugs, then Dmitry would investigate each one and send a "real" bug report of > better quality than the automated system can provide, or even send a fix > directly. But in reality, on the same day this bug was reported, syzbot also > found 10 other bugs, and in the previous 2 days it had found 38 more. No single > person can keep up with that. You can see the current bug list, which has 172 > open bugs, on the dashboard at https://syzkaller.appspot.com/. Yes, the kernel > really is that broken. Though, of course most bugs are in specific modules, not > the core kernel. There are a lot of bugs, so it needs to be easier for humans to figure out which ones they should care about. And not all bugs are created equal. Some are WARN_ON's that aren't all that important. Others will hard crash the kernel, but are not likely to be something that can be turned into a privilege escalation attack. Some bugs are trivially reproducible, and some take a lot more effort. Making it easier for humans to decide which ones should be looked at first would certainly be helpful.y For me the prioritization goes as follows. 1) Is it a regression? If it's a regression, I want to fix it fast. 2) Is it something that can be easily escalated to a privilege escalation attack? Again, if so, I want to fix it fast. 3) Is it going to get in the way of my development process? Things that trigger new xfstests failures are important, because it's how I detect (1). So I ignored the Syzkaller reports this week because it's hard to differentiate important bugs from less important ones, and after the merge window, I want to make sure that I have not introduced any regressions, and I also want to make sure that commits getting merged by others have not introduced any regressions in the testing suite that I use, which is xfstests. This is why I've been asking for the bisection feature --- not to find out when a bug has been fixed, but to find out when a bug has been *introduced*. If I know that this a bug which has recently introduced, especially if it has been recently introduced by commits in my tree, or which I have recently pushed to Linus, I'm going to care a lot more. If I can't make that determination, I'm going to deprioritize that bug in favor of those that definitely do meet these criteria. It's not a matter of waiting for someone else to fix it (although I won't complain if someone does :-). It's that I'm overloaded, and I have to prioritize the work that I do. If syzbot reports are hard to parse or hard to prioritize, then I may end up prioritizing other work as being more important. Sorry, but that's just the way that it is. Note that I haven't just been complaining about it. I've been working on ways so that the gce-xfstests and kvm-xfstests test appliances can more easily be used to work on Syzbot reports. If I can make myself more efficient, or help other people be more efficient, that's arguably more important than trying to fix some of the 174 currently open Syzbot issues --- unless you can tell me that certain ones are super urgent because they (for example) result in CVSS score > 8. Cheers, - Ted ^ permalink raw reply [flat|nested] 16+ messages in thread
* Running syzkaller repros using kvm-xfstests 2018-04-06 1:37 ` Theodore Y. Ts'o @ 2018-04-08 6:31 ` Theodore Y. Ts'o 2018-04-08 13:18 ` Dmitry Vyukov 0 siblings, 1 reply; 16+ messages in thread From: Theodore Y. Ts'o @ 2018-04-08 6:31 UTC (permalink / raw) To: Eric Biggers, Dave Chinner, Matthew Wilcox, Dmitry Vyukov, linux-fsdevel, LKML, syzkaller, Al Viro On Thu, Apr 05, 2018 at 09:37:41PM -0400, Theodore Y. Ts'o wrote: > Note that I haven't just been complaining about it. I've been working > on ways so that the gce-xfstests and kvm-xfstests test appliances can > more easily be used to work on Syzbot reports. If I can make myself > more efficient, or help other people be more efficient, that's > arguably more important than trying to fix some of the 174 currently > open Syzbot issues --- unless you can tell me that certain ones are > super urgent because they (for example) result in CVSS score > 8. I've got an initial version of this working for kvm-xfstests. To try it out, grab the latest version of xfstests-bld from [1], and the kvm-xfstests image from [2]. For people who have never tried using kvm-xfstests, see [3]. [1] https://github.com/tytso/xfstests-bld [2] https://www.kernel.org/pub/linux/kernel/people/tytso/kvm-xfstests/testing/root_fs.img.x86_64 [3] https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md If you're interested, please try it out, and send me comments. Sample usage: kvm-xfstest syz <path/to/repro.{c,syz}> kvm-xfstest syz <URL to repro.{c,syz}> Example run: % kvm-xfstests syz https://syzkaller.appspot.com/x/repro.syz?id=5709211904245760 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0100 533 100 533 0 0 2157 0 --:--:-- --:--:-- --:--:-- 2157 Saved downloaded copy at /tmp/tytso-downloaded-repro.syz Networking disabled. KERNEL: kernel 4.16.0-xfstests-09576-g38c23685b273 #134 SMP Sun Apr 8 01:36:01 EDT 2018 x86_64 FSTESTVER: e2fsprogs v1.43.6-85-g7595699d0 (Wed, 6 Sep 2017 22:04:14 -0400) FSTESTVER: fio fio-3.2 (Fri, 3 Nov 2017 15:23:49 -0600) FSTESTVER: quota 59b280e (Mon, 5 Feb 2018 16:48:22 +0100) FSTESTVER: stress-ng 977ae35 (Wed, 6 Sep 2017 23:45:03 -0400) FSTESTVER: syzkaller 66f22a7f (Sat, 7 Apr 2018 14:02:03 +0200) FSTESTVER: xfsprogs v4.15.1 (Mon, 26 Feb 2018 19:50:56 -0600) FSTESTVER: xfstests-bld 3be913e (Sun, 8 Apr 2018 01:19:21 -0400) FSTESTVER: xfstests linux-v3.8-1925-g62cc6d02 (Fri, 23 Mar 2018 22:26:41 -0400) FSTESTCFG: "all" FSTESTSET: "syz/001" FSTESTEXC: "" FSTESTOPT: "aex" MNTOPTS: "" CPUS: "2" MEM: "1684.65" total used free shared buff/cache available Mem: 1684 140 1479 8 65 1507 Swap: 0 0 0 BEGIN TEST 4k (1 test): Ext4 4k block Sun Apr 8 01:49:02 EDT 2018 DEVICE: /dev/vdd EXT_MKFS_OPTIONS: -b 4096 EXT_MOUNT_OPTIONS: -o block_validity FSTYP -- ext4 PLATFORM -- Linux/x86_64 kvm-xfstests 4.16.0-xfstests-09576-g38c23685b273 MKFS_OPTIONS -- -b 4096 /dev/vdc MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity /dev/vdc /vdc syz/001 [01:49:04][ 22.859794] run fstests syz/001 at 2018-04-08 01:49:04 [ 23.385195] EXT4-fs (vdc): mounted filesystem with ordered data mode. Opts: acl,user_xattr,block_validity [ 23.797611] EXT4-fs (vda): shut down requested (0) [ 23.855759] ------------[ cut here ]------------ [ 23.860823] DEBUG_LOCKS_WARN_ON(sem->owner != get_current()) [ 23.860881] WARNING: CPU: 1 PID: 1332 at /usr/projects/linux/ext4/kernel/locking/rwsem.c:133 up_write+0x113/0x150 [ 23.876121] CPU: 1 PID: 1332 Comm: syz-executor0 Not tainted 4.16.0-xfstests-09576-g38c23685b273 #134 [ 23.880836] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 23.884080] RIP: 0010:up_write+0x113/0x150 [ 23.885873] RSP: 0018:ffff88005e0b7a68 EFLAGS: 00010286 [ 23.887902] RAX: dffffc0000000008 RBX: ffff880066069038 RCX: ffffffff9002f2ce [ 23.890392] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000293 [ 23.892200] RBP: ffff8800660690a0 R08: fffffbfff245d71d R09: fffffbfff245d71d [ 23.894877] R10: ffff88007ffca050 R11: fffffbfff245d71c R12: ffff880066068ce0 [ 23.897244] R13: ffff880066068a30 R14: ffff8800660691e0 R15: ffffffff902fe397 [ 23.899597] FS: 000000000275c940(0000) GS:ffff88006d600000(0000) knlGS:0000000000000000 [ 23.902104] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 23.903808] CR2: 00000000006dbb18 CR3: 0000000067c7c000 CR4: 00000000000006e0 [ 23.905954] Call Trace: [ 23.906721] percpu_up_write+0x4c/0x60 [ 23.907868] thaw_super+0x1c4/0x250 [ 23.908943] thaw_bdev+0x14a/0x170 [ 23.909996] ext4_ioctl+0x1fd8/0x39a0 [ 23.911114] ? alloc_set_pte+0x66d/0xe50 [ 23.912318] ? ext4_ioctl_setflags+0x600/0x600 [ 23.913672] ? drop_futex_key_refs.isra.3+0x65/0xb0 [ 23.915106] ? futex_wake+0x14a/0x400 [ 23.916242] ? futex_wait_restart+0x1e0/0x1e0 [ 23.917589] ? lock_contended+0xd30/0xd30 [ 23.918805] ? alloc_set_pte+0x330/0xe50 [ 23.920025] ? kvm_sched_clock_read+0x21/0x30 [ 23.921369] ? sched_clock+0x5/0x10 [ 23.922442] ? sched_clock_cpu+0x18/0x180 [ 23.923691] ? do_futex+0x3ab/0xa90 [ 23.924783] ? exit_robust_list+0x240/0x240 [ 23.926076] ? do_raw_spin_unlock+0x54/0x220 [ 23.927388] ? ext4_ioctl_setflags+0x600/0x600 [ 23.928758] do_vfs_ioctl+0x18b/0xfb0 [ 23.929893] ? ioctl_preallocate+0x1a0/0x1a0 [ 23.931204] ? SyS_futex+0x1c9/0x270 [ 23.932304] ? SyS_futex+0x1d2/0x270 [ 23.933412] ? do_futex+0xa90/0xa90 [ 23.934502] ? up_read+0x1c/0x110 [ 23.935532] ksys_ioctl+0x42/0x80 [ 23.936564] SyS_ioctl+0x23/0x30 [ 23.937567] ? ksys_ioctl+0x80/0x80 [ 23.938649] do_syscall_64+0x1a0/0x640 [ 23.939813] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 23.941360] RIP: 0033:0x455289 [ 23.942298] RSP: 002b:00007ffea24780d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 23.944588] RAX: ffffffffffffffda RBX: 000000000070bea0 RCX: 0000000000455289 [ 23.946762] RDX: 0000000020000100 RSI: 000000008004587d RDI: 0000000000000003 [ 23.948924] RBP: 000000000275c914 R08: 0000000000000000 R09: 0000000000000000 [ 23.951102] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff [ 23.953287] R13: 00000000000001c5 R14: 00000000006dbb18 R15: 00000000006d90a0 [ 23.955435] Code: 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 48 8b 05 14 d0 c2 03 85 c0 75 86 48 c7 c6 60 2c c6 91 48 c7 c7 20 2c c6 91 e8 ad da f1 ff <0f> 0b e9 6c ff ff ff e8 01 a1 2d 00 e9 2a ff ff ff 48 89 ef e8 [ 23.960064] ---[ end trace f542ead798faa3a9 ]--- .... ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Running syzkaller repros using kvm-xfstests 2018-04-08 6:31 ` Running syzkaller repros using kvm-xfstests Theodore Y. Ts'o @ 2018-04-08 13:18 ` Dmitry Vyukov 2018-04-08 18:02 ` Theodore Y. Ts'o 0 siblings, 1 reply; 16+ messages in thread From: Dmitry Vyukov @ 2018-04-08 13:18 UTC (permalink / raw) To: Theodore Y. Ts'o, Eric Biggers, Dave Chinner, Matthew Wilcox, Dmitry Vyukov, linux-fsdevel, LKML, syzkaller, Al Viro On Sun, Apr 8, 2018 at 8:31 AM, Theodore Y. Ts'o <tytso@mit.edu> wrote: > On Thu, Apr 05, 2018 at 09:37:41PM -0400, Theodore Y. Ts'o wrote: >> Note that I haven't just been complaining about it. I've been working >> on ways so that the gce-xfstests and kvm-xfstests test appliances can >> more easily be used to work on Syzbot reports. If I can make myself >> more efficient, or help other people be more efficient, that's >> arguably more important than trying to fix some of the 174 currently >> open Syzbot issues --- unless you can tell me that certain ones are >> super urgent because they (for example) result in CVSS score > 8. > > I've got an initial version of this working for kvm-xfstests. To try > it out, grab the latest version of xfstests-bld from [1], and the > kvm-xfstests image from [2]. For people who have never tried using > kvm-xfstests, see [3]. > > [1] https://github.com/tytso/xfstests-bld > [2] https://www.kernel.org/pub/linux/kernel/people/tytso/kvm-xfstests/testing/root_fs.img.x86_64 > [3] https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md > > If you're interested, please try it out, and send me comments. > > Sample usage: > > kvm-xfstest syz <path/to/repro.{c,syz}> > kvm-xfstest syz <URL to repro.{c,syz}> > > Example run: > > % kvm-xfstests syz https://syzkaller.appspot.com/x/repro.syz?id=5709211904245760 /\/\/\/\/\/\/\/\ Nice! But note that syzkaller is under active development, so pre-canned binaries may not always work. Mismatching binary may not understand all syscalls, fail to parse program, interpret arguments differently, execute program differently, setup a different environment for the test, etc. Now a C program captures all of this, because code that transforms syzkaller programs into C is versioned along with the rest of the system. Strictly saying, for syzkaller reproducers one needs to use the exact syzkaller revision listed along with the reproducer, see for example: https://syzkaller.appspot.com/bug?id=3fb9c4777053e79a6d2a65ac3738664c87629a21 The "#syz test" syzbot command does this. Using a different syzkaller revision may or may not work. > % Total % Received % Xferd Average Speed Time Time Time Current > Dload Upload Total Spent Left Speed > 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0100 533 100 533 0 0 2157 0 --:--:-- --:--:-- --:--:-- 2157 > Saved downloaded copy at /tmp/tytso-downloaded-repro.syz > Networking disabled. > KERNEL: kernel 4.16.0-xfstests-09576-g38c23685b273 #134 SMP Sun Apr 8 01:36:01 EDT 2018 x86_64 > FSTESTVER: e2fsprogs v1.43.6-85-g7595699d0 (Wed, 6 Sep 2017 22:04:14 -0400) > FSTESTVER: fio fio-3.2 (Fri, 3 Nov 2017 15:23:49 -0600) > FSTESTVER: quota 59b280e (Mon, 5 Feb 2018 16:48:22 +0100) > FSTESTVER: stress-ng 977ae35 (Wed, 6 Sep 2017 23:45:03 -0400) > FSTESTVER: syzkaller 66f22a7f (Sat, 7 Apr 2018 14:02:03 +0200) > FSTESTVER: xfsprogs v4.15.1 (Mon, 26 Feb 2018 19:50:56 -0600) > FSTESTVER: xfstests-bld 3be913e (Sun, 8 Apr 2018 01:19:21 -0400) > FSTESTVER: xfstests linux-v3.8-1925-g62cc6d02 (Fri, 23 Mar 2018 22:26:41 -0400) > FSTESTCFG: "all" > FSTESTSET: "syz/001" > FSTESTEXC: "" > FSTESTOPT: "aex" > MNTOPTS: "" > CPUS: "2" > MEM: "1684.65" > total used free shared buff/cache available > Mem: 1684 140 1479 8 65 1507 > Swap: 0 0 0 > BEGIN TEST 4k (1 test): Ext4 4k block Sun Apr 8 01:49:02 EDT 2018 > DEVICE: /dev/vdd > EXT_MKFS_OPTIONS: -b 4096 > EXT_MOUNT_OPTIONS: -o block_validity > FSTYP -- ext4 > PLATFORM -- Linux/x86_64 kvm-xfstests 4.16.0-xfstests-09576-g38c23685b273 > MKFS_OPTIONS -- -b 4096 /dev/vdc > MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity /dev/vdc /vdc > > syz/001 [01:49:04][ 22.859794] run fstests syz/001 at 2018-04-08 01:49:04 > [ 23.385195] EXT4-fs (vdc): mounted filesystem with ordered data mode. Opts: acl,user_xattr,block_validity > [ 23.797611] EXT4-fs (vda): shut down requested (0) > [ 23.855759] ------------[ cut here ]------------ > [ 23.860823] DEBUG_LOCKS_WARN_ON(sem->owner != get_current()) > [ 23.860881] WARNING: CPU: 1 PID: 1332 at /usr/projects/linux/ext4/kernel/locking/rwsem.c:133 up_write+0x113/0x150 > [ 23.876121] CPU: 1 PID: 1332 Comm: syz-executor0 Not tainted 4.16.0-xfstests-09576-g38c23685b273 #134 > [ 23.880836] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 > [ 23.884080] RIP: 0010:up_write+0x113/0x150 > [ 23.885873] RSP: 0018:ffff88005e0b7a68 EFLAGS: 00010286 > [ 23.887902] RAX: dffffc0000000008 RBX: ffff880066069038 RCX: ffffffff9002f2ce > [ 23.890392] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000293 > [ 23.892200] RBP: ffff8800660690a0 R08: fffffbfff245d71d R09: fffffbfff245d71d > [ 23.894877] R10: ffff88007ffca050 R11: fffffbfff245d71c R12: ffff880066068ce0 > [ 23.897244] R13: ffff880066068a30 R14: ffff8800660691e0 R15: ffffffff902fe397 > [ 23.899597] FS: 000000000275c940(0000) GS:ffff88006d600000(0000) knlGS:0000000000000000 > [ 23.902104] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 23.903808] CR2: 00000000006dbb18 CR3: 0000000067c7c000 CR4: 00000000000006e0 > [ 23.905954] Call Trace: > [ 23.906721] percpu_up_write+0x4c/0x60 > [ 23.907868] thaw_super+0x1c4/0x250 > [ 23.908943] thaw_bdev+0x14a/0x170 > [ 23.909996] ext4_ioctl+0x1fd8/0x39a0 > [ 23.911114] ? alloc_set_pte+0x66d/0xe50 > [ 23.912318] ? ext4_ioctl_setflags+0x600/0x600 > [ 23.913672] ? drop_futex_key_refs.isra.3+0x65/0xb0 > [ 23.915106] ? futex_wake+0x14a/0x400 > [ 23.916242] ? futex_wait_restart+0x1e0/0x1e0 > [ 23.917589] ? lock_contended+0xd30/0xd30 > [ 23.918805] ? alloc_set_pte+0x330/0xe50 > [ 23.920025] ? kvm_sched_clock_read+0x21/0x30 > [ 23.921369] ? sched_clock+0x5/0x10 > [ 23.922442] ? sched_clock_cpu+0x18/0x180 > [ 23.923691] ? do_futex+0x3ab/0xa90 > [ 23.924783] ? exit_robust_list+0x240/0x240 > [ 23.926076] ? do_raw_spin_unlock+0x54/0x220 > [ 23.927388] ? ext4_ioctl_setflags+0x600/0x600 > [ 23.928758] do_vfs_ioctl+0x18b/0xfb0 > [ 23.929893] ? ioctl_preallocate+0x1a0/0x1a0 > [ 23.931204] ? SyS_futex+0x1c9/0x270 > [ 23.932304] ? SyS_futex+0x1d2/0x270 > [ 23.933412] ? do_futex+0xa90/0xa90 > [ 23.934502] ? up_read+0x1c/0x110 > [ 23.935532] ksys_ioctl+0x42/0x80 > [ 23.936564] SyS_ioctl+0x23/0x30 > [ 23.937567] ? ksys_ioctl+0x80/0x80 > [ 23.938649] do_syscall_64+0x1a0/0x640 > [ 23.939813] entry_SYSCALL_64_after_hwframe+0x42/0xb7 > [ 23.941360] RIP: 0033:0x455289 > [ 23.942298] RSP: 002b:00007ffea24780d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > [ 23.944588] RAX: ffffffffffffffda RBX: 000000000070bea0 RCX: 0000000000455289 > [ 23.946762] RDX: 0000000020000100 RSI: 000000008004587d RDI: 0000000000000003 > [ 23.948924] RBP: 000000000275c914 R08: 0000000000000000 R09: 0000000000000000 > [ 23.951102] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff > [ 23.953287] R13: 00000000000001c5 R14: 00000000006dbb18 R15: 00000000006d90a0 > [ 23.955435] Code: 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 48 8b 05 14 d0 c2 03 85 c0 75 86 48 c7 c6 60 2c c6 91 48 c7 c7 20 2c c6 91 e8 ad da f1 ff <0f> 0b e9 6c ff ff ff e8 01 a1 2d 00 e9 2a ff ff ff 48 89 ef e8 > [ 23.960064] ---[ end trace f542ead798faa3a9 ]--- > .... ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Running syzkaller repros using kvm-xfstests 2018-04-08 13:18 ` Dmitry Vyukov @ 2018-04-08 18:02 ` Theodore Y. Ts'o 2018-04-09 9:28 ` Dmitry Vyukov 0 siblings, 1 reply; 16+ messages in thread From: Theodore Y. Ts'o @ 2018-04-08 18:02 UTC (permalink / raw) To: Dmitry Vyukov Cc: Eric Biggers, Dave Chinner, Matthew Wilcox, linux-fsdevel, LKML, syzkaller, Al Viro On Sun, Apr 08, 2018 at 03:18:39PM +0200, Dmitry Vyukov wrote: > > But note that syzkaller is under active development, so pre-canned > binaries may not always work. Mismatching binary may not understand > all syscalls, fail to parse program, interpret arguments differently, > execute program differently, setup a different environment for the > test, etc. Now a C program captures all of this, because code that > transforms syzkaller programs into C is versioned along with the rest > of the system. > Strictly saying, for syzkaller reproducers one needs to use the exact > syzkaller revision listed along with the reproducer, see for example: > https://syzkaller.appspot.com/bug?id=3fb9c4777053e79a6d2a65ac3738664c87629a21 > The "#syz test" styzbot command does this. Using a different syzkaller > revision may or may not work. Thanks for the warning. I assume you try to maintain backwards compatibility where possible? It might be nice if you could add some kind of explicit versioning scheme --- perhaps with a major/minor version scheme where the syz-executor needs to have the same major number, and a minor number >= the minor version number of the test? One of the reasons why the C program is not so useful for me is that in the Repeat:true case, the C program repeats forever. So for example, I translate Repeat:true to -repeat=100. See: https://github.com/tytso/xfstests-bld/blob/master/kvm-xfstests/test-appliance/files/usr/local/bin/run-syz I suppose I could just abort the test after N minutes and assume if the kernel hasn't crashed, that it's probably not going to. But some way that the C program can be given an argument or an environment variable to control how number of loops it will run might be useful. And some kind of hint as how reliable the repro would be (e.g,. some indication that you should try to run it at least N times to get a failure at least 95% of the time). - Ted ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Running syzkaller repros using kvm-xfstests 2018-04-08 18:02 ` Theodore Y. Ts'o @ 2018-04-09 9:28 ` Dmitry Vyukov 0 siblings, 0 replies; 16+ messages in thread From: Dmitry Vyukov @ 2018-04-09 9:28 UTC (permalink / raw) To: Theodore Y. Ts'o, Dmitry Vyukov, Eric Biggers, Dave Chinner, Matthew Wilcox, linux-fsdevel, LKML, syzkaller, Al Viro On Sun, Apr 8, 2018 at 8:02 PM, Theodore Y. Ts'o <tytso@mit.edu> wrote: > On Sun, Apr 08, 2018 at 03:18:39PM +0200, Dmitry Vyukov wrote: >> >> But note that syzkaller is under active development, so pre-canned >> binaries may not always work. Mismatching binary may not understand >> all syscalls, fail to parse program, interpret arguments differently, >> execute program differently, setup a different environment for the >> test, etc. Now a C program captures all of this, because code that >> transforms syzkaller programs into C is versioned along with the rest >> of the system. >> Strictly saying, for syzkaller reproducers one needs to use the exact >> syzkaller revision listed along with the reproducer, see for example: >> https://syzkaller.appspot.com/bug?id=3fb9c4777053e79a6d2a65ac3738664c87629a21 >> The "#syz test" styzbot command does this. Using a different syzkaller >> revision may or may not work. > > Thanks for the warning. I assume you try to maintain backwards > compatibility where possible? It might be nice if you could add some > kind of explicit versioning scheme --- perhaps with a major/minor > version scheme where the syz-executor needs to have the same major > number, and a minor number >= the minor version number of the test? We try to not break backwards compatibility without a reason. Preserving full backwards compatibility within a single binary is extremely hard. It's like asking kernel to support each and every ever existed version of every in-memory data structure and all of the non-functional aspects (like any fluctuations in performance). If one could give us several additional FTEs for this, then it might be doable. But even then I don't think it's the best use of the FTE time because version control system already gives us exactly this -- exact behavior on a past revision. On top of this, the backward compatibility support will sure have bugs too. In the best case we will spent time debugging why a new version does not precisely model behavior of an old version. In the worst case you will test something and think that the bug is fixed, but it's just that the new version does not behave exactly as the old one. On top of this, this still does not give us forward compatibility, something that one wants in majority of cases with an old pre-canned binary. On top of this, the binaries will be huge because they will need to capture exact versions of all system call descriptions (and the simplest option for this is keeping copies all versions), a 87 MiB image definitely won't be enough to hold this, the binary will be somewhere between gigs and tens of gigs. > One of the reasons why the C program is not so useful for me is that > in the Repeat:true case, the C program repeats forever. So for > example, I translate Repeat:true to -repeat=100. See: > > https://github.com/tytso/xfstests-bld/blob/master/kvm-xfstests/test-appliance/files/usr/local/bin/run-syz > > I suppose I could just abort the test after N minutes and assume if > the kernel hasn't crashed, that it's probably not going to. But some > way that the C program can be given an argument or an environment > variable to control how number of loops it will run might be useful. > And some kind of hint as how reliable the repro would be (e.g,. some > indication that you should try to run it at least N times to get a > failure at least 95% of the time). I think: timeout -s KILL 450 ./a.out is the solution. Repro logic runs programs for at most 7.5 minutes, so 450 should be good. Re env var. There are opposite views too. People complain that syzkaller C repros are mess (which they are). Currently they complain minimal amount of code to reproduce the bugs. If we also start staffing some aux logic in them, it won't be helpful. timeout command looks just as good. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: WARNING in up_write 2018-04-06 0:13 ` Eric Biggers 2018-04-06 1:37 ` Theodore Y. Ts'o @ 2018-04-06 2:01 ` Dave Chinner 2018-04-13 18:25 ` Dmitry Vyukov 1 sibling, 1 reply; 16+ messages in thread From: Dave Chinner @ 2018-04-06 2:01 UTC (permalink / raw) To: Eric Biggers Cc: Matthew Wilcox, Theodore Y. Ts'o, Dmitry Vyukov, syzbot, linux-fsdevel, LKML, syzkaller-bugs, Al Viro On Thu, Apr 05, 2018 at 05:13:25PM -0700, Eric Biggers wrote: > On Fri, Apr 06, 2018 at 08:32:26AM +1000, Dave Chinner wrote: > > On Wed, Apr 04, 2018 at 08:24:54PM -0700, Matthew Wilcox wrote: > > > On Wed, Apr 04, 2018 at 11:22:00PM -0400, Theodore Y. Ts'o wrote: > > > > On Wed, Apr 04, 2018 at 12:35:04PM -0700, Matthew Wilcox wrote: > > > > > On Wed, Apr 04, 2018 at 09:24:05PM +0200, Dmitry Vyukov wrote: > > > > > > On Tue, Apr 3, 2018 at 4:01 AM, syzbot > > > > > > <syzbot+dc5ab2babdf22ca091af@syzkaller.appspotmail.com> wrote: > > > > > > > DEBUG_LOCKS_WARN_ON(sem->owner != get_current()) > > > > > > > WARNING: CPU: 1 PID: 4441 at kernel/locking/rwsem.c:133 up_write+0x1cc/0x210 > > > > > > > kernel/locking/rwsem.c:133 > > > > > > > Kernel panic - not syncing: panic_on_warn set ... > > > > > > > > > > Message-Id: <1522852646-2196-1-git-send-email-longman@redhat.com> > > > > > > > > > > > > > We were way ahead of syzbot in this case. :-) > > > > > > Not really ... syzbot caught it Monday evening ;-) > > > > Rather than arguing over who reported it first, I think that time > > would be better spent reflecting on why the syzbot report was > > completely ignored until *after* Ted diagnosed the issue > > independently and Waiman had already fixed it.... > > > > Clearly there is scope for improvement here. > > > > Cheers, > > > > Well, ultimately a human needed to investigate the syzbot bug report to figure > out what was really going on. In my view, the largest problem is that there are > simply too many bugs, so many are getting ignored. Well, yeah. And when there's too many bugs, looking at the ones people are actually hitting tend to take precedence over those reported by a bot an image problem... > If there were only a few bugs, then Dmitry would investigate each > one and send a "real" bug report of better quality than the > automated system can provide, or even send a fix directly. But in > reality, on the same day this bug was reported, syzbot also found > 10 other bugs, and in the previous 2 days it had found 38 more. > No single person can keep up with that. And this is precisely why people turn around and ask the syzbot developers to do things that make it easier for them to diagnose the problems syzbot reports. > You can see the current > bug list, which has 172 open bugs, on the dashboard at > https://syzkaller.appspot.com/. Is that all? That's *nothing*. > Yes, the kernel really is that > broken. Actually, that tells me the kernel is a hell of a lot better than my experience leads me to beleive it is. I'd have expected thousands of bugs, even tens of thousands of bugs given how many issues we deal with in individual subsystems on a day to day basis. > And although quite a few of these bugs will end up to be > duplicates or even already fixed, a human still has to look at > each one to figure that out. (Though, I do think that syzbot > should try to automatically detect when a reproducible bug was > already fixed, via bisection. It would cause a few bugs to be > incorrectly considered fixed, but it may be a worthwhile > tradeoff.) > > These bugs are all over the kernel as well, so most developers > don't see the big picture but rather just see a few bugs for > "their" subsystem on "their" subsystem's mailing list and > sometimes demand special attention. Of course, it's great when > people suggest ways to improve the process. That's not the response I got.... > But it's not great > when people just don't feel responsible for fixing bugs and wait > for Someone Else to do it. The excessive cross posting of the reports is one of the reasons people think someone else will take care of it. i.e. "Oh, that looks VFS, that went to -fsdevel, I don't need to look at it".... Put simply: if you're mounting an XFS filesystem image and something goes bang, then it should be reported to the XFS list. It does not need to be cross posted to LKML, -fsdevel, 10 individual developers, etc. If it's not an XFS problem, then the XFS developers will CC the relevant lists as needed. > I'm hoping that in the future the syzbot "team", which seems to > actually be just Dmitry now, can get more resources towards > helping fix the bugs. But either way, in the end Linux is a > community effort. We don't really need help fixing the bugs - we need help making it easier to *find the bug* the bot tripped over. That's what the syzbot team needs to focus on, not tell people that what they got is all they are going to get. > Note also that syzbot wasn't super useful in this particular case > because people running xfstests came across the same bug. But, > this is actually a rare case. Most syzbot bug reports have been > for weird corner cases or races that no one ever thought of > before, so there are no existing tests that find them. Which is exactly what these whacky "mount a filesystem fragment" tests it is now doing are exercising. Finding the cause of corruption related crashes is not easy and takes time. Having the bot developers add something to the bot that will save the developer looking at the problem 10 minutes of setup time makes a huge difference to the effort required to find the problem. The tool is useless if people find it too hard to make sense of the bug reports (*cough* lockdep *cough*) or perform triage of the report. If we want to get the bugs fixed faster, we have to make the reports from automated tools contain the exact information the developer needs to solve the problem. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: WARNING in up_write 2018-04-06 2:01 ` WARNING in up_write Dave Chinner @ 2018-04-13 18:25 ` Dmitry Vyukov 0 siblings, 0 replies; 16+ messages in thread From: Dmitry Vyukov @ 2018-04-13 18:25 UTC (permalink / raw) To: Dave Chinner Cc: Eric Biggers, Matthew Wilcox, Theodore Y. Ts'o, syzbot, linux-fsdevel, LKML, syzkaller-bugs, Al Viro, syzkaller On Fri, Apr 6, 2018 at 4:01 AM, Dave Chinner <david@fromorbit.com> wrote: > On Thu, Apr 05, 2018 at 05:13:25PM -0700, Eric Biggers wrote: >> On Fri, Apr 06, 2018 at 08:32:26AM +1000, Dave Chinner wrote: >> > On Wed, Apr 04, 2018 at 08:24:54PM -0700, Matthew Wilcox wrote: >> > > On Wed, Apr 04, 2018 at 11:22:00PM -0400, Theodore Y. Ts'o wrote: >> > > > On Wed, Apr 04, 2018 at 12:35:04PM -0700, Matthew Wilcox wrote: >> > > > > On Wed, Apr 04, 2018 at 09:24:05PM +0200, Dmitry Vyukov wrote: >> > > > > > On Tue, Apr 3, 2018 at 4:01 AM, syzbot >> > > > > > <syzbot+dc5ab2babdf22ca091af@syzkaller.appspotmail.com> wrote: >> > > > > > > DEBUG_LOCKS_WARN_ON(sem->owner != get_current()) >> > > > > > > WARNING: CPU: 1 PID: 4441 at kernel/locking/rwsem.c:133 up_write+0x1cc/0x210 >> > > > > > > kernel/locking/rwsem.c:133 >> > > > > > > Kernel panic - not syncing: panic_on_warn set ... >> > > > > >> > > > > Message-Id: <1522852646-2196-1-git-send-email-longman@redhat.com> >> > > > > >> > > > >> > > > We were way ahead of syzbot in this case. :-) >> > > >> > > Not really ... syzbot caught it Monday evening ;-) >> > >> > Rather than arguing over who reported it first, I think that time >> > would be better spent reflecting on why the syzbot report was >> > completely ignored until *after* Ted diagnosed the issue >> > independently and Waiman had already fixed it.... >> > >> > Clearly there is scope for improvement here. >> > >> > Cheers, >> > >> >> Well, ultimately a human needed to investigate the syzbot bug report to figure >> out what was really going on. In my view, the largest problem is that there are >> simply too many bugs, so many are getting ignored. > > Well, yeah. And when there's too many bugs, looking at the ones > people are actually hitting tend to take precedence over those > reported by a bot an image problem... > >> If there were only a few bugs, then Dmitry would investigate each >> one and send a "real" bug report of better quality than the >> automated system can provide, or even send a fix directly. But in >> reality, on the same day this bug was reported, syzbot also found >> 10 other bugs, and in the previous 2 days it had found 38 more. >> No single person can keep up with that. > > And this is precisely why people turn around and ask the syzbot > developers to do things that make it easier for them to diagnose > the problems syzbot reports. > >> You can see the current >> bug list, which has 172 open bugs, on the dashboard at >> https://syzkaller.appspot.com/. > > Is that all? That's *nothing*. > >> Yes, the kernel really is that >> broken. > > Actually, that tells me the kernel is a hell of a lot better than my > experience leads me to beleive it is. I'd have expected thousands of > bugs, even tens of thousands of bugs given how many issues we deal > with in individual subsystems on a day to day basis. > >> And although quite a few of these bugs will end up to be >> duplicates or even already fixed, a human still has to look at >> each one to figure that out. (Though, I do think that syzbot >> should try to automatically detect when a reproducible bug was >> already fixed, via bisection. It would cause a few bugs to be >> incorrectly considered fixed, but it may be a worthwhile >> tradeoff.) >> >> These bugs are all over the kernel as well, so most developers >> don't see the big picture but rather just see a few bugs for >> "their" subsystem on "their" subsystem's mailing list and >> sometimes demand special attention. Of course, it's great when >> people suggest ways to improve the process. > > That's not the response I got.... > >> But it's not great >> when people just don't feel responsible for fixing bugs and wait >> for Someone Else to do it. > > The excessive cross posting of the reports is one of the reasons > people think someone else will take care of it. i.e. "Oh, that looks VFS, > that went to -fsdevel, I don't need to look at it".... > > Put simply: if you're mounting an XFS filesystem image and something > goes bang, then it should be reported to the XFS list. It does not > need to be cross posted to LKML, -fsdevel, 10 individual developers, > etc. If it's not an XFS problem, then the XFS developers will CC the > relevant lists as needed. > >> I'm hoping that in the future the syzbot "team", which seems to >> actually be just Dmitry now, can get more resources towards >> helping fix the bugs. But either way, in the end Linux is a >> community effort. > > We don't really need help fixing the bugs - we need help making it > easier to *find the bug* the bot tripped over. That's what the > syzbot team needs to focus on, not tell people that what they got is > all they are going to get. > >> Note also that syzbot wasn't super useful in this particular case >> because people running xfstests came across the same bug. But, >> this is actually a rare case. Most syzbot bug reports have been >> for weird corner cases or races that no one ever thought of >> before, so there are no existing tests that find them. > > Which is exactly what these whacky "mount a filesystem fragment" > tests it is now doing are exercising. Finding the cause of > corruption related crashes is not easy and takes time. Having the > bot developers add something to the bot that will save the developer > looking at the problem 10 minutes of setup time makes a huge > difference to the effort required to find the problem. > > The tool is useless if people find it too hard to make sense of the > bug reports (*cough* lockdep *cough*) or perform triage of the > report. If we want to get the bugs fixed faster, we have to make the > reports from automated tools contain the exact information the > developer needs to solve the problem. Hi, Regarding feature requests. We too have limited resources unfortunately and can't handle all feature requests. Feature requests generally fall into the following categories: 1. General features that are easy to do. These are generally done right away (more or less). 2. General features that require significant time. These are noted and are done as resources permit. For example: - bisection (https://github.com/google/syzkaller/issues/501) - kdump collection (https://github.com/google/syzkaller/issues/491) Examples of what is done already: - patch testing - significantly restructured reports 3. Subsystem-specific features that are easy to do. I don't remember that we got any. I guess they would compete with case 2. 4. Subsystem-specific features that require significant time. For these we don't have resources at the moment. Our company have dedicated people for some subsystems (to not go far -- Ted for ext4), but we don't have people for just any subsystem. Kernel developers working on Infiniband contributed to syzkaller themselves, and as far as I understand they are very happy with the results because it allowed them to find and fix several dozens of critical bugs (without involing us at all), so that's an option too. Then, the context of the system is not a single subsystem and not a single bug. Please don't draw all conclusions from a small subset of cases. At this scale there inevitably will be harder bugs that will be handled worse than a dedicated human would do (but a dedicated human would not be able to handle that amount of bugs). But this does not make the overall effect negative, lots of hundreds of bugs are getting fixed. In lots of cases developers pick up bugs from "C program + repro instructions". There is also considerable amount of simpler bugs that are getting fixed even without reproducers. In can be a case for a filesystem too, for example, a NULL deref with an obvious missed preceeding state check, or a KASAN report with all stacks. It's not possible to know ahead of time if it's something that can be fixed with the existing information, or something that can't be. So there is no option of reporting just the former bugs, we can report either all of them or none of them (which would mean that none of the bugs are fixed). Regarding prioritization. Bisection is on our plate. But note that a WARNING can be misleading. One of the bad bugs syzkaller has found was exactly a WARNING, a WARNING to restore FPU registers on context switch, which means interprocess, or host->guest information leak. One of the worst ones manifested in no kernel report at all. It was one of these "target machine just become unresponsive with no self-detected reports". "There is something wrong with kernel" reports get lowest priority, but that one turned out to be full guest->host escape. Even if it's just a WARNING, but triggered remotely, that can be a large problem too. So generally prioritizaton still requires an expert atention, which in turn requires reports all these bugs in the first place. It can also be a case that an innocent bug masks critical bugs. For example, if there is an easy to trigger bug on enterance to a subsystem, nothing else will be discovered until that one is fixed. There are definitely more than 172 bugs. I agree, thousands. And the system is generally capable of finding them, it already has found close to 2000 I think. It's just that the system chokes with existing bugs and all test machines crash right after boot. The more bugs we fix, the more new bugs we will see. Bugs with high CVSS scores are frequently found with similar fuzzing systems. But these won't be reported by humans on mailing lists, and these are not bugs people are actually hitting. These look exactly like this -- some insane inputs to kernel and are sold and used to exploit our phones and bank accounts. Regarding CC lists. If you see issues there, please improve scripts/get_maintainer.pl. That's what most people use to find relevant emails when reporting bugs (when they are not maintainers of this very subsystem and have some secret knowledge) and that's what syzbot uses. If it produces wrong results, the scope of the problem is larger than syzbot. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2018-09-04 8:29 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-04-03 2:01 WARNING in up_write syzbot 2018-04-04 19:24 ` Dmitry Vyukov 2018-04-04 19:35 ` Matthew Wilcox 2018-04-05 3:22 ` Theodore Y. Ts'o 2018-04-05 3:24 ` Matthew Wilcox 2018-04-05 8:22 ` Dmitry Vyukov 2018-09-04 8:28 ` Dmitry Vyukov 2018-04-05 22:32 ` Dave Chinner 2018-04-06 0:13 ` Eric Biggers 2018-04-06 1:37 ` Theodore Y. Ts'o 2018-04-08 6:31 ` Running syzkaller repros using kvm-xfstests Theodore Y. Ts'o 2018-04-08 13:18 ` Dmitry Vyukov 2018-04-08 18:02 ` Theodore Y. Ts'o 2018-04-09 9:28 ` Dmitry Vyukov 2018-04-06 2:01 ` WARNING in up_write Dave Chinner 2018-04-13 18:25 ` Dmitry Vyukov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.