All of lore.kernel.org
 help / color / mirror / Atom feed
* WARNING: bad unlock balance in xfs_iunlock
@ 2018-04-03  2:01 syzbot
  2018-04-03  4:38 ` Dave Chinner
  0 siblings, 1 reply; 27+ messages in thread
From: syzbot @ 2018-04-03  2:01 UTC (permalink / raw)
  To: darrick.wong, linux-kernel, linux-xfs, syzkaller-bugs

Hello,

syzbot hit the following crash on upstream commit
86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +0000)
Merge branch 'ras-core-for-linus' of  
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba

C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5643378645532672
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=6801295859785128502
compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+84a67953651a971809ba@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.
If you forward the report, please keep this part and the footer.

000000005e445429: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
00  ................
00000000c1c426da: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
00  ................
XFS (loop0): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x1  
len 1 error 117

=====================================
WARNING: bad unlock balance detected!
4.16.0+ #11 Not tainted
-------------------------------------
syzkaller109734/4463 is trying to release lock (&xfs_nondir_ilock_class) at:
[<ffffffff8276589f>] mrunlock_excl fs/xfs/mrlock.h:74 [inline]
[<ffffffff8276589f>] xfs_iunlock+0x36f/0x4a0 fs/xfs/xfs_inode.c:329
but there are no more locks to release!

other info that might help us debug this:
2 locks held by syzkaller109734/4463:
  #0: 00000000b4cfce0b (&type->s_umount_key#36/1){+.+.}, at: alloc_super  
fs/super.c:211 [inline]
  #0: 00000000b4cfce0b (&type->s_umount_key#36/1){+.+.}, at:  
sget_userns+0x3b2/0xe60 fs/super.c:502
  #1: 00000000d8e8aeed (sb_internal#2){.+.+}, at: sb_start_intwrite  
include/linux/fs.h:1595 [inline]
  #1: 00000000d8e8aeed (sb_internal#2){.+.+}, at:  
xfs_trans_alloc+0x349/0x430 fs/xfs/xfs_trans.c:264

stack backtrace:
CPU: 1 PID: 4463 Comm: syzkaller109734 Not tainted 4.16.0+ #11
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x1a7/0x27d lib/dump_stack.c:53
  print_unlock_imbalance_bug+0x12f/0x140 kernel/locking/lockdep.c:3484
  __lock_release kernel/locking/lockdep.c:3691 [inline]
  lock_release+0x6fe/0xa40 kernel/locking/lockdep.c:3939
  up_write+0x72/0x210 kernel/locking/rwsem.c:132
  mrunlock_excl fs/xfs/mrlock.h:74 [inline]
  xfs_iunlock+0x36f/0x4a0 fs/xfs/xfs_inode.c:329
  xfs_inode_item_unlock+0x82/0xa0 fs/xfs/xfs_inode_item.c:600
  xfs_trans_free_items+0x176/0x230 fs/xfs/xfs_trans.c:790
  xfs_trans_cancel+0x1bb/0x260 fs/xfs/xfs_trans.c:1047
  xfs_qm_dqread+0xc7c/0x13b0 fs/xfs/xfs_dquot.c:623
  xfs_qm_dqget+0x6b9/0x2060 fs/xfs/xfs_dquot.c:766
  xfs_qm_quotacheck_dqadjust+0xe4/0x800 fs/xfs/xfs_qm.c:1080
  xfs_qm_dqusage_adjust+0x814/0x11c0 fs/xfs/xfs_qm.c:1194
  xfs_bulkstat_ag_ichunk fs/xfs/xfs_itable.c:302 [inline]
  xfs_bulkstat+0xc0a/0x1cb0 fs/xfs/xfs_itable.c:487
  xfs_qm_quotacheck+0x3cb/0x800 fs/xfs/xfs_qm.c:1340
  xfs_qm_mount_quotas+0x2c4/0x470 fs/xfs/xfs_qm.c:1459
  xfs_mountfs+0x22a1/0x2690 fs/xfs/xfs_mount.c:982
  xfs_fs_fill_super+0xc8d/0x1250 fs/xfs/xfs_super.c:1703
  mount_bdev+0x2b7/0x370 fs/super.c:1119
  xfs_fs_mount+0x34/0x40 fs/xfs/xfs_super.c:1770
  mount_fs+0x66/0x2d0 fs/super.c:1222
  vfs_kern_mount.part.26+0xc6/0x4a0 fs/namespace.c:1037
  vfs_kern_mount fs/namespace.c:2509 [inline]
  do_new_mount fs/namespace.c:2512 [inline]
  do_mount+0xea4/0x2bb0 fs/namespace.c:2842
  SYSC_mount fs/namespace.c:3058 [inline]
  SyS_mount+0xab/0x120 fs/namespace.c:3035
  do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x44488a
RSP: 002b:00007fff118ea198 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda R


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkaller@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
If you want to test a patch for this bug, please reply with:
#syz test: git://repo/address.git branch
and provide the patch inline or as an attachment.
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.
Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-04-03  2:01 WARNING: bad unlock balance in xfs_iunlock syzbot
@ 2018-04-03  4:38 ` Dave Chinner
  2018-04-05 18:54   ` Dmitry Vyukov
  0 siblings, 1 reply; 27+ messages in thread
From: Dave Chinner @ 2018-04-03  4:38 UTC (permalink / raw)
  To: syzbot; +Cc: darrick.wong, linux-kernel, linux-xfs, syzkaller-bugs

On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
> Hello,
> 
> syzbot hit the following crash on upstream commit
> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +0000)
> Merge branch 'ras-core-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
> 
> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
> syzkaller reproducer:
> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048

What a mess. A hand built, hopelessly broken filesystem image made
up of hex dumps, written into a mmap()d region of memory, then
copied into a tmpfs file and mounted with the loop device.

Engineers that can debug broken filesystems don't grow on trees.  If
we are to have any hope of understanding what the hell this test is
doing, the bot needs to supply us with a copy of the built
filesystem image the test uses. We need to be able to point forensic
tools at the image to decode all the structures into human readable
format - if we are forced to do that by hand or jump through hoops
to create our own filesystem image than I'm certainly not going to
waste time looking at these reports...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-04-03  4:38 ` Dave Chinner
@ 2018-04-05 18:54   ` Dmitry Vyukov
  2018-04-05 21:38     ` Dave Chinner
  2018-04-30 13:24     ` Dmitry Vyukov
  0 siblings, 2 replies; 27+ messages in thread
From: Dmitry Vyukov @ 2018-04-05 18:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: syzbot, darrick.wong, LKML, linux-xfs, syzkaller-bugs

On Tue, Apr 3, 2018 at 6:38 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
>> Hello,
>>
>> syzbot hit the following crash on upstream commit
>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +0000)
>> Merge branch 'ras-core-for-linus' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> syzbot dashboard link:
>> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
>>
>> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
>> syzkaller reproducer:
>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>
> What a mess. A hand built, hopelessly broken filesystem image made
> up of hex dumps, written into a mmap()d region of memory, then
> copied into a tmpfs file and mounted with the loop device.
>
> Engineers that can debug broken filesystems don't grow on trees.  If
> we are to have any hope of understanding what the hell this test is
> doing, the bot needs to supply us with a copy of the built
> filesystem image the test uses. We need to be able to point forensic
> tools at the image to decode all the structures into human readable
> format - if we are forced to do that by hand or jump through hoops
> to create our own filesystem image than I'm certainly not going to
> waste time looking at these reports...

Hi Dave,

Here is the image:
https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
(took me about a minute to extract from test by replacing memfd_create
with open and running the program).
Then do the following to trigger the bug:
losetup /dev/loop0 xfs.repro
mkdir xfs
mount -t xfs -o nouuid,prjquota,noikeep,quota /dev/loop0 xfs

To answer your more general question: syzbot is not a system to test
solely file systems, it finds bugs in hundreds of kernel subsystems.
Generating image for file systems, media files for sound and
FaceDancer programs that crash host when  FaceDancer device is plugged
into USB is not feasible. And in the end it's not even clear what
kernel subsystem is at fault and even if it somehow figures out that
it's a filesystem, it's unclear that it's exactly an image that
provokes the bug. syzbot provides C reproducers which is a reasonable
common ground for bug reports. At this point the bug needs human
attention. Some bugs are trivial enough that a developer does not even
need to look at the reproducer. Some bugs are so involved that only an
expert in a particular subsystem can figure out what happens there.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-04-05 18:54   ` Dmitry Vyukov
@ 2018-04-05 21:38     ` Dave Chinner
  2018-04-06 16:10       ` Darrick J. Wong
  2018-04-30 13:24     ` Dmitry Vyukov
  1 sibling, 1 reply; 27+ messages in thread
From: Dave Chinner @ 2018-04-05 21:38 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: syzbot, darrick.wong, LKML, linux-xfs, syzkaller-bugs

On Thu, Apr 05, 2018 at 08:54:50PM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 3, 2018 at 6:38 AM, Dave Chinner <david@fromorbit.com> wrote:
> > On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
> >> Hello,
> >>
> >> syzbot hit the following crash on upstream commit
> >> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +0000)
> >> Merge branch 'ras-core-for-linus' of
> >> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> >> syzbot dashboard link:
> >> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
> >>
> >> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
> >> syzkaller reproducer:
> >> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
> >
> > What a mess. A hand built, hopelessly broken filesystem image made
> > up of hex dumps, written into a mmap()d region of memory, then
> > copied into a tmpfs file and mounted with the loop device.
> >
> > Engineers that can debug broken filesystems don't grow on trees.  If
> > we are to have any hope of understanding what the hell this test is
> > doing, the bot needs to supply us with a copy of the built
> > filesystem image the test uses. We need to be able to point forensic
> > tools at the image to decode all the structures into human readable
> > format - if we are forced to do that by hand or jump through hoops
> > to create our own filesystem image than I'm certainly not going to
> > waste time looking at these reports...
> 
> Hi Dave,
> 
> Here is the image:
> https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
> (took me about a minute to extract from test by replacing memfd_create
> with open and running the program).

Says the expert who knows exactly how it's all put together. It took
me a couple of hours just to understand WTF the syzbot reproducer
was actually doing....

> Then do the following to trigger the bug:
> losetup /dev/loop0 xfs.repro
> mkdir xfs
> mount -t xfs -o nouuid,prjquota,noikeep,quota /dev/loop0 xfs
> 
> To answer your more general question: syzbot is not a system to test
> solely file systems, it finds bugs in hundreds of kernel subsystems.
> Generating image for file systems, media files for sound and
> FaceDancer programs that crash host when  FaceDancer device is plugged
> into USB is not feasible. And in the end it's not even clear what

I don't care how syzbot generates the filesystem image it uses.

> kernel subsystem is at fault and even if it somehow figures out that
> it's a filesystem, it's unclear that it's exactly an image that
> provokes the bug. syzbot provides C reproducers which is a reasonable

It doesn't matter *what subsystem breaks*. If syzbot is generating a
filesystem image and then mounting it, it needs to provide that
filesystem image to to people who end up having to debug the
problem. It's a basic "corrupt filesystem" bug triage step.

> Some bugs are so involved that only an
> expert in a particular subsystem can figure out what happens there.

And that's clearly the case here, whether you like it or not.

You want us to do things that make syzbot more useful as a tool but
you don't want to do things that make syzbot a useful tool for us.
It's a two way street....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-04-05 21:38     ` Dave Chinner
@ 2018-04-06 16:10       ` Darrick J. Wong
  2018-04-13 10:03         ` Dmitry Vyukov
  0 siblings, 1 reply; 27+ messages in thread
From: Darrick J. Wong @ 2018-04-06 16:10 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: Dave Chinner, syzbot, LKML, linux-xfs, syzkaller-bugs

On Fri, Apr 06, 2018 at 07:38:44AM +1000, Dave Chinner wrote:
> On Thu, Apr 05, 2018 at 08:54:50PM +0200, Dmitry Vyukov wrote:
> > On Tue, Apr 3, 2018 at 6:38 AM, Dave Chinner <david@fromorbit.com> wrote:
> > > On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
> > >> Hello,
> > >>
> > >> syzbot hit the following crash on upstream commit
> > >> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +0000)
> > >> Merge branch 'ras-core-for-linus' of
> > >> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > >> syzbot dashboard link:
> > >> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
> > >>
> > >> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
> > >> syzkaller reproducer:
> > >> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
> > >
> > > What a mess. A hand built, hopelessly broken filesystem image made
> > > up of hex dumps, written into a mmap()d region of memory, then
> > > copied into a tmpfs file and mounted with the loop device.
> > >
> > > Engineers that can debug broken filesystems don't grow on trees.  If
> > > we are to have any hope of understanding what the hell this test is
> > > doing, the bot needs to supply us with a copy of the built
> > > filesystem image the test uses. We need to be able to point forensic
> > > tools at the image to decode all the structures into human readable
> > > format - if we are forced to do that by hand or jump through hoops
> > > to create our own filesystem image than I'm certainly not going to
> > > waste time looking at these reports...
> > 
> > Hi Dave,
> > 
> > Here is the image:
> > https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
> > (took me about a minute to extract from test by replacing memfd_create
> > with open and running the program).
> 
> Says the expert who knows exactly how it's all put together. It took
> me a couple of hours just to understand WTF the syzbot reproducer
> was actually doing....
> 
> > Then do the following to trigger the bug:
> > losetup /dev/loop0 xfs.repro
> > mkdir xfs
> > mount -t xfs -o nouuid,prjquota,noikeep,quota /dev/loop0 xfs
> > 
> > To answer your more general question: syzbot is not a system to test
> > solely file systems, it finds bugs in hundreds of kernel subsystems.
> > Generating image for file systems, media files for sound and
> > FaceDancer programs that crash host when  FaceDancer device is plugged
> > into USB is not feasible. And in the end it's not even clear what
> 
> I don't care how syzbot generates the filesystem image it uses.
> 
> > kernel subsystem is at fault and even if it somehow figures out that
> > it's a filesystem, it's unclear that it's exactly an image that
> > provokes the bug. syzbot provides C reproducers which is a reasonable
> 
> It doesn't matter *what subsystem breaks*. If syzbot is generating a
> filesystem image and then mounting it, it needs to provide that
> filesystem image to to people who end up having to debug the
> problem. It's a basic "corrupt filesystem" bug triage step.
> 
> > Some bugs are so involved that only an
> > expert in a particular subsystem can figure out what happens there.
> 
> And that's clearly the case here, whether you like it or not.
> 
> You want us to do things that make syzbot more useful as a tool but
> you don't want to do things that make syzbot a useful tool for us.
> It's a two way street....

...here's my take on this whole situation:

So far as I can tell, this syzbot daemon grew the ability to fuzz
filesystems and started emitting bug report after bug report, with
misleading commit ids that have nothing to do with the complaint.  Your
maintainers (Dave, Eric, and me) have spent a few hours trying to figure
out what's going on, to some frustration.  The bug reports themselves
miss the public engagement detail of saying something like "Hey XFS
engineers, if you'd like to talk to the syzbot engineers they can be
reached at <googlegroup address>."  Instead it merely says "direct
questions to <addr>" like this is some press release and the only thing
on the other end of the line will be some disinterested bureaucracy.
Or some robot.

We're willing to take random user reports of corruption and misbehavior
and do all the work to turn that into regression tests and patches, but
that's not the situation here.  You work for a well known engineering
company which (I assume) has decided that fuzz hardening the commons
aligns with its goals.  Collective-you (i.e. your company) could realize
that goal sooner and with a lot less community friction by staffing up
engineers to join our community to help us triage and fix the things
reported by syzbot.  It's much /less/ effective to dump a pile of work
on the people in the community.  We each have our own work-piles and
most likely different priorities.

Hardening XFS to the sorts of things fuzzers find is important to me
(and $employer) as well, but the difference here is that I read every
report that gets generated and start the work to figure out a regression
test and a code fix.  That is what I send to the list for more public
participation and to signal that yes, there's a human behind all this
with some reasonable level of understanding of the problem.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-04-06 16:10       ` Darrick J. Wong
@ 2018-04-13 10:03         ` Dmitry Vyukov
  2018-04-16 19:22           ` Eric Sandeen
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Vyukov @ 2018-04-13 10:03 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dave Chinner, syzbot, LKML, linux-xfs, syzkaller-bugs,
	Theodore Ts'o, syzkaller

On Fri, Apr 6, 2018 at 6:10 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> On Fri, Apr 06, 2018 at 07:38:44AM +1000, Dave Chinner wrote:
>> On Thu, Apr 05, 2018 at 08:54:50PM +0200, Dmitry Vyukov wrote:
>> > On Tue, Apr 3, 2018 at 6:38 AM, Dave Chinner <david@fromorbit.com> wrote:
>> > > On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
>> > >> Hello,
>> > >>
>> > >> syzbot hit the following crash on upstream commit
>> > >> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +0000)
>> > >> Merge branch 'ras-core-for-linus' of
>> > >> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> > >> syzbot dashboard link:
>> > >> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
>> > >>
>> > >> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
>> > >> syzkaller reproducer:
>> > >> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>> > >
>> > > What a mess. A hand built, hopelessly broken filesystem image made
>> > > up of hex dumps, written into a mmap()d region of memory, then
>> > > copied into a tmpfs file and mounted with the loop device.
>> > >
>> > > Engineers that can debug broken filesystems don't grow on trees.  If
>> > > we are to have any hope of understanding what the hell this test is
>> > > doing, the bot needs to supply us with a copy of the built
>> > > filesystem image the test uses. We need to be able to point forensic
>> > > tools at the image to decode all the structures into human readable
>> > > format - if we are forced to do that by hand or jump through hoops
>> > > to create our own filesystem image than I'm certainly not going to
>> > > waste time looking at these reports...
>> >
>> > Hi Dave,
>> >
>> > Here is the image:
>> > https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
>> > (took me about a minute to extract from test by replacing memfd_create
>> > with open and running the program).
>>
>> Says the expert who knows exactly how it's all put together. It took
>> me a couple of hours just to understand WTF the syzbot reproducer
>> was actually doing....
>>
>> > Then do the following to trigger the bug:
>> > losetup /dev/loop0 xfs.repro
>> > mkdir xfs
>> > mount -t xfs -o nouuid,prjquota,noikeep,quota /dev/loop0 xfs
>> >
>> > To answer your more general question: syzbot is not a system to test
>> > solely file systems, it finds bugs in hundreds of kernel subsystems.
>> > Generating image for file systems, media files for sound and
>> > FaceDancer programs that crash host when  FaceDancer device is plugged
>> > into USB is not feasible. And in the end it's not even clear what
>>
>> I don't care how syzbot generates the filesystem image it uses.
>>
>> > kernel subsystem is at fault and even if it somehow figures out that
>> > it's a filesystem, it's unclear that it's exactly an image that
>> > provokes the bug. syzbot provides C reproducers which is a reasonable
>>
>> It doesn't matter *what subsystem breaks*. If syzbot is generating a
>> filesystem image and then mounting it, it needs to provide that
>> filesystem image to to people who end up having to debug the
>> problem. It's a basic "corrupt filesystem" bug triage step.
>>
>> > Some bugs are so involved that only an
>> > expert in a particular subsystem can figure out what happens there.
>>
>> And that's clearly the case here, whether you like it or not.
>>
>> You want us to do things that make syzbot more useful as a tool but
>> you don't want to do things that make syzbot a useful tool for us.
>> It's a two way street....

Hi Dave, Darrick,

It's not that we don't want to make the system more useful, it's just
that we don't see what reasonably can be done here. The system does
not have notion of images, sound files, USB devices. It feeds data
into system calls, and that's what it provides as reproducers. Also
see the last paragraph.

> ...here's my take on this whole situation:
>
> So far as I can tell, this syzbot daemon grew the ability to fuzz
> filesystems and started emitting bug report after bug report, with
> misleading commit ids that have nothing to do with the complaint.

Please elaborate re commits. It's a basic rule of any good bug report
-- communicate exact state of source code when the bug was hit, i.e.
provide the commit hash.

>  Your
> maintainers (Dave, Eric, and me) have spent a few hours trying to figure
> out what's going on, to some frustration.  The bug reports themselves
> miss the public engagement detail of saying something like "Hey XFS
> engineers, if you'd like to talk to the syzbot engineers they can be
> reached at <googlegroup address>."  Instead it merely says "direct
> questions to <addr>" like this is some press release and the only thing
> on the other end of the line will be some disinterested bureaucracy.
> Or some robot.

This is quite subjective and we hear opinions all possible ways. I
don't think there is one size fits all. E.g. +Ted argued in exactly
the opposite direction: make reports more laconic, formal,
table-formatted with dry information. There is also a tradeoff between
describing each detail in proper, friendly English and being more
laconic. If we increase everything 4x and post a wall of text with
each report, lots of people won't read anything of it just because
it's a wall of text. I am also not a native English speaker, so
providing simple formal text is a safer option than writing something
potentially clumsy.

Having said that, I am collecting proposals for report format
improvements. Here is fortunately slightly better wording for footer
based on your idea:
https://github.com/google/syzkaller/issues/565#issuecomment-380793620

> We're willing to take random user reports of corruption and misbehavior
> and do all the work to turn that into regression tests and patches, but
> that's not the situation here.  You work for a well known engineering
> company which (I assume) has decided that fuzz hardening the commons
> aligns with its goals.  Collective-you (i.e. your company) could realize
> that goal sooner and with a lot less community friction by staffing up
> engineers to join our community to help us triage and fix the things
> reported by syzbot.  It's much /less/ effective to dump a pile of work
> on the people in the community.  We each have our own work-piles and
> most likely different priorities.
>
> Hardening XFS to the sorts of things fuzzers find is important to me
> (and $employer) as well, but the difference here is that I read every
> report that gets generated and start the work to figure out a regression
> test and a code fix.  That is what I send to the list for more public
> participation and to signal that yes, there's a human behind all this
> with some reasonable level of understanding of the problem.

Well, I guess nobody has infinite engineering resources. If we would
have them we would also fix all these bug ourselves and don't bother
you at all. Just as any other company could invest in writing bug
detection tools, fuzzers, deploy them and report bugs, which would
relief us from this.
We have limited resources and do what's possible within these
resources. Unfortunately providing individual handling of each of the
thousands of bugs is not possible within these resources. I know that
what we are doing is useful for kernel overall because lots of
hundreds of bugs get fixed due to this effort. If you, as xfs
maintainers, think that these reports are net negative for xfs and xfs
should not be tested, say so and we will figure out how to make it
happen.

Thanks

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-04-13 10:03         ` Dmitry Vyukov
@ 2018-04-16 19:22           ` Eric Sandeen
  2018-04-30 13:23             ` Dmitry Vyukov
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Sandeen @ 2018-04-16 19:22 UTC (permalink / raw)
  To: Dmitry Vyukov, Darrick J. Wong
  Cc: Dave Chinner, syzbot, LKML, linux-xfs, syzkaller-bugs,
	Theodore Ts'o, syzkaller

On 4/13/18 5:03 AM, Dmitry Vyukov wrote:
> On Fri, Apr 6, 2018 at 6:10 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
>> On Fri, Apr 06, 2018 at 07:38:44AM +1000, Dave Chinner wrote:
>>> On Thu, Apr 05, 2018 at 08:54:50PM +0200, Dmitry Vyukov wrote:
>>>> On Tue, Apr 3, 2018 at 6:38 AM, Dave Chinner <david@fromorbit.com> wrote:
>>>>> On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
>>>>>> Hello,
>>>>>>
>>>>>> syzbot hit the following crash on upstream commit
>>>>>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +0000)
>>>>>> Merge branch 'ras-core-for-linus' of
>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>>>>> syzbot dashboard link:
>>>>>> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
>>>>>>
>>>>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
>>>>>> syzkaller reproducer:
>>>>>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>>>>>
>>>>> What a mess. A hand built, hopelessly broken filesystem image made
>>>>> up of hex dumps, written into a mmap()d region of memory, then
>>>>> copied into a tmpfs file and mounted with the loop device.
>>>>>
>>>>> Engineers that can debug broken filesystems don't grow on trees.  If
>>>>> we are to have any hope of understanding what the hell this test is
>>>>> doing, the bot needs to supply us with a copy of the built
>>>>> filesystem image the test uses. We need to be able to point forensic
>>>>> tools at the image to decode all the structures into human readable
>>>>> format - if we are forced to do that by hand or jump through hoops
>>>>> to create our own filesystem image than I'm certainly not going to
>>>>> waste time looking at these reports...
>>>>
>>>> Hi Dave,
>>>>
>>>> Here is the image:
>>>> https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
>>>> (took me about a minute to extract from test by replacing memfd_create
>>>> with open and running the program).
>>>
>>> Says the expert who knows exactly how it's all put together. It took
>>> me a couple of hours just to understand WTF the syzbot reproducer
>>> was actually doing....

*nod* more on this below.

>>>> Then do the following to trigger the bug:
>>>> losetup /dev/loop0 xfs.repro
>>>> mkdir xfs
>>>> mount -t xfs -o nouuid,prjquota,noikeep,quota /dev/loop0 xfs
>>>>
>>>> To answer your more general question: syzbot is not a system to test
>>>> solely file systems, it finds bugs in hundreds of kernel subsystems.
>>>> Generating image for file systems, media files for sound and
>>>> FaceDancer programs that crash host when  FaceDancer device is plugged
>>>> into USB is not feasible. And in the end it's not even clear what
>>>
>>> I don't care how syzbot generates the filesystem image it uses.
>>>
>>>> kernel subsystem is at fault and even if it somehow figures out that
>>>> it's a filesystem, it's unclear that it's exactly an image that
>>>> provokes the bug. syzbot provides C reproducers which is a reasonable
>>>
>>> It doesn't matter *what subsystem breaks*. If syzbot is generating a
>>> filesystem image and then mounting it, it needs to provide that
>>> filesystem image to to people who end up having to debug the
>>> problem. It's a basic "corrupt filesystem" bug triage step.

*nod*

>>>> Some bugs are so involved that only an
>>>> expert in a particular subsystem can figure out what happens there.
>>>
>>> And that's clearly the case here, whether you like it or not.
>>>
>>> You want us to do things that make syzbot more useful as a tool but
>>> you don't want to do things that make syzbot a useful tool for us.
>>> It's a two way street....
> 
> Hi Dave, Darrick,
> 
> It's not that we don't want to make the system more useful, it's just
> that we don't see what reasonably can be done here. The system does
> not have notion of images, sound files, USB devices. It feeds data
> into system calls, and that's what it provides as reproducers. Also
> see the last paragraph.

It sure /seems/ to have a notion of images: what else is syz_mount_image()?

i.e. you are mounting an image to reproduce the problem, correct?
And the system is "smart" enough to fire off an email to a filesystem list;
if it does so, add a link to the image itself, as you already have already done
for the C reproducer.

Filesystem images are common parlance for filesystem engineers.  When
you engage with them you'll have better results if you  provide them with
inputs they can use directly instead of requiring them to reverse-engineer
your custom test harness.

>> ...here's my take on this whole situation:
>>
>> So far as I can tell, this syzbot daemon grew the ability to fuzz
>> filesystems and started emitting bug report after bug report, with
>> misleading commit ids that have nothing to do with the complaint.
> 
> Please elaborate re commits. It's a basic rule of any good bug report
> -- communicate exact state of source code when the bug was hit, i.e.
> provide the commit hash.

Further best practice is to provide the /correct/ commit hash.

syzbot has identified commits which seem almost certainly incorrect.

For example,

KASAN: use-after-free Read in radix_tree_next_chunk

identified:

9dd2326890d89a5179967c947dab2bab34d7ddee (Fri Mar 30 17:29:47 2018 +0000)
Merge tag 'ceph-for-4.16-rc8' of git://github.com/ceph/ceph-client

and:

WARNING: bad unlock balance in xfs_iunlock

identified:

86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +0000)
Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

I can't imagine these are right...

>>  Your
>> maintainers (Dave, Eric, and me) have spent a few hours trying to figure
>> out what's going on, to some frustration.  The bug reports themselves
>> miss the public engagement detail of saying something like "Hey XFS
>> engineers, if you'd like to talk to the syzbot engineers they can be
>> reached at <googlegroup address>."  Instead it merely says "direct
>> questions to <addr>" like this is some press release and the only thing
>> on the other end of the line will be some disinterested bureaucracy.
>> Or some robot.
> 
> This is quite subjective and we hear opinions all possible ways. I
> don't think there is one size fits all. E.g. +Ted argued in exactly
> the opposite direction: make reports more laconic, formal,
> table-formatted with dry information. There is also a tradeoff between
> describing each detail in proper, friendly English and being more
> laconic. If we increase everything 4x and post a wall of text with
> each report, lots of people won't read anything of it just because
> it's a wall of text. I am also not a native English speaker, so
> providing simple formal text is a safer option than writing something
> potentially clumsy.

What Darrick is asking for, I think, is a human on the other end to talk
to if there are any issues or concerns about the reports.  (For example,
"hey that commit looks wrong, are you sure?)  We are not asking for a
long narrative with each report.  We're asking for a dialogue about this
framework and the reporting, so it can improve.

> Having said that, I am collecting proposals for report format
> improvements. Here is fortunately slightly better wording for footer
> based on your idea:
> https://github.com/google/syzkaller/issues/565#issuecomment-380793620

...

> Well, I guess nobody has infinite engineering resources. If we would
> have them we would also fix all these bug ourselves and don't bother
> you at all. Just as any other company could invest in writing bug
> detection tools, fuzzers, deploy them and report bugs, which would
> relief us from this.
> We have limited resources and do what's possible within these
> resources. Unfortunately providing individual handling of each of the
> thousands of bugs is not possible within these resources. I know that
> what we are doing is useful for kernel overall because lots of
> hundreds of bugs get fixed due to this effort. If you, as xfs
> maintainers, think that these reports are net negative for xfs and xfs
> should not be tested, say so and we will figure out how to make it
> happen.

We've been inundated lately with results of scripts which generate bug reports
quickly but do not provide enough information to quickly triage, reproduce, and
fix those reports.

(For example, another fuzzer is using the same "poc.c" to exercise a fuzzed
image; it might be the 1st operation or the 50th that causes the issue, but
the harness doesn't bother to find out or report it, it just sends all 50
potential operations to us, and assumes we'll take the time to figure it out.
It's laziness on the script/reporting side, resulting in extra work on the
human/debugging side.)

I think that in this case, what we are asking for is a fine tuning of the
testing and reporting so that we can more efficiently address these issues.
Off the top of my head, and there may be more items:

1) Add a human contact to the emails, start an IRC channel, etc, for better
   two-way communication.  (it wasn't clear that syzkaller@ reached humans,
   tbh.)
2) _Properly_ identify the regressing commit, if any.  If it doesn't look like
   a recent regression, you can state that too.
3) If you're reporting a filesystem bug that arose from using a filesystem
   image, provide a URL to that filesystem image directly in the report.
4) Create a filesystem image that can be more easily debugged by the experts,
   i.e. one with > 1 allocation group, so standard repair & analysis tools can
   be used with it.

Hopefully this sort of feedback will result in better bug reports from you,
and faster (and more) bugfixes from us.

Personally, I'm certainly not asking you to stop sending reports of bugs you
find, but I /am/ asking that they be as refined, specific, and useful as possible.

Thanks,
-Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-04-16 19:22           ` Eric Sandeen
@ 2018-04-30 13:23             ` Dmitry Vyukov
  2018-04-30 13:49               ` Eric Sandeen
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Vyukov @ 2018-04-30 13:23 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Darrick J. Wong, Dave Chinner, syzbot, LKML, linux-xfs,
	syzkaller-bugs, Theodore Ts'o, syzkaller

On Mon, Apr 16, 2018 at 9:22 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
> On 4/13/18 5:03 AM, Dmitry Vyukov wrote:
>> On Fri, Apr 6, 2018 at 6:10 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
>>> On Fri, Apr 06, 2018 at 07:38:44AM +1000, Dave Chinner wrote:
>>>> On Thu, Apr 05, 2018 at 08:54:50PM +0200, Dmitry Vyukov wrote:
>>>>> On Tue, Apr 3, 2018 at 6:38 AM, Dave Chinner <david@fromorbit.com> wrote:
>>>>>> On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> syzbot hit the following crash on upstream commit
>>>>>>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +0000)
>>>>>>> Merge branch 'ras-core-for-linus' of
>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>>>>>> syzbot dashboard link:
>>>>>>> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
>>>>>>>
>>>>>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
>>>>>>> syzkaller reproducer:
>>>>>>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>>>>>>
>>>>>> What a mess. A hand built, hopelessly broken filesystem image made
>>>>>> up of hex dumps, written into a mmap()d region of memory, then
>>>>>> copied into a tmpfs file and mounted with the loop device.
>>>>>>
>>>>>> Engineers that can debug broken filesystems don't grow on trees.  If
>>>>>> we are to have any hope of understanding what the hell this test is
>>>>>> doing, the bot needs to supply us with a copy of the built
>>>>>> filesystem image the test uses. We need to be able to point forensic
>>>>>> tools at the image to decode all the structures into human readable
>>>>>> format - if we are forced to do that by hand or jump through hoops
>>>>>> to create our own filesystem image than I'm certainly not going to
>>>>>> waste time looking at these reports...
>>>>>
>>>>> Hi Dave,
>>>>>
>>>>> Here is the image:
>>>>> https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
>>>>> (took me about a minute to extract from test by replacing memfd_create
>>>>> with open and running the program).
>>>>
>>>> Says the expert who knows exactly how it's all put together. It took
>>>> me a couple of hours just to understand WTF the syzbot reproducer
>>>> was actually doing....
>
> *nod* more on this below.
>
>>>>> Then do the following to trigger the bug:
>>>>> losetup /dev/loop0 xfs.repro
>>>>> mkdir xfs
>>>>> mount -t xfs -o nouuid,prjquota,noikeep,quota /dev/loop0 xfs
>>>>>
>>>>> To answer your more general question: syzbot is not a system to test
>>>>> solely file systems, it finds bugs in hundreds of kernel subsystems.
>>>>> Generating image for file systems, media files for sound and
>>>>> FaceDancer programs that crash host when  FaceDancer device is plugged
>>>>> into USB is not feasible. And in the end it's not even clear what
>>>>
>>>> I don't care how syzbot generates the filesystem image it uses.
>>>>
>>>>> kernel subsystem is at fault and even if it somehow figures out that
>>>>> it's a filesystem, it's unclear that it's exactly an image that
>>>>> provokes the bug. syzbot provides C reproducers which is a reasonable
>>>>
>>>> It doesn't matter *what subsystem breaks*. If syzbot is generating a
>>>> filesystem image and then mounting it, it needs to provide that
>>>> filesystem image to to people who end up having to debug the
>>>> problem. It's a basic "corrupt filesystem" bug triage step.
>
> *nod*
>
>>>>> Some bugs are so involved that only an
>>>>> expert in a particular subsystem can figure out what happens there.
>>>>
>>>> And that's clearly the case here, whether you like it or not.
>>>>
>>>> You want us to do things that make syzbot more useful as a tool but
>>>> you don't want to do things that make syzbot a useful tool for us.
>>>> It's a two way street....
>>
>> Hi Dave, Darrick,
>>
>> It's not that we don't want to make the system more useful, it's just
>> that we don't see what reasonably can be done here. The system does
>> not have notion of images, sound files, USB devices. It feeds data
>> into system calls, and that's what it provides as reproducers. Also
>> see the last paragraph.
>
> It sure /seems/ to have a notion of images: what else is syz_mount_image()?
>
> i.e. you are mounting an image to reproduce the problem, correct?
> And the system is "smart" enough to fire off an email to a filesystem list;
> if it does so, add a link to the image itself, as you already have already done
> for the C reproducer.
>
> Filesystem images are common parlance for filesystem engineers.  When
> you engage with them you'll have better results if you  provide them with
> inputs they can use directly instead of requiring them to reverse-engineer
> your custom test harness.


Well, yes and no.
syz_mount_image() is the only part of a large system that knows about
images. For the rest of the system it's just a syscall like any other
syscall. And the part that sends emails is far away from
syz_mount_image().
syzbot does not know per se that it sends an email to filesystems
list. It just extracted kernel source file name that looked relevant
to this crash and run get_maintainers.pl on it.
Also the image can contain dynamically generated data, which makes it
impossible to have as a file at all.

Thinking of this, what should be reasonably easy to do and may be a
compromise for near future is the following. We could insert code into
syz_mount_image() which dump the image if you build a program with a
special define (e.g. -DSYZ_DUMP_IMAGE). Will this work for you?



>>> ...here's my take on this whole situation:
>>>
>>> So far as I can tell, this syzbot daemon grew the ability to fuzz
>>> filesystems and started emitting bug report after bug report, with
>>> misleading commit ids that have nothing to do with the complaint.
>>
>> Please elaborate re commits. It's a basic rule of any good bug report
>> -- communicate exact state of source code when the bug was hit, i.e.
>> provide the commit hash.
>
> Further best practice is to provide the /correct/ commit hash.
>
> syzbot has identified commits which seem almost certainly incorrect.
>
> For example,
>
> KASAN: use-after-free Read in radix_tree_next_chunk
>
> identified:
>
> 9dd2326890d89a5179967c947dab2bab34d7ddee (Fri Mar 30 17:29:47 2018 +0000)
> Merge tag 'ceph-for-4.16-rc8' of git://github.com/ceph/ceph-client
>
> and:
>
> WARNING: bad unlock balance in xfs_iunlock
>
> identified:
>
> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +0000)
> Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>
> I can't imagine these are right...


As I said, bisection is on our plate:
https://github.com/google/syzkaller/issues/501
Though, we will see how well it works, because it's not trivial (see
the issue for details).


>>>  Your
>>> maintainers (Dave, Eric, and me) have spent a few hours trying to figure
>>> out what's going on, to some frustration.  The bug reports themselves
>>> miss the public engagement detail of saying something like "Hey XFS
>>> engineers, if you'd like to talk to the syzbot engineers they can be
>>> reached at <googlegroup address>."  Instead it merely says "direct
>>> questions to <addr>" like this is some press release and the only thing
>>> on the other end of the line will be some disinterested bureaucracy.
>>> Or some robot.
>>
>> This is quite subjective and we hear opinions all possible ways. I
>> don't think there is one size fits all. E.g. +Ted argued in exactly
>> the opposite direction: make reports more laconic, formal,
>> table-formatted with dry information. There is also a tradeoff between
>> describing each detail in proper, friendly English and being more
>> laconic. If we increase everything 4x and post a wall of text with
>> each report, lots of people won't read anything of it just because
>> it's a wall of text. I am also not a native English speaker, so
>> providing simple formal text is a safer option than writing something
>> potentially clumsy.
>
> What Darrick is asking for, I think, is a human on the other end to talk
> to if there are any issues or concerns about the reports.  (For example,
> "hey that commit looks wrong, are you sure?)  We are not asking for a
> long narrative with each report.  We're asking for a dialogue about this
> framework and the reporting, so it can improve.
>
>> Having said that, I am collecting proposals for report format
>> improvements. Here is fortunately slightly better wording for footer
>> based on your idea:
>> https://github.com/google/syzkaller/issues/565#issuecomment-380793620

Report footer will be improved to make it more clear:
https://github.com/google/syzkaller/issues/565#issuecomment-380793620


>> Well, I guess nobody has infinite engineering resources. If we would
>> have them we would also fix all these bug ourselves and don't bother
>> you at all. Just as any other company could invest in writing bug
>> detection tools, fuzzers, deploy them and report bugs, which would
>> relief us from this.
>> We have limited resources and do what's possible within these
>> resources. Unfortunately providing individual handling of each of the
>> thousands of bugs is not possible within these resources. I know that
>> what we are doing is useful for kernel overall because lots of
>> hundreds of bugs get fixed due to this effort. If you, as xfs
>> maintainers, think that these reports are net negative for xfs and xfs
>> should not be tested, say so and we will figure out how to make it
>> happen.
>
> We've been inundated lately with results of scripts which generate bug reports
> quickly but do not provide enough information to quickly triage, reproduce, and
> fix those reports.
>
> (For example, another fuzzer is using the same "poc.c" to exercise a fuzzed
> image; it might be the 1st operation or the 50th that causes the issue, but
> the harness doesn't bother to find out or report it, it just sends all 50
> potential operations to us, and assumes we'll take the time to figure it out.
> It's laziness on the script/reporting side, resulting in extra work on the
> human/debugging side.)
>
> I think that in this case, what we are asking for is a fine tuning of the
> testing and reporting so that we can more efficiently address these issues.
> Off the top of my head, and there may be more items:
>
> 1) Add a human contact to the emails, start an IRC channel, etc, for better
>    two-way communication.  (it wasn't clear that syzkaller@ reached humans,
>    tbh.)

There is a human contact at syzkaller@googlegroups.com. Report footer
will be improved to make it more clear:
https://github.com/google/syzkaller/issues/565#issuecomment-380793620

> 2) _Properly_ identify the regressing commit, if any.  If it doesn't look like
>    a recent regression, you can state that too.

Bisection is on our plate.

> 3) If you're reporting a filesystem bug that arose from using a filesystem
>    image, provide a URL to that filesystem image directly in the report.

See above. It may not be necessary representable as a static file at all.

> 4) Create a filesystem image that can be more easily debugged by the experts,
>    i.e. one with > 1 allocation group, so standard repair & analysis tools can
>    be used with it.

What is "> 1 allocation group"?

> Hopefully this sort of feedback will result in better bug reports from you,
> and faster (and more) bugfixes from us.
>
> Personally, I'm certainly not asking you to stop sending reports of bugs you
> find, but I /am/ asking that they be as refined, specific, and useful as possible.
>
> Thanks,
> -Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-04-05 18:54   ` Dmitry Vyukov
  2018-04-05 21:38     ` Dave Chinner
@ 2018-04-30 13:24     ` Dmitry Vyukov
  2018-05-01 22:51       ` Dave Chinner
  1 sibling, 1 reply; 27+ messages in thread
From: Dmitry Vyukov @ 2018-04-30 13:24 UTC (permalink / raw)
  To: Dave Chinner; +Cc: syzbot, Darrick J. Wong, LKML, linux-xfs, syzkaller-bugs

On Thu, Apr 5, 2018 at 8:54 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Tue, Apr 3, 2018 at 6:38 AM, Dave Chinner <david@fromorbit.com> wrote:
>> On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
>>> Hello,
>>>
>>> syzbot hit the following crash on upstream commit
>>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +0000)
>>> Merge branch 'ras-core-for-linus' of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>> syzbot dashboard link:
>>> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
>>>
>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
>>> syzkaller reproducer:
>>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>>
>> What a mess. A hand built, hopelessly broken filesystem image made
>> up of hex dumps, written into a mmap()d region of memory, then
>> copied into a tmpfs file and mounted with the loop device.
>>
>> Engineers that can debug broken filesystems don't grow on trees.  If
>> we are to have any hope of understanding what the hell this test is
>> doing, the bot needs to supply us with a copy of the built
>> filesystem image the test uses. We need to be able to point forensic
>> tools at the image to decode all the structures into human readable
>> format - if we are forced to do that by hand or jump through hoops
>> to create our own filesystem image than I'm certainly not going to
>> waste time looking at these reports...
>
> Hi Dave,
>
> Here is the image:
> https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view

Have anybody looked at the bug and the image yet?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-04-30 13:23             ` Dmitry Vyukov
@ 2018-04-30 13:49               ` Eric Sandeen
  2018-04-30 14:02                 ` Dmitry Vyukov
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Sandeen @ 2018-04-30 13:49 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Darrick J. Wong, Dave Chinner, syzbot, LKML, linux-xfs,
	syzkaller-bugs, Theodore Ts'o, syzkaller

On 4/30/18 8:23 AM, Dmitry Vyukov wrote:
> On Mon, Apr 16, 2018 at 9:22 PM, Eric Sandeen <sandeen@sandeen.net> wrote:

...

>> It sure /seems/ to have a notion of images: what else is syz_mount_image()?
>>
>> i.e. you are mounting an image to reproduce the problem, correct?
>> And the system is "smart" enough to fire off an email to a filesystem list;
>> if it does so, add a link to the image itself, as you already have already done
>> for the C reproducer.
>>
>> Filesystem images are common parlance for filesystem engineers.  When
>> you engage with them you'll have better results if you  provide them with
>> inputs they can use directly instead of requiring them to reverse-engineer
>> your custom test harness.
> 
> 
> Well, yes and no.
> syz_mount_image() is the only part of a large system that knows about
> images. For the rest of the system it's just a syscall like any other
> syscall. And the part that sends emails is far away from
> syz_mount_image().
> syzbot does not know per se that it sends an email to filesystems
> list.

I am asking it to learn this trick as an enhancement.  The MAINTAINERS
file contains big clues about which subsystems are filesystems, for example:

$ grep FILESYSTEM$ MAINTAINERS 
AFS FILESYSTEM
CRAMFS FILESYSTEM
EFI VARIABLE FILESYSTEM
EFS FILESYSTEM
FREEVXFS FILESYSTEM
...

> It just extracted kernel source file name that looked relevant
> to this crash and run get_maintainers.pl on it.
> Also the image can contain dynamically generated data, which makes it
> impossible to have as a file at all.

I guess I'm not sure what this means, can you explain?

> Thinking of this, what should be reasonably easy to do and may be a
> compromise for near future is the following. We could insert code into
> syz_mount_image() which dump the image if you build a program with a
> special define (e.g. -DSYZ_DUMP_IMAGE). Will this work for you?

If this is possible, I guess I still don't understand why you can't dump the
image and provide link.  You have fast, efficient robots.  We have slow,
busy humans.

>>> Please elaborate re commits. It's a basic rule of any good bug report
>>> -- communicate exact state of source code when the bug was hit, i.e.
>>> provide the commit hash.
>>
>> Further best practice is to provide the /correct/ commit hash.

....

>> I can't imagine these are right...
> 
> 
> As I said, bisection is on our plate:
> https://github.com/google/syzkaller/issues/501
> Though, we will see how well it works, because it's not trivial (see
> the issue for details).

Oh I see. I had misunderstood; so:

> syzbot hit the following crash on upstream commit
> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +0000)
> Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

doesn't mean you bisected to that commit, or that this was the first bad commit,
it just means you happened to have a tree at this commit when you hit the problem.

That was not at all clear to me.  I thought when syzkaller was telling us
"on upstream commit XYZ," it meant that it had identified commit XYZ as bad.
I'm not sure if anyone else made that mistake, but  perhaps you could also clarify
the bug report text in this regard?

...

>> I think that in this case, what we are asking for is a fine tuning of the
>> testing and reporting so that we can more efficiently address these issues.
>> Off the top of my head, and there may be more items:
>>
>> 1) Add a human contact to the emails, start an IRC channel, etc, for better
>>    two-way communication.  (it wasn't clear that syzkaller@ reached humans,
>>    tbh.)
> 
> There is a human contact at syzkaller@googlegroups.com. Report footer
> will be improved to make it more clear:
> https://github.com/google/syzkaller/issues/565#issuecomment-380793620
> 
>> 2) _Properly_ identify the regressing commit, if any.  If it doesn't look like
>>    a recent regression, you can state that too.
> 
> Bisection is on our plate.
> 
>> 3) If you're reporting a filesystem bug that arose from using a filesystem
>>    image, provide a URL to that filesystem image directly in the report.
> 
> See above. It may not be necessary representable as a static file at all.

Can you explain this?  Do you mean that the mounted image is changing while the
tool runs, while the filesystem is mounted?
 
>> 4) Create a filesystem image that can be more easily debugged by the experts,
>>    i.e. one with > 1 allocation group, so standard repair & analysis tools can
>>    be used with it.
> 
> What is "> 1 allocation group"?

Maybe I should back up; how are the xfs images created?  I had assumed that
surely you start with a base image of some sort, and start fuzzing it from there.
Is this correct?

If so, allocation groups are a fundamental geometry of the filesystem; from
man mkfs.xfs.8:

                   agcount=value
                          This is used to specify  the  number  of  allocation
                          groups.  The  data  section  of  the  filesystem  is
                          divided into allocation groups to improve  the  per‐
                          formance  of  XFS. More allocation groups imply that
                          more parallelism can  be  achieved  when  allocating
                          blocks and inodes. The minimum allocation group size
                          is 16 MiB; the maximum size is  just  under  1  TiB.
                          The  data  section of the filesystem is divided into
                          value allocation groups  (default  value  is  scaled
                          automatically based on the underlying device size).

If the base image only has one allocation group, it makes it more difficult for
some tools to work with the image, because there is no redundancy.  1 AG is
not a supported or recommended geometry for any real-life use of xfs.

If I am correct that you start with a base image w/ a certain geometry or
set of mkfs options, starting with >= 2 AGs would improve the usefulness of the
filesystem image.

Thanks,
-Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-04-30 13:49               ` Eric Sandeen
@ 2018-04-30 14:02                 ` Dmitry Vyukov
  2018-04-30 15:14                   ` Eric Sandeen
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Vyukov @ 2018-04-30 14:02 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Darrick J. Wong, Dave Chinner, syzbot, LKML, linux-xfs,
	syzkaller-bugs, Theodore Ts'o, syzkaller

On Mon, Apr 30, 2018 at 3:49 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
> On 4/30/18 8:23 AM, Dmitry Vyukov wrote:
>> On Mon, Apr 16, 2018 at 9:22 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
>
> ...
>
>>> It sure /seems/ to have a notion of images: what else is syz_mount_image()?
>>>
>>> i.e. you are mounting an image to reproduce the problem, correct?
>>> And the system is "smart" enough to fire off an email to a filesystem list;
>>> if it does so, add a link to the image itself, as you already have already done
>>> for the C reproducer.
>>>
>>> Filesystem images are common parlance for filesystem engineers.  When
>>> you engage with them you'll have better results if you  provide them with
>>> inputs they can use directly instead of requiring them to reverse-engineer
>>> your custom test harness.
>>
>>
>> Well, yes and no.
>> syz_mount_image() is the only part of a large system that knows about
>> images. For the rest of the system it's just a syscall like any other
>> syscall. And the part that sends emails is far away from
>> syz_mount_image().
>> syzbot does not know per se that it sends an email to filesystems
>> list.
>
> I am asking it to learn this trick as an enhancement.  The MAINTAINERS
> file contains big clues about which subsystems are filesystems, for example:
>
> $ grep FILESYSTEM$ MAINTAINERS
> AFS FILESYSTEM
> CRAMFS FILESYSTEM
> EFI VARIABLE FILESYSTEM
> EFS FILESYSTEM
> FREEVXFS FILESYSTEM
> ...
>
>> It just extracted kernel source file name that looked relevant
>> to this crash and run get_maintainers.pl on it.
>> Also the image can contain dynamically generated data, which makes it
>> impossible to have as a file at all.
>
> I guess I'm not sure what this means, can you explain?

Say, a value that we generally pass to close system call is not static
and can't be dumped to a static file. It's whatever a previous open
system call has returned. Inside of the program we memorize the return
value of open in a variable and then pass it to close. This generally
stands for all system calls. Say, an image can contain an uid, and
that uid can be obtained from a system call too.


>> Thinking of this, what should be reasonably easy to do and may be a
>> compromise for near future is the following. We could insert code into
>> syz_mount_image() which dump the image if you build a program with a
>> special define (e.g. -DSYZ_DUMP_IMAGE). Will this work for you?
>
> If this is possible, I guess I still don't understand why you can't dump the
> image and provide link.  You have fast, efficient robots.  We have slow,
> busy humans.
>
>>>> Please elaborate re commits. It's a basic rule of any good bug report
>>>> -- communicate exact state of source code when the bug was hit, i.e.
>>>> provide the commit hash.
>>>
>>> Further best practice is to provide the /correct/ commit hash.
>
> ....
>
>>> I can't imagine these are right...
>>
>>
>> As I said, bisection is on our plate:
>> https://github.com/google/syzkaller/issues/501
>> Though, we will see how well it works, because it's not trivial (see
>> the issue for details).
>
> Oh I see. I had misunderstood; so:
>
>> syzbot hit the following crash on upstream commit
>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +0000)
>> Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>
> doesn't mean you bisected to that commit, or that this was the first bad commit,
> it just means you happened to have a tree at this commit when you hit the problem.
>
> That was not at all clear to me.  I thought when syzkaller was telling us
> "on upstream commit XYZ," it meant that it had identified commit XYZ as bad.
> I'm not sure if anyone else made that mistake, but  perhaps you could also clarify
> the bug report text in this regard?

Suggestions are welcome. Currently it says "syzbot hit the following
crash on upstream commit SHA1", which was supposed to mean just the
state of the source tree when the crash happened. But I am not a
native speaker, so perhaps I am saying not what I intend to say.

There are also suggestions on report format improvement from +Ted
currently in works:
https://github.com/google/syzkaller/issues/565#issuecomment-380792942
Not sure if they make this distinction 100% clear, though.


>>> I think that in this case, what we are asking for is a fine tuning of the
>>> testing and reporting so that we can more efficiently address these issues.
>>> Off the top of my head, and there may be more items:
>>>
>>> 1) Add a human contact to the emails, start an IRC channel, etc, for better
>>>    two-way communication.  (it wasn't clear that syzkaller@ reached humans,
>>>    tbh.)
>>
>> There is a human contact at syzkaller@googlegroups.com. Report footer
>> will be improved to make it more clear:
>> https://github.com/google/syzkaller/issues/565#issuecomment-380793620
>>
>>> 2) _Properly_ identify the regressing commit, if any.  If it doesn't look like
>>>    a recent regression, you can state that too.
>>
>> Bisection is on our plate.
>>
>>> 3) If you're reporting a filesystem bug that arose from using a filesystem
>>>    image, provide a URL to that filesystem image directly in the report.
>>
>> See above. It may not be necessary representable as a static file at all.
>
> Can you explain this?  Do you mean that the mounted image is changing while the
> tool runs, while the filesystem is mounted?
>
>>> 4) Create a filesystem image that can be more easily debugged by the experts,
>>>    i.e. one with > 1 allocation group, so standard repair & analysis tools can
>>>    be used with it.
>>
>> What is "> 1 allocation group"?
>
> Maybe I should back up; how are the xfs images created?  I had assumed that
> surely you start with a base image of some sort, and start fuzzing it from there.
> Is this correct?
>
> If so, allocation groups are a fundamental geometry of the filesystem; from
> man mkfs.xfs.8:
>
>                    agcount=value
>                           This is used to specify  the  number  of  allocation
>                           groups.  The  data  section  of  the  filesystem  is
>                           divided into allocation groups to improve  the  per‐
>                           formance  of  XFS. More allocation groups imply that
>                           more parallelism can  be  achieved  when  allocating
>                           blocks and inodes. The minimum allocation group size
>                           is 16 MiB; the maximum size is  just  under  1  TiB.
>                           The  data  section of the filesystem is divided into
>                           value allocation groups  (default  value  is  scaled
>                           automatically based on the underlying device size).
>
> If the base image only has one allocation group, it makes it more difficult for
> some tools to work with the image, because there is no redundancy.  1 AG is
> not a supported or recommended geometry for any real-life use of xfs.
>
> If I am correct that you start with a base image w/ a certain geometry or
> set of mkfs options, starting with >= 2 AGs would improve the usefulness of the
> filesystem image.

syzkaller can generate/mutate images based on structured format
templates, but for now we don't have any templates and these are just
opaque blobs.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-04-30 14:02                 ` Dmitry Vyukov
@ 2018-04-30 15:14                   ` Eric Sandeen
  2018-05-02  9:54                     ` Jan Tulak
                                       ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Eric Sandeen @ 2018-04-30 15:14 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Darrick J. Wong, Dave Chinner, syzbot, LKML, linux-xfs,
	syzkaller-bugs, Theodore Ts'o, syzkaller

On 4/30/18 9:02 AM, Dmitry Vyukov wrote:
> On Mon, Apr 30, 2018 at 3:49 PM, Eric Sandeen <sandeen@sandeen.net> wrote:

...

>>> It just extracted kernel source file name that looked relevant
>>> to this crash and run get_maintainers.pl on it.
>>> Also the image can contain dynamically generated data, which makes it
>>> impossible to have as a file at all.
>>
>> I guess I'm not sure what this means, can you explain?
> 
> Say, a value that we generally pass to close system call is not static
> and can't be dumped to a static file. It's whatever a previous open
> system call has returned. Inside of the program we memorize the return
> value of open in a variable and then pass it to close. This generally
> stands for all system calls. Say, an image can contain an uid, and
> that uid can be obtained from a system call too.

Ok, but that's the syscall side.  You are operating on a static xfs image,
correct?  We're only asking for the actual filesystem you're operating
against.

(When I say "image" I am talking only about the filesystem itself, not any
other syzkaller state)
 
...

>> That was not at all clear to me.  I thought when syzkaller was telling us
>> "on upstream commit XYZ," it meant that it had identified commit XYZ as bad.
>> I'm not sure if anyone else made that mistake, but  perhaps you could also clarify
>> the bug report text in this regard?
> 
> Suggestions are welcome. Currently it says "syzbot hit the following
> crash on upstream commit SHA1", which was supposed to mean just the
> state of the source tree when the crash happened. But I am not a
> native speaker, so perhaps I am saying not what I intend to say.
> 
> There are also suggestions on report format improvement from +Ted
> currently in works:
> https://github.com/google/syzkaller/issues/565#issuecomment-380792942
> Not sure if they make this distinction 100% clear, though.

Maybe I was the only one who misunderstood, but something like

git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
HEAD:		f5c754d63d06 mm/swap_state.c: make bool enable_vma_readahead and swap_vma_readahead()
 
to make it clear that it has not identified that commit as the culprit, it's
just the head of the tree you were testing?  (I think I have the correct git
nomenclature ...)

...

>> If the base image only has one allocation group, it makes it more difficult for
>> some tools to work with the image, because there is no redundancy.  1 AG is
>> not a supported or recommended geometry for any real-life use of xfs.
>>
>> If I am correct that you start with a base image w/ a certain geometry or
>> set of mkfs options, starting with >= 2 AGs would improve the usefulness of the
>> filesystem image.
> 
> syzkaller can generate/mutate images based on structured format
> templates, but for now we don't have any templates and these are just
> opaque blobs.

Ok, backing up more: When you are testing against an xfs filesystem image, where
does that image come from?  How is it generated?  A quick look at the syzkaller
tree didn't make that clear to me.

the xfs.repro file you provided at
https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view

is strange, it doesn't even contain AGF blocks; they aren't fuzzed or corrupted,
they are completely zeroed out.  I don't know if that's part of the fuzzing,
or what - what steps led to that image?

Or put another way, how did you arrive at the fs image values in the reproducer,
i.e.:

oid loop()
{
  memcpy((void*)0x20000000, "xfs", 4);
  memcpy((void*)0x20000100, "./file0", 8);
  *(uint64_t*)0x20000200 = 0x20010000;
  memcpy((void*)0x20010000,
         "\x58\x46\x53\x42\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00"
         "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x9f\x98"
         "\x99\xff\xcb\xa1\x4e\xe6\xad\x52\x08\x20\x67\x09\xed\x75\x00\x00\x00"
         "\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x35\xe0\x00\x00\x00\x00"
         "\x00\x00\x35\xe1\x00\x00\x00\x00\x00\x00\x35\xe2\x00\x00\x00\x01\x00"
         "\x00\x10\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x03\x55\xb4\xa4"
         "\x02\x00\x01\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
         "\x00\x0c\x09\x08\x04\x0c\x00\x00\x19\x00\x00\x00\x00\x00\x00\x00\x40"
         "\x00\x00\x00\x00\x00\x00\x00\x3d\x00\x00\x00\x00\x00\x00\x0c\xa3\x00"
         "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
         "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00"
         "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x02\x02",
         204);

... 

The in-memory xfs filesystem it constructs is damaged, is that an intentional
part of the fuzzing during the test?

-Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-04-30 13:24     ` Dmitry Vyukov
@ 2018-05-01 22:51       ` Dave Chinner
  2018-05-08  7:56         ` Dmitry Vyukov
  0 siblings, 1 reply; 27+ messages in thread
From: Dave Chinner @ 2018-05-01 22:51 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: syzbot, Darrick J. Wong, LKML, linux-xfs, syzkaller-bugs

On Mon, Apr 30, 2018 at 03:24:48PM +0200, Dmitry Vyukov wrote:
> On Thu, Apr 5, 2018 at 8:54 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> > On Tue, Apr 3, 2018 at 6:38 AM, Dave Chinner <david@fromorbit.com> wrote:
> >> On Mon, Apr 02, 2018 at 07:01:02PM -0700, syzbot wrote:
> >>> Hello,
> >>>
> >>> syzbot hit the following crash on upstream commit
> >>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +0000)
> >>> Merge branch 'ras-core-for-linus' of
> >>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> >>> syzbot dashboard link:
> >>> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
> >>>
> >>> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
> >>> syzkaller reproducer:
> >>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
> >>
> >> What a mess. A hand built, hopelessly broken filesystem image made
> >> up of hex dumps, written into a mmap()d region of memory, then
> >> copied into a tmpfs file and mounted with the loop device.
> >>
> >> Engineers that can debug broken filesystems don't grow on trees.  If
> >> we are to have any hope of understanding what the hell this test is
> >> doing, the bot needs to supply us with a copy of the built
> >> filesystem image the test uses. We need to be able to point forensic
> >> tools at the image to decode all the structures into human readable
> >> format - if we are forced to do that by hand or jump through hoops
> >> to create our own filesystem image than I'm certainly not going to
> >> waste time looking at these reports...
> >
> > Hi Dave,
> >
> > Here is the image:
> > https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
> 
> Have anybody looked at the bug and the image yet?

Yes, I did that a couple of weeks ago. Couldn't reproduce on a TOT
kernel here.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-04-30 15:14                   ` Eric Sandeen
@ 2018-05-02  9:54                     ` Jan Tulak
  2018-05-08  7:52                     ` Dmitry Vyukov
  2018-05-08  7:54                     ` Dmitry Vyukov
  2 siblings, 0 replies; 27+ messages in thread
From: Jan Tulak @ 2018-05-02  9:54 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Dmitry Vyukov, Darrick J. Wong, Dave Chinner, syzbot, LKML,
	linux-xfs, syzkaller-bugs, Theodore Ts'o, syzkaller

On Mon, Apr 30, 2018 at 5:14 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
> On 4/30/18 9:02 AM, Dmitry Vyukov wrote:
>> Suggestions are welcome. Currently it says "syzbot hit the following
>> crash on upstream commit SHA1", which was supposed to mean just the
>> state of the source tree when the crash happened. But I am not a
>> native speaker, so perhaps I am saying not what I intend to say.
>>
>> There are also suggestions on report format improvement from +Ted
>> currently in works:
>> https://github.com/google/syzkaller/issues/565#issuecomment-380792942
>> Not sure if they make this distinction 100% clear, though.
>
> Maybe I was the only one who misunderstood, but something like
>

snip

You are not the only one, I (mis)understand it the same as you.

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-04-30 15:14                   ` Eric Sandeen
  2018-05-02  9:54                     ` Jan Tulak
@ 2018-05-08  7:52                     ` Dmitry Vyukov
  2018-05-09  2:48                       ` Eric Sandeen
  2018-05-08  7:54                     ` Dmitry Vyukov
  2 siblings, 1 reply; 27+ messages in thread
From: Dmitry Vyukov @ 2018-05-08  7:52 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Darrick J. Wong, Dave Chinner, syzbot, LKML, linux-xfs,
	syzkaller-bugs, Theodore Ts'o, syzkaller

On Mon, Apr 30, 2018 at 5:14 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
> On 4/30/18 9:02 AM, Dmitry Vyukov wrote:
>> On Mon, Apr 30, 2018 at 3:49 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
>
> ...
>
>>>> It just extracted kernel source file name that looked relevant
>>>> to this crash and run get_maintainers.pl on it.
>>>> Also the image can contain dynamically generated data, which makes it
>>>> impossible to have as a file at all.
>>>
>>> I guess I'm not sure what this means, can you explain?
>>
>> Say, a value that we generally pass to close system call is not static
>> and can't be dumped to a static file. It's whatever a previous open
>> system call has returned. Inside of the program we memorize the return
>> value of open in a variable and then pass it to close. This generally
>> stands for all system calls. Say, an image can contain an uid, and
>> that uid can be obtained from a system call too.
>
> Ok, but that's the syscall side.  You are operating on a static xfs image,
> correct?  We're only asking for the actual filesystem you're operating
> against.

Not necessary. Image can be dynamically generate too, all inputs to
kernel are generally dynamically generated.


> (When I say "image" I am talking only about the filesystem itself, not any
> other syzkaller state)

OK, let's do it this way. For the first 10 bugs, ask me, and I will do
it manually.
I am all for automation. And syzbot is already more automated than
most kernel testing systems. But, as I said, this is really
not-trivial, large amount of work, and is specific to one out of
dozens of kernel subsystems.


> Ok, backing up more: When you are testing against an xfs filesystem image, where
> does that image come from?  How is it generated?  A quick look at the syzkaller
> tree didn't make that clear to me.
>
> the xfs.repro file you provided at
> https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
>
> is strange, it doesn't even contain AGF blocks; they aren't fuzzed or corrupted,
> they are completely zeroed out.  I don't know if that's part of the fuzzing,
> or what - what steps led to that image?
>
> Or put another way, how did you arrive at the fs image values in the reproducer,
> i.e.:

Currently they are completely random, nobody taught syzkaller about AGFs, etc.

> oid loop()
> {
>   memcpy((void*)0x20000000, "xfs", 4);
>   memcpy((void*)0x20000100, "./file0", 8);
>   *(uint64_t*)0x20000200 = 0x20010000;
>   memcpy((void*)0x20010000,
>          "\x58\x46\x53\x42\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00"
>          "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x9f\x98"
>          "\x99\xff\xcb\xa1\x4e\xe6\xad\x52\x08\x20\x67\x09\xed\x75\x00\x00\x00"
>          "\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x35\xe0\x00\x00\x00\x00"
>          "\x00\x00\x35\xe1\x00\x00\x00\x00\x00\x00\x35\xe2\x00\x00\x00\x01\x00"
>          "\x00\x10\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x03\x55\xb4\xa4"
>          "\x02\x00\x01\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
>          "\x00\x0c\x09\x08\x04\x0c\x00\x00\x19\x00\x00\x00\x00\x00\x00\x00\x40"
>          "\x00\x00\x00\x00\x00\x00\x00\x3d\x00\x00\x00\x00\x00\x00\x0c\xa3\x00"
>          "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
>          "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00"
>          "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x02\x02",
>          204);
>
> ...
>
> The in-memory xfs filesystem it constructs is damaged, is that an intentional
> part of the fuzzing during the test?

Yes, invalid inputs is part of testing.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-04-30 15:14                   ` Eric Sandeen
  2018-05-02  9:54                     ` Jan Tulak
  2018-05-08  7:52                     ` Dmitry Vyukov
@ 2018-05-08  7:54                     ` Dmitry Vyukov
  2 siblings, 0 replies; 27+ messages in thread
From: Dmitry Vyukov @ 2018-05-08  7:54 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Darrick J. Wong, Dave Chinner, syzbot, LKML, linux-xfs,
	syzkaller-bugs, Theodore Ts'o, syzkaller

On Mon, Apr 30, 2018 at 5:14 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
> On 4/30/18 9:02 AM, Dmitry Vyukov wrote:
>> On Mon, Apr 30, 2018 at 3:49 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
>
> ...
>
>>>> It just extracted kernel source file name that looked relevant
>>>> to this crash and run get_maintainers.pl on it.
>>>> Also the image can contain dynamically generated data, which makes it
>>>> impossible to have as a file at all.
>>>
>>> I guess I'm not sure what this means, can you explain?
>>
>> Say, a value that we generally pass to close system call is not static
>> and can't be dumped to a static file. It's whatever a previous open
>> system call has returned. Inside of the program we memorize the return
>> value of open in a variable and then pass it to close. This generally
>> stands for all system calls. Say, an image can contain an uid, and
>> that uid can be obtained from a system call too.
>
> Ok, but that's the syscall side.  You are operating on a static xfs image,
> correct?  We're only asking for the actual filesystem you're operating
> against.
>
> (When I say "image" I am talking only about the filesystem itself, not any
> other syzkaller state)
>
> ...
>
>>> That was not at all clear to me.  I thought when syzkaller was telling us
>>> "on upstream commit XYZ," it meant that it had identified commit XYZ as bad.
>>> I'm not sure if anyone else made that mistake, but  perhaps you could also clarify
>>> the bug report text in this regard?
>>
>> Suggestions are welcome. Currently it says "syzbot hit the following
>> crash on upstream commit SHA1", which was supposed to mean just the
>> state of the source tree when the crash happened. But I am not a
>> native speaker, so perhaps I am saying not what I intend to say.
>>
>> There are also suggestions on report format improvement from +Ted
>> currently in works:
>> https://github.com/google/syzkaller/issues/565#issuecomment-380792942
>> Not sure if they make this distinction 100% clear, though.
>
> Maybe I was the only one who misunderstood, but something like
>
> git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> HEAD:           f5c754d63d06 mm/swap_state.c: make bool enable_vma_readahead and swap_vma_readahead()
>
> to make it clear that it has not identified that commit as the culprit, it's
> just the head of the tree you were testing?  (I think I have the correct git
> nomenclature ...)


This is done now, you can see example of new format here:

https://lkml.org/lkml/2018/5/8/36

It says "HEAD commit" and also "syzbot engineers can be reached at <email>".

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-05-01 22:51       ` Dave Chinner
@ 2018-05-08  7:56         ` Dmitry Vyukov
  2018-05-09  0:50           ` Dave Chinner
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Vyukov @ 2018-05-08  7:56 UTC (permalink / raw)
  To: Dave Chinner; +Cc: syzbot, Darrick J. Wong, LKML, linux-xfs, syzkaller-bugs

On Wed, May 2, 2018 at 12:51 AM, Dave Chinner <david@fromorbit.com> wrote:
>> >>> Hello,
>> >>>
>> >>> syzbot hit the following crash on upstream commit
>> >>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +0000)
>> >>> Merge branch 'ras-core-for-linus' of
>> >>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> >>> syzbot dashboard link:
>> >>> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
>> >>>
>> >>> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
>> >>> syzkaller reproducer:
>> >>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>> >>
>> >> What a mess. A hand built, hopelessly broken filesystem image made
>> >> up of hex dumps, written into a mmap()d region of memory, then
>> >> copied into a tmpfs file and mounted with the loop device.
>> >>
>> >> Engineers that can debug broken filesystems don't grow on trees.  If
>> >> we are to have any hope of understanding what the hell this test is
>> >> doing, the bot needs to supply us with a copy of the built
>> >> filesystem image the test uses. We need to be able to point forensic
>> >> tools at the image to decode all the structures into human readable
>> >> format - if we are forced to do that by hand or jump through hoops
>> >> to create our own filesystem image than I'm certainly not going to
>> >> waste time looking at these reports...
>> >
>> > Hi Dave,
>> >
>> > Here is the image:
>> > https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
>>
>> Have anybody looked at the bug and the image yet?
>
> Yes, I did that a couple of weeks ago. Couldn't reproduce on a TOT
> kernel here.

Do you think it is fixed now? What fixed it? The bug was there.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-05-08  7:56         ` Dmitry Vyukov
@ 2018-05-09  0:50           ` Dave Chinner
  2018-05-09  2:37             ` Eric Biggers
  2018-05-09 13:55             ` Theodore Y. Ts'o
  0 siblings, 2 replies; 27+ messages in thread
From: Dave Chinner @ 2018-05-09  0:50 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: syzbot, Darrick J. Wong, LKML, linux-xfs, syzkaller-bugs

On Tue, May 08, 2018 at 09:56:01AM +0200, Dmitry Vyukov wrote:
> On Wed, May 2, 2018 at 12:51 AM, Dave Chinner <david@fromorbit.com> wrote:
> >> >>> Hello,
> >> >>>
> >> >>> syzbot hit the following crash on upstream commit
> >> >>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +0000)
> >> >>> Merge branch 'ras-core-for-linus' of
> >> >>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> >> >>> syzbot dashboard link:
> >> >>> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
> >> >>>
> >> >>> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
> >> >>> syzkaller reproducer:
> >> >>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
> >> >>
> >> >> What a mess. A hand built, hopelessly broken filesystem image made
> >> >> up of hex dumps, written into a mmap()d region of memory, then
> >> >> copied into a tmpfs file and mounted with the loop device.
> >> >>
> >> >> Engineers that can debug broken filesystems don't grow on trees.  If
> >> >> we are to have any hope of understanding what the hell this test is
> >> >> doing, the bot needs to supply us with a copy of the built
> >> >> filesystem image the test uses. We need to be able to point forensic
> >> >> tools at the image to decode all the structures into human readable
> >> >> format - if we are forced to do that by hand or jump through hoops
> >> >> to create our own filesystem image than I'm certainly not going to
> >> >> waste time looking at these reports...
> >> >
> >> > Hi Dave,
> >> >
> >> > Here is the image:
> >> > https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
> >>
> >> Have anybody looked at the bug and the image yet?
> >
> > Yes, I did that a couple of weeks ago. Couldn't reproduce on a TOT
> > kernel here.
> 
> Do you think it is fixed now? What fixed it? The bug was there.

We merge fixes for fuzzing issues all the time. IIRC a big batch of
them from the xfstests fuzzing infrastructure went into 4.17-rc1.

If you want a commit, then do a bisect....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-05-09  0:50           ` Dave Chinner
@ 2018-05-09  2:37             ` Eric Biggers
  2018-05-09  3:32               ` Eric Sandeen
  2018-05-09 13:55             ` Theodore Y. Ts'o
  1 sibling, 1 reply; 27+ messages in thread
From: Eric Biggers @ 2018-05-09  2:37 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Dmitry Vyukov, syzbot, Darrick J. Wong, LKML, linux-xfs, syzkaller-bugs

On Wed, May 09, 2018 at 10:50:11AM +1000, Dave Chinner wrote:
> On Tue, May 08, 2018 at 09:56:01AM +0200, Dmitry Vyukov wrote:
> > On Wed, May 2, 2018 at 12:51 AM, Dave Chinner <david@fromorbit.com> wrote:
> > >> >>> Hello,
> > >> >>>
> > >> >>> syzbot hit the following crash on upstream commit
> > >> >>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +0000)
> > >> >>> Merge branch 'ras-core-for-linus' of
> > >> >>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > >> >>> syzbot dashboard link:
> > >> >>> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
> > >> >>>
> > >> >>> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
> > >> >>> syzkaller reproducer:
> > >> >>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
> > >> >>
> > >> >> What a mess. A hand built, hopelessly broken filesystem image made
> > >> >> up of hex dumps, written into a mmap()d region of memory, then
> > >> >> copied into a tmpfs file and mounted with the loop device.
> > >> >>
> > >> >> Engineers that can debug broken filesystems don't grow on trees.  If
> > >> >> we are to have any hope of understanding what the hell this test is
> > >> >> doing, the bot needs to supply us with a copy of the built
> > >> >> filesystem image the test uses. We need to be able to point forensic
> > >> >> tools at the image to decode all the structures into human readable
> > >> >> format - if we are forced to do that by hand or jump through hoops
> > >> >> to create our own filesystem image than I'm certainly not going to
> > >> >> waste time looking at these reports...
> > >> >
> > >> > Hi Dave,
> > >> >
> > >> > Here is the image:
> > >> > https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
> > >>
> > >> Have anybody looked at the bug and the image yet?
> > >
> > > Yes, I did that a couple of weeks ago. Couldn't reproduce on a TOT
> > > kernel here.
> > 
> > Do you think it is fixed now? What fixed it? The bug was there.
> 
> We merge fixes for fuzzing issues all the time. IIRC a big batch of
> them from the xfstests fuzzing infrastructure went into 4.17-rc1.
> 
> If you want a commit, then do a bisect....
> 

The fix was commit 8241f7f983b9728:

#syz fix: xfs: don't iunlock the quota ip when quota block

- Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-05-08  7:52                     ` Dmitry Vyukov
@ 2018-05-09  2:48                       ` Eric Sandeen
  2018-05-09  8:43                         ` Dmitry Vyukov
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Sandeen @ 2018-05-09  2:48 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Darrick J. Wong, Dave Chinner, syzbot, LKML, linux-xfs,
	syzkaller-bugs, Theodore Ts'o, syzkaller



On 5/8/18 2:52 AM, Dmitry Vyukov wrote:
>> Or put another way, how did you arrive at the fs image values in the reproducer,
>> i.e.:
> Currently they are completely random, nobody taught syzkaller about AGFs, etc.

So you just combine a few megabytes of purely random bits out of thin air until
you get something that approximates an xfs filesystem?  Google must have more
computing power than I was aware of.

-Eric

>> oid loop()
>> {
>>   memcpy((void*)0x20000000, "xfs", 4);
>>   memcpy((void*)0x20000100, "./file0", 8);
>>   *(uint64_t*)0x20000200 = 0x20010000;
>>   memcpy((void*)0x20010000,
>>          "\x58\x46\x53\x42\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00"
>>          "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x9f\x98"
>>          "\x99\xff\xcb\xa1\x4e\xe6\xad\x52\x08\x20\x67\x09\xed\x75\x00\x00\x00"
>>          "\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x35\xe0\x00\x00\x00\x00"
>>          "\x00\x00\x35\xe1\x00\x00\x00\x00\x00\x00\x35\xe2\x00\x00\x00\x01\x00"
>>          "\x00\x10\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x03\x55\xb4\xa4"
>>          "\x02\x00\x01\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
>>          "\x00\x0c\x09\x08\x04\x0c\x00\x00\x19\x00\x00\x00\x00\x00\x00\x00\x40"
>>          "\x00\x00\x00\x00\x00\x00\x00\x3d\x00\x00\x00\x00\x00\x00\x0c\xa3\x00"
>>          "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
>>          "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00"
>>          "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x02\x02",
>>          204);
>>
>> ...
>>
>> The in-memory xfs filesystem it constructs is damaged, is that an intentional
>> part of the fuzzing during the test?
> Yes, invalid inputs is part of testing.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-05-09  2:37             ` Eric Biggers
@ 2018-05-09  3:32               ` Eric Sandeen
  0 siblings, 0 replies; 27+ messages in thread
From: Eric Sandeen @ 2018-05-09  3:32 UTC (permalink / raw)
  To: Eric Biggers, Dave Chinner
  Cc: Dmitry Vyukov, syzbot, Darrick J. Wong, LKML, linux-xfs, syzkaller-bugs



On 5/8/18 9:37 PM, Eric Biggers wrote:
> On Wed, May 09, 2018 at 10:50:11AM +1000, Dave Chinner wrote:
>> On Tue, May 08, 2018 at 09:56:01AM +0200, Dmitry Vyukov wrote:
>>> On Wed, May 2, 2018 at 12:51 AM, Dave Chinner <david@fromorbit.com> wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> syzbot hit the following crash on upstream commit
>>>>>>>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +0000)
>>>>>>>> Merge branch 'ras-core-for-linus' of
>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>>>>>>> syzbot dashboard link:
>>>>>>>> https://syzkaller.appspot.com/bug?extid=84a67953651a971809ba
>>>>>>>>
>>>>>>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
>>>>>>>> syzkaller reproducer:
>>>>>>>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>>>>>>>
>>>>>>> What a mess. A hand built, hopelessly broken filesystem image made
>>>>>>> up of hex dumps, written into a mmap()d region of memory, then
>>>>>>> copied into a tmpfs file and mounted with the loop device.
>>>>>>>
>>>>>>> Engineers that can debug broken filesystems don't grow on trees.  If
>>>>>>> we are to have any hope of understanding what the hell this test is
>>>>>>> doing, the bot needs to supply us with a copy of the built
>>>>>>> filesystem image the test uses. We need to be able to point forensic
>>>>>>> tools at the image to decode all the structures into human readable
>>>>>>> format - if we are forced to do that by hand or jump through hoops
>>>>>>> to create our own filesystem image than I'm certainly not going to
>>>>>>> waste time looking at these reports...
>>>>>>
>>>>>> Hi Dave,
>>>>>>
>>>>>> Here is the image:
>>>>>> https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
>>>>>
>>>>> Have anybody looked at the bug and the image yet?
>>>>
>>>> Yes, I did that a couple of weeks ago. Couldn't reproduce on a TOT
>>>> kernel here.
>>>
>>> Do you think it is fixed now? What fixed it? The bug was there.
>>
>> We merge fixes for fuzzing issues all the time. IIRC a big batch of
>> them from the xfstests fuzzing infrastructure went into 4.17-rc1.
>>
>> If you want a commit, then do a bisect....
>>
> 
> The fix was commit 8241f7f983b9728:
> 
> #syz fix: xfs: don't iunlock the quota ip when quota block

Ah, thanks.  Interestingly that one was sent to the xfs list on 2/22, a couple
months before this bug report.  Took some time to get reviewed and merged
upstream, and it actually landed upstream in Linus' kernel not long after
this report...

-Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-05-09  2:48                       ` Eric Sandeen
@ 2018-05-09  8:43                         ` Dmitry Vyukov
  2018-05-09 23:22                           ` Dave Chinner
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Vyukov @ 2018-05-09  8:43 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Darrick J. Wong, Dave Chinner, syzbot, LKML, linux-xfs,
	syzkaller-bugs, Theodore Ts'o, syzkaller

On Wed, May 9, 2018 at 4:48 AM, Eric Sandeen <sandeen@sandeen.net> wrote:
>
>
> On 5/8/18 2:52 AM, Dmitry Vyukov wrote:
>>> Or put another way, how did you arrive at the fs image values in the reproducer,
>>> i.e.:
>> Currently they are completely random, nobody taught syzkaller about AGFs, etc.
>
> So you just combine a few megabytes of purely random bits out of thin air until
> you get something that approximates an xfs filesystem?  Google must have more
> computing power than I was aware of.


syzbot uses very few cores for fuzzing of all of the hundreds of
kernel subsystems. But syzkaller (the underlying fuzzer) uses
coverage-guidance and this makes fuzzing exponentially more efficient
than blind techniques. Coverage-guidance is also combined with
grammar-based generation techniques, which gives additional synergy
(though there are no grammar descriptions for filesystem formats at
the moment).

Does "xfstests fuzzing infrastructure" use coverage-guidance? If not,
it would be useful to add. Among some solutions there are LibFuzzer
(https://llvm.org/docs/LibFuzzer.html), AFL
(http://lcamtuf.coredump.cx/afl/), kernel-fuzzing
(https://github.com/oracle/kernel-fuzzing). I don't know how xfstests
fuzzing works, so I can't say what would be more suitable there.



>>> oid loop()
>>> {
>>>   memcpy((void*)0x20000000, "xfs", 4);
>>>   memcpy((void*)0x20000100, "./file0", 8);
>>>   *(uint64_t*)0x20000200 = 0x20010000;
>>>   memcpy((void*)0x20010000,
>>>          "\x58\x46\x53\x42\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00"
>>>          "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x9f\x98"
>>>          "\x99\xff\xcb\xa1\x4e\xe6\xad\x52\x08\x20\x67\x09\xed\x75\x00\x00\x00"
>>>          "\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x35\xe0\x00\x00\x00\x00"
>>>          "\x00\x00\x35\xe1\x00\x00\x00\x00\x00\x00\x35\xe2\x00\x00\x00\x01\x00"
>>>          "\x00\x10\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x03\x55\xb4\xa4"
>>>          "\x02\x00\x01\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
>>>          "\x00\x0c\x09\x08\x04\x0c\x00\x00\x19\x00\x00\x00\x00\x00\x00\x00\x40"
>>>          "\x00\x00\x00\x00\x00\x00\x00\x3d\x00\x00\x00\x00\x00\x00\x0c\xa3\x00"
>>>          "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
>>>          "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00"
>>>          "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x02\x02",
>>>          204);
>>>
>>> ...
>>>
>>> The in-memory xfs filesystem it constructs is damaged, is that an intentional
>>> part of the fuzzing during the test?
>> Yes, invalid inputs is part of testing.
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-05-09  0:50           ` Dave Chinner
  2018-05-09  2:37             ` Eric Biggers
@ 2018-05-09 13:55             ` Theodore Y. Ts'o
  2018-05-09 14:13               ` Dmitry Vyukov
  1 sibling, 1 reply; 27+ messages in thread
From: Theodore Y. Ts'o @ 2018-05-09 13:55 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Dmitry Vyukov, syzbot, Darrick J. Wong, LKML, linux-xfs, syzkaller-bugs

>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
>>> syzkaller reproducer:
>>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>>
>> What a mess. A hand built, hopelessly broken filesystem image made
>> up of hex dumps, written into a mmap()d region of memory, then
>> copied into a tmpfs file and mounted with the loop device.
>>
>> Engineers that can debug broken filesystems don't grow on trees.  If
>> we are to have any hope of understanding what the hell this test is
>> doing, the bot needs to supply us with a copy of the built
>> filesystem image the test uses. We need to be able to point forensic
>> tools at the image to decode all the structures into human readable
>> format - if we are forced to do that by hand or jump through hoops
>> to create our own filesystem image than I'm certainly not going to
>> waste time looking at these reports...
>
> Hi Dave,
>
> Here is the image:
> https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view

A suggestion --- insteading of forcing human beings --- either
overworked file system developers, or understaffed fuzzing tool teams,
to have to manually pull out the file system image out from the C
repro, if it's too hard to add a link where the file system iamge can
be downloaded from the Syzkaller web application --- how about adding
an option to the C repro template which causes it to dump the image to
a file and then immediately exit?

					- Ted

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-05-09 13:55             ` Theodore Y. Ts'o
@ 2018-05-09 14:13               ` Dmitry Vyukov
  0 siblings, 0 replies; 27+ messages in thread
From: Dmitry Vyukov @ 2018-05-09 14:13 UTC (permalink / raw)
  To: Theodore Y. Ts'o, Dave Chinner, Dmitry Vyukov, syzbot,
	Darrick J. Wong, LKML, linux-xfs, syzkaller-bugs

On Wed, May 9, 2018 at 3:55 PM, Theodore Y. Ts'o <tytso@mit.edu> wrote:
>>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5719304272084992
>>>> syzkaller reproducer:
>>>> https://syzkaller.appspot.com/x/repro.syz?id=5767783983874048
>>>
>>> What a mess. A hand built, hopelessly broken filesystem image made
>>> up of hex dumps, written into a mmap()d region of memory, then
>>> copied into a tmpfs file and mounted with the loop device.
>>>
>>> Engineers that can debug broken filesystems don't grow on trees.  If
>>> we are to have any hope of understanding what the hell this test is
>>> doing, the bot needs to supply us with a copy of the built
>>> filesystem image the test uses. We need to be able to point forensic
>>> tools at the image to decode all the structures into human readable
>>> format - if we are forced to do that by hand or jump through hoops
>>> to create our own filesystem image than I'm certainly not going to
>>> waste time looking at these reports...
>>
>> Hi Dave,
>>
>> Here is the image:
>> https://drive.google.com/file/d/1jzhGGe5SBJcqfsjxCLHoh4Kazke1oTfC/view
>
> A suggestion --- insteading of forcing human beings --- either
> overworked file system developers, or understaffed fuzzing tool teams,
> to have to manually pull out the file system image out from the C
> repro, if it's too hard to add a link where the file system iamge can
> be downloaded from the Syzkaller web application --- how about adding
> an option to the C repro template which causes it to dump the image to
> a file and then immediately exit?

Hi Ted,

That's what I proposed above:
https://groups.google.com/d/msg/syzkaller-bugs/KJNNTgTdg_g/NRxarDcYBgAJ
But I did not get response yet.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-05-09  8:43                         ` Dmitry Vyukov
@ 2018-05-09 23:22                           ` Dave Chinner
  2018-05-11  8:59                             ` Dmitry Vyukov
  0 siblings, 1 reply; 27+ messages in thread
From: Dave Chinner @ 2018-05-09 23:22 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Eric Sandeen, Darrick J. Wong, syzbot, LKML, linux-xfs,
	syzkaller-bugs, Theodore Ts'o, syzkaller

On Wed, May 09, 2018 at 10:43:05AM +0200, Dmitry Vyukov wrote:
> Does "xfstests fuzzing infrastructure" use coverage-guidance?

It's guided manually to fuzz a substantial proportion of the fields
in the on-disk format that are susceptible to fuzzing bqased
attacks. It's not complete coverage yet, but it's getting better and
better, and we're finding more problems from it that random bit
based fuzzing has ever uncovered.

Also, the xfstests fuzzing defeats the CRC protection now built into
the metadata, which means it can exercise all the new filesystem
features that random bit fuzzers cannot exercise. That's the problem
with fuzzers like syzbot - they can only usefully fuzz the legacy
filesystem format which doesn't have CRC validation, nor many of the
other protections that the current filesystem format has to detect
corruption. This will also allow us to test things like online
repair of fuzzed structures....

Random bit perturbation filesystem image fuzzing was state of the
art ~10 years ago.  They were made redundant by filesystems like
XFS and ext4 adding metadata CRC checking ~5 years ago. The legacy
filesystem formats are essentially unfixable, and it's largely a
waste of time to be trying to make them robust against random bit
fuzzing because such random bit corruptions (like the syzbot images)
do not occur in the real world.

IOWs, we've had to advance the "state of the art" ourselves because
nobody else in the fuzzing world responded to the fact we've
essentially defeated random bit image fuzzing. Not only that, we
have our own infrastructure that produces repeatable, understandable
and debuggable failures, and this is something that many 3rd party
fuzzing efforts do not provide us with.

> If not, it would be useful to add. Among some solutions there are
> LibFuzzer (https://llvm.org/docs/LibFuzzer.html), AFL >
> (http://lcamtuf.coredump.cx/afl/), kernel-fuzzing
> (https://github.com/oracle/kernel-fuzzing).  I don't know how
> xfstests fuzzing works, so I can't say what would be more suitable
> there.

I think only AFL would be a usable infrastructure, but it would
require being taught about the ondisk format so it could perturb the
image in ways that are useful to modern filesystem formats. Lots(!)
of work, and it's not clear it would do any better than what we
already have.

Given the results we're getting from our own fuzzers, I don't see
much point in (XFS developers) investing huge amounts of effort to
make some other fuzzer equivalent to what we already have. If
someone else starts fuzzing the current format (v5) XFS filesystems
and finding problems we haven't, then I'm going to be interested in
their fuzzing tools.  But (guided) random bit perturbation fuzzing
of legacy filesystem formats is really not that useful or
interesting to us right now.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-05-09 23:22                           ` Dave Chinner
@ 2018-05-11  8:59                             ` Dmitry Vyukov
  2018-05-12  1:16                               ` Dave Chinner
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Vyukov @ 2018-05-11  8:59 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Eric Sandeen, Darrick J. Wong, syzbot, LKML, linux-xfs,
	syzkaller-bugs, Theodore Ts'o, syzkaller

On Thu, May 10, 2018 at 1:22 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Wed, May 09, 2018 at 10:43:05AM +0200, Dmitry Vyukov wrote:
>> Does "xfstests fuzzing infrastructure" use coverage-guidance?
>
> It's guided manually to fuzz a substantial proportion of the fields
> in the on-disk format that are susceptible to fuzzing bqased
> attacks. It's not complete coverage yet, but it's getting better and
> better, and we're finding more problems from it that random bit
> based fuzzing has ever uncovered.
>
> Also, the xfstests fuzzing defeats the CRC protection now built into
> the metadata, which means it can exercise all the new filesystem
> features that random bit fuzzers cannot exercise. That's the problem
> with fuzzers like syzbot - they can only usefully fuzz the legacy
> filesystem format which doesn't have CRC validation, nor many of the
> other protections that the current filesystem format has to detect
> corruption. This will also allow us to test things like online
> repair of fuzzed structures....

syzkaller has 2 techniques to deal with checksums, if you are
interested I can go into more detail.

> Random bit perturbation filesystem image fuzzing was state of the
> art ~10 years ago.  They were made redundant by filesystems like
> XFS and ext4 adding metadata CRC checking ~5 years ago. The legacy
> filesystem formats are essentially unfixable, and it's largely a
> waste of time to be trying to make them robust against random bit
> fuzzing because such random bit corruptions (like the syzbot images)
> do not occur in the real world.

Note there are also specifically crafted images that can be
automounted whenever one plugs something into USB.

> IOWs, we've had to advance the "state of the art" ourselves because
> nobody else in the fuzzing world responded to the fact we've
> essentially defeated random bit image fuzzing. Not only that, we
> have our own infrastructure that produces repeatable, understandable
> and debuggable failures, and this is something that many 3rd party
> fuzzing efforts do not provide us with.
>
>> If not, it would be useful to add. Among some solutions there are
>> LibFuzzer (https://llvm.org/docs/LibFuzzer.html), AFL >
>> (http://lcamtuf.coredump.cx/afl/), kernel-fuzzing
>> (https://github.com/oracle/kernel-fuzzing).  I don't know how
>> xfstests fuzzing works, so I can't say what would be more suitable
>> there.
>
> I think only AFL would be a usable infrastructure, but it would
> require being taught about the ondisk format so it could perturb the
> image in ways that are useful to modern filesystem formats. Lots(!)
> of work, and it's not clear it would do any better than what we
> already have.
>
> Given the results we're getting from our own fuzzers, I don't see
> much point in (XFS developers) investing huge amounts of effort to
> make some other fuzzer equivalent to what we already have. If
> someone else starts fuzzing the current format (v5) XFS filesystems
> and finding problems we haven't, then I'm going to be interested in
> their fuzzing tools.  But (guided) random bit perturbation fuzzing
> of legacy filesystem formats is really not that useful or
> interesting to us right now.

Just asked.

Note that coverage-guidance does not necessary mean bit flipping.
syzkaller combines coverage-guidance with grammar-awareness and other
smartness.

Based on our experience with network testing, main advantage of
syzkaller over just feeding blobs as network packets (even if these
blobs are built in a very smart way) is the following. syzkaller can
build complex interactions between syscalls, external inputs and
blobs. For example, handling of external network packets depend on if
there is an open socket on that port, what setsockopts were called, if
there is a pending receive, what flags were passed to that receive,
were some data sent the other way, etc. For filesystems that would be
various filesystem syscalls executed against the mounted image,
concurrent umount, rebind, switch to read-only mode, etc.
But maybe xfstests do this too, I don't know. Do they?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: WARNING: bad unlock balance in xfs_iunlock
  2018-05-11  8:59                             ` Dmitry Vyukov
@ 2018-05-12  1:16                               ` Dave Chinner
  0 siblings, 0 replies; 27+ messages in thread
From: Dave Chinner @ 2018-05-12  1:16 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Eric Sandeen, Darrick J. Wong, syzbot, LKML, linux-xfs,
	syzkaller-bugs, Theodore Ts'o, syzkaller

On Fri, May 11, 2018 at 10:59:53AM +0200, Dmitry Vyukov wrote:
> On Thu, May 10, 2018 at 1:22 AM, Dave Chinner <david@fromorbit.com> wrote:
> > On Wed, May 09, 2018 at 10:43:05AM +0200, Dmitry Vyukov wrote:
> >> Does "xfstests fuzzing infrastructure" use coverage-guidance?
> >
> > It's guided manually to fuzz a substantial proportion of the fields
> > in the on-disk format that are susceptible to fuzzing bqased
> > attacks. It's not complete coverage yet, but it's getting better and
> > better, and we're finding more problems from it that random bit
> > based fuzzing has ever uncovered.
> >
> > Also, the xfstests fuzzing defeats the CRC protection now built into
> > the metadata, which means it can exercise all the new filesystem
> > features that random bit fuzzers cannot exercise. That's the problem
> > with fuzzers like syzbot - they can only usefully fuzz the legacy
> > filesystem format which doesn't have CRC validation, nor many of the
> > other protections that the current filesystem format has to detect
> > corruption. This will also allow us to test things like online
> > repair of fuzzed structures....
> 
> syzkaller has 2 techniques to deal with checksums, if you are
> interested I can go into more detail.

You can if you want, but I'm betting it basically comes down to
teaching syzcaller about parts of the on-disk format, similar to
AFL. And, like AFL, I doubt any XFS developer has the time to
add such support to syzbot.

> > Given the results we're getting from our own fuzzers, I don't see
> > much point in (XFS developers) investing huge amounts of effort to
> > make some other fuzzer equivalent to what we already have. If
> > someone else starts fuzzing the current format (v5) XFS filesystems
> > and finding problems we haven't, then I'm going to be interested in
> > their fuzzing tools.  But (guided) random bit perturbation fuzzing
> > of legacy filesystem formats is really not that useful or
> > interesting to us right now.
> 
> Just asked.
> 
> Note that coverage-guidance does not necessary mean bit flipping.
> syzkaller combines coverage-guidance with grammar-awareness and other
> smartness.

Yup, I assumed that this would be the case - those sorts of
"directed fuzzing" techniques were pioneered by the Samba guys for
reverse engineering the SMB protocol used by MS servers all those
years ago. But at it's most basic level, it's still using bit
flipping techniques to perturb the input and provoke responses.

> Based on our experience with network testing, main advantage of
> syzkaller over just feeding blobs as network packets (even if these
> blobs are built in a very smart way) is the following. syzkaller can
> build complex interactions between syscalls, external inputs and
> blobs.

Yup, nothing new there - that's what every other filesystem fuzzer
infrastructure does, too.  The problem with this is that it doesn't
pin-point the actual operation that tripped over the on-disk
corruption. It's catching downstream symptoms of an unknown,
undetected on-disk format corruption. i.e. it's a poor substitute
for explicit testing of structure bounds and data relationships of a
known format.

That's the fundamental premise of fuzz testing - most software does
not have robust validation of it's inputs and so fuzzing those
inputs finds problems. We've moved on from the old "trust and don't
validate" model of filesystem structure architecture.  The on-disk
format is very well defined, it is constrained in most cases, and we
can validate most individual structures at runtime with relatively
little cost.

Hence the "structure bounds" exploits that fuzzers tend to exercise
are pretty much taken out of the picture, and that leaves us with
"data relationships" between structures as the main vector for
undetected corruptions. These are mostly detectable, and many are
correctable as the current on-disk format has a lot of redundant
information. So the space for fuzzers to detect problems is getting
smaller and smaller all the time.

IOWs, filesystem image fuzzers have their place, but if you want us
to take your fuzzing seriously then your fuzzer needs to understand
all the mechanisms we now use to detect corruptions to show us where
they are deficient. If your fuzzing doesn't expose flaws in our
current validation techniques, then it's really not useful to us.

> For example, handling of external network packets depend on if
> there is an open socket on that port, what setsockopts were called, if
> there is a pending receive, what flags were passed to that receive,
> were some data sent the other way, etc. For filesystems that would be
> various filesystem syscalls executed against the mounted image,
> concurrent umount, rebind, switch to read-only mode, etc.
> But maybe xfstests do this too, I don't know. Do they?

Generally there is no need to do this because we know exactly what
syscalls will trigger access and/or modification to on-disk
structures. Access to the on-disk structures triggers the built in
verifier infrastructure, which then catches the majority of
corruptions before the structure is used by anything in the kernel.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2018-05-12  1:21 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-03  2:01 WARNING: bad unlock balance in xfs_iunlock syzbot
2018-04-03  4:38 ` Dave Chinner
2018-04-05 18:54   ` Dmitry Vyukov
2018-04-05 21:38     ` Dave Chinner
2018-04-06 16:10       ` Darrick J. Wong
2018-04-13 10:03         ` Dmitry Vyukov
2018-04-16 19:22           ` Eric Sandeen
2018-04-30 13:23             ` Dmitry Vyukov
2018-04-30 13:49               ` Eric Sandeen
2018-04-30 14:02                 ` Dmitry Vyukov
2018-04-30 15:14                   ` Eric Sandeen
2018-05-02  9:54                     ` Jan Tulak
2018-05-08  7:52                     ` Dmitry Vyukov
2018-05-09  2:48                       ` Eric Sandeen
2018-05-09  8:43                         ` Dmitry Vyukov
2018-05-09 23:22                           ` Dave Chinner
2018-05-11  8:59                             ` Dmitry Vyukov
2018-05-12  1:16                               ` Dave Chinner
2018-05-08  7:54                     ` Dmitry Vyukov
2018-04-30 13:24     ` Dmitry Vyukov
2018-05-01 22:51       ` Dave Chinner
2018-05-08  7:56         ` Dmitry Vyukov
2018-05-09  0:50           ` Dave Chinner
2018-05-09  2:37             ` Eric Biggers
2018-05-09  3:32               ` Eric Sandeen
2018-05-09 13:55             ` Theodore Y. Ts'o
2018-05-09 14:13               ` Dmitry Vyukov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.