Re: WARNING: bad unlock balance in xfs_iunlock

From: Dmitry Vyukov <dvyukov@google.com>
To: Eric Sandeen <sandeen@sandeen.net>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
	Dave Chinner <david@fromorbit.com>,
	syzbot <syzbot+84a67953651a971809ba@syzkaller.appspotmail.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-xfs@vger.kernel.org,
	syzkaller-bugs <syzkaller-bugs@googlegroups.com>,
	"Theodore Ts'o" <tytso@mit.edu>,
	syzkaller <syzkaller@googlegroups.com>
Subject: Re: WARNING: bad unlock balance in xfs_iunlock
Date: Mon, 30 Apr 2018 16:02:19 +0200	[thread overview]
Message-ID: <CACT4Y+bUx4jw83Ux7w9URL9bCy1ERLaVGTTwvA9Fe_UtLFeqCw@mail.gmail.com> (raw)
In-Reply-To: <9f8d657c-7f42-7bd9-4477-6c3addf16dee@sandeen.net>

On Mon, Apr 30, 2018 at 3:49 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
> On 4/30/18 8:23 AM, Dmitry Vyukov wrote:
>> On Mon, Apr 16, 2018 at 9:22 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
>
> ...
>
>>> It sure /seems/ to have a notion of images: what else is syz_mount_image()?
>>>
>>> i.e. you are mounting an image to reproduce the problem, correct?
>>> And the system is "smart" enough to fire off an email to a filesystem list;
>>> if it does so, add a link to the image itself, as you already have already done
>>> for the C reproducer.
>>>
>>> Filesystem images are common parlance for filesystem engineers.  When
>>> you engage with them you'll have better results if you  provide them with
>>> inputs they can use directly instead of requiring them to reverse-engineer
>>> your custom test harness.
>>
>>
>> Well, yes and no.
>> syz_mount_image() is the only part of a large system that knows about
>> images. For the rest of the system it's just a syscall like any other
>> syscall. And the part that sends emails is far away from
>> syz_mount_image().
>> syzbot does not know per se that it sends an email to filesystems
>> list.
>
> I am asking it to learn this trick as an enhancement.  The MAINTAINERS
> file contains big clues about which subsystems are filesystems, for example:
>
> $ grep FILESYSTEM$ MAINTAINERS
> AFS FILESYSTEM
> CRAMFS FILESYSTEM
> EFI VARIABLE FILESYSTEM
> EFS FILESYSTEM
> FREEVXFS FILESYSTEM
> ...
>
>> It just extracted kernel source file name that looked relevant
>> to this crash and run get_maintainers.pl on it.
>> Also the image can contain dynamically generated data, which makes it
>> impossible to have as a file at all.
>
> I guess I'm not sure what this means, can you explain?

Say, a value that we generally pass to close system call is not static
and can't be dumped to a static file. It's whatever a previous open
system call has returned. Inside of the program we memorize the return
value of open in a variable and then pass it to close. This generally
stands for all system calls. Say, an image can contain an uid, and
that uid can be obtained from a system call too.

>> Thinking of this, what should be reasonably easy to do and may be a
>> compromise for near future is the following. We could insert code into
>> syz_mount_image() which dump the image if you build a program with a
>> special define (e.g. -DSYZ_DUMP_IMAGE). Will this work for you?
>
> If this is possible, I guess I still don't understand why you can't dump the
> image and provide link.  You have fast, efficient robots.  We have slow,
> busy humans.
>
>>>> Please elaborate re commits. It's a basic rule of any good bug report
>>>> -- communicate exact state of source code when the bug was hit, i.e.
>>>> provide the commit hash.
>>>
>>> Further best practice is to provide the /correct/ commit hash.
>
> ....
>
>>> I can't imagine these are right...
>>
>>
>> As I said, bisection is on our plate:
>> https://github.com/google/syzkaller/issues/501
>> Though, we will see how well it works, because it's not trivial (see
>> the issue for details).
>
> Oh I see. I had misunderstood; so:
>
>> syzbot hit the following crash on upstream commit
>> 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc (Mon Apr 2 18:47:07 2018 +0000)
>> Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>
> doesn't mean you bisected to that commit, or that this was the first bad commit,
> it just means you happened to have a tree at this commit when you hit the problem.
>
> That was not at all clear to me.  I thought when syzkaller was telling us
> "on upstream commit XYZ," it meant that it had identified commit XYZ as bad.
> I'm not sure if anyone else made that mistake, but  perhaps you could also clarify
> the bug report text in this regard?

Suggestions are welcome. Currently it says "syzbot hit the following
crash on upstream commit SHA1", which was supposed to mean just the
state of the source tree when the crash happened. But I am not a
native speaker, so perhaps I am saying not what I intend to say.

There are also suggestions on report format improvement from +Ted
currently in works:
https://github.com/google/syzkaller/issues/565#issuecomment-380792942
Not sure if they make this distinction 100% clear, though.

>>> I think that in this case, what we are asking for is a fine tuning of the
>>> testing and reporting so that we can more efficiently address these issues.
>>> Off the top of my head, and there may be more items:
>>>
>>> 1) Add a human contact to the emails, start an IRC channel, etc, for better
>>>    two-way communication.  (it wasn't clear that syzkaller@ reached humans,
>>>    tbh.)
>>
>> There is a human contact at syzkaller@googlegroups.com. Report footer
>> will be improved to make it more clear:
>> https://github.com/google/syzkaller/issues/565#issuecomment-380793620
>>
>>> 2) _Properly_ identify the regressing commit, if any.  If it doesn't look like
>>>    a recent regression, you can state that too.
>>
>> Bisection is on our plate.
>>
>>> 3) If you're reporting a filesystem bug that arose from using a filesystem
>>>    image, provide a URL to that filesystem image directly in the report.
>>
>> See above. It may not be necessary representable as a static file at all.
>
> Can you explain this?  Do you mean that the mounted image is changing while the
> tool runs, while the filesystem is mounted?
>
>>> 4) Create a filesystem image that can be more easily debugged by the experts,
>>>    i.e. one with > 1 allocation group, so standard repair & analysis tools can
>>>    be used with it.
>>
>> What is "> 1 allocation group"?
>
> Maybe I should back up; how are the xfs images created?  I had assumed that
> surely you start with a base image of some sort, and start fuzzing it from there.
> Is this correct?
>
> If so, allocation groups are a fundamental geometry of the filesystem; from
> man mkfs.xfs.8:
>
>                    agcount=value
>                           This is used to specify  the  number  of  allocation
>                           groups.  The  data  section  of  the  filesystem  is
>                           divided into allocation groups to improve  the  per‐
>                           formance  of  XFS. More allocation groups imply that
>                           more parallelism can  be  achieved  when  allocating
>                           blocks and inodes. The minimum allocation group size
>                           is 16 MiB; the maximum size is  just  under  1  TiB.
>                           The  data  section of the filesystem is divided into
>                           value allocation groups  (default  value  is  scaled
>                           automatically based on the underlying device size).
>
> If the base image only has one allocation group, it makes it more difficult for
> some tools to work with the image, because there is no redundancy.  1 AG is
> not a supported or recommended geometry for any real-life use of xfs.
>
> If I am correct that you start with a base image w/ a certain geometry or
> set of mkfs options, starting with >= 2 AGs would improve the usefulness of the
> filesystem image.

syzkaller can generate/mutate images based on structured format
templates, but for now we don't have any templates and these are just
opaque blobs.