All of lore.kernel.org
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Ashlie Martinez <ashmrtn@utexas.edu>
Cc: Eryu Guan <eguan@redhat.com>, Josef Bacik <jbacik@fb.com>,
	Vijay Chidambaram <vvijay03@gmail.com>,
	fstests <fstests@vger.kernel.org>,
	Ext4 <linux-ext4@vger.kernel.org>, Theodore Tso <tytso@mit.edu>
Subject: Re: [RFC][PATCH] fstest: regression test for ext4 crash consistency bug
Date: Thu, 31 Aug 2017 07:06:48 +0300	[thread overview]
Message-ID: <CAOQ4uxgOu5yNV3zQ9WnJU_AQtpe9G9cxQQL52fWsXd=5bvGLaQ@mail.gmail.com> (raw)
In-Reply-To: <CAOQ4uxgRafGec=HFq1strxC5NM0jwNgbu5ZyP1g+c64=b399+A@mail.gmail.com>

[now really CC Ted]

On Thu, Aug 31, 2017 at 7:05 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Thu, Aug 31, 2017 at 4:28 AM, Ashlie Martinez <ashmrtn@utexas.edu> wrote:
>> Amir,
>>
>> I have been working on CrashMonkey more and I have jerry-rigged together a
>> test in CrashMonkey that calls into `fsx` with the minimal test case you
>> made. I am able to reproduce the ext4 error that you found along with a few
>> other potential errors.
>>
>> A quick point, I run fsck with `-yf` instead of `-nf` that xfstests runs
>> with. The reason for this is that CrashMonkey would like to report on
>> fixable and unfixable errors in the future.
>>
>
> That makes sense, but keep in mind that 'fixable' error may still loose data
> when fixing them with -y. Perhaps you should consider running fsck is auto
> fixing mode (i.e. e2fsck -p) when available, to classify errors as
> 'safely fixable'
> I believe the error these test encountered are 'safely fixable', but
> didn't check.
>
>> Running the ported test case, I find that CrashMonkey encounters the
>> following errors:
>> 1. Incorrect inode size and incorrect free data block and inode counts
>> (fixable)
>> 2. incorrect free data block and inode counts (fixable)
>> 3. `Superblock needs_recovery flag is clear, but journal has data` notice
>> along with errors present in case 1
>> 4. `Superblock needs_recovery flag is clear, but journal has data` notice
>> with no other errors
>>
>> For the incorrect i_size errors, I get the output `Inode 12, i_size is
>> 147456, should be 163840.` which I can also reproduce with your 501 xfstests
>> test case.
>>
>> When free data blocks and inode errors occur, the message is `Free blocks
>> count wrong (8795, counted=8714).` and `Free inodes count wrong (2549,
>> counted=2546).`
>>
>> I have not had a chance to look into the above errors to find their root
>> causes.
>>
>
> I believe this is what you get when you fsck -yf before trying to mount when
> the orphan list is not empty. You should avoid doing that.
>
> See what the greatest ext4 crash test experiment of them all is doing
> and read the comment to understand why:
> https://android.googlesource.com/platform/system/core/+/marshmallow-mr1-dev/fs_mgr/fs_mgr.c#96
> 1. mount -o  errors=remount-ro; umount
> 2. e2fsck -y
>
> So upstream Android never runs e2fsck -f. It will only check fs if kernel marked
> that fs has errors.
> Although Cyanogenmod did add -f and I imagine that many vendors do as well.
>
> As one who hacked and crashed a lot of Android devices, I can attest that I have
> observed both data loss and corrupted (non booting) fs, but the rest
> of the 2 billion
> crash test monkeys don't seem to be bothered ;-)
>
>> In total, CrashMonkey ran 1000 different tests. Of those, 344 passed without
>> fsck complaining. The remaining 656 tests saw fsck complain about something.
>> All of these tests consisted of unique sequences of bios, but may contain
>> equivalent crash states.
>>
>> The larger range of test results is due to the fact that CrashMonkey runs
>> many tests from just the single workload you made. These tests consist of
>> replaying some number of bio write operations, so it tests states different
>> than you 500 xfstest which I believe only replays to sync operations (i.e.
>> it never stops replay before a recorded fsync).
>
> That is correct. test 500 (temporary name) is mostly focused on checking
> data consistency of files after fsync. detecting metadata consistency errors
> is a by product. I do intend to add more tests focused on metadata consistency.
> Josef already wrote an fsstress script that should be converted to an xfstest
> which replays the log to every FUA and fsck.
>
>>
>> If you're interested, you can find the CrashMonkey code (and branch) at
>> https://github.com/utsaslab/crashmonkey/tree/ext4_regression_bug. If you
>> would like to run it, you should clone and build you xfstest in your home
>> directory so that the jerry-rigged CrashMonkey test case can find it.
>> Directions for running this test case in CrashMonkey should be at the top of
>> the README.
>
> You seem to have misspelled 'fsx' in README and in the code as 'xfs'.
> Funny, I always mistype it as 'sfx' :)
>
> Cheers,
> Amir.

  reply	other threads:[~2017-08-31  4:06 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-31  1:28 [RFC][PATCH] fstest: regression test for ext4 crash consistency bug Ashlie Martinez
2017-08-31  4:05 ` Amir Goldstein
2017-08-31  4:06   ` Amir Goldstein [this message]
2017-09-01 12:21     ` Ashlie Martinez
2017-09-01 14:59       ` Amir Goldstein
  -- strict thread matches above, loose matches on Subject: below --
2017-08-27 10:44 Amir Goldstein
2017-09-25  9:49 ` Xiao Yang
2017-09-25  9:49   ` Xiao Yang
2017-09-25 10:53   ` Amir Goldstein
2017-09-26 10:45     ` Xiao Yang
2017-09-26 11:48       ` Amir Goldstein
2017-09-30 14:15     ` Ashlie Martinez
2017-10-05  7:27       ` Xiao Yang
2017-10-05 15:04         ` Ashlie Martinez
2017-10-05 19:10           ` Amir Goldstein
2017-10-06  0:34             ` Ashlie Martinez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOQ4uxgOu5yNV3zQ9WnJU_AQtpe9G9cxQQL52fWsXd=5bvGLaQ@mail.gmail.com' \
    --to=amir73il@gmail.com \
    --cc=ashmrtn@utexas.edu \
    --cc=eguan@redhat.com \
    --cc=fstests@vger.kernel.org \
    --cc=jbacik@fb.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=vvijay03@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.