linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: JeffleXu <jefflexu@linux.alibaba.com>
To: Eric Whitney <enwlinux@gmail.com>
Cc: linux-ext4@vger.kernel.org, tytso@mit.edu, joseph.qi@linux.alibaba.com
Subject: Re: [PATCH RFC] ext4: fix partial cluster initialization when splitting extent
Date: Fri, 22 May 2020 11:09:37 +0800	[thread overview]
Message-ID: <5d4362c8-b11f-e7ef-8f48-baf1286c9289@linux.alibaba.com> (raw)
In-Reply-To: <20200521212619.GA10473@localhost.localdomain>

Thanks for reviewing. I will send a formal patch later ;)

Thanks,

Jeffle


On 5/22/20 5:26 AM, Eric Whitney wrote:
> * JeffleXu <jefflexu@linux.alibaba.com>:
>> On 5/19/20 6:08 AM, Eric Whitney wrote:
>>> Hi, Jeffle:
>>>
>>> What kernel were you running when you observed your failures?  Does your
>>> patch resolve all observed failures, or do any remain?  Do you have a
>>> simple test script that reproduces the bug?
>>>
>>> I've made almost 1000 runs of shared/298 on various bigalloc configurations
>>> using Ted's test appliance on 5.7-rc5 and have not observed a failure.
>>> Several auto group runs have also passed without failures.  Ideally, I'd
>>> like to be able to reproduce your failure to be sure we fully understand
>>> what's going on.  It's still the case that the "2" is wrong, but I think
>>> that code in rm_leaf may be involved in an unexpected way.
>>>
>>> Thanks,
>>> Eric
>> Hi Eric,
>>
>> Following on is my test environment.
>>
>>
>> kernel: 5.7-rc4-git-eb24fdd8e6f5c6bb95129748a1801c6476492aba
>>
>> e2fsprog: latest release version 1.45.6 (20-Mar-2020)
>>
>> xfstests: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git, master
>> branch, latest commit
>>
>>
>> 1. Test device
>>
>> I run the test in a VM and the VM is setup by qemu. The size of vdb is 1G,
>>
>> ```
>>
>> #lsblk
>>
>> NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
>> vdb    254:16   0   1G  0 disk
>>
>> ```
>>
>>
>> and is initialized by:
>>
>> ```
>>
>> qemu-img create -f qcow2 /XX/disk1.qcow2 1G
>>
>> qemu-kvm -drive file=/XX/disk1.qcow2,if=virtio,format=qcow2 ...
>>
>> ```
>>
>>
>> 2. Test script
>>
>>
>> local.config of xfstests is like:
>>
>> export TEST_DEV=/dev/vdb
>> export TEST_DIR=/mnt/test
>> export SCRATCH_DEV=/dev/vdc
>> export SCRATCH_MNT=/mnt/scratch
>>
>>
>> Following on is an example script to reproduce the failure:
>>
>> ```sh
>>
>> #!/bin/bash
>>
>> for i in `seq 100`; do
>>          echo y | mkfs.ext4 -O bigalloc -C 16K /dev/vdb
>>
>>          ./check shared/298
>>          status=$?
>>
>>          if [[ $status == 1 ]]; then
>>                  echo "$i exit"
>>                  exit
>>          fi
>> done
>>
>> ```
>>
>>
>> Indeed the failure occurs occasionally. Sometimes the script stops at
>> iteration 4, or sometimes
>>
>> at iteration 2, 7, 24.
>>
>>
>> The failure occurs with the following dmesg report:
>>
>> ```
>>
>> [  387.471876] EXT4-fs error (device vdb): mb_free_blocks:1457: group 1,
>> block 158084:freeing already freed block (bit 6753); block bitmap corrupt.
>> [  387.473729] EXT4-fs error (device vdb): ext4_mb_generate_buddy:747: group
>> 1, block bitmap and bg descriptor inconsistent: 19550 vs 19551 free clusters
>>
>> ```
>>
>>
>> 3. About the applied patch
>>
>> The applied patch does fix the failure in my test environment. At least the
>> failure doesn't occur after running the full 100 iterations.
>>
>>
>> Thanks
>>
>> Jeffle
>>
>>
>>
> Hi, Jeffle:
>
> Thanks for that information.  I'm still unable to reproduce your failure,
> but by inspection your patch clearly fixes a bug, and of course, you're seeing
> that.  I suspect the code in rm_leaf that also sets the partial cluster nofree
> state is masking the bug in my testing.  In your case, my best guess is that
> your testing is occasionally getting into the retry loop for EAGAIN in
> remove_space.  This would effectively expose the bug again and could lead to
> the failure you've described.
>
> Your patch has survived all the heavy testing I've thrown at it.  So, please
> repost your RFC patch as a fix, and feel free to add:
> Reviewed-by: Eric Whitney <enwlinux@gmail.com>
>
> This points out that the cluster freeing code really needs to be cleaned up,
> so I'm working on a patch series that does that.
>
> Thanks for your patience,
> Eric

  reply	other threads:[~2020-05-22  3:09 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-14  8:14 [PATCH RFC] ext4: fix partial cluster initialization when splitting extent Jeffle Xu
2020-05-14 22:21 ` Eric Whitney
2020-05-18 22:08   ` Eric Whitney
2020-05-19  3:29     ` JeffleXu
2020-05-21 21:26       ` Eric Whitney
2020-05-22  3:09         ` JeffleXu [this message]
2020-05-22  4:02 Jeffle Xu
2020-05-22  4:15 ` JeffleXu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5d4362c8-b11f-e7ef-8f48-baf1286c9289@linux.alibaba.com \
    --to=jefflexu@linux.alibaba.com \
    --cc=enwlinux@gmail.com \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).