All of lore.kernel.org
 help / color / mirror / Atom feed
From: Goldwyn Rodrigues <rgoldwyn@suse.com>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>,
	Goldwyn Rodrigues <rgoldwyn@suse.de>,
	linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] qgroup: Prevent qgroup->reserved from going subzero
Date: Fri, 23 Sep 2016 08:43:19 -0500	[thread overview]
Message-ID: <628e1dcd-ddc3-3f37-8a47-c01c912db970@suse.com> (raw)
In-Reply-To: <a5707357-38d1-f2c3-41cf-b9d76bd992a3@cn.fujitsu.com>



On 09/22/2016 08:06 PM, Qu Wenruo wrote:
> 
> 
> At 09/23/2016 02:47 AM, Goldwyn Rodrigues wrote:
>> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>>
>> While free'ing qgroup->reserved resources, we must check
>> if the page is already commmitted to disk or still in memory.
>> If not, the reserve free is doubly accounted, once while
>> invalidating the page, and the next time while free'ing
>> delalloc. This results is qgroup->reserved(u64) going subzero,
>> thus very large value. So, no further I/O can be performed.
>>
>> This is also expressed in the comments, but not performed.
>>
>> Testcase:
>> SCRATCH_DEV=/dev/vdb
>> SCRATCH_MNT=/mnt
>> mkfs.btrfs -f $SCRATCH_DEV
>> mount -t btrfs $SCRATCH_DEV $SCRATCH_MNT
>> cd $SCRATCH_MNT
>> btrfs quota enable $SCRATCH_MNT
>> btrfs subvolume create a
>> btrfs qgroup limit 50m a $SCRATCH_MNT
>> sync
>> for c in {1..15}; do
>> dd if=/dev/zero  bs=1M count=40 of=$SCRATCH_MNT/a/file;
>> done
>>
>> sleep 10
>> sync
>> sleep 5
>>
>> touch $SCRATCH_MNT/a/newfile
>>
>> echo "Removing file"
>> rm $SCRATCH_MNT/a/file
>>
>> Fixes: b9d0b38928 ("btrfs: Add handler for invalidate page")
>> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
>> ---
>>  fs/btrfs/inode.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index e6811c4..2e2a026 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -8917,7 +8917,8 @@ again:
>>       * 2) Not written to disk
>>       *    This means the reserved space should be freed here.
>>       */
>> -    btrfs_qgroup_free_data(inode, page_start, PAGE_SIZE);
>> +    if (PageDirty(page))
>> +        btrfs_qgroup_free_data(inode, page_start, PAGE_SIZE);
>>      if (!inode_evicting) {
>>          clear_extent_bit(tree, page_start, page_end,
>>                   EXTENT_LOCKED | EXTENT_DIRTY |
>>
> Thanks for the test case.
> 
> However for the fix, I'm afraid it may not be the root cause.
> 
> Here, if the pages are dirty, then corresponding range is marked
> EXTENT_QGROUP_RESERVED.
> Then btrfs_qgroup_free_data() will clear that bit and reduce the number.
> 
> If the pages are already committed, then corresponding range won't be
> marked EXTENT_QGROUP_RESERVED.
> Later btrfs_qgroup_free_data() won't reduce any bytes, since it will
> only reduce the bytes if it cleared EXTENT_QGROUP_RESERVED bit.
> 
> If everything goes well there is no need to check PageDirty() here, as
> we have EXTENT_QGROUP_RESERVED bit for that accounting.
> 
> So there is some other thing causing EXTENT_QGROUP_RESERVED bit out of
> sync with dirty pages.
> Considering you did it 15 times to reproduce the problem, maybe there is
> some race causing the problem?
> 

You can have pages marked as not dirty with EXTENT_QGROUP_RESERVED set
for a truncate operation. Performing dd on the same file, truncates the
file before overwriting, while the pages of the previous writes are
still in memory and not committed to disk.

truncate_inode_page() -> truncate_complete_page() clears the dirty flag.
So, you can have a case where the EXTENT_QGROUP_RESERVED bit is set
while the page is not listed as dirty because the truncate "cleared" all
the dirty pages.

-- 
Goldwyn

  reply	other threads:[~2016-09-23 13:43 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-22 18:47 [PATCH] qgroup: Prevent qgroup->reserved from going subzero Goldwyn Rodrigues
2016-09-23  1:06 ` Qu Wenruo
2016-09-23 13:43   ` Goldwyn Rodrigues [this message]
2016-09-26  2:33     ` Qu Wenruo
2016-09-26 14:31       ` Goldwyn Rodrigues
2016-09-27  3:10         ` Qu Wenruo
2016-09-27 14:04           ` Goldwyn Rodrigues
2016-09-28  1:44             ` Qu Wenruo
2016-09-28  2:19               ` Goldwyn Rodrigues
2016-09-29  8:57                 ` Qu Wenruo
2016-09-29 11:13                   ` Goldwyn Rodrigues
2016-09-30  1:21                     ` Qu Wenruo
2016-09-30  2:18                     ` Qu Wenruo
2016-09-30  2:24                       ` Qu Wenruo
2016-09-30 15:40 Goldwyn Rodrigues

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=628e1dcd-ddc3-3f37-8a47-c01c912db970@suse.com \
    --to=rgoldwyn@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo@cn.fujitsu.com \
    --cc=rgoldwyn@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.