All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: Goldwyn Rodrigues <rgoldwyn@suse.de>,
	Goldwyn Rodrigues <rgoldwyn@suse.com>,
	<linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH] qgroup: Prevent qgroup->reserved from going subzero
Date: Tue, 27 Sep 2016 11:10:16 +0800	[thread overview]
Message-ID: <1274b3b2-d196-ee9e-2261-d2f5cb906477@cn.fujitsu.com> (raw)
In-Reply-To: <0e892f0a-22b3-0fd7-ac8f-d52b76b4d4af@suse.de>



At 09/26/2016 10:31 PM, Goldwyn Rodrigues wrote:
>
>
> On 09/25/2016 09:33 PM, Qu Wenruo wrote:
>>
>>
>> At 09/23/2016 09:43 PM, Goldwyn Rodrigues wrote:
>>>
>>>
[snipped]
>>
>> Sorry I still don't get the point.
>> Would you please give a call flow of the timing dirtying page and
>> calling btrfs_qgroup_reserve/free/release_data()?
>>
>> Like:
>> __btrfs_buffered_write()
>> |- btrfs_check_data_free_space()
>> |  |- btrfs_qgroup_reserve_data() <- Mark QGROUP_RESERVED bit
>> |- btrfs_dirty_pages()            <- Mark page dirty
>>
>>
>> [[Timing of btrfs_invalidatepage()]]
>> About your commit message "once while invalidating the page, and the
>> next time while free'ing delalloc."
>> "Free'ing delalloc" did you mean btrfs_qgroup_free_delayed_ref().
>>
>> If so, it means one extent goes through full write back, and long before
>> calling btrfs_qgroup_free_delayed_ref(), it will call
>> btrfs_qgroup_release_data() to clear QGROUP_RESERVED.
>>
>> So the call will be:
>> __btrfs_buffered_write()
>> |- btrfs_check_data_free_space()
>> |  |- btrfs_qgroup_reserve_data() <- Mark QGROUP_RESERVED bit
>> |- btrfs_dirty_pages()            <- Mark page dirty
>>
>> <data write back happens>
>> run_delalloc_range()
>> |- cow_file_range()
>>    |- extent_clear_unlock_delalloc() <- Clear page dirty
>>
>> <modifying metadata>
>>
>> btrfs_finish_ordered_io()
>> |- insert_reserved_file_extent()
>>    |- btrfs_qgroup_release_data() <- Clear QGROUP_RESERVED bit
>>                                      but not decrease reserved space
>>
>> <run delayed refs, normally happens in commit_trans>
>> run_one_delyaed_refs()
>> |- btrfs_qgroup_free_delayed_ref() <- Directly decrease reserved space
>>
>>
>> So the problem seems to be, btrfs_invalidatepage() is called after
>> run_delalloc_range() but before btrfs_finish_ordered_io().
>>
>> Did you mean that?
>
> This happens event before a writeback happens. So, here is what is
> happening. This is in reference, and specific to the test case in the
> description.

The flowchart helps a lot, thanks.

But still some flow is still strange.
>
> Process: dd - first time
> __btrfs_buffered_write()
> |- btrfs_check_data_free_space()
> |  |- btrfs_qgroup_reserve_data() <- Mark QGROUP_RESERVED bit
> |- btrfs_dirty_pages()            <- Mark page dirty
>
> Please note data writeback does _not_ happen/complete.

This part is completely fine.

>
> Process: dd - next time, new process
> sys_open O_TRUNC
> .
>  |-btrfs_setattr()
>    |-truncate_pagecache()
>      |-truncate_inode_pages_range()
>         |-truncate_inode_page() - Page is cleared of Dirty flag.
>           |-btrfs_invalidatepage(page)
>             |-__btrfs_qgroup_release_data()
>               |-btrfs_qgroup_free_refroot() - qgroup->reserved is
> reduced by PAGESIZE.

That's OK too. No problem.
Although the __btrfs_qgroup_release_data() is called by 
btrfs_qgroup_free_data().
Which cleared QGROUP_RESERVED flag.


So the call flow should be

   |-btrfs_setattr()
     |-truncate_pagecache()
       |-truncate_inode_pages_range()
          |-truncate_inode_page()        <- Clear page dirty
            |-btrfs_invalidatepage(page)
              |-btrfs_qgroup_free_data() <- reduce qgroup->reserved
                                            and clear QGROUP_RESERVED

After btrfs_setattr(), the new dd write data.
So __btrfs_buffered_write() happens again,
dirtying pages and mark QGROUP_RESERVED and increase qgroup->reserved.

So here I don't see any problem
buffered_write:
   Mark dirty
   Mark QGROUP_RESERVED
   Increase qgroup->reserved

btrfs_setattr:
   Clear dirty
   Clear QGROUP_RESERVED
   Decrease qgroup->reserved

All pairs with each other, no leak no underflow.

Until the last buffered_write(), which doesn't got truncated.

>
>
> Process: sync
> btrfs_sync_fs()
> |-btrfs_commit_transaction()
>   |-btrfs_run_delayed_refs()
>     |- qgroup_free_refroot() - Reduces reserved by the size of the
> extent (in this case, the filesize of dd (first time)), even though some
> of the PAGESIZEs have been reduced while performing the truncate on the
> file.

The delayed_ref belongs to the last extent, which doesn't got truncated.

And in that case, that last extent got into disk.
run_dealloc_range() <- Write data into disk
finish_ordered_io() <- Update metadata
   |- insert_reserved_file_extents()
      |- btrfs_alloc_reserved_file_extent()
      |  |- btrfs_add_delayed_data_ref
      |     <This will cause a new delayed_data_ref>
      |- btrfs_qgroup_release_data()
         <This will clear QGROURP_RESERVED>

Then goes to your btrfs_sync_fs() => qgroup_free_refroot()



[[I think I find the problem now]]

Between btrfs_alloc_reserved_file_extent() and 
btrfs_qgroup_release_data(), there is a window that delayed_refs can be 
executed.
Since that d*mn delayed_refs can be run at any time asynchronously.

In that case, it will call qgroup_free_refroot() first at another 
process context, and later btrfs_qgroup_release_data() get schedualed, 
which will clear QGROUP_RESERVED again, causing the underflow.

Although it's not the original cause you reported, would you mind to try 
the following quick-dirty fix without your fix, to see if it fixes the 
problem?

-------
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e6811c4..70efa14 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2168,15 +2168,15 @@ static int insert_reserved_file_extent(struct 
btrfs_trans_handle *trans,
         ins.objectid = disk_bytenr;
         ins.offset = disk_num_bytes;
         ins.type = BTRFS_EXTENT_ITEM_KEY;
-       ret = btrfs_alloc_reserved_file_extent(trans, root,
-                                       root->root_key.objectid,
-                                       btrfs_ino(inode), file_pos,
-                                       ram_bytes, &ins);
         /*
          * Release the reserved range from inode dirty range map, as it is
          * already moved into delayed_ref_head
          */
         btrfs_qgroup_release_data(inode, file_pos, ram_bytes);
+       ret = btrfs_alloc_reserved_file_extent(trans, root,
+                                       root->root_key.objectid,
+                                       btrfs_ino(inode), file_pos,
+                                       ram_bytes, &ins);
  out:
         btrfs_free_path(path);

-------

>
> I hope that makes it clear.
>
>>
>> [[About test case]]
>> And for the test case, I can't reproduce the problem no matter if I
>> apply the fix or not.
>>
>> Either way it just fails after 3 loops of dd, and later dd will all fail.
>> But I can still remove the file and write new data into the fs.
>>
>
> Strange? I can reproduce at every instance of running it. Even on 4.8.0-rc7

Maybe related to the memory size.

Since you're also using VM, maybe your VM RAM is too small, so dirty 
pages pressure triggers a commit halfway and cause the race?


Thanks,
Qu
>
>>
>> [[Extra protect about qgroup->reserved]]
>> And for the underflowed qgroup reserve space, would you mind to add
>> warning for that case?
>> Just like what we did in qgroup excl/rfer values, so at least it won't
>> make qgroup blocking any write.
>>
>
> Oh yes, I wonder why that is not placed there when it is placed in all
> other location where qgroup->reserved is reduced.
>
> Also, this has nothing to do with the comment of two ways of free'ing
> the qgroups, as suggested in the commit statement. Thats my bad.
>



  reply	other threads:[~2016-09-27  3:10 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-22 18:47 [PATCH] qgroup: Prevent qgroup->reserved from going subzero Goldwyn Rodrigues
2016-09-23  1:06 ` Qu Wenruo
2016-09-23 13:43   ` Goldwyn Rodrigues
2016-09-26  2:33     ` Qu Wenruo
2016-09-26 14:31       ` Goldwyn Rodrigues
2016-09-27  3:10         ` Qu Wenruo [this message]
2016-09-27 14:04           ` Goldwyn Rodrigues
2016-09-28  1:44             ` Qu Wenruo
2016-09-28  2:19               ` Goldwyn Rodrigues
2016-09-29  8:57                 ` Qu Wenruo
2016-09-29 11:13                   ` Goldwyn Rodrigues
2016-09-30  1:21                     ` Qu Wenruo
2016-09-30  2:18                     ` Qu Wenruo
2016-09-30  2:24                       ` Qu Wenruo
2016-09-30 15:40 Goldwyn Rodrigues

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1274b3b2-d196-ee9e-2261-d2f5cb906477@cn.fujitsu.com \
    --to=quwenruo@cn.fujitsu.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=rgoldwyn@suse.com \
    --cc=rgoldwyn@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.