All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: Goldwyn Rodrigues <rgoldwyn@suse.de>,
	Goldwyn Rodrigues <rgoldwyn@suse.com>,
	<linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH] qgroup: Prevent qgroup->reserved from going subzero
Date: Fri, 30 Sep 2016 10:18:42 +0800	[thread overview]
Message-ID: <232dee19-bf1d-4cf1-68a0-6c12981cb6b9@cn.fujitsu.com> (raw)
In-Reply-To: <16c4c9a0-a698-91ab-2136-4731fea1850d@suse.de>

Your original fix is good.

Feel free to my tag after enriching the comment in btrfs_invalidatepage().

Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>

Thanks again for your founding and sorry for the extra time reviewing.

Thanks
Qu


At 09/29/2016 07:13 PM, Goldwyn Rodrigues wrote:
>
>
> On 09/29/2016 03:57 AM, Qu Wenruo wrote:
>> Thanks for your test script.
>>
>> I succeeded in pinning down the problem.
>>
>> The problem is, btrfs is invalidate pages that belongs to ordered
>> extent(goes through writeback)
>
>
> No, I don't think invalidated pages are going through writeback. The
> problem is that the space for extent allocation is done before the
> writeback and if pages are invalidated before writeback, it is not
> accounted for.
>
>>
>> The ftrace(with more tracepoints to trace qgroup->reserved change) shows:
>> ------
>> btrfs_qgroup_release_data: root=257, ino=257, start=0, len=4096,
>> reserved=4096, op=free
>> qgroup_update_reserve: qgid = 257, cur_reserved = 16171008, diff = -4096
>> btrfs_qgroup_release_data: root=257, ino=257, start=4096, len=4096,
>> reserved=4096, op=free
>> qgroup_update_reserve: qgid = 257, cur_reserved = 16166912, diff = -4096
>> btrfs_qgroup_release_data: root=257, ino=257, start=8192, len=4096,
>> reserved=4096, op=free
>> qgroup_update_reserve: qgid = 257, cur_reserved = 16162816, diff = -4096
>> btrfs_qgroup_release_data: root=257, ino=257, start=12288, len=4096,
>> reserved=4096, op=free
>> qgroup_update_reserve: qgid = 257, cur_reserved = 16158720, diff = -4096
>> btrfs_qgroup_release_data: root=257, ino=257, start=16384, len=4096,
>> reserved=4096, op=free
>> qgroup_update_reserve: qgid = 257, cur_reserved = 16154624, diff = -4096
>
> This is a little confusing. There is no qgroup_update_reserve() function
> in the source. Where did you add this?
>
>>
>>
>> ^^^^^ Btrfs invalidates 5 pages of ino 257 qg 257, range 0 ~ 20K.
>>
>> btrfs_qgroup_release_data: root=257, ino=257, start=0, len=5242880,
>> reserved=5222400, op=release
>>
>> ^^^^^ Here ordered_io finishes, write the whole 5M data, while
>> QGROUP_RESERVED only returned 5M - 20K.
>> ------
>>
>> The problem is at btrfs_add_delayed_data_ref(), we use the assumption
>> that, qgroup reserved space is the same as extent size(5M)
>> But in fact, btrfs_qgroup_release_data() only release (5M - 20K).
>> Leaking the 20K.
>
> Yes, this is more like it. However, I would put it the other way. It
> releases 5M when it should have released 5M-20K, because 20k was
> released when invalidating page.
>
>>
>> Calltrace:
>> btrfs_finish_ordered_io()
>> |- inserted_reserved_file_extent()
>>    |- btrfs_alloc_reserved_file_extent()
>>    |  ^^ Here we tell delayed_ref to free 5M later
>>    |- btrfs_qgroup_release_data()
>>       ^^ We only released (5M - 20K), as the first 5 pages
>>          are freed by btrfs_invalidatepage()
>>
>> If btrfs is designed to freeing pages which belongs to ordered_extent,
>> then it totally breaks the qgroup assumption.
>>
>> Then we have several ways to fix it:
>> 1) Fix race between ordered extent(writeback) with invalidatepage()
>>    Make btrfs_invalidatepage() skips pages that are already in
>>    ordered_extent, then we're OK.
>>
>>    This is much like your original fix, but I'm not sure if DIRTY page
>>    flag is bond to ordered_extent.
>>    And more comment will help.
>
>
> So, what you mean is that fix is correct, I just need to reword the
> patch header.
>
>>
>> 2) Make btrfs_qgroup_release_data() return accurate num bytes
>>    And then pass it to delayed_ref head.
>>
>>    This is quite anti-intuition. If we're adding a new extent, how
>>    could it happen that some pages are already invalidated?
>>
>
> I agree. It is counter-intutive and will increase complexity.
>
>> Anyway, awesome work, exposed a quite strange race and makes us re-think
>> the base of qgroup reserve space framework.
>
> Thanks.
>



  parent reply	other threads:[~2016-09-30  2:18 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-22 18:47 [PATCH] qgroup: Prevent qgroup->reserved from going subzero Goldwyn Rodrigues
2016-09-23  1:06 ` Qu Wenruo
2016-09-23 13:43   ` Goldwyn Rodrigues
2016-09-26  2:33     ` Qu Wenruo
2016-09-26 14:31       ` Goldwyn Rodrigues
2016-09-27  3:10         ` Qu Wenruo
2016-09-27 14:04           ` Goldwyn Rodrigues
2016-09-28  1:44             ` Qu Wenruo
2016-09-28  2:19               ` Goldwyn Rodrigues
2016-09-29  8:57                 ` Qu Wenruo
2016-09-29 11:13                   ` Goldwyn Rodrigues
2016-09-30  1:21                     ` Qu Wenruo
2016-09-30  2:18                     ` Qu Wenruo [this message]
2016-09-30  2:24                       ` Qu Wenruo
2016-09-30 15:40 Goldwyn Rodrigues

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=232dee19-bf1d-4cf1-68a0-6c12981cb6b9@cn.fujitsu.com \
    --to=quwenruo@cn.fujitsu.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=rgoldwyn@suse.com \
    --cc=rgoldwyn@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.