All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: Goldwyn Rodrigues <rgoldwyn@suse.com>,
	Goldwyn Rodrigues <rgoldwyn@suse.de>,
	<linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH] qgroup: Prevent qgroup->reserved from going subzero
Date: Thu, 29 Sep 2016 16:57:16 +0800	[thread overview]
Message-ID: <1d2ddf05-81d7-4be9-fe32-909ef121f3db@cn.fujitsu.com> (raw)
In-Reply-To: <3ef8e20a-d671-7691-d964-014f35299703@suse.com>

Thanks for your test script.

I succeeded in pinning down the problem.

The problem is, btrfs is invalidate pages that belongs to ordered 
extent(goes through writeback)

The ftrace(with more tracepoints to trace qgroup->reserved change) shows:
------
btrfs_qgroup_release_data: root=257, ino=257, start=0, len=4096, 
reserved=4096, op=free
qgroup_update_reserve: qgid = 257, cur_reserved = 16171008, diff = -4096
btrfs_qgroup_release_data: root=257, ino=257, start=4096, len=4096, 
reserved=4096, op=free
qgroup_update_reserve: qgid = 257, cur_reserved = 16166912, diff = -4096
btrfs_qgroup_release_data: root=257, ino=257, start=8192, len=4096, 
reserved=4096, op=free
qgroup_update_reserve: qgid = 257, cur_reserved = 16162816, diff = -4096
btrfs_qgroup_release_data: root=257, ino=257, start=12288, len=4096, 
reserved=4096, op=free
qgroup_update_reserve: qgid = 257, cur_reserved = 16158720, diff = -4096
btrfs_qgroup_release_data: root=257, ino=257, start=16384, len=4096, 
reserved=4096, op=free
qgroup_update_reserve: qgid = 257, cur_reserved = 16154624, diff = -4096


^^^^^ Btrfs invalidates 5 pages of ino 257 qg 257, range 0 ~ 20K.

btrfs_qgroup_release_data: root=257, ino=257, start=0, len=5242880, 
reserved=5222400, op=release

^^^^^ Here ordered_io finishes, write the whole 5M data, while 
QGROUP_RESERVED only returned 5M - 20K.
------

The problem is at btrfs_add_delayed_data_ref(), we use the assumption 
that, qgroup reserved space is the same as extent size(5M)
But in fact, btrfs_qgroup_release_data() only release (5M - 20K).
Leaking the 20K.

Calltrace:
btrfs_finish_ordered_io()
|- inserted_reserved_file_extent()
    |- btrfs_alloc_reserved_file_extent()
    |  ^^ Here we tell delayed_ref to free 5M later
    |- btrfs_qgroup_release_data()
       ^^ We only released (5M - 20K), as the first 5 pages
          are freed by btrfs_invalidatepage()

If btrfs is designed to freeing pages which belongs to ordered_extent, 
then it totally breaks the qgroup assumption.

Then we have several ways to fix it:
1) Fix race between ordered extent(writeback) with invalidatepage()
    Make btrfs_invalidatepage() skips pages that are already in
    ordered_extent, then we're OK.

    This is much like your original fix, but I'm not sure if DIRTY page
    flag is bond to ordered_extent.
    And more comment will help.

2) Make btrfs_qgroup_release_data() return accurate num bytes
    And then pass it to delayed_ref head.

    This is quite anti-intuition. If we're adding a new extent, how
    could it happen that some pages are already invalidated?


Anyway, awesome work, exposed a quite strange race and makes us re-think 
the base of qgroup reserve space framework.

Thanks,
Qu

At 09/28/2016 10:19 AM, Goldwyn Rodrigues wrote:
>
>
> On 09/27/2016 08:44 PM, Qu Wenruo wrote:
>
> <snipped>
>
>> Finally reproduced it.
>>
>> Although in a low chance, about 1/10.
>>
>> Under most case, the final remove gets executed without error.
>
> Change 50m to 500m while setting the qgroup limit, the probability will
> increase.
>



  reply	other threads:[~2016-09-29  8:57 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-22 18:47 [PATCH] qgroup: Prevent qgroup->reserved from going subzero Goldwyn Rodrigues
2016-09-23  1:06 ` Qu Wenruo
2016-09-23 13:43   ` Goldwyn Rodrigues
2016-09-26  2:33     ` Qu Wenruo
2016-09-26 14:31       ` Goldwyn Rodrigues
2016-09-27  3:10         ` Qu Wenruo
2016-09-27 14:04           ` Goldwyn Rodrigues
2016-09-28  1:44             ` Qu Wenruo
2016-09-28  2:19               ` Goldwyn Rodrigues
2016-09-29  8:57                 ` Qu Wenruo [this message]
2016-09-29 11:13                   ` Goldwyn Rodrigues
2016-09-30  1:21                     ` Qu Wenruo
2016-09-30  2:18                     ` Qu Wenruo
2016-09-30  2:24                       ` Qu Wenruo
2016-09-30 15:40 Goldwyn Rodrigues

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1d2ddf05-81d7-4be9-fe32-909ef121f3db@cn.fujitsu.com \
    --to=quwenruo@cn.fujitsu.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=rgoldwyn@suse.com \
    --cc=rgoldwyn@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.