All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH RFC 00/14] Qgroup reserved space fixing framework
Date: Tue, 1 Sep 2015 15:25:46 +0800	[thread overview]
Message-ID: <55E552FA.4050603@cn.fujitsu.com> (raw)
In-Reply-To: <1441092131-14088-1-git-send-email-quwenruo@cn.fujitsu.com>

Again, later patches are blocked by the Exchange mail server.....

I'll send it again using another mailbox(quwenruo.btrfs@gmx.com).

Thanks,
Qu

Qu Wenruo wrote on 2015/09/01 15:21 +0800:
> !!!!!!WARNING START!!!!!!
> These patch is just a WIP patchset, although it fixed a qgroup reserved
> space leaking bug in normal COW case, it still lacks fix for other
> corner case, like NODATACOW or prealloc case, and a lot of old
> facilities are not cleaned up yet.
>
> The reason to send the WIP patchset is to check if the patchset has some
> deep structure bug, to avoid another rework after the whole patchset is
> finished
> !!!!!!WARNING END!!!!!!
>
> Although we have already reworked btrfs qgroup accounting part in
> v4.2-rc1, but qgroup reserve part still has a problem of leaking
> reserved space.
>
> [[BUG]]
> One of the most common case to trigger the bug is the following method:
> 1) Enable quota
> 2) Limit excl of qgroup 5 to 16M
> 3) Write [0,2M) of a file inside subvol 5 10 times without sync
>
> EQUOT will be triggered at about the 8th write.
>
> [[CAUSE]]
> The problem is caused by the fact that qgroup will reserve space even
> the data space is already reserved.
>
> In above reproducer, even time we buffered write [0,2M) qgroup will
> reserve 2M space, but in fact, at the 1st time, we have already reserved
> 2M and from then on, we don't need to reserved any data space as we are
> only writing [0,2M).
>
> Also, the reserved space will only be freed *ONCE* when its backref is
> run at commit_transaction() time.
>
> That's causing the reserved space leaking.
>
> [[FIX]]
> The fix is not a simple one, as currently btrfs_qgroup_reserve() follow
> the very bad btrfs space allocating principle:
>    Allocate as much as you needed, even it's not fully used.
>
> So in the patchset, we introduce a lot of facilities:
> 1) Per inode data rsv map
>     Record which range of a file has already been reserved.
>     Dirty range will be released when the range is written into disk.
>     And for any request to reserve space on already reserved range, just
>     skip it to avoid
>
> 2) Delayed ref head qgroup members
>     After a range of data is written into disk, we can't keep the dirty
>     range in data rsv map or just release reserved space.
>
>     If we keep dirty range in data rsv map, next write will consider
>     there is no need to reserve space, but new write will be cowed, and
>     cause another extent to take qgroup space.
>     So if keep dirty range, it'll cause qgroup accounting to exceed
>     limit.
>
>     On the other hand, if just release and free the reserved space, we
>     can still exceed the limit by allowing over-reserve.
>
>     So here, we must only release the range, but keep the reserved space
>     recorded in other place.
>     With the new qgroup accounting framework, only delayed_ref_head is
>     safe and will be run at the same time as btrfs qgroup accounting.
>
> 3) New delalloc_reserve_space/check_data_free_space facilities to
>     support accurate reserve space.
>     Unlike old implement, which consider it enough by only using
>     num_bytes.
>     New facilities all need a exact range [start, start + len) to reserve
>     space.
>
> More detailed info can be found in each commit message and source
> commend.
>
> Qu Wenruo (14):
>    btrfs: qgroup: New function declaration for new reserve implement
>    btrfs: qgroup: Implement data_rsv_map init/free functions
>    btrfs: qgroup: Introduce new function to search most left reserve
>      range
>    btrfs: qgroup: Introduce function to insert non-overlap reserve range
>    btrfs: qgroup: Introduce function to reserve data range per inode
>    btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function
>    btrfs: qgroup: Introduce function to release reserved range
>    btrfs: qgroup: Introduce function to release/free reserved data range
>    btrfs: delayed_ref: Add new function to record reserved space into
>      delayed ref
>    btrfs: delayed_ref: release and free qgroup reserved at proper timing
>    btrfs: qgroup: Introduce new functions to reserve/free metadata
>    btrfs: qgroup: Use new metadata reservation.
>    btrfs: extent-tree: Add new verions of btrfs_check_data_free_space
>    btrfs: Use new check_data_free_space for buffered write
>
>   fs/btrfs/btrfs_inode.h |   6 +
>   fs/btrfs/ctree.h       |   5 +
>   fs/btrfs/delayed-ref.c |  29 +++
>   fs/btrfs/delayed-ref.h |  14 ++
>   fs/btrfs/disk-io.c     |   1 +
>   fs/btrfs/extent-tree.c |  68 +++--
>   fs/btrfs/file.c        |  22 +-
>   fs/btrfs/inode.c       |  20 ++
>   fs/btrfs/qgroup.c      | 658 ++++++++++++++++++++++++++++++++++++++++++++++++-
>   fs/btrfs/qgroup.h      |  21 +-
>   fs/btrfs/transaction.c |  34 +--
>   fs/btrfs/transaction.h |   1 -
>   12 files changed, 820 insertions(+), 59 deletions(-)
>

  parent reply	other threads:[~2015-09-01  7:25 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-01  7:21 [PATCH RFC 00/14] Qgroup reserved space fixing framework Qu Wenruo
2015-09-01  7:21 ` [PATCH RFC 01/14] btrfs: qgroup: New function declaration for new reserve implement Qu Wenruo
2015-09-01  7:21 ` [PATCH RFC 02/14] btrfs: qgroup: Implement data_rsv_map init/free functions Qu Wenruo
2015-09-01  7:22 ` [PATCH RFC 03/14] btrfs: qgroup: Introduce new function to search most left reserve range Qu Wenruo
2015-09-01  7:22 ` [PATCH RFC 04/14] btrfs: qgroup: Introduce function to insert non-overlap " Qu Wenruo
2015-09-01  7:25 ` Qu Wenruo [this message]
2015-09-01  8:45 ` [PATCH RFC 05/14] btrfs: qgroup: Introduce function to reserve data range per inode Qu Wenruo
2015-09-01  8:45   ` [PATCH RFC 06/14] btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function Qu Wenruo
2015-09-01  8:45   ` [PATCH RFC 07/14] btrfs: qgroup: Introduce function to release reserved range Qu Wenruo
2015-09-01  8:45   ` [PATCH RFC 08/14] btrfs: qgroup: Introduce function to release/free reserved data range Qu Wenruo
2015-09-01  8:50 ` [PATCH RFC 09/14] btrfs: delayed_ref: Add new function to record reserved space into delayed ref Qu Wenruo
2015-09-01  8:50 ` [PATCH RFC 10/14] btrfs: delayed_ref: release and free qgroup reserved at proper timing Qu Wenruo
2015-09-01  8:50 ` [PATCH RFC 11/14] btrfs: qgroup: Introduce new functions to reserve/free metadata Qu Wenruo
2015-09-01  8:50 ` [PATCH RFC 12/14] btrfs: qgroup: Use new metadata reservation Qu Wenruo
2015-09-01  8:54 ` [PATCH RFC 13/14] btrfs: extent-tree: Add new verions of btrfs_check_data_free_space Qu Wenruo
2015-09-01  8:54 ` [PATCH RFC 14/14] btrfs: Use new check_data_free_space for buffered write Qu Wenruo
  -- strict thread matches above, loose matches on Subject: below --
2015-09-01  7:27 [PATCH RFC 00/14] Qgroup reserved space fixing framework Qu Wenruo
2015-09-01  0:31 Qu Wenruo
2015-08-31  8:54 Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55E552FA.4050603@cn.fujitsu.com \
    --to=quwenruo@cn.fujitsu.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.