From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Cc: dsterba@suse.cz
Subject: Re: [Patch v5 0/7] btrfs: qgroup: Delay subtree scan to reduce overhead
Date: Wed, 23 Jan 2019 15:22:34 +0800 [thread overview]
Message-ID: <d3cb5623-0d21-cd65-6dda-5c463536ee35@gmx.com> (raw)
In-Reply-To: <20190123071518.2528-1-wqu@suse.com>
[-- Attachment #1.1: Type: text/plain, Size: 6520 bytes --]
On 2019/1/23 下午3:15, Qu Wenruo wrote:
> This patchset can be fetched from github:
> https://github.com/adam900710/linux/tree/qgroup_delayed_subtree
>
> Which is based on v5.0-rc1.
>
> This patch address the heavy load subtree scan, but delaying it until
> we're going to modify the swapped tree block.
>
> The overall workflow is:
>
> 1) Record the subtree root block get swapped.
>
> During subtree swap:
> O = Old tree blocks
> N = New tree blocks
> reloc tree subvol tree X
> Root Root
> / \ / \
> NA OB OA OB
> / | | \ / | | \
> NC ND OE OF OC OD OE OF
>
> In these case, NA and OA is going to be swapped, record (NA, OA) into
> subvol tree X.
>
> 2) After subtree swap.
> reloc tree subvol tree X
> Root Root
> / \ / \
> OA OB NA OB
> / | | \ / | | \
> OC OD OE OF NC ND OE OF
>
> 3a) CoW happens for OB
> If we are going to CoW tree block OB, we check OB's bytenr against
> tree X's swapped_blocks structure.
> It doesn't fit any one, nothing will happen.
>
> 3b) CoW happens for NA
> Check NA's bytenr against tree X's swapped_blocks, and get a hit.
> Then we do subtree scan on both subtree OA and NA.
> Resulting 6 tree blocks to be scanned (OA, OC, OD, NA, NC, ND).
>
> Then no matter what we do to subvol tree X, qgroup numbers will
> still be correct.
> Then NA's record get removed from X's swapped_blocks.
>
> 4) Transaction commit
> Any record in X's swapped_blocks get removed, since there is no
> modification to swapped subtrees, no need to trigger heavy qgroup
> subtree rescan for them.
>
> [[Benchmark]] (*)
> Hardware:
> VM 4G vRAM, 8 vCPUs,
> disk is using 'unsafe' cache mode,
> backing device is SAMSUNG 850 evo SSD.
> Host has 16G ram.
>
> Mkfs parameter:
> --nodesize 4K (To bump up tree size)
>
> Initial subvolume contents:
> 4G data copied from /usr and /lib.
> (With enough regular small files)
>
> Snapshots:
> 16 snapshots of the original subvolume.
> each snapshot has 3 random files modified.
>
> balance parameter:
> -m
>
> So the content should be pretty similar to a real world root fs layout.
>
> And after file system population, there is no other activity, so it
> should be the best case scenario.
>
> | v4.20-rc1 | w/ patchset | diff
> -----------------------------------------------------------------------
> relocated extents | 22615 | 22457 | -0.1%
> qgroup dirty extents | 163457 | 121606 | -25.6%
> time (sys) | 22.884s | 18.842s | -17.6%
> time (real) | 27.724s | 22.884s | -17.5%
>
> *: Due to a bug in v5.0-rc1, balancing metadata with snapshots is
> unacceptably slow even with quota disabled. So the result is from
> v4.20-rc1.
>
> changelog:
> v2:
> - Rebase to v4.20-rc1.
>
> - Instead commit transaction after each reloc tree merge, delay it until
> merge_reloc_roots() finishes.
> This provides a more natural behavior, and reduce the unnecessary
> transaction commits.
>
> v3:
> - Fix backref walk deadlock by not triggering it at all.
> This also removes the need for @exec_post refactor and replace the
> patch to allow @old_root unpopulated.
>
> - Include the patch that fixes the unexpected data rsv free.
>
> v3.1:
> - Rebased to v4.20-rc1.
> Minor conflicts with some cleanup code.
>
> v4:
> - Renaming members from "file_*" to "subv_*".
> Members like "file_bytenr" is pretty confusing, renaming it to
> "subv_bytenr" avoid the confusion.
>
> - Use btrfs_root::reloc_dirty_list to replace dynamic memory allocation
> One less point of failure, and no need to worry about GFP_KERNEL/NOFS.
> Furthermore, it's easier to manipulate list than rb tree.
>
> v5:
> - Use Josef's superior qgroup deadlock fix.
> No performance regression now.
>
> - A new patch to allow delayed subtree rescan to insert empty old_roots.
I should double check the cover letter.
This part is incorrect, please just ignore it.
Thanks,
Qu
>
> - Fix a possible race due to wrong rb_tree node initialization out of
> critical section.
>
> - A lot of coding style fixes:
> * naming change from "file"/"subv" to "subvol"
> * {} for any else if branch
> * avoid err/ret confusion by introducing "tmp_ret"
> * proper errno for non-uptodate extent buffer
> * struct member re-ordering to avoid unnecessary padding
> * avoid single letter variable name
> * less redundant emphasizing
> * move certain devel-only warning under CONFIG_BTRFS_DEBUG
> * replace cool-sounding 'hack' with 'optimization'
> * remove unnecessary inline prefix for btrfs_qgroup_init_swapped_blocks
> * keep an empty line before #endif
>
>
> Josef Bacik (1):
> btrfs: honor path->skip_locking in backref code
>
> Qu Wenruo (6):
> btrfs: qgroup: Move reserved data account from btrfs_delayed_ref_head
> to btrfs_qgroup_extent_record
> btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots()
> btrfs: qgroup: Refactor btrfs_qgroup_trace_subtree_swap()
> btrfs: qgroup: Introduce per-root swapped blocks infrastructure
> btrfs: qgroup: Use delayed subtree rescan for balance
> btrfs: qgroup: Cleanup old subtree swap code
>
> fs/btrfs/backref.c | 16 +-
> fs/btrfs/ctree.c | 8 +
> fs/btrfs/ctree.h | 29 +++
> fs/btrfs/delayed-ref.c | 15 +-
> fs/btrfs/delayed-ref.h | 11 --
> fs/btrfs/disk-io.c | 2 +
> fs/btrfs/extent-tree.c | 3 -
> fs/btrfs/qgroup.c | 339 +++++++++++++++++++++++++++--------
> fs/btrfs/qgroup.h | 120 +++++++++++--
> fs/btrfs/relocation.c | 101 ++++++++---
> fs/btrfs/transaction.c | 1 +
> include/trace/events/btrfs.h | 29 ---
> 12 files changed, 502 insertions(+), 172 deletions(-)
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2019-01-23 7:22 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-23 7:15 [Patch v5 0/7] btrfs: qgroup: Delay subtree scan to reduce overhead Qu Wenruo
2019-01-23 7:15 ` [Patch v5 1/7] btrfs: qgroup: Move reserved data account from btrfs_delayed_ref_head to btrfs_qgroup_extent_record Qu Wenruo
2019-02-08 14:02 ` David Sterba
2019-01-23 7:15 ` [Patch v5 2/7] btrfs: honor path->skip_locking in backref code Qu Wenruo
2019-01-23 7:15 ` [Patch v5 3/7] btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots() Qu Wenruo
2019-01-23 7:15 ` [Patch v5 4/7] btrfs: qgroup: Refactor btrfs_qgroup_trace_subtree_swap() Qu Wenruo
2019-01-23 7:15 ` [Patch v5 5/7] btrfs: qgroup: Introduce per-root swapped blocks infrastructure Qu Wenruo
2019-01-23 7:15 ` [Patch v5 6/7] btrfs: qgroup: Use delayed subtree rescan for balance Qu Wenruo
2019-01-23 7:15 ` [Patch v5 7/7] btrfs: qgroup: Cleanup old subtree swap code Qu Wenruo
2019-01-23 7:22 ` Qu Wenruo [this message]
2019-01-23 17:47 ` [Patch v5 0/7] btrfs: qgroup: Delay subtree scan to reduce overhead David Sterba
2019-01-24 0:02 ` Qu Wenruo
2019-01-24 13:36 ` Qu Wenruo
2019-01-24 19:22 ` David Sterba
2019-01-28 18:15 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d3cb5623-0d21-cd65-6dda-5c463536ee35@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=dsterba@suse.cz \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).