Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Cc: dsterba@suse.cz
Subject: Re: [Patch v5 0/7] btrfs: qgroup: Delay subtree scan to reduce overhead
Date: Wed, 23 Jan 2019 15:22:34 +0800
Message-ID: <d3cb5623-0d21-cd65-6dda-5c463536ee35@gmx.com> (raw)
In-Reply-To: <20190123071518.2528-1-wqu@suse.com>

[-- Attachment #1.1: Type: text/plain, Size: 6520 bytes --]



On 2019/1/23 下午3:15, Qu Wenruo wrote:
> This patchset can be fetched from github:
> https://github.com/adam900710/linux/tree/qgroup_delayed_subtree
> 
> Which is based on v5.0-rc1.
> 
> This patch address the heavy load subtree scan, but delaying it until
> we're going to modify the swapped tree block.
> 
> The overall workflow is:
> 
> 1) Record the subtree root block get swapped.
> 
>    During subtree swap:
>    O = Old tree blocks
>    N = New tree blocks
>          reloc tree                         subvol tree X
>             Root                               Root
>            /    \                             /    \
>          NA     OB                          OA      OB
>        /  |     |  \                      /  |      |  \
>      NC  ND     OE  OF                   OC  OD     OE  OF
> 
>   In these case, NA and OA is going to be swapped, record (NA, OA) into
>   subvol tree X.
> 
> 2) After subtree swap.
>          reloc tree                         subvol tree X
>             Root                               Root
>            /    \                             /    \
>          OA     OB                          NA      OB
>        /  |     |  \                      /  |      |  \
>      OC  OD     OE  OF                   NC  ND     OE  OF
> 
> 3a) CoW happens for OB
>     If we are going to CoW tree block OB, we check OB's bytenr against
>     tree X's swapped_blocks structure.
>     It doesn't fit any one, nothing will happen.
> 
> 3b) CoW happens for NA
>     Check NA's bytenr against tree X's swapped_blocks, and get a hit.
>     Then we do subtree scan on both subtree OA and NA.
>     Resulting 6 tree blocks to be scanned (OA, OC, OD, NA, NC, ND).
> 
>     Then no matter what we do to subvol tree X, qgroup numbers will
>     still be correct.
>     Then NA's record get removed from X's swapped_blocks.
> 
> 4)  Transaction commit
>     Any record in X's swapped_blocks get removed, since there is no
>     modification to swapped subtrees, no need to trigger heavy qgroup
>     subtree rescan for them.
> 
> [[Benchmark]] (*)
> Hardware:
> 	VM 4G vRAM, 8 vCPUs,
> 	disk is using 'unsafe' cache mode,
> 	backing device is SAMSUNG 850 evo SSD.
> 	Host has 16G ram.
> 
> Mkfs parameter:
> 	--nodesize 4K (To bump up tree size)
> 
> Initial subvolume contents:
> 	4G data copied from /usr and /lib.
> 	(With enough regular small files)
> 
> Snapshots:
> 	16 snapshots of the original subvolume.
> 	each snapshot has 3 random files modified.
> 
> balance parameter:
> 	-m
> 
> So the content should be pretty similar to a real world root fs layout.
> 
> And after file system population, there is no other activity, so it
> should be the best case scenario.
> 
>                      | v4.20-rc1            | w/ patchset    | diff
> -----------------------------------------------------------------------
> relocated extents    | 22615                | 22457          | -0.1%
> qgroup dirty extents | 163457               | 121606         | -25.6%
> time (sys)           | 22.884s              | 18.842s        | -17.6%
> time (real)          | 27.724s              | 22.884s        | -17.5%
> 
> *: Due to a bug in v5.0-rc1, balancing metadata with snapshots is
> unacceptably slow even with quota disabled. So the result is from
> v4.20-rc1.
> 
> changelog:
> v2:
> - Rebase to v4.20-rc1.
> 
> - Instead commit transaction after each reloc tree merge, delay it until
>   merge_reloc_roots() finishes.
>   This provides a more natural behavior, and reduce the unnecessary
>   transaction commits.
> 
> v3:
> - Fix backref walk deadlock by not triggering it at all.
>   This also removes the need for @exec_post refactor and replace the
>   patch to allow @old_root unpopulated.
> 
> - Include the patch that fixes the unexpected data rsv free.
> 
> v3.1:
> - Rebased to v4.20-rc1.
>   Minor conflicts with some cleanup code.
> 
> v4:
> - Renaming members from "file_*" to "subv_*".
>   Members like "file_bytenr" is pretty confusing, renaming it to
>   "subv_bytenr" avoid the confusion.
> 
> - Use btrfs_root::reloc_dirty_list to replace dynamic memory allocation
>   One less point of failure, and no need to worry about GFP_KERNEL/NOFS.
>   Furthermore, it's easier to manipulate list than rb tree.
> 
> v5:
> - Use Josef's superior qgroup deadlock fix.
>   No performance regression now.
> 
> - A new patch to allow delayed subtree rescan to insert empty old_roots.

I should double check the cover letter.
This part is incorrect, please just ignore it.

Thanks,
Qu

> 
> - Fix a possible race due to wrong rb_tree node initialization out of
>   critical section.
> 
> - A lot of coding style fixes:
>   * naming change from "file"/"subv" to "subvol"
>   * {} for any else if branch
>   * avoid err/ret confusion by introducing "tmp_ret"
>   * proper errno for non-uptodate extent buffer
>   * struct member re-ordering to avoid unnecessary padding
>   * avoid single letter variable name
>   * less redundant emphasizing
>   * move certain devel-only warning under CONFIG_BTRFS_DEBUG
>   * replace cool-sounding 'hack' with 'optimization'
>   * remove unnecessary inline prefix for btrfs_qgroup_init_swapped_blocks
>   * keep an empty line before #endif
> 
> 
> Josef Bacik (1):
>   btrfs: honor path->skip_locking in backref code
> 
> Qu Wenruo (6):
>   btrfs: qgroup: Move reserved data account from btrfs_delayed_ref_head
>     to btrfs_qgroup_extent_record
>   btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots()
>   btrfs: qgroup: Refactor btrfs_qgroup_trace_subtree_swap()
>   btrfs: qgroup: Introduce per-root swapped blocks infrastructure
>   btrfs: qgroup: Use delayed subtree rescan for balance
>   btrfs: qgroup: Cleanup old subtree swap code
> 
>  fs/btrfs/backref.c           |  16 +-
>  fs/btrfs/ctree.c             |   8 +
>  fs/btrfs/ctree.h             |  29 +++
>  fs/btrfs/delayed-ref.c       |  15 +-
>  fs/btrfs/delayed-ref.h       |  11 --
>  fs/btrfs/disk-io.c           |   2 +
>  fs/btrfs/extent-tree.c       |   3 -
>  fs/btrfs/qgroup.c            | 339 +++++++++++++++++++++++++++--------
>  fs/btrfs/qgroup.h            | 120 +++++++++++--
>  fs/btrfs/relocation.c        | 101 ++++++++---
>  fs/btrfs/transaction.c       |   1 +
>  include/trace/events/btrfs.h |  29 ---
>  12 files changed, 502 insertions(+), 172 deletions(-)
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  parent reply index

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-23  7:15 Qu Wenruo
2019-01-23  7:15 ` [Patch v5 1/7] btrfs: qgroup: Move reserved data account from btrfs_delayed_ref_head to btrfs_qgroup_extent_record Qu Wenruo
2019-02-08 14:02   ` David Sterba
2019-01-23  7:15 ` [Patch v5 2/7] btrfs: honor path->skip_locking in backref code Qu Wenruo
2019-01-23  7:15 ` [Patch v5 3/7] btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots() Qu Wenruo
2019-01-23  7:15 ` [Patch v5 4/7] btrfs: qgroup: Refactor btrfs_qgroup_trace_subtree_swap() Qu Wenruo
2019-01-23  7:15 ` [Patch v5 5/7] btrfs: qgroup: Introduce per-root swapped blocks infrastructure Qu Wenruo
2019-01-23  7:15 ` [Patch v5 6/7] btrfs: qgroup: Use delayed subtree rescan for balance Qu Wenruo
2019-01-23  7:15 ` [Patch v5 7/7] btrfs: qgroup: Cleanup old subtree swap code Qu Wenruo
2019-01-23  7:22 ` Qu Wenruo [this message]
2019-01-23 17:47 ` [Patch v5 0/7] btrfs: qgroup: Delay subtree scan to reduce overhead David Sterba
2019-01-24  0:02   ` Qu Wenruo
2019-01-24 13:36   ` Qu Wenruo
2019-01-24 19:22 ` David Sterba
2019-01-28 18:15 ` David Sterba

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d3cb5623-0d21-cd65-6dda-5c463536ee35@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org linux-btrfs@archiver.kernel.org
	public-inbox-index linux-btrfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox