All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2 0/5] btrfs: qgroup: Skip unrelated tree blocks for balance
Date: Tue, 11 Sep 2018 10:46:07 +0200	[thread overview]
Message-ID: <20180911084607.GA24025@twin.jikos.cz> (raw)
In-Reply-To: <365a6700-4a69-4997-1546-b90710d9d8d3@gmx.com>

On Tue, Sep 11, 2018 at 10:43:32AM +0800, Qu Wenruo wrote:
> 
> 
> On 2018/9/7 下午5:32, Qu Wenruo wrote:
> > This patchset can be fetched from github:
> > https://github.com/adam900710/linux/tree/qgroup_balance_skip_trees
> > The base commit is v4.19-rc1 tag.
> > 
> > There are a lot of reports of system hang for balance on quota enabled
> > fs.
> > It's most obvious for large fs.
> > 
> > The hang is caused by tons of unmodified extents marked as qgroup dirty.
> > Such unmodified/unrelated sources include:
> > 1) Unmodified subtree
> > 2) Subtree drop for reloc tree
> > (BTW, other sources includes unmodified file extent items)
> > 
> > E.g.
> > OO = Old tree blocks from file tree
> > NN = New tree blocks from reloc tree
> > 
> >         file tree                              reloc tree
> >            OO (a)                                  NN (a)
> >           /  \                                    /  \
> >     (b) OO    OO (c)                        (b) NN    NN (c)
> >        / \   / \                               / \   / \
> >      OO  OO OO  OO                           OO  OO OO  NN
> >     (d) (e) (f) (g)                         (d) (e) (f) (g)
> > 
> > In above case, balance will modify nodeptr in OO(a) to point NN(b) and
> > NN(c), and modify NN(a) to point to OO(B) and OO(c).
> > 
> > Before this patch, quota will mark the whole subtree from its parent
> > down to the leaves as dirty.
> > So btrfs quota need to trace all tree block from (a) to (g).
> > 
> > However tree blocks (d) (e) (f) are shared between both trees, thus
> > there is no need to trace those 3 tree blocks.
> > 
> > This patchset will change how this work by only tracing modified tree
> > blocks in reloc tree, and their counter parts in file tree.
> > 
> > Nodeptr swap will happen for tree blocks (b) and (c) in both tree.
> > 
> > For tree block (b), in reloc tree we could find that all its
> > children's generation is smaller than last_snapshot, thus no need to
> > trace them, only need to trace NN(b), and its counter part OO(b).
> > 
> > For tree block (c), in reloc tree, we find its child NN(g) need
> > tracing, and for tree block NN(g), there is no child need to trace.
> > 
> > So for subtree starting at tree block NN(c), we need to trace NN(c) and
> > NN(g), along with its counter part OO(c) and OO(c).
> > 
> > With this patch, we could skip tree blocks OO(d)~OO(f) in above example,
> > thus reduce some some overhead caused by qgroup.
> > 
> > The improvement is mostly related to metadata relocation.
> > If there is some high level tree blocks get relocated but its children are
> > still unmodified, we could save a lot of time.
> > 
> > Even for the worst case, it should be no worse than original full
> > subtree marking method.
> > 
> > Real world case benchmark is under way.
> 
> Did a small scale test. (With latest submitted patch "btrfs:
> delayed-ref: Introduce new parameter for btrfs_add_delayed_tree_ref() to
> reduce unnecessary qgroup tracing")
> 
> 4K nodesize fs (to bump tree sizes), around 4G data copied from /usr and
> /lib (so number of files should be large enough).
> 
> The VM has unsafe cache mode for its qcow2 file, and the backing device
> is a SAMSUNG 850 evo sata SSD. (Host has enough RAM so most IO should be
> as fast as RAM speed).
> 
> The for metadata only balance:
> 
>                      | Before          | After       | Diff
> --------------------------------------------------------------------------
> relocated extents    | 21112           | 22916       | +8.5%
> qgroup dirty extents | 213831          | 140731      | -30.0%
> time (sys)           | 7.828s          | 5.818s      | -25.7%
> time (real)          | 10.004s         | 7.768s      | -22.3%

Thanks, this looks good. I'd speculate that the improvement on systems
where the IO is not memory backed will be improved by the reduced count
of the dirty extents. But the memory-backed IO results look good on
itself.

I'll add the patches to for-next, feel free to send more testing results
or updates. Thanks.

      reply	other threads:[~2018-09-11 13:44 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-07  9:32 [PATCH v2 0/5] btrfs: qgroup: Skip unrelated tree blocks for balance Qu Wenruo
2018-09-07  9:32 ` [PATCH v2 1/5] btrfs: qgroup: Introduce trace event to analyse the number of dirty extents accounted Qu Wenruo
2018-09-07  9:32 ` [PATCH v2 2/5] btrfs: qgroup: Introduce function to trace two swaped extents Qu Wenruo
2018-09-07  9:32 ` [PATCH v2 3/5] btrfs: qgroup: Introduce function to find all new tree blocks of reloc tree Qu Wenruo
2018-09-07  9:32 ` [PATCH v2 4/5] btrfs: qgroup: Use generation aware subtree swap to mark dirty extents Qu Wenruo
2018-09-07  9:32 ` [PATCH v2 5/5] btrfs: qgroup: Don't trace subtree if we're dropping reloc tree Qu Wenruo
2018-09-11  2:43 ` [PATCH v2 0/5] btrfs: qgroup: Skip unrelated tree blocks for balance Qu Wenruo
2018-09-11  8:46   ` David Sterba [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180911084607.GA24025@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.