All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: Liu Bo <bo.liu@linux.alibaba.com>
Cc: dsterba@suse.cz, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 0/5] rb_first to rb_first_cached conversion
Date: Thu, 20 Sep 2018 17:56:29 +0200	[thread overview]
Message-ID: <20180920155629.GZ5847@twin.jikos.cz> (raw)
In-Reply-To: <20180914191456.4whx5gyjtg5hwbtt@US-160370MP2.local>

On Fri, Sep 14, 2018 at 12:14:57PM -0700, Liu Bo wrote:
> On Fri, Sep 14, 2018 at 05:11:02PM +0200, David Sterba wrote:
> > On Tue, Sep 11, 2018 at 11:31:49AM -0700, Liu Bo wrote:
> > > On Tue, Sep 11, 2018 at 05:34:03PM +0200, David Sterba wrote:
> > > > On Thu, Aug 23, 2018 at 03:51:48AM +0800, Liu Bo wrote:
> > > > > Several structs in btrfs are using rb_first() in a while loop, it'd be
> > > > > more efficient to do this with rb_first_cached() which has the O(1)
> > > > > complexity.
> > > > 
> > > > We'd like to see some numbers, though just by reading the code I think
> > > > there's going to be a noticeable improvement for some workloads.
> > > > 
> > > > There's a common pattern:
> > > > 
> > > > while(node = rb_first) {
> > > > 	entry = rb_entry(node)
> > > > 	next = rb_next(node)
> > > > 	rb_erase(node)
> > > > 	cleanup(entry)
> > > > }
> > > > 
> > > > rb_first needs to traverse the tree up to logN depth, rb_erase can
> > > > completely reshuffle the tree. With the caching we'll skip the traversal
> > > > in rb_first. That's a cached memory access vs looped pointer
> > > > dereference trade-off that IMHO has a clear winner.
> > > > 
> > > > As the pattern in many cases traverses the whole tree and removes all
> > > > entries, ideally we could get rid of the rebalancing too. The entries
> > > > would be cleaned up in postorder way, ie. reset the data and kfree in
> > > > the end. This requires that order of the freeing does not matter, which
> > > > might no apply to some trees.
> > > 
> > > The order of freeing does not matter, but we need the tree to return
> > > the correct next node to free, IOW, we have to maintain the rbtree
> > > until the last two nodes.
> > > 
> > > > 
> > > > I did not find suitable rbtree functions or helpers for that,
> > > > rbtree_postorder_for_each_entry_safe does not allow rb_erase and
> > > > rb_erase itself forces the rebalancing internally. But I think we can
> > > > implement that.
> > > > 
> > > > We could possibly use rb_next_postorder that makes sure that all
> > > > children are traversed before the parent so this is at least something
> > > > that could considerd.
> > > > 
> > > 
> > > Qu was inquiring about the same thing, I haven't got time to dig it
> > > further.
> > > 
> > > The question we need to answer is that whether we care about how fast
> > > destruction can make, as some are in critical paths while some are
> > > not.
> > 
> > Yeah, I take __btrfs_return_cluster_to_free_space as an example where
> > the more efficient tree destruction would shorten the time where a lock
> > is held.
> > 
> > In other cases it might complicate the code too much, though the
> > performance is not critical, eg. at umount time. Although, it'd be good
> > to optimize that too now that we know how.
> > 
> > And in the remaining cases it's not possible to avoid rb_erase. I had a
> > closer look at btrfs_cleanup_defrag_inodes and btrfs_put_tree_mod_seq.
> > Based on that I'd like to add this series to for-next, the first node
> > caching is clear enough and safe. We can do the other optimizations
> > later.
> 
> This needs more fine-grained debugging to see what's the cost
> lies on.
> 
> About the perf. number of rb_cached_node, I measured the time spent in
> extent map loop in btrfs_evict_inode(),
> 
> evict_inode_truncate_pages()
>     while (!RB_EMPTY_ROOT(&map_tree->map)) {
>          rb_first();
> 	 rb_erase();
>     }
> 
> 
> for a em tree containing 10,000 em,

The maximum depth of the rbtree for this number of nodes is roughly
2 * lg(10000) = 2 * 14 = 28. This is not an average case so it's going
to be better in practice but still 10+ pointer dereferences can be
exchanged with a pointer update.

> - with rb_first_cached, the loop takes 4880454 ns,
> - with rb_first, the loop takes 4584148 ns,
> 
> looks like the difference is not that much, ~6%.  After all it's about
> scalability, the more rb tree gets, the more rb_first_cached() wins.

Yeah, there's some difference but I think the cache effects will make
the measurements hard to estimate, so I take the numbers as a sort of
confirmation that there's not going to be a big difference.

The delayed refs and delayed inode can get some speedup from that as
there's usually only rb_first without rb_erase.

I'm going to update the first patch with the analysis we did and
reference the patch from the others so we'll have a starting point for
further optimizations and can verify if the described expected speedups
happen.

  reply	other threads:[~2018-09-20 21:40 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-22 19:51 [PATCH 0/5] rb_first to rb_first_cached conversion Liu Bo
2018-08-22 19:51 ` [PATCH 1/5] Btrfs: href_root: use rb_first_cached Liu Bo
2018-08-22 19:51 ` [PATCH 2/5] Btrfs: href->ref_tree: " Liu Bo
2018-08-22 19:51 ` [PATCH 3/5] Btrfs: delayed_items: " Liu Bo
2018-08-22 19:51 ` [PATCH 4/5] Btrfs: extent_map: " Liu Bo
2018-08-22 19:51 ` [PATCH 5/5] Btrfs: preftree: " Liu Bo
2018-08-23  0:27 ` [PATCH 0/5] rb_first to rb_first_cached conversion Qu Wenruo
2018-08-23 10:21 ` Holger Hoffstätte
2018-08-23 11:26 ` Nikolay Borisov
2018-09-11 15:34 ` David Sterba
2018-09-11 18:31   ` Liu Bo
2018-09-14 15:11     ` David Sterba
2018-09-14 19:14       ` Liu Bo
2018-09-20 15:56         ` David Sterba [this message]
2018-09-20 16:49 ` David Sterba
2018-09-20 17:40   ` Liu Bo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180920155629.GZ5847@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=bo.liu@linux.alibaba.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.