All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH RFC 0/2] Use new incompat feature BG_TREE to hugely reduce mount time
Date: Fri, 28 Dec 2018 11:15:47 +0200	[thread overview]
Message-ID: <38520a81-e5b0-35b5-bf81-966ea6ef2c45@suse.com> (raw)
In-Reply-To: <20181228083745.3134-1-wqu@suse.com>



On 28.12.18 г. 10:37 ч., Qu Wenruo wrote:
> This patchset can be fetched from:
> https://github.com/adam900710/linux/tree/bg_tree
> Which is based on v4.20-rc1 tag.
> 
> This patchset will hugely reduce mount time of large fs by putting all
> block group items into its own tree.
> 
> The old behavior will try to read out all block group items at mount
> time, however due to the key of block group items are scattered across
> tons of extent items, we must call btrfs_search_slot() for each block
> group.
> 
> It works fine for small fs, but when number of block groups goes beyond
> 200, such tree search will become a random read, causing obvious slow
> down.
> 
> On the other hand, btrfs_read_chunk_tree() is still very fast, since we
> put CHUNK_ITEMS into their own tree and package them next to each other.
> 
> 
> Following this idea, we could do the same thing for block group items,
> so instead of triggering btrfs_search_slot() for each block group, we
> just call btrfs_next_item() and under most case we could finish in
> memory, and hugely speed up mount (see BENCHMARK below).
> 
> The only disadvantage is, this method introduce an incompatible feature,
> so existing fs can't use this feature directly.
> Either specify it at mkfs time, or use btrfs-progs offline convert tool
> (*).

What if we start recording block group items in the chunk tree? Would
that bring any benefit without creating yet another tree, though likely
we'd still have to use an incompat flag? That way shouldn't BLOCK_GROUP
items be chunked together?

> 
> *: Mkfs and convert tool are doing the same work, however I haven't
> decide if I should put this feature to btrfstune.
> 
> The RFC tag is for the comprehensive test and sysfs interface.
> At least during my filling test it definitely works fine.
> 
> [[Benchmark]]
> Physical device:	HDD (7200RPM)
> Nodesize:		4K  (to bump up tree height)
> Used size:		250G
> Total size:		500G
> Extent data size:	1M
> 
> All file extents on disk is in 1M size, ensured by using fallocate.
> 
> Without patchset:
> Use ftrace function graph:
> 
>   3)               |  open_ctree [btrfs]() {
>   3)               |    btrfs_read_chunk_tree [btrfs]() {
>   3) * 69033.31 us |    }
>   3)               |    btrfs_verify_dev_extents [btrfs]() {
>   3) * 90376.15 us |    }
>   3)               |    btrfs_read_block_groups [btrfs]() {
>   2) $ 2733853 us  |    } /* btrfs_read_block_groups [btrfs] */
>   2) $ 3168384 us  |  } /* open_ctree [btrfs] */
> 
>  btrfs_read_block_groups() takes 87% of the total mount time,
> 
> With patchset, and use -O bg-tree mkfs option:
>   7)               |  open_ctree [btrfs]() {
>   7)               |    btrfs_read_chunk_tree [btrfs]() {
>   7) # 2448.562 us |    }
>   7)               |    btrfs_verify_dev_extents [btrfs]() {
>   7) * 19802.02 us |    }
>   7)               |    btrfs_read_block_groups [btrfs]() {
>   7) # 8610.397 us |    }
>   7) @ 113498.6 us |  }
> 
>   open_ctree() time is only 3% of original mount time.
>   And btrfs_read_block_groups() only takes 7.6% of total open_ctree()
>   execution time.
> 
> 
> 
> Qu Wenruo (2):
>   btrfs: Refactor btrfs_read_block_groups()
>   btrfs: Introduce new incompat feature, BG_TREE
> 
>  fs/btrfs/ctree.h                |   5 +-
>  fs/btrfs/disk-io.c              |  13 ++
>  fs/btrfs/extent-tree.c          | 300 ++++++++++++++++++++------------
>  include/uapi/linux/btrfs.h      |   1 +
>  include/uapi/linux/btrfs_tree.h |   3 +
>  5 files changed, 206 insertions(+), 116 deletions(-)
> 

  parent reply	other threads:[~2018-12-28  9:15 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-28  8:37 [PATCH RFC 0/2] Use new incompat feature BG_TREE to hugely reduce mount time Qu Wenruo
2018-12-28  8:37 ` [PATCH RFC 1/2] btrfs: Refactor btrfs_read_block_groups() Qu Wenruo
2018-12-28  8:37 ` [PATCH RFC 2/2] btrfs: Introduce new incompat feature, BG_TREE Qu Wenruo
2018-12-28  9:15 ` Nikolay Borisov [this message]
2018-12-28  9:28   ` [PATCH RFC 0/2] Use new incompat feature BG_TREE to hugely reduce mount time Qu Wenruo
2019-01-02 16:21     ` David Sterba
2019-01-03  0:14       ` Qu Wenruo
2019-01-03  0:57         ` Hans van Kranenburg
2019-01-03  1:10           ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=38520a81-e5b0-35b5-bf81-966ea6ef2c45@suse.com \
    --to=nborisov@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.