linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] btrfs: Introduce new incompat feature BG_TREE to hugely reduce mount time
@ 2019-10-08  4:49 Qu Wenruo
  2019-10-08  4:49 ` [PATCH v2 1/3] btrfs: block-group: Refactor btrfs_read_block_groups() Qu Wenruo
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Qu Wenruo @ 2019-10-08  4:49 UTC (permalink / raw)
  To: linux-btrfs

This patchset can be fetched from:
https://github.com/adam900710/linux/tree/bg_tree
Which is based on v5.4-rc1 tag.

This patchset will hugely reduce mount time of large fs by putting all
block group items into its own tree.

The old behavior will try to read out all block group items at mount
time, however due to the key of block group items are scattered across
tons of extent items, we must call btrfs_search_slot() for each block
group.

It works fine for small fs, but when number of block groups goes beyond
200, such tree search will become a random read, causing obvious slow
down.

On the other hand, btrfs_read_chunk_tree() is still very fast, since we
put CHUNK_ITEMS into their own tree and package them next to each other.

Following this idea, we could do the same thing for block group items,
so instead of triggering btrfs_search_slot() for each block group, we
just call btrfs_next_item() and under most case we could finish in
memory, and hugely speed up mount (see BENCHMARK below).

The only disadvantage is, this method introduce an incompatible feature,
so existing fs can't use this feature directly.
Either specify it at mkfs time, or use btrfs-progs offline convert tool.

[[Benchmark]]
Since I have upgraded my rig to all NVME storage, there is no HDD
test result.

Physical device:	NVMe SSD
VM device:		VirtIO block device, backup by sparse file
Nodesize:		4K  (to bump up tree height)
Extent data size:	4M
Fs size used:		1T

All file extents on disk is in 4M size, preallocated to reduce space usage
(as the VM uses loopback block device backed by sparse file)

Without patchset:
Use ftrace function graph:

 7)               |  open_ctree [btrfs]() {
 7)               |    btrfs_read_block_groups [btrfs]() {
 7) @ 805851.8 us |    }
 7) @ 911890.2 us |  }

 btrfs_read_block_groups() takes 88% of the total mount time,

With patchset, and use -O bg-tree mkfs option:

 6)               |  open_ctree [btrfs]() {
 6)               |    btrfs_read_block_groups [btrfs]() {
 6) * 91204.69 us |    }
 6) @ 192039.5 us |  }

  open_ctree() time is only 21% of original mount time.
  And btrfs_read_block_groups() only takes 47% of total open_ctree()
  execution time.

The reason is pretty obvious when considering how many tree blocks needs
to be read from disk:
- Original extent tree:
  nodes:	55
  leaves:	1025
  total:	1080
- Block group tree:
  nodes:	1
  leaves:	13
  total:	14

Not to mention all the tree blocks readahead works pretty fine for bg
tree, as we will read every item.
While readahead for extent tree will just be a diaster, as all block
groups are scatter across the whole extent tree.

Changelog:
v2:
- Rebase to v5.4-rc1
  Minor conflicts due to code moved to block-group.c
- Fix a bug where some block groups will not be loaded at mount time
  It's a bug in that refactor patch, not exposed by previous round of
  tests.
- Add a new patch to remove a dead check
- Update benchmark to NVMe based result
  Hardware upgrade is not always a good thing for benchmark.

Qu Wenruo (3):
  btrfs: block-group: Refactor btrfs_read_block_groups()
  btrfs: disk-io: Remove unnecessary check before freeing chunk root
  btrfs: Introduce new incompat feature, BG_TREE, to speed up mount time

 fs/btrfs/block-group.c          | 306 ++++++++++++++++++++------------
 fs/btrfs/ctree.h                |   5 +-
 fs/btrfs/disk-io.c              |  16 +-
 fs/btrfs/sysfs.c                |   2 +
 include/uapi/linux/btrfs.h      |   1 +
 include/uapi/linux/btrfs_tree.h |   3 +
 6 files changed, 213 insertions(+), 120 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-10-10  2:22 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-08  4:49 [PATCH v2 0/3] btrfs: Introduce new incompat feature BG_TREE to hugely reduce mount time Qu Wenruo
2019-10-08  4:49 ` [PATCH v2 1/3] btrfs: block-group: Refactor btrfs_read_block_groups() Qu Wenruo
2019-10-08  9:08   ` Johannes Thumshirn
2019-10-09 12:08   ` Anand Jain
2019-10-09 12:14     ` Qu WenRuo
2019-10-09 13:07   ` [PATCH " Qu Wenruo
2019-10-09 14:25     ` Filipe Manana
2019-10-08  4:49 ` [PATCH v2 2/3] btrfs: disk-io: Remove unnecessary check before freeing chunk root Qu Wenruo
2019-10-08  8:30   ` Johannes Thumshirn
2019-10-10  2:00   ` Anand Jain
2019-10-10  2:21     ` Qu Wenruo
2019-10-08  4:49 ` [PATCH v2 3/3] btrfs: Introduce new incompat feature, BG_TREE, to speed up mount time Qu Wenruo
2019-10-08  9:09   ` Johannes Thumshirn
2019-10-08  9:14 ` [PATCH v2 0/3] btrfs: Introduce new incompat feature BG_TREE to hugely reduce " Johannes Thumshirn
2019-10-08  9:26   ` Qu Wenruo
2019-10-08  9:47     ` Johannes Thumshirn
     [not found]       ` <b4821d86-eeb9-f21c-66aa-480df2d3a13d@feldspaten.org>
2019-10-09  7:43         ` Qu Wenruo
2019-10-09  8:08           ` Felix Niederwanger
2019-10-09 11:00             ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).