Linux-BTRFS Archive on
 help / color / Atom feed
From: Qu Wenruo <>
To:, Qu WenRuo <>,
	"" <>
Subject: Re: [PATCH v2 0/7] btrfs-progs: Support for BG_TREE feature
Date: Tue, 22 Oct 2019 14:30:22 +0800
Message-ID: <> (raw)
In-Reply-To: <>

[-- Attachment #1.1: Type: text/plain, Size: 5254 bytes --]

BTW, there is one important compatibility problem related to all the BGI
related features.

Although I'm putting the BG_TREE feature as incompatible feature, but in
theory, it should be RO compatible.

As except extent/bg tree, we *should* read the fs without any problem.

But the problem is, current btrfs mount (including btrfs-check) still
need to go through the block group item search, even for permanent RO mount.

This get my rescue mount option patchset to be involved.
If we have such skip_bg feature earlier, we can completely afford to
make all these potential features as RO compatible.

Now my question is,  should we put this feature still as incompatible


On 2019/10/22 上午8:49, Qu Wenruo wrote:
> On 2019/10/21 下午11:44, David Sterba wrote:
>> On Sat, Oct 19, 2019 at 08:04:51AM +0800, Qu Wenruo wrote:
>>> That's wonderful.
>>> Although I guess my patchset should provide the hint of where to modify
>>> the code, since there are only a limited number of places we modify
>>> block group item.
>> I indeed started at your patchset and grepped fro BG_TREE, adding
>> another branch.
>>>> We agree on the point that the block group items must be packed. The key
>>>> approach should move the new BGI to the beginning, ie. key type is
>>>> smaller than anything that appears in the extent tree. I chose 100 for
>>>> the prototype, it could change.
>>>> To keep changes to minimum, the new BGI uses the same block group item,
>>>> so the only difference then becomes how we search for the items.
>>> If we're introducing new block group item, I hope to do a minor change.
>>> Remove the chunk_objectid member, or even flags. to make it more
>>> compact. So that you can make the BGI subtree even smaller.
>> Yeah that can be done.
>>> But I guess since you don't want to modify the BGI structure, and keep
>>> the code modification minimal, it may not be a good idea right now.
>> As long as the changes are bearable, eg. a minor refactoring of block
>> group access (the cache.key serving a as offset and length) is fine. And
>> if we can make the b-tree item more then let's do it.
>>>> Packing of the items is done by swapping the key objectid and offset.
>>>> Normal BGI has bg.start == key.objectid and bg.length == key.offset. As
>>>> the objectid is the thing that scatters the items all over the tree.
>>>> So the new BGI has bg.length == key.objectid and bg.start == key.offset.
>>>> As most of block groups are of same size, or from a small set, they're
>>>> packed.
>>> That doesn't look optimized enough.
>>> bg.length can be at 1G, that means if extents starts below 1G can still
>>> be before BGIs.
>> This shold not be a big problem, the items are still grouped togethers.
>> Mkfs does 8M, we can have 256M or 1G. On average there could be several
>> packed groups, which I think is fine and the estimated overhead would be
>> a few more seeks.
>>> I believe we should have a fixed objectid for this new BGIs, so that
>>> they are ensured to be at the beginning of extent tree.
>> That was my idea too, but that also requires to add one more member to
>> the item to track the length. Currently the key is saves the bytes. With
>> the proposed changes to drop chunk_objectid, the overall size of BGI
>> would not increase so this still sounds ok. And all the problems with
>> searching would go away.
>>>> The nice thing is that a lot of code can be shared between BGI and new
>>>> BGI, just needs some care with searches, inserts and search key
>>>> advances.
>>> Exactly, but since we're introducing a new key type, I prefer to perfect
>>> it. Not only change the key, but also the block group item structure to
>>> make it more compact.
>>> Although from the design aspect, it looks like BG tree along with new
>>> BGI would be the best design.
>>> New BG key goes (bg start, NEW BGI TYPE, used) no data. It would provide
>>> the most compact on-disk format.
>> That's very compact. Given the 'bg start' we can't use the same for the
>> extent tree item.
>>>> This would be easy with the bg_tree, because we'd know that all items in
>>>> the tree are just the block group items. Some sort of enumeration could
>>>> work for bg_key too, but I don't have something solid.
>>> Why not fixed objectid for BGI and just ignore the bg.len part?
>> I wanted to avoid storing a useless number, it costs 8 bytes per item,
>> and the simple swap of objectid/offset was first idea how to avoid it.
>>> We have chunk<->BGI verification code, no bg.len is not a problem at
>>> all, we can still make sure chunk<->bg is 1:1 mapped and still verify
>>> the bg.used.
>> This is all great, sure, and makes the extensions easy. Let's try to
>> work out best for each approach we have so far. Having a separate tree
>> for block groups may open some future options.
> Great, I'll explore the most compact key only method with BG_TREE.
> And maybe also try the fixed key objectid solution, just dropping the
> bg.length, while keep the current BGI structure.
> Thanks,
> Qu

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

  reply index

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-08  4:49 Qu Wenruo
2019-10-08  4:49 ` [PATCH v2 1/7] btrfs-progs: Refactor excluded extent functions to use fs_info Qu Wenruo
2019-10-08  9:22   ` Johannes Thumshirn
2019-10-17  2:16   ` Anand Jain
2019-10-08  4:49 ` [PATCH v2 2/7] btrfs-progs: Refactor btrfs_read_block_groups() Qu Wenruo
2019-10-17  3:23   ` Anand Jain
2019-10-17  4:33     ` Qu Wenruo
2019-10-17  5:08       ` Anand Jain
2019-10-08  4:49 ` [PATCH v2 3/7] btrfs-progs: Enable read-write ability for 'bg_tree' feature Qu Wenruo
2019-10-17  4:56   ` Anand Jain
2019-10-08  4:49 ` [PATCH v2 4/7] btrfs-progs: mkfs: Introduce -O bg-tree Qu Wenruo
2019-10-08  8:16   ` [PATCH v2.1 " Qu Wenruo
2019-10-08  4:49 ` [PATCH v2 5/7] btrfs-progs: dump-tree/dump-super: Introduce support for bg tree Qu Wenruo
2019-10-08  4:49 ` [PATCH v2 6/7] btrfs-progs: check: Introduce support for bg-tree feature Qu Wenruo
2019-10-08  4:49 ` [PATCH v2 7/7] btrfs-progs: btrfstune: Allow to enable bg-tree feature offline Qu Wenruo
2019-10-17  4:17   ` Anand Jain
2019-10-17  4:28     ` Qu Wenruo
2019-10-14 15:17 ` [PATCH v2 0/7] btrfs-progs: Support for BG_TREE feature David Sterba
2019-10-15  0:32   ` Qu Wenruo
2019-10-16 11:16     ` David Sterba
2019-10-16 11:19       ` Qu WenRuo
2019-10-18 17:27         ` David Sterba
2019-10-19  0:04           ` Qu Wenruo
2019-10-21 15:44             ` David Sterba
2019-10-22  0:49               ` Qu Wenruo
2019-10-22  6:30                 ` Qu Wenruo [this message]
2019-10-22 12:23                   ` David Sterba
2019-10-22 12:27                     ` Qu Wenruo

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on

Archives are clonable:
	git clone --mirror linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ \
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone