Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: dsterba@suse.cz, Qu WenRuo <wqu@suse.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH v2 0/7] btrfs-progs: Support for BG_TREE feature
Date: Tue, 22 Oct 2019 08:49:39 +0800
Message-ID: <07b33628-2cec-7bd3-26a1-e3be2367774a@gmx.com> (raw)
In-Reply-To: <20191021154404.GP3001@twin.jikos.cz>

[-- Attachment #1.1: Type: text/plain, Size: 4389 bytes --]



On 2019/10/21 下午11:44, David Sterba wrote:
> On Sat, Oct 19, 2019 at 08:04:51AM +0800, Qu Wenruo wrote:
>> That's wonderful.
>> Although I guess my patchset should provide the hint of where to modify
>> the code, since there are only a limited number of places we modify
>> block group item.
> 
> I indeed started at your patchset and grepped fro BG_TREE, adding
> another branch.
> 
>>> We agree on the point that the block group items must be packed. The key
>>> approach should move the new BGI to the beginning, ie. key type is
>>> smaller than anything that appears in the extent tree. I chose 100 for
>>> the prototype, it could change.
>>>
>>> To keep changes to minimum, the new BGI uses the same block group item,
>>> so the only difference then becomes how we search for the items.
>>
>> If we're introducing new block group item, I hope to do a minor change.
>>
>> Remove the chunk_objectid member, or even flags. to make it more
>> compact. So that you can make the BGI subtree even smaller.
> 
> Yeah that can be done.
> 
>> But I guess since you don't want to modify the BGI structure, and keep
>> the code modification minimal, it may not be a good idea right now.
> 
> As long as the changes are bearable, eg. a minor refactoring of block
> group access (the cache.key serving a as offset and length) is fine. And
> if we can make the b-tree item more then let's do it.
> 
>>> Packing of the items is done by swapping the key objectid and offset.
>>>
>>> Normal BGI has bg.start == key.objectid and bg.length == key.offset. As
>>> the objectid is the thing that scatters the items all over the tree.
>>>
>>> So the new BGI has bg.length == key.objectid and bg.start == key.offset.
>>> As most of block groups are of same size, or from a small set, they're
>>> packed.
>>
>> That doesn't look optimized enough.
>>
>> bg.length can be at 1G, that means if extents starts below 1G can still
>> be before BGIs.
> 
> This shold not be a big problem, the items are still grouped togethers.
> Mkfs does 8M, we can have 256M or 1G. On average there could be several
> packed groups, which I think is fine and the estimated overhead would be
> a few more seeks.
> 
>> I believe we should have a fixed objectid for this new BGIs, so that
>> they are ensured to be at the beginning of extent tree.
> 
> That was my idea too, but that also requires to add one more member to
> the item to track the length. Currently the key is saves the bytes. With
> the proposed changes to drop chunk_objectid, the overall size of BGI
> would not increase so this still sounds ok. And all the problems with
> searching would go away.
> 
>>> The nice thing is that a lot of code can be shared between BGI and new
>>> BGI, just needs some care with searches, inserts and search key
>>> advances.
>>
>> Exactly, but since we're introducing a new key type, I prefer to perfect
>> it. Not only change the key, but also the block group item structure to
>> make it more compact.
>>
>> Although from the design aspect, it looks like BG tree along with new
>> BGI would be the best design.
>>
>> New BG key goes (bg start, NEW BGI TYPE, used) no data. It would provide
>> the most compact on-disk format.
> 
> That's very compact. Given the 'bg start' we can't use the same for the
> extent tree item.
> 
>>> This would be easy with the bg_tree, because we'd know that all items in
>>> the tree are just the block group items. Some sort of enumeration could
>>> work for bg_key too, but I don't have something solid.
>>
>> Why not fixed objectid for BGI and just ignore the bg.len part?
> 
> I wanted to avoid storing a useless number, it costs 8 bytes per item,
> and the simple swap of objectid/offset was first idea how to avoid it.
> 
>> We have chunk<->BGI verification code, no bg.len is not a problem at
>> all, we can still make sure chunk<->bg is 1:1 mapped and still verify
>> the bg.used.
> 
> This is all great, sure, and makes the extensions easy. Let's try to
> work out best for each approach we have so far. Having a separate tree
> for block groups may open some future options.

Great, I'll explore the most compact key only method with BG_TREE.

And maybe also try the fixed key objectid solution, just dropping the
bg.length, while keep the current BGI structure.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply index

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-08  4:49 Qu Wenruo
2019-10-08  4:49 ` [PATCH v2 1/7] btrfs-progs: Refactor excluded extent functions to use fs_info Qu Wenruo
2019-10-08  9:22   ` Johannes Thumshirn
2019-10-17  2:16   ` Anand Jain
2019-10-08  4:49 ` [PATCH v2 2/7] btrfs-progs: Refactor btrfs_read_block_groups() Qu Wenruo
2019-10-17  3:23   ` Anand Jain
2019-10-17  4:33     ` Qu Wenruo
2019-10-17  5:08       ` Anand Jain
2019-10-08  4:49 ` [PATCH v2 3/7] btrfs-progs: Enable read-write ability for 'bg_tree' feature Qu Wenruo
2019-10-17  4:56   ` Anand Jain
2019-10-08  4:49 ` [PATCH v2 4/7] btrfs-progs: mkfs: Introduce -O bg-tree Qu Wenruo
2019-10-08  8:16   ` [PATCH v2.1 " Qu Wenruo
2019-10-08  4:49 ` [PATCH v2 5/7] btrfs-progs: dump-tree/dump-super: Introduce support for bg tree Qu Wenruo
2019-10-08  4:49 ` [PATCH v2 6/7] btrfs-progs: check: Introduce support for bg-tree feature Qu Wenruo
2019-10-08  4:49 ` [PATCH v2 7/7] btrfs-progs: btrfstune: Allow to enable bg-tree feature offline Qu Wenruo
2019-10-17  4:17   ` Anand Jain
2019-10-17  4:28     ` Qu Wenruo
2019-10-14 15:17 ` [PATCH v2 0/7] btrfs-progs: Support for BG_TREE feature David Sterba
2019-10-15  0:32   ` Qu Wenruo
2019-10-16 11:16     ` David Sterba
2019-10-16 11:19       ` Qu WenRuo
2019-10-18 17:27         ` David Sterba
2019-10-19  0:04           ` Qu Wenruo
2019-10-21 15:44             ` David Sterba
2019-10-22  0:49               ` Qu Wenruo [this message]
2019-10-22  6:30                 ` Qu Wenruo
2019-10-22 12:23                   ` David Sterba
2019-10-22 12:27                     ` Qu Wenruo

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=07b33628-2cec-7bd3-26a1-e3be2367774a@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git