All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/11] btrfs-progs: Support for SKINNY_BG_TREE feature
@ 2020-05-05  0:02 Qu Wenruo
  2020-05-05  0:02 ` [PATCH v4 01/11] btrfs-progs: check/lowmem: Lookup block group item in a seperate function Qu Wenruo
                   ` (11 more replies)
  0 siblings, 12 replies; 24+ messages in thread
From: Qu Wenruo @ 2020-05-05  0:02 UTC (permalink / raw)
  To: linux-btrfs

This patchset can be fetched from github:
https://github.com/adam900710/btrfs-progs/tree/skinny_bg_tree
Which is based on v5.6 tag, with extra cleanups (sent to mail list) applied.

This patchset provides the needed user space infrastructure for SKINNY_BG_TREE
feature.

Since it's an new incompatible feature, unlike SKINNY_METADATA, btrfs-progs
is needed to convert existing fs (unmounted) to new format, and
vice-verse.

Now btrfstune can convert regular extent tree fs to bg tree fs to
improve mount time.

For the performance improvement, please check the kernel patchset cover
letter or the last patch.
(SPOILER ALERT: It's super fast)

Changelog:
v2:
- Rebase to v5.2.2 tag
- Add btrfstune ability to convert existing fs to BG_TREE feature

v3:
- Fix a bug that temp chunks are not cleaned up properly
  This is caused by wrong timing btrfs_convert_to_bg_tree() is called.
  It should be called after temp chunks cleaned up.

- Fix a bug that an extent buffer get leaked
  This is caused by newly created bg tree not added to dirty list.

v4:
- Go with skinny bg tree other than regular block group item
  We're introducing a new incompatible feature anyway, why not go
  extreme?

- Use the same refactor as kernel.
  To make code much cleaner and easier to read.

- Add the ability to rollback to regular extent tree.
  So confident tester can try SKINNY_BG_TREE using their real world
  data, and rollback if they still want to mount it using older kernels.


Qu Wenruo (11):
  btrfs-progs: check/lowmem: Lookup block group item in a seperate
    function
  btrfs-progs: block-group: Refactor how we read one block group item
  btrfs-progs: Rename btrfs_remove_block_group() and
    free_block_group_item()
  btrfs-progs: block-group: Refactor how we insert a block group item
  btrfs-progs: block-group: Rename write_one_cahce_group()
  btrfs-progs: Introduce rw support for skinny_bg_tree
  btrfs-progs: mkfs: Introduce -O skinny-bg-tree
  btrfs-progs: dump-tree/dump-super: Introduce support for skinny bg
    tree
  btrfs-progs: check: Introduce support for bg-tree feature
  btrfs-progs: btrfstune: Allow to enable bg-tree feature offline
  btrfs-progs: btrfstune: Allow user to rollback to regular extent tree

 Documentation/btrfstune.asciidoc |  10 +
 btrfstune.c                      |  36 +-
 check/common.h                   |   4 +-
 check/main.c                     |  60 +++-
 check/mode-lowmem.c              | 140 +++++---
 cmds/inspect-dump-super.c        |   1 +
 cmds/inspect-dump-tree.c         |   6 +
 cmds/rescue-chunk-recover.c      |   5 +-
 common/fsfeatures.c              |   6 +
 ctree.h                          |  23 +-
 disk-io.c                        |  20 ++
 extent-tree.c                    | 546 +++++++++++++++++++++++++------
 mkfs/common.c                    |   3 +-
 mkfs/common.h                    |   3 +
 mkfs/main.c                      |  13 +-
 print-tree.c                     |   4 +
 root-tree.c                      |   6 +-
 transaction.c                    |   2 +
 18 files changed, 738 insertions(+), 150 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 01/11] btrfs-progs: check/lowmem: Lookup block group item in a seperate function
  2020-05-05  0:02 [PATCH v4 00/11] btrfs-progs: Support for SKINNY_BG_TREE feature Qu Wenruo
@ 2020-05-05  0:02 ` Qu Wenruo
  2020-05-06 17:24   ` Johannes Thumshirn
  2020-05-05  0:02 ` [PATCH v4 02/11] btrfs-progs: block-group: Refactor how we read one block group item Qu Wenruo
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Qu Wenruo @ 2020-05-05  0:02 UTC (permalink / raw)
  To: linux-btrfs

In check_chunk_item() we search extent tree for block group item.

Refactor this part into a separate function, find_block_group_item(),
so that later skinny-bg-tree feature can reuse it.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 check/mode-lowmem.c | 74 ++++++++++++++++++++++++++++-----------------
 1 file changed, 47 insertions(+), 27 deletions(-)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index 821ebc57c8ed..dbb90895127d 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -4499,6 +4499,50 @@ next:
 	return 0;
 }
 
+/*
+ * Find the block group item with @bytenr, @len and @type
+ *
+ * Return 0 if found.
+ * Return -ENOENT if not found.
+ * Return <0 for fatal error.
+ */
+static int find_block_group_item(struct btrfs_fs_info *fs_info,
+				 struct btrfs_path *path, u64 bytenr, u64 len,
+				 u64 type)
+{
+	struct btrfs_block_group_item bgi;
+	struct btrfs_key key;
+	int ret;
+
+	key.objectid = bytenr;
+	key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
+	key.offset = len;
+
+	ret = btrfs_search_slot(NULL, fs_info->extent_root, &key, path, 0, 0);
+	if (ret < 0)
+		return ret;
+	if (ret > 0) {
+		ret = -ENOENT;
+		error("chunk [%llu %llu) doesn't have related block group item",
+		      bytenr, bytenr + len);
+		goto out;
+	}
+	read_extent_buffer(path->nodes[0], &bgi,
+			btrfs_item_ptr_offset(path->nodes[0], path->slots[0]),
+			sizeof(bgi));
+	if (btrfs_stack_block_group_flags(&bgi) != type) {
+		error(
+"chunk [%llu %llu) type mismatch with block group, block group has 0x%llx chunk has %llx",
+		      bytenr, bytenr + len, btrfs_stack_block_group_flags(&bgi),
+		      type);
+		ret = -EUCLEAN;
+	}
+
+out:
+	btrfs_release_path(path);
+	return ret;
+}
+
 /*
  * Check a chunk item.
  * Including checking all referred dev_extents and block group
@@ -4506,16 +4550,12 @@ next:
 static int check_chunk_item(struct btrfs_fs_info *fs_info,
 			    struct extent_buffer *eb, int slot)
 {
-	struct btrfs_root *extent_root = fs_info->extent_root;
 	struct btrfs_root *dev_root = fs_info->dev_root;
 	struct btrfs_path path;
 	struct btrfs_key chunk_key;
-	struct btrfs_key bg_key;
 	struct btrfs_key devext_key;
 	struct btrfs_chunk *chunk;
 	struct extent_buffer *leaf;
-	struct btrfs_block_group_item *bi;
-	struct btrfs_block_group_item bg_item;
 	struct btrfs_dev_extent *ptr;
 	u64 length;
 	u64 chunk_end;
@@ -4542,31 +4582,11 @@ static int check_chunk_item(struct btrfs_fs_info *fs_info,
 	}
 	type = btrfs_chunk_type(eb, chunk);
 
-	bg_key.objectid = chunk_key.offset;
-	bg_key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
-	bg_key.offset = length;
-
 	btrfs_init_path(&path);
-	ret = btrfs_search_slot(NULL, extent_root, &bg_key, &path, 0, 0);
-	if (ret) {
-		error(
-		"chunk[%llu %llu) did not find the related block group item",
-			chunk_key.offset, chunk_end);
+	ret = find_block_group_item(fs_info, &path, chunk_key.offset, length,
+				    type);
+	if (ret < 0)
 		err |= REFERENCER_MISSING;
-	} else{
-		leaf = path.nodes[0];
-		bi = btrfs_item_ptr(leaf, path.slots[0],
-				    struct btrfs_block_group_item);
-		read_extent_buffer(leaf, &bg_item, (unsigned long)bi,
-				   sizeof(bg_item));
-		if (btrfs_stack_block_group_flags(&bg_item) != type) {
-			error(
-"chunk[%llu %llu) related block group item flags mismatch, wanted: %llu, have: %llu",
-				chunk_key.offset, chunk_end, type,
-				btrfs_stack_block_group_flags(&bg_item));
-			err |= REFERENCER_MISSING;
-		}
-	}
 
 	num_stripes = btrfs_chunk_num_stripes(eb, chunk);
 	stripe_len = btrfs_stripe_length(fs_info, eb, chunk);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 02/11] btrfs-progs: block-group: Refactor how we read one block group item
  2020-05-05  0:02 [PATCH v4 00/11] btrfs-progs: Support for SKINNY_BG_TREE feature Qu Wenruo
  2020-05-05  0:02 ` [PATCH v4 01/11] btrfs-progs: check/lowmem: Lookup block group item in a seperate function Qu Wenruo
@ 2020-05-05  0:02 ` Qu Wenruo
  2020-05-06 17:27   ` Johannes Thumshirn
  2020-05-05  0:02 ` [PATCH v4 03/11] btrfs-progs: Rename btrfs_remove_block_group() and free_block_group_item() Qu Wenruo
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Qu Wenruo @ 2020-05-05  0:02 UTC (permalink / raw)
  To: linux-btrfs

Structure btrfs_block_group has the following members which are
currently read from on-disk block group item and key:
- Length
  From item key.
- Used
- Flags
  From block group item.

However for incoming skinny block group tree, we are going to read those
members from different sources.

This patch will refactor such read by:
- Refactor length/used/flags initialization into one function
  The new function, fill_one_block_group() will handle the
  initialization of such members.

- Use btrfs_block_group::length to replace key::offset
  Since skinny block group item would have a different meaning for its
  key offset.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 extent-tree.c | 36 +++++++++++++++++++++++++++---------
 1 file changed, 27 insertions(+), 9 deletions(-)

diff --git a/extent-tree.c b/extent-tree.c
index bd7dbf551876..5fc4308336dd 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -172,6 +172,7 @@ static int btrfs_add_block_group_cache(struct btrfs_fs_info *info,
 	struct rb_node *parent = NULL;
 	struct btrfs_block_group *cache;
 
+	ASSERT(block_group->length != 0);
 	p = &info->block_group_cache_tree.rb_node;
 
 	while (*p) {
@@ -2630,6 +2631,27 @@ error:
 	return ret;
 }
 
+static int read_block_group_item(struct btrfs_block_group *cache,
+				 struct btrfs_path *path,
+				 const struct btrfs_key *key)
+{
+	struct extent_buffer *leaf = path->nodes[0];
+	struct btrfs_block_group_item bgi;
+	int slot = path->slots[0];
+
+	ASSERT(key->type == BTRFS_BLOCK_GROUP_ITEM_KEY);
+
+	cache->start = key->objectid;
+	cache->length = key->offset;
+
+	read_extent_buffer(leaf, &bgi, btrfs_item_ptr_offset(leaf, slot),
+			   sizeof(bgi));
+	cache->used = btrfs_stack_block_group_used(&bgi);
+	cache->flags = btrfs_stack_block_group_flags(&bgi);
+
+	return 0;
+}
+
 /*
  * Read out one BLOCK_GROUP_ITEM and insert it into block group cache.
  *
@@ -2642,7 +2664,6 @@ static int read_one_block_group(struct btrfs_fs_info *fs_info,
 	struct extent_buffer *leaf = path->nodes[0];
 	struct btrfs_space_info *space_info;
 	struct btrfs_block_group *cache;
-	struct btrfs_block_group_item bgi;
 	struct btrfs_key key;
 	int slot = path->slots[0];
 	int ret;
@@ -2660,14 +2681,11 @@ static int read_one_block_group(struct btrfs_fs_info *fs_info,
 	cache = kzalloc(sizeof(*cache), GFP_NOFS);
 	if (!cache)
 		return -ENOMEM;
-	read_extent_buffer(leaf, &bgi, btrfs_item_ptr_offset(leaf, slot),
-			   sizeof(bgi));
-	cache->start = key.objectid;
-	cache->length = key.offset;
-	cache->cached = 0;
-	cache->pinned = 0;
-	cache->flags = btrfs_stack_block_group_flags(&bgi);
-	cache->used = btrfs_stack_block_group_used(&bgi);
+	ret = read_block_group_item(cache, path, &key);
+	if (ret < 0) {
+		free(cache);
+		return ret;
+	}
 	INIT_LIST_HEAD(&cache->dirty_list);
 
 	set_avail_alloc_bits(fs_info, cache->flags);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 03/11] btrfs-progs: Rename btrfs_remove_block_group() and free_block_group_item()
  2020-05-05  0:02 [PATCH v4 00/11] btrfs-progs: Support for SKINNY_BG_TREE feature Qu Wenruo
  2020-05-05  0:02 ` [PATCH v4 01/11] btrfs-progs: check/lowmem: Lookup block group item in a seperate function Qu Wenruo
  2020-05-05  0:02 ` [PATCH v4 02/11] btrfs-progs: block-group: Refactor how we read one block group item Qu Wenruo
@ 2020-05-05  0:02 ` Qu Wenruo
  2020-05-07 11:05   ` Johannes Thumshirn
  2020-05-05  0:02 ` [PATCH v4 04/11] btrfs-progs: block-group: Refactor how we insert a block group item Qu Wenruo
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Qu Wenruo @ 2020-05-05  0:02 UTC (permalink / raw)
  To: linux-btrfs

To sync with the refactored kernel code.

Also since we're here, sync the function parameters with kernel too.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 ctree.h       |  4 +--
 extent-tree.c | 80 +++++++++++++++++----------------------------------
 mkfs/main.c   |  2 +-
 3 files changed, 30 insertions(+), 56 deletions(-)

diff --git a/ctree.h b/ctree.h
index 0256b0e6bc3d..7c7c992cd885 100644
--- a/ctree.h
+++ b/ctree.h
@@ -2597,8 +2597,8 @@ int btrfs_record_file_extent(struct btrfs_trans_handle *trans,
 			      struct btrfs_inode_item *inode,
 			      u64 file_pos, u64 disk_bytenr,
 			      u64 num_bytes);
-int btrfs_free_block_group(struct btrfs_trans_handle *trans,
-			   struct btrfs_fs_info *fs_info, u64 bytenr, u64 len);
+int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
+			     u64 bytenr, u64 len);
 void free_excluded_extents(struct btrfs_fs_info *fs_info,
 			   struct btrfs_block_group *cache);
 int exclude_super_stripes(struct btrfs_fs_info *fs_info,
diff --git a/extent-tree.c b/extent-tree.c
index 5fc4308336dd..3de95052a645 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -2907,35 +2907,26 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
  * Caller should ensure the block group is empty and all space is pinned.
  * Or new tree block/data may be allocated into it.
  */
-static int free_block_group_item(struct btrfs_trans_handle *trans,
-				 struct btrfs_fs_info *fs_info,
-				 u64 bytenr, u64 len)
+static int remove_block_group_item(struct btrfs_trans_handle *trans,
+				   struct btrfs_path *path,
+				   struct btrfs_block_group *block_group)
 {
-	struct btrfs_path *path;
-	struct btrfs_key key;
+	struct btrfs_fs_info *fs_info = trans->fs_info;
 	struct btrfs_root *root = fs_info->extent_root;
+	struct btrfs_key key;
 	int ret = 0;
 
-	key.objectid = bytenr;
-	key.offset = len;
+	key.objectid = block_group->start;
+	key.offset = block_group->length;
 	key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
 
-	path = btrfs_alloc_path();
-	if (!path)
-		return -ENOMEM;
-
 	ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
-	if (ret > 0) {
+	if (ret > 0)
 		ret = -ENOENT;
-		goto out;
-	}
 	if (ret < 0)
-		goto out;
+		return ret;
 
-	ret = btrfs_del_item(trans, root, path);
-out:
-	btrfs_free_path(path);
-	return ret;
+	return btrfs_del_item(trans, root, path);
 }
 
 static int free_dev_extent_item(struct btrfs_trans_handle *trans,
@@ -3176,42 +3167,25 @@ out:
 	return ret;
 }
 
-int btrfs_free_block_group(struct btrfs_trans_handle *trans,
-			   struct btrfs_fs_info *fs_info, u64 bytenr, u64 len)
+int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
+			     u64 bytenr, u64 len)
 {
-	struct btrfs_root *extent_root = fs_info->extent_root;
-	struct btrfs_path *path;
-	struct btrfs_block_group_item *bgi;
-	struct btrfs_key key;
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_block_group *block_group;
+	struct btrfs_path path;
 	int ret = 0;
 
-	path = btrfs_alloc_path();
-	if (!path)
-		return -ENOMEM;
-
-	key.objectid = bytenr;
-	key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
-	key.offset = len;
-
+	block_group = btrfs_lookup_block_group(fs_info, bytenr);
+	if (!block_group || block_group->start != bytenr ||
+	    block_group->length != len)
+		return -ENOENT;
 	/* Double check the block group to ensure it's empty */
-	ret = btrfs_search_slot(trans, extent_root, &key, path, 0, 0);
-	if (ret > 0) {
-		ret = -ENONET;
-		goto out;
-	}
-	if (ret < 0)
-		goto out;
-
-	bgi = btrfs_item_ptr(path->nodes[0], path->slots[0],
-			     struct btrfs_block_group_item);
-	if (btrfs_block_group_used(path->nodes[0], bgi)) {
+	if (block_group->used) {
 		fprintf(stderr,
 			"WARNING: block group [%llu,%llu) is not empty\n",
 			bytenr, bytenr + len);
-		ret = -EINVAL;
-		goto out;
+		return -EUCLEAN;
 	}
-	btrfs_release_path(path);
 
 	/*
 	 * Now pin all space in the block group, to prevent further transaction
@@ -3220,14 +3194,16 @@ int btrfs_free_block_group(struct btrfs_trans_handle *trans,
 	 */
 	btrfs_pin_extent(fs_info, bytenr, len);
 
+	btrfs_init_path(&path);
 	/* delete block group item and chunk item */
-	ret = free_block_group_item(trans, fs_info, bytenr, len);
+	ret = remove_block_group_item(trans, &path, block_group);
+	btrfs_release_path(&path);
 	if (ret < 0) {
 		fprintf(stderr,
 			"failed to free block group item for [%llu,%llu)\n",
 			bytenr, bytenr + len);
 		btrfs_unpin_extent(fs_info, bytenr, len);
-		goto out;
+		return ret;
 	}
 
 	ret = free_chunk_dev_extent_items(trans, fs_info, bytenr);
@@ -3236,7 +3212,7 @@ int btrfs_free_block_group(struct btrfs_trans_handle *trans,
 			"failed to dev extents belongs to [%llu,%llu)\n",
 			bytenr, bytenr + len);
 		btrfs_unpin_extent(fs_info, bytenr, len);
-		goto out;
+		return ret;
 	}
 	ret = free_chunk_item(trans, fs_info, bytenr);
 	if (ret < 0) {
@@ -3244,15 +3220,13 @@ int btrfs_free_block_group(struct btrfs_trans_handle *trans,
 			"failed to free chunk for [%llu,%llu)\n",
 			bytenr, bytenr + len);
 		btrfs_unpin_extent(fs_info, bytenr, len);
-		goto out;
+		return ret;
 	}
 
 	/* Now release the block_group_cache */
 	ret = free_block_group_cache(trans, fs_info, bytenr, len);
 	btrfs_unpin_extent(fs_info, bytenr, len);
 
-out:
-	btrfs_free_path(path);
 	return ret;
 }
 
diff --git a/mkfs/main.c b/mkfs/main.c
index 89f3877fa3b2..2c28d0b159a6 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -644,7 +644,7 @@ static int cleanup_temp_chunks(struct btrfs_fs_info *fs_info,
 					sys_profile)) {
 			u64 flags = btrfs_block_group_flags(path.nodes[0], bgi);
 
-			ret = btrfs_free_block_group(trans, fs_info,
+			ret = btrfs_remove_block_group(trans,
 					found_key.objectid, found_key.offset);
 			if (ret < 0)
 				goto out;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 04/11] btrfs-progs: block-group: Refactor how we insert a block group item
  2020-05-05  0:02 [PATCH v4 00/11] btrfs-progs: Support for SKINNY_BG_TREE feature Qu Wenruo
                   ` (2 preceding siblings ...)
  2020-05-05  0:02 ` [PATCH v4 03/11] btrfs-progs: Rename btrfs_remove_block_group() and free_block_group_item() Qu Wenruo
@ 2020-05-05  0:02 ` Qu Wenruo
  2020-05-08 14:23   ` Johannes Thumshirn
  2020-05-05  0:02 ` [PATCH v4 05/11] btrfs-progs: block-group: Rename write_one_cahce_group() Qu Wenruo
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Qu Wenruo @ 2020-05-05  0:02 UTC (permalink / raw)
  To: linux-btrfs

Currently the block group item insert is pretty straight forward, fill
the block group item structure and insert it into extent tree.

However the incoming skinny block group feature is going to change this,
so this patch will refactor such insert into a new function,
insert_block_group_item(), to make the incoming feature easier to add.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 extent-tree.c | 34 +++++++++++++++++++++-------------
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/extent-tree.c b/extent-tree.c
index 3de95052a645..911fd25f3c6e 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -2804,6 +2804,26 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans,
 	return 0;
 }
 
+static int insert_block_group_item(struct btrfs_trans_handle *trans,
+				   struct btrfs_block_group *block_group)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_block_group_item bgi;
+	struct btrfs_root *root;
+	struct btrfs_key key;
+
+	btrfs_set_stack_block_group_used(&bgi, block_group->used);
+	btrfs_set_stack_block_group_chunk_objectid(&bgi,
+				BTRFS_FIRST_CHUNK_TREE_OBJECTID);
+	btrfs_set_stack_block_group_flags(&bgi, block_group->flags);
+	key.objectid = block_group->start;
+	key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
+	key.offset = block_group->length;
+
+	root = fs_info->extent_root;
+	return btrfs_insert_item(trans, root, &key, &bgi, sizeof(bgi));
+}
+
 /*
  * This is for converter use only.
  *
@@ -2822,7 +2842,6 @@ int btrfs_make_block_groups(struct btrfs_trans_handle *trans,
 	u64 total_data = 0;
 	u64 total_metadata = 0;
 	int ret;
-	struct btrfs_root *extent_root = fs_info->extent_root;
 	struct btrfs_block_group *cache;
 
 	total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
@@ -2873,21 +2892,10 @@ int btrfs_make_block_groups(struct btrfs_trans_handle *trans,
 	/* then insert all the items */
 	cur_start = 0;
 	while(cur_start < total_bytes) {
-		struct btrfs_block_group_item bgi;
-		struct btrfs_key key;
-
 		cache = btrfs_lookup_block_group(fs_info, cur_start);
 		BUG_ON(!cache);
 
-		btrfs_set_stack_block_group_used(&bgi, cache->used);
-		btrfs_set_stack_block_group_flags(&bgi, cache->flags);
-		btrfs_set_stack_block_group_chunk_objectid(&bgi,
-				BTRFS_FIRST_CHUNK_TREE_OBJECTID);
-		key.objectid = cache->start;
-		key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
-		key.offset = cache->length;
-		ret = btrfs_insert_item(trans, extent_root, &key, &bgi,
-					sizeof(bgi));
+		ret = insert_block_group_item(trans, cache);
 		BUG_ON(ret);
 
 		cur_start = cache->start + cache->length;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 05/11] btrfs-progs: block-group: Rename write_one_cahce_group()
  2020-05-05  0:02 [PATCH v4 00/11] btrfs-progs: Support for SKINNY_BG_TREE feature Qu Wenruo
                   ` (3 preceding siblings ...)
  2020-05-05  0:02 ` [PATCH v4 04/11] btrfs-progs: block-group: Refactor how we insert a block group item Qu Wenruo
@ 2020-05-05  0:02 ` Qu Wenruo
  2020-05-08 14:24   ` Johannes Thumshirn
  2020-05-05  0:02 ` [PATCH v4 06/11] btrfs-progs: Introduce rw support for skinny_bg_tree Qu Wenruo
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Qu Wenruo @ 2020-05-05  0:02 UTC (permalink / raw)
  To: linux-btrfs

The name of this function contains the word "cache", which is left from
the era where btrfs_block_group is called btrfs_block_group_cache.

Now this "cache" doesn't match any thing, and we have better namings for
functions like read/insert/remove_block_group_item().

So rename this function to update_block_group_item().

Since we're here, also rename the local variables to be more like a
Chrismas tree, and rename @extent_root to @root for later reuse.
And replace the BUG_ON() with proper error handling.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 extent-tree.c | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/extent-tree.c b/extent-tree.c
index 911fd25f3c6e..89e38e2ed7ae 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -1527,25 +1527,28 @@ int btrfs_dec_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 	return __btrfs_mod_ref(trans, root, buf, record_parent, 0);
 }
 
-static int write_one_cache_group(struct btrfs_trans_handle *trans,
-				 struct btrfs_path *path,
-				 struct btrfs_block_group *cache)
+static int update_block_group_item(struct btrfs_trans_handle *trans,
+				   struct btrfs_path *path,
+				   struct btrfs_block_group *cache)
 {
-	int ret;
-	struct btrfs_root *extent_root = trans->fs_info->extent_root;
-	unsigned long bi;
+	struct btrfs_fs_info *fs_info = trans->fs_info;
 	struct btrfs_block_group_item bgi;
 	struct extent_buffer *leaf;
+	struct btrfs_root *root;
 	struct btrfs_key key;
+	unsigned long bi;
+	int ret;
 
 	key.objectid = cache->start;
 	key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
 	key.offset = cache->length;
+	root = fs_info->extent_root;
 
-	ret = btrfs_search_slot(trans, extent_root, &key, path, 0, 1);
+	ret = btrfs_search_slot(trans, root, &key, path, 0, 1);
+	if (ret > 0)
+		ret = -ENOENT;
 	if (ret < 0)
 		goto fail;
-	BUG_ON(ret);
 
 	leaf = path->nodes[0];
 	bi = btrfs_item_ptr_offset(leaf, path->slots[0]);
@@ -1555,11 +1558,9 @@ static int write_one_cache_group(struct btrfs_trans_handle *trans,
 			BTRFS_FIRST_CHUNK_TREE_OBJECTID);
 	write_extent_buffer(leaf, &bgi, bi, sizeof(bgi));
 	btrfs_mark_buffer_dirty(leaf);
-	btrfs_release_path(path);
 fail:
-	if (ret)
-		return ret;
-	return 0;
+	btrfs_release_path(path);
+	return ret;
 
 }
 
@@ -1577,7 +1578,7 @@ int btrfs_write_dirty_block_groups(struct btrfs_trans_handle *trans)
 		cache = list_first_entry(&trans->dirty_bgs,
 				 struct btrfs_block_group, dirty_list);
 		list_del_init(&cache->dirty_list);
-		ret = write_one_cache_group(trans, path, cache);
+		ret = update_block_group_item(trans, path, cache);
 		if (ret)
 			break;
 	}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 06/11] btrfs-progs: Introduce rw support for skinny_bg_tree
  2020-05-05  0:02 [PATCH v4 00/11] btrfs-progs: Support for SKINNY_BG_TREE feature Qu Wenruo
                   ` (4 preceding siblings ...)
  2020-05-05  0:02 ` [PATCH v4 05/11] btrfs-progs: block-group: Rename write_one_cahce_group() Qu Wenruo
@ 2020-05-05  0:02 ` Qu Wenruo
  2020-05-05  0:02 ` [PATCH v4 07/11] btrfs-progs: mkfs: Introduce -O skinny-bg-tree Qu Wenruo
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Qu Wenruo @ 2020-05-05  0:02 UTC (permalink / raw)
  To: linux-btrfs

The ability to read/write a fs with skinny_bg_tree feature.
The code is mostly synced from kernel support.

Please note that, currently the support is just open/read/write, no
conversion support yet.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 ctree.h       |  15 ++++-
 disk-io.c     |  20 ++++++
 extent-tree.c | 167 +++++++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 192 insertions(+), 10 deletions(-)

diff --git a/ctree.h b/ctree.h
index 7c7c992cd885..9ce73008a7e0 100644
--- a/ctree.h
+++ b/ctree.h
@@ -91,6 +91,9 @@ struct btrfs_free_space_ctl;
 /* tracks free space in block groups. */
 #define BTRFS_FREE_SPACE_TREE_OBJECTID 10ULL
 
+/* store SKINNY_BLOCK_GROUP_ITEMs in a seperate tree */
+#define BTRFS_BLOCK_GROUP_TREE_OBJECTID 11ULL
+
 /* device stats in the device tree */
 #define BTRFS_DEV_STATS_OBJECTID 0ULL
 
@@ -495,6 +498,7 @@ struct btrfs_super_block {
 #define BTRFS_FEATURE_INCOMPAT_NO_HOLES		(1ULL << 9)
 #define BTRFS_FEATURE_INCOMPAT_METADATA_UUID    (1ULL << 10)
 #define BTRFS_FEATURE_INCOMPAT_RAID1C34		(1ULL << 11)
+#define BTRFS_FEATURE_INCOMPAT_SKINNY_BG_TREE	(1ULL << 12)
 
 #define BTRFS_FEATURE_COMPAT_SUPP		0ULL
 
@@ -519,7 +523,8 @@ struct btrfs_super_block {
 	 BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA |	\
 	 BTRFS_FEATURE_INCOMPAT_NO_HOLES |		\
 	 BTRFS_FEATURE_INCOMPAT_RAID1C34 |		\
-	 BTRFS_FEATURE_INCOMPAT_METADATA_UUID)
+	 BTRFS_FEATURE_INCOMPAT_METADATA_UUID |		\
+	 BTRFS_FEATURE_INCOMPAT_SKINNY_BG_TREE)
 
 /*
  * A leaf is full of items. offset and size tell us where to find
@@ -1147,6 +1152,7 @@ struct btrfs_fs_info {
 	struct btrfs_root *quota_root;
 	struct btrfs_root *free_space_root;
 	struct btrfs_root *uuid_root;
+	struct btrfs_root *bg_root;
 
 	struct rb_root fs_root_tree;
 
@@ -1355,6 +1361,13 @@ static inline u32 BTRFS_MAX_XATTR_SIZE(const struct btrfs_fs_info *info)
  */
 #define BTRFS_BLOCK_GROUP_ITEM_KEY 192
 
+/*
+ * More optimized block group item, use key.objectid for block group bytenr,
+ * key.offset for used bytes.
+ * No item data needed.
+ */
+#define BTRFS_SKINNY_BLOCK_GROUP_ITEM_KEY 193
+
 /*
  * Every block group is represented in the free space tree by a free space info
  * item, which stores some accounting information. It is keyed on
diff --git a/disk-io.c b/disk-io.c
index c895bd277491..4cfb48326e3b 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -751,6 +751,8 @@ struct btrfs_root *btrfs_read_fs_root(struct btrfs_fs_info *fs_info,
 	if (location->objectid == BTRFS_FREE_SPACE_TREE_OBJECTID)
 		return fs_info->free_space_root ? fs_info->free_space_root :
 						ERR_PTR(-ENOENT);
+	if (location->objectid == BTRFS_BLOCK_GROUP_TREE_OBJECTID)
+		return fs_info->bg_root ? fs_info->bg_root : ERR_PTR(-ENOENT);
 
 	BUG_ON(location->objectid == BTRFS_TREE_RELOC_OBJECTID ||
 	       location->offset != (u64)-1);
@@ -803,6 +805,7 @@ struct btrfs_fs_info *btrfs_new_fs_info(int writable, u64 sb_bytenr)
 	fs_info->quota_root = calloc(1, sizeof(struct btrfs_root));
 	fs_info->free_space_root = calloc(1, sizeof(struct btrfs_root));
 	fs_info->uuid_root = calloc(1, sizeof(struct btrfs_root));
+	fs_info->bg_root = calloc(1, sizeof(struct btrfs_root));
 	fs_info->super_copy = calloc(1, BTRFS_SUPER_INFO_SIZE);
 
 	if (!fs_info->tree_root || !fs_info->extent_root ||
@@ -968,6 +971,21 @@ int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info, u64 root_tree_bytenr,
 		return ret;
 	fs_info->extent_root->track_dirty = 1;
 
+	if (btrfs_fs_incompat(fs_info, SKINNY_BG_TREE)) {
+		ret = setup_root_or_create_block(fs_info, flags,
+					fs_info->bg_root,
+					BTRFS_BLOCK_GROUP_TREE_OBJECTID, "bg");
+		if (ret < 0) {
+			error("Couldn't setup bg tree");
+			return ret;
+		}
+		fs_info->bg_root->track_dirty = 1;
+		fs_info->bg_root->ref_cows = 0;
+	} else {
+		free(fs_info->bg_root);
+		fs_info->bg_root = NULL;
+	}
+
 	ret = find_and_setup_root(root, fs_info, BTRFS_DEV_TREE_OBJECTID,
 				  fs_info->dev_root);
 	if (ret) {
@@ -1056,6 +1074,8 @@ void btrfs_release_all_roots(struct btrfs_fs_info *fs_info)
 		free_extent_buffer(fs_info->dev_root->node);
 	if (fs_info->extent_root)
 		free_extent_buffer(fs_info->extent_root->node);
+	if (fs_info->bg_root)
+		free_extent_buffer(fs_info->bg_root->node);
 	if (fs_info->tree_root)
 		free_extent_buffer(fs_info->tree_root->node);
 	if (fs_info->log_root_tree)
diff --git a/extent-tree.c b/extent-tree.c
index 89e38e2ed7ae..179fce4422cf 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -1527,6 +1527,36 @@ int btrfs_dec_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 	return __btrfs_mod_ref(trans, root, buf, record_parent, 0);
 }
 
+static int locate_skinny_bg_item(struct btrfs_fs_info *fs_info,
+				 struct btrfs_trans_handle *trans,
+				 struct btrfs_block_group *bg,
+				 struct btrfs_path *path,
+				 int ins_len, int cow)
+{
+	struct btrfs_root *bg_root = fs_info->bg_root;
+	struct btrfs_key key;
+	int ret;
+
+	key.objectid = bg->start;
+	key.type = BTRFS_SKINNY_BLOCK_GROUP_ITEM_KEY;
+	key.offset = (u64)-1;
+
+	ret = btrfs_search_slot(trans, bg_root, &key, path, ins_len, cow);
+	if (ret == 0)
+		ret = -EUCLEAN;
+	if (ret < 0)
+		goto error;
+	ret = btrfs_previous_item(bg_root, path, key.objectid, key.type);
+	if (ret > 0)
+		ret = -ENOENT;
+	if (ret < 0)
+		goto error;
+	return ret;
+error:
+	btrfs_release_path(path);
+	return ret;
+}
+
 static int update_block_group_item(struct btrfs_trans_handle *trans,
 				   struct btrfs_path *path,
 				   struct btrfs_block_group *cache)
@@ -1539,6 +1569,14 @@ static int update_block_group_item(struct btrfs_trans_handle *trans,
 	unsigned long bi;
 	int ret;
 
+	if (btrfs_fs_incompat(fs_info, SKINNY_BG_TREE)) {
+		ret = locate_skinny_bg_item(fs_info, trans, cache, path, 0, 1);
+		if (ret < 0)
+			goto fail;
+		key.offset = cache->used;
+		btrfs_set_item_key_safe(fs_info->bg_root, path, &key);
+		return 0;
+	}
 	key.objectid = cache->start;
 	key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
 	key.offset = cache->length;
@@ -2637,9 +2675,33 @@ static int read_block_group_item(struct btrfs_block_group *cache,
 				 const struct btrfs_key *key)
 {
 	struct extent_buffer *leaf = path->nodes[0];
+	struct btrfs_fs_info *fs_info = leaf->fs_info;
 	struct btrfs_block_group_item bgi;
 	int slot = path->slots[0];
 
+	if (btrfs_fs_incompat(fs_info, SKINNY_BG_TREE)) {
+		struct cache_extent *ce;
+		struct map_lookup *map;
+
+		ASSERT(key->type == BTRFS_SKINNY_BLOCK_GROUP_ITEM_KEY);
+		ce = search_cache_extent(&fs_info->mapping_tree.cache_tree,
+					 key->objectid);
+		if (!ce || ce->start != key->objectid)
+			return -ENOENT;
+		map = container_of(ce, struct map_lookup, ce);
+		cache->start = key->objectid;
+		cache->length = ce->size;
+		cache->used = key->offset;
+		cache->flags = map->type;
+		if (cache->used > cache->length) {
+			error(
+	"invalid used bytes for block group %llu, have %llu expect [0, %llu]",
+			      cache->start, cache->used, ce->size);
+			return -EUCLEAN;
+		}
+		return 0;
+	}
+
 	ASSERT(key->type == BTRFS_BLOCK_GROUP_ITEM_KEY);
 
 	cache->start = key->objectid;
@@ -2670,14 +2732,10 @@ static int read_one_block_group(struct btrfs_fs_info *fs_info,
 	int ret;
 
 	btrfs_item_key_to_cpu(leaf, &key, slot);
-	ASSERT(key.type == BTRFS_BLOCK_GROUP_ITEM_KEY);
-
-	/*
-	 * Skip 0 sized block group, don't insert them into block group cache
-	 * tree, as its length is 0, it won't get freed at close_ctree() time.
-	 */
-	if (key.offset == 0)
-		return 0;
+	ASSERT((!btrfs_fs_incompat(fs_info, SKINNY_BG_TREE) &&
+				key.type == BTRFS_BLOCK_GROUP_ITEM_KEY) ||
+	       (btrfs_fs_incompat(fs_info, SKINNY_BG_TREE) &&
+				key.type == BTRFS_SKINNY_BLOCK_GROUP_ITEM_KEY));
 
 	cache = kzalloc(sizeof(*cache), GFP_NOFS);
 	if (!cache)
@@ -2687,6 +2745,16 @@ static int read_one_block_group(struct btrfs_fs_info *fs_info,
 		free(cache);
 		return ret;
 	}
+
+	/*
+	 * Skip 0 sized block group, don't insert them into block group cache
+	 * tree, as its length is 0, it won't get freed at close_ctree() time.
+	 */
+	if (cache->length == 0) {
+		free(cache);
+		return 0;
+	}
+
 	INIT_LIST_HEAD(&cache->dirty_list);
 
 	set_avail_alloc_bits(fs_info, cache->flags);
@@ -2711,6 +2779,53 @@ static int read_one_block_group(struct btrfs_fs_info *fs_info,
 	return 0;
 }
 
+static int read_skinny_block_groups(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_root *root = fs_info->bg_root;
+	struct btrfs_path path;
+	struct btrfs_key key;
+	int ret;
+
+	key.objectid = 0;
+	key.type = 0;
+	key.offset = 0;
+	btrfs_init_path(&path);
+
+	ret = btrfs_search_slot(NULL, root, &key, &path, 0, 0);
+	if (ret < 0)
+		return ret;
+	if (ret == 0) {
+		error("found invalid key (0, 0, 0) in block group tree");
+		ret = -EUCLEAN;
+		goto out;
+	}
+	while (1) {
+		btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]);
+		if (key.type != BTRFS_SKINNY_BLOCK_GROUP_ITEM_KEY) {
+			error(
+		"found invalid key(%llu, %u, %llu) in block group tree",
+				key.objectid, key.type, key.offset);
+			ret = -EUCLEAN;
+			goto out;
+		}
+
+		ret = read_one_block_group(fs_info, &path);
+		if (ret < 0)
+			goto out;
+
+		ret = btrfs_next_item(root, &path);
+		if (ret < 0)
+			goto out;
+		if (ret > 0) {
+			ret = 0;
+			goto out;
+		}
+	}
+out:
+	btrfs_release_path(&path);
+	return ret;
+}
+
 int btrfs_read_block_groups(struct btrfs_fs_info *fs_info)
 {
 	struct btrfs_path path;
@@ -2718,6 +2833,9 @@ int btrfs_read_block_groups(struct btrfs_fs_info *fs_info)
 	int ret;
 	struct btrfs_key key;
 
+	if (btrfs_fs_incompat(fs_info, SKINNY_BG_TREE))
+		return read_skinny_block_groups(fs_info);
+
 	root = fs_info->extent_root;
 	key.objectid = 0;
 	key.offset = 0;
@@ -2813,6 +2931,14 @@ static int insert_block_group_item(struct btrfs_trans_handle *trans,
 	struct btrfs_root *root;
 	struct btrfs_key key;
 
+	if (btrfs_fs_incompat(fs_info, SKINNY_BG_TREE)) {
+		key.objectid = block_group->start;
+		key.type = BTRFS_SKINNY_BLOCK_GROUP_ITEM_KEY;
+		key.offset = block_group->used;
+		root = fs_info->bg_root;
+
+		return btrfs_insert_item(trans, root, &key, NULL, 0);
+	}
 	btrfs_set_stack_block_group_used(&bgi, block_group->used);
 	btrfs_set_stack_block_group_chunk_objectid(&bgi,
 				BTRFS_FIRST_CHUNK_TREE_OBJECTID);
@@ -2921,13 +3047,36 @@ static int remove_block_group_item(struct btrfs_trans_handle *trans,
 				   struct btrfs_block_group *block_group)
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
-	struct btrfs_root *root = fs_info->extent_root;
+	struct btrfs_root *root;
 	struct btrfs_key key;
 	int ret = 0;
 
+	if (btrfs_fs_incompat(fs_info, SKINNY_BG_TREE)) {
+		key.objectid = block_group->start;
+		key.type = BTRFS_SKINNY_BLOCK_GROUP_ITEM_KEY;
+		key.offset = (u64)-1;
+		root = fs_info->bg_root;
+
+		ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
+		if (ret == 0) {
+			btrfs_release_path(path);
+			ret = -EUCLEAN;
+		}
+		if (ret < 0)
+			return ret;
+
+		ret = btrfs_previous_item(root, path, key.objectid, key.type);
+		if (ret > 0)
+			ret = -ENOENT;
+		if (ret < 0)
+			return ret;
+		ret = btrfs_del_item(trans, root, path);
+		return ret;
+	}
 	key.objectid = block_group->start;
 	key.offset = block_group->length;
 	key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
+	root = fs_info->extent_root;
 
 	ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
 	if (ret > 0)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 07/11] btrfs-progs: mkfs: Introduce -O skinny-bg-tree
  2020-05-05  0:02 [PATCH v4 00/11] btrfs-progs: Support for SKINNY_BG_TREE feature Qu Wenruo
                   ` (5 preceding siblings ...)
  2020-05-05  0:02 ` [PATCH v4 06/11] btrfs-progs: Introduce rw support for skinny_bg_tree Qu Wenruo
@ 2020-05-05  0:02 ` Qu Wenruo
  2020-05-05  0:02 ` [PATCH v4 08/11] btrfs-progs: dump-tree/dump-super: Introduce support for skinny bg tree Qu Wenruo
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Qu Wenruo @ 2020-05-05  0:02 UTC (permalink / raw)
  To: linux-btrfs

This allow mkfs.btrfs to create a btrfs with skinny-bg-tree feature.

This patch introduce a new function, btrfs_convert_to_bg_tree() in
extent-tree.c, to do the work.

The convert happens after our fs is created (with temp chunks cleaned
up), before we populate the root dir.

The workflow is simple:
- Create a new tree block for bg tree
- Set the SKINNY_BG_TREE feature for superblock
- Set the fs_info->convert_to_bg_tree flag
- Mark all block group as dirty
- Commit transaction
  * With fs_info->convert_to_skinny_bg_tree set, we will try to delete the
    BLOCK_GROUP_ITEM in extent tree first, then write the new
    BLOCK_GROUP_ITEM into bg tree.
    So that at update_block_group_item(), we convert the old extent
    items to skinny bg items.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 common/fsfeatures.c |  6 +++
 ctree.h             |  2 +
 extent-tree.c       | 97 ++++++++++++++++++++++++++++++++++++++++++++-
 mkfs/common.c       |  3 +-
 mkfs/common.h       |  3 ++
 mkfs/main.c         | 11 +++++
 transaction.c       |  1 +
 7 files changed, 121 insertions(+), 2 deletions(-)

diff --git a/common/fsfeatures.c b/common/fsfeatures.c
index ac12d57b25a3..46666b34281d 100644
--- a/common/fsfeatures.c
+++ b/common/fsfeatures.c
@@ -92,6 +92,12 @@ static const struct btrfs_fs_feature {
 		NULL, 0,
 		NULL, 0,
 		"RAID1 with 3 or 4 copies" },
+	{ "skinny-bg-tree", BTRFS_FEATURE_INCOMPAT_SKINNY_BG_TREE,
+		"skinny_bg_tree",
+		VERSION_TO_STRING2(5, 9),
+		NULL, 0,
+		NULL, 0,
+		"store optimized block group items in dedicated tree" },
 	/* Keep this one last */
 	{ "list-all", BTRFS_FEATURE_LIST_ALL, NULL }
 };
diff --git a/ctree.h b/ctree.h
index 9ce73008a7e0..a84237a06609 100644
--- a/ctree.h
+++ b/ctree.h
@@ -1205,6 +1205,7 @@ struct btrfs_fs_info {
 	unsigned int avoid_sys_chunk_alloc:1;
 	unsigned int finalize_on_close:1;
 	unsigned int hide_names:1;
+	unsigned int convert_to_skinny_bg_tree:1;
 
 	int transaction_aborted;
 
@@ -2619,6 +2620,7 @@ int exclude_super_stripes(struct btrfs_fs_info *fs_info,
 u64 add_new_free_space(struct btrfs_block_group *block_group,
 		       struct btrfs_fs_info *info, u64 start, u64 end);
 u64 hash_extent_data_ref(u64 root_objectid, u64 owner, u64 offset);
+int btrfs_convert_to_skinny_bg_tree(struct btrfs_fs_info *fs_info);
 
 /* ctree.c */
 int btrfs_comp_cpu_keys(const struct btrfs_key *k1, const struct btrfs_key *k2);
diff --git a/extent-tree.c b/extent-tree.c
index 179fce4422cf..b6ac7b4caa2f 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -1557,6 +1557,11 @@ error:
 	return ret;
 }
 
+static int remove_block_group_item(struct btrfs_trans_handle *trans,
+				   struct btrfs_path *path,
+				   struct btrfs_block_group *block_group);
+static int insert_block_group_item(struct btrfs_trans_handle *trans,
+				   struct btrfs_block_group *block_group);
 static int update_block_group_item(struct btrfs_trans_handle *trans,
 				   struct btrfs_path *path,
 				   struct btrfs_block_group *cache)
@@ -1570,6 +1575,21 @@ static int update_block_group_item(struct btrfs_trans_handle *trans,
 	int ret;
 
 	if (btrfs_fs_incompat(fs_info, SKINNY_BG_TREE)) {
+		if (fs_info->convert_to_skinny_bg_tree) {
+			ret = remove_block_group_item(trans, path, cache);
+			btrfs_release_path(path);
+			if (ret < 0 && ret != -ENOENT)
+				goto fail;
+
+			ret = insert_block_group_item(trans, cache);
+			btrfs_release_path(path);
+			/* New one is inserted, no need to update */
+			if (ret == 0)
+				return ret;
+			if (ret < 0 && ret != -EEXIST)
+				return ret;
+			/* ret == -EEXIST case falls through */
+		}
 		ret = locate_skinny_bg_item(fs_info, trans, cache, path, 0, 1);
 		if (ret < 0)
 			goto fail;
@@ -2932,6 +2952,24 @@ static int insert_block_group_item(struct btrfs_trans_handle *trans,
 	struct btrfs_key key;
 
 	if (btrfs_fs_incompat(fs_info, SKINNY_BG_TREE)) {
+		/*
+		 * For convert case, check if there is already one skinny bg
+		 * item, to prevent duplicating items.
+		 */
+		if (fs_info->convert_to_skinny_bg_tree) {
+			struct btrfs_path path;
+			int ret;
+
+			btrfs_init_path(&path);
+			ret = locate_skinny_bg_item(fs_info, NULL, block_group,
+						    &path, 0, 0);
+			btrfs_release_path(&path);
+			if (ret == 0)
+				return -EEXIST;
+			if (ret < 0 && ret != -ENOENT)
+				return ret;
+			/* -ENOENT (no existing item) case falls through */
+		}
 		key.objectid = block_group->start;
 		key.type = BTRFS_SKINNY_BLOCK_GROUP_ITEM_KEY;
 		key.offset = block_group->used;
@@ -3051,7 +3089,8 @@ static int remove_block_group_item(struct btrfs_trans_handle *trans,
 	struct btrfs_key key;
 	int ret = 0;
 
-	if (btrfs_fs_incompat(fs_info, SKINNY_BG_TREE)) {
+	if (btrfs_fs_incompat(fs_info, SKINNY_BG_TREE) &&
+	    !fs_info->convert_to_skinny_bg_tree) {
 		key.objectid = block_group->start;
 		key.type = BTRFS_SKINNY_BLOCK_GROUP_ITEM_KEY;
 		key.offset = (u64)-1;
@@ -4033,3 +4072,59 @@ int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, unsigned long nr)
 
 	return 0;
 }
+
+int btrfs_convert_to_skinny_bg_tree(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_trans_handle *trans;
+	struct btrfs_block_group *bg;
+	struct btrfs_root *bg_root;
+	u64 features = btrfs_super_incompat_flags(fs_info->super_copy);
+	int ret;
+
+	ASSERT(fs_info->bg_root == NULL);
+
+	trans = btrfs_start_transaction(fs_info->tree_root, 1);
+	if (IS_ERR(trans)) {
+		ret = PTR_ERR(trans);
+		errno = -ret;
+		error("failed to start transaction: %m");
+		return ret;
+	}
+
+	/* Create bg tree first */
+	bg_root = btrfs_create_tree(trans, fs_info,
+				    BTRFS_BLOCK_GROUP_TREE_OBJECTID);
+	if (IS_ERR(bg_root)) {
+		ret = PTR_ERR(bg_root);
+		errno = -ret;
+		error("failed to create bg tree: %m");
+		goto error;
+	}
+	fs_info->bg_root = bg_root;
+	fs_info->bg_root->track_dirty = 1;
+	fs_info->bg_root->ref_cows = 0;
+	add_root_to_dirty_list(bg_root);
+
+	/* Set SKINNY_BG_FEATURE and convert status */
+	btrfs_set_super_incompat_flags(fs_info->super_copy,
+			features | BTRFS_FEATURE_INCOMPAT_SKINNY_BG_TREE);
+	fs_info->convert_to_skinny_bg_tree = 1;
+
+	/* Mark all bgs dirty so convert will happen at convert time */
+	for (bg = btrfs_lookup_first_block_group(fs_info, 0); bg;
+	     bg = btrfs_lookup_first_block_group(fs_info,
+		     bg->start + bg->length))
+		if (list_empty(&bg->dirty_list))
+			list_add_tail(&bg->dirty_list, &trans->dirty_bgs);
+
+	ret = btrfs_commit_transaction(trans, fs_info->tree_root);
+	if (ret < 0) {
+		errno = -ret;
+		error("failed to commit transaction: %m");
+		goto error;
+	}
+	return ret;
+error:
+	btrfs_abort_transaction(trans, ret);
+	return ret;
+}
diff --git a/mkfs/common.c b/mkfs/common.c
index 469b88d6a8d3..88df4c5ac46d 100644
--- a/mkfs/common.c
+++ b/mkfs/common.c
@@ -205,7 +205,8 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 	btrfs_set_super_csum_type(&super, cfg->csum_type);
 	btrfs_set_super_chunk_root_generation(&super, 1);
 	btrfs_set_super_cache_generation(&super, -1);
-	btrfs_set_super_incompat_flags(&super, cfg->features);
+	btrfs_set_super_incompat_flags(&super, cfg->features &
+					~POST_MKFS_FEATURES);
 	if (cfg->label)
 		__strncpy_null(super.label, cfg->label, BTRFS_LABEL_SIZE - 1);
 
diff --git a/mkfs/common.h b/mkfs/common.h
index 426852bebf1d..217239345248 100644
--- a/mkfs/common.h
+++ b/mkfs/common.h
@@ -28,6 +28,9 @@
 #define BTRFS_MKFS_SYSTEM_GROUP_SIZE SZ_4M
 #define BTRFS_MKFS_SMALL_VOLUME_SIZE SZ_1G
 
+/* These features are handled after major mkfs work */
+#define POST_MKFS_FEATURES	(BTRFS_FEATURE_INCOMPAT_SKINNY_BG_TREE)
+
 /*
  * Tree root blocks created during mkfs
  */
diff --git a/mkfs/main.c b/mkfs/main.c
index 2c28d0b159a6..5a4c41bc9ce8 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -1332,6 +1332,17 @@ raid_groups:
 		goto out;
 	}
 
+	/* Handle post-mkfs features */
+	if (mkfs_cfg.features & POST_MKFS_FEATURES) {
+		if (mkfs_cfg.features & BTRFS_FEATURE_INCOMPAT_SKINNY_BG_TREE) {
+			ret = btrfs_convert_to_skinny_bg_tree(fs_info);
+			if (ret < 0) {
+				error("failed to convert to skinny bg tree");
+				goto out;
+			}
+		}
+	}
+
 	if (source_dir_set) {
 		ret = btrfs_mkfs_fill_dir(source_dir, root, verbose);
 		if (ret) {
diff --git a/transaction.c b/transaction.c
index 0917abcad705..4a00ff08d45a 100644
--- a/transaction.c
+++ b/transaction.c
@@ -226,6 +226,7 @@ commit_tree:
 	root->commit_root = NULL;
 	fs_info->running_transaction = NULL;
 	fs_info->last_trans_committed = transid;
+	fs_info->convert_to_skinny_bg_tree = 0;
 	list_for_each_entry(sinfo, &fs_info->space_info, list) {
 		if (sinfo->bytes_reserved) {
 			warning(
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 08/11] btrfs-progs: dump-tree/dump-super: Introduce support for skinny bg tree
  2020-05-05  0:02 [PATCH v4 00/11] btrfs-progs: Support for SKINNY_BG_TREE feature Qu Wenruo
                   ` (6 preceding siblings ...)
  2020-05-05  0:02 ` [PATCH v4 07/11] btrfs-progs: mkfs: Introduce -O skinny-bg-tree Qu Wenruo
@ 2020-05-05  0:02 ` Qu Wenruo
  2020-05-05  0:02 ` [PATCH v4 09/11] btrfs-progs: check: Introduce support for bg-tree feature Qu Wenruo
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Qu Wenruo @ 2020-05-05  0:02 UTC (permalink / raw)
  To: linux-btrfs

Just a new tree called BLOCK_GROUP_TREE.

The new type (SKINNY_BLOCK_GROUP_ITEM) doesn't has any item, thus no
need to add any extra output.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 cmds/inspect-dump-super.c | 1 +
 cmds/inspect-dump-tree.c  | 6 ++++++
 print-tree.c              | 4 ++++
 3 files changed, 11 insertions(+)

diff --git a/cmds/inspect-dump-super.c b/cmds/inspect-dump-super.c
index f22633b99390..bac50cb59391 100644
--- a/cmds/inspect-dump-super.c
+++ b/cmds/inspect-dump-super.c
@@ -229,6 +229,7 @@ static struct readable_flag_entry incompat_flags_array[] = {
 	DEF_INCOMPAT_FLAG_ENTRY(NO_HOLES),
 	DEF_INCOMPAT_FLAG_ENTRY(METADATA_UUID),
 	DEF_INCOMPAT_FLAG_ENTRY(RAID1C34),
+	DEF_INCOMPAT_FLAG_ENTRY(SKINNY_BG_TREE),
 };
 static const int incompat_flags_num = sizeof(incompat_flags_array) /
 				      sizeof(struct readable_flag_entry);
diff --git a/cmds/inspect-dump-tree.c b/cmds/inspect-dump-tree.c
index 1fdbb9a6b9b1..3a91fbe5ed79 100644
--- a/cmds/inspect-dump-tree.c
+++ b/cmds/inspect-dump-tree.c
@@ -151,6 +151,8 @@ static u64 treeid_from_string(const char *str, const char **end)
 		{ "CHECKSUM", BTRFS_CSUM_TREE_OBJECTID },
 		{ "QUOTA", BTRFS_QUOTA_TREE_OBJECTID },
 		{ "UUID", BTRFS_UUID_TREE_OBJECTID },
+		{ "BG", BTRFS_BLOCK_GROUP_TREE_OBJECTID},
+		{ "BLOCK_GROUP", BTRFS_BLOCK_GROUP_TREE_OBJECTID},
 		{ "FREE_SPACE", BTRFS_FREE_SPACE_TREE_OBJECTID },
 		{ "TREE_LOG_FIXUP", BTRFS_TREE_LOG_FIXUP_OBJECTID },
 		{ "TREE_LOG", BTRFS_TREE_LOG_OBJECTID },
@@ -668,6 +670,10 @@ again:
 				if (!skip)
 					printf("free space");
 				break;
+			case BTRFS_BLOCK_GROUP_TREE_OBJECTID:
+				if (!skip)
+					printf("block group");
+				break;
 			case BTRFS_MULTIPLE_OBJECTIDS:
 				if (!skip) {
 					printf("multiple");
diff --git a/print-tree.c b/print-tree.c
index 27acadb22205..ea9b35f604d6 100644
--- a/print-tree.c
+++ b/print-tree.c
@@ -683,6 +683,7 @@ void print_key_type(FILE *stream, u64 objectid, u8 type)
 		[BTRFS_EXTENT_CSUM_KEY]		= "EXTENT_CSUM",
 		[BTRFS_EXTENT_DATA_KEY]		= "EXTENT_DATA",
 		[BTRFS_BLOCK_GROUP_ITEM_KEY]	= "BLOCK_GROUP_ITEM",
+		[BTRFS_SKINNY_BLOCK_GROUP_ITEM_KEY] = "SKINNY_BLOCK_GROUP_ITEM",
 		[BTRFS_FREE_SPACE_INFO_KEY]	= "FREE_SPACE_INFO",
 		[BTRFS_FREE_SPACE_EXTENT_KEY]	= "FREE_SPACE_EXTENT",
 		[BTRFS_FREE_SPACE_BITMAP_KEY]	= "FREE_SPACE_BITMAP",
@@ -802,6 +803,9 @@ void print_objectid(FILE *stream, u64 objectid, u8 type)
 	case BTRFS_FREE_SPACE_TREE_OBJECTID:
 		fprintf(stream, "FREE_SPACE_TREE");
 		break;
+	case BTRFS_BLOCK_GROUP_TREE_OBJECTID:
+		fprintf(stream, "BLOCK_GROUP_TREE");
+		break;
 	case BTRFS_MULTIPLE_OBJECTIDS:
 		fprintf(stream, "MULTIPLE");
 		break;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 09/11] btrfs-progs: check: Introduce support for bg-tree feature
  2020-05-05  0:02 [PATCH v4 00/11] btrfs-progs: Support for SKINNY_BG_TREE feature Qu Wenruo
                   ` (7 preceding siblings ...)
  2020-05-05  0:02 ` [PATCH v4 08/11] btrfs-progs: dump-tree/dump-super: Introduce support for skinny bg tree Qu Wenruo
@ 2020-05-05  0:02 ` Qu Wenruo
  2020-05-05  0:02 ` [PATCH v4 10/11] btrfs-progs: btrfstune: Allow to enable bg-tree feature offline Qu Wenruo
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Qu Wenruo @ 2020-05-05  0:02 UTC (permalink / raw)
  To: linux-btrfs

Just some minor modification.

- original mode:
  * skinny block group item can only occur in bg tree
  * check skinny block group item
    Introduce a new function, process_skinny_bgi(), for this check.
- lowmem mode:
  * search skinny block group items in bg tree if SKINNY_BG_TREE feature is set.
  * check skinny block group item
    This is done by reusing check_block_group_item().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 check/common.h              |  4 +--
 check/main.c                | 60 ++++++++++++++++++++++++++-------
 check/mode-lowmem.c         | 66 +++++++++++++++++++++++++++++++------
 cmds/rescue-chunk-recover.c |  5 ++-
 4 files changed, 111 insertions(+), 24 deletions(-)

diff --git a/check/common.h b/check/common.h
index 62cdc1d934c7..060920f149d4 100644
--- a/check/common.h
+++ b/check/common.h
@@ -166,8 +166,8 @@ struct chunk_record *btrfs_new_chunk_record(struct extent_buffer *leaf,
 					    struct btrfs_key *key,
 					    int slot);
 struct block_group_record *
-btrfs_new_block_group_record(struct extent_buffer *leaf, struct btrfs_key *key,
-			     int slot);
+btrfs_new_block_group_record(struct extent_buffer *leaf, u64 start, u64 length,
+			     u64 flags);
 struct device_extent_record *
 btrfs_new_device_extent_record(struct extent_buffer *leaf,
 			       struct btrfs_key *key, int slot);
diff --git a/check/main.c b/check/main.c
index e7288e042dba..03b269404ba5 100644
--- a/check/main.c
+++ b/check/main.c
@@ -5210,10 +5210,9 @@ static int process_device_item(struct rb_root *dev_cache,
 }
 
 struct block_group_record *
-btrfs_new_block_group_record(struct extent_buffer *leaf, struct btrfs_key *key,
-			     int slot)
+btrfs_new_block_group_record(struct extent_buffer *leaf, u64 start, u64 length,
+			     u64 flags)
 {
-	struct btrfs_block_group_item *ptr;
 	struct block_group_record *rec;
 
 	rec = calloc(1, sizeof(*rec));
@@ -5222,17 +5221,16 @@ btrfs_new_block_group_record(struct extent_buffer *leaf, struct btrfs_key *key,
 		exit(-1);
 	}
 
-	rec->cache.start = key->objectid;
-	rec->cache.size = key->offset;
+	rec->cache.start = start;
+	rec->cache.size = length;
 
 	rec->generation = btrfs_header_generation(leaf);
 
-	rec->objectid = key->objectid;
-	rec->type = key->type;
-	rec->offset = key->offset;
+	rec->objectid = start;
+	rec->type = BTRFS_BLOCK_GROUP_ITEM_KEY;
+	rec->offset = length;
 
-	ptr = btrfs_item_ptr(leaf, slot, struct btrfs_block_group_item);
-	rec->flags = btrfs_block_group_flags(leaf, ptr);
+	rec->flags = flags;
 
 	INIT_LIST_HEAD(&rec->list);
 
@@ -5243,10 +5241,13 @@ static int process_block_group_item(struct block_group_tree *block_group_cache,
 				    struct btrfs_key *key,
 				    struct extent_buffer *eb, int slot)
 {
+	struct btrfs_block_group_item *bgi;
 	struct block_group_record *rec;
 	int ret = 0;
 
-	rec = btrfs_new_block_group_record(eb, key, slot);
+	bgi = btrfs_item_ptr(eb, slot, struct btrfs_block_group_item);
+	rec = btrfs_new_block_group_record(eb, key->objectid, key->offset,
+				btrfs_block_group_flags(eb, bgi));
 	ret = insert_block_group_record(block_group_cache, rec);
 	if (ret) {
 		fprintf(stderr, "Block Group[%llu, %llu] existed.\n",
@@ -5257,6 +5258,32 @@ static int process_block_group_item(struct block_group_tree *block_group_cache,
 	return ret;
 }
 
+static int process_skinny_bgi(struct block_group_tree *block_group_cache,
+			      struct btrfs_key *key, struct extent_buffer *eb)
+{
+	struct btrfs_mapping_tree *map_tree = &global_info->mapping_tree;
+	struct block_group_record *rec;
+	struct cache_extent *ce;
+	struct map_lookup *map;
+	int ret;
+
+	ce = search_cache_extent(&map_tree->cache_tree, key->objectid);
+	/* For mismatch case, we just skip this bgi */
+	if (ce->start != key->objectid)
+		return 0;
+
+	map = container_of(ce, struct map_lookup, ce);
+	rec = btrfs_new_block_group_record(eb, key->objectid, ce->size,
+					   map->type);
+	ret = insert_block_group_record(block_group_cache, rec);
+	if (ret) {
+		error("block group [%llu, %llu) existed.",
+			ce->start, ce->start + ce->size);
+		free(rec);
+	}
+	return ret;
+}
+
 struct device_extent_record *
 btrfs_new_device_extent_record(struct extent_buffer *leaf,
 			       struct btrfs_key *key, int slot)
@@ -6124,6 +6151,10 @@ static int check_type_with_root(u64 rootid, u8 key_type)
 		if (rootid != BTRFS_EXTENT_TREE_OBJECTID)
 			goto err;
 		break;
+	case BTRFS_SKINNY_BLOCK_GROUP_ITEM_KEY:
+		if (rootid != BTRFS_BLOCK_GROUP_TREE_OBJECTID)
+			goto err;
+		break;
 	case BTRFS_ROOT_ITEM_KEY:
 		if (rootid != BTRFS_ROOT_TREE_OBJECTID)
 			goto err;
@@ -6330,6 +6361,13 @@ static int run_next_block(struct btrfs_root *root,
 					&key, buf, i);
 				continue;
 			}
+			if (key.type == BTRFS_SKINNY_BLOCK_GROUP_ITEM_KEY) {
+				ret = process_skinny_bgi(block_group_cache,
+							 &key, buf);
+				if (ret < 0)
+					goto out;
+				continue;
+			}
 			if (key.type == BTRFS_DEV_EXTENT_KEY) {
 				process_device_extent_item(dev_extent_cache,
 					&key, buf, i);
diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index dbb90895127d..828358d9b2c9 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -3553,16 +3553,39 @@ static int check_block_group_item(struct btrfs_fs_info *fs_info,
 	u32 nodesize = btrfs_super_nodesize(fs_info->super_copy);
 	u64 flags;
 	u64 bg_flags;
+	u64 bg_len;
 	u64 used;
 	u64 total = 0;
 	int ret;
 	int err = 0;
 
 	btrfs_item_key_to_cpu(eb, &bg_key, slot);
-	bi = btrfs_item_ptr(eb, slot, struct btrfs_block_group_item);
-	read_extent_buffer(eb, &bg_item, (unsigned long)bi, sizeof(bg_item));
-	used = btrfs_stack_block_group_used(&bg_item);
-	bg_flags = btrfs_stack_block_group_flags(&bg_item);
+	if (bg_key.type == BTRFS_BLOCK_GROUP_ITEM_KEY) {
+		bi = btrfs_item_ptr(eb, slot, struct btrfs_block_group_item);
+		read_extent_buffer(eb, &bg_item, (unsigned long)bi, sizeof(bg_item));
+		used = btrfs_stack_block_group_used(&bg_item);
+		bg_flags = btrfs_stack_block_group_flags(&bg_item);
+		bg_len = bg_key.offset;
+	} else {
+		struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
+		struct cache_extent *ce;
+		struct map_lookup *map;
+
+		ce = search_cache_extent(&map_tree->cache_tree,
+					 bg_key.objectid);
+		if (!ce || ce->start != bg_key.objectid) {
+			error(
+		"block group[%llu] did not find the related chunk item",
+				bg_key.objectid);
+			err |= REFERENCER_MISSING;
+			return err;
+		} else {
+			map = container_of(ce, struct map_lookup, ce);
+			bg_flags = map->type;
+		}
+		used = bg_key.offset;
+		bg_len = ce->size;
+	}
 
 	chunk_key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
 	chunk_key.type = BTRFS_CHUNK_ITEM_KEY;
@@ -3574,16 +3597,15 @@ static int check_block_group_item(struct btrfs_fs_info *fs_info,
 	if (ret) {
 		error(
 		"block group[%llu %llu] did not find the related chunk item",
-			bg_key.objectid, bg_key.offset);
+			bg_key.objectid, bg_len);
 		err |= REFERENCER_MISSING;
 	} else {
 		chunk = btrfs_item_ptr(path.nodes[0], path.slots[0],
 					struct btrfs_chunk);
-		if (btrfs_chunk_length(path.nodes[0], chunk) !=
-						bg_key.offset) {
+		if (btrfs_chunk_length(path.nodes[0], chunk) != bg_len) {
 			error(
 	"block group[%llu %llu] related chunk item length does not match",
-				bg_key.objectid, bg_key.offset);
+				bg_key.objectid, bg_len);
 			err |= REFERENCER_MISMATCH;
 		}
 	}
@@ -3608,7 +3630,7 @@ static int check_block_group_item(struct btrfs_fs_info *fs_info,
 			goto next;
 
 		btrfs_item_key_to_cpu(leaf, &extent_key, path.slots[0]);
-		if (extent_key.objectid >= bg_key.objectid + bg_key.offset)
+		if (extent_key.objectid >= bg_key.objectid + bg_len)
 			break;
 
 		if (extent_key.type != BTRFS_METADATA_ITEM_KEY &&
@@ -3655,7 +3677,7 @@ out:
 	if (total != used) {
 		error(
 		"block group[%llu %llu] used %llu but extent items used %llu",
-			bg_key.objectid, bg_key.offset, used, total);
+			bg_key.objectid, bg_len, used, total);
 		err |= BG_ACCOUNTING_ERROR;
 	}
 	return err;
@@ -4514,6 +4536,29 @@ static int find_block_group_item(struct btrfs_fs_info *fs_info,
 	struct btrfs_key key;
 	int ret;
 
+	if (btrfs_fs_incompat(fs_info, SKINNY_BG_TREE)) {
+		key.objectid = bytenr;
+		key.type = BTRFS_SKINNY_BLOCK_GROUP_ITEM_KEY;
+		key.offset = (u64)-1;
+
+		ret = btrfs_search_slot(NULL, fs_info->bg_root, &key, path, 0, 0);
+		if (ret < 0)
+			return ret;
+		if (ret == 0) {
+			ret = -EUCLEAN;
+			error("invalid skinny bg item found for chunk [%llu, %llu)",
+				bytenr, bytenr + len);
+			goto out;
+		}
+		ret = btrfs_previous_item(fs_info->bg_root, path, bytenr,
+					  BTRFS_SKINNY_BLOCK_GROUP_ITEM_KEY);
+		if (ret > 0) {
+			ret = -ENOENT;
+			error("can't find skinny bg item for chunk [%llu, %llu)",
+				bytenr, bytenr + len);
+		}
+		goto out;
+	}
 	key.objectid = bytenr;
 	key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
 	key.offset = len;
@@ -4722,6 +4767,7 @@ again:
 		err |= ret;
 		break;
 	case BTRFS_BLOCK_GROUP_ITEM_KEY:
+	case BTRFS_SKINNY_BLOCK_GROUP_ITEM_KEY:
 		ret = check_block_group_item(fs_info, eb, slot);
 		if (repair &&
 		    ret & REFERENCER_MISSING)
diff --git a/cmds/rescue-chunk-recover.c b/cmds/rescue-chunk-recover.c
index 8732324e7da0..6a7dba3bc8b8 100644
--- a/cmds/rescue-chunk-recover.c
+++ b/cmds/rescue-chunk-recover.c
@@ -226,12 +226,15 @@ static int process_block_group_item(struct block_group_tree *bg_cache,
 				    struct extent_buffer *leaf,
 				    struct btrfs_key *key, int slot)
 {
+	struct btrfs_block_group_item *bgi;
 	struct block_group_record *rec;
 	struct block_group_record *exist;
 	struct cache_extent *cache;
 	int ret = 0;
 
-	rec = btrfs_new_block_group_record(leaf, key, slot);
+	bgi = btrfs_item_ptr(leaf, slot, struct btrfs_block_group_item);
+	rec = btrfs_new_block_group_record(leaf, key->objectid, key->offset,
+				btrfs_block_group_flags(leaf, bgi));
 	if (!rec->cache.size)
 		goto free_out;
 again:
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 10/11] btrfs-progs: btrfstune: Allow to enable bg-tree feature offline
  2020-05-05  0:02 [PATCH v4 00/11] btrfs-progs: Support for SKINNY_BG_TREE feature Qu Wenruo
                   ` (8 preceding siblings ...)
  2020-05-05  0:02 ` [PATCH v4 09/11] btrfs-progs: check: Introduce support for bg-tree feature Qu Wenruo
@ 2020-05-05  0:02 ` Qu Wenruo
  2020-05-05  0:02 ` [PATCH v4 11/11] btrfs-progs: btrfstune: Allow user to rollback to regular extent tree Qu Wenruo
  2020-05-11 18:58 ` [PATCH v4 00/11] btrfs-progs: Support for SKINNY_BG_TREE feature David Sterba
  11 siblings, 0 replies; 24+ messages in thread
From: Qu Wenruo @ 2020-05-05  0:02 UTC (permalink / raw)
  To: linux-btrfs

Add a new option '-b' for btrfstune, to enable bg-tree feature for a
unmounted fs.

This feature will convert all BLOCK_GROUP_ITEMs in extent tree to bg
tree, by reusing the existing btrfs_convert_to_bg_tree() function.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 Documentation/btrfstune.asciidoc |  6 ++++++
 btrfstune.c                      | 19 +++++++++++++++++--
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/Documentation/btrfstune.asciidoc b/Documentation/btrfstune.asciidoc
index 1d6bc98deed8..9726cef02929 100644
--- a/Documentation/btrfstune.asciidoc
+++ b/Documentation/btrfstune.asciidoc
@@ -26,6 +26,12 @@ means.  Please refer to the 'FILESYSTEM FEATURES' in `btrfs`(5).
 OPTIONS
 -------
 
+-b::
+(since kernel: 5.9)
++
+enable skinny-bg-tree feature (faster mount time for large fs), enabled by mkfs
+feature 'skinny-bg-tree'.
+
 -f::
 Allow dangerous changes, e.g. clear the seeding flag or change fsid. Make sure
 that you are aware of the dangers.
diff --git a/btrfstune.c b/btrfstune.c
index afa3aae35412..8926dd38798c 100644
--- a/btrfstune.c
+++ b/btrfstune.c
@@ -476,6 +476,8 @@ static void print_usage(void)
 	printf("\t-m          change fsid in metadata_uuid to a random UUID\n");
 	printf("\t            (incompat change, more lightweight than -u|-U)\n");
 	printf("\t-M UUID     change fsid in metadata_uuid to UUID\n");
+	printf("\t-b          enable skinny-bg-tree feature (mkfs: skinny-bg-tree)");
+	printf("\t            for faster mount time\n");
 	printf("  general:\n");
 	printf("\t-f          allow dangerous operations, make sure that you are aware of the dangers\n");
 	printf("\t--help      print this help\n");
@@ -485,6 +487,7 @@ int BOX_MAIN(btrfstune)(int argc, char *argv[])
 {
 	struct btrfs_root *root;
 	unsigned ctree_flags = OPEN_CTREE_WRITES;
+	bool to_skinny_bg_tree = false;
 	int success = 0;
 	int total = 0;
 	int seeding_flag = 0;
@@ -501,7 +504,8 @@ int BOX_MAIN(btrfstune)(int argc, char *argv[])
 			{ "help", no_argument, NULL, GETOPT_VAL_HELP},
 			{ NULL, 0, NULL, 0 }
 		};
-		int c = getopt_long(argc, argv, "S:rxfuU:nmM:", long_options, NULL);
+		int c = getopt_long(argc, argv, "S:rxfuU:nmM:b", long_options,
+				    NULL);
 
 		if (c < 0)
 			break;
@@ -539,6 +543,9 @@ int BOX_MAIN(btrfstune)(int argc, char *argv[])
 			ctree_flags |= OPEN_CTREE_IGNORE_FSID_MISMATCH;
 			change_metadata_uuid = 1;
 			break;
+		case 'b':
+			to_skinny_bg_tree = true;
+			break;
 		case GETOPT_VAL_HELP:
 		default:
 			print_usage();
@@ -556,7 +563,7 @@ int BOX_MAIN(btrfstune)(int argc, char *argv[])
 		return 1;
 	}
 	if (!super_flags && !seeding_flag && !(random_fsid || new_fsid_str) &&
-	    !change_metadata_uuid) {
+	    !change_metadata_uuid && !to_skinny_bg_tree) {
 		error("at least one option should be specified");
 		print_usage();
 		return 1;
@@ -602,6 +609,14 @@ int BOX_MAIN(btrfstune)(int argc, char *argv[])
 		return 1;
 	}
 
+	if (to_skinny_bg_tree) {
+		ret = btrfs_convert_to_skinny_bg_tree(root->fs_info);
+		if (ret < 0) {
+			errno = -ret;
+			error("failed to convert to bg-tree feature: %m");
+			goto out;
+		}
+	}
 	if (seeding_flag) {
 		if (btrfs_fs_incompat(root->fs_info, METADATA_UUID)) {
 			fprintf(stderr, "SEED flag cannot be changed on a metadata-uuid changed fs\n");
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 11/11] btrfs-progs: btrfstune: Allow user to rollback to regular extent tree
  2020-05-05  0:02 [PATCH v4 00/11] btrfs-progs: Support for SKINNY_BG_TREE feature Qu Wenruo
                   ` (9 preceding siblings ...)
  2020-05-05  0:02 ` [PATCH v4 10/11] btrfs-progs: btrfstune: Allow to enable bg-tree feature offline Qu Wenruo
@ 2020-05-05  0:02 ` Qu Wenruo
  2020-05-11 18:58 ` [PATCH v4 00/11] btrfs-progs: Support for SKINNY_BG_TREE feature David Sterba
  11 siblings, 0 replies; 24+ messages in thread
From: Qu Wenruo @ 2020-05-05  0:02 UTC (permalink / raw)
  To: linux-btrfs

Since the skinny bg tree will not be supported on older kernel, some
testers may want to roll back to regular extent tree so that they can
use real-world data to test this feature.

So add such rollback ability to do provide a much wider test coverage
while still allow testers to enjoy their old data on older kernels.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 Documentation/btrfstune.asciidoc |   4 ++
 btrfstune.c                      |  21 +++++-
 ctree.h                          |   2 +
 extent-tree.c                    | 119 +++++++++++++++++++++++++++++++
 root-tree.c                      |   6 +-
 transaction.c                    |   1 +
 6 files changed, 150 insertions(+), 3 deletions(-)

diff --git a/Documentation/btrfstune.asciidoc b/Documentation/btrfstune.asciidoc
index 9726cef02929..ca76b077cba6 100644
--- a/Documentation/btrfstune.asciidoc
+++ b/Documentation/btrfstune.asciidoc
@@ -32,6 +32,10 @@ OPTIONS
 enable skinny-bg-tree feature (faster mount time for large fs), enabled by mkfs
 feature 'skinny-bg-tree'.
 
+-B::
++
+disable skinny-bg-tree feature.
+
 -f::
 Allow dangerous changes, e.g. clear the seeding flag or change fsid. Make sure
 that you are aware of the dangers.
diff --git a/btrfstune.c b/btrfstune.c
index 8926dd38798c..6595b0fef32e 100644
--- a/btrfstune.c
+++ b/btrfstune.c
@@ -478,6 +478,7 @@ static void print_usage(void)
 	printf("\t-M UUID     change fsid in metadata_uuid to UUID\n");
 	printf("\t-b          enable skinny-bg-tree feature (mkfs: skinny-bg-tree)");
 	printf("\t            for faster mount time\n");
+	printf("\t-B          disable skinny-bg-tree feature");
 	printf("  general:\n");
 	printf("\t-f          allow dangerous operations, make sure that you are aware of the dangers\n");
 	printf("\t--help      print this help\n");
@@ -488,6 +489,7 @@ int BOX_MAIN(btrfstune)(int argc, char *argv[])
 	struct btrfs_root *root;
 	unsigned ctree_flags = OPEN_CTREE_WRITES;
 	bool to_skinny_bg_tree = false;
+	bool to_extent_tree = false;
 	int success = 0;
 	int total = 0;
 	int seeding_flag = 0;
@@ -504,7 +506,7 @@ int BOX_MAIN(btrfstune)(int argc, char *argv[])
 			{ "help", no_argument, NULL, GETOPT_VAL_HELP},
 			{ NULL, 0, NULL, 0 }
 		};
-		int c = getopt_long(argc, argv, "S:rxfuU:nmM:b", long_options,
+		int c = getopt_long(argc, argv, "S:rxfuU:nmM:bB", long_options,
 				    NULL);
 
 		if (c < 0)
@@ -546,6 +548,9 @@ int BOX_MAIN(btrfstune)(int argc, char *argv[])
 		case 'b':
 			to_skinny_bg_tree = true;
 			break;
+		case 'B':
+			to_extent_tree = true;
+			break;
 		case GETOPT_VAL_HELP:
 		default:
 			print_usage();
@@ -563,11 +568,15 @@ int BOX_MAIN(btrfstune)(int argc, char *argv[])
 		return 1;
 	}
 	if (!super_flags && !seeding_flag && !(random_fsid || new_fsid_str) &&
-	    !change_metadata_uuid && !to_skinny_bg_tree) {
+	    !change_metadata_uuid && !to_skinny_bg_tree && !to_extent_tree) {
 		error("at least one option should be specified");
 		print_usage();
 		return 1;
 	}
+	if (to_extent_tree && to_skinny_bg_tree) {
+		error("'-b' and '-B' conflict with each other");
+		return 1;
+	}
 
 	if (new_fsid_str) {
 		uuid_t tmp;
@@ -617,6 +626,14 @@ int BOX_MAIN(btrfstune)(int argc, char *argv[])
 			goto out;
 		}
 	}
+	if (to_extent_tree) {
+		ret = btrfs_convert_to_extent_tree(root->fs_info);
+		if (ret < 0) {
+			errno = -ret;
+			error("failed to disable bg-tree feature: %m");
+			goto out;
+		}
+	}
 	if (seeding_flag) {
 		if (btrfs_fs_incompat(root->fs_info, METADATA_UUID)) {
 			fprintf(stderr, "SEED flag cannot be changed on a metadata-uuid changed fs\n");
diff --git a/ctree.h b/ctree.h
index a84237a06609..acae74a93833 100644
--- a/ctree.h
+++ b/ctree.h
@@ -1206,6 +1206,7 @@ struct btrfs_fs_info {
 	unsigned int finalize_on_close:1;
 	unsigned int hide_names:1;
 	unsigned int convert_to_skinny_bg_tree:1;
+	unsigned int convert_to_extent_tree:1;
 
 	int transaction_aborted;
 
@@ -2621,6 +2622,7 @@ u64 add_new_free_space(struct btrfs_block_group *block_group,
 		       struct btrfs_fs_info *info, u64 start, u64 end);
 u64 hash_extent_data_ref(u64 root_objectid, u64 owner, u64 offset);
 int btrfs_convert_to_skinny_bg_tree(struct btrfs_fs_info *fs_info);
+int btrfs_convert_to_extent_tree(struct btrfs_fs_info *fs_info);
 
 /* ctree.c */
 int btrfs_comp_cpu_keys(const struct btrfs_key *k1, const struct btrfs_key *k2);
diff --git a/extent-tree.c b/extent-tree.c
index b6ac7b4caa2f..df4c08ddc6f0 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -1597,6 +1597,14 @@ static int update_block_group_item(struct btrfs_trans_handle *trans,
 		btrfs_set_item_key_safe(fs_info->bg_root, path, &key);
 		return 0;
 	}
+	if (fs_info->convert_to_extent_tree) {
+		ret = insert_block_group_item(trans, cache);
+		if (ret == 0)
+			return ret;
+		if (ret < 0 && ret != -EEXIST)
+			goto fail;
+		/* -EEXIST case falls through */
+	}
 	key.objectid = cache->start;
 	key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
 	key.offset = cache->length;
@@ -4128,3 +4136,114 @@ error:
 	btrfs_abort_transaction(trans, ret);
 	return ret;
 }
+
+static int clear_bg_tree(struct btrfs_trans_handle *trans)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_root *root = fs_info->bg_root;
+	struct btrfs_path path;
+	struct btrfs_key key;
+	int ret;
+	int nr;
+
+	btrfs_init_path(&path);
+	key.objectid = 0;
+	key.type = 0;
+	key.offset = 0;
+
+	while (1) {
+		ret = btrfs_search_slot(trans, root, &key, &path, -1, 1);
+		if (ret < 0)
+			goto out;
+		nr = btrfs_header_nritems(path.nodes[0]);
+		if (!nr)
+			break;
+		ret = btrfs_del_items(trans, root, &path, 0, nr);
+		if (ret < 0)
+			goto out;
+		btrfs_release_path(&path);
+	}
+	ret = 0;
+out:
+	btrfs_release_path(&path);
+	return ret;
+}
+
+int btrfs_convert_to_extent_tree(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_trans_handle *trans;
+	struct btrfs_root *bg_root = fs_info->bg_root;
+	struct btrfs_block_group *bg;
+	u64 features = btrfs_super_incompat_flags(fs_info->super_copy);
+	int ret;
+
+	if (bg_root == NULL) {
+		printf("The fs is not using skinny bg tree\n");
+		return 0;
+	}
+	trans = btrfs_start_transaction(fs_info->tree_root, 1);
+	if (IS_ERR(trans)) {
+		ret = PTR_ERR(trans);
+		errno = -ret;
+		error("failed to start transaction: %m");
+		return ret;
+	}
+
+	/*
+	 * Empty bg tree, but not delete it yet, as btrfs-progs doesn't have 
+	 * good root deletion routine.
+	 */
+	ret = clear_bg_tree(trans);
+	if (ret < 0) {
+		errno = -ret;
+		error("failed to delete bg tree: %m");
+		goto error;
+	}
+	/* Clear SKINNY_BG_FEATURE and set convert status */
+	btrfs_set_super_incompat_flags(fs_info->super_copy,
+			features & ~BTRFS_FEATURE_INCOMPAT_SKINNY_BG_TREE);
+	fs_info->convert_to_extent_tree = 1;
+
+	/* Mark all bgs dirty so convert will happen at convert time */
+	for (bg = btrfs_lookup_first_block_group(fs_info, 0); bg;
+	     bg = btrfs_lookup_first_block_group(fs_info,
+		     bg->start + bg->length))
+		if (list_empty(&bg->dirty_list))
+			list_add_tail(&bg->dirty_list, &trans->dirty_bgs);
+
+	ret = btrfs_commit_transaction(trans, fs_info->tree_root);
+	if (ret < 0) {
+		errno = -ret;
+		error("failed to commit transaction: %m");
+		goto error;
+	}
+	trans = btrfs_start_transaction(fs_info->tree_root, 1);
+	if (IS_ERR(trans)) {
+		ret = PTR_ERR(trans);
+		errno = -ret;
+		error("failed to start transaction: %m");
+		return ret;
+	}
+
+	/* Now cleanup the eb used by bg tree and delete it */
+	ret = btrfs_free_tree_block(trans, bg_root, bg_root->node, 0, 0);
+	if (ret < 0) {
+		errno = -ret;
+		error("failed to free bg tree root node: %m");
+		goto error;
+	}
+	free_extent_buffer(bg_root->node);
+	ret = btrfs_del_root(trans, fs_info->tree_root, &bg_root->root_key);
+	if (ret < 0) {
+		errno = -ret;
+		error("failed to delete bg root: %m");
+		goto error;
+	}
+	free(bg_root);
+	fs_info->bg_root = NULL;
+	ret = btrfs_commit_transaction(trans, fs_info->tree_root);
+	return ret;
+error:
+	btrfs_abort_transaction(trans, ret);
+	return ret;
+}
diff --git a/root-tree.c b/root-tree.c
index 6b8f8c1ce6c5..e39fe15fdd8e 100644
--- a/root-tree.c
+++ b/root-tree.c
@@ -83,7 +83,11 @@ int btrfs_update_root(struct btrfs_trans_handle *trans, struct btrfs_root
 	ret = btrfs_search_slot(trans, root, key, path, 0, 1);
 	if (ret < 0)
 		goto out;
-	BUG_ON(ret != 0);
+	/* The root has been deleted */
+	if (ret > 0) {
+		ret = 0;
+		goto out;
+	}
 	l = path->nodes[0];
 	slot = path->slots[0];
 	ptr = btrfs_item_ptr_offset(l, slot);
diff --git a/transaction.c b/transaction.c
index 4a00ff08d45a..b5f33e0900d4 100644
--- a/transaction.c
+++ b/transaction.c
@@ -227,6 +227,7 @@ commit_tree:
 	fs_info->running_transaction = NULL;
 	fs_info->last_trans_committed = transid;
 	fs_info->convert_to_skinny_bg_tree = 0;
+	fs_info->convert_to_extent_tree = 0;
 	list_for_each_entry(sinfo, &fs_info->space_info, list) {
 		if (sinfo->bytes_reserved) {
 			warning(
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 01/11] btrfs-progs: check/lowmem: Lookup block group item in a seperate function
  2020-05-05  0:02 ` [PATCH v4 01/11] btrfs-progs: check/lowmem: Lookup block group item in a seperate function Qu Wenruo
@ 2020-05-06 17:24   ` Johannes Thumshirn
  0 siblings, 0 replies; 24+ messages in thread
From: Johannes Thumshirn @ 2020-05-06 17:24 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Looks reasonable,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 02/11] btrfs-progs: block-group: Refactor how we read one block group item
  2020-05-05  0:02 ` [PATCH v4 02/11] btrfs-progs: block-group: Refactor how we read one block group item Qu Wenruo
@ 2020-05-06 17:27   ` Johannes Thumshirn
  2020-05-06 22:52     ` Qu Wenruo
  0 siblings, 1 reply; 24+ messages in thread
From: Johannes Thumshirn @ 2020-05-06 17:27 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 05/05/2020 02:02, Qu Wenruo wrote:
> - Use btrfs_block_group::length  to replace key::offset
>    Since skinny block group item would have a different meaning for its
>    key offset.

Nope, you still use key->offset for cache->length

> +
> +	cache->start = key->objectid;
> +	cache->length = key->offset;


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 02/11] btrfs-progs: block-group: Refactor how we read one block group item
  2020-05-06 17:27   ` Johannes Thumshirn
@ 2020-05-06 22:52     ` Qu Wenruo
  2020-05-07  7:41       ` Johannes Thumshirn
  0 siblings, 1 reply; 24+ messages in thread
From: Qu Wenruo @ 2020-05-06 22:52 UTC (permalink / raw)
  To: Johannes Thumshirn, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 693 bytes --]



On 2020/5/7 上午1:27, Johannes Thumshirn wrote:
> On 05/05/2020 02:02, Qu Wenruo wrote:
>> - Use btrfs_block_group::length  to replace key::offset
>>    Since skinny block group item would have a different meaning for its
>>    key offset.
> 
> Nope, you still use key->offset for cache->length

That's no problem for regular block group item, as in that case
key->offset is block group length.

It looks like the sentence is not clear enough, what I mean is, after
read_block_group_item(), there shouldn't be any key->offset user, but
use block_group->length instead.

Thanks,
Qu

> 
>> +
>> +	cache->start = key->objectid;
>> +	cache->length = key->offset;
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 02/11] btrfs-progs: block-group: Refactor how we read one block group item
  2020-05-06 22:52     ` Qu Wenruo
@ 2020-05-07  7:41       ` Johannes Thumshirn
  0 siblings, 0 replies; 24+ messages in thread
From: Johannes Thumshirn @ 2020-05-07  7:41 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs

On 07/05/2020 00:52, Qu Wenruo wrote:
> It looks like the sentence is not clear enough, what I mean is, after
> read_block_group_item(), there shouldn't be any key->offset user, but
> use block_group->length instead.

Ah ok, that makes more sense then.

Thanks,
	Johannes

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 03/11] btrfs-progs: Rename btrfs_remove_block_group() and free_block_group_item()
  2020-05-05  0:02 ` [PATCH v4 03/11] btrfs-progs: Rename btrfs_remove_block_group() and free_block_group_item() Qu Wenruo
@ 2020-05-07 11:05   ` Johannes Thumshirn
  0 siblings, 0 replies; 24+ messages in thread
From: Johannes Thumshirn @ 2020-05-07 11:05 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 04/11] btrfs-progs: block-group: Refactor how we insert a block group item
  2020-05-05  0:02 ` [PATCH v4 04/11] btrfs-progs: block-group: Refactor how we insert a block group item Qu Wenruo
@ 2020-05-08 14:23   ` Johannes Thumshirn
  0 siblings, 0 replies; 24+ messages in thread
From: Johannes Thumshirn @ 2020-05-08 14:23 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 05/11] btrfs-progs: block-group: Rename write_one_cahce_group()
  2020-05-05  0:02 ` [PATCH v4 05/11] btrfs-progs: block-group: Rename write_one_cahce_group() Qu Wenruo
@ 2020-05-08 14:24   ` Johannes Thumshirn
  0 siblings, 0 replies; 24+ messages in thread
From: Johannes Thumshirn @ 2020-05-08 14:24 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 00/11] btrfs-progs: Support for SKINNY_BG_TREE feature
  2020-05-05  0:02 [PATCH v4 00/11] btrfs-progs: Support for SKINNY_BG_TREE feature Qu Wenruo
                   ` (10 preceding siblings ...)
  2020-05-05  0:02 ` [PATCH v4 11/11] btrfs-progs: btrfstune: Allow user to rollback to regular extent tree Qu Wenruo
@ 2020-05-11 18:58 ` David Sterba
  2020-05-12  0:26   ` Qu Wenruo
  2020-05-12  2:30   ` Qu Wenruo
  11 siblings, 2 replies; 24+ messages in thread
From: David Sterba @ 2020-05-11 18:58 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, May 05, 2020 at 08:02:19AM +0800, Qu Wenruo wrote:
> This patchset can be fetched from github:
> https://github.com/adam900710/btrfs-progs/tree/skinny_bg_tree
> Which is based on v5.6 tag, with extra cleanups (sent to mail list) applied.
> 
> This patchset provides the needed user space infrastructure for SKINNY_BG_TREE
> feature.
> 
> Since it's an new incompatible feature, unlike SKINNY_METADATA, btrfs-progs
> is needed to convert existing fs (unmounted) to new format, and
> vice-verse.
> 
> Now btrfstune can convert regular extent tree fs to bg tree fs to
> improve mount time.
> 
> For the performance improvement, please check the kernel patchset cover
> letter or the last patch.
> (SPOILER ALERT: It's super fast)
> 
> Changelog:
> v2:
> - Rebase to v5.2.2 tag
> - Add btrfstune ability to convert existing fs to BG_TREE feature
> 
> v3:
> - Fix a bug that temp chunks are not cleaned up properly
>   This is caused by wrong timing btrfs_convert_to_bg_tree() is called.
>   It should be called after temp chunks cleaned up.
> 
> - Fix a bug that an extent buffer get leaked
>   This is caused by newly created bg tree not added to dirty list.
> 
> v4:
> - Go with skinny bg tree other than regular block group item
>   We're introducing a new incompatible feature anyway, why not go
>   extreme?
> 
> - Use the same refactor as kernel.
>   To make code much cleaner and easier to read.
> 
> - Add the ability to rollback to regular extent tree.
>   So confident tester can try SKINNY_BG_TREE using their real world
>   data, and rollback if they still want to mount it using older kernels.
>
> Qu Wenruo (11):
>   btrfs-progs: check/lowmem: Lookup block group item in a seperate
>     function
>   btrfs-progs: block-group: Refactor how we read one block group item
>   btrfs-progs: Rename btrfs_remove_block_group() and
>     free_block_group_item()
>   btrfs-progs: block-group: Refactor how we insert a block group item
>   btrfs-progs: block-group: Rename write_one_cahce_group()

I'll add the above patches independently, for the rest I don't know. I
still think the separate tree is somehow wrong so have to convince
myself that it's not.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 00/11] btrfs-progs: Support for SKINNY_BG_TREE feature
  2020-05-11 18:58 ` [PATCH v4 00/11] btrfs-progs: Support for SKINNY_BG_TREE feature David Sterba
@ 2020-05-12  0:26   ` Qu Wenruo
  2020-05-12  2:30   ` Qu Wenruo
  1 sibling, 0 replies; 24+ messages in thread
From: Qu Wenruo @ 2020-05-12  0:26 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2768 bytes --]



On 2020/5/12 上午2:58, David Sterba wrote:
> On Tue, May 05, 2020 at 08:02:19AM +0800, Qu Wenruo wrote:
>> This patchset can be fetched from github:
>> https://github.com/adam900710/btrfs-progs/tree/skinny_bg_tree
>> Which is based on v5.6 tag, with extra cleanups (sent to mail list) applied.
>>
>> This patchset provides the needed user space infrastructure for SKINNY_BG_TREE
>> feature.
>>
>> Since it's an new incompatible feature, unlike SKINNY_METADATA, btrfs-progs
>> is needed to convert existing fs (unmounted) to new format, and
>> vice-verse.
>>
>> Now btrfstune can convert regular extent tree fs to bg tree fs to
>> improve mount time.
>>
>> For the performance improvement, please check the kernel patchset cover
>> letter or the last patch.
>> (SPOILER ALERT: It's super fast)
>>
>> Changelog:
>> v2:
>> - Rebase to v5.2.2 tag
>> - Add btrfstune ability to convert existing fs to BG_TREE feature
>>
>> v3:
>> - Fix a bug that temp chunks are not cleaned up properly
>>   This is caused by wrong timing btrfs_convert_to_bg_tree() is called.
>>   It should be called after temp chunks cleaned up.
>>
>> - Fix a bug that an extent buffer get leaked
>>   This is caused by newly created bg tree not added to dirty list.
>>
>> v4:
>> - Go with skinny bg tree other than regular block group item
>>   We're introducing a new incompatible feature anyway, why not go
>>   extreme?
>>
>> - Use the same refactor as kernel.
>>   To make code much cleaner and easier to read.
>>
>> - Add the ability to rollback to regular extent tree.
>>   So confident tester can try SKINNY_BG_TREE using their real world
>>   data, and rollback if they still want to mount it using older kernels.
>>
>> Qu Wenruo (11):
>>   btrfs-progs: check/lowmem: Lookup block group item in a seperate
>>     function
>>   btrfs-progs: block-group: Refactor how we read one block group item
>>   btrfs-progs: Rename btrfs_remove_block_group() and
>>     free_block_group_item()
>>   btrfs-progs: block-group: Refactor how we insert a block group item
>>   btrfs-progs: block-group: Rename write_one_cahce_group()
> 
> I'll add the above patches independently, for the rest I don't know. I
> still think the separate tree is somehow wrong so have to convince
> myself that it's not.

No problem.

Since the refactor would be the basis for whatever the final method we
choose, it should be pretty OK.
Even if we go something like (0, NEW_BLOCK_GROUP_ITEM, bytenr) in extent
tree, the refactor still makes a lot of sense.

BTW, if we merge the skip_bg mount option before this patchset, we could
even make the skinny_bg_tree feature RO compactable.

So it would be a good time reviewing that feature.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 00/11] btrfs-progs: Support for SKINNY_BG_TREE feature
  2020-05-11 18:58 ` [PATCH v4 00/11] btrfs-progs: Support for SKINNY_BG_TREE feature David Sterba
  2020-05-12  0:26   ` Qu Wenruo
@ 2020-05-12  2:30   ` Qu Wenruo
  2020-05-12  8:21     ` Nikolay Borisov
  1 sibling, 1 reply; 24+ messages in thread
From: Qu Wenruo @ 2020-05-12  2:30 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3030 bytes --]



On 2020/5/12 上午2:58, David Sterba wrote:
> On Tue, May 05, 2020 at 08:02:19AM +0800, Qu Wenruo wrote:
>> This patchset can be fetched from github:
>> https://github.com/adam900710/btrfs-progs/tree/skinny_bg_tree
>> Which is based on v5.6 tag, with extra cleanups (sent to mail list) applied.
>>
>> This patchset provides the needed user space infrastructure for SKINNY_BG_TREE
>> feature.
>>
>> Since it's an new incompatible feature, unlike SKINNY_METADATA, btrfs-progs
>> is needed to convert existing fs (unmounted) to new format, and
>> vice-verse.
>>
>> Now btrfstune can convert regular extent tree fs to bg tree fs to
>> improve mount time.
>>
>> For the performance improvement, please check the kernel patchset cover
>> letter or the last patch.
>> (SPOILER ALERT: It's super fast)
>>
>> Changelog:
>> v2:
>> - Rebase to v5.2.2 tag
>> - Add btrfstune ability to convert existing fs to BG_TREE feature
>>
>> v3:
>> - Fix a bug that temp chunks are not cleaned up properly
>>   This is caused by wrong timing btrfs_convert_to_bg_tree() is called.
>>   It should be called after temp chunks cleaned up.
>>
>> - Fix a bug that an extent buffer get leaked
>>   This is caused by newly created bg tree not added to dirty list.
>>
>> v4:
>> - Go with skinny bg tree other than regular block group item
>>   We're introducing a new incompatible feature anyway, why not go
>>   extreme?
>>
>> - Use the same refactor as kernel.
>>   To make code much cleaner and easier to read.
>>
>> - Add the ability to rollback to regular extent tree.
>>   So confident tester can try SKINNY_BG_TREE using their real world
>>   data, and rollback if they still want to mount it using older kernels.
>>
>> Qu Wenruo (11):
>>   btrfs-progs: check/lowmem: Lookup block group item in a seperate
>>     function
>>   btrfs-progs: block-group: Refactor how we read one block group item
>>   btrfs-progs: Rename btrfs_remove_block_group() and
>>     free_block_group_item()
>>   btrfs-progs: block-group: Refactor how we insert a block group item
>>   btrfs-progs: block-group: Rename write_one_cahce_group()
> 
> I'll add the above patches independently, for the rest I don't know. I
> still think the separate tree is somehow wrong so have to convince
> myself that it's not.
> 
One interesting advantage here is, separate block group tree would
hugely reduce the possibility to fail to mount due to corrupted extent tree.
There are two reports of different corruption on extent tree already in
the mail list in the last 24 hours.

While the skinny bg tree could hugely reduce the amount of block group
items, which means less possibility to corrupt.

And since we have less tree blocks for block group tree, the cow cost
would also be reduced obviously.
As one BGI (just a key) get modified, all modification to other keys in
that leaf won't lead to new COW until next transaction.

So personally I believe it's much better than regular extent tree.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 00/11] btrfs-progs: Support for SKINNY_BG_TREE feature
  2020-05-12  2:30   ` Qu Wenruo
@ 2020-05-12  8:21     ` Nikolay Borisov
  2020-05-12  8:44       ` Qu Wenruo
  0 siblings, 1 reply; 24+ messages in thread
From: Nikolay Borisov @ 2020-05-12  8:21 UTC (permalink / raw)
  To: Qu Wenruo, dsterba, Qu Wenruo, linux-btrfs



On 12.05.20 г. 5:30 ч., Qu Wenruo wrote:
> 
> 
> On 2020/5/12 上午2:58, David Sterba wrote:
>> On Tue, May 05, 2020 at 08:02:19AM +0800, Qu Wenruo wrote:
>>> This patchset can be fetched from github:
>>> https://github.com/adam900710/btrfs-progs/tree/skinny_bg_tree
>>> Which is based on v5.6 tag, with extra cleanups (sent to mail list) applied.
>>>
>>> This patchset provides the needed user space infrastructure for SKINNY_BG_TREE
>>> feature.
>>>
>>> Since it's an new incompatible feature, unlike SKINNY_METADATA, btrfs-progs
>>> is needed to convert existing fs (unmounted) to new format, and
>>> vice-verse.
>>>
>>> Now btrfstune can convert regular extent tree fs to bg tree fs to
>>> improve mount time.
>>>
>>> For the performance improvement, please check the kernel patchset cover
>>> letter or the last patch.
>>> (SPOILER ALERT: It's super fast)
>>>
>>> Changelog:
>>> v2:
>>> - Rebase to v5.2.2 tag
>>> - Add btrfstune ability to convert existing fs to BG_TREE feature
>>>
>>> v3:
>>> - Fix a bug that temp chunks are not cleaned up properly
>>>   This is caused by wrong timing btrfs_convert_to_bg_tree() is called.
>>>   It should be called after temp chunks cleaned up.
>>>
>>> - Fix a bug that an extent buffer get leaked
>>>   This is caused by newly created bg tree not added to dirty list.
>>>
>>> v4:
>>> - Go with skinny bg tree other than regular block group item
>>>   We're introducing a new incompatible feature anyway, why not go
>>>   extreme?
>>>
>>> - Use the same refactor as kernel.
>>>   To make code much cleaner and easier to read.
>>>
>>> - Add the ability to rollback to regular extent tree.
>>>   So confident tester can try SKINNY_BG_TREE using their real world
>>>   data, and rollback if they still want to mount it using older kernels.
>>>
>>> Qu Wenruo (11):
>>>   btrfs-progs: check/lowmem: Lookup block group item in a seperate
>>>     function
>>>   btrfs-progs: block-group: Refactor how we read one block group item
>>>   btrfs-progs: Rename btrfs_remove_block_group() and
>>>     free_block_group_item()
>>>   btrfs-progs: block-group: Refactor how we insert a block group item
>>>   btrfs-progs: block-group: Rename write_one_cahce_group()
>>
>> I'll add the above patches independently, for the rest I don't know. I
>> still think the separate tree is somehow wrong so have to convince
>> myself that it's not.
>>
> One interesting advantage here is, separate block group tree would
> hugely reduce the possibility to fail to mount due to corrupted extent tree.
> There are two reports of different corruption on extent tree already in
> the mail list in the last 24 hours.
> 
> While the skinny bg tree could hugely reduce the amount of block group
> items, which means less possibility to corrupt.
> 
> And since we have less tree blocks for block group tree, the cow cost
> would also be reduced obviously.
> As one BGI (just a key) get modified, all modification to other keys in
> that leaf won't lead to new COW until next transaction.
> 
> So personally I believe it's much better than regular extent tree.

Perhaps it will be more convincing if you could substantiate those
claims with numbers. I.e run some benchmarks and show numbers under what
cases the added complexity brings positives to the table.

> 
> Thanks,
> Qu
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 00/11] btrfs-progs: Support for SKINNY_BG_TREE feature
  2020-05-12  8:21     ` Nikolay Borisov
@ 2020-05-12  8:44       ` Qu Wenruo
  0 siblings, 0 replies; 24+ messages in thread
From: Qu Wenruo @ 2020-05-12  8:44 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, dsterba, linux-btrfs



On 2020/5/12 下午4:21, Nikolay Borisov wrote:
> 
> 
> On 12.05.20 г. 5:30 ч., Qu Wenruo wrote:
>>
>>
>> On 2020/5/12 上午2:58, David Sterba wrote:
>>> On Tue, May 05, 2020 at 08:02:19AM +0800, Qu Wenruo wrote:
>>>> This patchset can be fetched from github:
>>>> https://github.com/adam900710/btrfs-progs/tree/skinny_bg_tree
>>>> Which is based on v5.6 tag, with extra cleanups (sent to mail list) applied.
>>>>
>>>> This patchset provides the needed user space infrastructure for SKINNY_BG_TREE
>>>> feature.
>>>>
>>>> Since it's an new incompatible feature, unlike SKINNY_METADATA, btrfs-progs
>>>> is needed to convert existing fs (unmounted) to new format, and
>>>> vice-verse.
>>>>
>>>> Now btrfstune can convert regular extent tree fs to bg tree fs to
>>>> improve mount time.
>>>>
>>>> For the performance improvement, please check the kernel patchset cover
>>>> letter or the last patch.
>>>> (SPOILER ALERT: It's super fast)
>>>>
>>>> Changelog:
>>>> v2:
>>>> - Rebase to v5.2.2 tag
>>>> - Add btrfstune ability to convert existing fs to BG_TREE feature
>>>>
>>>> v3:
>>>> - Fix a bug that temp chunks are not cleaned up properly
>>>>   This is caused by wrong timing btrfs_convert_to_bg_tree() is called.
>>>>   It should be called after temp chunks cleaned up.
>>>>
>>>> - Fix a bug that an extent buffer get leaked
>>>>   This is caused by newly created bg tree not added to dirty list.
>>>>
>>>> v4:
>>>> - Go with skinny bg tree other than regular block group item
>>>>   We're introducing a new incompatible feature anyway, why not go
>>>>   extreme?
>>>>
>>>> - Use the same refactor as kernel.
>>>>   To make code much cleaner and easier to read.
>>>>
>>>> - Add the ability to rollback to regular extent tree.
>>>>   So confident tester can try SKINNY_BG_TREE using their real world
>>>>   data, and rollback if they still want to mount it using older kernels.
>>>>
>>>> Qu Wenruo (11):
>>>>   btrfs-progs: check/lowmem: Lookup block group item in a seperate
>>>>     function
>>>>   btrfs-progs: block-group: Refactor how we read one block group item
>>>>   btrfs-progs: Rename btrfs_remove_block_group() and
>>>>     free_block_group_item()
>>>>   btrfs-progs: block-group: Refactor how we insert a block group item
>>>>   btrfs-progs: block-group: Rename write_one_cahce_group()
>>>
>>> I'll add the above patches independently, for the rest I don't know. I
>>> still think the separate tree is somehow wrong so have to convince
>>> myself that it's not.
>>>
>> One interesting advantage here is, separate block group tree would
>> hugely reduce the possibility to fail to mount due to corrupted extent tree.
>> There are two reports of different corruption on extent tree already in
>> the mail list in the last 24 hours.
>>
>> While the skinny bg tree could hugely reduce the amount of block group
>> items, which means less possibility to corrupt.
>>
>> And since we have less tree blocks for block group tree, the cow cost
>> would also be reduced obviously.
>> As one BGI (just a key) get modified, all modification to other keys in
>> that leaf won't lead to new COW until next transaction.
>>
>> So personally I believe it's much better than regular extent tree.
> 
> Perhaps it will be more convincing if you could substantiate those
> claims with numbers. I.e run some benchmarks and show numbers under what
> cases the added complexity brings positives to the table.

Exactly in the patch implementing the feature:


          |  Extent tree  |  Skinny bg tree  |
----------------------------------------------
  nodes   |            55 |                1 |
  leaves  |          1025 |                7 |
  total   |          1080 |                8 |

That's 1T used fs with 4K node size, which has at least 1024 data block
groups.

The above number is the needed tree blocks to be iterated to read all
block group items.

Thanks,
Qu
> 
>>
>> Thanks,
>> Qu
>>

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-05-12  8:45 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-05  0:02 [PATCH v4 00/11] btrfs-progs: Support for SKINNY_BG_TREE feature Qu Wenruo
2020-05-05  0:02 ` [PATCH v4 01/11] btrfs-progs: check/lowmem: Lookup block group item in a seperate function Qu Wenruo
2020-05-06 17:24   ` Johannes Thumshirn
2020-05-05  0:02 ` [PATCH v4 02/11] btrfs-progs: block-group: Refactor how we read one block group item Qu Wenruo
2020-05-06 17:27   ` Johannes Thumshirn
2020-05-06 22:52     ` Qu Wenruo
2020-05-07  7:41       ` Johannes Thumshirn
2020-05-05  0:02 ` [PATCH v4 03/11] btrfs-progs: Rename btrfs_remove_block_group() and free_block_group_item() Qu Wenruo
2020-05-07 11:05   ` Johannes Thumshirn
2020-05-05  0:02 ` [PATCH v4 04/11] btrfs-progs: block-group: Refactor how we insert a block group item Qu Wenruo
2020-05-08 14:23   ` Johannes Thumshirn
2020-05-05  0:02 ` [PATCH v4 05/11] btrfs-progs: block-group: Rename write_one_cahce_group() Qu Wenruo
2020-05-08 14:24   ` Johannes Thumshirn
2020-05-05  0:02 ` [PATCH v4 06/11] btrfs-progs: Introduce rw support for skinny_bg_tree Qu Wenruo
2020-05-05  0:02 ` [PATCH v4 07/11] btrfs-progs: mkfs: Introduce -O skinny-bg-tree Qu Wenruo
2020-05-05  0:02 ` [PATCH v4 08/11] btrfs-progs: dump-tree/dump-super: Introduce support for skinny bg tree Qu Wenruo
2020-05-05  0:02 ` [PATCH v4 09/11] btrfs-progs: check: Introduce support for bg-tree feature Qu Wenruo
2020-05-05  0:02 ` [PATCH v4 10/11] btrfs-progs: btrfstune: Allow to enable bg-tree feature offline Qu Wenruo
2020-05-05  0:02 ` [PATCH v4 11/11] btrfs-progs: btrfstune: Allow user to rollback to regular extent tree Qu Wenruo
2020-05-11 18:58 ` [PATCH v4 00/11] btrfs-progs: Support for SKINNY_BG_TREE feature David Sterba
2020-05-12  0:26   ` Qu Wenruo
2020-05-12  2:30   ` Qu Wenruo
2020-05-12  8:21     ` Nikolay Borisov
2020-05-12  8:44       ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.