linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/5] btrfs-progs: separate BLOCK_GROUP_TREE feature from extent-tree-v2
@ 2022-08-09  6:03 Qu Wenruo
  2022-08-09  6:03 ` [PATCH v3 1/5] btrfs-progs: mkfs: dynamically modify mkfs blocks array Qu Wenruo
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Qu Wenruo @ 2022-08-09  6:03 UTC (permalink / raw)
  To: linux-btrfs

[CHANGELOG]
v3:
- Add the artificial dependency for block group tree
  Now free-space-tree and no-holes must be enabled to use block group
  tree feature.
  This is for both mkfs and btrfstune.

v2:
- Add the ability to convert to bg tree using btrfstune
  Unlike the original patches years ago, which goes one transaction to
  convert the full fs to bg tree, this new method goes
  multi-transaction.

  After converting every 64 block groups, we will commit a transaction
  to avoid doing too much work in one trans.

  And the new convert will have fs_info->last_convert_bg_byter to record
  which bgs have been converted.
  This allows any bgs beyond above value to go new bg tree, while bgs
  before that threshold to go regular extent tree.

  The only concern is, the new method is pretty large in one single
  patch (+427/-27), which is not easy to review.
  I hope to get some feedback before adding the convert from bg tree
  (aka, convert from bg tree to regular extent tree).


The block group tree idea is introduced to greatly reduce mount time for
large fs, over SEVERN years ago.

And 4 years ago, it's determined to let extent-tree-v2 to implement the
feature.

However extent tree v2 still doesn't have a consistent on-disk format,
nor any implementation on the real extent items, nor any tests on some
independent sub-features.

I strongly doubt if that's a correct decision, especially considering
there is really no dependency from extent tree v2 on this block group
tree feature.

Not to mention this is against the common idea on progressive
improvement.

So now is the time to revive the independent block group compat RO flag.

[CHANGE FROM EXTENT-TREE-V2]
- Don't store block group root into super block
  There is no special reason for block group root to be stored in super
  block.

- Separate block-group-tree as a compat RO flag from extent-tree-v2
  The change to extent tree is not affecting read-only opeartions.
  No reason to make it incompat.

- Fix a bug in extent-tree-v2 which doesn't initialize block group item
  correctly.
  Since we're re-using the existing block group item structure, we
  should properly initialize chunk_objectid to 256, or tree block
  will reject it.

- Dynamically arrange the mkfs_block array
  Instead a completely new array dedicated for extent-tree-v2, now we
  have proper helpers to add/delete block from the array on-the-fly.

[TODO]
- Add btrfstune support to convert from block-group-tree feature
  The infrastructure is already done.


Qu Wenruo (5):
  btrfs-progs: mkfs: dynamically modify mkfs blocks array
  btrfs-progs: don't save block group root into super block
  btrfs-progs: separate block group tree from extent tree v2
  btrfs-progs: btrfstune: add the ability to convert to block group tree
    feature
  btrfs-progs: mkfs: add artificial dependency for block group tree

 Documentation/btrfstune.rst |   5 +
 btrfstune.c                 | 148 ++++++++++++++++++++-
 check/main.c                |   8 +-
 cmds/inspect-dump-tree.c    |  11 --
 common/fsfeatures.c         |   8 ++
 common/fsfeatures.h         |   2 +
 kernel-shared/ctree.c       |   8 ++
 kernel-shared/ctree.h       |  55 ++++----
 kernel-shared/disk-io.c     | 103 +++++----------
 kernel-shared/disk-io.h     |   5 +-
 kernel-shared/extent-tree.c | 247 ++++++++++++++++++++++++++++++++++--
 kernel-shared/print-tree.c  |  11 +-
 mkfs/common.c               | 113 ++++++++++++++---
 mkfs/common.h               |  20 +--
 mkfs/main.c                 |  10 +-
 15 files changed, 578 insertions(+), 176 deletions(-)

-- 
2.37.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 1/5] btrfs-progs: mkfs: dynamically modify mkfs blocks array
  2022-08-09  6:03 [PATCH v3 0/5] btrfs-progs: separate BLOCK_GROUP_TREE feature from extent-tree-v2 Qu Wenruo
@ 2022-08-09  6:03 ` Qu Wenruo
  2022-08-09  6:03 ` [PATCH v3 2/5] btrfs-progs: don't save block group root into super block Qu Wenruo
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2022-08-09  6:03 UTC (permalink / raw)
  To: linux-btrfs

In mkfs_btrfs(), we have a btrfs_mkfs_block array to store how many tree
blocks we need to reserve for the initial btrfs image.

Currently we have two very similar arrays, extent_tree_v1_blocks and
extent_tree_v2_blocks.

The only difference is just v2 has an extra block for block group tree.

This patch will add two helpers, mkfs_blocks_add() and
mkfs_blocks_remove() to properly add/remove one block dynamically from
the array.

This allows 3 things:

- Merge extent_tree_v1_blocks and extent_tree_v2_blocks into one array
  The new array will be the same as extent_tree_v1_blocks.
  For extent-tree-v2, we just dynamically add MKFS_BLOCK_GROUP_TREE.

- Remove free space tree block on-demand
  This only works for extent-tree-v1 case, as v2 has a hard requirement
  on free space tree.
  But this still make code much cleaner, not doing any special hacks.

- Allow future expansion without introduce new array
  I strongly doubt why this is not properly done in extent-tree-v2
  preparation patches.
  We should not allow bad practice to sneak in just because it's some
  preparation patches for a larger feature.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 mkfs/common.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++-----
 mkfs/common.h | 20 +++-----------
 2 files changed, 69 insertions(+), 24 deletions(-)

diff --git a/mkfs/common.c b/mkfs/common.c
index 218854491c14..d5a49ca11cde 100644
--- a/mkfs/common.c
+++ b/mkfs/common.c
@@ -260,6 +260,60 @@ next:
 	__builtin_unreachable();
 }
 
+/*
+ * Add @block into the @blocks array.
+ *
+ * The @blocks should already be in ascending order and no duplicate.
+ */
+static void mkfs_blocks_add(enum btrfs_mkfs_block *blocks, int *blocks_nr,
+			    enum btrfs_mkfs_block to_add)
+{
+	int i;
+
+	for (i = 0; i < *blocks_nr; i++) {
+		/* The target is already in the array. */
+		if (blocks[i] == to_add)
+			return;
+
+		/*
+		 * We find the first one past @to_add, move the array one slot
+		 * right, insert a new one.
+		 */
+		if (blocks[i] > to_add) {
+			memmove(blocks + i + 1, blocks + i, *blocks_nr - i);
+			blocks[i] = to_add;
+			(*blocks_nr)++;
+			return;
+		}
+		/* Current one still smaller than @to_add, go to next slot. */
+	}
+	/* All slots iterated and not match, insert into the last slot. */
+	blocks[i] = to_add;
+	(*blocks_nr)++;
+	return;
+}
+
+/*
+ * Remove @block from the @blocks array.
+ *
+ * The @blocks should already be in ascending order and no duplicate.
+ */
+static void mkfs_blocks_remove(enum btrfs_mkfs_block *blocks, int *blocks_nr,
+			       enum btrfs_mkfs_block to_remove)
+{
+	int i;
+
+	for (i = 0; i < *blocks_nr; i++) {
+		/* Found the target, move the array one slot left. */
+		if (blocks[i] == to_remove) {
+			memmove(blocks + i, blocks + i + 1, *blocks_nr - i - 1);
+			(*blocks_nr)--;
+		}
+	}
+	/* Nothing found, exit directly. */
+	return;
+}
+
 /*
  * @fs_uuid - if NULL, generates a UUID, returns back the new filesystem UUID
  *
@@ -290,12 +344,12 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 	struct btrfs_chunk *chunk;
 	struct btrfs_dev_item *dev_item;
 	struct btrfs_dev_extent *dev_extent;
-	const enum btrfs_mkfs_block *blocks = extent_tree_v1_blocks;
+	enum btrfs_mkfs_block blocks[MKFS_BLOCK_COUNT];
 	u8 chunk_tree_uuid[BTRFS_UUID_SIZE];
 	u8 *ptr;
 	int i;
 	int ret;
-	int blocks_nr = ARRAY_SIZE(extent_tree_v1_blocks);
+	int blocks_nr;
 	int blk;
 	u32 itemoff;
 	u32 nritems = 0;
@@ -315,16 +369,21 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 	bool extent_tree_v2 = !!(cfg->features &
 				 BTRFS_FEATURE_INCOMPAT_EXTENT_TREE_V2);
 
-	/* Don't include the free space tree in the blocks to process. */
-	if (!free_space_tree)
-		blocks_nr--;
+	memcpy(blocks, default_blocks,
+	       sizeof(enum btrfs_mkfs_block) * ARRAY_SIZE(default_blocks));
+	blocks_nr = ARRAY_SIZE(default_blocks);
 
+	/* Extent tree v2 needs an extra block for block group tree.*/
 	if (extent_tree_v2) {
-		blocks = extent_tree_v2_blocks;
-		blocks_nr = ARRAY_SIZE(extent_tree_v2_blocks);
+		mkfs_blocks_add(blocks, &blocks_nr, MKFS_BLOCK_GROUP_TREE);
 		add_block_group = false;
 	}
 
+	/* Don't include the free space tree in the blocks to process. */
+	if (!free_space_tree)
+		mkfs_blocks_remove(blocks, &blocks_nr, MKFS_FREE_SPACE_TREE);
+
+
 	if ((cfg->features & BTRFS_FEATURE_INCOMPAT_ZONED)) {
 		system_group_offset = zoned_system_group_offset(cfg->zone_size);
 		system_group_size = cfg->zone_size;
diff --git a/mkfs/common.h b/mkfs/common.h
index 3533e114e81c..47b14cdae2f3 100644
--- a/mkfs/common.h
+++ b/mkfs/common.h
@@ -52,25 +52,12 @@ enum btrfs_mkfs_block {
 	MKFS_CSUM_TREE,
 	MKFS_FREE_SPACE_TREE,
 	MKFS_BLOCK_GROUP_TREE,
-	MKFS_BLOCK_COUNT
-};
-
-static const enum btrfs_mkfs_block extent_tree_v1_blocks[] = {
-	MKFS_ROOT_TREE,
-	MKFS_EXTENT_TREE,
-	MKFS_CHUNK_TREE,
-	MKFS_DEV_TREE,
-	MKFS_FS_TREE,
-	MKFS_CSUM_TREE,
 
-	/*
-	 * Since the free space tree is optional with v1 it must always be last
-	 * in this array.
-	 */
-	MKFS_FREE_SPACE_TREE,
+	/* MKFS_BLOCK_COUNT should be the max blocks we can have at mkfs time. */
+	MKFS_BLOCK_COUNT
 };
 
-static const enum btrfs_mkfs_block extent_tree_v2_blocks[] = {
+static const enum btrfs_mkfs_block default_blocks[] = {
 	MKFS_ROOT_TREE,
 	MKFS_EXTENT_TREE,
 	MKFS_CHUNK_TREE,
@@ -78,7 +65,6 @@ static const enum btrfs_mkfs_block extent_tree_v2_blocks[] = {
 	MKFS_FS_TREE,
 	MKFS_CSUM_TREE,
 	MKFS_FREE_SPACE_TREE,
-	MKFS_BLOCK_GROUP_TREE,
 };
 
 struct btrfs_mkfs_config {
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 2/5] btrfs-progs: don't save block group root into super block
  2022-08-09  6:03 [PATCH v3 0/5] btrfs-progs: separate BLOCK_GROUP_TREE feature from extent-tree-v2 Qu Wenruo
  2022-08-09  6:03 ` [PATCH v3 1/5] btrfs-progs: mkfs: dynamically modify mkfs blocks array Qu Wenruo
@ 2022-08-09  6:03 ` Qu Wenruo
  2022-08-09  6:03 ` [PATCH v3 3/5] btrfs-progs: separate block group tree from extent tree v2 Qu Wenruo
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2022-08-09  6:03 UTC (permalink / raw)
  To: linux-btrfs

The extent tree v2 (thankfully not yet fully materialized) needs a
new root for storing all block group items.

My initial proposal years ago just added a new tree rootid, and load it
from tree root, just like what we did for quota/free space tree/uuid/extent
roots.

But the extent tree v2 patches introduced a completely new (and to me,
wasteful) way to store block group tree root into super block.

Currently there are only 3 trees stored in super blocks, and they all
have their valid reasons:

- Chunk root
  Needed for bootstrap.

- Tree root
  Really the entrance of all trees.

- Log root
  This is special as log root has to be updated out of existing
  transaction mechanism.

There is not even any reason to put block group root into super blocks,
the block group tree is updated at the same timing as old extent tree,
no need for extra bootstrap/out-of-transaction update.

So just move block group root from super block into tree root.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 cmds/inspect-dump-tree.c   | 11 ------
 kernel-shared/ctree.h      | 26 +------------
 kernel-shared/disk-io.c    | 75 ++++++++------------------------------
 kernel-shared/print-tree.c |  6 ---
 mkfs/common.c              | 11 ++----
 5 files changed, 20 insertions(+), 109 deletions(-)

diff --git a/cmds/inspect-dump-tree.c b/cmds/inspect-dump-tree.c
index 73ffd57eb13d..6374f137f7fb 100644
--- a/cmds/inspect-dump-tree.c
+++ b/cmds/inspect-dump-tree.c
@@ -517,11 +517,6 @@ static int cmd_inspect_dump_tree(const struct cmd_struct *cmd,
 				       info->log_root_tree->node->start,
 					btrfs_header_level(
 						info->log_root_tree->node));
-			if (info->block_group_root)
-				printf("block group tree: %llu level %d\n",
-				       info->block_group_root->node->start,
-					btrfs_header_level(
-						info->block_group_root->node));
 		} else {
 			if (info->tree_root->node) {
 				printf("root tree\n");
@@ -540,12 +535,6 @@ static int cmd_inspect_dump_tree(const struct cmd_struct *cmd,
 				btrfs_print_tree(info->log_root_tree->node,
 					BTRFS_PRINT_TREE_FOLLOW | print_mode);
 			}
-
-			if (info->block_group_root) {
-				printf("block group tree\n");
-				btrfs_print_tree(info->block_group_root->node,
-					BTRFS_PRINT_TREE_FOLLOW | print_mode);
-			}
 		}
 	}
 	tree_root_scan = info->tree_root;
diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h
index fc8b61eda829..c12076202577 100644
--- a/kernel-shared/ctree.h
+++ b/kernel-shared/ctree.h
@@ -457,13 +457,7 @@ struct btrfs_super_block {
 
 	__le64 nr_global_roots;
 
-	__le64 block_group_root;
-	__le64 block_group_root_generation;
-	u8 block_group_root_level;
-
-	/* future expansion */
-	u8 reserved8[7];
-	__le64 reserved[24];
+	__le64 reserved[27];
 	u8 sys_chunk_array[BTRFS_SYSTEM_CHUNK_ARRAY_SIZE];
 	struct btrfs_root_backup super_roots[BTRFS_NUM_BACKUP_ROOTS];
 	/* Padded to 4096 bytes */
@@ -2304,17 +2298,6 @@ BTRFS_SETGET_STACK_FUNCS(backup_bytes_used, struct btrfs_root_backup,
 BTRFS_SETGET_STACK_FUNCS(backup_num_devices, struct btrfs_root_backup,
 		   num_devices, 64);
 
-/*
- * Extent tree v2 doesn't have a global csum or extent root, so we use the
- * extent root slot for the block group root.
- */
-BTRFS_SETGET_STACK_FUNCS(backup_block_group_root, struct btrfs_root_backup,
-		   extent_root, 64);
-BTRFS_SETGET_STACK_FUNCS(backup_block_group_root_gen, struct btrfs_root_backup,
-		   extent_root_gen, 64);
-BTRFS_SETGET_STACK_FUNCS(backup_block_group_root_level, struct btrfs_root_backup,
-		   extent_root_level, 8);
-
 /* struct btrfs_super_block */
 
 BTRFS_SETGET_STACK_FUNCS(super_bytenr, struct btrfs_super_block, bytenr, 64);
@@ -2365,13 +2348,6 @@ BTRFS_SETGET_STACK_FUNCS(super_cache_generation, struct btrfs_super_block,
 BTRFS_SETGET_STACK_FUNCS(super_uuid_tree_generation, struct btrfs_super_block,
 			 uuid_tree_generation, 64);
 BTRFS_SETGET_STACK_FUNCS(super_magic, struct btrfs_super_block, magic, 64);
-BTRFS_SETGET_STACK_FUNCS(super_block_group_root, struct btrfs_super_block,
-			 block_group_root, 64);
-BTRFS_SETGET_STACK_FUNCS(super_block_group_root_generation,
-			 struct btrfs_super_block,
-			 block_group_root_generation, 64);
-BTRFS_SETGET_STACK_FUNCS(super_block_group_root_level,
-			 struct btrfs_super_block, block_group_root_level, 8);
 BTRFS_SETGET_STACK_FUNCS(super_nr_global_roots, struct btrfs_super_block,
 			 nr_global_roots, 64);
 
diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c
index 26b1c9aa192a..80db5976cc3f 100644
--- a/kernel-shared/disk-io.c
+++ b/kernel-shared/disk-io.c
@@ -1209,33 +1209,9 @@ static int load_important_roots(struct btrfs_fs_info *fs_info,
 		goto tree_root;
 	}
 
-	if (backup) {
-		bytenr = btrfs_backup_block_group_root(backup);
-		gen = btrfs_backup_block_group_root_gen(backup);
-		level = btrfs_backup_block_group_root_level(backup);
-	} else {
-		bytenr = btrfs_super_block_group_root(sb);
-		gen = btrfs_super_block_group_root_generation(sb);
-		level = btrfs_super_block_group_root_level(sb);
-	}
 	root = fs_info->block_group_root;
 	btrfs_setup_root(root, fs_info, BTRFS_BLOCK_GROUP_TREE_OBJECTID);
 
-	ret = read_root_node(fs_info, root, bytenr, gen, level);
-	if (ret) {
-		fprintf(stderr, "Couldn't read block group root\n");
-		return -EIO;
-	}
-
-	if (maybe_load_block_groups(fs_info, flags)) {
-		int ret = btrfs_read_block_groups(fs_info);
-		if (ret < 0 && ret != -ENOENT) {
-			errno = -ret;
-			error("failed to read block groups: %m");
-			return ret;
-		}
-	}
-
 tree_root:
 	if (backup) {
 		bytenr = btrfs_backup_tree_root(backup);
@@ -1280,6 +1256,17 @@ int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info, u64 root_tree_bytenr,
 	if (ret)
 		return ret;
 
+	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
+		ret = find_and_setup_root(root, fs_info,
+				BTRFS_BLOCK_GROUP_TREE_OBJECTID,
+				fs_info->block_group_root);
+		if (ret) {
+			error("Couldn't load block group tree\n");
+			return -EIO;
+		}
+		fs_info->block_group_root->track_dirty = 1;
+	}
+
 	ret = find_and_setup_root(root, fs_info, BTRFS_DEV_TREE_OBJECTID,
 				  fs_info->dev_root);
 	if (ret) {
@@ -1288,6 +1275,7 @@ int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info, u64 root_tree_bytenr,
 	}
 	fs_info->dev_root->track_dirty = 1;
 
+
 	ret = find_and_setup_root(root, fs_info, BTRFS_UUID_TREE_OBJECTID,
 				  fs_info->uuid_root);
 	if (ret) {
@@ -1313,8 +1301,7 @@ int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info, u64 root_tree_bytenr,
 			return -EIO;
 	}
 
-	if (!btrfs_fs_incompat(fs_info, EXTENT_TREE_V2) &&
-	    maybe_load_block_groups(fs_info, flags)) {
+	if (maybe_load_block_groups(fs_info, flags)) {
 		ret = btrfs_read_block_groups(fs_info);
 		/*
 		 * If we don't find any blockgroups (ENOENT) we're either
@@ -1834,20 +1821,6 @@ int btrfs_check_super(struct btrfs_super_block *sb, unsigned sbflags)
 		goto error_out;
 	}
 
-	if (btrfs_super_incompat_flags(sb) & BTRFS_FEATURE_INCOMPAT_EXTENT_TREE_V2) {
-		if (btrfs_super_block_group_root_level(sb) >= BTRFS_MAX_LEVEL) {
-			error("block_group_root level too big: %d >= %d",
-			      btrfs_super_block_group_root_level(sb),
-			      BTRFS_MAX_LEVEL);
-			goto error_out;
-		}
-		if (!IS_ALIGNED(btrfs_super_block_group_root(sb), 4096)) {
-			error("block_group_root block unaligned: %llu",
-			      btrfs_super_block_group_root(sb));
-			goto error_out;
-		}
-	}
-
 	if (btrfs_super_incompat_flags(sb) & BTRFS_FEATURE_INCOMPAT_METADATA_UUID)
 		metadata_uuid = sb->metadata_uuid;
 	else
@@ -2165,16 +2138,9 @@ static void backup_super_roots(struct btrfs_fs_info *info)
 	btrfs_set_backup_num_devices(root_backup,
 			     btrfs_super_num_devices(info->super_copy));
 
-	if (btrfs_fs_incompat(info, EXTENT_TREE_V2)) {
-		btrfs_set_backup_block_group_root(root_backup,
-				info->block_group_root->node->start);
-		btrfs_set_backup_block_group_root_gen(root_backup,
-			btrfs_header_generation(info->block_group_root->node));
-		btrfs_set_backup_block_group_root_level(root_backup,
-			btrfs_header_level(info->block_group_root->node));
-	} else {
-		struct btrfs_root *csum_root = btrfs_csum_root(info, 0);
+	if (!btrfs_fs_incompat(info, EXTENT_TREE_V2)) {
 		struct btrfs_root *extent_root = btrfs_extent_root(info, 0);
+		struct btrfs_root *csum_root = btrfs_csum_root(info, 0);
 
 		btrfs_set_backup_csum_root(root_backup, csum_root->node->start);
 		btrfs_set_backup_csum_root_gen(root_backup,
@@ -2235,7 +2201,7 @@ int write_ctree_super(struct btrfs_trans_handle *trans)
 	struct btrfs_fs_info *fs_info = trans->fs_info;
 	struct btrfs_root *tree_root = fs_info->tree_root;
 	struct btrfs_root *chunk_root = fs_info->chunk_root;
-	struct btrfs_root *block_group_root = fs_info->block_group_root;
+
 	if (fs_info->readonly)
 		return 0;
 
@@ -2252,15 +2218,6 @@ int write_ctree_super(struct btrfs_trans_handle *trans)
 	btrfs_set_super_chunk_root_generation(fs_info->super_copy,
 				btrfs_header_generation(chunk_root->node));
 
-	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
-		btrfs_set_super_block_group_root(fs_info->super_copy,
-						 block_group_root->node->start);
-		btrfs_set_super_block_group_root_generation(fs_info->super_copy,
-				btrfs_header_generation(block_group_root->node));
-		btrfs_set_super_block_group_root_level(fs_info->super_copy,
-				btrfs_header_level(block_group_root->node));
-	}
-
 	ret = write_all_supers(fs_info);
 	if (ret)
 		fprintf(stderr, "failed to write new super block err %d\n", ret);
diff --git a/kernel-shared/print-tree.c b/kernel-shared/print-tree.c
index a5886ff602ee..bffe30b405c7 100644
--- a/kernel-shared/print-tree.c
+++ b/kernel-shared/print-tree.c
@@ -2046,12 +2046,6 @@ void btrfs_print_superblock(struct btrfs_super_block *sb, int full)
 	       (unsigned long long)btrfs_super_cache_generation(sb));
 	printf("uuid_tree_generation\t%llu\n",
 	       (unsigned long long)btrfs_super_uuid_tree_generation(sb));
-	printf("block_group_root\t%llu\n",
-	       (unsigned long long)btrfs_super_block_group_root(sb));
-	printf("block_group_root_generation\t%llu\n",
-	       (unsigned long long)btrfs_super_block_group_root_generation(sb));
-	printf("block_group_root_level\t%llu\n",
-	       (unsigned long long)btrfs_super_block_group_root_level(sb));
 
 	uuid_unparse(sb->dev_item.uuid, buf);
 	printf("dev_item.uuid\t\t%s\n", buf);
diff --git a/mkfs/common.c b/mkfs/common.c
index d5a49ca11cde..b72338551dfb 100644
--- a/mkfs/common.c
+++ b/mkfs/common.c
@@ -98,8 +98,7 @@ static int btrfs_create_tree_root(int fd, struct btrfs_mkfs_config *cfg,
 
 	for (i = 0; i < blocks_nr; i++) {
 		blk = blocks[i];
-		if (blk == MKFS_ROOT_TREE || blk == MKFS_CHUNK_TREE ||
-		    blk == MKFS_BLOCK_GROUP_TREE)
+		if (blk == MKFS_ROOT_TREE || blk == MKFS_CHUNK_TREE)
 			continue;
 
 		btrfs_set_root_bytenr(&root_item, cfg->blocks[blk]);
@@ -440,13 +439,9 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 		btrfs_set_super_compat_ro_flags(&super, ro_flags);
 		btrfs_set_super_cache_generation(&super, 0);
 	}
-	if (extent_tree_v2) {
+	if (extent_tree_v2)
 		btrfs_set_super_nr_global_roots(&super, 1);
-		btrfs_set_super_block_group_root(&super,
-						 cfg->blocks[MKFS_BLOCK_GROUP_TREE]);
-		btrfs_set_super_block_group_root_generation(&super, 1);
-		btrfs_set_super_block_group_root_level(&super, 0);
-	}
+
 	if (cfg->label)
 		__strncpy_null(super.label, cfg->label, BTRFS_LABEL_SIZE - 1);
 
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 3/5] btrfs-progs: separate block group tree from extent tree v2
  2022-08-09  6:03 [PATCH v3 0/5] btrfs-progs: separate BLOCK_GROUP_TREE feature from extent-tree-v2 Qu Wenruo
  2022-08-09  6:03 ` [PATCH v3 1/5] btrfs-progs: mkfs: dynamically modify mkfs blocks array Qu Wenruo
  2022-08-09  6:03 ` [PATCH v3 2/5] btrfs-progs: don't save block group root into super block Qu Wenruo
@ 2022-08-09  6:03 ` Qu Wenruo
  2022-08-31 19:14   ` David Sterba
  2022-10-03 14:48   ` Anand Jain
  2022-08-09  6:03 ` [PATCH v3 4/5] btrfs-progs: btrfstune: add the ability to convert to block group tree feature Qu Wenruo
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 16+ messages in thread
From: Qu Wenruo @ 2022-08-09  6:03 UTC (permalink / raw)
  To: linux-btrfs

Block group tree feature is completely a standalone feature, and it has
been over 5 years before the initial introduction to solve the long
mount time.

I don't really want to waste another 5 years waiting for a feature which
may or may not work, but definitely not properly reviewed for its
preparation patches.

So this patch will separate the block group tree feature into a
standalone compat RO feature.

There is a catch, in mkfs create_block_group_tree(), current
tree-checker only accepts block group item with valid chunk_objectid,
but the existing code from extent-tree-v2 didn't properly initialize it.

This patch will also fix above mentioned problem so kernel can mount it
correctly.

Now mkfs/fsck should be able to handle the fs with block group tree.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 check/main.c               |  8 ++------
 common/fsfeatures.c        |  8 ++++++++
 common/fsfeatures.h        |  2 ++
 kernel-shared/ctree.h      |  9 ++++++++-
 kernel-shared/disk-io.c    |  4 ++--
 kernel-shared/disk-io.h    |  2 +-
 kernel-shared/print-tree.c |  5 ++---
 mkfs/common.c              | 31 ++++++++++++++++++++++++-------
 mkfs/main.c                |  3 ++-
 9 files changed, 51 insertions(+), 21 deletions(-)

diff --git a/check/main.c b/check/main.c
index 4f7ab8b29309..02abbd5289f9 100644
--- a/check/main.c
+++ b/check/main.c
@@ -6293,7 +6293,7 @@ static int check_type_with_root(u64 rootid, u8 key_type)
 			goto err;
 		break;
 	case BTRFS_BLOCK_GROUP_ITEM_KEY:
-		if (btrfs_fs_incompat(gfs_info, EXTENT_TREE_V2)) {
+		if (btrfs_fs_compat_ro(gfs_info, BLOCK_GROUP_TREE)) {
 			if (rootid != BTRFS_BLOCK_GROUP_TREE_OBJECTID)
 				goto err;
 		} else if (rootid != BTRFS_EXTENT_TREE_OBJECTID) {
@@ -9071,10 +9071,6 @@ again:
 	ret = load_super_root(&normal_trees, gfs_info->chunk_root);
 	if (ret < 0)
 		goto out;
-	ret = load_super_root(&normal_trees, gfs_info->block_group_root);
-	if (ret < 0)
-		goto out;
-
 	ret = parse_tree_roots(&normal_trees, &dropping_trees);
 	if (ret < 0)
 		goto out;
@@ -9574,7 +9570,7 @@ again:
 	 * If we are extent tree v2 then we can reint the block group root as
 	 * well.
 	 */
-	if (btrfs_fs_incompat(gfs_info, EXTENT_TREE_V2)) {
+	if (btrfs_fs_compat_ro(gfs_info, BLOCK_GROUP_TREE)) {
 		ret = btrfs_fsck_reinit_root(trans, gfs_info->block_group_root);
 		if (ret) {
 			fprintf(stderr, "block group initialization failed\n");
diff --git a/common/fsfeatures.c b/common/fsfeatures.c
index 23a92c21a2cc..90704959b13b 100644
--- a/common/fsfeatures.c
+++ b/common/fsfeatures.c
@@ -172,6 +172,14 @@ static const struct btrfs_feature runtime_features[] = {
 		VERSION_TO_STRING2(safe, 4,9),
 		VERSION_TO_STRING2(default, 5,15),
 		.desc		= "free space tree (space_cache=v2)"
+	}, {
+		.name		= "block-group-tree",
+		.flag		= BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE,
+		.sysfs_name = "block_group_tree",
+		VERSION_TO_STRING2(compat, 6,0),
+		VERSION_NULL(safe),
+		VERSION_NULL(default),
+		.desc		= "block group tree to reduce mount time"
 	},
 	/* Keep this one last */
 	{
diff --git a/common/fsfeatures.h b/common/fsfeatures.h
index 9e39c667b900..a8d77fd4da05 100644
--- a/common/fsfeatures.h
+++ b/common/fsfeatures.h
@@ -45,6 +45,8 @@
 
 #define BTRFS_RUNTIME_FEATURE_QUOTA		(1ULL << 0)
 #define BTRFS_RUNTIME_FEATURE_FREE_SPACE_TREE	(1ULL << 1)
+#define BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE	(1ULL << 2)
+
 
 void btrfs_list_all_fs_features(u64 mask_disallowed);
 void btrfs_list_all_runtime_features(u64 mask_disallowed);
diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h
index c12076202577..d8909b3fdf20 100644
--- a/kernel-shared/ctree.h
+++ b/kernel-shared/ctree.h
@@ -479,6 +479,12 @@ BUILD_ASSERT(sizeof(struct btrfs_super_block) == BTRFS_SUPER_INFO_SIZE);
  */
 #define BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID	(1ULL << 1)
 
+/*
+ * Save all block group items into a dedicated block group tree, to greatly
+ * reduce mount time for large fs.
+ */
+#define BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE	(1ULL << 5)
+
 #define BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF	(1ULL << 0)
 #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL	(1ULL << 1)
 #define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS	(1ULL << 2)
@@ -508,7 +514,8 @@ BUILD_ASSERT(sizeof(struct btrfs_super_block) == BTRFS_SUPER_INFO_SIZE);
  */
 #define BTRFS_FEATURE_COMPAT_RO_SUPP			\
 	(BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |	\
-	 BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID)
+	 BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID| \
+	 BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE)
 
 #if EXPERIMENTAL
 #define BTRFS_FEATURE_INCOMPAT_SUPP			\
diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c
index 80db5976cc3f..6eeb5ecd1d59 100644
--- a/kernel-shared/disk-io.c
+++ b/kernel-shared/disk-io.c
@@ -1203,7 +1203,7 @@ static int load_important_roots(struct btrfs_fs_info *fs_info,
 		backup = sb->super_roots + index;
 	}
 
-	if (!btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
+	if (!btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE)) {
 		free(fs_info->block_group_root);
 		fs_info->block_group_root = NULL;
 		goto tree_root;
@@ -1256,7 +1256,7 @@ int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info, u64 root_tree_bytenr,
 	if (ret)
 		return ret;
 
-	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
+	if (btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE)) {
 		ret = find_and_setup_root(root, fs_info,
 				BTRFS_BLOCK_GROUP_TREE_OBJECTID,
 				fs_info->block_group_root);
diff --git a/kernel-shared/disk-io.h b/kernel-shared/disk-io.h
index bba97fc1a814..6c8eaa2bd13d 100644
--- a/kernel-shared/disk-io.h
+++ b/kernel-shared/disk-io.h
@@ -232,7 +232,7 @@ int btrfs_global_root_insert(struct btrfs_fs_info *fs_info,
 static inline struct btrfs_root *btrfs_block_group_root(
 						struct btrfs_fs_info *fs_info)
 {
-	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
+	if (btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE))
 		return fs_info->block_group_root;
 	return btrfs_extent_root(fs_info, 0);
 }
diff --git a/kernel-shared/print-tree.c b/kernel-shared/print-tree.c
index bffe30b405c7..b2ee77c2fb73 100644
--- a/kernel-shared/print-tree.c
+++ b/kernel-shared/print-tree.c
@@ -1668,6 +1668,7 @@ struct readable_flag_entry {
 static struct readable_flag_entry compat_ro_flags_array[] = {
 	DEF_COMPAT_RO_FLAG_ENTRY(FREE_SPACE_TREE),
 	DEF_COMPAT_RO_FLAG_ENTRY(FREE_SPACE_TREE_VALID),
+	DEF_COMPAT_RO_FLAG_ENTRY(BLOCK_GROUP_TREE),
 };
 static const int compat_ro_flags_num = sizeof(compat_ro_flags_array) /
 				       sizeof(struct readable_flag_entry);
@@ -1754,9 +1755,7 @@ static void print_readable_compat_ro_flag(u64 flag)
 	 */
 	return __print_readable_flag(flag, compat_ro_flags_array,
 				     compat_ro_flags_num,
-				     BTRFS_FEATURE_COMPAT_RO_SUPP |
-				     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |
-				     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID);
+				     BTRFS_FEATURE_COMPAT_RO_SUPP);
 }
 
 static void print_readable_incompat_flag(u64 flag)
diff --git a/mkfs/common.c b/mkfs/common.c
index b72338551dfb..cb616f13ef9b 100644
--- a/mkfs/common.c
+++ b/mkfs/common.c
@@ -75,6 +75,8 @@ static int btrfs_create_tree_root(int fd, struct btrfs_mkfs_config *cfg,
 	int blk;
 	int i;
 	u8 uuid[BTRFS_UUID_SIZE];
+	bool block_group_tree = !!(cfg->runtime_features &
+				   BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE);
 
 	memset(buf->data + sizeof(struct btrfs_header), 0,
 		cfg->nodesize - sizeof(struct btrfs_header));
@@ -101,6 +103,9 @@ static int btrfs_create_tree_root(int fd, struct btrfs_mkfs_config *cfg,
 		if (blk == MKFS_ROOT_TREE || blk == MKFS_CHUNK_TREE)
 			continue;
 
+		if (!block_group_tree && blk == MKFS_BLOCK_GROUP_TREE)
+			continue;
+
 		btrfs_set_root_bytenr(&root_item, cfg->blocks[blk]);
 		btrfs_set_disk_key_objectid(&disk_key,
 			reference_root_table[blk]);
@@ -216,7 +221,8 @@ static int create_block_group_tree(int fd, struct btrfs_mkfs_config *cfg,
 
 	memset(buf->data + sizeof(struct btrfs_header), 0,
 		cfg->nodesize - sizeof(struct btrfs_header));
-	write_block_group_item(buf, 0, bg_offset, bg_size, bg_used, 0,
+	write_block_group_item(buf, 0, bg_offset, bg_size, bg_used,
+			       BTRFS_FIRST_CHUNK_TREE_OBJECTID,
 			       cfg->leaf_data_size -
 			       sizeof(struct btrfs_block_group_item));
 	btrfs_set_header_bytenr(buf, cfg->blocks[MKFS_BLOCK_GROUP_TREE]);
@@ -357,6 +363,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 	u32 array_size;
 	u32 item_size;
 	u64 total_used = 0;
+	u64 ro_flags = 0;
 	int skinny_metadata = !!(cfg->features &
 				 BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA);
 	u64 num_bytes;
@@ -365,6 +372,8 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 	bool add_block_group = true;
 	bool free_space_tree = !!(cfg->runtime_features &
 				  BTRFS_RUNTIME_FEATURE_FREE_SPACE_TREE);
+	bool block_group_tree = !!(cfg->runtime_features &
+				   BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE);
 	bool extent_tree_v2 = !!(cfg->features &
 				 BTRFS_FEATURE_INCOMPAT_EXTENT_TREE_V2);
 
@@ -372,8 +381,13 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 	       sizeof(enum btrfs_mkfs_block) * ARRAY_SIZE(default_blocks));
 	blocks_nr = ARRAY_SIZE(default_blocks);
 
-	/* Extent tree v2 needs an extra block for block group tree.*/
-	if (extent_tree_v2) {
+	/*
+	 * Add one new block for block group tree.
+	 * And for block group tree, we don't need to add block group item
+	 * into extent tree, the item will be handled in block group tree
+	 * initialization.
+	 */
+	if (block_group_tree) {
 		mkfs_blocks_add(blocks, &blocks_nr, MKFS_BLOCK_GROUP_TREE);
 		add_block_group = false;
 	}
@@ -433,12 +447,15 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 		btrfs_set_super_cache_generation(&super, -1);
 	btrfs_set_super_incompat_flags(&super, cfg->features);
 	if (free_space_tree) {
-		u64 ro_flags = BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |
-			BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID;
+		ro_flags |= (BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |
+			     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID);
 
-		btrfs_set_super_compat_ro_flags(&super, ro_flags);
 		btrfs_set_super_cache_generation(&super, 0);
 	}
+	if (block_group_tree)
+		ro_flags |= BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE;
+	btrfs_set_super_compat_ro_flags(&super, ro_flags);
+
 	if (extent_tree_v2)
 		btrfs_set_super_nr_global_roots(&super, 1);
 
@@ -695,7 +712,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 			goto out;
 	}
 
-	if (extent_tree_v2) {
+	if (block_group_tree) {
 		ret = create_block_group_tree(fd, cfg, buf,
 					      system_group_offset,
 					      system_group_size, total_used);
diff --git a/mkfs/main.c b/mkfs/main.c
index ce096d362171..518ce0fd7523 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -299,7 +299,8 @@ static int recow_roots(struct btrfs_trans_handle *trans,
 	ret = __recow_root(trans, info->dev_root);
 	if (ret)
 		return ret;
-        if (btrfs_fs_incompat(info, EXTENT_TREE_V2)) {
+
+	if (btrfs_fs_compat_ro(info, BLOCK_GROUP_TREE)) {
 		ret = __recow_root(trans, info->block_group_root);
 		if (ret)
 			return ret;
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 4/5] btrfs-progs: btrfstune: add the ability to convert to block group tree feature
  2022-08-09  6:03 [PATCH v3 0/5] btrfs-progs: separate BLOCK_GROUP_TREE feature from extent-tree-v2 Qu Wenruo
                   ` (2 preceding siblings ...)
  2022-08-09  6:03 ` [PATCH v3 3/5] btrfs-progs: separate block group tree from extent tree v2 Qu Wenruo
@ 2022-08-09  6:03 ` Qu Wenruo
  2022-08-09  6:03 ` [PATCH v3 5/5] btrfs-progs: mkfs: add artificial dependency for block group tree Qu Wenruo
  2022-08-31 18:26 ` [PATCH v3 0/5] btrfs-progs: separate BLOCK_GROUP_TREE feature from extent-tree-v2 David Sterba
  5 siblings, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2022-08-09  6:03 UTC (permalink / raw)
  To: linux-btrfs

The new '-b' option will be responsible for converting to block group
tree compat ro feature.

The workflow looks like this for new convert:

- Setting CHANGEING_BG_TREE flag
  And initialize fs_info->last_converted_bg_bytenr value to (u64)-1.

  Any bg with bytenr >= last_converted_bg_bytenr will have its bg item
  update go to the new root (bg tree).

- Iterate each block group by their bytenr in descending order
  This involves:
  * Delete the old bg item from the old tree (extent tree)
  * Update last_converted_bg_bytenr to the bytenr of the bg
  * Add the new bg item into the new tree (bg tree)
  * If we have converted a bunch of bgs, commit current transaction

- Clear CHANGEING_BG_TREE flag
  And set the new BLOCK_GROUP_TREE compat ro flag and commit.

And since we're doing the convert in multiple transactions, we also need
to resume from last interrupted convert.

In that case, we just grab the last unconverted bg, and start from it.

And to co-operate with the new kernel requirement for both no-holes and
free-space-tree features, the convert tool will check for
free-space-tree feature. If not enabled, will error out with an error
message to how to continue (by mounting with "-o space_cache=v2").

For missing no-holes feature, we just need to set the flag during
convert.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 Documentation/btrfstune.rst |   5 +
 btrfstune.c                 | 148 ++++++++++++++++++++-
 kernel-shared/ctree.c       |   8 ++
 kernel-shared/ctree.h       |  20 +++
 kernel-shared/disk-io.c     |  32 +++--
 kernel-shared/disk-io.h     |   3 +
 kernel-shared/extent-tree.c | 247 ++++++++++++++++++++++++++++++++++--
 7 files changed, 436 insertions(+), 27 deletions(-)

diff --git a/Documentation/btrfstune.rst b/Documentation/btrfstune.rst
index 47caccc647b2..01c59d6dbf3b 100644
--- a/Documentation/btrfstune.rst
+++ b/Documentation/btrfstune.rst
@@ -24,6 +24,11 @@ means.  Please refer to the *FILESYSTEM FEATURES* in ``btrfs(5)``.
 OPTIONS
 -------
 
+-b
+        (since kernel 6.0)
+        Enable block group tree feature (greatly reduce mount time),
+        enabled by mkfs feature *block-group-tree*.
+
 -f
         Allow dangerous changes, e.g. clear the seeding flag or change fsid.
         Make sure that you are aware of the dangers.
diff --git a/btrfstune.c b/btrfstune.c
index d1a1877ee45c..add7b1804400 100644
--- a/btrfstune.c
+++ b/btrfstune.c
@@ -775,12 +775,134 @@ out:
 	return ret;
 }
 
+/* After this many block groups we need to commit transaction. */
+#define BLOCK_GROUP_BATCH	64
+
+static int convert_to_bg_tree(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_super_block *sb = fs_info->super_copy;
+	struct btrfs_trans_handle *trans;
+	struct cache_extent *ce;
+	int converted_bgs = 0;
+	int ret;
+
+	trans = btrfs_start_transaction(fs_info->tree_root, 2);
+	if (IS_ERR(trans)) {
+		ret = PTR_ERR(trans);
+		error("failed to start transaction: %d", ret);
+		return ret;
+	}
+
+	/* Set NO_HOLES feature */
+	btrfs_set_super_incompat_flags(sb, btrfs_super_incompat_flags(sb) |
+				       BTRFS_FEATURE_INCOMPAT_NO_HOLES);
+
+	/* We're resuming from previous run. */
+	if (btrfs_super_flags(sb) & BTRFS_SUPER_FLAG_CHANGING_BG_TREE)
+		goto iterate_bgs;
+
+	ret = btrfs_create_root(trans, fs_info,
+				BTRFS_BLOCK_GROUP_TREE_OBJECTID);
+	if (ret < 0) {
+		error("failed to create block group root: %d", ret);
+		goto error;
+	}
+	btrfs_set_super_flags(sb,
+			btrfs_super_flags(sb) |
+			BTRFS_SUPER_FLAG_CHANGING_BG_TREE);
+	fs_info->last_converted_bg_bytenr = (u64)-1;
+
+	/* Now commit the transaction to make above changes to reach disks. */
+	ret = btrfs_commit_transaction(trans, fs_info->tree_root);
+	if (ret < 0) {
+		error("failed to commit transaction for the new bg root: %d",
+		      ret);
+		goto error;
+	}
+	trans = btrfs_start_transaction(fs_info->tree_root, 2);
+	if (IS_ERR(trans)) {
+		ret = PTR_ERR(trans);
+		error("failed to start transaction: %d", ret);
+		return ret;
+	}
+
+iterate_bgs:
+	if (fs_info->last_converted_bg_bytenr == (u64)-1) {
+		ce = last_cache_extent(&fs_info->mapping_tree.cache_tree);
+	} else {
+		ce = search_cache_extent(&fs_info->mapping_tree.cache_tree,
+					 fs_info->last_converted_bg_bytenr);
+		if (!ce) {
+			error("failed to find block group for bytenr %llu",
+			      fs_info->last_converted_bg_bytenr);
+			ret = -ENOENT;
+			goto error;
+		}
+		ce = prev_cache_extent(ce);
+		if (!ce) {
+			error("no more block group before bytenr %llu",
+			      fs_info->last_converted_bg_bytenr);
+			ret = -ENOENT;
+			goto error;
+		}
+	}
+
+	/* Now convert each block */
+	while (ce) {
+		struct cache_extent *prev = prev_cache_extent(ce);
+		u64 bytenr = ce->start;
+
+		ret = btrfs_convert_one_bg(trans, bytenr);
+		if (ret < 0)
+			goto error;
+		converted_bgs++;
+		ce = prev;
+
+		if (converted_bgs % BLOCK_GROUP_BATCH == 0) {
+			ret = btrfs_commit_transaction(trans,
+							fs_info->tree_root);
+			if (ret < 0) {
+				error("failed to commit transaction: %d", ret);
+				return ret;
+			}
+			trans = btrfs_start_transaction(fs_info->tree_root, 2);
+			if (IS_ERR(trans)) {
+				ret = PTR_ERR(trans);
+				error("failed to start transaction: %d", ret);
+				return ret;
+			}
+		}
+	}
+	/*
+	 * All bgs converted, remove the CHANGING_BG flag and set the compat ro
+	 * flag.
+	 */
+	fs_info->last_converted_bg_bytenr = 0;
+	btrfs_set_super_flags(sb,
+		btrfs_super_flags(sb) &
+		~BTRFS_SUPER_FLAG_CHANGING_BG_TREE);
+	btrfs_set_super_compat_ro_flags(sb,
+			btrfs_super_compat_ro_flags(sb) |
+			BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE);
+	ret = btrfs_commit_transaction(trans, fs_info->tree_root);
+	if (ret < 0) {
+		error("faield to commit the final transaction: %d", ret);
+		return ret;
+	}
+	printf("Converted the filesystem to block group tree feature\n");
+	return 0;
+error:
+	btrfs_abort_transaction(trans, ret);
+	return ret;
+}
+
 static void print_usage(void)
 {
 	printf("usage: btrfstune [options] device\n");
 	printf("Tune settings of filesystem features on an unmounted device\n\n");
 	printf("Options:\n");
 	printf("  change feature status:\n");
+	printf("\t-b          enable block group tree (mkfs: block-group-tree, for less mount time)\n");
 	printf("\t-r          enable extended inode refs (mkfs: extref, for hardlink limits)\n");
 	printf("\t-x          enable skinny metadata extent refs (mkfs: skinny-metadata)\n");
 	printf("\t-n          enable no-holes feature (mkfs: no-holes, more efficient sparse file representation)\n");
@@ -811,6 +933,7 @@ int BOX_MAIN(btrfstune)(int argc, char *argv[])
 	u64 seeding_value = 0;
 	int random_fsid = 0;
 	int change_metadata_uuid = 0;
+	bool to_bg_tree = false;
 	int csum_type = -1;
 	char *new_fsid_str = NULL;
 	int ret;
@@ -826,11 +949,14 @@ int BOX_MAIN(btrfstune)(int argc, char *argv[])
 #endif
 			{ NULL, 0, NULL, 0 }
 		};
-		int c = getopt_long(argc, argv, "S:rxfuU:nmM:", long_options, NULL);
+		int c = getopt_long(argc, argv, "S:rxfuU:nmM:b", long_options, NULL);
 
 		if (c < 0)
 			break;
 		switch(c) {
+		case 'b':
+			to_bg_tree = true;
+			break;
 		case 'S':
 			seeding_flag = 1;
 			seeding_value = arg_strtou64(optarg);
@@ -890,7 +1016,7 @@ int BOX_MAIN(btrfstune)(int argc, char *argv[])
 		return 1;
 	}
 	if (!super_flags && !seeding_flag && !(random_fsid || new_fsid_str) &&
-	    !change_metadata_uuid && csum_type == -1) {
+	    !change_metadata_uuid && csum_type == -1 && !to_bg_tree) {
 		error("at least one option should be specified");
 		print_usage();
 		return 1;
@@ -936,6 +1062,24 @@ int BOX_MAIN(btrfstune)(int argc, char *argv[])
 		return 1;
 	}
 
+	if (to_bg_tree) {
+		if (btrfs_fs_compat_ro(root->fs_info, BLOCK_GROUP_TREE)) {
+			error("the filesystem already has block group tree feature");
+			ret = 1;
+			goto out;
+		}
+		if (!btrfs_fs_compat_ro(root->fs_info, FREE_SPACE_TREE_VALID)) {
+			error("the filesystem doesn't have space cache v2, needs to be mounted with \"-o space_cache=v2\" first");
+			ret = 1;
+			goto out;
+		}
+		ret = convert_to_bg_tree(root->fs_info);
+		if (ret < 0) {
+			error("failed to convert the filesystem to block group tree feature");
+			goto out;
+		}
+		goto out;
+	}
 	if (seeding_flag) {
 		if (btrfs_fs_incompat(root->fs_info, METADATA_UUID)) {
 			fprintf(stderr, "SEED flag cannot be changed on a metadata-uuid changed fs\n");
diff --git a/kernel-shared/ctree.c b/kernel-shared/ctree.c
index 2707e0e64f31..834dcf412e11 100644
--- a/kernel-shared/ctree.c
+++ b/kernel-shared/ctree.c
@@ -267,6 +267,14 @@ int btrfs_create_root(struct btrfs_trans_handle *trans,
 		fs_info->quota_root = new_root;
 		fs_info->quota_enabled = 1;
 		break;
+	case BTRFS_BLOCK_GROUP_TREE_OBJECTID:
+		if (fs_info->block_group_root) {
+			error("bg root already exists");
+			ret = -EEXIST;
+			goto free;
+		}
+		fs_info->block_group_root = new_root;
+		break;
 	/*
 	 * Essential trees can't be created by this function, yet.
 	 * As we expect such skeleton exists, or a lot of functions like
diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h
index d8909b3fdf20..d9ec945690d6 100644
--- a/kernel-shared/ctree.h
+++ b/kernel-shared/ctree.h
@@ -324,6 +324,13 @@ static inline unsigned long btrfs_chunk_item_size(int num_stripes)
 #define BTRFS_SUPER_FLAG_CHANGING_FSID_V2	(1ULL << 36)
 #define BTRFS_SUPER_FLAG_CHANGING_CSUM		(1ULL << 37)
 
+/*
+ * The fs is undergoing block group tree feature change.
+ * If no BLOCK_GROUP_TREE compat ro flag, it's changing from regular
+ * bg item in extent tree to new bg tree.
+ */
+#define BTRFS_SUPER_FLAG_CHANGING_BG_TREE	(1ULL << 38)
+
 #define BTRFS_BACKREF_REV_MAX		256
 #define BTRFS_BACKREF_REV_SHIFT		56
 #define BTRFS_BACKREF_REV_MASK		(((u64)BTRFS_BACKREF_REV_MAX - 1) << \
@@ -1264,6 +1271,18 @@ struct btrfs_fs_info {
 	struct cache_tree *fsck_extent_cache;
 	struct cache_tree *corrupt_blocks;
 
+	/*
+	 * For converting to/from bg tree feature, this records the bytenr
+	 * of the last processed block group item.
+	 *
+	 * Any new block group item after this bytenr is using the target
+	 * block group item format. (e.g. if converting to bg tree, bg item
+	 * after this bytenr should go into block group tree).
+	 *
+	 * Thus the number should decrease as our convert progress goes.
+	 */
+	u64 last_converted_bg_bytenr;
+
 	/* Cached block sizes */
 	u32 nodesize;
 	u32 sectorsize;
@@ -2665,6 +2684,7 @@ int exclude_super_stripes(struct btrfs_fs_info *fs_info,
 u64 add_new_free_space(struct btrfs_block_group *block_group,
 		       struct btrfs_fs_info *info, u64 start, u64 end);
 u64 hash_extent_data_ref(u64 root_objectid, u64 owner, u64 offset);
+int btrfs_convert_one_bg(struct btrfs_trans_handle *trans, u64 bytenr);
 
 /* ctree.c */
 int btrfs_comp_cpu_keys(const struct btrfs_key *k1, const struct btrfs_key *k2);
diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c
index 6eeb5ecd1d59..58030ebf16cd 100644
--- a/kernel-shared/disk-io.c
+++ b/kernel-shared/disk-io.c
@@ -562,9 +562,9 @@ err:
 	return -EIO;
 }
 
-static int find_and_setup_root(struct btrfs_root *tree_root,
-			       struct btrfs_fs_info *fs_info,
-			       u64 objectid, struct btrfs_root *root)
+int btrfs_find_and_setup_root(struct btrfs_root *tree_root,
+			      struct btrfs_fs_info *fs_info,
+			      u64 objectid, struct btrfs_root *root)
 {
 	int ret;
 
@@ -644,7 +644,7 @@ struct btrfs_root *btrfs_read_fs_root_no_cache(struct btrfs_fs_info *fs_info,
 	if (!root)
 		return ERR_PTR(-ENOMEM);
 	if (location->offset == (u64)-1) {
-		ret = find_and_setup_root(tree_root, fs_info,
+		ret = btrfs_find_and_setup_root(tree_root, fs_info,
 					  location->objectid, root);
 		if (ret) {
 			free(root);
@@ -1203,7 +1203,9 @@ static int load_important_roots(struct btrfs_fs_info *fs_info,
 		backup = sb->super_roots + index;
 	}
 
-	if (!btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE)) {
+	if (!btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE) &&
+	    !(btrfs_super_flags(fs_info->super_copy) &
+	      BTRFS_SUPER_FLAG_CHANGING_BG_TREE)) {
 		free(fs_info->block_group_root);
 		fs_info->block_group_root = NULL;
 		goto tree_root;
@@ -1256,8 +1258,9 @@ int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info, u64 root_tree_bytenr,
 	if (ret)
 		return ret;
 
-	if (btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE)) {
-		ret = find_and_setup_root(root, fs_info,
+	if (btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE) ||
+	    btrfs_super_flags(sb) & BTRFS_SUPER_FLAG_CHANGING_BG_TREE) {
+		ret = btrfs_find_and_setup_root(root, fs_info,
 				BTRFS_BLOCK_GROUP_TREE_OBJECTID,
 				fs_info->block_group_root);
 		if (ret) {
@@ -1267,8 +1270,9 @@ int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info, u64 root_tree_bytenr,
 		fs_info->block_group_root->track_dirty = 1;
 	}
 
-	ret = find_and_setup_root(root, fs_info, BTRFS_DEV_TREE_OBJECTID,
-				  fs_info->dev_root);
+	ret = btrfs_find_and_setup_root(root, fs_info,
+					BTRFS_DEV_TREE_OBJECTID,
+					fs_info->dev_root);
 	if (ret) {
 		printk("Couldn't setup device tree\n");
 		return -EIO;
@@ -1276,8 +1280,9 @@ int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info, u64 root_tree_bytenr,
 	fs_info->dev_root->track_dirty = 1;
 
 
-	ret = find_and_setup_root(root, fs_info, BTRFS_UUID_TREE_OBJECTID,
-				  fs_info->uuid_root);
+	ret = btrfs_find_and_setup_root(root, fs_info,
+					BTRFS_UUID_TREE_OBJECTID,
+					fs_info->uuid_root);
 	if (ret) {
 		free(fs_info->uuid_root);
 		fs_info->uuid_root = NULL;
@@ -1285,8 +1290,9 @@ int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info, u64 root_tree_bytenr,
 		fs_info->uuid_root->track_dirty = 1;
 	}
 
-	ret = find_and_setup_root(root, fs_info, BTRFS_QUOTA_TREE_OBJECTID,
-				  fs_info->quota_root);
+	ret = btrfs_find_and_setup_root(root, fs_info,
+					BTRFS_QUOTA_TREE_OBJECTID,
+					fs_info->quota_root);
 	if (ret) {
 		free(fs_info->quota_root);
 		fs_info->quota_root = NULL;
diff --git a/kernel-shared/disk-io.h b/kernel-shared/disk-io.h
index 6c8eaa2bd13d..2424060d705f 100644
--- a/kernel-shared/disk-io.h
+++ b/kernel-shared/disk-io.h
@@ -228,6 +228,9 @@ struct btrfs_root *btrfs_global_root(struct btrfs_fs_info *fs_info,
 u64 btrfs_global_root_id(struct btrfs_fs_info *fs_info, u64 bytenr);
 int btrfs_global_root_insert(struct btrfs_fs_info *fs_info,
 			     struct btrfs_root *root);
+int btrfs_find_and_setup_root(struct btrfs_root *tree_root,
+			      struct btrfs_fs_info *fs_info,
+			      u64 objectid, struct btrfs_root *root);
 
 static inline struct btrfs_root *btrfs_block_group_root(
 						struct btrfs_fs_info *fs_info)
diff --git a/kernel-shared/extent-tree.c b/kernel-shared/extent-tree.c
index 5807b11a7b1a..4e8cf635b7e8 100644
--- a/kernel-shared/extent-tree.c
+++ b/kernel-shared/extent-tree.c
@@ -1546,6 +1546,15 @@ static int update_block_group_item(struct btrfs_trans_handle *trans,
 	struct extent_buffer *leaf;
 	struct btrfs_key key;
 
+	/*
+	 * If we're doing convert and the bg is beyond our last converted bg,
+	 * it should go to the new root.
+	 */
+	if (btrfs_super_flags(fs_info->super_copy) &
+	    BTRFS_SUPER_FLAG_CHANGING_BG_TREE &&
+	    cache->start >= fs_info->last_converted_bg_bytenr)
+		root = fs_info->block_group_root;
+
 	key.objectid = cache->start;
 	key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
 	key.offset = cache->length;
@@ -2726,33 +2735,99 @@ static int read_one_block_group(struct btrfs_fs_info *fs_info,
 	return 0;
 }
 
-int btrfs_read_block_groups(struct btrfs_fs_info *fs_info)
+static int get_last_converted_bg(struct btrfs_fs_info *fs_info)
 {
-	struct btrfs_path path;
-	struct btrfs_root *root;
+	struct btrfs_root *bg_root = fs_info->block_group_root;
+	struct btrfs_path path = {0};
+	struct btrfs_key key = {0};
 	int ret;
+
+	/* Load the first bg in bg tree, that would be our last converted bg. */
+	ret = btrfs_search_slot(NULL, bg_root, &key, &path, 0, 0);
+	if (ret < 0)
+		return ret;
+	ASSERT(ret > 0);
+	/* We should always be at the slot 0 of the first leaf. */
+	ASSERT(path.slots[0] == 0);
+
+	/* Empty bg tree, no converted bg item at all. */
+	if (btrfs_header_nritems(path.nodes[0]) == 0) {
+		fs_info->last_converted_bg_bytenr = (u64)-1;
+		ret = 0;
+		goto out;
+	}
+	btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]);
+	ASSERT(key.type == BTRFS_BLOCK_GROUP_ITEM_KEY);
+	fs_info->last_converted_bg_bytenr = key.objectid;
+
+out:
+	btrfs_release_path(&path);
+	return ret;
+}
+
+/*
+ * Helper to read old block groups items from specified root.
+ *
+ * The difference between this and read_block_groups_from_root() is,
+ * we will exit if we have already read the last bg in the old root.
+ *
+ * This is to avoid wasting time finding bg items which should be in the
+ * new root.
+ */
+static int read_old_block_groups_from_root(struct btrfs_fs_info *fs_info,
+					   struct btrfs_root *root)
+{
+	struct btrfs_path path = {0};
 	struct btrfs_key key;
+	struct cache_extent *ce;
+	/* The last block group bytenr in the old root. */
+	u64 last_bg_in_old_root;
+	int ret;
+
+	if (fs_info->last_converted_bg_bytenr != (u64)-1) {
+		/*
+		 * We know the last converted bg in the other tree, load the chunk
+		 * before that last converted as our last bg in the tree.
+		 */
+		ce = search_cache_extent(&fs_info->mapping_tree.cache_tree,
+			         fs_info->last_converted_bg_bytenr);
+		if (!ce || ce->start != fs_info->last_converted_bg_bytenr) {
+			error("no chunk found for bytenr %llu",
+			      fs_info->last_converted_bg_bytenr);
+			return -ENOENT;
+		}
+		ce = prev_cache_extent(ce);
+		/*
+		 * We should have previous unconverted chunk, or we have
+		 * already finished the convert.
+		 */
+		ASSERT(ce);
+
+		last_bg_in_old_root = ce->start;
+	} else {
+		last_bg_in_old_root = (u64)-1;
+	}
 
-	root = btrfs_block_group_root(fs_info);
-	key.objectid = 0;
-	key.offset = 0;
 	key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
-	btrfs_init_path(&path);
 
-	while(1) {
+	while (true) {
 		ret = find_first_block_group(root, &path, &key);
 		if (ret > 0) {
 			ret = 0;
-			goto error;
+			goto out;
 		}
 		if (ret != 0) {
-			goto error;
+			goto out;
 		}
 		btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]);
 
 		ret = read_one_block_group(fs_info, &path);
 		if (ret < 0 && ret != -ENOENT)
-			goto error;
+			goto out;
+
+		/* We have reached last bg in the old root, no need to continue */
+		if (key.objectid >= last_bg_in_old_root)
+			break;
 
 		if (key.offset == 0)
 			key.objectid++;
@@ -2762,11 +2837,91 @@ int btrfs_read_block_groups(struct btrfs_fs_info *fs_info)
 		btrfs_release_path(&path);
 	}
 	ret = 0;
-error:
+out:
+	btrfs_release_path(&path);
+	return ret;
+}
+
+/* Helper to read all block groups items from specified root. */
+static int read_block_groups_from_root(struct btrfs_fs_info *fs_info,
+					   struct btrfs_root *root)
+{
+	struct btrfs_path path = {0};
+	struct btrfs_key key = {0};
+	int ret;
+
+	key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
+
+	while (true) {
+		ret = find_first_block_group(root, &path, &key);
+		if (ret > 0) {
+			ret = 0;
+			goto out;
+		}
+		if (ret != 0) {
+			goto out;
+		}
+		btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]);
+
+		ret = read_one_block_group(fs_info, &path);
+		if (ret < 0 && ret != -ENOENT)
+			goto out;
+
+		if (key.offset == 0)
+			key.objectid++;
+		else
+			key.objectid = key.objectid + key.offset;
+		key.offset = 0;
+		btrfs_release_path(&path);
+	}
+	ret = 0;
+out:
 	btrfs_release_path(&path);
 	return ret;
 }
 
+static int read_converting_block_groups(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_root *old_root = btrfs_extent_root(fs_info, 0);
+	struct btrfs_root *new_root = btrfs_block_group_root(fs_info);
+	int ret;
+
+	/* Currently we only support converting to bg tree feature. */
+	ASSERT(!btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE));
+
+	ret = get_last_converted_bg(fs_info);
+	if (ret < 0) {
+		error("failed to load the last converted bg: %d", ret);
+		return ret;
+	}
+
+	ret = read_old_block_groups_from_root(fs_info, old_root);
+	if (ret < 0) {
+		error("failed to load block groups from the old root: %d", ret);
+		return ret;
+	}
+
+	/* For block group items in the new tree, just read them all. */
+	ret = read_block_groups_from_root(fs_info, new_root);
+	if (ret < 0) {
+		error("failed to load block groups from the new root: %d", ret);
+		return ret;
+	}
+	return ret;
+}
+
+int btrfs_read_block_groups(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_root *root;
+
+	if (btrfs_super_flags(fs_info->super_copy) &
+	    BTRFS_SUPER_FLAG_CHANGING_BG_TREE)
+		return read_converting_block_groups(fs_info);
+
+	root = btrfs_block_group_root(fs_info);
+	return read_block_groups_from_root(fs_info, root);
+}
+
 /*
  * For extent tree v2 we use the block_group_item->chunk_offset to point at our
  * global root id.  For v1 it's always set to BTRFS_FIRST_CHUNK_TREE_OBJECTID.
@@ -2834,6 +2989,15 @@ static int insert_block_group_item(struct btrfs_trans_handle *trans,
 	key.offset = block_group->length;
 
 	root = btrfs_block_group_root(fs_info);
+	/*
+	 * If we're doing convert and the bg is beyond our last converted bg,
+	 * it should go to the new root.
+	 */
+	if (btrfs_super_flags(fs_info->super_copy) &
+	    BTRFS_SUPER_FLAG_CHANGING_BG_TREE &&
+	    block_group->start >= fs_info->last_converted_bg_bytenr)
+		root = fs_info->block_group_root;
+
 	return btrfs_insert_item(trans, root, &key, &bgi, sizeof(bgi));
 }
 
@@ -2958,6 +3122,15 @@ static int remove_block_group_item(struct btrfs_trans_handle *trans,
 	key.offset = block_group->length;
 	key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
 
+	/*
+	 * If we're doing convert and the bg is beyond our last converted bg,
+	 * it should go to the new root.
+	 */
+	if (btrfs_super_flags(fs_info->super_copy) &
+	    BTRFS_SUPER_FLAG_CHANGING_BG_TREE &&
+	    block_group->start >= fs_info->last_converted_bg_bytenr)
+		root = fs_info->block_group_root;
+
 	ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
 	if (ret > 0)
 		ret = -ENOENT;
@@ -3849,3 +4022,53 @@ int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, unsigned long nr)
 
 	return 0;
 }
+
+int btrfs_convert_one_bg(struct btrfs_trans_handle *trans, u64 bytenr)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_root *new_root = fs_info->block_group_root;
+	struct btrfs_root *old_root = btrfs_extent_root(fs_info, 0);
+	struct btrfs_block_group *bg;
+	struct btrfs_path path = {0};
+	int ret;
+
+	ASSERT(new_root);
+	ASSERT(old_root);
+	ASSERT(btrfs_super_flags(fs_info->super_copy) &
+	       BTRFS_SUPER_FLAG_CHANGING_BG_TREE);
+	/*
+	 * Only support converting to bg tree yet, thus the feature should not
+	 * be set.
+	 */
+	ASSERT(!btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE));
+
+	bg = btrfs_lookup_block_group(fs_info, bytenr);
+	if (!bg) {
+		error("failed to find block group for bytenr %llu", bytenr);
+		return -ENOENT;
+	}
+	/*
+	 * Delete the block group item from the old tree first.
+	 * As we haven't yet update last_converted_bg_bytenr, the delete will
+	 * be done in the old tree.
+	 */
+	ret = remove_block_group_item(trans, &path, bg);
+	btrfs_release_path(&path);
+	if (ret < 0) {
+		error("failed to delete block group item from the old root: %d",
+		      ret);
+		return ret;
+	}
+	fs_info->last_converted_bg_bytenr = bytenr;
+	/*
+	 * Now last_converted_bg_bytenr is updated, the insert will happen for
+	 * the new root.
+	 */
+	ret = insert_block_group_item(trans, bg);
+	if (ret < 0) {
+		error("failed to insert block group item into the new root: %d",
+		      ret);
+		return ret;
+	}
+	return ret;
+}
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 5/5] btrfs-progs: mkfs: add artificial dependency for block group tree
  2022-08-09  6:03 [PATCH v3 0/5] btrfs-progs: separate BLOCK_GROUP_TREE feature from extent-tree-v2 Qu Wenruo
                   ` (3 preceding siblings ...)
  2022-08-09  6:03 ` [PATCH v3 4/5] btrfs-progs: btrfstune: add the ability to convert to block group tree feature Qu Wenruo
@ 2022-08-09  6:03 ` Qu Wenruo
  2022-08-31 18:26 ` [PATCH v3 0/5] btrfs-progs: separate BLOCK_GROUP_TREE feature from extent-tree-v2 David Sterba
  5 siblings, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2022-08-09  6:03 UTC (permalink / raw)
  To: linux-btrfs

To reduce the test matrix and to follow the kernel behavior, make sure
for block-group-tree feature, we have no-holes and free-space-tree
features enabled.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 mkfs/main.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mkfs/main.c b/mkfs/main.c
index 518ce0fd7523..54cd47a0cdc0 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -1303,6 +1303,13 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 		}
 	}
 
+	/* Block group tree feature requires no-holes and frree space tree. */
+	if (runtime_features & BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE &&
+	    (!(features & BTRFS_FEATURE_INCOMPAT_NO_HOLES) ||
+	     !(runtime_features & BTRFS_RUNTIME_FEATURE_FREE_SPACE_TREE))) {
+		error("block group tree feature requires no-holes and free-space-tree features");
+		exit(1);
+	}
 	if (zoned) {
 		if (source_dir_set) {
 			error("the option -r and zoned mode are incompatible");
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/5] btrfs-progs: separate BLOCK_GROUP_TREE feature from extent-tree-v2
  2022-08-09  6:03 [PATCH v3 0/5] btrfs-progs: separate BLOCK_GROUP_TREE feature from extent-tree-v2 Qu Wenruo
                   ` (4 preceding siblings ...)
  2022-08-09  6:03 ` [PATCH v3 5/5] btrfs-progs: mkfs: add artificial dependency for block group tree Qu Wenruo
@ 2022-08-31 18:26 ` David Sterba
  5 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2022-08-31 18:26 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Aug 09, 2022 at 02:03:50PM +0800, Qu Wenruo wrote:
> [CHANGELOG]
> [TODO]
> - Add btrfstune support to convert from block-group-tree feature
>   The infrastructure is already done.
> 
> Qu Wenruo (5):
>   btrfs-progs: mkfs: dynamically modify mkfs blocks array
>   btrfs-progs: don't save block group root into super block
>   btrfs-progs: separate block group tree from extent tree v2
>   btrfs-progs: btrfstune: add the ability to convert to block group tree
>     feature
>   btrfs-progs: mkfs: add artificial dependency for block group tree

The kernel part is in for-next so I'll add this to progs, so far not
under the experimental build but this should be resolved until the
final kernel release.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/5] btrfs-progs: separate block group tree from extent tree v2
  2022-08-09  6:03 ` [PATCH v3 3/5] btrfs-progs: separate block group tree from extent tree v2 Qu Wenruo
@ 2022-08-31 19:14   ` David Sterba
  2022-08-31 21:43     ` Qu Wenruo
  2022-10-03 14:48   ` Anand Jain
  1 sibling, 1 reply; 16+ messages in thread
From: David Sterba @ 2022-08-31 19:14 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Aug 09, 2022 at 02:03:53PM +0800, Qu Wenruo wrote:
> Block group tree feature is completely a standalone feature, and it has
> been over 5 years before the initial introduction to solve the long
> mount time.
> 
> I don't really want to waste another 5 years waiting for a feature which
> may or may not work, but definitely not properly reviewed for its
> preparation patches.

This should go to the cover letter but in the commit such ranting does
not bring much information for the code change. And I rephrase or delete
such things unless it's somehow relevant.

> So this patch will separate the block group tree feature into a
> standalone compat RO feature.
> 
> There is a catch, in mkfs create_block_group_tree(), current
> tree-checker only accepts block group item with valid chunk_objectid,
> but the existing code from extent-tree-v2 didn't properly initialize it.
> 
> This patch will also fix above mentioned problem so kernel can mount it
> correctly.
> 
> Now mkfs/fsck should be able to handle the fs with block group tree.
> 
> --- a/common/fsfeatures.c
> +++ b/common/fsfeatures.c
> @@ -172,6 +172,14 @@ static const struct btrfs_feature runtime_features[] = {
>  		VERSION_TO_STRING2(safe, 4,9),
>  		VERSION_TO_STRING2(default, 5,15),
>  		.desc		= "free space tree (space_cache=v2)"
> +	}, {
> +		.name		= "block-group-tree",
> +		.flag		= BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE,
> +		.sysfs_name = "block_group_tree",
> +		VERSION_TO_STRING2(compat, 6,0),
> +		VERSION_NULL(safe),
> +		VERSION_NULL(default),
> +		.desc		= "block group tree to reduce mount time"

Like explaining that this is a runtime feature and I have not noticed
until I tried to test it expecting to see it among the mkfs-time
features but there was nothing in 'mkfs.btrfs -O list-all'.

This is a mkfs-time feature as it creates a fundamental on-disk
structure, basically a subset of extent tree.

As it's in one patch please send a fixup so I can fold it. Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/5] btrfs-progs: separate block group tree from extent tree v2
  2022-08-31 19:14   ` David Sterba
@ 2022-08-31 21:43     ` Qu Wenruo
  2022-09-01 12:15       ` Qu Wenruo
  0 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2022-08-31 21:43 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs



On 2022/9/1 03:14, David Sterba wrote:
> On Tue, Aug 09, 2022 at 02:03:53PM +0800, Qu Wenruo wrote:
>> Block group tree feature is completely a standalone feature, and it has
>> been over 5 years before the initial introduction to solve the long
>> mount time.
>>
>> I don't really want to waste another 5 years waiting for a feature which
>> may or may not work, but definitely not properly reviewed for its
>> preparation patches.
>
> This should go to the cover letter but in the commit such ranting does
> not bring much information for the code change. And I rephrase or delete
> such things unless it's somehow relevant.
>
>> So this patch will separate the block group tree feature into a
>> standalone compat RO feature.
>>
>> There is a catch, in mkfs create_block_group_tree(), current
>> tree-checker only accepts block group item with valid chunk_objectid,
>> but the existing code from extent-tree-v2 didn't properly initialize it.
>>
>> This patch will also fix above mentioned problem so kernel can mount it
>> correctly.
>>
>> Now mkfs/fsck should be able to handle the fs with block group tree.
>>
>> --- a/common/fsfeatures.c
>> +++ b/common/fsfeatures.c
>> @@ -172,6 +172,14 @@ static const struct btrfs_feature runtime_features[] = {
>>   		VERSION_TO_STRING2(safe, 4,9),
>>   		VERSION_TO_STRING2(default, 5,15),
>>   		.desc		= "free space tree (space_cache=v2)"
>> +	}, {
>> +		.name		= "block-group-tree",
>> +		.flag		= BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE,
>> +		.sysfs_name = "block_group_tree",
>> +		VERSION_TO_STRING2(compat, 6,0),
>> +		VERSION_NULL(safe),
>> +		VERSION_NULL(default),
>> +		.desc		= "block group tree to reduce mount time"
>
> Like explaining that this is a runtime feature and I have not noticed
> until I tried to test it expecting to see it among the mkfs-time
> features but there was nothing in 'mkfs.btrfs -O list-all'.
>
> This is a mkfs-time feature as it creates a fundamental on-disk
> structure, basically a subset of extent tree.

This comes to the decision to make bg-tree feature as a compat RO flag.

As we didn't put free-space-tree into "-O" options, but "-R" options.
So the same should be done for most compat RO flags.

Furthermore I remember I discussed about this before, extent tree change
should not need a full incompat flag, as pure read-only tools, like
btrfs-fuse should still be able to read the subvolume/csum/chunk/root
trees without any problem.

So following above reasons, bg-tree is compat RO, and compat RO goes
into "-R" options, I see no reason to put it into "-O" options.

Thanks,
Qu

>
> As it's in one patch please send a fixup so I can fold it. Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/5] btrfs-progs: separate block group tree from extent tree v2
  2022-08-31 21:43     ` Qu Wenruo
@ 2022-09-01 12:15       ` Qu Wenruo
  2022-09-02  9:21         ` David Sterba
  0 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2022-09-01 12:15 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs



On 2022/9/1 05:43, Qu Wenruo wrote:
>
>
> On 2022/9/1 03:14, David Sterba wrote:
>> On Tue, Aug 09, 2022 at 02:03:53PM +0800, Qu Wenruo wrote:
>>> Block group tree feature is completely a standalone feature, and it has
>>> been over 5 years before the initial introduction to solve the long
>>> mount time.
>>>
>>> I don't really want to waste another 5 years waiting for a feature which
>>> may or may not work, but definitely not properly reviewed for its
>>> preparation patches.
>>
>> This should go to the cover letter but in the commit such ranting does
>> not bring much information for the code change. And I rephrase or delete
>> such things unless it's somehow relevant.
>>
>>> So this patch will separate the block group tree feature into a
>>> standalone compat RO feature.
>>>
>>> There is a catch, in mkfs create_block_group_tree(), current
>>> tree-checker only accepts block group item with valid chunk_objectid,
>>> but the existing code from extent-tree-v2 didn't properly initialize it.
>>>
>>> This patch will also fix above mentioned problem so kernel can mount it
>>> correctly.
>>>
>>> Now mkfs/fsck should be able to handle the fs with block group tree.
>>>
>>> --- a/common/fsfeatures.c
>>> +++ b/common/fsfeatures.c
>>> @@ -172,6 +172,14 @@ static const struct btrfs_feature
>>> runtime_features[] = {
>>>           VERSION_TO_STRING2(safe, 4,9),
>>>           VERSION_TO_STRING2(default, 5,15),
>>>           .desc        = "free space tree (space_cache=v2)"
>>> +    }, {
>>> +        .name        = "block-group-tree",
>>> +        .flag        = BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE,
>>> +        .sysfs_name = "block_group_tree",
>>> +        VERSION_TO_STRING2(compat, 6,0),
>>> +        VERSION_NULL(safe),
>>> +        VERSION_NULL(default),
>>> +        .desc        = "block group tree to reduce mount time"
>>
>> Like explaining that this is a runtime feature and I have not noticed
>> until I tried to test it expecting to see it among the mkfs-time
>> features but there was nothing in 'mkfs.btrfs -O list-all'.
>>
>> This is a mkfs-time feature as it creates a fundamental on-disk
>> structure, basically a subset of extent tree.
>
> This comes to the decision to make bg-tree feature as a compat RO flag.
>
> As we didn't put free-space-tree into "-O" options, but "-R" options.
> So the same should be done for most compat RO flags.
>
> Furthermore I remember I discussed about this before, extent tree change
> should not need a full incompat flag, as pure read-only tools, like
> btrfs-fuse should still be able to read the subvolume/csum/chunk/root
> trees without any problem.
>
> So following above reasons, bg-tree is compat RO, and compat RO goes
> into "-R" options, I see no reason to put it into "-O" options.

After more consideration, I believe we shouldn't split all the features
(including quota) between "-O" and "-R" options.

Firstly, although free space tree is compat RO (and a lot of future
features will also be compat RO), it's still a on-disk format change (a
new tree, some new keys).

It's even a bigger change compared to NO_HOLES features.
No to mention the block group tree.

Now we have a very bad split for -R and -O, some of them are on-disk
format change that is large enough, but still compat RO.

Some of them should be compat RO, but still set as incompt flags.

To me, end users should not really bother what the feature is
implemented, they only need to bother:

- What the feature is doing
- What is the compatibility
   The incompat and compat RO doesn't make too much difference for most
   users, they just care about which kernel version is compatible.

So from this point of view, -O/-R split it not really helpful from the
very beginning.

It may make sense for quota, which is the only exception, it's supported
from the very beginning, without a compat RO/incompat flag.

But for more and more features, -O/-R split doesn't make much sense.

Thanks,
Qu

>
> Thanks,
> Qu
>
>>
>> As it's in one patch please send a fixup so I can fold it. Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/5] btrfs-progs: separate block group tree from extent tree v2
  2022-09-01 12:15       ` Qu Wenruo
@ 2022-09-02  9:21         ` David Sterba
  2022-09-02  9:37           ` Qu Wenruo
  0 siblings, 1 reply; 16+ messages in thread
From: David Sterba @ 2022-09-02  9:21 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Thu, Sep 01, 2022 at 08:15:07PM +0800, Qu Wenruo wrote:
> >>> --- a/common/fsfeatures.c
> >>> +++ b/common/fsfeatures.c
> >>> @@ -172,6 +172,14 @@ static const struct btrfs_feature
> >>> runtime_features[] = {
> >>>           VERSION_TO_STRING2(safe, 4,9),
> >>>           VERSION_TO_STRING2(default, 5,15),
> >>>           .desc        = "free space tree (space_cache=v2)"
> >>> +    }, {
> >>> +        .name        = "block-group-tree",
> >>> +        .flag        = BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE,
> >>> +        .sysfs_name = "block_group_tree",
> >>> +        VERSION_TO_STRING2(compat, 6,0),
> >>> +        VERSION_NULL(safe),
> >>> +        VERSION_NULL(default),
> >>> +        .desc        = "block group tree to reduce mount time"
> >>
> >> Like explaining that this is a runtime feature and I have not noticed
> >> until I tried to test it expecting to see it among the mkfs-time
> >> features but there was nothing in 'mkfs.btrfs -O list-all'.
> >>
> >> This is a mkfs-time feature as it creates a fundamental on-disk
> >> structure, basically a subset of extent tree.
> >
> > This comes to the decision to make bg-tree feature as a compat RO flag.
> >
> > As we didn't put free-space-tree into "-O" options, but "-R" options.
> > So the same should be done for most compat RO flags.
> >
> > Furthermore I remember I discussed about this before, extent tree change
> > should not need a full incompat flag, as pure read-only tools, like
> > btrfs-fuse should still be able to read the subvolume/csum/chunk/root
> > trees without any problem.
> >
> > So following above reasons, bg-tree is compat RO, and compat RO goes
> > into "-R" options, I see no reason to put it into "-O" options.
> 
> After more consideration, I believe we shouldn't split all the features
> (including quota) between "-O" and "-R" options.

After reading your previous I got to the same conclusion.

> Firstly, although free space tree is compat RO (and a lot of future
> features will also be compat RO), it's still a on-disk format change (a
> new tree, some new keys).
> 
> It's even a bigger change compared to NO_HOLES features.
> No to mention the block group tree.
> 
> Now we have a very bad split for -R and -O, some of them are on-disk
> format change that is large enough, but still compat RO.

Agreed.

> Some of them should be compat RO, but still set as incompt flags.
> 
> To me, end users should not really bother what the feature is
> implemented, they only need to bother:
> 
> - What the feature is doing
> - What is the compatibility
>    The incompat and compat RO doesn't make too much difference for most
>    users, they just care about which kernel version is compatible.
> 
> So from this point of view, -O/-R split it not really helpful from the
> very beginning.
> 
> It may make sense for quota, which is the only exception, it's supported
> from the very beginning, without a compat RO/incompat flag.
> 
> But for more and more features, -O/-R split doesn't make much sense.

Yeah, the free-space-tree is misplaced and I did not realize that back
then. That something is possible to switch on at run time by a mount
option should not be the only condition to put the option to the -R option.

Quota are maybe still a good example of the runtime feature, there's a
command to enable and disable it. There are additional structures
created or deleted but it's not something fundamental. The distinction
in the options should hint at what's the type "what if I don't select
this now, can I turn it on later?", perhaps documentation should be more
explicit about that.

For compatibility we need to keep free-space-tree under -R but we can
add an alias to -O and everything of that sort add there too, like the
block group tree.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/5] btrfs-progs: separate block group tree from extent tree v2
  2022-09-02  9:21         ` David Sterba
@ 2022-09-02  9:37           ` Qu Wenruo
  2022-09-02 12:10             ` David Sterba
  0 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2022-09-02  9:37 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs



On 2022/9/2 17:21, David Sterba wrote:
> On Thu, Sep 01, 2022 at 08:15:07PM +0800, Qu Wenruo wrote:
>>>>> --- a/common/fsfeatures.c
>>>>> +++ b/common/fsfeatures.c
>>>>> @@ -172,6 +172,14 @@ static const struct btrfs_feature
>>>>> runtime_features[] = {
>>>>>            VERSION_TO_STRING2(safe, 4,9),
>>>>>            VERSION_TO_STRING2(default, 5,15),
>>>>>            .desc        = "free space tree (space_cache=v2)"
>>>>> +    }, {
>>>>> +        .name        = "block-group-tree",
>>>>> +        .flag        = BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE,
>>>>> +        .sysfs_name = "block_group_tree",
>>>>> +        VERSION_TO_STRING2(compat, 6,0),
>>>>> +        VERSION_NULL(safe),
>>>>> +        VERSION_NULL(default),
>>>>> +        .desc        = "block group tree to reduce mount time"
>>>>
>>>> Like explaining that this is a runtime feature and I have not noticed
>>>> until I tried to test it expecting to see it among the mkfs-time
>>>> features but there was nothing in 'mkfs.btrfs -O list-all'.
>>>>
>>>> This is a mkfs-time feature as it creates a fundamental on-disk
>>>> structure, basically a subset of extent tree.
>>>
>>> This comes to the decision to make bg-tree feature as a compat RO flag.
>>>
>>> As we didn't put free-space-tree into "-O" options, but "-R" options.
>>> So the same should be done for most compat RO flags.
>>>
>>> Furthermore I remember I discussed about this before, extent tree change
>>> should not need a full incompat flag, as pure read-only tools, like
>>> btrfs-fuse should still be able to read the subvolume/csum/chunk/root
>>> trees without any problem.
>>>
>>> So following above reasons, bg-tree is compat RO, and compat RO goes
>>> into "-R" options, I see no reason to put it into "-O" options.
>>
>> After more consideration, I believe we shouldn't split all the features
>> (including quota) between "-O" and "-R" options.
> 
> After reading your previous I got to the same conclusion.
> 
>> Firstly, although free space tree is compat RO (and a lot of future
>> features will also be compat RO), it's still a on-disk format change (a
>> new tree, some new keys).
>>
>> It's even a bigger change compared to NO_HOLES features.
>> No to mention the block group tree.
>>
>> Now we have a very bad split for -R and -O, some of them are on-disk
>> format change that is large enough, but still compat RO.
> 
> Agreed.
> 
>> Some of them should be compat RO, but still set as incompt flags.
>>
>> To me, end users should not really bother what the feature is
>> implemented, they only need to bother:
>>
>> - What the feature is doing
>> - What is the compatibility
>>     The incompat and compat RO doesn't make too much difference for most
>>     users, they just care about which kernel version is compatible.
>>
>> So from this point of view, -O/-R split it not really helpful from the
>> very beginning.
>>
>> It may make sense for quota, which is the only exception, it's supported
>> from the very beginning, without a compat RO/incompat flag.
>>
>> But for more and more features, -O/-R split doesn't make much sense.
> 
> Yeah, the free-space-tree is misplaced and I did not realize that back
> then. That something is possible to switch on at run time by a mount
> option should not be the only condition to put the option to the -R option.
> 
> Quota are maybe still a good example of the runtime feature, there's a
> command to enable and disable it. There are additional structures
> created or deleted but it's not something fundamental. The distinction
> in the options should hint at what's the type "what if I don't select
> this now, can I turn it on later?", perhaps documentation should be more
> explicit about that.

Quota tree is a special case, just because it's from day-one, thus no 
compat/compat ro/incompat flags needed at all.

To me, we can accept one exception.

> 
> For compatibility we need to keep free-space-tree under -R but we can
> add an alias to -O and everything of that sort add there too, like the
> block group tree.

That's simple, make -R deprecated, and treat -R just as -O internally, 
and put all features including quota into -O.

Of course, we may need some small changes, as now one fs feature needs 1 
or 0 compat/compat ro/incompat flags set.
But everything else, from the compat/safe/default string can be 
inherited from the existing format.

By this, we have the minimal code change, while still keeps the same 
compatibility (in fact, greatly enlarged -O options)

Thanks,
Qu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/5] btrfs-progs: separate block group tree from extent tree v2
  2022-09-02  9:37           ` Qu Wenruo
@ 2022-09-02 12:10             ` David Sterba
  0 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2022-09-02 12:10 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Fri, Sep 02, 2022 at 05:37:53PM +0800, Qu Wenruo wrote:
> > Yeah, the free-space-tree is misplaced and I did not realize that back
> > then. That something is possible to switch on at run time by a mount
> > option should not be the only condition to put the option to the -R option.
> > 
> > Quota are maybe still a good example of the runtime feature, there's a
> > command to enable and disable it. There are additional structures
> > created or deleted but it's not something fundamental. The distinction
> > in the options should hint at what's the type "what if I don't select
> > this now, can I turn it on later?", perhaps documentation should be more
> > explicit about that.
> 
> Quota tree is a special case, just because it's from day-one, thus no 
> compat/compat ro/incompat flags needed at all.
> 
> To me, we can accept one exception.
> 
> > 
> > For compatibility we need to keep free-space-tree under -R but we can
> > add an alias to -O and everything of that sort add there too, like the
> > block group tree.
> 
> That's simple, make -R deprecated, and treat -R just as -O internally, 
> and put all features including quota into -O.
> 
> Of course, we may need some small changes, as now one fs feature needs 1 
> or 0 compat/compat ro/incompat flags set.
> But everything else, from the compat/safe/default string can be 
> inherited from the existing format.
> 
> By this, we have the minimal code change, while still keeps the same 
> compatibility (in fact, greatly enlarged -O options)

It's a change to the interface so it's always with some consequences but
I think a single option for features is indeed an improvement. We now
have only 2 under -R so it's not that bad yet.

I've looked to manual pages of other filesystems' mkfs, there are
separate options but for specific features like for journal, or
additional tunables. For the global features there's one option.

We can add the quota and f-s-tree in a minor release, it's not breaking
compatibility and add a warning to -R in some future major release.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/5] btrfs-progs: separate block group tree from extent tree v2
  2022-08-09  6:03 ` [PATCH v3 3/5] btrfs-progs: separate block group tree from extent tree v2 Qu Wenruo
  2022-08-31 19:14   ` David Sterba
@ 2022-10-03 14:48   ` Anand Jain
  2022-10-03 23:28     ` Qu Wenruo
  1 sibling, 1 reply; 16+ messages in thread
From: Anand Jain @ 2022-10-03 14:48 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs


This patch is causing regressions; now can't mkfs with extent-tree-v2.


$ mkfs.btrfs -f -O block-group-tree  /dev/nvme0n1
btrfs-progs v5.19.1
See http://btrfs.wiki.kernel.org for more information.

ERROR: superblock magic doesn't match
ERROR: illegal nodesize 16384 (not equal to 4096 for mixed block group)



$ mkfs.btrfs -f -O extent-tree-v2  /dev/nvme0n1
btrfs-progs v5.19.1
See http://btrfs.wiki.kernel.org for more information.

ERROR: superblock magic doesn't match
NOTE: several default settings have changed in version 5.15, please make 
sure
       this does not affect your deployments:
       - DUP for metadata (-m dup)
       - enabled no-holes (-O no-holes)
       - enabled free-space-tree (-R free-space-tree)

Unable to find block group for 0
Unable to find block group for 0
Unable to find block group for 0
ERROR: no space to allocate metadata chunk
ERROR: failed to create default block groups: -28




On 09/08/2022 14:03, Qu Wenruo wrote:
> Block group tree feature is completely a standalone feature, and it has
> been over 5 years before the initial introduction to solve the long
> mount time.
> 
> I don't really want to waste another 5 years waiting for a feature which
> may or may not work, but definitely not properly reviewed for its
> preparation patches.
> 
> So this patch will separate the block group tree feature into a
> standalone compat RO feature.
> 
> There is a catch, in mkfs create_block_group_tree(), current
> tree-checker only accepts block group item with valid chunk_objectid,
> but the existing code from extent-tree-v2 didn't properly initialize it.
> 
> This patch will also fix above mentioned problem so kernel can mount it
> correctly.
> 
> Now mkfs/fsck should be able to handle the fs with block group tree.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   check/main.c               |  8 ++------
>   common/fsfeatures.c        |  8 ++++++++
>   common/fsfeatures.h        |  2 ++
>   kernel-shared/ctree.h      |  9 ++++++++-
>   kernel-shared/disk-io.c    |  4 ++--
>   kernel-shared/disk-io.h    |  2 +-
>   kernel-shared/print-tree.c |  5 ++---
>   mkfs/common.c              | 31 ++++++++++++++++++++++++-------
>   mkfs/main.c                |  3 ++-
>   9 files changed, 51 insertions(+), 21 deletions(-)
> 
> diff --git a/check/main.c b/check/main.c
> index 4f7ab8b29309..02abbd5289f9 100644
> --- a/check/main.c
> +++ b/check/main.c
> @@ -6293,7 +6293,7 @@ static int check_type_with_root(u64 rootid, u8 key_type)
>   			goto err;
>   		break;
>   	case BTRFS_BLOCK_GROUP_ITEM_KEY:
> -		if (btrfs_fs_incompat(gfs_info, EXTENT_TREE_V2)) {
> +		if (btrfs_fs_compat_ro(gfs_info, BLOCK_GROUP_TREE)) {
>   			if (rootid != BTRFS_BLOCK_GROUP_TREE_OBJECTID)
>   				goto err;
>   		} else if (rootid != BTRFS_EXTENT_TREE_OBJECTID) {
> @@ -9071,10 +9071,6 @@ again:
>   	ret = load_super_root(&normal_trees, gfs_info->chunk_root);
>   	if (ret < 0)
>   		goto out;
> -	ret = load_super_root(&normal_trees, gfs_info->block_group_root);
> -	if (ret < 0)
> -		goto out;
> -
>   	ret = parse_tree_roots(&normal_trees, &dropping_trees);
>   	if (ret < 0)
>   		goto out;
> @@ -9574,7 +9570,7 @@ again:
>   	 * If we are extent tree v2 then we can reint the block group root as
>   	 * well.
>   	 */
> -	if (btrfs_fs_incompat(gfs_info, EXTENT_TREE_V2)) {
> +	if (btrfs_fs_compat_ro(gfs_info, BLOCK_GROUP_TREE)) {
>   		ret = btrfs_fsck_reinit_root(trans, gfs_info->block_group_root);
>   		if (ret) {
>   			fprintf(stderr, "block group initialization failed\n");
> diff --git a/common/fsfeatures.c b/common/fsfeatures.c
> index 23a92c21a2cc..90704959b13b 100644
> --- a/common/fsfeatures.c
> +++ b/common/fsfeatures.c
> @@ -172,6 +172,14 @@ static const struct btrfs_feature runtime_features[] = {
>   		VERSION_TO_STRING2(safe, 4,9),
>   		VERSION_TO_STRING2(default, 5,15),
>   		.desc		= "free space tree (space_cache=v2)"
> +	}, {
> +		.name		= "block-group-tree",
> +		.flag		= BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE,
> +		.sysfs_name = "block_group_tree",
> +		VERSION_TO_STRING2(compat, 6,0),
> +		VERSION_NULL(safe),
> +		VERSION_NULL(default),
> +		.desc		= "block group tree to reduce mount time"
>   	},
>   	/* Keep this one last */
>   	{
> diff --git a/common/fsfeatures.h b/common/fsfeatures.h
> index 9e39c667b900..a8d77fd4da05 100644
> --- a/common/fsfeatures.h
> +++ b/common/fsfeatures.h
> @@ -45,6 +45,8 @@
>   
>   #define BTRFS_RUNTIME_FEATURE_QUOTA		(1ULL << 0)
>   #define BTRFS_RUNTIME_FEATURE_FREE_SPACE_TREE	(1ULL << 1)
> +#define BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE	(1ULL << 2)
> +
>   
>   void btrfs_list_all_fs_features(u64 mask_disallowed);
>   void btrfs_list_all_runtime_features(u64 mask_disallowed);
> diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h
> index c12076202577..d8909b3fdf20 100644
> --- a/kernel-shared/ctree.h
> +++ b/kernel-shared/ctree.h
> @@ -479,6 +479,12 @@ BUILD_ASSERT(sizeof(struct btrfs_super_block) == BTRFS_SUPER_INFO_SIZE);
>    */
>   #define BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID	(1ULL << 1)
>   
> +/*
> + * Save all block group items into a dedicated block group tree, to greatly
> + * reduce mount time for large fs.
> + */
> +#define BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE	(1ULL << 5)
> +
>   #define BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF	(1ULL << 0)
>   #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL	(1ULL << 1)
>   #define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS	(1ULL << 2)
> @@ -508,7 +514,8 @@ BUILD_ASSERT(sizeof(struct btrfs_super_block) == BTRFS_SUPER_INFO_SIZE);
>    */
>   #define BTRFS_FEATURE_COMPAT_RO_SUPP			\
>   	(BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |	\
> -	 BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID)
> +	 BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID| \
> +	 BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE)
>   
>   #if EXPERIMENTAL
>   #define BTRFS_FEATURE_INCOMPAT_SUPP			\
> diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c
> index 80db5976cc3f..6eeb5ecd1d59 100644
> --- a/kernel-shared/disk-io.c
> +++ b/kernel-shared/disk-io.c
> @@ -1203,7 +1203,7 @@ static int load_important_roots(struct btrfs_fs_info *fs_info,
>   		backup = sb->super_roots + index;
>   	}
>   
> -	if (!btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
> +	if (!btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE)) {
>   		free(fs_info->block_group_root);
>   		fs_info->block_group_root = NULL;
>   		goto tree_root;
> @@ -1256,7 +1256,7 @@ int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info, u64 root_tree_bytenr,
>   	if (ret)
>   		return ret;
>   
> -	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
> +	if (btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE)) {
>   		ret = find_and_setup_root(root, fs_info,
>   				BTRFS_BLOCK_GROUP_TREE_OBJECTID,
>   				fs_info->block_group_root);
> diff --git a/kernel-shared/disk-io.h b/kernel-shared/disk-io.h
> index bba97fc1a814..6c8eaa2bd13d 100644
> --- a/kernel-shared/disk-io.h
> +++ b/kernel-shared/disk-io.h
> @@ -232,7 +232,7 @@ int btrfs_global_root_insert(struct btrfs_fs_info *fs_info,
>   static inline struct btrfs_root *btrfs_block_group_root(
>   						struct btrfs_fs_info *fs_info)
>   {
> -	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
> +	if (btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE))
>   		return fs_info->block_group_root;
>   	return btrfs_extent_root(fs_info, 0);
>   }
> diff --git a/kernel-shared/print-tree.c b/kernel-shared/print-tree.c
> index bffe30b405c7..b2ee77c2fb73 100644
> --- a/kernel-shared/print-tree.c
> +++ b/kernel-shared/print-tree.c
> @@ -1668,6 +1668,7 @@ struct readable_flag_entry {
>   static struct readable_flag_entry compat_ro_flags_array[] = {
>   	DEF_COMPAT_RO_FLAG_ENTRY(FREE_SPACE_TREE),
>   	DEF_COMPAT_RO_FLAG_ENTRY(FREE_SPACE_TREE_VALID),
> +	DEF_COMPAT_RO_FLAG_ENTRY(BLOCK_GROUP_TREE),
>   };
>   static const int compat_ro_flags_num = sizeof(compat_ro_flags_array) /
>   				       sizeof(struct readable_flag_entry);
> @@ -1754,9 +1755,7 @@ static void print_readable_compat_ro_flag(u64 flag)
>   	 */
>   	return __print_readable_flag(flag, compat_ro_flags_array,
>   				     compat_ro_flags_num,
> -				     BTRFS_FEATURE_COMPAT_RO_SUPP |
> -				     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |
> -				     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID);
> +				     BTRFS_FEATURE_COMPAT_RO_SUPP);
>   }
>   
>   static void print_readable_incompat_flag(u64 flag)
> diff --git a/mkfs/common.c b/mkfs/common.c
> index b72338551dfb..cb616f13ef9b 100644
> --- a/mkfs/common.c
> +++ b/mkfs/common.c
> @@ -75,6 +75,8 @@ static int btrfs_create_tree_root(int fd, struct btrfs_mkfs_config *cfg,
>   	int blk;
>   	int i;
>   	u8 uuid[BTRFS_UUID_SIZE];
> +	bool block_group_tree = !!(cfg->runtime_features &
> +				   BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE);
>   
>   	memset(buf->data + sizeof(struct btrfs_header), 0,
>   		cfg->nodesize - sizeof(struct btrfs_header));
> @@ -101,6 +103,9 @@ static int btrfs_create_tree_root(int fd, struct btrfs_mkfs_config *cfg,
>   		if (blk == MKFS_ROOT_TREE || blk == MKFS_CHUNK_TREE)
>   			continue;
>   
> +		if (!block_group_tree && blk == MKFS_BLOCK_GROUP_TREE)
> +			continue;
> +
>   		btrfs_set_root_bytenr(&root_item, cfg->blocks[blk]);
>   		btrfs_set_disk_key_objectid(&disk_key,
>   			reference_root_table[blk]);
> @@ -216,7 +221,8 @@ static int create_block_group_tree(int fd, struct btrfs_mkfs_config *cfg,
>   
>   	memset(buf->data + sizeof(struct btrfs_header), 0,
>   		cfg->nodesize - sizeof(struct btrfs_header));
> -	write_block_group_item(buf, 0, bg_offset, bg_size, bg_used, 0,
> +	write_block_group_item(buf, 0, bg_offset, bg_size, bg_used,
> +			       BTRFS_FIRST_CHUNK_TREE_OBJECTID,
>   			       cfg->leaf_data_size -
>   			       sizeof(struct btrfs_block_group_item));
>   	btrfs_set_header_bytenr(buf, cfg->blocks[MKFS_BLOCK_GROUP_TREE]);
> @@ -357,6 +363,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
>   	u32 array_size;
>   	u32 item_size;
>   	u64 total_used = 0;
> +	u64 ro_flags = 0;
>   	int skinny_metadata = !!(cfg->features &
>   				 BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA);
>   	u64 num_bytes;
> @@ -365,6 +372,8 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
>   	bool add_block_group = true;
>   	bool free_space_tree = !!(cfg->runtime_features &
>   				  BTRFS_RUNTIME_FEATURE_FREE_SPACE_TREE);
> +	bool block_group_tree = !!(cfg->runtime_features &
> +				   BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE);
>   	bool extent_tree_v2 = !!(cfg->features &
>   				 BTRFS_FEATURE_INCOMPAT_EXTENT_TREE_V2);
>   
> @@ -372,8 +381,13 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
>   	       sizeof(enum btrfs_mkfs_block) * ARRAY_SIZE(default_blocks));
>   	blocks_nr = ARRAY_SIZE(default_blocks);
>   
> -	/* Extent tree v2 needs an extra block for block group tree.*/
> -	if (extent_tree_v2) {
> +	/*
> +	 * Add one new block for block group tree.
> +	 * And for block group tree, we don't need to add block group item
> +	 * into extent tree, the item will be handled in block group tree
> +	 * initialization.
> +	 */
> +	if (block_group_tree) {
>   		mkfs_blocks_add(blocks, &blocks_nr, MKFS_BLOCK_GROUP_TREE);
>   		add_block_group = false;
>   	}
> @@ -433,12 +447,15 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
>   		btrfs_set_super_cache_generation(&super, -1);
>   	btrfs_set_super_incompat_flags(&super, cfg->features);
>   	if (free_space_tree) {
> -		u64 ro_flags = BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |
> -			BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID;
> +		ro_flags |= (BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |
> +			     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID);
>   
> -		btrfs_set_super_compat_ro_flags(&super, ro_flags);
>   		btrfs_set_super_cache_generation(&super, 0);
>   	}
> +	if (block_group_tree)
> +		ro_flags |= BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE;
> +	btrfs_set_super_compat_ro_flags(&super, ro_flags);
> +
>   	if (extent_tree_v2)
>   		btrfs_set_super_nr_global_roots(&super, 1);
>   
> @@ -695,7 +712,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
>   			goto out;
>   	}
>   
> -	if (extent_tree_v2) {
> +	if (block_group_tree) {
>   		ret = create_block_group_tree(fd, cfg, buf,
>   					      system_group_offset,
>   					      system_group_size, total_used);
> diff --git a/mkfs/main.c b/mkfs/main.c
> index ce096d362171..518ce0fd7523 100644
> --- a/mkfs/main.c
> +++ b/mkfs/main.c
> @@ -299,7 +299,8 @@ static int recow_roots(struct btrfs_trans_handle *trans,
>   	ret = __recow_root(trans, info->dev_root);
>   	if (ret)
>   		return ret;
> -        if (btrfs_fs_incompat(info, EXTENT_TREE_V2)) {
> +
> +	if (btrfs_fs_compat_ro(info, BLOCK_GROUP_TREE)) {
>   		ret = __recow_root(trans, info->block_group_root);
>   		if (ret)
>   			return ret;


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/5] btrfs-progs: separate block group tree from extent tree v2
  2022-10-03 14:48   ` Anand Jain
@ 2022-10-03 23:28     ` Qu Wenruo
  2022-10-04  0:05       ` Qu Wenruo
  0 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2022-10-03 23:28 UTC (permalink / raw)
  To: Anand Jain, Qu Wenruo; +Cc: linux-btrfs



On 2022/10/3 22:48, Anand Jain wrote:
> 
> This patch is causing regressions; now can't mkfs with extent-tree-v2.

I'm already looking at it.

> 
> 
> $ mkfs.btrfs -f -O block-group-tree  /dev/nvme0n1
> btrfs-progs v5.19.1
> See http://btrfs.wiki.kernel.org for more information.
> 
> ERROR: superblock magic doesn't match
> ERROR: illegal nodesize 16384 (not equal to 4096 for mixed block group)
> 
> 
> 
> $ mkfs.btrfs -f -O extent-tree-v2  /dev/nvme0n1
> btrfs-progs v5.19.1
> See http://btrfs.wiki.kernel.org for more information.
> 
> ERROR: superblock magic doesn't match
> NOTE: several default settings have changed in version 5.15, please make 
> sure
>        this does not affect your deployments:
>        - DUP for metadata (-m dup)
>        - enabled no-holes (-O no-holes)
>        - enabled free-space-tree (-R free-space-tree)
> 
> Unable to find block group for 0
> Unable to find block group for 0
> Unable to find block group for 0
> ERROR: no space to allocate metadata chunk
> ERROR: failed to create default block groups: -28
> 
> 
> 
> 
> On 09/08/2022 14:03, Qu Wenruo wrote:
>> Block group tree feature is completely a standalone feature, and it has
>> been over 5 years before the initial introduction to solve the long
>> mount time.
>>
>> I don't really want to waste another 5 years waiting for a feature which
>> may or may not work, but definitely not properly reviewed for its
>> preparation patches.
>>
>> So this patch will separate the block group tree feature into a
>> standalone compat RO feature.
>>
>> There is a catch, in mkfs create_block_group_tree(), current
>> tree-checker only accepts block group item with valid chunk_objectid,
>> but the existing code from extent-tree-v2 didn't properly initialize it.
>>
>> This patch will also fix above mentioned problem so kernel can mount it
>> correctly.
>>
>> Now mkfs/fsck should be able to handle the fs with block group tree.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   check/main.c               |  8 ++------
>>   common/fsfeatures.c        |  8 ++++++++
>>   common/fsfeatures.h        |  2 ++
>>   kernel-shared/ctree.h      |  9 ++++++++-
>>   kernel-shared/disk-io.c    |  4 ++--
>>   kernel-shared/disk-io.h    |  2 +-
>>   kernel-shared/print-tree.c |  5 ++---
>>   mkfs/common.c              | 31 ++++++++++++++++++++++++-------
>>   mkfs/main.c                |  3 ++-
>>   9 files changed, 51 insertions(+), 21 deletions(-)
>>
>> diff --git a/check/main.c b/check/main.c
>> index 4f7ab8b29309..02abbd5289f9 100644
>> --- a/check/main.c
>> +++ b/check/main.c
>> @@ -6293,7 +6293,7 @@ static int check_type_with_root(u64 rootid, u8 
>> key_type)
>>               goto err;
>>           break;
>>       case BTRFS_BLOCK_GROUP_ITEM_KEY:
>> -        if (btrfs_fs_incompat(gfs_info, EXTENT_TREE_V2)) {
>> +        if (btrfs_fs_compat_ro(gfs_info, BLOCK_GROUP_TREE)) {
>>               if (rootid != BTRFS_BLOCK_GROUP_TREE_OBJECTID)
>>                   goto err;
>>           } else if (rootid != BTRFS_EXTENT_TREE_OBJECTID) {
>> @@ -9071,10 +9071,6 @@ again:
>>       ret = load_super_root(&normal_trees, gfs_info->chunk_root);
>>       if (ret < 0)
>>           goto out;
>> -    ret = load_super_root(&normal_trees, gfs_info->block_group_root);
>> -    if (ret < 0)
>> -        goto out;
>> -
>>       ret = parse_tree_roots(&normal_trees, &dropping_trees);
>>       if (ret < 0)
>>           goto out;
>> @@ -9574,7 +9570,7 @@ again:
>>        * If we are extent tree v2 then we can reint the block group 
>> root as
>>        * well.
>>        */
>> -    if (btrfs_fs_incompat(gfs_info, EXTENT_TREE_V2)) {
>> +    if (btrfs_fs_compat_ro(gfs_info, BLOCK_GROUP_TREE)) {
>>           ret = btrfs_fsck_reinit_root(trans, 
>> gfs_info->block_group_root);
>>           if (ret) {
>>               fprintf(stderr, "block group initialization failed\n");
>> diff --git a/common/fsfeatures.c b/common/fsfeatures.c
>> index 23a92c21a2cc..90704959b13b 100644
>> --- a/common/fsfeatures.c
>> +++ b/common/fsfeatures.c
>> @@ -172,6 +172,14 @@ static const struct btrfs_feature 
>> runtime_features[] = {
>>           VERSION_TO_STRING2(safe, 4,9),
>>           VERSION_TO_STRING2(default, 5,15),
>>           .desc        = "free space tree (space_cache=v2)"
>> +    }, {
>> +        .name        = "block-group-tree",
>> +        .flag        = BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE,
>> +        .sysfs_name = "block_group_tree",
>> +        VERSION_TO_STRING2(compat, 6,0),
>> +        VERSION_NULL(safe),
>> +        VERSION_NULL(default),
>> +        .desc        = "block group tree to reduce mount time"
>>       },
>>       /* Keep this one last */
>>       {
>> diff --git a/common/fsfeatures.h b/common/fsfeatures.h
>> index 9e39c667b900..a8d77fd4da05 100644
>> --- a/common/fsfeatures.h
>> +++ b/common/fsfeatures.h
>> @@ -45,6 +45,8 @@
>>   #define BTRFS_RUNTIME_FEATURE_QUOTA        (1ULL << 0)
>>   #define BTRFS_RUNTIME_FEATURE_FREE_SPACE_TREE    (1ULL << 1)
>> +#define BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE    (1ULL << 2)
>> +
>>   void btrfs_list_all_fs_features(u64 mask_disallowed);
>>   void btrfs_list_all_runtime_features(u64 mask_disallowed);
>> diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h
>> index c12076202577..d8909b3fdf20 100644
>> --- a/kernel-shared/ctree.h
>> +++ b/kernel-shared/ctree.h
>> @@ -479,6 +479,12 @@ BUILD_ASSERT(sizeof(struct btrfs_super_block) == 
>> BTRFS_SUPER_INFO_SIZE);
>>    */
>>   #define BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID    (1ULL << 1)
>> +/*
>> + * Save all block group items into a dedicated block group tree, to 
>> greatly
>> + * reduce mount time for large fs.
>> + */
>> +#define BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE    (1ULL << 5)
>> +
>>   #define BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF    (1ULL << 0)
>>   #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL    (1ULL << 1)
>>   #define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS    (1ULL << 2)
>> @@ -508,7 +514,8 @@ BUILD_ASSERT(sizeof(struct btrfs_super_block) == 
>> BTRFS_SUPER_INFO_SIZE);
>>    */
>>   #define BTRFS_FEATURE_COMPAT_RO_SUPP            \
>>       (BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |    \
>> -     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID)
>> +     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID| \
>> +     BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE)
>>   #if EXPERIMENTAL
>>   #define BTRFS_FEATURE_INCOMPAT_SUPP            \
>> diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c
>> index 80db5976cc3f..6eeb5ecd1d59 100644
>> --- a/kernel-shared/disk-io.c
>> +++ b/kernel-shared/disk-io.c
>> @@ -1203,7 +1203,7 @@ static int load_important_roots(struct 
>> btrfs_fs_info *fs_info,
>>           backup = sb->super_roots + index;
>>       }
>> -    if (!btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
>> +    if (!btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE)) {
>>           free(fs_info->block_group_root);
>>           fs_info->block_group_root = NULL;
>>           goto tree_root;
>> @@ -1256,7 +1256,7 @@ int btrfs_setup_all_roots(struct btrfs_fs_info 
>> *fs_info, u64 root_tree_bytenr,
>>       if (ret)
>>           return ret;
>> -    if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
>> +    if (btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE)) {
>>           ret = find_and_setup_root(root, fs_info,
>>                   BTRFS_BLOCK_GROUP_TREE_OBJECTID,
>>                   fs_info->block_group_root);
>> diff --git a/kernel-shared/disk-io.h b/kernel-shared/disk-io.h
>> index bba97fc1a814..6c8eaa2bd13d 100644
>> --- a/kernel-shared/disk-io.h
>> +++ b/kernel-shared/disk-io.h
>> @@ -232,7 +232,7 @@ int btrfs_global_root_insert(struct btrfs_fs_info 
>> *fs_info,
>>   static inline struct btrfs_root *btrfs_block_group_root(
>>                           struct btrfs_fs_info *fs_info)
>>   {
>> -    if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
>> +    if (btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE))
>>           return fs_info->block_group_root;
>>       return btrfs_extent_root(fs_info, 0);
>>   }
>> diff --git a/kernel-shared/print-tree.c b/kernel-shared/print-tree.c
>> index bffe30b405c7..b2ee77c2fb73 100644
>> --- a/kernel-shared/print-tree.c
>> +++ b/kernel-shared/print-tree.c
>> @@ -1668,6 +1668,7 @@ struct readable_flag_entry {
>>   static struct readable_flag_entry compat_ro_flags_array[] = {
>>       DEF_COMPAT_RO_FLAG_ENTRY(FREE_SPACE_TREE),
>>       DEF_COMPAT_RO_FLAG_ENTRY(FREE_SPACE_TREE_VALID),
>> +    DEF_COMPAT_RO_FLAG_ENTRY(BLOCK_GROUP_TREE),
>>   };
>>   static const int compat_ro_flags_num = sizeof(compat_ro_flags_array) /
>>                          sizeof(struct readable_flag_entry);
>> @@ -1754,9 +1755,7 @@ static void print_readable_compat_ro_flag(u64 flag)
>>        */
>>       return __print_readable_flag(flag, compat_ro_flags_array,
>>                        compat_ro_flags_num,
>> -                     BTRFS_FEATURE_COMPAT_RO_SUPP |
>> -                     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |
>> -                     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID);
>> +                     BTRFS_FEATURE_COMPAT_RO_SUPP);
>>   }
>>   static void print_readable_incompat_flag(u64 flag)
>> diff --git a/mkfs/common.c b/mkfs/common.c
>> index b72338551dfb..cb616f13ef9b 100644
>> --- a/mkfs/common.c
>> +++ b/mkfs/common.c
>> @@ -75,6 +75,8 @@ static int btrfs_create_tree_root(int fd, struct 
>> btrfs_mkfs_config *cfg,
>>       int blk;
>>       int i;
>>       u8 uuid[BTRFS_UUID_SIZE];
>> +    bool block_group_tree = !!(cfg->runtime_features &
>> +                   BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE);
>>       memset(buf->data + sizeof(struct btrfs_header), 0,
>>           cfg->nodesize - sizeof(struct btrfs_header));
>> @@ -101,6 +103,9 @@ static int btrfs_create_tree_root(int fd, struct 
>> btrfs_mkfs_config *cfg,
>>           if (blk == MKFS_ROOT_TREE || blk == MKFS_CHUNK_TREE)
>>               continue;
>> +        if (!block_group_tree && blk == MKFS_BLOCK_GROUP_TREE)
>> +            continue;
>> +
>>           btrfs_set_root_bytenr(&root_item, cfg->blocks[blk]);
>>           btrfs_set_disk_key_objectid(&disk_key,
>>               reference_root_table[blk]);
>> @@ -216,7 +221,8 @@ static int create_block_group_tree(int fd, struct 
>> btrfs_mkfs_config *cfg,
>>       memset(buf->data + sizeof(struct btrfs_header), 0,
>>           cfg->nodesize - sizeof(struct btrfs_header));
>> -    write_block_group_item(buf, 0, bg_offset, bg_size, bg_used, 0,
>> +    write_block_group_item(buf, 0, bg_offset, bg_size, bg_used,
>> +                   BTRFS_FIRST_CHUNK_TREE_OBJECTID,
>>                      cfg->leaf_data_size -
>>                      sizeof(struct btrfs_block_group_item));
>>       btrfs_set_header_bytenr(buf, cfg->blocks[MKFS_BLOCK_GROUP_TREE]);
>> @@ -357,6 +363,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
>>       u32 array_size;
>>       u32 item_size;
>>       u64 total_used = 0;
>> +    u64 ro_flags = 0;
>>       int skinny_metadata = !!(cfg->features &
>>                    BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA);
>>       u64 num_bytes;
>> @@ -365,6 +372,8 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
>>       bool add_block_group = true;
>>       bool free_space_tree = !!(cfg->runtime_features &
>>                     BTRFS_RUNTIME_FEATURE_FREE_SPACE_TREE);
>> +    bool block_group_tree = !!(cfg->runtime_features &
>> +                   BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE);
>>       bool extent_tree_v2 = !!(cfg->features &
>>                    BTRFS_FEATURE_INCOMPAT_EXTENT_TREE_V2);
>> @@ -372,8 +381,13 @@ int make_btrfs(int fd, struct btrfs_mkfs_config 
>> *cfg)
>>              sizeof(enum btrfs_mkfs_block) * ARRAY_SIZE(default_blocks));
>>       blocks_nr = ARRAY_SIZE(default_blocks);
>> -    /* Extent tree v2 needs an extra block for block group tree.*/
>> -    if (extent_tree_v2) {
>> +    /*
>> +     * Add one new block for block group tree.
>> +     * And for block group tree, we don't need to add block group item
>> +     * into extent tree, the item will be handled in block group tree
>> +     * initialization.
>> +     */
>> +    if (block_group_tree) {
>>           mkfs_blocks_add(blocks, &blocks_nr, MKFS_BLOCK_GROUP_TREE);
>>           add_block_group = false;
>>       }
>> @@ -433,12 +447,15 @@ int make_btrfs(int fd, struct btrfs_mkfs_config 
>> *cfg)
>>           btrfs_set_super_cache_generation(&super, -1);
>>       btrfs_set_super_incompat_flags(&super, cfg->features);
>>       if (free_space_tree) {
>> -        u64 ro_flags = BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |
>> -            BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID;
>> +        ro_flags |= (BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |
>> +                 BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID);
>> -        btrfs_set_super_compat_ro_flags(&super, ro_flags);
>>           btrfs_set_super_cache_generation(&super, 0);
>>       }
>> +    if (block_group_tree)
>> +        ro_flags |= BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE;
>> +    btrfs_set_super_compat_ro_flags(&super, ro_flags);
>> +
>>       if (extent_tree_v2)
>>           btrfs_set_super_nr_global_roots(&super, 1);
>> @@ -695,7 +712,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
>>               goto out;
>>       }
>> -    if (extent_tree_v2) {
>> +    if (block_group_tree) {
>>           ret = create_block_group_tree(fd, cfg, buf,
>>                             system_group_offset,
>>                             system_group_size, total_used);
>> diff --git a/mkfs/main.c b/mkfs/main.c
>> index ce096d362171..518ce0fd7523 100644
>> --- a/mkfs/main.c
>> +++ b/mkfs/main.c
>> @@ -299,7 +299,8 @@ static int recow_roots(struct btrfs_trans_handle 
>> *trans,
>>       ret = __recow_root(trans, info->dev_root);
>>       if (ret)
>>           return ret;
>> -        if (btrfs_fs_incompat(info, EXTENT_TREE_V2)) {
>> +
>> +    if (btrfs_fs_compat_ro(info, BLOCK_GROUP_TREE)) {
>>           ret = __recow_root(trans, info->block_group_root);
>>           if (ret)
>>               return ret;
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/5] btrfs-progs: separate block group tree from extent tree v2
  2022-10-03 23:28     ` Qu Wenruo
@ 2022-10-04  0:05       ` Qu Wenruo
  0 siblings, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2022-10-04  0:05 UTC (permalink / raw)
  To: Anand Jain, Qu Wenruo; +Cc: linux-btrfs



On 2022/10/4 07:28, Qu Wenruo wrote:
> 
> 
> On 2022/10/3 22:48, Anand Jain wrote:
>>
>> This patch is causing regressions; now can't mkfs with extent-tree-v2.
> 
> I'm already looking at it.

It's a more complex thing, not just a simple regression.

Firstly, commit "btrfs-progs: prepare merging compat feature lists" 
tries to merge the -O and -R options, which is a good idea.

The problem is, we're still just using the initial u64 numbers for 
btrfs_parse_fs_feaetures(), which we expect to get a simple U64 bit flags.

But unfortunately this means the u64 will have conflicting bits for 
compat_ro and incompat flags.

And for block group tree case, it's 1<<2 in compat_ro, while 1<<2 in 
incompat it's mixed bg.

Thus we trigger the problem.

I'll rework the merge patch to avoid the problem.

Thanks,
Qu

> 
>>
>>
>> $ mkfs.btrfs -f -O block-group-tree  /dev/nvme0n1
>> btrfs-progs v5.19.1
>> See http://btrfs.wiki.kernel.org for more information.
>>
>> ERROR: superblock magic doesn't match
>> ERROR: illegal nodesize 16384 (not equal to 4096 for mixed block group)
>>
>>
>>
>> $ mkfs.btrfs -f -O extent-tree-v2  /dev/nvme0n1
>> btrfs-progs v5.19.1
>> See http://btrfs.wiki.kernel.org for more information.
>>
>> ERROR: superblock magic doesn't match
>> NOTE: several default settings have changed in version 5.15, please 
>> make sure
>>        this does not affect your deployments:
>>        - DUP for metadata (-m dup)
>>        - enabled no-holes (-O no-holes)
>>        - enabled free-space-tree (-R free-space-tree)
>>
>> Unable to find block group for 0
>> Unable to find block group for 0
>> Unable to find block group for 0
>> ERROR: no space to allocate metadata chunk
>> ERROR: failed to create default block groups: -28
>>
>>
>>
>>
>> On 09/08/2022 14:03, Qu Wenruo wrote:
>>> Block group tree feature is completely a standalone feature, and it has
>>> been over 5 years before the initial introduction to solve the long
>>> mount time.
>>>
>>> I don't really want to waste another 5 years waiting for a feature which
>>> may or may not work, but definitely not properly reviewed for its
>>> preparation patches.
>>>
>>> So this patch will separate the block group tree feature into a
>>> standalone compat RO feature.
>>>
>>> There is a catch, in mkfs create_block_group_tree(), current
>>> tree-checker only accepts block group item with valid chunk_objectid,
>>> but the existing code from extent-tree-v2 didn't properly initialize it.
>>>
>>> This patch will also fix above mentioned problem so kernel can mount it
>>> correctly.
>>>
>>> Now mkfs/fsck should be able to handle the fs with block group tree.
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>>   check/main.c               |  8 ++------
>>>   common/fsfeatures.c        |  8 ++++++++
>>>   common/fsfeatures.h        |  2 ++
>>>   kernel-shared/ctree.h      |  9 ++++++++-
>>>   kernel-shared/disk-io.c    |  4 ++--
>>>   kernel-shared/disk-io.h    |  2 +-
>>>   kernel-shared/print-tree.c |  5 ++---
>>>   mkfs/common.c              | 31 ++++++++++++++++++++++++-------
>>>   mkfs/main.c                |  3 ++-
>>>   9 files changed, 51 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/check/main.c b/check/main.c
>>> index 4f7ab8b29309..02abbd5289f9 100644
>>> --- a/check/main.c
>>> +++ b/check/main.c
>>> @@ -6293,7 +6293,7 @@ static int check_type_with_root(u64 rootid, u8 
>>> key_type)
>>>               goto err;
>>>           break;
>>>       case BTRFS_BLOCK_GROUP_ITEM_KEY:
>>> -        if (btrfs_fs_incompat(gfs_info, EXTENT_TREE_V2)) {
>>> +        if (btrfs_fs_compat_ro(gfs_info, BLOCK_GROUP_TREE)) {
>>>               if (rootid != BTRFS_BLOCK_GROUP_TREE_OBJECTID)
>>>                   goto err;
>>>           } else if (rootid != BTRFS_EXTENT_TREE_OBJECTID) {
>>> @@ -9071,10 +9071,6 @@ again:
>>>       ret = load_super_root(&normal_trees, gfs_info->chunk_root);
>>>       if (ret < 0)
>>>           goto out;
>>> -    ret = load_super_root(&normal_trees, gfs_info->block_group_root);
>>> -    if (ret < 0)
>>> -        goto out;
>>> -
>>>       ret = parse_tree_roots(&normal_trees, &dropping_trees);
>>>       if (ret < 0)
>>>           goto out;
>>> @@ -9574,7 +9570,7 @@ again:
>>>        * If we are extent tree v2 then we can reint the block group 
>>> root as
>>>        * well.
>>>        */
>>> -    if (btrfs_fs_incompat(gfs_info, EXTENT_TREE_V2)) {
>>> +    if (btrfs_fs_compat_ro(gfs_info, BLOCK_GROUP_TREE)) {
>>>           ret = btrfs_fsck_reinit_root(trans, 
>>> gfs_info->block_group_root);
>>>           if (ret) {
>>>               fprintf(stderr, "block group initialization failed\n");
>>> diff --git a/common/fsfeatures.c b/common/fsfeatures.c
>>> index 23a92c21a2cc..90704959b13b 100644
>>> --- a/common/fsfeatures.c
>>> +++ b/common/fsfeatures.c
>>> @@ -172,6 +172,14 @@ static const struct btrfs_feature 
>>> runtime_features[] = {
>>>           VERSION_TO_STRING2(safe, 4,9),
>>>           VERSION_TO_STRING2(default, 5,15),
>>>           .desc        = "free space tree (space_cache=v2)"
>>> +    }, {
>>> +        .name        = "block-group-tree",
>>> +        .flag        = BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE,
>>> +        .sysfs_name = "block_group_tree",
>>> +        VERSION_TO_STRING2(compat, 6,0),
>>> +        VERSION_NULL(safe),
>>> +        VERSION_NULL(default),
>>> +        .desc        = "block group tree to reduce mount time"
>>>       },
>>>       /* Keep this one last */
>>>       {
>>> diff --git a/common/fsfeatures.h b/common/fsfeatures.h
>>> index 9e39c667b900..a8d77fd4da05 100644
>>> --- a/common/fsfeatures.h
>>> +++ b/common/fsfeatures.h
>>> @@ -45,6 +45,8 @@
>>>   #define BTRFS_RUNTIME_FEATURE_QUOTA        (1ULL << 0)
>>>   #define BTRFS_RUNTIME_FEATURE_FREE_SPACE_TREE    (1ULL << 1)
>>> +#define BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE    (1ULL << 2)
>>> +
>>>   void btrfs_list_all_fs_features(u64 mask_disallowed);
>>>   void btrfs_list_all_runtime_features(u64 mask_disallowed);
>>> diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h
>>> index c12076202577..d8909b3fdf20 100644
>>> --- a/kernel-shared/ctree.h
>>> +++ b/kernel-shared/ctree.h
>>> @@ -479,6 +479,12 @@ BUILD_ASSERT(sizeof(struct btrfs_super_block) == 
>>> BTRFS_SUPER_INFO_SIZE);
>>>    */
>>>   #define BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID    (1ULL << 1)
>>> +/*
>>> + * Save all block group items into a dedicated block group tree, to 
>>> greatly
>>> + * reduce mount time for large fs.
>>> + */
>>> +#define BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE    (1ULL << 5)
>>> +
>>>   #define BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF    (1ULL << 0)
>>>   #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL    (1ULL << 1)
>>>   #define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS    (1ULL << 2)
>>> @@ -508,7 +514,8 @@ BUILD_ASSERT(sizeof(struct btrfs_super_block) == 
>>> BTRFS_SUPER_INFO_SIZE);
>>>    */
>>>   #define BTRFS_FEATURE_COMPAT_RO_SUPP            \
>>>       (BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |    \
>>> -     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID)
>>> +     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID| \
>>> +     BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE)
>>>   #if EXPERIMENTAL
>>>   #define BTRFS_FEATURE_INCOMPAT_SUPP            \
>>> diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c
>>> index 80db5976cc3f..6eeb5ecd1d59 100644
>>> --- a/kernel-shared/disk-io.c
>>> +++ b/kernel-shared/disk-io.c
>>> @@ -1203,7 +1203,7 @@ static int load_important_roots(struct 
>>> btrfs_fs_info *fs_info,
>>>           backup = sb->super_roots + index;
>>>       }
>>> -    if (!btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
>>> +    if (!btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE)) {
>>>           free(fs_info->block_group_root);
>>>           fs_info->block_group_root = NULL;
>>>           goto tree_root;
>>> @@ -1256,7 +1256,7 @@ int btrfs_setup_all_roots(struct btrfs_fs_info 
>>> *fs_info, u64 root_tree_bytenr,
>>>       if (ret)
>>>           return ret;
>>> -    if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
>>> +    if (btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE)) {
>>>           ret = find_and_setup_root(root, fs_info,
>>>                   BTRFS_BLOCK_GROUP_TREE_OBJECTID,
>>>                   fs_info->block_group_root);
>>> diff --git a/kernel-shared/disk-io.h b/kernel-shared/disk-io.h
>>> index bba97fc1a814..6c8eaa2bd13d 100644
>>> --- a/kernel-shared/disk-io.h
>>> +++ b/kernel-shared/disk-io.h
>>> @@ -232,7 +232,7 @@ int btrfs_global_root_insert(struct btrfs_fs_info 
>>> *fs_info,
>>>   static inline struct btrfs_root *btrfs_block_group_root(
>>>                           struct btrfs_fs_info *fs_info)
>>>   {
>>> -    if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
>>> +    if (btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE))
>>>           return fs_info->block_group_root;
>>>       return btrfs_extent_root(fs_info, 0);
>>>   }
>>> diff --git a/kernel-shared/print-tree.c b/kernel-shared/print-tree.c
>>> index bffe30b405c7..b2ee77c2fb73 100644
>>> --- a/kernel-shared/print-tree.c
>>> +++ b/kernel-shared/print-tree.c
>>> @@ -1668,6 +1668,7 @@ struct readable_flag_entry {
>>>   static struct readable_flag_entry compat_ro_flags_array[] = {
>>>       DEF_COMPAT_RO_FLAG_ENTRY(FREE_SPACE_TREE),
>>>       DEF_COMPAT_RO_FLAG_ENTRY(FREE_SPACE_TREE_VALID),
>>> +    DEF_COMPAT_RO_FLAG_ENTRY(BLOCK_GROUP_TREE),
>>>   };
>>>   static const int compat_ro_flags_num = sizeof(compat_ro_flags_array) /
>>>                          sizeof(struct readable_flag_entry);
>>> @@ -1754,9 +1755,7 @@ static void print_readable_compat_ro_flag(u64 
>>> flag)
>>>        */
>>>       return __print_readable_flag(flag, compat_ro_flags_array,
>>>                        compat_ro_flags_num,
>>> -                     BTRFS_FEATURE_COMPAT_RO_SUPP |
>>> -                     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |
>>> -                     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID);
>>> +                     BTRFS_FEATURE_COMPAT_RO_SUPP);
>>>   }
>>>   static void print_readable_incompat_flag(u64 flag)
>>> diff --git a/mkfs/common.c b/mkfs/common.c
>>> index b72338551dfb..cb616f13ef9b 100644
>>> --- a/mkfs/common.c
>>> +++ b/mkfs/common.c
>>> @@ -75,6 +75,8 @@ static int btrfs_create_tree_root(int fd, struct 
>>> btrfs_mkfs_config *cfg,
>>>       int blk;
>>>       int i;
>>>       u8 uuid[BTRFS_UUID_SIZE];
>>> +    bool block_group_tree = !!(cfg->runtime_features &
>>> +                   BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE);
>>>       memset(buf->data + sizeof(struct btrfs_header), 0,
>>>           cfg->nodesize - sizeof(struct btrfs_header));
>>> @@ -101,6 +103,9 @@ static int btrfs_create_tree_root(int fd, struct 
>>> btrfs_mkfs_config *cfg,
>>>           if (blk == MKFS_ROOT_TREE || blk == MKFS_CHUNK_TREE)
>>>               continue;
>>> +        if (!block_group_tree && blk == MKFS_BLOCK_GROUP_TREE)
>>> +            continue;
>>> +
>>>           btrfs_set_root_bytenr(&root_item, cfg->blocks[blk]);
>>>           btrfs_set_disk_key_objectid(&disk_key,
>>>               reference_root_table[blk]);
>>> @@ -216,7 +221,8 @@ static int create_block_group_tree(int fd, struct 
>>> btrfs_mkfs_config *cfg,
>>>       memset(buf->data + sizeof(struct btrfs_header), 0,
>>>           cfg->nodesize - sizeof(struct btrfs_header));
>>> -    write_block_group_item(buf, 0, bg_offset, bg_size, bg_used, 0,
>>> +    write_block_group_item(buf, 0, bg_offset, bg_size, bg_used,
>>> +                   BTRFS_FIRST_CHUNK_TREE_OBJECTID,
>>>                      cfg->leaf_data_size -
>>>                      sizeof(struct btrfs_block_group_item));
>>>       btrfs_set_header_bytenr(buf, cfg->blocks[MKFS_BLOCK_GROUP_TREE]);
>>> @@ -357,6 +363,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config 
>>> *cfg)
>>>       u32 array_size;
>>>       u32 item_size;
>>>       u64 total_used = 0;
>>> +    u64 ro_flags = 0;
>>>       int skinny_metadata = !!(cfg->features &
>>>                    BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA);
>>>       u64 num_bytes;
>>> @@ -365,6 +372,8 @@ int make_btrfs(int fd, struct btrfs_mkfs_config 
>>> *cfg)
>>>       bool add_block_group = true;
>>>       bool free_space_tree = !!(cfg->runtime_features &
>>>                     BTRFS_RUNTIME_FEATURE_FREE_SPACE_TREE);
>>> +    bool block_group_tree = !!(cfg->runtime_features &
>>> +                   BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE);
>>>       bool extent_tree_v2 = !!(cfg->features &
>>>                    BTRFS_FEATURE_INCOMPAT_EXTENT_TREE_V2);
>>> @@ -372,8 +381,13 @@ int make_btrfs(int fd, struct btrfs_mkfs_config 
>>> *cfg)
>>>              sizeof(enum btrfs_mkfs_block) * 
>>> ARRAY_SIZE(default_blocks));
>>>       blocks_nr = ARRAY_SIZE(default_blocks);
>>> -    /* Extent tree v2 needs an extra block for block group tree.*/
>>> -    if (extent_tree_v2) {
>>> +    /*
>>> +     * Add one new block for block group tree.
>>> +     * And for block group tree, we don't need to add block group item
>>> +     * into extent tree, the item will be handled in block group tree
>>> +     * initialization.
>>> +     */
>>> +    if (block_group_tree) {
>>>           mkfs_blocks_add(blocks, &blocks_nr, MKFS_BLOCK_GROUP_TREE);
>>>           add_block_group = false;
>>>       }
>>> @@ -433,12 +447,15 @@ int make_btrfs(int fd, struct btrfs_mkfs_config 
>>> *cfg)
>>>           btrfs_set_super_cache_generation(&super, -1);
>>>       btrfs_set_super_incompat_flags(&super, cfg->features);
>>>       if (free_space_tree) {
>>> -        u64 ro_flags = BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |
>>> -            BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID;
>>> +        ro_flags |= (BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |
>>> +                 BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID);
>>> -        btrfs_set_super_compat_ro_flags(&super, ro_flags);
>>>           btrfs_set_super_cache_generation(&super, 0);
>>>       }
>>> +    if (block_group_tree)
>>> +        ro_flags |= BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE;
>>> +    btrfs_set_super_compat_ro_flags(&super, ro_flags);
>>> +
>>>       if (extent_tree_v2)
>>>           btrfs_set_super_nr_global_roots(&super, 1);
>>> @@ -695,7 +712,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config 
>>> *cfg)
>>>               goto out;
>>>       }
>>> -    if (extent_tree_v2) {
>>> +    if (block_group_tree) {
>>>           ret = create_block_group_tree(fd, cfg, buf,
>>>                             system_group_offset,
>>>                             system_group_size, total_used);
>>> diff --git a/mkfs/main.c b/mkfs/main.c
>>> index ce096d362171..518ce0fd7523 100644
>>> --- a/mkfs/main.c
>>> +++ b/mkfs/main.c
>>> @@ -299,7 +299,8 @@ static int recow_roots(struct btrfs_trans_handle 
>>> *trans,
>>>       ret = __recow_root(trans, info->dev_root);
>>>       if (ret)
>>>           return ret;
>>> -        if (btrfs_fs_incompat(info, EXTENT_TREE_V2)) {
>>> +
>>> +    if (btrfs_fs_compat_ro(info, BLOCK_GROUP_TREE)) {
>>>           ret = __recow_root(trans, info->block_group_root);
>>>           if (ret)
>>>               return ret;
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-10-04  0:06 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-09  6:03 [PATCH v3 0/5] btrfs-progs: separate BLOCK_GROUP_TREE feature from extent-tree-v2 Qu Wenruo
2022-08-09  6:03 ` [PATCH v3 1/5] btrfs-progs: mkfs: dynamically modify mkfs blocks array Qu Wenruo
2022-08-09  6:03 ` [PATCH v3 2/5] btrfs-progs: don't save block group root into super block Qu Wenruo
2022-08-09  6:03 ` [PATCH v3 3/5] btrfs-progs: separate block group tree from extent tree v2 Qu Wenruo
2022-08-31 19:14   ` David Sterba
2022-08-31 21:43     ` Qu Wenruo
2022-09-01 12:15       ` Qu Wenruo
2022-09-02  9:21         ` David Sterba
2022-09-02  9:37           ` Qu Wenruo
2022-09-02 12:10             ` David Sterba
2022-10-03 14:48   ` Anand Jain
2022-10-03 23:28     ` Qu Wenruo
2022-10-04  0:05       ` Qu Wenruo
2022-08-09  6:03 ` [PATCH v3 4/5] btrfs-progs: btrfstune: add the ability to convert to block group tree feature Qu Wenruo
2022-08-09  6:03 ` [PATCH v3 5/5] btrfs-progs: mkfs: add artificial dependency for block group tree Qu Wenruo
2022-08-31 18:26 ` [PATCH v3 0/5] btrfs-progs: separate BLOCK_GROUP_TREE feature from extent-tree-v2 David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).