linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/16] btrfs: make open_ctree() init/exit sequence strictly matched
@ 2022-09-22  0:06 Qu Wenruo
  2022-09-22  0:06 ` [PATCH 01/16] btrfs: make btrfs module init/exit match their sequence Qu Wenruo
                   ` (15 more replies)
  0 siblings, 16 replies; 18+ messages in thread
From: Qu Wenruo @ 2022-09-22  0:06 UTC (permalink / raw)
  To: linux-btrfs, Johannes Thumshirn, David Sterba

Just like init_btrfs_fs(), open_ctree() also has tons of different
labels for its error handling.

And unsurprisingly the error handling labels are not matched correctly,
e.g. we always call btrfs_mapping_tree_free() even we didn't reach
sys chunk array read.

And every time we need to add some new function, it will be a disaster
just to understand where the new function should be put and how the
error handling should be done.

This patchset will follow the init_btrfs_fs() method, by introducing
an open_ctree_seq[] array, which contains the following sections:

- btree_inode init/exit
- super block read and verification
- mount options and features check
- workqueues init/exit
- chunk tree init/exit
- tree roots init/exit
- mount time check and various item load
- sysfs init/exit
- block group tree init/exit
- subvolume trees init/exit
- kthread init/exit
- qgroup init/exit

The remaining part of open_ctree() is only less than 50 lines, and are
all related to the very end of the mount progress, including log-replay,
uuid tree check.

Also to do better testing, for DEBUG build there will be a new mount
option, "fail_mount=%u" to allow open_ctree() to fail at certain stage
of open_ctree_seq[].

Unfortunately since that mount option can only be parsed in
open_ctree_features_init(), this means we can only fail after stage 2.
But this should still provide much better testing coverage.

To David:

This serious is going to conflict with the regression fix on block group
tree ("btrfs: loosen the block-group-tree feature dependency check"),
please let me rebase before merging.

To Johannes:

Not 100% sure about the zoned code, thus some review/advice would be very
appreciated.

Maybe I can further extract all the zoned code into an init/exit pair?

Qu Wenruo (16):
  btrfs: make btrfs module init/exit match their sequence
  btrfs: initialize fs_info->sb at the very beginning of open_ctree()
  btrfs: remove @fs_devices argument from open_ctree()
  btrfs: extract btree inode init code into its own init/exit helpers
  btrfs: extract super block read code into its own init helper
  btrfs: extract mount options and features init code into its own init
    helper
  btrfs: move btrfs_init_workqueus() and btrfs_stop_all_workers() into
    open_ctree_seq[]
  btrfs: extract chunk tree read code into its own init/exit helpers
  btrfs: extract tree roots and zone info initialization into init/exit
    helpers
  btrfs: extract mount time checks and items load code into its init
    helper
  btrfs: extract sysfs init into its own helper
  btrfs: extra block groups read code into its own init/exit helpers
  btrfs: move the fs root related code into its own init/exit helpers
  btrfs: extract kthread code into its own init/exit helpers
  btrfs: move qgroup init/exit code into open_ctree_seq[] array
  btrfs: introduce a debug mount option to do error injection for each
    stage of open_ctree()

 fs/btrfs/compression.c |   3 +-
 fs/btrfs/compression.h |   2 +-
 fs/btrfs/ctree.h       |   7 +
 fs/btrfs/disk-io.c     | 660 ++++++++++++++++++++++++++---------------
 fs/btrfs/disk-io.h     |   4 +-
 fs/btrfs/props.c       |   3 +-
 fs/btrfs/props.h       |   2 +-
 fs/btrfs/super.c       | 229 +++++++-------
 8 files changed, 557 insertions(+), 353 deletions(-)

-- 
2.37.3


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 01/16] btrfs: make btrfs module init/exit match their sequence
  2022-09-22  0:06 [PATCH 00/16] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
@ 2022-09-22  0:06 ` Qu Wenruo
  2022-09-22  0:06 ` [PATCH 02/16] btrfs: initialize fs_info->sb at the very beginning of open_ctree() Qu Wenruo
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2022-09-22  0:06 UTC (permalink / raw)
  To: linux-btrfs

[BACKGROUND]
In theory init_btrfs_fs() and exit_btrfs_fs() should match their
sequence, thus normally they should look like this:

    init_btrfs_fs()   |   exit_btrfs_fs()
----------------------+------------------------
    init_A();         |
    init_B();         |
    init_C();         |
                      |   exit_C();
                      |   exit_B();
                      |   exit_A();

So is for the error path of init_btrfs_fs().

But it's not the case, some exit functions don't match their init
functions sequence in init_btrfs_fs().

Furthermore in init_btrfs_fs(), we need to have a new error tag for each
new init function we added.
This is not really expandable, especially recently we may add several
new functions to init_btrfs_fs().

[ENHANCEMENT]
The patch will introduce the following things to enhance the situation:

- struct init_sequence
  Just a wrapper of init and exit function pointers.

  The init function must use int type as return value, thus some init
  functions need to be updated to return 0.

  The exit function can be NULL, as there are some init sequence just
  outputting a message.

- struct mod_init_seq[] array
  This is a const array, recording all the initialization we need to do
  in init_btrfs_fs(), and the order follows the old init_btrfs_fs().

  Only one modification in the order, now we call btrfs_print_mod_info()
  after sanity checks.
  As it makes no sense to print the mod into, and fail the sanity tests.

- bool mod_init_result[] array
  This is a bool array, recording if we have initialized one entry in
  mod_init_seq[].

  The reason to split mod_init_seq[] and mod_init_result[] is to avoid
  section mismatch in reference.

  All init function are in .init.text, but if mod_init_seq[] records
  the @initialized member it can no longer be const, thus will be put
  into .data section, and cause modpost warning.

For init_btrfs_fs() we just call all init functions in their order in
mod_init_seq[] array, and after each call, setting corresponding
mod_init_result[] to true.

For exit_btrfs_fs() and error handling path of init_btrfs_fs(), we just
iterate mod_init_seq[] in reverse order, and skip all uninitialized
entry.

With this patch, init_btrfs_fs()/exit_btrfs_fs() will be much easier to
expand and will always follow the strict order.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Changelog:
RFC->v1:
- Change the mod_init_seq[] array to static const
---
 fs/btrfs/compression.c |   3 +-
 fs/btrfs/compression.h |   2 +-
 fs/btrfs/props.c       |   3 +-
 fs/btrfs/props.h       |   2 +-
 fs/btrfs/super.c       | 211 +++++++++++++++++++++--------------------
 5 files changed, 113 insertions(+), 108 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 54caa00a2245..8d3e3218fe37 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -1232,12 +1232,13 @@ int btrfs_decompress(int type, unsigned char *data_in, struct page *dest_page,
 	return ret;
 }
 
-void __init btrfs_init_compress(void)
+int __init btrfs_init_compress(void)
 {
 	btrfs_init_workspace_manager(BTRFS_COMPRESS_NONE);
 	btrfs_init_workspace_manager(BTRFS_COMPRESS_ZLIB);
 	btrfs_init_workspace_manager(BTRFS_COMPRESS_LZO);
 	zstd_init_workspace_manager();
+	return 0;
 }
 
 void __cold btrfs_exit_compress(void)
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index 1aa02903de69..9da2343eeff5 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -77,7 +77,7 @@ static inline unsigned int btrfs_compress_level(unsigned int type_level)
 	return ((type_level & 0xF0) >> 4);
 }
 
-void __init btrfs_init_compress(void);
+int __init btrfs_init_compress(void);
 void __cold btrfs_exit_compress(void);
 
 int btrfs_compress_pages(unsigned int type_level, struct address_space *mapping,
diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c
index 055a631276ce..abee92eb1ed6 100644
--- a/fs/btrfs/props.c
+++ b/fs/btrfs/props.c
@@ -453,7 +453,7 @@ int btrfs_inode_inherit_props(struct btrfs_trans_handle *trans,
 	return 0;
 }
 
-void __init btrfs_props_init(void)
+int __init btrfs_props_init(void)
 {
 	int i;
 
@@ -463,5 +463,6 @@ void __init btrfs_props_init(void)
 
 		hash_add(prop_handlers_ht, &p->node, h);
 	}
+	return 0;
 }
 
diff --git a/fs/btrfs/props.h b/fs/btrfs/props.h
index ca9dd3df129b..6e283196e38a 100644
--- a/fs/btrfs/props.h
+++ b/fs/btrfs/props.h
@@ -8,7 +8,7 @@
 
 #include "ctree.h"
 
-void __init btrfs_props_init(void);
+int __init btrfs_props_init(void);
 
 int btrfs_set_prop(struct btrfs_trans_handle *trans, struct inode *inode,
 		   const char *name, const char *value, size_t value_len,
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index be7df8d1d5b8..79dae3b0ff91 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2691,7 +2691,7 @@ static __cold void btrfs_interface_exit(void)
 	misc_deregister(&btrfs_misc);
 }
 
-static void __init btrfs_print_mod_info(void)
+static int __init btrfs_print_mod_info(void)
 {
 	static const char options[] = ""
 #ifdef CONFIG_BTRFS_DEBUG
@@ -2718,122 +2718,125 @@ static void __init btrfs_print_mod_info(void)
 #endif
 			;
 	pr_info("Btrfs loaded, crc32c=%s%s\n", crc32c_impl(), options);
+	return 0;
 }
 
-static int __init init_btrfs_fs(void)
+static int register_btrfs(void)
 {
-	int err;
-
-	btrfs_props_init();
-
-	err = btrfs_init_sysfs();
-	if (err)
-		return err;
-
-	btrfs_init_compress();
-
-	err = btrfs_init_cachep();
-	if (err)
-		goto free_compress;
-
-	err = extent_state_init_cachep();
-	if (err)
-		goto free_cachep;
-
-	err = extent_buffer_init_cachep();
-	if (err)
-		goto free_extent_cachep;
-
-	err = btrfs_bioset_init();
-	if (err)
-		goto free_eb_cachep;
-
-	err = extent_map_init();
-	if (err)
-		goto free_bioset;
-
-	err = ordered_data_init();
-	if (err)
-		goto free_extent_map;
-
-	err = btrfs_delayed_inode_init();
-	if (err)
-		goto free_ordered_data;
-
-	err = btrfs_auto_defrag_init();
-	if (err)
-		goto free_delayed_inode;
+	return register_filesystem(&btrfs_fs_type);
+}
+static void unregister_btrfs(void)
+{
+	unregister_filesystem(&btrfs_fs_type);
+}
 
-	err = btrfs_delayed_ref_init();
-	if (err)
-		goto free_auto_defrag;
+/* Helper structure for long init/exit functions. */
+struct init_sequence {
+	int (*init_func)(void);
 
-	err = btrfs_prelim_ref_init();
-	if (err)
-		goto free_delayed_ref;
+	/* Can be NULL if the init_func doesn't need cleanup. */
+	void (*exit_func)(void);
+};
 
-	err = btrfs_interface_init();
-	if (err)
-		goto free_prelim_ref;
+static const struct init_sequence mod_init_seq[] = {
+	{
+		.init_func = btrfs_props_init,
+		.exit_func = NULL,
+	}, {
+		.init_func = btrfs_init_sysfs,
+		.exit_func = btrfs_exit_sysfs,
+	}, {
+		.init_func = btrfs_init_compress,
+		.exit_func = btrfs_exit_compress,
+	}, {
+		.init_func = btrfs_init_cachep,
+		.exit_func = btrfs_destroy_cachep,
+	}, {
+		.init_func = extent_state_init_cachep,
+		.exit_func = extent_state_free_cachep,
+	}, {
+		.init_func = extent_buffer_init_cachep,
+		.exit_func = extent_buffer_free_cachep,
+	}, {
+		.init_func = btrfs_bioset_init,
+		.exit_func = btrfs_bioset_exit,
+	}, {
+		.init_func = extent_map_init,
+		.exit_func = extent_map_exit,
+	}, {
+		.init_func = ordered_data_init,
+		.exit_func = ordered_data_exit,
+	}, {
+		.init_func = btrfs_delayed_inode_init,
+		.exit_func = btrfs_delayed_inode_exit,
+	}, {
+		.init_func = btrfs_auto_defrag_init,
+		.exit_func = btrfs_auto_defrag_exit,
+	}, {
+		.init_func = btrfs_delayed_ref_init,
+		.exit_func = btrfs_delayed_ref_exit,
+	}, {
+		.init_func = btrfs_prelim_ref_init,
+		.exit_func = btrfs_prelim_ref_exit,
+	}, {
+		.init_func = btrfs_interface_init,
+		.exit_func = btrfs_interface_exit,
+	}, {
+		.init_func = btrfs_run_sanity_tests,
+		.exit_func = NULL,
+	}, {
+		.init_func = btrfs_print_mod_info,
+		.exit_func = NULL,
+	}, {
+		.init_func = register_btrfs,
+		.exit_func = unregister_btrfs,
+	}
+};
 
-	btrfs_print_mod_info();
+static bool mod_init_result[ARRAY_SIZE(mod_init_seq)];
 
-	err = btrfs_run_sanity_tests();
-	if (err)
-		goto unregister_ioctl;
+static void __exit exit_btrfs_fs(void)
+{
+	int i;
 
-	err = register_filesystem(&btrfs_fs_type);
-	if (err)
-		goto unregister_ioctl;
+	for (i = ARRAY_SIZE(mod_init_seq) - 1; i >= 0; i--) {
+		if (!mod_init_result[i])
+			continue;
+		if (mod_init_seq[i].exit_func)
+			mod_init_seq[i].exit_func();
+		mod_init_result[i] = false;
+	}
+}
 
+static int __init init_btrfs_fs(void)
+{
+	int ret;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(mod_init_seq); i++) {
+		ASSERT(!mod_init_result[i]);
+		ret = mod_init_seq[i].init_func();
+		if (ret < 0)
+			goto error;
+		mod_init_result[i] = true;
+	}
 	return 0;
 
-unregister_ioctl:
-	btrfs_interface_exit();
-free_prelim_ref:
-	btrfs_prelim_ref_exit();
-free_delayed_ref:
-	btrfs_delayed_ref_exit();
-free_auto_defrag:
-	btrfs_auto_defrag_exit();
-free_delayed_inode:
-	btrfs_delayed_inode_exit();
-free_ordered_data:
-	ordered_data_exit();
-free_extent_map:
-	extent_map_exit();
-free_bioset:
-	btrfs_bioset_exit();
-free_eb_cachep:
-	extent_buffer_free_cachep();
-free_extent_cachep:
-	extent_state_free_cachep();
-free_cachep:
-	btrfs_destroy_cachep();
-free_compress:
-	btrfs_exit_compress();
-	btrfs_exit_sysfs();
-
-	return err;
-}
+error:
+	/*
+	 * If we call exit_btrfs_fs() it would cause section mismatch.
+	 * As init_btrfs_fs() belongs to .init.text, while exit_btrfs_fs()
+	 * belongs to .exit.text.
+	 */
+	for (i = ARRAY_SIZE(mod_init_seq) - 1; i >= 0; i--) {
+		if (!mod_init_result[i])
+			continue;
+		if (mod_init_seq[i].exit_func)
+			mod_init_seq[i].exit_func();
+		mod_init_result[i] = false;
+	}
+	return ret;
 
-static void __exit exit_btrfs_fs(void)
-{
-	btrfs_destroy_cachep();
-	btrfs_delayed_ref_exit();
-	btrfs_auto_defrag_exit();
-	btrfs_delayed_inode_exit();
-	btrfs_prelim_ref_exit();
-	ordered_data_exit();
-	extent_map_exit();
-	btrfs_bioset_exit();
-	extent_state_free_cachep();
-	extent_buffer_free_cachep();
-	btrfs_interface_exit();
-	unregister_filesystem(&btrfs_fs_type);
-	btrfs_exit_sysfs();
-	btrfs_cleanup_fs_uuids();
-	btrfs_exit_compress();
 }
 
 late_initcall(init_btrfs_fs);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 02/16] btrfs: initialize fs_info->sb at the very beginning of open_ctree()
  2022-09-22  0:06 [PATCH 00/16] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
  2022-09-22  0:06 ` [PATCH 01/16] btrfs: make btrfs module init/exit match their sequence Qu Wenruo
@ 2022-09-22  0:06 ` Qu Wenruo
  2022-09-22  0:06 ` [PATCH 03/16] btrfs: remove @fs_devices argument from open_ctree() Qu Wenruo
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2022-09-22  0:06 UTC (permalink / raw)
  To: linux-btrfs

Currently at open_ctree(), sb->s_fs_info is already initialized to the
fs_info we want.

On the other hand, fs_info->sb is not initialized until
init_mount_fs_info().

This patch will initialize fs_info->sb at the very beginning of
open_ctree(), so later code can use fs_info->sb to grab the super block.

This does not only remove the @sb parameter for init_mount_fs_info(),
but also provides the basis for later open_ctree() refactor which
requires everything to be accessible from a single @fs_info pointer.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a887fe67a2a0..c4a8e684ee53 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3096,13 +3096,12 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 	INIT_WORK(&fs_info->reclaim_bgs_work, btrfs_reclaim_bgs_work);
 }
 
-static int init_mount_fs_info(struct btrfs_fs_info *fs_info, struct super_block *sb)
+static int init_mount_fs_info(struct btrfs_fs_info *fs_info)
 {
 	int ret;
 
-	fs_info->sb = sb;
-	sb->s_blocksize = BTRFS_BDEV_BLOCKSIZE;
-	sb->s_blocksize_bits = blksize_bits(BTRFS_BDEV_BLOCKSIZE);
+	fs_info->sb->s_blocksize = BTRFS_BDEV_BLOCKSIZE;
+	fs_info->sb->s_blocksize_bits = blksize_bits(BTRFS_BDEV_BLOCKSIZE);
 
 	ret = percpu_counter_init(&fs_info->ordered_bytes, 0, GFP_KERNEL);
 	if (ret)
@@ -3130,7 +3129,7 @@ static int init_mount_fs_info(struct btrfs_fs_info *fs_info, struct super_block
 		return -ENOMEM;
 	btrfs_init_delayed_root(fs_info->delayed_root);
 
-	if (sb_rdonly(sb))
+	if (sb_rdonly(fs_info->sb))
 		set_bit(BTRFS_FS_STATE_RO, &fs_info->fs_state);
 
 	return btrfs_alloc_stripe_hash_table(fs_info);
@@ -3308,7 +3307,9 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	int err = -EINVAL;
 	int level;
 
-	ret = init_mount_fs_info(fs_info, sb);
+	fs_info->sb = sb;
+
+	ret = init_mount_fs_info(fs_info);
 	if (ret) {
 		err = ret;
 		goto fail;
@@ -3326,7 +3327,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 		goto fail;
 	}
 
-	fs_info->btree_inode = new_inode(sb);
+	fs_info->btree_inode = new_inode(fs_info->sb);
 	if (!fs_info->btree_inode) {
 		err = -ENOMEM;
 		goto fail;
@@ -3434,7 +3435,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	fs_info->csums_per_leaf = BTRFS_MAX_ITEM_SIZE(fs_info) / fs_info->csum_size;
 	fs_info->stripesize = stripesize;
 
-	ret = btrfs_parse_options(fs_info, options, sb->s_flags);
+	ret = btrfs_parse_options(fs_info, options, fs_info->sb->s_flags);
 	if (ret) {
 		err = ret;
 		goto fail_alloc;
@@ -3484,7 +3485,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 
 	features = btrfs_super_compat_ro_flags(disk_super) &
 		~BTRFS_FEATURE_COMPAT_RO_SUPP;
-	if (!sb_rdonly(sb) && features) {
+	if (!sb_rdonly(fs_info->sb) && features) {
 		btrfs_err(fs_info,
 	"cannot mount read-write because of unsupported optional features (0x%llx)",
 		       features);
@@ -3536,12 +3537,12 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 		goto fail_sb_buffer;
 	}
 
-	sb->s_bdi->ra_pages *= btrfs_super_num_devices(disk_super);
-	sb->s_bdi->ra_pages = max(sb->s_bdi->ra_pages, SZ_4M / PAGE_SIZE);
+	fs_info->sb->s_bdi->ra_pages *= btrfs_super_num_devices(disk_super);
+	fs_info->sb->s_bdi->ra_pages = max(fs_info->sb->s_bdi->ra_pages, SZ_4M / PAGE_SIZE);
 
-	sb->s_blocksize = sectorsize;
-	sb->s_blocksize_bits = blksize_bits(sectorsize);
-	memcpy(&sb->s_uuid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE);
+	fs_info->sb->s_blocksize = sectorsize;
+	fs_info->sb->s_blocksize_bits = blksize_bits(sectorsize);
+	memcpy(&fs_info->sb->s_uuid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE);
 
 	mutex_lock(&fs_info->chunk_mutex);
 	ret = btrfs_read_sys_array(fs_info);
@@ -3738,7 +3739,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 		goto fail_qgroup;
 	}
 
-	if (sb_rdonly(sb))
+	if (sb_rdonly(fs_info->sb))
 		goto clear_oneshot;
 
 	ret = btrfs_start_pre_rw_mount(fs_info);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 03/16] btrfs: remove @fs_devices argument from open_ctree()
  2022-09-22  0:06 [PATCH 00/16] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
  2022-09-22  0:06 ` [PATCH 01/16] btrfs: make btrfs module init/exit match their sequence Qu Wenruo
  2022-09-22  0:06 ` [PATCH 02/16] btrfs: initialize fs_info->sb at the very beginning of open_ctree() Qu Wenruo
@ 2022-09-22  0:06 ` Qu Wenruo
  2022-09-22  0:06 ` [PATCH 04/16] btrfs: extract btree inode init code into its own init/exit helpers Qu Wenruo
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2022-09-22  0:06 UTC (permalink / raw)
  To: linux-btrfs

At the timing of open_ctree(), we have already initialized
fs_info->fs_devics, thus no need to pass a dedicated @fs_devices
argument into open_ctree().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 7 +++++--
 fs/btrfs/disk-io.h | 4 +---
 fs/btrfs/super.c   | 5 ++---
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c4a8e684ee53..5fbf73daa388 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3290,8 +3290,7 @@ int btrfs_start_pre_rw_mount(struct btrfs_fs_info *fs_info)
 	return ret;
 }
 
-int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_devices,
-		      char *options)
+int __cold open_ctree(struct super_block *sb, char *options)
 {
 	u32 sectorsize;
 	u32 nodesize;
@@ -3301,6 +3300,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	u16 csum_type;
 	struct btrfs_super_block *disk_super;
 	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
+	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
 	struct btrfs_root *tree_root;
 	struct btrfs_root *chunk_root;
 	int ret;
@@ -3309,6 +3309,9 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 
 	fs_info->sb = sb;
 
+	/* Caller should have already initialized fs_info->fs_devices. */
+	ASSERT(fs_info->fs_devices);
+
 	ret = init_mount_fs_info(fs_info);
 	if (ret) {
 		err = ret;
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 7e545ec09a10..b360ff633f40 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -42,9 +42,7 @@ struct extent_buffer *btrfs_find_create_tree_block(
 void btrfs_clean_tree_block(struct extent_buffer *buf);
 void btrfs_clear_oneshot_options(struct btrfs_fs_info *fs_info);
 int btrfs_start_pre_rw_mount(struct btrfs_fs_info *fs_info);
-int __cold open_ctree(struct super_block *sb,
-	       struct btrfs_fs_devices *fs_devices,
-	       char *options);
+int __cold open_ctree(struct super_block *sb, char *options);
 void __cold close_ctree(struct btrfs_fs_info *fs_info);
 int btrfs_validate_super(struct btrfs_fs_info *fs_info,
 			 struct btrfs_super_block *sb, int mirror_num);
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 79dae3b0ff91..9d86b7ef9e43 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1429,7 +1429,6 @@ static int get_default_subvol_objectid(struct btrfs_fs_info *fs_info, u64 *objec
 }
 
 static int btrfs_fill_super(struct super_block *sb,
-			    struct btrfs_fs_devices *fs_devices,
 			    void *data)
 {
 	struct inode *inode;
@@ -1458,7 +1457,7 @@ static int btrfs_fill_super(struct super_block *sb,
 		return err;
 	}
 
-	err = open_ctree(sb, fs_devices, (char *)data);
+	err = open_ctree(sb, (char *)data);
 	if (err) {
 		btrfs_err(fs_info, "open_ctree failed");
 		return err;
@@ -1826,7 +1825,7 @@ static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
 		btrfs_sb(s)->bdev_holder = fs_type;
 		if (!strstr(crc32c_impl(), "generic"))
 			set_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags);
-		error = btrfs_fill_super(s, fs_devices, data);
+		error = btrfs_fill_super(s, data);
 	}
 	if (!error)
 		error = security_sb_set_mnt_opts(s, new_sec_opts, 0, NULL);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 04/16] btrfs: extract btree inode init code into its own init/exit helpers
  2022-09-22  0:06 [PATCH 00/16] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (2 preceding siblings ...)
  2022-09-22  0:06 ` [PATCH 03/16] btrfs: remove @fs_devices argument from open_ctree() Qu Wenruo
@ 2022-09-22  0:06 ` Qu Wenruo
  2022-09-22  0:06 ` [PATCH 05/16] btrfs: extract super block read code into its own init helper Qu Wenruo
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2022-09-22  0:06 UTC (permalink / raw)
  To: linux-btrfs

Just like how we handle init_btrfs_fs(), here we also use an array to
handle the open_ctree() sequence.

This patch will do the first step by introducing the arrays, and put
btree_inode init/exit code into two helpers:

- open_ctree_btree_inode_init
- open_ctree_btree_inode_exit

[EXPOSED LABEL MISMATCH]
- Bad btrfs_mapping_tree_free() call
  Currently in fail_alloc tag, we call not only iput() on btree_inode, but
  also free the fs_info->mapping_tree, which is not correct.

  As the first time we touch mapping_tree is at btrfs_read_sys_array(),
  thus the btrfs_mapping_tree_free() call is already a label mismatch.

  This will be addressed when we refactor the chunk tree init code.

- Bad invalidate_inode_pages2() call
  After initiliazing the btree inode, we should invalidate all the page
  cache for metadata before we put the btree inode.

  But the old code is calling invalidate_inode_pages2() before stopping
  all workers.

  This is addressed by this patch as we're really the last exit function
  to be executed, thus we're safe to call invalidate_inode_pages2() then
  iput().

[SPECIAL HANDLING]
After init_mount_fs_info() if we hit some error, we don't need to undo
the work of init_mount_fs_info(), as caller will call btrfs_free_fs_info()
to free it anyway.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 116 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 82 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 5fbf73daa388..79507c384146 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3290,6 +3290,69 @@ int btrfs_start_pre_rw_mount(struct btrfs_fs_info *fs_info)
 	return ret;
 }
 
+static int open_ctree_btree_inode_init(struct btrfs_fs_info *fs_info)
+{
+	int ret;
+
+	/*
+	 * The members initialized inside init_mount_fs_info() will be
+	 * handled in btrfs_free_fs_info() by the caller.
+	 * Thus we don't need to handle its error here.
+	 */
+	ret = init_mount_fs_info(fs_info);
+	if (ret < 0)
+		return ret;
+
+	/*
+	 * These need to be init'ed before we start creating inodes and such.
+	 *
+	 * And this 
+	 */
+	fs_info->tree_root = btrfs_alloc_root(fs_info, BTRFS_ROOT_TREE_OBJECTID,
+					      GFP_KERNEL);
+	fs_info->chunk_root = btrfs_alloc_root(fs_info, BTRFS_CHUNK_TREE_OBJECTID,
+					       GFP_KERNEL);
+	if (!fs_info->tree_root || !fs_info->chunk_root)
+		goto enomem;
+	fs_info->btree_inode = new_inode(fs_info->sb);
+	if (!fs_info->btree_inode)
+		goto enomem;
+
+	mapping_set_gfp_mask(fs_info->btree_inode->i_mapping, GFP_NOFS);
+	btrfs_init_btree_inode(fs_info);
+	invalidate_bdev(fs_info->fs_devices->latest_dev->bdev);
+	return 0;
+enomem:
+	btrfs_put_root(fs_info->tree_root);
+	btrfs_put_root(fs_info->chunk_root);
+	fs_info->tree_root = NULL;
+	fs_info->chunk_root = NULL;
+	return -ENOMEM;
+}
+
+static void open_ctree_btree_inode_exit(struct btrfs_fs_info *fs_info)
+{
+	invalidate_inode_pages2(fs_info->btree_inode->i_mapping);
+	iput(fs_info->btree_inode);
+	btrfs_put_root(fs_info->tree_root);
+	btrfs_put_root(fs_info->chunk_root);
+	fs_info->tree_root = NULL;
+	fs_info->chunk_root = NULL;
+}
+
+struct init_sequence {
+	int (*init_func)(struct btrfs_fs_info *fs_info);
+	void (*exit_func)(struct btrfs_fs_info *fs_info);
+};
+
+static const struct init_sequence open_ctree_seq[] = {
+	{
+		.init_func = open_ctree_btree_inode_init,
+		.exit_func = open_ctree_btree_inode_exit,
+	}
+};
+
+
 int __cold open_ctree(struct super_block *sb, char *options)
 {
 	u32 sectorsize;
@@ -3298,47 +3361,26 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	u64 generation;
 	u64 features;
 	u16 csum_type;
+	bool open_ctree_res[ARRAY_SIZE(open_ctree_seq)] = {0};
 	struct btrfs_super_block *disk_super;
 	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
 	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
-	struct btrfs_root *tree_root;
-	struct btrfs_root *chunk_root;
 	int ret;
 	int err = -EINVAL;
 	int level;
+	int i;
 
 	fs_info->sb = sb;
 
 	/* Caller should have already initialized fs_info->fs_devices. */
 	ASSERT(fs_info->fs_devices);
 
-	ret = init_mount_fs_info(fs_info);
-	if (ret) {
-		err = ret;
-		goto fail;
-	}
-
-	/* These need to be init'ed before we start creating inodes and such. */
-	tree_root = btrfs_alloc_root(fs_info, BTRFS_ROOT_TREE_OBJECTID,
-				     GFP_KERNEL);
-	fs_info->tree_root = tree_root;
-	chunk_root = btrfs_alloc_root(fs_info, BTRFS_CHUNK_TREE_OBJECTID,
-				      GFP_KERNEL);
-	fs_info->chunk_root = chunk_root;
-	if (!tree_root || !chunk_root) {
-		err = -ENOMEM;
-		goto fail;
-	}
-
-	fs_info->btree_inode = new_inode(fs_info->sb);
-	if (!fs_info->btree_inode) {
-		err = -ENOMEM;
-		goto fail;
+	for (i = 0; i < ARRAY_SIZE(open_ctree_seq); i++) {
+		ret = open_ctree_seq[i].init_func(fs_info);
+		if (ret < 0)
+			goto fail;
+		open_ctree_res[i] = true;
 	}
-	mapping_set_gfp_mask(fs_info->btree_inode->i_mapping, GFP_NOFS);
-	btrfs_init_btree_inode(fs_info);
-
-	invalidate_bdev(fs_devices->latest_dev->bdev);
 
 	/*
 	 * Read super block and check the signature bytes only
@@ -3557,14 +3599,15 @@ int __cold open_ctree(struct super_block *sb, char *options)
 
 	generation = btrfs_super_chunk_root_generation(disk_super);
 	level = btrfs_super_chunk_root_level(disk_super);
-	ret = load_super_root(chunk_root, btrfs_super_chunk_root(disk_super),
+	ret = load_super_root(fs_info->chunk_root,
+			      btrfs_super_chunk_root(disk_super),
 			      generation, level);
 	if (ret) {
 		btrfs_err(fs_info, "failed to read chunk root");
 		goto fail_tree_roots;
 	}
 
-	read_extent_buffer(chunk_root->node, fs_info->chunk_tree_uuid,
+	read_extent_buffer(fs_info->chunk_root->node, fs_info->chunk_tree_uuid,
 			   offsetof(struct btrfs_header, chunk_tree_uuid),
 			   BTRFS_UUID_SIZE);
 
@@ -3688,7 +3731,7 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		goto fail_sysfs;
 
 	fs_info->transaction_kthread = kthread_run(transaction_kthread,
-						   tree_root,
+						   fs_info->tree_root,
 						   "btrfs-transaction");
 	if (IS_ERR(fs_info->transaction_kthread))
 		goto fail_cleaner;
@@ -3803,17 +3846,22 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	if (fs_info->data_reloc_root)
 		btrfs_drop_and_free_fs_root(fs_info, fs_info->data_reloc_root);
 	free_root_pointers(fs_info, true);
-	invalidate_inode_pages2(fs_info->btree_inode->i_mapping);
 
 fail_sb_buffer:
 	btrfs_stop_all_workers(fs_info);
 	btrfs_free_block_groups(fs_info);
 fail_alloc:
 	btrfs_mapping_tree_free(&fs_info->mapping_tree);
-
-	iput(fs_info->btree_inode);
 fail:
+	for (i = ARRAY_SIZE(open_ctree_seq) - 1; i >= 0; i--) {
+		if (!open_ctree_res[i] || !open_ctree_seq[i].exit_func)
+			continue;
+		open_ctree_seq[i].exit_func(fs_info);
+		open_ctree_res[i] = false;
+	}
 	btrfs_close_devices(fs_info->fs_devices);
+	if (ret < 0)
+		err = ret;
 	return err;
 }
 ALLOW_ERROR_INJECTION(open_ctree, ERRNO);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 05/16] btrfs: extract super block read code into its own init helper
  2022-09-22  0:06 [PATCH 00/16] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (3 preceding siblings ...)
  2022-09-22  0:06 ` [PATCH 04/16] btrfs: extract btree inode init code into its own init/exit helpers Qu Wenruo
@ 2022-09-22  0:06 ` Qu Wenruo
  2022-09-22  0:06 ` [PATCH 06/16] btrfs: extract mount options and features init " Qu Wenruo
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2022-09-22  0:06 UTC (permalink / raw)
  To: linux-btrfs

This patch will extract the super block read and cached members
(sectorsize/nodesize/etc) initialization into a helper,
open_ctree_super_init().

This extraction also did the following non-functional change:

- Add an error message for super_root == 0 case
  Previously we just goto fail_alloc, with ret == 0.
  This can be very confusing and would cause problems since we didn't
  finish the mount at all.

- Move sb->s_blocksize and sb->s_bdi initialization into the new helper
  Since at this stage we already have valid super block and its
  sectorsize, we can directly initialize them here.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 203 +++++++++++++++++++++++----------------------
 1 file changed, 102 insertions(+), 101 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 79507c384146..f4c04cb1c0d6 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3340,56 +3340,18 @@ static void open_ctree_btree_inode_exit(struct btrfs_fs_info *fs_info)
 	fs_info->chunk_root = NULL;
 }
 
-struct init_sequence {
-	int (*init_func)(struct btrfs_fs_info *fs_info);
-	void (*exit_func)(struct btrfs_fs_info *fs_info);
-};
-
-static const struct init_sequence open_ctree_seq[] = {
-	{
-		.init_func = open_ctree_btree_inode_init,
-		.exit_func = open_ctree_btree_inode_exit,
-	}
-};
-
-
-int __cold open_ctree(struct super_block *sb, char *options)
+static int open_ctree_super_init(struct btrfs_fs_info *fs_info)
 {
-	u32 sectorsize;
+	struct btrfs_super_block *disk_super;
 	u32 nodesize;
-	u32 stripesize;
-	u64 generation;
-	u64 features;
+	u32 sectorsize;
 	u16 csum_type;
-	bool open_ctree_res[ARRAY_SIZE(open_ctree_seq)] = {0};
-	struct btrfs_super_block *disk_super;
-	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
-	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
 	int ret;
-	int err = -EINVAL;
-	int level;
-	int i;
-
-	fs_info->sb = sb;
-
-	/* Caller should have already initialized fs_info->fs_devices. */
-	ASSERT(fs_info->fs_devices);
-
-	for (i = 0; i < ARRAY_SIZE(open_ctree_seq); i++) {
-		ret = open_ctree_seq[i].init_func(fs_info);
-		if (ret < 0)
-			goto fail;
-		open_ctree_res[i] = true;
-	}
 
-	/*
-	 * Read super block and check the signature bytes only
-	 */
-	disk_super = btrfs_read_dev_super(fs_devices->latest_dev->bdev);
-	if (IS_ERR(disk_super)) {
-		err = PTR_ERR(disk_super);
-		goto fail_alloc;
-	}
+	/* Read super block and check the signature bytes only */
+	disk_super = btrfs_read_dev_super(fs_info->fs_devices->latest_dev->bdev);
+	if (IS_ERR(disk_super))
+		return PTR_ERR(disk_super);
 
 	/*
 	 * Verify the type first, if that or the checksum value are
@@ -3399,19 +3361,15 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	if (!btrfs_supported_super_csum(csum_type)) {
 		btrfs_err(fs_info, "unsupported checksum algorithm: %u",
 			  csum_type);
-		err = -EINVAL;
-		btrfs_release_disk_super(disk_super);
-		goto fail_alloc;
+		ret = -EINVAL;
+		goto error;
 	}
 
 	fs_info->csum_size = btrfs_super_csum_size(disk_super);
 
 	ret = btrfs_init_csum_hash(fs_info, csum_type);
-	if (ret) {
-		err = ret;
-		btrfs_release_disk_super(disk_super);
-		goto fail_alloc;
-	}
+	if (ret)
+		goto error;
 
 	/*
 	 * We want to check superblock checksum, the type is stored inside.
@@ -3419,9 +3377,8 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	 */
 	if (btrfs_check_super_csum(fs_info, (u8 *)disk_super)) {
 		btrfs_err(fs_info, "superblock checksum mismatch");
-		err = -EINVAL;
-		btrfs_release_disk_super(disk_super);
-		goto fail_alloc;
+		ret = -EINVAL;
+		goto error;
 	}
 
 	/*
@@ -3429,33 +3386,30 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	 * following bytes up to INFO_SIZE, the checksum is calculated from
 	 * the whole block of INFO_SIZE
 	 */
-	memcpy(fs_info->super_copy, disk_super, sizeof(*fs_info->super_copy));
+	memcpy(fs_info->super_copy, disk_super, BTRFS_SUPER_INFO_SIZE);
 	btrfs_release_disk_super(disk_super);
 
-	disk_super = fs_info->super_copy;
-
-
-	features = btrfs_super_flags(disk_super);
-	if (features & BTRFS_SUPER_FLAG_CHANGING_FSID_V2) {
-		features &= ~BTRFS_SUPER_FLAG_CHANGING_FSID_V2;
-		btrfs_set_super_flags(disk_super, features);
+	if (btrfs_super_flags(fs_info->super_copy) &
+	    BTRFS_SUPER_FLAG_CHANGING_FSID_V2) {
+		btrfs_set_super_flags(fs_info->super_copy,
+				btrfs_super_flags(fs_info->super_copy) &
+				~BTRFS_SUPER_FLAG_CHANGING_FSID_V2);
 		btrfs_info(fs_info,
 			"found metadata UUID change in progress flag, clearing");
 	}
-
 	memcpy(fs_info->super_for_commit, fs_info->super_copy,
-	       sizeof(*fs_info->super_for_commit));
-
+	       BTRFS_SUPER_INFO_SIZE);
 	ret = btrfs_validate_mount_super(fs_info);
-	if (ret) {
+	if (ret < 0) {
 		btrfs_err(fs_info, "superblock contains fatal errors");
-		err = -EINVAL;
-		goto fail_alloc;
+		return -EINVAL;
 	}
 
-	if (!btrfs_super_root(disk_super))
-		goto fail_alloc;
-
+	if (!btrfs_super_root(fs_info->super_copy)) {
+		btrfs_err(fs_info,
+		"invalid super root bytenr, should have non-zero bytenr");
+		return -EINVAL;
+	}
 	/* check FS state, whether FS is broken. */
 	if (btrfs_super_flags(disk_super) & BTRFS_SUPER_FLAG_ERROR)
 		set_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state);
@@ -3466,11 +3420,9 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	 */
 	fs_info->compress_type = BTRFS_COMPRESS_ZLIB;
 
-
 	/* Set up fs_info before parsing mount options */
 	nodesize = btrfs_super_nodesize(disk_super);
 	sectorsize = btrfs_super_sectorsize(disk_super);
-	stripesize = sectorsize;
 	fs_info->dirty_metadata_batch = nodesize * (1 + ilog2(nr_cpu_ids));
 	fs_info->delalloc_batch = sectorsize * 512 * (1 + ilog2(nr_cpu_ids));
 
@@ -3478,7 +3430,61 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	fs_info->sectorsize = sectorsize;
 	fs_info->sectorsize_bits = ilog2(sectorsize);
 	fs_info->csums_per_leaf = BTRFS_MAX_ITEM_SIZE(fs_info) / fs_info->csum_size;
-	fs_info->stripesize = stripesize;
+	fs_info->stripesize = sectorsize;
+
+	fs_info->sb->s_bdi->ra_pages *= btrfs_super_num_devices(fs_info->super_copy);
+	fs_info->sb->s_bdi->ra_pages = max(fs_info->sb->s_bdi->ra_pages, SZ_4M / PAGE_SIZE);
+
+	fs_info->sb->s_blocksize = sectorsize;
+	fs_info->sb->s_blocksize_bits = blksize_bits(sectorsize);
+	memcpy(&fs_info->sb->s_uuid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE);
+
+	return 0;
+
+error:
+	btrfs_release_disk_super(disk_super);
+	return ret;
+}
+
+struct init_sequence {
+	int (*init_func)(struct btrfs_fs_info *fs_info);
+	void (*exit_func)(struct btrfs_fs_info *fs_info);
+};
+
+static const struct init_sequence open_ctree_seq[] = {
+	{
+		.init_func = open_ctree_btree_inode_init,
+		.exit_func = open_ctree_btree_inode_exit,
+	}, {
+		.init_func = open_ctree_super_init,
+		.exit_func = NULL,
+	}
+};
+
+
+int __cold open_ctree(struct super_block *sb, char *options)
+{
+	u64 generation;
+	u64 features;
+	bool open_ctree_res[ARRAY_SIZE(open_ctree_seq)] = {0};
+	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
+	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
+	int ret;
+	int err = -EINVAL;
+	int level;
+	int i;
+
+	fs_info->sb = sb;
+
+	/* Caller should have already initialized fs_info->fs_devices. */
+	ASSERT(fs_info->fs_devices);
+
+	for (i = 0; i < ARRAY_SIZE(open_ctree_seq); i++) {
+		ret = open_ctree_seq[i].init_func(fs_info);
+		if (ret < 0)
+			goto fail;
+		open_ctree_res[i] = true;
+	}
 
 	ret = btrfs_parse_options(fs_info, options, fs_info->sb->s_flags);
 	if (ret) {
@@ -3486,7 +3492,7 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		goto fail_alloc;
 	}
 
-	features = btrfs_super_incompat_flags(disk_super) &
+	features = btrfs_super_incompat_flags(fs_info->super_copy) &
 		~BTRFS_FEATURE_INCOMPAT_SUPP;
 	if (features) {
 		btrfs_err(fs_info,
@@ -3496,7 +3502,7 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		goto fail_alloc;
 	}
 
-	features = btrfs_super_incompat_flags(disk_super);
+	features = btrfs_super_incompat_flags(fs_info->super_copy);
 	features |= BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF;
 	if (fs_info->compress_type == BTRFS_COMPRESS_LZO)
 		features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;
@@ -3507,7 +3513,7 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	 * Flag our filesystem as having big metadata blocks if they are bigger
 	 * than the page size.
 	 */
-	if (btrfs_super_nodesize(disk_super) > PAGE_SIZE)
+	if (btrfs_super_nodesize(fs_info->super_copy) > PAGE_SIZE)
 		features |= BTRFS_FEATURE_INCOMPAT_BIG_METADATA;
 
 	/*
@@ -3515,10 +3521,10 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	 * extent buffers for the same range.  It leads to corruptions
 	 */
 	if ((features & BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS) &&
-	    (sectorsize != nodesize)) {
+	    (fs_info->sectorsize != fs_info->nodesize)) {
 		btrfs_err(fs_info,
 "unequal nodesize/sectorsize (%u != %u) are not allowed for mixed block groups",
-			nodesize, sectorsize);
+			fs_info->nodesize, fs_info->sectorsize);
 		goto fail_alloc;
 	}
 
@@ -3526,9 +3532,9 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	 * Needn't use the lock because there is no other task which will
 	 * update the flag.
 	 */
-	btrfs_set_super_incompat_flags(disk_super, features);
+	btrfs_set_super_incompat_flags(fs_info->super_copy, features);
 
-	features = btrfs_super_compat_ro_flags(disk_super) &
+	features = btrfs_super_compat_ro_flags(fs_info->super_copy) &
 		~BTRFS_FEATURE_COMPAT_RO_SUPP;
 	if (!sb_rdonly(fs_info->sb) && features) {
 		btrfs_err(fs_info,
@@ -3542,7 +3548,7 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	 * should not cause any metadata write, including log replay.
 	 * Or we could screw up whatever the new feature requires.
 	 */
-	if (unlikely(features && btrfs_super_log_root(disk_super) &&
+	if (unlikely(features && btrfs_super_log_root(fs_info->super_copy) &&
 		     !btrfs_test_opt(fs_info, NOLOGREPLAY))) {
 		btrfs_err(fs_info,
 "cannot replay dirty log with unsupported compat_ro features (0x%llx), try rescue=nologreplay",
@@ -3552,7 +3558,7 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	}
 
 
-	if (sectorsize < PAGE_SIZE) {
+	if (fs_info->sectorsize < PAGE_SIZE) {
 		struct btrfs_subpage_info *subpage_info;
 
 		/*
@@ -3564,15 +3570,15 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		btrfs_clear_opt(fs_info->mount_opt, SPACE_CACHE);
 		btrfs_set_and_info(fs_info, FREE_SPACE_TREE,
 			"forcing free space tree for sector size %u with page size %lu",
-			sectorsize, PAGE_SIZE);
+			fs_info->sectorsize, PAGE_SIZE);
 
 		btrfs_warn(fs_info,
 		"read-write for sector size %u with page size %lu is experimental",
-			   sectorsize, PAGE_SIZE);
+			   fs_info->sectorsize, PAGE_SIZE);
 		subpage_info = kzalloc(sizeof(*subpage_info), GFP_KERNEL);
 		if (!subpage_info)
 			goto fail_alloc;
-		btrfs_init_subpage_info(subpage_info, sectorsize);
+		btrfs_init_subpage_info(subpage_info, fs_info->sectorsize);
 		fs_info->subpage_info = subpage_info;
 	}
 
@@ -3582,13 +3588,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		goto fail_sb_buffer;
 	}
 
-	fs_info->sb->s_bdi->ra_pages *= btrfs_super_num_devices(disk_super);
-	fs_info->sb->s_bdi->ra_pages = max(fs_info->sb->s_bdi->ra_pages, SZ_4M / PAGE_SIZE);
-
-	fs_info->sb->s_blocksize = sectorsize;
-	fs_info->sb->s_blocksize_bits = blksize_bits(sectorsize);
-	memcpy(&fs_info->sb->s_uuid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE);
-
 	mutex_lock(&fs_info->chunk_mutex);
 	ret = btrfs_read_sys_array(fs_info);
 	mutex_unlock(&fs_info->chunk_mutex);
@@ -3597,10 +3596,10 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		goto fail_sb_buffer;
 	}
 
-	generation = btrfs_super_chunk_root_generation(disk_super);
-	level = btrfs_super_chunk_root_level(disk_super);
+	generation = btrfs_super_chunk_root_generation(fs_info->super_copy);
+	level = btrfs_super_chunk_root_level(fs_info->super_copy);
 	ret = load_super_root(fs_info->chunk_root,
-			      btrfs_super_chunk_root(disk_super),
+			      btrfs_super_chunk_root(fs_info->super_copy),
 			      generation, level);
 	if (ret) {
 		btrfs_err(fs_info, "failed to read chunk root");
@@ -3656,7 +3655,8 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	 * even though it was perfectly fine.
 	 */
 	if (fs_info->uuid_root && !btrfs_test_opt(fs_info, RESCAN_UUID_TREE) &&
-	    fs_info->generation == btrfs_super_uuid_tree_generation(disk_super))
+	    fs_info->generation ==
+	    btrfs_super_uuid_tree_generation(fs_info->super_copy))
 		set_bit(BTRFS_FS_UPDATE_UUID_TREE_GEN, &fs_info->flags);
 
 	ret = btrfs_verify_dev_extents(fs_info);
@@ -3767,7 +3767,7 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		btrfs_err(fs_info, "couldn't build ref tree");
 
 	/* do not make disk changes in broken FS or nologreplay is given */
-	if (btrfs_super_log_root(disk_super) != 0 &&
+	if (btrfs_super_log_root(fs_info->super_copy) != 0 &&
 	    !btrfs_test_opt(fs_info, NOLOGREPLAY)) {
 		btrfs_info(fs_info, "start tree-log replay");
 		ret = btrfs_replay_log(fs_info, fs_devices);
@@ -3797,7 +3797,8 @@ int __cold open_ctree(struct super_block *sb, char *options)
 
 	if (fs_info->uuid_root &&
 	    (btrfs_test_opt(fs_info, RESCAN_UUID_TREE) ||
-	     fs_info->generation != btrfs_super_uuid_tree_generation(disk_super))) {
+	     fs_info->generation !=
+	     btrfs_super_uuid_tree_generation(fs_info->super_copy))) {
 		btrfs_info(fs_info, "checking UUID tree");
 		ret = btrfs_check_uuid_tree(fs_info);
 		if (ret) {
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 06/16] btrfs: extract mount options and features init code into its own init helper
  2022-09-22  0:06 [PATCH 00/16] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (4 preceding siblings ...)
  2022-09-22  0:06 ` [PATCH 05/16] btrfs: extract super block read code into its own init helper Qu Wenruo
@ 2022-09-22  0:06 ` Qu Wenruo
  2022-09-22  0:28   ` Qu Wenruo
  2022-09-22  0:06 ` [PATCH 07/16] btrfs: move btrfs_init_workqueus() and btrfs_stop_all_workers() into open_ctree_seq[] Qu Wenruo
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2022-09-22  0:06 UTC (permalink / raw)
  To: linux-btrfs

This patch will extract the mount option parsing and features checking
code into a new helper, open_ctree_features_init().

This extraction also did the following non-functional change:

- Add btrfs_fs_info::__options member
  Currently the open_ctree_* helpers can only accept a single @fs_info
  parameter, to parse the mount options we have to use a temporary
  pointer for this purpose.

  Thankfully we don't need to do anything like freeing it.

- Move ssd optimization check into open_ctree_features_init()

- Move check_int related code into open_ctree_features_init()
  The mount time integrity check doesn't rely on any tree blocks, thus
  is safe to be called at feature init time.

- Separate @features variable into @incompat and @compat_ro
  So there will be no confusion, and compiler should be clever enough
  to optimize them out anyway.

- Properly return error for subpage initialization failure
  Preivously we just goto fail_alloc label, relying on the default
  -EINVAL error.
  Now we can return a proper -ENOMEM error instead.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.h   |   6 ++
 fs/btrfs/disk-io.c | 176 +++++++++++++++++++++++----------------------
 2 files changed, 96 insertions(+), 86 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8b7b7a212da0..88dc3b2896d7 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1106,6 +1106,12 @@ struct btrfs_fs_info {
 	struct lockdep_map btrfs_trans_pending_ordered_map;
 	struct lockdep_map btrfs_ordered_extent_map;
 
+	/*
+	 * Temporary pointer to the mount option string.
+	 * This is to workaround the fact that all open_ctree() init
+	 * functions can only accept a single @fs_info pointer.
+	 */
+	char *__options;
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
 	spinlock_t ref_verify_lock;
 	struct rb_root block_tree;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f4c04cb1c0d6..ec9038eeaa0f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3446,118 +3446,78 @@ static int open_ctree_super_init(struct btrfs_fs_info *fs_info)
 	return ret;
 }
 
-struct init_sequence {
-	int (*init_func)(struct btrfs_fs_info *fs_info);
-	void (*exit_func)(struct btrfs_fs_info *fs_info);
-};
-
-static const struct init_sequence open_ctree_seq[] = {
-	{
-		.init_func = open_ctree_btree_inode_init,
-		.exit_func = open_ctree_btree_inode_exit,
-	}, {
-		.init_func = open_ctree_super_init,
-		.exit_func = NULL,
-	}
-};
-
-
-int __cold open_ctree(struct super_block *sb, char *options)
+static int open_ctree_features_init(struct btrfs_fs_info *fs_info)
 {
-	u64 generation;
-	u64 features;
-	bool open_ctree_res[ARRAY_SIZE(open_ctree_seq)] = {0};
-	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
-	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
 	int ret;
-	int err = -EINVAL;
-	int level;
-	int i;
-
-	fs_info->sb = sb;
-
-	/* Caller should have already initialized fs_info->fs_devices. */
-	ASSERT(fs_info->fs_devices);
-
-	for (i = 0; i < ARRAY_SIZE(open_ctree_seq); i++) {
-		ret = open_ctree_seq[i].init_func(fs_info);
-		if (ret < 0)
-			goto fail;
-		open_ctree_res[i] = true;
-	}
+	u64 incompat;
+	u64 compat_ro;
 
-	ret = btrfs_parse_options(fs_info, options, fs_info->sb->s_flags);
-	if (ret) {
-		err = ret;
-		goto fail_alloc;
-	}
+	ret = btrfs_parse_options(fs_info, fs_info->__options,
+				  fs_info->sb->s_flags);
+	if (ret)
+		return ret;
 
-	features = btrfs_super_incompat_flags(fs_info->super_copy) &
-		~BTRFS_FEATURE_INCOMPAT_SUPP;
-	if (features) {
+	incompat = btrfs_super_incompat_flags(fs_info->super_copy);
+	if (incompat & ~BTRFS_FEATURE_INCOMPAT_SUPP) {
 		btrfs_err(fs_info,
 		    "cannot mount because of unsupported optional features (0x%llx)",
-		    features);
-		err = -EINVAL;
-		goto fail_alloc;
+		    incompat & ~BTRFS_FEATURE_INCOMPAT_SUPP);
+		return -EINVAL;
 	}
 
-	features = btrfs_super_incompat_flags(fs_info->super_copy);
-	features |= BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF;
+	incompat |= BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF;
 	if (fs_info->compress_type == BTRFS_COMPRESS_LZO)
-		features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;
+		incompat |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;
 	else if (fs_info->compress_type == BTRFS_COMPRESS_ZSTD)
-		features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_ZSTD;
+		incompat |= BTRFS_FEATURE_INCOMPAT_COMPRESS_ZSTD;
 
 	/*
 	 * Flag our filesystem as having big metadata blocks if they are bigger
 	 * than the page size.
 	 */
 	if (btrfs_super_nodesize(fs_info->super_copy) > PAGE_SIZE)
-		features |= BTRFS_FEATURE_INCOMPAT_BIG_METADATA;
+		incompat |= BTRFS_FEATURE_INCOMPAT_BIG_METADATA;
 
 	/*
 	 * mixed block groups end up with duplicate but slightly offset
 	 * extent buffers for the same range.  It leads to corruptions
 	 */
-	if ((features & BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS) &&
+	if ((incompat & BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS) &&
 	    (fs_info->sectorsize != fs_info->nodesize)) {
 		btrfs_err(fs_info,
 "unequal nodesize/sectorsize (%u != %u) are not allowed for mixed block groups",
 			fs_info->nodesize, fs_info->sectorsize);
-		goto fail_alloc;
+		return -EINVAL;
 	}
 
 	/*
 	 * Needn't use the lock because there is no other task which will
 	 * update the flag.
 	 */
-	btrfs_set_super_incompat_flags(fs_info->super_copy, features);
+	btrfs_set_super_incompat_flags(fs_info->super_copy, incompat);
 
-	features = btrfs_super_compat_ro_flags(fs_info->super_copy) &
-		~BTRFS_FEATURE_COMPAT_RO_SUPP;
-	if (!sb_rdonly(fs_info->sb) && features) {
+	compat_ro = btrfs_super_compat_ro_flags(fs_info->super_copy);
+	if (!sb_rdonly(fs_info->sb) &&
+	    (compat_ro & ~BTRFS_FEATURE_COMPAT_RO_SUPP)) {
 		btrfs_err(fs_info,
 	"cannot mount read-write because of unsupported optional features (0x%llx)",
-		       features);
-		err = -EINVAL;
-		goto fail_alloc;
+		       compat_ro & ~BTRFS_FEATURE_COMPAT_RO_SUPP);
+		return -EINVAL;
 	}
 	/*
 	 * We have unsupported RO compat features, although RO mounted, we
 	 * should not cause any metadata write, including log replay.
 	 * Or we could screw up whatever the new feature requires.
 	 */
-	if (unlikely(features && btrfs_super_log_root(fs_info->super_copy) &&
+	if (unlikely((compat_ro & ~BTRFS_FEATURE_COMPAT_RO_SUPP)
+		     && btrfs_super_log_root(fs_info->super_copy) &&
 		     !btrfs_test_opt(fs_info, NOLOGREPLAY))) {
 		btrfs_err(fs_info,
 "cannot replay dirty log with unsupported compat_ro features (0x%llx), try rescue=nologreplay",
-			  features);
-		err = -EINVAL;
-		goto fail_alloc;
+			  compat_ro & ~BTRFS_FEATURE_COMPAT_RO_SUPP);
+		return -EINVAL;
 	}
 
-
 	if (fs_info->sectorsize < PAGE_SIZE) {
 		struct btrfs_subpage_info *subpage_info;
 
@@ -3577,11 +3537,73 @@ int __cold open_ctree(struct super_block *sb, char *options)
 			   fs_info->sectorsize, PAGE_SIZE);
 		subpage_info = kzalloc(sizeof(*subpage_info), GFP_KERNEL);
 		if (!subpage_info)
-			goto fail_alloc;
+			return -ENOMEM;
 		btrfs_init_subpage_info(subpage_info, fs_info->sectorsize);
 		fs_info->subpage_info = subpage_info;
 	}
 
+	if (!btrfs_test_opt(fs_info, NOSSD) &&
+	    !fs_info->fs_devices->rotating)
+		btrfs_set_and_info(fs_info, SSD, "enabling ssd optimizations");
+
+#ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
+	if (btrfs_test_opt(fs_info, CHECK_INTEGRITY)) {
+		ret = btrfsic_mount(fs_info, fs_info->fs_devices,
+				    btrfs_test_opt(fs_info,
+					CHECK_INTEGRITY_DATA) ? 1 : 0,
+				    fs_info->check_integrity_print_mask);
+		if (ret)
+			btrfs_warn(fs_info,
+				"failed to initialize integrity check module: %d",
+				ret);
+	}
+#endif
+
+	return 0;
+}
+
+struct init_sequence {
+	int (*init_func)(struct btrfs_fs_info *fs_info);
+	void (*exit_func)(struct btrfs_fs_info *fs_info);
+};
+
+static const struct init_sequence open_ctree_seq[] = {
+	{
+		.init_func = open_ctree_btree_inode_init,
+		.exit_func = open_ctree_btree_inode_exit,
+	}, {
+		.init_func = open_ctree_super_init,
+		.exit_func = NULL,
+	}, {
+		.init_func = open_ctree_features_init,
+		.exit_func = NULL,
+	}
+};
+
+int __cold open_ctree(struct super_block *sb, char *options)
+{
+	u64 generation;
+	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
+	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
+	bool open_ctree_res[ARRAY_SIZE(open_ctree_seq)] = {0};
+	int ret;
+	int err = -EINVAL;
+	int level;
+	int i;
+
+	fs_info->sb = sb;
+	fs_info->__options = options;
+
+	/* Caller should have already initialized fs_info->fs_devices. */
+	ASSERT(fs_info->fs_devices);
+
+	for (i = 0; i < ARRAY_SIZE(open_ctree_seq); i++) {
+		ret = open_ctree_seq[i].init_func(fs_info);
+		if (ret < 0)
+			goto fail;
+		open_ctree_res[i] = true;
+	}
+
 	ret = btrfs_init_workqueues(fs_info);
 	if (ret) {
 		err = ret;
@@ -3736,29 +3758,12 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	if (IS_ERR(fs_info->transaction_kthread))
 		goto fail_cleaner;
 
-	if (!btrfs_test_opt(fs_info, NOSSD) &&
-	    !fs_info->fs_devices->rotating) {
-		btrfs_set_and_info(fs_info, SSD, "enabling ssd optimizations");
-	}
-
 	/*
 	 * Mount does not set all options immediately, we can do it now and do
 	 * not have to wait for transaction commit
 	 */
 	btrfs_apply_pending_changes(fs_info);
 
-#ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
-	if (btrfs_test_opt(fs_info, CHECK_INTEGRITY)) {
-		ret = btrfsic_mount(fs_info, fs_devices,
-				    btrfs_test_opt(fs_info,
-					CHECK_INTEGRITY_DATA) ? 1 : 0,
-				    fs_info->check_integrity_print_mask);
-		if (ret)
-			btrfs_warn(fs_info,
-				"failed to initialize integrity check module: %d",
-				ret);
-	}
-#endif
 	ret = btrfs_read_qgroup_config(fs_info);
 	if (ret)
 		goto fail_trans_kthread;
@@ -3851,7 +3856,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 fail_sb_buffer:
 	btrfs_stop_all_workers(fs_info);
 	btrfs_free_block_groups(fs_info);
-fail_alloc:
 	btrfs_mapping_tree_free(&fs_info->mapping_tree);
 fail:
 	for (i = ARRAY_SIZE(open_ctree_seq) - 1; i >= 0; i--) {
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 07/16] btrfs: move btrfs_init_workqueus() and btrfs_stop_all_workers() into open_ctree_seq[]
  2022-09-22  0:06 [PATCH 00/16] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (5 preceding siblings ...)
  2022-09-22  0:06 ` [PATCH 06/16] btrfs: extract mount options and features init " Qu Wenruo
@ 2022-09-22  0:06 ` Qu Wenruo
  2022-09-22  0:06 ` [PATCH 08/16] btrfs: extract chunk tree read code into its own init/exit helpers Qu Wenruo
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2022-09-22  0:06 UTC (permalink / raw)
  To: linux-btrfs

Those two helpers are already doing all the work, just move them into
the open_ctree_seq[] array.

There is only one small change:

- Call btrfs_stop_all_workers() inside btrfs_init_workqueus() for error
  handling
  Since open_ctree_seq[] makes all error path to call the exit function
  if and only if the corresponding init function finished without error.

  This means, if btrfs_init_workqueus() failed due to -ENOMEM, then we
  won't call btrfs_stop_all_workers() to cleanup whatever is already
  allocatd.

  To fix this problem, call btrfs_stop_all_workers() inside
  btrfs_init_workqueus() when we hit errors.
  Function btrfs_stop_all_workers() already has the checks to
  handle NULL pointers.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index ec9038eeaa0f..0e1fe12c1c99 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2318,11 +2318,12 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
 	      fs_info->endio_freespace_worker && fs_info->rmw_workers &&
 	      fs_info->caching_workers && fs_info->fixup_workers &&
 	      fs_info->delayed_workers && fs_info->qgroup_rescan_workers &&
-	      fs_info->discard_ctl.discard_workers)) {
-		return -ENOMEM;
-	}
-
+	      fs_info->discard_ctl.discard_workers))
+		goto error;
 	return 0;
+error:
+	btrfs_stop_all_workers(fs_info);
+	return -ENOMEM;
 }
 
 static int btrfs_init_csum_hash(struct btrfs_fs_info *fs_info, u16 csum_type)
@@ -3577,6 +3578,9 @@ static const struct init_sequence open_ctree_seq[] = {
 	}, {
 		.init_func = open_ctree_features_init,
 		.exit_func = NULL,
+	}, {
+		.init_func = btrfs_init_workqueues,
+		.exit_func = btrfs_stop_all_workers,
 	}
 };
 
@@ -3604,12 +3608,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		open_ctree_res[i] = true;
 	}
 
-	ret = btrfs_init_workqueues(fs_info);
-	if (ret) {
-		err = ret;
-		goto fail_sb_buffer;
-	}
-
 	mutex_lock(&fs_info->chunk_mutex);
 	ret = btrfs_read_sys_array(fs_info);
 	mutex_unlock(&fs_info->chunk_mutex);
@@ -3854,7 +3852,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	free_root_pointers(fs_info, true);
 
 fail_sb_buffer:
-	btrfs_stop_all_workers(fs_info);
 	btrfs_free_block_groups(fs_info);
 	btrfs_mapping_tree_free(&fs_info->mapping_tree);
 fail:
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 08/16] btrfs: extract chunk tree read code into its own init/exit helpers
  2022-09-22  0:06 [PATCH 00/16] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (6 preceding siblings ...)
  2022-09-22  0:06 ` [PATCH 07/16] btrfs: move btrfs_init_workqueus() and btrfs_stop_all_workers() into open_ctree_seq[] Qu Wenruo
@ 2022-09-22  0:06 ` Qu Wenruo
  2022-09-22  0:06 ` [PATCH 09/16] btrfs: extract tree roots and zone info initialization into " Qu Wenruo
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2022-09-22  0:06 UTC (permalink / raw)
  To: linux-btrfs

Involved functional changes:

- Properly free the chunk map and chunk root ebs at error handling
  Previously we rely the final close_ctree() to properly free the chunk
  root extent buffers.

  With the more strict open_ctree_seq[] requirement, since we're the
  first one to fully populate chunk root extent buffers, at error
  we should also free the extent buffers.

  Note, the tree root and chunk root themselves are first allocated by
  open_ctree_btree_inode_init(), thus we should not free the chunk_root
  pointer, but just the extent buffers.

- Do degradable check immediately after loading chunk tree
  The degradable check only requires the full chunk mapping, can be done
  immediately after btrfs_read_chunk_tree().

This also exposed one exiting label mismatch, at chunk tree read, we
didn't create block group items at all, but at the old fail_sb_buffer:
label we call btrfs_free_block_groups().

It doesn't hurt but just shows how bad the original code labels are
managed.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 127 ++++++++++++++++++++++++++-------------------
 1 file changed, 74 insertions(+), 53 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 0e1fe12c1c99..4fcdd15697d0 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3563,6 +3563,77 @@ static int open_ctree_features_init(struct btrfs_fs_info *fs_info)
 	return 0;
 }
 
+static int open_ctree_chunk_tree_init(struct btrfs_fs_info *fs_info)
+{
+	u64 generation;
+	int level;
+	int ret;
+
+	mutex_lock(&fs_info->chunk_mutex);
+	ret = btrfs_read_sys_array(fs_info);
+	mutex_unlock(&fs_info->chunk_mutex);
+	if (ret) {
+		btrfs_err(fs_info, "failed to read the system array: %d", ret);
+		goto free_mapping;
+	}
+
+	generation = btrfs_super_chunk_root_generation(fs_info->super_copy);
+	level = btrfs_super_chunk_root_level(fs_info->super_copy);
+	ret = load_super_root(fs_info->chunk_root,
+			      btrfs_super_chunk_root(fs_info->super_copy),
+			      generation, level);
+	if (ret) {
+		btrfs_err(fs_info, "failed to read chunk root");
+		goto free_root;
+	}
+
+	read_extent_buffer(fs_info->chunk_root->node, fs_info->chunk_tree_uuid,
+			   offsetof(struct btrfs_header, chunk_tree_uuid),
+			   BTRFS_UUID_SIZE);
+
+	ret = btrfs_read_chunk_tree(fs_info);
+	if (ret) {
+		btrfs_err(fs_info, "failed to read chunk tree: %d", ret);
+		goto free_root;
+	}
+
+	/*
+	 * At this point we know all the devices that make this filesystem,
+	 * including the seed devices but we don't know yet if the replace
+	 * target is required. So free devices that are not part of this
+	 * filesystem but skip the replace target device which is checked
+	 * below in btrfs_init_dev_replace().
+	 */
+	btrfs_free_extra_devids(fs_info->fs_devices);
+	if (!fs_info->fs_devices->latest_dev->bdev) {
+		btrfs_err(fs_info, "failed to read devices");
+		goto free_root;
+	}
+
+	/* We have full chunk tree loaded, can do degradable check now. */
+	if (!sb_rdonly(fs_info->sb) && fs_info->fs_devices->missing_devices &&
+	    !btrfs_check_rw_degradable(fs_info, NULL)) {
+		btrfs_warn(fs_info,
+		"writable mount is not allowed due to too many missing devices");
+		ret = -EIO;
+		goto free_root;
+	}
+
+	return 0;
+
+free_root:
+	free_root_extent_buffers(fs_info->chunk_root);
+free_mapping:
+	btrfs_mapping_tree_free(&fs_info->mapping_tree);
+	return ret;
+}
+
+static void open_ctree_chunk_tree_exit(struct btrfs_fs_info *fs_info)
+{
+	free_root_extent_buffers(fs_info->chunk_root);
+	btrfs_mapping_tree_free(&fs_info->mapping_tree);
+}
+
 struct init_sequence {
 	int (*init_func)(struct btrfs_fs_info *fs_info);
 	void (*exit_func)(struct btrfs_fs_info *fs_info);
@@ -3581,18 +3652,19 @@ static const struct init_sequence open_ctree_seq[] = {
 	}, {
 		.init_func = btrfs_init_workqueues,
 		.exit_func = btrfs_stop_all_workers,
+	}, {
+		.init_func = open_ctree_chunk_tree_init,
+		.exit_func = open_ctree_chunk_tree_exit,
 	}
 };
 
 int __cold open_ctree(struct super_block *sb, char *options)
 {
-	u64 generation;
 	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
 	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
 	bool open_ctree_res[ARRAY_SIZE(open_ctree_seq)] = {0};
 	int ret;
 	int err = -EINVAL;
-	int level;
 	int i;
 
 	fs_info->sb = sb;
@@ -3608,47 +3680,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		open_ctree_res[i] = true;
 	}
 
-	mutex_lock(&fs_info->chunk_mutex);
-	ret = btrfs_read_sys_array(fs_info);
-	mutex_unlock(&fs_info->chunk_mutex);
-	if (ret) {
-		btrfs_err(fs_info, "failed to read the system array: %d", ret);
-		goto fail_sb_buffer;
-	}
-
-	generation = btrfs_super_chunk_root_generation(fs_info->super_copy);
-	level = btrfs_super_chunk_root_level(fs_info->super_copy);
-	ret = load_super_root(fs_info->chunk_root,
-			      btrfs_super_chunk_root(fs_info->super_copy),
-			      generation, level);
-	if (ret) {
-		btrfs_err(fs_info, "failed to read chunk root");
-		goto fail_tree_roots;
-	}
-
-	read_extent_buffer(fs_info->chunk_root->node, fs_info->chunk_tree_uuid,
-			   offsetof(struct btrfs_header, chunk_tree_uuid),
-			   BTRFS_UUID_SIZE);
-
-	ret = btrfs_read_chunk_tree(fs_info);
-	if (ret) {
-		btrfs_err(fs_info, "failed to read chunk tree: %d", ret);
-		goto fail_tree_roots;
-	}
-
-	/*
-	 * At this point we know all the devices that make this filesystem,
-	 * including the seed devices but we don't know yet if the replace
-	 * target is required. So free devices that are not part of this
-	 * filesystem but skip the replace target device which is checked
-	 * below in btrfs_init_dev_replace().
-	 */
-	btrfs_free_extra_devids(fs_devices);
-	if (!fs_devices->latest_dev->bdev) {
-		btrfs_err(fs_info, "failed to read devices");
-		goto fail_tree_roots;
-	}
-
 	ret = init_tree_roots(fs_info);
 	if (ret)
 		goto fail_tree_roots;
@@ -3738,13 +3769,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 
 	btrfs_free_zone_cache(fs_info);
 
-	if (!sb_rdonly(sb) && fs_info->fs_devices->missing_devices &&
-	    !btrfs_check_rw_degradable(fs_info, NULL)) {
-		btrfs_warn(fs_info,
-		"writable mount is not allowed due to too many missing devices");
-		goto fail_sysfs;
-	}
-
 	fs_info->cleaner_kthread = kthread_run(cleaner_kthread, fs_info,
 					       "btrfs-cleaner");
 	if (IS_ERR(fs_info->cleaner_kthread))
@@ -3850,10 +3874,7 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	if (fs_info->data_reloc_root)
 		btrfs_drop_and_free_fs_root(fs_info, fs_info->data_reloc_root);
 	free_root_pointers(fs_info, true);
-
-fail_sb_buffer:
 	btrfs_free_block_groups(fs_info);
-	btrfs_mapping_tree_free(&fs_info->mapping_tree);
 fail:
 	for (i = ARRAY_SIZE(open_ctree_seq) - 1; i >= 0; i--) {
 		if (!open_ctree_res[i] || !open_ctree_seq[i].exit_func)
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 09/16] btrfs: extract tree roots and zone info initialization into init/exit helpers
  2022-09-22  0:06 [PATCH 00/16] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (7 preceding siblings ...)
  2022-09-22  0:06 ` [PATCH 08/16] btrfs: extract chunk tree read code into its own init/exit helpers Qu Wenruo
@ 2022-09-22  0:06 ` Qu Wenruo
  2022-09-22  0:06 ` [PATCH 10/16] btrfs: extract mount time checks and items load code into its init helper Qu Wenruo
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2022-09-22  0:06 UTC (permalink / raw)
  To: linux-btrfs

No functional change, but one special thing to notice:

- No need to cleanup zone info
  As we call btrfs_close_devices() at error path to cleanup all
  zone device info, thus we don't need to handle it at
  open_ctree_tree_roots_init().

  This is a break of the init/exit layer, but since devices info is
  not per-device, but shared for the whole btrfs module, such break
  should still be acceptable.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 88 ++++++++++++++++++++++++++++------------------
 1 file changed, 53 insertions(+), 35 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 4fcdd15697d0..e03927cb00b8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3634,6 +3634,56 @@ static void open_ctree_chunk_tree_exit(struct btrfs_fs_info *fs_info)
 	btrfs_mapping_tree_free(&fs_info->mapping_tree);
 }
 
+static int open_ctree_tree_roots_init(struct btrfs_fs_info *fs_info)
+{
+	int ret;
+
+	ret = init_tree_roots(fs_info);
+	if (ret)
+		goto error;
+
+	/*
+	 * Get zone type information of zoned block devices. This will also
+	 * handle emulation of a zoned filesystem if a regular device has the
+	 * zoned incompat feature flag set.
+	 */
+	ret = btrfs_get_dev_zone_info_all_devices(fs_info);
+	if (ret) {
+		btrfs_err(fs_info,
+			  "zoned: failed to read device zone info: %d",
+			  ret);
+		goto error;
+	}
+
+	/*
+	 * If we have a uuid root and we're not being told to rescan we need to
+	 * check the generation here so we can set the
+	 * BTRFS_FS_UPDATE_UUID_TREE_GEN bit.  Otherwise we could commit the
+	 * transaction during a balance or the log replay without updating the
+	 * uuid generation, and then if we crash we would rescan the uuid tree,
+	 * even though it was perfectly fine.
+	 */
+	if (fs_info->uuid_root && !btrfs_test_opt(fs_info, RESCAN_UUID_TREE) &&
+	    fs_info->generation ==
+	    btrfs_super_uuid_tree_generation(fs_info->super_copy))
+		set_bit(BTRFS_FS_UPDATE_UUID_TREE_GEN, &fs_info->flags);
+
+	return 0;
+
+error:
+	if (fs_info->data_reloc_root)
+		btrfs_drop_and_free_fs_root(fs_info, fs_info->data_reloc_root);
+	free_root_pointers(fs_info, true);
+	return ret;
+}
+
+static void open_ctree_tree_roots_exit(struct btrfs_fs_info *fs_info)
+{
+	if (fs_info->data_reloc_root)
+		btrfs_drop_and_free_fs_root(fs_info, fs_info->data_reloc_root);
+	free_root_pointers(fs_info, true);
+}
+
 struct init_sequence {
 	int (*init_func)(struct btrfs_fs_info *fs_info);
 	void (*exit_func)(struct btrfs_fs_info *fs_info);
@@ -3655,6 +3705,9 @@ static const struct init_sequence open_ctree_seq[] = {
 	}, {
 		.init_func = open_ctree_chunk_tree_init,
 		.exit_func = open_ctree_chunk_tree_exit,
+	}, {
+		.init_func = open_ctree_tree_roots_init,
+		.exit_func = open_ctree_tree_roots_exit,
 	}
 };
 
@@ -3680,36 +3733,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		open_ctree_res[i] = true;
 	}
 
-	ret = init_tree_roots(fs_info);
-	if (ret)
-		goto fail_tree_roots;
-
-	/*
-	 * Get zone type information of zoned block devices. This will also
-	 * handle emulation of a zoned filesystem if a regular device has the
-	 * zoned incompat feature flag set.
-	 */
-	ret = btrfs_get_dev_zone_info_all_devices(fs_info);
-	if (ret) {
-		btrfs_err(fs_info,
-			  "zoned: failed to read device zone info: %d",
-			  ret);
-		goto fail_block_groups;
-	}
-
-	/*
-	 * If we have a uuid root and we're not being told to rescan we need to
-	 * check the generation here so we can set the
-	 * BTRFS_FS_UPDATE_UUID_TREE_GEN bit.  Otherwise we could commit the
-	 * transaction during a balance or the log replay without updating the
-	 * uuid generation, and then if we crash we would rescan the uuid tree,
-	 * even though it was perfectly fine.
-	 */
-	if (fs_info->uuid_root && !btrfs_test_opt(fs_info, RESCAN_UUID_TREE) &&
-	    fs_info->generation ==
-	    btrfs_super_uuid_tree_generation(fs_info->super_copy))
-		set_bit(BTRFS_FS_UPDATE_UUID_TREE_GEN, &fs_info->flags);
-
 	ret = btrfs_verify_dev_extents(fs_info);
 	if (ret) {
 		btrfs_err(fs_info,
@@ -3869,11 +3892,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 
 fail_block_groups:
 	btrfs_put_block_group_cache(fs_info);
-
-fail_tree_roots:
-	if (fs_info->data_reloc_root)
-		btrfs_drop_and_free_fs_root(fs_info, fs_info->data_reloc_root);
-	free_root_pointers(fs_info, true);
 	btrfs_free_block_groups(fs_info);
 fail:
 	for (i = ARRAY_SIZE(open_ctree_seq) - 1; i >= 0; i--) {
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 10/16] btrfs: extract mount time checks and items load code into its init helper
  2022-09-22  0:06 [PATCH 00/16] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (8 preceding siblings ...)
  2022-09-22  0:06 ` [PATCH 09/16] btrfs: extract tree roots and zone info initialization into " Qu Wenruo
@ 2022-09-22  0:06 ` Qu Wenruo
  2022-09-22  0:06 ` [PATCH 11/16] btrfs: extract sysfs init into its own helper Qu Wenruo
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2022-09-22  0:06 UTC (permalink / raw)
  To: linux-btrfs

One thing to notice is, since we're also initializing zoned mode, also
move later btrfs_free_zone_cache() call into the helper to concentrace
the zoned code.

As later I found it pretty hard to find any logical connection around
that btrfs_free_zone_cache() call.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 81 +++++++++++++++++++++++++++-------------------
 1 file changed, 47 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e03927cb00b8..6e96829b8140 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3684,6 +3684,50 @@ static void open_ctree_tree_roots_exit(struct btrfs_fs_info *fs_info)
 	free_root_pointers(fs_info, true);
 }
 
+/* Load various items for balance/replace, and do various mount time check. */
+static int open_ctree_load_items_init(struct btrfs_fs_info *fs_info)
+{
+	int ret;
+
+	/*
+	 * Dev extents can only be verified after both dev tree and chunk tree
+	 * being initialized.
+	 */
+	ret = btrfs_verify_dev_extents(fs_info);
+	if (ret) {
+		btrfs_err(fs_info,
+			  "failed to verify dev extents against chunks: %d",
+			  ret);
+		return ret;
+	}
+	ret = btrfs_recover_balance(fs_info);
+	if (ret) {
+		btrfs_err(fs_info, "failed to recover balance: %d", ret);
+		return ret;
+	}
+
+	ret = btrfs_init_dev_stats(fs_info);
+	if (ret) {
+		btrfs_err(fs_info, "failed to init dev_stats: %d", ret);
+		return ret;
+	}
+
+	ret = btrfs_init_dev_replace(fs_info);
+	if (ret) {
+		btrfs_err(fs_info, "failed to init dev_replace: %d", ret);
+		return ret;
+	}
+
+	ret = btrfs_check_zoned_mode(fs_info);
+	if (ret) {
+		btrfs_err(fs_info, "failed to initialize zoned mode: %d", ret);
+		return ret;
+	}
+	btrfs_free_zone_cache(fs_info);
+
+	return 0;
+}
+
 struct init_sequence {
 	int (*init_func)(struct btrfs_fs_info *fs_info);
 	void (*exit_func)(struct btrfs_fs_info *fs_info);
@@ -3708,6 +3752,9 @@ static const struct init_sequence open_ctree_seq[] = {
 	}, {
 		.init_func = open_ctree_tree_roots_init,
 		.exit_func = open_ctree_tree_roots_exit,
+	}, {
+		.init_func = open_ctree_load_items_init,
+		.exit_func = NULL,
 	}
 };
 
@@ -3733,38 +3780,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		open_ctree_res[i] = true;
 	}
 
-	ret = btrfs_verify_dev_extents(fs_info);
-	if (ret) {
-		btrfs_err(fs_info,
-			  "failed to verify dev extents against chunks: %d",
-			  ret);
-		goto fail_block_groups;
-	}
-	ret = btrfs_recover_balance(fs_info);
-	if (ret) {
-		btrfs_err(fs_info, "failed to recover balance: %d", ret);
-		goto fail_block_groups;
-	}
-
-	ret = btrfs_init_dev_stats(fs_info);
-	if (ret) {
-		btrfs_err(fs_info, "failed to init dev_stats: %d", ret);
-		goto fail_block_groups;
-	}
-
-	ret = btrfs_init_dev_replace(fs_info);
-	if (ret) {
-		btrfs_err(fs_info, "failed to init dev_replace: %d", ret);
-		goto fail_block_groups;
-	}
-
-	ret = btrfs_check_zoned_mode(fs_info);
-	if (ret) {
-		btrfs_err(fs_info, "failed to initialize zoned mode: %d",
-			  ret);
-		goto fail_block_groups;
-	}
-
 	ret = btrfs_sysfs_add_fsid(fs_devices);
 	if (ret) {
 		btrfs_err(fs_info, "failed to init sysfs fsid interface: %d",
@@ -3790,8 +3805,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		goto fail_sysfs;
 	}
 
-	btrfs_free_zone_cache(fs_info);
-
 	fs_info->cleaner_kthread = kthread_run(cleaner_kthread, fs_info,
 					       "btrfs-cleaner");
 	if (IS_ERR(fs_info->cleaner_kthread))
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 11/16] btrfs: extract sysfs init into its own helper
  2022-09-22  0:06 [PATCH 00/16] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (9 preceding siblings ...)
  2022-09-22  0:06 ` [PATCH 10/16] btrfs: extract mount time checks and items load code into its init helper Qu Wenruo
@ 2022-09-22  0:06 ` Qu Wenruo
  2022-09-22  0:06 ` [PATCH 12/16] btrfs: extra block groups read code into its own init/exit helpers Qu Wenruo
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2022-09-22  0:06 UTC (permalink / raw)
  To: linux-btrfs

The three functions, btrfs_sysfs_add_fsid(), btrfs_sysfs_add_mounted()
and btrfs_init_space_info() are all doing sysfs related code.

The last one can only be called after fsid sysfs entry created, thus
they are all put into the same helper.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 66 ++++++++++++++++++++++++++++------------------
 1 file changed, 41 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 6e96829b8140..8f8a5fa62d1b 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3728,6 +3728,44 @@ static int open_ctree_load_items_init(struct btrfs_fs_info *fs_info)
 	return 0;
 }
 
+static int open_ctree_sysfs_init(struct btrfs_fs_info *fs_info)
+{
+	int ret;
+
+	ret = btrfs_sysfs_add_fsid(fs_info->fs_devices);
+	if (ret) {
+		btrfs_err(fs_info, "failed to init sysfs fsid interface: %d",
+				ret);
+		return ret;
+	}
+
+	ret = btrfs_sysfs_add_mounted(fs_info);
+	if (ret) {
+		btrfs_err(fs_info, "failed to init sysfs interface: %d", ret);
+		goto free_fsid;
+	}
+
+	/* This can only be called after the fsid entry being added. */
+	ret = btrfs_init_space_info(fs_info);
+	if (ret) {
+		btrfs_err(fs_info, "failed to initialize space info: %d", ret);
+		goto free_mounted;
+	}
+	return 0;
+
+free_mounted:
+	btrfs_sysfs_remove_mounted(fs_info);
+free_fsid:
+	btrfs_sysfs_remove_fsid(fs_info->fs_devices);
+	return ret;
+}
+
+static void open_ctree_sysfs_exit(struct btrfs_fs_info *fs_info)
+{
+	btrfs_sysfs_remove_mounted(fs_info);
+	btrfs_sysfs_remove_fsid(fs_info->fs_devices);
+}
+
 struct init_sequence {
 	int (*init_func)(struct btrfs_fs_info *fs_info);
 	void (*exit_func)(struct btrfs_fs_info *fs_info);
@@ -3755,6 +3793,9 @@ static const struct init_sequence open_ctree_seq[] = {
 	}, {
 		.init_func = open_ctree_load_items_init,
 		.exit_func = NULL,
+	}, {
+		.init_func = open_ctree_sysfs_init,
+		.exit_func = open_ctree_sysfs_exit,
 	}
 };
 
@@ -3780,25 +3821,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		open_ctree_res[i] = true;
 	}
 
-	ret = btrfs_sysfs_add_fsid(fs_devices);
-	if (ret) {
-		btrfs_err(fs_info, "failed to init sysfs fsid interface: %d",
-				ret);
-		goto fail_block_groups;
-	}
-
-	ret = btrfs_sysfs_add_mounted(fs_info);
-	if (ret) {
-		btrfs_err(fs_info, "failed to init sysfs interface: %d", ret);
-		goto fail_fsdev_sysfs;
-	}
-
-	ret = btrfs_init_space_info(fs_info);
-	if (ret) {
-		btrfs_err(fs_info, "failed to initialize space info: %d", ret);
-		goto fail_sysfs;
-	}
-
 	ret = btrfs_read_block_groups(fs_info);
 	if (ret) {
 		btrfs_err(fs_info, "failed to read block groups: %d", ret);
@@ -3898,12 +3920,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	filemap_write_and_wait(fs_info->btree_inode->i_mapping);
 
 fail_sysfs:
-	btrfs_sysfs_remove_mounted(fs_info);
-
-fail_fsdev_sysfs:
-	btrfs_sysfs_remove_fsid(fs_info->fs_devices);
-
-fail_block_groups:
 	btrfs_put_block_group_cache(fs_info);
 	btrfs_free_block_groups(fs_info);
 fail:
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 12/16] btrfs: extra block groups read code into its own init/exit helpers
  2022-09-22  0:06 [PATCH 00/16] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (10 preceding siblings ...)
  2022-09-22  0:06 ` [PATCH 11/16] btrfs: extract sysfs init into its own helper Qu Wenruo
@ 2022-09-22  0:06 ` Qu Wenruo
  2022-09-22  0:06 ` [PATCH 13/16] btrfs: move the fs root related " Qu Wenruo
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2022-09-22  0:06 UTC (permalink / raw)
  To: linux-btrfs

The only special handling is:

- Need error cleanup even in open_ctree_block_group_init()
  As btrfs_read_block_groups() can error out with some block groups
  already inserted.

  Thus here we have to do the cleanup manually, as exit helper will
  not be called if the init helper failed.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 39 +++++++++++++++++++++++++++++----------
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8f8a5fa62d1b..c034b017c316 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3766,6 +3766,31 @@ static void open_ctree_sysfs_exit(struct btrfs_fs_info *fs_info)
 	btrfs_sysfs_remove_fsid(fs_info->fs_devices);
 }
 
+static int open_ctree_block_groups_init(struct btrfs_fs_info *fs_info)
+{
+	int ret;
+
+	ret = btrfs_read_block_groups(fs_info);
+
+	/*
+	 * Even if btrfs_read_block_groups() failed, we may still have
+	 * inserted some block groups.
+	 * Thus we have to do cleanup here manually for error path, as
+	 * our exit function won't be executed for error path.
+	 */
+	if (ret < 0) {
+		btrfs_put_block_group_cache(fs_info);
+		btrfs_free_block_groups(fs_info);
+	}
+	return ret;
+}
+
+static void open_ctree_block_groups_exit(struct btrfs_fs_info *fs_info)
+{
+	btrfs_put_block_group_cache(fs_info);
+	btrfs_free_block_groups(fs_info);
+}
+
 struct init_sequence {
 	int (*init_func)(struct btrfs_fs_info *fs_info);
 	void (*exit_func)(struct btrfs_fs_info *fs_info);
@@ -3796,6 +3821,9 @@ static const struct init_sequence open_ctree_seq[] = {
 	}, {
 		.init_func = open_ctree_sysfs_init,
 		.exit_func = open_ctree_sysfs_exit,
+	}, {
+		.init_func = open_ctree_block_groups_init,
+		.exit_func = open_ctree_block_groups_exit,
 	}
 };
 
@@ -3821,16 +3849,10 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		open_ctree_res[i] = true;
 	}
 
-	ret = btrfs_read_block_groups(fs_info);
-	if (ret) {
-		btrfs_err(fs_info, "failed to read block groups: %d", ret);
-		goto fail_sysfs;
-	}
-
 	fs_info->cleaner_kthread = kthread_run(cleaner_kthread, fs_info,
 					       "btrfs-cleaner");
 	if (IS_ERR(fs_info->cleaner_kthread))
-		goto fail_sysfs;
+		goto fail;
 
 	fs_info->transaction_kthread = kthread_run(transaction_kthread,
 						   fs_info->tree_root,
@@ -3919,9 +3941,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	 */
 	filemap_write_and_wait(fs_info->btree_inode->i_mapping);
 
-fail_sysfs:
-	btrfs_put_block_group_cache(fs_info);
-	btrfs_free_block_groups(fs_info);
 fail:
 	for (i = ARRAY_SIZE(open_ctree_seq) - 1; i >= 0; i--) {
 		if (!open_ctree_res[i] || !open_ctree_seq[i].exit_func)
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 13/16] btrfs: move the fs root related code into its own init/exit helpers
  2022-09-22  0:06 [PATCH 00/16] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (11 preceding siblings ...)
  2022-09-22  0:06 ` [PATCH 12/16] btrfs: extra block groups read code into its own init/exit helpers Qu Wenruo
@ 2022-09-22  0:06 ` Qu Wenruo
  2022-09-22  0:06 ` [PATCH 14/16] btrfs: extract kthread " Qu Wenruo
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2022-09-22  0:06 UTC (permalink / raw)
  To: linux-btrfs

The most important change in this patch is the timing change.

The existing code put fs root read very late, after
kthread/qgroup-rescan/log-replay, but put btrfs_free_fs_roots() very
early, as kthread/qgroup/log-replacey can all populate the fs roots.

Thus this patch will change the timing, by reading fs root early.
The fs root read part is not that important, but the cleanup part is.

After the timing change, the fs root would be the first subvolume to be
read, and its exit call can be ensured to cover all later possible
subvolume loads.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c034b017c316..a152899fa21a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3791,6 +3791,19 @@ static void open_ctree_block_groups_exit(struct btrfs_fs_info *fs_info)
 	btrfs_free_block_groups(fs_info);
 }
 
+static int open_ctree_fs_root_init(struct btrfs_fs_info *fs_info)
+{
+	fs_info->fs_root = btrfs_get_fs_root(fs_info, BTRFS_FS_TREE_OBJECTID, true);
+	if (IS_ERR(fs_info->fs_root)) {
+		int ret = PTR_ERR(fs_info->fs_root);
+
+		btrfs_warn(fs_info, "failed to read fs tree: %d", ret);
+		fs_info->fs_root = NULL;
+		return ret;
+	}
+	return 0;
+}
+
 struct init_sequence {
 	int (*init_func)(struct btrfs_fs_info *fs_info);
 	void (*exit_func)(struct btrfs_fs_info *fs_info);
@@ -3824,6 +3837,17 @@ static const struct init_sequence open_ctree_seq[] = {
 	}, {
 		.init_func = open_ctree_block_groups_init,
 		.exit_func = open_ctree_block_groups_exit,
+	}, {
+		/*
+		 * This fs roots related code should be called before anything
+		 * which may try to read a subvolume, including cleanup/commit
+		 * kthread, qgroup rescan, log replay etc.
+		 *
+		 * The main reason is for the exit function to be called for
+		 * any stage which may read some subvolume trees.
+		 */
+		.init_func = open_ctree_fs_root_init,
+		.exit_func = btrfs_free_fs_roots,
 	}
 };
 
@@ -3884,14 +3908,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		}
 	}
 
-	fs_info->fs_root = btrfs_get_fs_root(fs_info, BTRFS_FS_TREE_OBJECTID, true);
-	if (IS_ERR(fs_info->fs_root)) {
-		err = PTR_ERR(fs_info->fs_root);
-		btrfs_warn(fs_info, "failed to read fs tree: %d", err);
-		fs_info->fs_root = NULL;
-		goto fail_qgroup;
-	}
-
 	if (sb_rdonly(fs_info->sb))
 		goto clear_oneshot;
 
@@ -3931,7 +3947,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 fail_trans_kthread:
 	kthread_stop(fs_info->transaction_kthread);
 	btrfs_cleanup_transaction(fs_info);
-	btrfs_free_fs_roots(fs_info);
 fail_cleaner:
 	kthread_stop(fs_info->cleaner_kthread);
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 14/16] btrfs: extract kthread code into its own init/exit helpers
  2022-09-22  0:06 [PATCH 00/16] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (12 preceding siblings ...)
  2022-09-22  0:06 ` [PATCH 13/16] btrfs: move the fs root related " Qu Wenruo
@ 2022-09-22  0:06 ` Qu Wenruo
  2022-09-22  0:06 ` [PATCH 15/16] btrfs: move qgroup init/exit code into open_ctree_seq[] array Qu Wenruo
  2022-09-22  0:06 ` [PATCH 16/16] btrfs: introduce a debug mount option to do error injection for each stage of open_ctree() Qu Wenruo
  15 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2022-09-22  0:06 UTC (permalink / raw)
  To: linux-btrfs

There are several changes involved:

- Change the timing of btrfs_cleanup_transaction()
  That call is to address any unfinished transaction mostly caused by
  the cleaner/commit kthread.

  Thus at exit function and error handling path, we should stop all
  kthread, then cleanup the unfinished transaction.

  Not calling it before stopping cleaner thread.

- Remove the filemap_write_and_wait() call
  Now we have open_ctree_btree_inode_exit() call, which will invalidate
  all dirty pages of btree inode.
  Thus there is no need to writeback those dirtied tree blocks anymore.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 79 +++++++++++++++++++++++++++++-----------------
 1 file changed, 50 insertions(+), 29 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a152899fa21a..7d46c9442b0f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3804,6 +3804,52 @@ static int open_ctree_fs_root_init(struct btrfs_fs_info *fs_info)
 	return 0;
 }
 
+static int open_ctree_kthread_init(struct btrfs_fs_info *fs_info)
+{
+	int ret;
+
+	fs_info->cleaner_kthread = kthread_run(cleaner_kthread, fs_info,
+					       "btrfs-cleaner");
+	if (IS_ERR(fs_info->cleaner_kthread)) {
+		ret = PTR_ERR(fs_info->cleaner_kthread);
+		return ret;
+	}
+
+	fs_info->transaction_kthread = kthread_run(transaction_kthread,
+						   fs_info->tree_root,
+						   "btrfs-transaction");
+	if (IS_ERR(fs_info->transaction_kthread)) {
+		kthread_stop(fs_info->cleaner_kthread);
+
+		/*
+		 * Cleanup thread may have already started a trans.
+		 * The dirtied tree blocks will be invalidated at
+		 * open_ctree_btree_inode_exit() thus we don't need to bother.
+		 */
+		btrfs_cleanup_transaction(fs_info);
+
+		ret = PTR_ERR(fs_info->cleaner_kthread);
+		return ret;
+	}
+	/*
+	 * Mount does not set all options immediately, we can do it now and do
+	 * not have to wait for transaction commit
+	 */
+	btrfs_apply_pending_changes(fs_info);
+	return 0;
+}
+
+static void open_ctree_kthread_exit(struct btrfs_fs_info *fs_info)
+{
+	kthread_stop(fs_info->transaction_kthread);
+	kthread_stop(fs_info->cleaner_kthread);
+	/*
+	 * Cleanup any unfinished transaction started by transaction/cleaner
+	 * kthread.
+	 */
+	btrfs_cleanup_transaction(fs_info);
+}
+
 struct init_sequence {
 	int (*init_func)(struct btrfs_fs_info *fs_info);
 	void (*exit_func)(struct btrfs_fs_info *fs_info);
@@ -3848,6 +3894,9 @@ static const struct init_sequence open_ctree_seq[] = {
 		 */
 		.init_func = open_ctree_fs_root_init,
 		.exit_func = btrfs_free_fs_roots,
+	}, {
+		.init_func = open_ctree_kthread_init,
+		.exit_func = open_ctree_kthread_exit,
 	}
 };
 
@@ -3873,26 +3922,9 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		open_ctree_res[i] = true;
 	}
 
-	fs_info->cleaner_kthread = kthread_run(cleaner_kthread, fs_info,
-					       "btrfs-cleaner");
-	if (IS_ERR(fs_info->cleaner_kthread))
-		goto fail;
-
-	fs_info->transaction_kthread = kthread_run(transaction_kthread,
-						   fs_info->tree_root,
-						   "btrfs-transaction");
-	if (IS_ERR(fs_info->transaction_kthread))
-		goto fail_cleaner;
-
-	/*
-	 * Mount does not set all options immediately, we can do it now and do
-	 * not have to wait for transaction commit
-	 */
-	btrfs_apply_pending_changes(fs_info);
-
 	ret = btrfs_read_qgroup_config(fs_info);
 	if (ret)
-		goto fail_trans_kthread;
+		goto fail;
 
 	if (btrfs_build_ref_tree(fs_info))
 		btrfs_err(fs_info, "couldn't build ref tree");
@@ -3944,17 +3976,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 
 fail_qgroup:
 	btrfs_free_qgroup_config(fs_info);
-fail_trans_kthread:
-	kthread_stop(fs_info->transaction_kthread);
-	btrfs_cleanup_transaction(fs_info);
-fail_cleaner:
-	kthread_stop(fs_info->cleaner_kthread);
-
-	/*
-	 * make sure we're done with the btree inode before we stop our
-	 * kthreads
-	 */
-	filemap_write_and_wait(fs_info->btree_inode->i_mapping);
 
 fail:
 	for (i = ARRAY_SIZE(open_ctree_seq) - 1; i >= 0; i--) {
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 15/16] btrfs: move qgroup init/exit code into open_ctree_seq[] array
  2022-09-22  0:06 [PATCH 00/16] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (13 preceding siblings ...)
  2022-09-22  0:06 ` [PATCH 14/16] btrfs: extract kthread " Qu Wenruo
@ 2022-09-22  0:06 ` Qu Wenruo
  2022-09-22  0:06 ` [PATCH 16/16] btrfs: introduce a debug mount option to do error injection for each stage of open_ctree() Qu Wenruo
  15 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2022-09-22  0:06 UTC (permalink / raw)
  To: linux-btrfs

The qgroup related code is already extracted into two functions,
btrfs_read_qgroup_config() and btrfs_free_qgroup_config().

They are perfect matches for open_ctree_seq[], so just move them into
open_ctree_seq[] array.

And with the usage of open_ctree_seq[], there is no more need for @err
variable, just remove it.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 21 ++++++---------------
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 7d46c9442b0f..14dcec9a23aa 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3897,6 +3897,9 @@ static const struct init_sequence open_ctree_seq[] = {
 	}, {
 		.init_func = open_ctree_kthread_init,
 		.exit_func = open_ctree_kthread_exit,
+	}, {
+		.init_func = btrfs_read_qgroup_config,
+		.exit_func = btrfs_free_qgroup_config,
 	}
 };
 
@@ -3906,7 +3909,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
 	bool open_ctree_res[ARRAY_SIZE(open_ctree_seq)] = {0};
 	int ret;
-	int err = -EINVAL;
 	int i;
 
 	fs_info->sb = sb;
@@ -3922,10 +3924,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		open_ctree_res[i] = true;
 	}
 
-	ret = btrfs_read_qgroup_config(fs_info);
-	if (ret)
-		goto fail;
-
 	if (btrfs_build_ref_tree(fs_info))
 		btrfs_err(fs_info, "couldn't build ref tree");
 
@@ -3934,10 +3932,8 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	    !btrfs_test_opt(fs_info, NOLOGREPLAY)) {
 		btrfs_info(fs_info, "start tree-log replay");
 		ret = btrfs_replay_log(fs_info, fs_devices);
-		if (ret) {
-			err = ret;
-			goto fail_qgroup;
-		}
+		if (ret)
+			goto fail;
 	}
 
 	if (sb_rdonly(fs_info->sb))
@@ -3974,9 +3970,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	btrfs_clear_oneshot_options(fs_info);
 	return 0;
 
-fail_qgroup:
-	btrfs_free_qgroup_config(fs_info);
-
 fail:
 	for (i = ARRAY_SIZE(open_ctree_seq) - 1; i >= 0; i--) {
 		if (!open_ctree_res[i] || !open_ctree_seq[i].exit_func)
@@ -3985,9 +3978,7 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		open_ctree_res[i] = false;
 	}
 	btrfs_close_devices(fs_info->fs_devices);
-	if (ret < 0)
-		err = ret;
-	return err;
+	return ret;
 }
 ALLOW_ERROR_INJECTION(open_ctree, ERRNO);
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 16/16] btrfs: introduce a debug mount option to do error injection for each stage of open_ctree()
  2022-09-22  0:06 [PATCH 00/16] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (14 preceding siblings ...)
  2022-09-22  0:06 ` [PATCH 15/16] btrfs: move qgroup init/exit code into open_ctree_seq[] array Qu Wenruo
@ 2022-09-22  0:06 ` Qu Wenruo
  15 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2022-09-22  0:06 UTC (permalink / raw)
  To: linux-btrfs

With the new open_ctree_seq[] array, we can afford a debug mount option
to do all the error inject at different stages to have a much better
coverage for the error path.

The new "fail_mount=%u" mount option will be hidden behind
CONFIG_BTRFS_DEBUG option, and when enabled it will cause mount failure
just after the init function of specified stage.

This can be verified by the following script:

 mkfs.btrfs -f $dev
 for (( i=0;; i++ )) do
	mount -o fail_mount=$i $dev $mnt
	ret=$?
	if [ $ret -eq 0 ]; then
		umount $mnt
		exit
	fi
 done
 umount $mnt

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.h   |  1 +
 fs/btrfs/disk-io.c | 14 ++++++++++++++
 fs/btrfs/super.c   | 13 +++++++++++++
 3 files changed, 28 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 88dc3b2896d7..7908027a6bb6 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1123,6 +1123,7 @@ struct btrfs_fs_info {
 
 	spinlock_t eb_leak_lock;
 	struct list_head allocated_ebs;
+	int fail_stage;
 #endif
 };
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 14dcec9a23aa..66d45521a9df 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3013,6 +3013,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 	INIT_LIST_HEAD(&fs_info->allocated_roots);
 	INIT_LIST_HEAD(&fs_info->allocated_ebs);
 	spin_lock_init(&fs_info->eb_leak_lock);
+	fs_info->fail_stage = -1;
 #endif
 	extent_map_tree_init(&fs_info->mapping_tree);
 	btrfs_init_block_rsv(&fs_info->global_block_rsv,
@@ -3922,6 +3923,19 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		if (ret < 0)
 			goto fail;
 		open_ctree_res[i] = true;
+#ifdef CONFIG_BTRFS_DEBUG
+		/*
+		 * This is not the best timing, as fail_stage will only be
+		 * initialized after open_ctree_features_init().
+		 * But this is still better to cover more error paths.
+		 */
+		if (fs_info->fail_stage >= 0 && i >= fs_info->fail_stage) {
+			btrfs_info(fs_info,
+				"error injected at open ctree stage %u", i);
+			ret = -ECANCELED;
+			goto fail;
+		}
+#endif
 	}
 
 	if (btrfs_build_ref_tree(fs_info))
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 9d86b7ef9e43..76e6ca9ea98d 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -447,6 +447,7 @@ enum {
 	Opt_enospc_debug, Opt_noenospc_debug,
 #ifdef CONFIG_BTRFS_DEBUG
 	Opt_fragment_data, Opt_fragment_metadata, Opt_fragment_all,
+	Opt_fail_mount,
 #endif
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
 	Opt_ref_verify,
@@ -521,6 +522,7 @@ static const match_table_t tokens = {
 	{Opt_fragment_data, "fragment=data"},
 	{Opt_fragment_metadata, "fragment=metadata"},
 	{Opt_fragment_all, "fragment=all"},
+	{Opt_fail_mount, "fail_mount=%u"},
 #endif
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
 	{Opt_ref_verify, "ref_verify"},
@@ -1106,6 +1108,17 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
 			btrfs_info(info, "fragmenting data");
 			btrfs_set_opt(info->mount_opt, FRAGMENT_DATA);
 			break;
+		case Opt_fail_mount:
+			ret = match_int(&args[0], &intarg);
+			if (ret) {
+				btrfs_err(info, "unrecognized fail_mount value %s",
+					  args[0].from);
+				goto out;
+			}
+			btrfs_info(info, "fail mount at open_ctree() stage %u",
+				   intarg);
+			info->fail_stage = intarg;
+			break;
 #endif
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
 		case Opt_ref_verify:
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 06/16] btrfs: extract mount options and features init code into its own init helper
  2022-09-22  0:06 ` [PATCH 06/16] btrfs: extract mount options and features init " Qu Wenruo
@ 2022-09-22  0:28   ` Qu Wenruo
  0 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2022-09-22  0:28 UTC (permalink / raw)
  To: linux-btrfs



On 2022/9/22 08:06, Qu Wenruo wrote:
> This patch will extract the mount option parsing and features checking
> code into a new helper, open_ctree_features_init().
> 
> This extraction also did the following non-functional change:
> 
> - Add btrfs_fs_info::__options member
>    Currently the open_ctree_* helpers can only accept a single @fs_info
>    parameter, to parse the mount options we have to use a temporary
>    pointer for this purpose.
> 
>    Thankfully we don't need to do anything like freeing it.
> 
> - Move ssd optimization check into open_ctree_features_init()
> 
> - Move check_int related code into open_ctree_features_init()
>    The mount time integrity check doesn't rely on any tree blocks, thus
>    is safe to be called at feature init time.
> 
> - Separate @features variable into @incompat and @compat_ro
>    So there will be no confusion, and compiler should be clever enough
>    to optimize them out anyway.
> 
> - Properly return error for subpage initialization failure
>    Preivously we just goto fail_alloc label, relying on the default
>    -EINVAL error.
>    Now we can return a proper -ENOMEM error instead.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/ctree.h   |   6 ++
>   fs/btrfs/disk-io.c | 176 +++++++++++++++++++++++----------------------
>   2 files changed, 96 insertions(+), 86 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 8b7b7a212da0..88dc3b2896d7 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1106,6 +1106,12 @@ struct btrfs_fs_info {
>   	struct lockdep_map btrfs_trans_pending_ordered_map;
>   	struct lockdep_map btrfs_ordered_extent_map;
>   
> +	/*
> +	 * Temporary pointer to the mount option string.
> +	 * This is to workaround the fact that all open_ctree() init
> +	 * functions can only accept a single @fs_info pointer.
> +	 */
> +	char *__options;
>   #ifdef CONFIG_BTRFS_FS_REF_VERIFY
>   	spinlock_t ref_verify_lock;
>   	struct rb_root block_tree;
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index f4c04cb1c0d6..ec9038eeaa0f 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3446,118 +3446,78 @@ static int open_ctree_super_init(struct btrfs_fs_info *fs_info)
>   	return ret;
>   }
>   
> -struct init_sequence {
> -	int (*init_func)(struct btrfs_fs_info *fs_info);
> -	void (*exit_func)(struct btrfs_fs_info *fs_info);
> -};
> -
> -static const struct init_sequence open_ctree_seq[] = {
> -	{
> -		.init_func = open_ctree_btree_inode_init,
> -		.exit_func = open_ctree_btree_inode_exit,
> -	}, {
> -		.init_func = open_ctree_super_init,
> -		.exit_func = NULL,
> -	}
> -};
> -
> -
> -int __cold open_ctree(struct super_block *sb, char *options)
> +static int open_ctree_features_init(struct btrfs_fs_info *fs_info)
>   {
> -	u64 generation;
> -	u64 features;
> -	bool open_ctree_res[ARRAY_SIZE(open_ctree_seq)] = {0};
> -	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
> -	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
>   	int ret;
> -	int err = -EINVAL;
> -	int level;
> -	int i;
> -
> -	fs_info->sb = sb;
> -
> -	/* Caller should have already initialized fs_info->fs_devices. */
> -	ASSERT(fs_info->fs_devices);
> -
> -	for (i = 0; i < ARRAY_SIZE(open_ctree_seq); i++) {
> -		ret = open_ctree_seq[i].init_func(fs_info);
> -		if (ret < 0)
> -			goto fail;
> -		open_ctree_res[i] = true;
> -	}
> +	u64 incompat;
> +	u64 compat_ro;
>   
> -	ret = btrfs_parse_options(fs_info, options, fs_info->sb->s_flags);
> -	if (ret) {
> -		err = ret;
> -		goto fail_alloc;
> -	}
> +	ret = btrfs_parse_options(fs_info, fs_info->__options,
> +				  fs_info->sb->s_flags);
> +	if (ret)
> +		return ret;
>   
> -	features = btrfs_super_incompat_flags(fs_info->super_copy) &
> -		~BTRFS_FEATURE_INCOMPAT_SUPP;
> -	if (features) {
> +	incompat = btrfs_super_incompat_flags(fs_info->super_copy);
> +	if (incompat & ~BTRFS_FEATURE_INCOMPAT_SUPP) {
>   		btrfs_err(fs_info,
>   		    "cannot mount because of unsupported optional features (0x%llx)",
> -		    features);
> -		err = -EINVAL;
> -		goto fail_alloc;
> +		    incompat & ~BTRFS_FEATURE_INCOMPAT_SUPP);
> +		return -EINVAL;
>   	}
>   
> -	features = btrfs_super_incompat_flags(fs_info->super_copy);
> -	features |= BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF;
> +	incompat |= BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF;
>   	if (fs_info->compress_type == BTRFS_COMPRESS_LZO)
> -		features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;
> +		incompat |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;
>   	else if (fs_info->compress_type == BTRFS_COMPRESS_ZSTD)
> -		features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_ZSTD;
> +		incompat |= BTRFS_FEATURE_INCOMPAT_COMPRESS_ZSTD;
>   
>   	/*
>   	 * Flag our filesystem as having big metadata blocks if they are bigger
>   	 * than the page size.
>   	 */
>   	if (btrfs_super_nodesize(fs_info->super_copy) > PAGE_SIZE)
> -		features |= BTRFS_FEATURE_INCOMPAT_BIG_METADATA;
> +		incompat |= BTRFS_FEATURE_INCOMPAT_BIG_METADATA;
>   
>   	/*
>   	 * mixed block groups end up with duplicate but slightly offset
>   	 * extent buffers for the same range.  It leads to corruptions
>   	 */
> -	if ((features & BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS) &&
> +	if ((incompat & BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS) &&
>   	    (fs_info->sectorsize != fs_info->nodesize)) {
>   		btrfs_err(fs_info,
>   "unequal nodesize/sectorsize (%u != %u) are not allowed for mixed block groups",
>   			fs_info->nodesize, fs_info->sectorsize);
> -		goto fail_alloc;
> +		return -EINVAL;
>   	}
>   
>   	/*
>   	 * Needn't use the lock because there is no other task which will
>   	 * update the flag.
>   	 */
> -	btrfs_set_super_incompat_flags(fs_info->super_copy, features);
> +	btrfs_set_super_incompat_flags(fs_info->super_copy, incompat);
>   
> -	features = btrfs_super_compat_ro_flags(fs_info->super_copy) &
> -		~BTRFS_FEATURE_COMPAT_RO_SUPP;
> -	if (!sb_rdonly(fs_info->sb) && features) {
> +	compat_ro = btrfs_super_compat_ro_flags(fs_info->super_copy);
> +	if (!sb_rdonly(fs_info->sb) &&
> +	    (compat_ro & ~BTRFS_FEATURE_COMPAT_RO_SUPP)) {
>   		btrfs_err(fs_info,
>   	"cannot mount read-write because of unsupported optional features (0x%llx)",
> -		       features);
> -		err = -EINVAL;
> -		goto fail_alloc;
> +		       compat_ro & ~BTRFS_FEATURE_COMPAT_RO_SUPP);
> +		return -EINVAL;
>   	}
>   	/*
>   	 * We have unsupported RO compat features, although RO mounted, we
>   	 * should not cause any metadata write, including log replay.
>   	 * Or we could screw up whatever the new feature requires.
>   	 */
> -	if (unlikely(features && btrfs_super_log_root(fs_info->super_copy) &&
> +	if (unlikely((compat_ro & ~BTRFS_FEATURE_COMPAT_RO_SUPP)
> +		     && btrfs_super_log_root(fs_info->super_copy) &&
>   		     !btrfs_test_opt(fs_info, NOLOGREPLAY))) {
>   		btrfs_err(fs_info,
>   "cannot replay dirty log with unsupported compat_ro features (0x%llx), try rescue=nologreplay",
> -			  features);
> -		err = -EINVAL;
> -		goto fail_alloc;
> +			  compat_ro & ~BTRFS_FEATURE_COMPAT_RO_SUPP);
> +		return -EINVAL;
>   	}
>   
> -
>   	if (fs_info->sectorsize < PAGE_SIZE) {
>   		struct btrfs_subpage_info *subpage_info;
>   
> @@ -3577,11 +3537,73 @@ int __cold open_ctree(struct super_block *sb, char *options)
>   			   fs_info->sectorsize, PAGE_SIZE);
>   		subpage_info = kzalloc(sizeof(*subpage_info), GFP_KERNEL);
>   		if (!subpage_info)
> -			goto fail_alloc;
> +			return -ENOMEM;
>   		btrfs_init_subpage_info(subpage_info, fs_info->sectorsize);
>   		fs_info->subpage_info = subpage_info;
>   	}
>   
> +	if (!btrfs_test_opt(fs_info, NOSSD) &&
> +	    !fs_info->fs_devices->rotating)
> +		btrfs_set_and_info(fs_info, SSD, "enabling ssd optimizations");
> +
> +#ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
> +	if (btrfs_test_opt(fs_info, CHECK_INTEGRITY)) {

Btrfs/220 would crash at this stage.

> +		ret = btrfsic_mount(fs_info, fs_info->fs_devices,
> +				    btrfs_test_opt(fs_info,
> +					CHECK_INTEGRITY_DATA) ? 1 : 0,
> +				    fs_info->check_integrity_print_mask);

Surprisingly btrfsic_mount() needs to btrfs logical address mapping at 
least, thus it can only be called after chunk tree being initialized.

Would move this into open_ctree_chunk_tree_init() instead.

Thanks,
Qu

> +		if (ret)
> +			btrfs_warn(fs_info,
> +				"failed to initialize integrity check module: %d",
> +				ret);
> +	}
> +#endif
> +
> +	return 0;
> +}
> +
> +struct init_sequence {
> +	int (*init_func)(struct btrfs_fs_info *fs_info);
> +	void (*exit_func)(struct btrfs_fs_info *fs_info);
> +};
> +
> +static const struct init_sequence open_ctree_seq[] = {
> +	{
> +		.init_func = open_ctree_btree_inode_init,
> +		.exit_func = open_ctree_btree_inode_exit,
> +	}, {
> +		.init_func = open_ctree_super_init,
> +		.exit_func = NULL,
> +	}, {
> +		.init_func = open_ctree_features_init,
> +		.exit_func = NULL,
> +	}
> +};
> +
> +int __cold open_ctree(struct super_block *sb, char *options)
> +{
> +	u64 generation;
> +	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
> +	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
> +	bool open_ctree_res[ARRAY_SIZE(open_ctree_seq)] = {0};
> +	int ret;
> +	int err = -EINVAL;
> +	int level;
> +	int i;
> +
> +	fs_info->sb = sb;
> +	fs_info->__options = options;
> +
> +	/* Caller should have already initialized fs_info->fs_devices. */
> +	ASSERT(fs_info->fs_devices);
> +
> +	for (i = 0; i < ARRAY_SIZE(open_ctree_seq); i++) {
> +		ret = open_ctree_seq[i].init_func(fs_info);
> +		if (ret < 0)
> +			goto fail;
> +		open_ctree_res[i] = true;
> +	}
> +
>   	ret = btrfs_init_workqueues(fs_info);
>   	if (ret) {
>   		err = ret;
> @@ -3736,29 +3758,12 @@ int __cold open_ctree(struct super_block *sb, char *options)
>   	if (IS_ERR(fs_info->transaction_kthread))
>   		goto fail_cleaner;
>   
> -	if (!btrfs_test_opt(fs_info, NOSSD) &&
> -	    !fs_info->fs_devices->rotating) {
> -		btrfs_set_and_info(fs_info, SSD, "enabling ssd optimizations");
> -	}
> -
>   	/*
>   	 * Mount does not set all options immediately, we can do it now and do
>   	 * not have to wait for transaction commit
>   	 */
>   	btrfs_apply_pending_changes(fs_info);
>   
> -#ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
> -	if (btrfs_test_opt(fs_info, CHECK_INTEGRITY)) {
> -		ret = btrfsic_mount(fs_info, fs_devices,
> -				    btrfs_test_opt(fs_info,
> -					CHECK_INTEGRITY_DATA) ? 1 : 0,
> -				    fs_info->check_integrity_print_mask);
> -		if (ret)
> -			btrfs_warn(fs_info,
> -				"failed to initialize integrity check module: %d",
> -				ret);
> -	}
> -#endif
>   	ret = btrfs_read_qgroup_config(fs_info);
>   	if (ret)
>   		goto fail_trans_kthread;
> @@ -3851,7 +3856,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
>   fail_sb_buffer:
>   	btrfs_stop_all_workers(fs_info);
>   	btrfs_free_block_groups(fs_info);
> -fail_alloc:
>   	btrfs_mapping_tree_free(&fs_info->mapping_tree);
>   fail:
>   	for (i = ARRAY_SIZE(open_ctree_seq) - 1; i >= 0; i--) {

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-09-22  0:28 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-22  0:06 [PATCH 00/16] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
2022-09-22  0:06 ` [PATCH 01/16] btrfs: make btrfs module init/exit match their sequence Qu Wenruo
2022-09-22  0:06 ` [PATCH 02/16] btrfs: initialize fs_info->sb at the very beginning of open_ctree() Qu Wenruo
2022-09-22  0:06 ` [PATCH 03/16] btrfs: remove @fs_devices argument from open_ctree() Qu Wenruo
2022-09-22  0:06 ` [PATCH 04/16] btrfs: extract btree inode init code into its own init/exit helpers Qu Wenruo
2022-09-22  0:06 ` [PATCH 05/16] btrfs: extract super block read code into its own init helper Qu Wenruo
2022-09-22  0:06 ` [PATCH 06/16] btrfs: extract mount options and features init " Qu Wenruo
2022-09-22  0:28   ` Qu Wenruo
2022-09-22  0:06 ` [PATCH 07/16] btrfs: move btrfs_init_workqueus() and btrfs_stop_all_workers() into open_ctree_seq[] Qu Wenruo
2022-09-22  0:06 ` [PATCH 08/16] btrfs: extract chunk tree read code into its own init/exit helpers Qu Wenruo
2022-09-22  0:06 ` [PATCH 09/16] btrfs: extract tree roots and zone info initialization into " Qu Wenruo
2022-09-22  0:06 ` [PATCH 10/16] btrfs: extract mount time checks and items load code into its init helper Qu Wenruo
2022-09-22  0:06 ` [PATCH 11/16] btrfs: extract sysfs init into its own helper Qu Wenruo
2022-09-22  0:06 ` [PATCH 12/16] btrfs: extra block groups read code into its own init/exit helpers Qu Wenruo
2022-09-22  0:06 ` [PATCH 13/16] btrfs: move the fs root related " Qu Wenruo
2022-09-22  0:06 ` [PATCH 14/16] btrfs: extract kthread " Qu Wenruo
2022-09-22  0:06 ` [PATCH 15/16] btrfs: move qgroup init/exit code into open_ctree_seq[] array Qu Wenruo
2022-09-22  0:06 ` [PATCH 16/16] btrfs: introduce a debug mount option to do error injection for each stage of open_ctree() Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).