All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/2] btrfs: Introduce new rescue= mount options
@ 2020-06-04  7:18 Qu Wenruo
  2020-06-04  7:18 ` [PATCH v7 1/2] btrfs: Introduce "rescue=" mount option Qu Wenruo
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Qu Wenruo @ 2020-06-04  7:18 UTC (permalink / raw)
  To: linux-btrfs

There are quite a lot btrfs extent tree corruption report in the mail
list.
Since btrfs will do mount time block group item search, one corrupted
leaf containing block group item will prevent the whole fs to be
mounted.

This patchset will try to address the problem by introducing a new mount
option, "rescue=skipbg", as a last-resort rescue.
With "rescue=skipbg", the whole extent tree will be skipped if we hit
some problems at mount time.
This brings some side effect that for super large fs, the mount time can
be hugely reduced by this mount option.

Of course this option will have a lot of restrictions to prevent further
screwing up the fs, including:

- Permanent RO
  No remount rw is allowed

- No dirty log
  Either clean the log or use rescue=nologreplay mount option

This "rescue=skipbg" has some advantage compared to user space tool
like "btrfs-restore":
- Unified recovery tool
  User can use any tool they're familiar with, as long as the kernel
  doesn't panic.

- More info for subvolume.
  "btrfs subv list" can work now!


Also move the following mount options to "rescue=" group:
- nologreplay
  to rescue=nologreplay

- usebackuproot
  to rescue=usebackuproot

Old options are still available for compatibility purpose, but they are
deprecated in favor of new 'rescue=' super option.

Different rescue sub options can be separated by ':', like:
"rescue=nologreplay:skipbg:usebackuproot".
Or the traditional but longer way like:
"rescue=nologreplay,rescue=skipbg"

The separation character is chosen by:
- No conflicts with existing character
  Especially no conflict with ','.

- No extra escaping/quota
  Original plan is ';', but since it'll be interpreted by bash, it's
  changed to current ':'.

With this ability to mount the fs without iterating extent tree, we will
have the possibility to make skinny bg tree feature RO compat other than
completely incompat.

Changelog:
v2:
- Introduce 'rescue=' super option.
- Rename original 'usebackuproot' and 'nologreplay'.
  It at least makes my vim spell check happier.
- Remove 'recovery' mount option.
  As its successor is now deprecated, not need to keep the predecessor.

v2.1:
- Rebase to v5.1-rc4.
- Fix the typos in the cover letter.

v3:
- Rebased to v5.2-rc2.
- Update commit message to include an example for "rescue=" options.
- Remove unnecessary exclusion of super blocks spaces and block group
  ro.
  This seems to cause incorrect df output.

v4:
- Rebased to v5.3-rc7
  Minor conflicts due to some function name and structure change.
- Keep the old 'recovery' mount option
- Keep the old 'usebackuproot' and 'nologreplay' naming for 'rescue='
  mount options
  So just append 'rescue=' to existing mount option.

v5:
- Rebased to v5.4-rc1
  Minor conflicts caused by block-group.[ch] code movement.
- Fix a bug of wrong prompt and check for log tree
  It should prompt user and check nologreplay option, not notreelog.

v6:
- Rebased to misc-next
  Minor conflicts caused by btrfs_block_group_cache rename.

v6:
- Rebased to misc-next
  Minor conflicts caused by btrfs_block_group initialization refactor.

- Add the mention for skinny bg tree feature in the cover letter

Qu Wenruo (2):
  btrfs: Introduce "rescue=" mount option
  btrfs: Introduce new mount option to skip block group items scan

 fs/btrfs/block-group.c |  49 ++++++++++++++++++
 fs/btrfs/ctree.h       |   1 +
 fs/btrfs/disk-io.c     |  29 +++++++++--
 fs/btrfs/super.c       | 109 +++++++++++++++++++++++++++++++++++++----
 fs/btrfs/volumes.c     |   7 +++
 5 files changed, 181 insertions(+), 14 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v7 1/2] btrfs: Introduce "rescue=" mount option
  2020-06-04  7:18 [PATCH v7 0/2] btrfs: Introduce new rescue= mount options Qu Wenruo
@ 2020-06-04  7:18 ` Qu Wenruo
  2020-06-04 13:15   ` Josef Bacik
                     ` (2 more replies)
  2020-06-04  7:18 ` [PATCH v7 2/2] btrfs: Introduce new mount option to skip block group items scan Qu Wenruo
  2020-06-05  9:22 ` [PATCH v7 0/2] btrfs: Introduce new rescue= mount options Anand Jain
  2 siblings, 3 replies; 13+ messages in thread
From: Qu Wenruo @ 2020-06-04  7:18 UTC (permalink / raw)
  To: linux-btrfs

This patch introduces a new "rescue=" mount option group for all those
mount options for data recovery.

Different rescue sub options are seperated by ':'. E.g
"ro,rescue=nologreplay:usebackuproot".
(The original plan is to use ';', but ';' needs to be escaped/quoted,
or it will be interpreted by bash)

And obviously, user can specify rescue options one by one like:
"ro,rescue=nologreplay,rescue=usebackuproot"

The following mount options are converted to "rescue=", old mount
options are deprecated but still available for compatibility purpose:

- usebackuproot
  Now it's "rescue=usebackuproot"

- nologreplay
  Now it's "rescue=nologreplay"

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/super.c | 79 +++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 71 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index bc73fd670702..ed6d5d55ee93 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -326,7 +326,6 @@ enum {
 	Opt_defrag, Opt_nodefrag,
 	Opt_discard, Opt_nodiscard,
 	Opt_discard_mode,
-	Opt_nologreplay,
 	Opt_norecovery,
 	Opt_ratio,
 	Opt_rescan_uuid_tree,
@@ -340,9 +339,13 @@ enum {
 	Opt_subvolid,
 	Opt_thread_pool,
 	Opt_treelog, Opt_notreelog,
-	Opt_usebackuproot,
 	Opt_user_subvol_rm_allowed,
 
+	/* Rescue options */
+	Opt_rescue,
+	Opt_usebackuproot,
+	Opt_nologreplay,
+
 	/* Deprecated options */
 	Opt_alloc_start,
 	Opt_recovery,
@@ -390,7 +393,6 @@ static const match_table_t tokens = {
 	{Opt_discard, "discard"},
 	{Opt_discard_mode, "discard=%s"},
 	{Opt_nodiscard, "nodiscard"},
-	{Opt_nologreplay, "nologreplay"},
 	{Opt_norecovery, "norecovery"},
 	{Opt_ratio, "metadata_ratio=%u"},
 	{Opt_rescan_uuid_tree, "rescan_uuid_tree"},
@@ -408,9 +410,13 @@ static const match_table_t tokens = {
 	{Opt_thread_pool, "thread_pool=%u"},
 	{Opt_treelog, "treelog"},
 	{Opt_notreelog, "notreelog"},
-	{Opt_usebackuproot, "usebackuproot"},
 	{Opt_user_subvol_rm_allowed, "user_subvol_rm_allowed"},
 
+	/* Recovery options */
+	{Opt_rescue, "rescue=%s"},
+	{Opt_nologreplay, "nologreplay"},
+	{Opt_usebackuproot, "usebackuproot"},
+
 	/* Deprecated options */
 	{Opt_alloc_start, "alloc_start=%s"},
 	{Opt_recovery, "recovery"},
@@ -433,6 +439,55 @@ static const match_table_t tokens = {
 	{Opt_err, NULL},
 };
 
+static const match_table_t rescue_tokens = {
+	{Opt_usebackuproot, "usebackuproot"},
+	{Opt_nologreplay, "nologreplay"},
+	{Opt_err, NULL},
+};
+
+static int parse_rescue_options(struct btrfs_fs_info *info, const char *options)
+{
+	char *opts;
+	char *orig;
+	char *p;
+	substring_t args[MAX_OPT_ARGS];
+	int ret = 0;
+
+	opts = kstrdup(options, GFP_KERNEL);
+	if (!opts)
+		return -ENOMEM;
+	orig = opts;
+
+	while ((p = strsep(&opts, ":")) != NULL) {
+		int token;
+
+		if (!*p)
+			continue;
+		token = match_token(p, rescue_tokens, args);
+		switch (token){
+		case Opt_usebackuproot:
+			btrfs_info(info,
+				   "trying to use backup root at mount time");
+			btrfs_set_opt(info->mount_opt, USEBACKUPROOT);
+			break;
+		case Opt_nologreplay:
+			btrfs_set_and_info(info, NOLOGREPLAY,
+					   "disabling log replay at mount time");
+			break;
+		case Opt_err:
+			btrfs_info(info, "unrecognized rescue option '%s'", p);
+			ret = -EINVAL;
+			goto out;
+		default:
+			break;
+		}
+
+	}
+out:
+	kfree(orig);
+	return ret;
+}
+
 /*
  * Regular mount options parser.  Everything that is needed only when
  * reading in a new superblock is parsed here.
@@ -689,6 +744,8 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
 			break;
 		case Opt_norecovery:
 		case Opt_nologreplay:
+			btrfs_warn(info,
+	"'nologreplay' is deprecated, use 'rescue=nologreplay' instead");
 			btrfs_set_and_info(info, NOLOGREPLAY,
 					   "disabling log replay at mount time");
 			break;
@@ -791,10 +848,11 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
 					     "disabling auto defrag");
 			break;
 		case Opt_recovery:
-			btrfs_warn(info,
-				   "'recovery' is deprecated, use 'usebackuproot' instead");
-			/* fall through */
 		case Opt_usebackuproot:
+			btrfs_warn(info,
+		"'%s' is deprecated, use 'rescue=usebackuproot' instead",
+				   token == Opt_recovery ? "recovery" :
+				   "usebackuproot");
 			btrfs_info(info,
 				   "trying to use backup root at mount time");
 			btrfs_set_opt(info->mount_opt, USEBACKUPROOT);
@@ -881,6 +939,11 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
 			btrfs_set_opt(info->mount_opt, REF_VERIFY);
 			break;
 #endif
+		case Opt_rescue:
+			ret = parse_rescue_options(info, args[0].from);
+			if (ret < 0)
+				goto out;
+			break;
 		case Opt_err:
 			btrfs_err(info, "unrecognized mount option '%s'", p);
 			ret = -EINVAL;
@@ -1344,7 +1407,7 @@ static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry)
 	if (btrfs_test_opt(info, NOTREELOG))
 		seq_puts(seq, ",notreelog");
 	if (btrfs_test_opt(info, NOLOGREPLAY))
-		seq_puts(seq, ",nologreplay");
+		seq_puts(seq, ",rescue=nologreplay");
 	if (btrfs_test_opt(info, FLUSHONCOMMIT))
 		seq_puts(seq, ",flushoncommit");
 	if (btrfs_test_opt(info, DISCARD_SYNC))
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v7 2/2] btrfs: Introduce new mount option to skip block group items scan
  2020-06-04  7:18 [PATCH v7 0/2] btrfs: Introduce new rescue= mount options Qu Wenruo
  2020-06-04  7:18 ` [PATCH v7 1/2] btrfs: Introduce "rescue=" mount option Qu Wenruo
@ 2020-06-04  7:18 ` Qu Wenruo
  2020-06-04 13:17   ` Josef Bacik
  2020-06-05 10:03   ` Anand Jain
  2020-06-05  9:22 ` [PATCH v7 0/2] btrfs: Introduce new rescue= mount options Anand Jain
  2 siblings, 2 replies; 13+ messages in thread
From: Qu Wenruo @ 2020-06-04  7:18 UTC (permalink / raw)
  To: linux-btrfs

[PROBLEM]
There are some reports of corrupted fs which can't be mounted due to
corrupted extent tree.

However under such situation, it's more likely the fs/subvolume trees
are still fine.

For such case we normally go btrfs-restore and salvage as much as we
can. However btrfs-restore can't list subvolumes as "btrfs subv list",
making it harder to restore a fs.

[ENHANCEMENT]
This patch will introduce a new mount option "rescue=skipbg" to skip
the mount time block group scan, and use chunk info solely to populate
fake block group cache.

The mount option has the following dependency:
- RO mount
  Obviously.

- No dirty log.
  Either there is no log, or use rescue=nologreplay mount option.

- No way to remoutn RW
  Similar to rescue=nologreplay option.

This allow kernel to accept all extent tree corruption, even when the
whole extent tree is corrupted, and allow user to salvage data and
subvolume info.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/block-group.c | 49 ++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/ctree.h       |  1 +
 fs/btrfs/disk-io.c     | 29 +++++++++++++++++++++----
 fs/btrfs/super.c       | 32 ++++++++++++++++++++++++---
 fs/btrfs/volumes.c     |  7 ++++++
 5 files changed, 111 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 176e8a292fd1..5ed9d7946ce3 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2000,6 +2000,52 @@ static int read_one_block_group(struct btrfs_fs_info *info,
 	return ret;
 }
 
+static int fill_dummy_bgs(struct btrfs_fs_info *fs_info)
+{
+	struct extent_map_tree *em_tree = &fs_info->mapping_tree;
+	struct extent_map *em;
+	struct map_lookup *map;
+	struct btrfs_block_group *bg;
+	struct btrfs_space_info *space_info;
+	struct rb_node *node;
+	int ret = 0;
+
+	read_lock(&em_tree->lock);
+	for (node = rb_first_cached(&em_tree->map); node;
+	     node = rb_next(node)) {
+		em = rb_entry(node, struct extent_map, rb_node);
+		map = em->map_lookup;
+		bg = btrfs_create_block_group_cache(fs_info, em->start);
+		if (!bg) {
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		/* Fill dummy cache as FULL */
+		bg->length = em->len;
+		bg->flags = map->type;
+		bg->last_byte_to_unpin = (u64)-1;
+		bg->cached = BTRFS_CACHE_FINISHED;
+		bg->used = em->len;
+		bg->flags = map->type;
+		ret = btrfs_add_block_group_cache(fs_info, bg);
+		if (ret) {
+			btrfs_remove_free_space_cache(bg);
+			btrfs_put_block_group(bg);
+			goto out;
+		}
+		btrfs_update_space_info(fs_info, bg->flags, em->len, em->len,
+					0, &space_info);
+		bg->space_info = space_info;
+		link_block_group(bg);
+
+		set_avail_alloc_bits(fs_info, bg->flags);
+	}
+out:
+	read_unlock(&em_tree->lock);
+	return ret;
+}
+
 int btrfs_read_block_groups(struct btrfs_fs_info *info)
 {
 	struct btrfs_path *path;
@@ -2010,6 +2056,9 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
 	int need_clear = 0;
 	u64 cache_gen;
 
+	if (btrfs_test_opt(info, SKIPBG))
+		return fill_dummy_bgs(info);
+
 	key.objectid = 0;
 	key.offset = 0;
 	key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 161533040978..f756775018bf 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1265,6 +1265,7 @@ static inline u32 BTRFS_MAX_XATTR_SIZE(const struct btrfs_fs_info *info)
 #define BTRFS_MOUNT_NOLOGREPLAY		(1 << 27)
 #define BTRFS_MOUNT_REF_VERIFY		(1 << 28)
 #define BTRFS_MOUNT_DISCARD_ASYNC	(1 << 29)
+#define BTRFS_MOUNT_SKIPBG		(1 << 30)
 
 #define BTRFS_DEFAULT_COMMIT_INTERVAL	(30)
 #define BTRFS_DEFAULT_MAX_INLINE	(2048)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f8ec2d8606fd..84d62bd53940 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2269,11 +2269,15 @@ static int btrfs_read_roots(struct btrfs_fs_info *fs_info)
 
 	root = btrfs_read_tree_root(tree_root, &location);
 	if (IS_ERR(root)) {
-		ret = PTR_ERR(root);
-		goto out;
+		if (!btrfs_test_opt(fs_info, SKIPBG)) {
+			ret = PTR_ERR(root);
+			goto out;
+		}
+		fs_info->extent_root = NULL;
+	} else {
+		set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state);
+		fs_info->extent_root = root;
 	}
-	set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state);
-	fs_info->extent_root = root;
 
 	location.objectid = BTRFS_DEV_TREE_OBJECTID;
 	root = btrfs_read_tree_root(tree_root, &location);
@@ -3047,6 +3051,23 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 		goto fail_alloc;
 	}
 
+	/* Skip bg needs RO and no log tree replay */
+	if (btrfs_test_opt(fs_info, SKIPBG)) {
+		if (!sb_rdonly(sb)) {
+			btrfs_err(fs_info,
+	"rescue=skipbg mount option can only be used with read-only mount");
+			err = -EINVAL;
+			goto fail_alloc;
+		}
+		if (btrfs_super_log_root(disk_super) &&
+		    !btrfs_test_opt(fs_info, NOLOGREPLAY)) {
+			btrfs_err(fs_info,
+"rescue=skipbg must be used with rescue=nologreplay mount option for dirty log");
+			err = -EINVAL;
+			goto fail_alloc;
+		}
+	}
+
 	ret = btrfs_init_workqueues(fs_info, fs_devices);
 	if (ret) {
 		err = ret;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index ed6d5d55ee93..ae909df0dccc 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -341,10 +341,11 @@ enum {
 	Opt_treelog, Opt_notreelog,
 	Opt_user_subvol_rm_allowed,
 
-	/* Rescue options */
+	/* Rescue options, Opt_rescue_* is only for rescue= mount options */
 	Opt_rescue,
 	Opt_usebackuproot,
 	Opt_nologreplay,
+	Opt_rescue_skipbg,
 
 	/* Deprecated options */
 	Opt_alloc_start,
@@ -442,6 +443,7 @@ static const match_table_t tokens = {
 static const match_table_t rescue_tokens = {
 	{Opt_usebackuproot, "usebackuproot"},
 	{Opt_nologreplay, "nologreplay"},
+	{Opt_rescue_skipbg, "skipbg"},
 	{Opt_err, NULL},
 };
 
@@ -474,6 +476,10 @@ static int parse_rescue_options(struct btrfs_fs_info *info, const char *options)
 			btrfs_set_and_info(info, NOLOGREPLAY,
 					   "disabling log replay at mount time");
 			break;
+		case Opt_rescue_skipbg:
+			btrfs_set_and_info(info, SKIPBG,
+				"skip mount time block group searching");
+			break;
 		case Opt_err:
 			btrfs_info(info, "unrecognized rescue option '%s'", p);
 			ret = -EINVAL;
@@ -1408,6 +1414,8 @@ static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry)
 		seq_puts(seq, ",notreelog");
 	if (btrfs_test_opt(info, NOLOGREPLAY))
 		seq_puts(seq, ",rescue=nologreplay");
+	if (btrfs_test_opt(info, SKIPBG))
+		seq_puts(seq, ",rescue=skipbg");
 	if (btrfs_test_opt(info, FLUSHONCOMMIT))
 		seq_puts(seq, ",flushoncommit");
 	if (btrfs_test_opt(info, DISCARD_SYNC))
@@ -1847,6 +1855,14 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 	if (ret)
 		goto restore;
 
+	if (btrfs_test_opt(fs_info, SKIPBG) !=
+	    (old_opts & BTRFS_MOUNT_SKIPBG)) {
+		btrfs_err(fs_info,
+		"rescue=skipbg mount option can't be changed during remount");
+		ret = -EINVAL;
+		goto restore;
+	}
+
 	btrfs_remount_begin(fs_info, old_opts, *flags);
 	btrfs_resize_thread_pool(fs_info,
 		fs_info->thread_pool_size, old_thread_pool_size);
@@ -1912,6 +1928,13 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 			goto restore;
 		}
 
+		if (btrfs_test_opt(fs_info, SKIPBG)) {
+			btrfs_err(fs_info,
+		"remounting read-write with rescue=skipbg is not allowed");
+			ret = -EINVAL;
+			goto restore;
+		}
+
 		ret = btrfs_cleanup_fs_roots(fs_info);
 		if (ret)
 			goto restore;
@@ -2215,9 +2238,12 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 	 * not fit in the free metadata space.  If we aren't ->full then we
 	 * still can allocate chunks and thus are fine using the currently
 	 * calculated f_bavail.
+	 *
+	 * Or if we're rescuing, set available to 0 anyway.
 	 */
-	if (!mixed && block_rsv->space_info->full &&
-	    total_free_meta - thresh < block_rsv->size)
+	if (btrfs_test_opt(fs_info, SKIPBG) ||
+	    (!mixed && block_rsv->space_info->full &&
+	     total_free_meta - thresh < block_rsv->size))
 		buf->f_bavail = 0;
 
 	buf->f_type = BTRFS_SUPER_MAGIC;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0d6e785bcb98..f89625de1fff 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7594,6 +7594,13 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
 	u64 prev_dev_ext_end = 0;
 	int ret = 0;
 
+	/*
+	 * For rescue=skipbg mount option, we're already RO and are salvaging
+	 * data, no need for such strict check.
+	 */
+	if (btrfs_test_opt(fs_info, SKIPBG))
+		return 0;
+
 	key.objectid = 1;
 	key.type = BTRFS_DEV_EXTENT_KEY;
 	key.offset = 0;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v7 1/2] btrfs: Introduce "rescue=" mount option
  2020-06-04  7:18 ` [PATCH v7 1/2] btrfs: Introduce "rescue=" mount option Qu Wenruo
@ 2020-06-04 13:15   ` Josef Bacik
  2020-06-05 10:04   ` Anand Jain
  2020-06-10 15:11   ` David Sterba
  2 siblings, 0 replies; 13+ messages in thread
From: Josef Bacik @ 2020-06-04 13:15 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 6/4/20 3:18 AM, Qu Wenruo wrote:
> This patch introduces a new "rescue=" mount option group for all those
> mount options for data recovery.
> 
> Different rescue sub options are seperated by ':'. E.g
> "ro,rescue=nologreplay:usebackuproot".
> (The original plan is to use ';', but ';' needs to be escaped/quoted,
> or it will be interpreted by bash)
> 
> And obviously, user can specify rescue options one by one like:
> "ro,rescue=nologreplay,rescue=usebackuproot"
> 
> The following mount options are converted to "rescue=", old mount
> options are deprecated but still available for compatibility purpose:
> 
> - usebackuproot
>    Now it's "rescue=usebackuproot"
> 
> - nologreplay
>    Now it's "rescue=nologreplay"
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v7 2/2] btrfs: Introduce new mount option to skip block group items scan
  2020-06-04  7:18 ` [PATCH v7 2/2] btrfs: Introduce new mount option to skip block group items scan Qu Wenruo
@ 2020-06-04 13:17   ` Josef Bacik
  2020-06-05 10:03   ` Anand Jain
  1 sibling, 0 replies; 13+ messages in thread
From: Josef Bacik @ 2020-06-04 13:17 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 6/4/20 3:18 AM, Qu Wenruo wrote:
> [PROBLEM]
> There are some reports of corrupted fs which can't be mounted due to
> corrupted extent tree.
> 
> However under such situation, it's more likely the fs/subvolume trees
> are still fine.
> 
> For such case we normally go btrfs-restore and salvage as much as we
> can. However btrfs-restore can't list subvolumes as "btrfs subv list",
> making it harder to restore a fs.
> 
> [ENHANCEMENT]
> This patch will introduce a new mount option "rescue=skipbg" to skip
> the mount time block group scan, and use chunk info solely to populate
> fake block group cache.
> 
> The mount option has the following dependency:
> - RO mount
>    Obviously.
> 
> - No dirty log.
>    Either there is no log, or use rescue=nologreplay mount option.
> 
> - No way to remoutn RW
>    Similar to rescue=nologreplay option.
> 
> This allow kernel to accept all extent tree corruption, even when the
> whole extent tree is corrupted, and allow user to salvage data and
> subvolume info.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v7 0/2] btrfs: Introduce new rescue= mount options
  2020-06-04  7:18 [PATCH v7 0/2] btrfs: Introduce new rescue= mount options Qu Wenruo
  2020-06-04  7:18 ` [PATCH v7 1/2] btrfs: Introduce "rescue=" mount option Qu Wenruo
  2020-06-04  7:18 ` [PATCH v7 2/2] btrfs: Introduce new mount option to skip block group items scan Qu Wenruo
@ 2020-06-05  9:22 ` Anand Jain
  2 siblings, 0 replies; 13+ messages in thread
From: Anand Jain @ 2020-06-05  9:22 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 4/6/20 3:18 pm, Qu Wenruo wrote:
> There are quite a lot btrfs extent tree corruption report in the mail
> list.

> Since btrfs will do mount time block group item search, one corrupted
> leaf containing block group item will prevent the whole fs to be
> mounted.
> 

  Can you add btrfs_info/warn() at those places to indicate
  -o rescue=skip.. might help to mount in RO.

> This patchset will try to address the problem by introducing a new mount
> option, "rescue=skipbg", as a last-resort rescue.

  Do you think there might be another check to skip during mount at some
  point. So if why not add a generic -o rescue=skipchecks? Of course
  dmesg -k must show what has been skipped.

> This "rescue=skipbg" has some advantage compared to user space tool
> like "btrfs-restore":
> - Unified recovery tool

  Yes.

Thanks, Anand

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v7 2/2] btrfs: Introduce new mount option to skip block group items scan
  2020-06-04  7:18 ` [PATCH v7 2/2] btrfs: Introduce new mount option to skip block group items scan Qu Wenruo
  2020-06-04 13:17   ` Josef Bacik
@ 2020-06-05 10:03   ` Anand Jain
  1 sibling, 0 replies; 13+ messages in thread
From: Anand Jain @ 2020-06-05 10:03 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



> @@ -2010,6 +2056,9 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
>   	int need_clear = 0;
>   	u64 cache_gen;
>   
> +	if (btrfs_test_opt(info, SKIPBG))
> +		return fill_dummy_bgs(info);
> +

   Could it first read the block group if it fails then check mount
   option skip + other required options are set to continue/abort?


> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index f8ec2d8606fd..84d62bd53940 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2269,11 +2269,15 @@ static int btrfs_read_roots(struct btrfs_fs_info *fs_info)
>   
>   	root = btrfs_read_tree_root(tree_root, &location);
>   	if (IS_ERR(root)) {
> -		ret = PTR_ERR(root);
> -		goto out;
> +		if (!btrfs_test_opt(fs_info, SKIPBG)) {
> +			ret = PTR_ERR(root);
> +			goto out;
> +		}

  Needs a btrfs_warn().

> @@ -2215,9 +2238,12 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
>   	 * not fit in the free metadata space.  If we aren't ->full then we
>   	 * still can allocate chunks and thus are fine using the currently
>   	 * calculated f_bavail.
> +	 *
> +	 * Or if we're rescuing, set available to 0 anyway.
 >
>   	 */
> -	if (!mixed && block_rsv->space_info->full &&
> -	    total_free_meta - thresh < block_rsv->size)
> +	if (btrfs_test_opt(fs_info, SKIPBG) ||
> +	    (!mixed && block_rsv->space_info->full &&
> +	     total_free_meta - thresh < block_rsv->size))
>   		buf->f_bavail = 0;
>

  I wonder why is this necessary? when RO and nologreply mount options
  are prerequisites of the skip mount option.

  Also its not a good idea that df reports 0 available size.


> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 0d6e785bcb98..f89625de1fff 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -7594,6 +7594,13 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
>   	u64 prev_dev_ext_end = 0;
>   	int ret = 0;
>   
> +	/*
> +	 * For rescue=skipbg mount option, we're already RO and are salvaging
> +	 * data, no need for such strict check.
> +	 */
> +	if (btrfs_test_opt(fs_info, SKIPBG))
> +		return 0;
> +

  Here too, can we first verify if the dev extents actually fail, and
  then check if skip + other necessary mount options are set to
  continue/abort the mount. ?

Thanks, Anand


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v7 1/2] btrfs: Introduce "rescue=" mount option
  2020-06-04  7:18 ` [PATCH v7 1/2] btrfs: Introduce "rescue=" mount option Qu Wenruo
  2020-06-04 13:15   ` Josef Bacik
@ 2020-06-05 10:04   ` Anand Jain
  2020-06-05 11:36     ` David Sterba
  2020-06-10 15:11   ` David Sterba
  2 siblings, 1 reply; 13+ messages in thread
From: Anand Jain @ 2020-06-05 10:04 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 4/6/20 3:18 pm, Qu Wenruo wrote:
> This patch introduces a new "rescue=" mount option group for all those
> mount options for data recovery.
> 
> Different rescue sub options are seperated by ':'. E.g
> "ro,rescue=nologreplay:usebackuproot".
> (The original plan is to use ';', but ';' needs to be escaped/quoted,
> or it will be interpreted by bash)

  I fell ':' isn't suitable here.

> And obviously, user can specify rescue options one by one like:
> "ro,rescue=nologreplay,rescue=usebackuproot"

  This should suffice right?

Thanks, Anand

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v7 1/2] btrfs: Introduce "rescue=" mount option
  2020-06-05 10:04   ` Anand Jain
@ 2020-06-05 11:36     ` David Sterba
  2020-06-08  8:11       ` Anand Jain
  0 siblings, 1 reply; 13+ messages in thread
From: David Sterba @ 2020-06-05 11:36 UTC (permalink / raw)
  To: Anand Jain; +Cc: Qu Wenruo, linux-btrfs

On Fri, Jun 05, 2020 at 06:04:01PM +0800, Anand Jain wrote:
> On 4/6/20 3:18 pm, Qu Wenruo wrote:
> > This patch introduces a new "rescue=" mount option group for all those
> > mount options for data recovery.
> > 
> > Different rescue sub options are seperated by ':'. E.g
> > "ro,rescue=nologreplay:usebackuproot".
> > (The original plan is to use ';', but ';' needs to be escaped/quoted,
> > or it will be interpreted by bash)
> 
>   I fell ':' isn't suitable here.

What do you suggest then?

> > And obviously, user can specify rescue options one by one like:
> > "ro,rescue=nologreplay,rescue=usebackuproot"
> 
>   This should suffice right?

Setting the rescue= value separately should be supported, but requiring
to write the option name for each value defeats the purpose to make it
compact and user friendly.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v7 1/2] btrfs: Introduce "rescue=" mount option
  2020-06-05 11:36     ` David Sterba
@ 2020-06-08  8:11       ` Anand Jain
  2020-06-08  9:39         ` Qu Wenruo
  2020-06-10 14:47         ` David Sterba
  0 siblings, 2 replies; 13+ messages in thread
From: Anand Jain @ 2020-06-08  8:11 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs

On 5/6/20 7:36 pm, David Sterba wrote:
> On Fri, Jun 05, 2020 at 06:04:01PM +0800, Anand Jain wrote:
>> On 4/6/20 3:18 pm, Qu Wenruo wrote:
>>> This patch introduces a new "rescue=" mount option group for all those
>>> mount options for data recovery.
>>>
>>> Different rescue sub options are seperated by ':'. E.g
>>> "ro,rescue=nologreplay:usebackuproot".
>>> (The original plan is to use ';', but ';' needs to be escaped/quoted,
>>> or it will be interpreted by bash)
>>
>>    I fell ':' isn't suitable here.
> 
> What do you suggest then?
> 

There isn't any other choice, right? Probably that's the reason for
-o device it is -o device=dev1,device=dev2 still remains separated?
IMO if there isn't a choice it is ok to leave them separate.

But as I commented in the other thread instead of
-o rescue=skipbg:another1:another2 why not just -o rescue
and mount thread shall skip the checks that fail and mount the
fs in RO if possible. The dmesg -k must show the checks that
were failed and had to skip to make the RO mount successful.
So, that becomes clear about the errors which lead to the current RO 
mount, instead of going through the logs to figure out. This is a more 
user-friendly approach as there is one rescue option. But I am not
sure if it is possible?

Thanks, Anand


>>> And obviously, user can specify rescue options one by one like:
>>> "ro,rescue=nologreplay,rescue=usebackuproot"
>>
>>    This should suffice right?
> 
> Setting the rescue= value separately should be supported, but requiring
> to write the option name for each value defeats the purpose to make it
> compact and user friendly.
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v7 1/2] btrfs: Introduce "rescue=" mount option
  2020-06-08  8:11       ` Anand Jain
@ 2020-06-08  9:39         ` Qu Wenruo
  2020-06-10 14:47         ` David Sterba
  1 sibling, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2020-06-08  9:39 UTC (permalink / raw)
  To: Anand Jain, dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2072 bytes --]



On 2020/6/8 下午4:11, Anand Jain wrote:
> On 5/6/20 7:36 pm, David Sterba wrote:
>> On Fri, Jun 05, 2020 at 06:04:01PM +0800, Anand Jain wrote:
>>> On 4/6/20 3:18 pm, Qu Wenruo wrote:
>>>> This patch introduces a new "rescue=" mount option group for all those
>>>> mount options for data recovery.
>>>>
>>>> Different rescue sub options are seperated by ':'. E.g
>>>> "ro,rescue=nologreplay:usebackuproot".
>>>> (The original plan is to use ';', but ';' needs to be escaped/quoted,
>>>> or it will be interpreted by bash)
>>>
>>>    I fell ':' isn't suitable here.
>>
>> What do you suggest then?
>>
> 
> There isn't any other choice, right? Probably that's the reason for
> -o device it is -o device=dev1,device=dev2 still remains separated?
> IMO if there isn't a choice it is ok to leave them separate.
> 
> But as I commented in the other thread instead of
> -o rescue=skipbg:another1:another2 why not just -o rescue
> and mount thread shall skip the checks that fail and mount the
> fs in RO if possible.

That would make dependency complex. The skipbg already needs nologreplay
and RO, and usebackuproot sometimes doesn't work as expected (in fact,
that mount option has fewer success than we thought).

I don't want to spend too much code on a salvage mount option group.

Thanks,
Qu

> The dmesg -k must show the checks that
> were failed and had to skip to make the RO mount successful.
> So, that becomes clear about the errors which lead to the current RO
> mount, instead of going through the logs to figure out. This is a more
> user-friendly approach as there is one rescue option. But I am not
> sure if it is possible?
> 
> Thanks, Anand
> 
> 
>>>> And obviously, user can specify rescue options one by one like:
>>>> "ro,rescue=nologreplay,rescue=usebackuproot"
>>>
>>>    This should suffice right?
>>
>> Setting the rescue= value separately should be supported, but requiring
>> to write the option name for each value defeats the purpose to make it
>> compact and user friendly.
>>
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v7 1/2] btrfs: Introduce "rescue=" mount option
  2020-06-08  8:11       ` Anand Jain
  2020-06-08  9:39         ` Qu Wenruo
@ 2020-06-10 14:47         ` David Sterba
  1 sibling, 0 replies; 13+ messages in thread
From: David Sterba @ 2020-06-10 14:47 UTC (permalink / raw)
  To: Anand Jain; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Mon, Jun 08, 2020 at 04:11:57PM +0800, Anand Jain wrote:
> On 5/6/20 7:36 pm, David Sterba wrote:
> > On Fri, Jun 05, 2020 at 06:04:01PM +0800, Anand Jain wrote:
> >> On 4/6/20 3:18 pm, Qu Wenruo wrote:
> >>> This patch introduces a new "rescue=" mount option group for all those
> >>> mount options for data recovery.
> >>>
> >>> Different rescue sub options are seperated by ':'. E.g
> >>> "ro,rescue=nologreplay:usebackuproot".
> >>> (The original plan is to use ';', but ';' needs to be escaped/quoted,
> >>> or it will be interpreted by bash)
> >>
> >>    I fell ':' isn't suitable here.
> > 
> > What do you suggest then?
> 
> There isn't any other choice, right? Probably that's the reason for
> -o device it is -o device=dev1,device=dev2 still remains separated?
> IMO if there isn't a choice it is ok to leave them separate.

I don't think -o device is a good example to follow, we'd hardly find
any good separator of the filenames, because device path can contain
everything. /dev/disk/by-id eg. contains ":", so we'd need escaping.

> But as I commented in the other thread instead of
> -o rescue=skipbg:another1:another2 why not just -o rescue
> and mount thread shall skip the checks that fail and mount the
> fs in RO if possible. The dmesg -k must show the checks that
> were failed and had to skip to make the RO mount successful.
> So, that becomes clear about the errors which lead to the current RO 
> mount, instead of going through the logs to figure out. This is a more 
> user-friendly approach as there is one rescue option. But I am not
> sure if it is possible?

That could be a mode of rescue= that would try hard to get the
filesystem mounted but by default it's better to separate the actions,
so eg. usebackuproot is not done while skipbg would be the one to make
the mount possible.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v7 1/2] btrfs: Introduce "rescue=" mount option
  2020-06-04  7:18 ` [PATCH v7 1/2] btrfs: Introduce "rescue=" mount option Qu Wenruo
  2020-06-04 13:15   ` Josef Bacik
  2020-06-05 10:04   ` Anand Jain
@ 2020-06-10 15:11   ` David Sterba
  2 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2020-06-10 15:11 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Jun 04, 2020 at 03:18:06PM +0800, Qu Wenruo wrote:
> This patch introduces a new "rescue=" mount option group for all those
> mount options for data recovery.
> 
> Different rescue sub options are seperated by ':'. E.g
> "ro,rescue=nologreplay:usebackuproot".
> (The original plan is to use ';', but ';' needs to be escaped/quoted,
> or it will be interpreted by bash)

The separators available:

- "," already used for mount options
- ";" needs shell escaping
- "|" same
- "+" that also looks ok
- * & # $ % @  all would be confusing I guess

so ":" seems like a good choice.

> And obviously, user can specify rescue options one by one like:
> "ro,rescue=nologreplay,rescue=usebackuproot"
> 
> The following mount options are converted to "rescue=", old mount
> options are deprecated but still available for compatibility purpose:
> 
> - usebackuproot
>   Now it's "rescue=usebackuproot"
> 
> - nologreplay
>   Now it's "rescue=nologreplay"
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

I'll add the patches as topic branch to for-next and to misc-next
eventually. Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-06-10 15:12 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-04  7:18 [PATCH v7 0/2] btrfs: Introduce new rescue= mount options Qu Wenruo
2020-06-04  7:18 ` [PATCH v7 1/2] btrfs: Introduce "rescue=" mount option Qu Wenruo
2020-06-04 13:15   ` Josef Bacik
2020-06-05 10:04   ` Anand Jain
2020-06-05 11:36     ` David Sterba
2020-06-08  8:11       ` Anand Jain
2020-06-08  9:39         ` Qu Wenruo
2020-06-10 14:47         ` David Sterba
2020-06-10 15:11   ` David Sterba
2020-06-04  7:18 ` [PATCH v7 2/2] btrfs: Introduce new mount option to skip block group items scan Qu Wenruo
2020-06-04 13:17   ` Josef Bacik
2020-06-05 10:03   ` Anand Jain
2020-06-05  9:22 ` [PATCH v7 0/2] btrfs: Introduce new rescue= mount options Anand Jain

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.