All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] Chunk level degradable check
@ 2017-03-06  8:58 Qu Wenruo
  2017-03-06  8:58 ` [PATCH v2 1/6] btrfs: Introduce a function to check if all chunks a OK for degraded rw mount Qu Wenruo
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Qu Wenruo @ 2017-03-06  8:58 UTC (permalink / raw)
  To: linux-btrfs; +Cc: lists

Btrfs currently uses num_tolerated_disk_barrier_failures to do global
check for tolerated missing device.

Although the one-size-fit-all solution is quite safe, it's too strict if
data and metadata has different duplication level.

For example, if one use Single data and RAID1 metadata for 2 disks, it
means any missing device will make the fs unable to be degraded mounted.

But in fact, some times all single chunks may be in the existing device
and in that case, we should allow it to be rw degraded mounted.

Such case can be easily reproduced using the following script:
 # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc
 # wipefs -f /dev/sdc
 # mount /dev/sdb -o degraded,rw

If using btrfs-debug-tree to check /dev/sdb, one should find that the
data chunk is only in sdb, so in fact it should allow degraded mount.

This patchset will introduce a new per-chunk degradable check for btrfs,
allow above case to succeed, and it's quite small anyway.

And enhance kernel error message for missing device, at least kernel can
know what's making mount failed, other than meaningless
"failed to read system chunk/chunk tree -5".

v2:
  Update after almost 2 years.
  Add the last patch to enhance the kernel output, so user can know it's
  missing devices prevent btrfs to mount.

Qu Wenruo (6):
  btrfs: Introduce a function to check if all chunks a OK for degraded
    rw mount
  btrfs: Do chunk level rw degrade check at mount time
  btrfs: Do chunk level degradation check for remount
  btrfs: Allow barrier_all_devices to do chunk level device check
  btrfs: Cleanup num_tolerated_disk_barrier_failures
  btrfs: Enhance missing device kernel message

 fs/btrfs/ctree.h   |  2 --
 fs/btrfs/disk-io.c | 80 ++++++--------------------------------------
 fs/btrfs/disk-io.h |  2 --
 fs/btrfs/super.c   |  5 ++-
 fs/btrfs/volumes.c | 97 ++++++++++++++++++++++++++++++++++++++++--------------
 fs/btrfs/volumes.h |  7 ++++
 6 files changed, 92 insertions(+), 101 deletions(-)

-- 
2.12.0




^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/6] btrfs: Introduce a function to check if all chunks a OK for degraded rw mount
  2017-03-06  8:58 [PATCH v2 0/6] Chunk level degradable check Qu Wenruo
@ 2017-03-06  8:58 ` Qu Wenruo
  2017-03-07  4:48   ` Anand Jain
  2017-03-06  8:58 ` [PATCH v2 2/6] btrfs: Do chunk level rw degrade check at mount time Qu Wenruo
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Qu Wenruo @ 2017-03-06  8:58 UTC (permalink / raw)
  To: linux-btrfs; +Cc: lists

Introduce a new function, btrfs_check_rw_degradable(), to check if all
chunks in btrfs is OK for degraded rw mount.

It provides the new basis for accurate btrfs mount/remount and even
runtime degraded mount check other than old one-size-fit-all method.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/volumes.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.h |  1 +
 2 files changed, 54 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 7c8c7bbee197..dd9dd94d7043 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6765,6 +6765,59 @@ int btrfs_read_sys_array(struct btrfs_fs_info *fs_info)
 	return -EIO;
 }
 
+/*
+ * Check if all chunks in the fs is OK for read-write degraded mount
+ *
+ * Return true if the fs is OK to be mounted degraded read-write
+ * Return false if the fs is not OK to be mounted degraded
+ */
+bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
+	struct extent_map *em;
+	u64 next_start = 0;
+	bool ret = true;
+
+	read_lock(&map_tree->map_tree.lock);
+	em = lookup_extent_mapping(&map_tree->map_tree, 0, (u64)-1);
+	/* No chunk at all? Return false anyway */
+	if (!em) {
+		ret = false;
+		goto out;
+	}
+	while (em) {
+		struct map_lookup *map;
+		int missing = 0;
+		int max_tolerated;
+		int i;
+
+		map = (struct map_lookup *) em->bdev;
+		max_tolerated =
+			btrfs_get_num_tolerated_disk_barrier_failures(
+					map->type);
+		for (i = 0; i < map->num_stripes; i++) {
+			if (map->stripes[i].dev->missing)
+				missing++;
+		}
+		if (missing > max_tolerated) {
+			ret = false;
+			btrfs_warn(fs_info,
+	"chunk %llu missing %d devices, max tolerance is %d for writeble mount",
+				   em->start, missing, max_tolerated);
+			free_extent_map(em);
+			goto out;
+		}
+		next_start = extent_map_end(em);
+		free_extent_map(em);
+
+		em = lookup_extent_mapping(&map_tree->map_tree, next_start,
+					   (u64)(-1) - next_start);
+	}
+out:
+	read_unlock(&map_tree->map_tree.lock);
+	return ret;
+}
+
 int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
 {
 	struct btrfs_root *root = fs_info->chunk_root;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 59be81206dd7..db1b5ef479cf 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -538,4 +538,5 @@ struct list_head *btrfs_get_fs_uuids(void);
 void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
 void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
 
+bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info);
 #endif
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/6] btrfs: Do chunk level rw degrade check at mount time
  2017-03-06  8:58 [PATCH v2 0/6] Chunk level degradable check Qu Wenruo
  2017-03-06  8:58 ` [PATCH v2 1/6] btrfs: Introduce a function to check if all chunks a OK for degraded rw mount Qu Wenruo
@ 2017-03-06  8:58 ` Qu Wenruo
  2017-03-06  8:58 ` [PATCH v2 3/6] btrfs: Do chunk level degradation check for remount Qu Wenruo
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2017-03-06  8:58 UTC (permalink / raw)
  To: linux-btrfs; +Cc: lists

Now use the btrfs_check_rw_degradable() to do mount time degration check.

With this patch, now we can mount with the following case:
 # mkfs.btrfs -f -m raid1 -d single /dev/sdb /dev/sdc
 # wipefs -a /dev/sdc
 # mount /dev/sdb /mnt/btrfs -o degraded
 As the single data chunk is only in sdb, so it's OK to mount as
 degraded, as missing one device is OK for RAID1.

But still fail with the following case as expected:
 # mkfs.btrfs -f -m raid1 -d single /dev/sdb /dev/sdc
 # wipefs -a /dev/sdb
 # mount /dev/sdc /mnt/btrfs -o degraded
 As the data chunk is only in sdb, so it's not OK to mount it as
 degraded.

Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Reported-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/disk-io.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 73fdc6bdaea9..c26b8a0b121c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3057,15 +3057,10 @@ int open_ctree(struct super_block *sb,
 		btrfs_err(fs_info, "failed to read block groups: %d", ret);
 		goto fail_sysfs;
 	}
-	fs_info->num_tolerated_disk_barrier_failures =
-		btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
-	if (fs_info->fs_devices->missing_devices >
-	     fs_info->num_tolerated_disk_barrier_failures &&
-	    !(sb->s_flags & MS_RDONLY)) {
+
+	if (!(sb->s_flags & MS_RDONLY) && !btrfs_check_rw_degradable(fs_info)) {
 		btrfs_warn(fs_info,
-"missing devices (%llu) exceeds the limit (%d), writeable mount is not allowed",
-			fs_info->fs_devices->missing_devices,
-			fs_info->num_tolerated_disk_barrier_failures);
+		"writeable mount is not allowed due to too many missing devices");
 		goto fail_sysfs;
 	}
 
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/6] btrfs: Do chunk level degradation check for remount
  2017-03-06  8:58 [PATCH v2 0/6] Chunk level degradable check Qu Wenruo
  2017-03-06  8:58 ` [PATCH v2 1/6] btrfs: Introduce a function to check if all chunks a OK for degraded rw mount Qu Wenruo
  2017-03-06  8:58 ` [PATCH v2 2/6] btrfs: Do chunk level rw degrade check at mount time Qu Wenruo
@ 2017-03-06  8:58 ` Qu Wenruo
  2017-03-06  8:58 ` [PATCH v2 4/6] btrfs: Allow barrier_all_devices to do chunk level device check Qu Wenruo
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2017-03-06  8:58 UTC (permalink / raw)
  To: linux-btrfs; +Cc: lists

Just the same for mount time check, use btrfs_check_rw_degradable() to
check if we are OK to be remounted rw.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/super.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index da687dc79cce..1f5772501c92 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1784,9 +1784,8 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 			goto restore;
 		}
 
-		if (fs_info->fs_devices->missing_devices >
-		     fs_info->num_tolerated_disk_barrier_failures &&
-		    !(*flags & MS_RDONLY)) {
+		if (!(*flags & MS_RDONLY) &&
+		    !btrfs_check_rw_degradable(fs_info)) {
 			btrfs_warn(fs_info,
 				"too many missing devices, writeable remount is not allowed");
 			ret = -EACCES;
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 4/6] btrfs: Allow barrier_all_devices to do chunk level device check
  2017-03-06  8:58 [PATCH v2 0/6] Chunk level degradable check Qu Wenruo
                   ` (2 preceding siblings ...)
  2017-03-06  8:58 ` [PATCH v2 3/6] btrfs: Do chunk level degradation check for remount Qu Wenruo
@ 2017-03-06  8:58 ` Qu Wenruo
  2017-03-07  4:48   ` Anand Jain
  2017-03-06  8:58 ` [PATCH v2 5/6] btrfs: Cleanup num_tolerated_disk_barrier_failures Qu Wenruo
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Qu Wenruo @ 2017-03-06  8:58 UTC (permalink / raw)
  To: linux-btrfs; +Cc: lists

The last user of num_tolerated_disk_barrier_failures is
barrier_all_devices().
But it's can be easily changed to new per-chunk degradable check
framework.

Now btrfs_device will have two extra members, representing send/wait
error, set at write_dev_flush() time.
With these 2 new members, btrfs_check_rw_degradable() can check if the
fs is still OK when the fs is committed to disk.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/disk-io.c | 15 +++++++--------
 fs/btrfs/volumes.c |  4 +++-
 fs/btrfs/volumes.h |  4 ++++
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c26b8a0b121c..f596bd130524 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3569,17 +3569,17 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 {
 	struct list_head *head;
 	struct btrfs_device *dev;
-	int errors_send = 0;
-	int errors_wait = 0;
 	int ret;
 
 	/* send down all the barriers */
 	head = &info->fs_devices->devices;
 	list_for_each_entry_rcu(dev, head, dev_list) {
+		dev->err_wait = false;
+		dev->err_send = false;
 		if (dev->missing)
 			continue;
 		if (!dev->bdev) {
-			errors_send++;
+			dev->err_send = true;
 			continue;
 		}
 		if (!dev->in_fs_metadata || !dev->writeable)
@@ -3587,7 +3587,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 
 		ret = write_dev_flush(dev, 0);
 		if (ret)
-			errors_send++;
+			dev->err_send = true;
 	}
 
 	/* wait for all the barriers */
@@ -3595,7 +3595,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 		if (dev->missing)
 			continue;
 		if (!dev->bdev) {
-			errors_wait++;
+			dev->err_wait = true;
 			continue;
 		}
 		if (!dev->in_fs_metadata || !dev->writeable)
@@ -3603,10 +3603,9 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 
 		ret = write_dev_flush(dev, 1);
 		if (ret)
-			errors_wait++;
+			dev->err_wait = true;
 	}
-	if (errors_send > info->num_tolerated_disk_barrier_failures ||
-	    errors_wait > info->num_tolerated_disk_barrier_failures)
+	if (!btrfs_check_rw_degradable(info))
 		return -EIO;
 	return 0;
 }
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index dd9dd94d7043..729cbd0d2b60 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6796,7 +6796,9 @@ bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info)
 			btrfs_get_num_tolerated_disk_barrier_failures(
 					map->type);
 		for (i = 0; i < map->num_stripes; i++) {
-			if (map->stripes[i].dev->missing)
+			if (map->stripes[i].dev->missing ||
+			    map->stripes[i].dev->err_wait ||
+			    map->stripes[i].dev->err_send)
 				missing++;
 		}
 		if (missing > max_tolerated) {
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index db1b5ef479cf..112fccacdabc 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -75,6 +75,10 @@ struct btrfs_device {
 	int can_discard;
 	int is_tgtdev_for_dev_replace;
 
+	/* If this devices fails to send/wait dev flush */
+	bool err_send;
+	bool err_wait;
+
 #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
 	seqcount_t data_seqcount;
 #endif
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 5/6] btrfs: Cleanup num_tolerated_disk_barrier_failures
  2017-03-06  8:58 [PATCH v2 0/6] Chunk level degradable check Qu Wenruo
                   ` (3 preceding siblings ...)
  2017-03-06  8:58 ` [PATCH v2 4/6] btrfs: Allow barrier_all_devices to do chunk level device check Qu Wenruo
@ 2017-03-06  8:58 ` Qu Wenruo
  2017-03-06  8:58 ` [PATCH v2 6/6] btrfs: Enhance missing device kernel message Qu Wenruo
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2017-03-06  8:58 UTC (permalink / raw)
  To: linux-btrfs; +Cc: lists

As we use per-chunk degradable check, now the global
num_tolerated_disk_barrier_failures is of no use.

So cleanup it.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/ctree.h   |  2 --
 fs/btrfs/disk-io.c | 54 ------------------------------------------------------
 fs/btrfs/disk-io.h |  2 --
 fs/btrfs/volumes.c | 17 -----------------
 4 files changed, 75 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index f03c2f285eb1..068c1be83266 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1061,8 +1061,6 @@ struct btrfs_fs_info {
 	/* next backup root to be overwritten */
 	int backup_root_index;
 
-	int num_tolerated_disk_barrier_failures;
-
 	/* device replace state */
 	struct btrfs_dev_replace dev_replace;
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f596bd130524..f86eaf63e5f5 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3639,60 +3639,6 @@ int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags)
 	return min_tolerated;
 }
 
-int btrfs_calc_num_tolerated_disk_barrier_failures(
-	struct btrfs_fs_info *fs_info)
-{
-	struct btrfs_ioctl_space_info space;
-	struct btrfs_space_info *sinfo;
-	u64 types[] = {BTRFS_BLOCK_GROUP_DATA,
-		       BTRFS_BLOCK_GROUP_SYSTEM,
-		       BTRFS_BLOCK_GROUP_METADATA,
-		       BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA};
-	int i;
-	int c;
-	int num_tolerated_disk_barrier_failures =
-		(int)fs_info->fs_devices->num_devices;
-
-	for (i = 0; i < ARRAY_SIZE(types); i++) {
-		struct btrfs_space_info *tmp;
-
-		sinfo = NULL;
-		rcu_read_lock();
-		list_for_each_entry_rcu(tmp, &fs_info->space_info, list) {
-			if (tmp->flags == types[i]) {
-				sinfo = tmp;
-				break;
-			}
-		}
-		rcu_read_unlock();
-
-		if (!sinfo)
-			continue;
-
-		down_read(&sinfo->groups_sem);
-		for (c = 0; c < BTRFS_NR_RAID_TYPES; c++) {
-			u64 flags;
-
-			if (list_empty(&sinfo->block_groups[c]))
-				continue;
-
-			btrfs_get_block_group_info(&sinfo->block_groups[c],
-						   &space);
-			if (space.total_bytes == 0 || space.used_bytes == 0)
-				continue;
-			flags = space.flags;
-
-			num_tolerated_disk_barrier_failures = min(
-				num_tolerated_disk_barrier_failures,
-				btrfs_get_num_tolerated_disk_barrier_failures(
-					flags));
-		}
-		up_read(&sinfo->groups_sem);
-	}
-
-	return num_tolerated_disk_barrier_failures;
-}
-
 int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
 {
 	struct list_head *head;
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 2e0ec29bfd69..4522d2f11909 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -142,8 +142,6 @@ struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans,
 int btree_lock_page_hook(struct page *page, void *data,
 				void (*flush_fn)(void *));
 int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags);
-int btrfs_calc_num_tolerated_disk_barrier_failures(
-	struct btrfs_fs_info *fs_info);
 int __init btrfs_end_io_wq_init(void);
 void btrfs_end_io_wq_exit(void);
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 729cbd0d2b60..72c78f096755 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1973,9 +1973,6 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
 		free_fs_devices(cur_devices);
 	}
 
-	fs_info->num_tolerated_disk_barrier_failures =
-		btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
-
 out:
 	mutex_unlock(&uuid_mutex);
 	return ret;
@@ -2474,8 +2471,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 				   "sysfs: failed to create fsid for sprout");
 	}
 
-	fs_info->num_tolerated_disk_barrier_failures =
-		btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
 	ret = btrfs_commit_transaction(trans);
 
 	if (seeding_dev) {
@@ -3858,13 +3853,6 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 			   bctl->meta.target, bctl->data.target);
 	}
 
-	if (bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) {
-		fs_info->num_tolerated_disk_barrier_failures = min(
-			btrfs_calc_num_tolerated_disk_barrier_failures(fs_info),
-			btrfs_get_num_tolerated_disk_barrier_failures(
-				bctl->sys.target));
-	}
-
 	ret = insert_balance_item(fs_info, bctl);
 	if (ret && ret != -EEXIST)
 		goto out;
@@ -3887,11 +3875,6 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 	mutex_lock(&fs_info->balance_mutex);
 	atomic_dec(&fs_info->balance_running);
 
-	if (bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) {
-		fs_info->num_tolerated_disk_barrier_failures =
-			btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
-	}
-
 	if (bargs) {
 		memset(bargs, 0, sizeof(*bargs));
 		update_ioctl_balance_args(fs_info, 0, bargs);
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 6/6] btrfs: Enhance missing device kernel message
  2017-03-06  8:58 [PATCH v2 0/6] Chunk level degradable check Qu Wenruo
                   ` (4 preceding siblings ...)
  2017-03-06  8:58 ` [PATCH v2 5/6] btrfs: Cleanup num_tolerated_disk_barrier_failures Qu Wenruo
@ 2017-03-06  8:58 ` Qu Wenruo
  2017-03-07  4:47   ` Anand Jain
  2017-03-06 18:49 ` [PATCH v2 0/6] Chunk level degradable check Dmitrii Tcvetkov
  2017-03-07  0:36 ` Adam Borowski
  7 siblings, 1 reply; 20+ messages in thread
From: Qu Wenruo @ 2017-03-06  8:58 UTC (permalink / raw)
  To: linux-btrfs; +Cc: lists

For missing device, btrfs will just refuse to mount with almost
meaningless kernel message like:

 BTRFS info (device vdb6): disk space caching is enabled
 BTRFS info (device vdb6): has skinny extents
 BTRFS error (device vdb6): failed to read the system array: -5
 BTRFS error (device vdb6): open_ctree failed

This patch will add extra device missing output, making the result to:

 BTRFS info (device vdb6): disk space caching is enabled
 BTRFS info (device vdb6): has skinny extents
 BTRFS warning (device vdb6): devid 2 uuid 80470722-cad2-4b90-b7c3-fee294552f1b is missing
 BTRFS error (device vdb6): failed to read the system array: -5
 BTRFS error (device vdb6): open_ctree failed

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/volumes.c | 25 ++++++++++++++++++-------
 fs/btrfs/volumes.h |  2 ++
 2 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 72c78f096755..b33e96495934 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6442,6 +6442,7 @@ static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
 		if (!map->stripes[i].dev &&
 		    !btrfs_test_opt(fs_info, DEGRADED)) {
 			free_extent_map(em);
+			btrfs_report_missing_device(fs_info, devid, uuid);
 			return -EIO;
 		}
 		if (!map->stripes[i].dev) {
@@ -6452,8 +6453,7 @@ static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
 				free_extent_map(em);
 				return -EIO;
 			}
-			btrfs_warn(fs_info, "devid %llu uuid %pU is missing",
-				   devid, uuid);
+			btrfs_report_missing_device(fs_info, devid, uuid);
 		}
 		map->stripes[i].dev->in_fs_metadata = 1;
 	}
@@ -6570,17 +6570,21 @@ static int read_one_dev(struct btrfs_fs_info *fs_info,
 
 	device = btrfs_find_device(fs_info, devid, dev_uuid, fs_uuid);
 	if (!device) {
-		if (!btrfs_test_opt(fs_info, DEGRADED))
+		if (!btrfs_test_opt(fs_info, DEGRADED)) {
+			btrfs_report_missing_device(fs_info, devid, dev_uuid);
 			return -EIO;
+		}
 
 		device = add_missing_dev(fs_devices, devid, dev_uuid);
 		if (!device)
 			return -ENOMEM;
-		btrfs_warn(fs_info, "devid %llu uuid %pU missing",
-				devid, dev_uuid);
+		btrfs_report_missing_device(fs_info, devid, dev_uuid);
 	} else {
-		if (!device->bdev && !btrfs_test_opt(fs_info, DEGRADED))
-			return -EIO;
+		if (!device->bdev) {
+			btrfs_report_missing_device(fs_info, devid, dev_uuid);
+			if (!btrfs_test_opt(fs_info, DEGRADED))
+				return -EIO;
+		}
 
 		if(!device->bdev && !device->missing) {
 			/*
@@ -6591,6 +6595,7 @@ static int read_one_dev(struct btrfs_fs_info *fs_info,
 			 */
 			device->fs_devices->missing_devices++;
 			device->missing = 1;
+			btrfs_report_missing_device(fs_info, devid, dev_uuid);
 		}
 
 		/* Move the device to its own fs_devices */
@@ -6748,6 +6753,12 @@ int btrfs_read_sys_array(struct btrfs_fs_info *fs_info)
 	return -EIO;
 }
 
+void btrfs_report_missing_device(struct btrfs_fs_info *fs_info, u64 devid,
+				 u8 *uuid)
+{
+	btrfs_warn_rl(fs_info, "devid %llu uuid %pU is missing", devid, uuid);
+}
+
 /*
  * Check if all chunks in the fs is OK for read-write degraded mount
  *
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 112fccacdabc..4f981b75958d 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -543,4 +543,6 @@ void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
 void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
 
 bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info);
+void btrfs_report_missing_device(struct btrfs_fs_info *fs_info, u64 devid,
+				 u8 *uuid);
 #endif
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/6] Chunk level degradable check
  2017-03-06  8:58 [PATCH v2 0/6] Chunk level degradable check Qu Wenruo
                   ` (5 preceding siblings ...)
  2017-03-06  8:58 ` [PATCH v2 6/6] btrfs: Enhance missing device kernel message Qu Wenruo
@ 2017-03-06 18:49 ` Dmitrii Tcvetkov
  2017-03-07  0:36 ` Adam Borowski
  7 siblings, 0 replies; 20+ messages in thread
From: Dmitrii Tcvetkov @ 2017-03-06 18:49 UTC (permalink / raw)
  To: linux-btrfs

On Mon, 6 Mar 2017 16:58:49 +0800
Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:

> Btrfs currently uses num_tolerated_disk_barrier_failures to do global
> check for tolerated missing device.
> 
> Although the one-size-fit-all solution is quite safe, it's too strict
> if data and metadata has different duplication level.
> 
> For example, if one use Single data and RAID1 metadata for 2 disks, it
> means any missing device will make the fs unable to be degraded
> mounted.
> 
> But in fact, some times all single chunks may be in the existing
> device and in that case, we should allow it to be rw degraded mounted.
> 
> Such case can be easily reproduced using the following script:
>  # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc
>  # wipefs -f /dev/sdc
>  # mount /dev/sdb -o degraded,rw
> 
> If using btrfs-debug-tree to check /dev/sdb, one should find that the
> data chunk is only in sdb, so in fact it should allow degraded mount.
> 
> This patchset will introduce a new per-chunk degradable check for
> btrfs, allow above case to succeed, and it's quite small anyway.
> 
> And enhance kernel error message for missing device, at least kernel
> can know what's making mount failed, other than meaningless
> "failed to read system chunk/chunk tree -5".

Hello,

Tested the patchset for raid1 and raid10. Successfully allows
degraded mount with single chunks on the filesystems without one drive.

Feel free to add Tested-By: Dmitrii Tcvetkov <demfloro@demfloro.ru>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/6] Chunk level degradable check
  2017-03-06  8:58 [PATCH v2 0/6] Chunk level degradable check Qu Wenruo
                   ` (6 preceding siblings ...)
  2017-03-06 18:49 ` [PATCH v2 0/6] Chunk level degradable check Dmitrii Tcvetkov
@ 2017-03-07  0:36 ` Adam Borowski
  2017-03-07  1:35   ` Qu Wenruo
  7 siblings, 1 reply; 20+ messages in thread
From: Adam Borowski @ 2017-03-07  0:36 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, lists

On Mon, Mar 06, 2017 at 04:58:49PM +0800, Qu Wenruo wrote:
> Btrfs currently uses num_tolerated_disk_barrier_failures to do global
> check for tolerated missing device.
> 
> Although the one-size-fit-all solution is quite safe, it's too strict if
> data and metadata has different duplication level.
> 
> For example, if one use Single data and RAID1 metadata for 2 disks, it
> means any missing device will make the fs unable to be degraded mounted.
> 
> But in fact, some times all single chunks may be in the existing device
> and in that case, we should allow it to be rw degraded mounted.
[...]
> This patchset will introduce a new per-chunk degradable check for btrfs,
> allow above case to succeed, and it's quite small anyway.

I've tested it quite extensively.

As Dmitrii already tested the common case of a raid1/raid10 degraded mount,
I concentrated mostly on cases where the answer is negative.  For example:
raid1 (A,B).  Pull out A.  Add C, start resync but interrupt it halfway. 
Pull out B.  Obviously, C doesn't have every chunk, but this doesn't come
from a naive count; Qu's patch handles this case correctly.

So far, so good.

Not so for -draid5 -mraid1, unfortunately:

[/mnt/btr2/scratch]# btrfs fi us /mnt/vol1
WARNING: RAID56 detected, not implemented
Overall:
    Device size:		  12.00GiB
    Device allocated:		   2.02GiB
    Device unallocated:		   9.98GiB
    Device missing:		     0.00B
    Used:			   7.12MiB
    Free (estimated):		     0.00B	(min: 8.00EiB)
    Data ratio:			      0.00
    Metadata ratio:		      2.00
    Global reserve:		  16.00MiB	(used: 0.00B)

Data,RAID5: Size:2.02GiB, Used:1.21GiB
   /dev/loop0	   1.01GiB
   /dev/loop1	   1.01GiB
   /dev/loop2	   1.01GiB

Metadata,RAID1: Size:1.00GiB, Used:3.55MiB
   /dev/loop0	   1.00GiB
   /dev/loop2	   1.00GiB

System,RAID1: Size:8.00MiB, Used:16.00KiB
   /dev/loop1	   8.00MiB
   /dev/loop2	   8.00MiB

Unallocated:
   /dev/loop0	   1.99GiB
   /dev/loop1	   2.98GiB
   /dev/loop2	   1.98GiB
[/mnt/btr2/scratch]# umount /mnt/vol1
[/mnt/btr2/scratch]# losetup -D                                                                        ✔
[/mnt/btr2/scratch]# losetup -f rb
[/mnt/btr2/scratch]# losetup -f rc
[/mnt/btr2/scratch]# mount -noatime,degraded /dev/loop0 /mnt/vol1
[/mnt/btr2/scratch]# btrfs fi us /mnt/vol1
WARNING: RAID56 detected, not implemented
Overall:
    Device size:		  12.00GiB
    Device allocated:		   2.02GiB
    Device unallocated:		   9.98GiB
    Device missing:		     0.00B
    Used:			   7.12MiB
    Free (estimated):		     0.00B	(min: 8.00EiB)
    Data ratio:			      0.00
    Metadata ratio:		      2.00
    Global reserve:		  16.00MiB	(used: 0.00B)

Data,RAID5: Size:2.02GiB, Used:1.21GiB
   /dev/loop0	   1.01GiB
   /dev/loop0	   1.01GiB
   /dev/loop1	   1.01GiB

Metadata,RAID1: Size:1.00GiB, Used:3.55MiB
   /dev/loop0	   1.00GiB
   /dev/loop1	   1.00GiB

System,RAID1: Size:8.00MiB, Used:16.00KiB
   /dev/loop0	   8.00MiB
   /dev/loop1	   8.00MiB

Unallocated:
   /dev/loop0	   1.99GiB
   /dev/loop0	   2.98GiB
   /dev/loop1	   1.98GiB

Write something, mount degraded again.  Massive data corruption, both on
plain reads and on scrub, unrecoverable.


Obviously, this problem is somewhere with RAID5 rather than this patch set,
but the safety check can't be removed before that is fixed.


-- 
⢀⣴⠾⠻⢶⣦⠀ Meow!
⣾⠁⢠⠒⠀⣿⡁
⢿⡄⠘⠷⠚⠋⠀ Collisions shmolisions, let's see them find a collision or second
⠈⠳⣄⠀⠀⠀⠀ preimage for double rot13!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/6] Chunk level degradable check
  2017-03-07  0:36 ` Adam Borowski
@ 2017-03-07  1:35   ` Qu Wenruo
  2017-03-07  2:23     ` Adam Borowski
  0 siblings, 1 reply; 20+ messages in thread
From: Qu Wenruo @ 2017-03-07  1:35 UTC (permalink / raw)
  To: Adam Borowski; +Cc: linux-btrfs, lists



At 03/07/2017 08:36 AM, Adam Borowski wrote:
> On Mon, Mar 06, 2017 at 04:58:49PM +0800, Qu Wenruo wrote:
>> Btrfs currently uses num_tolerated_disk_barrier_failures to do global
>> check for tolerated missing device.
>>
>> Although the one-size-fit-all solution is quite safe, it's too strict if
>> data and metadata has different duplication level.
>>
>> For example, if one use Single data and RAID1 metadata for 2 disks, it
>> means any missing device will make the fs unable to be degraded mounted.
>>
>> But in fact, some times all single chunks may be in the existing device
>> and in that case, we should allow it to be rw degraded mounted.
> [...]
>> This patchset will introduce a new per-chunk degradable check for btrfs,
>> allow above case to succeed, and it's quite small anyway.
>
> I've tested it quite extensively.
>
> As Dmitrii already tested the common case of a raid1/raid10 degraded mount,
> I concentrated mostly on cases where the answer is negative.  For example:
> raid1 (A,B).  Pull out A.  Add C, start resync but interrupt it halfway.
> Pull out B.  Obviously, C doesn't have every chunk, but this doesn't come
> from a naive count; Qu's patch handles this case correctly.
>
> So far, so good.

Thanks for your full test.

>
> Not so for -draid5 -mraid1, unfortunately:

Unfortunately, for raid5 there are still unfixed bugs.
In fact, some raid5/6 bugs are already fixed, but still not merged yet.

For example, the following patch will fix the RAID5/6 parity corruption.

https://patchwork.kernel.org/patch/9553581/

>
> [/mnt/btr2/scratch]# btrfs fi us /mnt/vol1
> WARNING: RAID56 detected, not implemented
> Overall:
>     Device size:		  12.00GiB
>     Device allocated:		   2.02GiB
>     Device unallocated:		   9.98GiB
>     Device missing:		     0.00B
>     Used:			   7.12MiB
>     Free (estimated):		     0.00B	(min: 8.00EiB)
>     Data ratio:			      0.00
>     Metadata ratio:		      2.00
>     Global reserve:		  16.00MiB	(used: 0.00B)
>
> Data,RAID5: Size:2.02GiB, Used:1.21GiB
>    /dev/loop0	   1.01GiB
>    /dev/loop1	   1.01GiB
>    /dev/loop2	   1.01GiB
>
> Metadata,RAID1: Size:1.00GiB, Used:3.55MiB
>    /dev/loop0	   1.00GiB
>    /dev/loop2	   1.00GiB
>
> System,RAID1: Size:8.00MiB, Used:16.00KiB
>    /dev/loop1	   8.00MiB
>    /dev/loop2	   8.00MiB
>
> Unallocated:
>    /dev/loop0	   1.99GiB
>    /dev/loop1	   2.98GiB
>    /dev/loop2	   1.98GiB
> [/mnt/btr2/scratch]# umount /mnt/vol1
> [/mnt/btr2/scratch]# losetup -D                                                                        ✔
> [/mnt/btr2/scratch]# losetup -f rb
> [/mnt/btr2/scratch]# losetup -f rc

So you're pulling out first device.
In theory, it should be completely OK for RAID5.
And the degradable check follows it.

> [/mnt/btr2/scratch]# mount -noatime,degraded /dev/loop0 /mnt/vol1
> [/mnt/btr2/scratch]# btrfs fi us /mnt/vol1
> WARNING: RAID56 detected, not implemented
> Overall:
>     Device size:		  12.00GiB
>     Device allocated:		   2.02GiB
>     Device unallocated:		   9.98GiB
>     Device missing:		     0.00B
>     Used:			   7.12MiB
>     Free (estimated):		     0.00B	(min: 8.00EiB)
>     Data ratio:			      0.00
>     Metadata ratio:		      2.00
>     Global reserve:		  16.00MiB	(used: 0.00B)
>
> Data,RAID5: Size:2.02GiB, Used:1.21GiB
>    /dev/loop0	   1.01GiB
>    /dev/loop0	   1.01GiB
>    /dev/loop1	   1.01GiB

Two loop0 shows up here, which should be detected as missing.

So it should be a btrfs-progs bug, and it'll be much easier to fix than 
kernel.

>
> Metadata,RAID1: Size:1.00GiB, Used:3.55MiB
>    /dev/loop0	   1.00GiB
>    /dev/loop1	   1.00GiB
>
> System,RAID1: Size:8.00MiB, Used:16.00KiB
>    /dev/loop0	   8.00MiB
>    /dev/loop1	   8.00MiB
>
> Unallocated:
>    /dev/loop0	   1.99GiB
>    /dev/loop0	   2.98GiB
>    /dev/loop1	   1.98GiB
>
> Write something, mount degraded again.  Massive data corruption, both on
> plain reads and on scrub, unrecoverable.

Yep, same thing here.
And you'll be surprised that even 2 devices RAID5, which is the same as 
RAID1(parity is the same as data), can still cause the problem.

So, RAID5/6 definitely has problem in degraded mode.
While I prefer to focus on normal RAID5/6 bug fix first, and until we 
solve all RAID5/6 normal mode bugs with enough test cases covering them.

>
> Obviously, this problem is somewhere with RAID5 rather than this patch set,
> but the safety check can't be removed before that is fixed.

Do we have *safety check* in original behavior?

At least v4.11-rc1, btrfs still allows us to mount raid5/6 degraded.
So the patchset itself is behaving just as old one.

I'm completely fine to add a new patch to prohibit raid5/6 degraded 
mount, but that would be a different enhancement though.

Thanks,
Qu



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/6] Chunk level degradable check
  2017-03-07  1:35   ` Qu Wenruo
@ 2017-03-07  2:23     ` Adam Borowski
  0 siblings, 0 replies; 20+ messages in thread
From: Adam Borowski @ 2017-03-07  2:23 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, lists

On Tue, Mar 07, 2017 at 09:35:56AM +0800, Qu Wenruo wrote:
> At 03/07/2017 08:36 AM, Adam Borowski wrote:
> > Not so for -draid5 -mraid1, unfortunately:
> 
> Unfortunately, for raid5 there are still unfixed bugs.
> In fact, some raid5/6 bugs are already fixed, but still not merged yet.
> 
> > [/mnt/btr2/scratch]# btrfs fi us /mnt/vol1
> > Data,RAID5: Size:2.02GiB, Used:1.21GiB
> >    /dev/loop0	   1.01GiB
> >    /dev/loop1	   1.01GiB
> >    /dev/loop2	   1.01GiB

> > [/mnt/btr2/scratch]# umount /mnt/vol1
> > [/mnt/btr2/scratch]# losetup -D
> > [/mnt/btr2/scratch]# losetup -f rb
> > [/mnt/btr2/scratch]# losetup -f rc
> 
> So you're pulling out first device.
> In theory, it should be completely OK for RAID5.
> And the degradable check follows it.
> 
> > [/mnt/btr2/scratch]# mount -noatime,degraded /dev/loop0 /mnt/vol1
> > [/mnt/btr2/scratch]# btrfs fi us /mnt/vol1
> > Data,RAID5: Size:2.02GiB, Used:1.21GiB
> >    /dev/loop0	   1.01GiB
> >    /dev/loop0	   1.01GiB
> >    /dev/loop1	   1.01GiB
> 
> Two loop0 shows up here, which should be detected as missing.
> 
> So it should be a btrfs-progs bug, and it'll be much easier to fix than
> kernel.

Alas, it's not merely a display bug, mounting is enough.

> > Write something, mount degraded again.  Massive data corruption, both on
> > plain reads and on scrub, unrecoverable.
> 
> Yep, same thing here.
> And you'll be surprised that even 2 devices RAID5, which is the same as
> RAID1(parity is the same as data), can still cause the problem.
> 
> So, RAID5/6 definitely has problem in degraded mode.
> While I prefer to focus on normal RAID5/6 bug fix first, and until we solve
> all RAID5/6 normal mode bugs with enough test cases covering them.

Actually, turns out even the _first_ mount gets bad, even without writing a
single data byte.  So it's not related to our single chunks bug.

> > Obviously, this problem is somewhere with RAID5 rather than this patch set,
> > but the safety check can't be removed before that is fixed.
> 
> Do we have *safety check* in original behavior?
> 
> At least v4.11-rc1, btrfs still allows us to mount raid5/6 degraded.
> So the patchset itself is behaving just as old one.

Right.  Thus, there's no regression.

As it's a strict improvement over previous state (ie, fixes raid1 issues),
Tested-by: Adam Borowski <kilobyte@angband.pl> (if you don't mind spamming
commits with too many tags).

> I'm completely fine to add a new patch to prohibit raid5/6 degraded mount,
> but that would be a different enhancement though.

Yeah.  I guess it's more in the "don't use RAID5, there be dragons" land.


Thanks for these patches, they fix the #1 problem people have with RAID1.


[Apologies for that "✔" crap on some lines, my exit code on prompt thingy
is very paste-unfriendly; I keep forgetting it so often that I'd better get
rid of it...]

-- 
⢀⣴⠾⠻⢶⣦⠀ Meow!
⣾⠁⢠⠒⠀⣿⡁
⢿⡄⠘⠷⠚⠋⠀ Collisions shmolisions, let's see them find a collision or second
⠈⠳⣄⠀⠀⠀⠀ preimage for double rot13!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 6/6] btrfs: Enhance missing device kernel message
  2017-03-06  8:58 ` [PATCH v2 6/6] btrfs: Enhance missing device kernel message Qu Wenruo
@ 2017-03-07  4:47   ` Anand Jain
  0 siblings, 0 replies; 20+ messages in thread
From: Anand Jain @ 2017-03-07  4:47 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: lists



On 03/06/2017 04:58 PM, Qu Wenruo wrote:
> For missing device, btrfs will just refuse to mount with almost
> meaningless kernel message like:
>
>  BTRFS info (device vdb6): disk space caching is enabled
>  BTRFS info (device vdb6): has skinny extents
>  BTRFS error (device vdb6): failed to read the system array: -5
>  BTRFS error (device vdb6): open_ctree failed
>
> This patch will add extra device missing output, making the result to:
>
>  BTRFS info (device vdb6): disk space caching is enabled
>  BTRFS info (device vdb6): has skinny extents
>  BTRFS warning (device vdb6): devid 2 uuid 80470722-cad2-4b90-b7c3-fee294552f1b is missing
>  BTRFS error (device vdb6): failed to read the system array: -5
>  BTRFS error (device vdb6): open_ctree failed


Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand


> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
>  fs/btrfs/volumes.c | 25 ++++++++++++++++++-------
>  fs/btrfs/volumes.h |  2 ++
>  2 files changed, 20 insertions(+), 7 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 72c78f096755..b33e96495934 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6442,6 +6442,7 @@ static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
>  		if (!map->stripes[i].dev &&
>  		    !btrfs_test_opt(fs_info, DEGRADED)) {
>  			free_extent_map(em);
> +			btrfs_report_missing_device(fs_info, devid, uuid);
>  			return -EIO;
>  		}
>  		if (!map->stripes[i].dev) {
> @@ -6452,8 +6453,7 @@ static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
>  				free_extent_map(em);
>  				return -EIO;
>  			}
> -			btrfs_warn(fs_info, "devid %llu uuid %pU is missing",
> -				   devid, uuid);
> +			btrfs_report_missing_device(fs_info, devid, uuid);
>  		}
>  		map->stripes[i].dev->in_fs_metadata = 1;
>  	}
> @@ -6570,17 +6570,21 @@ static int read_one_dev(struct btrfs_fs_info *fs_info,
>
>  	device = btrfs_find_device(fs_info, devid, dev_uuid, fs_uuid);
>  	if (!device) {
> -		if (!btrfs_test_opt(fs_info, DEGRADED))
> +		if (!btrfs_test_opt(fs_info, DEGRADED)) {
> +			btrfs_report_missing_device(fs_info, devid, dev_uuid);
>  			return -EIO;
> +		}
>
>  		device = add_missing_dev(fs_devices, devid, dev_uuid);
>  		if (!device)
>  			return -ENOMEM;
> -		btrfs_warn(fs_info, "devid %llu uuid %pU missing",
> -				devid, dev_uuid);
> +		btrfs_report_missing_device(fs_info, devid, dev_uuid);
>  	} else {
> -		if (!device->bdev && !btrfs_test_opt(fs_info, DEGRADED))
> -			return -EIO;
> +		if (!device->bdev) {
> +			btrfs_report_missing_device(fs_info, devid, dev_uuid);
> +			if (!btrfs_test_opt(fs_info, DEGRADED))
> +				return -EIO;
> +		}
>
>  		if(!device->bdev && !device->missing) {
>  			/*
> @@ -6591,6 +6595,7 @@ static int read_one_dev(struct btrfs_fs_info *fs_info,
>  			 */
>  			device->fs_devices->missing_devices++;
>  			device->missing = 1;
> +			btrfs_report_missing_device(fs_info, devid, dev_uuid);
>  		}
>
>  		/* Move the device to its own fs_devices */
> @@ -6748,6 +6753,12 @@ int btrfs_read_sys_array(struct btrfs_fs_info *fs_info)
>  	return -EIO;
>  }
>
> +void btrfs_report_missing_device(struct btrfs_fs_info *fs_info, u64 devid,
> +				 u8 *uuid)
> +{
> +	btrfs_warn_rl(fs_info, "devid %llu uuid %pU is missing", devid, uuid);
> +}
> +
>  /*
>   * Check if all chunks in the fs is OK for read-write degraded mount
>   *
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 112fccacdabc..4f981b75958d 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -543,4 +543,6 @@ void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
>  void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
>
>  bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info);
> +void btrfs_report_missing_device(struct btrfs_fs_info *fs_info, u64 devid,
> +				 u8 *uuid);
>  #endif
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/6] btrfs: Introduce a function to check if all chunks a OK for degraded rw mount
  2017-03-06  8:58 ` [PATCH v2 1/6] btrfs: Introduce a function to check if all chunks a OK for degraded rw mount Qu Wenruo
@ 2017-03-07  4:48   ` Anand Jain
  2017-03-08 18:26     ` Anand Jain
  0 siblings, 1 reply; 20+ messages in thread
From: Anand Jain @ 2017-03-07  4:48 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, lists



On 03/06/2017 04:58 PM, Qu Wenruo wrote:
> Introduce a new function, btrfs_check_rw_degradable(), to check if all
> chunks in btrfs is OK for degraded rw mount.
>
> It provides the new basis for accurate btrfs mount/remount and even
> runtime degraded mount check other than old one-size-fit-all method.

  Looks good.
  Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand


> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
>  fs/btrfs/volumes.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/btrfs/volumes.h |  1 +
>  2 files changed, 54 insertions(+)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 7c8c7bbee197..dd9dd94d7043 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6765,6 +6765,59 @@ int btrfs_read_sys_array(struct btrfs_fs_info *fs_info)
>  	return -EIO;
>  }
>
> +/*
> + * Check if all chunks in the fs is OK for read-write degraded mount
> + *
> + * Return true if the fs is OK to be mounted degraded read-write
> + * Return false if the fs is not OK to be mounted degraded
> + */
> +bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info)
> +{
> +	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
> +	struct extent_map *em;
> +	u64 next_start = 0;
> +	bool ret = true;
> +
> +	read_lock(&map_tree->map_tree.lock);
> +	em = lookup_extent_mapping(&map_tree->map_tree, 0, (u64)-1);
> +	/* No chunk at all? Return false anyway */
> +	if (!em) {
> +		ret = false;
> +		goto out;
> +	}
> +	while (em) {
> +		struct map_lookup *map;
> +		int missing = 0;
> +		int max_tolerated;
> +		int i;
> +
> +		map = (struct map_lookup *) em->bdev;
> +		max_tolerated =
> +			btrfs_get_num_tolerated_disk_barrier_failures(
> +					map->type);
> +		for (i = 0; i < map->num_stripes; i++) {
> +			if (map->stripes[i].dev->missing)
> +				missing++;
> +		}
> +		if (missing > max_tolerated) {
> +			ret = false;
> +			btrfs_warn(fs_info,
> +	"chunk %llu missing %d devices, max tolerance is %d for writeble mount",
> +				   em->start, missing, max_tolerated);
> +			free_extent_map(em);
> +			goto out;
> +		}
> +		next_start = extent_map_end(em);
> +		free_extent_map(em);
> +
> +		em = lookup_extent_mapping(&map_tree->map_tree, next_start,
> +					   (u64)(-1) - next_start);
> +	}
> +out:
> +	read_unlock(&map_tree->map_tree.lock);
> +	return ret;
> +}
> +
>  int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
>  {
>  	struct btrfs_root *root = fs_info->chunk_root;
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 59be81206dd7..db1b5ef479cf 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -538,4 +538,5 @@ struct list_head *btrfs_get_fs_uuids(void);
>  void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
>  void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
>
> +bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info);
>  #endif
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/6] btrfs: Allow barrier_all_devices to do chunk level device check
  2017-03-06  8:58 ` [PATCH v2 4/6] btrfs: Allow barrier_all_devices to do chunk level device check Qu Wenruo
@ 2017-03-07  4:48   ` Anand Jain
  2017-03-07  5:36     ` Qu Wenruo
  0 siblings, 1 reply; 20+ messages in thread
From: Anand Jain @ 2017-03-07  4:48 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, lists



On 03/06/2017 04:58 PM, Qu Wenruo wrote:
> The last user of num_tolerated_disk_barrier_failures is
> barrier_all_devices().
> But it's can be easily changed to new per-chunk degradable check
> framework.
>
> Now btrfs_device will have two extra members, representing send/wait
> error, set at write_dev_flush() time.
> With these 2 new members, btrfs_check_rw_degradable() can check if the
> fs is still OK when the fs is committed to disk.

  This logic isn't reentrant, earlier it was. How about using
  stack variable instead ?

Thanks, Anand


> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
>  fs/btrfs/disk-io.c | 15 +++++++--------
>  fs/btrfs/volumes.c |  4 +++-
>  fs/btrfs/volumes.h |  4 ++++
>  3 files changed, 14 insertions(+), 9 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index c26b8a0b121c..f596bd130524 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3569,17 +3569,17 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
>  {
>  	struct list_head *head;
>  	struct btrfs_device *dev;
> -	int errors_send = 0;
> -	int errors_wait = 0;
>  	int ret;
>
>  	/* send down all the barriers */
>  	head = &info->fs_devices->devices;
>  	list_for_each_entry_rcu(dev, head, dev_list) {
> +		dev->err_wait = false;
> +		dev->err_send = false;
>  		if (dev->missing)
>  			continue;
>  		if (!dev->bdev) {
> -			errors_send++;
> +			dev->err_send = true;
>  			continue;
>  		}
>  		if (!dev->in_fs_metadata || !dev->writeable)
> @@ -3587,7 +3587,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
>
>  		ret = write_dev_flush(dev, 0);
>  		if (ret)
> -			errors_send++;
> +			dev->err_send = true;
>  	}
>
>  	/* wait for all the barriers */
> @@ -3595,7 +3595,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
>  		if (dev->missing)
>  			continue;
>  		if (!dev->bdev) {
> -			errors_wait++;
> +			dev->err_wait = true;
>  			continue;
>  		}
>  		if (!dev->in_fs_metadata || !dev->writeable)
> @@ -3603,10 +3603,9 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
>
>  		ret = write_dev_flush(dev, 1);
>  		if (ret)
> -			errors_wait++;
> +			dev->err_wait = true;
>  	}
> -	if (errors_send > info->num_tolerated_disk_barrier_failures ||
> -	    errors_wait > info->num_tolerated_disk_barrier_failures)
> +	if (!btrfs_check_rw_degradable(info))
>  		return -EIO;
>  	return 0;
>  }
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index dd9dd94d7043..729cbd0d2b60 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6796,7 +6796,9 @@ bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info)
>  			btrfs_get_num_tolerated_disk_barrier_failures(
>  					map->type);
>  		for (i = 0; i < map->num_stripes; i++) {
> -			if (map->stripes[i].dev->missing)
> +			if (map->stripes[i].dev->missing ||
> +			    map->stripes[i].dev->err_wait ||
> +			    map->stripes[i].dev->err_send)
>  				missing++;
>  		}
>  		if (missing > max_tolerated) {
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index db1b5ef479cf..112fccacdabc 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -75,6 +75,10 @@ struct btrfs_device {
>  	int can_discard;
>  	int is_tgtdev_for_dev_replace;
>
> +	/* If this devices fails to send/wait dev flush */
> +	bool err_send;
> +	bool err_wait;



>  #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
>  	seqcount_t data_seqcount;
>  #endif
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/6] btrfs: Allow barrier_all_devices to do chunk level device check
  2017-03-07  4:48   ` Anand Jain
@ 2017-03-07  5:36     ` Qu Wenruo
  2017-03-07  6:55       ` Anand Jain
  0 siblings, 1 reply; 20+ messages in thread
From: Qu Wenruo @ 2017-03-07  5:36 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, lists



At 03/07/2017 12:48 PM, Anand Jain wrote:
>
>
> On 03/06/2017 04:58 PM, Qu Wenruo wrote:
>> The last user of num_tolerated_disk_barrier_failures is
>> barrier_all_devices().
>> But it's can be easily changed to new per-chunk degradable check
>> framework.
>>
>> Now btrfs_device will have two extra members, representing send/wait
>> error, set at write_dev_flush() time.
>> With these 2 new members, btrfs_check_rw_degradable() can check if the
>> fs is still OK when the fs is committed to disk.
>
>  This logic isn't reentrant, earlier it was. How about using
>  stack variable instead ?
>
> Thanks, Anand

Thanks for the review.

However I didn't quite get the point about the re-entrance and stack 
variable here.

1) About reentrancy
In previous version, the err_* bits are still put into btrfs_devices 
structure, just timing of resetting these bits are changes.

So either way, it's not reentrant in theory.

But that doesn't make a problem, as we have other things to protect when 
calling write_all_supers(), the only caller of barrier_all_devices().

So would you give me an example why we need to make it reentrant?

2) About using stack variable?
Did you mean build a array on stack to record which devices fails to 
send/wait and use the array as check condition other than 
btrfs_device->err_* and btrfs_device->missing ?

The only problem is, it sounds more complex than needed.
Despite the err_*, we also needs to check already missing devices, so I 
prefer the easy way, just checking btrfs_device->err_* and 
btrfs_device->missing.

Any simple example to explain your suggestion here?

Thanks,
Qu

>
>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> ---
>>  fs/btrfs/disk-io.c | 15 +++++++--------
>>  fs/btrfs/volumes.c |  4 +++-
>>  fs/btrfs/volumes.h |  4 ++++
>>  3 files changed, 14 insertions(+), 9 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index c26b8a0b121c..f596bd130524 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3569,17 +3569,17 @@ static int barrier_all_devices(struct
>> btrfs_fs_info *info)
>>  {
>>      struct list_head *head;
>>      struct btrfs_device *dev;
>> -    int errors_send = 0;
>> -    int errors_wait = 0;
>>      int ret;
>>
>>      /* send down all the barriers */
>>      head = &info->fs_devices->devices;
>>      list_for_each_entry_rcu(dev, head, dev_list) {
>> +        dev->err_wait = false;
>> +        dev->err_send = false;
>>          if (dev->missing)
>>              continue;
>>          if (!dev->bdev) {
>> -            errors_send++;
>> +            dev->err_send = true;
>>              continue;
>>          }
>>          if (!dev->in_fs_metadata || !dev->writeable)
>> @@ -3587,7 +3587,7 @@ static int barrier_all_devices(struct
>> btrfs_fs_info *info)
>>
>>          ret = write_dev_flush(dev, 0);
>>          if (ret)
>> -            errors_send++;
>> +            dev->err_send = true;
>>      }
>>
>>      /* wait for all the barriers */
>> @@ -3595,7 +3595,7 @@ static int barrier_all_devices(struct
>> btrfs_fs_info *info)
>>          if (dev->missing)
>>              continue;
>>          if (!dev->bdev) {
>> -            errors_wait++;
>> +            dev->err_wait = true;
>>              continue;
>>          }
>>          if (!dev->in_fs_metadata || !dev->writeable)
>> @@ -3603,10 +3603,9 @@ static int barrier_all_devices(struct
>> btrfs_fs_info *info)
>>
>>          ret = write_dev_flush(dev, 1);
>>          if (ret)
>> -            errors_wait++;
>> +            dev->err_wait = true;
>>      }
>> -    if (errors_send > info->num_tolerated_disk_barrier_failures ||
>> -        errors_wait > info->num_tolerated_disk_barrier_failures)
>> +    if (!btrfs_check_rw_degradable(info))
>>          return -EIO;
>>      return 0;
>>  }
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index dd9dd94d7043..729cbd0d2b60 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -6796,7 +6796,9 @@ bool btrfs_check_rw_degradable(struct
>> btrfs_fs_info *fs_info)
>>              btrfs_get_num_tolerated_disk_barrier_failures(
>>                      map->type);
>>          for (i = 0; i < map->num_stripes; i++) {
>> -            if (map->stripes[i].dev->missing)
>> +            if (map->stripes[i].dev->missing ||
>> +                map->stripes[i].dev->err_wait ||
>> +                map->stripes[i].dev->err_send)
>>                  missing++;
>>          }
>>          if (missing > max_tolerated) {
>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>> index db1b5ef479cf..112fccacdabc 100644
>> --- a/fs/btrfs/volumes.h
>> +++ b/fs/btrfs/volumes.h
>> @@ -75,6 +75,10 @@ struct btrfs_device {
>>      int can_discard;
>>      int is_tgtdev_for_dev_replace;
>>
>> +    /* If this devices fails to send/wait dev flush */
>> +    bool err_send;
>> +    bool err_wait;
>
>
>
>>  #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
>>      seqcount_t data_seqcount;
>>  #endif
>>
>
>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/6] btrfs: Allow barrier_all_devices to do chunk level device check
  2017-03-07  5:36     ` Qu Wenruo
@ 2017-03-07  6:55       ` Anand Jain
  2017-03-07  7:08         ` Qu Wenruo
  0 siblings, 1 reply; 20+ messages in thread
From: Anand Jain @ 2017-03-07  6:55 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, lists





> 1) About reentrancy
> In previous version, the err_* bits are still put into btrfs_devices
> structure, just timing of resetting these bits are changes.
>
> So either way, it's not reentrant in theory.
>
> But that doesn't make a problem, as we have other things to protect when
> calling write_all_supers(), the only caller of barrier_all_devices().
>
> So would you give me an example why we need to make it reentrant?

  Its updating the device struct I would avoid such a change
  for the reasons of this patch. (I notice that in V1 as well).
  Further btrfs does not handle online intermittent device failure,
  unless the complete story is taken care, I would avoid such a change.

  Theoretically this patch is buggy, btrfs_check_rw_degradable() has
  more consumers than just the barrier_all_devices(). However the
  dev->err_wait and dev->err_send are managed by only
  barrier_all_devices(). And the bad news is dev->err_wait and
  dev->err_send would affect the result of "missing" coming out of
  btrfs_check_rw_degradable() which is wrong for the threads other
  than barrier_all_devices(). Further, the only way dev->err_wait and
  dev->err_send gets reset is by the next call to
  barrier_all_devices(). And if lock is an answer that would makes
  it messy and complicated. I won't do that.

  There is a simple solution as below..

> 2) About using stack variable?

  pass err_send and err_write to btrfs_check_rw_degradable() through
  argument so to compute degradable for the barrier_all_devices().
  In this way changes are kept local thread specific.

Thanks, Anand


> Did you mean build a array on stack to record which devices fails to
> send/wait and use the array as check condition other than
> btrfs_device->err_* and btrfs_device->missing ?
>
> The only problem is, it sounds more complex than needed.
> Despite the err_*, we also needs to check already missing devices, so I
> prefer the easy way, just checking btrfs_device->err_* and
> btrfs_device->missing.
>
> Any simple example to explain your suggestion here?
>
> Thanks,
> Qu
>
>>
>>
>>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>> ---
>>>  fs/btrfs/disk-io.c | 15 +++++++--------
>>>  fs/btrfs/volumes.c |  4 +++-
>>>  fs/btrfs/volumes.h |  4 ++++
>>>  3 files changed, 14 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>> index c26b8a0b121c..f596bd130524 100644
>>> --- a/fs/btrfs/disk-io.c
>>> +++ b/fs/btrfs/disk-io.c
>>> @@ -3569,17 +3569,17 @@ static int barrier_all_devices(struct
>>> btrfs_fs_info *info)
>>>  {
>>>      struct list_head *head;
>>>      struct btrfs_device *dev;
>>> -    int errors_send = 0;
>>> -    int errors_wait = 0;
>>>      int ret;
>>>
>>>      /* send down all the barriers */
>>>      head = &info->fs_devices->devices;
>>>      list_for_each_entry_rcu(dev, head, dev_list) {
>>> +        dev->err_wait = false;
>>> +        dev->err_send = false;
>>>          if (dev->missing)
>>>              continue;
>>>          if (!dev->bdev) {
>>> -            errors_send++;
>>> +            dev->err_send = true;
>>>              continue;
>>>          }
>>>          if (!dev->in_fs_metadata || !dev->writeable)
>>> @@ -3587,7 +3587,7 @@ static int barrier_all_devices(struct
>>> btrfs_fs_info *info)
>>>
>>>          ret = write_dev_flush(dev, 0);
>>>          if (ret)
>>> -            errors_send++;
>>> +            dev->err_send = true;
>>>      }
>>>
>>>      /* wait for all the barriers */
>>> @@ -3595,7 +3595,7 @@ static int barrier_all_devices(struct
>>> btrfs_fs_info *info)
>>>          if (dev->missing)
>>>              continue;
>>>          if (!dev->bdev) {
>>> -            errors_wait++;
>>> +            dev->err_wait = true;
>>>              continue;
>>>          }
>>>          if (!dev->in_fs_metadata || !dev->writeable)
>>> @@ -3603,10 +3603,9 @@ static int barrier_all_devices(struct
>>> btrfs_fs_info *info)
>>>
>>>          ret = write_dev_flush(dev, 1);
>>>          if (ret)
>>> -            errors_wait++;
>>> +            dev->err_wait = true;
>>>      }
>>> -    if (errors_send > info->num_tolerated_disk_barrier_failures ||
>>> -        errors_wait > info->num_tolerated_disk_barrier_failures)
>>> +    if (!btrfs_check_rw_degradable(info))
>>>          return -EIO;
>>>      return 0;
>>>  }
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index dd9dd94d7043..729cbd0d2b60 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -6796,7 +6796,9 @@ bool btrfs_check_rw_degradable(struct
>>> btrfs_fs_info *fs_info)
>>>              btrfs_get_num_tolerated_disk_barrier_failures(
>>>                      map->type);
>>>          for (i = 0; i < map->num_stripes; i++) {
>>> -            if (map->stripes[i].dev->missing)
>>> +            if (map->stripes[i].dev->missing ||
>>> +                map->stripes[i].dev->err_wait ||
>>> +                map->stripes[i].dev->err_send)
>>>                  missing++;
>>>          }

  This is rather wrong.


>>>          if (missing > max_tolerated) {
>>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>>> index db1b5ef479cf..112fccacdabc 100644
>>> --- a/fs/btrfs/volumes.h
>>> +++ b/fs/btrfs/volumes.h
>>> @@ -75,6 +75,10 @@ struct btrfs_device {
>>>      int can_discard;
>>>      int is_tgtdev_for_dev_replace;
>>>
>>> +    /* If this devices fails to send/wait dev flush */
>>> +    bool err_send;
>>> +    bool err_wait;
>>
>>
>>
>>>  #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
>>>      seqcount_t data_seqcount;
>>>  #endif
>>>
>>
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/6] btrfs: Allow barrier_all_devices to do chunk level device check
  2017-03-07  6:55       ` Anand Jain
@ 2017-03-07  7:08         ` Qu Wenruo
  2017-03-07  8:07           ` Qu Wenruo
  0 siblings, 1 reply; 20+ messages in thread
From: Qu Wenruo @ 2017-03-07  7:08 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, lists



At 03/07/2017 02:55 PM, Anand Jain wrote:
>
>
>
>
>> 1) About reentrancy
>> In previous version, the err_* bits are still put into btrfs_devices
>> structure, just timing of resetting these bits are changes.
>>
>> So either way, it's not reentrant in theory.
>>
>> But that doesn't make a problem, as we have other things to protect when
>> calling write_all_supers(), the only caller of barrier_all_devices().
>>
>> So would you give me an example why we need to make it reentrant?
>
>  Its updating the device struct I would avoid such a change
>  for the reasons of this patch. (I notice that in V1 as well).
>  Further btrfs does not handle online intermittent device failure,
>  unless the complete story is taken care, I would avoid such a change.
>
>  Theoretically this patch is buggy, btrfs_check_rw_degradable() has
>  more consumers than just the barrier_all_devices(). However the
>  dev->err_wait and dev->err_send are managed by only
>  barrier_all_devices(). And the bad news is dev->err_wait and
>  dev->err_send would affect the result of "missing" coming out of
>  btrfs_check_rw_degradable() which is wrong for the threads other
>  than barrier_all_devices(). Further, the only way dev->err_wait and
>  dev->err_send gets reset is by the next call to
>  barrier_all_devices(). And if lock is an answer that would makes
>  it messy and complicated. I won't do that.
>
>  There is a simple solution as below..
>
>> 2) About using stack variable?
>
>  pass err_send and err_write to btrfs_check_rw_degradable() through
>  argument so to compute degradable for the barrier_all_devices().
>  In this way changes are kept local thread specific.

In this way, we need to make a expandable structure to record devid <-> 
err_send/wait mapping.

Simple array is not suitable here, as the starting devid can be either 1 
or 0 depending on whether we're replacing.
Furthermore devid is not always sequential, we can have holes in devid 
allocation.

Although this behavior will indeed makes less impact on btrfs_device 
structure.

I'll update the patchset and try such method, just hopes this won't 
introduce too much new code.

Thanks for the advice,
Qu

>
> Thanks, Anand
>
>
>> Did you mean build a array on stack to record which devices fails to
>> send/wait and use the array as check condition other than
>> btrfs_device->err_* and btrfs_device->missing ?
>>
>> The only problem is, it sounds more complex than needed.
>> Despite the err_*, we also needs to check already missing devices, so I
>> prefer the easy way, just checking btrfs_device->err_* and
>> btrfs_device->missing.
>>
>> Any simple example to explain your suggestion here?
>>
>> Thanks,
>> Qu
>>
>>>
>>>
>>>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>>> ---
>>>>  fs/btrfs/disk-io.c | 15 +++++++--------
>>>>  fs/btrfs/volumes.c |  4 +++-
>>>>  fs/btrfs/volumes.h |  4 ++++
>>>>  3 files changed, 14 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>>> index c26b8a0b121c..f596bd130524 100644
>>>> --- a/fs/btrfs/disk-io.c
>>>> +++ b/fs/btrfs/disk-io.c
>>>> @@ -3569,17 +3569,17 @@ static int barrier_all_devices(struct
>>>> btrfs_fs_info *info)
>>>>  {
>>>>      struct list_head *head;
>>>>      struct btrfs_device *dev;
>>>> -    int errors_send = 0;
>>>> -    int errors_wait = 0;
>>>>      int ret;
>>>>
>>>>      /* send down all the barriers */
>>>>      head = &info->fs_devices->devices;
>>>>      list_for_each_entry_rcu(dev, head, dev_list) {
>>>> +        dev->err_wait = false;
>>>> +        dev->err_send = false;
>>>>          if (dev->missing)
>>>>              continue;
>>>>          if (!dev->bdev) {
>>>> -            errors_send++;
>>>> +            dev->err_send = true;
>>>>              continue;
>>>>          }
>>>>          if (!dev->in_fs_metadata || !dev->writeable)
>>>> @@ -3587,7 +3587,7 @@ static int barrier_all_devices(struct
>>>> btrfs_fs_info *info)
>>>>
>>>>          ret = write_dev_flush(dev, 0);
>>>>          if (ret)
>>>> -            errors_send++;
>>>> +            dev->err_send = true;
>>>>      }
>>>>
>>>>      /* wait for all the barriers */
>>>> @@ -3595,7 +3595,7 @@ static int barrier_all_devices(struct
>>>> btrfs_fs_info *info)
>>>>          if (dev->missing)
>>>>              continue;
>>>>          if (!dev->bdev) {
>>>> -            errors_wait++;
>>>> +            dev->err_wait = true;
>>>>              continue;
>>>>          }
>>>>          if (!dev->in_fs_metadata || !dev->writeable)
>>>> @@ -3603,10 +3603,9 @@ static int barrier_all_devices(struct
>>>> btrfs_fs_info *info)
>>>>
>>>>          ret = write_dev_flush(dev, 1);
>>>>          if (ret)
>>>> -            errors_wait++;
>>>> +            dev->err_wait = true;
>>>>      }
>>>> -    if (errors_send > info->num_tolerated_disk_barrier_failures ||
>>>> -        errors_wait > info->num_tolerated_disk_barrier_failures)
>>>> +    if (!btrfs_check_rw_degradable(info))
>>>>          return -EIO;
>>>>      return 0;
>>>>  }
>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>> index dd9dd94d7043..729cbd0d2b60 100644
>>>> --- a/fs/btrfs/volumes.c
>>>> +++ b/fs/btrfs/volumes.c
>>>> @@ -6796,7 +6796,9 @@ bool btrfs_check_rw_degradable(struct
>>>> btrfs_fs_info *fs_info)
>>>>              btrfs_get_num_tolerated_disk_barrier_failures(
>>>>                      map->type);
>>>>          for (i = 0; i < map->num_stripes; i++) {
>>>> -            if (map->stripes[i].dev->missing)
>>>> +            if (map->stripes[i].dev->missing ||
>>>> +                map->stripes[i].dev->err_wait ||
>>>> +                map->stripes[i].dev->err_send)
>>>>                  missing++;
>>>>          }
>
>  This is rather wrong.
>
>
>>>>          if (missing > max_tolerated) {
>>>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>>>> index db1b5ef479cf..112fccacdabc 100644
>>>> --- a/fs/btrfs/volumes.h
>>>> +++ b/fs/btrfs/volumes.h
>>>> @@ -75,6 +75,10 @@ struct btrfs_device {
>>>>      int can_discard;
>>>>      int is_tgtdev_for_dev_replace;
>>>>
>>>> +    /* If this devices fails to send/wait dev flush */
>>>> +    bool err_send;
>>>> +    bool err_wait;
>>>
>>>
>>>
>>>>  #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
>>>>      seqcount_t data_seqcount;
>>>>  #endif
>>>>
>>>
>>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/6] btrfs: Allow barrier_all_devices to do chunk level device check
  2017-03-07  7:08         ` Qu Wenruo
@ 2017-03-07  8:07           ` Qu Wenruo
  0 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2017-03-07  8:07 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, lists



At 03/07/2017 03:08 PM, Qu Wenruo wrote:
>
>
> At 03/07/2017 02:55 PM, Anand Jain wrote:
>>
>>
>>
>>
>>> 1) About reentrancy
>>> In previous version, the err_* bits are still put into btrfs_devices
>>> structure, just timing of resetting these bits are changes.
>>>
>>> So either way, it's not reentrant in theory.
>>>
>>> But that doesn't make a problem, as we have other things to protect when
>>> calling write_all_supers(), the only caller of barrier_all_devices().
>>>
>>> So would you give me an example why we need to make it reentrant?
>>
>>  Its updating the device struct I would avoid such a change
>>  for the reasons of this patch. (I notice that in V1 as well).
>>  Further btrfs does not handle online intermittent device failure,
>>  unless the complete story is taken care, I would avoid such a change.
>>
>>  Theoretically this patch is buggy, btrfs_check_rw_degradable() has
>>  more consumers than just the barrier_all_devices(). However the
>>  dev->err_wait and dev->err_send are managed by only
>>  barrier_all_devices(). And the bad news is dev->err_wait and
>>  dev->err_send would affect the result of "missing" coming out of
>>  btrfs_check_rw_degradable() which is wrong for the threads other
>>  than barrier_all_devices(). Further, the only way dev->err_wait and
>>  dev->err_send gets reset is by the next call to
>>  barrier_all_devices(). And if lock is an answer that would makes
>>  it messy and complicated. I won't do that.
>>
>>  There is a simple solution as below..
>>
>>> 2) About using stack variable?
>>
>>  pass err_send and err_write to btrfs_check_rw_degradable() through
>>  argument so to compute degradable for the barrier_all_devices().
>>  In this way changes are kept local thread specific.
>
> In this way, we need to make a expandable structure to record devid <->
s/a expandable/an expendable/

Sorry for the confusion.

Thanks,
Qu
> err_send/wait mapping.
>
> Simple array is not suitable here, as the starting devid can be either 1
> or 0 depending on whether we're replacing.
> Furthermore devid is not always sequential, we can have holes in devid
> allocation.
>
> Although this behavior will indeed makes less impact on btrfs_device
> structure.
>
> I'll update the patchset and try such method, just hopes this won't
> introduce too much new code.
>
> Thanks for the advice,
> Qu
>
>>
>> Thanks, Anand
>>
>>
>>> Did you mean build a array on stack to record which devices fails to
>>> send/wait and use the array as check condition other than
>>> btrfs_device->err_* and btrfs_device->missing ?
>>>
>>> The only problem is, it sounds more complex than needed.
>>> Despite the err_*, we also needs to check already missing devices, so I
>>> prefer the easy way, just checking btrfs_device->err_* and
>>> btrfs_device->missing.
>>>
>>> Any simple example to explain your suggestion here?
>>>
>>> Thanks,
>>> Qu
>>>
>>>>
>>>>
>>>>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>>>> ---
>>>>>  fs/btrfs/disk-io.c | 15 +++++++--------
>>>>>  fs/btrfs/volumes.c |  4 +++-
>>>>>  fs/btrfs/volumes.h |  4 ++++
>>>>>  3 files changed, 14 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>>>> index c26b8a0b121c..f596bd130524 100644
>>>>> --- a/fs/btrfs/disk-io.c
>>>>> +++ b/fs/btrfs/disk-io.c
>>>>> @@ -3569,17 +3569,17 @@ static int barrier_all_devices(struct
>>>>> btrfs_fs_info *info)
>>>>>  {
>>>>>      struct list_head *head;
>>>>>      struct btrfs_device *dev;
>>>>> -    int errors_send = 0;
>>>>> -    int errors_wait = 0;
>>>>>      int ret;
>>>>>
>>>>>      /* send down all the barriers */
>>>>>      head = &info->fs_devices->devices;
>>>>>      list_for_each_entry_rcu(dev, head, dev_list) {
>>>>> +        dev->err_wait = false;
>>>>> +        dev->err_send = false;
>>>>>          if (dev->missing)
>>>>>              continue;
>>>>>          if (!dev->bdev) {
>>>>> -            errors_send++;
>>>>> +            dev->err_send = true;
>>>>>              continue;
>>>>>          }
>>>>>          if (!dev->in_fs_metadata || !dev->writeable)
>>>>> @@ -3587,7 +3587,7 @@ static int barrier_all_devices(struct
>>>>> btrfs_fs_info *info)
>>>>>
>>>>>          ret = write_dev_flush(dev, 0);
>>>>>          if (ret)
>>>>> -            errors_send++;
>>>>> +            dev->err_send = true;
>>>>>      }
>>>>>
>>>>>      /* wait for all the barriers */
>>>>> @@ -3595,7 +3595,7 @@ static int barrier_all_devices(struct
>>>>> btrfs_fs_info *info)
>>>>>          if (dev->missing)
>>>>>              continue;
>>>>>          if (!dev->bdev) {
>>>>> -            errors_wait++;
>>>>> +            dev->err_wait = true;
>>>>>              continue;
>>>>>          }
>>>>>          if (!dev->in_fs_metadata || !dev->writeable)
>>>>> @@ -3603,10 +3603,9 @@ static int barrier_all_devices(struct
>>>>> btrfs_fs_info *info)
>>>>>
>>>>>          ret = write_dev_flush(dev, 1);
>>>>>          if (ret)
>>>>> -            errors_wait++;
>>>>> +            dev->err_wait = true;
>>>>>      }
>>>>> -    if (errors_send > info->num_tolerated_disk_barrier_failures ||
>>>>> -        errors_wait > info->num_tolerated_disk_barrier_failures)
>>>>> +    if (!btrfs_check_rw_degradable(info))
>>>>>          return -EIO;
>>>>>      return 0;
>>>>>  }
>>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>>> index dd9dd94d7043..729cbd0d2b60 100644
>>>>> --- a/fs/btrfs/volumes.c
>>>>> +++ b/fs/btrfs/volumes.c
>>>>> @@ -6796,7 +6796,9 @@ bool btrfs_check_rw_degradable(struct
>>>>> btrfs_fs_info *fs_info)
>>>>>              btrfs_get_num_tolerated_disk_barrier_failures(
>>>>>                      map->type);
>>>>>          for (i = 0; i < map->num_stripes; i++) {
>>>>> -            if (map->stripes[i].dev->missing)
>>>>> +            if (map->stripes[i].dev->missing ||
>>>>> +                map->stripes[i].dev->err_wait ||
>>>>> +                map->stripes[i].dev->err_send)
>>>>>                  missing++;
>>>>>          }
>>
>>  This is rather wrong.
>>
>>
>>>>>          if (missing > max_tolerated) {
>>>>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>>>>> index db1b5ef479cf..112fccacdabc 100644
>>>>> --- a/fs/btrfs/volumes.h
>>>>> +++ b/fs/btrfs/volumes.h
>>>>> @@ -75,6 +75,10 @@ struct btrfs_device {
>>>>>      int can_discard;
>>>>>      int is_tgtdev_for_dev_replace;
>>>>>
>>>>> +    /* If this devices fails to send/wait dev flush */
>>>>> +    bool err_send;
>>>>> +    bool err_wait;
>>>>
>>>>
>>>>
>>>>>  #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
>>>>>      seqcount_t data_seqcount;
>>>>>  #endif
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/6] btrfs: Introduce a function to check if all chunks a OK for degraded rw mount
  2017-03-07  4:48   ` Anand Jain
@ 2017-03-08 18:26     ` Anand Jain
  2017-03-09  0:31       ` Qu Wenruo
  0 siblings, 1 reply; 20+ messages in thread
From: Anand Jain @ 2017-03-08 18:26 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, lists



>> Introduce a new function, btrfs_check_rw_degradable(), to check if all
>> chunks in btrfs is OK for degraded rw mount.
>>
>> It provides the new basis for accurate btrfs mount/remount and even
>> runtime degraded mount check other than old one-size-fit-all method.

   Sorry for late response. But this patch has a bug (calls free
   with the lock held) and fixed in this patch [1].
    [1]
     [PATCH] btrfs: fix btrfs_check_degradable() to free extent map
     https://patchwork.kernel.org/patch/8868761/

   the last version of this patch is here [2] and it did contain the
   above fix [1] but it missed my sign-off and change commit log
   update (sorry for that)
     [2] http://www.spinics.net/lists/linux-btrfs/msg55038.html

   Can you pls squash [1] to this patch.

   With that.

>  Looks good.
>  Reviewed-by: Anand Jain <anand.jain@oracle.com>

  While you are here, can you also consider..

-----
@@ -6843,7 +6843,7 @@ bool btrfs_check_rw_degradable(struct 
btrfs_fs_info *fs_info,
                 int max_tolerated;
                 int i;

-               map = (struct map_lookup *) em->bdev;
+               map = em->map_lookup;
                 max_tolerated =
                         btrfs_get_num_tolerated_disk_barrier_failures(
                                         map->type);
------


   Thanks, Anand

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/6] btrfs: Introduce a function to check if all chunks a OK for degraded rw mount
  2017-03-08 18:26     ` Anand Jain
@ 2017-03-09  0:31       ` Qu Wenruo
  0 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2017-03-09  0:31 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, lists



At 03/09/2017 02:26 AM, Anand Jain wrote:
>
>
>>> Introduce a new function, btrfs_check_rw_degradable(), to check if all
>>> chunks in btrfs is OK for degraded rw mount.
>>>
>>> It provides the new basis for accurate btrfs mount/remount and even
>>> runtime degraded mount check other than old one-size-fit-all method.
>
>   Sorry for late response. But this patch has a bug (calls free
>   with the lock held) and fixed in this patch [1].
>    [1]
>     [PATCH] btrfs: fix btrfs_check_degradable() to free extent map
>     https://patchwork.kernel.org/patch/8868761/

Thanks for the patch, while in v3, the memory leak is fixed.

However I still call free with read lock hold, I'll update it in v3.
>
>   the last version of this patch is here [2] and it did contain the
>   above fix [1] but it missed my sign-off and change commit log
>   update (sorry for that)
>     [2] http://www.spinics.net/lists/linux-btrfs/msg55038.html
>
>   Can you pls squash [1] to this patch.

I'll add you signed-off-by tag in next version too.

Thanks,
Qu
>
>   With that.
>
>>  Looks good.
>>  Reviewed-by: Anand Jain <anand.jain@oracle.com>
>
>  While you are here, can you also consider..
>
> -----
> @@ -6843,7 +6843,7 @@ bool btrfs_check_rw_degradable(struct
> btrfs_fs_info *fs_info,
>                 int max_tolerated;
>                 int i;
>
> -               map = (struct map_lookup *) em->bdev;
> +               map = em->map_lookup;
>                 max_tolerated =
>                         btrfs_get_num_tolerated_disk_barrier_failures(
>                                         map->type);
> ------
>
>
>   Thanks, Anand
>
>



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-03-09  0:36 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-06  8:58 [PATCH v2 0/6] Chunk level degradable check Qu Wenruo
2017-03-06  8:58 ` [PATCH v2 1/6] btrfs: Introduce a function to check if all chunks a OK for degraded rw mount Qu Wenruo
2017-03-07  4:48   ` Anand Jain
2017-03-08 18:26     ` Anand Jain
2017-03-09  0:31       ` Qu Wenruo
2017-03-06  8:58 ` [PATCH v2 2/6] btrfs: Do chunk level rw degrade check at mount time Qu Wenruo
2017-03-06  8:58 ` [PATCH v2 3/6] btrfs: Do chunk level degradation check for remount Qu Wenruo
2017-03-06  8:58 ` [PATCH v2 4/6] btrfs: Allow barrier_all_devices to do chunk level device check Qu Wenruo
2017-03-07  4:48   ` Anand Jain
2017-03-07  5:36     ` Qu Wenruo
2017-03-07  6:55       ` Anand Jain
2017-03-07  7:08         ` Qu Wenruo
2017-03-07  8:07           ` Qu Wenruo
2017-03-06  8:58 ` [PATCH v2 5/6] btrfs: Cleanup num_tolerated_disk_barrier_failures Qu Wenruo
2017-03-06  8:58 ` [PATCH v2 6/6] btrfs: Enhance missing device kernel message Qu Wenruo
2017-03-07  4:47   ` Anand Jain
2017-03-06 18:49 ` [PATCH v2 0/6] Chunk level degradable check Dmitrii Tcvetkov
2017-03-07  0:36 ` Adam Borowski
2017-03-07  1:35   ` Qu Wenruo
2017-03-07  2:23     ` Adam Borowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.