All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3.1 0/7] Chunk level degradable check
@ 2017-03-09  1:34 Qu Wenruo
  2017-03-09  1:34 ` [PATCH v3.1 1/7] btrfs: Introduce a function to check if all chunks a OK for degraded rw mount Qu Wenruo
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Qu Wenruo @ 2017-03-09  1:34 UTC (permalink / raw)
  To: linux-btrfs, anand.jain

Btrfs currently uses num_tolerated_disk_barrier_failures to do global
check for tolerated missing device.

Although the one-size-fit-all solution is quite safe, it's too strict
if data and metadata has different duplication level.

For example, if one use Single data and RAID1 metadata for 2 disks, it
means any missing device will make the fs unable to be degraded
mounted.

But in fact, some times all single chunks may be in the existing
device and in that case, we should allow it to be rw degraded mounted.

Such case can be easily reproduced using the following script:
 # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc
 # wipefs -f /dev/sdc
 # mount /dev/sdb -o degraded,rw

If using btrfs-debug-tree to check /dev/sdb, one should find that the
data chunk is only in sdb, so in fact it should allow degraded mount.

This patchset will introduce a new per-chunk degradable check for
btrfs, allow above case to succeed, and it's quite small anyway.

And enhance kernel error message for missing device, at least kernel
can know what's making mount failed, other than meaningless
"failed to read system chunk/chunk tree -5".

v2:
  Update after almost 2 years.
  Add the last patch to enhance the kernel output, so user can know
  it's missing devices prevent btrfs to mount.
v3:
  Remove one duplicated missing device output
  Use the advice from Anand Jain, not to add new members in btrfs_device,
  but use a new structure extra_rw_degrade_errors, to record error when
  sending down/waiting device.
v3.1:
  Reduce the critical section in btrfs_check_rw_degradable(), follow other
  caller to only acquire the lock when searching, as extent_map has
  refcount to avoid concurrency already.
  The modification itself won't affect the behavior, so tested-by tags are
  added to each patch.


Qu Wenruo (7):
  btrfs: Introduce a function to check if all chunks a OK for degraded
    rw mount
  btrfs: Do chunk level rw degrade check at mount time
  btrfs: Do chunk level degradation check for remount
  btrfs: Introduce extra_rw_degrade_errors parameter for
    btrfs_check_rw_degradable
  btrfs: Allow barrier_all_devices to do chunk level device check
  btrfs: Cleanup num_tolerated_disk_barrier_failures
  btrfs: Enhance missing device kernel message

 fs/btrfs/ctree.h   |   2 -
 fs/btrfs/disk-io.c |  87 ++++++-----------------------
 fs/btrfs/disk-io.h |   2 -
 fs/btrfs/super.c   |   5 +-
 fs/btrfs/volumes.c | 158 +++++++++++++++++++++++++++++++++++++++++++++--------
 fs/btrfs/volumes.h |  37 +++++++++++++
 6 files changed, 190 insertions(+), 101 deletions(-)

-- 
2.12.0




^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3.1 1/7] btrfs: Introduce a function to check if all chunks a OK for degraded rw mount
  2017-03-09  1:34 [PATCH v3.1 0/7] Chunk level degradable check Qu Wenruo
@ 2017-03-09  1:34 ` Qu Wenruo
  2017-03-13  7:29   ` Anand Jain
  2017-03-09  1:34 ` [PATCH v3.1 2/7] btrfs: Do chunk level rw degrade check at mount time Qu Wenruo
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Qu Wenruo @ 2017-03-09  1:34 UTC (permalink / raw)
  To: linux-btrfs, anand.jain

Introduce a new function, btrfs_check_rw_degradable(), to check if all
chunks in btrfs is OK for degraded rw mount.

It provides the new basis for accurate btrfs mount/remount and even
runtime degraded mount check other than old one-size-fit-all method.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
Tested-by: Adam Borowski <kilobyte@angband.pl>
Tested-by: Dmitrii Tcvetkov <demfloro@demfloro.ru>
---
 fs/btrfs/volumes.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.h |  1 +
 2 files changed, 56 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 73d56eef5e60..83613955e3c2 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6765,6 +6765,61 @@ int btrfs_read_sys_array(struct btrfs_fs_info *fs_info)
 	return -EIO;
 }
 
+/*
+ * Check if all chunks in the fs is OK for read-write degraded mount
+ *
+ * Return true if the fs is OK to be mounted degraded read-write
+ * Return false if the fs is not OK to be mounted degraded
+ */
+bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
+	struct extent_map *em;
+	u64 next_start = 0;
+	bool ret = true;
+
+	read_lock(&map_tree->map_tree.lock);
+	em = lookup_extent_mapping(&map_tree->map_tree, 0, (u64)-1);
+	read_unlock(&map_tree->map_tree.lock);
+	/* No chunk at all? Return false anyway */
+	if (!em) {
+		ret = false;
+		goto out;
+	}
+	while (em) {
+		struct map_lookup *map;
+		int missing = 0;
+		int max_tolerated;
+		int i;
+
+		map = (struct map_lookup *) em->bdev;
+		max_tolerated =
+			btrfs_get_num_tolerated_disk_barrier_failures(
+					map->type);
+		for (i = 0; i < map->num_stripes; i++) {
+			if (map->stripes[i].dev->missing)
+				missing++;
+		}
+		if (missing > max_tolerated) {
+			ret = false;
+			btrfs_warn(fs_info,
+	"chunk %llu missing %d devices, max tolerance is %d for writeble mount",
+				   em->start, missing, max_tolerated);
+			free_extent_map(em);
+			goto out;
+		}
+		next_start = extent_map_end(em);
+		free_extent_map(em);
+
+		read_lock(&map_tree->map_tree.lock);
+		em = lookup_extent_mapping(&map_tree->map_tree, next_start,
+					   (u64)(-1) - next_start);
+		read_unlock(&map_tree->map_tree.lock);
+	}
+out:
+	return ret;
+}
+
 int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
 {
 	struct btrfs_root *root = fs_info->chunk_root;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 59be81206dd7..db1b5ef479cf 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -538,4 +538,5 @@ struct list_head *btrfs_get_fs_uuids(void);
 void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
 void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
 
+bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info);
 #endif
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3.1 2/7] btrfs: Do chunk level rw degrade check at mount time
  2017-03-09  1:34 [PATCH v3.1 0/7] Chunk level degradable check Qu Wenruo
  2017-03-09  1:34 ` [PATCH v3.1 1/7] btrfs: Introduce a function to check if all chunks a OK for degraded rw mount Qu Wenruo
@ 2017-03-09  1:34 ` Qu Wenruo
  2017-03-09  1:34 ` [PATCH v3.1 3/7] btrfs: Do chunk level degradation check for remount Qu Wenruo
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2017-03-09  1:34 UTC (permalink / raw)
  To: linux-btrfs, anand.jain

Now use the btrfs_check_rw_degradable() to do mount time degration check.

With this patch, now we can mount with the following case:
 # mkfs.btrfs -f -m raid1 -d single /dev/sdb /dev/sdc
 # wipefs -a /dev/sdc
 # mount /dev/sdb /mnt/btrfs -o degraded
 As the single data chunk is only in sdb, so it's OK to mount as
 degraded, as missing one device is OK for RAID1.

But still fail with the following case as expected:
 # mkfs.btrfs -f -m raid1 -d single /dev/sdb /dev/sdc
 # wipefs -a /dev/sdb
 # mount /dev/sdc /mnt/btrfs -o degraded
 As the data chunk is only in sdb, so it's not OK to mount it as
 degraded.

Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Reported-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
Tested-by: Adam Borowski <kilobyte@angband.pl>
Tested-by: Dmitrii Tcvetkov <demfloro@demfloro.ru>
---
 fs/btrfs/disk-io.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 08b74daf35d0..3de89283d400 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3057,15 +3057,10 @@ int open_ctree(struct super_block *sb,
 		btrfs_err(fs_info, "failed to read block groups: %d", ret);
 		goto fail_sysfs;
 	}
-	fs_info->num_tolerated_disk_barrier_failures =
-		btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
-	if (fs_info->fs_devices->missing_devices >
-	     fs_info->num_tolerated_disk_barrier_failures &&
-	    !(sb->s_flags & MS_RDONLY)) {
+
+	if (!(sb->s_flags & MS_RDONLY) && !btrfs_check_rw_degradable(fs_info)) {
 		btrfs_warn(fs_info,
-"missing devices (%llu) exceeds the limit (%d), writeable mount is not allowed",
-			fs_info->fs_devices->missing_devices,
-			fs_info->num_tolerated_disk_barrier_failures);
+		"writeable mount is not allowed due to too many missing devices");
 		goto fail_sysfs;
 	}
 
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3.1 3/7] btrfs: Do chunk level degradation check for remount
  2017-03-09  1:34 [PATCH v3.1 0/7] Chunk level degradable check Qu Wenruo
  2017-03-09  1:34 ` [PATCH v3.1 1/7] btrfs: Introduce a function to check if all chunks a OK for degraded rw mount Qu Wenruo
  2017-03-09  1:34 ` [PATCH v3.1 2/7] btrfs: Do chunk level rw degrade check at mount time Qu Wenruo
@ 2017-03-09  1:34 ` Qu Wenruo
  2017-03-09  1:34 ` [PATCH v3.1 4/7] btrfs: Introduce extra_rw_degrade_errors parameter for btrfs_check_rw_degradable Qu Wenruo
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2017-03-09  1:34 UTC (permalink / raw)
  To: linux-btrfs, anand.jain

Just the same for mount time check, use btrfs_check_rw_degradable() to
check if we are OK to be remounted rw.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
Tested-by: Adam Borowski <kilobyte@angband.pl>
Tested-by: Dmitrii Tcvetkov <demfloro@demfloro.ru>
---
 fs/btrfs/super.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index da687dc79cce..1f5772501c92 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1784,9 +1784,8 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 			goto restore;
 		}
 
-		if (fs_info->fs_devices->missing_devices >
-		     fs_info->num_tolerated_disk_barrier_failures &&
-		    !(*flags & MS_RDONLY)) {
+		if (!(*flags & MS_RDONLY) &&
+		    !btrfs_check_rw_degradable(fs_info)) {
 			btrfs_warn(fs_info,
 				"too many missing devices, writeable remount is not allowed");
 			ret = -EACCES;
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3.1 4/7] btrfs: Introduce extra_rw_degrade_errors parameter for btrfs_check_rw_degradable
  2017-03-09  1:34 [PATCH v3.1 0/7] Chunk level degradable check Qu Wenruo
                   ` (2 preceding siblings ...)
  2017-03-09  1:34 ` [PATCH v3.1 3/7] btrfs: Do chunk level degradation check for remount Qu Wenruo
@ 2017-03-09  1:34 ` Qu Wenruo
  2017-03-09  1:34 ` [PATCH v3.1 5/7] btrfs: Allow barrier_all_devices to do chunk level device check Qu Wenruo
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2017-03-09  1:34 UTC (permalink / raw)
  To: linux-btrfs, anand.jain

Introduce a new structure, extra_rw_degrade_errors, to record
devid<->error mapping.

This strucutre will have a array to record runtime error, which affects
degraded mount, like failure to flush or wait one device.

Also allow btrfs_check_rw_degradable() to accept such structure as
another error source other than btrfs_device->missing.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
Tested-by: Adam Borowski <kilobyte@angband.pl>
Tested-by: Dmitrii Tcvetkov <demfloro@demfloro.ru>
---
 fs/btrfs/disk-io.c |  3 ++-
 fs/btrfs/super.c   |  2 +-
 fs/btrfs/volumes.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 fs/btrfs/volumes.h | 36 ++++++++++++++++++++++++++++-
 4 files changed, 102 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3de89283d400..658b8fab1d39 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3058,7 +3058,8 @@ int open_ctree(struct super_block *sb,
 		goto fail_sysfs;
 	}
 
-	if (!(sb->s_flags & MS_RDONLY) && !btrfs_check_rw_degradable(fs_info)) {
+	if (!(sb->s_flags & MS_RDONLY) &&
+	    !btrfs_check_rw_degradable(fs_info, NULL)) {
 		btrfs_warn(fs_info,
 		"writeable mount is not allowed due to too many missing devices");
 		goto fail_sysfs;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 1f5772501c92..06bd9b332e18 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1785,7 +1785,7 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 		}
 
 		if (!(*flags & MS_RDONLY) &&
-		    !btrfs_check_rw_degradable(fs_info)) {
+		    !btrfs_check_rw_degradable(fs_info, NULL)) {
 			btrfs_warn(fs_info,
 				"too many missing devices, writeable remount is not allowed");
 			ret = -EACCES;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 83613955e3c2..46cf676be15a 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6765,13 +6765,72 @@ int btrfs_read_sys_array(struct btrfs_fs_info *fs_info)
 	return -EIO;
 }
 
+void record_extra_rw_degrade_error(struct extra_rw_degrade_errors *errors,
+				   u64 devid)
+{
+	int i;
+	bool inserted = false;
+
+	if (!errors)
+		return;
+
+	spin_lock(&errors->lock);
+	for (i = 0; i < errors->nr_devs; i++) {
+		struct rw_degrade_error *error = &errors->errors[i];
+
+		if (!error->initialized) {
+			error->devid = devid;
+			error->initialized = true;
+			error->err = true;
+			inserted = true;
+			break;
+		}
+		if (error->devid == devid) {
+			error->err = true;
+			inserted = true;
+			break;
+		}
+	}
+	spin_unlock(&errors->lock);
+	/*
+	 * We iterate all the error records but still found no empty slot
+	 * This means errors->nr_devs is not correct.
+	 */
+	WARN_ON(!inserted);
+}
+
+static bool device_has_rw_degrade_error(struct extra_rw_degrade_errors *errors,
+					u64 devid)
+{
+	int i;
+	bool ret = false;
+
+	if (!errors)
+		return ret;
+
+	spin_lock(&errors->lock);
+	for (i = 0; i < errors->nr_devs; i++) {
+		struct rw_degrade_error *error = &errors->errors[i];
+
+		if (!error->initialized)
+			break;
+		if (error->devid == devid) {
+			ret = true;
+			break;
+		}
+	}
+	spin_unlock(&errors->lock);
+	return ret;
+}
+
 /*
  * Check if all chunks in the fs is OK for read-write degraded mount
  *
  * Return true if the fs is OK to be mounted degraded read-write
  * Return false if the fs is not OK to be mounted degraded
  */
-bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info)
+bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info,
+			       struct extra_rw_degrade_errors *errors)
 {
 	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
 	struct extent_map *em;
@@ -6797,7 +6856,10 @@ bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info)
 			btrfs_get_num_tolerated_disk_barrier_failures(
 					map->type);
 		for (i = 0; i < map->num_stripes; i++) {
-			if (map->stripes[i].dev->missing)
+			struct btrfs_device *device = map->stripes[i].dev;
+
+			if (device->missing ||
+			    device_has_rw_degrade_error(errors, device->devid))
 				missing++;
 		}
 		if (missing > max_tolerated) {
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index db1b5ef479cf..67d7474e42a3 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -538,5 +538,39 @@ struct list_head *btrfs_get_fs_uuids(void);
 void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
 void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
 
-bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info);
+/*
+ * For btrfs_check_rw_degradable() to check extra error from
+ * barrier_all_devices()
+ */
+struct rw_degrade_error {
+	u64 devid;
+	bool initialized;
+	bool err;
+};
+
+struct extra_rw_degrade_errors {
+	int nr_devs;
+	spinlock_t lock;
+	struct rw_degrade_error errors[];
+};
+
+static inline struct extra_rw_degrade_errors *alloc_extra_rw_degrade_errors(
+		int nr_devs)
+{
+	struct extra_rw_degrade_errors *ret;
+
+	ret = kzalloc(sizeof(struct extra_rw_degrade_errors) + nr_devs *
+		      sizeof(struct rw_degrade_error), GFP_NOFS);
+	if (!ret)
+		return ret;
+	spin_lock_init(&ret->lock);
+	ret->nr_devs = nr_devs;
+	return ret;
+}
+
+void record_extra_rw_degrade_error(struct extra_rw_degrade_errors *errors,
+				   u64 devid);
+
+bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info,
+			       struct extra_rw_degrade_errors *errors);
 #endif
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3.1 5/7] btrfs: Allow barrier_all_devices to do chunk level device check
  2017-03-09  1:34 [PATCH v3.1 0/7] Chunk level degradable check Qu Wenruo
                   ` (3 preceding siblings ...)
  2017-03-09  1:34 ` [PATCH v3.1 4/7] btrfs: Introduce extra_rw_degrade_errors parameter for btrfs_check_rw_degradable Qu Wenruo
@ 2017-03-09  1:34 ` Qu Wenruo
  2017-03-13  8:00   ` Anand Jain
  2017-03-09  1:34 ` [PATCH v3.1 6/7] btrfs: Cleanup num_tolerated_disk_barrier_failures Qu Wenruo
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Qu Wenruo @ 2017-03-09  1:34 UTC (permalink / raw)
  To: linux-btrfs, anand.jain

The last user of num_tolerated_disk_barrier_failures is
barrier_all_devices().
But it's can be easily changed to new per-chunk degradable check
framework.

Now btrfs_device will have two extra members, representing send/wait
error, set at write_dev_flush() time.
With these 2 new members, btrfs_check_rw_degradable() can check if the
fs is still OK when the fs is committed to disk.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
Tested-by: Adam Borowski <kilobyte@angband.pl>
Tested-by: Dmitrii Tcvetkov <demfloro@demfloro.ru>
---
 fs/btrfs/disk-io.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 658b8fab1d39..549045a3e15f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3570,17 +3570,20 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 {
 	struct list_head *head;
 	struct btrfs_device *dev;
-	int errors_send = 0;
-	int errors_wait = 0;
+	struct extra_rw_degrade_errors *errors;
 	int ret;
 
+	errors = alloc_extra_rw_degrade_errors(info->fs_devices->num_devices);
+	if (!errors)
+		return -ENOMEM;
+
 	/* send down all the barriers */
 	head = &info->fs_devices->devices;
 	list_for_each_entry_rcu(dev, head, dev_list) {
 		if (dev->missing)
 			continue;
 		if (!dev->bdev) {
-			errors_send++;
+			record_extra_rw_degrade_error(errors, dev->devid);
 			continue;
 		}
 		if (!dev->in_fs_metadata || !dev->writeable)
@@ -3588,7 +3591,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 
 		ret = write_dev_flush(dev, 0);
 		if (ret)
-			errors_send++;
+			record_extra_rw_degrade_error(errors, dev->devid);
 	}
 
 	/* wait for all the barriers */
@@ -3596,7 +3599,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 		if (dev->missing)
 			continue;
 		if (!dev->bdev) {
-			errors_wait++;
+			record_extra_rw_degrade_error(errors, dev->devid);
 			continue;
 		}
 		if (!dev->in_fs_metadata || !dev->writeable)
@@ -3604,11 +3607,13 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 
 		ret = write_dev_flush(dev, 1);
 		if (ret)
-			errors_wait++;
+			record_extra_rw_degrade_error(errors, dev->devid);
 	}
-	if (errors_send > info->num_tolerated_disk_barrier_failures ||
-	    errors_wait > info->num_tolerated_disk_barrier_failures)
+	if (!btrfs_check_rw_degradable(info, errors)) {
+		kfree(errors);
 		return -EIO;
+	}
+	kfree(errors);
 	return 0;
 }
 
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3.1 6/7] btrfs: Cleanup num_tolerated_disk_barrier_failures
  2017-03-09  1:34 [PATCH v3.1 0/7] Chunk level degradable check Qu Wenruo
                   ` (4 preceding siblings ...)
  2017-03-09  1:34 ` [PATCH v3.1 5/7] btrfs: Allow barrier_all_devices to do chunk level device check Qu Wenruo
@ 2017-03-09  1:34 ` Qu Wenruo
  2017-03-09  1:34 ` [PATCH v3.1 7/7] btrfs: Enhance missing device kernel message Qu Wenruo
  2017-06-26 18:59 ` [PATCH v3.1 0/7] Chunk level degradable check David Sterba
  7 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2017-03-09  1:34 UTC (permalink / raw)
  To: linux-btrfs, anand.jain

As we use per-chunk degradable check, now the global
num_tolerated_disk_barrier_failures is of no use.

So cleanup it.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
Tested-by: Adam Borowski <kilobyte@angband.pl>
Tested-by: Dmitrii Tcvetkov <demfloro@demfloro.ru>
---
 fs/btrfs/ctree.h   |  2 --
 fs/btrfs/disk-io.c | 54 ------------------------------------------------------
 fs/btrfs/disk-io.h |  2 --
 fs/btrfs/volumes.c | 17 -----------------
 4 files changed, 75 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 29b7fc28c607..d688025c1ef0 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1062,8 +1062,6 @@ struct btrfs_fs_info {
 	/* next backup root to be overwritten */
 	int backup_root_index;
 
-	int num_tolerated_disk_barrier_failures;
-
 	/* device replace state */
 	struct btrfs_dev_replace dev_replace;
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 549045a3e15f..affd7aada057 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3646,60 +3646,6 @@ int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags)
 	return min_tolerated;
 }
 
-int btrfs_calc_num_tolerated_disk_barrier_failures(
-	struct btrfs_fs_info *fs_info)
-{
-	struct btrfs_ioctl_space_info space;
-	struct btrfs_space_info *sinfo;
-	u64 types[] = {BTRFS_BLOCK_GROUP_DATA,
-		       BTRFS_BLOCK_GROUP_SYSTEM,
-		       BTRFS_BLOCK_GROUP_METADATA,
-		       BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA};
-	int i;
-	int c;
-	int num_tolerated_disk_barrier_failures =
-		(int)fs_info->fs_devices->num_devices;
-
-	for (i = 0; i < ARRAY_SIZE(types); i++) {
-		struct btrfs_space_info *tmp;
-
-		sinfo = NULL;
-		rcu_read_lock();
-		list_for_each_entry_rcu(tmp, &fs_info->space_info, list) {
-			if (tmp->flags == types[i]) {
-				sinfo = tmp;
-				break;
-			}
-		}
-		rcu_read_unlock();
-
-		if (!sinfo)
-			continue;
-
-		down_read(&sinfo->groups_sem);
-		for (c = 0; c < BTRFS_NR_RAID_TYPES; c++) {
-			u64 flags;
-
-			if (list_empty(&sinfo->block_groups[c]))
-				continue;
-
-			btrfs_get_block_group_info(&sinfo->block_groups[c],
-						   &space);
-			if (space.total_bytes == 0 || space.used_bytes == 0)
-				continue;
-			flags = space.flags;
-
-			num_tolerated_disk_barrier_failures = min(
-				num_tolerated_disk_barrier_failures,
-				btrfs_get_num_tolerated_disk_barrier_failures(
-					flags));
-		}
-		up_read(&sinfo->groups_sem);
-	}
-
-	return num_tolerated_disk_barrier_failures;
-}
-
 int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
 {
 	struct list_head *head;
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 2e0ec29bfd69..4522d2f11909 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -142,8 +142,6 @@ struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans,
 int btree_lock_page_hook(struct page *page, void *data,
 				void (*flush_fn)(void *));
 int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags);
-int btrfs_calc_num_tolerated_disk_barrier_failures(
-	struct btrfs_fs_info *fs_info);
 int __init btrfs_end_io_wq_init(void);
 void btrfs_end_io_wq_exit(void);
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 46cf676be15a..217cb149c5ff 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1973,9 +1973,6 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
 		free_fs_devices(cur_devices);
 	}
 
-	fs_info->num_tolerated_disk_barrier_failures =
-		btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
-
 out:
 	mutex_unlock(&uuid_mutex);
 	return ret;
@@ -2474,8 +2471,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 				   "sysfs: failed to create fsid for sprout");
 	}
 
-	fs_info->num_tolerated_disk_barrier_failures =
-		btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
 	ret = btrfs_commit_transaction(trans);
 
 	if (seeding_dev) {
@@ -3858,13 +3853,6 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 			   bctl->meta.target, bctl->data.target);
 	}
 
-	if (bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) {
-		fs_info->num_tolerated_disk_barrier_failures = min(
-			btrfs_calc_num_tolerated_disk_barrier_failures(fs_info),
-			btrfs_get_num_tolerated_disk_barrier_failures(
-				bctl->sys.target));
-	}
-
 	ret = insert_balance_item(fs_info, bctl);
 	if (ret && ret != -EEXIST)
 		goto out;
@@ -3887,11 +3875,6 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 	mutex_lock(&fs_info->balance_mutex);
 	atomic_dec(&fs_info->balance_running);
 
-	if (bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) {
-		fs_info->num_tolerated_disk_barrier_failures =
-			btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
-	}
-
 	if (bargs) {
 		memset(bargs, 0, sizeof(*bargs));
 		update_ioctl_balance_args(fs_info, 0, bargs);
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3.1 7/7] btrfs: Enhance missing device kernel message
  2017-03-09  1:34 [PATCH v3.1 0/7] Chunk level degradable check Qu Wenruo
                   ` (5 preceding siblings ...)
  2017-03-09  1:34 ` [PATCH v3.1 6/7] btrfs: Cleanup num_tolerated_disk_barrier_failures Qu Wenruo
@ 2017-03-09  1:34 ` Qu Wenruo
  2017-06-26 18:59 ` [PATCH v3.1 0/7] Chunk level degradable check David Sterba
  7 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2017-03-09  1:34 UTC (permalink / raw)
  To: linux-btrfs, anand.jain

For missing device, btrfs will just refuse to mount with almost
meaningless kernel message like:

 BTRFS info (device vdb6): disk space caching is enabled
 BTRFS info (device vdb6): has skinny extents
 BTRFS error (device vdb6): failed to read the system array: -5
 BTRFS error (device vdb6): open_ctree failed

This patch will add extra device missing output, making the result to:

 BTRFS info (device vdb6): disk space caching is enabled
 BTRFS info (device vdb6): has skinny extents
 BTRFS warning (device vdb6): devid 2 uuid 80470722-cad2-4b90-b7c3-fee294552f1b is missing
 BTRFS error (device vdb6): failed to read the system array: -5
 BTRFS error (device vdb6): open_ctree failed

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
Tested-by: Adam Borowski <kilobyte@angband.pl>
Tested-by: Dmitrii Tcvetkov <demfloro@demfloro.ru>
---
 fs/btrfs/volumes.c | 24 +++++++++++++++++-------
 fs/btrfs/volumes.h |  2 ++
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 217cb149c5ff..38e0f1c00b56 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6442,6 +6442,7 @@ static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
 		if (!map->stripes[i].dev &&
 		    !btrfs_test_opt(fs_info, DEGRADED)) {
 			free_extent_map(em);
+			btrfs_report_missing_device(fs_info, devid, uuid);
 			return -EIO;
 		}
 		if (!map->stripes[i].dev) {
@@ -6452,8 +6453,7 @@ static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
 				free_extent_map(em);
 				return -EIO;
 			}
-			btrfs_warn(fs_info, "devid %llu uuid %pU is missing",
-				   devid, uuid);
+			btrfs_report_missing_device(fs_info, devid, uuid);
 		}
 		map->stripes[i].dev->in_fs_metadata = 1;
 	}
@@ -6570,17 +6570,21 @@ static int read_one_dev(struct btrfs_fs_info *fs_info,
 
 	device = btrfs_find_device(fs_info, devid, dev_uuid, fs_uuid);
 	if (!device) {
-		if (!btrfs_test_opt(fs_info, DEGRADED))
+		if (!btrfs_test_opt(fs_info, DEGRADED)) {
+			btrfs_report_missing_device(fs_info, devid, dev_uuid);
 			return -EIO;
+		}
 
 		device = add_missing_dev(fs_devices, devid, dev_uuid);
 		if (!device)
 			return -ENOMEM;
-		btrfs_warn(fs_info, "devid %llu uuid %pU missing",
-				devid, dev_uuid);
+		btrfs_report_missing_device(fs_info, devid, dev_uuid);
 	} else {
-		if (!device->bdev && !btrfs_test_opt(fs_info, DEGRADED))
-			return -EIO;
+		if (!device->bdev) {
+			btrfs_report_missing_device(fs_info, devid, dev_uuid);
+			if (!btrfs_test_opt(fs_info, DEGRADED))
+				return -EIO;
+		}
 
 		if(!device->bdev && !device->missing) {
 			/*
@@ -6806,6 +6810,12 @@ static bool device_has_rw_degrade_error(struct extra_rw_degrade_errors *errors,
 	return ret;
 }
 
+void btrfs_report_missing_device(struct btrfs_fs_info *fs_info, u64 devid,
+				 u8 *uuid)
+{
+	btrfs_warn_rl(fs_info, "devid %llu uuid %pU is missing", devid, uuid);
+}
+
 /*
  * Check if all chunks in the fs is OK for read-write degraded mount
  *
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 67d7474e42a3..1f6ab55640da 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -573,4 +573,6 @@ void record_extra_rw_degrade_error(struct extra_rw_degrade_errors *errors,
 
 bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info,
 			       struct extra_rw_degrade_errors *errors);
+void btrfs_report_missing_device(struct btrfs_fs_info *fs_info, u64 devid,
+				 u8 *uuid);
 #endif
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v3.1 1/7] btrfs: Introduce a function to check if all chunks a OK for degraded rw mount
  2017-03-13  7:29   ` Anand Jain
@ 2017-03-13  7:25     ` Qu Wenruo
  2017-05-01 10:21       ` Dmitrii Tcvetkov
  0 siblings, 1 reply; 20+ messages in thread
From: Qu Wenruo @ 2017-03-13  7:25 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs



At 03/13/2017 03:29 PM, Anand Jain wrote:
>
>
> On 03/09/2017 09:34 AM, Qu Wenruo wrote:
>> Introduce a new function, btrfs_check_rw_degradable(), to check if all
>> chunks in btrfs is OK for degraded rw mount.
>>
>> It provides the new basis for accurate btrfs mount/remount and even
>> runtime degraded mount check other than old one-size-fit-all method.
>>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
>> Tested-by: Adam Borowski <kilobyte@angband.pl>
>> Tested-by: Dmitrii Tcvetkov <demfloro@demfloro.ru>
>> ---
>>  fs/btrfs/volumes.c | 55
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  fs/btrfs/volumes.h |  1 +
>>  2 files changed, 56 insertions(+)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 73d56eef5e60..83613955e3c2 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -6765,6 +6765,61 @@ int btrfs_read_sys_array(struct btrfs_fs_info
>> *fs_info)
>>      return -EIO;
>>  }
>>
>> +/*
>> + * Check if all chunks in the fs is OK for read-write degraded mount
>> + *
>> + * Return true if the fs is OK to be mounted degraded read-write
>> + * Return false if the fs is not OK to be mounted degraded
>> + */
>> +bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info)
>> +{
>> +    struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
>> +    struct extent_map *em;
>> +    u64 next_start = 0;
>> +    bool ret = true;
>> +
>> +    read_lock(&map_tree->map_tree.lock);
>> +    em = lookup_extent_mapping(&map_tree->map_tree, 0, (u64)-1);
>> +    read_unlock(&map_tree->map_tree.lock);
>> +    /* No chunk at all? Return false anyway */
>> +    if (!em) {
>> +        ret = false;
>> +        goto out;
>> +    }
>> +    while (em) {
>> +        struct map_lookup *map;
>> +        int missing = 0;
>> +        int max_tolerated;
>> +        int i;
>> +
>> +        map = (struct map_lookup *) em->bdev;
>
>
>    any idea why not   map = em->map_lookup;  here?


My fault, will update the patch.

Thanks,
Qu
>
> Thanks, Anand
>
>
>> +        max_tolerated =
>> +            btrfs_get_num_tolerated_disk_barrier_failures(
>> +                    map->type);
>> +        for (i = 0; i < map->num_stripes; i++) {
>> +            if (map->stripes[i].dev->missing)
>> +                missing++;
>> +        }
>> +        if (missing > max_tolerated) {
>> +            ret = false;
>> +            btrfs_warn(fs_info,
>> +    "chunk %llu missing %d devices, max tolerance is %d for writeble
>> mount",
>> +                   em->start, missing, max_tolerated);
>> +            free_extent_map(em);
>> +            goto out;
>> +        }
>> +        next_start = extent_map_end(em);
>> +        free_extent_map(em);
>> +
>> +        read_lock(&map_tree->map_tree.lock);
>> +        em = lookup_extent_mapping(&map_tree->map_tree, next_start,
>> +                       (u64)(-1) - next_start);
>> +        read_unlock(&map_tree->map_tree.lock);
>> +    }
>> +out:
>> +    return ret;
>> +}
>> +
>>  int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
>>  {
>>      struct btrfs_root *root = fs_info->chunk_root;
>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>> index 59be81206dd7..db1b5ef479cf 100644
>> --- a/fs/btrfs/volumes.h
>> +++ b/fs/btrfs/volumes.h
>> @@ -538,4 +538,5 @@ struct list_head *btrfs_get_fs_uuids(void);
>>  void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
>>  void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
>>
>> +bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info);
>>  #endif
>>
>
>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3.1 1/7] btrfs: Introduce a function to check if all chunks a OK for degraded rw mount
  2017-03-09  1:34 ` [PATCH v3.1 1/7] btrfs: Introduce a function to check if all chunks a OK for degraded rw mount Qu Wenruo
@ 2017-03-13  7:29   ` Anand Jain
  2017-03-13  7:25     ` Qu Wenruo
  0 siblings, 1 reply; 20+ messages in thread
From: Anand Jain @ 2017-03-13  7:29 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 03/09/2017 09:34 AM, Qu Wenruo wrote:
> Introduce a new function, btrfs_check_rw_degradable(), to check if all
> chunks in btrfs is OK for degraded rw mount.
>
> It provides the new basis for accurate btrfs mount/remount and even
> runtime degraded mount check other than old one-size-fit-all method.
>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
> Tested-by: Adam Borowski <kilobyte@angband.pl>
> Tested-by: Dmitrii Tcvetkov <demfloro@demfloro.ru>
> ---
>  fs/btrfs/volumes.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/btrfs/volumes.h |  1 +
>  2 files changed, 56 insertions(+)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 73d56eef5e60..83613955e3c2 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6765,6 +6765,61 @@ int btrfs_read_sys_array(struct btrfs_fs_info *fs_info)
>  	return -EIO;
>  }
>
> +/*
> + * Check if all chunks in the fs is OK for read-write degraded mount
> + *
> + * Return true if the fs is OK to be mounted degraded read-write
> + * Return false if the fs is not OK to be mounted degraded
> + */
> +bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info)
> +{
> +	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
> +	struct extent_map *em;
> +	u64 next_start = 0;
> +	bool ret = true;
> +
> +	read_lock(&map_tree->map_tree.lock);
> +	em = lookup_extent_mapping(&map_tree->map_tree, 0, (u64)-1);
> +	read_unlock(&map_tree->map_tree.lock);
> +	/* No chunk at all? Return false anyway */
> +	if (!em) {
> +		ret = false;
> +		goto out;
> +	}
> +	while (em) {
> +		struct map_lookup *map;
> +		int missing = 0;
> +		int max_tolerated;
> +		int i;
> +
> +		map = (struct map_lookup *) em->bdev;


    any idea why not   map = em->map_lookup;  here?

Thanks, Anand


> +		max_tolerated =
> +			btrfs_get_num_tolerated_disk_barrier_failures(
> +					map->type);
> +		for (i = 0; i < map->num_stripes; i++) {
> +			if (map->stripes[i].dev->missing)
> +				missing++;
> +		}
> +		if (missing > max_tolerated) {
> +			ret = false;
> +			btrfs_warn(fs_info,
> +	"chunk %llu missing %d devices, max tolerance is %d for writeble mount",
> +				   em->start, missing, max_tolerated);
> +			free_extent_map(em);
> +			goto out;
> +		}
> +		next_start = extent_map_end(em);
> +		free_extent_map(em);
> +
> +		read_lock(&map_tree->map_tree.lock);
> +		em = lookup_extent_mapping(&map_tree->map_tree, next_start,
> +					   (u64)(-1) - next_start);
> +		read_unlock(&map_tree->map_tree.lock);
> +	}
> +out:
> +	return ret;
> +}
> +
>  int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
>  {
>  	struct btrfs_root *root = fs_info->chunk_root;
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 59be81206dd7..db1b5ef479cf 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -538,4 +538,5 @@ struct list_head *btrfs_get_fs_uuids(void);
>  void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
>  void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
>
> +bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info);
>  #endif
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3.1 5/7] btrfs: Allow barrier_all_devices to do chunk level device check
  2017-03-09  1:34 ` [PATCH v3.1 5/7] btrfs: Allow barrier_all_devices to do chunk level device check Qu Wenruo
@ 2017-03-13  8:00   ` Anand Jain
  0 siblings, 0 replies; 20+ messages in thread
From: Anand Jain @ 2017-03-13  8:00 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs


Qu,

patch 4/4 added a cleanup for barrier_all_devices() and introduced
a new function check_barrier_error() where integration with per chunk
level device check will simplify.

  [PATCH 4/4] btrfs: cleanup barrier_all_devices() to check dev stat 
flush error

Thanks, Anand



On 03/09/2017 09:34 AM, Qu Wenruo wrote:
> The last user of num_tolerated_disk_barrier_failures is
> barrier_all_devices().
> But it's can be easily changed to new per-chunk degradable check
> framework.
>
> Now btrfs_device will have two extra members, representing send/wait
> error, set at write_dev_flush() time.
> With these 2 new members, btrfs_check_rw_degradable() can check if the
> fs is still OK when the fs is committed to disk.
>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
> Tested-by: Adam Borowski <kilobyte@angband.pl>
> Tested-by: Dmitrii Tcvetkov <demfloro@demfloro.ru>
> ---
>  fs/btrfs/disk-io.c | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 658b8fab1d39..549045a3e15f 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3570,17 +3570,20 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
>  {
>  	struct list_head *head;
>  	struct btrfs_device *dev;
> -	int errors_send = 0;
> -	int errors_wait = 0;
> +	struct extra_rw_degrade_errors *errors;
>  	int ret;
>
> +	errors = alloc_extra_rw_degrade_errors(info->fs_devices->num_devices);
> +	if (!errors)
> +		return -ENOMEM;
> +
>  	/* send down all the barriers */
>  	head = &info->fs_devices->devices;
>  	list_for_each_entry_rcu(dev, head, dev_list) {
>  		if (dev->missing)
>  			continue;
>  		if (!dev->bdev) {
> -			errors_send++;
> +			record_extra_rw_degrade_error(errors, dev->devid);
>  			continue;
>  		}
>  		if (!dev->in_fs_metadata || !dev->writeable)
> @@ -3588,7 +3591,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
>
>  		ret = write_dev_flush(dev, 0);
>  		if (ret)
> -			errors_send++;
> +			record_extra_rw_degrade_error(errors, dev->devid);
>  	}
>
>  	/* wait for all the barriers */
> @@ -3596,7 +3599,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
>  		if (dev->missing)
>  			continue;
>  		if (!dev->bdev) {
> -			errors_wait++;
> +			record_extra_rw_degrade_error(errors, dev->devid);
>  			continue;
>  		}
>  		if (!dev->in_fs_metadata || !dev->writeable)
> @@ -3604,11 +3607,13 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
>
>  		ret = write_dev_flush(dev, 1);
>  		if (ret)
> -			errors_wait++;
> +			record_extra_rw_degrade_error(errors, dev->devid);
>  	}
> -	if (errors_send > info->num_tolerated_disk_barrier_failures ||
> -	    errors_wait > info->num_tolerated_disk_barrier_failures)
> +	if (!btrfs_check_rw_degradable(info, errors)) {
> +		kfree(errors);
>  		return -EIO;
> +	}
> +	kfree(errors);
>  	return 0;
>  }
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3.1 1/7] btrfs: Introduce a function to check if all chunks a OK for degraded rw mount
  2017-03-13  7:25     ` Qu Wenruo
@ 2017-05-01 10:21       ` Dmitrii Tcvetkov
  2017-05-02  0:20         ` Qu Wenruo
  0 siblings, 1 reply; 20+ messages in thread
From: Dmitrii Tcvetkov @ 2017-05-01 10:21 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

> >> +bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info)
> >> +{
> >> +    struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
> >> +    struct extent_map *em;
> >> +    u64 next_start = 0;
> >> +    bool ret = true;
> >> +
> >> +    read_lock(&map_tree->map_tree.lock);
> >> +    em = lookup_extent_mapping(&map_tree->map_tree, 0, (u64)-1);
> >> +    read_unlock(&map_tree->map_tree.lock);
> >> +    /* No chunk at all? Return false anyway */
> >> +    if (!em) {
> >> +        ret = false;
> >> +        goto out;
> >> +    }
> >> +    while (em) {
> >> +        struct map_lookup *map;
> >> +        int missing = 0;
> >> +        int max_tolerated;
> >> +        int i;
> >> +
> >> +        map = (struct map_lookup *) em->bdev;  
> >
> >
> >    any idea why not   map = em->map_lookup;  here?  
> 
> 
> My fault, will update the patch.
> 
> Thanks,
> Qu

Sorry to bother, but looks like this patchset suddenly got forgotten.
It still applies to 4.11 but I'm afraid it won't after 4.12 merge
window. Any update on it?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3.1 1/7] btrfs: Introduce a function to check if all chunks a OK for degraded rw mount
  2017-05-01 10:21       ` Dmitrii Tcvetkov
@ 2017-05-02  0:20         ` Qu Wenruo
  2017-05-02  2:28           ` Anand Jain
  0 siblings, 1 reply; 20+ messages in thread
From: Qu Wenruo @ 2017-05-02  0:20 UTC (permalink / raw)
  To: Dmitrii Tcvetkov, linux-btrfs, Anand Jain



At 05/01/2017 06:21 PM, Dmitrii Tcvetkov wrote:
>>>> +bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info)
>>>> +{
>>>> +    struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
>>>> +    struct extent_map *em;
>>>> +    u64 next_start = 0;
>>>> +    bool ret = true;
>>>> +
>>>> +    read_lock(&map_tree->map_tree.lock);
>>>> +    em = lookup_extent_mapping(&map_tree->map_tree, 0, (u64)-1);
>>>> +    read_unlock(&map_tree->map_tree.lock);
>>>> +    /* No chunk at all? Return false anyway */
>>>> +    if (!em) {
>>>> +        ret = false;
>>>> +        goto out;
>>>> +    }
>>>> +    while (em) {
>>>> +        struct map_lookup *map;
>>>> +        int missing = 0;
>>>> +        int max_tolerated;
>>>> +        int i;
>>>> +
>>>> +        map = (struct map_lookup *) em->bdev;
>>>
>>>
>>>     any idea why not   map = em->map_lookup;  here?
>>
>>
>> My fault, will update the patch.
>>
>> Thanks,
>> Qu
> 
> Sorry to bother, but looks like this patchset suddenly got forgotten.
> It still applies to 4.11 but I'm afraid it won't after 4.12 merge
> window. Any update on it?

Just waiting for the flush error rework from Anand Jain.

(Well, I still remember the original patchset has the same thing problem)

Maybe Anand Jain has some idea on this.

Thanks,
Qu



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3.1 1/7] btrfs: Introduce a function to check if all chunks a OK for degraded rw mount
  2017-05-02  0:20         ` Qu Wenruo
@ 2017-05-02  2:28           ` Anand Jain
  0 siblings, 0 replies; 20+ messages in thread
From: Anand Jain @ 2017-05-02  2:28 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Dmitrii Tcvetkov, linux-btrfs



On 05/02/2017 08:20 AM, Qu Wenruo wrote:
>
>
> At 05/01/2017 06:21 PM, Dmitrii Tcvetkov wrote:
>>>>> +bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info)
>>>>> +{
>>>>> +    struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
>>>>> +    struct extent_map *em;
>>>>> +    u64 next_start = 0;
>>>>> +    bool ret = true;
>>>>> +
>>>>> +    read_lock(&map_tree->map_tree.lock);
>>>>> +    em = lookup_extent_mapping(&map_tree->map_tree, 0, (u64)-1);
>>>>> +    read_unlock(&map_tree->map_tree.lock);
>>>>> +    /* No chunk at all? Return false anyway */
>>>>> +    if (!em) {
>>>>> +        ret = false;
>>>>> +        goto out;
>>>>> +    }
>>>>> +    while (em) {
>>>>> +        struct map_lookup *map;
>>>>> +        int missing = 0;
>>>>> +        int max_tolerated;
>>>>> +        int i;
>>>>> +
>>>>> +        map = (struct map_lookup *) em->bdev;
>>>>
>>>>
>>>>     any idea why not   map = em->map_lookup;  here?
>>>
>>>
>>> My fault, will update the patch.
>>>
>>> Thanks,
>>> Qu
>>
>> Sorry to bother, but looks like this patchset suddenly got forgotten.
>> It still applies to 4.11 but I'm afraid it won't after 4.12 merge
>> window. Any update on it?
>
> Just waiting for the flush error rework from Anand Jain.


   There were quite a number of trials on the btrfs dev flush to get
   that correctly, David reviewed previous once and the current and
   probably the final is titled [1] which is waiting for David.
     [1]
     [PATCH] btrfs: add framework to handle device flush error as a volume

Thanks, Anand


> (Well, I still remember the original patchset has the same thing problem)
>
> Maybe Anand Jain has some idea on this.
>
> Thanks,
> Qu





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3.1 0/7] Chunk level degradable check
  2017-03-09  1:34 [PATCH v3.1 0/7] Chunk level degradable check Qu Wenruo
                   ` (6 preceding siblings ...)
  2017-03-09  1:34 ` [PATCH v3.1 7/7] btrfs: Enhance missing device kernel message Qu Wenruo
@ 2017-06-26 18:59 ` David Sterba
  2017-06-27  1:05   ` Qu Wenruo
  7 siblings, 1 reply; 20+ messages in thread
From: David Sterba @ 2017-06-26 18:59 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, anand.jain

On Thu, Mar 09, 2017 at 09:34:35AM +0800, Qu Wenruo wrote:
> Btrfs currently uses num_tolerated_disk_barrier_failures to do global
> check for tolerated missing device.
> 
> Although the one-size-fit-all solution is quite safe, it's too strict
> if data and metadata has different duplication level.
> 
> For example, if one use Single data and RAID1 metadata for 2 disks, it
> means any missing device will make the fs unable to be degraded
> mounted.
> 
> But in fact, some times all single chunks may be in the existing
> device and in that case, we should allow it to be rw degraded mounted.
> 
> Such case can be easily reproduced using the following script:
>  # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc
>  # wipefs -f /dev/sdc
>  # mount /dev/sdb -o degraded,rw
> 
> If using btrfs-debug-tree to check /dev/sdb, one should find that the
> data chunk is only in sdb, so in fact it should allow degraded mount.
> 
> This patchset will introduce a new per-chunk degradable check for
> btrfs, allow above case to succeed, and it's quite small anyway.
> 
> And enhance kernel error message for missing device, at least kernel
> can know what's making mount failed, other than meaningless
> "failed to read system chunk/chunk tree -5".

I'd like to get this merged to 4.14. The flush bio changes are now done,
so the base code should be stable. I've read the previous iterations of
this patchset, the comments and user feedback. The usecase coverage
seems to be good and what users expect.

There are some bits in the implementation that I do not like, eg.
reintroducing memory allocation failure to the barrier check, but IIRC
no fundamental problems. Please refresh the patchset on top of current
code that's going to 4.13 (equvalent to the current for-next), I'll
review that and comment. One or more iterations might be needed, but
4.14 target is within reach.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3.1 0/7] Chunk level degradable check
  2017-06-26 18:59 ` [PATCH v3.1 0/7] Chunk level degradable check David Sterba
@ 2017-06-27  1:05   ` Qu Wenruo
  2017-06-27  1:59     ` Anand Jain
  0 siblings, 1 reply; 20+ messages in thread
From: Qu Wenruo @ 2017-06-27  1:05 UTC (permalink / raw)
  To: dsterba, linux-btrfs, anand.jain



At 06/27/2017 02:59 AM, David Sterba wrote:
> On Thu, Mar 09, 2017 at 09:34:35AM +0800, Qu Wenruo wrote:
>> Btrfs currently uses num_tolerated_disk_barrier_failures to do global
>> check for tolerated missing device.
>>
>> Although the one-size-fit-all solution is quite safe, it's too strict
>> if data and metadata has different duplication level.
>>
>> For example, if one use Single data and RAID1 metadata for 2 disks, it
>> means any missing device will make the fs unable to be degraded
>> mounted.
>>
>> But in fact, some times all single chunks may be in the existing
>> device and in that case, we should allow it to be rw degraded mounted.
>>
>> Such case can be easily reproduced using the following script:
>>   # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc
>>   # wipefs -f /dev/sdc
>>   # mount /dev/sdb -o degraded,rw
>>
>> If using btrfs-debug-tree to check /dev/sdb, one should find that the
>> data chunk is only in sdb, so in fact it should allow degraded mount.
>>
>> This patchset will introduce a new per-chunk degradable check for
>> btrfs, allow above case to succeed, and it's quite small anyway.
>>
>> And enhance kernel error message for missing device, at least kernel
>> can know what's making mount failed, other than meaningless
>> "failed to read system chunk/chunk tree -5".
> 
> I'd like to get this merged to 4.14. The flush bio changes are now done,
> so the base code should be stable. I've read the previous iterations of
> this patchset, the comments and user feedback. The usecase coverage
> seems to be good and what users expect.

Thank you for the kindly remind.

> 
> There are some bits in the implementation that I do not like, eg.
> reintroducing memory allocation failure to the barrier check, but IIRC
> no fundamental problems. Please refresh the patchset on top of current
> code that's going to 4.13 (equvalent to the current for-next), I'll
> review that and comment. One or more iterations might be needed, but
> 4.14 target is within reach.

I'll check the new flush infrastructure and figure out if we can avoid 
re-introducing such memory allocation failure with the new infrastructure.

Thanks,
Qu



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3.1 0/7] Chunk level degradable check
  2017-06-27  1:05   ` Qu Wenruo
@ 2017-06-27  1:59     ` Anand Jain
  2017-06-27  2:49       ` Qu Wenruo
  0 siblings, 1 reply; 20+ messages in thread
From: Anand Jain @ 2017-06-27  1:59 UTC (permalink / raw)
  To: Qu Wenruo, dsterba, linux-btrfs



On 06/27/2017 09:05 AM, Qu Wenruo wrote:
> 
> 
> At 06/27/2017 02:59 AM, David Sterba wrote:
>> On Thu, Mar 09, 2017 at 09:34:35AM +0800, Qu Wenruo wrote:
>>> Btrfs currently uses num_tolerated_disk_barrier_failures to do global
>>> check for tolerated missing device.
>>>
>>> Although the one-size-fit-all solution is quite safe, it's too strict
>>> if data and metadata has different duplication level.
>>>
>>> For example, if one use Single data and RAID1 metadata for 2 disks, it
>>> means any missing device will make the fs unable to be degraded
>>> mounted.
>>>
>>> But in fact, some times all single chunks may be in the existing
>>> device and in that case, we should allow it to be rw degraded mounted.
>>>
>>> Such case can be easily reproduced using the following script:
>>>   # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc
>>>   # wipefs -f /dev/sdc
>>>   # mount /dev/sdb -o degraded,rw
>>>
>>> If using btrfs-debug-tree to check /dev/sdb, one should find that the
>>> data chunk is only in sdb, so in fact it should allow degraded mount.
>>>
>>> This patchset will introduce a new per-chunk degradable check for
>>> btrfs, allow above case to succeed, and it's quite small anyway.
>>>
>>> And enhance kernel error message for missing device, at least kernel
>>> can know what's making mount failed, other than meaningless
>>> "failed to read system chunk/chunk tree -5".
>>
>> I'd like to get this merged to 4.14. The flush bio changes are now done,
>> so the base code should be stable. I've read the previous iterations of
>> this patchset, the comments and user feedback. The usecase coverage
>> seems to be good and what users expect.
> 
> Thank you for the kindly remind.
> 
>>
>> There are some bits in the implementation that I do not like, eg.
>> reintroducing memory allocation failure to the barrier check, but IIRC
>> no fundamental problems. Please refresh the patchset on top of current
>> code that's going to 4.13 (equvalent to the current for-next), I'll
>> review that and comment. One or more iterations might be needed, but
>> 4.14 target is within reach.
> 
> I'll check the new flush infrastructure and figure out if we can avoid 
> re-introducing such memory allocation failure with the new infrastructure.

  As this is going to address the raid1 availability issue, its better to
  mark this for the stable. IMO. But I wonder if there is any objection ?

Thanks, -Anand

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3.1 0/7] Chunk level degradable check
  2017-06-27  1:59     ` Anand Jain
@ 2017-06-27  2:49       ` Qu Wenruo
  2017-06-27 11:20         ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 20+ messages in thread
From: Qu Wenruo @ 2017-06-27  2:49 UTC (permalink / raw)
  To: Anand Jain, dsterba, linux-btrfs



At 06/27/2017 09:59 AM, Anand Jain wrote:
> 
> 
> On 06/27/2017 09:05 AM, Qu Wenruo wrote:
>>
>>
>> At 06/27/2017 02:59 AM, David Sterba wrote:
>>> On Thu, Mar 09, 2017 at 09:34:35AM +0800, Qu Wenruo wrote:
>>>> Btrfs currently uses num_tolerated_disk_barrier_failures to do global
>>>> check for tolerated missing device.
>>>>
>>>> Although the one-size-fit-all solution is quite safe, it's too strict
>>>> if data and metadata has different duplication level.
>>>>
>>>> For example, if one use Single data and RAID1 metadata for 2 disks, it
>>>> means any missing device will make the fs unable to be degraded
>>>> mounted.
>>>>
>>>> But in fact, some times all single chunks may be in the existing
>>>> device and in that case, we should allow it to be rw degraded mounted.
>>>>
>>>> Such case can be easily reproduced using the following script:
>>>>   # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc
>>>>   # wipefs -f /dev/sdc
>>>>   # mount /dev/sdb -o degraded,rw
>>>>
>>>> If using btrfs-debug-tree to check /dev/sdb, one should find that the
>>>> data chunk is only in sdb, so in fact it should allow degraded mount.
>>>>
>>>> This patchset will introduce a new per-chunk degradable check for
>>>> btrfs, allow above case to succeed, and it's quite small anyway.
>>>>
>>>> And enhance kernel error message for missing device, at least kernel
>>>> can know what's making mount failed, other than meaningless
>>>> "failed to read system chunk/chunk tree -5".
>>>
>>> I'd like to get this merged to 4.14. The flush bio changes are now done,
>>> so the base code should be stable. I've read the previous iterations of
>>> this patchset, the comments and user feedback. The usecase coverage
>>> seems to be good and what users expect.
>>
>> Thank you for the kindly remind.
>>
>>>
>>> There are some bits in the implementation that I do not like, eg.
>>> reintroducing memory allocation failure to the barrier check, but IIRC
>>> no fundamental problems. Please refresh the patchset on top of current
>>> code that's going to 4.13 (equvalent to the current for-next), I'll
>>> review that and comment. One or more iterations might be needed, but
>>> 4.14 target is within reach.
>>
>> I'll check the new flush infrastructure and figure out if we can avoid 
>> re-introducing such memory allocation failure with the new 
>> infrastructure.
> 
>   As this is going to address the raid1 availability issue, its better to
>   mark this for the stable. IMO. But I wonder if there is any objection ?

Not sure if stable maintainers (even normal subsystem maintainers) will 
like it, as it's quite a large modification, including dev flush 
infrastructure.

But since v4.14 will be an LTS kernel, we don't need to rush too much to 
push this feature to stable, as long as the feature is planned to reach 
v4.14.

Thanks,
Qu

> 
> Thanks, -Anand
> 
> 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3.1 0/7] Chunk level degradable check
  2017-06-27  2:49       ` Qu Wenruo
@ 2017-06-27 11:20         ` Austin S. Hemmelgarn
  2017-06-27 12:20           ` David Sterba
  0 siblings, 1 reply; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2017-06-27 11:20 UTC (permalink / raw)
  To: Qu Wenruo, Anand Jain, dsterba, linux-btrfs

On 2017-06-26 22:49, Qu Wenruo wrote:
> 
> 
> At 06/27/2017 09:59 AM, Anand Jain wrote:
>>
>>
>> On 06/27/2017 09:05 AM, Qu Wenruo wrote:
>>>
>>>
>>> At 06/27/2017 02:59 AM, David Sterba wrote:
>>>> On Thu, Mar 09, 2017 at 09:34:35AM +0800, Qu Wenruo wrote:
>>>>> Btrfs currently uses num_tolerated_disk_barrier_failures to do global
>>>>> check for tolerated missing device.
>>>>>
>>>>> Although the one-size-fit-all solution is quite safe, it's too strict
>>>>> if data and metadata has different duplication level.
>>>>>
>>>>> For example, if one use Single data and RAID1 metadata for 2 disks, it
>>>>> means any missing device will make the fs unable to be degraded
>>>>> mounted.
>>>>>
>>>>> But in fact, some times all single chunks may be in the existing
>>>>> device and in that case, we should allow it to be rw degraded mounted.
>>>>>
>>>>> Such case can be easily reproduced using the following script:
>>>>>   # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc
>>>>>   # wipefs -f /dev/sdc
>>>>>   # mount /dev/sdb -o degraded,rw
>>>>>
>>>>> If using btrfs-debug-tree to check /dev/sdb, one should find that the
>>>>> data chunk is only in sdb, so in fact it should allow degraded mount.
>>>>>
>>>>> This patchset will introduce a new per-chunk degradable check for
>>>>> btrfs, allow above case to succeed, and it's quite small anyway.
>>>>>
>>>>> And enhance kernel error message for missing device, at least kernel
>>>>> can know what's making mount failed, other than meaningless
>>>>> "failed to read system chunk/chunk tree -5".
>>>>
>>>> I'd like to get this merged to 4.14. The flush bio changes are now 
>>>> done,
>>>> so the base code should be stable. I've read the previous iterations of
>>>> this patchset, the comments and user feedback. The usecase coverage
>>>> seems to be good and what users expect.
>>>
>>> Thank you for the kindly remind.
>>>
>>>>
>>>> There are some bits in the implementation that I do not like, eg.
>>>> reintroducing memory allocation failure to the barrier check, but IIRC
>>>> no fundamental problems. Please refresh the patchset on top of current
>>>> code that's going to 4.13 (equvalent to the current for-next), I'll
>>>> review that and comment. One or more iterations might be needed, but
>>>> 4.14 target is within reach.
>>>
>>> I'll check the new flush infrastructure and figure out if we can 
>>> avoid re-introducing such memory allocation failure with the new 
>>> infrastructure.
>>
>>   As this is going to address the raid1 availability issue, its better to
>>   mark this for the stable. IMO. But I wonder if there is any objection ?
> 
> Not sure if stable maintainers (even normal subsystem maintainers) will 
> like it, as it's quite a large modification, including dev flush 
> infrastructure.
> 
> But since v4.14 will be an LTS kernel, we don't need to rush too much to 
> push this feature to stable, as long as the feature is planned to reach 
> v4.14.
I would personally tend to disagree.  It fixes a pretty severe data loss 
bug that arises from what most people for some reason think is 
acceptable normal usage.  I can understand not going too far back, but I 
do think it should probably be marked for at least 4.9.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3.1 0/7] Chunk level degradable check
  2017-06-27 11:20         ` Austin S. Hemmelgarn
@ 2017-06-27 12:20           ` David Sterba
  0 siblings, 0 replies; 20+ messages in thread
From: David Sterba @ 2017-06-27 12:20 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Qu Wenruo, Anand Jain, linux-btrfs

On Tue, Jun 27, 2017 at 07:20:06AM -0400, Austin S. Hemmelgarn wrote:
> On 2017-06-26 22:49, Qu Wenruo wrote:
> > 
> > 
> > At 06/27/2017 09:59 AM, Anand Jain wrote:
> >>
> >>
> >> On 06/27/2017 09:05 AM, Qu Wenruo wrote:
> >>>
> >>>
> >>> At 06/27/2017 02:59 AM, David Sterba wrote:
> >>>> On Thu, Mar 09, 2017 at 09:34:35AM +0800, Qu Wenruo wrote:
> >>>>> Btrfs currently uses num_tolerated_disk_barrier_failures to do global
> >>>>> check for tolerated missing device.
> >>>>>
> >>>>> Although the one-size-fit-all solution is quite safe, it's too strict
> >>>>> if data and metadata has different duplication level.
> >>>>>
> >>>>> For example, if one use Single data and RAID1 metadata for 2 disks, it
> >>>>> means any missing device will make the fs unable to be degraded
> >>>>> mounted.
> >>>>>
> >>>>> But in fact, some times all single chunks may be in the existing
> >>>>> device and in that case, we should allow it to be rw degraded mounted.
> >>>>>
> >>>>> Such case can be easily reproduced using the following script:
> >>>>>   # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc
> >>>>>   # wipefs -f /dev/sdc
> >>>>>   # mount /dev/sdb -o degraded,rw
> >>>>>
> >>>>> If using btrfs-debug-tree to check /dev/sdb, one should find that the
> >>>>> data chunk is only in sdb, so in fact it should allow degraded mount.
> >>>>>
> >>>>> This patchset will introduce a new per-chunk degradable check for
> >>>>> btrfs, allow above case to succeed, and it's quite small anyway.
> >>>>>
> >>>>> And enhance kernel error message for missing device, at least kernel
> >>>>> can know what's making mount failed, other than meaningless
> >>>>> "failed to read system chunk/chunk tree -5".
> >>>>
> >>>> I'd like to get this merged to 4.14. The flush bio changes are now 
> >>>> done,
> >>>> so the base code should be stable. I've read the previous iterations of
> >>>> this patchset, the comments and user feedback. The usecase coverage
> >>>> seems to be good and what users expect.
> >>>
> >>> Thank you for the kindly remind.
> >>>
> >>>>
> >>>> There are some bits in the implementation that I do not like, eg.
> >>>> reintroducing memory allocation failure to the barrier check, but IIRC
> >>>> no fundamental problems. Please refresh the patchset on top of current
> >>>> code that's going to 4.13 (equvalent to the current for-next), I'll
> >>>> review that and comment. One or more iterations might be needed, but
> >>>> 4.14 target is within reach.
> >>>
> >>> I'll check the new flush infrastructure and figure out if we can 
> >>> avoid re-introducing such memory allocation failure with the new 
> >>> infrastructure.
> >>
> >>   As this is going to address the raid1 availability issue, its better to
> >>   mark this for the stable. IMO. But I wonder if there is any objection ?
> > 
> > Not sure if stable maintainers (even normal subsystem maintainers) will 
> > like it, as it's quite a large modification, including dev flush 
> > infrastructure.
> > 
> > But since v4.14 will be an LTS kernel, we don't need to rush too much to 
> > push this feature to stable, as long as the feature is planned to reach 
> > v4.14.
> I would personally tend to disagree.  It fixes a pretty severe data loss 
> bug that arises from what most people for some reason think is 
> acceptable normal usage.  I can understand not going too far back, but I 
> do think it should probably be marked for at least 4.9.

Depends on the amount of the code, but we can make prep paches tailored
for 4.9, leaving out the cleanups and just porting the required changes
for this patchset.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-06-27 12:22 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-09  1:34 [PATCH v3.1 0/7] Chunk level degradable check Qu Wenruo
2017-03-09  1:34 ` [PATCH v3.1 1/7] btrfs: Introduce a function to check if all chunks a OK for degraded rw mount Qu Wenruo
2017-03-13  7:29   ` Anand Jain
2017-03-13  7:25     ` Qu Wenruo
2017-05-01 10:21       ` Dmitrii Tcvetkov
2017-05-02  0:20         ` Qu Wenruo
2017-05-02  2:28           ` Anand Jain
2017-03-09  1:34 ` [PATCH v3.1 2/7] btrfs: Do chunk level rw degrade check at mount time Qu Wenruo
2017-03-09  1:34 ` [PATCH v3.1 3/7] btrfs: Do chunk level degradation check for remount Qu Wenruo
2017-03-09  1:34 ` [PATCH v3.1 4/7] btrfs: Introduce extra_rw_degrade_errors parameter for btrfs_check_rw_degradable Qu Wenruo
2017-03-09  1:34 ` [PATCH v3.1 5/7] btrfs: Allow barrier_all_devices to do chunk level device check Qu Wenruo
2017-03-13  8:00   ` Anand Jain
2017-03-09  1:34 ` [PATCH v3.1 6/7] btrfs: Cleanup num_tolerated_disk_barrier_failures Qu Wenruo
2017-03-09  1:34 ` [PATCH v3.1 7/7] btrfs: Enhance missing device kernel message Qu Wenruo
2017-06-26 18:59 ` [PATCH v3.1 0/7] Chunk level degradable check David Sterba
2017-06-27  1:05   ` Qu Wenruo
2017-06-27  1:59     ` Anand Jain
2017-06-27  2:49       ` Qu Wenruo
2017-06-27 11:20         ` Austin S. Hemmelgarn
2017-06-27 12:20           ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.