All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/15] btrfs: Hot spare and Auto replace
@ 2015-11-09 10:56 Anand Jain
  2015-11-09 10:56 ` [PATCH 01/15] btrfs: Introduce a new function to check if all chunks a OK for degraded mount Anand Jain
                   ` (19 more replies)
  0 siblings, 20 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
  To: linux-btrfs

These set of patches provides btrfs hot spare and auto replace support
for you review and comments.

First, here below are the simple example steps to configure the same:

Add a spare device:
    btrfs spare add /dev/sde -f

OR if there is a spare device which is already added before the, just
run

    btrfs dev scan [/dev/sde]

this will register the spare device to the kernel.

    btrfs fi show
    Label: none  uuid: 52f170c1-725c-457d-8cfd-d57090460091
	Total devices 2 FS bytes used 112.00KiB
	devid    1 size 2.00GiB used 417.50MiB path /dev/sdc
	devid    2 size 2.00GiB used 417.50MiB path /dev/sdd

    Global spare
	device size 3.00GiB path /dev/sde

Thats it.

Auto replace:
 Replace happens automatically, that is when there is any write
 failed or flush failed, the device will be marked as failed, which
 will stop any further IO attempt to that device. And in the next commit
 thread cycle the auto replace will pick the spare device (/dev/sde is
 above example) to replace the failed device. And so the btrfs volume is
 back to a healthy state.


Its btrfs Global spare:
 as of now only global hot spare is supported, that is hot spare(s)
 are for all the btrfs FS in the system.

No spare when device failed:
 It would scan for spare device at the rate of transaction commit
 and will trigger the auto replace when ever spare device is added.

Priority:
 In some future work there can be some chronological order to pick
 a spare and the failed device.


Patches:

Kernel:
First, it needs, Qu's per chunk missing device patchset,
which is part of the set here and also there is a light optimization
(patch 5/15) which was required as part of this enhancement.

Next patches 7,8/15 brings in support, to manage the transition of
devices from online (no state) to offline OR failed state dynamically.
On top of static device state like the current "missing" state.

Patch 9/15 fixes a bug where in we should have blocked the incompatible
feature at the device scan/add level instead/also at in the mount level.
This is because we don't have to bring a device into the device list,
if it is incompatible.

Next patches 10,11,12,13/15 adds support for Spare device. For the
details on how to add a spare device kindly see further below.
For kernel with out spare feature supported the spare device
is kept away. And when the kernel supports the spare device, it will
inhibit from mounting it. Further these patch set provides helper
function to pick a spare device and release a spare device back to
the spare device pool.

Patch 14/15 provides function for auto replace, this is mainly
from the existing replace code, and in the long run I see opportunity
to merge these code with the replace code that is triggered from
the user spare.

Last 15/15, uses all these facilities, picks a failed device and
triggers a auto replace in a kthread (casualty_kthread())


Progs:
Would need 4 patches as listed below.


Known Bug:

As now I see below stale kmem cache during module unload. Which
I am digging.
------
BUG btrfs_path (Not tainted): Objects remaining in btrfs_path on kmem_cache_close()
------

Anand Jain (10):
  btrfs: optimize btrfs_check_degradable() for calls outside of barrier
  btrfs: introduce device dynamic state transition to offline or failed
  btrfs: check device for critical errors and mark failed
  btrfs: block incompatible optional features at scan
  btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV
  btrfs: add check not to mount a spare device
  btrfs: support btrfs dev scan for spare device
  btrfs: provide framework to get and put a spare device
  btrfs: introduce helper functions to perform hot replace
  btrfs: check for failed device and hot replace

Qu Wenruo (5):
  btrfs: Introduce a new function to check if all chunks a OK for
    degraded mount
  btrfs: Do per-chunk check for mount time check
  btrfs: Do per-chunk degraded check for remount
  btrfs: Allow barrier_all_devices to do per-chunk device check
  btrfs: Cleanup num_tolerated_disk_barrier_failures

 fs/btrfs/ctree.h       |   7 +-
 fs/btrfs/dev-replace.c | 116 ++++++++++++++++++++
 fs/btrfs/dev-replace.h |   1 +
 fs/btrfs/disk-io.c     | 211 +++++++++++++++++++++++-------------
 fs/btrfs/disk-io.h     |   2 -
 fs/btrfs/super.c       |  20 +++-
 fs/btrfs/transaction.c |   3 +-
 fs/btrfs/volumes.c     | 283 ++++++++++++++++++++++++++++++++++++++++++++++---
 fs/btrfs/volumes.h     |  27 +++++
 9 files changed, 571 insertions(+), 99 deletions(-)

-- 
2.4.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 01/15] btrfs: Introduce a new function to check if all chunks a OK for degraded mount
  2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
  2015-11-09 10:56 ` [PATCH 02/15] btrfs: Do per-chunk check for mount time check Anand Jain
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
  To: linux-btrfs

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

Introduce a new function, btrfs_check_degradable(), to judge if all chunks
in btrfs is OK for degraded mount.

It provides the new basis for accurate btrfs mount/remount and even
runtime degraded mount check other than old one-size-fit-all method.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/volumes.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.h |  1 +
 2 files changed, 64 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f1fb3df..cfbdf9a 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6805,3 +6805,66 @@ void btrfs_close_one_device(struct btrfs_device *device)
 
 	call_rcu(&device->rcu, free_device);
 }
+
+/*
+ * Check if all chunks in the fs is OK for degraded mount
+ * Caller itself should do extra check if DEGRADED mount option is given
+ * for >0 return value.
+ *
+ * Return 0 if all chunks are OK.
+ * Return >0 if all chunks are degradable but not all OK.
+ * Return <0 if any chunk is not degradable or other bug.
+ */
+int btrfs_check_degradable(struct btrfs_fs_info *fs_info, unsigned flags)
+{
+	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
+	struct extent_map *em;
+	u64 next_start = 0;
+	int ret = 0;
+
+	if (flags & MS_RDONLY)
+		return 0;
+
+	read_lock(&map_tree->map_tree.lock);
+	em = lookup_extent_mapping(&map_tree->map_tree, 0, (u64)(-1));
+	/* No any chunk? Should be a huge bug */
+	if (!em) {
+		ret = -ENOENT;
+		goto out;
+	}
+
+	while (em) {
+		struct map_lookup *map;
+		int missing = 0;
+		int max_tolerated;
+		int i;
+
+		map = (struct map_lookup *) em->bdev;
+		max_tolerated =
+			btrfs_get_num_tolerated_disk_barrier_failures(
+					map->type);
+		for (i = 0; i < map->num_stripes; i++) {
+			if (map->stripes[i].dev->missing)
+				missing++;
+		}
+		if (missing > max_tolerated) {
+			ret = -EIO;
+			btrfs_warn(fs_info,
+				   "missing devices(%d) exceeds the limit(%d), writebale mount is not allowed",
+				   missing, max_tolerated);
+			goto out;
+		} else if (missing)
+			ret = 1;
+		next_start = extent_map_end(em);
+
+		/*
+		 * Alwasy search range [next_start, (u64)-1) to find the next
+		 * chunk map
+		 */
+		em = lookup_extent_mapping(&map_tree->map_tree, next_start,
+					   (u64)(-1) - next_start);
+	}
+out:
+	read_unlock(&map_tree->map_tree.lock);
+	return ret;
+}
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 4150d9d..c875be9 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -552,5 +552,6 @@ struct list_head *btrfs_get_fs_uuids(void);
 void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
 void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
 void btrfs_close_one_device(struct btrfs_device *device);
+int btrfs_check_degradable(struct btrfs_fs_info *fs_info, unsigned flags);
 
 #endif
-- 
2.4.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 02/15] btrfs: Do per-chunk check for mount time check
  2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
  2015-11-09 10:56 ` [PATCH 01/15] btrfs: Introduce a new function to check if all chunks a OK for degraded mount Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
  2015-11-09 10:56 ` [PATCH 03/15] btrfs: Do per-chunk degraded check for remount Anand Jain
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
  To: linux-btrfs

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

Now use the btrfs_check_degraded() to do mount time degraded check.

With this patch, now we can mount with the following case:
 # mkfs.btrfs -f -m raid1 -d single /dev/sdb /dev/sdc
 # wipefs -a /dev/sdc
 # mount /dev/sdb /mnt/btrfs -o degraded
 As the single data chunk is only in sdb, so it's OK to mount as degraded,
 as missing one device is OK for RAID1.

But still fail with the following case as expected:
 # mkfs.btrfs -f -m raid1 -d single /dev/sdb /dev/sdc
 # wipefs -a /dev/sdb
 # mount /dev/sdc /mnt/btrfs -o degraded
 As the data chunk is only in sdb, so it's not OK to mount it as degraded.

Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Reported-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>

[Btrfs: use btrfs_error instead of btrfs_err during mount]
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/disk-io.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 1776bcd..d54cdcc 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2866,6 +2866,16 @@ int open_ctree(struct super_block *sb,
 		goto fail_tree_roots;
 	}
 
+	ret = btrfs_check_degradable(fs_info, fs_info->sb->s_flags);
+	if (ret < 0) {
+		btrfs_err(fs_info, "degraded writable mount failed %d", ret);
+		goto fail_tree_roots;
+	} else if (ret > 0 && !btrfs_test_opt(chunk_root, DEGRADED)) {
+		btrfs_warn(fs_info,
+			"Some device missing, but still degraded mountable, please mount with -o degraded option");
+		ret = -EACCES;
+		goto fail_tree_roots;
+	}
 	/*
 	 * keep the device that is marked to be the target device for the
 	 * dev_replace procedure
@@ -2957,14 +2967,6 @@ retry_root_backup:
 	}
 	fs_info->num_tolerated_disk_barrier_failures =
 		btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
-	if (fs_info->fs_devices->missing_devices >
-	     fs_info->num_tolerated_disk_barrier_failures &&
-	    !(sb->s_flags & MS_RDONLY)) {
-		pr_warn("BTRFS: missing devices(%llu) exceeds the limit(%d), writeable mount is not allowed\n",
-			fs_info->fs_devices->missing_devices,
-			fs_info->num_tolerated_disk_barrier_failures);
-		goto fail_sysfs;
-	}
 
 	fs_info->cleaner_kthread = kthread_run(cleaner_kthread, tree_root,
 					       "btrfs-cleaner");
-- 
2.4.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 03/15] btrfs: Do per-chunk degraded check for remount
  2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
  2015-11-09 10:56 ` [PATCH 01/15] btrfs: Introduce a new function to check if all chunks a OK for degraded mount Anand Jain
  2015-11-09 10:56 ` [PATCH 02/15] btrfs: Do per-chunk check for mount time check Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
  2015-11-09 10:56 ` [PATCH 04/15] btrfs: Allow barrier_all_devices to do per-chunk device check Anand Jain
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
  To: linux-btrfs

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

Just the same for mount time check, use new btrfs_check_degraded() to do
per chunk check.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>

Btrfs: use btrfs_error instead of btrfs_err during remount

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/super.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index b23d49d..d495790 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1662,11 +1662,14 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 			goto restore;
 		}
 
-		if (fs_info->fs_devices->missing_devices >
-		     fs_info->num_tolerated_disk_barrier_failures &&
-		    !(*flags & MS_RDONLY)) {
+		ret = btrfs_check_degradable(fs_info, *flags);
+		if (ret < 0) {
+			btrfs_err(fs_info,
+				"degraded writable remount failed %d", ret);
+			goto restore;
+		} else if (ret > 0 && !btrfs_test_opt(root, DEGRADED)) {
 			btrfs_warn(fs_info,
-				"too many missing devices, writeable remount is not allowed");
+				"some device missing, but still degraded mountable, please remount with -o degraded option");
 			ret = -EACCES;
 			goto restore;
 		}
-- 
2.4.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 04/15] btrfs: Allow barrier_all_devices to do per-chunk device check
  2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
                   ` (2 preceding siblings ...)
  2015-11-09 10:56 ` [PATCH 03/15] btrfs: Do per-chunk degraded check for remount Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
  2015-11-09 10:56 ` [PATCH 05/15] btrfs: optimize btrfs_check_degradable() for calls outside of barrier Anand Jain
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
  To: linux-btrfs

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

The last user of num_tolerated_disk_barrier_failures is
barrier_all_devices(). But it's can be easily changed to new per-chunk
degradable check framework.

Now btrfs_device will have two extra members, representing send/wait
error, set at write_dev_flush() time. And then check it in a similar but
more accurate behavior than old code.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/disk-io.c | 13 +++++--------
 fs/btrfs/volumes.c |  6 +++++-
 fs/btrfs/volumes.h |  4 ++++
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d54cdcc..958c2a6 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3433,8 +3433,6 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 {
 	struct list_head *head;
 	struct btrfs_device *dev;
-	int errors_send = 0;
-	int errors_wait = 0;
 	int ret;
 
 	/* send down all the barriers */
@@ -3443,7 +3441,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 		if (dev->missing)
 			continue;
 		if (!dev->bdev) {
-			errors_send++;
+			dev->err_send = 1;
 			continue;
 		}
 		if (!dev->in_fs_metadata || !dev->writeable)
@@ -3451,7 +3449,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 
 		ret = write_dev_flush(dev, 0);
 		if (ret)
-			errors_send++;
+			dev->err_send = 1;
 	}
 
 	/* wait for all the barriers */
@@ -3459,7 +3457,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 		if (dev->missing)
 			continue;
 		if (!dev->bdev) {
-			errors_wait++;
+			dev->err_wait = 1;
 			continue;
 		}
 		if (!dev->in_fs_metadata || !dev->writeable)
@@ -3467,10 +3465,9 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 
 		ret = write_dev_flush(dev, 1);
 		if (ret)
-			errors_wait++;
+			dev->err_wait = 1;
 	}
-	if (errors_send > info->num_tolerated_disk_barrier_failures ||
-	    errors_wait > info->num_tolerated_disk_barrier_failures)
+	if (btrfs_check_degradable(info, info->sb->s_flags) < 0)
 		return -EIO;
 	return 0;
 }
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index cfbdf9a..8acf69b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6844,8 +6844,12 @@ int btrfs_check_degradable(struct btrfs_fs_info *fs_info, unsigned flags)
 			btrfs_get_num_tolerated_disk_barrier_failures(
 					map->type);
 		for (i = 0; i < map->num_stripes; i++) {
-			if (map->stripes[i].dev->missing)
+			if (map->stripes[i].dev->missing ||
+			    map->stripes[i].dev->err_wait ||
+			    map->stripes[i].dev->err_send)
 				missing++;
+			map->stripes[i].dev->err_wait = 0;
+			map->stripes[i].dev->err_send = 0;
 		}
 		if (missing > max_tolerated) {
 			ret = -EIO;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index c875be9..d9a4579 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -76,6 +76,10 @@ struct btrfs_device {
 	int can_discard;
 	int is_tgtdev_for_dev_replace;
 
+	/* for barrier_all_devices() check */
+	int err_send;
+	int err_wait;
+
 #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
 	seqcount_t data_seqcount;
 #endif
-- 
2.4.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 05/15] btrfs: optimize btrfs_check_degradable() for calls outside of barrier
  2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
                   ` (3 preceding siblings ...)
  2015-11-09 10:56 ` [PATCH 04/15] btrfs: Allow barrier_all_devices to do per-chunk device check Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
  2015-11-09 10:56 ` [PATCH 06/15] btrfs: Cleanup num_tolerated_disk_barrier_failures Anand Jain
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
  To: linux-btrfs

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/disk-io.c | 8 +++++++-
 fs/btrfs/volumes.c | 2 --
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 958c2a6..d3303f9 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3428,6 +3428,7 @@ static int write_dev_flush(struct btrfs_device *device, int wait)
 /*
  * send an empty flush down to each device in parallel,
  * then wait for them
+ * fixme: optimize err_wait, err_send.
  */
 static int barrier_all_devices(struct btrfs_fs_info *info)
 {
@@ -3467,8 +3468,13 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 		if (ret)
 			dev->err_wait = 1;
 	}
-	if (btrfs_check_degradable(info, info->sb->s_flags) < 0)
+	if (btrfs_check_degradable(info, info->sb->s_flags) < 0) {
+		dev->err_send = 0;
+		dev->err_wait = 0;
 		return -EIO;
+	}
+	dev->err_send = 0;
+	dev->err_wait = 0;
 	return 0;
 }
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 8acf69b..a5262bf 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6848,8 +6848,6 @@ int btrfs_check_degradable(struct btrfs_fs_info *fs_info, unsigned flags)
 			    map->stripes[i].dev->err_wait ||
 			    map->stripes[i].dev->err_send)
 				missing++;
-			map->stripes[i].dev->err_wait = 0;
-			map->stripes[i].dev->err_send = 0;
 		}
 		if (missing > max_tolerated) {
 			ret = -EIO;
-- 
2.4.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 06/15] btrfs: Cleanup num_tolerated_disk_barrier_failures
  2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
                   ` (4 preceding siblings ...)
  2015-11-09 10:56 ` [PATCH 05/15] btrfs: optimize btrfs_check_degradable() for calls outside of barrier Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
  2015-12-05  7:16   ` Qu Wenruo
  2015-11-09 10:56 ` [PATCH 07/15] btrfs: introduce device dynamic state transition to offline or failed Anand Jain
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
  To: linux-btrfs

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

As we use per-chunk degradable check, now the global
num_tolerated_disk_barrier_failures is of no use. So cleanup it.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>

[Btrfs: resolve conflict to apply 'btrfs: Cleanup num_tolerated_disk_barrier_failures']
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/ctree.h   |  2 --
 fs/btrfs/disk-io.c | 56 ------------------------------------------------------
 fs/btrfs/disk-io.h |  2 --
 fs/btrfs/volumes.c | 17 -----------------
 4 files changed, 77 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a86051e..dedd3e0 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1753,8 +1753,6 @@ struct btrfs_fs_info {
 	/* next backup root to be overwritten */
 	int backup_root_index;
 
-	int num_tolerated_disk_barrier_failures;
-
 	/* device replace state */
 	struct btrfs_dev_replace dev_replace;
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d3303f9..d10ef2e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2965,8 +2965,6 @@ retry_root_backup:
 		printk(KERN_ERR "BTRFS: Failed to read block groups: %d\n", ret);
 		goto fail_sysfs;
 	}
-	fs_info->num_tolerated_disk_barrier_failures =
-		btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
 
 	fs_info->cleaner_kthread = kthread_run(cleaner_kthread, tree_root,
 					       "btrfs-cleaner");
@@ -3498,60 +3496,6 @@ int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags)
 	return 0;
 }
 
-int btrfs_calc_num_tolerated_disk_barrier_failures(
-	struct btrfs_fs_info *fs_info)
-{
-	struct btrfs_ioctl_space_info space;
-	struct btrfs_space_info *sinfo;
-	u64 types[] = {BTRFS_BLOCK_GROUP_DATA,
-		       BTRFS_BLOCK_GROUP_SYSTEM,
-		       BTRFS_BLOCK_GROUP_METADATA,
-		       BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA};
-	int i;
-	int c;
-	int num_tolerated_disk_barrier_failures =
-		(int)fs_info->fs_devices->num_devices;
-
-	for (i = 0; i < ARRAY_SIZE(types); i++) {
-		struct btrfs_space_info *tmp;
-
-		sinfo = NULL;
-		rcu_read_lock();
-		list_for_each_entry_rcu(tmp, &fs_info->space_info, list) {
-			if (tmp->flags == types[i]) {
-				sinfo = tmp;
-				break;
-			}
-		}
-		rcu_read_unlock();
-
-		if (!sinfo)
-			continue;
-
-		down_read(&sinfo->groups_sem);
-		for (c = 0; c < BTRFS_NR_RAID_TYPES; c++) {
-			u64 flags;
-
-			if (list_empty(&sinfo->block_groups[c]))
-				continue;
-
-			btrfs_get_block_group_info(&sinfo->block_groups[c],
-						   &space);
-			if (space.total_bytes == 0 || space.used_bytes == 0)
-				continue;
-			flags = space.flags;
-
-			num_tolerated_disk_barrier_failures = min(
-				num_tolerated_disk_barrier_failures,
-				btrfs_get_num_tolerated_disk_barrier_failures(
-					flags));
-		}
-		up_read(&sinfo->groups_sem);
-	}
-
-	return num_tolerated_disk_barrier_failures;
-}
-
 static int write_all_supers(struct btrfs_root *root, int max_mirrors)
 {
 	struct list_head *head;
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index adeb318..6dc5fd3 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -142,8 +142,6 @@ struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans,
 int btree_lock_page_hook(struct page *page, void *data,
 				void (*flush_fn)(void *));
 int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags);
-int btrfs_calc_num_tolerated_disk_barrier_failures(
-	struct btrfs_fs_info *fs_info);
 int __init btrfs_end_io_wq_init(void);
 void btrfs_end_io_wq_exit(void);
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a5262bf..33ad42e 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1782,9 +1782,6 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path, u64 devid)
 		free_fs_devices(cur_devices);
 	}
 
-	root->fs_info->num_tolerated_disk_barrier_failures =
-		btrfs_calc_num_tolerated_disk_barrier_failures(root->fs_info);
-
 	/*
 	 * at this point, the device is zero sized.  We want to
 	 * remove it from the devices list and zero out the old super
@@ -2289,8 +2286,6 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path)
 		}
 	}
 
-	root->fs_info->num_tolerated_disk_barrier_failures =
-		btrfs_calc_num_tolerated_disk_barrier_failures(root->fs_info);
 	ret = btrfs_commit_transaction(trans, root);
 
 	if (seeding_dev) {
@@ -3518,13 +3513,6 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 		}
 	} while (read_seqretry(&fs_info->profiles_lock, seq));
 
-	if (bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) {
-		fs_info->num_tolerated_disk_barrier_failures = min(
-			btrfs_calc_num_tolerated_disk_barrier_failures(fs_info),
-			btrfs_get_num_tolerated_disk_barrier_failures(
-				bctl->sys.target));
-	}
-
 	ret = insert_balance_item(fs_info->tree_root, bctl);
 	if (ret && ret != -EEXIST)
 		goto out;
@@ -3547,11 +3535,6 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 	mutex_lock(&fs_info->balance_mutex);
 	atomic_dec(&fs_info->balance_running);
 
-	if (bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) {
-		fs_info->num_tolerated_disk_barrier_failures =
-			btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
-	}
-
 	if (bargs) {
 		memset(bargs, 0, sizeof(*bargs));
 		update_ioctl_balance_args(fs_info, 0, bargs);
-- 
2.4.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 07/15] btrfs: introduce device dynamic state transition to offline or failed
  2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
                   ` (5 preceding siblings ...)
  2015-11-09 10:56 ` [PATCH 06/15] btrfs: Cleanup num_tolerated_disk_barrier_failures Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
  2015-11-09 10:56 ` [PATCH 08/15] btrfs: check device for critical errors and mark failed Anand Jain
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
  To: linux-btrfs

Need device forced offline/failed feature for the following reasons,
1) a. it can be reported that device has failed when it does
   b. close the device when it goes offline so that blocklayer can
      cleanup
2) identify the candidate for the auto replace
3) avoid further commit error reported against the failing device and
4) a device in the multi device btrfs may go offline from the system
   (but as of now in in some system config btrfs gets unmounted in this
    context, which is not a correct behavior)

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/volumes.c | 148 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.h |  14 +++++
 2 files changed, 162 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 33ad42e..7492733 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6853,3 +6853,151 @@ out:
 	read_unlock(&map_tree->map_tree.lock);
 	return ret;
 }
+
+static void __close_device(struct work_struct *work)
+{
+	struct btrfs_device *device;
+
+	device = container_of(work, struct btrfs_device, rcu_work);
+
+	if (device->bdev)
+		blkdev_put(device->bdev, device->mode);
+
+	device->bdev = NULL;
+}
+
+static void close_device(struct rcu_head *head)
+{
+	struct btrfs_device *device;
+
+	device = container_of(head, struct btrfs_device, rcu);
+
+	INIT_WORK(&device->rcu_work, __close_device);
+	schedule_work(&device->rcu_work);
+}
+
+void btrfs_close_one_device_dont_free(struct btrfs_device *device)
+{
+	struct btrfs_fs_devices *fs_devices = device->fs_devices;
+
+	if (device->bdev)
+		fs_devices->open_devices--;
+
+	if (device->writeable &&
+	    device->devid != BTRFS_DEV_REPLACE_DEVID) {
+		list_del_init(&device->dev_alloc_list);
+		fs_devices->rw_devices--;
+	}
+
+	device->writeable = 0;
+
+	call_rcu(&device->rcu, close_device);
+}
+
+void __force_device_close(struct btrfs_device *device)
+{
+	struct btrfs_device *next_device;
+	struct btrfs_fs_devices *fs_devices;
+
+	fs_devices = device->fs_devices;
+
+	mutex_lock(&fs_devices->device_list_mutex);
+	lock_chunks(fs_devices->fs_info->fs_root);
+
+	next_device = list_entry(fs_devices->devices.next,
+					struct btrfs_device, dev_list);
+	if (device->bdev == fs_devices->fs_info->sb->s_bdev)
+		fs_devices->fs_info->sb->s_bdev = next_device->bdev;
+
+	if (device->bdev == fs_devices->latest_bdev)
+		fs_devices->latest_bdev = next_device->bdev;
+
+	btrfs_close_one_device_dont_free(device);
+
+	/*
+	 * fixme: works for now, but its better to keep the state
+	 * missing and offline different, and update rest of the
+	 * places where we check for only missing.
+	 */
+	device->missing = 1;
+	fs_devices->missing_devices++;
+	device->writeable = 0;
+
+	rcu_barrier();
+
+	unlock_chunks(fs_devices->fs_info->fs_root);
+	mutex_unlock(&fs_devices->device_list_mutex);
+}
+
+void btrfs_force_device_close(struct btrfs_device *dev, char *why)
+{
+	bool degrade_option;
+	int tolerated_fail;
+	u64 rw_devices;
+	struct btrfs_fs_info *fs_info;
+	struct btrfs_fs_devices *fs_devices;
+
+	fs_devices = dev->fs_devices;
+	fs_info = fs_devices->fs_info;
+	tolerated_fail = btrfs_check_degradable(fs_info,
+						fs_info->sb->s_flags);
+	rw_devices = fs_devices->rw_devices;
+	degrade_option = btrfs_test_opt(fs_info->fs_root, DEGRADED);
+
+	/* todo: support seed later */
+	if (fs_devices->seeding)
+		return;
+
+	/* this shouldn't be called if device is already missing */
+	if (dev->missing || !dev->bdev)
+		return;
+
+	if (dev->offline || dev->failed)
+		return;
+
+	/* last standing device is being offlined */
+	if (rw_devices == 1) {
+		btrfs_std_error(fs_info, -EIO, "force offline last RW device");
+		return;
+	}
+
+	if (!strcmp(why, "offline"))
+		dev->offline = 1;
+	else if (!strcmp(why, "failed"))
+		dev->failed = 1;
+	else
+		return;
+
+	rcu_read_lock();
+	btrfs_info(fs_info,
+	"device %s %s num_devices %llu rw_devices %llu degraded %d -o degraded %s",
+		rcu_str_deref(dev->name), why, fs_devices->num_devices,
+		rw_devices, tolerated_fail,
+		degrade_option ? "set":"unset");
+	rcu_read_unlock();
+
+	btrfs_sysfs_rm_device_link(fs_devices, dev, 0);
+
+	__force_device_close(dev);
+	tolerated_fail = btrfs_check_degradable(fs_info,
+						fs_info->sb->s_flags);
+	if (tolerated_fail > 0) {
+		rcu_read_lock();
+		btrfs_warn(fs_info, "device %s %s, chunks degraded",
+					rcu_str_deref(dev->name), why);
+		rcu_read_unlock();
+		return;
+	} else if(tolerated_fail < 0) {
+		rcu_read_lock();
+		btrfs_warn(fs_info,
+			"device %s is %s, device(s) with critical chunk(s) missing",
+			rcu_str_deref(dev->name), why);
+		rcu_read_unlock();
+		btrfs_std_error(fs_info, -EIO, "devices below critical level");
+		return;
+	}
+	rcu_read_lock();
+	btrfs_warn(fs_info, "device %s %s, No chunks are degraded",
+		rcu_str_deref(dev->name), why);
+	rcu_read_unlock();
+}
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index d9a4579..1c6107a 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -72,7 +72,20 @@ struct btrfs_device {
 
 	int writeable;
 	int in_fs_metadata;
+	/* missing: device wasn't found at the time of mount */
+	/* fixme: correct usage of missing_devices and missing */
 	int missing;
+	/* failed: device confirmed to have experienced critical io failure */
+	int failed;
+	/*
+	 * offline: system or user or block layer transport has removed
+	 * offlined the device which was once present and without going
+	 * through unmount. Implies an intriem communication break down
+	 * and not necessarily a candidate for the device replace. And
+	 * device might be online after user intervention or after
+	 * block transport layer error recovery.
+	 */
+	int offline;
 	int can_discard;
 	int is_tgtdev_for_dev_replace;
 
@@ -557,5 +570,6 @@ void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
 void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
 void btrfs_close_one_device(struct btrfs_device *device);
 int btrfs_check_degradable(struct btrfs_fs_info *fs_info, unsigned flags);
+void btrfs_force_device_close(struct btrfs_device *dev, char *why);
 
 #endif
-- 
2.4.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 08/15] btrfs: check device for critical errors and mark failed
  2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
                   ` (6 preceding siblings ...)
  2015-11-09 10:56 ` [PATCH 07/15] btrfs: introduce device dynamic state transition to offline or failed Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
  2015-11-09 10:56 ` [PATCH 09/15] btrfs: block incompatible optional features at scan Anand Jain
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
  To: linux-btrfs

Write and Flush errors are considered as critical errors,
upon which the device will be brought offline and marked as
failed. Write and Flush errors are identified using device
error statistics.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/disk-io.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.c |  1 +
 fs/btrfs/volumes.h |  4 ++++
 3 files changed, 48 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d10ef2e..38e0385 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1836,6 +1836,47 @@ sleep:
 	return 0;
 }
 
+static void btrfs_check_devices(struct btrfs_fs_devices *fs_devices)
+{
+	struct btrfs_fs_info *fs_info = fs_devices->fs_info;
+	struct btrfs_device *device;
+
+	if (btrfs_fs_closing(fs_info))
+		return;
+
+	/* mark disk(s) with write or flush error(s) as failed */
+	mutex_lock(&fs_info->volume_mutex);
+	list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) {
+		int c_err;
+
+		/*
+		 * todo: replace target device's write/flush error,
+		 * skip for now
+		 */
+		if (device->is_tgtdev_for_dev_replace)
+			continue;
+
+		if (!device->dev_stats_valid)
+			continue;
+
+		c_err = atomic_read(&device->new_critical_errs);
+		atomic_sub(c_err, &device->new_critical_errs);
+		if (c_err) {
+			rcu_read_lock();
+			btrfs_warn(fs_info,
+				"new write errors on device %s",
+					rcu_str_deref(device->name));
+			rcu_read_unlock();
+
+			/* force close and mark device as failed */
+			btrfs_force_device_close(device, "failed");
+		}
+	}
+	mutex_unlock(&fs_info->volume_mutex);
+
+	return;
+}
+
 static int transaction_kthread(void *arg)
 {
 	struct btrfs_root *root = arg;
@@ -1882,6 +1923,8 @@ static int transaction_kthread(void *arg)
 			btrfs_end_transaction(trans, root);
 		}
 sleep:
+		btrfs_check_devices(root->fs_info->fs_devices);
+
 		wake_up_process(root->fs_info->cleaner_kthread);
 		mutex_unlock(&root->fs_info->transaction_kthread_mutex);
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 7492733..b52197b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -157,6 +157,7 @@ static struct btrfs_device *__alloc_device(void)
 	spin_lock_init(&dev->reada_lock);
 	atomic_set(&dev->reada_in_flight, 0);
 	atomic_set(&dev->dev_stats_ccnt, 0);
+	atomic_set(&dev->new_critical_errs, 0);
 	INIT_RADIX_TREE(&dev->reada_zones, GFP_NOFS & ~__GFP_WAIT);
 	INIT_RADIX_TREE(&dev->reada_extents, GFP_NOFS & ~__GFP_WAIT);
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 1c6107a..827371e 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -167,6 +167,7 @@ struct btrfs_device {
 	/* Counter to record the change of device stats */
 	atomic_t dev_stats_ccnt;
 	atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX];
+	atomic_t new_critical_errs;
 };
 
 /*
@@ -518,6 +519,9 @@ static inline void btrfs_dev_stat_inc(struct btrfs_device *dev,
 	atomic_inc(dev->dev_stat_values + index);
 	smp_mb__before_atomic();
 	atomic_inc(&dev->dev_stats_ccnt);
+	if (index == BTRFS_DEV_STAT_WRITE_ERRS ||
+		index == BTRFS_DEV_STAT_FLUSH_ERRS)
+		atomic_inc(&dev->new_critical_errs);
 }
 
 static inline int btrfs_dev_stat_read(struct btrfs_device *dev,
-- 
2.4.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 09/15] btrfs: block incompatible optional features at scan
  2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
                   ` (7 preceding siblings ...)
  2015-11-09 10:56 ` [PATCH 08/15] btrfs: check device for critical errors and mark failed Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
  2015-11-09 10:56 ` [PATCH 10/15] btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV Anand Jain
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
  To: linux-btrfs

For the matter of completeness we need to check if the device
being scanned has features that are known to the kernel. As of
now if it doesn't - the mount will fails, then what is the point
in having those devices added to the btrfs_fs_devices list at
device_list_add().

So block those devices at scan. Which means the original block at
open_ctee() won't reach in case of device with unsupported feature.
But I am leaving that code as it is. without deleting.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/volumes.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b52197b..fcc9e57 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -976,6 +976,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
 	u64 transid;
 	u64 total_devices;
 	u64 bytenr;
+	u64 features;
 
 	/*
 	 * we would like to check all the supers, but that would make
@@ -996,6 +997,15 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
 	if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
 		goto error_bdev_put;
 
+	features = btrfs_super_incompat_flags(disk_super) &
+				~BTRFS_FEATURE_INCOMPAT_SUPP;
+	if (features) {
+		printk(KERN_ERR \
+			"BTRFS: couldn't scan, unsupported optional features (%Lx)\n",
+			features);
+		ret = -EOPNOTSUPP;
+		goto error_disk_super;
+	}
 	devid = btrfs_stack_device_id(&disk_super->dev_item);
 	transid = btrfs_super_generation(disk_super);
 	total_devices = btrfs_super_num_devices(disk_super);
@@ -1010,6 +1020,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
 	if (!ret && fs_devices_ret)
 		(*fs_devices_ret)->total_devices = total_devices;
 
+error_disk_super:
 	btrfs_release_disk_super(page);
 
 error_bdev_put:
-- 
2.4.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 10/15] btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV
  2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
                   ` (8 preceding siblings ...)
  2015-11-09 10:56 ` [PATCH 09/15] btrfs: block incompatible optional features at scan Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
  2015-11-09 10:56 ` [PATCH 11/15] btrfs: add check not to mount a spare device Anand Jain
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
  To: linux-btrfs

Add BTRFS_FEATURE_INCOMPAT_SPARE_DEV (400) flag to identify
a spare device.

Along with this it checks in the mount context that a spare
device will fail to mount.  As spare devices aren't mountable.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/ctree.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index dedd3e0..4d25fd8 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -522,6 +522,7 @@ struct btrfs_super_block {
 #define BTRFS_FEATURE_INCOMPAT_RAID56		(1ULL << 7)
 #define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA	(1ULL << 8)
 #define BTRFS_FEATURE_INCOMPAT_NO_HOLES		(1ULL << 9)
+#define BTRFS_FEATURE_INCOMPAT_SPARE_DEV	(1ULL << 10)
 
 #define BTRFS_FEATURE_COMPAT_SUPP		0ULL
 #define BTRFS_FEATURE_COMPAT_SAFE_SET		0ULL
@@ -539,7 +540,8 @@ struct btrfs_super_block {
 	 BTRFS_FEATURE_INCOMPAT_RAID56 |		\
 	 BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF |		\
 	 BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA |	\
-	 BTRFS_FEATURE_INCOMPAT_NO_HOLES)
+	 BTRFS_FEATURE_INCOMPAT_NO_HOLES |		\
+	 BTRFS_FEATURE_INCOMPAT_SPARE_DEV)
 
 #define BTRFS_FEATURE_INCOMPAT_SAFE_SET			\
 	(BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)
-- 
2.4.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 11/15] btrfs: add check not to mount a spare device
  2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
                   ` (9 preceding siblings ...)
  2015-11-09 10:56 ` [PATCH 10/15] btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
  2015-11-09 10:56 ` [PATCH 12/15] btrfs: support btrfs dev scan for " Anand Jain
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
  To: linux-btrfs

Spare devices can be scanned but shouldn't be mountable.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/disk-io.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 38e0385..3662c0a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2768,6 +2768,14 @@ int open_ctree(struct super_block *sb,
 		goto fail_alloc;
 	}
 
+	if (btrfs_super_incompat_flags(disk_super) &
+			BTRFS_FEATURE_INCOMPAT_SPARE_DEV) {
+		/*You can only scan a spare device but not mount*/
+		printk(KERN_ERR "BTRFS: You can't mount a spare device\n");
+		err = -ENOTSUPP;
+		goto fail_alloc;
+	}
+
 	/*
 	 * Leafsize and nodesize were always equal, this is only a sanity check.
 	 */
-- 
2.4.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 12/15] btrfs: support btrfs dev scan for spare device
  2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
                   ` (10 preceding siblings ...)
  2015-11-09 10:56 ` [PATCH 11/15] btrfs: add check not to mount a spare device Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
  2015-11-09 10:56 ` [PATCH 13/15] btrfs: provide framework to get and put a " Anand Jain
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
  To: linux-btrfs

When the user or system calls the BTRFS_IOC_SCAN_DEV,
ioctl this patch will make sure it is added to the device
list and set it as spare.

This operation will be same when BTRFS_IOC_DEVICES_READY
as well since BTRFS_IOC_DEVICES_READY ioctl has been doing
that by legacy.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/volumes.c | 4 ++++
 fs/btrfs/volumes.h | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index fcc9e57..28f549d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -526,6 +526,10 @@ static noinline int device_list_add(const char *path,
 		if (IS_ERR(fs_devices))
 			return PTR_ERR(fs_devices);
 
+		if (btrfs_super_incompat_flags(disk_super) &
+				BTRFS_FEATURE_INCOMPAT_SPARE_DEV)
+			fs_devices->spare = 1;
+
 		list_add(&fs_devices->list, &fs_uuids);
 
 		device = NULL;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 827371e..3d995b7 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -278,6 +278,8 @@ struct btrfs_fs_devices {
 	struct kobject fsid_kobj;
 	struct kobject *device_dir_kobj;
 	struct completion kobj_unregister;
+
+	int spare;
 };
 
 #define BTRFS_BIO_INLINE_CSUM_SIZE	64
-- 
2.4.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 13/15] btrfs: provide framework to get and put a spare device
  2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
                   ` (11 preceding siblings ...)
  2015-11-09 10:56 ` [PATCH 12/15] btrfs: support btrfs dev scan for " Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
  2015-11-09 10:56 ` [PATCH 14/15] btrfs: introduce helper functions to perform hot replace Anand Jain
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
  To: linux-btrfs

This adds functions to get and put a spare device from the list.
So that hot repace code can pick a spare device when needed.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/super.c   |  9 +++++++++
 fs/btrfs/volumes.c | 37 +++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.h |  2 ++
 3 files changed, 48 insertions(+)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index d495790..29836ca 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2035,6 +2035,15 @@ static int btrfs_control_open(struct inode *inode, struct file *file)
 	return 0;
 }
 
+void btrfs_put_spare_device(char *path)
+{
+	struct btrfs_fs_devices *fs_devices;
+
+	if (btrfs_scan_one_device(path, FMODE_READ,
+				    &btrfs_fs_type, &fs_devices))
+		printk(KERN_INFO "failed to return spare device\n");
+}
+
 /*
  * used by btrfsctl to scan devices when no FS is mounted
  */
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 28f549d..3b90690 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7017,3 +7017,40 @@ void btrfs_force_device_close(struct btrfs_device *dev, char *why)
 		rcu_str_deref(dev->name), why);
 	rcu_read_unlock();
 }
+
+int btrfs_get_spare_device(char **path)
+{
+	int ret = 1;
+	struct btrfs_fs_devices *fs_devices;
+	struct btrfs_device *device;
+	struct list_head *fs_uuids = btrfs_get_fs_uuids();
+
+	mutex_lock(&uuid_mutex);
+	list_for_each_entry(fs_devices, fs_uuids, list) {
+		if (!fs_devices->spare)
+			continue;
+
+		/* as of now there is only one device in the spare fs_devices */
+		device = list_entry(fs_devices->devices.next,
+					struct btrfs_device, dev_list);
+
+		if (!device || !device->name)
+			continue;
+
+		fs_devices->spare = 0;
+		rcu_read_lock();
+		*path = kstrdup(device->name->str, GFP_NOFS);
+		rcu_read_unlock();
+		ret = 0;
+		break;
+	}
+
+	if (!ret) {
+		btrfs_sysfs_remove_fsid(fs_devices);
+		list_del(&fs_devices->list);
+		free_fs_devices(fs_devices);
+	}
+	mutex_unlock(&uuid_mutex);
+
+	return ret;
+}
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 3d995b7..36184ec 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -577,5 +577,7 @@ void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
 void btrfs_close_one_device(struct btrfs_device *device);
 int btrfs_check_degradable(struct btrfs_fs_info *fs_info, unsigned flags);
 void btrfs_force_device_close(struct btrfs_device *dev, char *why);
+int btrfs_get_spare_device(char **path);
+void btrfs_put_spare_device(char *path);
 
 #endif
-- 
2.4.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 14/15] btrfs: introduce helper functions to perform hot replace
  2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
                   ` (12 preceding siblings ...)
  2015-11-09 10:56 ` [PATCH 13/15] btrfs: provide framework to get and put a " Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
  2015-11-09 10:56 ` [PATCH 15/15] btrfs: check for failed device and " Anand Jain
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
  To: linux-btrfs

Hot replace / auto replace is important volume manager feature
and is critical to the data center operations, so that the degraded
volume can be brought back to a healthy state at the earliest and
without manual intervention.

This modifies the existing replace code to suite the need of auto
replace, in the long run I hope both the codes to be merged.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/dev-replace.c | 116 +++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/dev-replace.h |   1 +
 2 files changed, 117 insertions(+)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 02df419..3294b33 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -914,3 +914,119 @@ void btrfs_bio_counter_inc_blocked(struct btrfs_fs_info *fs_info)
 				     &fs_info->fs_state));
 	}
 }
+
+int btrfs_dev_replace_start_v2(struct btrfs_root *root, char *tgt_path,
+					struct btrfs_device *src_device)
+{
+	struct btrfs_trans_handle *trans;
+	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace;
+	int ret;
+	struct btrfs_device *tgt_device = NULL;
+
+	/*
+	 * proceure here is the same as in the replace triggered from the
+	 * user land, some day we could merg this with it
+	 */
+	WARN_ON(!src_device);
+	mutex_lock(&fs_info->volume_mutex);
+	ret = btrfs_init_dev_replace_tgtdev(root, tgt_path,
+					    src_device, &tgt_device);
+	mutex_unlock(&fs_info->volume_mutex);
+	if (ret)
+		return ret;
+	WARN_ON(!tgt_device);
+
+	trans = btrfs_attach_transaction(root);
+	if (!IS_ERR(trans)) {
+		ret = btrfs_commit_transaction(trans, root);
+		if (ret)
+			return ret;
+	} else if (PTR_ERR(trans) != -ENOENT) {
+		return PTR_ERR(trans);
+	}
+
+	btrfs_dev_replace_lock(dev_replace);
+	if (dev_replace->replace_state ==
+		BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED ||
+		dev_replace->replace_state ==
+		BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED)
+		goto leave;
+
+	dev_replace->cont_reading_from_srcdev_mode =
+		BTRFS_IOCTL_DEV_REPLACE_CONT_READING_FROM_SRCDEV_MODE_AVOID;
+	dev_replace->srcdev = src_device;
+	dev_replace->tgtdev = tgt_device;
+
+	dev_replace->replace_state = BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED;
+	dev_replace->time_started = get_seconds();
+	dev_replace->cursor_left = 0;
+	dev_replace->committed_cursor_left = 0;
+	dev_replace->cursor_left_last_write_of_item = 0;
+	dev_replace->cursor_right = 0;
+	dev_replace->is_valid = 1;
+	dev_replace->item_needs_writeback = 1;
+
+	printk_in_rcu(KERN_INFO
+		"BTRFS: auto replace from %s (devid %llu) to %s started\n",
+		rcu_str_deref(src_device->name),
+		src_device->devid,
+		rcu_str_deref(tgt_device->name));
+
+	btrfs_dev_replace_unlock(dev_replace);
+
+	ret = btrfs_sysfs_add_device_link(tgt_device->fs_devices, tgt_device, 0);
+	if (ret)
+		btrfs_err(fs_info, "kobj add dev failed %d\n", ret);
+
+	btrfs_wait_ordered_roots(fs_info, -1);
+
+	trans = btrfs_start_transaction(root, 0);
+	if (IS_ERR(trans)) {
+		ret = PTR_ERR(trans);
+		btrfs_dev_replace_lock(dev_replace);
+		goto leave;
+	}
+	ret = btrfs_commit_transaction(trans, root);
+	WARN_ON(ret);
+
+	ret = btrfs_scrub_dev(fs_info, src_device->devid, 0,
+			      btrfs_device_get_total_bytes(src_device),
+			      &dev_replace->scrub_progress, 0, 1);
+
+	ret = btrfs_dev_replace_finishing(fs_info, ret);
+	if (ret == -EINPROGRESS)
+		ret = 0;
+	else
+		WARN_ON(ret);
+
+	return ret;
+
+leave:
+	dev_replace->srcdev = NULL;
+	dev_replace->tgtdev = NULL;
+	btrfs_dev_replace_unlock(dev_replace);
+	btrfs_destroy_dev_replace_tgtdev(fs_info, tgt_device);
+	return ret;
+}
+
+int btrfs_auto_replace_start(struct btrfs_root *root,
+				struct btrfs_device *src_device)
+{
+	char *tgt_path;
+	int ret;
+
+	if (btrfs_get_spare_device(&tgt_path)) {
+		btrfs_err(root->fs_info,
+			"No spare device found/configured in the kernel");
+		return -EINVAL;
+	}
+
+	ret = btrfs_dev_replace_start_v2(root, tgt_path, src_device);
+	if (ret)
+		btrfs_put_spare_device(tgt_path);
+
+	kfree(tgt_path);
+
+	return 0;
+}
diff --git a/fs/btrfs/dev-replace.h b/fs/btrfs/dev-replace.h
index 20035cb..2ead9a6 100644
--- a/fs/btrfs/dev-replace.h
+++ b/fs/btrfs/dev-replace.h
@@ -41,4 +41,5 @@ static inline void btrfs_dev_replace_stats_inc(atomic64_t *stat_value)
 {
 	atomic64_inc(stat_value);
 }
+int btrfs_auto_replace_start(struct btrfs_root *root, struct btrfs_device *src_device);
 #endif
-- 
2.4.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 15/15] btrfs: check for failed device and hot replace
  2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
                   ` (13 preceding siblings ...)
  2015-11-09 10:56 ` [PATCH 14/15] btrfs: introduce helper functions to perform hot replace Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
  2015-11-09 10:58 ` [PATCH 0/4] btrfs-progs: Hot spare and Auto replace Anand Jain
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
  To: linux-btrfs

This patch creates casualty_kthread to check for the failed
devices, and triggers device replace.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/ctree.h       |  1 +
 fs/btrfs/disk-io.c     | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/transaction.c |  3 ++-
 3 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4d25fd8..3e706ff 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1613,6 +1613,7 @@ struct btrfs_fs_info {
 	struct btrfs_workqueue *extent_workers;
 	struct task_struct *transaction_kthread;
 	struct task_struct *cleaner_kthread;
+	struct task_struct *casualty_kthread;
 	int thread_pool_size;
 
 	struct kobject *space_info_kobj;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3662c0a..beefe35 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1836,6 +1836,64 @@ sleep:
 	return 0;
 }
 
+/*
+ * A kthread to check if any auto maintenance be required. This is
+ * multithread safe, and kthread is running only if
+ * fs_info->casualty_kthread is not NULL, fixme: atomic ?
+ */
+static int casualty_kthread(void *arg)
+{
+	struct btrfs_root *root = arg;
+	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
+	struct btrfs_device *device;
+	int found = 0;
+
+	if (root->fs_info->sb->s_flags & MS_RDONLY)
+		goto out;
+
+	btrfs_dev_replace_lock(&fs_info->dev_replace);
+	if (btrfs_dev_replace_is_ongoing(&fs_info->dev_replace)) {
+		btrfs_dev_replace_unlock(&fs_info->dev_replace);
+		goto out;
+	}
+	btrfs_dev_replace_unlock(&fs_info->dev_replace);
+
+	/*
+	 * Find failed device, if any. After the replace the failed
+	 * device is removed, so any failed device found here is new and
+	 * will be a candidate for the replace, if FS can't work without
+	 * the failed device then btrfs_std_error() will have put FS into
+	 * readonly
+	 */
+	/*
+	 * fixme: introduce a priority order to find failed device,
+	 * chronological order ?
+	 */
+	mutex_lock(&fs_devices->device_list_mutex);
+	rcu_read_lock();
+	list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) {
+		if (device->failed) {
+			found = 1;
+			break;
+		}
+	}
+	rcu_read_unlock();
+	mutex_unlock(&fs_devices->device_list_mutex);
+
+	/*
+	 * We are using the replace code which should be interrupt-able
+	 * during unmount, and as of now there is no user land stop
+	 * request that we support
+	 */
+	if (found)
+		btrfs_auto_replace_start(root, device);
+
+out:
+	fs_info->casualty_kthread = NULL;
+	return 0;
+}
+
 static void btrfs_check_devices(struct btrfs_fs_devices *fs_devices)
 {
 	struct btrfs_fs_info *fs_info = fs_devices->fs_info;
@@ -1924,6 +1982,10 @@ static int transaction_kthread(void *arg)
 		}
 sleep:
 		btrfs_check_devices(root->fs_info->fs_devices);
+		if (!root->fs_info->casualty_kthread)
+			root->fs_info->casualty_kthread =
+				kthread_run(casualty_kthread, root,
+							"btrfs-casualty");
 
 		wake_up_process(root->fs_info->cleaner_kthread);
 		mutex_unlock(&root->fs_info->transaction_kthread_mutex);
@@ -3159,6 +3221,9 @@ fail_trans_kthread:
 	kthread_stop(fs_info->transaction_kthread);
 	btrfs_cleanup_transaction(fs_info->tree_root);
 	btrfs_free_fs_roots(fs_info);
+	if (fs_info->casualty_kthread)
+		kthread_stop(fs_info->casualty_kthread);
+
 fail_cleaner:
 	kthread_stop(fs_info->cleaner_kthread);
 
@@ -3807,6 +3872,8 @@ void close_ctree(struct btrfs_root *root)
 
 	kthread_stop(fs_info->transaction_kthread);
 	kthread_stop(fs_info->cleaner_kthread);
+	if (fs_info->casualty_kthread)
+		kthread_stop(fs_info->casualty_kthread);
 
 	fs_info->closing = 2;
 	smp_mb();
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 76354bb..ef4aaf5 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2187,7 +2187,8 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
 	kmem_cache_free(btrfs_trans_handle_cachep, trans);
 
 	if (current != root->fs_info->transaction_kthread &&
-	    current != root->fs_info->cleaner_kthread)
+	    current != root->fs_info->cleaner_kthread &&
+	    current != root->fs_info->casualty_kthread)
 		btrfs_run_delayed_iputs(root);
 
 	return ret;
-- 
2.4.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 0/4] btrfs-progs: Hot spare and Auto replace
  2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
                   ` (14 preceding siblings ...)
  2015-11-09 10:56 ` [PATCH 15/15] btrfs: check for failed device and " Anand Jain
@ 2015-11-09 10:58 ` Anand Jain
  2015-11-09 10:58   ` [PATCH 1/4] btrfs-progs: Introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV SB flags Anand Jain
                     ` (3 more replies)
  2015-11-09 14:09 ` [PATCH 00/15] btrfs: Hot spare and Auto replace Austin S Hemmelgarn
                   ` (3 subsequent siblings)
  19 siblings, 4 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:58 UTC (permalink / raw)
  To: linux-btrfs

Depends on the kernel patch set
  [PATCH 00/15] btrfs: Hot spare and Auto replace

This is btrfs-progs side of the patch set.

Anand Jain (4):
  btrfs-progs: Introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV SB flags
  btrfs-progs: Introduce btrfs spare subcommand
  btrfs-progs: add fi show for spare
  btrfs-progs: add global spare device list to filesystem show

 Android.mk         |   2 +-
 Makefile.in        |   2 +-
 btrfs-show-super.c |   3 +-
 btrfs.c            |   1 +
 cmds-filesystem.c  |   9 ++
 cmds-spare.c       | 291 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 commands.h         |   2 +
 ctree.h            |   4 +-
 utils.h            |   1 +
 volumes.c          |   4 +
 volumes.h          |   2 +
 11 files changed, 317 insertions(+), 4 deletions(-)
 create mode 100644 cmds-spare.c

-- 
2.4.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 1/4] btrfs-progs: Introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV SB flags
  2015-11-09 10:58 ` [PATCH 0/4] btrfs-progs: Hot spare and Auto replace Anand Jain
@ 2015-11-09 10:58   ` Anand Jain
  2015-11-09 10:58   ` [PATCH 2/4] btrfs-progs: Introduce btrfs spare subcommand Anand Jain
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:58 UTC (permalink / raw)
  To: linux-btrfs

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 btrfs-show-super.c | 3 ++-
 ctree.h            | 4 +++-
 volumes.c          | 4 ++++
 volumes.h          | 2 ++
 4 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/btrfs-show-super.c b/btrfs-show-super.c
index 27414c8..d9626cd 100644
--- a/btrfs-show-super.c
+++ b/btrfs-show-super.c
@@ -300,7 +300,8 @@ struct readable_flag_entry incompat_flags_array[] = {
 	DEF_INCOMPAT_FLAG_ENTRY(EXTENDED_IREF),
 	DEF_INCOMPAT_FLAG_ENTRY(RAID56),
 	DEF_INCOMPAT_FLAG_ENTRY(SKINNY_METADATA),
-	DEF_INCOMPAT_FLAG_ENTRY(NO_HOLES)
+	DEF_INCOMPAT_FLAG_ENTRY(NO_HOLES),
+	DEF_INCOMPAT_FLAG_ENTRY(SPARE_DEV)
 };
 static const int incompat_flags_num = sizeof(incompat_flags_array) /
 				      sizeof(struct readable_flag_entry);
diff --git a/ctree.h b/ctree.h
index c57f9ca..2c3aea6 100644
--- a/ctree.h
+++ b/ctree.h
@@ -475,6 +475,7 @@ struct btrfs_super_block {
 #define BTRFS_FEATURE_INCOMPAT_RAID56		(1ULL << 7)
 #define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA	(1ULL << 8)
 #define BTRFS_FEATURE_INCOMPAT_NO_HOLES		(1ULL << 9)
+#define BTRFS_FEATURE_INCOMPAT_SPARE_DEV	(1ULL << 10)
 
 
 #define BTRFS_FEATURE_COMPAT_SUPP		0ULL
@@ -488,7 +489,8 @@ struct btrfs_super_block {
 	 BTRFS_FEATURE_INCOMPAT_RAID56 |		\
 	 BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS |		\
 	 BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA |	\
-	 BTRFS_FEATURE_INCOMPAT_NO_HOLES)
+	 BTRFS_FEATURE_INCOMPAT_NO_HOLES |		\
+	 BTRFS_FEATURE_INCOMPAT_SPARE_DEV)
 
 /*
  * A leaf is full of items. offset and size tell us where to find
diff --git a/volumes.c b/volumes.c
index ca50f1c..beaeecf 100644
--- a/volumes.c
+++ b/volumes.c
@@ -101,6 +101,10 @@ static int device_list_add(const char *path,
 		fs_devices->latest_devid = devid;
 		fs_devices->latest_trans = found_transid;
 		fs_devices->lowest_devid = (u64)-1;
+		if (btrfs_super_incompat_flags(disk_super) &
+				BTRFS_FEATURE_INCOMPAT_SPARE_DEV)
+			fs_devices->spare = 1;
+
 		device = NULL;
 	} else {
 		device = __find_device(&fs_devices->devices, devid,
diff --git a/volumes.h b/volumes.h
index 4ecb993..3b56c1f 100644
--- a/volumes.h
+++ b/volumes.h
@@ -83,6 +83,8 @@ struct btrfs_fs_devices {
 
 	int seeding;
 	struct btrfs_fs_devices *seed;
+
+	int spare;
 };
 
 struct btrfs_bio_stripe {
-- 
2.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 2/4] btrfs-progs: Introduce btrfs spare subcommand
  2015-11-09 10:58 ` [PATCH 0/4] btrfs-progs: Hot spare and Auto replace Anand Jain
  2015-11-09 10:58   ` [PATCH 1/4] btrfs-progs: Introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV SB flags Anand Jain
@ 2015-11-09 10:58   ` Anand Jain
  2015-11-09 10:58   ` [PATCH 3/4] btrfs-progs: add fi show for spare Anand Jain
  2015-11-09 10:58   ` [PATCH 4/4] btrfs-progs: add global spare device list to filesystem show Anand Jain
  3 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:58 UTC (permalink / raw)
  To: linux-btrfs

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 Android.mk   |   2 +-
 Makefile.in  |   2 +-
 btrfs.c      |   1 +
 cmds-spare.c | 291 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 commands.h   |   2 +
 5 files changed, 296 insertions(+), 2 deletions(-)
 create mode 100644 cmds-spare.c

diff --git a/Android.mk b/Android.mk
index fe3209b..baaf179 100644
--- a/Android.mk
+++ b/Android.mk
@@ -27,7 +27,7 @@ cmds_objects := cmds-subvolume.c cmds-filesystem.c cmds-device.c cmds-scrub.c \
                cmds-inspect.c cmds-balance.c cmds-send.c cmds-receive.c \
                cmds-quota.c cmds-qgroup.c cmds-replace.c cmds-check.c \
                cmds-restore.c cmds-rescue.c chunk-recover.c super-recover.c \
-               cmds-property.c cmds-fi-usage.c
+               cmds-property.c cmds-fi-usage.c cmds-spare.c
 libbtrfs_objects := send-stream.c send-utils.c rbtree.c btrfs-list.c crc32c.c \
                    uuid-tree.c utils-lib.c rbtree-utils.c
 libbtrfs_headers := send-stream.h send-utils.h send.h rbtree.h btrfs-list.h \
diff --git a/Makefile.in b/Makefile.in
index 514a76f..1b005b0 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -43,7 +43,7 @@ cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
 	       cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
 	       cmds-quota.o cmds-qgroup.o cmds-replace.o cmds-check.o \
 	       cmds-restore.o cmds-rescue.o chunk-recover.o super-recover.o \
-	       cmds-property.o cmds-fi-usage.o
+	       cmds-property.o cmds-fi-usage.o cmds-spare.o
 libbtrfs_objects = send-stream.o send-utils.o rbtree.o btrfs-list.o crc32c.o \
 		   uuid-tree.o utils-lib.o rbtree-utils.o
 libbtrfs_headers = send-stream.h send-utils.h send.h rbtree.h btrfs-list.h \
diff --git a/btrfs.c b/btrfs.c
index 63df377..ba0dd02 100644
--- a/btrfs.c
+++ b/btrfs.c
@@ -204,6 +204,7 @@ static const struct cmd_group btrfs_cmd_group = {
 		{ "quota", cmd_quota, NULL, &quota_cmd_group, 0 },
 		{ "qgroup", cmd_qgroup, NULL, &qgroup_cmd_group, 0 },
 		{ "replace", cmd_replace, NULL, &replace_cmd_group, 0 },
+		{ "spare", cmd_spare, NULL, &spare_cmd_group, 0 },
 		{ "help", cmd_help, cmd_help_usage, NULL, 0 },
 		{ "version", cmd_version, cmd_version_usage, NULL, 0 },
 		NULL_CMD_STRUCT
diff --git a/cmds-spare.c b/cmds-spare.c
new file mode 100644
index 0000000..cd9c709
--- /dev/null
+++ b/cmds-spare.c
@@ -0,0 +1,291 @@
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <sys/ioctl.h>
+#include <errno.h>
+#include <getopt.h>
+
+#include "ctree.h"
+#include "utils.h"
+#include "volumes.h"
+#include "disk-io.h"
+
+#include "commands.h"
+
+int print_spare_device(unsigned unit_mode)
+{
+	int ret;
+	struct btrfs_fs_devices *fs_devices;
+	struct btrfs_device *device;
+	struct list_head *fs_uuids;
+
+	printf("Global spare\n");
+
+	ret = btrfs_scan_lblkid();
+	if (ret) {
+		fprintf(stderr, "scan_lblkid failed ret %d\n", ret);
+		return ret;
+	}
+
+	fs_uuids = btrfs_scanned_uuids();
+
+	list_for_each_entry(fs_devices, fs_uuids, list) {
+		if (!fs_devices->spare)
+			continue;
+
+		device = list_entry(fs_devices->devices.next,
+					struct btrfs_device, dev_list);
+		if (device->name)
+			printf("\tdevice size %s path %s\n",
+				pretty_size_mode(device->total_bytes,
+					unit_mode), device->name);
+
+	}
+
+	return 0;
+
+}
+
+static void btrfs_delete_spare(char *path)
+{
+	printf("Unscan the device (or don't run device scan after reboot) and run wipefs to wipe SB\n");
+
+}
+
+static void btrfs_add_spare(char *dev)
+{
+	struct stat st;
+	int fd;
+	int i;
+	int ret;
+	u64 block_cnt;
+	u64 blocks[7];
+	u32 nodesz = max_t(u32, sysconf(_SC_PAGESIZE), BTRFS_MKFS_DEFAULT_NODE_SIZE);
+	struct btrfs_mkfs_config mkfs_cfg;
+
+	fd = open(dev, O_RDWR);
+	if (fd < 0) {
+		fprintf(stderr, "unable to open %s: %s\n", dev, strerror(errno));
+		return;
+	}
+
+	if (fstat(fd, &st)) {
+		fprintf(stderr, "unable to stat %s\n", dev);
+		goto out;
+	}
+	block_cnt = btrfs_device_size(fd, &st);
+        if (!block_cnt) {
+                fprintf(stderr, "unable to find %s size\n", dev);
+		goto out;
+        }
+
+	if (block_cnt < BTRFS_MKFS_SYSTEM_GROUP_SIZE) {
+		fprintf(stderr, "device is too small to make filesystem\n");
+		goto out;
+	}
+
+	blocks[0] = BTRFS_SUPER_INFO_OFFSET;
+	for (i = 1; i < 7; i++)
+		blocks[i] = BTRFS_SUPER_INFO_OFFSET + 1024 * 1024 + nodesz * i;
+
+	memset(&mkfs_cfg, 0, sizeof(mkfs_cfg));
+	memcpy(mkfs_cfg.blocks, blocks, sizeof(blocks));
+	mkfs_cfg.num_bytes = block_cnt;
+	mkfs_cfg.nodesize = nodesz;
+	mkfs_cfg.sectorsize = 4096;
+	mkfs_cfg.stripesize = 4096;
+	mkfs_cfg.features = BTRFS_FEATURE_INCOMPAT_SPARE_DEV;
+	ret = make_btrfs(fd, &mkfs_cfg);
+	if (ret)
+		fprintf(stderr, "error during mkfs: %s\n", strerror(-ret));
+
+out:
+	close(fd);
+}
+
+static const char * const spare_cmd_group_usage[] = {
+	"btrfs spare <command> [<args>]",
+	NULL
+};
+
+static const char * const cmd_spare_add_usage[] = {
+	"btrfs spare add <device> [<device>...]",
+	"Add global spare device(s) to btrfs",
+	"-K|--nodiscard    do not perform whole device TRIM",
+	"-f|--force        force overwrite existing filesystem on the disk",
+	NULL
+};
+
+static const char * const cmd_spare_delete_usage[] = {
+	"btrfs spare delete <device> [<device>...]",
+	"Delete global spare device(s) from btrfs",
+	NULL
+};
+
+static const char * const cmd_spare_list_usage[] = {
+	"btrfs spare list",
+	"List spare device(s) both scanned and unscanned(*) for kernel",
+	NULL
+};
+
+static int cmd_spare_add(int argc, char **argv)
+{
+	int i;
+	int force = 0;
+	int discard = 1;
+	int ret = 0;
+
+	while (1) {
+		int c;
+		static const struct option long_options[] = {
+			{ "nodiscard", optional_argument, NULL, 'K'},
+			{ "force", no_argument, NULL, 'f'},
+			{ NULL, 0, NULL, 0}
+		};
+
+		c = getopt_long(argc, argv, "f", long_options, NULL);
+		if (c < 0)
+			break;
+
+		switch (c) {
+		case 'K':
+			discard = 0;
+			break;
+		case 'f':
+			force = 1;
+			break;
+		default:
+			usage(cmd_spare_add_usage);
+		}
+	}
+
+	if (check_argc_min(argc - optind, 1))
+		usage(cmd_spare_add_usage);
+
+	for (i = optind; i < argc; i++) {
+		u64 dev_block_count = 0;
+		int devfd;
+		char *path;
+		int res;
+		int mixed;
+
+		if (test_dev_for_mkfs(argv[i], force)) {
+			ret++;
+			continue;
+		}
+
+		devfd = open(argv[i], O_RDWR);
+		if (devfd < 0) {
+			fprintf(stderr, "ERROR: Unable to open device '%s'\n", argv[i]);
+			ret++;
+			continue;
+		}
+
+		res = btrfs_prepare_device(devfd, argv[i], 1, &dev_block_count,
+								0, &mixed, discard);
+		close(devfd);
+		if (res) {
+			ret++;
+			goto error_out;
+		}
+
+		path = canonicalize_path(argv[i]);
+		if (!path) {
+			fprintf(stderr,
+				"ERROR: Could not canonicalize pathname '%s': %s\n",
+				argv[i], strerror(errno));
+			ret++;
+			goto error_out;
+		}
+
+		btrfs_add_spare(path);
+		free(path);
+	}
+error_out:
+	btrfs_close_all_devices();
+	return !!ret;
+}
+
+static int cmd_spare_delete(int argc, char **argv)
+{
+	int i;
+	char *path;
+	int ret = 0;
+
+	if (check_argc_min(argc - optind, 1))
+		usage(cmd_spare_add_usage);
+
+	for (i = optind; i < argc; i++) {
+		int devfd;
+
+		devfd = open(argv[i], O_RDWR);
+		if (devfd < 0) {
+			fprintf(stderr, "ERROR: Unable to open device '%s'\n", argv[i]);
+			ret++;
+			continue;
+		}
+		close(devfd);
+
+		path = canonicalize_path(argv[i]);
+		if (!path) {
+			fprintf(stderr,
+				"ERROR: Could not canonicalize pathname '%s': %s\n",
+				argv[i], strerror(errno));
+			ret++;
+			goto error_out;
+		}
+
+		btrfs_delete_spare(path);
+		free(path);
+	}
+
+error_out:
+	btrfs_close_all_devices();
+	return !!ret;
+}
+
+int cmd_spare_list(int argc, char **argv)
+{
+	int ret;
+	unsigned unit_mode;
+
+	unit_mode = get_unit_mode_from_arg(&argc, argv, 0);
+
+	ret = print_spare_device(unit_mode);
+
+	return !!ret;
+}
+
+static const char spare_cmd_group_info[] =
+	"manage spare devices in the filesystem";
+
+const struct cmd_group spare_cmd_group = {
+	spare_cmd_group_usage, spare_cmd_group_info, {
+		{ "add", cmd_spare_add, cmd_spare_add_usage, NULL, 0 },
+		{ "delete", cmd_spare_delete, cmd_spare_delete_usage, NULL, 0},
+		{ "list", cmd_spare_list, cmd_spare_list_usage, NULL, 0},
+		NULL_CMD_STRUCT
+	}
+};
+
+int cmd_spare(int argc, char **argv)
+{
+	return handle_command_group(&spare_cmd_group, argc, argv);
+}
diff --git a/commands.h b/commands.h
index d2bb093..6f68ef1 100644
--- a/commands.h
+++ b/commands.h
@@ -95,6 +95,7 @@ extern const struct cmd_group quota_cmd_group;
 extern const struct cmd_group qgroup_cmd_group;
 extern const struct cmd_group replace_cmd_group;
 extern const struct cmd_group rescue_cmd_group;
+extern const struct cmd_group spare_cmd_group;
 
 extern const char * const cmd_send_usage[];
 extern const char * const cmd_receive_usage[];
@@ -119,6 +120,7 @@ int cmd_receive(int argc, char **argv);
 int cmd_quota(int argc, char **argv);
 int cmd_qgroup(int argc, char **argv);
 int cmd_replace(int argc, char **argv);
+int cmd_spare(int argc, char **argv);
 int cmd_restore(int argc, char **argv);
 int cmd_select_super(int argc, char **argv);
 int cmd_dump_super(int argc, char **argv);
-- 
2.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 3/4] btrfs-progs: add fi show for spare
  2015-11-09 10:58 ` [PATCH 0/4] btrfs-progs: Hot spare and Auto replace Anand Jain
  2015-11-09 10:58   ` [PATCH 1/4] btrfs-progs: Introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV SB flags Anand Jain
  2015-11-09 10:58   ` [PATCH 2/4] btrfs-progs: Introduce btrfs spare subcommand Anand Jain
@ 2015-11-09 10:58   ` Anand Jain
  2015-11-09 10:58   ` [PATCH 4/4] btrfs-progs: add global spare device list to filesystem show Anand Jain
  3 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:58 UTC (permalink / raw)
  To: linux-btrfs

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 cmds-filesystem.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 4d3a9a4..11d0406 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -353,6 +353,9 @@ static void print_one_uuid(struct btrfs_fs_devices *fs_devices,
 	if (add_seen_fsid(fs_devices->fsid))
 		return;
 
+	if (fs_devices->spare)
+		return;
+
 	uuid_unparse(fs_devices->fsid, uuidbuf);
 	device = list_entry(fs_devices->devices.next, struct btrfs_device,
 			    dev_list);
@@ -610,6 +613,7 @@ static int copy_fs_devices(struct btrfs_fs_devices *dst,
 	memcpy(dst->fsid, src->fsid, BTRFS_FSID_SIZE);
 	INIT_LIST_HEAD(&dst->devices);
 	dst->seed = NULL;
+	dst->spare = src->spare;
 
 	list_for_each_entry(cur_dev, &src->devices, dev_list) {
 		dev_copy = malloc(sizeof(*dev_copy));
-- 
2.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 4/4] btrfs-progs: add global spare device list to filesystem show
  2015-11-09 10:58 ` [PATCH 0/4] btrfs-progs: Hot spare and Auto replace Anand Jain
                     ` (2 preceding siblings ...)
  2015-11-09 10:58   ` [PATCH 3/4] btrfs-progs: add fi show for spare Anand Jain
@ 2015-11-09 10:58   ` Anand Jain
  3 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:58 UTC (permalink / raw)
  To: linux-btrfs

This patch will add list of spare devices to the filesystem show
output, as show in the example below.

btrfs fi show
Label: none  uuid: 17f7d403-17d7-4f0a-b8ba-de673fdd3f56
	Total devices 2 FS bytes used 15.88MiB
	devid    1 size 2.00GiB used 417.50MiB path /dev/sdc
	devid    2 size 2.00GiB used 417.50MiB path /dev/sdd

Global spare
	device size 3.00GiB path /dev/sde

btrfs-progs v4.2.3-12-gb5f4b68

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 cmds-filesystem.c | 5 +++++
 utils.h           | 1 +
 2 files changed, 6 insertions(+)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 11d0406..651ffe4 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -920,6 +920,11 @@ devs_only:
 					struct btrfs_fs_devices, list);
 		free_fs_devices(fs_devices);
 	}
+
+	if (where == -1 && search == NULL) {
+		ret = print_spare_device(unit_mode);
+		printf("\n");
+	}
 out:
 	printf("%s\n", PACKAGE_STRING);
 	free_seen_fsid();
diff --git a/utils.h b/utils.h
index a84cf2d..b833390 100644
--- a/utils.h
+++ b/utils.h
@@ -271,5 +271,6 @@ const char *get_argv0_buf(void);
 
 unsigned int get_unit_mode_from_arg(int *argc, char *argv[], int df_mode);
 int is_numerical(const char *str);
+int print_spare_device(unsigned unit_mode);
 
 #endif
-- 
2.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
  2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
                   ` (15 preceding siblings ...)
  2015-11-09 10:58 ` [PATCH 0/4] btrfs-progs: Hot spare and Auto replace Anand Jain
@ 2015-11-09 14:09 ` Austin S Hemmelgarn
  2015-11-09 21:29   ` Duncan
  2015-11-12  2:15 ` Qu Wenruo
                   ` (2 subsequent siblings)
  19 siblings, 1 reply; 43+ messages in thread
From: Austin S Hemmelgarn @ 2015-11-09 14:09 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3120 bytes --]

On 2015-11-09 05:56, Anand Jain wrote:
> These set of patches provides btrfs hot spare and auto replace support
> for you review and comments.
It's absolutely awesome to see that someone picked up this project, it's 
something that's very useful and helps BTRFS to compete with many 
established storage technologies.  I've got some specific questions below.
>
> First, here below are the simple example steps to configure the same:
>
> Add a spare device:
>      btrfs spare add /dev/sde -f
>
> OR if there is a spare device which is already added before the, just
> run
>
>      btrfs dev scan [/dev/sde]
>
> this will register the spare device to the kernel.
>
>      btrfs fi show
>      Label: none  uuid: 52f170c1-725c-457d-8cfd-d57090460091
> 	Total devices 2 FS bytes used 112.00KiB
> 	devid    1 size 2.00GiB used 417.50MiB path /dev/sdc
> 	devid    2 size 2.00GiB used 417.50MiB path /dev/sdd
>
>      Global spare
> 	device size 3.00GiB path /dev/sde
Would I be correct in assuming that we can have more than one hot-spare 
device at a time?  If so, what method is used to select which one to use 
when one is needed?
>
> Thats it.
>
> Auto replace:
>   Replace happens automatically, that is when there is any write
>   failed or flush failed, the device will be marked as failed, which
>   will stop any further IO attempt to that device. And in the next commit
>   thread cycle the auto replace will pick the spare device (/dev/sde is
>   above example) to replace the failed device. And so the btrfs volume is
>   back to a healthy state.
Is there any possibility we could add a knob to control how many errors 
are needed before the device is marked as failed?  For an enterprise 
environment, immediately marking the device failed is the right thing to 
do, but for home usage it may make more sense to retry the I/O at least 
once before marking the device failed (especially considering that most 
home users don't have ECC memory, and a transient memory error can cause 
an I/O request to fail (I've actually had this happen on my laptop before)).
>
>
> Its btrfs Global spare:
>   as of now only global hot spare is supported, that is hot spare(s)
>   are for all the btrfs FS in the system.
How hard would it be to eventually extend this to per-filesystem hot-spares?
>
> No spare when device failed:
>   It would scan for spare device at the rate of transaction commit
>   and will trigger the auto replace when ever spare device is added.
Does this absolutely have to be polled every commit?  This has serious 
potential to make running on a degraded array have a much bigger impact 
than it does now.  While we obviously want people to notice that their 
array is degraded, killing performance is not the proper way to do that. 
  Couldn't we have a callback when adding a hot-spare that would check 
for failed devices and initiate the replacement automatically for the 
first one found?  Ideally, we should keep the current behavior (assume 
the error was transient, and retry the I/O) when there is no hot-spare 
available.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
  2015-11-09 14:09 ` [PATCH 00/15] btrfs: Hot spare and Auto replace Austin S Hemmelgarn
@ 2015-11-09 21:29   ` Duncan
  2015-11-10 12:13     ` Austin S Hemmelgarn
  0 siblings, 1 reply; 43+ messages in thread
From: Duncan @ 2015-11-09 21:29 UTC (permalink / raw)
  To: linux-btrfs

Austin S Hemmelgarn posted on Mon, 09 Nov 2015 09:09:07 -0500 as
excerpted:

>>   btrfs fi show
>>   Label: none  uuid: 52f170c1-725c-457d-8cfd-d57090460091
>>    Total devices 2 FS bytes used 112.00KiB
>>    devid    1 size 2.00GiB used 417.50MiB path /dev/sdc 
>>    devid    2 size 2.00GiB used 417.50MiB path /dev/sdd
>>
>>   Global spare
>>    device size 3.00GiB path /dev/sde

First of all, thanks from me too, AJ, for this very nice new feature. =:^)

> Would I be correct in assuming that we can have more than one hot-spare
> device at a time?  If so, what method is used to select which one to use
> when one is needed?

In the later patches overview section, patches 10,11,12,13/15 paragraph, 
AJ mentions a helper function to pick/release a spare device from/to the 
spare devices pool.  That would appear to be patch 13, provide framework 
to get and put a spare device.

Which means yes, multiple hot-spares are (at least planned to be) 
allowed. =:^)

While I'm not a coder and could very well be misinterpreting this, 
however, reading the btrfs_get_spare_device function in patch 13, there's 
a comment that goes like this:

>> /* as of now there is only one device in the spare fs_devices */

I don't read C well enough to know whether that's a comment on the 
internal progress in the function (tho I don't see any obvious hints to 
indicate that), or whether it can be taken at face value, that right now 
there's only provision for one in the "pool" (seems the more obvious 
interpretation).

So unless my lack of C skills is deceiving me, while a pool is intended, 
current patch implementation status simply assumes a spare pool of one, 
and the first spare found is picked. The put function in the same patch 
doesn't appear to have a limit on the number of spares that can be added, 
so assuming the current pool implementation allows it, more than one 
spare can be added to the pool, but as I said, the get function appears 
to assume just one in the pool, so picks the first spare it finds.


Which is quite reasonable for a first patch series posting that may well 
require additional iterations, particularly so given the get helper 
function is already nicely modularized so adding more complex picker 
logic should be relatively simple.


Not that targeting particular use-cases is appropriate at this point, but 
simply for information purposes, my particular use-case is a bunch of 
different size independent raid1 btrfs on partitions, but with the 
devices composing each raid1 of identical size.  I think a reasonably 
simple picker logic optimization would be to first check if there's a 
spare matching the size of the failing device, and use it in preference 
to others of different sizes if so.

Given my partitioned usage, a failing physical device will trigger a 
whole slew of failing btrfs logical devices (partitions on that physical 
device), so in ordered for this feature to be of much use to me I'd have 
to maintain a whole series of spares, one for each btrfs logical device 
on a partition on the failing physical device, since they'd all fail at 
once.

Since those partitions and the btrfs on top of them are different sizes, 
a size-matching logic lets me partition the physical spare identically to 
the operational devices and simply add all the partitions to the spare 
list, while without size-matching logic, to ensure a large enough spare 
was picked for the largest btrfs, I'd have to make all the spares that 
size, and they'd no longer all fit on a single physical device of the 
same size as the originals, possibly not even on two physical devices 
that size.

At least at the non-enterprise level, size-similar picking logic would 
seem to be pretty useful if not feature critical, then, and given that 
it /should/ be reasonably simple to implement, I'd hope that doing so 
becomes a priority, tho certainly an initial first-pick base 
implementation to which size-similar logic can be added later, is fine as 
well.  I'd just hope that "later" is within a couple kernel cycles, not a 
couple kernel major version cycles (~4 years each with bumps at .20).

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
  2015-11-09 21:29   ` Duncan
@ 2015-11-10 12:13     ` Austin S Hemmelgarn
  2015-11-13 10:17       ` Anand Jain
  0 siblings, 1 reply; 43+ messages in thread
From: Austin S Hemmelgarn @ 2015-11-10 12:13 UTC (permalink / raw)
  To: Duncan, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3155 bytes --]

On 2015-11-09 16:29, Duncan wrote:
> Austin S Hemmelgarn posted on Mon, 09 Nov 2015 09:09:07 -0500 as
> excerpted:
>
>>>    btrfs fi show
>>>    Label: none  uuid: 52f170c1-725c-457d-8cfd-d57090460091
>>>     Total devices 2 FS bytes used 112.00KiB
>>>     devid    1 size 2.00GiB used 417.50MiB path /dev/sdc
>>>     devid    2 size 2.00GiB used 417.50MiB path /dev/sdd
>>>
>>>    Global spare
>>>     device size 3.00GiB path /dev/sde
>
> First of all, thanks from me too, AJ, for this very nice new feature. =:^)
>
>> Would I be correct in assuming that we can have more than one hot-spare
>> device at a time?  If so, what method is used to select which one to use
>> when one is needed?
>
> In the later patches overview section, patches 10,11,12,13/15 paragraph,
> AJ mentions a helper function to pick/release a spare device from/to the
> spare devices pool.  That would appear to be patch 13, provide framework
> to get and put a spare device.
>
> Which means yes, multiple hot-spares are (at least planned to be)
> allowed. =:^)
Ah, you're right, somehow I missed that bit.
>
> While I'm not a coder and could very well be misinterpreting this,
> however, reading the btrfs_get_spare_device function in patch 13, there's
> a comment that goes like this:
>
>>> /* as of now there is only one device in the spare fs_devices */
>
> I don't read C well enough to know whether that's a comment on the
> internal progress in the function (tho I don't see any obvious hints to
> indicate that), or whether it can be taken at face value, that right now
> there's only provision for one in the "pool" (seems the more obvious
> interpretation).
>
> So unless my lack of C skills is deceiving me, while a pool is intended,
> current patch implementation status simply assumes a spare pool of one,
> and the first spare found is picked. The put function in the same patch
> doesn't appear to have a limit on the number of spares that can be added,
> so assuming the current pool implementation allows it, more than one
> spare can be added to the pool, but as I said, the get function appears
> to assume just one in the pool, so picks the first spare it finds.
AFAICT, you are correct.  I hadn't yet gotten a chance to look at the 
actual code, so I hadn't seen this yet.
>
> At least at the non-enterprise level, size-similar picking logic would
> seem to be pretty useful if not feature critical, then, and given that
> it /should/ be reasonably simple to implement, I'd hope that doing so
> becomes a priority, tho certainly an initial first-pick base
> implementation to which size-similar logic can be added later, is fine as
> well.  I'd just hope that "later" is within a couple kernel cycles, not a
> couple kernel major version cycles (~4 years each with bumps at .20).
>
Hopefully, per-filesystem hot-spares will be a high priority too, as 
that type of usage is pretty much required for many enterprise type 
uses, although that doesn't need to be the same code doing it (in fact, 
I could see having per-fs spares and global spares both available 
potentially being very useful).


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
  2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
                   ` (16 preceding siblings ...)
  2015-11-09 14:09 ` [PATCH 00/15] btrfs: Hot spare and Auto replace Austin S Hemmelgarn
@ 2015-11-12  2:15 ` Qu Wenruo
  2015-11-12  6:46   ` Duncan
                     ` (3 more replies)
  2015-11-12 19:21 ` Goffredo Baroncelli
  2015-11-16 13:41 ` Austin S Hemmelgarn
  19 siblings, 4 replies; 43+ messages in thread
From: Qu Wenruo @ 2015-11-12  2:15 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs

Hi Anand,

Nice work.
But I have some small questions about it.

Anand Jain wrote on 2015/11/09 18:56 +0800:
> These set of patches provides btrfs hot spare and auto replace support
> for you review and comments.
>
> First, here below are the simple example steps to configure the same:
>
> Add a spare device:
>      btrfs spare add /dev/sde -f

I'm sorry but I didn't quite see the benefit of a spare device.

Let's take the following example:

1) 2 RAID1 + 1 spare
    (A + B) + C

2) 3 RAID1
    (A + B + C)
Let's assume they are all 12G size, and there are 3 raid1 chunks.
Each one is 3G size.

In my understanding, in normal operation case:

For case 1), all raid chunks should only be allocated into 2 RAID disks,
and spare one should contains no raid1 chunks.

   A       B       C
------  ------  ------
|free|  |free|  |free|
------  ------  |    |
|3Ga1|  |3Ga2|  |    |
------  ------  |    |
|3Gb1|  |3Gb2|  |    |
------  ------  |    |
|3Gc1|  |3Gc2|  |    |
------  ------  ------


For case 2), all raid1 chunks will be allocated into all 3 disks, making 
the allocation more fair.
   A       B       C
------  ------  ------
|free|  |free|  |free|
------  ------  ------
|free|  |free|  |free|
------  ------  ------
|3Gb2|  |3Ga1|  |3Ga2|
------  ------  ------
|3Gc1|  |3Gc2|  |3Gb1|
------  ------  ------


At least in normal operation case, case 1) makes device C useless, and 
reduce the total usable space.

In disk B failure case:

For case 1), we can auto replace B with C.
And it will copy all data chunks from A to C.
Need to copy 9G data.

And after replace:
   A       B       C
------  ------  ------
|free|  | X  |  |free|
------  ------  ------
|3Ga1|  | X  |->|3Ga2|
------  ------  ------
|3Gb1|  | X  |->|3Gb2|
------  ------  ------
|3Gc1|  | X  |->|3Gc2|
------  ------  ------



For case 2), we can just relocate and recover the bad chunks in B.
It it should only need to copy 6G data.

And after the "recovery", it should be much the same as case 1):
   A       B       C
------  ------  ------
|free|  | X  |  |free|
------  ------  ------
|3Ga1|<\| X  |/>|3Gc1|
------  ------  ------
|3Gb2| || X  |/ |3Ga2|
------  ------  ------
|3Gc1| \| X  |  |3Gb1|
------  ------  ------


IIRC, the only benefit of a spare device is, we can ensure there is 
enough space for a device place.(If the failing one is no larger than 
spare).

But the cost is, increase in replace data copy and unfair chunk allocation.

So I am not sure if the cost is good enough for the case.
At least, enhancing the chunk relocation to fulfill the case 2) will 
bring a much smaller code base.

Thanks,
Qu
>
> OR if there is a spare device which is already added before the, just
> run
>
>      btrfs dev scan [/dev/sde]
>
> this will register the spare device to the kernel.
>
>      btrfs fi show
>      Label: none  uuid: 52f170c1-725c-457d-8cfd-d57090460091
> 	Total devices 2 FS bytes used 112.00KiB
> 	devid    1 size 2.00GiB used 417.50MiB path /dev/sdc
> 	devid    2 size 2.00GiB used 417.50MiB path /dev/sdd
>
>      Global spare
> 	device size 3.00GiB path /dev/sde
>
> Thats it.
>
> Auto replace:
>   Replace happens automatically, that is when there is any write
>   failed or flush failed, the device will be marked as failed, which
>   will stop any further IO attempt to that device. And in the next commit
>   thread cycle the auto replace will pick the spare device (/dev/sde is
>   above example) to replace the failed device. And so the btrfs volume is
>   back to a healthy state.
>
>
> Its btrfs Global spare:
>   as of now only global hot spare is supported, that is hot spare(s)
>   are for all the btrfs FS in the system.
>
> No spare when device failed:
>   It would scan for spare device at the rate of transaction commit
>   and will trigger the auto replace when ever spare device is added.
>
> Priority:
>   In some future work there can be some chronological order to pick
>   a spare and the failed device.
>
>
> Patches:
>
> Kernel:
> First, it needs, Qu's per chunk missing device patchset,
> which is part of the set here and also there is a light optimization
> (patch 5/15) which was required as part of this enhancement.
>
> Next patches 7,8/15 brings in support, to manage the transition of
> devices from online (no state) to offline OR failed state dynamically.
> On top of static device state like the current "missing" state.
>
> Patch 9/15 fixes a bug where in we should have blocked the incompatible
> feature at the device scan/add level instead/also at in the mount level.
> This is because we don't have to bring a device into the device list,
> if it is incompatible.
>
> Next patches 10,11,12,13/15 adds support for Spare device. For the
> details on how to add a spare device kindly see further below.
> For kernel with out spare feature supported the spare device
> is kept away. And when the kernel supports the spare device, it will
> inhibit from mounting it. Further these patch set provides helper
> function to pick a spare device and release a spare device back to
> the spare device pool.
>
> Patch 14/15 provides function for auto replace, this is mainly
> from the existing replace code, and in the long run I see opportunity
> to merge these code with the replace code that is triggered from
> the user spare.
>
> Last 15/15, uses all these facilities, picks a failed device and
> triggers a auto replace in a kthread (casualty_kthread())
>
>
> Progs:
> Would need 4 patches as listed below.
>
>
> Known Bug:
>
> As now I see below stale kmem cache during module unload. Which
> I am digging.
> ------
> BUG btrfs_path (Not tainted): Objects remaining in btrfs_path on kmem_cache_close()
> ------
>
> Anand Jain (10):
>    btrfs: optimize btrfs_check_degradable() for calls outside of barrier
>    btrfs: introduce device dynamic state transition to offline or failed
>    btrfs: check device for critical errors and mark failed
>    btrfs: block incompatible optional features at scan
>    btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV
>    btrfs: add check not to mount a spare device
>    btrfs: support btrfs dev scan for spare device
>    btrfs: provide framework to get and put a spare device
>    btrfs: introduce helper functions to perform hot replace
>    btrfs: check for failed device and hot replace
>
> Qu Wenruo (5):
>    btrfs: Introduce a new function to check if all chunks a OK for
>      degraded mount
>    btrfs: Do per-chunk check for mount time check
>    btrfs: Do per-chunk degraded check for remount
>    btrfs: Allow barrier_all_devices to do per-chunk device check
>    btrfs: Cleanup num_tolerated_disk_barrier_failures
>
>   fs/btrfs/ctree.h       |   7 +-
>   fs/btrfs/dev-replace.c | 116 ++++++++++++++++++++
>   fs/btrfs/dev-replace.h |   1 +
>   fs/btrfs/disk-io.c     | 211 +++++++++++++++++++++++-------------
>   fs/btrfs/disk-io.h     |   2 -
>   fs/btrfs/super.c       |  20 +++-
>   fs/btrfs/transaction.c |   3 +-
>   fs/btrfs/volumes.c     | 283 ++++++++++++++++++++++++++++++++++++++++++++++---
>   fs/btrfs/volumes.h     |  27 +++++
>   9 files changed, 571 insertions(+), 99 deletions(-)
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
  2015-11-12  2:15 ` Qu Wenruo
@ 2015-11-12  6:46   ` Duncan
  2015-11-12 13:04   ` Austin S Hemmelgarn
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 43+ messages in thread
From: Duncan @ 2015-11-12  6:46 UTC (permalink / raw)
  To: linux-btrfs

Qu Wenruo posted on Thu, 12 Nov 2015 10:15:09 +0800 as excerpted:

> Anand Jain wrote on 2015/11/09 18:56 +0800:
>> These set of patches provides btrfs hot spare and auto replace support
>> for you review and comments.
>>
>> First, here below are the simple example steps to configure the same:
>>
>> Add a spare device:
>>      btrfs spare add /dev/sde -f
> 
> I'm sorry but I didn't quite see the benefit of a spare device.

You could ask the mdraid folks much the same question about spares there, 
and the answer would I think be very much the same...  I'll just present 
a couple points of the several that can be made.

Perhaps the biggest point for this particular case...

What you're forgetting is that the work here introduces the _global_ 
spare -- one spare device (or pool of devices) for the whole set of 
btrfs, no matter how many independent btrfs there happen to be on a 
machine.

Your example used just one filesystem, in which case this point is null 
and void, but what of the case where there's two?  You can't have the 
same device be part of *both* filesystems.  What if the device is part of 
btrfs A, but btrfs b is the one that loses a device?  In your example, 
you're out of luck.  But as a global spare, the "extra" device doesn't 
become attached to a specific btrfs until one of the existing devices 
goes bad.  With working global spares, the first btrfs to have a bad 
device will see the spare and be able to grab it, no matter which of the 
two (or 10 or 100) separate btrfs it happens to be, as it's a _global_ 
spare, not actually attached to a specific btrfs until it is needed as a 
replacement.

By extension, there's the spare _pool_.  Suppose you have three separate 
btrfs and three separate "extra" devices.  You can attach one to each 
btrfs and be fine... if the existing devices all play nice and a second 
one doesn't go out on any of them until all three have had one device go 
out.  But what happens if one btrfs gets some real heavy unexpected use 
and loses three devices before the other two btrfs lose any?  With global 
spares, the unlucky btrfs can call for one at a time, and assuming 
there's time for it to fully integrate before the next one dies, it can 
call for the next and the next, and get all three, one at a time, without 
the admin having to worry about manually device deleting the second and 
third devices from their other btrfs, to attach to the unlucky/greedy one.

And that three btrfs, three-device global-spare-pool scenario, with an 
unlucky/greedy btrfs ending up getting all three spares, brings up a 
second point...

In that scenario without global hot-spares, say you've added one more 
device to what ends up the unlucky btrfs than it'd need, so with auto-
repair it can detect a failing device and automatically device-delete it 
down to its device-minimum (either due to raid level or due to 
capacity).  Now another device fails.  Oops!  Can't auto-repair now!

But in the global hot-spare-pool scenario, with one repair done, there's 
still two spares in the pool, so at the second device failure, it can 
automatically pull a second from the pool (where given the pool it can be 
instead of already attached to one of the other btrfs') and complete the 
second repair, still without admin intervention.  Same again for the 
third.

So an admin who doesn't want to have to intervene when he's supposedly on 
vacation can setup a queue of spares, and sure, if he's a good admin, 
when a device goes bad and a spare is automatically pulled in to replace 
it, he'll be notified, and he'll probably login to check logs and see 
what happened, but no problem, there's still others in the queue.

In fact, since the common folk wisdom says this sort of bad event 
(someone you know getting a disease like cancer or dying, devices in your 
machines going bad, friends having their significant others leave them... 
at least here in the US, folk wisdom says it always happens in threes, so 
particularly once two happen, people start wondering who/where the third 
one is going to occur) happens in threes, a somewhat superstitious admin 
could ensure he had four, well, he's cautious too, so make it five, 
global spares setup, just in case.  Then it wouldn't matter if the three 
devices going bad were all on the same btrfs, or one each on the three, 
or two on one and a third elsewhere, he'd still have two additional 
devices in the pool, just to cover his a** if the three /did/ go out.

Now about time he loses a fourth, he better be on the phone confirming a 
ticket home, but even then, he still has the one still in the pool, as he 
was cautious, too, hopefully giving him time to actually /make/ it home 
before two more go out leaving the pool empty and a btrfs a device down.  
And if he's /that/ unlucky, well, maybe he better make a call to his 
lawyer confirming his last will and testament before he steps on that 
plane, too. =:^(

Just a short mention of a third point, too.

Devices in the pool presumably will be idle and thus spun down, thus not 
already wearing out like they would be if they were already in use all 
that time they're in the spare pool.

Those are the biggest and most obvious ones I know of.  Talk to any good 
admin who has handled lots of raid and I'm sure they'll provide a few 
more.

FWIW, there's also a case to be made for spare pools that may not be 
global, but that can still be attached to more than one btrfs/raid, if 
desired.  Consider the case for two pools, one with fast but small ssds 
while the other has slow but large spinning rust, with the ability to map 
individual btrfs to one or the other pool, or to neither, for instance.  
But this patch series simply introduces the global pool and 
functionality, leaving such fancy additional functionality for later.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
  2015-11-12  2:15 ` Qu Wenruo
  2015-11-12  6:46   ` Duncan
@ 2015-11-12 13:04   ` Austin S Hemmelgarn
  2015-11-13  1:07     ` Qu Wenruo
  2015-11-12 19:08   ` Goffredo Baroncelli
  2015-11-13 10:18   ` Anand Jain
  3 siblings, 1 reply; 43+ messages in thread
From: Austin S Hemmelgarn @ 2015-11-12 13:04 UTC (permalink / raw)
  To: Qu Wenruo, Anand Jain, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1874 bytes --]

On 2015-11-11 21:15, Qu Wenruo wrote:
> Hi Anand,
>
> Nice work.
> But I have some small questions about it.
>
> Anand Jain wrote on 2015/11/09 18:56 +0800:
>> These set of patches provides btrfs hot spare and auto replace support
>> for you review and comments.
>>
>> First, here below are the simple example steps to configure the same:
>>
>> Add a spare device:
>>      btrfs spare add /dev/sde -f
>
> I'm sorry but I didn't quite see the benefit of a spare device.
Aside from what Duncan said (and I happen to agree with him), there is 
also the fact that hot-spares are (at least traditionally in most RAID 
systems) usually used with RAID5 or RAID6 (or some other parity scheme).

So, to summarize:
1. Hot spares are more useful for most users in global context, and in 
that case only if they have more than one filesystem.
2. A pool of hot spares is even more useful.
3. Assuming whole disk usage (as opposed to partitioning), the hot spare 
will have no load on it until it gets used, at which point it will 
almost always be in better physical condition than the device it 
replaced (which is important for HA systems, in such cases you replace 
the disk that failed, and make the new disk a hot spare)
4. Hot spares are more often used (at least from what I've seen) on 
parity based raid systems than raid1.

In the rather limited case you outlined, I would probably just use raid1 
across all three devices myself (unless they were whole disks and not 
individual partitions, in which case I'd use a hot spare), but looking 
beyond that at my actual usage of BTRFS (multiple filesystems with 
multiple different raid profiles, spread across various disks), hot 
spares are definitely useful (although they would be more useful if I 
could specify that a given hot spare be used only for a given set of 
filesystems).



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
  2015-11-12  2:15 ` Qu Wenruo
  2015-11-12  6:46   ` Duncan
  2015-11-12 13:04   ` Austin S Hemmelgarn
@ 2015-11-12 19:08   ` Goffredo Baroncelli
  2015-11-13 10:18   ` Anand Jain
  3 siblings, 0 replies; 43+ messages in thread
From: Goffredo Baroncelli @ 2015-11-12 19:08 UTC (permalink / raw)
  To: Qu Wenruo, Anand Jain, linux-btrfs

On 2015-11-12 03:15, Qu Wenruo wrote:
> Hi Anand,
> 
> Nice work.
> But I have some small questions about it.
> 
> Anand Jain wrote on 2015/11/09 18:56 +0800:
>> These set of patches provides btrfs hot spare and auto replace support
>> for you review and comments.
>>
>> First, here below are the simple example steps to configure the same:
>>
>> Add a spare device:
>>      btrfs spare add /dev/sde -f
> 
> I'm sorry but I didn't quite see the benefit of a spare device.
> 
> Let's take the following example:
> 
> 1) 2 RAID1 + 1 spare
>    (A + B) + C
> 
> 2) 3 RAID1
>    (A + B + C)
> Let's assume they are all 12G size, and there are 3 raid1 chunks.
> Each one is 3G size.
> 
> In my understanding, in normal operation case:
> 
> For case 1), all raid chunks should only be allocated into 2 RAID disks,
> and spare one should contains no raid1 chunks.
> 
>   A       B       C
> ------  ------  ------
> |free|  |free|  |free|
> ------  ------  |    |
> |3Ga1|  |3Ga2|  |    |
> ------  ------  |    |
> |3Gb1|  |3Gb2|  |    |
> ------  ------  |    |
> |3Gc1|  |3Gc2|  |    |
> ------  ------  ------
> 
> 
> For case 2), all raid1 chunks will be allocated into all 3 disks, making the allocation more fair.
>   A       B       C
> ------  ------  ------
> |free|  |free|  |free|
> ------  ------  ------
> |free|  |free|  |free|
> ------  ------  ------
> |3Gb2|  |3Ga1|  |3Ga2|
> ------  ------  ------
> |3Gc1|  |3Gc2|  |3Gb1|
> ------  ------  ------
> 
> 
> At least in normal operation case, case 1) makes device C useless, and reduce the total usable space.
> 
> In disk B failure case:
> 
> For case 1), we can auto replace B with C.
> And it will copy all data chunks from A to C.
> Need to copy 9G data.
> 
> And after replace:
>   A       B       C
> ------  ------  ------
> |free|  | X  |  |free|
> ------  ------  ------
> |3Ga1|  | X  |->|3Ga2|
> ------  ------  ------
> |3Gb1|  | X  |->|3Gb2|
> ------  ------  ------
> |3Gc1|  | X  |->|3Gc2|
> ------  ------  ------
> 
> 
> 
> For case 2), we can just relocate and recover the bad chunks in B.
> It it should only need to copy 6G data.
> 
> And after the "recovery", it should be much the same as case 1):
>   A       B       C
> ------  ------  ------
> |free|  | X  |  |free|
> ------  ------  ------
> |3Ga1|<\| X  |/>|3Gc1|
> ------  ------  ------
> |3Gb2| || X  |/ |3Ga2|
> ------  ------  ------
> |3Gc1| \| X  |  |3Gb1|
> ------  ------  ------
> 
> 
> IIRC, the only benefit of a spare device is, we can ensure there is enough space for a device place.(If the failing one is no larger than spare).
> 
> But the cost is, increase in replace data copy and unfair chunk allocation.
> 
> So I am not sure if the cost is good enough for the case.
> At least, enhancing the chunk relocation to fulfill the case 2) will bring a much smaller code base.
> 
> Thanks,
> Qu

Interesting analysis. Another difference between the two scenarios, is that in the first case (A+B+spare) is that the spare doesn't work until it is needed: less power consumption and when needed you are using a new disk instead of an used one. 

>>
>> OR if there is a spare device which is already added before the, just
>> run
>>
>>      btrfs dev scan [/dev/sde]
>>
>> this will register the spare device to the kernel.
>>
>>      btrfs fi show
>>      Label: none  uuid: 52f170c1-725c-457d-8cfd-d57090460091
>>     Total devices 2 FS bytes used 112.00KiB
>>     devid    1 size 2.00GiB used 417.50MiB path /dev/sdc
>>     devid    2 size 2.00GiB used 417.50MiB path /dev/sdd
>>
>>      Global spare
>>     device size 3.00GiB path /dev/sde
>>
>> Thats it.
>>
>> Auto replace:
>>   Replace happens automatically, that is when there is any write
>>   failed or flush failed, the device will be marked as failed, which
>>   will stop any further IO attempt to that device. And in the next commit
>>   thread cycle the auto replace will pick the spare device (/dev/sde is
>>   above example) to replace the failed device. And so the btrfs volume is
>>   back to a healthy state.
>>
>>
>> Its btrfs Global spare:
>>   as of now only global hot spare is supported, that is hot spare(s)
>>   are for all the btrfs FS in the system.
>>
>> No spare when device failed:
>>   It would scan for spare device at the rate of transaction commit
>>   and will trigger the auto replace when ever spare device is added.
>>
>> Priority:
>>   In some future work there can be some chronological order to pick
>>   a spare and the failed device.
>>
>>
>> Patches:
>>
>> Kernel:
>> First, it needs, Qu's per chunk missing device patchset,
>> which is part of the set here and also there is a light optimization
>> (patch 5/15) which was required as part of this enhancement.
>>
>> Next patches 7,8/15 brings in support, to manage the transition of
>> devices from online (no state) to offline OR failed state dynamically.
>> On top of static device state like the current "missing" state.
>>
>> Patch 9/15 fixes a bug where in we should have blocked the incompatible
>> feature at the device scan/add level instead/also at in the mount level.
>> This is because we don't have to bring a device into the device list,
>> if it is incompatible.
>>
>> Next patches 10,11,12,13/15 adds support for Spare device. For the
>> details on how to add a spare device kindly see further below.
>> For kernel with out spare feature supported the spare device
>> is kept away. And when the kernel supports the spare device, it will
>> inhibit from mounting it. Further these patch set provides helper
>> function to pick a spare device and release a spare device back to
>> the spare device pool.
>>
>> Patch 14/15 provides function for auto replace, this is mainly
>> from the existing replace code, and in the long run I see opportunity
>> to merge these code with the replace code that is triggered from
>> the user spare.
>>
>> Last 15/15, uses all these facilities, picks a failed device and
>> triggers a auto replace in a kthread (casualty_kthread())
>>
>>
>> Progs:
>> Would need 4 patches as listed below.
>>
>>
>> Known Bug:
>>
>> As now I see below stale kmem cache during module unload. Which
>> I am digging.
>> ------
>> BUG btrfs_path (Not tainted): Objects remaining in btrfs_path on kmem_cache_close()
>> ------
>>
>> Anand Jain (10):
>>    btrfs: optimize btrfs_check_degradable() for calls outside of barrier
>>    btrfs: introduce device dynamic state transition to offline or failed
>>    btrfs: check device for critical errors and mark failed
>>    btrfs: block incompatible optional features at scan
>>    btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV
>>    btrfs: add check not to mount a spare device
>>    btrfs: support btrfs dev scan for spare device
>>    btrfs: provide framework to get and put a spare device
>>    btrfs: introduce helper functions to perform hot replace
>>    btrfs: check for failed device and hot replace
>>
>> Qu Wenruo (5):
>>    btrfs: Introduce a new function to check if all chunks a OK for
>>      degraded mount
>>    btrfs: Do per-chunk check for mount time check
>>    btrfs: Do per-chunk degraded check for remount
>>    btrfs: Allow barrier_all_devices to do per-chunk device check
>>    btrfs: Cleanup num_tolerated_disk_barrier_failures
>>
>>   fs/btrfs/ctree.h       |   7 +-
>>   fs/btrfs/dev-replace.c | 116 ++++++++++++++++++++
>>   fs/btrfs/dev-replace.h |   1 +
>>   fs/btrfs/disk-io.c     | 211 +++++++++++++++++++++++-------------
>>   fs/btrfs/disk-io.h     |   2 -
>>   fs/btrfs/super.c       |  20 +++-
>>   fs/btrfs/transaction.c |   3 +-
>>   fs/btrfs/volumes.c     | 283 ++++++++++++++++++++++++++++++++++++++++++++++---
>>   fs/btrfs/volumes.h     |  27 +++++
>>   9 files changed, 571 insertions(+), 99 deletions(-)
>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
  2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
                   ` (17 preceding siblings ...)
  2015-11-12  2:15 ` Qu Wenruo
@ 2015-11-12 19:21 ` Goffredo Baroncelli
  2015-11-13 10:20   ` Anand Jain
  2015-11-16 13:41 ` Austin S Hemmelgarn
  19 siblings, 1 reply; 43+ messages in thread
From: Goffredo Baroncelli @ 2015-11-12 19:21 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs

On 2015-11-09 11:56, Anand Jain wrote:
> These set of patches provides btrfs hot spare and auto replace support
> for you review and comments.

Hi Anand,

is there any reason to put this kind of logic in the kernel space ? I think that it could be more simply to create a daemon which checks the disks and when needed it starts a replace...

The pool policy could be more sophisticated: some filesystem could require a "dedicated" pool (for example because the disks are in the same enclosure); in other case a global pool may be more useful.

Another feature of this daemon could be to add a disk when the disk space is too low, or to start a balance when there is no space to allocate further chunk.....

Of course all these logic could be implemented in kernel space, but I think that we should avoid that when possible. Moreover in user space the logging is more easy....


Only my 2¢...

BR
G.Baroncelli
[...]

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
  2015-11-12 13:04   ` Austin S Hemmelgarn
@ 2015-11-13  1:07     ` Qu Wenruo
  2015-11-13 10:20       ` Anand Jain
  0 siblings, 1 reply; 43+ messages in thread
From: Qu Wenruo @ 2015-11-13  1:07 UTC (permalink / raw)
  To: Austin S Hemmelgarn, Anand Jain, linux-btrfs



Austin S Hemmelgarn wrote on 2015/11/12 08:04 -0500:
> On 2015-11-11 21:15, Qu Wenruo wrote:
>> Hi Anand,
>>
>> Nice work.
>> But I have some small questions about it.
>>
>> Anand Jain wrote on 2015/11/09 18:56 +0800:
>>> These set of patches provides btrfs hot spare and auto replace support
>>> for you review and comments.
>>>
>>> First, here below are the simple example steps to configure the same:
>>>
>>> Add a spare device:
>>>      btrfs spare add /dev/sde -f
>>
>> I'm sorry but I didn't quite see the benefit of a spare device.
> Aside from what Duncan said (and I happen to agree with him), there is
> also the fact that hot-spares are (at least traditionally in most RAID
> systems) usually used with RAID5 or RAID6 (or some other parity scheme).
>
> So, to summarize:
> 1. Hot spares are more useful for most users in global context, and in
> that case only if they have more than one filesystem.
> 2. A pool of hot spares is even more useful.

Agreed, just as Ducan said.
Although only one spare device is supported yet.

> 3. Assuming whole disk usage (as opposed to partitioning), the hot spare
> will have no load on it until it gets used, at which point it will
> almost always be in better physical condition than the device it
> replaced (which is important for HA systems, in such cases you replace
> the disk that failed, and make the new disk a hot spare)

OK, that's also right, if no one is calling btrfs dev scan with a interval.

> 4. Hot spares are more often used (at least from what I've seen) on
> parity based raid systems than raid1.

I'm not familiar with parity based RAID5/6 in btrfs, so I can't say for 
sure.
But considering the chunk based RAID feature of btrfs, I think parity 
based RAID of BTRFS is not that different from current btrfs RAID1.
Just stripe size difference. hole chunk size(RAID1) vs real stripe size 
(btrfs RAID5/6)

And if Btrfs support to specify the number of disks used in raid5/6 
chunk allocation, for example only use any 3 devices to allocation raid5 
chunk even there are 4 devices, it will be much the same case.

I choose Btrfs Raid1 as an example in my mail just because Btrfs raid1 
will only use 2 devices no matter how many devices are in the filesystem.

So I'm very curious of why parity based RAID is often used with hot spare.

Thanks,
Qu

>
> In the rather limited case you outlined, I would probably just use raid1
> across all three devices myself (unless they were whole disks and not
> individual partitions, in which case I'd use a hot spare), but looking
> beyond that at my actual usage of BTRFS (multiple filesystems with
> multiple different raid profiles, spread across various disks), hot
> spares are definitely useful (although they would be more useful if I
> could specify that a given hot spare be used only for a given set of
> filesystems).
>
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
  2015-11-10 12:13     ` Austin S Hemmelgarn
@ 2015-11-13 10:17       ` Anand Jain
  2015-11-13 12:25         ` Austin S Hemmelgarn
  2015-11-15 18:10         ` Christoph Anton Mitterer
  0 siblings, 2 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-13 10:17 UTC (permalink / raw)
  To: Austin S Hemmelgarn, Duncan; +Cc: linux-btrfs



Thanks for the comments.

  Sorry for the delay.
  Trying to find out if there is any pending concerns...

> Hopefully, per-filesystem hot-spares will be a high priority too, as
> that type of usage is pretty much required for many enterprise type
> uses, although that doesn't need to be the same code doing it (in fact,
> I could see having per-fs spares and global spares both available
> potentially being very useful).

  That's doable with in the current design as well, however stability
  and hardening (fixing the possible loop holes) is kind of priority.

Thanks, Anand

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
  2015-11-12  2:15 ` Qu Wenruo
                     ` (2 preceding siblings ...)
  2015-11-12 19:08   ` Goffredo Baroncelli
@ 2015-11-13 10:18   ` Anand Jain
  3 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-13 10:18 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs



Thanks for the comments.

> Let's take the following example:
>
> 1) 2 RAID1 + 1 spare
>     (A + B) + C
>
> 2) 3 RAID1
>     (A + B + C)

> At least in normal operation case, case 1) makes device C useless, and

  Yes.

> For case 2), we can just relocate and recover the bad chunks in B.
> It it should only need to copy 6G data.

  Case 2 is Wrong in the context of spare.

  Unless space usage is limited to 1/3 of total space.
  But when you do that, case 1 drawback will equally apply
  to case 2 as well.


 > It it should only need to copy 6G data.

   Its true as long as you don't replace the failed B
   and bring the configuration to its original. However
   when you do that,  Data  moved will be more than case 1.
   So this is not fully correct.

Thanks, Anand

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
  2015-11-12 19:21 ` Goffredo Baroncelli
@ 2015-11-13 10:20   ` Anand Jain
  2015-11-14 11:05     ` Goffredo Baroncelli
  0 siblings, 1 reply; 43+ messages in thread
From: Anand Jain @ 2015-11-13 10:20 UTC (permalink / raw)
  To: kreijack, linux-btrfs


Thanks for comments.

On 11/13/2015 03:21 AM, Goffredo Baroncelli wrote:
> On 2015-11-09 11:56, Anand Jain wrote:
>> These set of patches provides btrfs hot spare and auto replace support
>> for you review and comments.
>
> Hi Anand,
>
> is there any reason to put this kind of logic in the kernel space ?
> I think that it could be more simply to create a daemon which checks
 > the disks and when needed it starts a replace...

> The pool policy could be more sophisticated: some filesystem could
 > require a "dedicated" pool (for example because the disks are in the
 > same enclosure); in other case a global pool may be more useful.

  Thats true. It can be added as an enhancement on top of current
  implementation, I will, if time permits. Current priority is
  to have stability on what could possibly go wrong (in configuring)
  and how stable code towards it.

> Another feature of this daemon could be to add a disk when the disk
 > space is too low,

  That will be at the cost of a spare device which user should review
  the trade-offs and do it manually ? I am not sure.

> or to start a balance when there is no space to
 > allocate further chunk.....

  Yep. As you notice, the thread created here is casualty_kthread()
  (instead of replace_kthread()) over the long run I wish to provide
  that feature in this thread, as it is a mutually exclusive operations
  with replace.

> Of course all these logic could be implemented in kernel space,
 > but I think that we should avoid that when possible.

  Easy to handle the mutually_exclusive parts with in the kernel
  and Its better to have the important logic at one place. Two heads
  operating on an org looking and feeling different things will lead
  to wrong decisions.

> Moreover in  user space the logging is more easy....

Thanks, Anand

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
  2015-11-13  1:07     ` Qu Wenruo
@ 2015-11-13 10:20       ` Anand Jain
  2015-11-14  0:54         ` Qu Wenruo
  0 siblings, 1 reply; 43+ messages in thread
From: Anand Jain @ 2015-11-13 10:20 UTC (permalink / raw)
  To: Qu Wenruo, Austin S Hemmelgarn, linux-btrfs



Thanks for commenting.

>>> I'm sorry but I didn't quite see the benefit of a spare device.
>> Aside from what Duncan said (and I happen to agree with him), there is
>> also the fact that hot-spares are (at least traditionally in most RAID
>> systems) usually used with RAID5 or RAID6 (or some other parity scheme).
>>
>> So, to summarize:
>> 1. Hot spares are more useful for most users in global context, and in
>> that case only if they have more than one filesystem.
>> 2. A pool of hot spares is even more useful.
>
> Agreed, just as Ducan said.
> Although only one spare device is supported yet.

  You can add more than one spare device currently.

>> 3. Assuming whole disk usage (as opposed to partitioning), the hot spare
>> will have no load on it until it gets used, at which point it will
>> almost always be in better physical condition than the device it
>> replaced (which is important for HA systems, in such cases you replace
>> the disk that failed, and make the new disk a hot spare)
>
> OK, that's also right, if no one is calling btrfs dev scan with a interval.

   Not too sure what you mean about the scan part.

Thanks, Anand

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
  2015-11-13 10:17       ` Anand Jain
@ 2015-11-13 12:25         ` Austin S Hemmelgarn
  2015-11-15 18:10         ` Christoph Anton Mitterer
  1 sibling, 0 replies; 43+ messages in thread
From: Austin S Hemmelgarn @ 2015-11-13 12:25 UTC (permalink / raw)
  To: Anand Jain, Duncan; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 966 bytes --]

On 2015-11-13 05:17, Anand Jain wrote:
>
>
> Thanks for the comments.
>
>   Sorry for the delay.
>   Trying to find out if there is any pending concerns...
FWIW, I'm planning on setting up a VM to test this over the weekend (I 
would have already, but I've been kind of busy at work this week), so 
I'll hopefully have some more feedback on Monday.
>
>> Hopefully, per-filesystem hot-spares will be a high priority too, as
>> that type of usage is pretty much required for many enterprise type
>> uses, although that doesn't need to be the same code doing it (in fact,
>> I could see having per-fs spares and global spares both available
>> potentially being very useful).
>
>   That's doable with in the current design as well, however stability
>   and hardening (fixing the possible loop holes) is kind of priority.
Entirely understandable, I would actually be somewhat worried if 
stability and hardening weren't the priority right now.



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
  2015-11-13 10:20       ` Anand Jain
@ 2015-11-14  0:54         ` Qu Wenruo
  2015-11-16 13:39           ` Austin S Hemmelgarn
  0 siblings, 1 reply; 43+ messages in thread
From: Qu Wenruo @ 2015-11-14  0:54 UTC (permalink / raw)
  To: Anand Jain, Qu Wenruo, Austin S Hemmelgarn, linux-btrfs



在 2015年11月13日 18:20, Anand Jain 写道:
>
>
> Thanks for commenting.
>
>>>> I'm sorry but I didn't quite see the benefit of a spare device.
>>> Aside from what Duncan said (and I happen to agree with him), there is
>>> also the fact that hot-spares are (at least traditionally in most RAID
>>> systems) usually used with RAID5 or RAID6 (or some other parity scheme).
>>>
>>> So, to summarize:
>>> 1. Hot spares are more useful for most users in global context, and in
>>> that case only if they have more than one filesystem.
>>> 2. A pool of hot spares is even more useful.
>>
>> Agreed, just as Ducan said.
>> Although only one spare device is supported yet.
>
>   You can add more than one spare device currently.
>
>>> 3. Assuming whole disk usage (as opposed to partitioning), the hot spare
>>> will have no load on it until it gets used, at which point it will
>>> almost always be in better physical condition than the device it
>>> replaced (which is important for HA systems, in such cases you replace
>>> the disk that failed, and make the new disk a hot spare)
>>
>> OK, that's also right, if no one is calling btrfs dev scan with a
>> interval.
>
>    Not too sure what you mean about the scan part.

Btrfs device scan will need to read the sb of the device.
So the hot spare device won't really sleep for a long time as each time 
btrfs scan devices, it will wakeup the device.

Not sure about soft raid hot spare. Maybe they won't cause any IO on the 
device? Or just the same with btrfs hot spare.

Thanks,
Qu

>
> Thanks, Anand
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
  2015-11-13 10:20   ` Anand Jain
@ 2015-11-14 11:05     ` Goffredo Baroncelli
  0 siblings, 0 replies; 43+ messages in thread
From: Goffredo Baroncelli @ 2015-11-14 11:05 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs

On 2015-11-13 11:20, Anand Jain wrote:
> 
> Thanks for comments.
> 
> On 11/13/2015 03:21 AM, Goffredo Baroncelli wrote:
>> On 2015-11-09 11:56, Anand Jain wrote:
>>> These set of patches provides btrfs hot spare and auto replace support
>>> for you review and comments.
>>
>> Hi Anand,
>>
>> is there any reason to put this kind of logic in the kernel space ?
[...]
> 
>> Another feature of this daemon could be to add a disk when the disk
>> space is too low,
> 
>  That will be at the cost of a spare device which user should review
>  the trade-offs and do it manually ? I am not sure.

If you have more than one spare, you can do automatically both: a new disk is added when the space is low, and a disk is replaced in case of failure. If you have only one spare: you may decide to reserve it only for replacing a failed disk. But this should be a configurable option: a low space leads to a not available filesystem, a failed disk means a higher likelihood to loosing all the filesystem. I am not sure which should be the more critical.

>> or to start a balance when there is no space to
>> allocate further chunk.....
> 
>  Yep. As you notice, the thread created here is casualty_kthread()
>  (instead of replace_kthread()) over the long run I wish to provide
>  that feature in this thread, as it is a mutually exclusive operations
>  with replace.

A disk replacing should be an higher priority operation. In case of disk failure during a balance/defrag, these operation should be stopped to allow a replace.
If you want to start a replace, you should stop others (long time) operations like balance and defrag.

> 
>> Of course all these logic could be implemented in kernel space,
>> but I think that we should avoid that when possible.
> 
>  Easy to handle the mutually_exclusive parts with in the kernel
>  and Its better to have the important logic at one place. Two heads
>  operating on an org looking and feeling different things will lead
>  to wrong decisions.

Which is the other logic which you are referring ?

> 
>> Moreover in  user space the logging is more easy....
> 
> Thanks, Anand


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
  2015-11-13 10:17       ` Anand Jain
  2015-11-13 12:25         ` Austin S Hemmelgarn
@ 2015-11-15 18:10         ` Christoph Anton Mitterer
  1 sibling, 0 replies; 43+ messages in thread
From: Christoph Anton Mitterer @ 2015-11-15 18:10 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Anand Jain

[-- Attachment #1: Type: text/plain, Size: 128 bytes --]

Hey.

You guys may want to update:
https://btrfs.wiki.kernel.org/index.php/Project_ideas#Hot_spare_support

Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
  2015-11-14  0:54         ` Qu Wenruo
@ 2015-11-16 13:39           ` Austin S Hemmelgarn
  0 siblings, 0 replies; 43+ messages in thread
From: Austin S Hemmelgarn @ 2015-11-16 13:39 UTC (permalink / raw)
  To: Qu Wenruo, Anand Jain, Qu Wenruo, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2360 bytes --]

On 2015-11-13 19:54, Qu Wenruo wrote:
>
>
> 在 2015年11月13日 18:20, Anand Jain 写道:
>>
>>
>> Thanks for commenting.
>>
>>>>> I'm sorry but I didn't quite see the benefit of a spare device.
>>>> Aside from what Duncan said (and I happen to agree with him), there is
>>>> also the fact that hot-spares are (at least traditionally in most RAID
>>>> systems) usually used with RAID5 or RAID6 (or some other parity
>>>> scheme).
>>>>
>>>> So, to summarize:
>>>> 1. Hot spares are more useful for most users in global context, and in
>>>> that case only if they have more than one filesystem.
>>>> 2. A pool of hot spares is even more useful.
>>>
>>> Agreed, just as Ducan said.
>>> Although only one spare device is supported yet.
>>
>>   You can add more than one spare device currently.
>>
>>>> 3. Assuming whole disk usage (as opposed to partitioning), the hot
>>>> spare
>>>> will have no load on it until it gets used, at which point it will
>>>> almost always be in better physical condition than the device it
>>>> replaced (which is important for HA systems, in such cases you replace
>>>> the disk that failed, and make the new disk a hot spare)
>>>
>>> OK, that's also right, if no one is calling btrfs dev scan with a
>>> interval.
>>
>>    Not too sure what you mean about the scan part.
>
> Btrfs device scan will need to read the sb of the device.
> So the hot spare device won't really sleep for a long time as each time
> btrfs scan devices, it will wakeup the device.
Um, no, unless you have a device scan on a cron job, you will only scan 
at boot (and the disk will usually be running then anyway, because most 
firmware spins up all disks at boot), because using mkfs or (I assume) 
registering the hot spare the first time automatically registers it with 
the kernel module.
>
> Not sure about soft raid hot spare. Maybe they won't cause any IO on the
> device? Or just the same with btrfs hot spare.
That depends on the type of hot spare.  Most soft raid systems use a 
similar policy to this patch-set (let the hot spare sit there until we 
need, then auto-replace when a device fails), but some use it actively 
in the set without counting it as part of the capacity (I see this 
mostly in RAID6 setups, where it just reshapes the array online to 
exclude the failed device).



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
  2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
                   ` (18 preceding siblings ...)
  2015-11-12 19:21 ` Goffredo Baroncelli
@ 2015-11-16 13:41 ` Austin S Hemmelgarn
  2015-11-16 22:07   ` Anand Jain
  19 siblings, 1 reply; 43+ messages in thread
From: Austin S Hemmelgarn @ 2015-11-16 13:41 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 5214 bytes --]

On 2015-11-09 05:56, Anand Jain wrote:
> These set of patches provides btrfs hot spare and auto replace support
> for you review and comments.
>
> First, here below are the simple example steps to configure the same:
>
> Add a spare device:
>      btrfs spare add /dev/sde -f
>
> OR if there is a spare device which is already added before the, just
> run
>
>      btrfs dev scan [/dev/sde]
>
> this will register the spare device to the kernel.
>
>      btrfs fi show
>      Label: none  uuid: 52f170c1-725c-457d-8cfd-d57090460091
> 	Total devices 2 FS bytes used 112.00KiB
> 	devid    1 size 2.00GiB used 417.50MiB path /dev/sdc
> 	devid    2 size 2.00GiB used 417.50MiB path /dev/sdd
>
>      Global spare
> 	device size 3.00GiB path /dev/sde
>
> Thats it.
>
> Auto replace:
>   Replace happens automatically, that is when there is any write
>   failed or flush failed, the device will be marked as failed, which
>   will stop any further IO attempt to that device. And in the next commit
>   thread cycle the auto replace will pick the spare device (/dev/sde is
>   above example) to replace the failed device. And so the btrfs volume is
>   back to a healthy state.
>
>
> Its btrfs Global spare:
>   as of now only global hot spare is supported, that is hot spare(s)
>   are for all the btrfs FS in the system.
>
> No spare when device failed:
>   It would scan for spare device at the rate of transaction commit
>   and will trigger the auto replace when ever spare device is added.
>
> Priority:
>   In some future work there can be some chronological order to pick
>   a spare and the failed device.
>
>
> Patches:
>
> Kernel:
> First, it needs, Qu's per chunk missing device patchset,
> which is part of the set here and also there is a light optimization
> (patch 5/15) which was required as part of this enhancement.
>
> Next patches 7,8/15 brings in support, to manage the transition of
> devices from online (no state) to offline OR failed state dynamically.
> On top of static device state like the current "missing" state.
>
> Patch 9/15 fixes a bug where in we should have blocked the incompatible
> feature at the device scan/add level instead/also at in the mount level.
> This is because we don't have to bring a device into the device list,
> if it is incompatible.
>
> Next patches 10,11,12,13/15 adds support for Spare device. For the
> details on how to add a spare device kindly see further below.
> For kernel with out spare feature supported the spare device
> is kept away. And when the kernel supports the spare device, it will
> inhibit from mounting it. Further these patch set provides helper
> function to pick a spare device and release a spare device back to
> the spare device pool.
>
> Patch 14/15 provides function for auto replace, this is mainly
> from the existing replace code, and in the long run I see opportunity
> to merge these code with the replace code that is triggered from
> the user spare.
>
> Last 15/15, uses all these facilities, picks a failed device and
> triggers a auto replace in a kthread (casualty_kthread())
>
>
> Progs:
> Would need 4 patches as listed below.
>
>
> Known Bug:
>
> As now I see below stale kmem cache during module unload. Which
> I am digging.
> ------
> BUG btrfs_path (Not tainted): Objects remaining in btrfs_path on kmem_cache_close()
> ------
>
> Anand Jain (10):
>    btrfs: optimize btrfs_check_degradable() for calls outside of barrier
>    btrfs: introduce device dynamic state transition to offline or failed
>    btrfs: check device for critical errors and mark failed
>    btrfs: block incompatible optional features at scan
>    btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV
>    btrfs: add check not to mount a spare device
>    btrfs: support btrfs dev scan for spare device
>    btrfs: provide framework to get and put a spare device
>    btrfs: introduce helper functions to perform hot replace
>    btrfs: check for failed device and hot replace
>
> Qu Wenruo (5):
>    btrfs: Introduce a new function to check if all chunks a OK for
>      degraded mount
>    btrfs: Do per-chunk check for mount time check
>    btrfs: Do per-chunk degraded check for remount
>    btrfs: Allow barrier_all_devices to do per-chunk device check
>    btrfs: Cleanup num_tolerated_disk_barrier_failures
>
>   fs/btrfs/ctree.h       |   7 +-
>   fs/btrfs/dev-replace.c | 116 ++++++++++++++++++++
>   fs/btrfs/dev-replace.h |   1 +
>   fs/btrfs/disk-io.c     | 211 +++++++++++++++++++++++-------------
>   fs/btrfs/disk-io.h     |   2 -
>   fs/btrfs/super.c       |  20 +++-
>   fs/btrfs/transaction.c |   3 +-
>   fs/btrfs/volumes.c     | 283 ++++++++++++++++++++++++++++++++++++++++++++++---
>   fs/btrfs/volumes.h     |  27 +++++
>   9 files changed, 571 insertions(+), 99 deletions(-)
>
I've thrown everything I can think of at this over the weekend, and 
nothing broke (at least, nothing broke that had anything to do with 
these patches, I ended up triggering a couple of known bugs that I had 
completely forgotten about), so you can add:
Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
  2015-11-16 13:41 ` Austin S Hemmelgarn
@ 2015-11-16 22:07   ` Anand Jain
  2015-11-17 12:28     ` Austin S Hemmelgarn
  0 siblings, 1 reply; 43+ messages in thread
From: Anand Jain @ 2015-11-16 22:07 UTC (permalink / raw)
  To: Austin S Hemmelgarn; +Cc: linux-btrfs



On 11/16/2015 09:41 PM, Austin S Hemmelgarn wrote:
> On 2015-11-09 05:56, Anand Jain wrote:
>> These set of patches provides btrfs hot spare and auto replace support
>> for you review and comments.
>>
>> First, here below are the simple example steps to configure the same:
>>
>> Add a spare device:
>>      btrfs spare add /dev/sde -f
>>
>> OR if there is a spare device which is already added before the, just
>> run
>>
>>      btrfs dev scan [/dev/sde]
>>
>> this will register the spare device to the kernel.
>>
>>      btrfs fi show
>>      Label: none  uuid: 52f170c1-725c-457d-8cfd-d57090460091
>>     Total devices 2 FS bytes used 112.00KiB
>>     devid    1 size 2.00GiB used 417.50MiB path /dev/sdc
>>     devid    2 size 2.00GiB used 417.50MiB path /dev/sdd
>>
>>      Global spare
>>     device size 3.00GiB path /dev/sde
>>
>> Thats it.
>>
>> Auto replace:
>>   Replace happens automatically, that is when there is any write
>>   failed or flush failed, the device will be marked as failed, which
>>   will stop any further IO attempt to that device. And in the next commit
>>   thread cycle the auto replace will pick the spare device (/dev/sde is
>>   above example) to replace the failed device. And so the btrfs volume is
>>   back to a healthy state.
>>
>>
>> Its btrfs Global spare:
>>   as of now only global hot spare is supported, that is hot spare(s)
>>   are for all the btrfs FS in the system.
>>
>> No spare when device failed:
>>   It would scan for spare device at the rate of transaction commit
>>   and will trigger the auto replace when ever spare device is added.
>>
>> Priority:
>>   In some future work there can be some chronological order to pick
>>   a spare and the failed device.
>>
>>
>> Patches:
>>
>> Kernel:
>> First, it needs, Qu's per chunk missing device patchset,
>> which is part of the set here and also there is a light optimization
>> (patch 5/15) which was required as part of this enhancement.
>>
>> Next patches 7,8/15 brings in support, to manage the transition of
>> devices from online (no state) to offline OR failed state dynamically.
>> On top of static device state like the current "missing" state.
>>
>> Patch 9/15 fixes a bug where in we should have blocked the incompatible
>> feature at the device scan/add level instead/also at in the mount level.
>> This is because we don't have to bring a device into the device list,
>> if it is incompatible.
>>
>> Next patches 10,11,12,13/15 adds support for Spare device. For the
>> details on how to add a spare device kindly see further below.
>> For kernel with out spare feature supported the spare device
>> is kept away. And when the kernel supports the spare device, it will
>> inhibit from mounting it. Further these patch set provides helper
>> function to pick a spare device and release a spare device back to
>> the spare device pool.
>>
>> Patch 14/15 provides function for auto replace, this is mainly
>> from the existing replace code, and in the long run I see opportunity
>> to merge these code with the replace code that is triggered from
>> the user spare.
>>
>> Last 15/15, uses all these facilities, picks a failed device and
>> triggers a auto replace in a kthread (casualty_kthread())
>>
>>
>> Progs:
>> Would need 4 patches as listed below.
>>
>>
>> Known Bug:
>>
>> As now I see below stale kmem cache during module unload. Which
>> I am digging.
>> ------
>> BUG btrfs_path (Not tainted): Objects remaining in btrfs_path on
>> kmem_cache_close()
>> ------
>>
>> Anand Jain (10):
>>    btrfs: optimize btrfs_check_degradable() for calls outside of barrier
>>    btrfs: introduce device dynamic state transition to offline or failed
>>    btrfs: check device for critical errors and mark failed
>>    btrfs: block incompatible optional features at scan
>>    btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV
>>    btrfs: add check not to mount a spare device
>>    btrfs: support btrfs dev scan for spare device
>>    btrfs: provide framework to get and put a spare device
>>    btrfs: introduce helper functions to perform hot replace
>>    btrfs: check for failed device and hot replace
>>
>> Qu Wenruo (5):
>>    btrfs: Introduce a new function to check if all chunks a OK for
>>      degraded mount
>>    btrfs: Do per-chunk check for mount time check
>>    btrfs: Do per-chunk degraded check for remount
>>    btrfs: Allow barrier_all_devices to do per-chunk device check
>>    btrfs: Cleanup num_tolerated_disk_barrier_failures
>>
>>   fs/btrfs/ctree.h       |   7 +-
>>   fs/btrfs/dev-replace.c | 116 ++++++++++++++++++++
>>   fs/btrfs/dev-replace.h |   1 +
>>   fs/btrfs/disk-io.c     | 211 +++++++++++++++++++++++-------------
>>   fs/btrfs/disk-io.h     |   2 -
>>   fs/btrfs/super.c       |  20 +++-
>>   fs/btrfs/transaction.c |   3 +-
>>   fs/btrfs/volumes.c     | 283
>> ++++++++++++++++++++++++++++++++++++++++++++++---
>>   fs/btrfs/volumes.h     |  27 +++++
>>   9 files changed, 571 insertions(+), 99 deletions(-)
>>
> I've thrown everything I can think of at this over the weekend, and
> nothing broke (at least, nothing broke that had anything to do with
> these patches, I ended up triggering a couple of known bugs that I had
> completely forgotten about), so you can add:
> Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
>

Thanks Austin.
Yeah I should fix the known bug as listed above.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
  2015-11-16 22:07   ` Anand Jain
@ 2015-11-17 12:28     ` Austin S Hemmelgarn
  0 siblings, 0 replies; 43+ messages in thread
From: Austin S Hemmelgarn @ 2015-11-17 12:28 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 6149 bytes --]

On 2015-11-16 17:07, Anand Jain wrote:
>
>
> On 11/16/2015 09:41 PM, Austin S Hemmelgarn wrote:
>> On 2015-11-09 05:56, Anand Jain wrote:
>>> These set of patches provides btrfs hot spare and auto replace support
>>> for you review and comments.
>>>
>>> First, here below are the simple example steps to configure the same:
>>>
>>> Add a spare device:
>>>      btrfs spare add /dev/sde -f
>>>
>>> OR if there is a spare device which is already added before the, just
>>> run
>>>
>>>      btrfs dev scan [/dev/sde]
>>>
>>> this will register the spare device to the kernel.
>>>
>>>      btrfs fi show
>>>      Label: none  uuid: 52f170c1-725c-457d-8cfd-d57090460091
>>>     Total devices 2 FS bytes used 112.00KiB
>>>     devid    1 size 2.00GiB used 417.50MiB path /dev/sdc
>>>     devid    2 size 2.00GiB used 417.50MiB path /dev/sdd
>>>
>>>      Global spare
>>>     device size 3.00GiB path /dev/sde
>>>
>>> Thats it.
>>>
>>> Auto replace:
>>>   Replace happens automatically, that is when there is any write
>>>   failed or flush failed, the device will be marked as failed, which
>>>   will stop any further IO attempt to that device. And in the next
>>> commit
>>>   thread cycle the auto replace will pick the spare device (/dev/sde is
>>>   above example) to replace the failed device. And so the btrfs
>>> volume is
>>>   back to a healthy state.
>>>
>>>
>>> Its btrfs Global spare:
>>>   as of now only global hot spare is supported, that is hot spare(s)
>>>   are for all the btrfs FS in the system.
>>>
>>> No spare when device failed:
>>>   It would scan for spare device at the rate of transaction commit
>>>   and will trigger the auto replace when ever spare device is added.
>>>
>>> Priority:
>>>   In some future work there can be some chronological order to pick
>>>   a spare and the failed device.
>>>
>>>
>>> Patches:
>>>
>>> Kernel:
>>> First, it needs, Qu's per chunk missing device patchset,
>>> which is part of the set here and also there is a light optimization
>>> (patch 5/15) which was required as part of this enhancement.
>>>
>>> Next patches 7,8/15 brings in support, to manage the transition of
>>> devices from online (no state) to offline OR failed state dynamically.
>>> On top of static device state like the current "missing" state.
>>>
>>> Patch 9/15 fixes a bug where in we should have blocked the incompatible
>>> feature at the device scan/add level instead/also at in the mount level.
>>> This is because we don't have to bring a device into the device list,
>>> if it is incompatible.
>>>
>>> Next patches 10,11,12,13/15 adds support for Spare device. For the
>>> details on how to add a spare device kindly see further below.
>>> For kernel with out spare feature supported the spare device
>>> is kept away. And when the kernel supports the spare device, it will
>>> inhibit from mounting it. Further these patch set provides helper
>>> function to pick a spare device and release a spare device back to
>>> the spare device pool.
>>>
>>> Patch 14/15 provides function for auto replace, this is mainly
>>> from the existing replace code, and in the long run I see opportunity
>>> to merge these code with the replace code that is triggered from
>>> the user spare.
>>>
>>> Last 15/15, uses all these facilities, picks a failed device and
>>> triggers a auto replace in a kthread (casualty_kthread())
>>>
>>>
>>> Progs:
>>> Would need 4 patches as listed below.
>>>
>>>
>>> Known Bug:
>>>
>>> As now I see below stale kmem cache during module unload. Which
>>> I am digging.
>>> ------
>>> BUG btrfs_path (Not tainted): Objects remaining in btrfs_path on
>>> kmem_cache_close()
>>> ------
>>>
>>> Anand Jain (10):
>>>    btrfs: optimize btrfs_check_degradable() for calls outside of barrier
>>>    btrfs: introduce device dynamic state transition to offline or failed
>>>    btrfs: check device for critical errors and mark failed
>>>    btrfs: block incompatible optional features at scan
>>>    btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV
>>>    btrfs: add check not to mount a spare device
>>>    btrfs: support btrfs dev scan for spare device
>>>    btrfs: provide framework to get and put a spare device
>>>    btrfs: introduce helper functions to perform hot replace
>>>    btrfs: check for failed device and hot replace
>>>
>>> Qu Wenruo (5):
>>>    btrfs: Introduce a new function to check if all chunks a OK for
>>>      degraded mount
>>>    btrfs: Do per-chunk check for mount time check
>>>    btrfs: Do per-chunk degraded check for remount
>>>    btrfs: Allow barrier_all_devices to do per-chunk device check
>>>    btrfs: Cleanup num_tolerated_disk_barrier_failures
>>>
>>>   fs/btrfs/ctree.h       |   7 +-
>>>   fs/btrfs/dev-replace.c | 116 ++++++++++++++++++++
>>>   fs/btrfs/dev-replace.h |   1 +
>>>   fs/btrfs/disk-io.c     | 211 +++++++++++++++++++++++-------------
>>>   fs/btrfs/disk-io.h     |   2 -
>>>   fs/btrfs/super.c       |  20 +++-
>>>   fs/btrfs/transaction.c |   3 +-
>>>   fs/btrfs/volumes.c     | 283
>>> ++++++++++++++++++++++++++++++++++++++++++++++---
>>>   fs/btrfs/volumes.h     |  27 +++++
>>>   9 files changed, 571 insertions(+), 99 deletions(-)
>>>
>> I've thrown everything I can think of at this over the weekend, and
>> nothing broke (at least, nothing broke that had anything to do with
>> these patches, I ended up triggering a couple of known bugs that I had
>> completely forgotten about), so you can add:
>> Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
>>
>
> Thanks Austin.
> Yeah I should fix the known bug as listed above.
>
Actually, while I did see that, I also ran into a couple of other bugs 
that are unrelated to these patches (including the balance related bug I 
was recently discussing in another thread on the ML, which (like 
everyone else it's hit) I've sadly been unable to reproduce).  None of 
the ones I hit other than the one you mentioned in the cover letter were 
anything new with these patches, and they didn't happen any more 
frequently with the patches.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 06/15] btrfs: Cleanup num_tolerated_disk_barrier_failures
  2015-11-09 10:56 ` [PATCH 06/15] btrfs: Cleanup num_tolerated_disk_barrier_failures Anand Jain
@ 2015-12-05  7:16   ` Qu Wenruo
  0 siblings, 0 replies; 43+ messages in thread
From: Qu Wenruo @ 2015-12-05  7:16 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs; +Cc: Chris Mason

Hi Anand,

Would you please push patch 1~6 in your hot spare patchset to Chris first?

In my opinion, it will need some time before some details like whether 
to do hot-spare in kernel or in user-space are settled.

And all these 6 patches are quite independent from the hot spare patchset.
So it would be OK to push them into mainline in this or next merge windows.

Thanks,
Qu

On 11/09/2015 06:56 PM, Anand Jain wrote:
> From: Qu Wenruo <quwenruo@cn.fujitsu.com>
>
> As we use per-chunk degradable check, now the global
> num_tolerated_disk_barrier_failures is of no use. So cleanup it.
>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>
> [Btrfs: resolve conflict to apply 'btrfs: Cleanup num_tolerated_disk_barrier_failures']
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>   fs/btrfs/ctree.h   |  2 --
>   fs/btrfs/disk-io.c | 56 ------------------------------------------------------
>   fs/btrfs/disk-io.h |  2 --
>   fs/btrfs/volumes.c | 17 -----------------
>   4 files changed, 77 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index a86051e..dedd3e0 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1753,8 +1753,6 @@ struct btrfs_fs_info {
>   	/* next backup root to be overwritten */
>   	int backup_root_index;
>
> -	int num_tolerated_disk_barrier_failures;
> -
>   	/* device replace state */
>   	struct btrfs_dev_replace dev_replace;
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index d3303f9..d10ef2e 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2965,8 +2965,6 @@ retry_root_backup:
>   		printk(KERN_ERR "BTRFS: Failed to read block groups: %d\n", ret);
>   		goto fail_sysfs;
>   	}
> -	fs_info->num_tolerated_disk_barrier_failures =
> -		btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
>
>   	fs_info->cleaner_kthread = kthread_run(cleaner_kthread, tree_root,
>   					       "btrfs-cleaner");
> @@ -3498,60 +3496,6 @@ int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags)
>   	return 0;
>   }
>
> -int btrfs_calc_num_tolerated_disk_barrier_failures(
> -	struct btrfs_fs_info *fs_info)
> -{
> -	struct btrfs_ioctl_space_info space;
> -	struct btrfs_space_info *sinfo;
> -	u64 types[] = {BTRFS_BLOCK_GROUP_DATA,
> -		       BTRFS_BLOCK_GROUP_SYSTEM,
> -		       BTRFS_BLOCK_GROUP_METADATA,
> -		       BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA};
> -	int i;
> -	int c;
> -	int num_tolerated_disk_barrier_failures =
> -		(int)fs_info->fs_devices->num_devices;
> -
> -	for (i = 0; i < ARRAY_SIZE(types); i++) {
> -		struct btrfs_space_info *tmp;
> -
> -		sinfo = NULL;
> -		rcu_read_lock();
> -		list_for_each_entry_rcu(tmp, &fs_info->space_info, list) {
> -			if (tmp->flags == types[i]) {
> -				sinfo = tmp;
> -				break;
> -			}
> -		}
> -		rcu_read_unlock();
> -
> -		if (!sinfo)
> -			continue;
> -
> -		down_read(&sinfo->groups_sem);
> -		for (c = 0; c < BTRFS_NR_RAID_TYPES; c++) {
> -			u64 flags;
> -
> -			if (list_empty(&sinfo->block_groups[c]))
> -				continue;
> -
> -			btrfs_get_block_group_info(&sinfo->block_groups[c],
> -						   &space);
> -			if (space.total_bytes == 0 || space.used_bytes == 0)
> -				continue;
> -			flags = space.flags;
> -
> -			num_tolerated_disk_barrier_failures = min(
> -				num_tolerated_disk_barrier_failures,
> -				btrfs_get_num_tolerated_disk_barrier_failures(
> -					flags));
> -		}
> -		up_read(&sinfo->groups_sem);
> -	}
> -
> -	return num_tolerated_disk_barrier_failures;
> -}
> -
>   static int write_all_supers(struct btrfs_root *root, int max_mirrors)
>   {
>   	struct list_head *head;
> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
> index adeb318..6dc5fd3 100644
> --- a/fs/btrfs/disk-io.h
> +++ b/fs/btrfs/disk-io.h
> @@ -142,8 +142,6 @@ struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans,
>   int btree_lock_page_hook(struct page *page, void *data,
>   				void (*flush_fn)(void *));
>   int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags);
> -int btrfs_calc_num_tolerated_disk_barrier_failures(
> -	struct btrfs_fs_info *fs_info);
>   int __init btrfs_end_io_wq_init(void);
>   void btrfs_end_io_wq_exit(void);
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index a5262bf..33ad42e 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1782,9 +1782,6 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path, u64 devid)
>   		free_fs_devices(cur_devices);
>   	}
>
> -	root->fs_info->num_tolerated_disk_barrier_failures =
> -		btrfs_calc_num_tolerated_disk_barrier_failures(root->fs_info);
> -
>   	/*
>   	 * at this point, the device is zero sized.  We want to
>   	 * remove it from the devices list and zero out the old super
> @@ -2289,8 +2286,6 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path)
>   		}
>   	}
>
> -	root->fs_info->num_tolerated_disk_barrier_failures =
> -		btrfs_calc_num_tolerated_disk_barrier_failures(root->fs_info);
>   	ret = btrfs_commit_transaction(trans, root);
>
>   	if (seeding_dev) {
> @@ -3518,13 +3513,6 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
>   		}
>   	} while (read_seqretry(&fs_info->profiles_lock, seq));
>
> -	if (bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) {
> -		fs_info->num_tolerated_disk_barrier_failures = min(
> -			btrfs_calc_num_tolerated_disk_barrier_failures(fs_info),
> -			btrfs_get_num_tolerated_disk_barrier_failures(
> -				bctl->sys.target));
> -	}
> -
>   	ret = insert_balance_item(fs_info->tree_root, bctl);
>   	if (ret && ret != -EEXIST)
>   		goto out;
> @@ -3547,11 +3535,6 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
>   	mutex_lock(&fs_info->balance_mutex);
>   	atomic_dec(&fs_info->balance_running);
>
> -	if (bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) {
> -		fs_info->num_tolerated_disk_barrier_failures =
> -			btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
> -	}
> -
>   	if (bargs) {
>   		memset(bargs, 0, sizeof(*bargs));
>   		update_ioctl_balance_args(fs_info, 0, bargs);
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2015-12-05  7:16 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
2015-11-09 10:56 ` [PATCH 01/15] btrfs: Introduce a new function to check if all chunks a OK for degraded mount Anand Jain
2015-11-09 10:56 ` [PATCH 02/15] btrfs: Do per-chunk check for mount time check Anand Jain
2015-11-09 10:56 ` [PATCH 03/15] btrfs: Do per-chunk degraded check for remount Anand Jain
2015-11-09 10:56 ` [PATCH 04/15] btrfs: Allow barrier_all_devices to do per-chunk device check Anand Jain
2015-11-09 10:56 ` [PATCH 05/15] btrfs: optimize btrfs_check_degradable() for calls outside of barrier Anand Jain
2015-11-09 10:56 ` [PATCH 06/15] btrfs: Cleanup num_tolerated_disk_barrier_failures Anand Jain
2015-12-05  7:16   ` Qu Wenruo
2015-11-09 10:56 ` [PATCH 07/15] btrfs: introduce device dynamic state transition to offline or failed Anand Jain
2015-11-09 10:56 ` [PATCH 08/15] btrfs: check device for critical errors and mark failed Anand Jain
2015-11-09 10:56 ` [PATCH 09/15] btrfs: block incompatible optional features at scan Anand Jain
2015-11-09 10:56 ` [PATCH 10/15] btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV Anand Jain
2015-11-09 10:56 ` [PATCH 11/15] btrfs: add check not to mount a spare device Anand Jain
2015-11-09 10:56 ` [PATCH 12/15] btrfs: support btrfs dev scan for " Anand Jain
2015-11-09 10:56 ` [PATCH 13/15] btrfs: provide framework to get and put a " Anand Jain
2015-11-09 10:56 ` [PATCH 14/15] btrfs: introduce helper functions to perform hot replace Anand Jain
2015-11-09 10:56 ` [PATCH 15/15] btrfs: check for failed device and " Anand Jain
2015-11-09 10:58 ` [PATCH 0/4] btrfs-progs: Hot spare and Auto replace Anand Jain
2015-11-09 10:58   ` [PATCH 1/4] btrfs-progs: Introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV SB flags Anand Jain
2015-11-09 10:58   ` [PATCH 2/4] btrfs-progs: Introduce btrfs spare subcommand Anand Jain
2015-11-09 10:58   ` [PATCH 3/4] btrfs-progs: add fi show for spare Anand Jain
2015-11-09 10:58   ` [PATCH 4/4] btrfs-progs: add global spare device list to filesystem show Anand Jain
2015-11-09 14:09 ` [PATCH 00/15] btrfs: Hot spare and Auto replace Austin S Hemmelgarn
2015-11-09 21:29   ` Duncan
2015-11-10 12:13     ` Austin S Hemmelgarn
2015-11-13 10:17       ` Anand Jain
2015-11-13 12:25         ` Austin S Hemmelgarn
2015-11-15 18:10         ` Christoph Anton Mitterer
2015-11-12  2:15 ` Qu Wenruo
2015-11-12  6:46   ` Duncan
2015-11-12 13:04   ` Austin S Hemmelgarn
2015-11-13  1:07     ` Qu Wenruo
2015-11-13 10:20       ` Anand Jain
2015-11-14  0:54         ` Qu Wenruo
2015-11-16 13:39           ` Austin S Hemmelgarn
2015-11-12 19:08   ` Goffredo Baroncelli
2015-11-13 10:18   ` Anand Jain
2015-11-12 19:21 ` Goffredo Baroncelli
2015-11-13 10:20   ` Anand Jain
2015-11-14 11:05     ` Goffredo Baroncelli
2015-11-16 13:41 ` Austin S Hemmelgarn
2015-11-16 22:07   ` Anand Jain
2015-11-17 12:28     ` Austin S Hemmelgarn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.