All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/15] Introduce device state 'failed', Hot spare and Auto replace
@ 2016-03-29 14:22 Anand Jain
  2016-03-29 14:22 ` [PATCH 01/12] btrfs: Introduce a new function to check if all chunks a OK for degraded mount Anand Jain
                   ` (13 more replies)
  0 siblings, 14 replies; 25+ messages in thread
From: Anand Jain @ 2016-03-29 14:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, dsterba

Thanks for various comments, tests and feedback.

Background: Hot spare and Auto replace:
 Hot spare is predominately used to mitigate or narrow the time
 window of a storage in degraded mode during which any further disk
 failure might lead to a catastrophic data loss. Data center
 storage generally will have couple of disks reserved as spares
 on the storage. Mainly this is an enterprise storage feature
 rather than a FS feature, I believe people acquainted with
 enterprise storage use cases will appreciate the need of it and
 so most/all of the enterprise storage has hot spare feature.

Btrfs device states:
 This patch-set adds 'failed' state and makes provision to use
 'offline' state as two new device states. So to summarize
 various device states and their meanings..

 /* missing: device wasn't found at the time of mount */
 int missing;

 /*
  * failed: device confirmed to have experienced critical
  * io failure
  */
 int failed;

 /*
  * offline: When there is no confirmation that a disk has
  * failed. But an interim communication breakdown
  * and not necessarily a candidate for the device replace.
  * Device might be online after user intervention or after
  * block transport layer error recovery.
  */
 int offline;


Device state transition Tuning and visualization:
 Sysfs interfaces are planned to provide the required tuning for
 device state transition sensitivities and visualization of device
 states. However sysfs framework which could provide such an interface
 is being reviewed/tested and not yet ready as of now. So for the
 testing and debug of these features here I have used an update
 version of the procfs patch which is in the ML.

      [PATCH] btrfs: debug: procfs-devlist: introduce procfs interface for
the device list for debugging

 I find the above patch very useful and stable as compared to sysfs
to visualize the device state.

This patch set does not depend on any of the sysfs patches as such.

Cross compatibility:
 Adds a new incompatibility feature flags
 (BTRFS_FEATURE_INCOMPAT_SPARE_DEV) to manage the spare device
 when older kernels are used. So it is tested to be work fine
 with older kernel/prog versions.


Auto replace:
 Replace happens automatically, that is when there is any write
 failed or flush failed, the device will be marked as failed, which
 will stop any further IO attempt to that device. And in the next
 commit cycle the auto replace will pick the spare device to
 replace the failed device. And so the btrfs volume is back to a
 healthy state.

Per FSID spare vs Global spare:
 As of now only global hot spare is supported, that is hot spare(s)
 are for all the btrfs FS in the system. However future there will
 be a fs_info->no_auto_replace tunable which can be tuned by the user
 to limit the use of global spare.


Example use case:
 Here below is an example use case of the hot spare setup.

 Add a spare device:
        btrfs spare add /dev/sde -f

 If there is a spare device which is already added before the,
 just run

        btrfs dev scan [/dev/sde]

 Which will register the spare device to the kernel.

        btrfs fi show
         Label: none uuid: 52f170c1-725c-457d-8cfd-d57090460091
          Total devices 2 FS bytes used 112.00KiB
          devid 1 size 2.00GiB used 417.50MiB path /dev/sdc
          devid 2 size 2.00GiB used 417.50MiB path /dev/sdd

        Global spare
          device size 3.00GiB path /dev/sde


Patches:

Kernel:
 First, it needs, Qu's per chunk missing device patchset, which is
 part of the set.

 Next patches 6/12 brings in support to manage the transition of
 devices from online (no state) to offline OR failed state dynamically.
 On top of static device state like the current "missing" state.
 
 Next patches 7-11/12 adds support for Spare device. For kernel without
 spare feature the spare device is kept away. And when the kernel
 supports the spare device, it will inhibit from mounting it. Further
 these patch set provides helper function to pick a spare device and
 release a spare device back to the spare device pool.

 Patch 11/12 provides function for auto replace, this is mainly
 from the existing replace code.
 Last 12/15, uses all these facilities, picks a failed device and
 triggers a auto replace in a kthread (casualty_kthread())


Progs:
 Needs below 4 patches which will add sub cli 'spare' to manage
 the spare device. As of now deleting a spare device has to be
 managed using wipefs. However in the long run we would a proper
 btrfs command to do that job.


V1->V2:
Kernel:
 (Based on tests and commets provided in the ML)
 a. Now transition_kthread() wakes up the casualty_kthread to check
    for device states. Instead of doing that in the transition_kthread()
    itself. Cleaner and less pressure on transition_kthread().
 b. Dropped
     [PATCH 05/15] btrfs: optimize btrfs_check_degradable() for calls outside of barrier
    as it was wrong patch and the optimization was incomplete.
 c. Merged patches
    btrfs: check for failed device and hot replace
      to
    btrfs: check device for critical errors and mark failed
    in an effort to make the changes as in a above.

Progs:
 a. Added to call btrfs_register_one_device() when doing btrfs
    spare add


Anand Jain (7):
  btrfs: introduce device dynamic state transition to offline or failed
  btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV
  btrfs: add check not to mount a spare device
  btrfs: support btrfs dev scan for spare device
  btrfs: provide framework to get and put a spare device
  btrfs: introduce helper functions to perform hot replace
  btrfs: check device for critical errors and mark failed

Qu Wenruo (5):
  btrfs: Introduce a new function to check if all chunks a OK for
    degraded mount
  btrfs: Do per-chunk check for mount time check
  btrfs: Do per-chunk degraded check for remount
  btrfs: Allow barrier_all_devices to do per-chunk device check
  btrfs: Cleanup num_tolerated_disk_barrier_failures

 fs/btrfs/ctree.h       |   8 +-
 fs/btrfs/dev-replace.c |  24 +++++
 fs/btrfs/dev-replace.h |   1 +
 fs/btrfs/disk-io.c     | 256 +++++++++++++++++++++++++++++++++--------------
 fs/btrfs/disk-io.h     |   4 +-
 fs/btrfs/super.c       |  20 +++-
 fs/btrfs/volumes.c     | 263 +++++++++++++++++++++++++++++++++++++++++++++----
 fs/btrfs/volumes.h     |  27 +++++
 8 files changed, 504 insertions(+), 99 deletions(-)

Anand Jain (4):
  btrfs-progs: Introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV SB flags
  btrfs-progs: Introduce btrfs spare subcommand
  btrfs-progs: add fi show for spare
  btrfs-progs: add global spare device list to filesystem show

 Android.mk        |   2 +-
 Makefile.in       |   3 +-
 btrfs.c           |   1 +
 cmds-filesystem.c |   9 ++
 cmds-spare.c      | 292 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 commands.h        |   2 +
 ctree.h           |   4 +-
 utils.h           |   1 +
 volumes.c         |   4 +
 volumes.h         |   2 +
 10 files changed, 317 insertions(+), 3 deletions(-)
 create mode 100644 cmds-spare.c

-- 
2.7.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 01/12] btrfs: Introduce a new function to check if all chunks a OK for degraded mount
  2016-03-29 14:22 [PATCH v2 00/15] Introduce device state 'failed', Hot spare and Auto replace Anand Jain
@ 2016-03-29 14:22 ` Anand Jain
  2016-03-29 14:22 ` [PATCH 02/12] btrfs: Do per-chunk check for mount time check Anand Jain
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Anand Jain @ 2016-03-29 14:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, dsterba

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

Introduce a new function, btrfs_check_degradable(), to judge if all chunks
in btrfs is OK for degraded mount.

It provides the new basis for accurate btrfs mount/remount and even
runtime degraded mount check other than old one-size-fit-all method.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/volumes.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.h |  1 +
 2 files changed, 64 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e2b54d546b7c..dd3dc53a302a 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7042,3 +7042,66 @@ static void btrfs_close_one_device(struct btrfs_device *device)
 
 	call_rcu(&device->rcu, free_device);
 }
+
+/*
+ * Check if all chunks in the fs is OK for degraded mount
+ * Caller itself should do extra check if DEGRADED mount option is given
+ * for >0 return value.
+ *
+ * Return 0 if all chunks are OK.
+ * Return >0 if all chunks are degradable but not all OK.
+ * Return <0 if any chunk is not degradable or other bug.
+ */
+int btrfs_check_degradable(struct btrfs_fs_info *fs_info, unsigned flags)
+{
+	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
+	struct extent_map *em;
+	u64 next_start = 0;
+	int ret = 0;
+
+	if (flags & MS_RDONLY)
+		return 0;
+
+	read_lock(&map_tree->map_tree.lock);
+	em = lookup_extent_mapping(&map_tree->map_tree, 0, (u64)(-1));
+	/* No any chunk? Should be a huge bug */
+	if (!em) {
+		ret = -ENOENT;
+		goto out;
+	}
+
+	while (em) {
+		struct map_lookup *map;
+		int missing = 0;
+		int max_tolerated;
+		int i;
+
+		map = (struct map_lookup *) em->bdev;
+		max_tolerated =
+			btrfs_get_num_tolerated_disk_barrier_failures(
+					map->type);
+		for (i = 0; i < map->num_stripes; i++) {
+			if (map->stripes[i].dev->missing)
+				missing++;
+		}
+		if (missing > max_tolerated) {
+			ret = -EIO;
+			btrfs_warn(fs_info,
+				   "missing devices(%d) exceeds the limit(%d), writebale mount is not allowed",
+				   missing, max_tolerated);
+			goto out;
+		} else if (missing)
+			ret = 1;
+		next_start = extent_map_end(em);
+
+		/*
+		 * Alwasy search range [next_start, (u64)-1) to find the next
+		 * chunk map
+		 */
+		em = lookup_extent_mapping(&map_tree->map_tree, next_start,
+					   (u64)(-1) - next_start);
+	}
+out:
+	read_unlock(&map_tree->map_tree.lock);
+	return ret;
+}
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 1939ebde63df..351431a3f5aa 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -566,5 +566,6 @@ static inline void unlock_chunks(struct btrfs_root *root)
 struct list_head *btrfs_get_fs_uuids(void);
 void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
 void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
+int btrfs_check_degradable(struct btrfs_fs_info *fs_info, unsigned flags);
 
 #endif
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/12] btrfs: Do per-chunk check for mount time check
  2016-03-29 14:22 [PATCH v2 00/15] Introduce device state 'failed', Hot spare and Auto replace Anand Jain
  2016-03-29 14:22 ` [PATCH 01/12] btrfs: Introduce a new function to check if all chunks a OK for degraded mount Anand Jain
@ 2016-03-29 14:22 ` Anand Jain
  2016-03-29 14:22 ` [PATCH 03/12] btrfs: Do per-chunk degraded check for remount Anand Jain
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Anand Jain @ 2016-03-29 14:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, dsterba

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

Now use the btrfs_check_degraded() to do mount time degraded check.

With this patch, now we can mount with the following case:
 # mkfs.btrfs -f -m raid1 -d single /dev/sdb /dev/sdc
 # wipefs -a /dev/sdc
 # mount /dev/sdb /mnt/btrfs -o degraded
 As the single data chunk is only in sdb, so it's OK to mount as degraded,
 as missing one device is OK for RAID1.

But still fail with the following case as expected:
 # mkfs.btrfs -f -m raid1 -d single /dev/sdb /dev/sdc
 # wipefs -a /dev/sdb
 # mount /dev/sdc /mnt/btrfs -o degraded
 As the data chunk is only in sdb, so it's not OK to mount it as degraded.

Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Reported-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>

[Btrfs: use btrfs_error instead of btrfs_err during mount]
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/disk-io.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c95e3ce9f22e..bfea0f8f6a87 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2880,6 +2880,16 @@ int open_ctree(struct super_block *sb,
 		goto fail_tree_roots;
 	}
 
+	ret = btrfs_check_degradable(fs_info, fs_info->sb->s_flags);
+	if (ret < 0) {
+		btrfs_err(fs_info, "degraded writable mount failed %d", ret);
+		goto fail_tree_roots;
+	} else if (ret > 0 && !btrfs_test_opt(chunk_root, DEGRADED)) {
+		btrfs_warn(fs_info,
+			"Some device missing, but still degraded mountable, please mount with -o degraded option");
+		ret = -EACCES;
+		goto fail_tree_roots;
+	}
 	/*
 	 * keep the device that is marked to be the target device for the
 	 * dev_replace procedure
@@ -2983,14 +2993,6 @@ retry_root_backup:
 	}
 	fs_info->num_tolerated_disk_barrier_failures =
 		btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
-	if (fs_info->fs_devices->missing_devices >
-	     fs_info->num_tolerated_disk_barrier_failures &&
-	    !(sb->s_flags & MS_RDONLY)) {
-		pr_warn("BTRFS: missing devices(%llu) exceeds the limit(%d), writeable mount is not allowed\n",
-			fs_info->fs_devices->missing_devices,
-			fs_info->num_tolerated_disk_barrier_failures);
-		goto fail_sysfs;
-	}
 
 	fs_info->cleaner_kthread = kthread_run(cleaner_kthread, tree_root,
 					       "btrfs-cleaner");
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 03/12] btrfs: Do per-chunk degraded check for remount
  2016-03-29 14:22 [PATCH v2 00/15] Introduce device state 'failed', Hot spare and Auto replace Anand Jain
  2016-03-29 14:22 ` [PATCH 01/12] btrfs: Introduce a new function to check if all chunks a OK for degraded mount Anand Jain
  2016-03-29 14:22 ` [PATCH 02/12] btrfs: Do per-chunk check for mount time check Anand Jain
@ 2016-03-29 14:22 ` Anand Jain
  2016-03-29 14:22 ` [PATCH 04/12] btrfs: Allow barrier_all_devices to do per-chunk device check Anand Jain
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Anand Jain @ 2016-03-29 14:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, dsterba

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

Just the same for mount time check, use new btrfs_check_degraded() to do
per chunk check.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>

Btrfs: use btrfs_error instead of btrfs_err during remount

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/super.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 00b8f37cc306..87639fa53b10 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1767,11 +1767,14 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 			goto restore;
 		}
 
-		if (fs_info->fs_devices->missing_devices >
-		     fs_info->num_tolerated_disk_barrier_failures &&
-		    !(*flags & MS_RDONLY)) {
+		ret = btrfs_check_degradable(fs_info, *flags);
+		if (ret < 0) {
+			btrfs_err(fs_info,
+				"degraded writable remount failed %d", ret);
+			goto restore;
+		} else if (ret > 0 && !btrfs_test_opt(root, DEGRADED)) {
 			btrfs_warn(fs_info,
-				"too many missing devices, writeable remount is not allowed");
+				"some device missing, but still degraded mountable, please remount with -o degraded option");
 			ret = -EACCES;
 			goto restore;
 		}
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/12] btrfs: Allow barrier_all_devices to do per-chunk device check
  2016-03-29 14:22 [PATCH v2 00/15] Introduce device state 'failed', Hot spare and Auto replace Anand Jain
                   ` (2 preceding siblings ...)
  2016-03-29 14:22 ` [PATCH 03/12] btrfs: Do per-chunk degraded check for remount Anand Jain
@ 2016-03-29 14:22 ` Anand Jain
  2016-03-29 14:22 ` [PATCH 05/12] btrfs: Cleanup num_tolerated_disk_barrier_failures Anand Jain
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Anand Jain @ 2016-03-29 14:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, dsterba

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

The last user of num_tolerated_disk_barrier_failures is
barrier_all_devices(). But it's can be easily changed to new per-chunk
degradable check framework.

Now btrfs_device will have two extra members, representing send/wait
error, set at write_dev_flush() time. And then check it in a similar but
more accurate behavior than old code.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/disk-io.c | 13 +++++--------
 fs/btrfs/volumes.c |  6 +++++-
 fs/btrfs/volumes.h |  4 ++++
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index bfea0f8f6a87..85e26d62c089 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3491,8 +3491,6 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 {
 	struct list_head *head;
 	struct btrfs_device *dev;
-	int errors_send = 0;
-	int errors_wait = 0;
 	int ret;
 
 	/* send down all the barriers */
@@ -3501,7 +3499,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 		if (dev->missing)
 			continue;
 		if (!dev->bdev) {
-			errors_send++;
+			dev->err_send = 1;
 			continue;
 		}
 		if (!dev->in_fs_metadata || !dev->writeable)
@@ -3509,7 +3507,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 
 		ret = write_dev_flush(dev, 0);
 		if (ret)
-			errors_send++;
+			dev->err_send = 1;
 	}
 
 	/* wait for all the barriers */
@@ -3517,7 +3515,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 		if (dev->missing)
 			continue;
 		if (!dev->bdev) {
-			errors_wait++;
+			dev->err_wait = 1;
 			continue;
 		}
 		if (!dev->in_fs_metadata || !dev->writeable)
@@ -3525,10 +3523,9 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 
 		ret = write_dev_flush(dev, 1);
 		if (ret)
-			errors_wait++;
+			dev->err_wait = 1;
 	}
-	if (errors_send > info->num_tolerated_disk_barrier_failures ||
-	    errors_wait > info->num_tolerated_disk_barrier_failures)
+	if (btrfs_check_degradable(info, info->sb->s_flags) < 0)
 		return -EIO;
 	return 0;
 }
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index dd3dc53a302a..a840d78ba127 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7081,8 +7081,12 @@ int btrfs_check_degradable(struct btrfs_fs_info *fs_info, unsigned flags)
 			btrfs_get_num_tolerated_disk_barrier_failures(
 					map->type);
 		for (i = 0; i < map->num_stripes; i++) {
-			if (map->stripes[i].dev->missing)
+			if (map->stripes[i].dev->missing ||
+			    map->stripes[i].dev->err_wait ||
+			    map->stripes[i].dev->err_send)
 				missing++;
+			map->stripes[i].dev->err_wait = 0;
+			map->stripes[i].dev->err_send = 0;
 		}
 		if (missing > max_tolerated) {
 			ret = -EIO;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 351431a3f5aa..48ced5cc09e4 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -76,6 +76,10 @@ struct btrfs_device {
 	int can_discard;
 	int is_tgtdev_for_dev_replace;
 
+	/* for barrier_all_devices() check */
+	int err_send;
+	int err_wait;
+
 #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
 	seqcount_t data_seqcount;
 #endif
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 05/12] btrfs: Cleanup num_tolerated_disk_barrier_failures
  2016-03-29 14:22 [PATCH v2 00/15] Introduce device state 'failed', Hot spare and Auto replace Anand Jain
                   ` (3 preceding siblings ...)
  2016-03-29 14:22 ` [PATCH 04/12] btrfs: Allow barrier_all_devices to do per-chunk device check Anand Jain
@ 2016-03-29 14:22 ` Anand Jain
  2016-03-29 14:22 ` [PATCH 06/12] btrfs: introduce device dynamic state transition to offline or failed Anand Jain
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Anand Jain @ 2016-03-29 14:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, dsterba

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

As we use per-chunk degradable check, now the global
num_tolerated_disk_barrier_failures is of no use. So cleanup it.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>

[Btrfs: resolve conflict to apply 'btrfs: Cleanup num_tolerated_disk_barrier_failures']
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/ctree.h   |  2 --
 fs/btrfs/disk-io.c | 56 ------------------------------------------------------
 fs/btrfs/disk-io.h |  2 --
 fs/btrfs/volumes.c | 17 -----------------
 4 files changed, 77 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 84a6a5b3384a..e0a50f478e01 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1829,8 +1829,6 @@ struct btrfs_fs_info {
 	/* next backup root to be overwritten */
 	int backup_root_index;
 
-	int num_tolerated_disk_barrier_failures;
-
 	/* device replace state */
 	struct btrfs_dev_replace dev_replace;
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 85e26d62c089..7f02f1766037 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2991,8 +2991,6 @@ retry_root_backup:
 		printk(KERN_ERR "BTRFS: Failed to read block groups: %d\n", ret);
 		goto fail_sysfs;
 	}
-	fs_info->num_tolerated_disk_barrier_failures =
-		btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
 
 	fs_info->cleaner_kthread = kthread_run(cleaner_kthread, tree_root,
 					       "btrfs-cleaner");
@@ -3559,60 +3557,6 @@ int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags)
 	return min_tolerated;
 }
 
-int btrfs_calc_num_tolerated_disk_barrier_failures(
-	struct btrfs_fs_info *fs_info)
-{
-	struct btrfs_ioctl_space_info space;
-	struct btrfs_space_info *sinfo;
-	u64 types[] = {BTRFS_BLOCK_GROUP_DATA,
-		       BTRFS_BLOCK_GROUP_SYSTEM,
-		       BTRFS_BLOCK_GROUP_METADATA,
-		       BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA};
-	int i;
-	int c;
-	int num_tolerated_disk_barrier_failures =
-		(int)fs_info->fs_devices->num_devices;
-
-	for (i = 0; i < ARRAY_SIZE(types); i++) {
-		struct btrfs_space_info *tmp;
-
-		sinfo = NULL;
-		rcu_read_lock();
-		list_for_each_entry_rcu(tmp, &fs_info->space_info, list) {
-			if (tmp->flags == types[i]) {
-				sinfo = tmp;
-				break;
-			}
-		}
-		rcu_read_unlock();
-
-		if (!sinfo)
-			continue;
-
-		down_read(&sinfo->groups_sem);
-		for (c = 0; c < BTRFS_NR_RAID_TYPES; c++) {
-			u64 flags;
-
-			if (list_empty(&sinfo->block_groups[c]))
-				continue;
-
-			btrfs_get_block_group_info(&sinfo->block_groups[c],
-						   &space);
-			if (space.total_bytes == 0 || space.used_bytes == 0)
-				continue;
-			flags = space.flags;
-
-			num_tolerated_disk_barrier_failures = min(
-				num_tolerated_disk_barrier_failures,
-				btrfs_get_num_tolerated_disk_barrier_failures(
-					flags));
-		}
-		up_read(&sinfo->groups_sem);
-	}
-
-	return num_tolerated_disk_barrier_failures;
-}
-
 static int write_all_supers(struct btrfs_root *root, int max_mirrors)
 {
 	struct list_head *head;
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 8e79d0070bcf..dd155621f95f 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -141,8 +141,6 @@ struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans,
 int btree_lock_page_hook(struct page *page, void *data,
 				void (*flush_fn)(void *));
 int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags);
-int btrfs_calc_num_tolerated_disk_barrier_failures(
-	struct btrfs_fs_info *fs_info);
 int __init btrfs_end_io_wq_init(void);
 void btrfs_end_io_wq_exit(void);
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a840d78ba127..dff2deaf88d3 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1876,9 +1876,6 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path)
 		free_fs_devices(cur_devices);
 	}
 
-	root->fs_info->num_tolerated_disk_barrier_failures =
-		btrfs_calc_num_tolerated_disk_barrier_failures(root->fs_info);
-
 	/*
 	 * at this point, the device is zero sized.  We want to
 	 * remove it from the devices list and zero out the old super
@@ -2405,8 +2402,6 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path)
 				"sysfs: failed to create fsid for sprout");
 	}
 
-	root->fs_info->num_tolerated_disk_barrier_failures =
-		btrfs_calc_num_tolerated_disk_barrier_failures(root->fs_info);
 	ret = btrfs_commit_transaction(trans, root);
 
 	if (seeding_dev) {
@@ -3757,13 +3752,6 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 			bctl->meta.target, bctl->data.target);
 	}
 
-	if (bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) {
-		fs_info->num_tolerated_disk_barrier_failures = min(
-			btrfs_calc_num_tolerated_disk_barrier_failures(fs_info),
-			btrfs_get_num_tolerated_disk_barrier_failures(
-				bctl->sys.target));
-	}
-
 	ret = insert_balance_item(fs_info->tree_root, bctl);
 	if (ret && ret != -EEXIST)
 		goto out;
@@ -3786,11 +3774,6 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 	mutex_lock(&fs_info->balance_mutex);
 	atomic_dec(&fs_info->balance_running);
 
-	if (bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) {
-		fs_info->num_tolerated_disk_barrier_failures =
-			btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
-	}
-
 	if (bargs) {
 		memset(bargs, 0, sizeof(*bargs));
 		update_ioctl_balance_args(fs_info, 0, bargs);
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 06/12] btrfs: introduce device dynamic state transition to offline or failed
  2016-03-29 14:22 [PATCH v2 00/15] Introduce device state 'failed', Hot spare and Auto replace Anand Jain
                   ` (4 preceding siblings ...)
  2016-03-29 14:22 ` [PATCH 05/12] btrfs: Cleanup num_tolerated_disk_barrier_failures Anand Jain
@ 2016-03-29 14:22 ` Anand Jain
  2016-03-29 14:22 ` [PATCH 07/12] btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV Anand Jain
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Anand Jain @ 2016-03-29 14:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, dsterba

Need device forced offline/failed feature for the following reasons,
1) a. it can be reported that device has failed when it does
   b. close the device when it goes offline so that blocklayer can
      cleanup
2) identify the candidate for the auto replace
3) avoid further commit error reported against the failing device and
4) a device in the multi device btrfs may go offline from the system
   (but as of now in in some system config btrfs gets unmounted in this
    context, which is not a correct behavior)

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/volumes.c | 137 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.h |  14 ++++++
 2 files changed, 151 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index dff2deaf88d3..a662701d4f22 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7092,3 +7092,140 @@ out:
 	read_unlock(&map_tree->map_tree.lock);
 	return ret;
 }
+
+static void __close_device(struct work_struct *work)
+{
+	struct btrfs_device *device;
+
+	device = container_of(work, struct btrfs_device, rcu_work);
+
+	if (device->bdev)
+		blkdev_put(device->bdev, device->mode);
+
+	device->bdev = NULL;
+}
+
+static void close_device(struct rcu_head *head)
+{
+	struct btrfs_device *device;
+
+	device = container_of(head, struct btrfs_device, rcu);
+
+	INIT_WORK(&device->rcu_work, __close_device);
+	schedule_work(&device->rcu_work);
+}
+
+void btrfs_close_one_device_dont_free(struct btrfs_device *device)
+{
+	struct btrfs_fs_devices *fs_devices = device->fs_devices;
+
+	if (device->bdev)
+		fs_devices->open_devices--;
+
+	if (device->writeable &&
+	    device->devid != BTRFS_DEV_REPLACE_DEVID) {
+		list_del_init(&device->dev_alloc_list);
+		fs_devices->rw_devices--;
+	}
+
+	device->writeable = 0;
+
+	call_rcu(&device->rcu, close_device);
+}
+
+void force_device_close(struct btrfs_device *device)
+{
+	struct btrfs_device *next_device;
+	struct btrfs_fs_devices *fs_devices;
+
+	fs_devices = device->fs_devices;
+
+	mutex_lock(&fs_devices->device_list_mutex);
+	lock_chunks(fs_devices->fs_info->fs_root);
+
+	next_device = list_entry(fs_devices->devices.next,
+					struct btrfs_device, dev_list);
+	if (device->bdev == fs_devices->fs_info->sb->s_bdev)
+		fs_devices->fs_info->sb->s_bdev = next_device->bdev;
+
+	if (device->bdev == fs_devices->latest_bdev)
+		fs_devices->latest_bdev = next_device->bdev;
+
+	btrfs_close_one_device_dont_free(device);
+
+	/*
+	 * TODO: works for now, but its better to keep the state of
+	 * missing and offline different, and update rest of the
+	 * places where we check for only missing and not for failed
+	 * or offline as of now.
+	 */
+	device->missing = 1;
+	fs_devices->missing_devices++;
+	device->writeable = 0;
+
+	rcu_barrier();
+
+	unlock_chunks(fs_devices->fs_info->fs_root);
+	mutex_unlock(&fs_devices->device_list_mutex);
+}
+
+void btrfs_force_device_close(struct btrfs_device *dev, char *why)
+{
+	bool degrade_option;
+	int tolerated_fail;
+	struct btrfs_fs_info *fs_info;
+	struct btrfs_fs_devices *fs_devices;
+
+	fs_devices = dev->fs_devices;
+	fs_info = fs_devices->fs_info;
+	degrade_option = btrfs_test_opt(fs_info->fs_root, DEGRADED);
+
+	/* todo: support seed later */
+	if (fs_devices->seeding)
+		return;
+
+	/* this shouldn't be called if device is already missing */
+	if (dev->missing || !dev->bdev)
+		return;
+
+	if (dev->offline || dev->failed)
+		return;
+
+	/* Only RW device is requested to force close let FS handle it*/
+	if (fs_devices->rw_devices == 1) {
+		btrfs_std_error(fs_info, -EIO,
+			"force offline last RW device");
+		return;
+	}
+
+	if (!strcmp(why, "offline"))
+		dev->offline = 1;
+	else if (!strcmp(why, "failed"))
+		dev->failed = 1;
+	else
+		return;
+
+	btrfs_sysfs_rm_device_link(fs_devices, dev);
+
+	force_device_close(dev);
+
+	tolerated_fail = btrfs_check_degradable(fs_info,
+						fs_info->sb->s_flags);
+	if (tolerated_fail > 0) {
+		btrfs_warn_in_rcu(fs_info, "device %s %s, chunks degraded",
+					rcu_str_deref(dev->name), why);
+	} else if(tolerated_fail < 0) {
+		btrfs_warn_in_rcu(fs_info,
+		"device %s %s, chunks failed",
+			rcu_str_deref(dev->name), why);
+		btrfs_std_error(fs_info, -EIO, "devices below critical level");
+	} else {
+		btrfs_warn_in_rcu(fs_info,
+			"device %s %s, No chunks are degraded",
+			rcu_str_deref(dev->name), why);
+	}
+	btrfs_info_in_rcu(fs_info,
+		"num_devices %llu rw_devices %llu degraded-option: %s",
+		fs_devices->num_devices, fs_devices->rw_devices,
+		degrade_option ? "set":"unset");
+}
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 48ced5cc09e4..ccc716b3c419 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -72,7 +72,20 @@ struct btrfs_device {
 
 	int writeable;
 	int in_fs_metadata;
+	/* missing: device wasn't found at the time of mount */
+	/* fixme: correct usage of missing_devices and missing */
 	int missing;
+	/* failed: device confirmed to have experienced critical io failure */
+	int failed;
+	/*
+	 * offline: system or user or block layer transport has removed
+	 * offlined the device which was once present and without going
+	 * through unmount. Implies an intriem communication break down
+	 * and not necessarily a candidate for the device replace. And
+	 * device might be online after user intervention or after
+	 * block transport layer error recovery.
+	 */
+	int offline;
 	int can_discard;
 	int is_tgtdev_for_dev_replace;
 
@@ -571,5 +584,6 @@ struct list_head *btrfs_get_fs_uuids(void);
 void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
 void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
 int btrfs_check_degradable(struct btrfs_fs_info *fs_info, unsigned flags);
+void btrfs_force_device_close(struct btrfs_device *dev, char *why);
 
 #endif
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 07/12] btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV
  2016-03-29 14:22 [PATCH v2 00/15] Introduce device state 'failed', Hot spare and Auto replace Anand Jain
                   ` (5 preceding siblings ...)
  2016-03-29 14:22 ` [PATCH 06/12] btrfs: introduce device dynamic state transition to offline or failed Anand Jain
@ 2016-03-29 14:22 ` Anand Jain
  2016-03-29 14:22 ` [PATCH 08/12] btrfs: add check not to mount a spare device Anand Jain
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Anand Jain @ 2016-03-29 14:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, dsterba

Add BTRFS_FEATURE_INCOMPAT_SPARE_DEV (400) flag to identify
a spare device.

Along with this it checks in the mount context that a spare
device will fail to mount.  As spare devices aren't mountable.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/ctree.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index e0a50f478e01..2c185a8e92f0 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -531,6 +531,7 @@ struct btrfs_super_block {
 #define BTRFS_FEATURE_INCOMPAT_RAID56		(1ULL << 7)
 #define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA	(1ULL << 8)
 #define BTRFS_FEATURE_INCOMPAT_NO_HOLES		(1ULL << 9)
+#define BTRFS_FEATURE_INCOMPAT_SPARE_DEV	(1ULL << 10)
 
 #define BTRFS_FEATURE_COMPAT_SUPP		0ULL
 #define BTRFS_FEATURE_COMPAT_SAFE_SET		0ULL
@@ -551,7 +552,8 @@ struct btrfs_super_block {
 	 BTRFS_FEATURE_INCOMPAT_RAID56 |		\
 	 BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF |		\
 	 BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA |	\
-	 BTRFS_FEATURE_INCOMPAT_NO_HOLES)
+	 BTRFS_FEATURE_INCOMPAT_NO_HOLES |		\
+	 BTRFS_FEATURE_INCOMPAT_SPARE_DEV)
 
 #define BTRFS_FEATURE_INCOMPAT_SAFE_SET			\
 	(BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 08/12] btrfs: add check not to mount a spare device
  2016-03-29 14:22 [PATCH v2 00/15] Introduce device state 'failed', Hot spare and Auto replace Anand Jain
                   ` (6 preceding siblings ...)
  2016-03-29 14:22 ` [PATCH 07/12] btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV Anand Jain
@ 2016-03-29 14:22 ` Anand Jain
  2016-03-29 14:22 ` [PATCH 09/12] btrfs: support btrfs dev scan for " Anand Jain
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Anand Jain @ 2016-03-29 14:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, dsterba

Spare devices can be scanned but shouldn't be mountable.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/disk-io.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 7f02f1766037..b99329e37965 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2806,6 +2806,14 @@ int open_ctree(struct super_block *sb,
 		goto fail_alloc;
 	}
 
+	if (btrfs_super_incompat_flags(disk_super) &
+			BTRFS_FEATURE_INCOMPAT_SPARE_DEV) {
+		/*You can only scan a spare device but not mount*/
+		printk(KERN_ERR "BTRFS: You can't mount a spare device\n");
+		err = -ENOTSUPP;
+		goto fail_alloc;
+	}
+
 	/*
 	 * Needn't use the lock because there is no other task which will
 	 * update the flag.
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 09/12] btrfs: support btrfs dev scan for spare device
  2016-03-29 14:22 [PATCH v2 00/15] Introduce device state 'failed', Hot spare and Auto replace Anand Jain
                   ` (7 preceding siblings ...)
  2016-03-29 14:22 ` [PATCH 08/12] btrfs: add check not to mount a spare device Anand Jain
@ 2016-03-29 14:22 ` Anand Jain
  2016-03-29 14:22 ` [PATCH 10/12] btrfs: provide framework to get and put a " Anand Jain
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Anand Jain @ 2016-03-29 14:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, dsterba

When the user or system calls the BTRFS_IOC_SCAN_DEV,
ioctl this patch will make sure it is added to the device
list and set it as spare.

This operation will be same when BTRFS_IOC_DEVICES_READY
as well since BTRFS_IOC_DEVICES_READY ioctl has been doing
that by legacy.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/volumes.c | 4 ++++
 fs/btrfs/volumes.h | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a662701d4f22..26ae4fd39ce7 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -604,6 +604,10 @@ static noinline int device_list_add(const char *path,
 		if (IS_ERR(fs_devices))
 			return PTR_ERR(fs_devices);
 
+		if (btrfs_super_incompat_flags(disk_super) &
+				BTRFS_FEATURE_INCOMPAT_SPARE_DEV)
+			fs_devices->spare = 1;
+
 		list_add(&fs_devices->list, &fs_uuids);
 
 		device = NULL;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index ccc716b3c419..6b3b730c2727 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -276,6 +276,8 @@ struct btrfs_fs_devices {
 	struct kobject fsid_kobj;
 	struct kobject *device_dir_kobj;
 	struct completion kobj_unregister;
+
+	int spare;
 };
 
 #define BTRFS_BIO_INLINE_CSUM_SIZE	64
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 10/12] btrfs: provide framework to get and put a spare device
  2016-03-29 14:22 [PATCH v2 00/15] Introduce device state 'failed', Hot spare and Auto replace Anand Jain
                   ` (8 preceding siblings ...)
  2016-03-29 14:22 ` [PATCH 09/12] btrfs: support btrfs dev scan for " Anand Jain
@ 2016-03-29 14:22 ` Anand Jain
  2016-03-29 14:22 ` [PATCH 11/12] btrfs: introduce helper functions to perform hot replace Anand Jain
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Anand Jain @ 2016-03-29 14:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, dsterba

This adds functions to get and put a spare device from the list.
So that hot repace code can pick a spare device when needed.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/super.c   |  9 +++++++++
 fs/btrfs/volumes.c | 37 +++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.h |  2 ++
 3 files changed, 48 insertions(+)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 87639fa53b10..138fca39ffbb 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2164,6 +2164,15 @@ static int btrfs_control_open(struct inode *inode, struct file *file)
 	return 0;
 }
 
+void btrfs_put_spare_device(char *path)
+{
+	struct btrfs_fs_devices *fs_devices;
+
+	if (btrfs_scan_one_device(path, FMODE_READ,
+				    &btrfs_fs_type, &fs_devices))
+		printk(KERN_INFO "failed to return spare device\n");
+}
+
 /*
  * used by btrfsctl to scan devices when no FS is mounted
  */
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 26ae4fd39ce7..308fcb55f2a1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7233,3 +7233,40 @@ void btrfs_force_device_close(struct btrfs_device *dev, char *why)
 		fs_devices->num_devices, fs_devices->rw_devices,
 		degrade_option ? "set":"unset");
 }
+
+int btrfs_get_spare_device(char **path)
+{
+	int ret = 1;
+	struct btrfs_fs_devices *fs_devices;
+	struct btrfs_device *device;
+	struct list_head *fs_uuids = btrfs_get_fs_uuids();
+
+	mutex_lock(&uuid_mutex);
+	list_for_each_entry(fs_devices, fs_uuids, list) {
+		if (!fs_devices->spare)
+			continue;
+
+		/* as of now there is only one device in the spare fs_devices */
+		device = list_entry(fs_devices->devices.next,
+					struct btrfs_device, dev_list);
+
+		if (!device || !device->name)
+			continue;
+
+		fs_devices->spare = 0;
+		rcu_read_lock();
+		*path = kstrdup(device->name->str, GFP_NOFS);
+		rcu_read_unlock();
+		ret = 0;
+		break;
+	}
+
+	if (!ret) {
+		btrfs_sysfs_remove_fsid(fs_devices);
+		list_del(&fs_devices->list);
+		free_fs_devices(fs_devices);
+	}
+	mutex_unlock(&uuid_mutex);
+
+	return ret;
+}
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 6b3b730c2727..b9c04fdf7166 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -587,5 +587,7 @@ void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
 void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
 int btrfs_check_degradable(struct btrfs_fs_info *fs_info, unsigned flags);
 void btrfs_force_device_close(struct btrfs_device *dev, char *why);
+int btrfs_get_spare_device(char **path);
+void btrfs_put_spare_device(char *path);
 
 #endif
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 11/12] btrfs: introduce helper functions to perform hot replace
  2016-03-29 14:22 [PATCH v2 00/15] Introduce device state 'failed', Hot spare and Auto replace Anand Jain
                   ` (9 preceding siblings ...)
  2016-03-29 14:22 ` [PATCH 10/12] btrfs: provide framework to get and put a " Anand Jain
@ 2016-03-29 14:22 ` Anand Jain
  2016-03-29 14:45   ` kbuild test robot
  2016-03-29 14:22 ` [PATCH 12/12] btrfs: check device for critical errors and mark failed Anand Jain
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: Anand Jain @ 2016-03-29 14:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, dsterba

Hot replace / auto replace is important volume manager feature
and is critical to the data center operations, so that the degraded
volume can be brought back to a healthy state at the earliest and
without manual intervention.

This modifies the existing replace code to suite the need of auto
replace, in the long run I hope both the codes to be merged.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/dev-replace.c | 24 ++++++++++++++++++++++++
 fs/btrfs/dev-replace.h |  1 +
 2 files changed, 25 insertions(+)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 2b926867d136..d6b768cf121f 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -957,3 +957,27 @@ void btrfs_bio_counter_inc_blocked(struct btrfs_fs_info *fs_info)
 				     &fs_info->fs_state));
 	}
 }
+
+int btrfs_auto_replace_start(struct btrfs_root *root,
+				struct btrfs_device *src_device)
+{
+	int ret;
+	char *tgt_path;
+
+	if (btrfs_get_spare_device(&tgt_path)) {
+		btrfs_err(root->fs_info,
+			"No spare device found/configured in the kernel");
+		return -EINVAL;
+	}
+
+	ret = btrfs_dev_replace_start(root, tgt_path,
+					src_device->devid,
+					rcu_str_deref(src_device->name),
+		BTRFS_IOCTL_DEV_REPLACE_CONT_READING_FROM_SRCDEV_MODE_AVOID);
+	if (ret)
+		btrfs_put_spare_device(tgt_path);
+
+	kfree(tgt_path);
+
+	return 0;
+}
diff --git a/fs/btrfs/dev-replace.h b/fs/btrfs/dev-replace.h
index e922b42d91df..b918b9d6e5df 100644
--- a/fs/btrfs/dev-replace.h
+++ b/fs/btrfs/dev-replace.h
@@ -46,4 +46,5 @@ static inline void btrfs_dev_replace_stats_inc(atomic64_t *stat_value)
 {
 	atomic64_inc(stat_value);
 }
+int btrfs_auto_replace_start(struct btrfs_root *root, struct btrfs_device *src_device);
 #endif
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 12/12] btrfs: check device for critical errors and mark failed
  2016-03-29 14:22 [PATCH v2 00/15] Introduce device state 'failed', Hot spare and Auto replace Anand Jain
                   ` (10 preceding siblings ...)
  2016-03-29 14:22 ` [PATCH 11/12] btrfs: introduce helper functions to perform hot replace Anand Jain
@ 2016-03-29 14:22 ` Anand Jain
  2016-03-29 22:41   ` Yauhen Kharuzhy
  2016-03-30  0:49   ` Yauhen Kharuzhy
  2016-03-29 14:27 ` [PATCH 1/4] btrfs-progs: Introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV SB flags Anand Jain
  2016-03-29 17:30 ` [PATCH v2 00/15] Introduce device state 'failed', Hot spare and Auto replace Austin S. Hemmelgarn
  13 siblings, 2 replies; 25+ messages in thread
From: Anand Jain @ 2016-03-29 14:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, dsterba

Write and Flush errors are considered as critical errors,
upon which the device will be brought offline and marked as
failed. Write and Flush errors are identified using device
error statistics.

Signed-off-by: Anand Jain <anand.jain@oracle.com>

btrfs: check for failed device and hot replace

This patch creates casualty_kthread to check for the failed
devices, and triggers device replace.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/ctree.h   |   2 +
 fs/btrfs/disk-io.c | 161 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/disk-io.h |   2 +
 fs/btrfs/volumes.c |   1 +
 fs/btrfs/volumes.h |   4 ++
 5 files changed, 169 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2c185a8e92f0..36f1c29e00a0 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1569,6 +1569,7 @@ struct btrfs_fs_info {
 	struct mutex tree_log_mutex;
 	struct mutex transaction_kthread_mutex;
 	struct mutex cleaner_mutex;
+	struct mutex casualty_mutex;
 	struct mutex chunk_mutex;
 	struct mutex volume_mutex;
 
@@ -1686,6 +1687,7 @@ struct btrfs_fs_info {
 	struct btrfs_workqueue *extent_workers;
 	struct task_struct *transaction_kthread;
 	struct task_struct *cleaner_kthread;
+	struct task_struct *casualty_kthread;
 	int thread_pool_size;
 
 	struct kobject *space_info_kobj;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b99329e37965..650e26e0acda 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1869,6 +1869,153 @@ sleep:
 	return 0;
 }
 
+static int btrfs_check_and_handle_casualty(void *arg)
+{
+	int ret;
+	int found = 0;
+	struct btrfs_device *device;
+	struct btrfs_root *root = arg;
+	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
+
+	btrfs_dev_replace_lock(&fs_info->dev_replace, 0);
+	if (btrfs_dev_replace_is_ongoing(&fs_info->dev_replace)) {
+		btrfs_dev_replace_unlock(&fs_info->dev_replace, 0);
+		return -EBUSY;
+	}
+	btrfs_dev_replace_unlock(&fs_info->dev_replace, 0);
+
+	ret = btrfs_check_devices(fs_devices);
+	if (ret == 1) {
+		/*
+		 * There were some casualties, and if its beyond a
+		 * chunk group can tolerate, then FS will already
+		 * be in readonly, so check that. And that's best
+		 * btrfs could do as of now and no replace will help.
+		 */
+		if (fs_info->sb->s_flags & MS_RDONLY)
+			return -EROFS;
+
+		mutex_lock(&fs_devices->device_list_mutex);
+		rcu_read_lock();
+		list_for_each_entry_rcu(device,
+				&fs_devices->devices, dev_list) {
+			if (device->failed) {
+				found = 1;
+				break;
+			}
+		}
+		rcu_read_unlock();
+		mutex_unlock(&fs_devices->device_list_mutex);
+	}
+
+	/*
+	 * We are using the replace code which should be interrupt-able
+	 * during unmount, and as of now there is no user land stop
+	 * request that we support and this will run until its complete
+	 */
+	if (found)
+		ret = btrfs_auto_replace_start(root, device);
+
+	return ret;
+}
+
+/*
+ * A kthread to check if any auto maintenance be required. This is
+ * multithread safe, and kthread is running only if
+ * fs_info->casualty_kthread is not NULL, fixme: atomic ?
+ */
+static int casualty_kthread(void *arg)
+{
+	int ret;
+	int again;
+	struct btrfs_root *root = arg;
+
+	do {
+		again = 0;
+
+		if (btrfs_need_cleaner_sleep(root))
+			goto sleep;
+
+		if (!mutex_trylock(&root->fs_info->casualty_mutex))
+			goto sleep;
+
+		if (btrfs_need_cleaner_sleep(root)) {
+			mutex_unlock(&root->fs_info->casualty_mutex);
+			goto sleep;
+		}
+
+		ret = btrfs_check_and_handle_casualty(arg);
+		if (ret == -EROFS) {
+			/*
+			 * When checking and fixing the devices, the
+			 * FS may be marked as RO in some situations.
+			 * And on ROFS casualty thread has no work.
+			 * So optimize here, to stop this thread until
+			 * FS is back to RW.
+			 */
+		}
+		mutex_unlock(&root->fs_info->casualty_mutex);
+
+sleep:
+		if (!try_to_freeze() && !again) {
+			set_current_state(TASK_INTERRUPTIBLE);
+			if (!kthread_should_stop())
+				schedule();
+			__set_current_state(TASK_RUNNING);
+		}
+	} while (!kthread_should_stop());
+
+	return 0;
+}
+
+/*
+ * returns:
+ * < 0 : Check didn't run, std error
+ *   0 : No errors found
+ * > 0 : # of devices having fatal errors
+ */
+int btrfs_check_devices(struct btrfs_fs_devices *fs_devices)
+{
+	int ret = 0;
+	struct btrfs_fs_info *fs_info = fs_devices->fs_info;
+	struct btrfs_device *device;
+
+	if (btrfs_fs_closing(fs_info))
+		return -EBUSY;
+
+	/* mark disk(s) with write or flush error(s) as failed */
+	mutex_lock(&fs_info->volume_mutex);
+	list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) {
+		int c_err;
+
+		/*
+		 * todo: replace target device's write/flush error,
+		 * skip for now
+		 */
+		if (device->is_tgtdev_for_dev_replace)
+			continue;
+
+		if (!device->dev_stats_valid)
+			continue;
+
+		c_err = atomic_read(&device->new_critical_errs);
+		atomic_sub(c_err, &device->new_critical_errs);
+		if (c_err) {
+			btrfs_crit_in_rcu(fs_info,
+				"Fatal error on device %s",
+					rcu_str_deref(device->name));
+
+			/* force close and mark device as failed */
+			btrfs_force_device_close(device, "failed");
+			ret = 1;
+		}
+	}
+	mutex_unlock(&fs_info->volume_mutex);
+
+	return ret;
+}
+
 static int transaction_kthread(void *arg)
 {
 	struct btrfs_root *root = arg;
@@ -1915,6 +2062,7 @@ static int transaction_kthread(void *arg)
 			btrfs_end_transaction(trans, root);
 		}
 sleep:
+		wake_up_process(root->fs_info->casualty_kthread);
 		wake_up_process(root->fs_info->cleaner_kthread);
 		mutex_unlock(&root->fs_info->transaction_kthread_mutex);
 
@@ -2663,6 +2811,7 @@ int open_ctree(struct super_block *sb,
 	mutex_init(&fs_info->chunk_mutex);
 	mutex_init(&fs_info->transaction_kthread_mutex);
 	mutex_init(&fs_info->cleaner_mutex);
+	mutex_init(&fs_info->casualty_mutex);
 	mutex_init(&fs_info->volume_mutex);
 	mutex_init(&fs_info->ro_block_group_mutex);
 	init_rwsem(&fs_info->commit_root_sem);
@@ -3005,11 +3154,16 @@ retry_root_backup:
 	if (IS_ERR(fs_info->cleaner_kthread))
 		goto fail_sysfs;
 
+	fs_info->casualty_kthread = kthread_run(casualty_kthread, tree_root,
+					       "btrfs-casualty");
+	if (IS_ERR(fs_info->casualty_kthread))
+		goto fail_cleaner;
+
 	fs_info->transaction_kthread = kthread_run(transaction_kthread,
 						   tree_root,
 						   "btrfs-transaction");
 	if (IS_ERR(fs_info->transaction_kthread))
-		goto fail_cleaner;
+		goto fail_casualty;
 
 	if (!btrfs_test_opt(tree_root, SSD) &&
 	    !btrfs_test_opt(tree_root, NOSSD) &&
@@ -3173,6 +3327,10 @@ fail_trans_kthread:
 	kthread_stop(fs_info->transaction_kthread);
 	btrfs_cleanup_transaction(fs_info->tree_root);
 	btrfs_free_fs_roots(fs_info);
+
+fail_casualty:
+	kthread_stop(fs_info->casualty_kthread);
+
 fail_cleaner:
 	kthread_stop(fs_info->cleaner_kthread);
 
@@ -3828,6 +3986,7 @@ void close_ctree(struct btrfs_root *root)
 
 	kthread_stop(fs_info->transaction_kthread);
 	kthread_stop(fs_info->cleaner_kthread);
+	kthread_stop(fs_info->casualty_kthread);
 
 	fs_info->closing = 2;
 	smp_mb();
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index dd155621f95f..0a58b0c2bc46 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -156,4 +156,6 @@ static inline void btrfs_set_buffer_lockdep_class(u64 objectid,
 {
 }
 #endif
+
+int btrfs_check_devices(struct btrfs_fs_devices *fs_devices);
 #endif
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 308fcb55f2a1..95a530af8145 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -233,6 +233,7 @@ static struct btrfs_device *__alloc_device(void)
 	spin_lock_init(&dev->reada_lock);
 	atomic_set(&dev->reada_in_flight, 0);
 	atomic_set(&dev->dev_stats_ccnt, 0);
+	atomic_set(&dev->new_critical_errs, 0);
 	btrfs_device_data_ordered_init(dev);
 	INIT_RADIX_TREE(&dev->reada_zones, GFP_NOFS & ~__GFP_DIRECT_RECLAIM);
 	INIT_RADIX_TREE(&dev->reada_extents, GFP_NOFS & ~__GFP_DIRECT_RECLAIM);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index b9c04fdf7166..9fc4c1734ba7 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -167,6 +167,7 @@ struct btrfs_device {
 	/* Counter to record the change of device stats */
 	atomic_t dev_stats_ccnt;
 	atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX];
+	atomic_t new_critical_errs;
 };
 
 /*
@@ -535,6 +536,9 @@ static inline void btrfs_dev_stat_inc(struct btrfs_device *dev,
 	atomic_inc(dev->dev_stat_values + index);
 	smp_mb__before_atomic();
 	atomic_inc(&dev->dev_stats_ccnt);
+	if (index == BTRFS_DEV_STAT_WRITE_ERRS ||
+		index == BTRFS_DEV_STAT_FLUSH_ERRS)
+		atomic_inc(&dev->new_critical_errs);
 }
 
 static inline int btrfs_dev_stat_read(struct btrfs_device *dev,
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 1/4] btrfs-progs: Introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV SB flags
  2016-03-29 14:22 [PATCH v2 00/15] Introduce device state 'failed', Hot spare and Auto replace Anand Jain
                   ` (11 preceding siblings ...)
  2016-03-29 14:22 ` [PATCH 12/12] btrfs: check device for critical errors and mark failed Anand Jain
@ 2016-03-29 14:27 ` Anand Jain
  2016-03-29 14:27   ` [PATCH v2 2/4] btrfs-progs: Introduce btrfs spare subcommand Anand Jain
                     ` (2 more replies)
  2016-03-29 17:30 ` [PATCH v2 00/15] Introduce device state 'failed', Hot spare and Auto replace Austin S. Hemmelgarn
  13 siblings, 3 replies; 25+ messages in thread
From: Anand Jain @ 2016-03-29 14:27 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, dsterba

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 ctree.h   | 4 +++-
 volumes.c | 4 ++++
 volumes.h | 2 ++
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/ctree.h b/ctree.h
index 5ab0f4a45a15..97cbd032fbb1 100644
--- a/ctree.h
+++ b/ctree.h
@@ -480,6 +480,7 @@ struct btrfs_super_block {
 #define BTRFS_FEATURE_INCOMPAT_RAID56		(1ULL << 7)
 #define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA	(1ULL << 8)
 #define BTRFS_FEATURE_INCOMPAT_NO_HOLES		(1ULL << 9)
+#define BTRFS_FEATURE_INCOMPAT_SPARE_DEV	(1ULL << 10)
 
 #define BTRFS_FEATURE_COMPAT_SUPP		0ULL
 
@@ -495,7 +496,8 @@ struct btrfs_super_block {
 	 BTRFS_FEATURE_INCOMPAT_RAID56 |		\
 	 BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS |		\
 	 BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA |	\
-	 BTRFS_FEATURE_INCOMPAT_NO_HOLES)
+	 BTRFS_FEATURE_INCOMPAT_NO_HOLES |		\
+	 BTRFS_FEATURE_INCOMPAT_SPARE_DEV)
 
 /*
  * A leaf is full of items. offset and size tell us where to find
diff --git a/volumes.c b/volumes.c
index 4d22db25be1d..2a5dcd40c092 100644
--- a/volumes.c
+++ b/volumes.c
@@ -101,6 +101,10 @@ static int device_list_add(const char *path,
 		fs_devices->latest_devid = devid;
 		fs_devices->latest_trans = found_transid;
 		fs_devices->lowest_devid = (u64)-1;
+		if (btrfs_super_incompat_flags(disk_super) &
+				BTRFS_FEATURE_INCOMPAT_SPARE_DEV)
+			fs_devices->spare = 1;
+
 		device = NULL;
 	} else {
 		device = __find_device(&fs_devices->devices, devid,
diff --git a/volumes.h b/volumes.h
index c0007adc6a24..79cec37e9194 100644
--- a/volumes.h
+++ b/volumes.h
@@ -83,6 +83,8 @@ struct btrfs_fs_devices {
 
 	int seeding;
 	struct btrfs_fs_devices *seed;
+
+	int spare;
 };
 
 struct btrfs_bio_stripe {
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 2/4] btrfs-progs: Introduce btrfs spare subcommand
  2016-03-29 14:27 ` [PATCH 1/4] btrfs-progs: Introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV SB flags Anand Jain
@ 2016-03-29 14:27   ` Anand Jain
  2016-03-29 14:27   ` [PATCH 3/4] btrfs-progs: add fi show for spare Anand Jain
  2016-03-29 14:27   ` [PATCH 4/4] btrfs-progs: add global spare device list to filesystem show Anand Jain
  2 siblings, 0 replies; 25+ messages in thread
From: Anand Jain @ 2016-03-29 14:27 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, dsterba

Adds a new sub command so that a global spare device can be
added. A sub cli is better so that we can enhance to provide
per FSID spare in future.

btrfs spare add <dev> ..

This will create a btrfs on the dev with the newly introduced
flag, BTRFS_FEATURE_INCOMPAT_SPARE_DEV. And then calls
btrfs_register_one_device() to let kernel know about it.

Compatible with older kernel, that it would fail to mount
as there will be an incompatible flag.


Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v2: 
  Commit log updated
  Changes as per mixed patch from Chandan
  Call btrfs_register_one_device() so that user no need to run
   btrfs dev scan again
  User error() instead of fprintf(stderr,
  
 Android.mk   |   2 +-
 Makefile.in  |   3 +-
 btrfs.c      |   1 +
 cmds-spare.c | 292 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 commands.h   |   2 +
 5 files changed, 298 insertions(+), 2 deletions(-)
 create mode 100644 cmds-spare.c

diff --git a/Android.mk b/Android.mk
index fe3209b63dfe..baaf17967864 100644
--- a/Android.mk
+++ b/Android.mk
@@ -27,7 +27,7 @@ cmds_objects := cmds-subvolume.c cmds-filesystem.c cmds-device.c cmds-scrub.c \
                cmds-inspect.c cmds-balance.c cmds-send.c cmds-receive.c \
                cmds-quota.c cmds-qgroup.c cmds-replace.c cmds-check.c \
                cmds-restore.c cmds-rescue.c chunk-recover.c super-recover.c \
-               cmds-property.c cmds-fi-usage.c
+               cmds-property.c cmds-fi-usage.c cmds-spare.c
 libbtrfs_objects := send-stream.c send-utils.c rbtree.c btrfs-list.c crc32c.c \
                    uuid-tree.c utils-lib.c rbtree-utils.c
 libbtrfs_headers := send-stream.h send-utils.h send.h rbtree.h btrfs-list.h \
diff --git a/Makefile.in b/Makefile.in
index 71ef76d4fd4e..f7a1e7dc11f8 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -76,7 +76,8 @@ cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
 	       cmds-quota.o cmds-qgroup.o cmds-replace.o cmds-check.o \
 	       cmds-restore.o cmds-rescue.o chunk-recover.o super-recover.o \
 	       cmds-property.o cmds-fi-usage.o cmds-inspect-dump-tree.o \
-	       cmds-inspect-dump-super.o cmds-inspect-tree-stats.o cmds-fi-du.o
+	       cmds-inspect-dump-super.o cmds-inspect-tree-stats.o cmds-fi-du.o \
+	       cmds-spare.o
 libbtrfs_objects = send-stream.o send-utils.o rbtree.o btrfs-list.o crc32c.o \
 		   uuid-tree.o utils-lib.o rbtree-utils.o
 libbtrfs_headers = send-stream.h send-utils.h send.h rbtree.h btrfs-list.h \
diff --git a/btrfs.c b/btrfs.c
index cc7051531824..455f78ec8012 100644
--- a/btrfs.c
+++ b/btrfs.c
@@ -200,6 +200,7 @@ static const struct cmd_group btrfs_cmd_group = {
 		{ "quota", cmd_quota, NULL, &quota_cmd_group, 0 },
 		{ "qgroup", cmd_qgroup, NULL, &qgroup_cmd_group, 0 },
 		{ "replace", cmd_replace, NULL, &replace_cmd_group, 0 },
+		{ "spare", cmd_spare, NULL, &spare_cmd_group, 0 },
 		{ "help", cmd_help, cmd_help_usage, NULL, 0 },
 		{ "version", cmd_version, cmd_version_usage, NULL, 0 },
 		NULL_CMD_STRUCT
diff --git a/cmds-spare.c b/cmds-spare.c
new file mode 100644
index 000000000000..2a5c4b4d7308
--- /dev/null
+++ b/cmds-spare.c
@@ -0,0 +1,292 @@
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <sys/ioctl.h>
+#include <errno.h>
+#include <getopt.h>
+
+#include "ctree.h"
+#include "utils.h"
+#include "volumes.h"
+#include "disk-io.h"
+
+#include "commands.h"
+
+int print_spare_device(unsigned unit_mode)
+{
+	int ret;
+	struct btrfs_fs_devices *fs_devices;
+	struct btrfs_device *device;
+	struct list_head *fs_uuids;
+
+	printf("Global spare\n");
+
+	ret = btrfs_scan_lblkid();
+	if (ret) {
+		error("scan_lblkid failed ret %d\n", ret);
+		return ret;
+	}
+
+	fs_uuids = btrfs_scanned_uuids();
+
+	list_for_each_entry(fs_devices, fs_uuids, list) {
+		if (!fs_devices->spare)
+			continue;
+
+		device = list_entry(fs_devices->devices.next,
+					struct btrfs_device, dev_list);
+		if (device->name)
+			printf("\tdevice size %s path %s\n",
+				pretty_size_mode(device->total_bytes,
+					unit_mode), device->name);
+
+	}
+
+	return 0;
+
+}
+
+static void btrfs_delete_spare(char *path)
+{
+	printf("Unscan the device (or don't run device scan after reboot) and run wipefs to wipe SB\n");
+
+}
+
+static void btrfs_add_spare(char *dev)
+{
+	struct stat st;
+	int fd;
+	int i;
+	int ret;
+	u64 block_cnt;
+	u64 blocks[7];
+	u32 nodesz = max_t(u32, sysconf(_SC_PAGESIZE), BTRFS_MKFS_DEFAULT_NODE_SIZE);
+	struct btrfs_mkfs_config mkfs_cfg;
+
+	fd = open(dev, O_RDWR);
+	if (fd < 0) {
+		error("unable to open %s: %s\n", dev, strerror(errno));
+		return;
+	}
+
+	if (fstat(fd, &st)) {
+		error("unable to stat %s\n", dev);
+		goto out;
+	}
+	block_cnt = btrfs_device_size(fd, &st);
+	if (!block_cnt) {
+		error("unable to find %s size\n", dev);
+		goto out;
+	}
+
+	if (block_cnt < BTRFS_MKFS_SYSTEM_GROUP_SIZE) {
+		error("device is too small to make filesystem\n");
+		goto out;
+	}
+
+	blocks[0] = BTRFS_SUPER_INFO_OFFSET;
+	for (i = 1; i < 7; i++)
+		blocks[i] = BTRFS_SUPER_INFO_OFFSET + 1024 * 1024 + nodesz * i;
+
+	memset(&mkfs_cfg, 0, sizeof(mkfs_cfg));
+	memcpy(mkfs_cfg.blocks, blocks, sizeof(blocks));
+	mkfs_cfg.num_bytes = block_cnt;
+	mkfs_cfg.nodesize = nodesz;
+	mkfs_cfg.sectorsize = 4096;
+	mkfs_cfg.stripesize = 4096;
+	mkfs_cfg.features = BTRFS_FEATURE_INCOMPAT_SPARE_DEV;
+	ret = make_btrfs(fd, &mkfs_cfg);
+	if (!ret) {
+		close(fd);
+		btrfs_register_one_device(dev);
+		return;
+	}
+	error("during mkfs: %s\n", strerror(-ret));
+
+out:
+	close(fd);
+}
+
+static const char * const spare_cmd_group_usage[] = {
+	"btrfs spare <command> [<args>]",
+	NULL
+};
+
+static const char * const cmd_spare_add_usage[] = {
+	"btrfs spare add <device> [<device>...]",
+	"Add global spare device(s) to btrfs",
+	"-K|--nodiscard    do not perform whole device TRIM",
+	"-f|--force        force overwrite existing filesystem on the disk",
+	NULL
+};
+
+static const char * const cmd_spare_delete_usage[] = {
+	"btrfs spare delete <device> [<device>...]",
+	"Delete global spare device(s) from btrfs",
+	NULL
+};
+
+static const char * const cmd_spare_list_usage[] = {
+	"btrfs spare list",
+	"List spare device(s) both scanned and unscanned(*) for kernel",
+	NULL
+};
+
+static int cmd_spare_add(int argc, char **argv)
+{
+	int i;
+	int force = 0;
+	int discard = 1;
+	int ret = 0;
+
+	while (1) {
+		int c;
+		static const struct option long_options[] = {
+			{ "nodiscard", optional_argument, NULL, 'K'},
+			{ "force", no_argument, NULL, 'f'},
+			{ NULL, 0, NULL, 0}
+		};
+
+		c = getopt_long(argc, argv, "f", long_options, NULL);
+		if (c < 0)
+			break;
+
+		switch (c) {
+		case 'K':
+			discard = 0;
+			break;
+		case 'f':
+			force = 1;
+			break;
+		default:
+			usage(cmd_spare_add_usage);
+		}
+	}
+
+	if (check_argc_min(argc - optind, 1))
+		usage(cmd_spare_add_usage);
+
+	for (i = optind; i < argc; i++) {
+		u64 dev_block_count = 0;
+		int devfd;
+		char *path;
+		int res;
+
+		if (test_dev_for_mkfs(argv[i], force)) {
+			ret++;
+			continue;
+		}
+
+		devfd = open(argv[i], O_RDWR);
+		if (devfd < 0) {
+			error("Unable to open device '%s'\n", argv[i]);
+			ret++;
+			continue;
+		}
+
+		res = btrfs_prepare_device(devfd, argv[i], 1,
+					&dev_block_count, 0, discard);
+		close(devfd);
+		if (res) {
+			ret++;
+			goto error_out;
+		}
+
+		path = canonicalize_path(argv[i]);
+		if (!path) {
+			error("Could not canonicalize pathname '%s': %s\n",
+				argv[i], strerror(errno));
+			ret++;
+			goto error_out;
+		}
+
+		btrfs_add_spare(path);
+		free(path);
+	}
+error_out:
+	btrfs_close_all_devices();
+	return !!ret;
+}
+
+static int cmd_spare_delete(int argc, char **argv)
+{
+	int i;
+	char *path;
+	int ret = 0;
+
+	if (check_argc_min(argc - optind, 1))
+		usage(cmd_spare_add_usage);
+
+	for (i = optind; i < argc; i++) {
+		int devfd;
+
+		devfd = open(argv[i], O_RDWR);
+		if (devfd < 0) {
+			error("Unable to open device '%s'\n", argv[i]);
+			ret++;
+			continue;
+		}
+		close(devfd);
+
+		path = canonicalize_path(argv[i]);
+		if (!path) {
+			error("Could not canonicalize pathname '%s': %s\n",
+				argv[i], strerror(errno));
+			ret++;
+			goto error_out;
+		}
+
+		btrfs_delete_spare(path);
+		free(path);
+	}
+
+error_out:
+	btrfs_close_all_devices();
+	return !!ret;
+}
+
+int cmd_spare_list(int argc, char **argv)
+{
+	int ret;
+	unsigned unit_mode;
+
+	unit_mode = get_unit_mode_from_arg(&argc, argv, 0);
+
+	ret = print_spare_device(unit_mode);
+
+	return !!ret;
+}
+
+static const char spare_cmd_group_info[] =
+	"manage spare devices in the filesystem";
+
+const struct cmd_group spare_cmd_group = {
+	spare_cmd_group_usage, spare_cmd_group_info, {
+		{ "add", cmd_spare_add, cmd_spare_add_usage, NULL, 0 },
+		{ "delete", cmd_spare_delete, cmd_spare_delete_usage, NULL, 0},
+		{ "list", cmd_spare_list, cmd_spare_list_usage, NULL, 0},
+		NULL_CMD_STRUCT
+	}
+};
+
+int cmd_spare(int argc, char **argv)
+{
+	return handle_command_group(&spare_cmd_group, argc, argv);
+}
diff --git a/commands.h b/commands.h
index 2da093bf81a3..c9be13fb15ba 100644
--- a/commands.h
+++ b/commands.h
@@ -95,6 +95,7 @@ extern const struct cmd_group quota_cmd_group;
 extern const struct cmd_group qgroup_cmd_group;
 extern const struct cmd_group replace_cmd_group;
 extern const struct cmd_group rescue_cmd_group;
+extern const struct cmd_group spare_cmd_group;
 
 extern const char * const cmd_send_usage[];
 extern const char * const cmd_receive_usage[];
@@ -119,6 +120,7 @@ int cmd_receive(int argc, char **argv);
 int cmd_quota(int argc, char **argv);
 int cmd_qgroup(int argc, char **argv);
 int cmd_replace(int argc, char **argv);
+int cmd_spare(int argc, char **argv);
 int cmd_restore(int argc, char **argv);
 int cmd_select_super(int argc, char **argv);
 int cmd_dump_super(int argc, char **argv);
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/4] btrfs-progs: add fi show for spare
  2016-03-29 14:27 ` [PATCH 1/4] btrfs-progs: Introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV SB flags Anand Jain
  2016-03-29 14:27   ` [PATCH v2 2/4] btrfs-progs: Introduce btrfs spare subcommand Anand Jain
@ 2016-03-29 14:27   ` Anand Jain
  2016-03-29 14:27   ` [PATCH 4/4] btrfs-progs: add global spare device list to filesystem show Anand Jain
  2 siblings, 0 replies; 25+ messages in thread
From: Anand Jain @ 2016-03-29 14:27 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, dsterba

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 cmds-filesystem.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 38404d29026e..0901c47e8679 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -351,6 +351,9 @@ static void print_one_uuid(struct btrfs_fs_devices *fs_devices,
 	if (add_seen_fsid(fs_devices->fsid))
 		return;
 
+	if (fs_devices->spare)
+		return;
+
 	uuid_unparse(fs_devices->fsid, uuidbuf);
 	device = list_entry(fs_devices->devices.next, struct btrfs_device,
 			    dev_list);
@@ -597,6 +600,7 @@ static int copy_fs_devices(struct btrfs_fs_devices *dst,
 	memcpy(dst->fsid, src->fsid, BTRFS_FSID_SIZE);
 	INIT_LIST_HEAD(&dst->devices);
 	dst->seed = NULL;
+	dst->spare = src->spare;
 
 	list_for_each_entry(cur_dev, &src->devices, dev_list) {
 		dev_copy = malloc(sizeof(*dev_copy));
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/4] btrfs-progs: add global spare device list to filesystem show
  2016-03-29 14:27 ` [PATCH 1/4] btrfs-progs: Introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV SB flags Anand Jain
  2016-03-29 14:27   ` [PATCH v2 2/4] btrfs-progs: Introduce btrfs spare subcommand Anand Jain
  2016-03-29 14:27   ` [PATCH 3/4] btrfs-progs: add fi show for spare Anand Jain
@ 2016-03-29 14:27   ` Anand Jain
  2 siblings, 0 replies; 25+ messages in thread
From: Anand Jain @ 2016-03-29 14:27 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, dsterba

This patch will add list of spare devices to the filesystem show
output, as show in the example below.

btrfs fi show
Label: none  uuid: 17f7d403-17d7-4f0a-b8ba-de673fdd3f56
	Total devices 2 FS bytes used 15.88MiB
	devid    1 size 2.00GiB used 417.50MiB path /dev/sdc
	devid    2 size 2.00GiB used 417.50MiB path /dev/sdd

Global spare
	device size 3.00GiB path /dev/sde

btrfs-progs v4.2.3-12-gb5f4b68

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 cmds-filesystem.c | 5 +++++
 utils.h           | 1 +
 2 files changed, 6 insertions(+)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 0901c47e8679..78e5eba624dd 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -903,6 +903,11 @@ devs_only:
 					struct btrfs_fs_devices, list);
 		free_fs_devices(fs_devices);
 	}
+
+	if (where == -1 && search == NULL) {
+		ret = print_spare_device(unit_mode);
+		printf("\n");
+	}
 out:
 	free_seen_fsid();
 	return ret;
diff --git a/utils.h b/utils.h
index ff5966d11913..ea54692aca86 100644
--- a/utils.h
+++ b/utils.h
@@ -280,6 +280,7 @@ const char *get_argv0_buf(void);
 unsigned int get_unit_mode_from_arg(int *argc, char *argv[], int df_mode);
 void clean_args_no_options(int argc, char *argv[], const char * const *usage);
 int string_is_numerical(const char *str);
+int print_spare_device(unsigned unit_mode);
 
 __attribute__ ((format (printf, 1, 2)))
 static inline void warning(const char *fmt, ...)
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/12] btrfs: introduce helper functions to perform hot replace
  2016-03-29 14:22 ` [PATCH 11/12] btrfs: introduce helper functions to perform hot replace Anand Jain
@ 2016-03-29 14:45   ` kbuild test robot
  2016-03-30 10:13     ` Anand Jain
  0 siblings, 1 reply; 25+ messages in thread
From: kbuild test robot @ 2016-03-29 14:45 UTC (permalink / raw)
  To: Anand Jain; +Cc: kbuild-all, linux-btrfs, clm, dsterba

[-- Attachment #1: Type: text/plain, Size: 2119 bytes --]

Hi Anand,

[auto build test ERROR on btrfs/next]
[also build test ERROR on v4.6-rc1 next-20160329]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url:    https://github.com/0day-ci/linux/commits/Anand-Jain/btrfs-Introduce-a-new-function-to-check-if-all-chunks-a-OK-for-degraded-mount/20160329-222724
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git next
config: sparc64-allmodconfig (attached as .config)
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=sparc64 

All error/warnings (new ones prefixed by >>):

   fs/btrfs/dev-replace.c: In function 'btrfs_auto_replace_start':
>> fs/btrfs/dev-replace.c:962:8: warning: passing argument 2 of 'btrfs_dev_replace_start' from incompatible pointer type
     ret = btrfs_dev_replace_start(root, tgt_path,
           ^
   fs/btrfs/dev-replace.c:308:5: note: expected 'struct btrfs_ioctl_dev_replace_args *' but argument is of type 'char *'
    int btrfs_dev_replace_start(struct btrfs_root *root,
        ^
>> fs/btrfs/dev-replace.c:962:8: error: too many arguments to function 'btrfs_dev_replace_start'
     ret = btrfs_dev_replace_start(root, tgt_path,
           ^
   fs/btrfs/dev-replace.c:308:5: note: declared here
    int btrfs_dev_replace_start(struct btrfs_root *root,
        ^

vim +/btrfs_dev_replace_start +962 fs/btrfs/dev-replace.c

   956		if (btrfs_get_spare_device(&tgt_path)) {
   957			btrfs_err(root->fs_info,
   958				"No spare device found/configured in the kernel");
   959			return -EINVAL;
   960		}
   961	
 > 962		ret = btrfs_dev_replace_start(root, tgt_path,
   963						src_device->devid,
   964						rcu_str_deref(src_device->name),
   965			BTRFS_IOCTL_DEV_REPLACE_CONT_READING_FROM_SRCDEV_MODE_AVOID);

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 44805 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 00/15] Introduce device state 'failed', Hot spare and Auto replace
  2016-03-29 14:22 [PATCH v2 00/15] Introduce device state 'failed', Hot spare and Auto replace Anand Jain
                   ` (12 preceding siblings ...)
  2016-03-29 14:27 ` [PATCH 1/4] btrfs-progs: Introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV SB flags Anand Jain
@ 2016-03-29 17:30 ` Austin S. Hemmelgarn
  13 siblings, 0 replies; 25+ messages in thread
From: Austin S. Hemmelgarn @ 2016-03-29 17:30 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs; +Cc: clm, dsterba

On 2016-03-29 10:22, Anand Jain wrote:
> Thanks for various comments, tests and feedback.
>
> Background: Hot spare and Auto replace:
>   Hot spare is predominately used to mitigate or narrow the time
>   window of a storage in degraded mode during which any further disk
>   failure might lead to a catastrophic data loss. Data center
>   storage generally will have couple of disks reserved as spares
>   on the storage. Mainly this is an enterprise storage feature
>   rather than a FS feature, I believe people acquainted with
>   enterprise storage use cases will appreciate the need of it and
>   so most/all of the enterprise storage has hot spare feature.
>
> Btrfs device states:
>   This patch-set adds 'failed' state and makes provision to use
>   'offline' state as two new device states. So to summarize
>   various device states and their meanings..
>
>   /* missing: device wasn't found at the time of mount */
>   int missing;
>
>   /*
>    * failed: device confirmed to have experienced critical
>    * io failure
>    */
>   int failed;
>
>   /*
>    * offline: When there is no confirmation that a disk has
>    * failed. But an interim communication breakdown
>    * and not necessarily a candidate for the device replace.
>    * Device might be online after user intervention or after
>    * block transport layer error recovery.
>    */
>   int offline;
>
>
> Device state transition Tuning and visualization:
>   Sysfs interfaces are planned to provide the required tuning for
>   device state transition sensitivities and visualization of device
>   states. However sysfs framework which could provide such an interface
>   is being reviewed/tested and not yet ready as of now. So for the
>   testing and debug of these features here I have used an update
>   version of the procfs patch which is in the ML.
>
>        [PATCH] btrfs: debug: procfs-devlist: introduce procfs interface for
> the device list for debugging
>
>   I find the above patch very useful and stable as compared to sysfs
> to visualize the device state.
>
> This patch set does not depend on any of the sysfs patches as such.
>
> Cross compatibility:
>   Adds a new incompatibility feature flags
>   (BTRFS_FEATURE_INCOMPAT_SPARE_DEV) to manage the spare device
>   when older kernels are used. So it is tested to be work fine
>   with older kernel/prog versions.
>
>
> Auto replace:
>   Replace happens automatically, that is when there is any write
>   failed or flush failed, the device will be marked as failed, which
>   will stop any further IO attempt to that device. And in the next
>   commit cycle the auto replace will pick the spare device to
>   replace the failed device. And so the btrfs volume is back to a
>   healthy state.
>
> Per FSID spare vs Global spare:
>   As of now only global hot spare is supported, that is hot spare(s)
>   are for all the btrfs FS in the system. However future there will
>   be a fs_info->no_auto_replace tunable which can be tuned by the user
>   to limit the use of global spare.
>
>
> Example use case:
>   Here below is an example use case of the hot spare setup.
>
>   Add a spare device:
>          btrfs spare add /dev/sde -f
>
>   If there is a spare device which is already added before the,
>   just run
>
>          btrfs dev scan [/dev/sde]
>
>   Which will register the spare device to the kernel.
>
>          btrfs fi show
>           Label: none uuid: 52f170c1-725c-457d-8cfd-d57090460091
>            Total devices 2 FS bytes used 112.00KiB
>            devid 1 size 2.00GiB used 417.50MiB path /dev/sdc
>            devid 2 size 2.00GiB used 417.50MiB path /dev/sdd
>
>          Global spare
>            device size 3.00GiB path /dev/sde
>
>
> Patches:
>
> Kernel:
>   First, it needs, Qu's per chunk missing device patchset, which is
>   part of the set.
>
>   Next patches 6/12 brings in support to manage the transition of
>   devices from online (no state) to offline OR failed state dynamically.
>   On top of static device state like the current "missing" state.
>
>   Next patches 7-11/12 adds support for Spare device. For kernel without
>   spare feature the spare device is kept away. And when the kernel
>   supports the spare device, it will inhibit from mounting it. Further
>   these patch set provides helper function to pick a spare device and
>   release a spare device back to the spare device pool.
>
>   Patch 11/12 provides function for auto replace, this is mainly
>   from the existing replace code.
>   Last 12/15, uses all these facilities, picks a failed device and
>   triggers a auto replace in a kthread (casualty_kthread())
>
>
> Progs:
>   Needs below 4 patches which will add sub cli 'spare' to manage
>   the spare device. As of now deleting a spare device has to be
>   managed using wipefs. However in the long run we would a proper
>   btrfs command to do that job.
>
>
> V1->V2:
> Kernel:
>   (Based on tests and commets provided in the ML)
>   a. Now transition_kthread() wakes up the casualty_kthread to check
>      for device states. Instead of doing that in the transition_kthread()
>      itself. Cleaner and less pressure on transition_kthread().
>   b. Dropped
>       [PATCH 05/15] btrfs: optimize btrfs_check_degradable() for calls outside of barrier
>      as it was wrong patch and the optimization was incomplete.
>   c. Merged patches
>      btrfs: check for failed device and hot replace
>        to
>      btrfs: check device for critical errors and mark failed
>      in an effort to make the changes as in a above.
>
> Progs:
>   a. Added to call btrfs_register_one_device() when doing btrfs
>      spare add
>
>
> Anand Jain (7):
>    btrfs: introduce device dynamic state transition to offline or failed
>    btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV
>    btrfs: add check not to mount a spare device
>    btrfs: support btrfs dev scan for spare device
>    btrfs: provide framework to get and put a spare device
>    btrfs: introduce helper functions to perform hot replace
>    btrfs: check device for critical errors and mark failed
>
> Qu Wenruo (5):
>    btrfs: Introduce a new function to check if all chunks a OK for
>      degraded mount
>    btrfs: Do per-chunk check for mount time check
>    btrfs: Do per-chunk degraded check for remount
>    btrfs: Allow barrier_all_devices to do per-chunk device check
>    btrfs: Cleanup num_tolerated_disk_barrier_failures
>
>   fs/btrfs/ctree.h       |   8 +-
>   fs/btrfs/dev-replace.c |  24 +++++
>   fs/btrfs/dev-replace.h |   1 +
>   fs/btrfs/disk-io.c     | 256 +++++++++++++++++++++++++++++++++--------------
>   fs/btrfs/disk-io.h     |   4 +-
>   fs/btrfs/super.c       |  20 +++-
>   fs/btrfs/volumes.c     | 263 +++++++++++++++++++++++++++++++++++++++++++++----
>   fs/btrfs/volumes.h     |  27 +++++
>   8 files changed, 504 insertions(+), 99 deletions(-)
>
> Anand Jain (4):
>    btrfs-progs: Introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV SB flags
>    btrfs-progs: Introduce btrfs spare subcommand
>    btrfs-progs: add fi show for spare
>    btrfs-progs: add global spare device list to filesystem show
>
>   Android.mk        |   2 +-
>   Makefile.in       |   3 +-
>   btrfs.c           |   1 +
>   cmds-filesystem.c |   9 ++
>   cmds-spare.c      | 292 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   commands.h        |   2 +
>   ctree.h           |   4 +-
>   utils.h           |   1 +
>   volumes.c         |   4 +
>   volumes.h         |   2 +
>   10 files changed, 317 insertions(+), 3 deletions(-)
>   create mode 100644 cmds-spare.c
>
I can't provide the same degree of testing this time that I did for the 
previous version (the system I had set up with my normal testing harness 
is offline for the foreseeable future).  That said, I've built and 
booted a kernel with these patches in a VM on my laptop and tested the 
new functionality, and everything appears to work like it's supposed to 
without breaking any existing code, so for the patch-set as a whole:

Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 12/12] btrfs: check device for critical errors and mark failed
  2016-03-29 14:22 ` [PATCH 12/12] btrfs: check device for critical errors and mark failed Anand Jain
@ 2016-03-29 22:41   ` Yauhen Kharuzhy
  2016-04-01 23:53     ` Anand Jain
  2016-03-30  0:49   ` Yauhen Kharuzhy
  1 sibling, 1 reply; 25+ messages in thread
From: Yauhen Kharuzhy @ 2016-03-29 22:41 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

On Tue, Mar 29, 2016 at 10:22:29PM +0800, Anand Jain wrote:
> Write and Flush errors are considered as critical errors,
> upon which the device will be brought offline and marked as
> failed. Write and Flush errors are identified using device
> error statistics.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> 
> btrfs: check for failed device and hot replace
> 
> This patch creates casualty_kthread to check for the failed
> devices, and triggers device replace.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>  fs/btrfs/ctree.h   |   2 +
>  fs/btrfs/disk-io.c | 161 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  fs/btrfs/disk-io.h |   2 +
>  fs/btrfs/volumes.c |   1 +
>  fs/btrfs/volumes.h |   4 ++
>  5 files changed, 169 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 2c185a8e92f0..36f1c29e00a0 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1569,6 +1569,7 @@ struct btrfs_fs_info {
>  	struct mutex tree_log_mutex;
>  	struct mutex transaction_kthread_mutex;
>  	struct mutex cleaner_mutex;
> +	struct mutex casualty_mutex;
>  	struct mutex chunk_mutex;
>  	struct mutex volume_mutex;
>  
> @@ -1686,6 +1687,7 @@ struct btrfs_fs_info {
>  	struct btrfs_workqueue *extent_workers;
>  	struct task_struct *transaction_kthread;
>  	struct task_struct *cleaner_kthread;
> +	struct task_struct *casualty_kthread;
>  	int thread_pool_size;
>  
>  	struct kobject *space_info_kobj;
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index b99329e37965..650e26e0acda 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1869,6 +1869,153 @@ sleep:
>  	return 0;
>  }
>  
> +static int btrfs_check_and_handle_casualty(void *arg)
> +{
> +	int ret;
> +	int found = 0;
> +	struct btrfs_device *device;
> +	struct btrfs_root *root = arg;
> +	struct btrfs_fs_info *fs_info = root->fs_info;
> +	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
> +
> +	btrfs_dev_replace_lock(&fs_info->dev_replace, 0);
> +	if (btrfs_dev_replace_is_ongoing(&fs_info->dev_replace)) {
> +		btrfs_dev_replace_unlock(&fs_info->dev_replace, 0);
> +		return -EBUSY;
> +	}
> +	btrfs_dev_replace_unlock(&fs_info->dev_replace, 0);
> +
> +	ret = btrfs_check_devices(fs_devices);
> +	if (ret == 1) {
> +		/*
> +		 * There were some casualties, and if its beyond a
> +		 * chunk group can tolerate, then FS will already
> +		 * be in readonly, so check that. And that's best
> +		 * btrfs could do as of now and no replace will help.
> +		 */
> +		if (fs_info->sb->s_flags & MS_RDONLY)
> +			return -EROFS;
> +
> +		mutex_lock(&fs_devices->device_list_mutex);
> +		rcu_read_lock();
> +		list_for_each_entry_rcu(device,
> +				&fs_devices->devices, dev_list) {
> +			if (device->failed) {
> +				found = 1;
> +				break;
> +			}
> +		}
> +		rcu_read_unlock();
> +		mutex_unlock(&fs_devices->device_list_mutex);
> +	}
> +
> +	/*
> +	 * We are using the replace code which should be interrupt-able
> +	 * during unmount, and as of now there is no user land stop
> +	 * request that we support and this will run until its complete
> +	 */
> +	if (found)
> +		ret = btrfs_auto_replace_start(root, device);
> +
> +	return ret;
> +}
> +
> +/*
> + * A kthread to check if any auto maintenance be required. This is
> + * multithread safe, and kthread is running only if
> + * fs_info->casualty_kthread is not NULL, fixme: atomic ?
> + */
> +static int casualty_kthread(void *arg)
> +{
> +	int ret;
> +	int again;
> +	struct btrfs_root *root = arg;
> +
> +	do {
> +		again = 0;
> +
> +		if (btrfs_need_cleaner_sleep(root))
> +			goto sleep;
> +
> +		if (!mutex_trylock(&root->fs_info->casualty_mutex))
> +			goto sleep;
> +
> +		if (btrfs_need_cleaner_sleep(root)) {
> +			mutex_unlock(&root->fs_info->casualty_mutex);
> +			goto sleep;
> +		}
> +
> +		ret = btrfs_check_and_handle_casualty(arg);
> +		if (ret == -EROFS) {
> +			/*
> +			 * When checking and fixing the devices, the
> +			 * FS may be marked as RO in some situations.
> +			 * And on ROFS casualty thread has no work.
> +			 * So optimize here, to stop this thread until
> +			 * FS is back to RW.
> +			 */
> +		}
> +		mutex_unlock(&root->fs_info->casualty_mutex);
> +
> +sleep:
> +		if (!try_to_freeze() && !again) {

This block was copy-pasted from the cleaner_kthread(). 'again' variable
is not used in reality, and using of try_to_freeze() in the cleaner_kthread()
was eliminated in 'for-linus-4.6' mason's branch in the commit
838fe188 'btrfs: cleaner_kthread() doesn't need explicit freeze'.
casualty_kthread() isn't marked as freezabe too,
so this check can be removed entirely.


> +			set_current_state(TASK_INTERRUPTIBLE);
> +			if (!kthread_should_stop())
> +				schedule();
> +			__set_current_state(TASK_RUNNING);
> +		}
> +	} while (!kthread_should_stop());
> +
> +	return 0;
> +}
> +

-- 
Yauhen Kharuzhy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 12/12] btrfs: check device for critical errors and mark failed
  2016-03-29 14:22 ` [PATCH 12/12] btrfs: check device for critical errors and mark failed Anand Jain
  2016-03-29 22:41   ` Yauhen Kharuzhy
@ 2016-03-30  0:49   ` Yauhen Kharuzhy
  2016-04-01 23:59     ` Anand Jain
  1 sibling, 1 reply; 25+ messages in thread
From: Yauhen Kharuzhy @ 2016-03-30  0:49 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, clm, dsterba

On Tue, Mar 29, 2016 at 10:22:29PM +0800, Anand Jain wrote:
> Write and Flush errors are considered as critical errors,
> upon which the device will be brought offline and marked as
> failed. Write and Flush errors are identified using device
> error statistics.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> 
> btrfs: check for failed device and hot replace
> 
> This patch creates casualty_kthread to check for the failed
> devices, and triggers device replace.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>  fs/btrfs/ctree.h   |   2 +
>  fs/btrfs/disk-io.c | 161 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  fs/btrfs/disk-io.h |   2 +
>  fs/btrfs/volumes.c |   1 +
>  fs/btrfs/volumes.h |   4 ++
>  5 files changed, 169 insertions(+), 1 deletion(-)

btrfs_check_and_handle_casualty() tries to perfom auto-replacement
only once after each failure. If no hotspare was added in system before failure, only one
remaining way to replace drive is to perform replace manually. This sounds
reasonable, so just clarification: are you sure that we shouldn't start
autoreplacement if hotspare will be added after drive failure?

V1 of the patchset tried to perform autoreplace endlessly until replace
drive is added.



-- 
Yauhen Kharuzhy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/12] btrfs: introduce helper functions to perform hot replace
  2016-03-29 14:45   ` kbuild test robot
@ 2016-03-30 10:13     ` Anand Jain
  2016-03-31  2:14       ` [kbuild-all] " Fengguang Wu
  0 siblings, 1 reply; 25+ messages in thread
From: Anand Jain @ 2016-03-30 10:13 UTC (permalink / raw)
  To: kbuild test robot; +Cc: kbuild-all, linux-btrfs, clm, dsterba



Hi,

  You are missing the patch set which includes
    https://patchwork.kernel.org/patch/8659651/

  btrfs: refactor btrfs_dev_replace_start for reuse


Thanks, Anand


On 03/29/2016 10:45 PM, kbuild test robot wrote:
> Hi Anand,
>
> [auto build test ERROR on btrfs/next]
> [also build test ERROR on v4.6-rc1 next-20160329]
> [if your patch is applied to the wrong git tree, please drop us a note to help improving the system]
>
> url:    https://github.com/0day-ci/linux/commits/Anand-Jain/btrfs-Introduce-a-new-function-to-check-if-all-chunks-a-OK-for-degraded-mount/20160329-222724
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git next
> config: sparc64-allmodconfig (attached as .config)
> reproduce:
>          wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
>          chmod +x ~/bin/make.cross
>          # save the attached .config to linux build tree
>          make.cross ARCH=sparc64
>
> All error/warnings (new ones prefixed by >>):
>
>     fs/btrfs/dev-replace.c: In function 'btrfs_auto_replace_start':
>>> fs/btrfs/dev-replace.c:962:8: warning: passing argument 2 of 'btrfs_dev_replace_start' from incompatible pointer type
>       ret = btrfs_dev_replace_start(root, tgt_path,
>             ^
>     fs/btrfs/dev-replace.c:308:5: note: expected 'struct btrfs_ioctl_dev_replace_args *' but argument is of type 'char *'
>      int btrfs_dev_replace_start(struct btrfs_root *root,
>          ^
>>> fs/btrfs/dev-replace.c:962:8: error: too many arguments to function 'btrfs_dev_replace_start'
>       ret = btrfs_dev_replace_start(root, tgt_path,
>             ^
>     fs/btrfs/dev-replace.c:308:5: note: declared here
>      int btrfs_dev_replace_start(struct btrfs_root *root,
>          ^
>
> vim +/btrfs_dev_replace_start +962 fs/btrfs/dev-replace.c
>
>     956		if (btrfs_get_spare_device(&tgt_path)) {
>     957			btrfs_err(root->fs_info,
>     958				"No spare device found/configured in the kernel");
>     959			return -EINVAL;
>     960		}
>     961	
>   > 962		ret = btrfs_dev_replace_start(root, tgt_path,
>     963						src_device->devid,
>     964						rcu_str_deref(src_device->name),
>     965			BTRFS_IOCTL_DEV_REPLACE_CONT_READING_FROM_SRCDEV_MODE_AVOID);
>
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [kbuild-all] [PATCH 11/12] btrfs: introduce helper functions to perform hot replace
  2016-03-30 10:13     ` Anand Jain
@ 2016-03-31  2:14       ` Fengguang Wu
  0 siblings, 0 replies; 25+ messages in thread
From: Fengguang Wu @ 2016-03-31  2:14 UTC (permalink / raw)
  To: Anand Jain; +Cc: clm, kbuild-all, linux-btrfs, dsterba

On Wed, Mar 30, 2016 at 06:13:43PM +0800, Anand Jain wrote:
> 
> 
> Hi,
> 
>  You are missing the patch set which includes
>    https://patchwork.kernel.org/patch/8659651/
> 
>  btrfs: refactor btrfs_dev_replace_start for reuse

Sorry that comes in another patchset and the robot currently is not
smart enough to understand the relationship between 2 patchsets.

Thanks,
Fengguang

> On 03/29/2016 10:45 PM, kbuild test robot wrote:
> >Hi Anand,
> >
> >[auto build test ERROR on btrfs/next]
> >[also build test ERROR on v4.6-rc1 next-20160329]
> >[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]
> >
> >url:    https://github.com/0day-ci/linux/commits/Anand-Jain/btrfs-Introduce-a-new-function-to-check-if-all-chunks-a-OK-for-degraded-mount/20160329-222724
> >base:   https://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git next
> >config: sparc64-allmodconfig (attached as .config)
> >reproduce:
> >         wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
> >         chmod +x ~/bin/make.cross
> >         # save the attached .config to linux build tree
> >         make.cross ARCH=sparc64
> >
> >All error/warnings (new ones prefixed by >>):
> >
> >    fs/btrfs/dev-replace.c: In function 'btrfs_auto_replace_start':
> >>>fs/btrfs/dev-replace.c:962:8: warning: passing argument 2 of 'btrfs_dev_replace_start' from incompatible pointer type
> >      ret = btrfs_dev_replace_start(root, tgt_path,
> >            ^
> >    fs/btrfs/dev-replace.c:308:5: note: expected 'struct btrfs_ioctl_dev_replace_args *' but argument is of type 'char *'
> >     int btrfs_dev_replace_start(struct btrfs_root *root,
> >         ^
> >>>fs/btrfs/dev-replace.c:962:8: error: too many arguments to function 'btrfs_dev_replace_start'
> >      ret = btrfs_dev_replace_start(root, tgt_path,
> >            ^
> >    fs/btrfs/dev-replace.c:308:5: note: declared here
> >     int btrfs_dev_replace_start(struct btrfs_root *root,
> >         ^
> >
> >vim +/btrfs_dev_replace_start +962 fs/btrfs/dev-replace.c
> >
> >    956		if (btrfs_get_spare_device(&tgt_path)) {
> >    957			btrfs_err(root->fs_info,
> >    958				"No spare device found/configured in the kernel");
> >    959			return -EINVAL;
> >    960		}
> >    961	
> >  > 962		ret = btrfs_dev_replace_start(root, tgt_path,
> >    963						src_device->devid,
> >    964						rcu_str_deref(src_device->name),
> >    965			BTRFS_IOCTL_DEV_REPLACE_CONT_READING_FROM_SRCDEV_MODE_AVOID);
> >
> >---
> >0-DAY kernel test infrastructure                Open Source Technology Center
> >https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
> >
> _______________________________________________
> kbuild-all mailing list
> kbuild-all@lists.01.org
> https://lists.01.org/mailman/listinfo/kbuild-all

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 12/12] btrfs: check device for critical errors and mark failed
  2016-03-29 22:41   ` Yauhen Kharuzhy
@ 2016-04-01 23:53     ` Anand Jain
  0 siblings, 0 replies; 25+ messages in thread
From: Anand Jain @ 2016-04-01 23:53 UTC (permalink / raw)
  To: Yauhen Kharuzhy; +Cc: linux-btrfs



On 03/30/2016 06:41 AM, Yauhen Kharuzhy wrote:
> On Tue, Mar 29, 2016 at 10:22:29PM +0800, Anand Jain wrote:
>> Write and Flush errors are considered as critical errors,
>> upon which the device will be brought offline and marked as
>> failed. Write and Flush errors are identified using device
>> error statistics.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>
>> btrfs: check for failed device and hot replace
>>
>> This patch creates casualty_kthread to check for the failed
>> devices, and triggers device replace.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>   fs/btrfs/ctree.h   |   2 +
>>   fs/btrfs/disk-io.c | 161 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>   fs/btrfs/disk-io.h |   2 +
>>   fs/btrfs/volumes.c |   1 +
>>   fs/btrfs/volumes.h |   4 ++
>>   5 files changed, 169 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 2c185a8e92f0..36f1c29e00a0 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -1569,6 +1569,7 @@ struct btrfs_fs_info {
>>   	struct mutex tree_log_mutex;
>>   	struct mutex transaction_kthread_mutex;
>>   	struct mutex cleaner_mutex;
>> +	struct mutex casualty_mutex;
>>   	struct mutex chunk_mutex;
>>   	struct mutex volume_mutex;
>>
>> @@ -1686,6 +1687,7 @@ struct btrfs_fs_info {
>>   	struct btrfs_workqueue *extent_workers;
>>   	struct task_struct *transaction_kthread;
>>   	struct task_struct *cleaner_kthread;
>> +	struct task_struct *casualty_kthread;
>>   	int thread_pool_size;
>>
>>   	struct kobject *space_info_kobj;
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index b99329e37965..650e26e0acda 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -1869,6 +1869,153 @@ sleep:
>>   	return 0;
>>   }
>>
>> +static int btrfs_check_and_handle_casualty(void *arg)
>> +{
>> +	int ret;
>> +	int found = 0;
>> +	struct btrfs_device *device;
>> +	struct btrfs_root *root = arg;
>> +	struct btrfs_fs_info *fs_info = root->fs_info;
>> +	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
>> +
>> +	btrfs_dev_replace_lock(&fs_info->dev_replace, 0);
>> +	if (btrfs_dev_replace_is_ongoing(&fs_info->dev_replace)) {
>> +		btrfs_dev_replace_unlock(&fs_info->dev_replace, 0);
>> +		return -EBUSY;
>> +	}
>> +	btrfs_dev_replace_unlock(&fs_info->dev_replace, 0);
>> +
>> +	ret = btrfs_check_devices(fs_devices);
>> +	if (ret == 1) {
>> +		/*
>> +		 * There were some casualties, and if its beyond a
>> +		 * chunk group can tolerate, then FS will already
>> +		 * be in readonly, so check that. And that's best
>> +		 * btrfs could do as of now and no replace will help.
>> +		 */
>> +		if (fs_info->sb->s_flags & MS_RDONLY)
>> +			return -EROFS;
>> +
>> +		mutex_lock(&fs_devices->device_list_mutex);
>> +		rcu_read_lock();
>> +		list_for_each_entry_rcu(device,
>> +				&fs_devices->devices, dev_list) {
>> +			if (device->failed) {
>> +				found = 1;
>> +				break;
>> +			}
>> +		}
>> +		rcu_read_unlock();
>> +		mutex_unlock(&fs_devices->device_list_mutex);
>> +	}
>> +
>> +	/*
>> +	 * We are using the replace code which should be interrupt-able
>> +	 * during unmount, and as of now there is no user land stop
>> +	 * request that we support and this will run until its complete
>> +	 */
>> +	if (found)
>> +		ret = btrfs_auto_replace_start(root, device);
>> +
>> +	return ret;
>> +}
>> +
>> +/*
>> + * A kthread to check if any auto maintenance be required. This is
>> + * multithread safe, and kthread is running only if
>> + * fs_info->casualty_kthread is not NULL, fixme: atomic ?
>> + */
>> +static int casualty_kthread(void *arg)
>> +{
>> +	int ret;
>> +	int again;
>> +	struct btrfs_root *root = arg;
>> +
>> +	do {
>> +		again = 0;
>> +
>> +		if (btrfs_need_cleaner_sleep(root))
>> +			goto sleep;
>> +
>> +		if (!mutex_trylock(&root->fs_info->casualty_mutex))
>> +			goto sleep;
>> +
>> +		if (btrfs_need_cleaner_sleep(root)) {
>> +			mutex_unlock(&root->fs_info->casualty_mutex);
>> +			goto sleep;
>> +		}
>> +
>> +		ret = btrfs_check_and_handle_casualty(arg);
>> +		if (ret == -EROFS) {
>> +			/*
>> +			 * When checking and fixing the devices, the
>> +			 * FS may be marked as RO in some situations.
>> +			 * And on ROFS casualty thread has no work.
>> +			 * So optimize here, to stop this thread until
>> +			 * FS is back to RW.
>> +			 */
>> +		}
>> +		mutex_unlock(&root->fs_info->casualty_mutex);
>> +
>> +sleep:
>> +		if (!try_to_freeze() && !again) {
>
> This block was copy-pasted from the cleaner_kthread(). 'again' variable
> is not used in reality, and using of try_to_freeze() in the cleaner_kthread()
> was eliminated in 'for-linus-4.6' mason's branch in the commit
> 838fe188 'btrfs: cleaner_kthread() doesn't need explicit freeze'.
> casualty_kthread() isn't marked as freezabe too,
> so this check can be removed entirely.


Thanks this is fixed in v3.

Anand

>
>> +			set_current_state(TASK_INTERRUPTIBLE);
>> +			if (!kthread_should_stop())
>> +				schedule();
>> +			__set_current_state(TASK_RUNNING);
>> +		}
>> +	} while (!kthread_should_stop());
>> +
>> +	return 0;
>> +}
>> +
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 12/12] btrfs: check device for critical errors and mark failed
  2016-03-30  0:49   ` Yauhen Kharuzhy
@ 2016-04-01 23:59     ` Anand Jain
  0 siblings, 0 replies; 25+ messages in thread
From: Anand Jain @ 2016-04-01 23:59 UTC (permalink / raw)
  To: Yauhen Kharuzhy; +Cc: linux-btrfs, clm, dsterba



On 03/30/2016 08:49 AM, Yauhen Kharuzhy wrote:
> On Tue, Mar 29, 2016 at 10:22:29PM +0800, Anand Jain wrote:
>> Write and Flush errors are considered as critical errors,
>> upon which the device will be brought offline and marked as
>> failed. Write and Flush errors are identified using device
>> error statistics.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>
>> btrfs: check for failed device and hot replace
>>
>> This patch creates casualty_kthread to check for the failed
>> devices, and triggers device replace.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>   fs/btrfs/ctree.h   |   2 +
>>   fs/btrfs/disk-io.c | 161 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>   fs/btrfs/disk-io.h |   2 +
>>   fs/btrfs/volumes.c |   1 +
>>   fs/btrfs/volumes.h |   4 ++
>>   5 files changed, 169 insertions(+), 1 deletion(-)
>
> btrfs_check_and_handle_casualty() tries to perfom auto-replacement
> only once after each failure. If no hotspare was added in system before failure, only one
> remaining way to replace drive is to perform replace manually. This sounds
> reasonable, so just clarification: are you sure that we shouldn't start
> autoreplacement if hotspare will be added after drive failure?
>
> V1 of the patchset tried to perform autoreplace endlessly until replace
> drive is added.

Yeah. I did that change purposely, but in V3 I have reverted, so
that code is more flexible and has better design control/change.

Thanks, Anand



^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-04-01 23:59 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-29 14:22 [PATCH v2 00/15] Introduce device state 'failed', Hot spare and Auto replace Anand Jain
2016-03-29 14:22 ` [PATCH 01/12] btrfs: Introduce a new function to check if all chunks a OK for degraded mount Anand Jain
2016-03-29 14:22 ` [PATCH 02/12] btrfs: Do per-chunk check for mount time check Anand Jain
2016-03-29 14:22 ` [PATCH 03/12] btrfs: Do per-chunk degraded check for remount Anand Jain
2016-03-29 14:22 ` [PATCH 04/12] btrfs: Allow barrier_all_devices to do per-chunk device check Anand Jain
2016-03-29 14:22 ` [PATCH 05/12] btrfs: Cleanup num_tolerated_disk_barrier_failures Anand Jain
2016-03-29 14:22 ` [PATCH 06/12] btrfs: introduce device dynamic state transition to offline or failed Anand Jain
2016-03-29 14:22 ` [PATCH 07/12] btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV Anand Jain
2016-03-29 14:22 ` [PATCH 08/12] btrfs: add check not to mount a spare device Anand Jain
2016-03-29 14:22 ` [PATCH 09/12] btrfs: support btrfs dev scan for " Anand Jain
2016-03-29 14:22 ` [PATCH 10/12] btrfs: provide framework to get and put a " Anand Jain
2016-03-29 14:22 ` [PATCH 11/12] btrfs: introduce helper functions to perform hot replace Anand Jain
2016-03-29 14:45   ` kbuild test robot
2016-03-30 10:13     ` Anand Jain
2016-03-31  2:14       ` [kbuild-all] " Fengguang Wu
2016-03-29 14:22 ` [PATCH 12/12] btrfs: check device for critical errors and mark failed Anand Jain
2016-03-29 22:41   ` Yauhen Kharuzhy
2016-04-01 23:53     ` Anand Jain
2016-03-30  0:49   ` Yauhen Kharuzhy
2016-04-01 23:59     ` Anand Jain
2016-03-29 14:27 ` [PATCH 1/4] btrfs-progs: Introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV SB flags Anand Jain
2016-03-29 14:27   ` [PATCH v2 2/4] btrfs-progs: Introduce btrfs spare subcommand Anand Jain
2016-03-29 14:27   ` [PATCH 3/4] btrfs-progs: add fi show for spare Anand Jain
2016-03-29 14:27   ` [PATCH 4/4] btrfs-progs: add global spare device list to filesystem show Anand Jain
2016-03-29 17:30 ` [PATCH v2 00/15] Introduce device state 'failed', Hot spare and Auto replace Austin S. Hemmelgarn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.