* [PATCH 00/15] btrfs: Hot spare and Auto replace
@ 2015-11-09 10:56 Anand Jain
2015-11-09 10:56 ` [PATCH 01/15] btrfs: Introduce a new function to check if all chunks a OK for degraded mount Anand Jain
` (19 more replies)
0 siblings, 20 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
To: linux-btrfs
These set of patches provides btrfs hot spare and auto replace support
for you review and comments.
First, here below are the simple example steps to configure the same:
Add a spare device:
btrfs spare add /dev/sde -f
OR if there is a spare device which is already added before the, just
run
btrfs dev scan [/dev/sde]
this will register the spare device to the kernel.
btrfs fi show
Label: none uuid: 52f170c1-725c-457d-8cfd-d57090460091
Total devices 2 FS bytes used 112.00KiB
devid 1 size 2.00GiB used 417.50MiB path /dev/sdc
devid 2 size 2.00GiB used 417.50MiB path /dev/sdd
Global spare
device size 3.00GiB path /dev/sde
Thats it.
Auto replace:
Replace happens automatically, that is when there is any write
failed or flush failed, the device will be marked as failed, which
will stop any further IO attempt to that device. And in the next commit
thread cycle the auto replace will pick the spare device (/dev/sde is
above example) to replace the failed device. And so the btrfs volume is
back to a healthy state.
Its btrfs Global spare:
as of now only global hot spare is supported, that is hot spare(s)
are for all the btrfs FS in the system.
No spare when device failed:
It would scan for spare device at the rate of transaction commit
and will trigger the auto replace when ever spare device is added.
Priority:
In some future work there can be some chronological order to pick
a spare and the failed device.
Patches:
Kernel:
First, it needs, Qu's per chunk missing device patchset,
which is part of the set here and also there is a light optimization
(patch 5/15) which was required as part of this enhancement.
Next patches 7,8/15 brings in support, to manage the transition of
devices from online (no state) to offline OR failed state dynamically.
On top of static device state like the current "missing" state.
Patch 9/15 fixes a bug where in we should have blocked the incompatible
feature at the device scan/add level instead/also at in the mount level.
This is because we don't have to bring a device into the device list,
if it is incompatible.
Next patches 10,11,12,13/15 adds support for Spare device. For the
details on how to add a spare device kindly see further below.
For kernel with out spare feature supported the spare device
is kept away. And when the kernel supports the spare device, it will
inhibit from mounting it. Further these patch set provides helper
function to pick a spare device and release a spare device back to
the spare device pool.
Patch 14/15 provides function for auto replace, this is mainly
from the existing replace code, and in the long run I see opportunity
to merge these code with the replace code that is triggered from
the user spare.
Last 15/15, uses all these facilities, picks a failed device and
triggers a auto replace in a kthread (casualty_kthread())
Progs:
Would need 4 patches as listed below.
Known Bug:
As now I see below stale kmem cache during module unload. Which
I am digging.
------
BUG btrfs_path (Not tainted): Objects remaining in btrfs_path on kmem_cache_close()
------
Anand Jain (10):
btrfs: optimize btrfs_check_degradable() for calls outside of barrier
btrfs: introduce device dynamic state transition to offline or failed
btrfs: check device for critical errors and mark failed
btrfs: block incompatible optional features at scan
btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV
btrfs: add check not to mount a spare device
btrfs: support btrfs dev scan for spare device
btrfs: provide framework to get and put a spare device
btrfs: introduce helper functions to perform hot replace
btrfs: check for failed device and hot replace
Qu Wenruo (5):
btrfs: Introduce a new function to check if all chunks a OK for
degraded mount
btrfs: Do per-chunk check for mount time check
btrfs: Do per-chunk degraded check for remount
btrfs: Allow barrier_all_devices to do per-chunk device check
btrfs: Cleanup num_tolerated_disk_barrier_failures
fs/btrfs/ctree.h | 7 +-
fs/btrfs/dev-replace.c | 116 ++++++++++++++++++++
fs/btrfs/dev-replace.h | 1 +
fs/btrfs/disk-io.c | 211 +++++++++++++++++++++++-------------
fs/btrfs/disk-io.h | 2 -
fs/btrfs/super.c | 20 +++-
fs/btrfs/transaction.c | 3 +-
fs/btrfs/volumes.c | 283 ++++++++++++++++++++++++++++++++++++++++++++++---
fs/btrfs/volumes.h | 27 +++++
9 files changed, 571 insertions(+), 99 deletions(-)
--
2.4.1
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH 01/15] btrfs: Introduce a new function to check if all chunks a OK for degraded mount
2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
2015-11-09 10:56 ` [PATCH 02/15] btrfs: Do per-chunk check for mount time check Anand Jain
` (18 subsequent siblings)
19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
To: linux-btrfs
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
Introduce a new function, btrfs_check_degradable(), to judge if all chunks
in btrfs is OK for degraded mount.
It provides the new basis for accurate btrfs mount/remount and even
runtime degraded mount check other than old one-size-fit-all method.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
fs/btrfs/volumes.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/btrfs/volumes.h | 1 +
2 files changed, 64 insertions(+)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f1fb3df..cfbdf9a 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6805,3 +6805,66 @@ void btrfs_close_one_device(struct btrfs_device *device)
call_rcu(&device->rcu, free_device);
}
+
+/*
+ * Check if all chunks in the fs is OK for degraded mount
+ * Caller itself should do extra check if DEGRADED mount option is given
+ * for >0 return value.
+ *
+ * Return 0 if all chunks are OK.
+ * Return >0 if all chunks are degradable but not all OK.
+ * Return <0 if any chunk is not degradable or other bug.
+ */
+int btrfs_check_degradable(struct btrfs_fs_info *fs_info, unsigned flags)
+{
+ struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
+ struct extent_map *em;
+ u64 next_start = 0;
+ int ret = 0;
+
+ if (flags & MS_RDONLY)
+ return 0;
+
+ read_lock(&map_tree->map_tree.lock);
+ em = lookup_extent_mapping(&map_tree->map_tree, 0, (u64)(-1));
+ /* No any chunk? Should be a huge bug */
+ if (!em) {
+ ret = -ENOENT;
+ goto out;
+ }
+
+ while (em) {
+ struct map_lookup *map;
+ int missing = 0;
+ int max_tolerated;
+ int i;
+
+ map = (struct map_lookup *) em->bdev;
+ max_tolerated =
+ btrfs_get_num_tolerated_disk_barrier_failures(
+ map->type);
+ for (i = 0; i < map->num_stripes; i++) {
+ if (map->stripes[i].dev->missing)
+ missing++;
+ }
+ if (missing > max_tolerated) {
+ ret = -EIO;
+ btrfs_warn(fs_info,
+ "missing devices(%d) exceeds the limit(%d), writebale mount is not allowed",
+ missing, max_tolerated);
+ goto out;
+ } else if (missing)
+ ret = 1;
+ next_start = extent_map_end(em);
+
+ /*
+ * Alwasy search range [next_start, (u64)-1) to find the next
+ * chunk map
+ */
+ em = lookup_extent_mapping(&map_tree->map_tree, next_start,
+ (u64)(-1) - next_start);
+ }
+out:
+ read_unlock(&map_tree->map_tree.lock);
+ return ret;
+}
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 4150d9d..c875be9 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -552,5 +552,6 @@ struct list_head *btrfs_get_fs_uuids(void);
void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
void btrfs_close_one_device(struct btrfs_device *device);
+int btrfs_check_degradable(struct btrfs_fs_info *fs_info, unsigned flags);
#endif
--
2.4.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 02/15] btrfs: Do per-chunk check for mount time check
2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
2015-11-09 10:56 ` [PATCH 01/15] btrfs: Introduce a new function to check if all chunks a OK for degraded mount Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
2015-11-09 10:56 ` [PATCH 03/15] btrfs: Do per-chunk degraded check for remount Anand Jain
` (17 subsequent siblings)
19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
To: linux-btrfs
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
Now use the btrfs_check_degraded() to do mount time degraded check.
With this patch, now we can mount with the following case:
# mkfs.btrfs -f -m raid1 -d single /dev/sdb /dev/sdc
# wipefs -a /dev/sdc
# mount /dev/sdb /mnt/btrfs -o degraded
As the single data chunk is only in sdb, so it's OK to mount as degraded,
as missing one device is OK for RAID1.
But still fail with the following case as expected:
# mkfs.btrfs -f -m raid1 -d single /dev/sdb /dev/sdc
# wipefs -a /dev/sdb
# mount /dev/sdc /mnt/btrfs -o degraded
As the data chunk is only in sdb, so it's not OK to mount it as degraded.
Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Reported-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[Btrfs: use btrfs_error instead of btrfs_err during mount]
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/disk-io.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 1776bcd..d54cdcc 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2866,6 +2866,16 @@ int open_ctree(struct super_block *sb,
goto fail_tree_roots;
}
+ ret = btrfs_check_degradable(fs_info, fs_info->sb->s_flags);
+ if (ret < 0) {
+ btrfs_err(fs_info, "degraded writable mount failed %d", ret);
+ goto fail_tree_roots;
+ } else if (ret > 0 && !btrfs_test_opt(chunk_root, DEGRADED)) {
+ btrfs_warn(fs_info,
+ "Some device missing, but still degraded mountable, please mount with -o degraded option");
+ ret = -EACCES;
+ goto fail_tree_roots;
+ }
/*
* keep the device that is marked to be the target device for the
* dev_replace procedure
@@ -2957,14 +2967,6 @@ retry_root_backup:
}
fs_info->num_tolerated_disk_barrier_failures =
btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
- if (fs_info->fs_devices->missing_devices >
- fs_info->num_tolerated_disk_barrier_failures &&
- !(sb->s_flags & MS_RDONLY)) {
- pr_warn("BTRFS: missing devices(%llu) exceeds the limit(%d), writeable mount is not allowed\n",
- fs_info->fs_devices->missing_devices,
- fs_info->num_tolerated_disk_barrier_failures);
- goto fail_sysfs;
- }
fs_info->cleaner_kthread = kthread_run(cleaner_kthread, tree_root,
"btrfs-cleaner");
--
2.4.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 03/15] btrfs: Do per-chunk degraded check for remount
2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
2015-11-09 10:56 ` [PATCH 01/15] btrfs: Introduce a new function to check if all chunks a OK for degraded mount Anand Jain
2015-11-09 10:56 ` [PATCH 02/15] btrfs: Do per-chunk check for mount time check Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
2015-11-09 10:56 ` [PATCH 04/15] btrfs: Allow barrier_all_devices to do per-chunk device check Anand Jain
` (16 subsequent siblings)
19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
To: linux-btrfs
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
Just the same for mount time check, use new btrfs_check_degraded() to do
per chunk check.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Btrfs: use btrfs_error instead of btrfs_err during remount
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/super.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index b23d49d..d495790 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1662,11 +1662,14 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
goto restore;
}
- if (fs_info->fs_devices->missing_devices >
- fs_info->num_tolerated_disk_barrier_failures &&
- !(*flags & MS_RDONLY)) {
+ ret = btrfs_check_degradable(fs_info, *flags);
+ if (ret < 0) {
+ btrfs_err(fs_info,
+ "degraded writable remount failed %d", ret);
+ goto restore;
+ } else if (ret > 0 && !btrfs_test_opt(root, DEGRADED)) {
btrfs_warn(fs_info,
- "too many missing devices, writeable remount is not allowed");
+ "some device missing, but still degraded mountable, please remount with -o degraded option");
ret = -EACCES;
goto restore;
}
--
2.4.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 04/15] btrfs: Allow barrier_all_devices to do per-chunk device check
2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
` (2 preceding siblings ...)
2015-11-09 10:56 ` [PATCH 03/15] btrfs: Do per-chunk degraded check for remount Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
2015-11-09 10:56 ` [PATCH 05/15] btrfs: optimize btrfs_check_degradable() for calls outside of barrier Anand Jain
` (15 subsequent siblings)
19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
To: linux-btrfs
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
The last user of num_tolerated_disk_barrier_failures is
barrier_all_devices(). But it's can be easily changed to new per-chunk
degradable check framework.
Now btrfs_device will have two extra members, representing send/wait
error, set at write_dev_flush() time. And then check it in a similar but
more accurate behavior than old code.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
fs/btrfs/disk-io.c | 13 +++++--------
fs/btrfs/volumes.c | 6 +++++-
fs/btrfs/volumes.h | 4 ++++
3 files changed, 14 insertions(+), 9 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d54cdcc..958c2a6 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3433,8 +3433,6 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
{
struct list_head *head;
struct btrfs_device *dev;
- int errors_send = 0;
- int errors_wait = 0;
int ret;
/* send down all the barriers */
@@ -3443,7 +3441,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
if (dev->missing)
continue;
if (!dev->bdev) {
- errors_send++;
+ dev->err_send = 1;
continue;
}
if (!dev->in_fs_metadata || !dev->writeable)
@@ -3451,7 +3449,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
ret = write_dev_flush(dev, 0);
if (ret)
- errors_send++;
+ dev->err_send = 1;
}
/* wait for all the barriers */
@@ -3459,7 +3457,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
if (dev->missing)
continue;
if (!dev->bdev) {
- errors_wait++;
+ dev->err_wait = 1;
continue;
}
if (!dev->in_fs_metadata || !dev->writeable)
@@ -3467,10 +3465,9 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
ret = write_dev_flush(dev, 1);
if (ret)
- errors_wait++;
+ dev->err_wait = 1;
}
- if (errors_send > info->num_tolerated_disk_barrier_failures ||
- errors_wait > info->num_tolerated_disk_barrier_failures)
+ if (btrfs_check_degradable(info, info->sb->s_flags) < 0)
return -EIO;
return 0;
}
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index cfbdf9a..8acf69b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6844,8 +6844,12 @@ int btrfs_check_degradable(struct btrfs_fs_info *fs_info, unsigned flags)
btrfs_get_num_tolerated_disk_barrier_failures(
map->type);
for (i = 0; i < map->num_stripes; i++) {
- if (map->stripes[i].dev->missing)
+ if (map->stripes[i].dev->missing ||
+ map->stripes[i].dev->err_wait ||
+ map->stripes[i].dev->err_send)
missing++;
+ map->stripes[i].dev->err_wait = 0;
+ map->stripes[i].dev->err_send = 0;
}
if (missing > max_tolerated) {
ret = -EIO;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index c875be9..d9a4579 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -76,6 +76,10 @@ struct btrfs_device {
int can_discard;
int is_tgtdev_for_dev_replace;
+ /* for barrier_all_devices() check */
+ int err_send;
+ int err_wait;
+
#ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
seqcount_t data_seqcount;
#endif
--
2.4.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 05/15] btrfs: optimize btrfs_check_degradable() for calls outside of barrier
2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
` (3 preceding siblings ...)
2015-11-09 10:56 ` [PATCH 04/15] btrfs: Allow barrier_all_devices to do per-chunk device check Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
2015-11-09 10:56 ` [PATCH 06/15] btrfs: Cleanup num_tolerated_disk_barrier_failures Anand Jain
` (14 subsequent siblings)
19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
To: linux-btrfs
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/disk-io.c | 8 +++++++-
fs/btrfs/volumes.c | 2 --
2 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 958c2a6..d3303f9 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3428,6 +3428,7 @@ static int write_dev_flush(struct btrfs_device *device, int wait)
/*
* send an empty flush down to each device in parallel,
* then wait for them
+ * fixme: optimize err_wait, err_send.
*/
static int barrier_all_devices(struct btrfs_fs_info *info)
{
@@ -3467,8 +3468,13 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
if (ret)
dev->err_wait = 1;
}
- if (btrfs_check_degradable(info, info->sb->s_flags) < 0)
+ if (btrfs_check_degradable(info, info->sb->s_flags) < 0) {
+ dev->err_send = 0;
+ dev->err_wait = 0;
return -EIO;
+ }
+ dev->err_send = 0;
+ dev->err_wait = 0;
return 0;
}
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 8acf69b..a5262bf 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6848,8 +6848,6 @@ int btrfs_check_degradable(struct btrfs_fs_info *fs_info, unsigned flags)
map->stripes[i].dev->err_wait ||
map->stripes[i].dev->err_send)
missing++;
- map->stripes[i].dev->err_wait = 0;
- map->stripes[i].dev->err_send = 0;
}
if (missing > max_tolerated) {
ret = -EIO;
--
2.4.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 06/15] btrfs: Cleanup num_tolerated_disk_barrier_failures
2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
` (4 preceding siblings ...)
2015-11-09 10:56 ` [PATCH 05/15] btrfs: optimize btrfs_check_degradable() for calls outside of barrier Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
2015-12-05 7:16 ` Qu Wenruo
2015-11-09 10:56 ` [PATCH 07/15] btrfs: introduce device dynamic state transition to offline or failed Anand Jain
` (13 subsequent siblings)
19 siblings, 1 reply; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
To: linux-btrfs
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
As we use per-chunk degradable check, now the global
num_tolerated_disk_barrier_failures is of no use. So cleanup it.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[Btrfs: resolve conflict to apply 'btrfs: Cleanup num_tolerated_disk_barrier_failures']
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/ctree.h | 2 --
fs/btrfs/disk-io.c | 56 ------------------------------------------------------
fs/btrfs/disk-io.h | 2 --
fs/btrfs/volumes.c | 17 -----------------
4 files changed, 77 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a86051e..dedd3e0 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1753,8 +1753,6 @@ struct btrfs_fs_info {
/* next backup root to be overwritten */
int backup_root_index;
- int num_tolerated_disk_barrier_failures;
-
/* device replace state */
struct btrfs_dev_replace dev_replace;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d3303f9..d10ef2e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2965,8 +2965,6 @@ retry_root_backup:
printk(KERN_ERR "BTRFS: Failed to read block groups: %d\n", ret);
goto fail_sysfs;
}
- fs_info->num_tolerated_disk_barrier_failures =
- btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
fs_info->cleaner_kthread = kthread_run(cleaner_kthread, tree_root,
"btrfs-cleaner");
@@ -3498,60 +3496,6 @@ int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags)
return 0;
}
-int btrfs_calc_num_tolerated_disk_barrier_failures(
- struct btrfs_fs_info *fs_info)
-{
- struct btrfs_ioctl_space_info space;
- struct btrfs_space_info *sinfo;
- u64 types[] = {BTRFS_BLOCK_GROUP_DATA,
- BTRFS_BLOCK_GROUP_SYSTEM,
- BTRFS_BLOCK_GROUP_METADATA,
- BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA};
- int i;
- int c;
- int num_tolerated_disk_barrier_failures =
- (int)fs_info->fs_devices->num_devices;
-
- for (i = 0; i < ARRAY_SIZE(types); i++) {
- struct btrfs_space_info *tmp;
-
- sinfo = NULL;
- rcu_read_lock();
- list_for_each_entry_rcu(tmp, &fs_info->space_info, list) {
- if (tmp->flags == types[i]) {
- sinfo = tmp;
- break;
- }
- }
- rcu_read_unlock();
-
- if (!sinfo)
- continue;
-
- down_read(&sinfo->groups_sem);
- for (c = 0; c < BTRFS_NR_RAID_TYPES; c++) {
- u64 flags;
-
- if (list_empty(&sinfo->block_groups[c]))
- continue;
-
- btrfs_get_block_group_info(&sinfo->block_groups[c],
- &space);
- if (space.total_bytes == 0 || space.used_bytes == 0)
- continue;
- flags = space.flags;
-
- num_tolerated_disk_barrier_failures = min(
- num_tolerated_disk_barrier_failures,
- btrfs_get_num_tolerated_disk_barrier_failures(
- flags));
- }
- up_read(&sinfo->groups_sem);
- }
-
- return num_tolerated_disk_barrier_failures;
-}
-
static int write_all_supers(struct btrfs_root *root, int max_mirrors)
{
struct list_head *head;
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index adeb318..6dc5fd3 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -142,8 +142,6 @@ struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans,
int btree_lock_page_hook(struct page *page, void *data,
void (*flush_fn)(void *));
int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags);
-int btrfs_calc_num_tolerated_disk_barrier_failures(
- struct btrfs_fs_info *fs_info);
int __init btrfs_end_io_wq_init(void);
void btrfs_end_io_wq_exit(void);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a5262bf..33ad42e 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1782,9 +1782,6 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path, u64 devid)
free_fs_devices(cur_devices);
}
- root->fs_info->num_tolerated_disk_barrier_failures =
- btrfs_calc_num_tolerated_disk_barrier_failures(root->fs_info);
-
/*
* at this point, the device is zero sized. We want to
* remove it from the devices list and zero out the old super
@@ -2289,8 +2286,6 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path)
}
}
- root->fs_info->num_tolerated_disk_barrier_failures =
- btrfs_calc_num_tolerated_disk_barrier_failures(root->fs_info);
ret = btrfs_commit_transaction(trans, root);
if (seeding_dev) {
@@ -3518,13 +3513,6 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
}
} while (read_seqretry(&fs_info->profiles_lock, seq));
- if (bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) {
- fs_info->num_tolerated_disk_barrier_failures = min(
- btrfs_calc_num_tolerated_disk_barrier_failures(fs_info),
- btrfs_get_num_tolerated_disk_barrier_failures(
- bctl->sys.target));
- }
-
ret = insert_balance_item(fs_info->tree_root, bctl);
if (ret && ret != -EEXIST)
goto out;
@@ -3547,11 +3535,6 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
mutex_lock(&fs_info->balance_mutex);
atomic_dec(&fs_info->balance_running);
- if (bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) {
- fs_info->num_tolerated_disk_barrier_failures =
- btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
- }
-
if (bargs) {
memset(bargs, 0, sizeof(*bargs));
update_ioctl_balance_args(fs_info, 0, bargs);
--
2.4.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 07/15] btrfs: introduce device dynamic state transition to offline or failed
2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
` (5 preceding siblings ...)
2015-11-09 10:56 ` [PATCH 06/15] btrfs: Cleanup num_tolerated_disk_barrier_failures Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
2015-11-09 10:56 ` [PATCH 08/15] btrfs: check device for critical errors and mark failed Anand Jain
` (12 subsequent siblings)
19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
To: linux-btrfs
Need device forced offline/failed feature for the following reasons,
1) a. it can be reported that device has failed when it does
b. close the device when it goes offline so that blocklayer can
cleanup
2) identify the candidate for the auto replace
3) avoid further commit error reported against the failing device and
4) a device in the multi device btrfs may go offline from the system
(but as of now in in some system config btrfs gets unmounted in this
context, which is not a correct behavior)
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/volumes.c | 148 +++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/btrfs/volumes.h | 14 +++++
2 files changed, 162 insertions(+)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 33ad42e..7492733 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6853,3 +6853,151 @@ out:
read_unlock(&map_tree->map_tree.lock);
return ret;
}
+
+static void __close_device(struct work_struct *work)
+{
+ struct btrfs_device *device;
+
+ device = container_of(work, struct btrfs_device, rcu_work);
+
+ if (device->bdev)
+ blkdev_put(device->bdev, device->mode);
+
+ device->bdev = NULL;
+}
+
+static void close_device(struct rcu_head *head)
+{
+ struct btrfs_device *device;
+
+ device = container_of(head, struct btrfs_device, rcu);
+
+ INIT_WORK(&device->rcu_work, __close_device);
+ schedule_work(&device->rcu_work);
+}
+
+void btrfs_close_one_device_dont_free(struct btrfs_device *device)
+{
+ struct btrfs_fs_devices *fs_devices = device->fs_devices;
+
+ if (device->bdev)
+ fs_devices->open_devices--;
+
+ if (device->writeable &&
+ device->devid != BTRFS_DEV_REPLACE_DEVID) {
+ list_del_init(&device->dev_alloc_list);
+ fs_devices->rw_devices--;
+ }
+
+ device->writeable = 0;
+
+ call_rcu(&device->rcu, close_device);
+}
+
+void __force_device_close(struct btrfs_device *device)
+{
+ struct btrfs_device *next_device;
+ struct btrfs_fs_devices *fs_devices;
+
+ fs_devices = device->fs_devices;
+
+ mutex_lock(&fs_devices->device_list_mutex);
+ lock_chunks(fs_devices->fs_info->fs_root);
+
+ next_device = list_entry(fs_devices->devices.next,
+ struct btrfs_device, dev_list);
+ if (device->bdev == fs_devices->fs_info->sb->s_bdev)
+ fs_devices->fs_info->sb->s_bdev = next_device->bdev;
+
+ if (device->bdev == fs_devices->latest_bdev)
+ fs_devices->latest_bdev = next_device->bdev;
+
+ btrfs_close_one_device_dont_free(device);
+
+ /*
+ * fixme: works for now, but its better to keep the state
+ * missing and offline different, and update rest of the
+ * places where we check for only missing.
+ */
+ device->missing = 1;
+ fs_devices->missing_devices++;
+ device->writeable = 0;
+
+ rcu_barrier();
+
+ unlock_chunks(fs_devices->fs_info->fs_root);
+ mutex_unlock(&fs_devices->device_list_mutex);
+}
+
+void btrfs_force_device_close(struct btrfs_device *dev, char *why)
+{
+ bool degrade_option;
+ int tolerated_fail;
+ u64 rw_devices;
+ struct btrfs_fs_info *fs_info;
+ struct btrfs_fs_devices *fs_devices;
+
+ fs_devices = dev->fs_devices;
+ fs_info = fs_devices->fs_info;
+ tolerated_fail = btrfs_check_degradable(fs_info,
+ fs_info->sb->s_flags);
+ rw_devices = fs_devices->rw_devices;
+ degrade_option = btrfs_test_opt(fs_info->fs_root, DEGRADED);
+
+ /* todo: support seed later */
+ if (fs_devices->seeding)
+ return;
+
+ /* this shouldn't be called if device is already missing */
+ if (dev->missing || !dev->bdev)
+ return;
+
+ if (dev->offline || dev->failed)
+ return;
+
+ /* last standing device is being offlined */
+ if (rw_devices == 1) {
+ btrfs_std_error(fs_info, -EIO, "force offline last RW device");
+ return;
+ }
+
+ if (!strcmp(why, "offline"))
+ dev->offline = 1;
+ else if (!strcmp(why, "failed"))
+ dev->failed = 1;
+ else
+ return;
+
+ rcu_read_lock();
+ btrfs_info(fs_info,
+ "device %s %s num_devices %llu rw_devices %llu degraded %d -o degraded %s",
+ rcu_str_deref(dev->name), why, fs_devices->num_devices,
+ rw_devices, tolerated_fail,
+ degrade_option ? "set":"unset");
+ rcu_read_unlock();
+
+ btrfs_sysfs_rm_device_link(fs_devices, dev, 0);
+
+ __force_device_close(dev);
+ tolerated_fail = btrfs_check_degradable(fs_info,
+ fs_info->sb->s_flags);
+ if (tolerated_fail > 0) {
+ rcu_read_lock();
+ btrfs_warn(fs_info, "device %s %s, chunks degraded",
+ rcu_str_deref(dev->name), why);
+ rcu_read_unlock();
+ return;
+ } else if(tolerated_fail < 0) {
+ rcu_read_lock();
+ btrfs_warn(fs_info,
+ "device %s is %s, device(s) with critical chunk(s) missing",
+ rcu_str_deref(dev->name), why);
+ rcu_read_unlock();
+ btrfs_std_error(fs_info, -EIO, "devices below critical level");
+ return;
+ }
+ rcu_read_lock();
+ btrfs_warn(fs_info, "device %s %s, No chunks are degraded",
+ rcu_str_deref(dev->name), why);
+ rcu_read_unlock();
+}
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index d9a4579..1c6107a 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -72,7 +72,20 @@ struct btrfs_device {
int writeable;
int in_fs_metadata;
+ /* missing: device wasn't found at the time of mount */
+ /* fixme: correct usage of missing_devices and missing */
int missing;
+ /* failed: device confirmed to have experienced critical io failure */
+ int failed;
+ /*
+ * offline: system or user or block layer transport has removed
+ * offlined the device which was once present and without going
+ * through unmount. Implies an intriem communication break down
+ * and not necessarily a candidate for the device replace. And
+ * device might be online after user intervention or after
+ * block transport layer error recovery.
+ */
+ int offline;
int can_discard;
int is_tgtdev_for_dev_replace;
@@ -557,5 +570,6 @@ void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
void btrfs_close_one_device(struct btrfs_device *device);
int btrfs_check_degradable(struct btrfs_fs_info *fs_info, unsigned flags);
+void btrfs_force_device_close(struct btrfs_device *dev, char *why);
#endif
--
2.4.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 08/15] btrfs: check device for critical errors and mark failed
2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
` (6 preceding siblings ...)
2015-11-09 10:56 ` [PATCH 07/15] btrfs: introduce device dynamic state transition to offline or failed Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
2015-11-09 10:56 ` [PATCH 09/15] btrfs: block incompatible optional features at scan Anand Jain
` (11 subsequent siblings)
19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
To: linux-btrfs
Write and Flush errors are considered as critical errors,
upon which the device will be brought offline and marked as
failed. Write and Flush errors are identified using device
error statistics.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/disk-io.c | 43 +++++++++++++++++++++++++++++++++++++++++++
fs/btrfs/volumes.c | 1 +
fs/btrfs/volumes.h | 4 ++++
3 files changed, 48 insertions(+)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d10ef2e..38e0385 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1836,6 +1836,47 @@ sleep:
return 0;
}
+static void btrfs_check_devices(struct btrfs_fs_devices *fs_devices)
+{
+ struct btrfs_fs_info *fs_info = fs_devices->fs_info;
+ struct btrfs_device *device;
+
+ if (btrfs_fs_closing(fs_info))
+ return;
+
+ /* mark disk(s) with write or flush error(s) as failed */
+ mutex_lock(&fs_info->volume_mutex);
+ list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) {
+ int c_err;
+
+ /*
+ * todo: replace target device's write/flush error,
+ * skip for now
+ */
+ if (device->is_tgtdev_for_dev_replace)
+ continue;
+
+ if (!device->dev_stats_valid)
+ continue;
+
+ c_err = atomic_read(&device->new_critical_errs);
+ atomic_sub(c_err, &device->new_critical_errs);
+ if (c_err) {
+ rcu_read_lock();
+ btrfs_warn(fs_info,
+ "new write errors on device %s",
+ rcu_str_deref(device->name));
+ rcu_read_unlock();
+
+ /* force close and mark device as failed */
+ btrfs_force_device_close(device, "failed");
+ }
+ }
+ mutex_unlock(&fs_info->volume_mutex);
+
+ return;
+}
+
static int transaction_kthread(void *arg)
{
struct btrfs_root *root = arg;
@@ -1882,6 +1923,8 @@ static int transaction_kthread(void *arg)
btrfs_end_transaction(trans, root);
}
sleep:
+ btrfs_check_devices(root->fs_info->fs_devices);
+
wake_up_process(root->fs_info->cleaner_kthread);
mutex_unlock(&root->fs_info->transaction_kthread_mutex);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 7492733..b52197b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -157,6 +157,7 @@ static struct btrfs_device *__alloc_device(void)
spin_lock_init(&dev->reada_lock);
atomic_set(&dev->reada_in_flight, 0);
atomic_set(&dev->dev_stats_ccnt, 0);
+ atomic_set(&dev->new_critical_errs, 0);
INIT_RADIX_TREE(&dev->reada_zones, GFP_NOFS & ~__GFP_WAIT);
INIT_RADIX_TREE(&dev->reada_extents, GFP_NOFS & ~__GFP_WAIT);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 1c6107a..827371e 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -167,6 +167,7 @@ struct btrfs_device {
/* Counter to record the change of device stats */
atomic_t dev_stats_ccnt;
atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX];
+ atomic_t new_critical_errs;
};
/*
@@ -518,6 +519,9 @@ static inline void btrfs_dev_stat_inc(struct btrfs_device *dev,
atomic_inc(dev->dev_stat_values + index);
smp_mb__before_atomic();
atomic_inc(&dev->dev_stats_ccnt);
+ if (index == BTRFS_DEV_STAT_WRITE_ERRS ||
+ index == BTRFS_DEV_STAT_FLUSH_ERRS)
+ atomic_inc(&dev->new_critical_errs);
}
static inline int btrfs_dev_stat_read(struct btrfs_device *dev,
--
2.4.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 09/15] btrfs: block incompatible optional features at scan
2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
` (7 preceding siblings ...)
2015-11-09 10:56 ` [PATCH 08/15] btrfs: check device for critical errors and mark failed Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
2015-11-09 10:56 ` [PATCH 10/15] btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV Anand Jain
` (10 subsequent siblings)
19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
To: linux-btrfs
For the matter of completeness we need to check if the device
being scanned has features that are known to the kernel. As of
now if it doesn't - the mount will fails, then what is the point
in having those devices added to the btrfs_fs_devices list at
device_list_add().
So block those devices at scan. Which means the original block at
open_ctee() won't reach in case of device with unsupported feature.
But I am leaving that code as it is. without deleting.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/volumes.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b52197b..fcc9e57 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -976,6 +976,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
u64 transid;
u64 total_devices;
u64 bytenr;
+ u64 features;
/*
* we would like to check all the supers, but that would make
@@ -996,6 +997,15 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super))
goto error_bdev_put;
+ features = btrfs_super_incompat_flags(disk_super) &
+ ~BTRFS_FEATURE_INCOMPAT_SUPP;
+ if (features) {
+ printk(KERN_ERR \
+ "BTRFS: couldn't scan, unsupported optional features (%Lx)\n",
+ features);
+ ret = -EOPNOTSUPP;
+ goto error_disk_super;
+ }
devid = btrfs_stack_device_id(&disk_super->dev_item);
transid = btrfs_super_generation(disk_super);
total_devices = btrfs_super_num_devices(disk_super);
@@ -1010,6 +1020,7 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
if (!ret && fs_devices_ret)
(*fs_devices_ret)->total_devices = total_devices;
+error_disk_super:
btrfs_release_disk_super(page);
error_bdev_put:
--
2.4.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 10/15] btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV
2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
` (8 preceding siblings ...)
2015-11-09 10:56 ` [PATCH 09/15] btrfs: block incompatible optional features at scan Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
2015-11-09 10:56 ` [PATCH 11/15] btrfs: add check not to mount a spare device Anand Jain
` (9 subsequent siblings)
19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
To: linux-btrfs
Add BTRFS_FEATURE_INCOMPAT_SPARE_DEV (400) flag to identify
a spare device.
Along with this it checks in the mount context that a spare
device will fail to mount. As spare devices aren't mountable.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/ctree.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index dedd3e0..4d25fd8 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -522,6 +522,7 @@ struct btrfs_super_block {
#define BTRFS_FEATURE_INCOMPAT_RAID56 (1ULL << 7)
#define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA (1ULL << 8)
#define BTRFS_FEATURE_INCOMPAT_NO_HOLES (1ULL << 9)
+#define BTRFS_FEATURE_INCOMPAT_SPARE_DEV (1ULL << 10)
#define BTRFS_FEATURE_COMPAT_SUPP 0ULL
#define BTRFS_FEATURE_COMPAT_SAFE_SET 0ULL
@@ -539,7 +540,8 @@ struct btrfs_super_block {
BTRFS_FEATURE_INCOMPAT_RAID56 | \
BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF | \
BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA | \
- BTRFS_FEATURE_INCOMPAT_NO_HOLES)
+ BTRFS_FEATURE_INCOMPAT_NO_HOLES | \
+ BTRFS_FEATURE_INCOMPAT_SPARE_DEV)
#define BTRFS_FEATURE_INCOMPAT_SAFE_SET \
(BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)
--
2.4.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 11/15] btrfs: add check not to mount a spare device
2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
` (9 preceding siblings ...)
2015-11-09 10:56 ` [PATCH 10/15] btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
2015-11-09 10:56 ` [PATCH 12/15] btrfs: support btrfs dev scan for " Anand Jain
` (8 subsequent siblings)
19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
To: linux-btrfs
Spare devices can be scanned but shouldn't be mountable.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/disk-io.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 38e0385..3662c0a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2768,6 +2768,14 @@ int open_ctree(struct super_block *sb,
goto fail_alloc;
}
+ if (btrfs_super_incompat_flags(disk_super) &
+ BTRFS_FEATURE_INCOMPAT_SPARE_DEV) {
+ /*You can only scan a spare device but not mount*/
+ printk(KERN_ERR "BTRFS: You can't mount a spare device\n");
+ err = -ENOTSUPP;
+ goto fail_alloc;
+ }
+
/*
* Leafsize and nodesize were always equal, this is only a sanity check.
*/
--
2.4.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 12/15] btrfs: support btrfs dev scan for spare device
2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
` (10 preceding siblings ...)
2015-11-09 10:56 ` [PATCH 11/15] btrfs: add check not to mount a spare device Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
2015-11-09 10:56 ` [PATCH 13/15] btrfs: provide framework to get and put a " Anand Jain
` (7 subsequent siblings)
19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
To: linux-btrfs
When the user or system calls the BTRFS_IOC_SCAN_DEV,
ioctl this patch will make sure it is added to the device
list and set it as spare.
This operation will be same when BTRFS_IOC_DEVICES_READY
as well since BTRFS_IOC_DEVICES_READY ioctl has been doing
that by legacy.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/volumes.c | 4 ++++
fs/btrfs/volumes.h | 2 ++
2 files changed, 6 insertions(+)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index fcc9e57..28f549d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -526,6 +526,10 @@ static noinline int device_list_add(const char *path,
if (IS_ERR(fs_devices))
return PTR_ERR(fs_devices);
+ if (btrfs_super_incompat_flags(disk_super) &
+ BTRFS_FEATURE_INCOMPAT_SPARE_DEV)
+ fs_devices->spare = 1;
+
list_add(&fs_devices->list, &fs_uuids);
device = NULL;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 827371e..3d995b7 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -278,6 +278,8 @@ struct btrfs_fs_devices {
struct kobject fsid_kobj;
struct kobject *device_dir_kobj;
struct completion kobj_unregister;
+
+ int spare;
};
#define BTRFS_BIO_INLINE_CSUM_SIZE 64
--
2.4.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 13/15] btrfs: provide framework to get and put a spare device
2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
` (11 preceding siblings ...)
2015-11-09 10:56 ` [PATCH 12/15] btrfs: support btrfs dev scan for " Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
2015-11-09 10:56 ` [PATCH 14/15] btrfs: introduce helper functions to perform hot replace Anand Jain
` (6 subsequent siblings)
19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
To: linux-btrfs
This adds functions to get and put a spare device from the list.
So that hot repace code can pick a spare device when needed.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/super.c | 9 +++++++++
fs/btrfs/volumes.c | 37 +++++++++++++++++++++++++++++++++++++
fs/btrfs/volumes.h | 2 ++
3 files changed, 48 insertions(+)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index d495790..29836ca 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2035,6 +2035,15 @@ static int btrfs_control_open(struct inode *inode, struct file *file)
return 0;
}
+void btrfs_put_spare_device(char *path)
+{
+ struct btrfs_fs_devices *fs_devices;
+
+ if (btrfs_scan_one_device(path, FMODE_READ,
+ &btrfs_fs_type, &fs_devices))
+ printk(KERN_INFO "failed to return spare device\n");
+}
+
/*
* used by btrfsctl to scan devices when no FS is mounted
*/
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 28f549d..3b90690 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7017,3 +7017,40 @@ void btrfs_force_device_close(struct btrfs_device *dev, char *why)
rcu_str_deref(dev->name), why);
rcu_read_unlock();
}
+
+int btrfs_get_spare_device(char **path)
+{
+ int ret = 1;
+ struct btrfs_fs_devices *fs_devices;
+ struct btrfs_device *device;
+ struct list_head *fs_uuids = btrfs_get_fs_uuids();
+
+ mutex_lock(&uuid_mutex);
+ list_for_each_entry(fs_devices, fs_uuids, list) {
+ if (!fs_devices->spare)
+ continue;
+
+ /* as of now there is only one device in the spare fs_devices */
+ device = list_entry(fs_devices->devices.next,
+ struct btrfs_device, dev_list);
+
+ if (!device || !device->name)
+ continue;
+
+ fs_devices->spare = 0;
+ rcu_read_lock();
+ *path = kstrdup(device->name->str, GFP_NOFS);
+ rcu_read_unlock();
+ ret = 0;
+ break;
+ }
+
+ if (!ret) {
+ btrfs_sysfs_remove_fsid(fs_devices);
+ list_del(&fs_devices->list);
+ free_fs_devices(fs_devices);
+ }
+ mutex_unlock(&uuid_mutex);
+
+ return ret;
+}
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 3d995b7..36184ec 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -577,5 +577,7 @@ void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
void btrfs_close_one_device(struct btrfs_device *device);
int btrfs_check_degradable(struct btrfs_fs_info *fs_info, unsigned flags);
void btrfs_force_device_close(struct btrfs_device *dev, char *why);
+int btrfs_get_spare_device(char **path);
+void btrfs_put_spare_device(char *path);
#endif
--
2.4.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 14/15] btrfs: introduce helper functions to perform hot replace
2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
` (12 preceding siblings ...)
2015-11-09 10:56 ` [PATCH 13/15] btrfs: provide framework to get and put a " Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
2015-11-09 10:56 ` [PATCH 15/15] btrfs: check for failed device and " Anand Jain
` (5 subsequent siblings)
19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
To: linux-btrfs
Hot replace / auto replace is important volume manager feature
and is critical to the data center operations, so that the degraded
volume can be brought back to a healthy state at the earliest and
without manual intervention.
This modifies the existing replace code to suite the need of auto
replace, in the long run I hope both the codes to be merged.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/dev-replace.c | 116 +++++++++++++++++++++++++++++++++++++++++++++++++
fs/btrfs/dev-replace.h | 1 +
2 files changed, 117 insertions(+)
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 02df419..3294b33 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -914,3 +914,119 @@ void btrfs_bio_counter_inc_blocked(struct btrfs_fs_info *fs_info)
&fs_info->fs_state));
}
}
+
+int btrfs_dev_replace_start_v2(struct btrfs_root *root, char *tgt_path,
+ struct btrfs_device *src_device)
+{
+ struct btrfs_trans_handle *trans;
+ struct btrfs_fs_info *fs_info = root->fs_info;
+ struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace;
+ int ret;
+ struct btrfs_device *tgt_device = NULL;
+
+ /*
+ * proceure here is the same as in the replace triggered from the
+ * user land, some day we could merg this with it
+ */
+ WARN_ON(!src_device);
+ mutex_lock(&fs_info->volume_mutex);
+ ret = btrfs_init_dev_replace_tgtdev(root, tgt_path,
+ src_device, &tgt_device);
+ mutex_unlock(&fs_info->volume_mutex);
+ if (ret)
+ return ret;
+ WARN_ON(!tgt_device);
+
+ trans = btrfs_attach_transaction(root);
+ if (!IS_ERR(trans)) {
+ ret = btrfs_commit_transaction(trans, root);
+ if (ret)
+ return ret;
+ } else if (PTR_ERR(trans) != -ENOENT) {
+ return PTR_ERR(trans);
+ }
+
+ btrfs_dev_replace_lock(dev_replace);
+ if (dev_replace->replace_state ==
+ BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED ||
+ dev_replace->replace_state ==
+ BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED)
+ goto leave;
+
+ dev_replace->cont_reading_from_srcdev_mode =
+ BTRFS_IOCTL_DEV_REPLACE_CONT_READING_FROM_SRCDEV_MODE_AVOID;
+ dev_replace->srcdev = src_device;
+ dev_replace->tgtdev = tgt_device;
+
+ dev_replace->replace_state = BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED;
+ dev_replace->time_started = get_seconds();
+ dev_replace->cursor_left = 0;
+ dev_replace->committed_cursor_left = 0;
+ dev_replace->cursor_left_last_write_of_item = 0;
+ dev_replace->cursor_right = 0;
+ dev_replace->is_valid = 1;
+ dev_replace->item_needs_writeback = 1;
+
+ printk_in_rcu(KERN_INFO
+ "BTRFS: auto replace from %s (devid %llu) to %s started\n",
+ rcu_str_deref(src_device->name),
+ src_device->devid,
+ rcu_str_deref(tgt_device->name));
+
+ btrfs_dev_replace_unlock(dev_replace);
+
+ ret = btrfs_sysfs_add_device_link(tgt_device->fs_devices, tgt_device, 0);
+ if (ret)
+ btrfs_err(fs_info, "kobj add dev failed %d\n", ret);
+
+ btrfs_wait_ordered_roots(fs_info, -1);
+
+ trans = btrfs_start_transaction(root, 0);
+ if (IS_ERR(trans)) {
+ ret = PTR_ERR(trans);
+ btrfs_dev_replace_lock(dev_replace);
+ goto leave;
+ }
+ ret = btrfs_commit_transaction(trans, root);
+ WARN_ON(ret);
+
+ ret = btrfs_scrub_dev(fs_info, src_device->devid, 0,
+ btrfs_device_get_total_bytes(src_device),
+ &dev_replace->scrub_progress, 0, 1);
+
+ ret = btrfs_dev_replace_finishing(fs_info, ret);
+ if (ret == -EINPROGRESS)
+ ret = 0;
+ else
+ WARN_ON(ret);
+
+ return ret;
+
+leave:
+ dev_replace->srcdev = NULL;
+ dev_replace->tgtdev = NULL;
+ btrfs_dev_replace_unlock(dev_replace);
+ btrfs_destroy_dev_replace_tgtdev(fs_info, tgt_device);
+ return ret;
+}
+
+int btrfs_auto_replace_start(struct btrfs_root *root,
+ struct btrfs_device *src_device)
+{
+ char *tgt_path;
+ int ret;
+
+ if (btrfs_get_spare_device(&tgt_path)) {
+ btrfs_err(root->fs_info,
+ "No spare device found/configured in the kernel");
+ return -EINVAL;
+ }
+
+ ret = btrfs_dev_replace_start_v2(root, tgt_path, src_device);
+ if (ret)
+ btrfs_put_spare_device(tgt_path);
+
+ kfree(tgt_path);
+
+ return 0;
+}
diff --git a/fs/btrfs/dev-replace.h b/fs/btrfs/dev-replace.h
index 20035cb..2ead9a6 100644
--- a/fs/btrfs/dev-replace.h
+++ b/fs/btrfs/dev-replace.h
@@ -41,4 +41,5 @@ static inline void btrfs_dev_replace_stats_inc(atomic64_t *stat_value)
{
atomic64_inc(stat_value);
}
+int btrfs_auto_replace_start(struct btrfs_root *root, struct btrfs_device *src_device);
#endif
--
2.4.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 15/15] btrfs: check for failed device and hot replace
2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
` (13 preceding siblings ...)
2015-11-09 10:56 ` [PATCH 14/15] btrfs: introduce helper functions to perform hot replace Anand Jain
@ 2015-11-09 10:56 ` Anand Jain
2015-11-09 10:58 ` [PATCH 0/4] btrfs-progs: Hot spare and Auto replace Anand Jain
` (4 subsequent siblings)
19 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:56 UTC (permalink / raw)
To: linux-btrfs
This patch creates casualty_kthread to check for the failed
devices, and triggers device replace.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/ctree.h | 1 +
fs/btrfs/disk-io.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++
fs/btrfs/transaction.c | 3 ++-
3 files changed, 70 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4d25fd8..3e706ff 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1613,6 +1613,7 @@ struct btrfs_fs_info {
struct btrfs_workqueue *extent_workers;
struct task_struct *transaction_kthread;
struct task_struct *cleaner_kthread;
+ struct task_struct *casualty_kthread;
int thread_pool_size;
struct kobject *space_info_kobj;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3662c0a..beefe35 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1836,6 +1836,64 @@ sleep:
return 0;
}
+/*
+ * A kthread to check if any auto maintenance be required. This is
+ * multithread safe, and kthread is running only if
+ * fs_info->casualty_kthread is not NULL, fixme: atomic ?
+ */
+static int casualty_kthread(void *arg)
+{
+ struct btrfs_root *root = arg;
+ struct btrfs_fs_info *fs_info = root->fs_info;
+ struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
+ struct btrfs_device *device;
+ int found = 0;
+
+ if (root->fs_info->sb->s_flags & MS_RDONLY)
+ goto out;
+
+ btrfs_dev_replace_lock(&fs_info->dev_replace);
+ if (btrfs_dev_replace_is_ongoing(&fs_info->dev_replace)) {
+ btrfs_dev_replace_unlock(&fs_info->dev_replace);
+ goto out;
+ }
+ btrfs_dev_replace_unlock(&fs_info->dev_replace);
+
+ /*
+ * Find failed device, if any. After the replace the failed
+ * device is removed, so any failed device found here is new and
+ * will be a candidate for the replace, if FS can't work without
+ * the failed device then btrfs_std_error() will have put FS into
+ * readonly
+ */
+ /*
+ * fixme: introduce a priority order to find failed device,
+ * chronological order ?
+ */
+ mutex_lock(&fs_devices->device_list_mutex);
+ rcu_read_lock();
+ list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) {
+ if (device->failed) {
+ found = 1;
+ break;
+ }
+ }
+ rcu_read_unlock();
+ mutex_unlock(&fs_devices->device_list_mutex);
+
+ /*
+ * We are using the replace code which should be interrupt-able
+ * during unmount, and as of now there is no user land stop
+ * request that we support
+ */
+ if (found)
+ btrfs_auto_replace_start(root, device);
+
+out:
+ fs_info->casualty_kthread = NULL;
+ return 0;
+}
+
static void btrfs_check_devices(struct btrfs_fs_devices *fs_devices)
{
struct btrfs_fs_info *fs_info = fs_devices->fs_info;
@@ -1924,6 +1982,10 @@ static int transaction_kthread(void *arg)
}
sleep:
btrfs_check_devices(root->fs_info->fs_devices);
+ if (!root->fs_info->casualty_kthread)
+ root->fs_info->casualty_kthread =
+ kthread_run(casualty_kthread, root,
+ "btrfs-casualty");
wake_up_process(root->fs_info->cleaner_kthread);
mutex_unlock(&root->fs_info->transaction_kthread_mutex);
@@ -3159,6 +3221,9 @@ fail_trans_kthread:
kthread_stop(fs_info->transaction_kthread);
btrfs_cleanup_transaction(fs_info->tree_root);
btrfs_free_fs_roots(fs_info);
+ if (fs_info->casualty_kthread)
+ kthread_stop(fs_info->casualty_kthread);
+
fail_cleaner:
kthread_stop(fs_info->cleaner_kthread);
@@ -3807,6 +3872,8 @@ void close_ctree(struct btrfs_root *root)
kthread_stop(fs_info->transaction_kthread);
kthread_stop(fs_info->cleaner_kthread);
+ if (fs_info->casualty_kthread)
+ kthread_stop(fs_info->casualty_kthread);
fs_info->closing = 2;
smp_mb();
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 76354bb..ef4aaf5 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2187,7 +2187,8 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
kmem_cache_free(btrfs_trans_handle_cachep, trans);
if (current != root->fs_info->transaction_kthread &&
- current != root->fs_info->cleaner_kthread)
+ current != root->fs_info->cleaner_kthread &&
+ current != root->fs_info->casualty_kthread)
btrfs_run_delayed_iputs(root);
return ret;
--
2.4.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 0/4] btrfs-progs: Hot spare and Auto replace
2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
` (14 preceding siblings ...)
2015-11-09 10:56 ` [PATCH 15/15] btrfs: check for failed device and " Anand Jain
@ 2015-11-09 10:58 ` Anand Jain
2015-11-09 10:58 ` [PATCH 1/4] btrfs-progs: Introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV SB flags Anand Jain
` (3 more replies)
2015-11-09 14:09 ` [PATCH 00/15] btrfs: Hot spare and Auto replace Austin S Hemmelgarn
` (3 subsequent siblings)
19 siblings, 4 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:58 UTC (permalink / raw)
To: linux-btrfs
Depends on the kernel patch set
[PATCH 00/15] btrfs: Hot spare and Auto replace
This is btrfs-progs side of the patch set.
Anand Jain (4):
btrfs-progs: Introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV SB flags
btrfs-progs: Introduce btrfs spare subcommand
btrfs-progs: add fi show for spare
btrfs-progs: add global spare device list to filesystem show
Android.mk | 2 +-
Makefile.in | 2 +-
btrfs-show-super.c | 3 +-
btrfs.c | 1 +
cmds-filesystem.c | 9 ++
cmds-spare.c | 291 +++++++++++++++++++++++++++++++++++++++++++++++++++++
commands.h | 2 +
ctree.h | 4 +-
utils.h | 1 +
volumes.c | 4 +
volumes.h | 2 +
11 files changed, 317 insertions(+), 4 deletions(-)
create mode 100644 cmds-spare.c
--
2.4.1
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH 1/4] btrfs-progs: Introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV SB flags
2015-11-09 10:58 ` [PATCH 0/4] btrfs-progs: Hot spare and Auto replace Anand Jain
@ 2015-11-09 10:58 ` Anand Jain
2015-11-09 10:58 ` [PATCH 2/4] btrfs-progs: Introduce btrfs spare subcommand Anand Jain
` (2 subsequent siblings)
3 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:58 UTC (permalink / raw)
To: linux-btrfs
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
btrfs-show-super.c | 3 ++-
ctree.h | 4 +++-
volumes.c | 4 ++++
volumes.h | 2 ++
4 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/btrfs-show-super.c b/btrfs-show-super.c
index 27414c8..d9626cd 100644
--- a/btrfs-show-super.c
+++ b/btrfs-show-super.c
@@ -300,7 +300,8 @@ struct readable_flag_entry incompat_flags_array[] = {
DEF_INCOMPAT_FLAG_ENTRY(EXTENDED_IREF),
DEF_INCOMPAT_FLAG_ENTRY(RAID56),
DEF_INCOMPAT_FLAG_ENTRY(SKINNY_METADATA),
- DEF_INCOMPAT_FLAG_ENTRY(NO_HOLES)
+ DEF_INCOMPAT_FLAG_ENTRY(NO_HOLES),
+ DEF_INCOMPAT_FLAG_ENTRY(SPARE_DEV)
};
static const int incompat_flags_num = sizeof(incompat_flags_array) /
sizeof(struct readable_flag_entry);
diff --git a/ctree.h b/ctree.h
index c57f9ca..2c3aea6 100644
--- a/ctree.h
+++ b/ctree.h
@@ -475,6 +475,7 @@ struct btrfs_super_block {
#define BTRFS_FEATURE_INCOMPAT_RAID56 (1ULL << 7)
#define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA (1ULL << 8)
#define BTRFS_FEATURE_INCOMPAT_NO_HOLES (1ULL << 9)
+#define BTRFS_FEATURE_INCOMPAT_SPARE_DEV (1ULL << 10)
#define BTRFS_FEATURE_COMPAT_SUPP 0ULL
@@ -488,7 +489,8 @@ struct btrfs_super_block {
BTRFS_FEATURE_INCOMPAT_RAID56 | \
BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS | \
BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA | \
- BTRFS_FEATURE_INCOMPAT_NO_HOLES)
+ BTRFS_FEATURE_INCOMPAT_NO_HOLES | \
+ BTRFS_FEATURE_INCOMPAT_SPARE_DEV)
/*
* A leaf is full of items. offset and size tell us where to find
diff --git a/volumes.c b/volumes.c
index ca50f1c..beaeecf 100644
--- a/volumes.c
+++ b/volumes.c
@@ -101,6 +101,10 @@ static int device_list_add(const char *path,
fs_devices->latest_devid = devid;
fs_devices->latest_trans = found_transid;
fs_devices->lowest_devid = (u64)-1;
+ if (btrfs_super_incompat_flags(disk_super) &
+ BTRFS_FEATURE_INCOMPAT_SPARE_DEV)
+ fs_devices->spare = 1;
+
device = NULL;
} else {
device = __find_device(&fs_devices->devices, devid,
diff --git a/volumes.h b/volumes.h
index 4ecb993..3b56c1f 100644
--- a/volumes.h
+++ b/volumes.h
@@ -83,6 +83,8 @@ struct btrfs_fs_devices {
int seeding;
struct btrfs_fs_devices *seed;
+
+ int spare;
};
struct btrfs_bio_stripe {
--
2.4.1
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 2/4] btrfs-progs: Introduce btrfs spare subcommand
2015-11-09 10:58 ` [PATCH 0/4] btrfs-progs: Hot spare and Auto replace Anand Jain
2015-11-09 10:58 ` [PATCH 1/4] btrfs-progs: Introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV SB flags Anand Jain
@ 2015-11-09 10:58 ` Anand Jain
2015-11-09 10:58 ` [PATCH 3/4] btrfs-progs: add fi show for spare Anand Jain
2015-11-09 10:58 ` [PATCH 4/4] btrfs-progs: add global spare device list to filesystem show Anand Jain
3 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:58 UTC (permalink / raw)
To: linux-btrfs
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
Android.mk | 2 +-
Makefile.in | 2 +-
btrfs.c | 1 +
cmds-spare.c | 291 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
commands.h | 2 +
5 files changed, 296 insertions(+), 2 deletions(-)
create mode 100644 cmds-spare.c
diff --git a/Android.mk b/Android.mk
index fe3209b..baaf179 100644
--- a/Android.mk
+++ b/Android.mk
@@ -27,7 +27,7 @@ cmds_objects := cmds-subvolume.c cmds-filesystem.c cmds-device.c cmds-scrub.c \
cmds-inspect.c cmds-balance.c cmds-send.c cmds-receive.c \
cmds-quota.c cmds-qgroup.c cmds-replace.c cmds-check.c \
cmds-restore.c cmds-rescue.c chunk-recover.c super-recover.c \
- cmds-property.c cmds-fi-usage.c
+ cmds-property.c cmds-fi-usage.c cmds-spare.c
libbtrfs_objects := send-stream.c send-utils.c rbtree.c btrfs-list.c crc32c.c \
uuid-tree.c utils-lib.c rbtree-utils.c
libbtrfs_headers := send-stream.h send-utils.h send.h rbtree.h btrfs-list.h \
diff --git a/Makefile.in b/Makefile.in
index 514a76f..1b005b0 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -43,7 +43,7 @@ cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
cmds-quota.o cmds-qgroup.o cmds-replace.o cmds-check.o \
cmds-restore.o cmds-rescue.o chunk-recover.o super-recover.o \
- cmds-property.o cmds-fi-usage.o
+ cmds-property.o cmds-fi-usage.o cmds-spare.o
libbtrfs_objects = send-stream.o send-utils.o rbtree.o btrfs-list.o crc32c.o \
uuid-tree.o utils-lib.o rbtree-utils.o
libbtrfs_headers = send-stream.h send-utils.h send.h rbtree.h btrfs-list.h \
diff --git a/btrfs.c b/btrfs.c
index 63df377..ba0dd02 100644
--- a/btrfs.c
+++ b/btrfs.c
@@ -204,6 +204,7 @@ static const struct cmd_group btrfs_cmd_group = {
{ "quota", cmd_quota, NULL, "a_cmd_group, 0 },
{ "qgroup", cmd_qgroup, NULL, &qgroup_cmd_group, 0 },
{ "replace", cmd_replace, NULL, &replace_cmd_group, 0 },
+ { "spare", cmd_spare, NULL, &spare_cmd_group, 0 },
{ "help", cmd_help, cmd_help_usage, NULL, 0 },
{ "version", cmd_version, cmd_version_usage, NULL, 0 },
NULL_CMD_STRUCT
diff --git a/cmds-spare.c b/cmds-spare.c
new file mode 100644
index 0000000..cd9c709
--- /dev/null
+++ b/cmds-spare.c
@@ -0,0 +1,291 @@
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <sys/ioctl.h>
+#include <errno.h>
+#include <getopt.h>
+
+#include "ctree.h"
+#include "utils.h"
+#include "volumes.h"
+#include "disk-io.h"
+
+#include "commands.h"
+
+int print_spare_device(unsigned unit_mode)
+{
+ int ret;
+ struct btrfs_fs_devices *fs_devices;
+ struct btrfs_device *device;
+ struct list_head *fs_uuids;
+
+ printf("Global spare\n");
+
+ ret = btrfs_scan_lblkid();
+ if (ret) {
+ fprintf(stderr, "scan_lblkid failed ret %d\n", ret);
+ return ret;
+ }
+
+ fs_uuids = btrfs_scanned_uuids();
+
+ list_for_each_entry(fs_devices, fs_uuids, list) {
+ if (!fs_devices->spare)
+ continue;
+
+ device = list_entry(fs_devices->devices.next,
+ struct btrfs_device, dev_list);
+ if (device->name)
+ printf("\tdevice size %s path %s\n",
+ pretty_size_mode(device->total_bytes,
+ unit_mode), device->name);
+
+ }
+
+ return 0;
+
+}
+
+static void btrfs_delete_spare(char *path)
+{
+ printf("Unscan the device (or don't run device scan after reboot) and run wipefs to wipe SB\n");
+
+}
+
+static void btrfs_add_spare(char *dev)
+{
+ struct stat st;
+ int fd;
+ int i;
+ int ret;
+ u64 block_cnt;
+ u64 blocks[7];
+ u32 nodesz = max_t(u32, sysconf(_SC_PAGESIZE), BTRFS_MKFS_DEFAULT_NODE_SIZE);
+ struct btrfs_mkfs_config mkfs_cfg;
+
+ fd = open(dev, O_RDWR);
+ if (fd < 0) {
+ fprintf(stderr, "unable to open %s: %s\n", dev, strerror(errno));
+ return;
+ }
+
+ if (fstat(fd, &st)) {
+ fprintf(stderr, "unable to stat %s\n", dev);
+ goto out;
+ }
+ block_cnt = btrfs_device_size(fd, &st);
+ if (!block_cnt) {
+ fprintf(stderr, "unable to find %s size\n", dev);
+ goto out;
+ }
+
+ if (block_cnt < BTRFS_MKFS_SYSTEM_GROUP_SIZE) {
+ fprintf(stderr, "device is too small to make filesystem\n");
+ goto out;
+ }
+
+ blocks[0] = BTRFS_SUPER_INFO_OFFSET;
+ for (i = 1; i < 7; i++)
+ blocks[i] = BTRFS_SUPER_INFO_OFFSET + 1024 * 1024 + nodesz * i;
+
+ memset(&mkfs_cfg, 0, sizeof(mkfs_cfg));
+ memcpy(mkfs_cfg.blocks, blocks, sizeof(blocks));
+ mkfs_cfg.num_bytes = block_cnt;
+ mkfs_cfg.nodesize = nodesz;
+ mkfs_cfg.sectorsize = 4096;
+ mkfs_cfg.stripesize = 4096;
+ mkfs_cfg.features = BTRFS_FEATURE_INCOMPAT_SPARE_DEV;
+ ret = make_btrfs(fd, &mkfs_cfg);
+ if (ret)
+ fprintf(stderr, "error during mkfs: %s\n", strerror(-ret));
+
+out:
+ close(fd);
+}
+
+static const char * const spare_cmd_group_usage[] = {
+ "btrfs spare <command> [<args>]",
+ NULL
+};
+
+static const char * const cmd_spare_add_usage[] = {
+ "btrfs spare add <device> [<device>...]",
+ "Add global spare device(s) to btrfs",
+ "-K|--nodiscard do not perform whole device TRIM",
+ "-f|--force force overwrite existing filesystem on the disk",
+ NULL
+};
+
+static const char * const cmd_spare_delete_usage[] = {
+ "btrfs spare delete <device> [<device>...]",
+ "Delete global spare device(s) from btrfs",
+ NULL
+};
+
+static const char * const cmd_spare_list_usage[] = {
+ "btrfs spare list",
+ "List spare device(s) both scanned and unscanned(*) for kernel",
+ NULL
+};
+
+static int cmd_spare_add(int argc, char **argv)
+{
+ int i;
+ int force = 0;
+ int discard = 1;
+ int ret = 0;
+
+ while (1) {
+ int c;
+ static const struct option long_options[] = {
+ { "nodiscard", optional_argument, NULL, 'K'},
+ { "force", no_argument, NULL, 'f'},
+ { NULL, 0, NULL, 0}
+ };
+
+ c = getopt_long(argc, argv, "f", long_options, NULL);
+ if (c < 0)
+ break;
+
+ switch (c) {
+ case 'K':
+ discard = 0;
+ break;
+ case 'f':
+ force = 1;
+ break;
+ default:
+ usage(cmd_spare_add_usage);
+ }
+ }
+
+ if (check_argc_min(argc - optind, 1))
+ usage(cmd_spare_add_usage);
+
+ for (i = optind; i < argc; i++) {
+ u64 dev_block_count = 0;
+ int devfd;
+ char *path;
+ int res;
+ int mixed;
+
+ if (test_dev_for_mkfs(argv[i], force)) {
+ ret++;
+ continue;
+ }
+
+ devfd = open(argv[i], O_RDWR);
+ if (devfd < 0) {
+ fprintf(stderr, "ERROR: Unable to open device '%s'\n", argv[i]);
+ ret++;
+ continue;
+ }
+
+ res = btrfs_prepare_device(devfd, argv[i], 1, &dev_block_count,
+ 0, &mixed, discard);
+ close(devfd);
+ if (res) {
+ ret++;
+ goto error_out;
+ }
+
+ path = canonicalize_path(argv[i]);
+ if (!path) {
+ fprintf(stderr,
+ "ERROR: Could not canonicalize pathname '%s': %s\n",
+ argv[i], strerror(errno));
+ ret++;
+ goto error_out;
+ }
+
+ btrfs_add_spare(path);
+ free(path);
+ }
+error_out:
+ btrfs_close_all_devices();
+ return !!ret;
+}
+
+static int cmd_spare_delete(int argc, char **argv)
+{
+ int i;
+ char *path;
+ int ret = 0;
+
+ if (check_argc_min(argc - optind, 1))
+ usage(cmd_spare_add_usage);
+
+ for (i = optind; i < argc; i++) {
+ int devfd;
+
+ devfd = open(argv[i], O_RDWR);
+ if (devfd < 0) {
+ fprintf(stderr, "ERROR: Unable to open device '%s'\n", argv[i]);
+ ret++;
+ continue;
+ }
+ close(devfd);
+
+ path = canonicalize_path(argv[i]);
+ if (!path) {
+ fprintf(stderr,
+ "ERROR: Could not canonicalize pathname '%s': %s\n",
+ argv[i], strerror(errno));
+ ret++;
+ goto error_out;
+ }
+
+ btrfs_delete_spare(path);
+ free(path);
+ }
+
+error_out:
+ btrfs_close_all_devices();
+ return !!ret;
+}
+
+int cmd_spare_list(int argc, char **argv)
+{
+ int ret;
+ unsigned unit_mode;
+
+ unit_mode = get_unit_mode_from_arg(&argc, argv, 0);
+
+ ret = print_spare_device(unit_mode);
+
+ return !!ret;
+}
+
+static const char spare_cmd_group_info[] =
+ "manage spare devices in the filesystem";
+
+const struct cmd_group spare_cmd_group = {
+ spare_cmd_group_usage, spare_cmd_group_info, {
+ { "add", cmd_spare_add, cmd_spare_add_usage, NULL, 0 },
+ { "delete", cmd_spare_delete, cmd_spare_delete_usage, NULL, 0},
+ { "list", cmd_spare_list, cmd_spare_list_usage, NULL, 0},
+ NULL_CMD_STRUCT
+ }
+};
+
+int cmd_spare(int argc, char **argv)
+{
+ return handle_command_group(&spare_cmd_group, argc, argv);
+}
diff --git a/commands.h b/commands.h
index d2bb093..6f68ef1 100644
--- a/commands.h
+++ b/commands.h
@@ -95,6 +95,7 @@ extern const struct cmd_group quota_cmd_group;
extern const struct cmd_group qgroup_cmd_group;
extern const struct cmd_group replace_cmd_group;
extern const struct cmd_group rescue_cmd_group;
+extern const struct cmd_group spare_cmd_group;
extern const char * const cmd_send_usage[];
extern const char * const cmd_receive_usage[];
@@ -119,6 +120,7 @@ int cmd_receive(int argc, char **argv);
int cmd_quota(int argc, char **argv);
int cmd_qgroup(int argc, char **argv);
int cmd_replace(int argc, char **argv);
+int cmd_spare(int argc, char **argv);
int cmd_restore(int argc, char **argv);
int cmd_select_super(int argc, char **argv);
int cmd_dump_super(int argc, char **argv);
--
2.4.1
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 3/4] btrfs-progs: add fi show for spare
2015-11-09 10:58 ` [PATCH 0/4] btrfs-progs: Hot spare and Auto replace Anand Jain
2015-11-09 10:58 ` [PATCH 1/4] btrfs-progs: Introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV SB flags Anand Jain
2015-11-09 10:58 ` [PATCH 2/4] btrfs-progs: Introduce btrfs spare subcommand Anand Jain
@ 2015-11-09 10:58 ` Anand Jain
2015-11-09 10:58 ` [PATCH 4/4] btrfs-progs: add global spare device list to filesystem show Anand Jain
3 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:58 UTC (permalink / raw)
To: linux-btrfs
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
cmds-filesystem.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 4d3a9a4..11d0406 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -353,6 +353,9 @@ static void print_one_uuid(struct btrfs_fs_devices *fs_devices,
if (add_seen_fsid(fs_devices->fsid))
return;
+ if (fs_devices->spare)
+ return;
+
uuid_unparse(fs_devices->fsid, uuidbuf);
device = list_entry(fs_devices->devices.next, struct btrfs_device,
dev_list);
@@ -610,6 +613,7 @@ static int copy_fs_devices(struct btrfs_fs_devices *dst,
memcpy(dst->fsid, src->fsid, BTRFS_FSID_SIZE);
INIT_LIST_HEAD(&dst->devices);
dst->seed = NULL;
+ dst->spare = src->spare;
list_for_each_entry(cur_dev, &src->devices, dev_list) {
dev_copy = malloc(sizeof(*dev_copy));
--
2.4.1
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 4/4] btrfs-progs: add global spare device list to filesystem show
2015-11-09 10:58 ` [PATCH 0/4] btrfs-progs: Hot spare and Auto replace Anand Jain
` (2 preceding siblings ...)
2015-11-09 10:58 ` [PATCH 3/4] btrfs-progs: add fi show for spare Anand Jain
@ 2015-11-09 10:58 ` Anand Jain
3 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-09 10:58 UTC (permalink / raw)
To: linux-btrfs
This patch will add list of spare devices to the filesystem show
output, as show in the example below.
btrfs fi show
Label: none uuid: 17f7d403-17d7-4f0a-b8ba-de673fdd3f56
Total devices 2 FS bytes used 15.88MiB
devid 1 size 2.00GiB used 417.50MiB path /dev/sdc
devid 2 size 2.00GiB used 417.50MiB path /dev/sdd
Global spare
device size 3.00GiB path /dev/sde
btrfs-progs v4.2.3-12-gb5f4b68
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
cmds-filesystem.c | 5 +++++
utils.h | 1 +
2 files changed, 6 insertions(+)
diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 11d0406..651ffe4 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -920,6 +920,11 @@ devs_only:
struct btrfs_fs_devices, list);
free_fs_devices(fs_devices);
}
+
+ if (where == -1 && search == NULL) {
+ ret = print_spare_device(unit_mode);
+ printf("\n");
+ }
out:
printf("%s\n", PACKAGE_STRING);
free_seen_fsid();
diff --git a/utils.h b/utils.h
index a84cf2d..b833390 100644
--- a/utils.h
+++ b/utils.h
@@ -271,5 +271,6 @@ const char *get_argv0_buf(void);
unsigned int get_unit_mode_from_arg(int *argc, char *argv[], int df_mode);
int is_numerical(const char *str);
+int print_spare_device(unsigned unit_mode);
#endif
--
2.4.1
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-
^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
` (15 preceding siblings ...)
2015-11-09 10:58 ` [PATCH 0/4] btrfs-progs: Hot spare and Auto replace Anand Jain
@ 2015-11-09 14:09 ` Austin S Hemmelgarn
2015-11-09 21:29 ` Duncan
2015-11-12 2:15 ` Qu Wenruo
` (2 subsequent siblings)
19 siblings, 1 reply; 43+ messages in thread
From: Austin S Hemmelgarn @ 2015-11-09 14:09 UTC (permalink / raw)
To: Anand Jain, linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 3120 bytes --]
On 2015-11-09 05:56, Anand Jain wrote:
> These set of patches provides btrfs hot spare and auto replace support
> for you review and comments.
It's absolutely awesome to see that someone picked up this project, it's
something that's very useful and helps BTRFS to compete with many
established storage technologies. I've got some specific questions below.
>
> First, here below are the simple example steps to configure the same:
>
> Add a spare device:
> btrfs spare add /dev/sde -f
>
> OR if there is a spare device which is already added before the, just
> run
>
> btrfs dev scan [/dev/sde]
>
> this will register the spare device to the kernel.
>
> btrfs fi show
> Label: none uuid: 52f170c1-725c-457d-8cfd-d57090460091
> Total devices 2 FS bytes used 112.00KiB
> devid 1 size 2.00GiB used 417.50MiB path /dev/sdc
> devid 2 size 2.00GiB used 417.50MiB path /dev/sdd
>
> Global spare
> device size 3.00GiB path /dev/sde
Would I be correct in assuming that we can have more than one hot-spare
device at a time? If so, what method is used to select which one to use
when one is needed?
>
> Thats it.
>
> Auto replace:
> Replace happens automatically, that is when there is any write
> failed or flush failed, the device will be marked as failed, which
> will stop any further IO attempt to that device. And in the next commit
> thread cycle the auto replace will pick the spare device (/dev/sde is
> above example) to replace the failed device. And so the btrfs volume is
> back to a healthy state.
Is there any possibility we could add a knob to control how many errors
are needed before the device is marked as failed? For an enterprise
environment, immediately marking the device failed is the right thing to
do, but for home usage it may make more sense to retry the I/O at least
once before marking the device failed (especially considering that most
home users don't have ECC memory, and a transient memory error can cause
an I/O request to fail (I've actually had this happen on my laptop before)).
>
>
> Its btrfs Global spare:
> as of now only global hot spare is supported, that is hot spare(s)
> are for all the btrfs FS in the system.
How hard would it be to eventually extend this to per-filesystem hot-spares?
>
> No spare when device failed:
> It would scan for spare device at the rate of transaction commit
> and will trigger the auto replace when ever spare device is added.
Does this absolutely have to be polled every commit? This has serious
potential to make running on a degraded array have a much bigger impact
than it does now. While we obviously want people to notice that their
array is degraded, killing performance is not the proper way to do that.
Couldn't we have a callback when adding a hot-spare that would check
for failed devices and initiate the replacement automatically for the
first one found? Ideally, we should keep the current behavior (assume
the error was transient, and retry the I/O) when there is no hot-spare
available.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
2015-11-09 14:09 ` [PATCH 00/15] btrfs: Hot spare and Auto replace Austin S Hemmelgarn
@ 2015-11-09 21:29 ` Duncan
2015-11-10 12:13 ` Austin S Hemmelgarn
0 siblings, 1 reply; 43+ messages in thread
From: Duncan @ 2015-11-09 21:29 UTC (permalink / raw)
To: linux-btrfs
Austin S Hemmelgarn posted on Mon, 09 Nov 2015 09:09:07 -0500 as
excerpted:
>> btrfs fi show
>> Label: none uuid: 52f170c1-725c-457d-8cfd-d57090460091
>> Total devices 2 FS bytes used 112.00KiB
>> devid 1 size 2.00GiB used 417.50MiB path /dev/sdc
>> devid 2 size 2.00GiB used 417.50MiB path /dev/sdd
>>
>> Global spare
>> device size 3.00GiB path /dev/sde
First of all, thanks from me too, AJ, for this very nice new feature. =:^)
> Would I be correct in assuming that we can have more than one hot-spare
> device at a time? If so, what method is used to select which one to use
> when one is needed?
In the later patches overview section, patches 10,11,12,13/15 paragraph,
AJ mentions a helper function to pick/release a spare device from/to the
spare devices pool. That would appear to be patch 13, provide framework
to get and put a spare device.
Which means yes, multiple hot-spares are (at least planned to be)
allowed. =:^)
While I'm not a coder and could very well be misinterpreting this,
however, reading the btrfs_get_spare_device function in patch 13, there's
a comment that goes like this:
>> /* as of now there is only one device in the spare fs_devices */
I don't read C well enough to know whether that's a comment on the
internal progress in the function (tho I don't see any obvious hints to
indicate that), or whether it can be taken at face value, that right now
there's only provision for one in the "pool" (seems the more obvious
interpretation).
So unless my lack of C skills is deceiving me, while a pool is intended,
current patch implementation status simply assumes a spare pool of one,
and the first spare found is picked. The put function in the same patch
doesn't appear to have a limit on the number of spares that can be added,
so assuming the current pool implementation allows it, more than one
spare can be added to the pool, but as I said, the get function appears
to assume just one in the pool, so picks the first spare it finds.
Which is quite reasonable for a first patch series posting that may well
require additional iterations, particularly so given the get helper
function is already nicely modularized so adding more complex picker
logic should be relatively simple.
Not that targeting particular use-cases is appropriate at this point, but
simply for information purposes, my particular use-case is a bunch of
different size independent raid1 btrfs on partitions, but with the
devices composing each raid1 of identical size. I think a reasonably
simple picker logic optimization would be to first check if there's a
spare matching the size of the failing device, and use it in preference
to others of different sizes if so.
Given my partitioned usage, a failing physical device will trigger a
whole slew of failing btrfs logical devices (partitions on that physical
device), so in ordered for this feature to be of much use to me I'd have
to maintain a whole series of spares, one for each btrfs logical device
on a partition on the failing physical device, since they'd all fail at
once.
Since those partitions and the btrfs on top of them are different sizes,
a size-matching logic lets me partition the physical spare identically to
the operational devices and simply add all the partitions to the spare
list, while without size-matching logic, to ensure a large enough spare
was picked for the largest btrfs, I'd have to make all the spares that
size, and they'd no longer all fit on a single physical device of the
same size as the originals, possibly not even on two physical devices
that size.
At least at the non-enterprise level, size-similar picking logic would
seem to be pretty useful if not feature critical, then, and given that
it /should/ be reasonably simple to implement, I'd hope that doing so
becomes a priority, tho certainly an initial first-pick base
implementation to which size-similar logic can be added later, is fine as
well. I'd just hope that "later" is within a couple kernel cycles, not a
couple kernel major version cycles (~4 years each with bumps at .20).
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
2015-11-09 21:29 ` Duncan
@ 2015-11-10 12:13 ` Austin S Hemmelgarn
2015-11-13 10:17 ` Anand Jain
0 siblings, 1 reply; 43+ messages in thread
From: Austin S Hemmelgarn @ 2015-11-10 12:13 UTC (permalink / raw)
To: Duncan, linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 3155 bytes --]
On 2015-11-09 16:29, Duncan wrote:
> Austin S Hemmelgarn posted on Mon, 09 Nov 2015 09:09:07 -0500 as
> excerpted:
>
>>> btrfs fi show
>>> Label: none uuid: 52f170c1-725c-457d-8cfd-d57090460091
>>> Total devices 2 FS bytes used 112.00KiB
>>> devid 1 size 2.00GiB used 417.50MiB path /dev/sdc
>>> devid 2 size 2.00GiB used 417.50MiB path /dev/sdd
>>>
>>> Global spare
>>> device size 3.00GiB path /dev/sde
>
> First of all, thanks from me too, AJ, for this very nice new feature. =:^)
>
>> Would I be correct in assuming that we can have more than one hot-spare
>> device at a time? If so, what method is used to select which one to use
>> when one is needed?
>
> In the later patches overview section, patches 10,11,12,13/15 paragraph,
> AJ mentions a helper function to pick/release a spare device from/to the
> spare devices pool. That would appear to be patch 13, provide framework
> to get and put a spare device.
>
> Which means yes, multiple hot-spares are (at least planned to be)
> allowed. =:^)
Ah, you're right, somehow I missed that bit.
>
> While I'm not a coder and could very well be misinterpreting this,
> however, reading the btrfs_get_spare_device function in patch 13, there's
> a comment that goes like this:
>
>>> /* as of now there is only one device in the spare fs_devices */
>
> I don't read C well enough to know whether that's a comment on the
> internal progress in the function (tho I don't see any obvious hints to
> indicate that), or whether it can be taken at face value, that right now
> there's only provision for one in the "pool" (seems the more obvious
> interpretation).
>
> So unless my lack of C skills is deceiving me, while a pool is intended,
> current patch implementation status simply assumes a spare pool of one,
> and the first spare found is picked. The put function in the same patch
> doesn't appear to have a limit on the number of spares that can be added,
> so assuming the current pool implementation allows it, more than one
> spare can be added to the pool, but as I said, the get function appears
> to assume just one in the pool, so picks the first spare it finds.
AFAICT, you are correct. I hadn't yet gotten a chance to look at the
actual code, so I hadn't seen this yet.
>
> At least at the non-enterprise level, size-similar picking logic would
> seem to be pretty useful if not feature critical, then, and given that
> it /should/ be reasonably simple to implement, I'd hope that doing so
> becomes a priority, tho certainly an initial first-pick base
> implementation to which size-similar logic can be added later, is fine as
> well. I'd just hope that "later" is within a couple kernel cycles, not a
> couple kernel major version cycles (~4 years each with bumps at .20).
>
Hopefully, per-filesystem hot-spares will be a high priority too, as
that type of usage is pretty much required for many enterprise type
uses, although that doesn't need to be the same code doing it (in fact,
I could see having per-fs spares and global spares both available
potentially being very useful).
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
` (16 preceding siblings ...)
2015-11-09 14:09 ` [PATCH 00/15] btrfs: Hot spare and Auto replace Austin S Hemmelgarn
@ 2015-11-12 2:15 ` Qu Wenruo
2015-11-12 6:46 ` Duncan
` (3 more replies)
2015-11-12 19:21 ` Goffredo Baroncelli
2015-11-16 13:41 ` Austin S Hemmelgarn
19 siblings, 4 replies; 43+ messages in thread
From: Qu Wenruo @ 2015-11-12 2:15 UTC (permalink / raw)
To: Anand Jain, linux-btrfs
Hi Anand,
Nice work.
But I have some small questions about it.
Anand Jain wrote on 2015/11/09 18:56 +0800:
> These set of patches provides btrfs hot spare and auto replace support
> for you review and comments.
>
> First, here below are the simple example steps to configure the same:
>
> Add a spare device:
> btrfs spare add /dev/sde -f
I'm sorry but I didn't quite see the benefit of a spare device.
Let's take the following example:
1) 2 RAID1 + 1 spare
(A + B) + C
2) 3 RAID1
(A + B + C)
Let's assume they are all 12G size, and there are 3 raid1 chunks.
Each one is 3G size.
In my understanding, in normal operation case:
For case 1), all raid chunks should only be allocated into 2 RAID disks,
and spare one should contains no raid1 chunks.
A B C
------ ------ ------
|free| |free| |free|
------ ------ | |
|3Ga1| |3Ga2| | |
------ ------ | |
|3Gb1| |3Gb2| | |
------ ------ | |
|3Gc1| |3Gc2| | |
------ ------ ------
For case 2), all raid1 chunks will be allocated into all 3 disks, making
the allocation more fair.
A B C
------ ------ ------
|free| |free| |free|
------ ------ ------
|free| |free| |free|
------ ------ ------
|3Gb2| |3Ga1| |3Ga2|
------ ------ ------
|3Gc1| |3Gc2| |3Gb1|
------ ------ ------
At least in normal operation case, case 1) makes device C useless, and
reduce the total usable space.
In disk B failure case:
For case 1), we can auto replace B with C.
And it will copy all data chunks from A to C.
Need to copy 9G data.
And after replace:
A B C
------ ------ ------
|free| | X | |free|
------ ------ ------
|3Ga1| | X |->|3Ga2|
------ ------ ------
|3Gb1| | X |->|3Gb2|
------ ------ ------
|3Gc1| | X |->|3Gc2|
------ ------ ------
For case 2), we can just relocate and recover the bad chunks in B.
It it should only need to copy 6G data.
And after the "recovery", it should be much the same as case 1):
A B C
------ ------ ------
|free| | X | |free|
------ ------ ------
|3Ga1|<\| X |/>|3Gc1|
------ ------ ------
|3Gb2| || X |/ |3Ga2|
------ ------ ------
|3Gc1| \| X | |3Gb1|
------ ------ ------
IIRC, the only benefit of a spare device is, we can ensure there is
enough space for a device place.(If the failing one is no larger than
spare).
But the cost is, increase in replace data copy and unfair chunk allocation.
So I am not sure if the cost is good enough for the case.
At least, enhancing the chunk relocation to fulfill the case 2) will
bring a much smaller code base.
Thanks,
Qu
>
> OR if there is a spare device which is already added before the, just
> run
>
> btrfs dev scan [/dev/sde]
>
> this will register the spare device to the kernel.
>
> btrfs fi show
> Label: none uuid: 52f170c1-725c-457d-8cfd-d57090460091
> Total devices 2 FS bytes used 112.00KiB
> devid 1 size 2.00GiB used 417.50MiB path /dev/sdc
> devid 2 size 2.00GiB used 417.50MiB path /dev/sdd
>
> Global spare
> device size 3.00GiB path /dev/sde
>
> Thats it.
>
> Auto replace:
> Replace happens automatically, that is when there is any write
> failed or flush failed, the device will be marked as failed, which
> will stop any further IO attempt to that device. And in the next commit
> thread cycle the auto replace will pick the spare device (/dev/sde is
> above example) to replace the failed device. And so the btrfs volume is
> back to a healthy state.
>
>
> Its btrfs Global spare:
> as of now only global hot spare is supported, that is hot spare(s)
> are for all the btrfs FS in the system.
>
> No spare when device failed:
> It would scan for spare device at the rate of transaction commit
> and will trigger the auto replace when ever spare device is added.
>
> Priority:
> In some future work there can be some chronological order to pick
> a spare and the failed device.
>
>
> Patches:
>
> Kernel:
> First, it needs, Qu's per chunk missing device patchset,
> which is part of the set here and also there is a light optimization
> (patch 5/15) which was required as part of this enhancement.
>
> Next patches 7,8/15 brings in support, to manage the transition of
> devices from online (no state) to offline OR failed state dynamically.
> On top of static device state like the current "missing" state.
>
> Patch 9/15 fixes a bug where in we should have blocked the incompatible
> feature at the device scan/add level instead/also at in the mount level.
> This is because we don't have to bring a device into the device list,
> if it is incompatible.
>
> Next patches 10,11,12,13/15 adds support for Spare device. For the
> details on how to add a spare device kindly see further below.
> For kernel with out spare feature supported the spare device
> is kept away. And when the kernel supports the spare device, it will
> inhibit from mounting it. Further these patch set provides helper
> function to pick a spare device and release a spare device back to
> the spare device pool.
>
> Patch 14/15 provides function for auto replace, this is mainly
> from the existing replace code, and in the long run I see opportunity
> to merge these code with the replace code that is triggered from
> the user spare.
>
> Last 15/15, uses all these facilities, picks a failed device and
> triggers a auto replace in a kthread (casualty_kthread())
>
>
> Progs:
> Would need 4 patches as listed below.
>
>
> Known Bug:
>
> As now I see below stale kmem cache during module unload. Which
> I am digging.
> ------
> BUG btrfs_path (Not tainted): Objects remaining in btrfs_path on kmem_cache_close()
> ------
>
> Anand Jain (10):
> btrfs: optimize btrfs_check_degradable() for calls outside of barrier
> btrfs: introduce device dynamic state transition to offline or failed
> btrfs: check device for critical errors and mark failed
> btrfs: block incompatible optional features at scan
> btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV
> btrfs: add check not to mount a spare device
> btrfs: support btrfs dev scan for spare device
> btrfs: provide framework to get and put a spare device
> btrfs: introduce helper functions to perform hot replace
> btrfs: check for failed device and hot replace
>
> Qu Wenruo (5):
> btrfs: Introduce a new function to check if all chunks a OK for
> degraded mount
> btrfs: Do per-chunk check for mount time check
> btrfs: Do per-chunk degraded check for remount
> btrfs: Allow barrier_all_devices to do per-chunk device check
> btrfs: Cleanup num_tolerated_disk_barrier_failures
>
> fs/btrfs/ctree.h | 7 +-
> fs/btrfs/dev-replace.c | 116 ++++++++++++++++++++
> fs/btrfs/dev-replace.h | 1 +
> fs/btrfs/disk-io.c | 211 +++++++++++++++++++++++-------------
> fs/btrfs/disk-io.h | 2 -
> fs/btrfs/super.c | 20 +++-
> fs/btrfs/transaction.c | 3 +-
> fs/btrfs/volumes.c | 283 ++++++++++++++++++++++++++++++++++++++++++++++---
> fs/btrfs/volumes.h | 27 +++++
> 9 files changed, 571 insertions(+), 99 deletions(-)
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
2015-11-12 2:15 ` Qu Wenruo
@ 2015-11-12 6:46 ` Duncan
2015-11-12 13:04 ` Austin S Hemmelgarn
` (2 subsequent siblings)
3 siblings, 0 replies; 43+ messages in thread
From: Duncan @ 2015-11-12 6:46 UTC (permalink / raw)
To: linux-btrfs
Qu Wenruo posted on Thu, 12 Nov 2015 10:15:09 +0800 as excerpted:
> Anand Jain wrote on 2015/11/09 18:56 +0800:
>> These set of patches provides btrfs hot spare and auto replace support
>> for you review and comments.
>>
>> First, here below are the simple example steps to configure the same:
>>
>> Add a spare device:
>> btrfs spare add /dev/sde -f
>
> I'm sorry but I didn't quite see the benefit of a spare device.
You could ask the mdraid folks much the same question about spares there,
and the answer would I think be very much the same... I'll just present
a couple points of the several that can be made.
Perhaps the biggest point for this particular case...
What you're forgetting is that the work here introduces the _global_
spare -- one spare device (or pool of devices) for the whole set of
btrfs, no matter how many independent btrfs there happen to be on a
machine.
Your example used just one filesystem, in which case this point is null
and void, but what of the case where there's two? You can't have the
same device be part of *both* filesystems. What if the device is part of
btrfs A, but btrfs b is the one that loses a device? In your example,
you're out of luck. But as a global spare, the "extra" device doesn't
become attached to a specific btrfs until one of the existing devices
goes bad. With working global spares, the first btrfs to have a bad
device will see the spare and be able to grab it, no matter which of the
two (or 10 or 100) separate btrfs it happens to be, as it's a _global_
spare, not actually attached to a specific btrfs until it is needed as a
replacement.
By extension, there's the spare _pool_. Suppose you have three separate
btrfs and three separate "extra" devices. You can attach one to each
btrfs and be fine... if the existing devices all play nice and a second
one doesn't go out on any of them until all three have had one device go
out. But what happens if one btrfs gets some real heavy unexpected use
and loses three devices before the other two btrfs lose any? With global
spares, the unlucky btrfs can call for one at a time, and assuming
there's time for it to fully integrate before the next one dies, it can
call for the next and the next, and get all three, one at a time, without
the admin having to worry about manually device deleting the second and
third devices from their other btrfs, to attach to the unlucky/greedy one.
And that three btrfs, three-device global-spare-pool scenario, with an
unlucky/greedy btrfs ending up getting all three spares, brings up a
second point...
In that scenario without global hot-spares, say you've added one more
device to what ends up the unlucky btrfs than it'd need, so with auto-
repair it can detect a failing device and automatically device-delete it
down to its device-minimum (either due to raid level or due to
capacity). Now another device fails. Oops! Can't auto-repair now!
But in the global hot-spare-pool scenario, with one repair done, there's
still two spares in the pool, so at the second device failure, it can
automatically pull a second from the pool (where given the pool it can be
instead of already attached to one of the other btrfs') and complete the
second repair, still without admin intervention. Same again for the
third.
So an admin who doesn't want to have to intervene when he's supposedly on
vacation can setup a queue of spares, and sure, if he's a good admin,
when a device goes bad and a spare is automatically pulled in to replace
it, he'll be notified, and he'll probably login to check logs and see
what happened, but no problem, there's still others in the queue.
In fact, since the common folk wisdom says this sort of bad event
(someone you know getting a disease like cancer or dying, devices in your
machines going bad, friends having their significant others leave them...
at least here in the US, folk wisdom says it always happens in threes, so
particularly once two happen, people start wondering who/where the third
one is going to occur) happens in threes, a somewhat superstitious admin
could ensure he had four, well, he's cautious too, so make it five,
global spares setup, just in case. Then it wouldn't matter if the three
devices going bad were all on the same btrfs, or one each on the three,
or two on one and a third elsewhere, he'd still have two additional
devices in the pool, just to cover his a** if the three /did/ go out.
Now about time he loses a fourth, he better be on the phone confirming a
ticket home, but even then, he still has the one still in the pool, as he
was cautious, too, hopefully giving him time to actually /make/ it home
before two more go out leaving the pool empty and a btrfs a device down.
And if he's /that/ unlucky, well, maybe he better make a call to his
lawyer confirming his last will and testament before he steps on that
plane, too. =:^(
Just a short mention of a third point, too.
Devices in the pool presumably will be idle and thus spun down, thus not
already wearing out like they would be if they were already in use all
that time they're in the spare pool.
Those are the biggest and most obvious ones I know of. Talk to any good
admin who has handled lots of raid and I'm sure they'll provide a few
more.
FWIW, there's also a case to be made for spare pools that may not be
global, but that can still be attached to more than one btrfs/raid, if
desired. Consider the case for two pools, one with fast but small ssds
while the other has slow but large spinning rust, with the ability to map
individual btrfs to one or the other pool, or to neither, for instance.
But this patch series simply introduces the global pool and
functionality, leaving such fancy additional functionality for later.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
2015-11-12 2:15 ` Qu Wenruo
2015-11-12 6:46 ` Duncan
@ 2015-11-12 13:04 ` Austin S Hemmelgarn
2015-11-13 1:07 ` Qu Wenruo
2015-11-12 19:08 ` Goffredo Baroncelli
2015-11-13 10:18 ` Anand Jain
3 siblings, 1 reply; 43+ messages in thread
From: Austin S Hemmelgarn @ 2015-11-12 13:04 UTC (permalink / raw)
To: Qu Wenruo, Anand Jain, linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 1874 bytes --]
On 2015-11-11 21:15, Qu Wenruo wrote:
> Hi Anand,
>
> Nice work.
> But I have some small questions about it.
>
> Anand Jain wrote on 2015/11/09 18:56 +0800:
>> These set of patches provides btrfs hot spare and auto replace support
>> for you review and comments.
>>
>> First, here below are the simple example steps to configure the same:
>>
>> Add a spare device:
>> btrfs spare add /dev/sde -f
>
> I'm sorry but I didn't quite see the benefit of a spare device.
Aside from what Duncan said (and I happen to agree with him), there is
also the fact that hot-spares are (at least traditionally in most RAID
systems) usually used with RAID5 or RAID6 (or some other parity scheme).
So, to summarize:
1. Hot spares are more useful for most users in global context, and in
that case only if they have more than one filesystem.
2. A pool of hot spares is even more useful.
3. Assuming whole disk usage (as opposed to partitioning), the hot spare
will have no load on it until it gets used, at which point it will
almost always be in better physical condition than the device it
replaced (which is important for HA systems, in such cases you replace
the disk that failed, and make the new disk a hot spare)
4. Hot spares are more often used (at least from what I've seen) on
parity based raid systems than raid1.
In the rather limited case you outlined, I would probably just use raid1
across all three devices myself (unless they were whole disks and not
individual partitions, in which case I'd use a hot spare), but looking
beyond that at my actual usage of BTRFS (multiple filesystems with
multiple different raid profiles, spread across various disks), hot
spares are definitely useful (although they would be more useful if I
could specify that a given hot spare be used only for a given set of
filesystems).
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
2015-11-12 2:15 ` Qu Wenruo
2015-11-12 6:46 ` Duncan
2015-11-12 13:04 ` Austin S Hemmelgarn
@ 2015-11-12 19:08 ` Goffredo Baroncelli
2015-11-13 10:18 ` Anand Jain
3 siblings, 0 replies; 43+ messages in thread
From: Goffredo Baroncelli @ 2015-11-12 19:08 UTC (permalink / raw)
To: Qu Wenruo, Anand Jain, linux-btrfs
On 2015-11-12 03:15, Qu Wenruo wrote:
> Hi Anand,
>
> Nice work.
> But I have some small questions about it.
>
> Anand Jain wrote on 2015/11/09 18:56 +0800:
>> These set of patches provides btrfs hot spare and auto replace support
>> for you review and comments.
>>
>> First, here below are the simple example steps to configure the same:
>>
>> Add a spare device:
>> btrfs spare add /dev/sde -f
>
> I'm sorry but I didn't quite see the benefit of a spare device.
>
> Let's take the following example:
>
> 1) 2 RAID1 + 1 spare
> (A + B) + C
>
> 2) 3 RAID1
> (A + B + C)
> Let's assume they are all 12G size, and there are 3 raid1 chunks.
> Each one is 3G size.
>
> In my understanding, in normal operation case:
>
> For case 1), all raid chunks should only be allocated into 2 RAID disks,
> and spare one should contains no raid1 chunks.
>
> A B C
> ------ ------ ------
> |free| |free| |free|
> ------ ------ | |
> |3Ga1| |3Ga2| | |
> ------ ------ | |
> |3Gb1| |3Gb2| | |
> ------ ------ | |
> |3Gc1| |3Gc2| | |
> ------ ------ ------
>
>
> For case 2), all raid1 chunks will be allocated into all 3 disks, making the allocation more fair.
> A B C
> ------ ------ ------
> |free| |free| |free|
> ------ ------ ------
> |free| |free| |free|
> ------ ------ ------
> |3Gb2| |3Ga1| |3Ga2|
> ------ ------ ------
> |3Gc1| |3Gc2| |3Gb1|
> ------ ------ ------
>
>
> At least in normal operation case, case 1) makes device C useless, and reduce the total usable space.
>
> In disk B failure case:
>
> For case 1), we can auto replace B with C.
> And it will copy all data chunks from A to C.
> Need to copy 9G data.
>
> And after replace:
> A B C
> ------ ------ ------
> |free| | X | |free|
> ------ ------ ------
> |3Ga1| | X |->|3Ga2|
> ------ ------ ------
> |3Gb1| | X |->|3Gb2|
> ------ ------ ------
> |3Gc1| | X |->|3Gc2|
> ------ ------ ------
>
>
>
> For case 2), we can just relocate and recover the bad chunks in B.
> It it should only need to copy 6G data.
>
> And after the "recovery", it should be much the same as case 1):
> A B C
> ------ ------ ------
> |free| | X | |free|
> ------ ------ ------
> |3Ga1|<\| X |/>|3Gc1|
> ------ ------ ------
> |3Gb2| || X |/ |3Ga2|
> ------ ------ ------
> |3Gc1| \| X | |3Gb1|
> ------ ------ ------
>
>
> IIRC, the only benefit of a spare device is, we can ensure there is enough space for a device place.(If the failing one is no larger than spare).
>
> But the cost is, increase in replace data copy and unfair chunk allocation.
>
> So I am not sure if the cost is good enough for the case.
> At least, enhancing the chunk relocation to fulfill the case 2) will bring a much smaller code base.
>
> Thanks,
> Qu
Interesting analysis. Another difference between the two scenarios, is that in the first case (A+B+spare) is that the spare doesn't work until it is needed: less power consumption and when needed you are using a new disk instead of an used one.
>>
>> OR if there is a spare device which is already added before the, just
>> run
>>
>> btrfs dev scan [/dev/sde]
>>
>> this will register the spare device to the kernel.
>>
>> btrfs fi show
>> Label: none uuid: 52f170c1-725c-457d-8cfd-d57090460091
>> Total devices 2 FS bytes used 112.00KiB
>> devid 1 size 2.00GiB used 417.50MiB path /dev/sdc
>> devid 2 size 2.00GiB used 417.50MiB path /dev/sdd
>>
>> Global spare
>> device size 3.00GiB path /dev/sde
>>
>> Thats it.
>>
>> Auto replace:
>> Replace happens automatically, that is when there is any write
>> failed or flush failed, the device will be marked as failed, which
>> will stop any further IO attempt to that device. And in the next commit
>> thread cycle the auto replace will pick the spare device (/dev/sde is
>> above example) to replace the failed device. And so the btrfs volume is
>> back to a healthy state.
>>
>>
>> Its btrfs Global spare:
>> as of now only global hot spare is supported, that is hot spare(s)
>> are for all the btrfs FS in the system.
>>
>> No spare when device failed:
>> It would scan for spare device at the rate of transaction commit
>> and will trigger the auto replace when ever spare device is added.
>>
>> Priority:
>> In some future work there can be some chronological order to pick
>> a spare and the failed device.
>>
>>
>> Patches:
>>
>> Kernel:
>> First, it needs, Qu's per chunk missing device patchset,
>> which is part of the set here and also there is a light optimization
>> (patch 5/15) which was required as part of this enhancement.
>>
>> Next patches 7,8/15 brings in support, to manage the transition of
>> devices from online (no state) to offline OR failed state dynamically.
>> On top of static device state like the current "missing" state.
>>
>> Patch 9/15 fixes a bug where in we should have blocked the incompatible
>> feature at the device scan/add level instead/also at in the mount level.
>> This is because we don't have to bring a device into the device list,
>> if it is incompatible.
>>
>> Next patches 10,11,12,13/15 adds support for Spare device. For the
>> details on how to add a spare device kindly see further below.
>> For kernel with out spare feature supported the spare device
>> is kept away. And when the kernel supports the spare device, it will
>> inhibit from mounting it. Further these patch set provides helper
>> function to pick a spare device and release a spare device back to
>> the spare device pool.
>>
>> Patch 14/15 provides function for auto replace, this is mainly
>> from the existing replace code, and in the long run I see opportunity
>> to merge these code with the replace code that is triggered from
>> the user spare.
>>
>> Last 15/15, uses all these facilities, picks a failed device and
>> triggers a auto replace in a kthread (casualty_kthread())
>>
>>
>> Progs:
>> Would need 4 patches as listed below.
>>
>>
>> Known Bug:
>>
>> As now I see below stale kmem cache during module unload. Which
>> I am digging.
>> ------
>> BUG btrfs_path (Not tainted): Objects remaining in btrfs_path on kmem_cache_close()
>> ------
>>
>> Anand Jain (10):
>> btrfs: optimize btrfs_check_degradable() for calls outside of barrier
>> btrfs: introduce device dynamic state transition to offline or failed
>> btrfs: check device for critical errors and mark failed
>> btrfs: block incompatible optional features at scan
>> btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV
>> btrfs: add check not to mount a spare device
>> btrfs: support btrfs dev scan for spare device
>> btrfs: provide framework to get and put a spare device
>> btrfs: introduce helper functions to perform hot replace
>> btrfs: check for failed device and hot replace
>>
>> Qu Wenruo (5):
>> btrfs: Introduce a new function to check if all chunks a OK for
>> degraded mount
>> btrfs: Do per-chunk check for mount time check
>> btrfs: Do per-chunk degraded check for remount
>> btrfs: Allow barrier_all_devices to do per-chunk device check
>> btrfs: Cleanup num_tolerated_disk_barrier_failures
>>
>> fs/btrfs/ctree.h | 7 +-
>> fs/btrfs/dev-replace.c | 116 ++++++++++++++++++++
>> fs/btrfs/dev-replace.h | 1 +
>> fs/btrfs/disk-io.c | 211 +++++++++++++++++++++++-------------
>> fs/btrfs/disk-io.h | 2 -
>> fs/btrfs/super.c | 20 +++-
>> fs/btrfs/transaction.c | 3 +-
>> fs/btrfs/volumes.c | 283 ++++++++++++++++++++++++++++++++++++++++++++++---
>> fs/btrfs/volumes.h | 27 +++++
>> 9 files changed, 571 insertions(+), 99 deletions(-)
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
` (17 preceding siblings ...)
2015-11-12 2:15 ` Qu Wenruo
@ 2015-11-12 19:21 ` Goffredo Baroncelli
2015-11-13 10:20 ` Anand Jain
2015-11-16 13:41 ` Austin S Hemmelgarn
19 siblings, 1 reply; 43+ messages in thread
From: Goffredo Baroncelli @ 2015-11-12 19:21 UTC (permalink / raw)
To: Anand Jain, linux-btrfs
On 2015-11-09 11:56, Anand Jain wrote:
> These set of patches provides btrfs hot spare and auto replace support
> for you review and comments.
Hi Anand,
is there any reason to put this kind of logic in the kernel space ? I think that it could be more simply to create a daemon which checks the disks and when needed it starts a replace...
The pool policy could be more sophisticated: some filesystem could require a "dedicated" pool (for example because the disks are in the same enclosure); in other case a global pool may be more useful.
Another feature of this daemon could be to add a disk when the disk space is too low, or to start a balance when there is no space to allocate further chunk.....
Of course all these logic could be implemented in kernel space, but I think that we should avoid that when possible. Moreover in user space the logging is more easy....
Only my 2¢...
BR
G.Baroncelli
[...]
--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
2015-11-12 13:04 ` Austin S Hemmelgarn
@ 2015-11-13 1:07 ` Qu Wenruo
2015-11-13 10:20 ` Anand Jain
0 siblings, 1 reply; 43+ messages in thread
From: Qu Wenruo @ 2015-11-13 1:07 UTC (permalink / raw)
To: Austin S Hemmelgarn, Anand Jain, linux-btrfs
Austin S Hemmelgarn wrote on 2015/11/12 08:04 -0500:
> On 2015-11-11 21:15, Qu Wenruo wrote:
>> Hi Anand,
>>
>> Nice work.
>> But I have some small questions about it.
>>
>> Anand Jain wrote on 2015/11/09 18:56 +0800:
>>> These set of patches provides btrfs hot spare and auto replace support
>>> for you review and comments.
>>>
>>> First, here below are the simple example steps to configure the same:
>>>
>>> Add a spare device:
>>> btrfs spare add /dev/sde -f
>>
>> I'm sorry but I didn't quite see the benefit of a spare device.
> Aside from what Duncan said (and I happen to agree with him), there is
> also the fact that hot-spares are (at least traditionally in most RAID
> systems) usually used with RAID5 or RAID6 (or some other parity scheme).
>
> So, to summarize:
> 1. Hot spares are more useful for most users in global context, and in
> that case only if they have more than one filesystem.
> 2. A pool of hot spares is even more useful.
Agreed, just as Ducan said.
Although only one spare device is supported yet.
> 3. Assuming whole disk usage (as opposed to partitioning), the hot spare
> will have no load on it until it gets used, at which point it will
> almost always be in better physical condition than the device it
> replaced (which is important for HA systems, in such cases you replace
> the disk that failed, and make the new disk a hot spare)
OK, that's also right, if no one is calling btrfs dev scan with a interval.
> 4. Hot spares are more often used (at least from what I've seen) on
> parity based raid systems than raid1.
I'm not familiar with parity based RAID5/6 in btrfs, so I can't say for
sure.
But considering the chunk based RAID feature of btrfs, I think parity
based RAID of BTRFS is not that different from current btrfs RAID1.
Just stripe size difference. hole chunk size(RAID1) vs real stripe size
(btrfs RAID5/6)
And if Btrfs support to specify the number of disks used in raid5/6
chunk allocation, for example only use any 3 devices to allocation raid5
chunk even there are 4 devices, it will be much the same case.
I choose Btrfs Raid1 as an example in my mail just because Btrfs raid1
will only use 2 devices no matter how many devices are in the filesystem.
So I'm very curious of why parity based RAID is often used with hot spare.
Thanks,
Qu
>
> In the rather limited case you outlined, I would probably just use raid1
> across all three devices myself (unless they were whole disks and not
> individual partitions, in which case I'd use a hot spare), but looking
> beyond that at my actual usage of BTRFS (multiple filesystems with
> multiple different raid profiles, spread across various disks), hot
> spares are definitely useful (although they would be more useful if I
> could specify that a given hot spare be used only for a given set of
> filesystems).
>
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
2015-11-10 12:13 ` Austin S Hemmelgarn
@ 2015-11-13 10:17 ` Anand Jain
2015-11-13 12:25 ` Austin S Hemmelgarn
2015-11-15 18:10 ` Christoph Anton Mitterer
0 siblings, 2 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-13 10:17 UTC (permalink / raw)
To: Austin S Hemmelgarn, Duncan; +Cc: linux-btrfs
Thanks for the comments.
Sorry for the delay.
Trying to find out if there is any pending concerns...
> Hopefully, per-filesystem hot-spares will be a high priority too, as
> that type of usage is pretty much required for many enterprise type
> uses, although that doesn't need to be the same code doing it (in fact,
> I could see having per-fs spares and global spares both available
> potentially being very useful).
That's doable with in the current design as well, however stability
and hardening (fixing the possible loop holes) is kind of priority.
Thanks, Anand
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
2015-11-12 2:15 ` Qu Wenruo
` (2 preceding siblings ...)
2015-11-12 19:08 ` Goffredo Baroncelli
@ 2015-11-13 10:18 ` Anand Jain
3 siblings, 0 replies; 43+ messages in thread
From: Anand Jain @ 2015-11-13 10:18 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
Thanks for the comments.
> Let's take the following example:
>
> 1) 2 RAID1 + 1 spare
> (A + B) + C
>
> 2) 3 RAID1
> (A + B + C)
> At least in normal operation case, case 1) makes device C useless, and
Yes.
> For case 2), we can just relocate and recover the bad chunks in B.
> It it should only need to copy 6G data.
Case 2 is Wrong in the context of spare.
Unless space usage is limited to 1/3 of total space.
But when you do that, case 1 drawback will equally apply
to case 2 as well.
> It it should only need to copy 6G data.
Its true as long as you don't replace the failed B
and bring the configuration to its original. However
when you do that, Data moved will be more than case 1.
So this is not fully correct.
Thanks, Anand
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
2015-11-12 19:21 ` Goffredo Baroncelli
@ 2015-11-13 10:20 ` Anand Jain
2015-11-14 11:05 ` Goffredo Baroncelli
0 siblings, 1 reply; 43+ messages in thread
From: Anand Jain @ 2015-11-13 10:20 UTC (permalink / raw)
To: kreijack, linux-btrfs
Thanks for comments.
On 11/13/2015 03:21 AM, Goffredo Baroncelli wrote:
> On 2015-11-09 11:56, Anand Jain wrote:
>> These set of patches provides btrfs hot spare and auto replace support
>> for you review and comments.
>
> Hi Anand,
>
> is there any reason to put this kind of logic in the kernel space ?
> I think that it could be more simply to create a daemon which checks
> the disks and when needed it starts a replace...
> The pool policy could be more sophisticated: some filesystem could
> require a "dedicated" pool (for example because the disks are in the
> same enclosure); in other case a global pool may be more useful.
Thats true. It can be added as an enhancement on top of current
implementation, I will, if time permits. Current priority is
to have stability on what could possibly go wrong (in configuring)
and how stable code towards it.
> Another feature of this daemon could be to add a disk when the disk
> space is too low,
That will be at the cost of a spare device which user should review
the trade-offs and do it manually ? I am not sure.
> or to start a balance when there is no space to
> allocate further chunk.....
Yep. As you notice, the thread created here is casualty_kthread()
(instead of replace_kthread()) over the long run I wish to provide
that feature in this thread, as it is a mutually exclusive operations
with replace.
> Of course all these logic could be implemented in kernel space,
> but I think that we should avoid that when possible.
Easy to handle the mutually_exclusive parts with in the kernel
and Its better to have the important logic at one place. Two heads
operating on an org looking and feeling different things will lead
to wrong decisions.
> Moreover in user space the logging is more easy....
Thanks, Anand
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
2015-11-13 1:07 ` Qu Wenruo
@ 2015-11-13 10:20 ` Anand Jain
2015-11-14 0:54 ` Qu Wenruo
0 siblings, 1 reply; 43+ messages in thread
From: Anand Jain @ 2015-11-13 10:20 UTC (permalink / raw)
To: Qu Wenruo, Austin S Hemmelgarn, linux-btrfs
Thanks for commenting.
>>> I'm sorry but I didn't quite see the benefit of a spare device.
>> Aside from what Duncan said (and I happen to agree with him), there is
>> also the fact that hot-spares are (at least traditionally in most RAID
>> systems) usually used with RAID5 or RAID6 (or some other parity scheme).
>>
>> So, to summarize:
>> 1. Hot spares are more useful for most users in global context, and in
>> that case only if they have more than one filesystem.
>> 2. A pool of hot spares is even more useful.
>
> Agreed, just as Ducan said.
> Although only one spare device is supported yet.
You can add more than one spare device currently.
>> 3. Assuming whole disk usage (as opposed to partitioning), the hot spare
>> will have no load on it until it gets used, at which point it will
>> almost always be in better physical condition than the device it
>> replaced (which is important for HA systems, in such cases you replace
>> the disk that failed, and make the new disk a hot spare)
>
> OK, that's also right, if no one is calling btrfs dev scan with a interval.
Not too sure what you mean about the scan part.
Thanks, Anand
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
2015-11-13 10:17 ` Anand Jain
@ 2015-11-13 12:25 ` Austin S Hemmelgarn
2015-11-15 18:10 ` Christoph Anton Mitterer
1 sibling, 0 replies; 43+ messages in thread
From: Austin S Hemmelgarn @ 2015-11-13 12:25 UTC (permalink / raw)
To: Anand Jain, Duncan; +Cc: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 966 bytes --]
On 2015-11-13 05:17, Anand Jain wrote:
>
>
> Thanks for the comments.
>
> Sorry for the delay.
> Trying to find out if there is any pending concerns...
FWIW, I'm planning on setting up a VM to test this over the weekend (I
would have already, but I've been kind of busy at work this week), so
I'll hopefully have some more feedback on Monday.
>
>> Hopefully, per-filesystem hot-spares will be a high priority too, as
>> that type of usage is pretty much required for many enterprise type
>> uses, although that doesn't need to be the same code doing it (in fact,
>> I could see having per-fs spares and global spares both available
>> potentially being very useful).
>
> That's doable with in the current design as well, however stability
> and hardening (fixing the possible loop holes) is kind of priority.
Entirely understandable, I would actually be somewhat worried if
stability and hardening weren't the priority right now.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
2015-11-13 10:20 ` Anand Jain
@ 2015-11-14 0:54 ` Qu Wenruo
2015-11-16 13:39 ` Austin S Hemmelgarn
0 siblings, 1 reply; 43+ messages in thread
From: Qu Wenruo @ 2015-11-14 0:54 UTC (permalink / raw)
To: Anand Jain, Qu Wenruo, Austin S Hemmelgarn, linux-btrfs
在 2015年11月13日 18:20, Anand Jain 写道:
>
>
> Thanks for commenting.
>
>>>> I'm sorry but I didn't quite see the benefit of a spare device.
>>> Aside from what Duncan said (and I happen to agree with him), there is
>>> also the fact that hot-spares are (at least traditionally in most RAID
>>> systems) usually used with RAID5 or RAID6 (or some other parity scheme).
>>>
>>> So, to summarize:
>>> 1. Hot spares are more useful for most users in global context, and in
>>> that case only if they have more than one filesystem.
>>> 2. A pool of hot spares is even more useful.
>>
>> Agreed, just as Ducan said.
>> Although only one spare device is supported yet.
>
> You can add more than one spare device currently.
>
>>> 3. Assuming whole disk usage (as opposed to partitioning), the hot spare
>>> will have no load on it until it gets used, at which point it will
>>> almost always be in better physical condition than the device it
>>> replaced (which is important for HA systems, in such cases you replace
>>> the disk that failed, and make the new disk a hot spare)
>>
>> OK, that's also right, if no one is calling btrfs dev scan with a
>> interval.
>
> Not too sure what you mean about the scan part.
Btrfs device scan will need to read the sb of the device.
So the hot spare device won't really sleep for a long time as each time
btrfs scan devices, it will wakeup the device.
Not sure about soft raid hot spare. Maybe they won't cause any IO on the
device? Or just the same with btrfs hot spare.
Thanks,
Qu
>
> Thanks, Anand
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
2015-11-13 10:20 ` Anand Jain
@ 2015-11-14 11:05 ` Goffredo Baroncelli
0 siblings, 0 replies; 43+ messages in thread
From: Goffredo Baroncelli @ 2015-11-14 11:05 UTC (permalink / raw)
To: Anand Jain, linux-btrfs
On 2015-11-13 11:20, Anand Jain wrote:
>
> Thanks for comments.
>
> On 11/13/2015 03:21 AM, Goffredo Baroncelli wrote:
>> On 2015-11-09 11:56, Anand Jain wrote:
>>> These set of patches provides btrfs hot spare and auto replace support
>>> for you review and comments.
>>
>> Hi Anand,
>>
>> is there any reason to put this kind of logic in the kernel space ?
[...]
>
>> Another feature of this daemon could be to add a disk when the disk
>> space is too low,
>
> That will be at the cost of a spare device which user should review
> the trade-offs and do it manually ? I am not sure.
If you have more than one spare, you can do automatically both: a new disk is added when the space is low, and a disk is replaced in case of failure. If you have only one spare: you may decide to reserve it only for replacing a failed disk. But this should be a configurable option: a low space leads to a not available filesystem, a failed disk means a higher likelihood to loosing all the filesystem. I am not sure which should be the more critical.
>> or to start a balance when there is no space to
>> allocate further chunk.....
>
> Yep. As you notice, the thread created here is casualty_kthread()
> (instead of replace_kthread()) over the long run I wish to provide
> that feature in this thread, as it is a mutually exclusive operations
> with replace.
A disk replacing should be an higher priority operation. In case of disk failure during a balance/defrag, these operation should be stopped to allow a replace.
If you want to start a replace, you should stop others (long time) operations like balance and defrag.
>
>> Of course all these logic could be implemented in kernel space,
>> but I think that we should avoid that when possible.
>
> Easy to handle the mutually_exclusive parts with in the kernel
> and Its better to have the important logic at one place. Two heads
> operating on an org looking and feeling different things will lead
> to wrong decisions.
Which is the other logic which you are referring ?
>
>> Moreover in user space the logging is more easy....
>
> Thanks, Anand
--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
2015-11-13 10:17 ` Anand Jain
2015-11-13 12:25 ` Austin S Hemmelgarn
@ 2015-11-15 18:10 ` Christoph Anton Mitterer
1 sibling, 0 replies; 43+ messages in thread
From: Christoph Anton Mitterer @ 2015-11-15 18:10 UTC (permalink / raw)
To: linux-btrfs; +Cc: Anand Jain
[-- Attachment #1: Type: text/plain, Size: 128 bytes --]
Hey.
You guys may want to update:
https://btrfs.wiki.kernel.org/index.php/Project_ideas#Hot_spare_support
Cheers,
Chris.
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
2015-11-14 0:54 ` Qu Wenruo
@ 2015-11-16 13:39 ` Austin S Hemmelgarn
0 siblings, 0 replies; 43+ messages in thread
From: Austin S Hemmelgarn @ 2015-11-16 13:39 UTC (permalink / raw)
To: Qu Wenruo, Anand Jain, Qu Wenruo, linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 2360 bytes --]
On 2015-11-13 19:54, Qu Wenruo wrote:
>
>
> 在 2015年11月13日 18:20, Anand Jain 写道:
>>
>>
>> Thanks for commenting.
>>
>>>>> I'm sorry but I didn't quite see the benefit of a spare device.
>>>> Aside from what Duncan said (and I happen to agree with him), there is
>>>> also the fact that hot-spares are (at least traditionally in most RAID
>>>> systems) usually used with RAID5 or RAID6 (or some other parity
>>>> scheme).
>>>>
>>>> So, to summarize:
>>>> 1. Hot spares are more useful for most users in global context, and in
>>>> that case only if they have more than one filesystem.
>>>> 2. A pool of hot spares is even more useful.
>>>
>>> Agreed, just as Ducan said.
>>> Although only one spare device is supported yet.
>>
>> You can add more than one spare device currently.
>>
>>>> 3. Assuming whole disk usage (as opposed to partitioning), the hot
>>>> spare
>>>> will have no load on it until it gets used, at which point it will
>>>> almost always be in better physical condition than the device it
>>>> replaced (which is important for HA systems, in such cases you replace
>>>> the disk that failed, and make the new disk a hot spare)
>>>
>>> OK, that's also right, if no one is calling btrfs dev scan with a
>>> interval.
>>
>> Not too sure what you mean about the scan part.
>
> Btrfs device scan will need to read the sb of the device.
> So the hot spare device won't really sleep for a long time as each time
> btrfs scan devices, it will wakeup the device.
Um, no, unless you have a device scan on a cron job, you will only scan
at boot (and the disk will usually be running then anyway, because most
firmware spins up all disks at boot), because using mkfs or (I assume)
registering the hot spare the first time automatically registers it with
the kernel module.
>
> Not sure about soft raid hot spare. Maybe they won't cause any IO on the
> device? Or just the same with btrfs hot spare.
That depends on the type of hot spare. Most soft raid systems use a
similar policy to this patch-set (let the hot spare sit there until we
need, then auto-replace when a device fails), but some use it actively
in the set without counting it as part of the capacity (I see this
mostly in RAID6 setups, where it just reshapes the array online to
exclude the failed device).
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
` (18 preceding siblings ...)
2015-11-12 19:21 ` Goffredo Baroncelli
@ 2015-11-16 13:41 ` Austin S Hemmelgarn
2015-11-16 22:07 ` Anand Jain
19 siblings, 1 reply; 43+ messages in thread
From: Austin S Hemmelgarn @ 2015-11-16 13:41 UTC (permalink / raw)
To: Anand Jain, linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 5214 bytes --]
On 2015-11-09 05:56, Anand Jain wrote:
> These set of patches provides btrfs hot spare and auto replace support
> for you review and comments.
>
> First, here below are the simple example steps to configure the same:
>
> Add a spare device:
> btrfs spare add /dev/sde -f
>
> OR if there is a spare device which is already added before the, just
> run
>
> btrfs dev scan [/dev/sde]
>
> this will register the spare device to the kernel.
>
> btrfs fi show
> Label: none uuid: 52f170c1-725c-457d-8cfd-d57090460091
> Total devices 2 FS bytes used 112.00KiB
> devid 1 size 2.00GiB used 417.50MiB path /dev/sdc
> devid 2 size 2.00GiB used 417.50MiB path /dev/sdd
>
> Global spare
> device size 3.00GiB path /dev/sde
>
> Thats it.
>
> Auto replace:
> Replace happens automatically, that is when there is any write
> failed or flush failed, the device will be marked as failed, which
> will stop any further IO attempt to that device. And in the next commit
> thread cycle the auto replace will pick the spare device (/dev/sde is
> above example) to replace the failed device. And so the btrfs volume is
> back to a healthy state.
>
>
> Its btrfs Global spare:
> as of now only global hot spare is supported, that is hot spare(s)
> are for all the btrfs FS in the system.
>
> No spare when device failed:
> It would scan for spare device at the rate of transaction commit
> and will trigger the auto replace when ever spare device is added.
>
> Priority:
> In some future work there can be some chronological order to pick
> a spare and the failed device.
>
>
> Patches:
>
> Kernel:
> First, it needs, Qu's per chunk missing device patchset,
> which is part of the set here and also there is a light optimization
> (patch 5/15) which was required as part of this enhancement.
>
> Next patches 7,8/15 brings in support, to manage the transition of
> devices from online (no state) to offline OR failed state dynamically.
> On top of static device state like the current "missing" state.
>
> Patch 9/15 fixes a bug where in we should have blocked the incompatible
> feature at the device scan/add level instead/also at in the mount level.
> This is because we don't have to bring a device into the device list,
> if it is incompatible.
>
> Next patches 10,11,12,13/15 adds support for Spare device. For the
> details on how to add a spare device kindly see further below.
> For kernel with out spare feature supported the spare device
> is kept away. And when the kernel supports the spare device, it will
> inhibit from mounting it. Further these patch set provides helper
> function to pick a spare device and release a spare device back to
> the spare device pool.
>
> Patch 14/15 provides function for auto replace, this is mainly
> from the existing replace code, and in the long run I see opportunity
> to merge these code with the replace code that is triggered from
> the user spare.
>
> Last 15/15, uses all these facilities, picks a failed device and
> triggers a auto replace in a kthread (casualty_kthread())
>
>
> Progs:
> Would need 4 patches as listed below.
>
>
> Known Bug:
>
> As now I see below stale kmem cache during module unload. Which
> I am digging.
> ------
> BUG btrfs_path (Not tainted): Objects remaining in btrfs_path on kmem_cache_close()
> ------
>
> Anand Jain (10):
> btrfs: optimize btrfs_check_degradable() for calls outside of barrier
> btrfs: introduce device dynamic state transition to offline or failed
> btrfs: check device for critical errors and mark failed
> btrfs: block incompatible optional features at scan
> btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV
> btrfs: add check not to mount a spare device
> btrfs: support btrfs dev scan for spare device
> btrfs: provide framework to get and put a spare device
> btrfs: introduce helper functions to perform hot replace
> btrfs: check for failed device and hot replace
>
> Qu Wenruo (5):
> btrfs: Introduce a new function to check if all chunks a OK for
> degraded mount
> btrfs: Do per-chunk check for mount time check
> btrfs: Do per-chunk degraded check for remount
> btrfs: Allow barrier_all_devices to do per-chunk device check
> btrfs: Cleanup num_tolerated_disk_barrier_failures
>
> fs/btrfs/ctree.h | 7 +-
> fs/btrfs/dev-replace.c | 116 ++++++++++++++++++++
> fs/btrfs/dev-replace.h | 1 +
> fs/btrfs/disk-io.c | 211 +++++++++++++++++++++++-------------
> fs/btrfs/disk-io.h | 2 -
> fs/btrfs/super.c | 20 +++-
> fs/btrfs/transaction.c | 3 +-
> fs/btrfs/volumes.c | 283 ++++++++++++++++++++++++++++++++++++++++++++++---
> fs/btrfs/volumes.h | 27 +++++
> 9 files changed, 571 insertions(+), 99 deletions(-)
>
I've thrown everything I can think of at this over the weekend, and
nothing broke (at least, nothing broke that had anything to do with
these patches, I ended up triggering a couple of known bugs that I had
completely forgotten about), so you can add:
Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
2015-11-16 13:41 ` Austin S Hemmelgarn
@ 2015-11-16 22:07 ` Anand Jain
2015-11-17 12:28 ` Austin S Hemmelgarn
0 siblings, 1 reply; 43+ messages in thread
From: Anand Jain @ 2015-11-16 22:07 UTC (permalink / raw)
To: Austin S Hemmelgarn; +Cc: linux-btrfs
On 11/16/2015 09:41 PM, Austin S Hemmelgarn wrote:
> On 2015-11-09 05:56, Anand Jain wrote:
>> These set of patches provides btrfs hot spare and auto replace support
>> for you review and comments.
>>
>> First, here below are the simple example steps to configure the same:
>>
>> Add a spare device:
>> btrfs spare add /dev/sde -f
>>
>> OR if there is a spare device which is already added before the, just
>> run
>>
>> btrfs dev scan [/dev/sde]
>>
>> this will register the spare device to the kernel.
>>
>> btrfs fi show
>> Label: none uuid: 52f170c1-725c-457d-8cfd-d57090460091
>> Total devices 2 FS bytes used 112.00KiB
>> devid 1 size 2.00GiB used 417.50MiB path /dev/sdc
>> devid 2 size 2.00GiB used 417.50MiB path /dev/sdd
>>
>> Global spare
>> device size 3.00GiB path /dev/sde
>>
>> Thats it.
>>
>> Auto replace:
>> Replace happens automatically, that is when there is any write
>> failed or flush failed, the device will be marked as failed, which
>> will stop any further IO attempt to that device. And in the next commit
>> thread cycle the auto replace will pick the spare device (/dev/sde is
>> above example) to replace the failed device. And so the btrfs volume is
>> back to a healthy state.
>>
>>
>> Its btrfs Global spare:
>> as of now only global hot spare is supported, that is hot spare(s)
>> are for all the btrfs FS in the system.
>>
>> No spare when device failed:
>> It would scan for spare device at the rate of transaction commit
>> and will trigger the auto replace when ever spare device is added.
>>
>> Priority:
>> In some future work there can be some chronological order to pick
>> a spare and the failed device.
>>
>>
>> Patches:
>>
>> Kernel:
>> First, it needs, Qu's per chunk missing device patchset,
>> which is part of the set here and also there is a light optimization
>> (patch 5/15) which was required as part of this enhancement.
>>
>> Next patches 7,8/15 brings in support, to manage the transition of
>> devices from online (no state) to offline OR failed state dynamically.
>> On top of static device state like the current "missing" state.
>>
>> Patch 9/15 fixes a bug where in we should have blocked the incompatible
>> feature at the device scan/add level instead/also at in the mount level.
>> This is because we don't have to bring a device into the device list,
>> if it is incompatible.
>>
>> Next patches 10,11,12,13/15 adds support for Spare device. For the
>> details on how to add a spare device kindly see further below.
>> For kernel with out spare feature supported the spare device
>> is kept away. And when the kernel supports the spare device, it will
>> inhibit from mounting it. Further these patch set provides helper
>> function to pick a spare device and release a spare device back to
>> the spare device pool.
>>
>> Patch 14/15 provides function for auto replace, this is mainly
>> from the existing replace code, and in the long run I see opportunity
>> to merge these code with the replace code that is triggered from
>> the user spare.
>>
>> Last 15/15, uses all these facilities, picks a failed device and
>> triggers a auto replace in a kthread (casualty_kthread())
>>
>>
>> Progs:
>> Would need 4 patches as listed below.
>>
>>
>> Known Bug:
>>
>> As now I see below stale kmem cache during module unload. Which
>> I am digging.
>> ------
>> BUG btrfs_path (Not tainted): Objects remaining in btrfs_path on
>> kmem_cache_close()
>> ------
>>
>> Anand Jain (10):
>> btrfs: optimize btrfs_check_degradable() for calls outside of barrier
>> btrfs: introduce device dynamic state transition to offline or failed
>> btrfs: check device for critical errors and mark failed
>> btrfs: block incompatible optional features at scan
>> btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV
>> btrfs: add check not to mount a spare device
>> btrfs: support btrfs dev scan for spare device
>> btrfs: provide framework to get and put a spare device
>> btrfs: introduce helper functions to perform hot replace
>> btrfs: check for failed device and hot replace
>>
>> Qu Wenruo (5):
>> btrfs: Introduce a new function to check if all chunks a OK for
>> degraded mount
>> btrfs: Do per-chunk check for mount time check
>> btrfs: Do per-chunk degraded check for remount
>> btrfs: Allow barrier_all_devices to do per-chunk device check
>> btrfs: Cleanup num_tolerated_disk_barrier_failures
>>
>> fs/btrfs/ctree.h | 7 +-
>> fs/btrfs/dev-replace.c | 116 ++++++++++++++++++++
>> fs/btrfs/dev-replace.h | 1 +
>> fs/btrfs/disk-io.c | 211 +++++++++++++++++++++++-------------
>> fs/btrfs/disk-io.h | 2 -
>> fs/btrfs/super.c | 20 +++-
>> fs/btrfs/transaction.c | 3 +-
>> fs/btrfs/volumes.c | 283
>> ++++++++++++++++++++++++++++++++++++++++++++++---
>> fs/btrfs/volumes.h | 27 +++++
>> 9 files changed, 571 insertions(+), 99 deletions(-)
>>
> I've thrown everything I can think of at this over the weekend, and
> nothing broke (at least, nothing broke that had anything to do with
> these patches, I ended up triggering a couple of known bugs that I had
> completely forgotten about), so you can add:
> Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
>
Thanks Austin.
Yeah I should fix the known bug as listed above.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
2015-11-16 22:07 ` Anand Jain
@ 2015-11-17 12:28 ` Austin S Hemmelgarn
0 siblings, 0 replies; 43+ messages in thread
From: Austin S Hemmelgarn @ 2015-11-17 12:28 UTC (permalink / raw)
To: Anand Jain; +Cc: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 6149 bytes --]
On 2015-11-16 17:07, Anand Jain wrote:
>
>
> On 11/16/2015 09:41 PM, Austin S Hemmelgarn wrote:
>> On 2015-11-09 05:56, Anand Jain wrote:
>>> These set of patches provides btrfs hot spare and auto replace support
>>> for you review and comments.
>>>
>>> First, here below are the simple example steps to configure the same:
>>>
>>> Add a spare device:
>>> btrfs spare add /dev/sde -f
>>>
>>> OR if there is a spare device which is already added before the, just
>>> run
>>>
>>> btrfs dev scan [/dev/sde]
>>>
>>> this will register the spare device to the kernel.
>>>
>>> btrfs fi show
>>> Label: none uuid: 52f170c1-725c-457d-8cfd-d57090460091
>>> Total devices 2 FS bytes used 112.00KiB
>>> devid 1 size 2.00GiB used 417.50MiB path /dev/sdc
>>> devid 2 size 2.00GiB used 417.50MiB path /dev/sdd
>>>
>>> Global spare
>>> device size 3.00GiB path /dev/sde
>>>
>>> Thats it.
>>>
>>> Auto replace:
>>> Replace happens automatically, that is when there is any write
>>> failed or flush failed, the device will be marked as failed, which
>>> will stop any further IO attempt to that device. And in the next
>>> commit
>>> thread cycle the auto replace will pick the spare device (/dev/sde is
>>> above example) to replace the failed device. And so the btrfs
>>> volume is
>>> back to a healthy state.
>>>
>>>
>>> Its btrfs Global spare:
>>> as of now only global hot spare is supported, that is hot spare(s)
>>> are for all the btrfs FS in the system.
>>>
>>> No spare when device failed:
>>> It would scan for spare device at the rate of transaction commit
>>> and will trigger the auto replace when ever spare device is added.
>>>
>>> Priority:
>>> In some future work there can be some chronological order to pick
>>> a spare and the failed device.
>>>
>>>
>>> Patches:
>>>
>>> Kernel:
>>> First, it needs, Qu's per chunk missing device patchset,
>>> which is part of the set here and also there is a light optimization
>>> (patch 5/15) which was required as part of this enhancement.
>>>
>>> Next patches 7,8/15 brings in support, to manage the transition of
>>> devices from online (no state) to offline OR failed state dynamically.
>>> On top of static device state like the current "missing" state.
>>>
>>> Patch 9/15 fixes a bug where in we should have blocked the incompatible
>>> feature at the device scan/add level instead/also at in the mount level.
>>> This is because we don't have to bring a device into the device list,
>>> if it is incompatible.
>>>
>>> Next patches 10,11,12,13/15 adds support for Spare device. For the
>>> details on how to add a spare device kindly see further below.
>>> For kernel with out spare feature supported the spare device
>>> is kept away. And when the kernel supports the spare device, it will
>>> inhibit from mounting it. Further these patch set provides helper
>>> function to pick a spare device and release a spare device back to
>>> the spare device pool.
>>>
>>> Patch 14/15 provides function for auto replace, this is mainly
>>> from the existing replace code, and in the long run I see opportunity
>>> to merge these code with the replace code that is triggered from
>>> the user spare.
>>>
>>> Last 15/15, uses all these facilities, picks a failed device and
>>> triggers a auto replace in a kthread (casualty_kthread())
>>>
>>>
>>> Progs:
>>> Would need 4 patches as listed below.
>>>
>>>
>>> Known Bug:
>>>
>>> As now I see below stale kmem cache during module unload. Which
>>> I am digging.
>>> ------
>>> BUG btrfs_path (Not tainted): Objects remaining in btrfs_path on
>>> kmem_cache_close()
>>> ------
>>>
>>> Anand Jain (10):
>>> btrfs: optimize btrfs_check_degradable() for calls outside of barrier
>>> btrfs: introduce device dynamic state transition to offline or failed
>>> btrfs: check device for critical errors and mark failed
>>> btrfs: block incompatible optional features at scan
>>> btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV
>>> btrfs: add check not to mount a spare device
>>> btrfs: support btrfs dev scan for spare device
>>> btrfs: provide framework to get and put a spare device
>>> btrfs: introduce helper functions to perform hot replace
>>> btrfs: check for failed device and hot replace
>>>
>>> Qu Wenruo (5):
>>> btrfs: Introduce a new function to check if all chunks a OK for
>>> degraded mount
>>> btrfs: Do per-chunk check for mount time check
>>> btrfs: Do per-chunk degraded check for remount
>>> btrfs: Allow barrier_all_devices to do per-chunk device check
>>> btrfs: Cleanup num_tolerated_disk_barrier_failures
>>>
>>> fs/btrfs/ctree.h | 7 +-
>>> fs/btrfs/dev-replace.c | 116 ++++++++++++++++++++
>>> fs/btrfs/dev-replace.h | 1 +
>>> fs/btrfs/disk-io.c | 211 +++++++++++++++++++++++-------------
>>> fs/btrfs/disk-io.h | 2 -
>>> fs/btrfs/super.c | 20 +++-
>>> fs/btrfs/transaction.c | 3 +-
>>> fs/btrfs/volumes.c | 283
>>> ++++++++++++++++++++++++++++++++++++++++++++++---
>>> fs/btrfs/volumes.h | 27 +++++
>>> 9 files changed, 571 insertions(+), 99 deletions(-)
>>>
>> I've thrown everything I can think of at this over the weekend, and
>> nothing broke (at least, nothing broke that had anything to do with
>> these patches, I ended up triggering a couple of known bugs that I had
>> completely forgotten about), so you can add:
>> Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
>>
>
> Thanks Austin.
> Yeah I should fix the known bug as listed above.
>
Actually, while I did see that, I also ran into a couple of other bugs
that are unrelated to these patches (including the balance related bug I
was recently discussing in another thread on the ML, which (like
everyone else it's hit) I've sadly been unable to reproduce). None of
the ones I hit other than the one you mentioned in the cover letter were
anything new with these patches, and they didn't happen any more
frequently with the patches.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 06/15] btrfs: Cleanup num_tolerated_disk_barrier_failures
2015-11-09 10:56 ` [PATCH 06/15] btrfs: Cleanup num_tolerated_disk_barrier_failures Anand Jain
@ 2015-12-05 7:16 ` Qu Wenruo
0 siblings, 0 replies; 43+ messages in thread
From: Qu Wenruo @ 2015-12-05 7:16 UTC (permalink / raw)
To: Anand Jain, linux-btrfs; +Cc: Chris Mason
Hi Anand,
Would you please push patch 1~6 in your hot spare patchset to Chris first?
In my opinion, it will need some time before some details like whether
to do hot-spare in kernel or in user-space are settled.
And all these 6 patches are quite independent from the hot spare patchset.
So it would be OK to push them into mainline in this or next merge windows.
Thanks,
Qu
On 11/09/2015 06:56 PM, Anand Jain wrote:
> From: Qu Wenruo <quwenruo@cn.fujitsu.com>
>
> As we use per-chunk degradable check, now the global
> num_tolerated_disk_barrier_failures is of no use. So cleanup it.
>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>
> [Btrfs: resolve conflict to apply 'btrfs: Cleanup num_tolerated_disk_barrier_failures']
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
> fs/btrfs/ctree.h | 2 --
> fs/btrfs/disk-io.c | 56 ------------------------------------------------------
> fs/btrfs/disk-io.h | 2 --
> fs/btrfs/volumes.c | 17 -----------------
> 4 files changed, 77 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index a86051e..dedd3e0 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1753,8 +1753,6 @@ struct btrfs_fs_info {
> /* next backup root to be overwritten */
> int backup_root_index;
>
> - int num_tolerated_disk_barrier_failures;
> -
> /* device replace state */
> struct btrfs_dev_replace dev_replace;
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index d3303f9..d10ef2e 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2965,8 +2965,6 @@ retry_root_backup:
> printk(KERN_ERR "BTRFS: Failed to read block groups: %d\n", ret);
> goto fail_sysfs;
> }
> - fs_info->num_tolerated_disk_barrier_failures =
> - btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
>
> fs_info->cleaner_kthread = kthread_run(cleaner_kthread, tree_root,
> "btrfs-cleaner");
> @@ -3498,60 +3496,6 @@ int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags)
> return 0;
> }
>
> -int btrfs_calc_num_tolerated_disk_barrier_failures(
> - struct btrfs_fs_info *fs_info)
> -{
> - struct btrfs_ioctl_space_info space;
> - struct btrfs_space_info *sinfo;
> - u64 types[] = {BTRFS_BLOCK_GROUP_DATA,
> - BTRFS_BLOCK_GROUP_SYSTEM,
> - BTRFS_BLOCK_GROUP_METADATA,
> - BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA};
> - int i;
> - int c;
> - int num_tolerated_disk_barrier_failures =
> - (int)fs_info->fs_devices->num_devices;
> -
> - for (i = 0; i < ARRAY_SIZE(types); i++) {
> - struct btrfs_space_info *tmp;
> -
> - sinfo = NULL;
> - rcu_read_lock();
> - list_for_each_entry_rcu(tmp, &fs_info->space_info, list) {
> - if (tmp->flags == types[i]) {
> - sinfo = tmp;
> - break;
> - }
> - }
> - rcu_read_unlock();
> -
> - if (!sinfo)
> - continue;
> -
> - down_read(&sinfo->groups_sem);
> - for (c = 0; c < BTRFS_NR_RAID_TYPES; c++) {
> - u64 flags;
> -
> - if (list_empty(&sinfo->block_groups[c]))
> - continue;
> -
> - btrfs_get_block_group_info(&sinfo->block_groups[c],
> - &space);
> - if (space.total_bytes == 0 || space.used_bytes == 0)
> - continue;
> - flags = space.flags;
> -
> - num_tolerated_disk_barrier_failures = min(
> - num_tolerated_disk_barrier_failures,
> - btrfs_get_num_tolerated_disk_barrier_failures(
> - flags));
> - }
> - up_read(&sinfo->groups_sem);
> - }
> -
> - return num_tolerated_disk_barrier_failures;
> -}
> -
> static int write_all_supers(struct btrfs_root *root, int max_mirrors)
> {
> struct list_head *head;
> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
> index adeb318..6dc5fd3 100644
> --- a/fs/btrfs/disk-io.h
> +++ b/fs/btrfs/disk-io.h
> @@ -142,8 +142,6 @@ struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans,
> int btree_lock_page_hook(struct page *page, void *data,
> void (*flush_fn)(void *));
> int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags);
> -int btrfs_calc_num_tolerated_disk_barrier_failures(
> - struct btrfs_fs_info *fs_info);
> int __init btrfs_end_io_wq_init(void);
> void btrfs_end_io_wq_exit(void);
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index a5262bf..33ad42e 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1782,9 +1782,6 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path, u64 devid)
> free_fs_devices(cur_devices);
> }
>
> - root->fs_info->num_tolerated_disk_barrier_failures =
> - btrfs_calc_num_tolerated_disk_barrier_failures(root->fs_info);
> -
> /*
> * at this point, the device is zero sized. We want to
> * remove it from the devices list and zero out the old super
> @@ -2289,8 +2286,6 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path)
> }
> }
>
> - root->fs_info->num_tolerated_disk_barrier_failures =
> - btrfs_calc_num_tolerated_disk_barrier_failures(root->fs_info);
> ret = btrfs_commit_transaction(trans, root);
>
> if (seeding_dev) {
> @@ -3518,13 +3513,6 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
> }
> } while (read_seqretry(&fs_info->profiles_lock, seq));
>
> - if (bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) {
> - fs_info->num_tolerated_disk_barrier_failures = min(
> - btrfs_calc_num_tolerated_disk_barrier_failures(fs_info),
> - btrfs_get_num_tolerated_disk_barrier_failures(
> - bctl->sys.target));
> - }
> -
> ret = insert_balance_item(fs_info->tree_root, bctl);
> if (ret && ret != -EEXIST)
> goto out;
> @@ -3547,11 +3535,6 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
> mutex_lock(&fs_info->balance_mutex);
> atomic_dec(&fs_info->balance_running);
>
> - if (bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) {
> - fs_info->num_tolerated_disk_barrier_failures =
> - btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
> - }
> -
> if (bargs) {
> memset(bargs, 0, sizeof(*bargs));
> update_ioctl_balance_args(fs_info, 0, bargs);
>
^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2015-12-05 7:16 UTC | newest]
Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
2015-11-09 10:56 ` [PATCH 01/15] btrfs: Introduce a new function to check if all chunks a OK for degraded mount Anand Jain
2015-11-09 10:56 ` [PATCH 02/15] btrfs: Do per-chunk check for mount time check Anand Jain
2015-11-09 10:56 ` [PATCH 03/15] btrfs: Do per-chunk degraded check for remount Anand Jain
2015-11-09 10:56 ` [PATCH 04/15] btrfs: Allow barrier_all_devices to do per-chunk device check Anand Jain
2015-11-09 10:56 ` [PATCH 05/15] btrfs: optimize btrfs_check_degradable() for calls outside of barrier Anand Jain
2015-11-09 10:56 ` [PATCH 06/15] btrfs: Cleanup num_tolerated_disk_barrier_failures Anand Jain
2015-12-05 7:16 ` Qu Wenruo
2015-11-09 10:56 ` [PATCH 07/15] btrfs: introduce device dynamic state transition to offline or failed Anand Jain
2015-11-09 10:56 ` [PATCH 08/15] btrfs: check device for critical errors and mark failed Anand Jain
2015-11-09 10:56 ` [PATCH 09/15] btrfs: block incompatible optional features at scan Anand Jain
2015-11-09 10:56 ` [PATCH 10/15] btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV Anand Jain
2015-11-09 10:56 ` [PATCH 11/15] btrfs: add check not to mount a spare device Anand Jain
2015-11-09 10:56 ` [PATCH 12/15] btrfs: support btrfs dev scan for " Anand Jain
2015-11-09 10:56 ` [PATCH 13/15] btrfs: provide framework to get and put a " Anand Jain
2015-11-09 10:56 ` [PATCH 14/15] btrfs: introduce helper functions to perform hot replace Anand Jain
2015-11-09 10:56 ` [PATCH 15/15] btrfs: check for failed device and " Anand Jain
2015-11-09 10:58 ` [PATCH 0/4] btrfs-progs: Hot spare and Auto replace Anand Jain
2015-11-09 10:58 ` [PATCH 1/4] btrfs-progs: Introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV SB flags Anand Jain
2015-11-09 10:58 ` [PATCH 2/4] btrfs-progs: Introduce btrfs spare subcommand Anand Jain
2015-11-09 10:58 ` [PATCH 3/4] btrfs-progs: add fi show for spare Anand Jain
2015-11-09 10:58 ` [PATCH 4/4] btrfs-progs: add global spare device list to filesystem show Anand Jain
2015-11-09 14:09 ` [PATCH 00/15] btrfs: Hot spare and Auto replace Austin S Hemmelgarn
2015-11-09 21:29 ` Duncan
2015-11-10 12:13 ` Austin S Hemmelgarn
2015-11-13 10:17 ` Anand Jain
2015-11-13 12:25 ` Austin S Hemmelgarn
2015-11-15 18:10 ` Christoph Anton Mitterer
2015-11-12 2:15 ` Qu Wenruo
2015-11-12 6:46 ` Duncan
2015-11-12 13:04 ` Austin S Hemmelgarn
2015-11-13 1:07 ` Qu Wenruo
2015-11-13 10:20 ` Anand Jain
2015-11-14 0:54 ` Qu Wenruo
2015-11-16 13:39 ` Austin S Hemmelgarn
2015-11-12 19:08 ` Goffredo Baroncelli
2015-11-13 10:18 ` Anand Jain
2015-11-12 19:21 ` Goffredo Baroncelli
2015-11-13 10:20 ` Anand Jain
2015-11-14 11:05 ` Goffredo Baroncelli
2015-11-16 13:41 ` Austin S Hemmelgarn
2015-11-16 22:07 ` Anand Jain
2015-11-17 12:28 ` Austin S Hemmelgarn
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.