All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] [RFC] Introduce device state 'failed'
@ 2017-05-03 13:34 Anand Jain
  2017-05-03 13:34 ` [PATCH 1/2 v7] btrfs: introduce device dynamic state transition to offline or failed Anand Jain
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Anand Jain @ 2017-05-03 13:34 UTC (permalink / raw)
  To: linux-btrfs

As the below two patches are about managing the failed disk,
I have separated them from the spare disk and auto replace
support patch set which was sent before here [1]..

    [1] https://lwn.net/Articles/684195/

V7 changes are very limited in this individual patches. But adds
the mount option degrad without going through the remount. And is
for your comments, (RFC).

This set [1] was extensively tested, however as these two patches
are out of the set now, and with a minor change v6-v7, I have
removed the Tested-by: sorry.

The script which I have used to verify v7 is here [2], which
eventually will be part of fstests, once we confirm on the RFC.

[2]
----------------
create_err_dev_raid1()
{
        dm_backing_dev="/dev/sdd"
        blk_dev_size=`blockdev --getsz $dm_backing_dev`
        dmerror_dev="/dev/mapper/dm-sdd"
        dmlinear_table="0 $blk_dev_size linear $dm_backing_dev 0"
        dmerror_table="0 $blk_dev_size error $dm_backing_dev 0"

        echo -e dm_backing_dev'\t'= $dm_backing_dev
        echo -e blk_dev_size'\t'= $blk_dev_size
        [[ $blk_dev_size ]] || exit
        echo -e dmerror_dev'\t'= $dmerror_dev
        echo -e dmlinear_table'\t'= $dmlinear_table
        echo -e dmerror_table'\t'= $dmerror_table
        echo

        runnt "dmsetup remove dm-sdd > /dev/null 2>&1"
        run "dmsetup create dm-sdd --table '${dmlinear_table}'"

        run "mkfs.btrfs -f -draid1 -mraid1 /dev/sdc $dmerror_dev > /dev/null 2>&1"
        run mount /dev/sdc /btrfs
        run "dd if=/dev/zero of=/btrfs/tf1 bs=4096 count=100 > /dev/null 2>&1"

        run btrfs fi show -m /btrfs

        run dmsetup suspend dm-sdd
        run "dmsetup load dm-sdd --table '$dmerror_table'"
        run dmsetup resume dm-sdd
        run "dd if=/dev/zero of=/btrfs/tf1 bs=4096 count=100 > /dev/null 2>&1"

	run sleep 32
        run btrfs fi show -m /btrfs
	run cat /proc/self/mounts
}
---------------

Anand Jain (2):
  btrfs: introduce device dynamic state transition to offline or failed
  btrfs: check device for critical errors and mark failed

 fs/btrfs/ctree.h   |   2 +
 fs/btrfs/disk-io.c | 101 ++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/volumes.c | 135 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.h |  18 +++++++
 4 files changed, 255 insertions(+), 1 deletion(-)

-- 
2.10.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2 v7] btrfs: introduce device dynamic state transition to offline or failed
  2017-05-03 13:34 [PATCH 0/2] [RFC] Introduce device state 'failed' Anand Jain
@ 2017-05-03 13:34 ` Anand Jain
  2017-05-03 13:34 ` [PATCH 2/2 v7] btrfs: check device for critical errors and mark failed Anand Jain
  2017-05-03 15:31 ` [PATCH 0/2] [RFC] Introduce device state 'failed' Austin S. Hemmelgarn
  2 siblings, 0 replies; 5+ messages in thread
From: Anand Jain @ 2017-05-03 13:34 UTC (permalink / raw)
  To: linux-btrfs

From: Anand Jain <Anand.Jain@oracle.com>

This patch provides helper functions to force a device to offline
or failed, and we need this device states for the following reasons,
1) a. it can be reported that device has failed when it does
   b. close the device when it goes offline so that blocklayer can
      cleanup
2) identify the candidate for the auto replace
3) avoid further commit error reported against the failing device and
4) a device in the multi device btrfs may go offline from the system
   (but as of now in in some system config btrfs gets unmounted in this
    context, which is not a correct behavior)

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v7:
 . Set degraded mount flag when volume is degraded due to disk failure
 . Use fs_info->num_tolerated_disk_barrier_failures for now and update
   this later based on which approach finally makes it to the mainline.
 . Removed: (as this is out of the set)
 Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
 Tested-by: Yauhen Kharuzhy <yauhen.kharuzhy@zavadatar.com>

v6: Changes on top of
    btrfs: rename btrfs_std_error to btrfs_handle_fs_error

 fs/btrfs/volumes.c | 134 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.h |  14 ++++++
 2 files changed, 148 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ab8a66d852f9..609ed3d924c3 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7200,3 +7200,137 @@ void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info)
 		fs_devices = fs_devices->seed;
 	}
 }
+
+static void __close_device(struct work_struct *work)
+{
+	struct btrfs_device *device;
+
+	device = container_of(work, struct btrfs_device, rcu_work);
+
+	if (device->closing_bdev)
+		blkdev_put(device->closing_bdev, device->mode);
+
+	device->closing_bdev = NULL;
+}
+
+static void close_device(struct rcu_head *head)
+{
+	struct btrfs_device *device;
+
+	device = container_of(head, struct btrfs_device, rcu);
+
+	INIT_WORK(&device->rcu_work, __close_device);
+	schedule_work(&device->rcu_work);
+}
+
+void device_force_close(struct btrfs_device *device)
+{
+	struct btrfs_fs_devices *fs_devices;
+
+	fs_devices = device->fs_devices;
+
+	mutex_lock(&fs_devices->device_list_mutex);
+	mutex_lock(&fs_devices->fs_info->chunk_mutex);
+	spin_lock(&fs_devices->fs_info->free_chunk_lock);
+
+	btrfs_assign_next_active_device(fs_devices->fs_info, device, NULL);
+
+	if (device->bdev)
+		fs_devices->open_devices--;
+
+	if (device->writeable) {
+		list_del_init(&device->dev_alloc_list);
+		fs_devices->rw_devices--;
+	}
+	device->writeable = 0;
+
+	/*
+	 * fixme: works for now, but its better to keep the state of
+	 * missing and offline different, and update rest of the
+	 * places where we check for only missing and not for failed
+	 * or offline as of now.
+	 */
+	device->missing = 1;
+	fs_devices->missing_devices++;
+	device->closing_bdev = device->bdev;
+	device->bdev = NULL;
+
+	call_rcu(&device->rcu, close_device);
+
+	spin_unlock(&fs_devices->fs_info->free_chunk_lock);
+	mutex_unlock(&fs_devices->fs_info->chunk_mutex);
+	mutex_unlock(&fs_devices->device_list_mutex);
+
+	rcu_barrier();
+}
+
+void btrfs_device_enforce_state(struct btrfs_device *dev, char *why)
+{
+	int tolerance;
+	bool degrade_option;
+	char dev_status[10];
+	char chunk_status[25];
+	struct btrfs_fs_info *fs_info;
+	struct btrfs_fs_devices *fs_devices;
+
+	fs_devices = dev->fs_devices;
+	fs_info = fs_devices->fs_info;
+	degrade_option = btrfs_test_opt(fs_info, DEGRADED);
+
+	/* todo: support seed later */
+	if (fs_devices->seeding)
+		return;
+
+	/* this shouldn't be called if device is already missing */
+	if (dev->missing || !dev->bdev)
+		return;
+
+	if (dev->offline || dev->failed)
+		return;
+
+	/* Last RW device is requested to force close let FS handle it*/
+	if (fs_devices->rw_devices == 1) {
+		btrfs_handle_fs_error(fs_info, -EIO,
+			"force offline last RW device");
+		return;
+	}
+
+	if (!strcmp(why, "offline"))
+		dev->offline = 1;
+	else if (!strcmp(why, "failed"))
+		dev->failed = 1;
+	else
+		return;
+
+	/*
+	 * Here after, there shouldn't any reason why can't force
+	 * close this device
+	 */
+	btrfs_sysfs_rm_device_link(fs_devices, dev);
+	device_force_close(dev);
+	strcpy(dev_status, "closed");
+
+	tolerance = fs_info->num_tolerated_disk_barrier_failures -
+				fs_info->fs_devices->missing_devices;
+	if(tolerance < 0) {
+		strncpy(chunk_status, "chunk(s) failed", 25);
+	} else {
+		strncpy(chunk_status, "chunk(s) degraded", 25);
+		/*
+		 * don't remount, that will jitter the application
+		 * IO workload performance, which is not acceptable
+		 */
+		btrfs_set_opt(fs_info->mount_opt, DEGRADED);
+	}
+
+	btrfs_warn_in_rcu(fs_info, "device %s marked %s, %s, %s",
+		rcu_str_deref(dev->name), why, dev_status, chunk_status);
+	btrfs_info_in_rcu(fs_info,
+		"num_devices %llu rw_devices %llu degraded-option: %s",
+		fs_devices->num_devices, fs_devices->rw_devices,
+		degrade_option ? "set":"unset");
+
+	if (tolerance < 0)
+		btrfs_handle_fs_error(fs_info, -EIO,
+					"devices below critical level");
+}
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 9c09dcd96e5d..10818974ed07 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -65,13 +65,26 @@ struct btrfs_device {
 	struct btrfs_pending_bios pending_sync_bios;
 
 	struct block_device *bdev;
+	struct block_device *closing_bdev;
 
 	/* the mode sent to blkdev_get */
 	fmode_t mode;
 
 	int writeable;
 	int in_fs_metadata;
+	/* missing: device wasn't found at the time of mount */
 	int missing;
+	/* failed: device confirmed to have experienced critical io failure */
+	int failed;
+	/*
+	 * offline: system or user or block layer transport has removed
+	 * offlined the device which was once present and without going
+	 * through unmount. Implies an intriem communication break down
+	 * and not necessarily a candidate for the device replace. And
+	 * device might be online after user intervention or after
+	 * block transport layer error recovery.
+	 */
+	int offline;
 	int can_discard;
 	int is_tgtdev_for_dev_replace;
 	int last_flush_error;
@@ -538,5 +551,6 @@ void btrfs_update_commit_device_bytes_used(struct btrfs_fs_info *fs_info,
 struct list_head *btrfs_get_fs_uuids(void);
 void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
 void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
+void btrfs_device_enforce_state(struct btrfs_device *dev, char *why);
 
 #endif
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2 v7] btrfs: check device for critical errors and mark failed
  2017-05-03 13:34 [PATCH 0/2] [RFC] Introduce device state 'failed' Anand Jain
  2017-05-03 13:34 ` [PATCH 1/2 v7] btrfs: introduce device dynamic state transition to offline or failed Anand Jain
@ 2017-05-03 13:34 ` Anand Jain
  2017-05-03 15:31 ` [PATCH 0/2] [RFC] Introduce device state 'failed' Austin S. Hemmelgarn
  2 siblings, 0 replies; 5+ messages in thread
From: Anand Jain @ 2017-05-03 13:34 UTC (permalink / raw)
  To: linux-btrfs

From: Anand Jain <Anand.Jain@oracle.com>

Write and Flush errors are considered as critical errors,
upon which the device will be brought offline and marked as
failed. Write and Flush errors are identified using device
error statistics. This is monitored using a kthread
btrfs_health.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
V7:
 . Now out of its set.
 . Removed:
 Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
 Tested-by: Yauhen Kharuzhy <yauhen.kharuzhy@zavadatar.com>

V6: Fix the case where the fail monitor would clash with user initated
    device operation.

 fs/btrfs/ctree.h   |   2 ++
 fs/btrfs/disk-io.c | 101 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/volumes.c |   1 +
 fs/btrfs/volumes.h |   4 +++
 4 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index c4115901d906..db519f0ebcb0 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -814,6 +814,7 @@ struct btrfs_fs_info {
 	struct mutex tree_log_mutex;
 	struct mutex transaction_kthread_mutex;
 	struct mutex cleaner_mutex;
+	struct mutex health_mutex;
 	struct mutex chunk_mutex;
 	struct mutex volume_mutex;
 
@@ -931,6 +932,7 @@ struct btrfs_fs_info {
 	struct btrfs_workqueue *extent_workers;
 	struct task_struct *transaction_kthread;
 	struct task_struct *cleaner_kthread;
+	struct task_struct *health_kthread;
 	int thread_pool_size;
 
 	struct kobject *space_info_kobj;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index efdd16294e60..9f16c7d2c191 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1936,6 +1936,93 @@ static int cleaner_kthread(void *arg)
 	return 0;
 }
 
+/*
+ * returns:
+ * < 0 : Check didn't run, std error
+ *   0 : No errors found
+ * > 0 : # of devices having fatal errors
+ */
+static int btrfs_update_devices_health(struct btrfs_root *root)
+{
+	int ret = 0;
+	struct btrfs_device *device;
+	struct btrfs_fs_info *fs_info = root->fs_info;
+
+	if (btrfs_fs_closing(fs_info))
+		return -EBUSY;
+
+	/* mark disk(s) with write or flush error(s) as failed */
+	mutex_lock(&fs_info->volume_mutex);
+	list_for_each_entry_rcu(device,
+			&fs_info->fs_devices->devices, dev_list) {
+		int c_err;
+
+		if (device->failed) {
+			ret++;
+			continue;
+		}
+
+		/*
+		 * todo: replace target device's write/flush error,
+		 * skip for now
+		 */
+		if (device->is_tgtdev_for_dev_replace)
+			continue;
+
+		if (!device->dev_stats_valid)
+			continue;
+
+		c_err = atomic_read(&device->new_critical_errs);
+		atomic_sub(c_err, &device->new_critical_errs);
+		if (c_err) {
+			btrfs_crit_in_rcu(fs_info,
+				"fatal error on device %s",
+					rcu_str_deref(device->name));
+			btrfs_device_enforce_state(device, "failed");
+			ret ++;
+		}
+	}
+	mutex_unlock(&fs_info->volume_mutex);
+
+	return ret;
+}
+
+/*
+ * Devices health maintenance kthread, gets woken-up by transaction
+ * kthread, once sysfs is ready, this should publish the report
+ * through sysfs so that user land scripts and invoke actions.
+ */
+static int health_kthread(void *arg)
+{
+	struct btrfs_root *root = arg;
+
+	do {
+		if (btrfs_need_cleaner_sleep(root->fs_info))
+			goto sleep;
+
+		if (!mutex_trylock(&root->fs_info->health_mutex))
+			goto sleep;
+
+		if (btrfs_need_cleaner_sleep(root->fs_info)) {
+			mutex_unlock(&root->fs_info->health_mutex);
+			goto sleep;
+		}
+
+		/* Check devices health */
+		btrfs_update_devices_health(root);
+
+		mutex_unlock(&root->fs_info->health_mutex);
+
+sleep:
+		set_current_state(TASK_INTERRUPTIBLE);
+		if (!kthread_should_stop())
+			schedule();
+		__set_current_state(TASK_RUNNING);
+	} while (!kthread_should_stop());
+
+	return 0;
+}
+
 static int transaction_kthread(void *arg)
 {
 	struct btrfs_root *root = arg;
@@ -1983,6 +2070,7 @@ static int transaction_kthread(void *arg)
 			btrfs_end_transaction(trans);
 		}
 sleep:
+		wake_up_process(fs_info->health_kthread);
 		wake_up_process(fs_info->cleaner_kthread);
 		mutex_unlock(&fs_info->transaction_kthread_mutex);
 
@@ -2738,6 +2826,7 @@ int open_ctree(struct super_block *sb,
 	mutex_init(&fs_info->chunk_mutex);
 	mutex_init(&fs_info->transaction_kthread_mutex);
 	mutex_init(&fs_info->cleaner_mutex);
+	mutex_init(&fs_info->health_mutex);
 	mutex_init(&fs_info->volume_mutex);
 	mutex_init(&fs_info->ro_block_group_mutex);
 	init_rwsem(&fs_info->commit_root_sem);
@@ -3074,11 +3163,16 @@ int open_ctree(struct super_block *sb,
 	if (IS_ERR(fs_info->cleaner_kthread))
 		goto fail_sysfs;
 
+	fs_info->health_kthread = kthread_run(health_kthread, tree_root,
+					       "btrfs-health");
+	if (IS_ERR(fs_info->health_kthread))
+		goto fail_cleaner;
+
 	fs_info->transaction_kthread = kthread_run(transaction_kthread,
 						   tree_root,
 						   "btrfs-transaction");
 	if (IS_ERR(fs_info->transaction_kthread))
-		goto fail_cleaner;
+		goto fail_health;
 
 	if (!btrfs_test_opt(fs_info, SSD) &&
 	    !btrfs_test_opt(fs_info, NOSSD) &&
@@ -3249,6 +3343,10 @@ int open_ctree(struct super_block *sb,
 	kthread_stop(fs_info->transaction_kthread);
 	btrfs_cleanup_transaction(fs_info);
 	btrfs_free_fs_roots(fs_info);
+
+fail_health:
+	kthread_stop(fs_info->health_kthread);
+
 fail_cleaner:
 	kthread_stop(fs_info->cleaner_kthread);
 
@@ -4022,6 +4120,7 @@ void close_ctree(struct btrfs_fs_info *fs_info)
 
 	kthread_stop(fs_info->transaction_kthread);
 	kthread_stop(fs_info->cleaner_kthread);
+	kthread_stop(fs_info->health_kthread);
 
 	set_bit(BTRFS_FS_CLOSING_DONE, &fs_info->flags);
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 609ed3d924c3..e0ca956a2994 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -246,6 +246,7 @@ static struct btrfs_device *__alloc_device(void)
 	spin_lock_init(&dev->reada_lock);
 	atomic_set(&dev->reada_in_flight, 0);
 	atomic_set(&dev->dev_stats_ccnt, 0);
+	atomic_set(&dev->new_critical_errs, 0);
 	btrfs_device_data_ordered_init(dev);
 	INIT_RADIX_TREE(&dev->reada_zones, GFP_NOFS & ~__GFP_DIRECT_RECLAIM);
 	INIT_RADIX_TREE(&dev->reada_extents, GFP_NOFS & ~__GFP_DIRECT_RECLAIM);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 10818974ed07..9bb248b3fa0e 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -163,6 +163,7 @@ struct btrfs_device {
 	/* Counter to record the change of device stats */
 	atomic_t dev_stats_ccnt;
 	atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX];
+	atomic_t new_critical_errs;
 };
 
 /*
@@ -511,6 +512,9 @@ static inline void btrfs_dev_stat_inc(struct btrfs_device *dev,
 	atomic_inc(dev->dev_stat_values + index);
 	smp_mb__before_atomic();
 	atomic_inc(&dev->dev_stats_ccnt);
+	if (index == BTRFS_DEV_STAT_WRITE_ERRS ||
+		index == BTRFS_DEV_STAT_FLUSH_ERRS)
+		atomic_inc(&dev->new_critical_errs);
 }
 
 static inline int btrfs_dev_stat_read(struct btrfs_device *dev,
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] [RFC] Introduce device state 'failed'
  2017-05-03 13:34 [PATCH 0/2] [RFC] Introduce device state 'failed' Anand Jain
  2017-05-03 13:34 ` [PATCH 1/2 v7] btrfs: introduce device dynamic state transition to offline or failed Anand Jain
  2017-05-03 13:34 ` [PATCH 2/2 v7] btrfs: check device for critical errors and mark failed Anand Jain
@ 2017-05-03 15:31 ` Austin S. Hemmelgarn
  2017-05-03 20:57   ` Anand Jain
  2 siblings, 1 reply; 5+ messages in thread
From: Austin S. Hemmelgarn @ 2017-05-03 15:31 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs

On 2017-05-03 09:34, Anand Jain wrote:
> As the below two patches are about managing the failed disk,
> I have separated them from the spare disk and auto replace
> support patch set which was sent before here [1]..
>
>     [1] https://lwn.net/Articles/684195/
>
> V7 changes are very limited in this individual patches. But adds
> the mount option degrad without going through the remount. And is
> for your comments, (RFC).
>
> This set [1] was extensively tested, however as these two patches
> are out of the set now, and with a minor change v6-v7, I have
> removed the Tested-by: sorry.
Entirely understood.
>
> The script which I have used to verify v7 is here [2], which
> eventually will be part of fstests, once we confirm on the RFC.
>
> [2]
> ----------------
> create_err_dev_raid1()
> {
>         dm_backing_dev="/dev/sdd"
>         blk_dev_size=`blockdev --getsz $dm_backing_dev`
>         dmerror_dev="/dev/mapper/dm-sdd"
>         dmlinear_table="0 $blk_dev_size linear $dm_backing_dev 0"
>         dmerror_table="0 $blk_dev_size error $dm_backing_dev 0"
>
>         echo -e dm_backing_dev'\t'= $dm_backing_dev
>         echo -e blk_dev_size'\t'= $blk_dev_size
>         [[ $blk_dev_size ]] || exit
>         echo -e dmerror_dev'\t'= $dmerror_dev
>         echo -e dmlinear_table'\t'= $dmlinear_table
>         echo -e dmerror_table'\t'= $dmerror_table
>         echo
>
>         runnt "dmsetup remove dm-sdd > /dev/null 2>&1"
>         run "dmsetup create dm-sdd --table '${dmlinear_table}'"
>
>         run "mkfs.btrfs -f -draid1 -mraid1 /dev/sdc $dmerror_dev > /dev/null 2>&1"
>         run mount /dev/sdc /btrfs
>         run "dd if=/dev/zero of=/btrfs/tf1 bs=4096 count=100 > /dev/null 2>&1"
>
>         run btrfs fi show -m /btrfs
>
>         run dmsetup suspend dm-sdd
>         run "dmsetup load dm-sdd --table '$dmerror_table'"
>         run dmsetup resume dm-sdd
>         run "dd if=/dev/zero of=/btrfs/tf1 bs=4096 count=100 > /dev/null 2>&1"
>
> 	run sleep 32
>         run btrfs fi show -m /btrfs
> 	run cat /proc/self/mounts
> }
> ---------------
>
> Anand Jain (2):
>   btrfs: introduce device dynamic state transition to offline or failed
>   btrfs: check device for critical errors and mark failed
>
>  fs/btrfs/ctree.h   |   2 +
>  fs/btrfs/disk-io.c | 101 ++++++++++++++++++++++++++++++++++++++-
>  fs/btrfs/volumes.c | 135 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/btrfs/volumes.h |  18 +++++++
>  4 files changed, 255 insertions(+), 1 deletion(-)
>
All my tests passed, and manual testing shows that it does as 
advertised, so for the series as a whole you can add:
Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] [RFC] Introduce device state 'failed'
  2017-05-03 15:31 ` [PATCH 0/2] [RFC] Introduce device state 'failed' Austin S. Hemmelgarn
@ 2017-05-03 20:57   ` Anand Jain
  0 siblings, 0 replies; 5+ messages in thread
From: Anand Jain @ 2017-05-03 20:57 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: linux-btrfs



On 05/03/2017 11:31 PM, Austin S. Hemmelgarn wrote:
> On 2017-05-03 09:34, Anand Jain wrote:
>> As the below two patches are about managing the failed disk,
>> I have separated them from the spare disk and auto replace
>> support patch set which was sent before here [1]..
>>
>>     [1] https://lwn.net/Articles/684195/
>>
>> V7 changes are very limited in this individual patches. But adds
>> the mount option degrad without going through the remount. And is
>> for your comments, (RFC).
>>
>> This set [1] was extensively tested, however as these two patches
>> are out of the set now, and with a minor change v6-v7, I have
>> removed the Tested-by: sorry.
> Entirely understood.
>>
>> The script which I have used to verify v7 is here [2], which
>> eventually will be part of fstests, once we confirm on the RFC.
>>
>> [2]
>> ----------------
>> create_err_dev_raid1()
>> {
>>         dm_backing_dev="/dev/sdd"
>>         blk_dev_size=`blockdev --getsz $dm_backing_dev`
>>         dmerror_dev="/dev/mapper/dm-sdd"
>>         dmlinear_table="0 $blk_dev_size linear $dm_backing_dev 0"
>>         dmerror_table="0 $blk_dev_size error $dm_backing_dev 0"
>>
>>         echo -e dm_backing_dev'\t'= $dm_backing_dev
>>         echo -e blk_dev_size'\t'= $blk_dev_size
>>         [[ $blk_dev_size ]] || exit
>>         echo -e dmerror_dev'\t'= $dmerror_dev
>>         echo -e dmlinear_table'\t'= $dmlinear_table
>>         echo -e dmerror_table'\t'= $dmerror_table
>>         echo
>>
>>         runnt "dmsetup remove dm-sdd > /dev/null 2>&1"
>>         run "dmsetup create dm-sdd --table '${dmlinear_table}'"
>>
>>         run "mkfs.btrfs -f -draid1 -mraid1 /dev/sdc $dmerror_dev >
>> /dev/null 2>&1"
>>         run mount /dev/sdc /btrfs
>>         run "dd if=/dev/zero of=/btrfs/tf1 bs=4096 count=100 >
>> /dev/null 2>&1"
>>
>>         run btrfs fi show -m /btrfs
>>
>>         run dmsetup suspend dm-sdd
>>         run "dmsetup load dm-sdd --table '$dmerror_table'"
>>         run dmsetup resume dm-sdd
>>         run "dd if=/dev/zero of=/btrfs/tf1 bs=4096 count=100 >
>> /dev/null 2>&1"
>>
>>     run sleep 32
>>         run btrfs fi show -m /btrfs
>>     run cat /proc/self/mounts
>> }
>> ---------------
>>
>> Anand Jain (2):
>>   btrfs: introduce device dynamic state transition to offline or failed
>>   btrfs: check device for critical errors and mark failed
>>
>>  fs/btrfs/ctree.h   |   2 +
>>  fs/btrfs/disk-io.c | 101 ++++++++++++++++++++++++++++++++++++++-
>>  fs/btrfs/volumes.c | 135
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  fs/btrfs/volumes.h |  18 +++++++
>>  4 files changed, 255 insertions(+), 1 deletion(-)
>>
> All my tests passed, and manual testing shows that it does as
> advertised, so for the series as a whole you can add:
> Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>

Thanks!
  Anand


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-05-03 20:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-03 13:34 [PATCH 0/2] [RFC] Introduce device state 'failed' Anand Jain
2017-05-03 13:34 ` [PATCH 1/2 v7] btrfs: introduce device dynamic state transition to offline or failed Anand Jain
2017-05-03 13:34 ` [PATCH 2/2 v7] btrfs: check device for critical errors and mark failed Anand Jain
2017-05-03 15:31 ` [PATCH 0/2] [RFC] Introduce device state 'failed' Austin S. Hemmelgarn
2017-05-03 20:57   ` Anand Jain

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.