All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] dm-zoned: harden error handling
@ 2019-08-10 21:43 Dmitry Fomichev
  2019-08-10 21:43 ` [PATCH v2 1/3] dm-zoned: improve error handling in reclaim Dmitry Fomichev
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Dmitry Fomichev @ 2019-08-10 21:43 UTC (permalink / raw)
  To: dm-devel, Mike Snitzer; +Cc: Damien Le Moal

This series contains a few patches aimed to make dm-zoned properly
handle backing zoned storage device failures.

The following test environment was used to trigger device failures:
A 100% write file I/O fio workload was run against an XFS volume
created on top of a dm-zoned target device. The dm-zoned device,
in turn, was created on top of a zoned device provisioned by
tcmu-runner ZBC file handler. The backing storage file for the
tcmu-runner handler has been created on a separate ext4 file system.

Two different failure modes were tested -
1. Removal of the tcmu-runner device under I/O
2. Complete failure of the backing storage of tcmu-runner under I/O

In both of these test cases, dm-zoned has been observed to hang and
the only possible remedy to recover from that state was to reboot the
test system. The fio test script would also hang forever.

With these patches in place, the test script properly reports the
error and exits. After that, it is possible to remove the dm-zoned
target device and then recreate it with some working backup storage
without rebooting the system.

v1 -> v2:	- don't set dm-zoned disk read-only after BD failure

Dmitry Fomichev (3):
  dm-zoned: improve error handling in reclaim
  dm-zoned: improve error handling in i/o map code
  dm-zoned: properly handle backing device failure

 drivers/md/dm-zoned-metadata.c | 55 ++++++++++++++++++++++------
 drivers/md/dm-zoned-reclaim.c  | 44 +++++++++++++++++-----
 drivers/md/dm-zoned-target.c   | 67 ++++++++++++++++++++++++++++++----
 drivers/md/dm-zoned.h          | 10 +++++
 4 files changed, 146 insertions(+), 30 deletions(-)

-- 
2.21.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/3] dm-zoned: improve error handling in reclaim
  2019-08-10 21:43 [PATCH v2 0/3] dm-zoned: harden error handling Dmitry Fomichev
@ 2019-08-10 21:43 ` Dmitry Fomichev
  2019-08-10 21:43 ` [PATCH v2 2/3] dm-zoned: improve error handling in i/o map code Dmitry Fomichev
  2019-08-10 21:43 ` [PATCH v2 3/3] dm-zoned: properly handle backing device failure Dmitry Fomichev
  2 siblings, 0 replies; 5+ messages in thread
From: Dmitry Fomichev @ 2019-08-10 21:43 UTC (permalink / raw)
  To: dm-devel, Mike Snitzer; +Cc: Damien Le Moal

There are several places in reclaim code where errors are not
propagated to the main function, dmz_reclaim(). This function
is responsible for unlocking zones that might be still locked
at the end of any failed reclaim iterations. As the result,
some device zones may be left permanently locked for reclaim,
degrading target's capability to reclaim zones.

This patch fixes these issues as follows -

Make sure that dmz_reclaim_buf(), dmz_reclaim_seq_data() and
dmz_reclaim_rnd_data() return error codes to the caller.

dmz_reclaim() function is renamed to dmz_do_reclaim() to avoid
clashing with "struct dmz_reclaim" and is modified to return the
error to the caller.

dmz_get_zone_for_reclaim() now returns an error instead of NULL
pointer and reclaim code checks for that error.

Error logging/debug messages are added where necessary.

Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
---
 drivers/md/dm-zoned-metadata.c |  4 ++--
 drivers/md/dm-zoned-reclaim.c  | 28 +++++++++++++++++++---------
 2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index ded4984d18c9..6b7fbbd735ef 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -1543,7 +1543,7 @@ static struct dm_zone *dmz_get_rnd_zone_for_reclaim(struct dmz_metadata *zmd)
 	struct dm_zone *zone;
 
 	if (list_empty(&zmd->map_rnd_list))
-		return NULL;
+		return ERR_PTR(-EBUSY);
 
 	list_for_each_entry(zone, &zmd->map_rnd_list, link) {
 		if (dmz_is_buf(zone))
@@ -1554,7 +1554,7 @@ static struct dm_zone *dmz_get_rnd_zone_for_reclaim(struct dmz_metadata *zmd)
 			return dzone;
 	}
 
-	return NULL;
+	return ERR_PTR(-EBUSY);
 }
 
 /*
diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
index 260e3598199e..26e34493a2db 100644
--- a/drivers/md/dm-zoned-reclaim.c
+++ b/drivers/md/dm-zoned-reclaim.c
@@ -216,7 +216,7 @@ static int dmz_reclaim_buf(struct dmz_reclaim *zrc, struct dm_zone *dzone)
 
 	dmz_unlock_flush(zmd);
 
-	return 0;
+	return ret;
 }
 
 /*
@@ -260,7 +260,7 @@ static int dmz_reclaim_seq_data(struct dmz_reclaim *zrc, struct dm_zone *dzone)
 
 	dmz_unlock_flush(zmd);
 
-	return 0;
+	return ret;
 }
 
 /*
@@ -313,7 +313,7 @@ static int dmz_reclaim_rnd_data(struct dmz_reclaim *zrc, struct dm_zone *dzone)
 
 	dmz_unlock_flush(zmd);
 
-	return 0;
+	return ret;
 }
 
 /*
@@ -335,7 +335,7 @@ static void dmz_reclaim_empty(struct dmz_reclaim *zrc, struct dm_zone *dzone)
 /*
  * Find a candidate zone for reclaim and process it.
  */
-static void dmz_reclaim(struct dmz_reclaim *zrc)
+static int dmz_do_reclaim(struct dmz_reclaim *zrc)
 {
 	struct dmz_metadata *zmd = zrc->metadata;
 	struct dm_zone *dzone;
@@ -345,8 +345,8 @@ static void dmz_reclaim(struct dmz_reclaim *zrc)
 
 	/* Get a data zone */
 	dzone = dmz_get_zone_for_reclaim(zmd);
-	if (!dzone)
-		return;
+	if (IS_ERR(dzone))
+		return PTR_ERR(dzone);
 
 	start = jiffies;
 
@@ -392,13 +392,20 @@ static void dmz_reclaim(struct dmz_reclaim *zrc)
 out:
 	if (ret) {
 		dmz_unlock_zone_reclaim(dzone);
-		return;
+		return ret;
 	}
 
-	(void) dmz_flush_metadata(zrc->metadata);
+	ret = dmz_flush_metadata(zrc->metadata);
+	if (ret) {
+		dmz_dev_debug(zrc->dev,
+			      "Metadata flush for zone %u failed, err %d\n",
+			      dmz_id(zmd, rzone), ret);
+		return ret;
+	}
 
 	dmz_dev_debug(zrc->dev, "Reclaimed zone %u in %u ms",
 		      dmz_id(zmd, rzone), jiffies_to_msecs(jiffies - start));
+	return 0;
 }
 
 /*
@@ -443,6 +450,7 @@ static void dmz_reclaim_work(struct work_struct *work)
 	struct dmz_metadata *zmd = zrc->metadata;
 	unsigned int nr_rnd, nr_unmap_rnd;
 	unsigned int p_unmap_rnd;
+	int ret;
 
 	if (!dmz_should_reclaim(zrc)) {
 		mod_delayed_work(zrc->wq, &zrc->work, DMZ_IDLE_PERIOD);
@@ -472,7 +480,9 @@ static void dmz_reclaim_work(struct work_struct *work)
 		      (dmz_target_idle(zrc) ? "Idle" : "Busy"),
 		      p_unmap_rnd, nr_unmap_rnd, nr_rnd);
 
-	dmz_reclaim(zrc);
+	ret = dmz_do_reclaim(zrc);
+	if (ret)
+		dmz_dev_debug(zrc->dev, "Reclaim error %d\n", ret);
 
 	dmz_schedule_reclaim(zrc);
 }
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 2/3] dm-zoned: improve error handling in i/o map code
  2019-08-10 21:43 [PATCH v2 0/3] dm-zoned: harden error handling Dmitry Fomichev
  2019-08-10 21:43 ` [PATCH v2 1/3] dm-zoned: improve error handling in reclaim Dmitry Fomichev
@ 2019-08-10 21:43 ` Dmitry Fomichev
  2019-08-10 21:43 ` [PATCH v2 3/3] dm-zoned: properly handle backing device failure Dmitry Fomichev
  2 siblings, 0 replies; 5+ messages in thread
From: Dmitry Fomichev @ 2019-08-10 21:43 UTC (permalink / raw)
  To: dm-devel, Mike Snitzer; +Cc: Damien Le Moal

Some errors are ignored in the I/O path during queueing chunks
for processing by chunk works. Since at least these errors are
transient in nature, it should be possible to retry the failed
incoming commands.

The fix -

Errors that can happen while queueing chunks are carried upwards
to the main mapping function and it now returns DM_MAPIO_REQUEUE
for any incoming requests that can not be properly queued.

Error logging/debug messages are added where needed.

Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
---
 drivers/md/dm-zoned-target.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
index 1bf6e6eebee1..c1992034c099 100644
--- a/drivers/md/dm-zoned-target.c
+++ b/drivers/md/dm-zoned-target.c
@@ -514,22 +514,24 @@ static void dmz_flush_work(struct work_struct *work)
  * Get a chunk work and start it to process a new BIO.
  * If the BIO chunk has no work yet, create one.
  */
-static void dmz_queue_chunk_work(struct dmz_target *dmz, struct bio *bio)
+static int dmz_queue_chunk_work(struct dmz_target *dmz, struct bio *bio)
 {
 	unsigned int chunk = dmz_bio_chunk(dmz->dev, bio);
 	struct dm_chunk_work *cw;
+	int ret = 0;
 
 	mutex_lock(&dmz->chunk_lock);
 
 	/* Get the BIO chunk work. If one is not active yet, create one */
 	cw = radix_tree_lookup(&dmz->chunk_rxtree, chunk);
 	if (!cw) {
-		int ret;
 
 		/* Create a new chunk work */
 		cw = kmalloc(sizeof(struct dm_chunk_work), GFP_NOIO);
-		if (!cw)
+		if (unlikely(!cw)) {
+			ret = -ENOMEM;
 			goto out;
+		}
 
 		INIT_WORK(&cw->work, dmz_chunk_work);
 		refcount_set(&cw->refcount, 0);
@@ -540,7 +542,6 @@ static void dmz_queue_chunk_work(struct dmz_target *dmz, struct bio *bio)
 		ret = radix_tree_insert(&dmz->chunk_rxtree, chunk, cw);
 		if (unlikely(ret)) {
 			kfree(cw);
-			cw = NULL;
 			goto out;
 		}
 	}
@@ -548,10 +549,12 @@ static void dmz_queue_chunk_work(struct dmz_target *dmz, struct bio *bio)
 	bio_list_add(&cw->bio_list, bio);
 	dmz_get_chunk_work(cw);
 
+	dmz_reclaim_bio_acc(dmz->reclaim);
 	if (queue_work(dmz->chunk_wq, &cw->work))
 		dmz_get_chunk_work(cw);
 out:
 	mutex_unlock(&dmz->chunk_lock);
+	return ret;
 }
 
 /*
@@ -565,6 +568,7 @@ static int dmz_map(struct dm_target *ti, struct bio *bio)
 	sector_t sector = bio->bi_iter.bi_sector;
 	unsigned int nr_sectors = bio_sectors(bio);
 	sector_t chunk_sector;
+	int ret;
 
 	dmz_dev_debug(dev, "BIO op %d sector %llu + %u => chunk %llu, block %llu, %u blocks",
 		      bio_op(bio), (unsigned long long)sector, nr_sectors,
@@ -602,8 +606,14 @@ static int dmz_map(struct dm_target *ti, struct bio *bio)
 		dm_accept_partial_bio(bio, dev->zone_nr_sectors - chunk_sector);
 
 	/* Now ready to handle this BIO */
-	dmz_reclaim_bio_acc(dmz->reclaim);
-	dmz_queue_chunk_work(dmz, bio);
+	ret = dmz_queue_chunk_work(dmz, bio);
+	if (ret) {
+		dmz_dev_debug(dmz->dev,
+			      "BIO op %d, can't process chunk %llu, err %i\n",
+			      bio_op(bio), (u64)dmz_bio_chunk(dmz->dev, bio),
+			      ret);
+		return DM_MAPIO_REQUEUE;
+	}
 
 	return DM_MAPIO_SUBMITTED;
 }
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 3/3] dm-zoned: properly handle backing device failure
  2019-08-10 21:43 [PATCH v2 0/3] dm-zoned: harden error handling Dmitry Fomichev
  2019-08-10 21:43 ` [PATCH v2 1/3] dm-zoned: improve error handling in reclaim Dmitry Fomichev
  2019-08-10 21:43 ` [PATCH v2 2/3] dm-zoned: improve error handling in i/o map code Dmitry Fomichev
@ 2019-08-10 21:43 ` Dmitry Fomichev
  2019-08-12 18:43   ` Damien Le Moal
  2 siblings, 1 reply; 5+ messages in thread
From: Dmitry Fomichev @ 2019-08-10 21:43 UTC (permalink / raw)
  To: dm-devel, Mike Snitzer; +Cc: Damien Le Moal

dm-zoned is observed to lock up or livelock in case of hardware
failure or some misconfiguration of the backing zoned device.

This patch adds a new dm-zoned target function that checks the status of
the backing device. If the request queue of the backing device is found
to be in dying state or the SCSI backing device enters offline state,
the health check code sets a dm-zoned target flag prompting all further
incoming I/O to be rejected. In order to detect backing device failures
timely, this new function is called in the request mapping path, at the
beginning of every reclaim run and before performing any metadata I/O.

The proper way out of this situation is to do

dmsetup remove <dm-zoned target>

and recreate the target when the problem with the backing device
is resolved.

Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
---
 drivers/md/dm-zoned-metadata.c | 51 +++++++++++++++++++++++++++-------
 drivers/md/dm-zoned-reclaim.c  | 18 ++++++++++--
 drivers/md/dm-zoned-target.c   | 45 ++++++++++++++++++++++++++++--
 drivers/md/dm-zoned.h          | 10 +++++++
 4 files changed, 110 insertions(+), 14 deletions(-)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index 6b7fbbd735ef..2a5bc51fd6d5 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -403,15 +403,18 @@ static struct dmz_mblock *dmz_get_mblock_slow(struct dmz_metadata *zmd,
 	sector_t block = zmd->sb[zmd->mblk_primary].block + mblk_no;
 	struct bio *bio;
 
+	if (dmz_bdev_is_dying(zmd->dev))
+		return ERR_PTR(-EIO);
+
 	/* Get a new block and a BIO to read it */
 	mblk = dmz_alloc_mblock(zmd, mblk_no);
 	if (!mblk)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 
 	bio = bio_alloc(GFP_NOIO, 1);
 	if (!bio) {
 		dmz_free_mblock(zmd, mblk);
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 	}
 
 	spin_lock(&zmd->mblk_lock);
@@ -542,8 +545,8 @@ static struct dmz_mblock *dmz_get_mblock(struct dmz_metadata *zmd,
 	if (!mblk) {
 		/* Cache miss: read the block from disk */
 		mblk = dmz_get_mblock_slow(zmd, mblk_no);
-		if (!mblk)
-			return ERR_PTR(-ENOMEM);
+		if (IS_ERR(mblk))
+			return mblk;
 	}
 
 	/* Wait for on-going read I/O and check for error */
@@ -571,16 +574,19 @@ static void dmz_dirty_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk)
 /*
  * Issue a metadata block write BIO.
  */
-static void dmz_write_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk,
-			     unsigned int set)
+static int dmz_write_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk,
+			    unsigned int set)
 {
 	sector_t block = zmd->sb[set].block + mblk->no;
 	struct bio *bio;
 
+	if (dmz_bdev_is_dying(zmd->dev))
+		return -EIO;
+
 	bio = bio_alloc(GFP_NOIO, 1);
 	if (!bio) {
 		set_bit(DMZ_META_ERROR, &mblk->state);
-		return;
+		return -ENOMEM;
 	}
 
 	set_bit(DMZ_META_WRITING, &mblk->state);
@@ -592,6 +598,8 @@ static void dmz_write_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk,
 	bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_META | REQ_PRIO);
 	bio_add_page(bio, mblk->page, DMZ_BLOCK_SIZE, 0);
 	submit_bio(bio);
+
+	return 0;
 }
 
 /*
@@ -603,6 +611,9 @@ static int dmz_rdwr_block(struct dmz_metadata *zmd, int op, sector_t block,
 	struct bio *bio;
 	int ret;
 
+	if (dmz_bdev_is_dying(zmd->dev))
+		return -EIO;
+
 	bio = bio_alloc(GFP_NOIO, 1);
 	if (!bio)
 		return -ENOMEM;
@@ -660,22 +671,29 @@ static int dmz_write_dirty_mblocks(struct dmz_metadata *zmd,
 {
 	struct dmz_mblock *mblk;
 	struct blk_plug plug;
-	int ret = 0;
+	int ret = 0, nr_mblks_submitted = 0;
 
 	/* Issue writes */
 	blk_start_plug(&plug);
-	list_for_each_entry(mblk, write_list, link)
-		dmz_write_mblock(zmd, mblk, set);
+	list_for_each_entry(mblk, write_list, link) {
+		ret = dmz_write_mblock(zmd, mblk, set);
+		if (ret)
+			break;
+		nr_mblks_submitted++;
+	}
 	blk_finish_plug(&plug);
 
 	/* Wait for completion */
 	list_for_each_entry(mblk, write_list, link) {
+		if (!nr_mblks_submitted)
+			break;
 		wait_on_bit_io(&mblk->state, DMZ_META_WRITING,
 			       TASK_UNINTERRUPTIBLE);
 		if (test_bit(DMZ_META_ERROR, &mblk->state)) {
 			clear_bit(DMZ_META_ERROR, &mblk->state);
 			ret = -EIO;
 		}
+		nr_mblks_submitted--;
 	}
 
 	/* Flush drive cache (this will also sync data) */
@@ -737,6 +755,11 @@ int dmz_flush_metadata(struct dmz_metadata *zmd)
 	 */
 	dmz_lock_flush(zmd);
 
+	if (dmz_bdev_is_dying(zmd->dev)) {
+		ret = -EIO;
+		goto out;
+	}
+
 	/* Get dirty blocks */
 	spin_lock(&zmd->mblk_lock);
 	list_splice_init(&zmd->mblk_dirty_list, &write_list);
@@ -1632,6 +1655,10 @@ struct dm_zone *dmz_get_chunk_mapping(struct dmz_metadata *zmd, unsigned int chu
 		/* Allocate a random zone */
 		dzone = dmz_alloc_zone(zmd, DMZ_ALLOC_RND);
 		if (!dzone) {
+			if (dmz_bdev_is_dying(zmd->dev)) {
+				dzone = ERR_PTR(-EIO);
+				goto out;
+			}
 			dmz_wait_for_free_zones(zmd);
 			goto again;
 		}
@@ -1729,6 +1756,10 @@ struct dm_zone *dmz_get_chunk_buffer(struct dmz_metadata *zmd,
 	/* Allocate a random zone */
 	bzone = dmz_alloc_zone(zmd, DMZ_ALLOC_RND);
 	if (!bzone) {
+		if (dmz_bdev_is_dying(zmd->dev)) {
+			bzone = ERR_PTR(-EIO);
+			goto out;
+		}
 		dmz_wait_for_free_zones(zmd);
 		goto again;
 	}
diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
index 26e34493a2db..d240d7ca8a8a 100644
--- a/drivers/md/dm-zoned-reclaim.c
+++ b/drivers/md/dm-zoned-reclaim.c
@@ -38,7 +38,7 @@ enum {
 /*
  * Number of seconds of target BIO inactivity to consider the target idle.
  */
-#define DMZ_IDLE_PERIOD		(10UL * HZ)
+#define DMZ_IDLE_PERIOD			(10UL * HZ)
 
 /*
  * Percentage of unmapped (free) random zones below which reclaim starts
@@ -135,6 +135,9 @@ static int dmz_reclaim_copy(struct dmz_reclaim *zrc,
 		set_bit(DM_KCOPYD_WRITE_SEQ, &flags);
 
 	while (block < end_block) {
+		if (dev->flags & DMZ_BDEV_DYING)
+			return -EIO;
+
 		/* Get a valid region from the source zone */
 		ret = dmz_first_valid_block(zmd, src_zone, &block);
 		if (ret <= 0)
@@ -452,6 +455,9 @@ static void dmz_reclaim_work(struct work_struct *work)
 	unsigned int p_unmap_rnd;
 	int ret;
 
+	if (dmz_bdev_is_dying(zrc->dev))
+		return;
+
 	if (!dmz_should_reclaim(zrc)) {
 		mod_delayed_work(zrc->wq, &zrc->work, DMZ_IDLE_PERIOD);
 		return;
@@ -481,8 +487,16 @@ static void dmz_reclaim_work(struct work_struct *work)
 		      p_unmap_rnd, nr_unmap_rnd, nr_rnd);
 
 	ret = dmz_do_reclaim(zrc);
-	if (ret)
+	if (ret) {
 		dmz_dev_debug(zrc->dev, "Reclaim error %d\n", ret);
+		if (ret == -EIO)
+			/*
+			 * LLD might be performing some error handling sequence
+			 * at the underlying device. To not interfere, do not
+			 * attempt to schedule the next reclaim run immediately.
+			 */
+			return;
+	}
 
 	dmz_schedule_reclaim(zrc);
 }
diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
index c1992034c099..31478fef6032 100644
--- a/drivers/md/dm-zoned-target.c
+++ b/drivers/md/dm-zoned-target.c
@@ -134,6 +134,8 @@ static int dmz_submit_bio(struct dmz_target *dmz, struct dm_zone *zone,
 
 	refcount_inc(&bioctx->ref);
 	generic_make_request(clone);
+	if (clone->bi_status == BLK_STS_IOERR)
+		return -EIO;
 
 	if (bio_op(bio) == REQ_OP_WRITE && dmz_is_seq(zone))
 		zone->wp_block += nr_blocks;
@@ -278,8 +280,8 @@ static int dmz_handle_buffered_write(struct dmz_target *dmz,
 
 	/* Get the buffer zone. One will be allocated if needed */
 	bzone = dmz_get_chunk_buffer(zmd, zone);
-	if (!bzone)
-		return -ENOSPC;
+	if (IS_ERR(bzone))
+		return PTR_ERR(bzone);
 
 	if (dmz_is_readonly(bzone))
 		return -EROFS;
@@ -390,6 +392,11 @@ static void dmz_handle_bio(struct dmz_target *dmz, struct dm_chunk_work *cw,
 
 	dmz_lock_metadata(zmd);
 
+	if (dmz->dev->flags & DMZ_BDEV_DYING) {
+		ret = -EIO;
+		goto out;
+	}
+
 	/*
 	 * Get the data zone mapping the chunk. There may be no
 	 * mapping for read and discard. If a mapping is obtained,
@@ -494,6 +501,8 @@ static void dmz_flush_work(struct work_struct *work)
 
 	/* Flush dirty metadata blocks */
 	ret = dmz_flush_metadata(dmz->metadata);
+	if (ret)
+		dmz_dev_debug(dmz->dev, "Metadata flush failed, rc=%d\n", ret);
 
 	/* Process queued flush requests */
 	while (1) {
@@ -557,6 +566,32 @@ static int dmz_queue_chunk_work(struct dmz_target *dmz, struct bio *bio)
 	return ret;
 }
 
+/*
+ * Check the backing device availability. If it's on the way out,
+ * start failing I/O. Reclaim and metadata components also call this
+ * function to cleanly abort operation in the event of such failure.
+ */
+bool dmz_bdev_is_dying(struct dmz_dev *dmz_dev)
+{
+	struct gendisk *disk;
+
+	if (!(dmz_dev->flags & DMZ_BDEV_DYING)) {
+		disk = dmz_dev->bdev->bd_disk;
+		if (blk_queue_dying(bdev_get_queue(dmz_dev->bdev))) {
+			dmz_dev_warn(dmz_dev, "Backing device queue dying");
+			dmz_dev->flags |= DMZ_BDEV_DYING;
+		} else if (disk->fops->check_events) {
+			if (disk->fops->check_events(disk, 0) &
+					DISK_EVENT_MEDIA_CHANGE) {
+				dmz_dev_warn(dmz_dev, "Backing device offline");
+				dmz_dev->flags |= DMZ_BDEV_DYING;
+			}
+		}
+	}
+
+	return dmz_dev->flags & DMZ_BDEV_DYING;
+}
+
 /*
  * Process a new BIO.
  */
@@ -570,6 +605,9 @@ static int dmz_map(struct dm_target *ti, struct bio *bio)
 	sector_t chunk_sector;
 	int ret;
 
+	if (dmz_bdev_is_dying(dmz->dev))
+		return DM_MAPIO_KILL;
+
 	dmz_dev_debug(dev, "BIO op %d sector %llu + %u => chunk %llu, block %llu, %u blocks",
 		      bio_op(bio), (unsigned long long)sector, nr_sectors,
 		      (unsigned long long)dmz_bio_chunk(dmz->dev, bio),
@@ -866,6 +904,9 @@ static int dmz_prepare_ioctl(struct dm_target *ti, struct block_device **bdev)
 {
 	struct dmz_target *dmz = ti->private;
 
+	if (dmz_bdev_is_dying(dmz->dev))
+		return -ENODEV;
+
 	*bdev = dmz->dev->bdev;
 
 	return 0;
diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
index 1a3b06de2c00..d8e70b0ade35 100644
--- a/drivers/md/dm-zoned.h
+++ b/drivers/md/dm-zoned.h
@@ -57,6 +57,8 @@ struct dmz_dev {
 
 	unsigned int		nr_zones;
 
+	unsigned int		flags;
+
 	sector_t		zone_nr_sectors;
 	unsigned int		zone_nr_sectors_shift;
 
@@ -68,6 +70,9 @@ struct dmz_dev {
 				 (dev)->zone_nr_sectors_shift)
 #define dmz_chunk_block(dev, b)	((b) & ((dev)->zone_nr_blocks - 1))
 
+/* Device flags. */
+#define DMZ_BDEV_DYING		(1 << 0)
+
 /*
  * Zone descriptor.
  */
@@ -246,4 +251,9 @@ void dmz_resume_reclaim(struct dmz_reclaim *zrc);
 void dmz_reclaim_bio_acc(struct dmz_reclaim *zrc);
 void dmz_schedule_reclaim(struct dmz_reclaim *zrc);
 
+/*
+ * Functions defined in dm-zoned-target.c
+ */
+bool dmz_bdev_is_dying(struct dmz_dev *dmz_dev);
+
 #endif /* DM_ZONED_H */
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 3/3] dm-zoned: properly handle backing device failure
  2019-08-10 21:43 ` [PATCH v2 3/3] dm-zoned: properly handle backing device failure Dmitry Fomichev
@ 2019-08-12 18:43   ` Damien Le Moal
  0 siblings, 0 replies; 5+ messages in thread
From: Damien Le Moal @ 2019-08-12 18:43 UTC (permalink / raw)
  To: Dmitry Fomichev, dm-devel, Mike Snitzer

On 2019/08/10 14:43, Dmitry Fomichev wrote:
> dm-zoned is observed to lock up or livelock in case of hardware
> failure or some misconfiguration of the backing zoned device.
> 
> This patch adds a new dm-zoned target function that checks the status of
> the backing device. If the request queue of the backing device is found
> to be in dying state or the SCSI backing device enters offline state,
> the health check code sets a dm-zoned target flag prompting all further
> incoming I/O to be rejected. In order to detect backing device failures
> timely, this new function is called in the request mapping path, at the
> beginning of every reclaim run and before performing any metadata I/O.
> 
> The proper way out of this situation is to do
> 
> dmsetup remove <dm-zoned target>
> 
> and recreate the target when the problem with the backing device
> is resolved.
> 
> Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target")
> Cc: stable@vger.kernel.org
> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
> ---
>  drivers/md/dm-zoned-metadata.c | 51 +++++++++++++++++++++++++++-------
>  drivers/md/dm-zoned-reclaim.c  | 18 ++++++++++--
>  drivers/md/dm-zoned-target.c   | 45 ++++++++++++++++++++++++++++--
>  drivers/md/dm-zoned.h          | 10 +++++++
>  4 files changed, 110 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> index 6b7fbbd735ef..2a5bc51fd6d5 100644
> --- a/drivers/md/dm-zoned-metadata.c
> +++ b/drivers/md/dm-zoned-metadata.c
> @@ -403,15 +403,18 @@ static struct dmz_mblock *dmz_get_mblock_slow(struct dmz_metadata *zmd,
>  	sector_t block = zmd->sb[zmd->mblk_primary].block + mblk_no;
>  	struct bio *bio;
>  
> +	if (dmz_bdev_is_dying(zmd->dev))
> +		return ERR_PTR(-EIO);
> +
>  	/* Get a new block and a BIO to read it */
>  	mblk = dmz_alloc_mblock(zmd, mblk_no);
>  	if (!mblk)
> -		return NULL;
> +		return ERR_PTR(-ENOMEM);
>  
>  	bio = bio_alloc(GFP_NOIO, 1);
>  	if (!bio) {
>  		dmz_free_mblock(zmd, mblk);
> -		return NULL;
> +		return ERR_PTR(-ENOMEM);
>  	}
>  
>  	spin_lock(&zmd->mblk_lock);
> @@ -542,8 +545,8 @@ static struct dmz_mblock *dmz_get_mblock(struct dmz_metadata *zmd,
>  	if (!mblk) {
>  		/* Cache miss: read the block from disk */
>  		mblk = dmz_get_mblock_slow(zmd, mblk_no);
> -		if (!mblk)
> -			return ERR_PTR(-ENOMEM);
> +		if (IS_ERR(mblk))
> +			return mblk;
>  	}
>  
>  	/* Wait for on-going read I/O and check for error */
> @@ -571,16 +574,19 @@ static void dmz_dirty_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk)
>  /*
>   * Issue a metadata block write BIO.
>   */
> -static void dmz_write_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk,
> -			     unsigned int set)
> +static int dmz_write_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk,
> +			    unsigned int set)
>  {
>  	sector_t block = zmd->sb[set].block + mblk->no;
>  	struct bio *bio;
>  
> +	if (dmz_bdev_is_dying(zmd->dev))
> +		return -EIO;
> +
>  	bio = bio_alloc(GFP_NOIO, 1);
>  	if (!bio) {
>  		set_bit(DMZ_META_ERROR, &mblk->state);
> -		return;
> +		return -ENOMEM;
>  	}
>  
>  	set_bit(DMZ_META_WRITING, &mblk->state);
> @@ -592,6 +598,8 @@ static void dmz_write_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk,
>  	bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_META | REQ_PRIO);
>  	bio_add_page(bio, mblk->page, DMZ_BLOCK_SIZE, 0);
>  	submit_bio(bio);
> +
> +	return 0;
>  }
>  
>  /*
> @@ -603,6 +611,9 @@ static int dmz_rdwr_block(struct dmz_metadata *zmd, int op, sector_t block,
>  	struct bio *bio;
>  	int ret;
>  
> +	if (dmz_bdev_is_dying(zmd->dev))
> +		return -EIO;
> +
>  	bio = bio_alloc(GFP_NOIO, 1);
>  	if (!bio)
>  		return -ENOMEM;
> @@ -660,22 +671,29 @@ static int dmz_write_dirty_mblocks(struct dmz_metadata *zmd,
>  {
>  	struct dmz_mblock *mblk;
>  	struct blk_plug plug;
> -	int ret = 0;
> +	int ret = 0, nr_mblks_submitted = 0;
>  
>  	/* Issue writes */
>  	blk_start_plug(&plug);
> -	list_for_each_entry(mblk, write_list, link)
> -		dmz_write_mblock(zmd, mblk, set);
> +	list_for_each_entry(mblk, write_list, link) {
> +		ret = dmz_write_mblock(zmd, mblk, set);
> +		if (ret)
> +			break;
> +		nr_mblks_submitted++;
> +	}
>  	blk_finish_plug(&plug);
>  
>  	/* Wait for completion */
>  	list_for_each_entry(mblk, write_list, link) {
> +		if (!nr_mblks_submitted)
> +			break;
>  		wait_on_bit_io(&mblk->state, DMZ_META_WRITING,
>  			       TASK_UNINTERRUPTIBLE);
>  		if (test_bit(DMZ_META_ERROR, &mblk->state)) {
>  			clear_bit(DMZ_META_ERROR, &mblk->state);
>  			ret = -EIO;
>  		}
> +		nr_mblks_submitted--;
>  	}
>  
>  	/* Flush drive cache (this will also sync data) */
> @@ -737,6 +755,11 @@ int dmz_flush_metadata(struct dmz_metadata *zmd)
>  	 */
>  	dmz_lock_flush(zmd);
>  
> +	if (dmz_bdev_is_dying(zmd->dev)) {
> +		ret = -EIO;
> +		goto out;
> +	}
> +
>  	/* Get dirty blocks */
>  	spin_lock(&zmd->mblk_lock);
>  	list_splice_init(&zmd->mblk_dirty_list, &write_list);
> @@ -1632,6 +1655,10 @@ struct dm_zone *dmz_get_chunk_mapping(struct dmz_metadata *zmd, unsigned int chu
>  		/* Allocate a random zone */
>  		dzone = dmz_alloc_zone(zmd, DMZ_ALLOC_RND);
>  		if (!dzone) {
> +			if (dmz_bdev_is_dying(zmd->dev)) {
> +				dzone = ERR_PTR(-EIO);
> +				goto out;
> +			}
>  			dmz_wait_for_free_zones(zmd);
>  			goto again;
>  		}
> @@ -1729,6 +1756,10 @@ struct dm_zone *dmz_get_chunk_buffer(struct dmz_metadata *zmd,
>  	/* Allocate a random zone */
>  	bzone = dmz_alloc_zone(zmd, DMZ_ALLOC_RND);
>  	if (!bzone) {
> +		if (dmz_bdev_is_dying(zmd->dev)) {
> +			bzone = ERR_PTR(-EIO);
> +			goto out;
> +		}
>  		dmz_wait_for_free_zones(zmd);
>  		goto again;
>  	}
> diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
> index 26e34493a2db..d240d7ca8a8a 100644
> --- a/drivers/md/dm-zoned-reclaim.c
> +++ b/drivers/md/dm-zoned-reclaim.c
> @@ -38,7 +38,7 @@ enum {
>  /*
>   * Number of seconds of target BIO inactivity to consider the target idle.
>   */
> -#define DMZ_IDLE_PERIOD		(10UL * HZ)
> +#define DMZ_IDLE_PERIOD			(10UL * HZ)
>  
>  /*
>   * Percentage of unmapped (free) random zones below which reclaim starts
> @@ -135,6 +135,9 @@ static int dmz_reclaim_copy(struct dmz_reclaim *zrc,
>  		set_bit(DM_KCOPYD_WRITE_SEQ, &flags);
>  
>  	while (block < end_block) {
> +		if (dev->flags & DMZ_BDEV_DYING)
> +			return -EIO;
> +
>  		/* Get a valid region from the source zone */
>  		ret = dmz_first_valid_block(zmd, src_zone, &block);
>  		if (ret <= 0)
> @@ -452,6 +455,9 @@ static void dmz_reclaim_work(struct work_struct *work)
>  	unsigned int p_unmap_rnd;
>  	int ret;
>  
> +	if (dmz_bdev_is_dying(zrc->dev))
> +		return;
> +
>  	if (!dmz_should_reclaim(zrc)) {
>  		mod_delayed_work(zrc->wq, &zrc->work, DMZ_IDLE_PERIOD);
>  		return;
> @@ -481,8 +487,16 @@ static void dmz_reclaim_work(struct work_struct *work)
>  		      p_unmap_rnd, nr_unmap_rnd, nr_rnd);
>  
>  	ret = dmz_do_reclaim(zrc);
> -	if (ret)
> +	if (ret) {
>  		dmz_dev_debug(zrc->dev, "Reclaim error %d\n", ret);
> +		if (ret == -EIO)
> +			/*
> +			 * LLD might be performing some error handling sequence
> +			 * at the underlying device. To not interfere, do not
> +			 * attempt to schedule the next reclaim run immediately.
> +			 */
> +			return;
> +	}
>  
>  	dmz_schedule_reclaim(zrc);
>  }
> diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
> index c1992034c099..31478fef6032 100644
> --- a/drivers/md/dm-zoned-target.c
> +++ b/drivers/md/dm-zoned-target.c
> @@ -134,6 +134,8 @@ static int dmz_submit_bio(struct dmz_target *dmz, struct dm_zone *zone,
>  
>  	refcount_inc(&bioctx->ref);
>  	generic_make_request(clone);
> +	if (clone->bi_status == BLK_STS_IOERR)
> +		return -EIO;
>  
>  	if (bio_op(bio) == REQ_OP_WRITE && dmz_is_seq(zone))
>  		zone->wp_block += nr_blocks;
> @@ -278,8 +280,8 @@ static int dmz_handle_buffered_write(struct dmz_target *dmz,
>  
>  	/* Get the buffer zone. One will be allocated if needed */
>  	bzone = dmz_get_chunk_buffer(zmd, zone);
> -	if (!bzone)
> -		return -ENOSPC;
> +	if (IS_ERR(bzone))
> +		return PTR_ERR(bzone);
>  
>  	if (dmz_is_readonly(bzone))
>  		return -EROFS;
> @@ -390,6 +392,11 @@ static void dmz_handle_bio(struct dmz_target *dmz, struct dm_chunk_work *cw,
>  
>  	dmz_lock_metadata(zmd);
>  
> +	if (dmz->dev->flags & DMZ_BDEV_DYING) {
> +		ret = -EIO;
> +		goto out;
> +	}
> +
>  	/*
>  	 * Get the data zone mapping the chunk. There may be no
>  	 * mapping for read and discard. If a mapping is obtained,
> @@ -494,6 +501,8 @@ static void dmz_flush_work(struct work_struct *work)
>  
>  	/* Flush dirty metadata blocks */
>  	ret = dmz_flush_metadata(dmz->metadata);
> +	if (ret)
> +		dmz_dev_debug(dmz->dev, "Metadata flush failed, rc=%d\n", ret);
>  
>  	/* Process queued flush requests */
>  	while (1) {
> @@ -557,6 +566,32 @@ static int dmz_queue_chunk_work(struct dmz_target *dmz, struct bio *bio)
>  	return ret;
>  }
>  
> +/*
> + * Check the backing device availability. If it's on the way out,
> + * start failing I/O. Reclaim and metadata components also call this
> + * function to cleanly abort operation in the event of such failure.
> + */
> +bool dmz_bdev_is_dying(struct dmz_dev *dmz_dev)
> +{
> +	struct gendisk *disk;
> +
> +	if (!(dmz_dev->flags & DMZ_BDEV_DYING)) {
> +		disk = dmz_dev->bdev->bd_disk;
> +		if (blk_queue_dying(bdev_get_queue(dmz_dev->bdev))) {
> +			dmz_dev_warn(dmz_dev, "Backing device queue dying");
> +			dmz_dev->flags |= DMZ_BDEV_DYING;
> +		} else if (disk->fops->check_events) {
> +			if (disk->fops->check_events(disk, 0) &
> +					DISK_EVENT_MEDIA_CHANGE) {
> +				dmz_dev_warn(dmz_dev, "Backing device offline");
> +				dmz_dev->flags |= DMZ_BDEV_DYING;
> +			}
> +		}
> +	}
> +
> +	return dmz_dev->flags & DMZ_BDEV_DYING;
> +}
> +
>  /*
>   * Process a new BIO.
>   */
> @@ -570,6 +605,9 @@ static int dmz_map(struct dm_target *ti, struct bio *bio)
>  	sector_t chunk_sector;
>  	int ret;
>  
> +	if (dmz_bdev_is_dying(dmz->dev))
> +		return DM_MAPIO_KILL;
> +
>  	dmz_dev_debug(dev, "BIO op %d sector %llu + %u => chunk %llu, block %llu, %u blocks",
>  		      bio_op(bio), (unsigned long long)sector, nr_sectors,
>  		      (unsigned long long)dmz_bio_chunk(dmz->dev, bio),
> @@ -866,6 +904,9 @@ static int dmz_prepare_ioctl(struct dm_target *ti, struct block_device **bdev)
>  {
>  	struct dmz_target *dmz = ti->private;
>  
> +	if (dmz_bdev_is_dying(dmz->dev))
> +		return -ENODEV;
> +
>  	*bdev = dmz->dev->bdev;
>  
>  	return 0;
> diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
> index 1a3b06de2c00..d8e70b0ade35 100644
> --- a/drivers/md/dm-zoned.h
> +++ b/drivers/md/dm-zoned.h
> @@ -57,6 +57,8 @@ struct dmz_dev {
>  
>  	unsigned int		nr_zones;
>  
> +	unsigned int		flags;
> +
>  	sector_t		zone_nr_sectors;
>  	unsigned int		zone_nr_sectors_shift;
>  
> @@ -68,6 +70,9 @@ struct dmz_dev {
>  				 (dev)->zone_nr_sectors_shift)
>  #define dmz_chunk_block(dev, b)	((b) & ((dev)->zone_nr_blocks - 1))
>  
> +/* Device flags. */
> +#define DMZ_BDEV_DYING		(1 << 0)
> +
>  /*
>   * Zone descriptor.
>   */
> @@ -246,4 +251,9 @@ void dmz_resume_reclaim(struct dmz_reclaim *zrc);
>  void dmz_reclaim_bio_acc(struct dmz_reclaim *zrc);
>  void dmz_schedule_reclaim(struct dmz_reclaim *zrc);
>  
> +/*
> + * Functions defined in dm-zoned-target.c
> + */
> +bool dmz_bdev_is_dying(struct dmz_dev *dmz_dev);
> +
>  #endif /* DM_ZONED_H */
> 

Looks good. For this patch and the entire series:

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-08-12 18:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-10 21:43 [PATCH v2 0/3] dm-zoned: harden error handling Dmitry Fomichev
2019-08-10 21:43 ` [PATCH v2 1/3] dm-zoned: improve error handling in reclaim Dmitry Fomichev
2019-08-10 21:43 ` [PATCH v2 2/3] dm-zoned: improve error handling in i/o map code Dmitry Fomichev
2019-08-10 21:43 ` [PATCH v2 3/3] dm-zoned: properly handle backing device failure Dmitry Fomichev
2019-08-12 18:43   ` Damien Le Moal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.