All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V6 0/6] block/scsi: safe SCSI quiescing
@ 2017-09-27  5:48 Ming Lei
  2017-09-27  5:48 ` [PATCH V6 1/6] blk-mq: only run hw queues for blk-mq Ming Lei
                   ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Ming Lei @ 2017-09-27  5:48 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, linux-scsi,
	Martin K . Petersen, James E . J . Bottomley
  Cc: Bart Van Assche, Oleksandr Natalenko, Johannes Thumshirn,
	Cathy Avery, Martin Steigerwald, Ming Lei

Hi,

The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.

Once SCSI device is put into QUIESCE, no new request except for
RQF_PREEMPT can be dispatched to SCSI successfully, and
scsi_device_quiesce() just simply waits for completion of I/Os
dispatched to SCSI stack. It isn't enough at all.

Because new request still can be comming, but all the allocated
requests can't be dispatched successfully, so request pool can be
consumed up easily.

Then request with RQF_PREEMPT can't be allocated and wait forever,
meantime scsi_device_resume() waits for completion of RQF_PREEMPT,
then system hangs forever, such as during system suspend or
sending SCSI domain alidation.

Both IO hang inside system suspend[1] or SCSI domain validation
were reported before.

This patch introduces preempt only mode, and solves the issue
by allowing RQF_PREEMP only during SCSI quiesce.

Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
them all.

V6:
	- borrow Bart's idea of preempt only, with clean
	  implementation(patch 5/patch 6)
	- needn't any external driver's dependency, such as MD's
	change

V5:
	- fix one tiny race by introducing blk_queue_enter_preempt_freeze()
	given this change is small enough compared with V4, I added
	tested-by directly

V4:
	- reorganize patch order to make it more reasonable
	- support nested preempt freeze, as required by SCSI transport spi
	- check preempt freezing in slow path of of blk_queue_enter()
	- add "SCSI: transport_spi: resume a quiesced device"
	- wake up freeze queue in setting dying for both blk-mq and legacy
	- rename blk_mq_[freeze|unfreeze]_queue() in one patch
	- rename .mq_freeze_wq and .mq_freeze_depth
	- improve comment

V3:
	- introduce q->preempt_unfreezing to fix one bug of preempt freeze
	- call blk_queue_enter_live() only when queue is preempt frozen
	- cleanup a bit on the implementation of preempt freeze
	- only patch 6 and 7 are changed

V2:
	- drop the 1st patch in V1 because percpu_ref_is_dying() is
	enough as pointed by Tejun
	- introduce preempt version of blk_[freeze|unfreeze]_queue
	- sync between preempt freeze and normal freeze
	- fix warning from percpu-refcount as reported by Oleksandr


[1] https://marc.info/?t=150340250100013&r=3&w=2


Thanks,
Ming

Ming Lei (6):
  blk-mq: only run hw queues for blk-mq
  block: tracking request allocation with q_usage_counter
  block: pass flags to blk_queue_enter()
  block: prepare for passing RQF_PREEMPT to request allocation
  block: support PREEMPT_ONLY
  SCSI: set block queue at preempt only when SCSI device is put into
    quiesce

 block/blk-core.c        | 62 ++++++++++++++++++++++++++++++++++++++-----------
 block/blk-mq.c          | 14 ++++-------
 block/blk-timeout.c     |  2 +-
 drivers/scsi/scsi_lib.c | 25 +++++++++++++++++---
 fs/block_dev.c          |  4 ++--
 include/linux/blk-mq.h  |  7 +++---
 include/linux/blkdev.h  | 27 ++++++++++++++++++---
 7 files changed, 106 insertions(+), 35 deletions(-)

-- 
2.9.5

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH V6 1/6] blk-mq: only run hw queues for blk-mq
  2017-09-27  5:48 [PATCH V6 0/6] block/scsi: safe SCSI quiescing Ming Lei
@ 2017-09-27  5:48 ` Ming Lei
  2017-09-27  5:48 ` [PATCH V6 2/6] block: tracking request allocation with q_usage_counter Ming Lei
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2017-09-27  5:48 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, linux-scsi,
	Martin K . Petersen, James E . J . Bottomley
  Cc: Bart Van Assche, Oleksandr Natalenko, Johannes Thumshirn,
	Cathy Avery, Martin Steigerwald, Ming Lei

This patch just makes it explicitely.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 98a18609755e..6fd9f86fc86d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -125,7 +125,8 @@ void blk_freeze_queue_start(struct request_queue *q)
 	freeze_depth = atomic_inc_return(&q->mq_freeze_depth);
 	if (freeze_depth == 1) {
 		percpu_ref_kill(&q->q_usage_counter);
-		blk_mq_run_hw_queues(q, false);
+		if (q->mq_ops)
+			blk_mq_run_hw_queues(q, false);
 	}
 }
 EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH V6 2/6] block: tracking request allocation with q_usage_counter
  2017-09-27  5:48 [PATCH V6 0/6] block/scsi: safe SCSI quiescing Ming Lei
  2017-09-27  5:48 ` [PATCH V6 1/6] blk-mq: only run hw queues for blk-mq Ming Lei
@ 2017-09-27  5:48 ` Ming Lei
  2017-09-27  6:10   ` Hannes Reinecke
  2017-09-27  5:48 ` [PATCH V6 3/6] block: pass flags to blk_queue_enter() Ming Lei
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Ming Lei @ 2017-09-27  5:48 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, linux-scsi,
	Martin K . Petersen, James E . J . Bottomley
  Cc: Bart Van Assche, Oleksandr Natalenko, Johannes Thumshirn,
	Cathy Avery, Martin Steigerwald, Ming Lei

This usage is basically same with blk-mq, so that we can
support to freeze legacy queue easily.

Also 'wake_up_all(&q->mq_freeze_wq)' has to be moved
into blk_set_queue_dying() since both legacy and blk-mq
may wait on the wait queue of .mq_freeze_wq.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c | 14 ++++++++++++++
 block/blk-mq.c   |  7 -------
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index aebe676225e6..abfba798ee03 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -610,6 +610,12 @@ void blk_set_queue_dying(struct request_queue *q)
 		}
 		spin_unlock_irq(q->queue_lock);
 	}
+
+	/*
+	 * We need to ensure that processes currently waiting on
+	 * the queue are notified as well.
+	 */
+	wake_up_all(&q->mq_freeze_wq);
 }
 EXPORT_SYMBOL_GPL(blk_set_queue_dying);
 
@@ -1392,16 +1398,21 @@ static struct request *blk_old_get_request(struct request_queue *q,
 					   unsigned int op, gfp_t gfp_mask)
 {
 	struct request *rq;
+	int ret = 0;
 
 	WARN_ON_ONCE(q->mq_ops);
 
 	/* create ioc upfront */
 	create_io_context(gfp_mask, q->node);
 
+	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
+	if (ret)
+		return ERR_PTR(ret);
 	spin_lock_irq(q->queue_lock);
 	rq = get_request(q, op, NULL, gfp_mask);
 	if (IS_ERR(rq)) {
 		spin_unlock_irq(q->queue_lock);
+		blk_queue_exit(q);
 		return rq;
 	}
 
@@ -1573,6 +1584,7 @@ void __blk_put_request(struct request_queue *q, struct request *req)
 		blk_free_request(rl, req);
 		freed_request(rl, sync, rq_flags);
 		blk_put_rl(rl);
+		blk_queue_exit(q);
 	}
 }
 EXPORT_SYMBOL_GPL(__blk_put_request);
@@ -1854,8 +1866,10 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
 	 * Grab a free request. This is might sleep but can not fail.
 	 * Returns with the queue unlocked.
 	 */
+	blk_queue_enter_live(q);
 	req = get_request(q, bio->bi_opf, bio, GFP_NOIO);
 	if (IS_ERR(req)) {
+		blk_queue_exit(q);
 		__wbt_done(q->rq_wb, wb_acct);
 		if (PTR_ERR(req) == -ENOMEM)
 			bio->bi_status = BLK_STS_RESOURCE;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6fd9f86fc86d..10c1f49f663d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -256,13 +256,6 @@ void blk_mq_wake_waiters(struct request_queue *q)
 	queue_for_each_hw_ctx(q, hctx, i)
 		if (blk_mq_hw_queue_mapped(hctx))
 			blk_mq_tag_wakeup_all(hctx->tags, true);
-
-	/*
-	 * If we are called because the queue has now been marked as
-	 * dying, we need to ensure that processes currently waiting on
-	 * the queue are notified as well.
-	 */
-	wake_up_all(&q->mq_freeze_wq);
 }
 
 bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH V6 3/6] block: pass flags to blk_queue_enter()
  2017-09-27  5:48 [PATCH V6 0/6] block/scsi: safe SCSI quiescing Ming Lei
  2017-09-27  5:48 ` [PATCH V6 1/6] blk-mq: only run hw queues for blk-mq Ming Lei
  2017-09-27  5:48 ` [PATCH V6 2/6] block: tracking request allocation with q_usage_counter Ming Lei
@ 2017-09-27  5:48 ` Ming Lei
  2017-09-27  5:48 ` [PATCH V6 4/6] block: prepare for passing RQF_PREEMPT to request allocation Ming Lei
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2017-09-27  5:48 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, linux-scsi,
	Martin K . Petersen, James E . J . Bottomley
  Cc: Bart Van Assche, Oleksandr Natalenko, Johannes Thumshirn,
	Cathy Avery, Martin Steigerwald, Ming Lei

We need to pass PREEMPT flags to blk_queue_enter()
for allocating request with RQF_PREEMPT in the
following patch.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c       | 10 ++++++----
 block/blk-mq.c         |  5 +++--
 block/blk-timeout.c    |  2 +-
 fs/block_dev.c         |  4 ++--
 include/linux/blkdev.h |  7 ++++++-
 5 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index abfba798ee03..be17b5bcf6e7 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -766,7 +766,7 @@ struct request_queue *blk_alloc_queue(gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(blk_alloc_queue);
 
-int blk_queue_enter(struct request_queue *q, bool nowait)
+int blk_queue_enter(struct request_queue *q, unsigned flags)
 {
 	while (true) {
 		int ret;
@@ -774,7 +774,7 @@ int blk_queue_enter(struct request_queue *q, bool nowait)
 		if (percpu_ref_tryget_live(&q->q_usage_counter))
 			return 0;
 
-		if (nowait)
+		if (flags & BLK_REQ_NOWAIT)
 			return -EBUSY;
 
 		/*
@@ -1405,7 +1405,8 @@ static struct request *blk_old_get_request(struct request_queue *q,
 	/* create ioc upfront */
 	create_io_context(gfp_mask, q->node);
 
-	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
+	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM) ?
+			BLK_REQ_NOWAIT : 0);
 	if (ret)
 		return ERR_PTR(ret);
 	spin_lock_irq(q->queue_lock);
@@ -2212,7 +2213,8 @@ blk_qc_t generic_make_request(struct bio *bio)
 	do {
 		struct request_queue *q = bio->bi_disk->queue;
 
-		if (likely(blk_queue_enter(q, bio->bi_opf & REQ_NOWAIT) == 0)) {
+		if (likely(blk_queue_enter(q, (bio->bi_opf & REQ_NOWAIT) ?
+						BLK_REQ_NOWAIT : 0) == 0)) {
 			struct bio_list lower, same;
 
 			/* Create a fresh bio_list for all subordinate requests */
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 10c1f49f663d..45bff90e08f7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -384,7 +384,8 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
 	struct request *rq;
 	int ret;
 
-	ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
+	ret = blk_queue_enter(q, (flags & BLK_MQ_REQ_NOWAIT) ?
+			BLK_REQ_NOWAIT : 0);
 	if (ret)
 		return ERR_PTR(ret);
 
@@ -423,7 +424,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 	if (hctx_idx >= q->nr_hw_queues)
 		return ERR_PTR(-EIO);
 
-	ret = blk_queue_enter(q, true);
+	ret = blk_queue_enter(q, BLK_REQ_NOWAIT);
 	if (ret)
 		return ERR_PTR(ret);
 
diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index 17ec83bb0900..e803106a5e5b 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -134,7 +134,7 @@ void blk_timeout_work(struct work_struct *work)
 	struct request *rq, *tmp;
 	int next_set = 0;
 
-	if (blk_queue_enter(q, true))
+	if (blk_queue_enter(q, BLK_REQ_NOWAIT))
 		return;
 	spin_lock_irqsave(q->queue_lock, flags);
 
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 93d088ffc05c..98cf2d7ee9d3 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -674,7 +674,7 @@ int bdev_read_page(struct block_device *bdev, sector_t sector,
 	if (!ops->rw_page || bdev_get_integrity(bdev))
 		return result;
 
-	result = blk_queue_enter(bdev->bd_queue, false);
+	result = blk_queue_enter(bdev->bd_queue, 0);
 	if (result)
 		return result;
 	result = ops->rw_page(bdev, sector + get_start_sect(bdev), page, false);
@@ -710,7 +710,7 @@ int bdev_write_page(struct block_device *bdev, sector_t sector,
 
 	if (!ops->rw_page || bdev_get_integrity(bdev))
 		return -EOPNOTSUPP;
-	result = blk_queue_enter(bdev->bd_queue, false);
+	result = blk_queue_enter(bdev->bd_queue, 0);
 	if (result)
 		return result;
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 460294bb0fa5..107e2fd48486 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -857,6 +857,11 @@ enum {
 	BLKPREP_INVALID,	/* invalid command, kill, return -EREMOTEIO */
 };
 
+/* passed to blk_queue_enter */
+enum {
+	BLK_REQ_NOWAIT = (1 << 0),
+};
+
 extern unsigned long blk_max_low_pfn, blk_max_pfn;
 
 /*
@@ -962,7 +967,7 @@ extern int scsi_cmd_ioctl(struct request_queue *, struct gendisk *, fmode_t,
 extern int sg_scsi_ioctl(struct request_queue *, struct gendisk *, fmode_t,
 			 struct scsi_ioctl_command __user *);
 
-extern int blk_queue_enter(struct request_queue *q, bool nowait);
+extern int blk_queue_enter(struct request_queue *q, unsigned flags);
 extern void blk_queue_exit(struct request_queue *q);
 extern void blk_start_queue(struct request_queue *q);
 extern void blk_start_queue_async(struct request_queue *q);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH V6 4/6] block: prepare for passing RQF_PREEMPT to request allocation
  2017-09-27  5:48 [PATCH V6 0/6] block/scsi: safe SCSI quiescing Ming Lei
                   ` (2 preceding siblings ...)
  2017-09-27  5:48 ` [PATCH V6 3/6] block: pass flags to blk_queue_enter() Ming Lei
@ 2017-09-27  5:48 ` Ming Lei
  2017-09-27  5:48 ` [PATCH V6 5/6] block: support PREEMPT_ONLY Ming Lei
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2017-09-27  5:48 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, linux-scsi,
	Martin K . Petersen, James E . J . Bottomley
  Cc: Bart Van Assche, Oleksandr Natalenko, Johannes Thumshirn,
	Cathy Avery, Martin Steigerwald, Ming Lei

REQF_PREEMPT is a bit special because the request is required
to be dispatched to lld even when SCSI device is quiesced.

So this patch introduces __blk_get_request() and allows users to pass
RQF_PREEMPT flag in, then we can allow to allocate request of RQF_PREEMPT
when queue is in mode of PREEMPT ONLY which will be introduced
in the following patch.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c       | 19 +++++++++----------
 block/blk-mq.c         |  3 +--
 include/linux/blk-mq.h |  7 ++++---
 include/linux/blkdev.h | 17 ++++++++++++++---
 4 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index be17b5bcf6e7..0a8396e8e4ff 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1395,7 +1395,8 @@ static struct request *get_request(struct request_queue *q, unsigned int op,
 }
 
 static struct request *blk_old_get_request(struct request_queue *q,
-					   unsigned int op, gfp_t gfp_mask)
+					   unsigned int op, gfp_t gfp_mask,
+					   unsigned int flags)
 {
 	struct request *rq;
 	int ret = 0;
@@ -1405,8 +1406,7 @@ static struct request *blk_old_get_request(struct request_queue *q,
 	/* create ioc upfront */
 	create_io_context(gfp_mask, q->node);
 
-	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM) ?
-			BLK_REQ_NOWAIT : 0);
+	ret = blk_queue_enter(q, flags & BLK_REQ_BITS_MASK);
 	if (ret)
 		return ERR_PTR(ret);
 	spin_lock_irq(q->queue_lock);
@@ -1424,26 +1424,25 @@ static struct request *blk_old_get_request(struct request_queue *q,
 	return rq;
 }
 
-struct request *blk_get_request(struct request_queue *q, unsigned int op,
-				gfp_t gfp_mask)
+struct request *__blk_get_request(struct request_queue *q, unsigned int op,
+				  gfp_t gfp_mask, unsigned int flags)
 {
 	struct request *req;
 
+	flags |= gfp_mask & __GFP_DIRECT_RECLAIM ? 0 : BLK_REQ_NOWAIT;
 	if (q->mq_ops) {
-		req = blk_mq_alloc_request(q, op,
-			(gfp_mask & __GFP_DIRECT_RECLAIM) ?
-				0 : BLK_MQ_REQ_NOWAIT);
+		req = blk_mq_alloc_request(q, op, flags);
 		if (!IS_ERR(req) && q->mq_ops->initialize_rq_fn)
 			q->mq_ops->initialize_rq_fn(req);
 	} else {
-		req = blk_old_get_request(q, op, gfp_mask);
+		req = blk_old_get_request(q, op, gfp_mask, flags);
 		if (!IS_ERR(req) && q->initialize_rq_fn)
 			q->initialize_rq_fn(req);
 	}
 
 	return req;
 }
-EXPORT_SYMBOL(blk_get_request);
+EXPORT_SYMBOL(__blk_get_request);
 
 /**
  * blk_requeue_request - put a request back on queue
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 45bff90e08f7..90b43f607e3c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -384,8 +384,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
 	struct request *rq;
 	int ret;
 
-	ret = blk_queue_enter(q, (flags & BLK_MQ_REQ_NOWAIT) ?
-			BLK_REQ_NOWAIT : 0);
+	ret = blk_queue_enter(q, flags & BLK_REQ_BITS_MASK);
 	if (ret)
 		return ERR_PTR(ret);
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 50c6485cb04f..066a676d7749 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -197,9 +197,10 @@ void blk_mq_free_request(struct request *rq);
 bool blk_mq_can_queue(struct blk_mq_hw_ctx *);
 
 enum {
-	BLK_MQ_REQ_NOWAIT	= (1 << 0), /* return when out of requests */
-	BLK_MQ_REQ_RESERVED	= (1 << 1), /* allocate from reserved pool */
-	BLK_MQ_REQ_INTERNAL	= (1 << 2), /* allocate internal/sched tag */
+	BLK_MQ_REQ_NOWAIT	= BLK_REQ_NOWAIT, /* return when out of requests */
+	BLK_MQ_REQ_PREEMPT	= BLK_REQ_PREEMPT, /* allocate for RQF_PREEMPT */
+	BLK_MQ_REQ_RESERVED	= (1 << BLK_REQ_MQ_START_BIT), /* allocate from reserved pool */
+	BLK_MQ_REQ_INTERNAL	= (1 << (BLK_REQ_MQ_START_BIT + 1)), /* allocate internal/sched tag */
 };
 
 struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 107e2fd48486..d1ab950a7f72 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -859,7 +859,10 @@ enum {
 
 /* passed to blk_queue_enter */
 enum {
-	BLK_REQ_NOWAIT = (1 << 0),
+	BLK_REQ_NOWAIT		= (1 << 0),
+	BLK_REQ_PREEMPT		= (1 << 1),
+	BLK_REQ_MQ_START_BIT	= 2,
+	BLK_REQ_BITS_MASK	= (1U << BLK_REQ_MQ_START_BIT) - 1,
 };
 
 extern unsigned long blk_max_low_pfn, blk_max_pfn;
@@ -944,8 +947,9 @@ extern void blk_rq_init(struct request_queue *q, struct request *rq);
 extern void blk_init_request_from_bio(struct request *req, struct bio *bio);
 extern void blk_put_request(struct request *);
 extern void __blk_put_request(struct request_queue *, struct request *);
-extern struct request *blk_get_request(struct request_queue *, unsigned int op,
-				       gfp_t gfp_mask);
+extern struct request *__blk_get_request(struct request_queue *,
+					 unsigned int op, gfp_t gfp_mask,
+					 unsigned int flags);
 extern void blk_requeue_request(struct request_queue *, struct request *);
 extern int blk_lld_busy(struct request_queue *q);
 extern int blk_rq_prep_clone(struct request *rq, struct request *rq_src,
@@ -996,6 +1000,13 @@ blk_status_t errno_to_blk_status(int errno);
 
 bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
 
+static inline struct request *blk_get_request(struct request_queue *q,
+					      unsigned int op,
+					      gfp_t gfp_mask)
+{
+	return __blk_get_request(q, op, gfp_mask, 0);
+}
+
 static inline struct request_queue *bdev_get_queue(struct block_device *bdev)
 {
 	return bdev->bd_disk->queue;	/* this is never NULL */
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH V6 5/6] block: support PREEMPT_ONLY
  2017-09-27  5:48 [PATCH V6 0/6] block/scsi: safe SCSI quiescing Ming Lei
                   ` (3 preceding siblings ...)
  2017-09-27  5:48 ` [PATCH V6 4/6] block: prepare for passing RQF_PREEMPT to request allocation Ming Lei
@ 2017-09-27  5:48 ` Ming Lei
  2017-09-27  5:48 ` [PATCH V6 6/6] SCSI: set block queue at preempt only when SCSI device is put into quiesce Ming Lei
  2017-09-27  7:57   ` Martin Steigerwald
  6 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2017-09-27  5:48 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, linux-scsi,
	Martin K . Petersen, James E . J . Bottomley
  Cc: Bart Van Assche, Oleksandr Natalenko, Johannes Thumshirn,
	Cathy Avery, Martin Steigerwald, Ming Lei

When queue is in PREEMPT_ONLY mode, only RQF_PREEMPT request
can be allocated and dispatched, other requests won't be allowed
to enter I/O path.

This is useful for supporting safe SCSI quiesce.

Part of this patch is from Bart's '[PATCH v4 4∕7] block: Add the QUEUE_FLAG_PREEMPT_ONLY
request queue flag'.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c       | 25 +++++++++++++++++++++++--
 include/linux/blkdev.h |  5 +++++
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 0a8396e8e4ff..3c0a7e7f172f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -346,6 +346,17 @@ void blk_sync_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL(blk_sync_queue);
 
+void blk_set_preempt_only(struct request_queue *q, bool preempt_only)
+{
+	blk_mq_freeze_queue(q);
+	if (preempt_only)
+		queue_flag_set_unlocked(QUEUE_FLAG_PREEMPT_ONLY, q);
+	else
+		queue_flag_clear_unlocked(QUEUE_FLAG_PREEMPT_ONLY, q);
+	blk_mq_unfreeze_queue(q);
+}
+EXPORT_SYMBOL(blk_set_preempt_only);
+
 /**
  * __blk_run_queue_uncond - run a queue whether or not it has been stopped
  * @q:	The queue to run
@@ -771,9 +782,18 @@ int blk_queue_enter(struct request_queue *q, unsigned flags)
 	while (true) {
 		int ret;
 
+		/*
+		 * preempt_only flag has to be set after queue is frozen,
+		 * so it can be checked here lockless and safely
+		 */
+		if (blk_queue_preempt_only(q)) {
+			if (!(flags & BLK_REQ_PREEMPT))
+				goto slow_path;
+		}
+
 		if (percpu_ref_tryget_live(&q->q_usage_counter))
 			return 0;
-
+ slow_path:
 		if (flags & BLK_REQ_NOWAIT)
 			return -EBUSY;
 
@@ -787,7 +807,8 @@ int blk_queue_enter(struct request_queue *q, unsigned flags)
 		smp_rmb();
 
 		ret = wait_event_interruptible(q->mq_freeze_wq,
-				!atomic_read(&q->mq_freeze_depth) ||
+				(!atomic_read(&q->mq_freeze_depth) &&
+				!blk_queue_preempt_only(q)) ||
 				blk_queue_dying(q));
 		if (blk_queue_dying(q))
 			return -ENODEV;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d1ab950a7f72..8f5b15b2bf06 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -630,6 +630,7 @@ struct request_queue {
 #define QUEUE_FLAG_REGISTERED  26	/* queue has been registered to a disk */
 #define QUEUE_FLAG_SCSI_PASSTHROUGH 27	/* queue supports SCSI commands */
 #define QUEUE_FLAG_QUIESCED    28	/* queue has been quiesced */
+#define QUEUE_FLAG_PREEMPT_ONLY	29	/* only process REQ_PREEMPT requests */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_STACKABLE)	|	\
@@ -734,6 +735,10 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
 			     REQ_FAILFAST_DRIVER))
 #define blk_queue_quiesced(q)	test_bit(QUEUE_FLAG_QUIESCED, &(q)->queue_flags)
+#define blk_queue_preempt_only(q)				\
+	test_bit(QUEUE_FLAG_PREEMPT_ONLY, &(q)->queue_flags)
+
+extern void blk_set_preempt_only(struct request_queue *q, bool preempt_only);
 
 static inline bool blk_account_rq(struct request *rq)
 {
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH V6 6/6] SCSI: set block queue at preempt only when SCSI device is put into quiesce
  2017-09-27  5:48 [PATCH V6 0/6] block/scsi: safe SCSI quiescing Ming Lei
                   ` (4 preceding siblings ...)
  2017-09-27  5:48 ` [PATCH V6 5/6] block: support PREEMPT_ONLY Ming Lei
@ 2017-09-27  5:48 ` Ming Lei
  2017-09-27  9:54     ` Bart Van Assche
  2017-09-27  7:57   ` Martin Steigerwald
  6 siblings, 1 reply; 21+ messages in thread
From: Ming Lei @ 2017-09-27  5:48 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, linux-scsi,
	Martin K . Petersen, James E . J . Bottomley
  Cc: Bart Van Assche, Oleksandr Natalenko, Johannes Thumshirn,
	Cathy Avery, Martin Steigerwald, Ming Lei

Simply quiesing SCSI device and waiting for completeion of IO
dispatched to SCSI queue isn't safe, it is easy to use up
request pool because all allocated requests before can't
be dispatched when device is put in QIUESCE. Then no request
can be allocated for RQF_PREEMPT, and system may hang somewhere,
such as When sending commands of sync_cache or start_stop during
system suspend path.

Before quiesing SCSI, this patch sets block queue in preempt
mode first, so no new normal request can enter queue any more,
and all pending requests are drained too once blk_set_preempt_only(true)
is returned. Then RQF_PREEMPT can be allocated successfully duirng
preempt freeze.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/scsi_lib.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9cf6a80fe297..82c51619f1b7 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -252,9 +252,10 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 	struct scsi_request *rq;
 	int ret = DRIVER_ERROR << 24;
 
-	req = blk_get_request(sdev->request_queue,
+	req = __blk_get_request(sdev->request_queue,
 			data_direction == DMA_TO_DEVICE ?
-			REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, __GFP_RECLAIM);
+			REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, __GFP_RECLAIM,
+			BLK_REQ_PREEMPT);
 	if (IS_ERR(req))
 		return ret;
 	rq = scsi_req(req);
@@ -2928,12 +2929,28 @@ scsi_device_quiesce(struct scsi_device *sdev)
 {
 	int err;
 
+	/*
+	 * Simply quiesing SCSI device isn't safe, it is easy
+	 * to use up requests because all these allocated requests
+	 * can't be dispatched when device is put in QIUESCE.
+	 * Then no request can be allocated and we may hang
+	 * somewhere, such as system suspend/resume.
+	 *
+	 * So we set block queue in preempt only first, no new
+	 * normal request can enter queue any more, and all pending
+	 * requests are drained once blk_set_preempt_only()
+	 * returns. Only RQF_PREEMPT is allowed in preempt only mode.
+	 */
+	blk_set_preempt_only(sdev->request_queue, true);
+
 	mutex_lock(&sdev->state_mutex);
 	err = scsi_device_set_state(sdev, SDEV_QUIESCE);
 	mutex_unlock(&sdev->state_mutex);
 
-	if (err)
+	if (err) {
+		blk_set_preempt_only(sdev->request_queue, false);
 		return err;
+	}
 
 	scsi_run_queue(sdev->request_queue);
 	while (atomic_read(&sdev->device_busy)) {
@@ -2964,6 +2981,8 @@ void scsi_device_resume(struct scsi_device *sdev)
 	    scsi_device_set_state(sdev, SDEV_RUNNING) == 0)
 		scsi_run_queue(sdev->request_queue);
 	mutex_unlock(&sdev->state_mutex);
+
+	blk_set_preempt_only(sdev->request_queue, false);
 }
 EXPORT_SYMBOL(scsi_device_resume);
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH V6 2/6] block: tracking request allocation with q_usage_counter
  2017-09-27  5:48 ` [PATCH V6 2/6] block: tracking request allocation with q_usage_counter Ming Lei
@ 2017-09-27  6:10   ` Hannes Reinecke
  0 siblings, 0 replies; 21+ messages in thread
From: Hannes Reinecke @ 2017-09-27  6:10 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, Christoph Hellwig, linux-scsi,
	Martin K . Petersen, James E . J . Bottomley
  Cc: Bart Van Assche, Oleksandr Natalenko, Johannes Thumshirn,
	Cathy Avery, Martin Steigerwald

On 09/27/2017 07:48 AM, Ming Lei wrote:
> This usage is basically same with blk-mq, so that we can
> support to freeze legacy queue easily.
> 
> Also 'wake_up_all(&q->mq_freeze_wq)' has to be moved
> into blk_set_queue_dying() since both legacy and blk-mq
> may wait on the wait queue of .mq_freeze_wq.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-core.c | 14 ++++++++++++++
>  block/blk-mq.c   |  7 -------
>  2 files changed, 14 insertions(+), 7 deletions(-)
> 
As indicated in the other (similar) patch from Bart, we have a customer
report running into a q_usage_counter underflow with legacy-sq.
So this patch is actually a bugfix.

Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V6 0/6] block/scsi: safe SCSI quiescing
  2017-09-27  5:48 [PATCH V6 0/6] block/scsi: safe SCSI quiescing Ming Lei
@ 2017-09-27  7:57   ` Martin Steigerwald
  2017-09-27  5:48 ` [PATCH V6 2/6] block: tracking request allocation with q_usage_counter Ming Lei
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Martin Steigerwald @ 2017-09-27  7:57 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, linux-scsi,
	Martin K . Petersen, James E . J . Bottomley, Bart Van Assche,
	Oleksandr Natalenko, Johannes Thumshirn, Cathy Avery

Hi Ming.

Ming Lei - 27.09.17, 13:48:
> Hi,
> 
> The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.
> 
> Once SCSI device is put into QUIESCE, no new request except for
> RQF_PREEMPT can be dispatched to SCSI successfully, and
> scsi_device_quiesce() just simply waits for completion of I/Os
> dispatched to SCSI stack. It isn't enough at all.
> 
> Because new request still can be comming, but all the allocated
> requests can't be dispatched successfully, so request pool can be
> consumed up easily.
> 
> Then request with RQF_PREEMPT can't be allocated and wait forever,
> meantime scsi_device_resume() waits for completion of RQF_PREEMPT,
> then system hangs forever, such as during system suspend or
> sending SCSI domain alidation.
> 
> Both IO hang inside system suspend[1] or SCSI domain validation
> were reported before.
> 
> This patch introduces preempt only mode, and solves the issue
> by allowing RQF_PREEMP only during SCSI quiesce.
> 
> Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
> them all.
> 
> V6:
> 	- borrow Bart's idea of preempt only, with clean
> 	  implementation(patch 5/patch 6)
> 	- needn't any external driver's dependency, such as MD's
> 	change

Do you want me to test with v6 of the patch set? If so, it would be nice if 
you�d make a v6 branch in your git repo.

After an uptime of almost 6 days I am pretty confident that the V5 one fixes the 
issue for me. So

Tested-by: Martin Steigerwald <martin@lichtvoll.de>

for V5.

Thanks,
Martin

> V5:
> 	- fix one tiny race by introducing blk_queue_enter_preempt_freeze()
> 	given this change is small enough compared with V4, I added
> 	tested-by directly
> 
> V4:
> 	- reorganize patch order to make it more reasonable
> 	- support nested preempt freeze, as required by SCSI transport spi
> 	- check preempt freezing in slow path of of blk_queue_enter()
> 	- add "SCSI: transport_spi: resume a quiesced device"
> 	- wake up freeze queue in setting dying for both blk-mq and legacy
> 	- rename blk_mq_[freeze|unfreeze]_queue() in one patch
> 	- rename .mq_freeze_wq and .mq_freeze_depth
> 	- improve comment
> 
> V3:
> 	- introduce q->preempt_unfreezing to fix one bug of preempt freeze
> 	- call blk_queue_enter_live() only when queue is preempt frozen
> 	- cleanup a bit on the implementation of preempt freeze
> 	- only patch 6 and 7 are changed
> 
> V2:
> 	- drop the 1st patch in V1 because percpu_ref_is_dying() is
> 	enough as pointed by Tejun
> 	- introduce preempt version of blk_[freeze|unfreeze]_queue
> 	- sync between preempt freeze and normal freeze
> 	- fix warning from percpu-refcount as reported by Oleksandr
> 
> 
> [1] https://marc.info/?t=150340250100013&r=3&w=2
> 
> 
> Thanks,
> Ming
> 
> Ming Lei (6):
>   blk-mq: only run hw queues for blk-mq
>   block: tracking request allocation with q_usage_counter
>   block: pass flags to blk_queue_enter()
>   block: prepare for passing RQF_PREEMPT to request allocation
>   block: support PREEMPT_ONLY
>   SCSI: set block queue at preempt only when SCSI device is put into
>     quiesce
> 
>  block/blk-core.c        | 62
> ++++++++++++++++++++++++++++++++++++++----------- block/blk-mq.c          |
> 14 ++++-------
>  block/blk-timeout.c     |  2 +-
>  drivers/scsi/scsi_lib.c | 25 +++++++++++++++++---
>  fs/block_dev.c          |  4 ++--
>  include/linux/blk-mq.h  |  7 +++---
>  include/linux/blkdev.h  | 27 ++++++++++++++++++---
>  7 files changed, 106 insertions(+), 35 deletions(-)


-- 
Martin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V6 0/6] block/scsi: safe SCSI quiescing
@ 2017-09-27  7:57   ` Martin Steigerwald
  0 siblings, 0 replies; 21+ messages in thread
From: Martin Steigerwald @ 2017-09-27  7:57 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, linux-scsi,
	Martin K . Petersen, James E . J . Bottomley, Bart Van Assche,
	Oleksandr Natalenko, Johannes Thumshirn, Cathy Avery

Hi Ming.

Ming Lei - 27.09.17, 13:48:
> Hi,
> 
> The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.
> 
> Once SCSI device is put into QUIESCE, no new request except for
> RQF_PREEMPT can be dispatched to SCSI successfully, and
> scsi_device_quiesce() just simply waits for completion of I/Os
> dispatched to SCSI stack. It isn't enough at all.
> 
> Because new request still can be comming, but all the allocated
> requests can't be dispatched successfully, so request pool can be
> consumed up easily.
> 
> Then request with RQF_PREEMPT can't be allocated and wait forever,
> meantime scsi_device_resume() waits for completion of RQF_PREEMPT,
> then system hangs forever, such as during system suspend or
> sending SCSI domain alidation.
> 
> Both IO hang inside system suspend[1] or SCSI domain validation
> were reported before.
> 
> This patch introduces preempt only mode, and solves the issue
> by allowing RQF_PREEMP only during SCSI quiesce.
> 
> Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
> them all.
> 
> V6:
> 	- borrow Bart's idea of preempt only, with clean
> 	  implementation(patch 5/patch 6)
> 	- needn't any external driver's dependency, such as MD's
> 	change

Do you want me to test with v6 of the patch set? If so, it would be nice if 
you´d make a v6 branch in your git repo.

After an uptime of almost 6 days I am pretty confident that the V5 one fixes the 
issue for me. So

Tested-by: Martin Steigerwald <martin@lichtvoll.de>

for V5.

Thanks,
Martin

> V5:
> 	- fix one tiny race by introducing blk_queue_enter_preempt_freeze()
> 	given this change is small enough compared with V4, I added
> 	tested-by directly
> 
> V4:
> 	- reorganize patch order to make it more reasonable
> 	- support nested preempt freeze, as required by SCSI transport spi
> 	- check preempt freezing in slow path of of blk_queue_enter()
> 	- add "SCSI: transport_spi: resume a quiesced device"
> 	- wake up freeze queue in setting dying for both blk-mq and legacy
> 	- rename blk_mq_[freeze|unfreeze]_queue() in one patch
> 	- rename .mq_freeze_wq and .mq_freeze_depth
> 	- improve comment
> 
> V3:
> 	- introduce q->preempt_unfreezing to fix one bug of preempt freeze
> 	- call blk_queue_enter_live() only when queue is preempt frozen
> 	- cleanup a bit on the implementation of preempt freeze
> 	- only patch 6 and 7 are changed
> 
> V2:
> 	- drop the 1st patch in V1 because percpu_ref_is_dying() is
> 	enough as pointed by Tejun
> 	- introduce preempt version of blk_[freeze|unfreeze]_queue
> 	- sync between preempt freeze and normal freeze
> 	- fix warning from percpu-refcount as reported by Oleksandr
> 
> 
> [1] https://marc.info/?t=150340250100013&r=3&w=2
> 
> 
> Thanks,
> Ming
> 
> Ming Lei (6):
>   blk-mq: only run hw queues for blk-mq
>   block: tracking request allocation with q_usage_counter
>   block: pass flags to blk_queue_enter()
>   block: prepare for passing RQF_PREEMPT to request allocation
>   block: support PREEMPT_ONLY
>   SCSI: set block queue at preempt only when SCSI device is put into
>     quiesce
> 
>  block/blk-core.c        | 62
> ++++++++++++++++++++++++++++++++++++++----------- block/blk-mq.c          |
> 14 ++++-------
>  block/blk-timeout.c     |  2 +-
>  drivers/scsi/scsi_lib.c | 25 +++++++++++++++++---
>  fs/block_dev.c          |  4 ++--
>  include/linux/blk-mq.h  |  7 +++---
>  include/linux/blkdev.h  | 27 ++++++++++++++++++---
>  7 files changed, 106 insertions(+), 35 deletions(-)


-- 
Martin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V6 0/6] block/scsi: safe SCSI quiescing
  2017-09-27  7:57   ` Martin Steigerwald
@ 2017-09-27  8:27     ` Ming Lei
  -1 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2017-09-27  8:27 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Jens Axboe, linux-block, Christoph Hellwig, linux-scsi,
	Martin K . Petersen, James E . J . Bottomley, Bart Van Assche,
	Oleksandr Natalenko, Johannes Thumshirn, Cathy Avery

On Wed, Sep 27, 2017 at 09:57:37AM +0200, Martin Steigerwald wrote:
> Hi Ming.
> 
> Ming Lei - 27.09.17, 13:48:
> > Hi,
> > 
> > The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.
> > 
> > Once SCSI device is put into QUIESCE, no new request except for
> > RQF_PREEMPT can be dispatched to SCSI successfully, and
> > scsi_device_quiesce() just simply waits for completion of I/Os
> > dispatched to SCSI stack. It isn't enough at all.
> > 
> > Because new request still can be comming, but all the allocated
> > requests can't be dispatched successfully, so request pool can be
> > consumed up easily.
> > 
> > Then request with RQF_PREEMPT can't be allocated and wait forever,
> > meantime scsi_device_resume() waits for completion of RQF_PREEMPT,
> > then system hangs forever, such as during system suspend or
> > sending SCSI domain alidation.
> > 
> > Both IO hang inside system suspend[1] or SCSI domain validation
> > were reported before.
> > 
> > This patch introduces preempt only mode, and solves the issue
> > by allowing RQF_PREEMP only during SCSI quiesce.
> > 
> > Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
> > them all.
> > 
> > V6:
> > 	- borrow Bart's idea of preempt only, with clean
> > 	  implementation(patch 5/patch 6)
> > 	- needn't any external driver's dependency, such as MD's
> > 	change
> 
> Do you want me to test with v6 of the patch set? If so, it would be nice if 
> you�d make a v6 branch in your git repo.

Hi Martin,

I appreciate much if you may run V6 and provide your test result,
follows the branch:

https://github.com/ming1/linux/tree/blk_safe_scsi_quiesce_V6

https://github.com/ming1/linux.git #blk_safe_scsi_quiesce_V6


> 
> After an uptime of almost 6 days I am pretty confident that the V5 one fixes the 
> issue for me. So
> 
> Tested-by: Martin Steigerwald <martin@lichtvoll.de>
> 
> for V5.

Thanks for your test!


-- 
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V6 0/6] block/scsi: safe SCSI quiescing
@ 2017-09-27  8:27     ` Ming Lei
  0 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2017-09-27  8:27 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Jens Axboe, linux-block, Christoph Hellwig, linux-scsi,
	Martin K . Petersen, James E . J . Bottomley, Bart Van Assche,
	Oleksandr Natalenko, Johannes Thumshirn, Cathy Avery

On Wed, Sep 27, 2017 at 09:57:37AM +0200, Martin Steigerwald wrote:
> Hi Ming.
> 
> Ming Lei - 27.09.17, 13:48:
> > Hi,
> > 
> > The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.
> > 
> > Once SCSI device is put into QUIESCE, no new request except for
> > RQF_PREEMPT can be dispatched to SCSI successfully, and
> > scsi_device_quiesce() just simply waits for completion of I/Os
> > dispatched to SCSI stack. It isn't enough at all.
> > 
> > Because new request still can be comming, but all the allocated
> > requests can't be dispatched successfully, so request pool can be
> > consumed up easily.
> > 
> > Then request with RQF_PREEMPT can't be allocated and wait forever,
> > meantime scsi_device_resume() waits for completion of RQF_PREEMPT,
> > then system hangs forever, such as during system suspend or
> > sending SCSI domain alidation.
> > 
> > Both IO hang inside system suspend[1] or SCSI domain validation
> > were reported before.
> > 
> > This patch introduces preempt only mode, and solves the issue
> > by allowing RQF_PREEMP only during SCSI quiesce.
> > 
> > Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
> > them all.
> > 
> > V6:
> > 	- borrow Bart's idea of preempt only, with clean
> > 	  implementation(patch 5/patch 6)
> > 	- needn't any external driver's dependency, such as MD's
> > 	change
> 
> Do you want me to test with v6 of the patch set? If so, it would be nice if 
> you´d make a v6 branch in your git repo.

Hi Martin,

I appreciate much if you may run V6 and provide your test result,
follows the branch:

https://github.com/ming1/linux/tree/blk_safe_scsi_quiesce_V6

https://github.com/ming1/linux.git #blk_safe_scsi_quiesce_V6


> 
> After an uptime of almost 6 days I am pretty confident that the V5 one fixes the 
> issue for me. So
> 
> Tested-by: Martin Steigerwald <martin@lichtvoll.de>
> 
> for V5.

Thanks for your test!


-- 
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V6 0/6] block/scsi: safe SCSI quiescing
  2017-09-27  8:27     ` Ming Lei
@ 2017-09-27  8:52       ` Ming Lei
  -1 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2017-09-27  8:52 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Jens Axboe, linux-block, Christoph Hellwig, linux-scsi,
	Martin K . Petersen, James E . J . Bottomley, Bart Van Assche,
	Oleksandr Natalenko, Johannes Thumshirn, Cathy Avery

On Wed, Sep 27, 2017 at 04:27:51PM +0800, Ming Lei wrote:
> On Wed, Sep 27, 2017 at 09:57:37AM +0200, Martin Steigerwald wrote:
> > Hi Ming.
> > 
> > Ming Lei - 27.09.17, 13:48:
> > > Hi,
> > > 
> > > The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.
> > > 
> > > Once SCSI device is put into QUIESCE, no new request except for
> > > RQF_PREEMPT can be dispatched to SCSI successfully, and
> > > scsi_device_quiesce() just simply waits for completion of I/Os
> > > dispatched to SCSI stack. It isn't enough at all.
> > > 
> > > Because new request still can be comming, but all the allocated
> > > requests can't be dispatched successfully, so request pool can be
> > > consumed up easily.
> > > 
> > > Then request with RQF_PREEMPT can't be allocated and wait forever,
> > > meantime scsi_device_resume() waits for completion of RQF_PREEMPT,
> > > then system hangs forever, such as during system suspend or
> > > sending SCSI domain alidation.
> > > 
> > > Both IO hang inside system suspend[1] or SCSI domain validation
> > > were reported before.
> > > 
> > > This patch introduces preempt only mode, and solves the issue
> > > by allowing RQF_PREEMP only during SCSI quiesce.
> > > 
> > > Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
> > > them all.
> > > 
> > > V6:
> > > 	- borrow Bart's idea of preempt only, with clean
> > > 	  implementation(patch 5/patch 6)
> > > 	- needn't any external driver's dependency, such as MD's
> > > 	change
> > 
> > Do you want me to test with v6 of the patch set? If so, it would be nice if 
> > you�d make a v6 branch in your git repo.
> 
> Hi Martin,
> 
> I appreciate much if you may run V6 and provide your test result,
> follows the branch:
> 
> https://github.com/ming1/linux/tree/blk_safe_scsi_quiesce_V6
> 
> https://github.com/ming1/linux.git #blk_safe_scsi_quiesce_V6
> 

Also follows the branch against V4.13:

https://github.com/ming1/linux/tree/v4.13-safe-scsi-quiesce_V6_for_test

https://github.com/ming1/linux.git #v4.13-safe-scsi-quiesce_V6_for_test

-- 
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V6 0/6] block/scsi: safe SCSI quiescing
@ 2017-09-27  8:52       ` Ming Lei
  0 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2017-09-27  8:52 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Jens Axboe, linux-block, Christoph Hellwig, linux-scsi,
	Martin K . Petersen, James E . J . Bottomley, Bart Van Assche,
	Oleksandr Natalenko, Johannes Thumshirn, Cathy Avery

On Wed, Sep 27, 2017 at 04:27:51PM +0800, Ming Lei wrote:
> On Wed, Sep 27, 2017 at 09:57:37AM +0200, Martin Steigerwald wrote:
> > Hi Ming.
> > 
> > Ming Lei - 27.09.17, 13:48:
> > > Hi,
> > > 
> > > The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.
> > > 
> > > Once SCSI device is put into QUIESCE, no new request except for
> > > RQF_PREEMPT can be dispatched to SCSI successfully, and
> > > scsi_device_quiesce() just simply waits for completion of I/Os
> > > dispatched to SCSI stack. It isn't enough at all.
> > > 
> > > Because new request still can be comming, but all the allocated
> > > requests can't be dispatched successfully, so request pool can be
> > > consumed up easily.
> > > 
> > > Then request with RQF_PREEMPT can't be allocated and wait forever,
> > > meantime scsi_device_resume() waits for completion of RQF_PREEMPT,
> > > then system hangs forever, such as during system suspend or
> > > sending SCSI domain alidation.
> > > 
> > > Both IO hang inside system suspend[1] or SCSI domain validation
> > > were reported before.
> > > 
> > > This patch introduces preempt only mode, and solves the issue
> > > by allowing RQF_PREEMP only during SCSI quiesce.
> > > 
> > > Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
> > > them all.
> > > 
> > > V6:
> > > 	- borrow Bart's idea of preempt only, with clean
> > > 	  implementation(patch 5/patch 6)
> > > 	- needn't any external driver's dependency, such as MD's
> > > 	change
> > 
> > Do you want me to test with v6 of the patch set? If so, it would be nice if 
> > you´d make a v6 branch in your git repo.
> 
> Hi Martin,
> 
> I appreciate much if you may run V6 and provide your test result,
> follows the branch:
> 
> https://github.com/ming1/linux/tree/blk_safe_scsi_quiesce_V6
> 
> https://github.com/ming1/linux.git #blk_safe_scsi_quiesce_V6
> 

Also follows the branch against V4.13:

https://github.com/ming1/linux/tree/v4.13-safe-scsi-quiesce_V6_for_test

https://github.com/ming1/linux.git #v4.13-safe-scsi-quiesce_V6_for_test

-- 
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V6 6/6] SCSI: set block queue at preempt only when SCSI device is put into quiesce
  2017-09-27  5:48 ` [PATCH V6 6/6] SCSI: set block queue at preempt only when SCSI device is put into quiesce Ming Lei
@ 2017-09-27  9:54     ` Bart Van Assche
  0 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2017-09-27  9:54 UTC (permalink / raw)
  To: linux-scsi, hch, jejb, linux-block, axboe, ming.lei, martin.petersen
  Cc: Bart Van Assche, martin, jthumshirn, oleksandr, cavery

T24gV2VkLCAyMDE3LTA5LTI3IGF0IDEzOjQ4ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gQEAg
LTI5MjgsMTIgKzI5MjksMjggQEAgc2NzaV9kZXZpY2VfcXVpZXNjZShzdHJ1Y3Qgc2NzaV9kZXZp
Y2UgKnNkZXYpDQo+ICB7DQo+ICAJaW50IGVycjsNCj4gIA0KPiArCS8qDQo+ICsJICogU2ltcGx5
IHF1aWVzaW5nIFNDU0kgZGV2aWNlIGlzbid0IHNhZmUsIGl0IGlzIGVhc3kNCj4gKwkgKiB0byB1
c2UgdXAgcmVxdWVzdHMgYmVjYXVzZSBhbGwgdGhlc2UgYWxsb2NhdGVkIHJlcXVlc3RzDQo+ICsJ
ICogY2FuJ3QgYmUgZGlzcGF0Y2hlZCB3aGVuIGRldmljZSBpcyBwdXQgaW4gUUlVRVNDRS4NCj4g
KwkgKiBUaGVuIG5vIHJlcXVlc3QgY2FuIGJlIGFsbG9jYXRlZCBhbmQgd2UgbWF5IGhhbmcNCj4g
KwkgKiBzb21ld2hlcmUsIHN1Y2ggYXMgc3lzdGVtIHN1c3BlbmQvcmVzdW1lLg0KPiArCSAqDQo+
ICsJICogU28gd2Ugc2V0IGJsb2NrIHF1ZXVlIGluIHByZWVtcHQgb25seSBmaXJzdCwgbm8gbmV3
DQo+ICsJICogbm9ybWFsIHJlcXVlc3QgY2FuIGVudGVyIHF1ZXVlIGFueSBtb3JlLCBhbmQgYWxs
IHBlbmRpbmcNCj4gKwkgKiByZXF1ZXN0cyBhcmUgZHJhaW5lZCBvbmNlIGJsa19zZXRfcHJlZW1w
dF9vbmx5KCkNCj4gKwkgKiByZXR1cm5zLiBPbmx5IFJRRl9QUkVFTVBUIGlzIGFsbG93ZWQgaW4g
cHJlZW1wdCBvbmx5IG1vZGUuDQo+ICsJICovDQo+ICsJYmxrX3NldF9wcmVlbXB0X29ubHkoc2Rl
di0+cmVxdWVzdF9xdWV1ZSwgdHJ1ZSk7DQo+ICsNCj4gIAltdXRleF9sb2NrKCZzZGV2LT5zdGF0
ZV9tdXRleCk7DQo+ICAJZXJyID0gc2NzaV9kZXZpY2Vfc2V0X3N0YXRlKHNkZXYsIFNERVZfUVVJ
RVNDRSk7DQo+ICAJbXV0ZXhfdW5sb2NrKCZzZGV2LT5zdGF0ZV9tdXRleCk7DQo+ICANCj4gLQlp
ZiAoZXJyKQ0KPiArCWlmIChlcnIpIHsNCj4gKwkJYmxrX3NldF9wcmVlbXB0X29ubHkoc2Rldi0+
cmVxdWVzdF9xdWV1ZSwgZmFsc2UpOw0KPiAgCQlyZXR1cm4gZXJyOw0KPiArCX0NCj4gIA0KPiAg
CXNjc2lfcnVuX3F1ZXVlKHNkZXYtPnJlcXVlc3RfcXVldWUpOw0KPiAgCXdoaWxlIChhdG9taWNf
cmVhZCgmc2Rldi0+ZGV2aWNlX2J1c3kpKSB7DQo+IEBAIC0yOTY0LDYgKzI5ODEsOCBAQCB2b2lk
IHNjc2lfZGV2aWNlX3Jlc3VtZShzdHJ1Y3Qgc2NzaV9kZXZpY2UgKnNkZXYpDQo+ICAJICAgIHNj
c2lfZGV2aWNlX3NldF9zdGF0ZShzZGV2LCBTREVWX1JVTk5JTkcpID09IDApDQo+ICAJCXNjc2lf
cnVuX3F1ZXVlKHNkZXYtPnJlcXVlc3RfcXVldWUpOw0KPiAgCW11dGV4X3VubG9jaygmc2Rldi0+
c3RhdGVfbXV0ZXgpOw0KPiArDQo+ICsJYmxrX3NldF9wcmVlbXB0X29ubHkoc2Rldi0+cmVxdWVz
dF9xdWV1ZSwgZmFsc2UpOw0KDQpZb3Ugc2hvdWxkIGhhdmUgcmVhbGl6ZWQgeW91cnNlbGYgdGhh
dCB0aGlzIGNvZGUgaXMgcmFjeS4gSWYgYSByZXF1ZXN0IGlzDQphbGxvY2F0ZWQganVzdCBiZWZv
cmUgc2NzaV9kZXZpY2VfcXVpZXNjZSgpIGlzIGNhbGxlZCBhbmQgZGlzcGF0Y2hlZCBqdXN0DQph
ZnRlciB0aGUgZGV2aWNlIHN0YXRlIGhhcyBiZWVuIGNoYW5nZWQgaW50byBTREVWX1FVSUVTQ0Ug
dGhlbiB0aGUgbG9vcCB0aGF0DQp3YWl0cyBmb3IgYWxsIGNvbW1hbmRzIHRvIGNvbXBsZXRlIHdp
bGwgd2FpdCBmb3JldmVyIGR1ZSB0byB0aGUgU0NTSSBwcmVwDQpmdW5jdGlvbiByZXR1cm5pbmcg
QkxLUFJFUF9ERUZFUi4NCg0KQmFydC4=

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V6 6/6] SCSI: set block queue at preempt only when SCSI device is put into quiesce
@ 2017-09-27  9:54     ` Bart Van Assche
  0 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2017-09-27  9:54 UTC (permalink / raw)
  To: linux-scsi, hch, jejb, linux-block, axboe, ming.lei, martin.petersen
  Cc: Bart Van Assche, martin, jthumshirn, oleksandr, cavery

On Wed, 2017-09-27 at 13:48 +0800, Ming Lei wrote:
> @@ -2928,12 +2929,28 @@ scsi_device_quiesce(struct scsi_device *sdev)
>  {
>  	int err;
>  
> +	/*
> +	 * Simply quiesing SCSI device isn't safe, it is easy
> +	 * to use up requests because all these allocated requests
> +	 * can't be dispatched when device is put in QIUESCE.
> +	 * Then no request can be allocated and we may hang
> +	 * somewhere, such as system suspend/resume.
> +	 *
> +	 * So we set block queue in preempt only first, no new
> +	 * normal request can enter queue any more, and all pending
> +	 * requests are drained once blk_set_preempt_only()
> +	 * returns. Only RQF_PREEMPT is allowed in preempt only mode.
> +	 */
> +	blk_set_preempt_only(sdev->request_queue, true);
> +
>  	mutex_lock(&sdev->state_mutex);
>  	err = scsi_device_set_state(sdev, SDEV_QUIESCE);
>  	mutex_unlock(&sdev->state_mutex);
>  
> -	if (err)
> +	if (err) {
> +		blk_set_preempt_only(sdev->request_queue, false);
>  		return err;
> +	}
>  
>  	scsi_run_queue(sdev->request_queue);
>  	while (atomic_read(&sdev->device_busy)) {
> @@ -2964,6 +2981,8 @@ void scsi_device_resume(struct scsi_device *sdev)
>  	    scsi_device_set_state(sdev, SDEV_RUNNING) == 0)
>  		scsi_run_queue(sdev->request_queue);
>  	mutex_unlock(&sdev->state_mutex);
> +
> +	blk_set_preempt_only(sdev->request_queue, false);

You should have realized yourself that this code is racy. If a request is
allocated just before scsi_device_quiesce() is called and dispatched just
after the device state has been changed into SDEV_QUIESCE then the loop that
waits for all commands to complete will wait forever due to the SCSI prep
function returning BLKPREP_DEFER.

Bart.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V6 6/6] SCSI: set block queue at preempt only when SCSI device is put into quiesce
  2017-09-27  9:54     ` Bart Van Assche
  (?)
@ 2017-09-27 10:14     ` Ming Lei
  -1 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2017-09-27 10:14 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, hch, jejb, linux-block, axboe, martin.petersen,
	martin, jthumshirn, oleksandr, cavery

On Wed, Sep 27, 2017 at 09:54:09AM +0000, Bart Van Assche wrote:
> On Wed, 2017-09-27 at 13:48 +0800, Ming Lei wrote:
> > @@ -2928,12 +2929,28 @@ scsi_device_quiesce(struct scsi_device *sdev)
> >  {
> >  	int err;
> >  
> > +	/*
> > +	 * Simply quiesing SCSI device isn't safe, it is easy
> > +	 * to use up requests because all these allocated requests
> > +	 * can't be dispatched when device is put in QIUESCE.
> > +	 * Then no request can be allocated and we may hang
> > +	 * somewhere, such as system suspend/resume.
> > +	 *
> > +	 * So we set block queue in preempt only first, no new
> > +	 * normal request can enter queue any more, and all pending
> > +	 * requests are drained once blk_set_preempt_only()
> > +	 * returns. Only RQF_PREEMPT is allowed in preempt only mode.
> > +	 */
> > +	blk_set_preempt_only(sdev->request_queue, true);
> > +
> >  	mutex_lock(&sdev->state_mutex);
> >  	err = scsi_device_set_state(sdev, SDEV_QUIESCE);
> >  	mutex_unlock(&sdev->state_mutex);
> >  
> > -	if (err)
> > +	if (err) {
> > +		blk_set_preempt_only(sdev->request_queue, false);
> >  		return err;
> > +	}
> >  
> >  	scsi_run_queue(sdev->request_queue);
> >  	while (atomic_read(&sdev->device_busy)) {
> > @@ -2964,6 +2981,8 @@ void scsi_device_resume(struct scsi_device *sdev)
> >  	    scsi_device_set_state(sdev, SDEV_RUNNING) == 0)
> >  		scsi_run_queue(sdev->request_queue);
> >  	mutex_unlock(&sdev->state_mutex);
> > +
> > +	blk_set_preempt_only(sdev->request_queue, false);
> 
> You should have realized yourself that this code is racy. If a request is
> allocated just before scsi_device_quiesce() is called and dispatched just
> after the device state has been changed into SDEV_QUIESCE then the loop that

That won't happen, any requests allocated before blk_set_preempt_only(true)
will be drained. Any normal requests are prevented from being entering
queue after blk_set_preempt_only(true) returns.

Please look at blk_set_preempt_only():

	+void blk_set_preempt_only(struct request_queue *q, bool preempt_only)
	+{
	+       blk_mq_freeze_queue(q);
	+       if (preempt_only)
	+               queue_flag_set_unlocked(QUEUE_FLAG_PREEMPT_ONLY, q);
	+       else
	+               queue_flag_clear_unlocked(QUEUE_FLAG_PREEMPT_ONLY, q);
	+       blk_mq_unfreeze_queue(q);
	+}
	+EXPORT_SYMBOL(blk_set_preempt_only);

blk_set_preempt_only(true) is called before scsi_device_set_state(sdev, SDEV_QUIESCE),
then any requests will be drained by blk_mq_freeze_queue() inside
blk_set_preempt_only(), meantime new normal requests are prevented from
being entering queue.

Once blk_set_preempt_only() returns, only RQF_PREEMPT is allowed to
enter queue.


-- 
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V6 0/6] block/scsi: safe SCSI quiescing
  2017-09-27  8:52       ` Ming Lei
@ 2017-09-28  8:11         ` Oleksandr Natalenko
  -1 siblings, 0 replies; 21+ messages in thread
From: Oleksandr Natalenko @ 2017-09-28  8:11 UTC (permalink / raw)
  To: Ming Lei
  Cc: Martin Steigerwald, Jens Axboe, linux-block, Christoph Hellwig,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley,
	Bart Van Assche, Johannes Thumshirn, Cathy Avery

Hey.

I can confirm that v6 of your patchset still works well for me. Tested on=20
v4.13 kernel.

Thanks.

On st=C5=99eda 27. z=C3=A1=C5=99=C3=AD 2017 10:52:41 CEST Ming Lei wrote:
> On Wed, Sep 27, 2017 at 04:27:51PM +0800, Ming Lei wrote:
> > On Wed, Sep 27, 2017 at 09:57:37AM +0200, Martin Steigerwald wrote:
> > > Hi Ming.
> > >=20
> > > Ming Lei - 27.09.17, 13:48:
> > > > Hi,
> > > >=20
> > > > The current SCSI quiesce isn't safe and easy to trigger I/O deadloc=
k.
> > > >=20
> > > > Once SCSI device is put into QUIESCE, no new request except for
> > > > RQF_PREEMPT can be dispatched to SCSI successfully, and
> > > > scsi_device_quiesce() just simply waits for completion of I/Os
> > > > dispatched to SCSI stack. It isn't enough at all.
> > > >=20
> > > > Because new request still can be comming, but all the allocated
> > > > requests can't be dispatched successfully, so request pool can be
> > > > consumed up easily.
> > > >=20
> > > > Then request with RQF_PREEMPT can't be allocated and wait forever,
> > > > meantime scsi_device_resume() waits for completion of RQF_PREEMPT,
> > > > then system hangs forever, such as during system suspend or
> > > > sending SCSI domain alidation.
> > > >=20
> > > > Both IO hang inside system suspend[1] or SCSI domain validation
> > > > were reported before.
> > > >=20
> > > > This patch introduces preempt only mode, and solves the issue
> > > > by allowing RQF_PREEMP only during SCSI quiesce.
> > > >=20
> > > > Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
> > > > them all.
> > > >=20
> > > > V6:
> > > > 	- borrow Bart's idea of preempt only, with clean
> > > > =09
> > > > 	  implementation(patch 5/patch 6)
> > > > =09
> > > > 	- needn't any external driver's dependency, such as MD's
> > > > 	change
> > >=20
> > > Do you want me to test with v6 of the patch set? If so, it would be n=
ice
> > > if
> > > you=C2=B4d make a v6 branch in your git repo.
> >=20
> > Hi Martin,
> >=20
> > I appreciate much if you may run V6 and provide your test result,
> > follows the branch:
> >=20
> > https://github.com/ming1/linux/tree/blk_safe_scsi_quiesce_V6
> >=20
> > https://github.com/ming1/linux.git #blk_safe_scsi_quiesce_V6
>=20
> Also follows the branch against V4.13:
>=20
> https://github.com/ming1/linux/tree/v4.13-safe-scsi-quiesce_V6_for_test
>=20
> https://github.com/ming1/linux.git #v4.13-safe-scsi-quiesce_V6_for_test

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V6 0/6] block/scsi: safe SCSI quiescing
@ 2017-09-28  8:11         ` Oleksandr Natalenko
  0 siblings, 0 replies; 21+ messages in thread
From: Oleksandr Natalenko @ 2017-09-28  8:11 UTC (permalink / raw)
  To: Ming Lei
  Cc: Martin Steigerwald, Jens Axboe, linux-block, Christoph Hellwig,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley,
	Bart Van Assche, Johannes Thumshirn, Cathy Avery

Hey.

I can confirm that v6 of your patchset still works well for me. Tested on 
v4.13 kernel.

Thanks.

On středa 27. září 2017 10:52:41 CEST Ming Lei wrote:
> On Wed, Sep 27, 2017 at 04:27:51PM +0800, Ming Lei wrote:
> > On Wed, Sep 27, 2017 at 09:57:37AM +0200, Martin Steigerwald wrote:
> > > Hi Ming.
> > > 
> > > Ming Lei - 27.09.17, 13:48:
> > > > Hi,
> > > > 
> > > > The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.
> > > > 
> > > > Once SCSI device is put into QUIESCE, no new request except for
> > > > RQF_PREEMPT can be dispatched to SCSI successfully, and
> > > > scsi_device_quiesce() just simply waits for completion of I/Os
> > > > dispatched to SCSI stack. It isn't enough at all.
> > > > 
> > > > Because new request still can be comming, but all the allocated
> > > > requests can't be dispatched successfully, so request pool can be
> > > > consumed up easily.
> > > > 
> > > > Then request with RQF_PREEMPT can't be allocated and wait forever,
> > > > meantime scsi_device_resume() waits for completion of RQF_PREEMPT,
> > > > then system hangs forever, such as during system suspend or
> > > > sending SCSI domain alidation.
> > > > 
> > > > Both IO hang inside system suspend[1] or SCSI domain validation
> > > > were reported before.
> > > > 
> > > > This patch introduces preempt only mode, and solves the issue
> > > > by allowing RQF_PREEMP only during SCSI quiesce.
> > > > 
> > > > Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
> > > > them all.
> > > > 
> > > > V6:
> > > > 	- borrow Bart's idea of preempt only, with clean
> > > > 	
> > > > 	  implementation(patch 5/patch 6)
> > > > 	
> > > > 	- needn't any external driver's dependency, such as MD's
> > > > 	change
> > > 
> > > Do you want me to test with v6 of the patch set? If so, it would be nice
> > > if
> > > you´d make a v6 branch in your git repo.
> > 
> > Hi Martin,
> > 
> > I appreciate much if you may run V6 and provide your test result,
> > follows the branch:
> > 
> > https://github.com/ming1/linux/tree/blk_safe_scsi_quiesce_V6
> > 
> > https://github.com/ming1/linux.git #blk_safe_scsi_quiesce_V6
> 
> Also follows the branch against V4.13:
> 
> https://github.com/ming1/linux/tree/v4.13-safe-scsi-quiesce_V6_for_test
> 
> https://github.com/ming1/linux.git #v4.13-safe-scsi-quiesce_V6_for_test

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V6 0/6] block/scsi: safe SCSI quiescing
  2017-09-27  8:27     ` Ming Lei
@ 2017-09-29 18:47       ` Martin Steigerwald
  -1 siblings, 0 replies; 21+ messages in thread
From: Martin Steigerwald @ 2017-09-29 18:47 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, linux-scsi,
	Martin K . Petersen, James E . J . Bottomley, Bart Van Assche,
	Oleksandr Natalenko, Johannes Thumshirn, Cathy Avery

Ming Lei - 27.09.17, 16:27:
> On Wed, Sep 27, 2017 at 09:57:37AM +0200, Martin Steigerwald wrote:
> > Hi Ming.
> > 
> > Ming Lei - 27.09.17, 13:48:
> > > Hi,
> > > 
> > > The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.
> > > 
> > > Once SCSI device is put into QUIESCE, no new request except for
> > > RQF_PREEMPT can be dispatched to SCSI successfully, and
> > > scsi_device_quiesce() just simply waits for completion of I/Os
> > > dispatched to SCSI stack. It isn't enough at all.
> > > 
> > > Because new request still can be comming, but all the allocated
> > > requests can't be dispatched successfully, so request pool can be
> > > consumed up easily.
> > > 
> > > Then request with RQF_PREEMPT can't be allocated and wait forever,
> > > meantime scsi_device_resume() waits for completion of RQF_PREEMPT,
> > > then system hangs forever, such as during system suspend or
> > > sending SCSI domain alidation.
> > > 
> > > Both IO hang inside system suspend[1] or SCSI domain validation
> > > were reported before.
> > > 
> > > This patch introduces preempt only mode, and solves the issue
> > > by allowing RQF_PREEMP only during SCSI quiesce.
> > > 
> > > Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
> > > them all.
> > > 
> > > V6:
> > > 	- borrow Bart's idea of preempt only, with clean
> > > 	
> > > 	  implementation(patch 5/patch 6)
> > > 	
> > > 	- needn't any external driver's dependency, such as MD's
> > > 	change
> > 
> > Do you want me to test with v6 of the patch set? If so, it would be nice
> > if
> > you�d make a v6 branch in your git repo.
> 
> Hi Martin,
> 
> I appreciate much if you may run V6 and provide your test result,
> follows the branch:
> 
> https://github.com/ming1/linux/tree/blk_safe_scsi_quiesce_V6
> 
> https://github.com/ming1/linux.git #blk_safe_scsi_quiesce_V6
> 
> > After an uptime of almost 6 days I am pretty confident that the V5 one
> > fixes the issue for me. So
> > 
> > Tested-by: Martin Steigerwald <martin@lichtvoll.de>
> > 
> > for V5.
> 
> Thanks for your test!

Two days and almost 6 hours, no hang yet. I bet the whole thing works. 
(3e45474d7df3bfdabe4801b5638d197df9810a79)

Tested-By: Martin Steigerwald <martin@lichtvoll.de>

(It could still hang after three days, but usually I got the first hang within 
the first two days.)

Thanks,
-- 
Martin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V6 0/6] block/scsi: safe SCSI quiescing
@ 2017-09-29 18:47       ` Martin Steigerwald
  0 siblings, 0 replies; 21+ messages in thread
From: Martin Steigerwald @ 2017-09-29 18:47 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, linux-scsi,
	Martin K . Petersen, James E . J . Bottomley, Bart Van Assche,
	Oleksandr Natalenko, Johannes Thumshirn, Cathy Avery

Ming Lei - 27.09.17, 16:27:
> On Wed, Sep 27, 2017 at 09:57:37AM +0200, Martin Steigerwald wrote:
> > Hi Ming.
> > 
> > Ming Lei - 27.09.17, 13:48:
> > > Hi,
> > > 
> > > The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.
> > > 
> > > Once SCSI device is put into QUIESCE, no new request except for
> > > RQF_PREEMPT can be dispatched to SCSI successfully, and
> > > scsi_device_quiesce() just simply waits for completion of I/Os
> > > dispatched to SCSI stack. It isn't enough at all.
> > > 
> > > Because new request still can be comming, but all the allocated
> > > requests can't be dispatched successfully, so request pool can be
> > > consumed up easily.
> > > 
> > > Then request with RQF_PREEMPT can't be allocated and wait forever,
> > > meantime scsi_device_resume() waits for completion of RQF_PREEMPT,
> > > then system hangs forever, such as during system suspend or
> > > sending SCSI domain alidation.
> > > 
> > > Both IO hang inside system suspend[1] or SCSI domain validation
> > > were reported before.
> > > 
> > > This patch introduces preempt only mode, and solves the issue
> > > by allowing RQF_PREEMP only during SCSI quiesce.
> > > 
> > > Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
> > > them all.
> > > 
> > > V6:
> > > 	- borrow Bart's idea of preempt only, with clean
> > > 	
> > > 	  implementation(patch 5/patch 6)
> > > 	
> > > 	- needn't any external driver's dependency, such as MD's
> > > 	change
> > 
> > Do you want me to test with v6 of the patch set? If so, it would be nice
> > if
> > you´d make a v6 branch in your git repo.
> 
> Hi Martin,
> 
> I appreciate much if you may run V6 and provide your test result,
> follows the branch:
> 
> https://github.com/ming1/linux/tree/blk_safe_scsi_quiesce_V6
> 
> https://github.com/ming1/linux.git #blk_safe_scsi_quiesce_V6
> 
> > After an uptime of almost 6 days I am pretty confident that the V5 one
> > fixes the issue for me. So
> > 
> > Tested-by: Martin Steigerwald <martin@lichtvoll.de>
> > 
> > for V5.
> 
> Thanks for your test!

Two days and almost 6 hours, no hang yet. I bet the whole thing works. 
(3e45474d7df3bfdabe4801b5638d197df9810a79)

Tested-By: Martin Steigerwald <martin@lichtvoll.de>

(It could still hang after three days, but usually I got the first hang within 
the first two days.)

Thanks,
-- 
Martin

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-09-29 18:47 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-27  5:48 [PATCH V6 0/6] block/scsi: safe SCSI quiescing Ming Lei
2017-09-27  5:48 ` [PATCH V6 1/6] blk-mq: only run hw queues for blk-mq Ming Lei
2017-09-27  5:48 ` [PATCH V6 2/6] block: tracking request allocation with q_usage_counter Ming Lei
2017-09-27  6:10   ` Hannes Reinecke
2017-09-27  5:48 ` [PATCH V6 3/6] block: pass flags to blk_queue_enter() Ming Lei
2017-09-27  5:48 ` [PATCH V6 4/6] block: prepare for passing RQF_PREEMPT to request allocation Ming Lei
2017-09-27  5:48 ` [PATCH V6 5/6] block: support PREEMPT_ONLY Ming Lei
2017-09-27  5:48 ` [PATCH V6 6/6] SCSI: set block queue at preempt only when SCSI device is put into quiesce Ming Lei
2017-09-27  9:54   ` Bart Van Assche
2017-09-27  9:54     ` Bart Van Assche
2017-09-27 10:14     ` Ming Lei
2017-09-27  7:57 ` [PATCH V6 0/6] block/scsi: safe SCSI quiescing Martin Steigerwald
2017-09-27  7:57   ` Martin Steigerwald
2017-09-27  8:27   ` Ming Lei
2017-09-27  8:27     ` Ming Lei
2017-09-27  8:52     ` Ming Lei
2017-09-27  8:52       ` Ming Lei
2017-09-28  8:11       ` Oleksandr Natalenko
2017-09-28  8:11         ` Oleksandr Natalenko
2017-09-29 18:47     ` Martin Steigerwald
2017-09-29 18:47       ` Martin Steigerwald

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.