All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] block/scsi: safe SCSI quiescing
@ 2017-08-31 17:27 Ming Lei
  2017-08-31 17:27 ` [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead() Ming Lei
                   ` (9 more replies)
  0 siblings, 10 replies; 45+ messages in thread
From: Ming Lei @ 2017-08-31 17:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley
  Cc: Oleksandr Natalenko, Ming Lei

The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.

Once SCSI device is put into QUIESCE, no new request except for RQF_PREEMPT
can be dispatched to SCSI successfully, and scsi_device_quiesce() just simply
waits for completion of I/Os dispatched to SCSI stack. It isn't enough at all.

Because new request still can be allocated, but all the allocated
requests can't be dispatched successfully, so request pool can be
consumed up easily.

Then request with RQF_PREEMPT can't be allocated, and system may
hang forever, such as during system suspend or SCSI domain alidation.

Both IO hang inside system suspend[1] or SCSI domain validation
were reported before.

This patch tries to solve the issue by freezing block queue during
SCSI quiescing, and allowing to allocate request of RQF_PREEMPT
when queue is frozen.

Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
them all by unifying blk_freeze_queue() and blk_unfreeze_queue().


[1] https://marc.info/?t=150340250100013&r=3&w=2

Ming Lei (9):
  percpu-refcount: introduce percpu_ref_is_dead()
  blk-mq: rename blk_mq_unfreeze_queue as blk_unfreeze_queue
  blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue
  blk-mq: only run hw queues for blk-mq
  block: introduce blk_drain_queue()
  blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait
  block: tracking request allocation with q_usage_counter
  block: allow to allocate req with REQF_PREEMPT when queue is frozen
  SCSI: freeze block queue when SCSI device is put into quiesce

 block/bfq-iosched.c             |  2 +-
 block/blk-cgroup.c              |  8 +++----
 block/blk-core.c                | 49 +++++++++++++++++++++++++++++++++--------
 block/blk-mq.c                  | 49 +++++++++++++++++++++--------------------
 block/blk-mq.h                  |  1 -
 block/blk.h                     |  5 +++++
 block/elevator.c                |  4 ++--
 drivers/block/loop.c            | 16 +++++++-------
 drivers/block/rbd.c             |  2 +-
 drivers/ide/ide-pm.c            |  3 ++-
 drivers/nvme/host/core.c        |  8 +++----
 drivers/scsi/scsi_lib.c         | 23 +++++++++++++++++--
 include/linux/blk-mq.h          | 13 ++++++-----
 include/linux/blkdev.h          | 18 +++++++++++++--
 include/linux/percpu-refcount.h | 17 ++++++++++++++
 15 files changed, 153 insertions(+), 65 deletions(-)

-- 
2.9.5

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead()
  2017-08-31 17:27 [PATCH 0/9] block/scsi: safe SCSI quiescing Ming Lei
@ 2017-08-31 17:27 ` Ming Lei
  2017-08-31 23:07     ` Bart Van Assche
  2017-09-01 13:59   ` Tejun Heo
  2017-08-31 17:27 ` [PATCH 2/9] blk-mq: rename blk_mq_unfreeze_queue as blk_unfreeze_queue Ming Lei
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 45+ messages in thread
From: Ming Lei @ 2017-08-31 17:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley
  Cc: Oleksandr Natalenko, Ming Lei, Tejun Heo

Inside block layer, we need to support to allocate request with
RQF_PREEMPT even though queue is frozen.

So introduce this helper for checking if queue is frozen.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/linux/percpu-refcount.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index c13dceb87b60..08549bbf4080 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -324,4 +324,21 @@ static inline bool percpu_ref_is_zero(struct percpu_ref *ref)
 	return !atomic_long_read(&ref->count);
 }
 
+/**
+ * percpu_ref_is_dead - test whether a percpu refcount is killed
+ * @ref: percpu_ref to test
+ *
+ * Returns %true if @ref is dead
+ *
+ * This function is safe to call as long as @ref is between init and exit.
+ */
+static inline bool percpu_ref_is_dead(struct percpu_ref *ref)
+{
+	unsigned long __percpu *percpu_count;
+
+	if (__ref_is_percpu(ref, &percpu_count))
+		return false;
+	return ref->percpu_count_ptr & __PERCPU_REF_DEAD;
+}
+
 #endif
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 2/9] blk-mq: rename blk_mq_unfreeze_queue as blk_unfreeze_queue
  2017-08-31 17:27 [PATCH 0/9] block/scsi: safe SCSI quiescing Ming Lei
  2017-08-31 17:27 ` [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead() Ming Lei
@ 2017-08-31 17:27 ` Ming Lei
  2017-08-31 17:27 ` [PATCH 3/9] blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue Ming Lei
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 45+ messages in thread
From: Ming Lei @ 2017-08-31 17:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley
  Cc: Oleksandr Natalenko, Ming Lei

We will support to freeze queue on block legacy path too.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-cgroup.c       |  4 ++--
 block/blk-mq.c           | 10 +++++-----
 block/elevator.c         |  2 +-
 drivers/block/loop.c     |  8 ++++----
 drivers/nvme/host/core.c |  4 ++--
 include/linux/blk-mq.h   |  2 +-
 6 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 0480892e97e5..02e8a47ac77c 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1337,7 +1337,7 @@ int blkcg_activate_policy(struct request_queue *q,
 	spin_unlock_irq(q->queue_lock);
 out_bypass_end:
 	if (q->mq_ops)
-		blk_mq_unfreeze_queue(q);
+		blk_unfreeze_queue(q);
 	else
 		blk_queue_bypass_end(q);
 	if (pd_prealloc)
@@ -1388,7 +1388,7 @@ void blkcg_deactivate_policy(struct request_queue *q,
 	spin_unlock_irq(q->queue_lock);
 
 	if (q->mq_ops)
-		blk_mq_unfreeze_queue(q);
+		blk_unfreeze_queue(q);
 	else
 		blk_queue_bypass_end(q);
 }
diff --git a/block/blk-mq.c b/block/blk-mq.c
index d935f15c54da..82136e83951d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -172,7 +172,7 @@ void blk_mq_freeze_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue);
 
-void blk_mq_unfreeze_queue(struct request_queue *q)
+void blk_unfreeze_queue(struct request_queue *q)
 {
 	int freeze_depth;
 
@@ -183,7 +183,7 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
 		wake_up_all(&q->mq_freeze_wq);
 	}
 }
-EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
+EXPORT_SYMBOL_GPL(blk_unfreeze_queue);
 
 /*
  * FIXME: replace the scsi_internal_device_*block_nowait() calls in the
@@ -2250,7 +2250,7 @@ static void blk_mq_update_tag_set_depth(struct blk_mq_tag_set *set,
 	list_for_each_entry(q, &set->tag_list, tag_set_list) {
 		blk_mq_freeze_queue(q);
 		queue_set_hctx_shared(q, shared);
-		blk_mq_unfreeze_queue(q);
+		blk_unfreeze_queue(q);
 	}
 }
 
@@ -2708,7 +2708,7 @@ static int __blk_mq_update_nr_requests(struct request_queue *q,
 	if (!ret)
 		q->nr_requests = nr;
 
-	blk_mq_unfreeze_queue(q);
+	blk_unfreeze_queue(q);
 
 	return ret;
 }
@@ -2757,7 +2757,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 	}
 
 	list_for_each_entry(q, &set->tag_list, tag_set_list)
-		blk_mq_unfreeze_queue(q);
+		blk_unfreeze_queue(q);
 }
 
 void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues)
diff --git a/block/elevator.c b/block/elevator.c
index 0e465809d3f3..371c8165c9e8 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -994,7 +994,7 @@ static int elevator_switch_mq(struct request_queue *q,
 		blk_add_trace_msg(q, "elv switch: none");
 
 out:
-	blk_mq_unfreeze_queue(q);
+	blk_unfreeze_queue(q);
 	return ret;
 }
 
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 2fbd4089c20e..5c11ea44d470 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -217,7 +217,7 @@ static void __loop_update_dio(struct loop_device *lo, bool dio)
 		lo->lo_flags |= LO_FLAGS_DIRECT_IO;
 	else
 		lo->lo_flags &= ~LO_FLAGS_DIRECT_IO;
-	blk_mq_unfreeze_queue(lo->lo_queue);
+	blk_unfreeze_queue(lo->lo_queue);
 }
 
 static int
@@ -605,7 +605,7 @@ static int loop_switch(struct loop_device *lo, struct file *file)
 	do_loop_switch(lo, &w);
 
 	/* unfreeze */
-	blk_mq_unfreeze_queue(lo->lo_queue);
+	blk_unfreeze_queue(lo->lo_queue);
 
 	return 0;
 }
@@ -1079,7 +1079,7 @@ static int loop_clr_fd(struct loop_device *lo)
 	lo->lo_state = Lo_unbound;
 	/* This is safe: open() is still holding a reference. */
 	module_put(THIS_MODULE);
-	blk_mq_unfreeze_queue(lo->lo_queue);
+	blk_unfreeze_queue(lo->lo_queue);
 
 	if (lo->lo_flags & LO_FLAGS_PARTSCAN && bdev)
 		loop_reread_partitions(lo, bdev);
@@ -1191,7 +1191,7 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
 	__loop_update_dio(lo, lo->use_dio);
 
  exit:
-	blk_mq_unfreeze_queue(lo->lo_queue);
+	blk_unfreeze_queue(lo->lo_queue);
 
 	if (!err && (info->lo_flags & LO_FLAGS_PARTSCAN) &&
 	     !(lo->lo_flags & LO_FLAGS_PARTSCAN)) {
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 37046ac2c441..5c76b0a96be2 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1226,7 +1226,7 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 
 	if (ctrl->oncs & NVME_CTRL_ONCS_DSM)
 		nvme_config_discard(ns);
-	blk_mq_unfreeze_queue(disk->queue);
+	blk_unfreeze_queue(disk->queue);
 }
 
 static int nvme_revalidate_disk(struct gendisk *disk)
@@ -2753,7 +2753,7 @@ void nvme_unfreeze(struct nvme_ctrl *ctrl)
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list)
-		blk_mq_unfreeze_queue(ns->queue);
+		blk_unfreeze_queue(ns->queue);
 	mutex_unlock(&ctrl->namespaces_mutex);
 }
 EXPORT_SYMBOL_GPL(nvme_unfreeze);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 13f6c25fa461..2572e5641568 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -257,7 +257,7 @@ void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs);
 void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 		busy_tag_iter_fn *fn, void *priv);
 void blk_mq_freeze_queue(struct request_queue *q);
-void blk_mq_unfreeze_queue(struct request_queue *q);
+void blk_unfreeze_queue(struct request_queue *q);
 void blk_freeze_queue_start(struct request_queue *q);
 void blk_mq_freeze_queue_wait(struct request_queue *q);
 int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 3/9] blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue
  2017-08-31 17:27 [PATCH 0/9] block/scsi: safe SCSI quiescing Ming Lei
  2017-08-31 17:27 ` [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead() Ming Lei
  2017-08-31 17:27 ` [PATCH 2/9] blk-mq: rename blk_mq_unfreeze_queue as blk_unfreeze_queue Ming Lei
@ 2017-08-31 17:27 ` Ming Lei
  2017-08-31 22:35     ` Bart Van Assche
  2017-08-31 17:27 ` [PATCH 4/9] blk-mq: only run hw queues for blk-mq Ming Lei
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Ming Lei @ 2017-08-31 17:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley
  Cc: Oleksandr Natalenko, Ming Lei

This APIs will be used by legacy path too.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/bfq-iosched.c      |  2 +-
 block/blk-cgroup.c       |  4 ++--
 block/blk-mq.c           | 17 ++++-------------
 block/blk-mq.h           |  1 -
 block/elevator.c         |  2 +-
 drivers/block/loop.c     |  8 ++++----
 drivers/block/rbd.c      |  2 +-
 drivers/nvme/host/core.c |  2 +-
 include/linux/blk-mq.h   |  2 +-
 9 files changed, 15 insertions(+), 25 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 509f39998011..ce2b00e897e2 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -4757,7 +4757,7 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
 	 * The invocation of the next bfq_create_group_hierarchy
 	 * function is the head of a chain of function calls
 	 * (bfq_create_group_hierarchy->blkcg_activate_policy->
-	 * blk_mq_freeze_queue) that may lead to the invocation of the
+	 * blk_freeze_queue) that may lead to the invocation of the
 	 * has_work hook function. For this reason,
 	 * bfq_create_group_hierarchy is invoked only after all
 	 * scheduler data has been initialized, apart from the fields
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 02e8a47ac77c..87c15f3947d5 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1296,7 +1296,7 @@ int blkcg_activate_policy(struct request_queue *q,
 		return 0;
 
 	if (q->mq_ops)
-		blk_mq_freeze_queue(q);
+		blk_freeze_queue(q);
 	else
 		blk_queue_bypass_start(q);
 pd_prealloc:
@@ -1363,7 +1363,7 @@ void blkcg_deactivate_policy(struct request_queue *q,
 		return;
 
 	if (q->mq_ops)
-		blk_mq_freeze_queue(q);
+		blk_freeze_queue(q);
 	else
 		blk_queue_bypass_start(q);
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 82136e83951d..8cf1f7cbef2b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -161,16 +161,7 @@ void blk_freeze_queue(struct request_queue *q)
 	blk_freeze_queue_start(q);
 	blk_mq_freeze_queue_wait(q);
 }
-
-void blk_mq_freeze_queue(struct request_queue *q)
-{
-	/*
-	 * ...just an alias to keep freeze and unfreeze actions balanced
-	 * in the blk_mq_* namespace
-	 */
-	blk_freeze_queue(q);
-}
-EXPORT_SYMBOL_GPL(blk_mq_freeze_queue);
+EXPORT_SYMBOL_GPL(blk_freeze_queue);
 
 void blk_unfreeze_queue(struct request_queue *q)
 {
@@ -2248,7 +2239,7 @@ static void blk_mq_update_tag_set_depth(struct blk_mq_tag_set *set,
 	lockdep_assert_held(&set->tag_list_lock);
 
 	list_for_each_entry(q, &set->tag_list, tag_set_list) {
-		blk_mq_freeze_queue(q);
+		blk_freeze_queue(q);
 		queue_set_hctx_shared(q, shared);
 		blk_unfreeze_queue(q);
 	}
@@ -2683,7 +2674,7 @@ static int __blk_mq_update_nr_requests(struct request_queue *q,
 	if (!set)
 		return -EINVAL;
 
-	blk_mq_freeze_queue(q);
+	blk_freeze_queue(q);
 
 	ret = 0;
 	queue_for_each_hw_ctx(q, hctx, i) {
@@ -2747,7 +2738,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 		return;
 
 	list_for_each_entry(q, &set->tag_list, tag_set_list)
-		blk_mq_freeze_queue(q);
+		blk_freeze_queue(q);
 
 	set->nr_hw_queues = nr_hw_queues;
 	blk_mq_update_queue_map(set);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 1b9742eb7399..7ce29ef1e6f3 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -30,7 +30,6 @@ struct blk_mq_ctx {
 } ____cacheline_aligned_in_smp;
 
 void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
-void blk_mq_freeze_queue(struct request_queue *q);
 void blk_mq_free_queue(struct request_queue *q);
 int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr);
 void blk_mq_wake_waiters(struct request_queue *q);
diff --git a/block/elevator.c b/block/elevator.c
index 371c8165c9e8..1164c8a3720f 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -967,7 +967,7 @@ static int elevator_switch_mq(struct request_queue *q,
 {
 	int ret;
 
-	blk_mq_freeze_queue(q);
+	blk_freeze_queue(q);
 
 	if (q->elevator) {
 		if (q->elevator->registered)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 5c11ea44d470..b2e708b7e1e6 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -211,7 +211,7 @@ static void __loop_update_dio(struct loop_device *lo, bool dio)
 	 * LO_FLAGS_READ_ONLY, both are set from kernel, and losetup
 	 * will get updated by ioctl(LOOP_GET_STATUS)
 	 */
-	blk_mq_freeze_queue(lo->lo_queue);
+	blk_freeze_queue(lo->lo_queue);
 	lo->use_dio = use_dio;
 	if (use_dio)
 		lo->lo_flags |= LO_FLAGS_DIRECT_IO;
@@ -599,7 +599,7 @@ static int loop_switch(struct loop_device *lo, struct file *file)
 	w.file = file;
 
 	/* freeze queue and wait for completion of scheduled requests */
-	blk_mq_freeze_queue(lo->lo_queue);
+	blk_freeze_queue(lo->lo_queue);
 
 	/* do the switch action */
 	do_loop_switch(lo, &w);
@@ -1046,7 +1046,7 @@ static int loop_clr_fd(struct loop_device *lo)
 		return -EINVAL;
 
 	/* freeze request queue during the transition */
-	blk_mq_freeze_queue(lo->lo_queue);
+	blk_freeze_queue(lo->lo_queue);
 
 	spin_lock_irq(&lo->lo_lock);
 	lo->lo_state = Lo_rundown;
@@ -1116,7 +1116,7 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
 		return -EINVAL;
 
 	/* I/O need to be drained during transfer transition */
-	blk_mq_freeze_queue(lo->lo_queue);
+	blk_freeze_queue(lo->lo_queue);
 
 	err = loop_release_xfer(lo);
 	if (err)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index b008b6a98098..3a97ffcb3a81 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -6347,7 +6347,7 @@ static ssize_t do_rbd_remove(struct bus_type *bus,
 		 * Prevent new IO from being queued and wait for existing
 		 * IO to complete/fail.
 		 */
-		blk_mq_freeze_queue(rbd_dev->disk->queue);
+		blk_freeze_queue(rbd_dev->disk->queue);
 		blk_set_queue_dying(rbd_dev->disk->queue);
 	}
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 5c76b0a96be2..986f2b4f9760 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1210,7 +1210,7 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 	bs = 1 << ns->lba_shift;
 	ns->noiob = le16_to_cpu(id->noiob);
 
-	blk_mq_freeze_queue(disk->queue);
+	blk_freeze_queue(disk->queue);
 
 	if (ctrl->ops->flags & NVME_F_METADATA_SUPPORTED)
 		nvme_prep_integrity(disk, id, bs);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 2572e5641568..8ae77e088c01 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -256,7 +256,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async);
 void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs);
 void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 		busy_tag_iter_fn *fn, void *priv);
-void blk_mq_freeze_queue(struct request_queue *q);
+void blk_freeze_queue(struct request_queue *q);
 void blk_unfreeze_queue(struct request_queue *q);
 void blk_freeze_queue_start(struct request_queue *q);
 void blk_mq_freeze_queue_wait(struct request_queue *q);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 4/9] blk-mq: only run hw queues for blk-mq
  2017-08-31 17:27 [PATCH 0/9] block/scsi: safe SCSI quiescing Ming Lei
                   ` (2 preceding siblings ...)
  2017-08-31 17:27 ` [PATCH 3/9] blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue Ming Lei
@ 2017-08-31 17:27 ` Ming Lei
  2017-09-01  8:16     ` Johannes Thumshirn
  2017-08-31 17:27 ` [PATCH 5/9] block: introduce blk_drain_queue() Ming Lei
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Ming Lei @ 2017-08-31 17:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley
  Cc: Oleksandr Natalenko, Ming Lei

This patch just makes it explicitely.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8cf1f7cbef2b..4c532d8612e1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -125,7 +125,8 @@ void blk_freeze_queue_start(struct request_queue *q)
 	freeze_depth = atomic_inc_return(&q->mq_freeze_depth);
 	if (freeze_depth == 1) {
 		percpu_ref_kill(&q->q_usage_counter);
-		blk_mq_run_hw_queues(q, false);
+		if (q->mq_ops)
+			blk_mq_run_hw_queues(q, false);
 	}
 }
 EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 5/9] block: introduce blk_drain_queue()
  2017-08-31 17:27 [PATCH 0/9] block/scsi: safe SCSI quiescing Ming Lei
                   ` (3 preceding siblings ...)
  2017-08-31 17:27 ` [PATCH 4/9] blk-mq: only run hw queues for blk-mq Ming Lei
@ 2017-08-31 17:27 ` Ming Lei
  2017-08-31 22:34     ` Bart Van Assche
  2017-08-31 17:27 ` [PATCH 6/9] blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait Ming Lei
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Ming Lei @ 2017-08-31 17:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley
  Cc: Oleksandr Natalenko, Ming Lei

So that we can support legacy version of freezing queue,
which is required by safe SCSI quiescing.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c       | 16 ++++++++++++++++
 include/linux/blkdev.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index d579501f24ba..636452f151ea 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -530,6 +530,22 @@ static void __blk_drain_queue(struct request_queue *q, bool drain_all)
 }
 
 /**
+ * blk_drain_queue - drain requests from request_queue
+ * @q: queue to drain
+ *
+ * Drain requests from @q.  All pending requests are drained.
+ * The caller is responsible for ensuring that no new requests
+ * which need to be drained are queued.
+ */
+void blk_drain_queue(struct request_queue *q)
+{
+	spin_lock_irq(q->queue_lock);
+	__blk_drain_queue(q, true);
+	spin_unlock_irq(q->queue_lock);
+}
+EXPORT_SYMBOL(blk_drain_queue);
+
+/**
  * blk_queue_bypass_start - enter queue bypass mode
  * @q: queue of interest
  *
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f45f157b2910..02959ca03b66 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1146,6 +1146,7 @@ extern struct request_queue *blk_init_queue_node(request_fn_proc *rfn,
 extern struct request_queue *blk_init_queue(request_fn_proc *, spinlock_t *);
 extern int blk_init_allocated_queue(struct request_queue *);
 extern void blk_cleanup_queue(struct request_queue *);
+extern void blk_drain_queue(struct request_queue *);
 extern void blk_queue_make_request(struct request_queue *, make_request_fn *);
 extern void blk_queue_bounce_limit(struct request_queue *, u64);
 extern void blk_queue_max_hw_sectors(struct request_queue *, unsigned int);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 6/9] blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait
  2017-08-31 17:27 [PATCH 0/9] block/scsi: safe SCSI quiescing Ming Lei
                   ` (4 preceding siblings ...)
  2017-08-31 17:27 ` [PATCH 5/9] block: introduce blk_drain_queue() Ming Lei
@ 2017-08-31 17:27 ` Ming Lei
  2017-08-31 22:37     ` Bart Van Assche
  2017-08-31 17:27 ` [PATCH 7/9] block: tracking request allocation with q_usage_counter Ming Lei
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Ming Lei @ 2017-08-31 17:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley
  Cc: Oleksandr Natalenko, Ming Lei

The only change on legacy is that blk_drain_queue() is run
from blk_freeze_queue(), which is called in blk_cleanup_queue().

So this patch removes the explicite __blk_drain_queue() in
blk_cleanup_queue().

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c         | 2 --
 block/blk-mq.c           | 8 +++++---
 drivers/nvme/host/core.c | 2 +-
 include/linux/blk-mq.h   | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 636452f151ea..7a9f93889669 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -669,8 +669,6 @@ void blk_cleanup_queue(struct request_queue *q)
 	 */
 	blk_freeze_queue(q);
 	spin_lock_irq(lock);
-	if (!q->mq_ops)
-		__blk_drain_queue(q, true);
 	queue_flag_set(QUEUE_FLAG_DEAD, q);
 	spin_unlock_irq(lock);
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4c532d8612e1..24de78afbe9a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -131,11 +131,13 @@ void blk_freeze_queue_start(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
 
-void blk_mq_freeze_queue_wait(struct request_queue *q)
+void blk_freeze_queue_wait(struct request_queue *q)
 {
+	if (!q->mq_ops)
+		blk_drain_queue(q);
 	wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter));
 }
-EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);
+EXPORT_SYMBOL_GPL(blk_freeze_queue_wait);
 
 int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
 				     unsigned long timeout)
@@ -160,7 +162,7 @@ void blk_freeze_queue(struct request_queue *q)
 	 * exported to drivers as the only user for unfreeze is blk_mq.
 	 */
 	blk_freeze_queue_start(q);
-	blk_mq_freeze_queue_wait(q);
+	blk_freeze_queue_wait(q);
 }
 EXPORT_SYMBOL_GPL(blk_freeze_queue);
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 986f2b4f9760..d34a9ffaa940 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2778,7 +2778,7 @@ void nvme_wait_freeze(struct nvme_ctrl *ctrl)
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list)
-		blk_mq_freeze_queue_wait(ns->queue);
+		blk_freeze_queue_wait(ns->queue);
 	mutex_unlock(&ctrl->namespaces_mutex);
 }
 EXPORT_SYMBOL_GPL(nvme_wait_freeze);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8ae77e088c01..f90d78eb85df 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -259,7 +259,7 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 void blk_freeze_queue(struct request_queue *q);
 void blk_unfreeze_queue(struct request_queue *q);
 void blk_freeze_queue_start(struct request_queue *q);
-void blk_mq_freeze_queue_wait(struct request_queue *q);
+void blk_freeze_queue_wait(struct request_queue *q);
 int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
 				     unsigned long timeout);
 int blk_mq_reinit_tagset(struct blk_mq_tag_set *set,
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 7/9] block: tracking request allocation with q_usage_counter
  2017-08-31 17:27 [PATCH 0/9] block/scsi: safe SCSI quiescing Ming Lei
                   ` (5 preceding siblings ...)
  2017-08-31 17:27 ` [PATCH 6/9] blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait Ming Lei
@ 2017-08-31 17:27 ` Ming Lei
  2017-08-31 17:27 ` [PATCH 8/9] block: allow to allocate req with REQF_PREEMPT when queue is frozen Ming Lei
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 45+ messages in thread
From: Ming Lei @ 2017-08-31 17:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley
  Cc: Oleksandr Natalenko, Ming Lei

This usage is basically same with blk-mq, so that we can
support to freeze queue easily.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 7a9f93889669..a84672f4db32 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1406,16 +1406,21 @@ static struct request *blk_old_get_request(struct request_queue *q,
 					   unsigned int op, gfp_t gfp_mask)
 {
 	struct request *rq;
+	int ret = 0;
 
 	WARN_ON_ONCE(q->mq_ops);
 
 	/* create ioc upfront */
 	create_io_context(gfp_mask, q->node);
 
+	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
+	if (ret)
+		return ERR_PTR(ret);
 	spin_lock_irq(q->queue_lock);
 	rq = get_request(q, op, NULL, gfp_mask);
 	if (IS_ERR(rq)) {
 		spin_unlock_irq(q->queue_lock);
+		blk_queue_exit(q);
 		return rq;
 	}
 
@@ -1587,6 +1592,7 @@ void __blk_put_request(struct request_queue *q, struct request *req)
 		blk_free_request(rl, req);
 		freed_request(rl, sync, rq_flags);
 		blk_put_rl(rl);
+		blk_queue_exit(q);
 	}
 }
 EXPORT_SYMBOL_GPL(__blk_put_request);
@@ -1868,8 +1874,10 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
 	 * Grab a free request. This is might sleep but can not fail.
 	 * Returns with the queue unlocked.
 	 */
+	blk_queue_enter_live(q);
 	req = get_request(q, bio->bi_opf, bio, GFP_NOIO);
 	if (IS_ERR(req)) {
+		blk_queue_exit(q);
 		__wbt_done(q->rq_wb, wb_acct);
 		if (PTR_ERR(req) == -ENOMEM)
 			bio->bi_status = BLK_STS_RESOURCE;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 8/9] block: allow to allocate req with REQF_PREEMPT when queue is frozen
  2017-08-31 17:27 [PATCH 0/9] block/scsi: safe SCSI quiescing Ming Lei
                   ` (6 preceding siblings ...)
  2017-08-31 17:27 ` [PATCH 7/9] block: tracking request allocation with q_usage_counter Ming Lei
@ 2017-08-31 17:27 ` Ming Lei
  2017-08-31 22:50     ` Bart Van Assche
  2017-08-31 17:27 ` [PATCH 9/9] SCSI: freeze block queue when SCSI device is put into quiesce Ming Lei
  2017-08-31 17:34   ` Oleksandr Natalenko
  9 siblings, 1 reply; 45+ messages in thread
From: Ming Lei @ 2017-08-31 17:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley
  Cc: Oleksandr Natalenko, Ming Lei

REQF_PREEMPT is a bit special because it is required to be
dispatched to lld even when SCSI device is quiesced.

So this patch introduces __blk_get_request() to allow block
layer to allocate request when queue is frozen, since we
will freeze queue before quiescing SCSI device in the
following patch for supporting safe SCSI quiescing.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c       | 25 +++++++++++++++++--------
 block/blk-mq.c         | 11 +++++++++--
 block/blk.h            |  5 +++++
 include/linux/blk-mq.h |  7 ++++---
 include/linux/blkdev.h | 17 +++++++++++++++--
 5 files changed, 50 insertions(+), 15 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index a84672f4db32..4e048cbf82e1 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1403,7 +1403,8 @@ static struct request *get_request(struct request_queue *q, unsigned int op,
 }
 
 static struct request *blk_old_get_request(struct request_queue *q,
-					   unsigned int op, gfp_t gfp_mask)
+					   unsigned int op, gfp_t gfp_mask,
+					   unsigned int flags)
 {
 	struct request *rq;
 	int ret = 0;
@@ -1413,9 +1414,17 @@ static struct request *blk_old_get_request(struct request_queue *q,
 	/* create ioc upfront */
 	create_io_context(gfp_mask, q->node);
 
-	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
+	/*
+	 * When queue is frozen, we still need to allocate req for
+	 * REQF_PREEMPT.
+	 */
+	if ((flags & BLK_MQ_REQ_PREEMPT) && blk_queue_is_frozen(q))
+		blk_queue_enter_live(q);
+	else
+		ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
 	if (ret)
 		return ERR_PTR(ret);
+
 	spin_lock_irq(q->queue_lock);
 	rq = get_request(q, op, NULL, gfp_mask);
 	if (IS_ERR(rq)) {
@@ -1431,26 +1440,26 @@ static struct request *blk_old_get_request(struct request_queue *q,
 	return rq;
 }
 
-struct request *blk_get_request(struct request_queue *q, unsigned int op,
-				gfp_t gfp_mask)
+struct request *__blk_get_request(struct request_queue *q, unsigned int op,
+				  gfp_t gfp_mask, unsigned int flags)
 {
 	struct request *req;
 
 	if (q->mq_ops) {
 		req = blk_mq_alloc_request(q, op,
-			(gfp_mask & __GFP_DIRECT_RECLAIM) ?
-				0 : BLK_MQ_REQ_NOWAIT);
+			flags | ((gfp_mask & __GFP_DIRECT_RECLAIM) ?
+				0 : BLK_MQ_REQ_NOWAIT));
 		if (!IS_ERR(req) && q->mq_ops->initialize_rq_fn)
 			q->mq_ops->initialize_rq_fn(req);
 	} else {
-		req = blk_old_get_request(q, op, gfp_mask);
+		req = blk_old_get_request(q, op, gfp_mask, flags);
 		if (!IS_ERR(req) && q->initialize_rq_fn)
 			q->initialize_rq_fn(req);
 	}
 
 	return req;
 }
-EXPORT_SYMBOL(blk_get_request);
+EXPORT_SYMBOL(__blk_get_request);
 
 /**
  * blk_requeue_request - put a request back on queue
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 24de78afbe9a..bd0a498e04df 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -382,9 +382,16 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
 {
 	struct blk_mq_alloc_data alloc_data = { .flags = flags };
 	struct request *rq;
-	int ret;
+	int ret = 0;
 
-	ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
+	/*
+	 * When queue is frozen, we still need to allocate req for
+	 * REQF_PREEMPT.
+	 */
+	if ((flags & BLK_MQ_REQ_PREEMPT) && blk_queue_is_frozen(q))
+		blk_queue_enter_live(q);
+	else
+		ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
 	if (ret)
 		return ERR_PTR(ret);
 
diff --git a/block/blk.h b/block/blk.h
index 6847c5435cca..784e487c7bb7 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -79,6 +79,11 @@ static inline void blk_queue_enter_live(struct request_queue *q)
 	percpu_ref_get(&q->q_usage_counter);
 }
 
+static inline bool blk_queue_is_frozen(struct request_queue *q)
+{
+	return percpu_ref_is_dead(&q->q_usage_counter);
+}
+
 #ifdef CONFIG_BLK_DEV_INTEGRITY
 void blk_flush_integrity(void);
 bool __bio_integrity_endio(struct bio *);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index f90d78eb85df..0ba5cb043172 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -200,9 +200,10 @@ void blk_mq_free_request(struct request *rq);
 bool blk_mq_can_queue(struct blk_mq_hw_ctx *);
 
 enum {
-	BLK_MQ_REQ_NOWAIT	= (1 << 0), /* return when out of requests */
-	BLK_MQ_REQ_RESERVED	= (1 << 1), /* allocate from reserved pool */
-	BLK_MQ_REQ_INTERNAL	= (1 << 2), /* allocate internal/sched tag */
+	BLK_MQ_REQ_PREEMPT	= BLK_REQ_PREEMPT, /* allocate for RQF_PREEMPT */
+	BLK_MQ_REQ_NOWAIT	= (1 << 8), /* return when out of requests */
+	BLK_MQ_REQ_RESERVED	= (1 << 9), /* allocate from reserved pool */
+	BLK_MQ_REQ_INTERNAL	= (1 << 10), /* allocate internal/sched tag */
 };
 
 struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 02959ca03b66..10369ba86102 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -858,6 +858,11 @@ enum {
 	BLKPREP_INVALID,	/* invalid command, kill, return -EREMOTEIO */
 };
 
+/* passed to __blk_get_request */
+enum {
+	BLK_REQ_PREEMPT	= (1 << 0), /* allocate for RQF_PREEMPT */
+};
+
 extern unsigned long blk_max_low_pfn, blk_max_pfn;
 
 /*
@@ -940,8 +945,9 @@ extern void blk_rq_init(struct request_queue *q, struct request *rq);
 extern void blk_init_request_from_bio(struct request *req, struct bio *bio);
 extern void blk_put_request(struct request *);
 extern void __blk_put_request(struct request_queue *, struct request *);
-extern struct request *blk_get_request(struct request_queue *, unsigned int op,
-				       gfp_t gfp_mask);
+extern struct request *__blk_get_request(struct request_queue *,
+					 unsigned int op, gfp_t gfp_mask,
+					 unsigned int flags);
 extern void blk_requeue_request(struct request_queue *, struct request *);
 extern int blk_lld_busy(struct request_queue *q);
 extern int blk_rq_prep_clone(struct request *rq, struct request *rq_src,
@@ -992,6 +998,13 @@ blk_status_t errno_to_blk_status(int errno);
 
 bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
 
+static inline struct request *blk_get_request(struct request_queue *q,
+					      unsigned int op,
+					      gfp_t gfp_mask)
+{
+	return __blk_get_request(q, op, gfp_mask, 0);
+}
+
 static inline struct request_queue *bdev_get_queue(struct block_device *bdev)
 {
 	return bdev->bd_disk->queue;	/* this is never NULL */
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 9/9] SCSI: freeze block queue when SCSI device is put into quiesce
  2017-08-31 17:27 [PATCH 0/9] block/scsi: safe SCSI quiescing Ming Lei
                   ` (7 preceding siblings ...)
  2017-08-31 17:27 ` [PATCH 8/9] block: allow to allocate req with REQF_PREEMPT when queue is frozen Ming Lei
@ 2017-08-31 17:27 ` Ming Lei
  2017-08-31 17:34   ` Oleksandr Natalenko
  9 siblings, 0 replies; 45+ messages in thread
From: Ming Lei @ 2017-08-31 17:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley
  Cc: Oleksandr Natalenko, Ming Lei

Simply quiesing SCSI device and waiting for completeion of IO
dispatched to SCSI queue isn't safe, it is easy to use up
requests because all these allocated requests can't be dispatched
when device is put in QIUESCE. Then no request can be allocated
for RQF_PREEMPT, and system may hang somewhere, such as
When sending commands of sync_cache or start_stop during
system suspend path.

Before quiesing SCSI, this patch freezes block queue first,
so no new request can enter queue any more, and all pending
requests are drained too once blk_freeze_queue is returned.

This patch also uses __blk_get_request() for allocating
request with RQF_PREEMPT, so that the allocation can
succeed even though block queue is frozen.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/ide/ide-pm.c    |  3 ++-
 drivers/scsi/scsi_lib.c | 23 +++++++++++++++++++++--
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/ide/ide-pm.c b/drivers/ide/ide-pm.c
index 544f02d673ca..4970e3561b13 100644
--- a/drivers/ide/ide-pm.c
+++ b/drivers/ide/ide-pm.c
@@ -89,7 +89,8 @@ int generic_ide_resume(struct device *dev)
 	}
 
 	memset(&rqpm, 0, sizeof(rqpm));
-	rq = blk_get_request(drive->queue, REQ_OP_DRV_IN, __GFP_RECLAIM);
+	rq = __blk_get_request(drive->queue, REQ_OP_DRV_IN,
+			       __GFP_RECLAIM, BLK_REQ_PREEMPT);
 	ide_req(rq)->type = ATA_PRIV_PM_RESUME;
 	rq->rq_flags |= RQF_PREEMPT;
 	rq->special = &rqpm;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f6097b89d5d3..b9ad8aeb456f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -244,9 +244,10 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 	struct scsi_request *rq;
 	int ret = DRIVER_ERROR << 24;
 
-	req = blk_get_request(sdev->request_queue,
+	req = __blk_get_request(sdev->request_queue,
 			data_direction == DMA_TO_DEVICE ?
-			REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, __GFP_RECLAIM);
+			REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, __GFP_RECLAIM,
+			BLK_REQ_PREEMPT);
 	if (IS_ERR(req))
 		return ret;
 	rq = scsi_req(req);
@@ -2890,6 +2891,19 @@ scsi_device_quiesce(struct scsi_device *sdev)
 {
 	int err;
 
+	/*
+	 * Simply quiesing SCSI device isn't safe, it is easy
+	 * to use up requests because all these allocated requests
+	 * can't be dispatched when device is put in QIUESCE.
+	 * Then no request can be allocated and we may hang
+	 * somewhere, such as system suspend/resume.
+	 *
+	 * So we freeze block queue first, no new request can
+	 * enter queue any more, and pending requests are drained
+	 * once blk_freeze_queue is returned.
+	 */
+	blk_freeze_queue(sdev->request_queue);
+
 	mutex_lock(&sdev->state_mutex);
 	err = scsi_device_set_state(sdev, SDEV_QUIESCE);
 	mutex_unlock(&sdev->state_mutex);
@@ -2917,6 +2931,11 @@ EXPORT_SYMBOL(scsi_device_quiesce);
  */
 void scsi_device_resume(struct scsi_device *sdev)
 {
+	/* wait for completion of IO issued during SCSI quiese */
+	blk_freeze_queue_wait(sdev->request_queue);
+
+	blk_unfreeze_queue(sdev->request_queue);
+
 	/* check if the device state was mutated prior to resume, and if
 	 * so assume the state is being managed elsewhere (for example
 	 * device deleted during suspend)
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/9] block/scsi: safe SCSI quiescing
  2017-08-31 17:27 [PATCH 0/9] block/scsi: safe SCSI quiescing Ming Lei
@ 2017-08-31 17:34   ` Oleksandr Natalenko
  2017-08-31 17:27 ` [PATCH 2/9] blk-mq: rename blk_mq_unfreeze_queue as blk_unfreeze_queue Ming Lei
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 45+ messages in thread
From: Oleksandr Natalenko @ 2017-08-31 17:34 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley

Since I'm in CC, does this series aim to replace 2 patches I've tested befo=
re:

blk-mq: add requests in the tail of hctx->dispatch
blk-mq: align to legacy's implementation of blk_execute_rq

?

On =C4=8Dtvrtek 31. srpna 2017 19:27:19 CEST Ming Lei wrote:
> The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.
>=20
> Once SCSI device is put into QUIESCE, no new request except for RQF_PREEM=
PT
> can be dispatched to SCSI successfully, and scsi_device_quiesce() just
> simply waits for completion of I/Os dispatched to SCSI stack. It isn't
> enough at all.
>=20
> Because new request still can be allocated, but all the allocated
> requests can't be dispatched successfully, so request pool can be
> consumed up easily.
>=20
> Then request with RQF_PREEMPT can't be allocated, and system may
> hang forever, such as during system suspend or SCSI domain alidation.
>=20
> Both IO hang inside system suspend[1] or SCSI domain validation
> were reported before.
>=20
> This patch tries to solve the issue by freezing block queue during
> SCSI quiescing, and allowing to allocate request of RQF_PREEMPT
> when queue is frozen.
>=20
> Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
> them all by unifying blk_freeze_queue() and blk_unfreeze_queue().
>=20
>=20
> [1] https://marc.info/?t=3D150340250100013&r=3D3&w=3D2
>=20
> Ming Lei (9):
>   percpu-refcount: introduce percpu_ref_is_dead()
>   blk-mq: rename blk_mq_unfreeze_queue as blk_unfreeze_queue
>   blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue
>   blk-mq: only run hw queues for blk-mq
>   block: introduce blk_drain_queue()
>   blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait
>   block: tracking request allocation with q_usage_counter
>   block: allow to allocate req with REQF_PREEMPT when queue is frozen
>   SCSI: freeze block queue when SCSI device is put into quiesce
>=20
>  block/bfq-iosched.c             |  2 +-
>  block/blk-cgroup.c              |  8 +++----
>  block/blk-core.c                | 49
> +++++++++++++++++++++++++++++++++-------- block/blk-mq.c                 =
 |
> 49 +++++++++++++++++++++-------------------- block/blk-mq.h              =
 =20
>  |  1 -
>  block/blk.h                     |  5 +++++
>  block/elevator.c                |  4 ++--
>  drivers/block/loop.c            | 16 +++++++-------
>  drivers/block/rbd.c             |  2 +-
>  drivers/ide/ide-pm.c            |  3 ++-
>  drivers/nvme/host/core.c        |  8 +++----
>  drivers/scsi/scsi_lib.c         | 23 +++++++++++++++++--
>  include/linux/blk-mq.h          | 13 ++++++-----
>  include/linux/blkdev.h          | 18 +++++++++++++--
>  include/linux/percpu-refcount.h | 17 ++++++++++++++
>  15 files changed, 153 insertions(+), 65 deletions(-)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/9] block/scsi: safe SCSI quiescing
@ 2017-08-31 17:34   ` Oleksandr Natalenko
  0 siblings, 0 replies; 45+ messages in thread
From: Oleksandr Natalenko @ 2017-08-31 17:34 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley

Since I'm in CC, does this series aim to replace 2 patches I've tested before:

blk-mq: add requests in the tail of hctx->dispatch
blk-mq: align to legacy's implementation of blk_execute_rq

?

On čtvrtek 31. srpna 2017 19:27:19 CEST Ming Lei wrote:
> The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.
> 
> Once SCSI device is put into QUIESCE, no new request except for RQF_PREEMPT
> can be dispatched to SCSI successfully, and scsi_device_quiesce() just
> simply waits for completion of I/Os dispatched to SCSI stack. It isn't
> enough at all.
> 
> Because new request still can be allocated, but all the allocated
> requests can't be dispatched successfully, so request pool can be
> consumed up easily.
> 
> Then request with RQF_PREEMPT can't be allocated, and system may
> hang forever, such as during system suspend or SCSI domain alidation.
> 
> Both IO hang inside system suspend[1] or SCSI domain validation
> were reported before.
> 
> This patch tries to solve the issue by freezing block queue during
> SCSI quiescing, and allowing to allocate request of RQF_PREEMPT
> when queue is frozen.
> 
> Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
> them all by unifying blk_freeze_queue() and blk_unfreeze_queue().
> 
> 
> [1] https://marc.info/?t=150340250100013&r=3&w=2
> 
> Ming Lei (9):
>   percpu-refcount: introduce percpu_ref_is_dead()
>   blk-mq: rename blk_mq_unfreeze_queue as blk_unfreeze_queue
>   blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue
>   blk-mq: only run hw queues for blk-mq
>   block: introduce blk_drain_queue()
>   blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait
>   block: tracking request allocation with q_usage_counter
>   block: allow to allocate req with REQF_PREEMPT when queue is frozen
>   SCSI: freeze block queue when SCSI device is put into quiesce
> 
>  block/bfq-iosched.c             |  2 +-
>  block/blk-cgroup.c              |  8 +++----
>  block/blk-core.c                | 49
> +++++++++++++++++++++++++++++++++-------- block/blk-mq.c                  |
> 49 +++++++++++++++++++++-------------------- block/blk-mq.h                
>  |  1 -
>  block/blk.h                     |  5 +++++
>  block/elevator.c                |  4 ++--
>  drivers/block/loop.c            | 16 +++++++-------
>  drivers/block/rbd.c             |  2 +-
>  drivers/ide/ide-pm.c            |  3 ++-
>  drivers/nvme/host/core.c        |  8 +++----
>  drivers/scsi/scsi_lib.c         | 23 +++++++++++++++++--
>  include/linux/blk-mq.h          | 13 ++++++-----
>  include/linux/blkdev.h          | 18 +++++++++++++--
>  include/linux/percpu-refcount.h | 17 ++++++++++++++
>  15 files changed, 153 insertions(+), 65 deletions(-)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/9] block/scsi: safe SCSI quiescing
  2017-08-31 17:34   ` Oleksandr Natalenko
  (?)
@ 2017-08-31 17:38   ` Ming Lei
  2017-08-31 18:31       ` Oleksandr Natalenko
  -1 siblings, 1 reply; 45+ messages in thread
From: Ming Lei @ 2017-08-31 17:38 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley

On Thu, Aug 31, 2017 at 07:34:06PM +0200, Oleksandr Natalenko wrote:
> Since I'm in CC, does this series aim to replace 2 patches I've tested before:
> 
> blk-mq: add requests in the tail of hctx->dispatch
> blk-mq: align to legacy's implementation of blk_execute_rq
> 
> ?

Yeah, this solution is more generic, and the old one in above
two patches may run into the same deadlock inevitably.

Oleksandr, could you test this patchset and provide the feedback?

BTW, it fixes the I/O hang in my raid10 test, but I just write
'devices' to pm_test.

Thanks!

-- 
Ming

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/9] block/scsi: safe SCSI quiescing
  2017-08-31 17:38   ` Ming Lei
@ 2017-08-31 18:31       ` Oleksandr Natalenko
  0 siblings, 0 replies; 45+ messages in thread
From: Oleksandr Natalenko @ 2017-08-31 18:31 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley

Tested against v4.13-rc7. With this patchset it looks like I/O doesn't hang=
,=20
but once (just once, not each time) I've got the following stacktrace on=20
resume:

=3D=3D=3D
[   55.577173] ata1.00: Security Log not supported
[   55.580690] ata1.00: configured for UDMA/100
[   55.582257] ------------[ cut here ]------------
[   55.583924] usb 1-1: reset high-speed USB device number 2 using xhci_hcd
[   55.587489] WARNING: CPU: 3 PID: 646 at lib/percpu-refcount.c:361=20
percpu_ref_reinit+0x21/0x80
[   55.590073] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat iTCO_wdt=
=20
kvm_intel bochs_drm ppdev kvm ttm iTCO_vendor_support drm_kms_helper irqbyp=
ass=20
8139too input_leds drm evdev psmouse led_class pcspkr syscopyarea joydev=20
sysfillrect lpc_ich 8139cp parport_pc sysimgblt mousedev intel_agp i2c_i801=
=20
fb_sys_fops mii mac_hid intel_gtt parport qemu_fw_cfg button sch_fq_codel=20
ip_tables x_tables xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_buf=
io=20
libcrc32c crc32c_generic algif_skcipher af_alg dm_crypt dm_mod dax raid10=20
md_mod sr_mod cdrom sd_mod hid_generic usbhid hid uhci_hcd crct10dif_pclmul=
=20
crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_rng ahci xhci_pci=20
serio_raw pcbc ehci_pci xhci_hcd rng_core atkbd libps2 libahci ehci_hcd lib=
ata=20
aesni_intel aes_x86_64 crypto_simd glue_helper cryptd
[   55.611580]  usbcore virtio_pci scsi_mod usb_common virtio_ring virtio=20
i8042 serio
[   55.614305] CPU: 3 PID: 646 Comm: kworker/u8:23 Not tainted 4.13.0-pf1 #1
[   55.616611] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0=
=2E0=20
02/06/2015
[   55.619903] Workqueue: events_unbound async_run_entry_fn
[   55.621888] task: ffff88001b271e00 task.stack: ffffc90000a2c000
[   55.623674] RIP: 0010:percpu_ref_reinit+0x21/0x80
[   55.625751] RSP: 0000:ffffc90000a2fdb0 EFLAGS: 00010002
[   55.628687] RAX: 0000000000000002 RBX: ffff88001dd80768 RCX:=20
ffff88001dd80758
[   55.631475] RDX: 0000000000000001 RSI: 0000000000000212 RDI:=20
ffffffff81f3e2f0
[   55.633694] RBP: ffffc90000a2fdc0 R08: 0000000cc61e7800 R09:=20
ffff88001f9929c0
[   55.637144] R10: ffffffffffec3296 R11: 7fffffffffffffff R12:=20
0000000000000246
[   55.642456] R13: ffff88001f410800 R14: ffff88001f414300 R15:=20
0000000000000000
[   55.644832] FS:  0000000000000000(0000) GS:ffff88001f980000(0000) knlGS:
0000000000000000
[   55.647388] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   55.649608] CR2: 00000000ffffffff CR3: 000000001aa50000 CR4:=20
00000000001406e0
[   55.652688] Call Trace:
[   55.654597]  blk_unfreeze_queue+0x2f/0x50
[   55.656794]  scsi_device_resume+0x28/0x70 [scsi_mod]
[   55.659059]  scsi_dev_type_resume+0x38/0x90 [scsi_mod]
[   55.660875]  async_sdev_resume+0x15/0x20 [scsi_mod]
[   55.662564]  async_run_entry_fn+0x36/0x150
[   55.664241]  process_one_work+0x1de/0x430
[   55.666018]  worker_thread+0x47/0x3f0
[   55.667387]  kthread+0x125/0x140
[   55.672740]  ? process_one_work+0x430/0x430
[   55.674971]  ? kthread_create_on_node+0x70/0x70
[   55.677110]  ret_from_fork+0x25/0x30
[   55.679098] Code: 5b 41 5c 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 =
89=20
fb 48 c7 c7 f0 e2 f3 81 e8 0a de 32 00 49 89 c4 48 8b 43 08 a8 03 75 42 <0f=
>=20
ff 48 83 63 08 fd 65 ff 05 31 7d cc 7e 48 8b 53 08 f6 c2 03=20
[   55.684822] ---[ end trace dbbf5aed3cf35331 ]---
[   55.714306] PM: resume of devices complete after 500.175 msecs
[   55.717299] OOM killer enabled.
=3D=3D=3D

Here:

=3D=3D=3D
355 void percpu_ref_reinit(struct percpu_ref *ref)
356 {
357     unsigned long flags;
358=20
359     spin_lock_irqsave(&percpu_ref_switch_lock, flags);
360=20
361     WARN_ON_ONCE(!percpu_ref_is_zero(ref));   // <--
362=20
363     ref->percpu_count_ptr &=3D ~__PERCPU_REF_DEAD;
364     percpu_ref_get(ref);
365     __percpu_ref_switch_mode(ref, NULL);
366=20
367     spin_unlock_irqrestore(&percpu_ref_switch_lock, flags);
368 }
=3D=3D=3D

On =C4=8Dtvrtek 31. srpna 2017 19:38:34 CEST Ming Lei wrote:
> On Thu, Aug 31, 2017 at 07:34:06PM +0200, Oleksandr Natalenko wrote:
> > Since I'm in CC, does this series aim to replace 2 patches I've tested
> > before:
> >=20
> > blk-mq: add requests in the tail of hctx->dispatch
> > blk-mq: align to legacy's implementation of blk_execute_rq
> >=20
> > ?
>=20
> Yeah, this solution is more generic, and the old one in above
> two patches may run into the same deadlock inevitably.
>=20
> Oleksandr, could you test this patchset and provide the feedback?
>=20
> BTW, it fixes the I/O hang in my raid10 test, but I just write
> 'devices' to pm_test.
>=20
> Thanks!

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/9] block/scsi: safe SCSI quiescing
@ 2017-08-31 18:31       ` Oleksandr Natalenko
  0 siblings, 0 replies; 45+ messages in thread
From: Oleksandr Natalenko @ 2017-08-31 18:31 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley

Tested against v4.13-rc7. With this patchset it looks like I/O doesn't hang, 
but once (just once, not each time) I've got the following stacktrace on 
resume:

===
[   55.577173] ata1.00: Security Log not supported
[   55.580690] ata1.00: configured for UDMA/100
[   55.582257] ------------[ cut here ]------------
[   55.583924] usb 1-1: reset high-speed USB device number 2 using xhci_hcd
[   55.587489] WARNING: CPU: 3 PID: 646 at lib/percpu-refcount.c:361 
percpu_ref_reinit+0x21/0x80
[   55.590073] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat iTCO_wdt 
kvm_intel bochs_drm ppdev kvm ttm iTCO_vendor_support drm_kms_helper irqbypass 
8139too input_leds drm evdev psmouse led_class pcspkr syscopyarea joydev 
sysfillrect lpc_ich 8139cp parport_pc sysimgblt mousedev intel_agp i2c_i801 
fb_sys_fops mii mac_hid intel_gtt parport qemu_fw_cfg button sch_fq_codel 
ip_tables x_tables xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio 
libcrc32c crc32c_generic algif_skcipher af_alg dm_crypt dm_mod dax raid10 
md_mod sr_mod cdrom sd_mod hid_generic usbhid hid uhci_hcd crct10dif_pclmul 
crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_rng ahci xhci_pci 
serio_raw pcbc ehci_pci xhci_hcd rng_core atkbd libps2 libahci ehci_hcd libata 
aesni_intel aes_x86_64 crypto_simd glue_helper cryptd
[   55.611580]  usbcore virtio_pci scsi_mod usb_common virtio_ring virtio 
i8042 serio
[   55.614305] CPU: 3 PID: 646 Comm: kworker/u8:23 Not tainted 4.13.0-pf1 #1
[   55.616611] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 
02/06/2015
[   55.619903] Workqueue: events_unbound async_run_entry_fn
[   55.621888] task: ffff88001b271e00 task.stack: ffffc90000a2c000
[   55.623674] RIP: 0010:percpu_ref_reinit+0x21/0x80
[   55.625751] RSP: 0000:ffffc90000a2fdb0 EFLAGS: 00010002
[   55.628687] RAX: 0000000000000002 RBX: ffff88001dd80768 RCX: 
ffff88001dd80758
[   55.631475] RDX: 0000000000000001 RSI: 0000000000000212 RDI: 
ffffffff81f3e2f0
[   55.633694] RBP: ffffc90000a2fdc0 R08: 0000000cc61e7800 R09: 
ffff88001f9929c0
[   55.637144] R10: ffffffffffec3296 R11: 7fffffffffffffff R12: 
0000000000000246
[   55.642456] R13: ffff88001f410800 R14: ffff88001f414300 R15: 
0000000000000000
[   55.644832] FS:  0000000000000000(0000) GS:ffff88001f980000(0000) knlGS:
0000000000000000
[   55.647388] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   55.649608] CR2: 00000000ffffffff CR3: 000000001aa50000 CR4: 
00000000001406e0
[   55.652688] Call Trace:
[   55.654597]  blk_unfreeze_queue+0x2f/0x50
[   55.656794]  scsi_device_resume+0x28/0x70 [scsi_mod]
[   55.659059]  scsi_dev_type_resume+0x38/0x90 [scsi_mod]
[   55.660875]  async_sdev_resume+0x15/0x20 [scsi_mod]
[   55.662564]  async_run_entry_fn+0x36/0x150
[   55.664241]  process_one_work+0x1de/0x430
[   55.666018]  worker_thread+0x47/0x3f0
[   55.667387]  kthread+0x125/0x140
[   55.672740]  ? process_one_work+0x430/0x430
[   55.674971]  ? kthread_create_on_node+0x70/0x70
[   55.677110]  ret_from_fork+0x25/0x30
[   55.679098] Code: 5b 41 5c 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 
fb 48 c7 c7 f0 e2 f3 81 e8 0a de 32 00 49 89 c4 48 8b 43 08 a8 03 75 42 <0f> 
ff 48 83 63 08 fd 65 ff 05 31 7d cc 7e 48 8b 53 08 f6 c2 03 
[   55.684822] ---[ end trace dbbf5aed3cf35331 ]---
[   55.714306] PM: resume of devices complete after 500.175 msecs
[   55.717299] OOM killer enabled.
===

Here:

===
355 void percpu_ref_reinit(struct percpu_ref *ref)
356 {
357     unsigned long flags;
358 
359     spin_lock_irqsave(&percpu_ref_switch_lock, flags);
360 
361     WARN_ON_ONCE(!percpu_ref_is_zero(ref));   // <--
362 
363     ref->percpu_count_ptr &= ~__PERCPU_REF_DEAD;
364     percpu_ref_get(ref);
365     __percpu_ref_switch_mode(ref, NULL);
366 
367     spin_unlock_irqrestore(&percpu_ref_switch_lock, flags);
368 }
===

On čtvrtek 31. srpna 2017 19:38:34 CEST Ming Lei wrote:
> On Thu, Aug 31, 2017 at 07:34:06PM +0200, Oleksandr Natalenko wrote:
> > Since I'm in CC, does this series aim to replace 2 patches I've tested
> > before:
> > 
> > blk-mq: add requests in the tail of hctx->dispatch
> > blk-mq: align to legacy's implementation of blk_execute_rq
> > 
> > ?
> 
> Yeah, this solution is more generic, and the old one in above
> two patches may run into the same deadlock inevitably.
> 
> Oleksandr, could you test this patchset and provide the feedback?
> 
> BTW, it fixes the I/O hang in my raid10 test, but I just write
> 'devices' to pm_test.
> 
> Thanks!

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/9] block: introduce blk_drain_queue()
  2017-08-31 17:27 ` [PATCH 5/9] block: introduce blk_drain_queue() Ming Lei
@ 2017-08-31 22:34     ` Bart Van Assche
  0 siblings, 0 replies; 45+ messages in thread
From: Bart Van Assche @ 2017-08-31 22:34 UTC (permalink / raw)
  To: linux-block, hch, Bart Van Assche, martin.petersen, ming.lei,
	linux-scsi, axboe, jejb
  Cc: oleksandr

T24gRnJpLCAyMDE3LTA5LTAxIGF0IDAxOjI3ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gU28g
dGhhdCB3ZSBjYW4gc3VwcG9ydCBsZWdhY3kgdmVyc2lvbiBvZiBmcmVlemluZyBxdWV1ZSwNCj4g
d2hpY2ggaXMgcmVxdWlyZWQgYnkgc2FmZSBTQ1NJIHF1aWVzY2luZy4NCj4gDQo+IFNpZ25lZC1v
ZmYtYnk6IE1pbmcgTGVpIDxtaW5nLmxlaUByZWRoYXQuY29tPg0KPiAtLS0NCj4gIGJsb2NrL2Js
ay1jb3JlLmMgICAgICAgfCAxNiArKysrKysrKysrKysrKysrDQo+ICBpbmNsdWRlL2xpbnV4L2Js
a2Rldi5oIHwgIDEgKw0KPiAgMiBmaWxlcyBjaGFuZ2VkLCAxNyBpbnNlcnRpb25zKCspDQo+IA0K
PiBkaWZmIC0tZ2l0IGEvYmxvY2svYmxrLWNvcmUuYyBiL2Jsb2NrL2Jsay1jb3JlLmMNCj4gaW5k
ZXggZDU3OTUwMWYyNGJhLi42MzY0NTJmMTUxZWEgMTAwNjQ0DQo+IC0tLSBhL2Jsb2NrL2Jsay1j
b3JlLmMNCj4gKysrIGIvYmxvY2svYmxrLWNvcmUuYw0KPiBAQCAtNTMwLDYgKzUzMCwyMiBAQCBz
dGF0aWMgdm9pZCBfX2Jsa19kcmFpbl9xdWV1ZShzdHJ1Y3QgcmVxdWVzdF9xdWV1ZSAqcSwgYm9v
bCBkcmFpbl9hbGwpDQo+ICB9DQo+ICANCj4gIC8qKg0KPiArICogYmxrX2RyYWluX3F1ZXVlIC0g
ZHJhaW4gcmVxdWVzdHMgZnJvbSByZXF1ZXN0X3F1ZXVlDQo+ICsgKiBAcTogcXVldWUgdG8gZHJh
aW4NCj4gKyAqDQo+ICsgKiBEcmFpbiByZXF1ZXN0cyBmcm9tIEBxLiAgQWxsIHBlbmRpbmcgcmVx
dWVzdHMgYXJlIGRyYWluZWQuDQo+ICsgKiBUaGUgY2FsbGVyIGlzIHJlc3BvbnNpYmxlIGZvciBl
bnN1cmluZyB0aGF0IG5vIG5ldyByZXF1ZXN0cw0KPiArICogd2hpY2ggbmVlZCB0byBiZSBkcmFp
bmVkIGFyZSBxdWV1ZWQuDQo+ICsgKi8NCj4gK3ZvaWQgYmxrX2RyYWluX3F1ZXVlKHN0cnVjdCBy
ZXF1ZXN0X3F1ZXVlICpxKQ0KPiArew0KPiArCXNwaW5fbG9ja19pcnEocS0+cXVldWVfbG9jayk7
DQo+ICsJX19ibGtfZHJhaW5fcXVldWUocSwgdHJ1ZSk7DQo+ICsJc3Bpbl91bmxvY2tfaXJxKHEt
PnF1ZXVlX2xvY2spOw0KPiArfQ0KPiArRVhQT1JUX1NZTUJPTChibGtfZHJhaW5fcXVldWUpOw0K
DQpIZWxsbyBNaW5nLA0KDQpQbGVhc2UgZHJvcCB0aGlzIHBhdGNoLiBBcyBmYXIgYXMgSSBjYW4g
c2VlIHRoaXMgcGF0Y2ggc2VyaWVzIGludHJvZHVjZXMgb25seQ0Kb25lIGNhbGwgdG8gYmxrX2Ry
YWluX3F1ZXVlKCksIG5hbWVseSBpbiBibGtfZnJlZXplX3F1ZXVlX3dhaXQoKSBpbiBwYXRjaCA2
Lg0KTXkgcHJvcG9zYWwgaXMgdG8gaW5saW5lIHRoZSBib2R5IG9mIGJsa19kcmFpbl9xdWV1ZSgp
IGluIGJsa19mcmVlemVfcXVldWVfd2FpdCgpDQppbnN0ZWFkIG9mIHJlaW50cm9kdWNpbmcgYmxr
X2RyYWluX3F1ZXVlKCkgKHNlZSBhbHNvIGNvbW1pdCA4MDc1OTJhNGZhZmI7DQoiYmxvY2s6IExl
dCBibGtfZHJhaW5fcXVldWUoKSBjYWxsZXIgb2J0YWluIHRoZSBxdWV1ZSBsb2NrIikuDQoNCkJh
cnQu

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/9] block: introduce blk_drain_queue()
@ 2017-08-31 22:34     ` Bart Van Assche
  0 siblings, 0 replies; 45+ messages in thread
From: Bart Van Assche @ 2017-08-31 22:34 UTC (permalink / raw)
  To: linux-block, hch, Bart Van Assche, martin.petersen, ming.lei,
	linux-scsi, axboe, jejb
  Cc: oleksandr

On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> So that we can support legacy version of freezing queue,
> which is required by safe SCSI quiescing.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-core.c       | 16 ++++++++++++++++
>  include/linux/blkdev.h |  1 +
>  2 files changed, 17 insertions(+)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index d579501f24ba..636452f151ea 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -530,6 +530,22 @@ static void __blk_drain_queue(struct request_queue *q, bool drain_all)
>  }
>  
>  /**
> + * blk_drain_queue - drain requests from request_queue
> + * @q: queue to drain
> + *
> + * Drain requests from @q.  All pending requests are drained.
> + * The caller is responsible for ensuring that no new requests
> + * which need to be drained are queued.
> + */
> +void blk_drain_queue(struct request_queue *q)
> +{
> +	spin_lock_irq(q->queue_lock);
> +	__blk_drain_queue(q, true);
> +	spin_unlock_irq(q->queue_lock);
> +}
> +EXPORT_SYMBOL(blk_drain_queue);

Hello Ming,

Please drop this patch. As far as I can see this patch series introduces only
one call to blk_drain_queue(), namely in blk_freeze_queue_wait() in patch 6.
My proposal is to inline the body of blk_drain_queue() in blk_freeze_queue_wait()
instead of reintroducing blk_drain_queue() (see also commit 807592a4fafb;
"block: Let blk_drain_queue() caller obtain the queue lock").

Bart.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 3/9] blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue
  2017-08-31 17:27 ` [PATCH 3/9] blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue Ming Lei
@ 2017-08-31 22:35     ` Bart Van Assche
  0 siblings, 0 replies; 45+ messages in thread
From: Bart Van Assche @ 2017-08-31 22:35 UTC (permalink / raw)
  To: linux-block, hch, Bart Van Assche, martin.petersen, ming.lei,
	linux-scsi, axboe, jejb
  Cc: oleksandr

T24gRnJpLCAyMDE3LTA5LTAxIGF0IDAxOjI3ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gVGhp
cyBBUElzIHdpbGwgYmUgdXNlZCBieSBsZWdhY3kgcGF0aCB0b28uDQoNCkhlbGxvIE1pbmcsDQoN
CkhhZCB5b3UgYWxyZWFkeSBjb25zaWRlcmVkIHRvIGNvbWJpbmUgcGF0Y2hlcyAyLzkgYW5kIDMv
OSBpbnRvIGEgc2luZ2xlIHBhdGNoPw0KDQpUaGFua3MsDQoNCkJhcnQu

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 3/9] blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue
@ 2017-08-31 22:35     ` Bart Van Assche
  0 siblings, 0 replies; 45+ messages in thread
From: Bart Van Assche @ 2017-08-31 22:35 UTC (permalink / raw)
  To: linux-block, hch, Bart Van Assche, martin.petersen, ming.lei,
	linux-scsi, axboe, jejb
  Cc: oleksandr

On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> This APIs will be used by legacy path too.

Hello Ming,

Had you already considered to combine patches 2/9 and 3/9 into a single patch?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 6/9] blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait
  2017-08-31 17:27 ` [PATCH 6/9] blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait Ming Lei
@ 2017-08-31 22:37     ` Bart Van Assche
  0 siblings, 0 replies; 45+ messages in thread
From: Bart Van Assche @ 2017-08-31 22:37 UTC (permalink / raw)
  To: linux-block, hch, Bart Van Assche, martin.petersen, ming.lei,
	linux-scsi, axboe, jejb
  Cc: oleksandr

T24gRnJpLCAyMDE3LTA5LTAxIGF0IDAxOjI3ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gVGhl
IG9ubHkgY2hhbmdlIG9uIGxlZ2FjeSBpcyB0aGF0IGJsa19kcmFpbl9xdWV1ZSgpIGlzIHJ1bg0K
PiBmcm9tIGJsa19mcmVlemVfcXVldWUoKSwgd2hpY2ggaXMgY2FsbGVkIGluIGJsa19jbGVhbnVw
X3F1ZXVlKCkuDQo+IA0KPiBTbyB0aGlzIHBhdGNoIHJlbW92ZXMgdGhlIGV4cGxpY2l0ZSBfX2Js
a19kcmFpbl9xdWV1ZSgpIGluDQo+IGJsa19jbGVhbnVwX3F1ZXVlKCkuDQoNCkhlbGxvIE1pbmcs
DQoNClNob3VsZG4ndCB0aGUgcmVuYW1lIGFuZCB0aGUgbW92ZSBvZiB0aGUgX19ibGtfZHJhaW5f
cXVldWUoKSBiZQ0Kc2VwYXJhdGUgcGF0Y2hlcz8NCg0KVGhhbmtzLA0KDQpCYXJ0Lg==

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 6/9] blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait
@ 2017-08-31 22:37     ` Bart Van Assche
  0 siblings, 0 replies; 45+ messages in thread
From: Bart Van Assche @ 2017-08-31 22:37 UTC (permalink / raw)
  To: linux-block, hch, Bart Van Assche, martin.petersen, ming.lei,
	linux-scsi, axboe, jejb
  Cc: oleksandr

On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> The only change on legacy is that blk_drain_queue() is run
> from blk_freeze_queue(), which is called in blk_cleanup_queue().
> 
> So this patch removes the explicite __blk_drain_queue() in
> blk_cleanup_queue().

Hello Ming,

Shouldn't the rename and the move of the __blk_drain_queue() be
separate patches?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 8/9] block: allow to allocate req with REQF_PREEMPT when queue is frozen
  2017-08-31 17:27 ` [PATCH 8/9] block: allow to allocate req with REQF_PREEMPT when queue is frozen Ming Lei
@ 2017-08-31 22:50     ` Bart Van Assche
  0 siblings, 0 replies; 45+ messages in thread
From: Bart Van Assche @ 2017-08-31 22:50 UTC (permalink / raw)
  To: linux-block, hch, Bart Van Assche, martin.petersen, ming.lei,
	linux-scsi, axboe, jejb
  Cc: oleksandr

T24gRnJpLCAyMDE3LTA5LTAxIGF0IDAxOjI3ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gQEAg
LTE0MTMsOSArMTQxNCwxNyBAQCBzdGF0aWMgc3RydWN0IHJlcXVlc3QgKmJsa19vbGRfZ2V0X3Jl
cXVlc3Qoc3RydWN0IHJlcXVlc3RfcXVldWUgKnEsDQo+ICAJLyogY3JlYXRlIGlvYyB1cGZyb250
ICovDQo+ICAJY3JlYXRlX2lvX2NvbnRleHQoZ2ZwX21hc2ssIHEtPm5vZGUpOw0KPiAgDQo+IC0J
cmV0ID0gYmxrX3F1ZXVlX2VudGVyKHEsICEoZ2ZwX21hc2sgJiBfX0dGUF9ESVJFQ1RfUkVDTEFJ
TSkpOw0KPiArCS8qDQo+ICsJICogV2hlbiBxdWV1ZSBpcyBmcm96ZW4sIHdlIHN0aWxsIG5lZWQg
dG8gYWxsb2NhdGUgcmVxIGZvcg0KPiArCSAqIFJFUUZfUFJFRU1QVC4NCj4gKwkgKi8NCj4gKwlp
ZiAoKGZsYWdzICYgQkxLX01RX1JFUV9QUkVFTVBUKSAmJiBibGtfcXVldWVfaXNfZnJvemVuKHEp
KQ0KPiArCQlibGtfcXVldWVfZW50ZXJfbGl2ZShxKTsNCj4gKwllbHNlDQo+ICsJCXJldCA9IGJs
a19xdWV1ZV9lbnRlcihxLCAhKGdmcF9tYXNrICYgX19HRlBfRElSRUNUX1JFQ0xBSU0pKTsNCg0K
SGVsbG8gTWluZywNCg0KU29ycnkgYnV0IEkgZG91YnQgdGhhdCBjYWxsaW5nIGJsa19xdWV1ZV9l
bnRlcl9saXZlKCkgZnJvbSBpbnNpZGUNCmJsa19vbGRfZ2V0X3JlcXVlc3QoKSBpcyBzYWZlLiBD
YWxsaW5nIGJsa19xdWV1ZV9lbnRlcl9saXZlKCkgaXMgb25seSBzYWZlDQpiZWZvcmUgYSByZXF1
ZXN0IHF1ZXVlIGhhcyBiZWVuIG1hcmtlZCBkZWFkLiBXaGF0IHByZXZlbnRzIGEga2VybmVsIHRo
cmVhZA0KdGhhdCBob2xkcyBhIHJlZmVyZW5jZSBvbiBhIHJlcXVlc3QgcXVldWUgYW5kIHRoYXQg
aXMgcnVubmluZyBjb25jdXJyZW50bHkNCndpdGggYmxrX2NsZWFudXBfcXVldWUoKSB0byBjYWxs
IGJsa19vbGRfZ2V0X3JlcXVlc3QoKSBhZnRlciBhIHF1ZXVlIGhhcw0KYmVlbiBtYXJrZWQgZGVh
ZD8NCg0KVGhhbmtzLA0KDQpCYXJ0Lg==

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 8/9] block: allow to allocate req with REQF_PREEMPT when queue is frozen
@ 2017-08-31 22:50     ` Bart Van Assche
  0 siblings, 0 replies; 45+ messages in thread
From: Bart Van Assche @ 2017-08-31 22:50 UTC (permalink / raw)
  To: linux-block, hch, Bart Van Assche, martin.petersen, ming.lei,
	linux-scsi, axboe, jejb
  Cc: oleksandr

On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> @@ -1413,9 +1414,17 @@ static struct request *blk_old_get_request(struct request_queue *q,
>  	/* create ioc upfront */
>  	create_io_context(gfp_mask, q->node);
>  
> -	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
> +	/*
> +	 * When queue is frozen, we still need to allocate req for
> +	 * REQF_PREEMPT.
> +	 */
> +	if ((flags & BLK_MQ_REQ_PREEMPT) && blk_queue_is_frozen(q))
> +		blk_queue_enter_live(q);
> +	else
> +		ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));

Hello Ming,

Sorry but I doubt that calling blk_queue_enter_live() from inside
blk_old_get_request() is safe. Calling blk_queue_enter_live() is only safe
before a request queue has been marked dead. What prevents a kernel thread
that holds a reference on a request queue and that is running concurrently
with blk_cleanup_queue() to call blk_old_get_request() after a queue has
been marked dead?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead()
  2017-08-31 17:27 ` [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead() Ming Lei
@ 2017-08-31 23:07     ` Bart Van Assche
  2017-09-01 13:59   ` Tejun Heo
  1 sibling, 0 replies; 45+ messages in thread
From: Bart Van Assche @ 2017-08-31 23:07 UTC (permalink / raw)
  To: linux-block, hch, Bart Van Assche, martin.petersen, ming.lei,
	linux-scsi, axboe, jejb
  Cc: tj, oleksandr

T24gRnJpLCAyMDE3LTA5LTAxIGF0IDAxOjI3ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gSW5z
aWRlIGJsb2NrIGxheWVyLCB3ZSBuZWVkIHRvIHN1cHBvcnQgdG8gYWxsb2NhdGUgcmVxdWVzdCB3
aXRoDQo+IFJRRl9QUkVFTVBUIGV2ZW4gdGhvdWdoIHF1ZXVlIGlzIGZyb3plbi4NCg0KSGVsbG8g
TWluZywNCg0KQ2FuIHBhdGNoIDgvOSBiZSByZXdvcmtlZCBzdWNoIHRoYXQgaXQgZG9lc24ndCBu
ZWVkIHRoZSBuZXcNCnBlcmNwdV9yZWZfaXNfZGVhZCgpIGhlbHBlciBmdW5jdGlvbiwgZS5nLiBi
eSBjaGVja2luZyBxLT5tcV9mcmVlemVfZGVwdGgNCnRvIGNoZWNrIHdoZXRoZXIgb3Igbm90IHJl
cXVlc3QgcHJvY2Vzc2luZyBoYXMgYmVlbiBmcm96ZW4/IEhhdmUgeW91IG5vdGljZWQNCnRoYXQg
c2V2ZXJhbCBibGstbXEgY29kZSBwYXRocyBhbHJlYWR5IGNoZWNrIHEtPm1xX2ZyZWV6ZV9kZXB0
aCB0byBjaGVjaw0Kd2hldGhlciBvciBub3QgYSBxdWV1ZSBpcyBmcm96ZW4/DQoNClRoYW5rcywN
Cg0KQmFydC4=

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead()
@ 2017-08-31 23:07     ` Bart Van Assche
  0 siblings, 0 replies; 45+ messages in thread
From: Bart Van Assche @ 2017-08-31 23:07 UTC (permalink / raw)
  To: linux-block, hch, Bart Van Assche, martin.petersen, ming.lei,
	linux-scsi, axboe, jejb
  Cc: tj, oleksandr

On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> Inside block layer, we need to support to allocate request with
> RQF_PREEMPT even though queue is frozen.

Hello Ming,

Can patch 8/9 be reworked such that it doesn't need the new
percpu_ref_is_dead() helper function, e.g. by checking q->mq_freeze_depth
to check whether or not request processing has been frozen? Have you noticed
that several blk-mq code paths already check q->mq_freeze_depth to check
whether or not a queue is frozen?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/9] block/scsi: safe SCSI quiescing
  2017-08-31 18:31       ` Oleksandr Natalenko
  (?)
@ 2017-09-01  3:45       ` Ming Lei
  2017-09-01  6:24         ` oleksandr
  -1 siblings, 1 reply; 45+ messages in thread
From: Ming Lei @ 2017-09-01  3:45 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley

On Thu, Aug 31, 2017 at 08:31:54PM +0200, Oleksandr Natalenko wrote:
> Tested against v4.13-rc7. With this patchset it looks like I/O doesn't hang, 
> but once (just once, not each time) I've got the following stacktrace on 
> resume:

That looks not bad, and thanks for your test!

Could you try the following patch against this patchset to see
if there is still the warning?


diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index b9ad8aeb456f..bbd85ddad18c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -243,11 +243,12 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 	struct request *req;
 	struct scsi_request *rq;
 	int ret = DRIVER_ERROR << 24;
+	unsigned flag = sdev->sdev_state == SDEV_QUIESCE ? BLK_REQ_PREEMPT : 0;
 
 	req = __blk_get_request(sdev->request_queue,
 			data_direction == DMA_TO_DEVICE ?
 			REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, __GFP_RECLAIM,
-			BLK_REQ_PREEMPT);
+			flag);
 	if (IS_ERR(req))
 		return ret;
 	rq = scsi_req(req);
@@ -2931,11 +2932,6 @@ EXPORT_SYMBOL(scsi_device_quiesce);
  */
 void scsi_device_resume(struct scsi_device *sdev)
 {
-	/* wait for completion of IO issued during SCSI quiese */
-	blk_freeze_queue_wait(sdev->request_queue);
-
-	blk_unfreeze_queue(sdev->request_queue);
-
 	/* check if the device state was mutated prior to resume, and if
 	 * so assume the state is being managed elsewhere (for example
 	 * device deleted during suspend)
@@ -2945,6 +2941,11 @@ void scsi_device_resume(struct scsi_device *sdev)
 	    scsi_device_set_state(sdev, SDEV_RUNNING) == 0)
 		scsi_run_queue(sdev->request_queue);
 	mutex_unlock(&sdev->state_mutex);
+
+	/* wait for completion of IO issued during SCSI quiese */
+	blk_freeze_queue_wait(sdev->request_queue);
+
+	blk_unfreeze_queue(sdev->request_queue);
 }
 EXPORT_SYMBOL(scsi_device_resume);
 
-- 
Ming

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead()
  2017-08-31 23:07     ` Bart Van Assche
  (?)
@ 2017-09-01  3:49     ` Ming Lei
  -1 siblings, 0 replies; 45+ messages in thread
From: Ming Lei @ 2017-09-01  3:49 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block, hch, martin.petersen, linux-scsi, axboe, jejb, tj,
	oleksandr

On Thu, Aug 31, 2017 at 11:07:30PM +0000, Bart Van Assche wrote:
> On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> > Inside block layer, we need to support to allocate request with
> > RQF_PREEMPT even though queue is frozen.
> 
> Hello Ming,
> 
> Can patch 8/9 be reworked such that it doesn't need the new
> percpu_ref_is_dead() helper function, e.g. by checking q->mq_freeze_depth
> to check whether or not request processing has been frozen? Have you noticed
> that several blk-mq code paths already check q->mq_freeze_depth to check
> whether or not a queue is frozen?

Strictly speaking, that way is wrong, or at least one lock is required
to check the counter, because the percpu-refcount isn't killed yet after
the atomic counter is increased by 1.

So I prefer to use this new helper.

-- 
Ming

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 3/9] blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue
  2017-08-31 22:35     ` Bart Van Assche
  (?)
@ 2017-09-01  3:51     ` Ming Lei
  -1 siblings, 0 replies; 45+ messages in thread
From: Ming Lei @ 2017-09-01  3:51 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block, hch, martin.petersen, linux-scsi, axboe, jejb, oleksandr

On Thu, Aug 31, 2017 at 10:35:50PM +0000, Bart Van Assche wrote:
> On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> > This APIs will be used by legacy path too.
> 
> Hello Ming,
> 
> Had you already considered to combine patches 2/9 and 3/9 into a single patch?

Either way is fine for me, and splitting them makes it easier to review.

-- 
Ming

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/9] block: introduce blk_drain_queue()
  2017-08-31 22:34     ` Bart Van Assche
  (?)
@ 2017-09-01  3:52     ` Ming Lei
  -1 siblings, 0 replies; 45+ messages in thread
From: Ming Lei @ 2017-09-01  3:52 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block, hch, martin.petersen, linux-scsi, axboe, jejb, oleksandr

On Thu, Aug 31, 2017 at 10:34:49PM +0000, Bart Van Assche wrote:
> On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> > So that we can support legacy version of freezing queue,
> > which is required by safe SCSI quiescing.
> > 
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  block/blk-core.c       | 16 ++++++++++++++++
> >  include/linux/blkdev.h |  1 +
> >  2 files changed, 17 insertions(+)
> > 
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index d579501f24ba..636452f151ea 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -530,6 +530,22 @@ static void __blk_drain_queue(struct request_queue *q, bool drain_all)
> >  }
> >  
> >  /**
> > + * blk_drain_queue - drain requests from request_queue
> > + * @q: queue to drain
> > + *
> > + * Drain requests from @q.  All pending requests are drained.
> > + * The caller is responsible for ensuring that no new requests
> > + * which need to be drained are queued.
> > + */
> > +void blk_drain_queue(struct request_queue *q)
> > +{
> > +	spin_lock_irq(q->queue_lock);
> > +	__blk_drain_queue(q, true);
> > +	spin_unlock_irq(q->queue_lock);
> > +}
> > +EXPORT_SYMBOL(blk_drain_queue);
> 
> Hello Ming,
> 
> Please drop this patch. As far as I can see this patch series introduces only
> one call to blk_drain_queue(), namely in blk_freeze_queue_wait() in patch 6.
> My proposal is to inline the body of blk_drain_queue() in blk_freeze_queue_wait()
> instead of reintroducing blk_drain_queue() (see also commit 807592a4fafb;
> "block: Let blk_drain_queue() caller obtain the queue lock").

Looks good, will do it in V2.

-- 
Ming

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 6/9] blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait
  2017-08-31 22:37     ` Bart Van Assche
  (?)
@ 2017-09-01  3:53     ` Ming Lei
  -1 siblings, 0 replies; 45+ messages in thread
From: Ming Lei @ 2017-09-01  3:53 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block, hch, martin.petersen, linux-scsi, axboe, jejb, oleksandr

On Thu, Aug 31, 2017 at 10:37:46PM +0000, Bart Van Assche wrote:
> On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> > The only change on legacy is that blk_drain_queue() is run
> > from blk_freeze_queue(), which is called in blk_cleanup_queue().
> > 
> > So this patch removes the explicite __blk_drain_queue() in
> > blk_cleanup_queue().
> 
> Hello Ming,
> 
> Shouldn't the rename and the move of the __blk_drain_queue() be
> separate patches?

OK.

-- 
Ming

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 8/9] block: allow to allocate req with REQF_PREEMPT when queue is frozen
  2017-08-31 22:50     ` Bart Van Assche
  (?)
@ 2017-09-01  3:55     ` Ming Lei
  2017-09-01 15:43         ` Bart Van Assche
  -1 siblings, 1 reply; 45+ messages in thread
From: Ming Lei @ 2017-09-01  3:55 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block, hch, martin.petersen, linux-scsi, axboe, jejb, oleksandr

On Thu, Aug 31, 2017 at 10:50:25PM +0000, Bart Van Assche wrote:
> On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> > @@ -1413,9 +1414,17 @@ static struct request *blk_old_get_request(struct request_queue *q,
> >  	/* create ioc upfront */
> >  	create_io_context(gfp_mask, q->node);
> >  
> > -	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
> > +	/*
> > +	 * When queue is frozen, we still need to allocate req for
> > +	 * REQF_PREEMPT.
> > +	 */
> > +	if ((flags & BLK_MQ_REQ_PREEMPT) && blk_queue_is_frozen(q))
> > +		blk_queue_enter_live(q);
> > +	else
> > +		ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
> 
> Hello Ming,
> 
> Sorry but I doubt that calling blk_queue_enter_live() from inside
> blk_old_get_request() is safe. Calling blk_queue_enter_live() is only safe
> before a request queue has been marked dead. What prevents a kernel thread
> that holds a reference on a request queue and that is running concurrently
> with blk_cleanup_queue() to call blk_old_get_request() after a queue has
> been marked dead?

I have sent one delta patch in list, which will only call
blk_queue_enter_live() iff SCSI device is in QUIESCE.

-- 
Ming

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/9] block/scsi: safe SCSI quiescing
  2017-09-01  3:45       ` Ming Lei
@ 2017-09-01  6:24         ` oleksandr
  2017-09-01  6:30           ` Ming Lei
  0 siblings, 1 reply; 45+ messages in thread
From: oleksandr @ 2017-09-01  6:24 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley

Hi.

01.09.2017 05:45, Ming Lei wrote:
> Could you try the following patch against this patchset to see
> if there is still the warning?

With this patch I wasn't able to trigger per-CPU-related warning.

Also, for 8th patch you've written:

> I have sent one delta patch in list, which will only call 
> blk_queue_enter_live() iff SCSI device is in QUIESCE.

This refers to the patch I've just tested, right?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/9] block/scsi: safe SCSI quiescing
  2017-09-01  6:24         ` oleksandr
@ 2017-09-01  6:30           ` Ming Lei
  0 siblings, 0 replies; 45+ messages in thread
From: Ming Lei @ 2017-09-01  6:30 UTC (permalink / raw)
  To: oleksandr
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley

On Fri, Sep 01, 2017 at 08:24:46AM +0200, oleksandr@natalenko.name wrote:
> Hi.
> 
> 01.09.2017 05:45, Ming Lei wrote:
> > Could you try the following patch against this patchset to see
> > if there is still the warning?
> 
> With this patch I wasn't able to trigger per-CPU-related warning.

That is great!

> 
> Also, for 8th patch you've written:
> 
> > I have sent one delta patch in list, which will only call
> > blk_queue_enter_live() iff SCSI device is in QUIESCE.
> 
> This refers to the patch I've just tested, right?

Yeah, you are right, thanks again for your test!

I will prepare V2 and send it out soon.

-- 
Ming

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/9] blk-mq: only run hw queues for blk-mq
  2017-08-31 17:27 ` [PATCH 4/9] blk-mq: only run hw queues for blk-mq Ming Lei
@ 2017-09-01  8:16     ` Johannes Thumshirn
  0 siblings, 0 replies; 45+ messages in thread
From: Johannes Thumshirn @ 2017-09-01  8:16 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley,
	Oleksandr Natalenko

Hi Ming,

On Fri, Sep 01, 2017 at 01:27:23AM +0800, Ming Lei wrote:
> -		blk_mq_run_hw_queues(q, false);
> +		if (q->mq_ops)
> +			blk_mq_run_hw_queues(q, false);

What speaks against putting the if (q->mq_ops) directly into
blk_mq_run_hw_queues() so we can't accidently call it from the sq path?

Just an idea, no hard preferences here.

Byte,
	Johannes

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Felix Imend�rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N�rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/9] blk-mq: only run hw queues for blk-mq
@ 2017-09-01  8:16     ` Johannes Thumshirn
  0 siblings, 0 replies; 45+ messages in thread
From: Johannes Thumshirn @ 2017-09-01  8:16 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley,
	Oleksandr Natalenko

Hi Ming,

On Fri, Sep 01, 2017 at 01:27:23AM +0800, Ming Lei wrote:
> -		blk_mq_run_hw_queues(q, false);
> +		if (q->mq_ops)
> +			blk_mq_run_hw_queues(q, false);

What speaks against putting the if (q->mq_ops) directly into
blk_mq_run_hw_queues() so we can't accidently call it from the sq path?

Just an idea, no hard preferences here.

Byte,
	Johannes

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/9] blk-mq: only run hw queues for blk-mq
  2017-09-01  8:16     ` Johannes Thumshirn
  (?)
@ 2017-09-01 12:33     ` Ming Lei
  2017-09-01 12:44         ` Johannes Thumshirn
  -1 siblings, 1 reply; 45+ messages in thread
From: Ming Lei @ 2017-09-01 12:33 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley,
	Oleksandr Natalenko

On Fri, Sep 01, 2017 at 10:16:34AM +0200, Johannes Thumshirn wrote:
> Hi Ming,
> 
> On Fri, Sep 01, 2017 at 01:27:23AM +0800, Ming Lei wrote:
> > -		blk_mq_run_hw_queues(q, false);
> > +		if (q->mq_ops)
> > +			blk_mq_run_hw_queues(q, false);
> 
> What speaks against putting the if (q->mq_ops) directly into
> blk_mq_run_hw_queues() so we can't accidently call it from the sq path?
> 
> Just an idea, no hard preferences here.

Given it is the only case that blk_mq_run_hw_queues() is run on
!q->mq_ops now, I suggest to check q->mq_ops outside, otherwise
it can be a bit overkill.

-- 
Ming

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/9] blk-mq: only run hw queues for blk-mq
  2017-09-01 12:33     ` Ming Lei
@ 2017-09-01 12:44         ` Johannes Thumshirn
  0 siblings, 0 replies; 45+ messages in thread
From: Johannes Thumshirn @ 2017-09-01 12:44 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley,
	Oleksandr Natalenko

On Fri, Sep 01, 2017 at 08:33:40PM +0800, Ming Lei wrote:
> Given it is the only case that blk_mq_run_hw_queues() is run on
> !q->mq_ops now, I suggest to check q->mq_ops outside, otherwise
> it can be a bit overkill.

Fair enough,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Felix Imend�rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N�rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/9] blk-mq: only run hw queues for blk-mq
@ 2017-09-01 12:44         ` Johannes Thumshirn
  0 siblings, 0 replies; 45+ messages in thread
From: Johannes Thumshirn @ 2017-09-01 12:44 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley,
	Oleksandr Natalenko

On Fri, Sep 01, 2017 at 08:33:40PM +0800, Ming Lei wrote:
> Given it is the only case that blk_mq_run_hw_queues() is run on
> !q->mq_ops now, I suggest to check q->mq_ops outside, otherwise
> it can be a bit overkill.

Fair enough,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead()
  2017-08-31 17:27 ` [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead() Ming Lei
  2017-08-31 23:07     ` Bart Van Assche
@ 2017-09-01 13:59   ` Tejun Heo
  2017-09-01 15:44       ` Bart Van Assche
  2017-09-01 16:05     ` Ming Lei
  1 sibling, 2 replies; 45+ messages in thread
From: Tejun Heo @ 2017-09-01 13:59 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley,
	Oleksandr Natalenko

Hello, Ming.

> +/**
> + * percpu_ref_is_dead - test whether a percpu refcount is killed
> + * @ref: percpu_ref to test
> + *
> + * Returns %true if @ref is dead
> + *
> + * This function is safe to call as long as @ref is between init and exit.
> + */
> +static inline bool percpu_ref_is_dead(struct percpu_ref *ref)
> +{
> +	unsigned long __percpu *percpu_count;
> +
> +	if (__ref_is_percpu(ref, &percpu_count))
> +		return false;
> +	return ref->percpu_count_ptr & __PERCPU_REF_DEAD;
> +}

Can you please explain why percpu_ref_is_dying() isn't enough for your
use case?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 8/9] block: allow to allocate req with REQF_PREEMPT when queue is frozen
  2017-09-01  3:55     ` Ming Lei
@ 2017-09-01 15:43         ` Bart Van Assche
  0 siblings, 0 replies; 45+ messages in thread
From: Bart Van Assche @ 2017-09-01 15:43 UTC (permalink / raw)
  To: ming.lei
  Cc: linux-scsi, hch, jejb, linux-block, oleksandr, martin.petersen, axboe

T24gRnJpLCAyMDE3LTA5LTAxIGF0IDExOjU1ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gT24g
VGh1LCBBdWcgMzEsIDIwMTcgYXQgMTA6NTA6MjVQTSArMDAwMCwgQmFydCBWYW4gQXNzY2hlIHdy
b3RlOg0KPiA+IE9uIEZyaSwgMjAxNy0wOS0wMSBhdCAwMToyNyArMDgwMCwgTWluZyBMZWkgd3Jv
dGU6DQo+ID4gPiBAQCAtMTQxMyw5ICsxNDE0LDE3IEBAIHN0YXRpYyBzdHJ1Y3QgcmVxdWVzdCAq
YmxrX29sZF9nZXRfcmVxdWVzdChzdHJ1Y3QgcmVxdWVzdF9xdWV1ZSAqcSwNCj4gPiA+ICAJLyog
Y3JlYXRlIGlvYyB1cGZyb250ICovDQo+ID4gPiAgCWNyZWF0ZV9pb19jb250ZXh0KGdmcF9tYXNr
LCBxLT5ub2RlKTsNCj4gPiA+ICANCj4gPiA+IC0JcmV0ID0gYmxrX3F1ZXVlX2VudGVyKHEsICEo
Z2ZwX21hc2sgJiBfX0dGUF9ESVJFQ1RfUkVDTEFJTSkpOw0KPiA+ID4gKwkvKg0KPiA+ID4gKwkg
KiBXaGVuIHF1ZXVlIGlzIGZyb3plbiwgd2Ugc3RpbGwgbmVlZCB0byBhbGxvY2F0ZSByZXEgZm9y
DQo+ID4gPiArCSAqIFJFUUZfUFJFRU1QVC4NCj4gPiA+ICsJICovDQo+ID4gPiArCWlmICgoZmxh
Z3MgJiBCTEtfTVFfUkVRX1BSRUVNUFQpICYmIGJsa19xdWV1ZV9pc19mcm96ZW4ocSkpDQo+ID4g
PiArCQlibGtfcXVldWVfZW50ZXJfbGl2ZShxKTsNCj4gPiA+ICsJZWxzZQ0KPiA+ID4gKwkJcmV0
ID0gYmxrX3F1ZXVlX2VudGVyKHEsICEoZ2ZwX21hc2sgJiBfX0dGUF9ESVJFQ1RfUkVDTEFJTSkp
Ow0KPiA+IA0KPiA+IFNvcnJ5IGJ1dCBJIGRvdWJ0IHRoYXQgY2FsbGluZyBibGtfcXVldWVfZW50
ZXJfbGl2ZSgpIGZyb20gaW5zaWRlDQo+ID4gYmxrX29sZF9nZXRfcmVxdWVzdCgpIGlzIHNhZmUu
IENhbGxpbmcgYmxrX3F1ZXVlX2VudGVyX2xpdmUoKSBpcyBvbmx5IHNhZmUNCj4gPiBiZWZvcmUg
YSByZXF1ZXN0IHF1ZXVlIGhhcyBiZWVuIG1hcmtlZCBkZWFkLiBXaGF0IHByZXZlbnRzIGEga2Vy
bmVsIHRocmVhZA0KPiA+IHRoYXQgaG9sZHMgYSByZWZlcmVuY2Ugb24gYSByZXF1ZXN0IHF1ZXVl
IGFuZCB0aGF0IGlzIHJ1bm5pbmcgY29uY3VycmVudGx5DQo+ID4gd2l0aCBibGtfY2xlYW51cF9x
dWV1ZSgpIHRvIGNhbGwgYmxrX29sZF9nZXRfcmVxdWVzdCgpIGFmdGVyIGEgcXVldWUgaGFzDQo+
ID4gYmVlbiBtYXJrZWQgZGVhZD8NCj4gDQo+IEkgaGF2ZSBzZW50IG9uZSBkZWx0YSBwYXRjaCBp
biBsaXN0LCB3aGljaCB3aWxsIG9ubHkgY2FsbA0KPiBibGtfcXVldWVfZW50ZXJfbGl2ZSgpIGlm
ZiBTQ1NJIGRldmljZSBpcyBpbiBRVUlFU0NFLg0KDQpUaGF0IHdvdWxkbid0IG1ha2UgdGhpcyBo
YWNrIGxlc3MgdWdseS4NCg0KSXQgaXMgcG9zc2libGUgdG8gdHJpZ2dlciB0aGUgcnVubmluZyAt
PiBxdWllc2NlIHN0YXRlIHRyYW5zaXRpb24gdGhyb3VnaA0KL3N5cy9jbGFzcy9zY3NpX2Rldmlj
ZS8qL2RldmljZS9zdGF0ZSBhbmQgdGhlIHF1aWVzY2UgLT4gcmVtb3ZlZCB0cmFuc2l0aW9uDQp0
aHJvdWdoIC9zeXMvY2xhc3Mvc2NzaV9kZXZpY2UvKi9kZXZpY2UvZGVsZXRlLiBBbiBleGFtcGxl
Og0KDQptb2Rwcm9iZSBzY3NpX2RlYnVnIGRlbGF5PTAgZGV2X3NpemVfbWI9MTYNCmxzc2NzaSB8
IGdyZXAgc2NzaV9kZWJ1Zw0KY2QgL3N5cy9jbGFzcy9zY3NpX2RldmljZS84OjA6MDowL2Rldmlj
ZQ0KZWNobyBxdWllc2NlID4gc3RhdGUNCmVjaG8gMSA+IGRlbGV0ZQ0KDQpTbyB0aGUgY29kZSBm
cm9tIHlvdXIgcGF0Y2ggOC85IGNhbiByYWNlIGFnYWluc3QgZGV2aWNlIHJlbW92YWwuDQoNCkkg
dGhpbmsgd2UgbmVlZCBhIGJldHRlciBhcHByb2FjaCB0aGFuIHRoZSBSRVFfUFJFRU1QVCBoYWNr
LiBIb3cgYWJvdXQNCmltcGxlbWVudGluZyByZXN1bWUgYXMgYSBjYWxsYmFjayBieSBhZGRpbmcg
YW4gYWRkaXRpb25hbCBmdW5jdGlvbiBwb2ludGVyDQp0byBzdHJ1Y3QgcmVxdWVzdF9xdWV1ZSAv
IHN0cnVjdCBibGtfbXFfb3BzIGluc3RlYWQgb2YgaW1wbGVtZW50aW5nIGl0IGFzDQphIHJlcXVl
c3Q/IEZvciBTQ1NJIGRldmljZXMgcmFjZXMgb2YgcmVzdW1lIGFnYWluc3QgcmVtb3ZhbCBjYW4g
ZS5nLiBiZQ0KaGFuZGxlZCBieSBjaGVja2luZyB0aGUgc2NzaV9kZXZpY2Vfc2V0X3N0YXRlKCkg
cmV0dXJuIHZhbHVlLiBUaGF0IGZ1bmN0aW9uDQpuYW1lbHkgZG9lcyBub3QgYWxsb3cgdGhlIHJl
bW92aW5nL3JlbW92ZWQgdG8gcnVubmluZyBzdGF0ZSB0cmFuc2l0aW9uLg0KDQpCYXJ0Lg==

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 8/9] block: allow to allocate req with REQF_PREEMPT when queue is frozen
@ 2017-09-01 15:43         ` Bart Van Assche
  0 siblings, 0 replies; 45+ messages in thread
From: Bart Van Assche @ 2017-09-01 15:43 UTC (permalink / raw)
  To: ming.lei
  Cc: linux-scsi, hch, jejb, linux-block, oleksandr, martin.petersen, axboe

On Fri, 2017-09-01 at 11:55 +0800, Ming Lei wrote:
> On Thu, Aug 31, 2017 at 10:50:25PM +0000, Bart Van Assche wrote:
> > On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> > > @@ -1413,9 +1414,17 @@ static struct request *blk_old_get_request(struct request_queue *q,
> > >  	/* create ioc upfront */
> > >  	create_io_context(gfp_mask, q->node);
> > >  
> > > -	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
> > > +	/*
> > > +	 * When queue is frozen, we still need to allocate req for
> > > +	 * REQF_PREEMPT.
> > > +	 */
> > > +	if ((flags & BLK_MQ_REQ_PREEMPT) && blk_queue_is_frozen(q))
> > > +		blk_queue_enter_live(q);
> > > +	else
> > > +		ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
> > 
> > Sorry but I doubt that calling blk_queue_enter_live() from inside
> > blk_old_get_request() is safe. Calling blk_queue_enter_live() is only safe
> > before a request queue has been marked dead. What prevents a kernel thread
> > that holds a reference on a request queue and that is running concurrently
> > with blk_cleanup_queue() to call blk_old_get_request() after a queue has
> > been marked dead?
> 
> I have sent one delta patch in list, which will only call
> blk_queue_enter_live() iff SCSI device is in QUIESCE.

That wouldn't make this hack less ugly.

It is possible to trigger the running -> quiesce state transition through
/sys/class/scsi_device/*/device/state and the quiesce -> removed transition
through /sys/class/scsi_device/*/device/delete. An example:

modprobe scsi_debug delay=0 dev_size_mb=16
lsscsi | grep scsi_debug
cd /sys/class/scsi_device/8:0:0:0/device
echo quiesce > state
echo 1 > delete

So the code from your patch 8/9 can race against device removal.

I think we need a better approach than the REQ_PREEMPT hack. How about
implementing resume as a callback by adding an additional function pointer
to struct request_queue / struct blk_mq_ops instead of implementing it as
a request? For SCSI devices races of resume against removal can e.g. be
handled by checking the scsi_device_set_state() return value. That function
namely does not allow the removing/removed to running state transition.

Bart.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead()
  2017-09-01 13:59   ` Tejun Heo
@ 2017-09-01 15:44       ` Bart Van Assche
  2017-09-01 16:05     ` Ming Lei
  1 sibling, 0 replies; 45+ messages in thread
From: Bart Van Assche @ 2017-09-01 15:44 UTC (permalink / raw)
  To: tj, ming.lei
  Cc: linux-block, hch, Bart Van Assche, martin.petersen, axboe,
	linux-scsi, oleksandr, jejb

T24gRnJpLCAyMDE3LTA5LTAxIGF0IDA2OjU5IC0wNzAwLCBUZWp1biBIZW8gd3JvdGU6DQo+ID4g
Ky8qKg0KPiA+ICsgKiBwZXJjcHVfcmVmX2lzX2RlYWQgLSB0ZXN0IHdoZXRoZXIgYSBwZXJjcHUg
cmVmY291bnQgaXMga2lsbGVkDQo+ID4gKyAqIEByZWY6IHBlcmNwdV9yZWYgdG8gdGVzdA0KPiA+
ICsgKg0KPiA+ICsgKiBSZXR1cm5zICV0cnVlIGlmIEByZWYgaXMgZGVhZA0KPiA+ICsgKg0KPiA+
ICsgKiBUaGlzIGZ1bmN0aW9uIGlzIHNhZmUgdG8gY2FsbCBhcyBsb25nIGFzIEByZWYgaXMgYmV0
d2VlbiBpbml0IGFuZCBleGl0Lg0KPiA+ICsgKi8NCj4gPiArc3RhdGljIGlubGluZSBib29sIHBl
cmNwdV9yZWZfaXNfZGVhZChzdHJ1Y3QgcGVyY3B1X3JlZiAqcmVmKQ0KPiA+ICt7DQo+ID4gKwl1
bnNpZ25lZCBsb25nIF9fcGVyY3B1ICpwZXJjcHVfY291bnQ7DQo+ID4gKw0KPiA+ICsJaWYgKF9f
cmVmX2lzX3BlcmNwdShyZWYsICZwZXJjcHVfY291bnQpKQ0KPiA+ICsJCXJldHVybiBmYWxzZTsN
Cj4gPiArCXJldHVybiByZWYtPnBlcmNwdV9jb3VudF9wdHIgJiBfX1BFUkNQVV9SRUZfREVBRDsN
Cj4gPiArfQ0KPiANCj4gQ2FuIHlvdSBwbGVhc2UgZXhwbGFpbiB3aHkgcGVyY3B1X3JlZl9pc19k
eWluZygpIGlzbid0IGVub3VnaCBmb3IgeW91cg0KPiB1c2UgY2FzZT8NCg0KSGVsbG8gVGVqdW4s
DQoNClRoZSBhcHByb2FjaCBvZiB0aGUgd2hvbGUgc2VyaWVzIGlzIHdyb25nIHNvIEkgdGhpbmsg
dGhhdCB5b3UgY2FuIGlnbm9yZSB0aGlzIHBhdGNoLg0KDQpCYXJ0Lg==

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead()
@ 2017-09-01 15:44       ` Bart Van Assche
  0 siblings, 0 replies; 45+ messages in thread
From: Bart Van Assche @ 2017-09-01 15:44 UTC (permalink / raw)
  To: tj, ming.lei
  Cc: linux-block, hch, Bart Van Assche, martin.petersen, axboe,
	linux-scsi, oleksandr, jejb

On Fri, 2017-09-01 at 06:59 -0700, Tejun Heo wrote:
> > +/**
> > + * percpu_ref_is_dead - test whether a percpu refcount is killed
> > + * @ref: percpu_ref to test
> > + *
> > + * Returns %true if @ref is dead
> > + *
> > + * This function is safe to call as long as @ref is between init and exit.
> > + */
> > +static inline bool percpu_ref_is_dead(struct percpu_ref *ref)
> > +{
> > +	unsigned long __percpu *percpu_count;
> > +
> > +	if (__ref_is_percpu(ref, &percpu_count))
> > +		return false;
> > +	return ref->percpu_count_ptr & __PERCPU_REF_DEAD;
> > +}
> 
> Can you please explain why percpu_ref_is_dying() isn't enough for your
> use case?

Hello Tejun,

The approach of the whole series is wrong so I think that you can ignore this patch.

Bart.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead()
  2017-09-01 13:59   ` Tejun Heo
  2017-09-01 15:44       ` Bart Van Assche
@ 2017-09-01 16:05     ` Ming Lei
  1 sibling, 0 replies; 45+ messages in thread
From: Ming Lei @ 2017-09-01 16:05 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley,
	Oleksandr Natalenko

On Fri, Sep 01, 2017 at 06:59:15AM -0700, Tejun Heo wrote:
> Hello, Ming.
> 
> > +/**
> > + * percpu_ref_is_dead - test whether a percpu refcount is killed
> > + * @ref: percpu_ref to test
> > + *
> > + * Returns %true if @ref is dead
> > + *
> > + * This function is safe to call as long as @ref is between init and exit.
> > + */
> > +static inline bool percpu_ref_is_dead(struct percpu_ref *ref)
> > +{
> > +	unsigned long __percpu *percpu_count;
> > +
> > +	if (__ref_is_percpu(ref, &percpu_count))
> > +		return false;
> > +	return ref->percpu_count_ptr & __PERCPU_REF_DEAD;
> > +}
> 
> Can you please explain why percpu_ref_is_dying() isn't enough for your
> use case?

Hi Tejun,

You are right, looks percpu_ref_is_dying() is enough,
percpu_ref_get_many() works fine no matter the ref is dying or not.

Thanks!

-- 
Ming

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 8/9] block: allow to allocate req with REQF_PREEMPT when queue is frozen
  2017-09-01 15:43         ` Bart Van Assche
  (?)
@ 2017-09-01 16:56         ` Ming Lei
  -1 siblings, 0 replies; 45+ messages in thread
From: Ming Lei @ 2017-09-01 16:56 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, hch, jejb, linux-block, oleksandr, martin.petersen, axboe

On Fri, Sep 01, 2017 at 03:43:39PM +0000, Bart Van Assche wrote:
> On Fri, 2017-09-01 at 11:55 +0800, Ming Lei wrote:
> > On Thu, Aug 31, 2017 at 10:50:25PM +0000, Bart Van Assche wrote:
> > > On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> > > > @@ -1413,9 +1414,17 @@ static struct request *blk_old_get_request(struct request_queue *q,
> > > >  	/* create ioc upfront */
> > > >  	create_io_context(gfp_mask, q->node);
> > > >  
> > > > -	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
> > > > +	/*
> > > > +	 * When queue is frozen, we still need to allocate req for
> > > > +	 * REQF_PREEMPT.
> > > > +	 */
> > > > +	if ((flags & BLK_MQ_REQ_PREEMPT) && blk_queue_is_frozen(q))
> > > > +		blk_queue_enter_live(q);
> > > > +	else
> > > > +		ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
> > > 
> > > Sorry but I doubt that calling blk_queue_enter_live() from inside
> > > blk_old_get_request() is safe. Calling blk_queue_enter_live() is only safe
> > > before a request queue has been marked dead. What prevents a kernel thread
> > > that holds a reference on a request queue and that is running concurrently
> > > with blk_cleanup_queue() to call blk_old_get_request() after a queue has
> > > been marked dead?
> > 
> > I have sent one delta patch in list, which will only call
> > blk_queue_enter_live() iff SCSI device is in QUIESCE.
> 
> That wouldn't make this hack less ugly.
> 
> It is possible to trigger the running -> quiesce state transition through
> /sys/class/scsi_device/*/device/state and the quiesce -> removed transition
> through /sys/class/scsi_device/*/device/delete. An example:
> 
> modprobe scsi_debug delay=0 dev_size_mb=16
> lsscsi | grep scsi_debug
> cd /sys/class/scsi_device/8:0:0:0/device
> echo quiesce > state
> echo 1 > delete
> 
> So the code from your patch 8/9 can race against device removal.
> 
> I think we need a better approach than the REQ_PREEMPT hack. How about
> implementing resume as a callback by adding an additional function pointer
> to struct request_queue / struct blk_mq_ops instead of implementing it as
> a request? For SCSI devices races of resume against removal can e.g. be
> handled by checking the scsi_device_set_state() return value. That function
> namely does not allow the removing/removed to running state transition.

If there is race between resume vs. removal, that is nothing to do
this patchset.

We definitely need to prevent new requests from being allocated after
SCSI device is put into quiesce. I don't see another better way
than freezing queue, because it is the only available way to 
prevent new req allocation.

Actually there is race between normal freezing and the preempt freezing
in this patchset, and it will be fixed in V2.


-- 
Ming

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2017-09-01 16:56 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-31 17:27 [PATCH 0/9] block/scsi: safe SCSI quiescing Ming Lei
2017-08-31 17:27 ` [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead() Ming Lei
2017-08-31 23:07   ` Bart Van Assche
2017-08-31 23:07     ` Bart Van Assche
2017-09-01  3:49     ` Ming Lei
2017-09-01 13:59   ` Tejun Heo
2017-09-01 15:44     ` Bart Van Assche
2017-09-01 15:44       ` Bart Van Assche
2017-09-01 16:05     ` Ming Lei
2017-08-31 17:27 ` [PATCH 2/9] blk-mq: rename blk_mq_unfreeze_queue as blk_unfreeze_queue Ming Lei
2017-08-31 17:27 ` [PATCH 3/9] blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue Ming Lei
2017-08-31 22:35   ` Bart Van Assche
2017-08-31 22:35     ` Bart Van Assche
2017-09-01  3:51     ` Ming Lei
2017-08-31 17:27 ` [PATCH 4/9] blk-mq: only run hw queues for blk-mq Ming Lei
2017-09-01  8:16   ` Johannes Thumshirn
2017-09-01  8:16     ` Johannes Thumshirn
2017-09-01 12:33     ` Ming Lei
2017-09-01 12:44       ` Johannes Thumshirn
2017-09-01 12:44         ` Johannes Thumshirn
2017-08-31 17:27 ` [PATCH 5/9] block: introduce blk_drain_queue() Ming Lei
2017-08-31 22:34   ` Bart Van Assche
2017-08-31 22:34     ` Bart Van Assche
2017-09-01  3:52     ` Ming Lei
2017-08-31 17:27 ` [PATCH 6/9] blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait Ming Lei
2017-08-31 22:37   ` Bart Van Assche
2017-08-31 22:37     ` Bart Van Assche
2017-09-01  3:53     ` Ming Lei
2017-08-31 17:27 ` [PATCH 7/9] block: tracking request allocation with q_usage_counter Ming Lei
2017-08-31 17:27 ` [PATCH 8/9] block: allow to allocate req with REQF_PREEMPT when queue is frozen Ming Lei
2017-08-31 22:50   ` Bart Van Assche
2017-08-31 22:50     ` Bart Van Assche
2017-09-01  3:55     ` Ming Lei
2017-09-01 15:43       ` Bart Van Assche
2017-09-01 15:43         ` Bart Van Assche
2017-09-01 16:56         ` Ming Lei
2017-08-31 17:27 ` [PATCH 9/9] SCSI: freeze block queue when SCSI device is put into quiesce Ming Lei
2017-08-31 17:34 ` [PATCH 0/9] block/scsi: safe SCSI quiescing Oleksandr Natalenko
2017-08-31 17:34   ` Oleksandr Natalenko
2017-08-31 17:38   ` Ming Lei
2017-08-31 18:31     ` Oleksandr Natalenko
2017-08-31 18:31       ` Oleksandr Natalenko
2017-09-01  3:45       ` Ming Lei
2017-09-01  6:24         ` oleksandr
2017-09-01  6:30           ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.