All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/14] SCSI: introduce per-host admin queue & enable runtime PM
@ 2018-08-07 17:44 Ming Lei
  2018-08-07 17:44 ` [RFC PATCH 01/14] blk-mq: allow to pass default queue flags for creating & initializing queue Ming Lei
                   ` (14 more replies)
  0 siblings, 15 replies; 28+ messages in thread
From: Ming Lei @ 2018-08-07 17:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Alan Stern, Christoph Hellwig,
	Bart Van Assche, Jianchao Wang, Hannes Reinecke,
	Johannes Thumshirn, Adrian Hunter, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi

Hi,

This patchset introduces per-host admin request queue for submitting
admin request only, and uses this approach to implement both SCSI
quiesce and runtime PM in one very simply way. Also runtime PM deadlock
can be avoided in case that request pool is used up.

The idea is borrowed from NVMe.

Admin request is submitted via per-host admin queue, and it is still
associated with the same scsi_device as before, and respects this
scsi_device's all kinds of limits. Admin queue shares host tags with
other IO queues.

One core idea is that for admin request submitted from this admin queue,
this request won't be called back to block layer via the associated IO
queue(scsi_device). And this is done in the 3rd patch. So once IO queue
is frozen, it can be observed as really frozen from block layer view.

SCSI quiesce is implemented by admin queue in very simple way, see patch
12.

Also runtime PM for legacy path can be simplified too, see patch 13.

Finally blk-mq simply follows legacy's approach for supporting runtime PM.

Any comments are welcome!

Thanks,

Ming Lei (14):
  blk-mq: allow to pass default queue flags for creating & initializing
    queue
  blk-mq: convert BLK_MQ_F_NO_SCHED into per-queue flag
  SCSI: try to retrieve request_queue via 'scsi_cmnd' if possible
  SCSI: pass 'scsi_device' instance from 'scsi_request'
  SCSI: prepare for introducing admin queue for legacy path
  SCSI: pass scsi_device to scsi_mq_prep_fn
  SCSI: don't set .queuedata in scsi_mq_alloc_queue()
  SCSI: deal with admin queue busy
  SCSI: create admin queue for each host
  SCSI: use the dedicated admin queue to send admin commands
  SCSI: transport_spi: resume a quiesced device
  SCSI: use admin queue to implement queue QUIESCE
  block: simplify runtime PM support
  block: enable runtime PM for blk-mq

 block/blk-core.c                    | 122 ++++++------------
 block/blk-mq-debugfs.c              |   3 +-
 block/blk-mq.c                      |  22 ++--
 block/elevator.c                    |  10 +-
 drivers/block/null_blk_main.c       |   7 +-
 drivers/nvme/host/fc.c              |   4 +-
 drivers/nvme/host/pci.c             |   4 +-
 drivers/nvme/host/rdma.c            |   4 +-
 drivers/nvme/target/loop.c          |   4 +-
 drivers/scsi/hosts.c                |   9 ++
 drivers/scsi/libsas/sas_ata.c       |   2 +-
 drivers/scsi/libsas/sas_scsi_host.c |   2 +-
 drivers/scsi/scsi_error.c           |   2 +-
 drivers/scsi/scsi_lib.c             | 250 ++++++++++++++++++++++++++----------
 drivers/scsi/scsi_priv.h            |   1 +
 drivers/scsi/scsi_scan.c            |   1 +
 drivers/scsi/scsi_sysfs.c           |   1 +
 drivers/scsi/scsi_transport_spi.c   |   3 +
 include/linux/blk-mq.h              |  20 ++-
 include/linux/blkdev.h              |  13 +-
 include/scsi/scsi_device.h          |   5 +-
 include/scsi/scsi_host.h            |   2 +
 include/scsi/scsi_request.h         |   5 +-
 23 files changed, 306 insertions(+), 190 deletions(-)

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org

-- 
2.9.5

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH 01/14] blk-mq: allow to pass default queue flags for creating & initializing queue
  2018-08-07 17:44 [RFC PATCH 00/14] SCSI: introduce per-host admin queue & enable runtime PM Ming Lei
@ 2018-08-07 17:44 ` Ming Lei
  2018-08-07 17:44 ` [RFC PATCH 02/14] blk-mq: convert BLK_MQ_F_NO_SCHED into per-queue flag Ming Lei
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2018-08-07 17:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Alan Stern, Christoph Hellwig,
	Bart Van Assche, Jianchao Wang, Hannes Reinecke,
	Johannes Thumshirn, Adrian Hunter, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi

Prepare for converting the flag of BLK_MQ_F_NO_SCHED into per-queue
flag, since the following patches need this way for supporting per-host
admin queue.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c         | 16 +++++++++-------
 include/linux/blk-mq.h | 19 ++++++++++++++++---
 2 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c92ce06fd565..e2d0fe503d8d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2479,7 +2479,8 @@ void blk_mq_release(struct request_queue *q)
 	free_percpu(q->queue_ctx);
 }
 
-struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
+struct request_queue *__blk_mq_init_queue(struct blk_mq_tag_set *set,
+					  unsigned long def_flags)
 {
 	struct request_queue *uninit_q, *q;
 
@@ -2487,13 +2488,13 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
 	if (!uninit_q)
 		return ERR_PTR(-ENOMEM);
 
-	q = blk_mq_init_allocated_queue(set, uninit_q);
+	q = __blk_mq_init_allocated_queue(set, uninit_q, def_flags);
 	if (IS_ERR(q))
 		blk_cleanup_queue(uninit_q);
 
 	return q;
 }
-EXPORT_SYMBOL(blk_mq_init_queue);
+EXPORT_SYMBOL(__blk_mq_init_queue);
 
 static int blk_mq_hw_ctx_size(struct blk_mq_tag_set *tag_set)
 {
@@ -2567,8 +2568,9 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
 	blk_mq_sysfs_register(q);
 }
 
-struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
-						  struct request_queue *q)
+struct request_queue *__blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
+						    struct request_queue *q,
+						    unsigned long def_flags)
 {
 	/* mark the queue as mq asap */
 	q->mq_ops = set->ops;
@@ -2602,7 +2604,7 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 
 	q->nr_queues = nr_cpu_ids;
 
-	q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT;
+	q->queue_flags |= def_flags;
 
 	if (!(set->flags & BLK_MQ_F_SG_MERGE))
 		queue_flag_set_unlocked(QUEUE_FLAG_NO_SG_MERGE, q);
@@ -2652,7 +2654,7 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	q->mq_ops = NULL;
 	return ERR_PTR(-ENOMEM);
 }
-EXPORT_SYMBOL(blk_mq_init_allocated_queue);
+EXPORT_SYMBOL(__blk_mq_init_allocated_queue);
 
 void blk_mq_free_queue(struct request_queue *q)
 {
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 1da59c16f637..7f6ecd7b35ce 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -200,9 +200,22 @@ enum {
 	((policy & ((1 << BLK_MQ_F_ALLOC_POLICY_BITS) - 1)) \
 		<< BLK_MQ_F_ALLOC_POLICY_START_BIT)
 
-struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *);
-struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
-						  struct request_queue *q);
+struct request_queue *__blk_mq_init_queue(struct blk_mq_tag_set *, unsigned long);
+struct request_queue *__blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
+						    struct request_queue *q,
+						    unsigned long def_flags);
+
+static inline struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
+{
+	return __blk_mq_init_queue(set, QUEUE_FLAG_MQ_DEFAULT);
+}
+
+static inline struct request_queue *
+blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, struct request_queue *q)
+{
+	return __blk_mq_init_allocated_queue(set, q, QUEUE_FLAG_MQ_DEFAULT);
+}
+
 int blk_mq_register_dev(struct device *, struct request_queue *);
 void blk_mq_unregister_dev(struct device *, struct request_queue *);
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 02/14] blk-mq: convert BLK_MQ_F_NO_SCHED into per-queue flag
  2018-08-07 17:44 [RFC PATCH 00/14] SCSI: introduce per-host admin queue & enable runtime PM Ming Lei
  2018-08-07 17:44 ` [RFC PATCH 01/14] blk-mq: allow to pass default queue flags for creating & initializing queue Ming Lei
@ 2018-08-07 17:44 ` Ming Lei
  2018-08-07 17:44 ` [RFC PATCH 03/14] SCSI: try to retrieve request_queue via 'scsi_cmnd' if possible Ming Lei
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2018-08-07 17:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Alan Stern, Christoph Hellwig,
	Bart Van Assche, Jianchao Wang, Hannes Reinecke,
	Johannes Thumshirn, Adrian Hunter, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi

We need to support admin queue for scsi host, and not like NVMe,
this support is only from logic view, and the admin queue still has
to share same tags with IO queues.

Convert BLK_MQ_F_NO_SCHED into per-queue flag so that we can support
admin queue for SCSI.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-debugfs.c        | 2 +-
 block/blk-mq.c                | 2 +-
 block/elevator.c              | 3 +--
 drivers/block/null_blk_main.c | 7 ++++---
 drivers/nvme/host/fc.c        | 4 ++--
 drivers/nvme/host/pci.c       | 4 ++--
 drivers/nvme/host/rdma.c      | 4 ++--
 drivers/nvme/target/loop.c    | 4 ++--
 include/linux/blk-mq.h        | 1 -
 include/linux/blkdev.h        | 5 +++++
 10 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index cb1e6cf7ac48..246c9afb6f5d 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -133,6 +133,7 @@ static const char *const blk_queue_flag_name[] = {
 	QUEUE_FLAG_NAME(SCSI_PASSTHROUGH),
 	QUEUE_FLAG_NAME(QUIESCED),
 	QUEUE_FLAG_NAME(PREEMPT_ONLY),
+	QUEUE_FLAG_NAME(NO_SCHED),
 };
 #undef QUEUE_FLAG_NAME
 
@@ -246,7 +247,6 @@ static const char *const hctx_flag_name[] = {
 	HCTX_FLAG_NAME(TAG_SHARED),
 	HCTX_FLAG_NAME(SG_MERGE),
 	HCTX_FLAG_NAME(BLOCKING),
-	HCTX_FLAG_NAME(NO_SCHED),
 };
 #undef HCTX_FLAG_NAME
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index e2d0fe503d8d..cf0790b628e4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2636,7 +2636,7 @@ struct request_queue *__blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	blk_mq_add_queue_tag_set(set, q);
 	blk_mq_map_swqueue(q);
 
-	if (!(set->flags & BLK_MQ_F_NO_SCHED)) {
+	if (!blk_queue_no_sched(q)) {
 		int ret;
 
 		ret = elevator_init_mq(q);
diff --git a/block/elevator.c b/block/elevator.c
index fa828b5bfd4b..a34fecbe7e81 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -1108,8 +1108,7 @@ static int __elevator_change(struct request_queue *q, const char *name)
 
 static inline bool elv_support_iosched(struct request_queue *q)
 {
-	if (q->mq_ops && q->tag_set && (q->tag_set->flags &
-				BLK_MQ_F_NO_SCHED))
+	if (q->mq_ops && blk_queue_no_sched(q))
 		return false;
 	return true;
 }
diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
index 86cafa6d3b41..9a0170a808f6 100644
--- a/drivers/block/null_blk_main.c
+++ b/drivers/block/null_blk_main.c
@@ -1615,8 +1615,6 @@ static int null_init_tag_set(struct nullb *nullb, struct blk_mq_tag_set *set)
 	set->numa_node = nullb ? nullb->dev->home_node : g_home_node;
 	set->cmd_size	= sizeof(struct nullb_cmd);
 	set->flags = BLK_MQ_F_SHOULD_MERGE;
-	if (g_no_sched)
-		set->flags |= BLK_MQ_F_NO_SCHED;
 	set->driver_data = NULL;
 
 	if ((nullb && nullb->dev->blocking) || g_blocking)
@@ -1701,6 +1699,9 @@ static int null_add_dev(struct nullb_device *dev)
 		goto out_free_nullb;
 
 	if (dev->queue_mode == NULL_Q_MQ) {
+		unsigned long q_flags = g_no_sched ?
+			QUEUE_FLAG_MQ_NO_SCHED_DEFAULT : QUEUE_FLAG_MQ_DEFAULT;
+
 		if (shared_tags) {
 			nullb->tag_set = &tag_set;
 			rv = 0;
@@ -1716,7 +1717,7 @@ static int null_add_dev(struct nullb_device *dev)
 			goto out_cleanup_queues;
 
 		nullb->tag_set->timeout = 5 * HZ;
-		nullb->q = blk_mq_init_queue(nullb->tag_set);
+		nullb->q = __blk_mq_init_queue(nullb->tag_set, q_flags);
 		if (IS_ERR(nullb->q)) {
 			rv = -ENOMEM;
 			goto out_cleanup_tags;
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 611e70cae754..7048e1444210 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3034,14 +3034,14 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 	ctrl->admin_tag_set.driver_data = ctrl;
 	ctrl->admin_tag_set.nr_hw_queues = 1;
 	ctrl->admin_tag_set.timeout = ADMIN_TIMEOUT;
-	ctrl->admin_tag_set.flags = BLK_MQ_F_NO_SCHED;
 
 	ret = blk_mq_alloc_tag_set(&ctrl->admin_tag_set);
 	if (ret)
 		goto out_free_queues;
 	ctrl->ctrl.admin_tagset = &ctrl->admin_tag_set;
 
-	ctrl->ctrl.admin_q = blk_mq_init_queue(&ctrl->admin_tag_set);
+	ctrl->ctrl.admin_q = __blk_mq_init_queue(&ctrl->admin_tag_set,
+			QUEUE_FLAG_MQ_NO_SCHED_DEFAULT);
 	if (IS_ERR(ctrl->ctrl.admin_q)) {
 		ret = PTR_ERR(ctrl->ctrl.admin_q);
 		goto out_free_admin_tag_set;
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 1b9951d2067e..d7aabd87d57e 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1484,14 +1484,14 @@ static int nvme_alloc_admin_tags(struct nvme_dev *dev)
 		dev->admin_tagset.timeout = ADMIN_TIMEOUT;
 		dev->admin_tagset.numa_node = dev_to_node(dev->dev);
 		dev->admin_tagset.cmd_size = nvme_pci_cmd_size(dev, false);
-		dev->admin_tagset.flags = BLK_MQ_F_NO_SCHED;
 		dev->admin_tagset.driver_data = dev;
 
 		if (blk_mq_alloc_tag_set(&dev->admin_tagset))
 			return -ENOMEM;
 		dev->ctrl.admin_tagset = &dev->admin_tagset;
 
-		dev->ctrl.admin_q = blk_mq_init_queue(&dev->admin_tagset);
+		dev->ctrl.admin_q = __blk_mq_init_queue(&dev->admin_tagset,
+				QUEUE_FLAG_MQ_NO_SCHED_DEFAULT);
 		if (IS_ERR(dev->ctrl.admin_q)) {
 			blk_mq_free_tag_set(&dev->admin_tagset);
 			return -ENOMEM;
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 0805fa6215ee..447c6d5c55d5 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -692,7 +692,6 @@ static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
 		set->driver_data = ctrl;
 		set->nr_hw_queues = 1;
 		set->timeout = ADMIN_TIMEOUT;
-		set->flags = BLK_MQ_F_NO_SCHED;
 	} else {
 		set = &ctrl->tag_set;
 		memset(set, 0, sizeof(*set));
@@ -770,7 +769,8 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
 			goto out_free_async_qe;
 		}
 
-		ctrl->ctrl.admin_q = blk_mq_init_queue(&ctrl->admin_tag_set);
+		ctrl->ctrl.admin_q = __blk_mq_init_queue(&ctrl->admin_tag_set,
+				QUEUE_FLAG_MQ_NO_SCHED_DEFAULT);
 		if (IS_ERR(ctrl->ctrl.admin_q)) {
 			error = PTR_ERR(ctrl->ctrl.admin_q);
 			goto out_free_tagset;
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index 9908082b32c4..c689621c2187 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -368,7 +368,6 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
 	ctrl->admin_tag_set.driver_data = ctrl;
 	ctrl->admin_tag_set.nr_hw_queues = 1;
 	ctrl->admin_tag_set.timeout = ADMIN_TIMEOUT;
-	ctrl->admin_tag_set.flags = BLK_MQ_F_NO_SCHED;
 
 	ctrl->queues[0].ctrl = ctrl;
 	error = nvmet_sq_init(&ctrl->queues[0].nvme_sq);
@@ -381,7 +380,8 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
 		goto out_free_sq;
 	ctrl->ctrl.admin_tagset = &ctrl->admin_tag_set;
 
-	ctrl->ctrl.admin_q = blk_mq_init_queue(&ctrl->admin_tag_set);
+	ctrl->ctrl.admin_q = __blk_mq_init_queue(&ctrl->admin_tag_set,
+			QUEUE_FLAG_MQ_NO_SCHED_DEFAULT);
 	if (IS_ERR(ctrl->ctrl.admin_q)) {
 		error = PTR_ERR(ctrl->ctrl.admin_q);
 		goto out_free_tagset;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 7f6ecd7b35ce..afde18ac5b31 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -181,7 +181,6 @@ enum {
 	BLK_MQ_F_TAG_SHARED	= 1 << 1,
 	BLK_MQ_F_SG_MERGE	= 1 << 2,
 	BLK_MQ_F_BLOCKING	= 1 << 5,
-	BLK_MQ_F_NO_SCHED	= 1 << 6,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
 	BLK_MQ_F_ALLOC_POLICY_BITS = 1,
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 050d599f5ea9..962945579f2a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -701,6 +701,7 @@ struct request_queue {
 #define QUEUE_FLAG_SCSI_PASSTHROUGH 27	/* queue supports SCSI commands */
 #define QUEUE_FLAG_QUIESCED    28	/* queue has been quiesced */
 #define QUEUE_FLAG_PREEMPT_ONLY	29	/* only process REQ_PREEMPT requests */
+#define QUEUE_FLAG_NO_SCHED	30	/* no scheduler allowed */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_SAME_COMP)	|	\
@@ -710,6 +711,9 @@ struct request_queue {
 				 (1 << QUEUE_FLAG_SAME_COMP)	|	\
 				 (1 << QUEUE_FLAG_POLL))
 
+#define QUEUE_FLAG_MQ_NO_SCHED_DEFAULT	 (QUEUE_FLAG_MQ_DEFAULT |	\
+					  (1 << QUEUE_FLAG_NO_SCHED))
+
 void blk_queue_flag_set(unsigned int flag, struct request_queue *q);
 void blk_queue_flag_clear(unsigned int flag, struct request_queue *q);
 bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
@@ -741,6 +745,7 @@ bool blk_queue_flag_test_and_clear(unsigned int flag, struct request_queue *q);
 #define blk_queue_preempt_only(q)				\
 	test_bit(QUEUE_FLAG_PREEMPT_ONLY, &(q)->queue_flags)
 #define blk_queue_fua(q)	test_bit(QUEUE_FLAG_FUA, &(q)->queue_flags)
+#define blk_queue_no_sched(q)	test_bit(QUEUE_FLAG_NO_SCHED, &(q)->queue_flags)
 
 extern int blk_set_preempt_only(struct request_queue *q);
 extern void blk_clear_preempt_only(struct request_queue *q);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 03/14] SCSI: try to retrieve request_queue via 'scsi_cmnd' if possible
  2018-08-07 17:44 [RFC PATCH 00/14] SCSI: introduce per-host admin queue & enable runtime PM Ming Lei
  2018-08-07 17:44 ` [RFC PATCH 01/14] blk-mq: allow to pass default queue flags for creating & initializing queue Ming Lei
  2018-08-07 17:44 ` [RFC PATCH 02/14] blk-mq: convert BLK_MQ_F_NO_SCHED into per-queue flag Ming Lei
@ 2018-08-07 17:44 ` Ming Lei
  2018-08-07 17:44 ` [RFC PATCH 04/14] SCSI: pass 'scsi_device' instance from 'scsi_request' Ming Lei
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2018-08-07 17:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Alan Stern, Christoph Hellwig,
	Bart Van Assche, Jianchao Wang, Hannes Reinecke,
	Johannes Thumshirn, Adrian Hunter, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi

Prepare for introduing per-host admin queue.

The most important part is that the request originated from admin queue
can't be called back to the IO queue associated with scsi_device, especially,
one request may be requeued, timedout or completed via block layer helper, so
what we should do is to use 'scsi_cmnd->request->q' to retrieve the request
queue, and pass that to block layer helper, instead of
sdev->request_queue.

Fortunately most of users of 'scsi_device->request_queue' aren't in related IO
path(requeue, timeout, complete, run queue), so the audit isn't more difficult
than I thought of.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/libsas/sas_ata.c       | 2 +-
 drivers/scsi/libsas/sas_scsi_host.c | 2 +-
 drivers/scsi/scsi_error.c           | 2 +-
 drivers/scsi/scsi_lib.c             | 6 +++---
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index ff1d612f6fb9..590dffe6e960 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -594,7 +594,7 @@ void sas_ata_task_abort(struct sas_task *task)
 
 	/* Bounce SCSI-initiated commands to the SCSI EH */
 	if (qc->scsicmd) {
-		struct request_queue *q = qc->scsicmd->device->request_queue;
+		struct request_queue *q = qc->scsicmd->request->q;
 		unsigned long flags;
 
 		spin_lock_irqsave(q->queue_lock, flags);
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index ceab5e5c41c2..42994ad9549c 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -933,7 +933,7 @@ void sas_task_abort(struct sas_task *task)
 	if (dev_is_sata(task->dev)) {
 		sas_ata_task_abort(task);
 	} else {
-		struct request_queue *q = sc->device->request_queue;
+		struct request_queue *q = sc->request->q;
 		unsigned long flags;
 
 		spin_lock_irqsave(q->queue_lock, flags);
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 2715cdaa669c..9260b267fe43 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1360,7 +1360,7 @@ static int scsi_eh_try_stu(struct scsi_cmnd *scmd)
 		int i, rtn = NEEDS_RETRY;
 
 		for (i = 0; rtn == NEEDS_RETRY && i < 2; i++)
-			rtn = scsi_send_eh_cmnd(scmd, stu_command, 6, scmd->device->request_queue->rq_timeout, 0);
+			rtn = scsi_send_eh_cmnd(scmd, stu_command, 6, scmd->request->q->rq_timeout, 0);
 
 		if (rtn == SUCCESS)
 			return 0;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9cb9a166fa0c..4341cf8a7322 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -168,7 +168,7 @@ static void scsi_mq_requeue_cmd(struct scsi_cmnd *cmd)
 static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, bool unbusy)
 {
 	struct scsi_device *device = cmd->device;
-	struct request_queue *q = device->request_queue;
+	struct request_queue *q = cmd->request->q;
 	unsigned long flags;
 
 	SCSI_LOG_MLQUEUE(1, scmd_printk(KERN_INFO, cmd,
@@ -667,7 +667,7 @@ static bool scsi_end_request(struct request *req, blk_status_t error,
 {
 	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req);
 	struct scsi_device *sdev = cmd->device;
-	struct request_queue *q = sdev->request_queue;
+	struct request_queue *q = cmd->request->q;
 
 	if (blk_update_request(req, error, bytes))
 		return true;
@@ -792,7 +792,7 @@ static blk_status_t scsi_result_to_blk_status(struct scsi_cmnd *cmd, int result)
 void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 {
 	int result = cmd->result;
-	struct request_queue *q = cmd->device->request_queue;
+	struct request_queue *q = cmd->request->q;
 	struct request *req = cmd->request;
 	blk_status_t error = BLK_STS_OK;
 	struct scsi_sense_hdr sshdr;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 04/14] SCSI: pass 'scsi_device' instance from 'scsi_request'
  2018-08-07 17:44 [RFC PATCH 00/14] SCSI: introduce per-host admin queue & enable runtime PM Ming Lei
                   ` (2 preceding siblings ...)
  2018-08-07 17:44 ` [RFC PATCH 03/14] SCSI: try to retrieve request_queue via 'scsi_cmnd' if possible Ming Lei
@ 2018-08-07 17:44 ` Ming Lei
  2018-08-07 17:44 ` [RFC PATCH 05/14] SCSI: prepare for introducing admin queue for legacy path Ming Lei
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2018-08-07 17:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Alan Stern, Christoph Hellwig,
	Bart Van Assche, Jianchao Wang, Hannes Reinecke,
	Johannes Thumshirn, Adrian Hunter, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi

This patch prepares for introducing SCSI per-host admin queue, which
is only used for queuing admin requests, which are now submitted via
__scsi_execute().

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/scsi_lib.c     | 2 ++
 include/scsi/scsi_request.h | 5 ++++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 4341cf8a7322..62699adaef61 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -279,6 +279,8 @@ int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 	rq->cmd_len = COMMAND_SIZE(cmd[0]);
 	memcpy(rq->cmd, cmd, rq->cmd_len);
 	rq->retries = retries;
+	rq->sdev = sdev;	/* only valid in submit path */
+
 	req->timeout = timeout;
 	req->cmd_flags |= flags;
 	req->rq_flags |= rq_flags | RQF_QUIET;
diff --git a/include/scsi/scsi_request.h b/include/scsi/scsi_request.h
index b06f28c74908..0de6901b48ab 100644
--- a/include/scsi/scsi_request.h
+++ b/include/scsi/scsi_request.h
@@ -14,7 +14,10 @@ struct scsi_request {
 	unsigned int	sense_len;
 	unsigned int	resid_len;	/* residual count */
 	int		retries;
-	void		*sense;
+	union {
+		void		*sense;
+		struct scsi_device *sdev;
+	};
 };
 
 static inline struct scsi_request *scsi_req(struct request *rq)
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 05/14] SCSI: prepare for introducing admin queue for legacy path
  2018-08-07 17:44 [RFC PATCH 00/14] SCSI: introduce per-host admin queue & enable runtime PM Ming Lei
                   ` (3 preceding siblings ...)
  2018-08-07 17:44 ` [RFC PATCH 04/14] SCSI: pass 'scsi_device' instance from 'scsi_request' Ming Lei
@ 2018-08-07 17:44 ` Ming Lei
  2018-08-07 17:44 ` [RFC PATCH 06/14] SCSI: pass scsi_device to scsi_mq_prep_fn Ming Lei
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2018-08-07 17:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Alan Stern, Christoph Hellwig,
	Bart Van Assche, Jianchao Wang, Hannes Reinecke,
	Johannes Thumshirn, Adrian Hunter, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi

Uses scsi_is_admin_queue() and scsi_get_scsi_dev() to retrieve
'scsi_device' for legacy path.

The same approach can be used in SCSI_MQ path too, just not
efficiently, and will deal with this issue when introducing admin
queue for SCSI_MQ.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/scsi_lib.c | 32 +++++++++++++++++++++++++++-----
 1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 62699adaef61..435347f58328 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -46,6 +46,20 @@ static DEFINE_MUTEX(scsi_sense_cache_mutex);
 
 static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd);
 
+/* For admin queue, its queuedata is NULL */
+static inline bool scsi_is_admin_queue(struct request_queue *q)
+{
+	return !q->queuedata;
+}
+
+/* This helper can only be used in req prep stage */
+static inline struct scsi_device *scsi_get_scsi_dev(struct request *rq)
+{
+	if (scsi_is_admin_queue(rq->q))
+		return scsi_req(rq)->sdev;
+	return rq->q->queuedata;
+}
+
 static inline struct kmem_cache *
 scsi_select_sense_cache(bool unchecked_isa_dma)
 {
@@ -1378,7 +1392,7 @@ scsi_prep_state_check(struct scsi_device *sdev, struct request *req)
 static int
 scsi_prep_return(struct request_queue *q, struct request *req, int ret)
 {
-	struct scsi_device *sdev = q->queuedata;
+	struct scsi_device *sdev = scsi_get_scsi_dev(req);
 
 	switch (ret) {
 	case BLKPREP_KILL:
@@ -1411,7 +1425,7 @@ scsi_prep_return(struct request_queue *q, struct request *req, int ret)
 
 static int scsi_prep_fn(struct request_queue *q, struct request *req)
 {
-	struct scsi_device *sdev = q->queuedata;
+	struct scsi_device *sdev = scsi_get_scsi_dev(req);
 	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req);
 	int ret;
 
@@ -1613,6 +1627,9 @@ static int scsi_lld_busy(struct request_queue *q)
 	if (blk_queue_dying(q))
 		return 0;
 
+	if (WARN_ON_ONCE(scsi_is_admin_queue(q)))
+		return 0;
+
 	shost = sdev->host;
 
 	/*
@@ -1816,7 +1833,7 @@ static void scsi_request_fn(struct request_queue *q)
 	__releases(q->queue_lock)
 	__acquires(q->queue_lock)
 {
-	struct scsi_device *sdev = q->queuedata;
+	struct scsi_device *sdev;
 	struct Scsi_Host *shost;
 	struct scsi_cmnd *cmd;
 	struct request *req;
@@ -1825,7 +1842,6 @@ static void scsi_request_fn(struct request_queue *q)
 	 * To start with, we keep looping until the queue is empty, or until
 	 * the host is no longer able to accept any more requests.
 	 */
-	shost = sdev->host;
 	for (;;) {
 		int rtn;
 		/*
@@ -1837,6 +1853,10 @@ static void scsi_request_fn(struct request_queue *q)
 		if (!req)
 			break;
 
+		cmd = blk_mq_rq_to_pdu(req);
+		sdev = cmd->device;
+		shost = sdev->host;
+
 		if (unlikely(!scsi_device_online(sdev))) {
 			sdev_printk(KERN_ERR, sdev,
 				    "rejecting I/O to offline device\n");
@@ -1854,7 +1874,6 @@ static void scsi_request_fn(struct request_queue *q)
 			blk_start_request(req);
 
 		spin_unlock_irq(q->queue_lock);
-		cmd = blk_mq_rq_to_pdu(req);
 		if (cmd != req->special) {
 			printk(KERN_CRIT "impossible request in %s.\n"
 					 "please mail a stack trace to "
@@ -2332,6 +2351,9 @@ struct scsi_device *scsi_device_from_queue(struct request_queue *q)
 {
 	struct scsi_device *sdev = NULL;
 
+	/* admin queue won't be exposed to external users */
+	WARN_ON_ONCE(scsi_is_admin_queue(q));
+
 	if (q->mq_ops) {
 		if (q->mq_ops == &scsi_mq_ops)
 			sdev = q->queuedata;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 06/14] SCSI: pass scsi_device to scsi_mq_prep_fn
  2018-08-07 17:44 [RFC PATCH 00/14] SCSI: introduce per-host admin queue & enable runtime PM Ming Lei
                   ` (4 preceding siblings ...)
  2018-08-07 17:44 ` [RFC PATCH 05/14] SCSI: prepare for introducing admin queue for legacy path Ming Lei
@ 2018-08-07 17:44 ` Ming Lei
  2018-08-07 23:24   ` Bart Van Assche
  2018-08-07 17:44 ` [RFC PATCH 07/14] SCSI: don't set .queuedata in scsi_mq_alloc_queue() Ming Lei
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2018-08-07 17:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Alan Stern, Christoph Hellwig,
	Bart Van Assche, Jianchao Wang, Hannes Reinecke,
	Johannes Thumshirn, Adrian Hunter, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi

This patchset will introduce per-host admin queue, so it may not to get
'scsi_device' via q->queuedata.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/scsi_lib.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 435347f58328..52e498fb6280 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1971,10 +1971,9 @@ static unsigned int scsi_mq_sgl_size(struct Scsi_Host *shost)
 		sizeof(struct scatterlist);
 }
 
-static int scsi_mq_prep_fn(struct request *req)
+static int scsi_mq_prep_fn(struct scsi_device *sdev, struct request *req)
 {
 	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req);
-	struct scsi_device *sdev = req->q->queuedata;
 	struct Scsi_Host *shost = sdev->host;
 	struct scatterlist *sg;
 
@@ -2070,7 +2069,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
 		goto out_dec_target_busy;
 
 	if (!(req->rq_flags & RQF_DONTPREP)) {
-		ret = prep_to_mq(scsi_mq_prep_fn(req));
+		ret = prep_to_mq(scsi_mq_prep_fn(sdev, req));
 		if (ret != BLK_STS_OK)
 			goto out_dec_host_busy;
 		req->rq_flags |= RQF_DONTPREP;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 07/14] SCSI: don't set .queuedata in scsi_mq_alloc_queue()
  2018-08-07 17:44 [RFC PATCH 00/14] SCSI: introduce per-host admin queue & enable runtime PM Ming Lei
                   ` (5 preceding siblings ...)
  2018-08-07 17:44 ` [RFC PATCH 06/14] SCSI: pass scsi_device to scsi_mq_prep_fn Ming Lei
@ 2018-08-07 17:44 ` Ming Lei
  2018-08-07 17:44 ` [RFC PATCH 08/14] SCSI: deal with admin queue busy Ming Lei
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2018-08-07 17:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Alan Stern, Christoph Hellwig,
	Bart Van Assche, Jianchao Wang, Hannes Reinecke,
	Johannes Thumshirn, Adrian Hunter, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi

.queuedata is set in scsi_alloc_sdev() for both non-mq and scsi_mq,
so not necessary to do it in scsi_mq_alloc_queue().

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/scsi_lib.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 52e498fb6280..0c9c38a82cc8 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2305,7 +2305,6 @@ struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev)
 	if (IS_ERR(sdev->request_queue))
 		return NULL;
 
-	sdev->request_queue->queuedata = sdev;
 	__scsi_init_queue(sdev->host, sdev->request_queue);
 	blk_queue_flag_set(QUEUE_FLAG_SCSI_PASSTHROUGH, sdev->request_queue);
 	return sdev->request_queue;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 08/14] SCSI: deal with admin queue busy
  2018-08-07 17:44 [RFC PATCH 00/14] SCSI: introduce per-host admin queue & enable runtime PM Ming Lei
                   ` (6 preceding siblings ...)
  2018-08-07 17:44 ` [RFC PATCH 07/14] SCSI: don't set .queuedata in scsi_mq_alloc_queue() Ming Lei
@ 2018-08-07 17:44 ` Ming Lei
  2018-08-07 17:44 ` [RFC PATCH 09/14] SCSI: create admin queue for each host Ming Lei
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2018-08-07 17:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Alan Stern, Christoph Hellwig,
	Bart Van Assche, Jianchao Wang, Hannes Reinecke,
	Johannes Thumshirn, Adrian Hunter, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi

When request originated from admin queue isn't queued successfully, we
deal with it just like for normal requests, that said the admin queue
will be rerun after one request in this host is completed.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/scsi_lib.c  | 46 +++++++++++++++++++++++++++++++++-------------
 include/scsi/scsi_host.h |  2 ++
 2 files changed, 35 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 0c9c38a82cc8..d0467c8cb996 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -475,10 +475,14 @@ static void scsi_starved_list_run(struct Scsi_Host *shost)
 	LIST_HEAD(starved_list);
 	struct scsi_device *sdev;
 	unsigned long flags;
+	bool run_admin;
 
 	spin_lock_irqsave(shost->host_lock, flags);
 	list_splice_init(&shost->starved_list, &starved_list);
 
+	run_admin = shost->run_admin_queue;
+	shost->run_admin_queue = false;
+
 	while (!list_empty(&starved_list)) {
 		struct request_queue *slq;
 
@@ -527,6 +531,10 @@ static void scsi_starved_list_run(struct Scsi_Host *shost)
 	/* put any unprocessed entries back */
 	list_splice(&starved_list, &shost->starved_list);
 	spin_unlock_irqrestore(shost->host_lock, flags);
+
+	/* no need to get queue for admin_q */
+	if (run_admin)
+		scsi_kick_queue(shost->admin_q);
 }
 
 /*
@@ -541,14 +549,16 @@ static void scsi_starved_list_run(struct Scsi_Host *shost)
  * Notes:      The previous command was completely finished, start
  *             a new one if possible.
  */
-static void scsi_run_queue(struct request_queue *q)
+static void scsi_run_queue(struct Scsi_Host *shost, struct request_queue *q)
 {
-	struct scsi_device *sdev = q->queuedata;
+	if (!scsi_is_admin_queue(q)) {
+		struct scsi_device *sdev = q->queuedata;
+		if (scsi_target(sdev)->single_lun)
+			scsi_single_lun_run(sdev);
+	}
 
-	if (scsi_target(sdev)->single_lun)
-		scsi_single_lun_run(sdev);
-	if (!list_empty(&sdev->host->starved_list))
-		scsi_starved_list_run(sdev->host);
+	if (!list_empty(&shost->starved_list) || shost->run_admin_queue)
+		scsi_starved_list_run(shost);
 
 	if (q->mq_ops)
 		blk_mq_run_hw_queues(q, false);
@@ -563,7 +573,7 @@ void scsi_requeue_run_queue(struct work_struct *work)
 
 	sdev = container_of(work, struct scsi_device, requeue_work);
 	q = sdev->request_queue;
-	scsi_run_queue(q);
+	scsi_run_queue(sdev->host, q);
 }
 
 /*
@@ -597,7 +607,7 @@ static void scsi_requeue_command(struct request_queue *q, struct scsi_cmnd *cmd)
 	blk_requeue_request(q, req);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
-	scsi_run_queue(q);
+	scsi_run_queue(sdev->host, q);
 
 	put_device(&sdev->sdev_gendev);
 }
@@ -607,7 +617,7 @@ void scsi_run_host_queues(struct Scsi_Host *shost)
 	struct scsi_device *sdev;
 
 	shost_for_each_device(sdev, shost)
-		scsi_run_queue(sdev->request_queue);
+		scsi_run_queue(shost, sdev->request_queue);
 }
 
 static void scsi_uninit_cmd(struct scsi_cmnd *cmd)
@@ -731,7 +741,7 @@ static bool scsi_end_request(struct request *req, blk_status_t error,
 		blk_finish_request(req, error);
 		spin_unlock_irqrestore(q->queue_lock, flags);
 
-		scsi_run_queue(q);
+		scsi_run_queue(sdev->host, q);
 	}
 
 	put_device(&sdev->sdev_gendev);
@@ -1495,6 +1505,12 @@ static inline int scsi_dev_queue_ready(struct request_queue *q,
 	return 1;
 out_dec:
 	atomic_dec(&sdev->device_busy);
+
+	if (unlikely(scsi_is_admin_queue(q))) {
+		spin_lock_irq(sdev->host->host_lock);
+		sdev->host->run_admin_queue = true;
+		spin_unlock_irq(sdev->host->host_lock);
+	}
 	return 0;
 }
 
@@ -1503,7 +1519,7 @@ static inline int scsi_dev_queue_ready(struct request_queue *q,
  * @sdev: scsi device on starget to check.
  */
 static inline int scsi_target_queue_ready(struct Scsi_Host *shost,
-					   struct scsi_device *sdev)
+					   struct scsi_device *sdev, bool admin)
 {
 	struct scsi_target *starget = scsi_target(sdev);
 	unsigned int busy;
@@ -1545,6 +1561,8 @@ static inline int scsi_target_queue_ready(struct Scsi_Host *shost,
 starved:
 	spin_lock_irq(shost->host_lock);
 	list_move_tail(&sdev->starved_entry, &shost->starved_list);
+	if (admin)
+		shost->run_admin_queue = true;
 	spin_unlock_irq(shost->host_lock);
 out_dec:
 	if (starget->can_queue > 0)
@@ -1601,6 +1619,8 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
 	spin_lock_irq(shost->host_lock);
 	if (list_empty(&sdev->starved_entry))
 		list_add_tail(&sdev->starved_entry, &shost->starved_list);
+	if (scsi_is_admin_queue(q))
+		shost->run_admin_queue = true;
 	spin_unlock_irq(shost->host_lock);
 out_dec:
 	scsi_dec_host_busy(shost);
@@ -1900,7 +1920,7 @@ static void scsi_request_fn(struct request_queue *q)
 			goto not_ready;
 		}
 
-		if (!scsi_target_queue_ready(shost, sdev))
+		if (!scsi_target_queue_ready(shost, sdev, scsi_is_admin_queue(q)))
 			goto not_ready;
 
 		if (!scsi_host_queue_ready(q, shost, sdev))
@@ -2063,7 +2083,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
 		goto out_put_budget;
 
 	ret = BLK_STS_RESOURCE;
-	if (!scsi_target_queue_ready(shost, sdev))
+	if (!scsi_target_queue_ready(shost, sdev, scsi_is_admin_queue(q)))
 		goto out_put_budget;
 	if (!scsi_host_queue_ready(q, shost, sdev))
 		goto out_dec_target_busy;
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 53b485fe9b67..7479923c4a2a 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -532,6 +532,8 @@ struct Scsi_Host {
 	struct list_head	__devices;
 	struct list_head	__targets;
 	
+	struct request_queue	*admin_q;
+	bool                    run_admin_queue;
 	struct list_head	starved_list;
 
 	spinlock_t		default_lock;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 09/14] SCSI: create admin queue for each host
  2018-08-07 17:44 [RFC PATCH 00/14] SCSI: introduce per-host admin queue & enable runtime PM Ming Lei
                   ` (7 preceding siblings ...)
  2018-08-07 17:44 ` [RFC PATCH 08/14] SCSI: deal with admin queue busy Ming Lei
@ 2018-08-07 17:44 ` Ming Lei
  2018-08-08  5:57   ` jianchao.wang
  2018-08-07 17:44 ` [RFC PATCH 10/14] SCSI: use the dedicated admin queue to send admin commands Ming Lei
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2018-08-07 17:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Alan Stern, Christoph Hellwig,
	Bart Van Assche, Jianchao Wang, Hannes Reinecke,
	Johannes Thumshirn, Adrian Hunter, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi

The created admin queue will be used to send internal admin commands,
so we can simplify the sync between some admin commands and IO requests,
typical examples are system suspend and runtime PM.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/hosts.c     |   9 ++++
 drivers/scsi/scsi_lib.c  | 111 +++++++++++++++++++++++++++++++++++++++++------
 drivers/scsi/scsi_priv.h |   1 +
 3 files changed, 108 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 3771e59a9fae..e09f9e5a75da 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -242,6 +242,9 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,
 
 	shost->dma_dev = dma_dev;
 
+	if (!scsi_init_admin_queue(shost))
+		goto out_remove_tags;
+
 	/*
 	 * Increase usage count temporarily here so that calling
 	 * scsi_autopm_put_host() will trigger runtime idle if there is
@@ -309,6 +312,9 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,
 	pm_runtime_disable(&shost->shost_gendev);
 	pm_runtime_set_suspended(&shost->shost_gendev);
 	pm_runtime_put_noidle(&shost->shost_gendev);
+	blk_cleanup_queue(shost->admin_q);
+	blk_put_queue(shost->admin_q);
+ out_remove_tags:
 	if (shost_use_blk_mq(shost))
 		scsi_mq_destroy_tags(shost);
  fail:
@@ -344,6 +350,9 @@ static void scsi_host_dev_release(struct device *dev)
 		kfree(dev_name(&shost->shost_dev));
 	}
 
+	blk_cleanup_queue(shost->admin_q);
+	blk_put_queue(shost->admin_q);
+
 	if (shost_use_blk_mq(shost)) {
 		if (shost->tag_set.tags)
 			scsi_mq_destroy_tags(shost);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index d0467c8cb996..dbb355e23262 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2038,19 +2038,22 @@ static void scsi_mq_done(struct scsi_cmnd *cmd)
 	blk_mq_complete_request(cmd->request);
 }
 
-static void scsi_mq_put_budget(struct blk_mq_hw_ctx *hctx)
+static void __scsi_mq_put_budget(struct blk_mq_hw_ctx *hctx,
+		struct scsi_device *sdev)
 {
-	struct request_queue *q = hctx->queue;
-	struct scsi_device *sdev = q->queuedata;
-
 	atomic_dec(&sdev->device_busy);
 	put_device(&sdev->sdev_gendev);
 }
 
-static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx)
+static void scsi_mq_put_budget(struct blk_mq_hw_ctx *hctx)
+{
+	__scsi_mq_put_budget(hctx, hctx->queue->queuedata);
+}
+
+static bool __scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx,
+		struct scsi_device *sdev)
 {
 	struct request_queue *q = hctx->queue;
-	struct scsi_device *sdev = q->queuedata;
 
 	if (!get_device(&sdev->sdev_gendev))
 		goto out;
@@ -2067,12 +2070,17 @@ static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx)
 	return false;
 }
 
-static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
-			 const struct blk_mq_queue_data *bd)
+static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx)
+{
+	return __scsi_mq_get_budget(hctx, hctx->queue->queuedata);
+}
+
+static blk_status_t __scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
+			 const struct blk_mq_queue_data *bd,
+			 struct scsi_device *sdev)
 {
 	struct request *req = bd->rq;
 	struct request_queue *q = req->q;
-	struct scsi_device *sdev = q->queuedata;
 	struct Scsi_Host *shost = sdev->host;
 	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req);
 	blk_status_t ret;
@@ -2120,7 +2128,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
 	if (scsi_target(sdev)->can_queue > 0)
 		atomic_dec(&scsi_target(sdev)->target_busy);
 out_put_budget:
-	scsi_mq_put_budget(hctx);
+	__scsi_mq_put_budget(hctx, sdev);
 	switch (ret) {
 	case BLK_STS_OK:
 		break;
@@ -2142,6 +2150,23 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
 	return ret;
 }
 
+static blk_status_t scsi_admin_queue_rq(struct blk_mq_hw_ctx *hctx,
+			 const struct blk_mq_queue_data *bd)
+{
+	struct scsi_device *sdev = scsi_req(bd->rq)->sdev;
+
+	if (!__scsi_mq_get_budget(hctx, sdev))
+		return BLK_STS_RESOURCE;
+
+	return __scsi_queue_rq(hctx, bd, sdev);
+}
+
+static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
+			 const struct blk_mq_queue_data *bd)
+{
+	return __scsi_queue_rq(hctx, bd, hctx->queue->queuedata);
+}
+
 static enum blk_eh_timer_return scsi_timeout(struct request *req,
 		bool reserved)
 {
@@ -2274,9 +2299,9 @@ static void scsi_old_exit_rq(struct request_queue *q, struct request *rq)
 			       cmd->sense_buffer);
 }
 
-struct request_queue *scsi_old_alloc_queue(struct scsi_device *sdev)
+static struct request_queue *__scsi_old_alloc_queue(struct Scsi_Host *shost,
+		bool admin)
 {
-	struct Scsi_Host *shost = sdev->host;
 	struct request_queue *q;
 
 	q = blk_alloc_queue_node(GFP_KERNEL, NUMA_NO_NODE, NULL);
@@ -2295,7 +2320,9 @@ struct request_queue *scsi_old_alloc_queue(struct scsi_device *sdev)
 	}
 
 	__scsi_init_queue(shost, q);
-	blk_queue_flag_set(QUEUE_FLAG_SCSI_PASSTHROUGH, q);
+
+	if (!admin)
+		blk_queue_flag_set(QUEUE_FLAG_SCSI_PASSTHROUGH, q);
 	blk_queue_prep_rq(q, scsi_prep_fn);
 	blk_queue_unprep_rq(q, scsi_unprep_fn);
 	blk_queue_softirq_done(q, scsi_softirq_done);
@@ -2304,6 +2331,23 @@ struct request_queue *scsi_old_alloc_queue(struct scsi_device *sdev)
 	return q;
 }
 
+struct request_queue *scsi_old_alloc_queue(struct scsi_device *sdev)
+{
+	return __scsi_old_alloc_queue(sdev->host, false);
+}
+
+static struct request_queue *scsi_old_alloc_admin_queue(struct Scsi_Host *shost)
+{
+	struct request_queue *q = __scsi_old_alloc_queue(shost, true);
+
+	if (!q)
+		return NULL;
+
+	blk_queue_init_tags(q, shost->cmd_per_lun, shost->bqt,
+			    shost->hostt->tag_alloc_policy);
+	return q;
+}
+
 static const struct blk_mq_ops scsi_mq_ops = {
 	.get_budget	= scsi_mq_get_budget,
 	.put_budget	= scsi_mq_put_budget,
@@ -2319,6 +2363,16 @@ static const struct blk_mq_ops scsi_mq_ops = {
 	.map_queues	= scsi_map_queues,
 };
 
+static const struct blk_mq_ops scsi_mq_admin_ops = {
+	.queue_rq	= scsi_admin_queue_rq,
+	.complete	= scsi_softirq_done,
+	.timeout	= scsi_timeout,
+	.init_request	= scsi_mq_init_request,
+	.exit_request	= scsi_mq_exit_request,
+	.initialize_rq_fn = scsi_initialize_rq,
+	.map_queues	= scsi_map_queues,
+};
+
 struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev)
 {
 	sdev->request_queue = blk_mq_init_queue(&sdev->host->tag_set);
@@ -2330,6 +2384,37 @@ struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev)
 	return sdev->request_queue;
 }
 
+static struct request_queue *scsi_mq_alloc_admin_queue(struct Scsi_Host *shost)
+{
+	struct request_queue *q = __blk_mq_init_queue(&shost->tag_set,
+			QUEUE_FLAG_MQ_NO_SCHED_DEFAULT);
+	if (IS_ERR(q))
+		return NULL;
+
+	q->mq_ops = &scsi_mq_admin_ops;
+
+	__scsi_init_queue(shost, q);
+
+	return q;
+}
+
+struct request_queue *scsi_init_admin_queue(struct Scsi_Host *shost)
+{
+	struct request_queue *q;
+
+	if (shost_use_blk_mq(shost))
+		q = scsi_mq_alloc_admin_queue(shost);
+	else
+		q = scsi_old_alloc_admin_queue(shost);
+
+	if (!q)
+		return NULL;
+
+	WARN_ON(!blk_get_queue(q));
+	shost->admin_q = q;
+	return q;
+}
+
 int scsi_mq_setup_tags(struct Scsi_Host *shost)
 {
 	unsigned int cmd_size, sgl_size;
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 99f1db5e467e..0553acbc3f65 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -94,6 +94,7 @@ extern void scsi_run_host_queues(struct Scsi_Host *shost);
 extern void scsi_requeue_run_queue(struct work_struct *work);
 extern struct request_queue *scsi_old_alloc_queue(struct scsi_device *sdev);
 extern struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev);
+extern struct request_queue *scsi_init_admin_queue(struct Scsi_Host *shost);
 extern void scsi_start_queue(struct scsi_device *sdev);
 extern int scsi_mq_setup_tags(struct Scsi_Host *shost);
 extern void scsi_mq_destroy_tags(struct Scsi_Host *shost);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 10/14] SCSI: use the dedicated admin queue to send admin commands
  2018-08-07 17:44 [RFC PATCH 00/14] SCSI: introduce per-host admin queue & enable runtime PM Ming Lei
                   ` (8 preceding siblings ...)
  2018-08-07 17:44 ` [RFC PATCH 09/14] SCSI: create admin queue for each host Ming Lei
@ 2018-08-07 17:44 ` Ming Lei
  2018-08-07 23:33   ` Bart Van Assche
  2018-08-07 17:44 ` [RFC PATCH 11/14] SCSI: transport_spi: resume a quiesced device Ming Lei
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2018-08-07 17:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Alan Stern, Christoph Hellwig,
	Bart Van Assche, Jianchao Wang, Hannes Reinecke,
	Johannes Thumshirn, Adrian Hunter, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi

Now the per-host dedicated admin queue is ready, so use this queue to
send admin commands only.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/scsi_lib.c    | 11 ++++++++---
 drivers/scsi/scsi_scan.c   |  1 +
 drivers/scsi/scsi_sysfs.c  |  1 +
 include/scsi/scsi_device.h |  4 ++++
 4 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index dbb355e23262..79475a679750 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -278,16 +278,16 @@ int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 	struct request *req;
 	struct scsi_request *rq;
 	int ret = DRIVER_ERROR << 24;
+	struct request_queue *q = sdev->host->admin_q;
 
-	req = blk_get_request(sdev->request_queue,
+	req = blk_get_request(q,
 			data_direction == DMA_TO_DEVICE ?
 			REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, BLK_MQ_REQ_PREEMPT);
 	if (IS_ERR(req))
 		return ret;
 	rq = scsi_req(req);
 
-	if (bufflen &&	blk_rq_map_kern(sdev->request_queue, req,
-					buffer, bufflen, GFP_NOIO))
+	if (bufflen && blk_rq_map_kern(q, req, buffer, bufflen, GFP_NOIO))
 		goto out;
 
 	rq->cmd_len = COMMAND_SIZE(cmd[0]);
@@ -299,6 +299,8 @@ int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 	req->cmd_flags |= flags;
 	req->rq_flags |= rq_flags | RQF_QUIET;
 
+	atomic_inc(&sdev->nr_admin_pending);
+
 	/*
 	 * head injection *required* here otherwise quiesce won't work
 	 */
@@ -323,6 +325,9 @@ int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
  out:
 	blk_put_request(req);
 
+	atomic_dec(&sdev->nr_admin_pending);
+	wake_up_all(&sdev->admin_wq);
+
 	return ret;
 }
 EXPORT_SYMBOL(__scsi_execute);
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 0880d975eed3..b10af1692ddd 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -243,6 +243,7 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
 	mutex_init(&sdev->inquiry_mutex);
 	INIT_WORK(&sdev->event_work, scsi_evt_thread);
 	INIT_WORK(&sdev->requeue_work, scsi_requeue_run_queue);
+	init_waitqueue_head(&sdev->admin_wq);
 
 	sdev->sdev_gendev.parent = get_device(&starget->dev);
 	sdev->sdev_target = starget;
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 7943b762c12d..7e03a420dfe7 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1377,6 +1377,7 @@ void __scsi_remove_device(struct scsi_device *sdev)
 
 	blk_cleanup_queue(sdev->request_queue);
 	cancel_work_sync(&sdev->requeue_work);
+	wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending));
 
 	if (sdev->host->hostt->slave_destroy)
 		sdev->host->hostt->slave_destroy(sdev);
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 202f4d6a4342..f6820da1dc37 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -227,6 +227,10 @@ struct scsi_device {
 	struct mutex		state_mutex;
 	enum scsi_device_state sdev_state;
 	struct task_struct	*quiesced_by;
+
+	atomic_t		nr_admin_pending;
+	wait_queue_head_t       admin_wq;
+
 	unsigned long		sdev_data[0];
 } __attribute__((aligned(sizeof(unsigned long))));
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 11/14] SCSI: transport_spi: resume a quiesced device
  2018-08-07 17:44 [RFC PATCH 00/14] SCSI: introduce per-host admin queue & enable runtime PM Ming Lei
                   ` (9 preceding siblings ...)
  2018-08-07 17:44 ` [RFC PATCH 10/14] SCSI: use the dedicated admin queue to send admin commands Ming Lei
@ 2018-08-07 17:44 ` Ming Lei
  2018-08-07 17:44 ` [RFC PATCH 12/14] SCSI: use admin queue to implement queue QUIESCE Ming Lei
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2018-08-07 17:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Alan Stern, Christoph Hellwig,
	Bart Van Assche, Jianchao Wang, Hannes Reinecke,
	Johannes Thumshirn, Adrian Hunter, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi

We have to preempt freeze queue in scsi_device_quiesce(),
and unfreeze in scsi_device_resume(), so call scsi_device_resume()
for the device which is quiesced by scsi_device_quiesce().

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/scsi_transport_spi.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/scsi_transport_spi.c b/drivers/scsi/scsi_transport_spi.c
index 2ca150b16764..10854c1848e2 100644
--- a/drivers/scsi/scsi_transport_spi.c
+++ b/drivers/scsi/scsi_transport_spi.c
@@ -1052,6 +1052,9 @@ spi_dv_device(struct scsi_device *sdev)
 
 	scsi_target_resume(starget);
 
+	/* undo what scsi_device_quiesce() did */
+	scsi_device_resume(sdev);
+
 	spi_initial_dv(starget) = 1;
 
  out_free:
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 12/14] SCSI: use admin queue to implement queue QUIESCE
  2018-08-07 17:44 [RFC PATCH 00/14] SCSI: introduce per-host admin queue & enable runtime PM Ming Lei
                   ` (10 preceding siblings ...)
  2018-08-07 17:44 ` [RFC PATCH 11/14] SCSI: transport_spi: resume a quiesced device Ming Lei
@ 2018-08-07 17:44 ` Ming Lei
  2018-08-07 17:44 ` [RFC PATCH 13/14] block: simplify runtime PM support Ming Lei
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2018-08-07 17:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Alan Stern, Christoph Hellwig,
	Bart Van Assche, Jianchao Wang, Hannes Reinecke,
	Johannes Thumshirn, Adrian Hunter, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi

All admin commands are sent via per-host admin queue, so we can simply
freeze the IO queue for quiescing scsi device.

Also the current SCSI stack guarantees that any request originated from
admin queue won't be called back to block layer via the associated IO queue,
and it is always dealt with by the admin queue.

So it is safe to submit admin request via admin queue when the associated IO
queue is frozen, and this way matches the PREEMPT flag perfectly.

Finally, we can remove the preempt_only approach for supporting SCSI
quiesce, then the code in block fast path is simplified a lot.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c           | 44 ++------------------------------------------
 block/blk-mq-debugfs.c     |  1 -
 drivers/scsi/scsi_lib.c    | 29 +++--------------------------
 include/linux/blkdev.h     |  6 ------
 include/scsi/scsi_device.h |  1 -
 5 files changed, 5 insertions(+), 76 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 5e0f4bee3588..ea12e3fcfa11 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -421,26 +421,6 @@ void blk_sync_queue(struct request_queue *q)
 EXPORT_SYMBOL(blk_sync_queue);
 
 /**
- * blk_set_preempt_only - set QUEUE_FLAG_PREEMPT_ONLY
- * @q: request queue pointer
- *
- * Returns the previous value of the PREEMPT_ONLY flag - 0 if the flag was not
- * set and 1 if the flag was already set.
- */
-int blk_set_preempt_only(struct request_queue *q)
-{
-	return blk_queue_flag_test_and_set(QUEUE_FLAG_PREEMPT_ONLY, q);
-}
-EXPORT_SYMBOL_GPL(blk_set_preempt_only);
-
-void blk_clear_preempt_only(struct request_queue *q)
-{
-	blk_queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q);
-	wake_up_all(&q->mq_freeze_wq);
-}
-EXPORT_SYMBOL_GPL(blk_clear_preempt_only);
-
-/**
  * __blk_run_queue_uncond - run a queue whether or not it has been stopped
  * @q:	The queue to run
  *
@@ -911,27 +891,8 @@ EXPORT_SYMBOL(blk_alloc_queue);
  */
 int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 {
-	const bool preempt = flags & BLK_MQ_REQ_PREEMPT;
-
 	while (true) {
-		bool success = false;
-
-		rcu_read_lock();
-		if (percpu_ref_tryget_live(&q->q_usage_counter)) {
-			/*
-			 * The code that sets the PREEMPT_ONLY flag is
-			 * responsible for ensuring that that flag is globally
-			 * visible before the queue is unfrozen.
-			 */
-			if (preempt || !blk_queue_preempt_only(q)) {
-				success = true;
-			} else {
-				percpu_ref_put(&q->q_usage_counter);
-			}
-		}
-		rcu_read_unlock();
-
-		if (success)
+		if (percpu_ref_tryget_live(&q->q_usage_counter))
 			return 0;
 
 		if (flags & BLK_MQ_REQ_NOWAIT)
@@ -947,8 +908,7 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 		smp_rmb();
 
 		wait_event(q->mq_freeze_wq,
-			   (atomic_read(&q->mq_freeze_depth) == 0 &&
-			    (preempt || !blk_queue_preempt_only(q))) ||
+			   atomic_read(&q->mq_freeze_depth) == 0 ||
 			   blk_queue_dying(q));
 		if (blk_queue_dying(q))
 			return -ENODEV;
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 246c9afb6f5d..899654eb10b6 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -132,7 +132,6 @@ static const char *const blk_queue_flag_name[] = {
 	QUEUE_FLAG_NAME(REGISTERED),
 	QUEUE_FLAG_NAME(SCSI_PASSTHROUGH),
 	QUEUE_FLAG_NAME(QUIESCED),
-	QUEUE_FLAG_NAME(PREEMPT_ONLY),
 	QUEUE_FLAG_NAME(NO_SCHED),
 };
 #undef QUEUE_FLAG_NAME
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 79475a679750..c78602f1a425 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -3118,34 +3118,12 @@ static void scsi_wait_for_queuecommand(struct scsi_device *sdev)
 int
 scsi_device_quiesce(struct scsi_device *sdev)
 {
-	struct request_queue *q = sdev->request_queue;
 	int err;
 
-	/*
-	 * It is allowed to call scsi_device_quiesce() multiple times from
-	 * the same context but concurrent scsi_device_quiesce() calls are
-	 * not allowed.
-	 */
-	WARN_ON_ONCE(sdev->quiesced_by && sdev->quiesced_by != current);
-
-	blk_set_preempt_only(q);
-
-	blk_mq_freeze_queue(q);
-	/*
-	 * Ensure that the effect of blk_set_preempt_only() will be visible
-	 * for percpu_ref_tryget() callers that occur after the queue
-	 * unfreeze even if the queue was already frozen before this function
-	 * was called. See also https://lwn.net/Articles/573497/.
-	 */
-	synchronize_rcu();
-	blk_mq_unfreeze_queue(q);
+	blk_mq_freeze_queue(sdev->request_queue);
 
 	mutex_lock(&sdev->state_mutex);
 	err = scsi_device_set_state(sdev, SDEV_QUIESCE);
-	if (err == 0)
-		sdev->quiesced_by = current;
-	else
-		blk_clear_preempt_only(q);
 	mutex_unlock(&sdev->state_mutex);
 
 	return err;
@@ -3168,12 +3146,11 @@ void scsi_device_resume(struct scsi_device *sdev)
 	 * device deleted during suspend)
 	 */
 	mutex_lock(&sdev->state_mutex);
-	WARN_ON_ONCE(!sdev->quiesced_by);
-	sdev->quiesced_by = NULL;
-	blk_clear_preempt_only(sdev->request_queue);
 	if (sdev->sdev_state == SDEV_QUIESCE)
 		scsi_device_set_state(sdev, SDEV_RUNNING);
 	mutex_unlock(&sdev->state_mutex);
+
+	blk_mq_unfreeze_queue(sdev->request_queue);
 }
 EXPORT_SYMBOL(scsi_device_resume);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 962945579f2a..a9d371f55ca5 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -700,7 +700,6 @@ struct request_queue {
 #define QUEUE_FLAG_REGISTERED  26	/* queue has been registered to a disk */
 #define QUEUE_FLAG_SCSI_PASSTHROUGH 27	/* queue supports SCSI commands */
 #define QUEUE_FLAG_QUIESCED    28	/* queue has been quiesced */
-#define QUEUE_FLAG_PREEMPT_ONLY	29	/* only process REQ_PREEMPT requests */
 #define QUEUE_FLAG_NO_SCHED	30	/* no scheduler allowed */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
@@ -742,14 +741,9 @@ bool blk_queue_flag_test_and_clear(unsigned int flag, struct request_queue *q);
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
 			     REQ_FAILFAST_DRIVER))
 #define blk_queue_quiesced(q)	test_bit(QUEUE_FLAG_QUIESCED, &(q)->queue_flags)
-#define blk_queue_preempt_only(q)				\
-	test_bit(QUEUE_FLAG_PREEMPT_ONLY, &(q)->queue_flags)
 #define blk_queue_fua(q)	test_bit(QUEUE_FLAG_FUA, &(q)->queue_flags)
 #define blk_queue_no_sched(q)	test_bit(QUEUE_FLAG_NO_SCHED, &(q)->queue_flags)
 
-extern int blk_set_preempt_only(struct request_queue *q);
-extern void blk_clear_preempt_only(struct request_queue *q);
-
 static inline int queue_in_flight(struct request_queue *q)
 {
 	return q->in_flight[0] + q->in_flight[1];
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index f6820da1dc37..666b58799cec 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -226,7 +226,6 @@ struct scsi_device {
 	unsigned char		access_state;
 	struct mutex		state_mutex;
 	enum scsi_device_state sdev_state;
-	struct task_struct	*quiesced_by;
 
 	atomic_t		nr_admin_pending;
 	wait_queue_head_t       admin_wq;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 13/14] block: simplify runtime PM support
  2018-08-07 17:44 [RFC PATCH 00/14] SCSI: introduce per-host admin queue & enable runtime PM Ming Lei
                   ` (11 preceding siblings ...)
  2018-08-07 17:44 ` [RFC PATCH 12/14] SCSI: use admin queue to implement queue QUIESCE Ming Lei
@ 2018-08-07 17:44 ` Ming Lei
  2018-08-07 19:54   ` Bart Van Assche
  2018-08-07 17:44 ` [RFC PATCH 14/14] block: enable runtime PM for blk-mq Ming Lei
  2018-08-08 14:06 ` [RFC PATCH 00/14] SCSI: introduce per-host admin queue & enable runtime PM Alan Stern
  14 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2018-08-07 17:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Alan Stern, Christoph Hellwig,
	Bart Van Assche, Jianchao Wang, Hannes Reinecke,
	Johannes Thumshirn, Adrian Hunter, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi

This patch simplifies runtime PM support by the following approach:

1) resume device in blk_queue_enter() if this device isn't active

2) freeze queue in blk_pre_runtime_suspend()

3) unfreeze queue in blk_pre_runtime_resume()

4) remove checking on RRF_PM because now we requires out-of-band PM
request to resume device

Then we can remove blk_pm_allow_request(), and more importantly this way
can be applied to blk-mq path too.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c        | 72 ++++++++++++++++++++++++++-----------------------
 block/elevator.c        |  7 +++--
 drivers/scsi/scsi_lib.c |  7 +++++
 include/linux/blkdev.h  |  2 ++
 4 files changed, 51 insertions(+), 37 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index ea12e3fcfa11..7390149f4fd1 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -884,6 +884,24 @@ struct request_queue *blk_alloc_queue(gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(blk_alloc_queue);
 
+#ifdef CONFIG_PM
+static void blk_resume_queue(struct request_queue *q)
+{
+	if (!q->dev)
+		return;
+
+	/* PM request needs to be dealt with out of band */
+	mutex_lock(&q->pm_lock);
+	if (q->rpm_status == RPM_SUSPENDED || q->rpm_status == RPM_SUSPENDING)
+		pm_runtime_resume(q->dev);
+	mutex_unlock(&q->pm_lock);
+}
+#else
+static void blk_resume_queue(struct request_queue *q)
+{
+}
+#endif
+
 /**
  * blk_queue_enter() - try to increase q->q_usage_counter
  * @q: request queue pointer
@@ -907,6 +925,8 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 		 */
 		smp_rmb();
 
+		blk_resume_queue(q);
+
 		wait_event(q->mq_freeze_wq,
 			   atomic_read(&q->mq_freeze_depth) == 0 ||
 			   blk_queue_dying(q));
@@ -1684,7 +1704,7 @@ EXPORT_SYMBOL_GPL(part_round_stats);
 #ifdef CONFIG_PM
 static void blk_pm_put_request(struct request *rq)
 {
-	if (rq->q->dev && !(rq->rq_flags & RQF_PM) && !--rq->q->nr_pending)
+	if (rq->q->dev && !--rq->q->nr_pending)
 		pm_runtime_mark_last_busy(rq->q->dev);
 }
 #else
@@ -2702,30 +2722,6 @@ void blk_account_io_done(struct request *req, u64 now)
 	}
 }
 
-#ifdef CONFIG_PM
-/*
- * Don't process normal requests when queue is suspended
- * or in the process of suspending/resuming
- */
-static bool blk_pm_allow_request(struct request *rq)
-{
-	switch (rq->q->rpm_status) {
-	case RPM_RESUMING:
-	case RPM_SUSPENDING:
-		return rq->rq_flags & RQF_PM;
-	case RPM_SUSPENDED:
-		return false;
-	default:
-		return true;
-	}
-}
-#else
-static bool blk_pm_allow_request(struct request *rq)
-{
-	return true;
-}
-#endif
-
 void blk_account_io_start(struct request *rq, bool new_io)
 {
 	struct hd_struct *part;
@@ -2770,13 +2766,8 @@ static struct request *elv_next_request(struct request_queue *q)
 	WARN_ON_ONCE(q->mq_ops);
 
 	while (1) {
-		list_for_each_entry(rq, &q->queue_head, queuelist) {
-			if (blk_pm_allow_request(rq))
-				return rq;
-
-			if (rq->rq_flags & RQF_SOFTBARRIER)
-				break;
-		}
+		list_for_each_entry(rq, &q->queue_head, queuelist)
+			return rq;
 
 		/*
 		 * Flush request is running and flush request isn't queueable
@@ -3737,6 +3728,7 @@ void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
 		return;
 	}
 
+	mutex_init(&q->pm_lock);
 	q->dev = dev;
 	q->rpm_status = RPM_ACTIVE;
 	pm_runtime_set_autosuspend_delay(q->dev, -1);
@@ -3772,6 +3764,7 @@ int blk_pre_runtime_suspend(struct request_queue *q)
 	if (!q->dev)
 		return ret;
 
+	mutex_lock(&q->pm_lock);
 	spin_lock_irq(q->queue_lock);
 	if (q->nr_pending) {
 		ret = -EBUSY;
@@ -3780,6 +3773,13 @@ int blk_pre_runtime_suspend(struct request_queue *q)
 		q->rpm_status = RPM_SUSPENDING;
 	}
 	spin_unlock_irq(q->queue_lock);
+
+	if (!ret) {
+		blk_freeze_queue(q);
+		q->rpm_q_frozen = true;
+	}
+	mutex_unlock(&q->pm_lock);
+
 	return ret;
 }
 EXPORT_SYMBOL(blk_pre_runtime_suspend);
@@ -3854,16 +3854,22 @@ void blk_post_runtime_resume(struct request_queue *q, int err)
 	if (!q->dev)
 		return;
 
+	lockdep_assert_held(&q->pm_lock);
+
 	spin_lock_irq(q->queue_lock);
 	if (!err) {
 		q->rpm_status = RPM_ACTIVE;
-		__blk_run_queue(q);
 		pm_runtime_mark_last_busy(q->dev);
 		pm_request_autosuspend(q->dev);
 	} else {
 		q->rpm_status = RPM_SUSPENDED;
 	}
 	spin_unlock_irq(q->queue_lock);
+
+	if (!err && q->rpm_q_frozen) {
+		blk_mq_unfreeze_queue(q);
+		q->rpm_q_frozen = false;
+	}
 }
 EXPORT_SYMBOL(blk_post_runtime_resume);
 
diff --git a/block/elevator.c b/block/elevator.c
index a34fecbe7e81..d389b942378b 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -560,15 +560,14 @@ void elv_bio_merged(struct request_queue *q, struct request *rq,
 #ifdef CONFIG_PM
 static void blk_pm_requeue_request(struct request *rq)
 {
-	if (rq->q->dev && !(rq->rq_flags & RQF_PM))
+	if (rq->q->dev)
 		rq->q->nr_pending--;
 }
 
 static void blk_pm_add_request(struct request_queue *q, struct request *rq)
 {
-	if (q->dev && !(rq->rq_flags & RQF_PM) && q->nr_pending++ == 0 &&
-	    (q->rpm_status == RPM_SUSPENDED || q->rpm_status == RPM_SUSPENDING))
-		pm_request_resume(q->dev);
+	if (q->dev)
+		q->nr_pending++;
 }
 #else
 static inline void blk_pm_requeue_request(struct request *rq) {}
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index c78602f1a425..0aee332fbb63 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -279,6 +279,10 @@ int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 	struct scsi_request *rq;
 	int ret = DRIVER_ERROR << 24;
 	struct request_queue *q = sdev->host->admin_q;
+	bool pm_rq = rq_flags & RQF_PM;
+
+	if (!pm_rq)
+		scsi_autopm_get_device(sdev);
 
 	req = blk_get_request(q,
 			data_direction == DMA_TO_DEVICE ?
@@ -328,6 +332,9 @@ int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 	atomic_dec(&sdev->nr_admin_pending);
 	wake_up_all(&sdev->admin_wq);
 
+	if (!pm_rq)
+		scsi_autopm_put_device(sdev);
+
 	return ret;
 }
 EXPORT_SYMBOL(__scsi_execute);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index a9d371f55ca5..b3dcba83a8d7 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -544,6 +544,8 @@ struct request_queue {
 	struct device		*dev;
 	int			rpm_status;
 	unsigned int		nr_pending;
+	bool			rpm_q_frozen;
+	struct mutex		pm_lock;
 #endif
 
 	/*
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 14/14] block: enable runtime PM for blk-mq
  2018-08-07 17:44 [RFC PATCH 00/14] SCSI: introduce per-host admin queue & enable runtime PM Ming Lei
                   ` (12 preceding siblings ...)
  2018-08-07 17:44 ` [RFC PATCH 13/14] block: simplify runtime PM support Ming Lei
@ 2018-08-07 17:44 ` Ming Lei
  2018-08-08 14:06 ` [RFC PATCH 00/14] SCSI: introduce per-host admin queue & enable runtime PM Alan Stern
  14 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2018-08-07 17:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Alan Stern, Christoph Hellwig,
	Bart Van Assche, Jianchao Wang, Hannes Reinecke,
	Johannes Thumshirn, Adrian Hunter, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi

Now blk-mq can borrow the runtime PM approach from legacy path, so
enable it simply.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c | 6 ------
 block/blk-mq.c   | 4 ++++
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 7390149f4fd1..26f9ceb85318 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -3722,12 +3722,6 @@ EXPORT_SYMBOL(blk_finish_plug);
  */
 void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
 {
-	/* Don't enable runtime PM for blk-mq until it is ready */
-	if (q->mq_ops) {
-		pm_runtime_disable(dev);
-		return;
-	}
-
 	mutex_init(&q->pm_lock);
 	q->dev = dev;
 	q->rpm_status = RPM_ACTIVE;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index cf0790b628e4..4feb3f484c6a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -25,6 +25,7 @@
 #include <linux/delay.h>
 #include <linux/crash_dump.h>
 #include <linux/prefetch.h>
+#include <linux/pm_runtime.h>
 
 #include <trace/events/block.h>
 
@@ -479,6 +480,9 @@ static void __blk_mq_free_request(struct request *rq)
 		blk_mq_put_tag(hctx, hctx->sched_tags, ctx, sched_tag);
 	blk_mq_sched_restart(hctx);
 	blk_queue_exit(q);
+
+	if (q->dev)
+		pm_runtime_mark_last_busy(q->dev);
 }
 
 void blk_mq_free_request(struct request *rq)
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 13/14] block: simplify runtime PM support
  2018-08-07 17:44 ` [RFC PATCH 13/14] block: simplify runtime PM support Ming Lei
@ 2018-08-07 19:54   ` Bart Van Assche
  2018-08-08  3:50     ` Ming Lei
  0 siblings, 1 reply; 28+ messages in thread
From: Bart Van Assche @ 2018-08-07 19:54 UTC (permalink / raw)
  To: ming.lei, axboe
  Cc: jthumshirn, linux-block, hch, martin.petersen, hare, linux-scsi,
	stern, jianchao.w.wang, jejb, adrian.hunter

On Wed, 2018-08-08 at 01:44 +0800, Ming Lei wrote:
> @@ -3772,6 +3764,7 @@ int blk_pre_runtime_su=
spend(struct request_queue *q)
>         if (!q->dev)
>                 return ret;
> =20
> +       mutex_lock(&q->pm_lock);
>         spin_lock_irq(q->queue_lock);
>         if (q->nr_pending) {
>                 ret = -EBUSY;
> @@ -3780,6 +3773,13 @@ int blk_pre_runtime_s=
uspend(struct request_queue *q)
>                 q->rpm_status = RPM_SUSPENDING;
>         }

Hello Ming,

As far as I can see none of the patches in this series adds a call to
blk_pm_add_request() in the blk-mq code. Does that mean that q-=
>nr_pending
will always be zero for blk-mq code with your approach and hence that runti=
me
suspend can get triggered while I/O is in progress, e.g. if blk_queue=
F8-enter()
is called concurrently with blk_pre_runtime_suspend()?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 06/14] SCSI: pass scsi_device to scsi_mq_prep_fn
  2018-08-07 17:44 ` [RFC PATCH 06/14] SCSI: pass scsi_device to scsi_mq_prep_fn Ming Lei
@ 2018-08-07 23:24   ` Bart Van Assche
  2018-08-08  3:37     ` Ming Lei
  0 siblings, 1 reply; 28+ messages in thread
From: Bart Van Assche @ 2018-08-07 23:24 UTC (permalink / raw)
  To: ming.lei, axboe
  Cc: jthumshirn, linux-block, hch, martin.petersen, hare, linux-scsi,
	stern, jianchao.w.wang, jejb, adrian.hunter

On Wed, 2018-08-08 at 01:44 +0800, Ming Lei wrote:
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.=
c
> index 435347f58328..52e498fb6280 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1971,10 +1971,9 @@ static unsigned int scsi_mq=
F8-sgl_size(struct Scsi_Host *shost)
>  		sizeof(struct scatterlist);
>  }
> =20
> -static int scsi_mq_prep_fn(struct request *req)
> +static int scsi_mq_prep_fn(struct scsi_device *=
sdev, struct request *req)
>  {
>  	struct scsi_cmnd *cmd = blk_mq_rq_to_pd=
u(req);
> -	struct scsi_device *sdev = req->q->queuedata=
s-
>  	struct Scsi_Host *shost = sdev->host;
>  	struct scatterlist *sg;
> =20
> @@ -2070,7 +2069,7 @@ static blk_status_t scsi=
F8-queue_rq(struct blk_mq_hw_ctx *hctx,
>  		goto out_dec_target_busy;
> =20
>  	if (!(req->rq_flags & RQF_DONTPREP)) {
> -		ret = prep_to_mq(scsi_mq_prep_fn(req));=
-
> +		ret = prep_to_mq(scsi_mq_prep_fn(sdev, re=
q));
>  		if (ret != BLK_STS_OK)
>  			goto out_dec_host_busy;
>  		req->rq_flags |= RQF_DONTPREP;

This patch looks useful to me since it probably realizes a (small) performa=
nce
improvement. I think this patch does not depend on any of the previous patc=
hes
in this series. Is that correct?

Bart.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 10/14] SCSI: use the dedicated admin queue to send admin commands
  2018-08-07 17:44 ` [RFC PATCH 10/14] SCSI: use the dedicated admin queue to send admin commands Ming Lei
@ 2018-08-07 23:33   ` Bart Van Assche
  2018-08-08  3:36     ` Ming Lei
  0 siblings, 1 reply; 28+ messages in thread
From: Bart Van Assche @ 2018-08-07 23:33 UTC (permalink / raw)
  To: ming.lei, axboe
  Cc: jthumshirn, linux-block, hch, martin.petersen, hare, linux-scsi,
	stern, jianchao.w.wang, jejb, adrian.hunter

On Wed, 2018-08-08 at 01:44 +0800, Ming Lei wrote:
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -278,16 +278,16 @@ int __scsi_execute(struc=
t scsi_device *sdev, const unsigned char *cmd,
>  	struct request *req;
>  	struct scsi_request *rq;
>  	int ret = DRIVER_ERROR << 24;
> +	struct request_queue *q = sdev->host->admin=
8-q;
> =20
> -	req = blk_get_request(sdev->request_queue,
> +	req = blk_get_request(q,
>  			data_direction == DMA_TO_DEVICE ?
>  			REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, BLK_=
-MQ_REQ_PREEMPT);

The above looks weird to me. Why are all RQF_PREEMPT requests sent to t=
he admin
queue instead of only RQF_PM requests?

> @@ -299,6 +299,8 @@ int __scsi_execute(struct =
scsi_device *sdev, const unsigned char *cmd,
>  	req->cmd_flags |= flags;
>  	req->rq_flags |= rq_flags | RQF_QUIET;=
-
> =20
> +	atomic_inc(&sdev->nr_admin_pending);

Why has a new counter been introduced to keep track of admin requests inste=
ad of
using q_usage_counter?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 10/14] SCSI: use the dedicated admin queue to send admin commands
  2018-08-07 23:33   ` Bart Van Assche
@ 2018-08-08  3:36     ` Ming Lei
  0 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2018-08-08  3:36 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: axboe, jthumshirn, linux-block, hch, martin.petersen, hare,
	linux-scsi, stern, jianchao.w.wang, jejb, adrian.hunter

On Tue, Aug 07, 2018 at 11:33:15PM +0000, Bart Van Assche wrote:
> On Wed, 2018-08-08 at 01:44 +0800, Ming Lei wrote:
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -278,16 +278,16 @@ int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
> >  	struct request *req;
> >  	struct scsi_request *rq;
> >  	int ret = DRIVER_ERROR << 24;
> > +	struct request_queue *q = sdev->host->admin_q;
> >  
> > -	req = blk_get_request(sdev->request_queue,
> > +	req = blk_get_request(q,
> >  			data_direction == DMA_TO_DEVICE ?
> >  			REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, BLK_MQ_REQ_PREEMPT);
> 
> The above looks weird to me. Why are all RQF_PREEMPT requests sent to the admin
> queue instead of only RQF_PM requests?

The motivation is to use the dedicated admin queue for sending any admin
request, and that is why it isn't named as pm_queue, :-)

Also usually RQF_PREEMPT request is allowed when queue is quiesced, which
can be used in both system suspend and sending domain validation.

> 
> > @@ -299,6 +299,8 @@ int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
> >  	req->cmd_flags |= flags;
> >  	req->rq_flags |= rq_flags | RQF_QUIET;
> >  
> > +	atomic_inc(&sdev->nr_admin_pending);
> 
> Why has a new counter been introduced to keep track of admin requests instead of
> using q_usage_counter?

Just for making sure all admin requests sent to this scsi_device can be
completed before removing this scsi_device, given this patch switches to
use per-host admin queue to send admin requests to all scsi_devices in this host.

So we can't use the q_usage_counter of scsi_device's queue or the per-host
admin queue.

It won't be one big deal since scsi_execute() won't be run in fast path.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 06/14] SCSI: pass scsi_device to scsi_mq_prep_fn
  2018-08-07 23:24   ` Bart Van Assche
@ 2018-08-08  3:37     ` Ming Lei
  0 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2018-08-08  3:37 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: axboe, jthumshirn, linux-block, hch, martin.petersen, hare,
	linux-scsi, stern, jianchao.w.wang, jejb, adrian.hunter

On Tue, Aug 07, 2018 at 11:24:06PM +0000, Bart Van Assche wrote:
> On Wed, 2018-08-08 at 01:44 +0800, Ming Lei wrote:
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index 435347f58328..52e498fb6280 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1971,10 +1971,9 @@ static unsigned int scsi_mq_sgl_size(struct Scsi_Host *shost)
> >  		sizeof(struct scatterlist);
> >  }
> >  
> > -static int scsi_mq_prep_fn(struct request *req)
> > +static int scsi_mq_prep_fn(struct scsi_device *sdev, struct request *req)
> >  {
> >  	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req);
> > -	struct scsi_device *sdev = req->q->queuedata;
> >  	struct Scsi_Host *shost = sdev->host;
> >  	struct scatterlist *sg;
> >  
> > @@ -2070,7 +2069,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
> >  		goto out_dec_target_busy;
> >  
> >  	if (!(req->rq_flags & RQF_DONTPREP)) {
> > -		ret = prep_to_mq(scsi_mq_prep_fn(req));
> > +		ret = prep_to_mq(scsi_mq_prep_fn(sdev, req));
> >  		if (ret != BLK_STS_OK)
> >  			goto out_dec_host_busy;
> >  		req->rq_flags |= RQF_DONTPREP;
> 
> This patch looks useful to me since it probably realizes a (small) performance
> improvement. I think this patch does not depend on any of the previous patches
> in this series. Is that correct?

Yes, it is.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 13/14] block: simplify runtime PM support
  2018-08-07 19:54   ` Bart Van Assche
@ 2018-08-08  3:50     ` Ming Lei
  2018-08-08  7:57       ` jianchao.wang
  0 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2018-08-08  3:50 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: axboe, jthumshirn, linux-block, hch, martin.petersen, hare,
	linux-scsi, stern, jianchao.w.wang, jejb, adrian.hunter

On Tue, Aug 07, 2018 at 07:54:44PM +0000, Bart Van Assche wrote:
> On Wed, 2018-08-08 at 01:44 +0800, Ming Lei wrote:
> > @@ -3772,6 +3764,7 @@ int blk_pre_runtime_suspend(struct request_queue *q)
> >         if (!q->dev)
> >                 return ret;
> >  
> > +       mutex_lock(&q->pm_lock);
> >         spin_lock_irq(q->queue_lock);
> >         if (q->nr_pending) {
> >                 ret = -EBUSY;
> > @@ -3780,6 +3773,13 @@ int blk_pre_runtime_suspend(struct request_queue *q)
> >                 q->rpm_status = RPM_SUSPENDING;
> >         }
> 
> Hello Ming,
> 
> As far as I can see none of the patches in this series adds a call to
> blk_pm_add_request() in the blk-mq code. Does that mean that q->nr_pending
> will always be zero for blk-mq code with your approach and hence that runtime

The counter of q->nr_pending is legacy only, and I just forgot to check
blk-mq queue idle in next patch, but the runtime PM still works in this
way for blk-mq, :-)

> suspend can get triggered while I/O is in progress, e.g. if blk_queue_enter()
> is called concurrently with blk_pre_runtime_suspend()?

In this patchset, for blk-mq, runtime suspend is tried when the auto_suspend
period is expired.

Yes, blk_queue_enter() can run concurrently with blk_pre_runtime_suspend().

	1) if queue isn't frozen, blk_pre_runtime_suspend() will wait for
	completion of the coming request

	2) if queue is frozen, blk_queue_enter() will try to resume the device
	via blk_resume_queue(), and q->pm_lock is use for covering the two paths.

But I should have checked the inflight request counter in blk_pre_runtime_suspend()
like the following way before freezing queue, will add it in V2 if no
one objects this approach.

diff --git a/block/blk-core.c b/block/blk-core.c
index 26f9ceb85318..d1a5cd1da861 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -3730,6 +3730,24 @@ void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
 }
 EXPORT_SYMBOL(blk_pm_runtime_init);
 
+static void blk_mq_pm_check_idle(struct blk_mq_hw_ctx *hctx,
+		struct request *rq, void *priv, bool reserved)
+{
+	unsigned long *cnt = priv;
+
+	(*cnt)++;
+}
+
+static bool blk_mq_pm_queue_idle(struct request_queue *q)
+{
+	unsigned long idle_cnt;
+
+	idle_cnt = 0;
+	blk_mq_queue_tag_busy_iter(q, blk_mq_pm_check_idle, &idle_cnt);
+
+	return idle_cnt == 0;
+}
+
 /**
  * blk_pre_runtime_suspend - Pre runtime suspend check
  * @q: the queue of the device
@@ -3754,13 +3772,18 @@ EXPORT_SYMBOL(blk_pm_runtime_init);
 int blk_pre_runtime_suspend(struct request_queue *q)
 {
 	int ret = 0;
+	bool mq_idle = false;
 
 	if (!q->dev)
 		return ret;
 
 	mutex_lock(&q->pm_lock);
+
+	if (q->mq_ops)
+		mq_idle = blk_mq_pm_queue_idle(q);
+
 	spin_lock_irq(q->queue_lock);
-	if (q->nr_pending) {
+	if (q->nr_pending || !mq_idle) {
 		ret = -EBUSY;
 		pm_runtime_mark_last_busy(q->dev);
 	} else {

Thanks,
Ming

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 09/14] SCSI: create admin queue for each host
  2018-08-07 17:44 ` [RFC PATCH 09/14] SCSI: create admin queue for each host Ming Lei
@ 2018-08-08  5:57   ` jianchao.wang
  2018-08-08  7:11     ` Ming Lei
  0 siblings, 1 reply; 28+ messages in thread
From: jianchao.wang @ 2018-08-08  5:57 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: linux-block, Alan Stern, Christoph Hellwig, Bart Van Assche,
	Hannes Reinecke, Johannes Thumshirn, Adrian Hunter,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

Hi Ming

On 08/08/2018 01:44 AM, Ming Lei wrote:
>  
> +static struct request_queue *scsi_mq_alloc_admin_queue(struct Scsi_Host *shost)
> +{
> +	struct request_queue *q = __blk_mq_init_queue(&shost->tag_set,
> +			QUEUE_FLAG_MQ_NO_SCHED_DEFAULT);
> +	if (IS_ERR(q))
> +		return NULL;
> +
> +	q->mq_ops = &scsi_mq_admin_ops;
> +
> +	__scsi_init_queue(shost, q);
> +
> +	return q;
> +}

In your patch set, the logical adminq per host is standalone request_queue which share
the tagset with other request_queue.

Due to the hctx_may_queue, 
If only one LUN, the adminq will take away half of the driver tags when
the adminq is active.
And when multiple LUNs, all of the LUNs have to share the limited budget of tags
of the adminq.
This is unacceptable.

And also, not all the admin request is send out through scsi_execute, maybe SG_IO.
So the admin queue here looks more like the pm queue ?

Thanks
Jianchao

 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 09/14] SCSI: create admin queue for each host
  2018-08-08  5:57   ` jianchao.wang
@ 2018-08-08  7:11     ` Ming Lei
  2018-08-08  7:34       ` jianchao.wang
  0 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2018-08-08  7:11 UTC (permalink / raw)
  To: jianchao.wang
  Cc: Ming Lei, Jens Axboe, linux-block, Alan Stern, Christoph Hellwig,
	Bart Van Assche, Hannes Reinecke, Johannes Thumshirn,
	Adrian Hunter, James E.J. Bottomley, Martin K. Petersen,
	Linux SCSI List

On Wed, Aug 8, 2018 at 1:57 PM, jianchao.wang
<jianchao.w.wang@oracle.com> wrote:
> Hi Ming
>
> On 08/08/2018 01:44 AM, Ming Lei wrote:
>>
>> +static struct request_queue *scsi_mq_alloc_admin_queue(struct Scsi_Host *shost)
>> +{
>> +     struct request_queue *q = __blk_mq_init_queue(&shost->tag_set,
>> +                     QUEUE_FLAG_MQ_NO_SCHED_DEFAULT);
>> +     if (IS_ERR(q))
>> +             return NULL;
>> +
>> +     q->mq_ops = &scsi_mq_admin_ops;
>> +
>> +     __scsi_init_queue(shost, q);
>> +
>> +     return q;
>> +}
>
> In your patch set, the logical adminq per host is standalone request_queue which share
> the tagset with other request_queue.
>
> Due to the hctx_may_queue,
> If only one LUN, the adminq will take away half of the driver tags when
> the adminq is active.

Most of times, the admin queue is inactive, so it shouldn't be a big deal.

> And when multiple LUNs, all of the LUNs have to share the limited budget of tags
> of the adminq.
> This is unacceptable.

That can be solved easily, such as, let __blk_mq_tag_busy() not take
account of admin queue, will do it in V2.

>
> And also, not all the admin request is send out through scsi_execute, maybe SG_IO.
> So the admin queue here looks more like the pm queue ?

SG_IO should be covered by IO queue in which SCSI_PASSTHROUGH
is enabled. It is reasonable too since SG_IO can be normal IO from
userspace.

thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 09/14] SCSI: create admin queue for each host
  2018-08-08  7:11     ` Ming Lei
@ 2018-08-08  7:34       ` jianchao.wang
  2018-08-08  7:46         ` Ming Lei
  0 siblings, 1 reply; 28+ messages in thread
From: jianchao.wang @ 2018-08-08  7:34 UTC (permalink / raw)
  To: Ming Lei
  Cc: Ming Lei, Jens Axboe, linux-block, Alan Stern, Christoph Hellwig,
	Bart Van Assche, Hannes Reinecke, Johannes Thumshirn,
	Adrian Hunter, James E.J. Bottomley, Martin K. Petersen,
	Linux SCSI List



On 08/08/2018 03:11 PM, Ming Lei wrote:
> On Wed, Aug 8, 2018 at 1:57 PM, jianchao.wang
> <jianchao.w.wang@oracle.com> wrote:
>> Hi Ming
>>
>> On 08/08/2018 01:44 AM, Ming Lei wrote:
>>>
>>> +static struct request_queue *scsi_mq_alloc_admin_queue(struct Scsi_Host *shost)
>>> +{
>>> +     struct request_queue *q = __blk_mq_init_queue(&shost->tag_set,
>>> +                     QUEUE_FLAG_MQ_NO_SCHED_DEFAULT);
>>> +     if (IS_ERR(q))
>>> +             return NULL;
>>> +
>>> +     q->mq_ops = &scsi_mq_admin_ops;
>>> +
>>> +     __scsi_init_queue(shost, q);
>>> +
>>> +     return q;
>>> +}
>>
>> In your patch set, the logical adminq per host is standalone request_queue which share
>> the tagset with other request_queue.
>>
>> Due to the hctx_may_queue,
>> If only one LUN, the adminq will take away half of the driver tags when
>> the adminq is active.
> 
> Most of times, the admin queue is inactive, so it shouldn't be a big deal.
> 
>> And when multiple LUNs, all of the LUNs have to share the limited budget of tags
>> of the adminq.
>> This is unacceptable.
> 
> That can be solved easily, such as, let __blk_mq_tag_busy() not take
> account of admin queue, will do it in V2.
> 
>>
>> And also, not all the admin request is send out through scsi_execute, maybe SG_IO.
>> So the admin queue here looks more like the pm queue ?
> 
> SG_IO should be covered by IO queue in which SCSI_PASSTHROUGH
> is enabled. It is reasonable too since SG_IO can be normal IO from
> userspace.
> 

I just concern too much complexity would be introduced by this logical admin queue. ;)

Thanks
Jianchao

 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 09/14] SCSI: create admin queue for each host
  2018-08-08  7:34       ` jianchao.wang
@ 2018-08-08  7:46         ` Ming Lei
  0 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2018-08-08  7:46 UTC (permalink / raw)
  To: jianchao.wang
  Cc: Ming Lei, Jens Axboe, linux-block, Alan Stern, Christoph Hellwig,
	Bart Van Assche, Hannes Reinecke, Johannes Thumshirn,
	Adrian Hunter, James E.J. Bottomley, Martin K. Petersen,
	Linux SCSI List

On Wed, Aug 8, 2018 at 3:34 PM, jianchao.wang
<jianchao.w.wang@oracle.com> wrote:
>
>
> On 08/08/2018 03:11 PM, Ming Lei wrote:
>> On Wed, Aug 8, 2018 at 1:57 PM, jianchao.wang
>> <jianchao.w.wang@oracle.com> wrote:
>>> Hi Ming
>>>
>>> On 08/08/2018 01:44 AM, Ming Lei wrote:
>>>>
>>>> +static struct request_queue *scsi_mq_alloc_admin_queue(struct Scsi_Host *shost)
>>>> +{
>>>> +     struct request_queue *q = __blk_mq_init_queue(&shost->tag_set,
>>>> +                     QUEUE_FLAG_MQ_NO_SCHED_DEFAULT);
>>>> +     if (IS_ERR(q))
>>>> +             return NULL;
>>>> +
>>>> +     q->mq_ops = &scsi_mq_admin_ops;
>>>> +
>>>> +     __scsi_init_queue(shost, q);
>>>> +
>>>> +     return q;
>>>> +}
>>>
>>> In your patch set, the logical adminq per host is standalone request_queue which share
>>> the tagset with other request_queue.
>>>
>>> Due to the hctx_may_queue,
>>> If only one LUN, the adminq will take away half of the driver tags when
>>> the adminq is active.
>>
>> Most of times, the admin queue is inactive, so it shouldn't be a big deal.
>>
>>> And when multiple LUNs, all of the LUNs have to share the limited budget of tags
>>> of the adminq.
>>> This is unacceptable.
>>
>> That can be solved easily, such as, let __blk_mq_tag_busy() not take
>> account of admin queue, will do it in V2.
>>
>>>
>>> And also, not all the admin request is send out through scsi_execute, maybe SG_IO.
>>> So the admin queue here looks more like the pm queue ?
>>
>> SG_IO should be covered by IO queue in which SCSI_PASSTHROUGH
>> is enabled. It is reasonable too since SG_IO can be normal IO from
>> userspace.
>>
>
> I just concern too much complexity would be introduced by this logical admin queue. ;)

About this concern, I am pretty sure the logic is simple, :-)

If no one objects the admin queue approach.  I plan to rename
the queue flag of NO_SCHED into ADMIN in V2, then we can simply
not increase/decrease the counter of 'active_queues' in case of one
ADMIN queue.

As you saw, the complexity is reduced a lot about implementing
runtime PM and SCSI quiesce with this patch.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 13/14] block: simplify runtime PM support
  2018-08-08  3:50     ` Ming Lei
@ 2018-08-08  7:57       ` jianchao.wang
  0 siblings, 0 replies; 28+ messages in thread
From: jianchao.wang @ 2018-08-08  7:57 UTC (permalink / raw)
  To: Ming Lei, Bart Van Assche
  Cc: axboe, jthumshirn, linux-block, hch, martin.petersen, hare,
	linux-scsi, stern, jejb, adrian.hunter

Hi Ming and Bart

Would you mind to combine your solution together ? ;)

It could be like this:

blk_pre_runtime_suspend


	if (q->mq_ops) {
		if (!blk_mq_pm_queue_idle(q)) {
 		    ret = -EBUSY;
  		    pm_runtime_mark_last_busy(q->dev);
                } else {
                    blk_set_preempt_only(q);
                    synchronize_rcu()
                    if (!blk_mq_pm_queue_idle(q)) {
                        blk_clear_preempt_only(q);
                        ret = -EBUSY;
                    } else {
                    q->rpm_status = RPM_SUSPENDING;
                    }
                }
        } else {
            spin_lock_irq(q->queue_lock);
            if (q->nr_pending) {
                ret = -EBUSY;
                pm_runtime_mark_last_busy(q->dev);
            } else {
                q->rpm_status = RPM_SUSPENDING;
            }
            spin_unlock_irq(q->queue_lock);
        }

blk_queue_enter
 
    blk_resume_queue(q);
    wait_event(q->mq_freeze_wq,
 	  atomic_read(&q->mq_freeze_depth) == 0 ||
 	  blk_queue_dying(q));


Thanks
Jianchao

On 08/08/2018 11:50 AM, Ming Lei wrote:
> iff --git a/block/blk-core.c b/block/blk-core.c
> index 26f9ceb85318..d1a5cd1da861 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -3730,6 +3730,24 @@ void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
>  }
>  EXPORT_SYMBOL(blk_pm_runtime_init);
>  
> +static void blk_mq_pm_check_idle(struct blk_mq_hw_ctx *hctx,
> +		struct request *rq, void *priv, bool reserved)
> +{
> +	unsigned long *cnt = priv;
> +
> +	(*cnt)++;
> +}
> +
> +static bool blk_mq_pm_queue_idle(struct request_queue *q)
> +{
> +	unsigned long idle_cnt;
> +
> +	idle_cnt = 0;
> +	blk_mq_queue_tag_busy_iter(q, blk_mq_pm_check_idle, &idle_cnt);
> +
> +	return idle_cnt == 0;
> +}
> +
>  /**
>   * blk_pre_runtime_suspend - Pre runtime suspend check
>   * @q: the queue of the device
> @@ -3754,13 +3772,18 @@ EXPORT_SYMBOL(blk_pm_runtime_init);
>  int blk_pre_runtime_suspend(struct request_queue *q)
>  {
>  	int ret = 0;
> +	bool mq_idle = false;
>  
>  	if (!q->dev)
>  		return ret;
>  
>  	mutex_lock(&q->pm_lock);
> +
> +	if (q->mq_ops)
> +		mq_idle = blk_mq_pm_queue_idle(q);
> +
>  	spin_lock_irq(q->queue_lock);
> -	if (q->nr_pending) {
> +	if (q->nr_pending || !mq_idle) {
>  		ret = -EBUSY;
>  		pm_runtime_mark_last_busy(q->dev);
>  	} else {

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 00/14] SCSI: introduce per-host admin queue & enable runtime PM
  2018-08-07 17:44 [RFC PATCH 00/14] SCSI: introduce per-host admin queue & enable runtime PM Ming Lei
                   ` (13 preceding siblings ...)
  2018-08-07 17:44 ` [RFC PATCH 14/14] block: enable runtime PM for blk-mq Ming Lei
@ 2018-08-08 14:06 ` Alan Stern
  2018-08-08 15:25   ` Ming Lei
  14 siblings, 1 reply; 28+ messages in thread
From: Alan Stern @ 2018-08-08 14:06 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	Jianchao Wang, Hannes Reinecke, Johannes Thumshirn,
	Adrian Hunter, James E.J. Bottomley, Martin K. Petersen,
	linux-scsi

On Wed, 8 Aug 2018, Ming Lei wrote:

> Hi,
> 
> This patchset introduces per-host admin request queue for submitting
> admin request only, and uses this approach to implement both SCSI
> quiesce and runtime PM in one very simply way. Also runtime PM deadlock
> can be avoided in case that request pool is used up.
> 
> The idea is borrowed from NVMe.
> 
> Admin request is submitted via per-host admin queue, and it is still
> associated with the same scsi_device as before, and respects this
> scsi_device's all kinds of limits. Admin queue shares host tags with
> other IO queues.
> 
> One core idea is that for admin request submitted from this admin queue,
> this request won't be called back to block layer via the associated IO
> queue(scsi_device). And this is done in the 3rd patch. So once IO queue
> is frozen, it can be observed as really frozen from block layer view.
> 
> SCSI quiesce is implemented by admin queue in very simple way, see patch
> 12.
> 
> Also runtime PM for legacy path can be simplified too, see patch 13.
> 
> Finally blk-mq simply follows legacy's approach for supporting runtime PM.
> 
> Any comments are welcome!

The admin queue is meant for a few other types of request, not just PM
requests, right?

Which raises a question: How do you prevent those other types of 
request, once they are added to the admin queue, from being sent to the 
device while it is in low-power mode?

Or turn the question around: Suppose you prevent all requests, even 
those on the admin queue, from being sent to the device while it is in 
low-power mode.  Then how do you send the request which tells the 
device to go back to full power?

It seems to me that any queue-based approach needs to be aware of which
requests will actually change the device's power level.

Alan Stern

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 00/14] SCSI: introduce per-host admin queue & enable runtime PM
  2018-08-08 14:06 ` [RFC PATCH 00/14] SCSI: introduce per-host admin queue & enable runtime PM Alan Stern
@ 2018-08-08 15:25   ` Ming Lei
  0 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2018-08-08 15:25 UTC (permalink / raw)
  To: Alan Stern
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	Jianchao Wang, Hannes Reinecke, Johannes Thumshirn,
	Adrian Hunter, James E.J. Bottomley, Martin K. Petersen,
	linux-scsi

On Wed, Aug 08, 2018 at 10:06:00AM -0400, Alan Stern wrote:
> On Wed, 8 Aug 2018, Ming Lei wrote:
> 
> > Hi,
> > 
> > This patchset introduces per-host admin request queue for submitting
> > admin request only, and uses this approach to implement both SCSI
> > quiesce and runtime PM in one very simply way. Also runtime PM deadlock
> > can be avoided in case that request pool is used up.
> > 
> > The idea is borrowed from NVMe.
> > 
> > Admin request is submitted via per-host admin queue, and it is still
> > associated with the same scsi_device as before, and respects this
> > scsi_device's all kinds of limits. Admin queue shares host tags with
> > other IO queues.
> > 
> > One core idea is that for admin request submitted from this admin queue,
> > this request won't be called back to block layer via the associated IO
> > queue(scsi_device). And this is done in the 3rd patch. So once IO queue
> > is frozen, it can be observed as really frozen from block layer view.
> > 
> > SCSI quiesce is implemented by admin queue in very simple way, see patch
> > 12.
> > 
> > Also runtime PM for legacy path can be simplified too, see patch 13.
> > 
> > Finally blk-mq simply follows legacy's approach for supporting runtime PM.
> > 
> > Any comments are welcome!
> 
> The admin queue is meant for a few other types of request, not just PM
> requests, right?

Yes, they are all requests sent via scsi_execute() actually.

> 
> Which raises a question: How do you prevent those other types of 
> request, once they are added to the admin queue, from being sent to the 
> device while it is in low-power mode?

If other non-PM types of request needs to be submitted via admin queue, the
related IO queue will be resumed first, which is done via scsi_autopm_get_device()
in scsi_execute().

> 
> Or turn the question around: Suppose you prevent all requests, even 
> those on the admin queue, from being sent to the device while it is in 
> low-power mode.  Then how do you send the request which tells the 
> device to go back to full power?

For normal IO request, the IO queue is resumed before allocating the IO request.

For other non-PM request, the related IO queue is resumed via scsi_autopm_get_device()
before allocating this request from admin queue.

> 
> It seems to me that any queue-based approach needs to be aware of which
> requests will actually change the device's power level.

Now looks we suppose it is only done by RQF_PM request.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2018-08-08 15:25 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-07 17:44 [RFC PATCH 00/14] SCSI: introduce per-host admin queue & enable runtime PM Ming Lei
2018-08-07 17:44 ` [RFC PATCH 01/14] blk-mq: allow to pass default queue flags for creating & initializing queue Ming Lei
2018-08-07 17:44 ` [RFC PATCH 02/14] blk-mq: convert BLK_MQ_F_NO_SCHED into per-queue flag Ming Lei
2018-08-07 17:44 ` [RFC PATCH 03/14] SCSI: try to retrieve request_queue via 'scsi_cmnd' if possible Ming Lei
2018-08-07 17:44 ` [RFC PATCH 04/14] SCSI: pass 'scsi_device' instance from 'scsi_request' Ming Lei
2018-08-07 17:44 ` [RFC PATCH 05/14] SCSI: prepare for introducing admin queue for legacy path Ming Lei
2018-08-07 17:44 ` [RFC PATCH 06/14] SCSI: pass scsi_device to scsi_mq_prep_fn Ming Lei
2018-08-07 23:24   ` Bart Van Assche
2018-08-08  3:37     ` Ming Lei
2018-08-07 17:44 ` [RFC PATCH 07/14] SCSI: don't set .queuedata in scsi_mq_alloc_queue() Ming Lei
2018-08-07 17:44 ` [RFC PATCH 08/14] SCSI: deal with admin queue busy Ming Lei
2018-08-07 17:44 ` [RFC PATCH 09/14] SCSI: create admin queue for each host Ming Lei
2018-08-08  5:57   ` jianchao.wang
2018-08-08  7:11     ` Ming Lei
2018-08-08  7:34       ` jianchao.wang
2018-08-08  7:46         ` Ming Lei
2018-08-07 17:44 ` [RFC PATCH 10/14] SCSI: use the dedicated admin queue to send admin commands Ming Lei
2018-08-07 23:33   ` Bart Van Assche
2018-08-08  3:36     ` Ming Lei
2018-08-07 17:44 ` [RFC PATCH 11/14] SCSI: transport_spi: resume a quiesced device Ming Lei
2018-08-07 17:44 ` [RFC PATCH 12/14] SCSI: use admin queue to implement queue QUIESCE Ming Lei
2018-08-07 17:44 ` [RFC PATCH 13/14] block: simplify runtime PM support Ming Lei
2018-08-07 19:54   ` Bart Van Assche
2018-08-08  3:50     ` Ming Lei
2018-08-08  7:57       ` jianchao.wang
2018-08-07 17:44 ` [RFC PATCH 14/14] block: enable runtime PM for blk-mq Ming Lei
2018-08-08 14:06 ` [RFC PATCH 00/14] SCSI: introduce per-host admin queue & enable runtime PM Alan Stern
2018-08-08 15:25   ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.