All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/4] blk-mq: support runtime PM
@ 2018-07-11 16:29 Ming Lei
  2018-07-11 16:29 ` [PATCH RFC 1/4] blk-mq: introduce blk_mq_support_runtime_pm() Ming Lei
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Ming Lei @ 2018-07-11 16:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Rafael J. Wysocki, Alan Stern, linux-pm,
	Greg Kroah-Hartman, Christoph Hellwig, Bart Van Assche,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

Hi Guys,

Runtime PM is usually enabled for SCSI devices, and we are switching to
SCSI_MQ recently, but runtime PM isn't supported yet by blk-mq, and
people may complain that.

This patch tries to support runtime PM for blk-mq. And one chanllenge is
that it can be quite expensive to account the active in-flight IOs for
figuring out when to mark the last busy. This patch simply marks busy
after each non-PM IO is done, and this way is workable because:

1) pm_runtime_mark_last_busy() is very cheap

2) in-flight non-PM IO is checked in blk_pre_runtime_suspend(), so
if there is any IO queued, the device will be prevented from being
suspened.

3) Generally speaking, autosuspend_delay_ms is often big, and should
be in unit of second, so it shouldn't be a big deal to check if queue
is idle in blk_pre_runtime_suspend().

Ming Lei (4):
  blk-mq: introduce blk_mq_support_runtime_pm()
  blk-mq: introduce blk_mq_pm_queue_idle()
  blk-mq: prepare for supporting runtime PM
  scsi_mq: enable runtime PM

 block/blk-core.c        | 14 +++++++++++---
 block/blk-mq.c          | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 block/blk-mq.h          |  3 +++
 drivers/scsi/scsi_lib.c |  3 ++-
 include/linux/blk-mq.h  | 10 ++++++++++
 5 files changed, 72 insertions(+), 4 deletions(-)

Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-pm@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org


-- 
2.9.5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH RFC 1/4] blk-mq: introduce blk_mq_support_runtime_pm()
  2018-07-11 16:29 [PATCH RFC 0/4] blk-mq: support runtime PM Ming Lei
@ 2018-07-11 16:29 ` Ming Lei
  2018-07-11 17:10   ` Christoph Hellwig
  2018-07-11 16:29 ` [PATCH RFC 2/4] blk-mq: introduce blk_mq_pm_queue_idle() Ming Lei
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 23+ messages in thread
From: Ming Lei @ 2018-07-11 16:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Rafael J. Wysocki, Alan Stern, linux-pm,
	Greg Kroah-Hartman, Christoph Hellwig, Bart Van Assche,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

A bit cost has to be added in fast IO path for supporting runtime PM, and
introduce this helper so that drivers incapable of runtime PM may not be
affected by this feature.

Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-pm@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/linux/blk-mq.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index d710e92874cc..998ad6602448 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -180,6 +180,7 @@ enum {
 	BLK_MQ_F_SHOULD_MERGE	= 1 << 0,
 	BLK_MQ_F_TAG_SHARED	= 1 << 1,
 	BLK_MQ_F_SG_MERGE	= 1 << 2,
+	BLK_MQ_F_SUPPORT_RPM	= 1 << 3,
 	BLK_MQ_F_BLOCKING	= 1 << 5,
 	BLK_MQ_F_NO_SCHED	= 1 << 6,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
@@ -249,6 +250,15 @@ static inline u16 blk_mq_unique_tag_to_tag(u32 unique_tag)
 	return unique_tag & BLK_MQ_UNIQUE_TAG_MASK;
 }
 
+static inline bool blk_mq_support_runtime_pm(struct request_queue *q)
+{
+	struct blk_mq_tag_set *set = q->tag_set;
+
+	if (!q->dev || !set || !(set->flags & BLK_MQ_F_SUPPORT_RPM))
+		return false;
+
+	return true;
+}
 
 int blk_mq_request_started(struct request *rq);
 void blk_mq_start_request(struct request *rq);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH RFC 2/4] blk-mq: introduce blk_mq_pm_queue_idle()
  2018-07-11 16:29 [PATCH RFC 0/4] blk-mq: support runtime PM Ming Lei
  2018-07-11 16:29 ` [PATCH RFC 1/4] blk-mq: introduce blk_mq_support_runtime_pm() Ming Lei
@ 2018-07-11 16:29 ` Ming Lei
  2018-07-11 17:11   ` Christoph Hellwig
  2018-07-11 16:29 ` [PATCH RFC 3/4] blk-mq: prepare for supporting runtime PM Ming Lei
  2018-07-11 16:29 ` [PATCH RFC 4/4] scsi_mq: enable " Ming Lei
  3 siblings, 1 reply; 23+ messages in thread
From: Ming Lei @ 2018-07-11 16:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Rafael J. Wysocki, Alan Stern, linux-pm,
	Greg Kroah-Hartman, Christoph Hellwig, Bart Van Assche,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

This helper will be used to decide if this block device can be
put into runtime suspend.

Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-pm@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 22 ++++++++++++++++++++++
 block/blk-mq.h |  3 +++
 2 files changed, 25 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 73a43b81b17d..3a78fed87959 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -58,6 +58,28 @@ static int blk_mq_poll_stats_bkt(const struct request *rq)
 	return bucket;
 }
 
+static void blk_mq_pm_check_idle(struct blk_mq_hw_ctx *hctx,
+		struct request *rq, void *priv, bool reserved)
+{
+	unsigned long *cnt = priv;
+
+	if (!(rq->rq_flags & RQF_PM))
+		(*cnt)++;
+}
+
+bool blk_mq_pm_queue_idle(struct request_queue *q)
+{
+	unsigned long idle_cnt;
+
+	if (!blk_mq_support_runtime_pm(q))
+		return false;
+
+	idle_cnt = 0;
+	blk_mq_queue_tag_busy_iter(q, blk_mq_pm_check_idle, &idle_cnt);
+
+	return idle_cnt == 0;
+}
+
 /*
  * Check if any of the ctx's have pending work in this hardware queue
  */
diff --git a/block/blk-mq.h b/block/blk-mq.h
index bc2b24735ed4..b64f6f69176a 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -88,6 +88,9 @@ extern void blk_mq_hctx_kobj_init(struct blk_mq_hw_ctx *hctx);
 
 void blk_mq_release(struct request_queue *q);
 
+/* blk-mq pm helpers */
+bool blk_mq_pm_queue_idle(struct request_queue *q);
+
 /**
  * blk_mq_rq_state() - read the current MQ_RQ_* state of a request
  * @rq: target request.
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH RFC 3/4] blk-mq: prepare for supporting runtime PM
  2018-07-11 16:29 [PATCH RFC 0/4] blk-mq: support runtime PM Ming Lei
  2018-07-11 16:29 ` [PATCH RFC 1/4] blk-mq: introduce blk_mq_support_runtime_pm() Ming Lei
  2018-07-11 16:29 ` [PATCH RFC 2/4] blk-mq: introduce blk_mq_pm_queue_idle() Ming Lei
@ 2018-07-11 16:29 ` Ming Lei
  2018-07-11 17:19   ` Christoph Hellwig
  2018-07-12  9:58   ` Ming Lei
  2018-07-11 16:29 ` [PATCH RFC 4/4] scsi_mq: enable " Ming Lei
  3 siblings, 2 replies; 23+ messages in thread
From: Ming Lei @ 2018-07-11 16:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Rafael J. Wysocki, Alan Stern, linux-pm,
	Greg Kroah-Hartman, Christoph Hellwig, Bart Van Assche,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

This patch introduces blk_mq_pm_add_request() which is called after
allocating one request. Also blk_mq_pm_put_request() is introduced
and called after one request is freed.

For blk-mq, it can be quite expensive to accounting in-flight IOs,
so this patch calls pm_runtime_mark_last_busy() simply after each IO
is done, instead of doing that only after the last in-flight IO is done.
This way is still workable, since the active non-PM IO will be checked
in blk_pre_runtime_suspend(), and runtime suspend will be prevented
if there is any active non-PM IO.

Also makes blk_post_runtime_resume() to cover blk-mq.

Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-pm@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c | 12 ++++++++++--
 block/blk-mq.c   | 24 ++++++++++++++++++++++++
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index c4b57d8806fe..bf66d561980d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -3804,12 +3804,17 @@ EXPORT_SYMBOL(blk_pm_runtime_init);
 int blk_pre_runtime_suspend(struct request_queue *q)
 {
 	int ret = 0;
+	bool active;
 
 	if (!q->dev)
 		return ret;
 
 	spin_lock_irq(q->queue_lock);
-	if (q->nr_pending) {
+	if (!q->mq_ops)
+		active = !!q->nr_pending;
+	else
+		active = !blk_mq_pm_queue_idle(q);
+	if (active) {
 		ret = -EBUSY;
 		pm_runtime_mark_last_busy(q->dev);
 	} else {
@@ -3893,7 +3898,10 @@ void blk_post_runtime_resume(struct request_queue *q, int err)
 	spin_lock_irq(q->queue_lock);
 	if (!err) {
 		q->rpm_status = RPM_ACTIVE;
-		__blk_run_queue(q);
+		if (!q->mq_ops)
+			__blk_run_queue(q);
+		else
+			blk_mq_run_hw_queues(q, true);
 		pm_runtime_mark_last_busy(q->dev);
 		pm_request_autosuspend(q->dev);
 	} else {
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3a78fed87959..50dd259f798f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -25,6 +25,7 @@
 #include <linux/delay.h>
 #include <linux/crash_dump.h>
 #include <linux/prefetch.h>
+#include <linux/pm_runtime.h>
 
 #include <trace/events/block.h>
 
@@ -80,6 +81,25 @@ bool blk_mq_pm_queue_idle(struct request_queue *q)
 	return idle_cnt == 0;
 }
 
+static void blk_mq_pm_add_request(struct request_queue *q, struct request *rq)
+{
+	if (!blk_mq_support_runtime_pm(q))
+		return;
+
+	if (q->dev && !(rq->rq_flags & RQF_PM) &&
+	    (q->rpm_status == RPM_SUSPENDED || q->rpm_status == RPM_SUSPENDING))
+		pm_request_resume(q->dev);
+}
+
+static void blk_mq_pm_put_request(struct request_queue *q, struct request *rq)
+{
+	if (!blk_mq_support_runtime_pm(q))
+		return;
+
+	if (q->dev && !(rq->rq_flags & RQF_PM))
+		pm_runtime_mark_last_busy(q->dev);
+}
+
 /*
  * Check if any of the ctx's have pending work in this hardware queue
  */
@@ -531,6 +551,8 @@ void blk_mq_free_request(struct request *rq)
 	if (blk_rq_rl(rq))
 		blk_put_rl(blk_rq_rl(rq));
 
+	blk_mq_pm_put_request(q, rq);
+
 	WRITE_ONCE(rq->state, MQ_RQ_IDLE);
 	if (refcount_dec_and_test(&rq->ref))
 		__blk_mq_free_request(rq);
@@ -1841,6 +1863,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 
 	rq_qos_track(q, rq, bio);
 
+	blk_mq_pm_add_request(q, rq);
+
 	cookie = request_to_qc_t(data.hctx, rq);
 
 	plug = current->plug;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH RFC 4/4] scsi_mq: enable runtime PM
  2018-07-11 16:29 [PATCH RFC 0/4] blk-mq: support runtime PM Ming Lei
                   ` (2 preceding siblings ...)
  2018-07-11 16:29 ` [PATCH RFC 3/4] blk-mq: prepare for supporting runtime PM Ming Lei
@ 2018-07-11 16:29 ` Ming Lei
  2018-07-11 17:14   ` Christoph Hellwig
  2018-07-11 17:28   ` Alan Stern
  3 siblings, 2 replies; 23+ messages in thread
From: Ming Lei @ 2018-07-11 16:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Rafael J. Wysocki, Alan Stern, linux-pm,
	Greg Kroah-Hartman, Christoph Hellwig, Bart Van Assche,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

Usually SCSI supports runtime PM, so pass BLK_MQ_F_SUPPORT_RPM to blk-mq
core for enabling block runtime PM.

Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-pm@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c        | 2 +-
 drivers/scsi/scsi_lib.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index bf66d561980d..9e47205366ab 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -3770,7 +3770,7 @@ EXPORT_SYMBOL(blk_finish_plug);
 void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
 {
 	/* not support for RQF_PM and ->rpm_status in blk-mq yet */
-	if (q->mq_ops)
+	if (q->mq_ops && !(q->tag_set->flags & BLK_MQ_F_SUPPORT_RPM))
 		return;
 
 	q->dev = dev;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 41e9ac9fc138..fa4667aa4732 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2306,7 +2306,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
 	shost->tag_set.queue_depth = shost->can_queue;
 	shost->tag_set.cmd_size = cmd_size;
 	shost->tag_set.numa_node = NUMA_NO_NODE;
-	shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
+	shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE |
+		BLK_MQ_F_SG_MERGE | BLK_MQ_F_SUPPORT_RPM;
 	shost->tag_set.flags |=
 		BLK_ALLOC_POLICY_TO_MQ_FLAG(shost->hostt->tag_alloc_policy);
 	shost->tag_set.driver_data = shost;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 1/4] blk-mq: introduce blk_mq_support_runtime_pm()
  2018-07-11 16:29 ` [PATCH RFC 1/4] blk-mq: introduce blk_mq_support_runtime_pm() Ming Lei
@ 2018-07-11 17:10   ` Christoph Hellwig
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2018-07-11 17:10 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Rafael J. Wysocki, Alan Stern, linux-pm,
	Greg Kroah-Hartman, Christoph Hellwig, Bart Van Assche,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

On Thu, Jul 12, 2018 at 12:29:03AM +0800, Ming Lei wrote:
> A bit cost has to be added in fast IO path for supporting runtime PM, and
> introduce this helper so that drivers incapable of runtime PM may not be
> affected by this feature.

I don't think this is useful on it's own and should go with the
patch that uses it.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 2/4] blk-mq: introduce blk_mq_pm_queue_idle()
  2018-07-11 16:29 ` [PATCH RFC 2/4] blk-mq: introduce blk_mq_pm_queue_idle() Ming Lei
@ 2018-07-11 17:11   ` Christoph Hellwig
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2018-07-11 17:11 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Rafael J. Wysocki, Alan Stern, linux-pm,
	Greg Kroah-Hartman, Christoph Hellwig, Bart Van Assche,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

On Thu, Jul 12, 2018 at 12:29:04AM +0800, Ming Lei wrote:
> This helper will be used to decide if this block device can be
> put into runtime suspend.

This also seems like a rather odd building block, and would probably
make more sense with a user..

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 4/4] scsi_mq: enable runtime PM
  2018-07-11 16:29 ` [PATCH RFC 4/4] scsi_mq: enable " Ming Lei
@ 2018-07-11 17:14   ` Christoph Hellwig
  2018-07-12  1:36     ` Ming Lei
  2018-07-11 17:28   ` Alan Stern
  1 sibling, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2018-07-11 17:14 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Rafael J. Wysocki, Alan Stern, linux-pm,
	Greg Kroah-Hartman, Christoph Hellwig, Bart Van Assche,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

> @@ -3770,7 +3770,7 @@ EXPORT_SYMBOL(blk_finish_plug);
>  void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
>  {
>  	/* not support for RQF_PM and ->rpm_status in blk-mq yet */
> -	if (q->mq_ops)
> +	if (q->mq_ops && !(q->tag_set->flags & BLK_MQ_F_SUPPORT_RPM))
>  		return;
>  
>  	q->dev = dev;

This should go into a block layer patch, not a scsi one.

> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 41e9ac9fc138..fa4667aa4732 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2306,7 +2306,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
>  	shost->tag_set.queue_depth = shost->can_queue;
>  	shost->tag_set.cmd_size = cmd_size;
>  	shost->tag_set.numa_node = NUMA_NO_NODE;
> -	shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
> +	shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE |
> +		BLK_MQ_F_SG_MERGE | BLK_MQ_F_SUPPORT_RPM;

As far as I can tell only ufs and libata support runtime PM, so
we should probably only enable it for those.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 3/4] blk-mq: prepare for supporting runtime PM
  2018-07-11 16:29 ` [PATCH RFC 3/4] blk-mq: prepare for supporting runtime PM Ming Lei
@ 2018-07-11 17:19   ` Christoph Hellwig
  2018-07-12  9:58   ` Ming Lei
  1 sibling, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2018-07-11 17:19 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Rafael J. Wysocki, Alan Stern, linux-pm,
	Greg Kroah-Hartman, Christoph Hellwig, Bart Van Assche,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

> index c4b57d8806fe..bf66d561980d 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -3804,12 +3804,17 @@ EXPORT_SYMBOL(blk_pm_runtime_init);
>  int blk_pre_runtime_suspend(struct request_queue *q)
>  {
>  	int ret = 0;
> +	bool active;
>  
>  	if (!q->dev)
>  		return ret;
>  
>  	spin_lock_irq(q->queue_lock);
> -	if (q->nr_pending) {
> +	if (!q->mq_ops)
> +		active = !!q->nr_pending;
> +	else
> +		active = !blk_mq_pm_queue_idle(q);
> +	if (active) {

We shouldn't really need queue_lock for blk-mq.  Also the !! is not
really needed when assigning to a bool.

> +static void blk_mq_pm_add_request(struct request_queue *q, struct request *rq)
> +{
> +	if (!blk_mq_support_runtime_pm(q))
> +		return;
> +
> +	if (q->dev && !(rq->rq_flags & RQF_PM) &&
> +	    (q->rpm_status == RPM_SUSPENDED || q->rpm_status == RPM_SUSPENDING))
> +		pm_request_resume(q->dev);

blk_mq_support_runtime_pm already checks for q->dev.   Also to mee
it sems just opencoding blk_mq_pm_add_request / blk_mq_pm_put_request
in the callers would seems more obvious.

> @@ -1841,6 +1863,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>  
>  	rq_qos_track(q, rq, bio);
>  
> +	blk_mq_pm_add_request(q, rq);

This doesn't seem to handle passthrough requests and not actually
pair with blk_mq_free_request, is that intentional?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 4/4] scsi_mq: enable runtime PM
  2018-07-11 16:29 ` [PATCH RFC 4/4] scsi_mq: enable " Ming Lei
  2018-07-11 17:14   ` Christoph Hellwig
@ 2018-07-11 17:28   ` Alan Stern
  1 sibling, 0 replies; 23+ messages in thread
From: Alan Stern @ 2018-07-11 17:28 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Rafael J. Wysocki, linux-pm,
	Greg Kroah-Hartman, Christoph Hellwig, Bart Van Assche,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

On Thu, 12 Jul 2018, Ming Lei wrote:

> Usually SCSI supports runtime PM, so pass BLK_MQ_F_SUPPORT_RPM to blk-mq
> core for enabling block runtime PM.
> 
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: linux-pm@vger.kernel.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
> Cc: linux-scsi@vger.kernel.org
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-core.c        | 2 +-
>  drivers/scsi/scsi_lib.c | 3 ++-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index bf66d561980d..9e47205366ab 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -3770,7 +3770,7 @@ EXPORT_SYMBOL(blk_finish_plug);
>  void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
>  {
>  	/* not support for RQF_PM and ->rpm_status in blk-mq yet */

Shouldn't that comment be removed?

Alan Stern

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 4/4] scsi_mq: enable runtime PM
  2018-07-11 17:14   ` Christoph Hellwig
@ 2018-07-12  1:36     ` Ming Lei
  2018-07-12  7:17       ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Ming Lei @ 2018-07-12  1:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Rafael J. Wysocki, Alan Stern, linux-pm,
	Greg Kroah-Hartman, Bart Van Assche, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi

Hi Christoph,

On Wed, Jul 11, 2018 at 07:14:09PM +0200, Christoph Hellwig wrote:
> > @@ -3770,7 +3770,7 @@ EXPORT_SYMBOL(blk_finish_plug);
> >  void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
> >  {
> >  	/* not support for RQF_PM and ->rpm_status in blk-mq yet */
> > -	if (q->mq_ops)
> > +	if (q->mq_ops && !(q->tag_set->flags & BLK_MQ_F_SUPPORT_RPM))
> >  		return;
> >  
> >  	q->dev = dev;
> 
> This should go into a block layer patch, not a scsi one.

OK.

> 
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index 41e9ac9fc138..fa4667aa4732 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -2306,7 +2306,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
> >  	shost->tag_set.queue_depth = shost->can_queue;
> >  	shost->tag_set.cmd_size = cmd_size;
> >  	shost->tag_set.numa_node = NUMA_NO_NODE;
> > -	shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
> > +	shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE |
> > +		BLK_MQ_F_SG_MERGE | BLK_MQ_F_SUPPORT_RPM;
> 
> As far as I can tell only ufs and libata support runtime PM, so
> we should probably only enable it for those.

For legacy path, blk_pm_runtime_init() is always run for all SCSI devices,
and this patch just follows the old way, so that we can keep runtime PM
behaviour not changed from user view. That means if we want to only enable
for ufs & libata, it should be in another standalone patch, instead of
this one.

I will address all your other comments and Alan's in V2.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 4/4] scsi_mq: enable runtime PM
  2018-07-12  1:36     ` Ming Lei
@ 2018-07-12  7:17       ` Christoph Hellwig
  2018-07-12 12:21         ` Ming Lei
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2018-07-12  7:17 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Rafael J. Wysocki,
	Alan Stern, linux-pm, Greg Kroah-Hartman, Bart Van Assche,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

On Thu, Jul 12, 2018 at 09:36:49AM +0800, Ming Lei wrote:
> > As far as I can tell only ufs and libata support runtime PM, so
> > we should probably only enable it for those.
> 
> For legacy path, blk_pm_runtime_init() is always run for all SCSI devices,
> and this patch just follows the old way, so that we can keep runtime PM
> behaviour not changed from user view. That means if we want to only enable
> for ufs & libata, it should be in another standalone patch, instead of
> this one.

Please add this to the series as a separate patch then.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 3/4] blk-mq: prepare for supporting runtime PM
  2018-07-11 16:29 ` [PATCH RFC 3/4] blk-mq: prepare for supporting runtime PM Ming Lei
  2018-07-11 17:19   ` Christoph Hellwig
@ 2018-07-12  9:58   ` Ming Lei
  2018-07-12 12:28     ` Ming Lei
  1 sibling, 1 reply; 23+ messages in thread
From: Ming Lei @ 2018-07-12  9:58 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Rafael J. Wysocki, Alan Stern, linux-pm,
	Greg Kroah-Hartman, Christoph Hellwig, Bart Van Assche,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

On Thu, Jul 12, 2018 at 12:29:05AM +0800, Ming Lei wrote:
> This patch introduces blk_mq_pm_add_request() which is called after
> allocating one request. Also blk_mq_pm_put_request() is introduced
> and called after one request is freed.
> 
> For blk-mq, it can be quite expensive to accounting in-flight IOs,
> so this patch calls pm_runtime_mark_last_busy() simply after each IO
> is done, instead of doing that only after the last in-flight IO is done.
> This way is still workable, since the active non-PM IO will be checked
> in blk_pre_runtime_suspend(), and runtime suspend will be prevented
> if there is any active non-PM IO.
> 
> Also makes blk_post_runtime_resume() to cover blk-mq.
> 
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: linux-pm@vger.kernel.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
> Cc: linux-scsi@vger.kernel.org
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-core.c | 12 ++++++++++--
>  block/blk-mq.c   | 24 ++++++++++++++++++++++++
>  2 files changed, 34 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index c4b57d8806fe..bf66d561980d 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -3804,12 +3804,17 @@ EXPORT_SYMBOL(blk_pm_runtime_init);
>  int blk_pre_runtime_suspend(struct request_queue *q)
>  {
>  	int ret = 0;
> +	bool active;
>  
>  	if (!q->dev)
>  		return ret;
>  
>  	spin_lock_irq(q->queue_lock);
> -	if (q->nr_pending) {
> +	if (!q->mq_ops)
> +		active = !!q->nr_pending;
> +	else
> +		active = !blk_mq_pm_queue_idle(q);
> +	if (active) {
>  		ret = -EBUSY;
>  		pm_runtime_mark_last_busy(q->dev);
>  	} else {

Looks there is one big issue, one new IO may come just after reading
'active' and before writing RPM_SUSPENDING to q->rpm_status, and both
the suspending and the new IO may be in-progress at the same time.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 4/4] scsi_mq: enable runtime PM
  2018-07-12  7:17       ` Christoph Hellwig
@ 2018-07-12 12:21         ` Ming Lei
  0 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2018-07-12 12:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Rafael J. Wysocki, Alan Stern, linux-pm,
	Greg Kroah-Hartman, Bart Van Assche, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi

On Thu, Jul 12, 2018 at 09:17:58AM +0200, Christoph Hellwig wrote:
> On Thu, Jul 12, 2018 at 09:36:49AM +0800, Ming Lei wrote:
> > > As far as I can tell only ufs and libata support runtime PM, so
> > > we should probably only enable it for those.
> > 
> > For legacy path, blk_pm_runtime_init() is always run for all SCSI devices,
> > and this patch just follows the old way, so that we can keep runtime PM
> > behaviour not changed from user view. That means if we want to only enable
> > for ufs & libata, it should be in another standalone patch, instead of
> > this one.
> 
> Please add this to the series as a separate patch then.

blk_pm_runtime_init() is called from sd/sr, that means all sd/sr device
may support runtime PM. And the command of START_STOP will be run for
runtime suspend/resume at least even though the host isn't involved.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 3/4] blk-mq: prepare for supporting runtime PM
  2018-07-12  9:58   ` Ming Lei
@ 2018-07-12 12:28     ` Ming Lei
  2018-07-12 14:00       ` Jens Axboe
  0 siblings, 1 reply; 23+ messages in thread
From: Ming Lei @ 2018-07-12 12:28 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Rafael J. Wysocki, Alan Stern, linux-pm,
	Greg Kroah-Hartman, Christoph Hellwig, Bart Van Assche,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

On Thu, Jul 12, 2018 at 05:58:28PM +0800, Ming Lei wrote:
> On Thu, Jul 12, 2018 at 12:29:05AM +0800, Ming Lei wrote:
> > This patch introduces blk_mq_pm_add_request() which is called after
> > allocating one request. Also blk_mq_pm_put_request() is introduced
> > and called after one request is freed.
> > 
> > For blk-mq, it can be quite expensive to accounting in-flight IOs,
> > so this patch calls pm_runtime_mark_last_busy() simply after each IO
> > is done, instead of doing that only after the last in-flight IO is done.
> > This way is still workable, since the active non-PM IO will be checked
> > in blk_pre_runtime_suspend(), and runtime suspend will be prevented
> > if there is any active non-PM IO.
> > 
> > Also makes blk_post_runtime_resume() to cover blk-mq.
> > 
> > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> > Cc: Alan Stern <stern@rowland.harvard.edu>
> > Cc: linux-pm@vger.kernel.org
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Bart Van Assche <bart.vanassche@wdc.com>
> > Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
> > Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
> > Cc: linux-scsi@vger.kernel.org
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  block/blk-core.c | 12 ++++++++++--
> >  block/blk-mq.c   | 24 ++++++++++++++++++++++++
> >  2 files changed, 34 insertions(+), 2 deletions(-)
> > 
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index c4b57d8806fe..bf66d561980d 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -3804,12 +3804,17 @@ EXPORT_SYMBOL(blk_pm_runtime_init);
> >  int blk_pre_runtime_suspend(struct request_queue *q)
> >  {
> >  	int ret = 0;
> > +	bool active;
> >  
> >  	if (!q->dev)
> >  		return ret;
> >  
> >  	spin_lock_irq(q->queue_lock);
> > -	if (q->nr_pending) {
> > +	if (!q->mq_ops)
> > +		active = !!q->nr_pending;
> > +	else
> > +		active = !blk_mq_pm_queue_idle(q);
> > +	if (active) {
> >  		ret = -EBUSY;
> >  		pm_runtime_mark_last_busy(q->dev);
> >  	} else {
> 
> Looks there is one big issue, one new IO may come just after reading
> 'active' and before writing RPM_SUSPENDING to q->rpm_status, and both
> the suspending and the new IO may be in-progress at the same time.

One idea I thought of is to use seqlock to sync changing & reading q->rpm_status,
and looks read lock(read_seqcount_begin/read_seqcount_retry) shouldn't introduce
big cost in fast path.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 3/4] blk-mq: prepare for supporting runtime PM
  2018-07-12 12:28     ` Ming Lei
@ 2018-07-12 14:00       ` Jens Axboe
  2018-07-12 21:32         ` Ming Lei
  0 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2018-07-12 14:00 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-block, Rafael J. Wysocki, Alan Stern, linux-pm,
	Greg Kroah-Hartman, Christoph Hellwig, Bart Van Assche,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

On 7/12/18 6:28 AM, Ming Lei wrote:
> On Thu, Jul 12, 2018 at 05:58:28PM +0800, Ming Lei wrote:
>> On Thu, Jul 12, 2018 at 12:29:05AM +0800, Ming Lei wrote:
>>> This patch introduces blk_mq_pm_add_request() which is called after
>>> allocating one request. Also blk_mq_pm_put_request() is introduced
>>> and called after one request is freed.
>>>
>>> For blk-mq, it can be quite expensive to accounting in-flight IOs,
>>> so this patch calls pm_runtime_mark_last_busy() simply after each IO
>>> is done, instead of doing that only after the last in-flight IO is done.
>>> This way is still workable, since the active non-PM IO will be checked
>>> in blk_pre_runtime_suspend(), and runtime suspend will be prevented
>>> if there is any active non-PM IO.
>>>
>>> Also makes blk_post_runtime_resume() to cover blk-mq.
>>>
>>> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
>>> Cc: Alan Stern <stern@rowland.harvard.edu>
>>> Cc: linux-pm@vger.kernel.org
>>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>> Cc: Christoph Hellwig <hch@lst.de>
>>> Cc: Bart Van Assche <bart.vanassche@wdc.com>
>>> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
>>> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
>>> Cc: linux-scsi@vger.kernel.org
>>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>>> ---
>>>  block/blk-core.c | 12 ++++++++++--
>>>  block/blk-mq.c   | 24 ++++++++++++++++++++++++
>>>  2 files changed, 34 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>> index c4b57d8806fe..bf66d561980d 100644
>>> --- a/block/blk-core.c
>>> +++ b/block/blk-core.c
>>> @@ -3804,12 +3804,17 @@ EXPORT_SYMBOL(blk_pm_runtime_init);
>>>  int blk_pre_runtime_suspend(struct request_queue *q)
>>>  {
>>>  	int ret = 0;
>>> +	bool active;
>>>  
>>>  	if (!q->dev)
>>>  		return ret;
>>>  
>>>  	spin_lock_irq(q->queue_lock);
>>> -	if (q->nr_pending) {
>>> +	if (!q->mq_ops)
>>> +		active = !!q->nr_pending;
>>> +	else
>>> +		active = !blk_mq_pm_queue_idle(q);
>>> +	if (active) {
>>>  		ret = -EBUSY;
>>>  		pm_runtime_mark_last_busy(q->dev);
>>>  	} else {
>>
>> Looks there is one big issue, one new IO may come just after reading
>> 'active' and before writing RPM_SUSPENDING to q->rpm_status, and both
>> the suspending and the new IO may be in-progress at the same time.
> 
> One idea I thought of is to use seqlock to sync changing & reading q->rpm_status,
> and looks read lock(read_seqcount_begin/read_seqcount_retry) shouldn't introduce
> big cost in fast path.

Let's please keep in mind that this is runtime pm stuff. Better to
make the rules relaxed around it, instead of adding synchronization.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 3/4] blk-mq: prepare for supporting runtime PM
  2018-07-12 14:00       ` Jens Axboe
@ 2018-07-12 21:32         ` Ming Lei
  2018-07-12 21:44           ` Jens Axboe
  0 siblings, 1 reply; 23+ messages in thread
From: Ming Lei @ 2018-07-12 21:32 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ming Lei, linux-block, Rafael J. Wysocki, Alan Stern,
	Linux PM List, Greg Kroah-Hartman, Christoph Hellwig,
	Bart Van Assche, James E.J. Bottomley, Martin K. Petersen,
	Linux SCSI List

On Thu, Jul 12, 2018 at 10:00 PM, Jens Axboe <axboe@kernel.dk> wrote:
> On 7/12/18 6:28 AM, Ming Lei wrote:
>> On Thu, Jul 12, 2018 at 05:58:28PM +0800, Ming Lei wrote:
>>> On Thu, Jul 12, 2018 at 12:29:05AM +0800, Ming Lei wrote:
>>>> This patch introduces blk_mq_pm_add_request() which is called after
>>>> allocating one request. Also blk_mq_pm_put_request() is introduced
>>>> and called after one request is freed.
>>>>
>>>> For blk-mq, it can be quite expensive to accounting in-flight IOs,
>>>> so this patch calls pm_runtime_mark_last_busy() simply after each IO
>>>> is done, instead of doing that only after the last in-flight IO is done.
>>>> This way is still workable, since the active non-PM IO will be checked
>>>> in blk_pre_runtime_suspend(), and runtime suspend will be prevented
>>>> if there is any active non-PM IO.
>>>>
>>>> Also makes blk_post_runtime_resume() to cover blk-mq.
>>>>
>>>> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
>>>> Cc: Alan Stern <stern@rowland.harvard.edu>
>>>> Cc: linux-pm@vger.kernel.org
>>>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>> Cc: Christoph Hellwig <hch@lst.de>
>>>> Cc: Bart Van Assche <bart.vanassche@wdc.com>
>>>> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
>>>> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
>>>> Cc: linux-scsi@vger.kernel.org
>>>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>>>> ---
>>>>  block/blk-core.c | 12 ++++++++++--
>>>>  block/blk-mq.c   | 24 ++++++++++++++++++++++++
>>>>  2 files changed, 34 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>>> index c4b57d8806fe..bf66d561980d 100644
>>>> --- a/block/blk-core.c
>>>> +++ b/block/blk-core.c
>>>> @@ -3804,12 +3804,17 @@ EXPORT_SYMBOL(blk_pm_runtime_init);
>>>>  int blk_pre_runtime_suspend(struct request_queue *q)
>>>>  {
>>>>     int ret = 0;
>>>> +   bool active;
>>>>
>>>>     if (!q->dev)
>>>>             return ret;
>>>>
>>>>     spin_lock_irq(q->queue_lock);
>>>> -   if (q->nr_pending) {
>>>> +   if (!q->mq_ops)
>>>> +           active = !!q->nr_pending;
>>>> +   else
>>>> +           active = !blk_mq_pm_queue_idle(q);
>>>> +   if (active) {
>>>>             ret = -EBUSY;
>>>>             pm_runtime_mark_last_busy(q->dev);
>>>>     } else {
>>>
>>> Looks there is one big issue, one new IO may come just after reading
>>> 'active' and before writing RPM_SUSPENDING to q->rpm_status, and both
>>> the suspending and the new IO may be in-progress at the same time.
>>
>> One idea I thought of is to use seqlock to sync changing & reading q->rpm_status,
>> and looks read lock(read_seqcount_begin/read_seqcount_retry) shouldn't introduce
>> big cost in fast path.
>
> Let's please keep in mind that this is runtime pm stuff. Better to
> make the rules relaxed around it, instead of adding synchronization.

But the race has to be avoided, otherwise IO may be failed. I don't
find any simple solution yet for avoiding the race without adding sync.

Any idea for avoiding the race without using sync like seqlock or others?


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 3/4] blk-mq: prepare for supporting runtime PM
  2018-07-12 21:32         ` Ming Lei
@ 2018-07-12 21:44           ` Jens Axboe
  2018-07-12 23:15             ` Ming Lei
  0 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2018-07-12 21:44 UTC (permalink / raw)
  To: Ming Lei
  Cc: Ming Lei, linux-block, Rafael J. Wysocki, Alan Stern,
	Linux PM List, Greg Kroah-Hartman, Christoph Hellwig,
	Bart Van Assche, James E.J. Bottomley, Martin K. Petersen,
	Linux SCSI List

On 7/12/18 3:32 PM, Ming Lei wrote:
> On Thu, Jul 12, 2018 at 10:00 PM, Jens Axboe <axboe@kernel.dk> wrote:
>> On 7/12/18 6:28 AM, Ming Lei wrote:
>>> On Thu, Jul 12, 2018 at 05:58:28PM +0800, Ming Lei wrote:
>>>> On Thu, Jul 12, 2018 at 12:29:05AM +0800, Ming Lei wrote:
>>>>> This patch introduces blk_mq_pm_add_request() which is called after
>>>>> allocating one request. Also blk_mq_pm_put_request() is introduced
>>>>> and called after one request is freed.
>>>>>
>>>>> For blk-mq, it can be quite expensive to accounting in-flight IOs,
>>>>> so this patch calls pm_runtime_mark_last_busy() simply after each IO
>>>>> is done, instead of doing that only after the last in-flight IO is done.
>>>>> This way is still workable, since the active non-PM IO will be checked
>>>>> in blk_pre_runtime_suspend(), and runtime suspend will be prevented
>>>>> if there is any active non-PM IO.
>>>>>
>>>>> Also makes blk_post_runtime_resume() to cover blk-mq.
>>>>>
>>>>> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
>>>>> Cc: Alan Stern <stern@rowland.harvard.edu>
>>>>> Cc: linux-pm@vger.kernel.org
>>>>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>> Cc: Christoph Hellwig <hch@lst.de>
>>>>> Cc: Bart Van Assche <bart.vanassche@wdc.com>
>>>>> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
>>>>> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
>>>>> Cc: linux-scsi@vger.kernel.org
>>>>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>>>>> ---
>>>>>  block/blk-core.c | 12 ++++++++++--
>>>>>  block/blk-mq.c   | 24 ++++++++++++++++++++++++
>>>>>  2 files changed, 34 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>>>> index c4b57d8806fe..bf66d561980d 100644
>>>>> --- a/block/blk-core.c
>>>>> +++ b/block/blk-core.c
>>>>> @@ -3804,12 +3804,17 @@ EXPORT_SYMBOL(blk_pm_runtime_init);
>>>>>  int blk_pre_runtime_suspend(struct request_queue *q)
>>>>>  {
>>>>>     int ret = 0;
>>>>> +   bool active;
>>>>>
>>>>>     if (!q->dev)
>>>>>             return ret;
>>>>>
>>>>>     spin_lock_irq(q->queue_lock);
>>>>> -   if (q->nr_pending) {
>>>>> +   if (!q->mq_ops)
>>>>> +           active = !!q->nr_pending;
>>>>> +   else
>>>>> +           active = !blk_mq_pm_queue_idle(q);
>>>>> +   if (active) {
>>>>>             ret = -EBUSY;
>>>>>             pm_runtime_mark_last_busy(q->dev);
>>>>>     } else {
>>>>
>>>> Looks there is one big issue, one new IO may come just after reading
>>>> 'active' and before writing RPM_SUSPENDING to q->rpm_status, and both
>>>> the suspending and the new IO may be in-progress at the same time.
>>>
>>> One idea I thought of is to use seqlock to sync changing & reading q->rpm_status,
>>> and looks read lock(read_seqcount_begin/read_seqcount_retry) shouldn't introduce
>>> big cost in fast path.
>>
>> Let's please keep in mind that this is runtime pm stuff. Better to
>> make the rules relaxed around it, instead of adding synchronization.
> 
> But the race has to be avoided, otherwise IO may be failed. I don't
> find any simple solution yet for avoiding the race without adding sync.
> 
> Any idea for avoiding the race without using sync like seqlock or others?

I just don't want anything like this in the hot path. Why can't we
handle this similarly to how we handle request timeouts? It'll
potentially delay the suspend by a few seconds, but surely that can't be
a big deal. I don't see why we need to track this on a per-request
basis.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 3/4] blk-mq: prepare for supporting runtime PM
  2018-07-12 21:44           ` Jens Axboe
@ 2018-07-12 23:15             ` Ming Lei
  2018-07-13 14:20               ` Jens Axboe
  0 siblings, 1 reply; 23+ messages in thread
From: Ming Lei @ 2018-07-12 23:15 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ming Lei, linux-block, Rafael J. Wysocki, Alan Stern,
	Linux PM List, Greg Kroah-Hartman, Christoph Hellwig,
	Bart Van Assche, James E.J. Bottomley, Martin K. Petersen,
	Linux SCSI List

On Thu, Jul 12, 2018 at 03:44:05PM -0600, Jens Axboe wrote:
> On 7/12/18 3:32 PM, Ming Lei wrote:
> > On Thu, Jul 12, 2018 at 10:00 PM, Jens Axboe <axboe@kernel.dk> wrote:
> >> On 7/12/18 6:28 AM, Ming Lei wrote:
> >>> On Thu, Jul 12, 2018 at 05:58:28PM +0800, Ming Lei wrote:
> >>>> On Thu, Jul 12, 2018 at 12:29:05AM +0800, Ming Lei wrote:
> >>>>> This patch introduces blk_mq_pm_add_request() which is called after
> >>>>> allocating one request. Also blk_mq_pm_put_request() is introduced
> >>>>> and called after one request is freed.
> >>>>>
> >>>>> For blk-mq, it can be quite expensive to accounting in-flight IOs,
> >>>>> so this patch calls pm_runtime_mark_last_busy() simply after each IO
> >>>>> is done, instead of doing that only after the last in-flight IO is done.
> >>>>> This way is still workable, since the active non-PM IO will be checked
> >>>>> in blk_pre_runtime_suspend(), and runtime suspend will be prevented
> >>>>> if there is any active non-PM IO.
> >>>>>
> >>>>> Also makes blk_post_runtime_resume() to cover blk-mq.
> >>>>>
> >>>>> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> >>>>> Cc: Alan Stern <stern@rowland.harvard.edu>
> >>>>> Cc: linux-pm@vger.kernel.org
> >>>>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> >>>>> Cc: Christoph Hellwig <hch@lst.de>
> >>>>> Cc: Bart Van Assche <bart.vanassche@wdc.com>
> >>>>> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
> >>>>> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
> >>>>> Cc: linux-scsi@vger.kernel.org
> >>>>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> >>>>> ---
> >>>>>  block/blk-core.c | 12 ++++++++++--
> >>>>>  block/blk-mq.c   | 24 ++++++++++++++++++++++++
> >>>>>  2 files changed, 34 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/block/blk-core.c b/block/blk-core.c
> >>>>> index c4b57d8806fe..bf66d561980d 100644
> >>>>> --- a/block/blk-core.c
> >>>>> +++ b/block/blk-core.c
> >>>>> @@ -3804,12 +3804,17 @@ EXPORT_SYMBOL(blk_pm_runtime_init);
> >>>>>  int blk_pre_runtime_suspend(struct request_queue *q)
> >>>>>  {
> >>>>>     int ret = 0;
> >>>>> +   bool active;
> >>>>>
> >>>>>     if (!q->dev)
> >>>>>             return ret;
> >>>>>
> >>>>>     spin_lock_irq(q->queue_lock);
> >>>>> -   if (q->nr_pending) {
> >>>>> +   if (!q->mq_ops)
> >>>>> +           active = !!q->nr_pending;
> >>>>> +   else
> >>>>> +           active = !blk_mq_pm_queue_idle(q);
> >>>>> +   if (active) {
> >>>>>             ret = -EBUSY;
> >>>>>             pm_runtime_mark_last_busy(q->dev);
> >>>>>     } else {
> >>>>
> >>>> Looks there is one big issue, one new IO may come just after reading
> >>>> 'active' and before writing RPM_SUSPENDING to q->rpm_status, and both
> >>>> the suspending and the new IO may be in-progress at the same time.
> >>>
> >>> One idea I thought of is to use seqlock to sync changing & reading q->rpm_status,
> >>> and looks read lock(read_seqcount_begin/read_seqcount_retry) shouldn't introduce
> >>> big cost in fast path.
> >>
> >> Let's please keep in mind that this is runtime pm stuff. Better to
> >> make the rules relaxed around it, instead of adding synchronization.
> > 
> > But the race has to be avoided, otherwise IO may be failed. I don't
> > find any simple solution yet for avoiding the race without adding sync.
> > 
> > Any idea for avoiding the race without using sync like seqlock or others?
> 
> I just don't want anything like this in the hot path. Why can't we
> handle this similarly to how we handle request timeouts? It'll
> potentially delay the suspend by a few seconds, but surely that can't be
> a big deal. I don't see why we need to track this on a per-request
> basis.

For legacy path, there is the queue lock, so no the race mentioned.

I guess you mean why we can't use RCU style to deal with this issue, so
we don't introduce cost in fast path, but the problem is that IO has
to be submitted to one active hardware, that is one invariant of runtime
PM.

So RCU/SRCU won't fix this issue because the rcu_read_lock sync nothing,
and we have to make sure that hardware is ready before dispatching IO to
hardware/driver. That is why I think sort of sync is required in IO path.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 3/4] blk-mq: prepare for supporting runtime PM
  2018-07-12 23:15             ` Ming Lei
@ 2018-07-13 14:20               ` Jens Axboe
  2018-07-13 20:27                 ` Alan Stern
  0 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2018-07-13 14:20 UTC (permalink / raw)
  To: Ming Lei
  Cc: Ming Lei, linux-block, Rafael J. Wysocki, Alan Stern,
	Linux PM List, Greg Kroah-Hartman, Christoph Hellwig,
	Bart Van Assche, James E.J. Bottomley, Martin K. Petersen,
	Linux SCSI List

On 7/12/18 5:15 PM, Ming Lei wrote:
> On Thu, Jul 12, 2018 at 03:44:05PM -0600, Jens Axboe wrote:
>> On 7/12/18 3:32 PM, Ming Lei wrote:
>>> On Thu, Jul 12, 2018 at 10:00 PM, Jens Axboe <axboe@kernel.dk> wrote:
>>>> On 7/12/18 6:28 AM, Ming Lei wrote:
>>>>> On Thu, Jul 12, 2018 at 05:58:28PM +0800, Ming Lei wrote:
>>>>>> On Thu, Jul 12, 2018 at 12:29:05AM +0800, Ming Lei wrote:
>>>>>>> This patch introduces blk_mq_pm_add_request() which is called after
>>>>>>> allocating one request. Also blk_mq_pm_put_request() is introduced
>>>>>>> and called after one request is freed.
>>>>>>>
>>>>>>> For blk-mq, it can be quite expensive to accounting in-flight IOs,
>>>>>>> so this patch calls pm_runtime_mark_last_busy() simply after each IO
>>>>>>> is done, instead of doing that only after the last in-flight IO is done.
>>>>>>> This way is still workable, since the active non-PM IO will be checked
>>>>>>> in blk_pre_runtime_suspend(), and runtime suspend will be prevented
>>>>>>> if there is any active non-PM IO.
>>>>>>>
>>>>>>> Also makes blk_post_runtime_resume() to cover blk-mq.
>>>>>>>
>>>>>>> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
>>>>>>> Cc: Alan Stern <stern@rowland.harvard.edu>
>>>>>>> Cc: linux-pm@vger.kernel.org
>>>>>>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>> Cc: Christoph Hellwig <hch@lst.de>
>>>>>>> Cc: Bart Van Assche <bart.vanassche@wdc.com>
>>>>>>> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
>>>>>>> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
>>>>>>> Cc: linux-scsi@vger.kernel.org
>>>>>>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>>>>>>> ---
>>>>>>>  block/blk-core.c | 12 ++++++++++--
>>>>>>>  block/blk-mq.c   | 24 ++++++++++++++++++++++++
>>>>>>>  2 files changed, 34 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>>>>>> index c4b57d8806fe..bf66d561980d 100644
>>>>>>> --- a/block/blk-core.c
>>>>>>> +++ b/block/blk-core.c
>>>>>>> @@ -3804,12 +3804,17 @@ EXPORT_SYMBOL(blk_pm_runtime_init);
>>>>>>>  int blk_pre_runtime_suspend(struct request_queue *q)
>>>>>>>  {
>>>>>>>     int ret = 0;
>>>>>>> +   bool active;
>>>>>>>
>>>>>>>     if (!q->dev)
>>>>>>>             return ret;
>>>>>>>
>>>>>>>     spin_lock_irq(q->queue_lock);
>>>>>>> -   if (q->nr_pending) {
>>>>>>> +   if (!q->mq_ops)
>>>>>>> +           active = !!q->nr_pending;
>>>>>>> +   else
>>>>>>> +           active = !blk_mq_pm_queue_idle(q);
>>>>>>> +   if (active) {
>>>>>>>             ret = -EBUSY;
>>>>>>>             pm_runtime_mark_last_busy(q->dev);
>>>>>>>     } else {
>>>>>>
>>>>>> Looks there is one big issue, one new IO may come just after reading
>>>>>> 'active' and before writing RPM_SUSPENDING to q->rpm_status, and both
>>>>>> the suspending and the new IO may be in-progress at the same time.
>>>>>
>>>>> One idea I thought of is to use seqlock to sync changing & reading q->rpm_status,
>>>>> and looks read lock(read_seqcount_begin/read_seqcount_retry) shouldn't introduce
>>>>> big cost in fast path.
>>>>
>>>> Let's please keep in mind that this is runtime pm stuff. Better to
>>>> make the rules relaxed around it, instead of adding synchronization.
>>>
>>> But the race has to be avoided, otherwise IO may be failed. I don't
>>> find any simple solution yet for avoiding the race without adding sync.
>>>
>>> Any idea for avoiding the race without using sync like seqlock or others?
>>
>> I just don't want anything like this in the hot path. Why can't we
>> handle this similarly to how we handle request timeouts? It'll
>> potentially delay the suspend by a few seconds, but surely that can't be
>> a big deal. I don't see why we need to track this on a per-request
>> basis.
> 
> For legacy path, there is the queue lock, so no the race mentioned.
> 
> I guess you mean why we can't use RCU style to deal with this issue, so
> we don't introduce cost in fast path, but the problem is that IO has
> to be submitted to one active hardware, that is one invariant of runtime
> PM.
> 
> So RCU/SRCU won't fix this issue because the rcu_read_lock sync nothing,
> and we have to make sure that hardware is ready before dispatching IO to
> hardware/driver. That is why I think sort of sync is required in IO path.

That's not what I meant at all. As I wrote above, I don't want it in the
hot path at all, and certainly not as a per-request thing. We already
have a timer on blk-mq that runs while requests are pending, by
definition the last time that timer triggers, the device is idle. If you
need to do heavier lifting to ensure we only runtime suspend at that
point, then THAT'S the place to do it, not adding extra code per
request. I don't care how cheap the perceived locking is, it's still
extra code and checks for each and every request. That is what I am
objecting to.

Who even uses the runtime pm stuff?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 3/4] blk-mq: prepare for supporting runtime PM
  2018-07-13 14:20               ` Jens Axboe
@ 2018-07-13 20:27                 ` Alan Stern
  2018-07-13 20:39                   ` Jens Axboe
  0 siblings, 1 reply; 23+ messages in thread
From: Alan Stern @ 2018-07-13 20:27 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ming Lei, Ming Lei, linux-block, Rafael J. Wysocki,
	Linux PM List, Greg Kroah-Hartman, Christoph Hellwig,
	Bart Van Assche, James E.J. Bottomley, Martin K. Petersen,
	Linux SCSI List

On Fri, 13 Jul 2018, Jens Axboe wrote:

> >>> Any idea for avoiding the race without using sync like seqlock or others?
> >>
> >> I just don't want anything like this in the hot path. Why can't we
> >> handle this similarly to how we handle request timeouts? It'll
> >> potentially delay the suspend by a few seconds, but surely that can't be
> >> a big deal. I don't see why we need to track this on a per-request
> >> basis.
> > 
> > For legacy path, there is the queue lock, so no the race mentioned.
> > 
> > I guess you mean why we can't use RCU style to deal with this issue, so
> > we don't introduce cost in fast path, but the problem is that IO has
> > to be submitted to one active hardware, that is one invariant of runtime
> > PM.
> > 
> > So RCU/SRCU won't fix this issue because the rcu_read_lock sync nothing,
> > and we have to make sure that hardware is ready before dispatching IO to
> > hardware/driver. That is why I think sort of sync is required in IO path.
> 
> That's not what I meant at all. As I wrote above, I don't want it in the
> hot path at all, and certainly not as a per-request thing. We already
> have a timer on blk-mq that runs while requests are pending, by
> definition the last time that timer triggers, the device is idle. If you
> need to do heavier lifting to ensure we only runtime suspend at that
> point, then THAT'S the place to do it, not adding extra code per
> request. I don't care how cheap the perceived locking is, it's still
> extra code and checks for each and every request. That is what I am
> objecting to.

The problem occurs on the opposite side: when a new request is added,
we don't want it to race with a just-started suspend transition.  Can
you suggest a way to prevent this without adding any overhead to the
hot path?

For that matter, we also have the issue of checking whether the device
is already suspended when a request is added; in that case we have to
resume the device before issuing the request.  I'm not aware of any way
to avoid performing this check in the hot path.

Is there already some synchronization in place for plugging or stopping 
a request queue?  If there is, could the runtime-PM code make use of 
it?  We might need to add a state in which a queue is blocked for 
normal requests but allows PM-related request to run.

Alan Stern

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 3/4] blk-mq: prepare for supporting runtime PM
  2018-07-13 20:27                 ` Alan Stern
@ 2018-07-13 20:39                   ` Jens Axboe
  2018-07-13 22:47                     ` Ming Lei
  0 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2018-07-13 20:39 UTC (permalink / raw)
  To: Alan Stern
  Cc: Ming Lei, Ming Lei, linux-block, Rafael J. Wysocki,
	Linux PM List, Greg Kroah-Hartman, Christoph Hellwig,
	Bart Van Assche, James E.J. Bottomley, Martin K. Petersen,
	Linux SCSI List

On 7/13/18 2:27 PM, Alan Stern wrote:
> On Fri, 13 Jul 2018, Jens Axboe wrote:
> 
>>>>> Any idea for avoiding the race without using sync like seqlock or others?
>>>>
>>>> I just don't want anything like this in the hot path. Why can't we
>>>> handle this similarly to how we handle request timeouts? It'll
>>>> potentially delay the suspend by a few seconds, but surely that can't be
>>>> a big deal. I don't see why we need to track this on a per-request
>>>> basis.
>>>
>>> For legacy path, there is the queue lock, so no the race mentioned.
>>>
>>> I guess you mean why we can't use RCU style to deal with this issue, so
>>> we don't introduce cost in fast path, but the problem is that IO has
>>> to be submitted to one active hardware, that is one invariant of runtime
>>> PM.
>>>
>>> So RCU/SRCU won't fix this issue because the rcu_read_lock sync nothing,
>>> and we have to make sure that hardware is ready before dispatching IO to
>>> hardware/driver. That is why I think sort of sync is required in IO path.
>>
>> That's not what I meant at all. As I wrote above, I don't want it in the
>> hot path at all, and certainly not as a per-request thing. We already
>> have a timer on blk-mq that runs while requests are pending, by
>> definition the last time that timer triggers, the device is idle. If you
>> need to do heavier lifting to ensure we only runtime suspend at that
>> point, then THAT'S the place to do it, not adding extra code per
>> request. I don't care how cheap the perceived locking is, it's still
>> extra code and checks for each and every request. That is what I am
>> objecting to.
> 
> The problem occurs on the opposite side: when a new request is added,
> we don't want it to race with a just-started suspend transition.  Can
> you suggest a way to prevent this without adding any overhead to the
> hot path?
> 
> For that matter, we also have the issue of checking whether the device
> is already suspended when a request is added; in that case we have to
> resume the device before issuing the request.  I'm not aware of any way
> to avoid performing this check in the hot path.

The issue is on both sides, of course. The problem, to me, appears to be
attempting to retrofit the old pre-suspend checking to blk-mq, where it
could be done a lot more optimally by having the suspend side be driven
by the timeout timer, and resume could be driven by first request
entering on an idle queue.

Doing a per-request inc/dec type of tracking with synchronization is the
easy approach/lazy approach, but it's also woefully inefficient. Any
sort of per-queue tracking for blk-mq is not a great idea.

> Is there already some synchronization in place for plugging or stopping 
> a request queue?  If there is, could the runtime-PM code make use of 
> it?  We might need to add a state in which a queue is blocked for 
> normal requests but allows PM-related request to run.

Sure, blk-mq has a plethora of helpers for that, since we use it in
other places as well. And it might not be a bad idea to extend that to
cover this case as well. See blk_queue_enter() and the queue freeze and
queuescing. This is already per-request overhead we're paying.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 3/4] blk-mq: prepare for supporting runtime PM
  2018-07-13 20:39                   ` Jens Axboe
@ 2018-07-13 22:47                     ` Ming Lei
  0 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2018-07-13 22:47 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Alan Stern, Ming Lei, linux-block, Rafael J. Wysocki,
	Linux PM List, Greg Kroah-Hartman, Christoph Hellwig,
	Bart Van Assche, James E.J. Bottomley, Martin K. Petersen,
	Linux SCSI List

On Fri, Jul 13, 2018 at 02:39:15PM -0600, Jens Axboe wrote:
> On 7/13/18 2:27 PM, Alan Stern wrote:
> > On Fri, 13 Jul 2018, Jens Axboe wrote:
> > 
> >>>>> Any idea for avoiding the race without using sync like seqlock or others?
> >>>>
> >>>> I just don't want anything like this in the hot path. Why can't we
> >>>> handle this similarly to how we handle request timeouts? It'll
> >>>> potentially delay the suspend by a few seconds, but surely that can't be
> >>>> a big deal. I don't see why we need to track this on a per-request
> >>>> basis.
> >>>
> >>> For legacy path, there is the queue lock, so no the race mentioned.
> >>>
> >>> I guess you mean why we can't use RCU style to deal with this issue, so
> >>> we don't introduce cost in fast path, but the problem is that IO has
> >>> to be submitted to one active hardware, that is one invariant of runtime
> >>> PM.
> >>>
> >>> So RCU/SRCU won't fix this issue because the rcu_read_lock sync nothing,
> >>> and we have to make sure that hardware is ready before dispatching IO to
> >>> hardware/driver. That is why I think sort of sync is required in IO path.
> >>
> >> That's not what I meant at all. As I wrote above, I don't want it in the
> >> hot path at all, and certainly not as a per-request thing. We already
> >> have a timer on blk-mq that runs while requests are pending, by
> >> definition the last time that timer triggers, the device is idle. If you
> >> need to do heavier lifting to ensure we only runtime suspend at that
> >> point, then THAT'S the place to do it, not adding extra code per
> >> request. I don't care how cheap the perceived locking is, it's still
> >> extra code and checks for each and every request. That is what I am
> >> objecting to.
> > 
> > The problem occurs on the opposite side: when a new request is added,
> > we don't want it to race with a just-started suspend transition.  Can
> > you suggest a way to prevent this without adding any overhead to the
> > hot path?
> > 
> > For that matter, we also have the issue of checking whether the device
> > is already suspended when a request is added; in that case we have to
> > resume the device before issuing the request.  I'm not aware of any way
> > to avoid performing this check in the hot path.
> 
> The issue is on both sides, of course. The problem, to me, appears to be
> attempting to retrofit the old pre-suspend checking to blk-mq, where it
> could be done a lot more optimally by having the suspend side be driven
> by the timeout timer, and

Timeout timer won't be one accepted way from user view since the default
timeout is 30s, but autosuspend delay may be just several seconds, but
that isn't the current problem.

Now the problem isn't in suspend side, I did think about this way.
The issue is that we could enter suspending when queue isn't idle, like
there are only RQF_PM requests pended.

> resume could be driven by first request entering on an idle queue.

The 1st request may be RQF_PM request.

> 
> Doing a per-request inc/dec type of tracking with synchronization is the
> easy approach/lazy approach, but it's also woefully inefficient. Any
> sort of per-queue tracking for blk-mq is not a great idea.
> 
> > Is there already some synchronization in place for plugging or stopping 
> > a request queue?  If there is, could the runtime-PM code make use of 
> > it?  We might need to add a state in which a queue is blocked for 
> > normal requests but allows PM-related request to run.
> 
> Sure, blk-mq has a plethora of helpers for that, since we use it in
> other places as well. And it might not be a bad idea to extend that to
> cover this case as well. See blk_queue_enter() and the queue freeze and
> queuescing. This is already per-request overhead we're paying.

Before entering runtime suspend, there can't be any normal in-flight IO, so
quiesce won't work since quiesce only prevents new dispatching and doesn't
drain queue.

Freeze might work, but blk_queue_enter() will becomes a bit complicated:

1) queue freeze can be done before runtime suspending

2) when any new IO comes, the queue has to be unfrozen.

3) when any RQF_PM request comes, the queue has to be kept in freezing,
and allows this request to be crossed.

I will think about this approach further and see if it can be done in
easy way.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-07-13 22:47 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-11 16:29 [PATCH RFC 0/4] blk-mq: support runtime PM Ming Lei
2018-07-11 16:29 ` [PATCH RFC 1/4] blk-mq: introduce blk_mq_support_runtime_pm() Ming Lei
2018-07-11 17:10   ` Christoph Hellwig
2018-07-11 16:29 ` [PATCH RFC 2/4] blk-mq: introduce blk_mq_pm_queue_idle() Ming Lei
2018-07-11 17:11   ` Christoph Hellwig
2018-07-11 16:29 ` [PATCH RFC 3/4] blk-mq: prepare for supporting runtime PM Ming Lei
2018-07-11 17:19   ` Christoph Hellwig
2018-07-12  9:58   ` Ming Lei
2018-07-12 12:28     ` Ming Lei
2018-07-12 14:00       ` Jens Axboe
2018-07-12 21:32         ` Ming Lei
2018-07-12 21:44           ` Jens Axboe
2018-07-12 23:15             ` Ming Lei
2018-07-13 14:20               ` Jens Axboe
2018-07-13 20:27                 ` Alan Stern
2018-07-13 20:39                   ` Jens Axboe
2018-07-13 22:47                     ` Ming Lei
2018-07-11 16:29 ` [PATCH RFC 4/4] scsi_mq: enable " Ming Lei
2018-07-11 17:14   ` Christoph Hellwig
2018-07-12  1:36     ` Ming Lei
2018-07-12  7:17       ` Christoph Hellwig
2018-07-12 12:21         ` Ming Lei
2018-07-11 17:28   ` Alan Stern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.