All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC V2 0/3] blk-mq: support runtime PM
@ 2018-07-13  8:05 Ming Lei
  2018-07-13  8:06 ` [PATCH RFC V2 1/3] block: put runtime PM code into common helpers Ming Lei
                   ` (3 more replies)
  0 siblings, 4 replies; 34+ messages in thread
From: Ming Lei @ 2018-07-13  8:05 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Rafael J. Wysocki, Alan Stern, linux-pm,
	Greg Kroah-Hartman, Christoph Hellwig, Bart Van Assche,
	Hannes Reinecke, Johannes Thumshirn, Adrian Hunter,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

Hi Guys,

Runtime PM is usually enabled for SCSI devices, and we are switching to
SCSI_MQ recently, but runtime PM isn't supported yet by blk-mq, and
people may complain that.

This patch tries to support runtime PM for blk-mq. And one chanllenge is
that it can be quite expensive to account the active in-flight IOs for
figuring out when to mark the last busy. This patch simply marks busy
after each non-PM IO is done, and this way is workable because:

1) pm_runtime_mark_last_busy() is very cheap

2) in-flight non-PM IO is checked in blk_pre_runtime_suspend(), so
if there is any IO queued, the device will be prevented from being
suspened.

3) Generally speaking, autosuspend_delay_ms is often big, and should
be in unit of second, so it shouldn't be a big deal to check if queue
is idle in blk_pre_runtime_suspend().


V2:
	- re-organize code as suggested by Christoph
	- use seqlock to sync runtime PM and IO path

Ming Lei (3):
  block: put runtime PM code into common helpers
  blk-mq: prepare for supporting runtime PM
  scsi_mq: enable runtime PM

 block/blk-core.c        | 176 +++++++++++++++++++++++++++++++++++++++---------
 block/blk-mq.c          |  71 +++++++++++++++++++
 block/blk-mq.h          |  10 +++
 drivers/scsi/scsi_lib.c |   3 +-
 include/linux/blk-mq.h  |   1 +
 include/linux/blkdev.h  |   1 +
 6 files changed, 228 insertions(+), 34 deletions(-)

Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-pm@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org


-- 
2.9.5

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH RFC V2 1/3] block: put runtime PM code into common helpers
  2018-07-13  8:05 [PATCH RFC V2 0/3] blk-mq: support runtime PM Ming Lei
@ 2018-07-13  8:06 ` Ming Lei
  2018-07-17 13:21   ` Christoph Hellwig
  2018-07-13  8:06 ` [PATCH RFC V2 2/3] blk-mq: prepare for supporting runtime PM Ming Lei
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 34+ messages in thread
From: Ming Lei @ 2018-07-13  8:06 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Rafael J. Wysocki, Alan Stern, linux-pm,
	Greg Kroah-Hartman, Christoph Hellwig, Bart Van Assche,
	Hannes Reinecke, Johannes Thumshirn, Adrian Hunter,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

So that the following patch can reuse these helpers for supporting
runtime PM on blk-mq.

No function change.

Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-pm@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c | 71 ++++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 48 insertions(+), 23 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index c4b57d8806fe..1087a58590f1 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -3746,6 +3746,50 @@ void blk_finish_plug(struct blk_plug *plug)
 EXPORT_SYMBOL(blk_finish_plug);
 
 #ifdef CONFIG_PM
+static int __blk_pre_runtime_suspend(struct request_queue *q, bool active)
+{
+	int ret;
+
+	if (active) {
+		ret = -EBUSY;
+		pm_runtime_mark_last_busy(q->dev);
+	} else {
+		ret = 0;
+		q->rpm_status = RPM_SUSPENDING;
+	}
+
+	return ret;
+}
+
+static void __blk_post_runtime_suspend(struct request_queue *q, int err)
+{
+	if (!err) {
+		q->rpm_status = RPM_SUSPENDED;
+	} else {
+		q->rpm_status = RPM_ACTIVE;
+		pm_runtime_mark_last_busy(q->dev);
+	}
+}
+
+static void __blk_post_runtime_resume(struct request_queue *q, int err)
+{
+	if (!err) {
+		q->rpm_status = RPM_ACTIVE;
+		__blk_run_queue(q);
+		pm_runtime_mark_last_busy(q->dev);
+		pm_request_autosuspend(q->dev);
+	} else {
+		q->rpm_status = RPM_SUSPENDED;
+	}
+}
+
+static void __blk_set_runtime_active(struct request_queue *q)
+{
+	q->rpm_status = RPM_ACTIVE;
+	pm_runtime_mark_last_busy(q->dev);
+	pm_request_autosuspend(q->dev);
+}
+
 /**
  * blk_pm_runtime_init - Block layer runtime PM initialization routine
  * @q: the queue of the device
@@ -3809,12 +3853,7 @@ int blk_pre_runtime_suspend(struct request_queue *q)
 		return ret;
 
 	spin_lock_irq(q->queue_lock);
-	if (q->nr_pending) {
-		ret = -EBUSY;
-		pm_runtime_mark_last_busy(q->dev);
-	} else {
-		q->rpm_status = RPM_SUSPENDING;
-	}
+	ret = __blk_pre_runtime_suspend(q, q->nr_pending);
 	spin_unlock_irq(q->queue_lock);
 	return ret;
 }
@@ -3839,12 +3878,7 @@ void blk_post_runtime_suspend(struct request_queue *q, int err)
 		return;
 
 	spin_lock_irq(q->queue_lock);
-	if (!err) {
-		q->rpm_status = RPM_SUSPENDED;
-	} else {
-		q->rpm_status = RPM_ACTIVE;
-		pm_runtime_mark_last_busy(q->dev);
-	}
+	__blk_post_runtime_suspend(q, err);
 	spin_unlock_irq(q->queue_lock);
 }
 EXPORT_SYMBOL(blk_post_runtime_suspend);
@@ -3891,14 +3925,7 @@ void blk_post_runtime_resume(struct request_queue *q, int err)
 		return;
 
 	spin_lock_irq(q->queue_lock);
-	if (!err) {
-		q->rpm_status = RPM_ACTIVE;
-		__blk_run_queue(q);
-		pm_runtime_mark_last_busy(q->dev);
-		pm_request_autosuspend(q->dev);
-	} else {
-		q->rpm_status = RPM_SUSPENDED;
-	}
+	__blk_post_runtime_resume(q, err);
 	spin_unlock_irq(q->queue_lock);
 }
 EXPORT_SYMBOL(blk_post_runtime_resume);
@@ -3920,9 +3947,7 @@ EXPORT_SYMBOL(blk_post_runtime_resume);
 void blk_set_runtime_active(struct request_queue *q)
 {
 	spin_lock_irq(q->queue_lock);
-	q->rpm_status = RPM_ACTIVE;
-	pm_runtime_mark_last_busy(q->dev);
-	pm_request_autosuspend(q->dev);
+	__blk_set_runtime_active(q);
 	spin_unlock_irq(q->queue_lock);
 }
 EXPORT_SYMBOL(blk_set_runtime_active);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC V2 2/3] blk-mq: prepare for supporting runtime PM
  2018-07-13  8:05 [PATCH RFC V2 0/3] blk-mq: support runtime PM Ming Lei
  2018-07-13  8:06 ` [PATCH RFC V2 1/3] block: put runtime PM code into common helpers Ming Lei
@ 2018-07-13  8:06 ` Ming Lei
  2018-07-13 20:16   ` Alan Stern
  2018-07-17 13:23   ` Christoph Hellwig
  2018-07-13  8:06 ` [PATCH RFC V2 3/3] scsi_mq: enable " Ming Lei
  2018-07-13 14:21 ` [PATCH RFC V2 0/3] blk-mq: support " Jens Axboe
  3 siblings, 2 replies; 34+ messages in thread
From: Ming Lei @ 2018-07-13  8:06 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Rafael J. Wysocki, Alan Stern, linux-pm,
	Greg Kroah-Hartman, Christoph Hellwig, Bart Van Assche,
	Hannes Reinecke, Johannes Thumshirn, Adrian Hunter,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

This patch introduces blk_mq_pm_add_request() which is called after
allocating one request. Also blk_mq_pm_put_request() is introduced
and called after one request is freed.

For blk-mq, it can be quite expensive to accounting in-flight IOs,
so this patch calls pm_runtime_mark_last_busy() simply after each IO
is done, instead of doing that only after the last in-flight IO is done.
This way is still workable, since the active non-PM IO will be checked
in blk_pre_runtime_suspend(), and runtime suspend will be prevented
if there is any active non-PM IO.

Turns out that sync between runtime PM and IO path has to be done
for avoiding race, this patch applies one seqlock for this purpose.
So the cost introduced in fast IO path can be minimized given seqlock
is often used in fast path, such as reading jiffies &tick, or d_walk(),
...

Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-pm@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c       | 121 +++++++++++++++++++++++++++++++++++++++++--------
 block/blk-mq.c         |  71 +++++++++++++++++++++++++++++
 block/blk-mq.h         |  10 ++++
 include/linux/blk-mq.h |   1 +
 include/linux/blkdev.h |   1 +
 5 files changed, 186 insertions(+), 18 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 1087a58590f1..cd73db90d1e3 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -3775,7 +3775,10 @@ static void __blk_post_runtime_resume(struct request_queue *q, int err)
 {
 	if (!err) {
 		q->rpm_status = RPM_ACTIVE;
-		__blk_run_queue(q);
+		if (!q->mq_ops)
+			__blk_run_queue(q);
+		else
+			blk_mq_run_hw_queues(q, true);
 		pm_runtime_mark_last_busy(q->dev);
 		pm_request_autosuspend(q->dev);
 	} else {
@@ -3790,6 +3793,69 @@ static void __blk_set_runtime_active(struct request_queue *q)
 	pm_request_autosuspend(q->dev);
 }
 
+static bool blk_mq_support_runtime_pm(struct request_queue *q)
+{
+	if (!q->tag_set || !(q->tag_set->flags & BLK_MQ_F_SUPPORT_RPM))
+		return false;
+	return true;
+}
+
+static int blk_mq_pre_runtime_suspend(struct request_queue *q)
+{
+	bool active;
+	int ret = 0;
+
+	if (!blk_mq_support_runtime_pm(q))
+		return ret;
+
+	write_seqlock_irq(&q->rpm_lock);
+	active = blk_mq_pm_queue_idle(q);
+	ret = __blk_pre_runtime_suspend(q, active);
+	write_sequnlock_irq(&q->rpm_lock);
+
+	return ret;
+}
+
+static void blk_mq_post_runtime_suspend(struct request_queue *q, int err)
+{
+	if (!blk_mq_support_runtime_pm(q))
+		return;
+
+	write_seqlock_irq(&q->rpm_lock);
+	__blk_post_runtime_suspend(q, err);
+	write_sequnlock_irq(&q->rpm_lock);
+}
+
+static void blk_mq_pre_runtime_resume(struct request_queue *q)
+{
+	if (!blk_mq_support_runtime_pm(q))
+		return;
+
+	write_seqlock_irq(&q->rpm_lock);
+	q->rpm_status = RPM_RESUMING;
+	write_sequnlock_irq(&q->rpm_lock);
+}
+
+static void blk_mq_post_runtime_resume(struct request_queue *q, int err)
+{
+	if (!blk_mq_support_runtime_pm(q))
+		return;
+
+	write_seqlock_irq(&q->rpm_lock);
+	__blk_post_runtime_resume(q, err);
+	write_sequnlock_irq(&q->rpm_lock);
+}
+
+static void blk_mq_set_runtime_active(struct request_queue *q)
+{
+	if (!blk_mq_support_runtime_pm(q))
+		return;
+
+	write_seqlock_irq(&q->rpm_lock);
+	__blk_set_runtime_active(q);
+	write_sequnlock_irq(&q->rpm_lock);
+}
+
 /**
  * blk_pm_runtime_init - Block layer runtime PM initialization routine
  * @q: the queue of the device
@@ -3813,8 +3879,7 @@ static void __blk_set_runtime_active(struct request_queue *q)
  */
 void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
 {
-	/* not support for RQF_PM and ->rpm_status in blk-mq yet */
-	if (q->mq_ops)
+	if (q->mq_ops && !blk_mq_support_runtime_pm(q))
 		return;
 
 	q->dev = dev;
@@ -3852,9 +3917,13 @@ int blk_pre_runtime_suspend(struct request_queue *q)
 	if (!q->dev)
 		return ret;
 
-	spin_lock_irq(q->queue_lock);
-	ret = __blk_pre_runtime_suspend(q, q->nr_pending);
-	spin_unlock_irq(q->queue_lock);
+	if (q->mq_ops)
+		ret = blk_mq_pre_runtime_suspend(q);
+	else {
+		spin_lock_irq(q->queue_lock);
+		ret = __blk_pre_runtime_suspend(q, q->nr_pending);
+		spin_unlock_irq(q->queue_lock);
+	}
 	return ret;
 }
 EXPORT_SYMBOL(blk_pre_runtime_suspend);
@@ -3877,9 +3946,13 @@ void blk_post_runtime_suspend(struct request_queue *q, int err)
 	if (!q->dev)
 		return;
 
-	spin_lock_irq(q->queue_lock);
-	__blk_post_runtime_suspend(q, err);
-	spin_unlock_irq(q->queue_lock);
+	if (q->mq_ops)
+		blk_mq_post_runtime_suspend(q, err);
+	else {
+		spin_lock_irq(q->queue_lock);
+		__blk_post_runtime_suspend(q, err);
+		spin_unlock_irq(q->queue_lock);
+	}
 }
 EXPORT_SYMBOL(blk_post_runtime_suspend);
 
@@ -3899,9 +3972,13 @@ void blk_pre_runtime_resume(struct request_queue *q)
 	if (!q->dev)
 		return;
 
-	spin_lock_irq(q->queue_lock);
-	q->rpm_status = RPM_RESUMING;
-	spin_unlock_irq(q->queue_lock);
+	if (q->mq_ops)
+		blk_mq_pre_runtime_resume(q);
+	else {
+		spin_lock_irq(q->queue_lock);
+		q->rpm_status = RPM_RESUMING;
+		spin_unlock_irq(q->queue_lock);
+	}
 }
 EXPORT_SYMBOL(blk_pre_runtime_resume);
 
@@ -3924,9 +4001,13 @@ void blk_post_runtime_resume(struct request_queue *q, int err)
 	if (!q->dev)
 		return;
 
-	spin_lock_irq(q->queue_lock);
-	__blk_post_runtime_resume(q, err);
-	spin_unlock_irq(q->queue_lock);
+	if (q->mq_ops)
+		blk_mq_post_runtime_resume(q, err);
+	else {
+		spin_lock_irq(q->queue_lock);
+		__blk_post_runtime_resume(q, err);
+		spin_unlock_irq(q->queue_lock);
+	}
 }
 EXPORT_SYMBOL(blk_post_runtime_resume);
 
@@ -3946,9 +4027,13 @@ EXPORT_SYMBOL(blk_post_runtime_resume);
  */
 void blk_set_runtime_active(struct request_queue *q)
 {
-	spin_lock_irq(q->queue_lock);
-	__blk_set_runtime_active(q);
-	spin_unlock_irq(q->queue_lock);
+	if (q->mq_ops)
+		blk_mq_set_runtime_active(q);
+	else {
+		spin_lock_irq(q->queue_lock);
+		__blk_set_runtime_active(q);
+		spin_unlock_irq(q->queue_lock);
+	}
 }
 EXPORT_SYMBOL(blk_set_runtime_active);
 #endif
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 73a43b81b17d..8eb6ea1a7410 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -25,6 +25,8 @@
 #include <linux/delay.h>
 #include <linux/crash_dump.h>
 #include <linux/prefetch.h>
+#include <linux/pm_runtime.h>
+#include <linux/seqlock.h>
 
 #include <trace/events/block.h>
 
@@ -58,6 +60,66 @@ static int blk_mq_poll_stats_bkt(const struct request *rq)
 	return bucket;
 }
 
+#ifdef CONFIG_PM
+static void blk_mq_pm_check_idle(struct blk_mq_hw_ctx *hctx,
+		struct request *rq, void *priv, bool reserved)
+{
+	unsigned long *cnt = priv;
+
+	if (!(rq->rq_flags & RQF_PM))
+		(*cnt)++;
+}
+
+bool blk_mq_pm_queue_idle(struct request_queue *q)
+{
+	unsigned long idle_cnt;
+
+	if (!q->tag_set || !(q->tag_set->flags & BLK_MQ_F_SUPPORT_RPM))
+		return false;
+
+	idle_cnt = 0;
+	blk_mq_queue_tag_busy_iter(q, blk_mq_pm_check_idle, &idle_cnt);
+
+	return idle_cnt == 0;
+}
+
+static void blk_mq_pm_init(struct request_queue *q)
+{
+	seqlock_init(&q->rpm_lock);
+}
+
+static void blk_mq_pm_add_request(struct request_queue *q, struct request *rq)
+{
+	unsigned int seq;
+	bool need_resume;
+
+	do {
+		seq = read_seqbegin(&q->rpm_lock);
+		need_resume = q->dev && !(rq->rq_flags & RQF_PM) &&
+			(q->rpm_status == RPM_SUSPENDED ||
+			 q->rpm_status == RPM_SUSPENDING);
+	} while (read_seqretry(&q->rpm_lock, seq));
+
+	if (need_resume)
+		pm_runtime_resume(q->dev);
+}
+
+static void blk_mq_pm_put_request(struct request_queue *q, struct request *rq)
+{
+	if (q->dev && !(rq->rq_flags & RQF_PM))
+		pm_runtime_mark_last_busy(q->dev);
+}
+#else
+static void blk_mq_pm_init(struct request_queue *q)
+{}
+
+static void blk_mq_pm_add_request(struct request_queue *q, struct request *rq)
+{}
+
+static void blk_mq_pm_put_request(struct request_queue *q, struct request *rq)
+{}
+#endif
+
 /*
  * Check if any of the ctx's have pending work in this hardware queue
  */
@@ -391,6 +453,10 @@ static struct request *blk_mq_get_request(struct request_queue *q,
 		}
 	}
 	data->hctx->queued++;
+
+	if (data->hctx->flags & BLK_MQ_F_SUPPORT_RPM)
+		blk_mq_pm_add_request(q, rq);
+
 	return rq;
 }
 
@@ -509,6 +575,9 @@ void blk_mq_free_request(struct request *rq)
 	if (blk_rq_rl(rq))
 		blk_put_rl(blk_rq_rl(rq));
 
+	if (hctx->flags & BLK_MQ_F_SUPPORT_RPM)
+		blk_mq_pm_put_request(q, rq);
+
 	WRITE_ONCE(rq->state, MQ_RQ_IDLE);
 	if (refcount_dec_and_test(&rq->ref))
 		__blk_mq_free_request(rq);
@@ -2619,6 +2688,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 			return ERR_PTR(ret);
 	}
 
+	blk_mq_pm_init(q);
+
 	return q;
 
 err_hctxs:
diff --git a/block/blk-mq.h b/block/blk-mq.h
index bc2b24735ed4..886e09b07628 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -88,6 +88,16 @@ extern void blk_mq_hctx_kobj_init(struct blk_mq_hw_ctx *hctx);
 
 void blk_mq_release(struct request_queue *q);
 
+/* blk-mq pm helpers */
+#ifdef CONFIG_PM
+extern bool blk_mq_pm_queue_idle(struct request_queue *q);
+#else
+static inline bool blk_mq_pm_queue_idle(struct request_queue *q)
+{
+	return false;
+}
+#endif
+
 /**
  * blk_mq_rq_state() - read the current MQ_RQ_* state of a request
  * @rq: target request.
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index d710e92874cc..f88639478d30 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -180,6 +180,7 @@ enum {
 	BLK_MQ_F_SHOULD_MERGE	= 1 << 0,
 	BLK_MQ_F_TAG_SHARED	= 1 << 1,
 	BLK_MQ_F_SG_MERGE	= 1 << 2,
+	BLK_MQ_F_SUPPORT_RPM	= 1 << 3,
 	BLK_MQ_F_BLOCKING	= 1 << 5,
 	BLK_MQ_F_NO_SCHED	= 1 << 6,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 137759862f07..16113921519d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -542,6 +542,7 @@ struct request_queue {
 
 #ifdef CONFIG_PM
 	struct device		*dev;
+	seqlock_t		rpm_lock;
 	int			rpm_status;
 	unsigned int		nr_pending;
 #endif
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC V2 3/3] scsi_mq: enable runtime PM
  2018-07-13  8:05 [PATCH RFC V2 0/3] blk-mq: support runtime PM Ming Lei
  2018-07-13  8:06 ` [PATCH RFC V2 1/3] block: put runtime PM code into common helpers Ming Lei
  2018-07-13  8:06 ` [PATCH RFC V2 2/3] blk-mq: prepare for supporting runtime PM Ming Lei
@ 2018-07-13  8:06 ` Ming Lei
  2018-07-17 13:24   ` Christoph Hellwig
  2018-07-13 14:21 ` [PATCH RFC V2 0/3] blk-mq: support " Jens Axboe
  3 siblings, 1 reply; 34+ messages in thread
From: Ming Lei @ 2018-07-13  8:06 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Rafael J. Wysocki, Alan Stern, linux-pm,
	Greg Kroah-Hartman, Christoph Hellwig, Bart Van Assche,
	Hannes Reinecke, Johannes Thumshirn, Adrian Hunter,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

Usually SCSI supports runtime PM, so pass BLK_MQ_F_SUPPORT_RPM to blk-mq
core for enabling block runtime PM.

Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-pm@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/scsi_lib.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 41e9ac9fc138..fa4667aa4732 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2306,7 +2306,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
 	shost->tag_set.queue_depth = shost->can_queue;
 	shost->tag_set.cmd_size = cmd_size;
 	shost->tag_set.numa_node = NUMA_NO_NODE;
-	shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
+	shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE |
+		BLK_MQ_F_SG_MERGE | BLK_MQ_F_SUPPORT_RPM;
 	shost->tag_set.flags |=
 		BLK_ALLOC_POLICY_TO_MQ_FLAG(shost->hostt->tag_alloc_policy);
 	shost->tag_set.driver_data = shost;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 0/3] blk-mq: support runtime PM
  2018-07-13  8:05 [PATCH RFC V2 0/3] blk-mq: support runtime PM Ming Lei
                   ` (2 preceding siblings ...)
  2018-07-13  8:06 ` [PATCH RFC V2 3/3] scsi_mq: enable " Ming Lei
@ 2018-07-13 14:21 ` Jens Axboe
  2018-07-14  2:37   ` Ming Lei
  3 siblings, 1 reply; 34+ messages in thread
From: Jens Axboe @ 2018-07-13 14:21 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-block, Rafael J. Wysocki, Alan Stern, linux-pm,
	Greg Kroah-Hartman, Christoph Hellwig, Bart Van Assche,
	Hannes Reinecke, Johannes Thumshirn, Adrian Hunter,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

On 7/13/18 2:05 AM, Ming Lei wrote:
> Hi Guys,
> 
> Runtime PM is usually enabled for SCSI devices, and we are switching to
> SCSI_MQ recently, but runtime PM isn't supported yet by blk-mq, and
> people may complain that.
> 
> This patch tries to support runtime PM for blk-mq. And one chanllenge is
> that it can be quite expensive to account the active in-flight IOs for
> figuring out when to mark the last busy. This patch simply marks busy
> after each non-PM IO is done, and this way is workable because:
> 
> 1) pm_runtime_mark_last_busy() is very cheap
> 
> 2) in-flight non-PM IO is checked in blk_pre_runtime_suspend(), so
> if there is any IO queued, the device will be prevented from being
> suspened.
> 
> 3) Generally speaking, autosuspend_delay_ms is often big, and should
> be in unit of second, so it shouldn't be a big deal to check if queue
> is idle in blk_pre_runtime_suspend().
> 
> 
> V2:
> 	- re-organize code as suggested by Christoph
> 	- use seqlock to sync runtime PM and IO path

See other mail on why this is not going to be acceptable.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 2/3] blk-mq: prepare for supporting runtime PM
  2018-07-13  8:06 ` [PATCH RFC V2 2/3] blk-mq: prepare for supporting runtime PM Ming Lei
@ 2018-07-13 20:16   ` Alan Stern
  2018-07-17 13:23   ` Christoph Hellwig
  1 sibling, 0 replies; 34+ messages in thread
From: Alan Stern @ 2018-07-13 20:16 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Rafael J. Wysocki, linux-pm,
	Greg Kroah-Hartman, Christoph Hellwig, Bart Van Assche,
	Hannes Reinecke, Johannes Thumshirn, Adrian Hunter,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

On Fri, 13 Jul 2018, Ming Lei wrote:

> This patch introduces blk_mq_pm_add_request() which is called after
> allocating one request. Also blk_mq_pm_put_request() is introduced
> and called after one request is freed.
> 
> For blk-mq, it can be quite expensive to accounting in-flight IOs,
> so this patch calls pm_runtime_mark_last_busy() simply after each IO
> is done, instead of doing that only after the last in-flight IO is done.
> This way is still workable, since the active non-PM IO will be checked
> in blk_pre_runtime_suspend(), and runtime suspend will be prevented
> if there is any active non-PM IO.
> 
> Turns out that sync between runtime PM and IO path has to be done
> for avoiding race, this patch applies one seqlock for this purpose.
> So the cost introduced in fast IO path can be minimized given seqlock
> is often used in fast path, such as reading jiffies &tick, or d_walk(),
> ...
> 
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: linux-pm@vger.kernel.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
> Cc: linux-scsi@vger.kernel.org
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---


> +static void blk_mq_post_runtime_suspend(struct request_queue *q, int err)
> +{
> +	if (!blk_mq_support_runtime_pm(q))
> +		return;
> +
> +	write_seqlock_irq(&q->rpm_lock);
> +	__blk_post_runtime_suspend(q, err);
> +	write_sequnlock_irq(&q->rpm_lock);
> +}
> +
> +static void blk_mq_pre_runtime_resume(struct request_queue *q)
> +{
> +	if (!blk_mq_support_runtime_pm(q))
> +		return;
> +
> +	write_seqlock_irq(&q->rpm_lock);
> +	q->rpm_status = RPM_RESUMING;
> +	write_sequnlock_irq(&q->rpm_lock);
> +}
> +
> +static void blk_mq_post_runtime_resume(struct request_queue *q, int err)
> +{
> +	if (!blk_mq_support_runtime_pm(q))
> +		return;
> +
> +	write_seqlock_irq(&q->rpm_lock);
> +	__blk_post_runtime_resume(q, err);
> +	write_sequnlock_irq(&q->rpm_lock);
> +}
> +
> +static void blk_mq_set_runtime_active(struct request_queue *q)
> +{
> +	if (!blk_mq_support_runtime_pm(q))
> +		return;
> +
> +	write_seqlock_irq(&q->rpm_lock);
> +	__blk_set_runtime_active(q);
> +	write_sequnlock_irq(&q->rpm_lock);
> +}

Would the code be cleaner if these routines were written inline,
like their non-mq counterparts?

Alan Stern

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 0/3] blk-mq: support runtime PM
  2018-07-13 14:21 ` [PATCH RFC V2 0/3] blk-mq: support " Jens Axboe
@ 2018-07-14  2:37   ` Ming Lei
  2018-07-14  2:54     ` Jens Axboe
  2018-07-16 16:03     ` Bart Van Assche
  0 siblings, 2 replies; 34+ messages in thread
From: Ming Lei @ 2018-07-14  2:37 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Rafael J. Wysocki, Alan Stern, linux-pm,
	Greg Kroah-Hartman, Christoph Hellwig, Bart Van Assche,
	Hannes Reinecke, Johannes Thumshirn, Adrian Hunter,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

On Fri, Jul 13, 2018 at 08:21:09AM -0600, Jens Axboe wrote:
> On 7/13/18 2:05 AM, Ming Lei wrote:
> > Hi Guys,
> > 
> > Runtime PM is usually enabled for SCSI devices, and we are switching to
> > SCSI_MQ recently, but runtime PM isn't supported yet by blk-mq, and
> > people may complain that.
> > 
> > This patch tries to support runtime PM for blk-mq. And one chanllenge is
> > that it can be quite expensive to account the active in-flight IOs for
> > figuring out when to mark the last busy. This patch simply marks busy
> > after each non-PM IO is done, and this way is workable because:
> > 
> > 1) pm_runtime_mark_last_busy() is very cheap
> > 
> > 2) in-flight non-PM IO is checked in blk_pre_runtime_suspend(), so
> > if there is any IO queued, the device will be prevented from being
> > suspened.
> > 
> > 3) Generally speaking, autosuspend_delay_ms is often big, and should
> > be in unit of second, so it shouldn't be a big deal to check if queue
> > is idle in blk_pre_runtime_suspend().
> > 
> > 
> > V2:
> > 	- re-organize code as suggested by Christoph
> > 	- use seqlock to sync runtime PM and IO path
> 
> See other mail on why this is not going to be acceptable.

OK, I am thinking another idea for addressing this issue.

We may introduce one logical admin(pm) request queue for each scsi_device,
and this queue shares tag with IO queue, with NO_SCHED set, and always
use atomic mode of the queue usage refcounter. Then we may send PM
command to device after the IO queue is frozen.

Also PREEMPT_ONLY can be removed too in this way.

Even in future, all pass-through commands may be sent to this admin queue.

If no one objects, I will cook patches towards this direction.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 0/3] blk-mq: support runtime PM
  2018-07-14  2:37   ` Ming Lei
@ 2018-07-14  2:54     ` Jens Axboe
  2018-07-16 16:21       ` Bart Van Assche
  2018-07-16 16:03     ` Bart Van Assche
  1 sibling, 1 reply; 34+ messages in thread
From: Jens Axboe @ 2018-07-14  2:54 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-block, Rafael J. Wysocki, Alan Stern, linux-pm,
	Greg Kroah-Hartman, Christoph Hellwig, Bart Van Assche,
	Hannes Reinecke, Johannes Thumshirn, Adrian Hunter,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

On 7/13/18 8:37 PM, Ming Lei wrote:
> On Fri, Jul 13, 2018 at 08:21:09AM -0600, Jens Axboe wrote:
>> On 7/13/18 2:05 AM, Ming Lei wrote:
>>> Hi Guys,
>>>
>>> Runtime PM is usually enabled for SCSI devices, and we are switching to
>>> SCSI_MQ recently, but runtime PM isn't supported yet by blk-mq, and
>>> people may complain that.
>>>
>>> This patch tries to support runtime PM for blk-mq. And one chanllenge is
>>> that it can be quite expensive to account the active in-flight IOs for
>>> figuring out when to mark the last busy. This patch simply marks busy
>>> after each non-PM IO is done, and this way is workable because:
>>>
>>> 1) pm_runtime_mark_last_busy() is very cheap
>>>
>>> 2) in-flight non-PM IO is checked in blk_pre_runtime_suspend(), so
>>> if there is any IO queued, the device will be prevented from being
>>> suspened.
>>>
>>> 3) Generally speaking, autosuspend_delay_ms is often big, and should
>>> be in unit of second, so it shouldn't be a big deal to check if queue
>>> is idle in blk_pre_runtime_suspend().
>>>
>>>
>>> V2:
>>> 	- re-organize code as suggested by Christoph
>>> 	- use seqlock to sync runtime PM and IO path
>>
>> See other mail on why this is not going to be acceptable.
> 
> OK, I am thinking another idea for addressing this issue.
> 
> We may introduce one logical admin(pm) request queue for each scsi_device,
> and this queue shares tag with IO queue, with NO_SCHED set, and always
> use atomic mode of the queue usage refcounter. Then we may send PM
> command to device after the IO queue is frozen.
> 
> Also PREEMPT_ONLY can be removed too in this way.
> 
> Even in future, all pass-through commands may be sent to this admin queue.
> 
> If no one objects, I will cook patches towards this direction.

Yes, this seems like a fine idea. It's essentially the same as handling
the enter differently, but the abstraction is nicer.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 0/3] blk-mq: support runtime PM
  2018-07-14  2:37   ` Ming Lei
  2018-07-14  2:54     ` Jens Axboe
@ 2018-07-16 16:03     ` Bart Van Assche
  2018-07-17  1:12       ` Ming Lei
  1 sibling, 1 reply; 34+ messages in thread
From: Bart Van Assche @ 2018-07-16 16:03 UTC (permalink / raw)
  To: ming.lei, axboe
  Cc: jthumshirn, linux-block, linux-pm, hch, martin.petersen, hare,
	linux-scsi, rjw, stern, gregkh, jejb, adrian.hunter

On Sat, 2018-07-14 at 10:37 +0800, Ming Lei wrote:
> OK, I am thinking another idea for addressing this issue.
>=20
> We may introduce one logical admin(pm) request queue for each scsi=
8-device,
> and this queue shares tag with IO queue, with NO_SCHED set, and a=
lways
> use atomic mode of the queue usage refcounter. Then we may send PM
> command to device after the IO queue is frozen.
>=20
> Also PREEMPT_ONLY can be removed too in this way.
>=20
> Even in future, all pass-through commands may be sent to this admin q=
ueue.
>=20
> If no one objects, I will cook patches towards this direction.

Wouldn't it be overkill to introduce a separate admin queue for SCSI device=
s?
Will that new queue be visible under /sys/block as a request queue? If so,
how many scripts will break due to that sysfs change? If the new queue won'=
t
be visible under /sys/block, how many changes will have to be made to the
block layer to avoid that it becomes visible?

Regarding the device_busy, device_blocked, cmd_list and other a=
ttributes in
struct scsi_device: will the SCSI commands submitted to the admin queue=
 affect
these attributes? If not, which assumptions in the SCSI core will break? An=
d
if SCSI commands submitted to the admin queue will affect these attributes,
how will these attributes be shared between the regular and the admin reque=
st
queues?

Since the power management core guarantees that runtime PM operations do no=
t
occur during suspend nor during resume operations: has it been considered t=
o
drop the RQF_PM flag and to use the RQF_PREEMPT flag instead and al=
so to
set and reset QUEUE_FLAG_PREEMPT_ONLY as needed?

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 0/3] blk-mq: support runtime PM
  2018-07-14  2:54     ` Jens Axboe
@ 2018-07-16 16:21       ` Bart Van Assche
  0 siblings, 0 replies; 34+ messages in thread
From: Bart Van Assche @ 2018-07-16 16:21 UTC (permalink / raw)
  To: axboe, ming.lei
  Cc: jthumshirn, linux-block, linux-pm, hch, martin.petersen, hare,
	linux-scsi, rjw, stern, gregkh, jejb, adrian.hunter

On Fri, 2018-07-13 at 20:54 -0600, Jens Axboe wrote:
> On 7/13/18 8:37 PM, Ming Lei wrote:
> > OK, I am thinking another idea for addressing this issue.
> >=20
> > We may introduce one logical admin(pm) request queue for each s=
csi_device,
> > and this queue shares tag with IO queue, with NO_SCHED set,=
 and always
> > use atomic mode of the queue usage refcounter. Then we may send=
 PM
> > command to device after the IO queue is frozen.
> >=20
> > Also PREEMPT_ONLY can be removed too in this way.
> >=20
> > Even in future, all pass-through commands may be sent to this a=
dmin queue.
> >=20
> > If no one objects, I will cook patches towards this direction.
>=20
> Yes, this seems like a fine idea. It's essentially the same as handli=
ng
> the enter differently, but the abstraction is nicer.

Hello Jens,

Why do you support the idea of removing the PREEMPT_ONLY flag? Are you
perhaps concerned about the performance impact of that flag for NVMe reques=
t
processing? If that is your concern and if it can be shown that the
processing of that flag has a measurable performance impact, how about usin=
g
jump labels to ensure that only request queues that need the PREEMPT_ON=
LY
flag pay the overhead of processing that flag? See also
Documentation/static-keys.txt.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 0/3] blk-mq: support runtime PM
  2018-07-16 16:03     ` Bart Van Assche
@ 2018-07-17  1:12       ` Ming Lei
  0 siblings, 0 replies; 34+ messages in thread
From: Ming Lei @ 2018-07-17  1:12 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: axboe, jthumshirn, linux-block, linux-pm, hch, martin.petersen,
	hare, linux-scsi, rjw, stern, gregkh, jejb, adrian.hunter

On Mon, Jul 16, 2018 at 04:03:04PM +0000, Bart Van Assche wrote:
> On Sat, 2018-07-14 at 10:37 +0800, Ming Lei wrote:
> > OK, I am thinking another idea for addressing this issue.
> > 
> > We may introduce one logical admin(pm) request queue for each scsi_device,
> > and this queue shares tag with IO queue, with NO_SCHED set, and always
> > use atomic mode of the queue usage refcounter. Then we may send PM
> > command to device after the IO queue is frozen.
> > 
> > Also PREEMPT_ONLY can be removed too in this way.
> > 
> > Even in future, all pass-through commands may be sent to this admin queue.
> > 
> > If no one objects, I will cook patches towards this direction.
> 
> Wouldn't it be overkill to introduce a separate admin queue for SCSI devices?
> Will that new queue be visible under /sys/block as a request queue? If so,

The admin queue won't be visible under /sys/block because there isn't
disk attached to it.

> how many scripts will break due to that sysfs change? If the new queue won't
> be visible under /sys/block, how many changes will have to be made to the
> block layer to avoid that it becomes visible?

That is easy enough, please see NVMe's admin queue. And the concept is
actually from NVMe. The difference is just that NVMe's admin queue has
standalone tags, but it is easy to solve with TAG_SHARED for scsi.

> 
> Regarding the device_busy, device_blocked, cmd_list and other attributes in
> struct scsi_device: will the SCSI commands submitted to the admin queue affect
> these attributes? If not, which assumptions in the SCSI core will break? And
> if SCSI commands submitted to the admin queue will affect these attributes,
> how will these attributes be shared between the regular and the admin request
> queues?

Per my current design, requests submitted to one scsi_device via admin
queue will share this scsi_device attributes just for sake of easy
implementation.

Sharing won't be very difficult, and one way is to pass 'scsi_device'
via scsi_request from scsi_execute().

BTW, I am trying to make the admin queue per-host first.

> 
> Since the power management core guarantees that runtime PM operations do not
> occur during suspend nor during resume operations: has it been considered to
> drop the RQF_PM flag and to use the RQF_PREEMPT flag instead and also to
> set and reset QUEUE_FLAG_PREEMPT_ONLY as needed?

That can't work on runtime PM which requires new IO to wakeup queue, if
we add this to blk_queue_enter(), which will become quite complicated:

1) queue freeze has to be done before runtime suspending

2) when any new IO comes, the queue has to be unfrozen.

3) when any RQF_PM request comes, the queue has to be kept in freezing,
and allows this request to be crossed.


There are at least two benefits with this suggested approach:

1) simplifying blk_queue_enter() for using freezing IO queue before
runtime suspend. We can simply freeze IO queue before suspend.

2) solve the RQF_PM request starvation issue in simple way
Now blk_pm_add_request() is called from __elv_add_request(), but at
that time, the request pool may have been consumed up already.

Also separating admin requests from IO queue may cause make chance to
simplify & optimize IO path in future too.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 1/3] block: put runtime PM code into common helpers
  2018-07-13  8:06 ` [PATCH RFC V2 1/3] block: put runtime PM code into common helpers Ming Lei
@ 2018-07-17 13:21   ` Christoph Hellwig
  0 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2018-07-17 13:21 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Rafael J. Wysocki, Alan Stern, linux-pm,
	Greg Kroah-Hartman, Christoph Hellwig, Bart Van Assche,
	Hannes Reinecke, Johannes Thumshirn, Adrian Hunter,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

>  #ifdef CONFIG_PM
> +static int __blk_pre_runtime_suspend(struct request_queue *q, bool active)
> +{
> +	int ret;
> +
> +	if (active) {
> +		ret = -EBUSY;
> +		pm_runtime_mark_last_busy(q->dev);
> +	} else {
> +		ret = 0;
> +		q->rpm_status = RPM_SUSPENDING;
> +	}
> +
> +	return ret;
> +}

Why not:

	if (active) {
		pm_runtime_mark_last_busy(q->dev);
		return -EBUSY;
	}

	q->rpm_status = RPM_SUSPENDING;
	return 0;
		

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 2/3] blk-mq: prepare for supporting runtime PM
  2018-07-13  8:06 ` [PATCH RFC V2 2/3] blk-mq: prepare for supporting runtime PM Ming Lei
  2018-07-13 20:16   ` Alan Stern
@ 2018-07-17 13:23   ` Christoph Hellwig
  1 sibling, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2018-07-17 13:23 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Rafael J. Wysocki, Alan Stern, linux-pm,
	Greg Kroah-Hartman, Christoph Hellwig, Bart Van Assche,
	Hannes Reinecke, Johannes Thumshirn, Adrian Hunter,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

On Fri, Jul 13, 2018 at 04:06:01PM +0800, Ming Lei wrote:
> This patch introduces blk_mq_pm_add_request() which is called after
> allocating one request. Also blk_mq_pm_put_request() is introduced
> and called after one request is freed.
> 
> For blk-mq, it can be quite expensive to accounting in-flight IOs,
> so this patch calls pm_runtime_mark_last_busy() simply after each IO
> is done, instead of doing that only after the last in-flight IO is done.
> This way is still workable, since the active non-PM IO will be checked
> in blk_pre_runtime_suspend(), and runtime suspend will be prevented
> if there is any active non-PM IO.
> 
> Turns out that sync between runtime PM and IO path has to be done
> for avoiding race, this patch applies one seqlock for this purpose.
> So the cost introduced in fast IO path can be minimized given seqlock
> is often used in fast path, such as reading jiffies &tick, or d_walk(),
> ...
> 
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: linux-pm@vger.kernel.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
> Cc: linux-scsi@vger.kernel.org
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-core.c       | 121 +++++++++++++++++++++++++++++++++++++++++--------
>  block/blk-mq.c         |  71 +++++++++++++++++++++++++++++
>  block/blk-mq.h         |  10 ++++
>  include/linux/blk-mq.h |   1 +
>  include/linux/blkdev.h |   1 +
>  5 files changed, 186 insertions(+), 18 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 1087a58590f1..cd73db90d1e3 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -3775,7 +3775,10 @@ static void __blk_post_runtime_resume(struct request_queue *q, int err)
>  {
>  	if (!err) {
>  		q->rpm_status = RPM_ACTIVE;
> -		__blk_run_queue(q);
> +		if (!q->mq_ops)
> +			__blk_run_queue(q);
> +		else
> +			blk_mq_run_hw_queues(q, true);
>  		pm_runtime_mark_last_busy(q->dev);
>  		pm_request_autosuspend(q->dev);
>  	} else {
> @@ -3790,6 +3793,69 @@ static void __blk_set_runtime_active(struct request_queue *q)
>  	pm_request_autosuspend(q->dev);
>  }
>  
> +static bool blk_mq_support_runtime_pm(struct request_queue *q)
> +{
> +	if (!q->tag_set || !(q->tag_set->flags & BLK_MQ_F_SUPPORT_RPM))
> +		return false;
> +	return true;

	return q->tag_set && (q->tag_set->flags & BLK_MQ_F_SUPPORT_RPM);

?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 3/3] scsi_mq: enable runtime PM
  2018-07-13  8:06 ` [PATCH RFC V2 3/3] scsi_mq: enable " Ming Lei
@ 2018-07-17 13:24   ` Christoph Hellwig
  2018-07-17 15:30     ` Ming Lei
  0 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2018-07-17 13:24 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Rafael J. Wysocki, Alan Stern, linux-pm,
	Greg Kroah-Hartman, Christoph Hellwig, Bart Van Assche,
	Hannes Reinecke, Johannes Thumshirn, Adrian Hunter,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

On Fri, Jul 13, 2018 at 04:06:02PM +0800, Ming Lei wrote:
> Usually SCSI supports runtime PM, so pass BLK_MQ_F_SUPPORT_RPM to blk-mq
> core for enabling block runtime PM.

I still think enabling this unconditionally for any SCSI device was
a mistake, and it is even more so for blk-mq.

Please only opt in for ufs, ATA first, adding others if wanted by
maintainers.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 3/3] scsi_mq: enable runtime PM
  2018-07-17 13:24   ` Christoph Hellwig
@ 2018-07-17 15:30     ` Ming Lei
  2018-07-17 15:34       ` Bart Van Assche
  0 siblings, 1 reply; 34+ messages in thread
From: Ming Lei @ 2018-07-17 15:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Rafael J. Wysocki, Alan Stern, linux-pm,
	Greg Kroah-Hartman, Bart Van Assche, Hannes Reinecke,
	Johannes Thumshirn, Adrian Hunter, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi

On Tue, Jul 17, 2018 at 03:24:11PM +0200, Christoph Hellwig wrote:
> On Fri, Jul 13, 2018 at 04:06:02PM +0800, Ming Lei wrote:
> > Usually SCSI supports runtime PM, so pass BLK_MQ_F_SUPPORT_RPM to blk-mq
> > core for enabling block runtime PM.
> 
> I still think enabling this unconditionally for any SCSI device was
> a mistake, and it is even more so for blk-mq.
> 
> Please only opt in for ufs, ATA first, adding others if wanted by
> maintainers.

No, this way will cause regression because runtime PM works for
all sd/sr device actually, and it isn't related with scsi host.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 3/3] scsi_mq: enable runtime PM
  2018-07-17 15:30     ` Ming Lei
@ 2018-07-17 15:34       ` Bart Van Assche
  2018-07-17 15:38         ` Ming Lei
  0 siblings, 1 reply; 34+ messages in thread
From: Bart Van Assche @ 2018-07-17 15:34 UTC (permalink / raw)
  To: hch, ming.lei
  Cc: jthumshirn, linux-block, linux-pm, martin.petersen, hare, axboe,
	linux-scsi, rjw, stern, gregkh, jejb, adrian.hunter

On Tue, 2018-07-17 at 23:30 +0800, Ming Lei wrote:
> On Tue, Jul 17, 2018 at 03:24:11PM +0200, Christoph Hellwig wrote:
> > On Fri, Jul 13, 2018 at 04:06:02PM +0800, Ming Lei wrote:
> > > Usually SCSI supports runtime PM, so pass BLK_MQ_=
F_SUPPORT_RPM to blk-mq
> > > core for enabling block runtime PM.
> >=20
> > I still think enabling this unconditionally for any SCSI device=
 was
> > a mistake, and it is even more so for blk-mq.
> >=20
> > Please only opt in for ufs, ATA first, adding others if wanted =
by
> > maintainers.
>=20
> No, this way will cause regression because runtime PM works for
> all sd/sr device actually, and it isn't related with scsi host.

For which SCSI devices other than ufs and ATA do we need PM support?

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 3/3] scsi_mq: enable runtime PM
  2018-07-17 15:34       ` Bart Van Assche
@ 2018-07-17 15:38         ` Ming Lei
  2018-07-17 19:50           ` hch
  2018-07-17 21:49           ` Jens Axboe
  0 siblings, 2 replies; 34+ messages in thread
From: Ming Lei @ 2018-07-17 15:38 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: hch, jthumshirn, linux-block, linux-pm, martin.petersen, hare,
	axboe, linux-scsi, rjw, stern, gregkh, jejb, adrian.hunter

On Tue, Jul 17, 2018 at 03:34:35PM +0000, Bart Van Assche wrote:
> On Tue, 2018-07-17 at 23:30 +0800, Ming Lei wrote:
> > On Tue, Jul 17, 2018 at 03:24:11PM +0200, Christoph Hellwig wrote:
> > > On Fri, Jul 13, 2018 at 04:06:02PM +0800, Ming Lei wrote:
> > > > Usually SCSI supports runtime PM, so pass BLK_MQ_F_SUPPORT_RPM to blk-mq
> > > > core for enabling block runtime PM.
> > > 
> > > I still think enabling this unconditionally for any SCSI device was
> > > a mistake, and it is even more so for blk-mq.
> > > 
> > > Please only opt in for ufs, ATA first, adding others if wanted by
> > > maintainers.
> > 
> > No, this way will cause regression because runtime PM works for
> > all sd/sr device actually, and it isn't related with scsi host.
> 
> For which SCSI devices other than ufs and ATA do we need PM support?

As I said, it is any sd/sr device, which can be put down by runtime PM
via START_STOP command if it isn't used for a while.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 3/3] scsi_mq: enable runtime PM
  2018-07-17 15:38         ` Ming Lei
@ 2018-07-17 19:50           ` hch
  2018-07-17 20:54             ` Alan Stern
  2018-07-17 21:49           ` Jens Axboe
  1 sibling, 1 reply; 34+ messages in thread
From: hch @ 2018-07-17 19:50 UTC (permalink / raw)
  To: Ming Lei
  Cc: Bart Van Assche, hch, jthumshirn, linux-block, linux-pm,
	martin.petersen, hare, axboe, linux-scsi, rjw, stern, gregkh,
	jejb, adrian.hunter

On Tue, Jul 17, 2018 at 11:38:53PM +0800, Ming Lei wrote:
> As I said, it is any sd/sr device, which can be put down by runtime PM
> via START_STOP command if it isn't used for a while.

And in which case do we care that this happens?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 3/3] scsi_mq: enable runtime PM
  2018-07-17 19:50           ` hch
@ 2018-07-17 20:54             ` Alan Stern
  0 siblings, 0 replies; 34+ messages in thread
From: Alan Stern @ 2018-07-17 20:54 UTC (permalink / raw)
  To: hch
  Cc: Ming Lei, Bart Van Assche, jthumshirn, linux-block, linux-pm,
	martin.petersen, hare, axboe, linux-scsi, rjw, gregkh, jejb,
	adrian.hunter

On Tue, 17 Jul 2018, hch@lst.de wrote:

> On Tue, Jul 17, 2018 at 11:38:53PM +0800, Ming Lei wrote:
> > As I said, it is any sd/sr device, which can be put down by runtime PM
> > via START_STOP command if it isn't used for a while.
> 
> And in which case do we care that this happens?

Consider a low-power system (e.g., embedded applications, tablet 
computer, etc.) using a USB flash drive.  Consider a high-power system 
with lots of disk drives that consume a lot of power while they are 
running but not while they are stopped.

It's safe to assume that if the kernel can provide a capability like 
this for users, some people will take advantage of it.

Alan Stern

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 3/3] scsi_mq: enable runtime PM
  2018-07-17 15:38         ` Ming Lei
  2018-07-17 19:50           ` hch
@ 2018-07-17 21:49           ` Jens Axboe
  2018-07-18 12:06             ` Ming Lei
  1 sibling, 1 reply; 34+ messages in thread
From: Jens Axboe @ 2018-07-17 21:49 UTC (permalink / raw)
  To: Ming Lei, Bart Van Assche
  Cc: hch, jthumshirn, linux-block, linux-pm, martin.petersen, hare,
	linux-scsi, rjw, stern, gregkh, jejb, adrian.hunter

On 7/17/18 9:38 AM, Ming Lei wrote:
> On Tue, Jul 17, 2018 at 03:34:35PM +0000, Bart Van Assche wrote:
>> On Tue, 2018-07-17 at 23:30 +0800, Ming Lei wrote:
>>> On Tue, Jul 17, 2018 at 03:24:11PM +0200, Christoph Hellwig wrote:
>>>> On Fri, Jul 13, 2018 at 04:06:02PM +0800, Ming Lei wrote:
>>>>> Usually SCSI supports runtime PM, so pass BLK_MQ_F_SUPPORT_RPM to blk-mq
>>>>> core for enabling block runtime PM.
>>>>
>>>> I still think enabling this unconditionally for any SCSI device was
>>>> a mistake, and it is even more so for blk-mq.
>>>>
>>>> Please only opt in for ufs, ATA first, adding others if wanted by
>>>> maintainers.
>>>
>>> No, this way will cause regression because runtime PM works for
>>> all sd/sr device actually, and it isn't related with scsi host.
>>
>> For which SCSI devices other than ufs and ATA do we need PM support?
> 
> As I said, it is any sd/sr device, which can be put down by runtime PM
> via START_STOP command if it isn't used for a while.

Christoph is basically echoing my concerns. Why don't we just enable
it on slower devices, similarly to what we do for adding
randomness? Nobody wants to pay this overhead for faster devices,
since most people won't care.

That said, we should still make it as efficient as we can, as per
previous discussions.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 3/3] scsi_mq: enable runtime PM
  2018-07-17 21:49           ` Jens Axboe
@ 2018-07-18 12:06             ` Ming Lei
  2018-07-18 12:28               ` Johannes Thumshirn
  2018-07-18 12:43               ` Hannes Reinecke
  0 siblings, 2 replies; 34+ messages in thread
From: Ming Lei @ 2018-07-18 12:06 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ming Lei, Bart Van Assche, hch, jthumshirn, linux-block,
	linux-pm, martin.petersen, hare, linux-scsi, rjw, stern, gregkh,
	jejb, adrian.hunter

On Wed, Jul 18, 2018 at 5:49 AM, Jens Axboe <axboe@kernel.dk> wrote:
> On 7/17/18 9:38 AM, Ming Lei wrote:
>> On Tue, Jul 17, 2018 at 03:34:35PM +0000, Bart Van Assche wrote:
>>> On Tue, 2018-07-17 at 23:30 +0800, Ming Lei wrote:
>>>> On Tue, Jul 17, 2018 at 03:24:11PM +0200, Christoph Hellwig wrote:
>>>>> On Fri, Jul 13, 2018 at 04:06:02PM +0800, Ming Lei wrote:
>>>>>> Usually SCSI supports runtime PM, so pass BLK_MQ_F_SUPPORT_RPM to blk-mq
>>>>>> core for enabling block runtime PM.
>>>>>
>>>>> I still think enabling this unconditionally for any SCSI device was
>>>>> a mistake, and it is even more so for blk-mq.
>>>>>
>>>>> Please only opt in for ufs, ATA first, adding others if wanted by
>>>>> maintainers.
>>>>
>>>> No, this way will cause regression because runtime PM works for
>>>> all sd/sr device actually, and it isn't related with scsi host.
>>>
>>> For which SCSI devices other than ufs and ATA do we need PM support?
>>
>> As I said, it is any sd/sr device, which can be put down by runtime PM
>> via START_STOP command if it isn't used for a while.
>
> Christoph is basically echoing my concerns. Why don't we just enable
> it on slower devices, similarly to what we do for adding
> randomness? Nobody wants to pay this overhead for faster devices,
> since most people won't care.

IMO the problem isn't related with slow or quick device, it is related with
the system, especially when it cares about power consumption, such as
mobile phone, or laptop or servers with lots of disks attached. And we know
it is often to see some fast disks shipped in laptop, such as NVMe, or other
SSD.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 3/3] scsi_mq: enable runtime PM
  2018-07-18 12:06             ` Ming Lei
@ 2018-07-18 12:28               ` Johannes Thumshirn
  2018-07-18 12:37                 ` Ming Lei
  2018-07-18 14:12                 ` Alan Stern
  2018-07-18 12:43               ` Hannes Reinecke
  1 sibling, 2 replies; 34+ messages in thread
From: Johannes Thumshirn @ 2018-07-18 12:28 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Ming Lei, Bart Van Assche, hch, linux-block,
	linux-pm, martin.petersen, hare, linux-scsi, rjw, stern, gregkh,
	jejb, adrian.hunter

On Wed, Jul 18, 2018 at 08:06:10PM +0800, Ming Lei wrote:
> IMO the problem isn't related with slow or quick device, it is related with
> the system, especially when it cares about power consumption, such as
> mobile phone, or laptop or servers with lots of disks attached. And we know
> it is often to see some fast disks shipped in laptop, such as NVMe, or other
> SSD.

Yes but you're only taking locally attached devices into account.
This is very likely harmful on sd devices attached to a
SRP/FC/iSCSI/SAS Target in a SAN environment. These can be really fast
and people investing this money are likely to not like additional
performance penalties. 

Furthermore I'd don't know if any array vendors actually implement
START/STOP DISK, etc..

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Felix Imend�rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N�rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 3/3] scsi_mq: enable runtime PM
  2018-07-18 12:28               ` Johannes Thumshirn
@ 2018-07-18 12:37                 ` Ming Lei
  2018-07-18 14:12                 ` Alan Stern
  1 sibling, 0 replies; 34+ messages in thread
From: Ming Lei @ 2018-07-18 12:37 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Jens Axboe, Ming Lei, Bart Van Assche, hch, linux-block,
	linux-pm, martin.petersen, hare, linux-scsi, rjw, stern, gregkh,
	jejb, adrian.hunter

On Wed, Jul 18, 2018 at 8:28 PM, Johannes Thumshirn <jthumshirn@suse.de> wrote:
> On Wed, Jul 18, 2018 at 08:06:10PM +0800, Ming Lei wrote:
>> IMO the problem isn't related with slow or quick device, it is related with
>> the system, especially when it cares about power consumption, such as
>> mobile phone, or laptop or servers with lots of disks attached. And we know
>> it is often to see some fast disks shipped in laptop, such as NVMe, or other
>> SSD.
>
> Yes but you're only taking locally attached devices into account.
> This is very likely harmful on sd devices attached to a
> SRP/FC/iSCSI/SAS Target in a SAN environment. These can be really fast
> and people investing this money are likely to not like additional
> performance penalties.

Runtime PM provides options for all these kind of cases. If you care
performance,
just keep the default setting(device is always in active). If you care power
consumption, you may run PM utility to switch runtime PM on for the
interested devices.

Definitely we are working on solutions in which runtime PM code shouldn't
introduce extra obvious cost in fast path, so no need to worry about its effect
if you never want to use that.

>
> Furthermore I'd don't know if any array vendors actually implement
> START/STOP DISK, etc..

That won't be one issue as I mentioned above.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 3/3] scsi_mq: enable runtime PM
  2018-07-18 12:06             ` Ming Lei
  2018-07-18 12:28               ` Johannes Thumshirn
@ 2018-07-18 12:43               ` Hannes Reinecke
  2018-07-18 13:05                 ` Ming Lei
  1 sibling, 1 reply; 34+ messages in thread
From: Hannes Reinecke @ 2018-07-18 12:43 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: Ming Lei, Bart Van Assche, hch, jthumshirn, linux-block,
	linux-pm, martin.petersen, linux-scsi, rjw, stern, gregkh, jejb,
	adrian.hunter

On 07/18/2018 02:06 PM, Ming Lei wrote:
> On Wed, Jul 18, 2018 at 5:49 AM, Jens Axboe <axboe@kernel.dk> wrote:
>> On 7/17/18 9:38 AM, Ming Lei wrote:
>>> On Tue, Jul 17, 2018 at 03:34:35PM +0000, Bart Van Assche wrote:
>>>> On Tue, 2018-07-17 at 23:30 +0800, Ming Lei wrote:
>>>>> On Tue, Jul 17, 2018 at 03:24:11PM +0200, Christoph Hellwig wrote:
>>>>>> On Fri, Jul 13, 2018 at 04:06:02PM +0800, Ming Lei wrote:
>>>>>>> Usually SCSI supports runtime PM, so pass BLK_MQ_F_SUPPORT_RPM to blk-mq
>>>>>>> core for enabling block runtime PM.
>>>>>>
>>>>>> I still think enabling this unconditionally for any SCSI device was
>>>>>> a mistake, and it is even more so for blk-mq.
>>>>>>
>>>>>> Please only opt in for ufs, ATA first, adding others if wanted by
>>>>>> maintainers.
>>>>>
>>>>> No, this way will cause regression because runtime PM works for
>>>>> all sd/sr device actually, and it isn't related with scsi host.
>>>>
>>>> For which SCSI devices other than ufs and ATA do we need PM support?
>>>
>>> As I said, it is any sd/sr device, which can be put down by runtime PM
>>> via START_STOP command if it isn't used for a while.
>>
>> Christoph is basically echoing my concerns. Why don't we just enable
>> it on slower devices, similarly to what we do for adding
>> randomness? Nobody wants to pay this overhead for faster devices,
>> since most people won't care.
> 
> IMO the problem isn't related with slow or quick device, it is related with
> the system, especially when it cares about power consumption, such as
> mobile phone, or laptop or servers with lots of disks attached. And we know
> it is often to see some fast disks shipped in laptop, such as NVMe, or other
> SSD.
> 
But those typically have dedicated (transport/driver) mechanism to put
the device to sleep, _and_ START STOP UNIT will be emulated for those
devices anyway.
I'd rather advocate to move runtime PM into the individual
subsystems/drivers, and _not_ abuse START STOP UNIT here.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 3/3] scsi_mq: enable runtime PM
  2018-07-18 12:43               ` Hannes Reinecke
@ 2018-07-18 13:05                 ` Ming Lei
  0 siblings, 0 replies; 34+ messages in thread
From: Ming Lei @ 2018-07-18 13:05 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Jens Axboe, Ming Lei, Bart Van Assche, hch, jthumshirn,
	linux-block, linux-pm, martin.petersen, linux-scsi, rjw, stern,
	gregkh, jejb, adrian.hunter

On Wed, Jul 18, 2018 at 8:43 PM, Hannes Reinecke <hare@suse.de> wrote:
> On 07/18/2018 02:06 PM, Ming Lei wrote:
>> On Wed, Jul 18, 2018 at 5:49 AM, Jens Axboe <axboe@kernel.dk> wrote:
>>> On 7/17/18 9:38 AM, Ming Lei wrote:
>>>> On Tue, Jul 17, 2018 at 03:34:35PM +0000, Bart Van Assche wrote:
>>>>> On Tue, 2018-07-17 at 23:30 +0800, Ming Lei wrote:
>>>>>> On Tue, Jul 17, 2018 at 03:24:11PM +0200, Christoph Hellwig wrote:
>>>>>>> On Fri, Jul 13, 2018 at 04:06:02PM +0800, Ming Lei wrote:
>>>>>>>> Usually SCSI supports runtime PM, so pass BLK_MQ_F_SUPPORT_RPM to blk-mq
>>>>>>>> core for enabling block runtime PM.
>>>>>>>
>>>>>>> I still think enabling this unconditionally for any SCSI device was
>>>>>>> a mistake, and it is even more so for blk-mq.
>>>>>>>
>>>>>>> Please only opt in for ufs, ATA first, adding others if wanted by
>>>>>>> maintainers.
>>>>>>
>>>>>> No, this way will cause regression because runtime PM works for
>>>>>> all sd/sr device actually, and it isn't related with scsi host.
>>>>>
>>>>> For which SCSI devices other than ufs and ATA do we need PM support?
>>>>
>>>> As I said, it is any sd/sr device, which can be put down by runtime PM
>>>> via START_STOP command if it isn't used for a while.
>>>
>>> Christoph is basically echoing my concerns. Why don't we just enable
>>> it on slower devices, similarly to what we do for adding
>>> randomness? Nobody wants to pay this overhead for faster devices,
>>> since most people won't care.
>>
>> IMO the problem isn't related with slow or quick device, it is related with
>> the system, especially when it cares about power consumption, such as
>> mobile phone, or laptop or servers with lots of disks attached. And we know
>> it is often to see some fast disks shipped in laptop, such as NVMe, or other
>> SSD.
>>
> But those typically have dedicated (transport/driver) mechanism to put
> the device to sleep, _and_ START STOP UNIT will be emulated for those
> devices anyway.
> I'd rather advocate to move runtime PM into the individual
> subsystems/drivers, and _not_ abuse START STOP UNIT here.

Firstly START_STOP isn't a abuse for PM purpose, please see
the description in t10.org:

"The START STOP UNIT command (see 5.17) allows an application client to
control the power condition of a logical unit. This method includes specifying
that the logical unit transition to a power condition."

Secondly the runtime PM on LUN level isn't contradictory with subsystem/driver's
runtime PM. The two can co-exits, for examples, USB storage host, ufshcd, ...


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 3/3] scsi_mq: enable runtime PM
  2018-07-18 12:28               ` Johannes Thumshirn
  2018-07-18 12:37                 ` Ming Lei
@ 2018-07-18 14:12                 ` Alan Stern
  2018-07-18 14:18                   ` Johannes Thumshirn
  2018-07-18 14:50                   ` Jens Axboe
  1 sibling, 2 replies; 34+ messages in thread
From: Alan Stern @ 2018-07-18 14:12 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Ming Lei, Jens Axboe, Ming Lei, Bart Van Assche, hch,
	linux-block, linux-pm, martin.petersen, hare, linux-scsi, rjw,
	gregkh, jejb, adrian.hunter

On Wed, 18 Jul 2018, Johannes Thumshirn wrote:

> On Wed, Jul 18, 2018 at 08:06:10PM +0800, Ming Lei wrote:
> > IMO the problem isn't related with slow or quick device, it is related with
> > the system, especially when it cares about power consumption, such as
> > mobile phone, or laptop or servers with lots of disks attached. And we know
> > it is often to see some fast disks shipped in laptop, such as NVMe, or other
> > SSD.
> 
> Yes but you're only taking locally attached devices into account.
> This is very likely harmful on sd devices attached to a
> SRP/FC/iSCSI/SAS Target in a SAN environment. These can be really fast
> and people investing this money are likely to not like additional
> performance penalties. 

As one extra reminder...  People who care about extreme performance can 
perfectly well disable CONFIG_PM in their kernels.  That should remove 
any overhead due to runtime PM.

Alan Stern

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 3/3] scsi_mq: enable runtime PM
  2018-07-18 14:12                 ` Alan Stern
@ 2018-07-18 14:18                   ` Johannes Thumshirn
  2018-07-18 15:01                     ` Alan Stern
  2018-07-18 14:50                   ` Jens Axboe
  1 sibling, 1 reply; 34+ messages in thread
From: Johannes Thumshirn @ 2018-07-18 14:18 UTC (permalink / raw)
  To: Alan Stern
  Cc: Ming Lei, Jens Axboe, Ming Lei, Bart Van Assche, hch,
	linux-block, linux-pm, martin.petersen, hare, linux-scsi, rjw,
	gregkh, jejb, adrian.hunter

On Wed, Jul 18, 2018 at 10:12:37AM -0400, Alan Stern wrote:
> As one extra reminder...  People who care about extreme performance can 
> perfectly well disable CONFIG_PM in their kernels.  That should remove 
> any overhead due to runtime PM.

Yes but how likely is this for people who are running their data
center with a distribution kernel.

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Felix Imend�rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N�rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 3/3] scsi_mq: enable runtime PM
  2018-07-18 14:12                 ` Alan Stern
  2018-07-18 14:18                   ` Johannes Thumshirn
@ 2018-07-18 14:50                   ` Jens Axboe
  2018-07-18 18:46                     ` Alan Stern
  2018-07-18 23:08                     ` Ming Lei
  1 sibling, 2 replies; 34+ messages in thread
From: Jens Axboe @ 2018-07-18 14:50 UTC (permalink / raw)
  To: Alan Stern, Johannes Thumshirn
  Cc: Ming Lei, Ming Lei, Bart Van Assche, hch, linux-block, linux-pm,
	martin.petersen, hare, linux-scsi, rjw, gregkh, jejb,
	adrian.hunter

On 7/18/18 8:12 AM, Alan Stern wrote:
> On Wed, 18 Jul 2018, Johannes Thumshirn wrote:
> 
>> On Wed, Jul 18, 2018 at 08:06:10PM +0800, Ming Lei wrote:
>>> IMO the problem isn't related with slow or quick device, it is related with
>>> the system, especially when it cares about power consumption, such as
>>> mobile phone, or laptop or servers with lots of disks attached. And we know
>>> it is often to see some fast disks shipped in laptop, such as NVMe, or other
>>> SSD.
>>
>> Yes but you're only taking locally attached devices into account.
>> This is very likely harmful on sd devices attached to a
>> SRP/FC/iSCSI/SAS Target in a SAN environment. These can be really fast
>> and people investing this money are likely to not like additional
>> performance penalties. 
> 
> As one extra reminder...  People who care about extreme performance can 
> perfectly well disable CONFIG_PM in their kernels.  That should remove 
> any overhead due to runtime PM.

This is a kernel fallacy that needs to be shot down. Most folks just run
what the distro provides, so no, they can't just turn off various kernel
options. This is NEVER an argument that carries any weight at all for
me. If we have the implementation, it needs to be fast enough to be on
by default. No "well just turn off the option if you think the overhead
is too high". Never.

That's exactly why I didn't like the first implementation that Ming did.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 3/3] scsi_mq: enable runtime PM
  2018-07-18 14:18                   ` Johannes Thumshirn
@ 2018-07-18 15:01                     ` Alan Stern
  2018-07-19  6:41                       ` Johannes Thumshirn
  0 siblings, 1 reply; 34+ messages in thread
From: Alan Stern @ 2018-07-18 15:01 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Ming Lei, Jens Axboe, Ming Lei, Bart Van Assche, hch,
	linux-block, linux-pm, martin.petersen, hare, linux-scsi, rjw,
	gregkh, jejb, adrian.hunter

On Wed, 18 Jul 2018, Johannes Thumshirn wrote:

> On Wed, Jul 18, 2018 at 10:12:37AM -0400, Alan Stern wrote:
> > As one extra reminder...  People who care about extreme performance can 
> > perfectly well disable CONFIG_PM in their kernels.  That should remove 
> > any overhead due to runtime PM.
> 
> Yes but how likely is this for people who are running their data
> center with a distribution kernel.

Strictly speaking, that's a nonsense question.  People who run 
distribution kernels are 100% unlikely to change the kernel 
configuration, by definition.

That's not the point.  If people really care about getting the utmost 
performance, to the extent that they resent the minimal overhead 
required for runtime PM, then they should not be running distribution 
kernels in the first place.

Alan Stern

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 3/3] scsi_mq: enable runtime PM
  2018-07-18 14:50                   ` Jens Axboe
@ 2018-07-18 18:46                     ` Alan Stern
  2018-07-18 23:08                     ` Ming Lei
  1 sibling, 0 replies; 34+ messages in thread
From: Alan Stern @ 2018-07-18 18:46 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Johannes Thumshirn, Ming Lei, Ming Lei, Bart Van Assche, hch,
	linux-block, linux-pm, martin.petersen, hare, linux-scsi, rjw,
	gregkh, jejb, adrian.hunter

On Wed, 18 Jul 2018, Jens Axboe wrote:

> On 7/18/18 8:12 AM, Alan Stern wrote:
> > On Wed, 18 Jul 2018, Johannes Thumshirn wrote:
> > 
> >> On Wed, Jul 18, 2018 at 08:06:10PM +0800, Ming Lei wrote:
> >>> IMO the problem isn't related with slow or quick device, it is related with
> >>> the system, especially when it cares about power consumption, such as
> >>> mobile phone, or laptop or servers with lots of disks attached. And we know
> >>> it is often to see some fast disks shipped in laptop, such as NVMe, or other
> >>> SSD.
> >>
> >> Yes but you're only taking locally attached devices into account.
> >> This is very likely harmful on sd devices attached to a
> >> SRP/FC/iSCSI/SAS Target in a SAN environment. These can be really fast
> >> and people investing this money are likely to not like additional
> >> performance penalties. 
> > 
> > As one extra reminder...  People who care about extreme performance can 
> > perfectly well disable CONFIG_PM in their kernels.  That should remove 
> > any overhead due to runtime PM.
> 
> This is a kernel fallacy that needs to be shot down. Most folks just run
> what the distro provides, so no, they can't just turn off various kernel
> options. This is NEVER an argument that carries any weight at all for
> me. If we have the implementation, it needs to be fast enough to be on
> by default. No "well just turn off the option if you think the overhead
> is too high". Never.

I certainly agree that it needs to be fast enough to be on by default.  
That was never an issue, and it is the goal we are working toward.

But that isn't what Johannes wrote.  To rule out all improvements that
carry even the smallest runtime overhead is an extreme attitude.  It
does have its place, though -- and people who are in that place can
afford to build their own kernels with their own configurations.

> That's exactly why I didn't like the first implementation that Ming did.

Fine.  I'm happy to leave the technical details up to the two of you,
since I'm not really up to speed on this part of the kernel and Ming 
does have a keen understanding of the runtime PM implementation.  And
hopefully -- perhaps with a little outside brainstorming inspiration --
you'll be able to come up with something that is acceptable to
everyone.

Alan Stern

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 3/3] scsi_mq: enable runtime PM
  2018-07-18 14:50                   ` Jens Axboe
  2018-07-18 18:46                     ` Alan Stern
@ 2018-07-18 23:08                     ` Ming Lei
  1 sibling, 0 replies; 34+ messages in thread
From: Ming Lei @ 2018-07-18 23:08 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Alan Stern, Johannes Thumshirn, Ming Lei, Bart Van Assche, hch,
	linux-block, linux-pm, martin.petersen, hare, linux-scsi, rjw,
	gregkh, jejb, adrian.hunter

On Wed, Jul 18, 2018 at 08:50:39AM -0600, Jens Axboe wrote:
> On 7/18/18 8:12 AM, Alan Stern wrote:
> > On Wed, 18 Jul 2018, Johannes Thumshirn wrote:
> > 
> >> On Wed, Jul 18, 2018 at 08:06:10PM +0800, Ming Lei wrote:
> >>> IMO the problem isn't related with slow or quick device, it is related with
> >>> the system, especially when it cares about power consumption, such as
> >>> mobile phone, or laptop or servers with lots of disks attached. And we know
> >>> it is often to see some fast disks shipped in laptop, such as NVMe, or other
> >>> SSD.
> >>
> >> Yes but you're only taking locally attached devices into account.
> >> This is very likely harmful on sd devices attached to a
> >> SRP/FC/iSCSI/SAS Target in a SAN environment. These can be really fast
> >> and people investing this money are likely to not like additional
> >> performance penalties. 
> > 
> > As one extra reminder...  People who care about extreme performance can 
> > perfectly well disable CONFIG_PM in their kernels.  That should remove 
> > any overhead due to runtime PM.
> 
> This is a kernel fallacy that needs to be shot down. Most folks just run
> what the distro provides, so no, they can't just turn off various kernel
> options. This is NEVER an argument that carries any weight at all for
> me. If we have the implementation, it needs to be fast enough to be on
> by default. No "well just turn off the option if you think the overhead
> is too high". Never.
> 
> That's exactly why I didn't like the first implementation that Ming did.

Hi Jens,

As far as I can think of, pm_runtime_mark_last_busy() may has to do when
completing request for helping to figure out if queue is idle. Is
this kind of cost you can accept?

static inline void pm_runtime_mark_last_busy(struct device *dev)
{
        WRITE_ONCE(dev->power.last_busy, jiffies);
}

BTW, last time, you mentioned that we may reuse the way used in timeout
for deciding idle, but that period is too big.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 3/3] scsi_mq: enable runtime PM
  2018-07-18 15:01                     ` Alan Stern
@ 2018-07-19  6:41                       ` Johannes Thumshirn
  2018-07-19 14:35                         ` Alan Stern
  0 siblings, 1 reply; 34+ messages in thread
From: Johannes Thumshirn @ 2018-07-19  6:41 UTC (permalink / raw)
  To: Alan Stern
  Cc: Ming Lei, Jens Axboe, Ming Lei, Bart Van Assche, hch,
	linux-block, linux-pm, martin.petersen, hare, linux-scsi, rjw,
	gregkh, jejb, adrian.hunter

On Wed, Jul 18, 2018 at 11:01:52AM -0400, Alan Stern wrote:
> On Wed, 18 Jul 2018, Johannes Thumshirn wrote:
> 
> > On Wed, Jul 18, 2018 at 10:12:37AM -0400, Alan Stern wrote:
> > > As one extra reminder...  People who care about extreme performance can 
> > > perfectly well disable CONFIG_PM in their kernels.  That should remove 
> > > any overhead due to runtime PM.
> > 
> > Yes but how likely is this for people who are running their data
> > center with a distribution kernel.
> 
> Strictly speaking, that's a nonsense question.  People who run 
> distribution kernels are 100% unlikely to change the kernel 
> configuration, by definition.
> 
> That's not the point.  If people really care about getting the utmost 
> performance, to the extent that they resent the minimal overhead 
> required for runtime PM, then they should not be running distribution 
> kernels in the first place.

Can we agree to disagree here?

Some of the people I know in HPC, HFT, etc... run stock SLE/RHEL and
they do file bugs for every single 3-5% performance regression. This
is not something I made up, this is the kind of things I have to deal
with on my day job.

So long,
   Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Felix Imend�rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N�rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 3/3] scsi_mq: enable runtime PM
  2018-07-19  6:41                       ` Johannes Thumshirn
@ 2018-07-19 14:35                         ` Alan Stern
  2018-07-19 14:43                           ` Johannes Thumshirn
  0 siblings, 1 reply; 34+ messages in thread
From: Alan Stern @ 2018-07-19 14:35 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Ming Lei, Jens Axboe, Ming Lei, Bart Van Assche, hch,
	linux-block, linux-pm, martin.petersen, hare, linux-scsi, rjw,
	gregkh, jejb, adrian.hunter

On Thu, 19 Jul 2018, Johannes Thumshirn wrote:

> On Wed, Jul 18, 2018 at 11:01:52AM -0400, Alan Stern wrote:
> > On Wed, 18 Jul 2018, Johannes Thumshirn wrote:
> > 
> > > On Wed, Jul 18, 2018 at 10:12:37AM -0400, Alan Stern wrote:
> > > > As one extra reminder...  People who care about extreme performance can 
> > > > perfectly well disable CONFIG_PM in their kernels.  That should remove 
> > > > any overhead due to runtime PM.
> > > 
> > > Yes but how likely is this for people who are running their data
> > > center with a distribution kernel.
> > 
> > Strictly speaking, that's a nonsense question.  People who run 
> > distribution kernels are 100% unlikely to change the kernel 
> > configuration, by definition.
> > 
> > That's not the point.  If people really care about getting the utmost 
> > performance, to the extent that they resent the minimal overhead 
> > required for runtime PM, then they should not be running distribution 
> > kernels in the first place.
> 
> Can we agree to disagree here?
> 
> Some of the people I know in HPC, HFT, etc... run stock SLE/RHEL and
> they do file bugs for every single 3-5% performance regression. This
> is not something I made up, this is the kind of things I have to deal
> with on my day job.

The performance affect of runtime PM ought to be less than 1%.  If it
is larger, we'll figure out a different approach.

Alan Stern

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC V2 3/3] scsi_mq: enable runtime PM
  2018-07-19 14:35                         ` Alan Stern
@ 2018-07-19 14:43                           ` Johannes Thumshirn
  0 siblings, 0 replies; 34+ messages in thread
From: Johannes Thumshirn @ 2018-07-19 14:43 UTC (permalink / raw)
  To: Alan Stern
  Cc: Ming Lei, Jens Axboe, Ming Lei, Bart Van Assche, hch,
	linux-block, linux-pm, martin.petersen, hare, linux-scsi, rjw,
	gregkh, jejb, adrian.hunter

On Thu, Jul 19, 2018 at 10:35:39AM -0400, Alan Stern wrote:
> On Thu, 19 Jul 2018, Johannes Thumshirn wrote:
> 
> The performance affect of runtime PM ought to be less than 1%.  If it
> is larger, we'll figure out a different approach.

OK, let's see then.

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Felix Imend�rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N�rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2018-07-19 14:43 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-13  8:05 [PATCH RFC V2 0/3] blk-mq: support runtime PM Ming Lei
2018-07-13  8:06 ` [PATCH RFC V2 1/3] block: put runtime PM code into common helpers Ming Lei
2018-07-17 13:21   ` Christoph Hellwig
2018-07-13  8:06 ` [PATCH RFC V2 2/3] blk-mq: prepare for supporting runtime PM Ming Lei
2018-07-13 20:16   ` Alan Stern
2018-07-17 13:23   ` Christoph Hellwig
2018-07-13  8:06 ` [PATCH RFC V2 3/3] scsi_mq: enable " Ming Lei
2018-07-17 13:24   ` Christoph Hellwig
2018-07-17 15:30     ` Ming Lei
2018-07-17 15:34       ` Bart Van Assche
2018-07-17 15:38         ` Ming Lei
2018-07-17 19:50           ` hch
2018-07-17 20:54             ` Alan Stern
2018-07-17 21:49           ` Jens Axboe
2018-07-18 12:06             ` Ming Lei
2018-07-18 12:28               ` Johannes Thumshirn
2018-07-18 12:37                 ` Ming Lei
2018-07-18 14:12                 ` Alan Stern
2018-07-18 14:18                   ` Johannes Thumshirn
2018-07-18 15:01                     ` Alan Stern
2018-07-19  6:41                       ` Johannes Thumshirn
2018-07-19 14:35                         ` Alan Stern
2018-07-19 14:43                           ` Johannes Thumshirn
2018-07-18 14:50                   ` Jens Axboe
2018-07-18 18:46                     ` Alan Stern
2018-07-18 23:08                     ` Ming Lei
2018-07-18 12:43               ` Hannes Reinecke
2018-07-18 13:05                 ` Ming Lei
2018-07-13 14:21 ` [PATCH RFC V2 0/3] blk-mq: support " Jens Axboe
2018-07-14  2:37   ` Ming Lei
2018-07-14  2:54     ` Jens Axboe
2018-07-16 16:21       ` Bart Van Assche
2018-07-16 16:03     ` Bart Van Assche
2018-07-17  1:12       ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.