All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v9 0/8] blk-mq: Implement runtime power management
@ 2018-09-19 22:45 Bart Van Assche
  2018-09-19 22:45 ` [PATCH v9 1/8] blk-mq: Document the functions that iterate over requests Bart Van Assche
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Bart Van Assche @ 2018-09-19 22:45 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Bart Van Assche

Hello Jens,

One of the pieces that is missing before blk-mq can be made the default is
implementing runtime power management support for blk-mq.  This patch series
not only implements runtime power management for blk-mq but also fixes a
starvation issue in the power management code for the legacy block
layer. Please consider this patch series for the upstream kernel.

Thanks,

Bart.

Changes compared to v8:
- Fixed the race that was reported by Jianchao and Ming.
- Fixed another spelling issue in a source code comment.

Changes compared to v7:
- Addressed Jianchao's feedback about patch "Make blk_get_request() block for
  non-PM requests while suspended".
- Added two new patches - one that documents the functions that iterate over
  requests and one that introduces a new function that iterates over all
  requests associated with a queue.

Changes compared to v6:
- Left out the patches that split RQF_PREEMPT in three flags.
- Left out the patch that introduces the SCSI device state SDEV_SUSPENDED.
- Left out the patch that introduces blk_pm_runtime_exit().
- Restored the patch that changes the PREEMPT_ONLY flag into a counter.

Changes compared to v5:
- Introduced a new flag RQF_DV that replaces RQF_PREEMPT for SCSI domain
  validation.
- Introduced a new request queue state QUEUE_FLAG_DV_ONLY for SCSI domain
  validation.
- Instead of using SDEV_QUIESCE for both runtime suspend and SCSI domain
  validation, use that state for domain validation only and introduce a new
  state for runtime suspend, namely SDEV_QUIESCE.
- Reallow system suspend during SCSI domain validation.
- Moved the runtime resume call from the request allocation code into
  blk_queue_enter().
- Instead of relying on q_usage_counter, iterate over the tag set to determine
  whether or not any requests are in flight.

Changes compared to v4:
- Dropped the patches "Give RQF_PREEMPT back its original meaning" and
  "Serialize queue freezing and blk_pre_runtime_suspend()".
- Replaced "percpu_ref_read()" with "percpu_is_in_use()".
- Inserted pm_request_resume() calls in the block layer request allocation
  code such that the context that submits a request no longer has to call
  pm_runtime_get().

Changes compared to v3:
- Avoid adverse interactions between system-wide suspend/resume and runtime
  power management by changing the PREEMPT_ONLY flag into a counter.
- Give RQF_PREEMPT back its original meaning, namely that it is only set for
  ide_preempt requests.
- Remove the flag BLK_MQ_REQ_PREEMPT.
- Removed the pm_request_resume() call.

Changes compared to v2:
- Fixed the build for CONFIG_BLOCK=n.
- Added a patch that introduces percpu_ref_read() in the percpu-counter code.
- Added a patch that makes it easier to detect missing pm_runtime_get*() calls.
- Addressed Jianchao's feedback including the comment about runtime overhead
  of switching a per-cpu counter to atomic mode.

Changes compared to v1:
- Moved the runtime power management code into a separate file.
- Addressed Ming's feedback.

Bart Van Assche (8):
  blk-mq: Document the functions that iterate over requests
  blk-mq: Introduce blk_mq_queue_rq_iter()
  block: Move power management code into a new source file
  block, scsi: Change the preempt-only flag into a counter
  block: Split blk_pm_add_request() and blk_pm_put_request()
  block: Schedule runtime resume earlier
  block: Make blk_get_request() block for non-PM requests while
    suspended
  blk-mq: Enable support for runtime power management

 block/Kconfig           |   3 +
 block/Makefile          |   1 +
 block/blk-core.c        | 271 +++++-----------------------------------
 block/blk-mq-debugfs.c  |  10 +-
 block/blk-mq-tag.c      |  74 ++++++++++-
 block/blk-mq-tag.h      |   2 +
 block/blk-mq.c          |   2 +
 block/blk-pm.c          | 253 +++++++++++++++++++++++++++++++++++++
 block/blk-pm.h          |  69 ++++++++++
 block/elevator.c        |  22 +---
 drivers/scsi/scsi_lib.c |  11 +-
 drivers/scsi/scsi_pm.c  |   1 +
 drivers/scsi/sd.c       |   1 +
 drivers/scsi/sr.c       |   1 +
 include/linux/blk-pm.h  |  24 ++++
 include/linux/blkdev.h  |  37 ++----
 16 files changed, 484 insertions(+), 298 deletions(-)
 create mode 100644 block/blk-pm.c
 create mode 100644 block/blk-pm.h
 create mode 100644 include/linux/blk-pm.h

-- 
2.19.0.397.gdd90340f6a-goog

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v9 1/8] blk-mq: Document the functions that iterate over requests
  2018-09-19 22:45 [PATCH v9 0/8] blk-mq: Implement runtime power management Bart Van Assche
@ 2018-09-19 22:45 ` Bart Van Assche
  2018-09-20  7:35   ` Johannes Thumshirn
  2018-09-19 22:45 ` [PATCH v9 2/8] blk-mq: Introduce blk_mq_queue_rq_iter() Bart Van Assche
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Bart Van Assche @ 2018-09-19 22:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Ming Lei,
	Jianchao Wang, Hannes Reinecke, Johannes Thumshirn

Make it easier to understand the purpose of the functions that iterate
over requests by documenting their purpose. Fix three minor spelling
mistakes in comments in these functions.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 block/blk-mq-tag.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 94e1ed667b6e..cb5db0c3cc32 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -232,13 +232,19 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 
 	/*
 	 * We can hit rq == NULL here, because the tagging functions
-	 * test and set the bit before assining ->rqs[].
+	 * test and set the bit before assigning ->rqs[].
 	 */
 	if (rq && rq->q == hctx->queue)
 		iter_data->fn(hctx, rq, iter_data->data, reserved);
 	return true;
 }
 
+/*
+ * Call function @fn(@hctx, rq, @data, @reserved) for each request queued on
+ * @hctx that has been assigned a driver tag. @reserved indicates whether @bt
+ * is the breserved_tags member or the bitmap_tags member of struct
+ * blk_mq_tags.
+ */
 static void bt_for_each(struct blk_mq_hw_ctx *hctx, struct sbitmap_queue *bt,
 			busy_iter_fn *fn, void *data, bool reserved)
 {
@@ -280,6 +286,11 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	return true;
 }
 
+/*
+ * Call function @fn(rq, @data, @reserved) for each request in @tags that has
+ * been started. @reserved indicates whether @bt is the breserved_tags member
+ * or the bitmap_tags member of struct blk_mq_tags.
+ */
 static void bt_tags_for_each(struct blk_mq_tags *tags, struct sbitmap_queue *bt,
 			     busy_tag_iter_fn *fn, void *data, bool reserved)
 {
@@ -294,6 +305,10 @@ static void bt_tags_for_each(struct blk_mq_tags *tags, struct sbitmap_queue *bt,
 		sbitmap_for_each_set(&bt->sb, bt_tags_iter, &iter_data);
 }
 
+/*
+ * Call @fn(rq, @priv, reserved) for each started request in @tags. 'reserved'
+ * indicates whether or not 'rq' is a reserved request.
+ */
 static void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags,
 		busy_tag_iter_fn *fn, void *priv)
 {
@@ -302,6 +317,10 @@ static void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags,
 	bt_tags_for_each(tags, &tags->bitmap_tags, fn, priv, false);
 }
 
+/*
+ * Call @fn(rq, @priv, reserved) for each request in @tagset. 'reserved'
+ * indicates whether or not 'rq' is a reserved request.
+ */
 void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 		busy_tag_iter_fn *fn, void *priv)
 {
@@ -314,6 +333,11 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 }
 EXPORT_SYMBOL(blk_mq_tagset_busy_iter);
 
+/*
+ * Call @fn(rq, @priv, reserved) for each request associated with request
+ * queue @q or any queue it shares tags with and that has been assigned a
+ * driver tag. 'reserved' indicates whether or not 'rq' is a reserved request.
+ */
 void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
 		void *priv)
 {
@@ -323,7 +347,7 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
 	/*
 	 * __blk_mq_update_nr_hw_queues will update the nr_hw_queues and
 	 * queue_hw_ctx after freeze the queue. So we could use q_usage_counter
-	 * to avoid race with it. __blk_mq_update_nr_hw_queues will users
+	 * to avoid race with it. __blk_mq_update_nr_hw_queues will use
 	 * synchronize_rcu to ensure all of the users go out of the critical
 	 * section below and see zeroed q_usage_counter.
 	 */
@@ -337,7 +361,7 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
 		struct blk_mq_tags *tags = hctx->tags;
 
 		/*
-		 * If not software queues are currently mapped to this
+		 * If no software queues are currently mapped to this
 		 * hardware queue, there's nothing to check
 		 */
 		if (!blk_mq_hw_queue_mapped(hctx))
-- 
2.19.0.397.gdd90340f6a-goog

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v9 2/8] blk-mq: Introduce blk_mq_queue_rq_iter()
  2018-09-19 22:45 [PATCH v9 0/8] blk-mq: Implement runtime power management Bart Van Assche
  2018-09-19 22:45 ` [PATCH v9 1/8] blk-mq: Document the functions that iterate over requests Bart Van Assche
@ 2018-09-19 22:45 ` Bart Van Assche
  2018-09-19 22:45 ` [PATCH v9 3/8] block: Move power management code into a new source file Bart Van Assche
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2018-09-19 22:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Ming Lei,
	Jianchao Wang, Hannes Reinecke, Johannes Thumshirn, Alan Stern

This function will be used in the patch "Make blk_get_request() block for
non-PM requests while suspended".

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
---
 block/blk-mq-tag.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 block/blk-mq-tag.h |  2 ++
 2 files changed, 46 insertions(+)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index cb5db0c3cc32..f95b41b5f07a 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -374,6 +374,50 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
 	rcu_read_unlock();
 }
 
+/*
+ * Call @fn(rq, @priv, reserved) for each request associated with request
+ * queue @q or any queue that it shares tags with and that has been assigned a
+ * tag. 'reserved' indicates whether or not 'rq' is a reserved request. In
+ * contrast to blk_mq_queue_tag_busy_iter(), if an I/O scheduler has been
+ * associated with @q, this function also iterates over requests that have
+ * been assigned a scheduler tag but that have not yet been assigned a driver
+ * tag.
+ */
+void blk_mq_queue_rq_iter(struct request_queue *q, busy_iter_fn *fn, void *priv)
+{
+	struct blk_mq_hw_ctx *hctx;
+	int i;
+
+	/*
+	 * __blk_mq_update_nr_hw_queues() will update nr_hw_queues and
+	 * queue_hw_ctx after having frozen the request queue. So we can use
+	 * q_usage_counter to avoid a race with that
+	 * function. __blk_mq_update_nr_hw_queues() uses synchronize_rcu() to
+	 * ensure that this function leaves the critical section below.
+	 */
+	rcu_read_lock();
+	if (percpu_ref_is_zero(&q->q_usage_counter)) {
+		rcu_read_unlock();
+		return;
+	}
+
+	queue_for_each_hw_ctx(q, hctx, i) {
+		struct blk_mq_tags *tags = hctx->sched_tags ? : hctx->tags;
+
+		/*
+		 * If no software queues are currently mapped to this
+		 * hardware queue, there's nothing to check
+		 */
+		if (!blk_mq_hw_queue_mapped(hctx))
+			continue;
+
+		if (tags->nr_reserved_tags)
+			bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
+		bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
+	}
+	rcu_read_unlock();
+}
+
 static int bt_alloc(struct sbitmap_queue *bt, unsigned int depth,
 		    bool round_robin, int node)
 {
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 61deab0b5a5a..25e62997ed6c 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -35,6 +35,8 @@ extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 extern void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool);
 void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
 		void *priv);
+void blk_mq_queue_rq_iter(struct request_queue *q, busy_iter_fn *fn,
+			  void *priv);
 
 static inline struct sbq_wait_state *bt_wait_ptr(struct sbitmap_queue *bt,
 						 struct blk_mq_hw_ctx *hctx)
-- 
2.19.0.397.gdd90340f6a-goog

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v9 3/8] block: Move power management code into a new source file
  2018-09-19 22:45 [PATCH v9 0/8] blk-mq: Implement runtime power management Bart Van Assche
  2018-09-19 22:45 ` [PATCH v9 1/8] blk-mq: Document the functions that iterate over requests Bart Van Assche
  2018-09-19 22:45 ` [PATCH v9 2/8] blk-mq: Introduce blk_mq_queue_rq_iter() Bart Van Assche
@ 2018-09-19 22:45 ` Bart Van Assche
  2018-09-19 22:45 ` [PATCH v9 4/8] block, scsi: Change the preempt-only flag into a counter Bart Van Assche
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2018-09-19 22:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Ming Lei,
	Jianchao Wang, Hannes Reinecke, Johannes Thumshirn, Alan Stern

Move the code for runtime power management from blk-core.c into the
new source file blk-pm.c. Move the corresponding declarations from
<linux/blkdev.h> into <linux/blk-pm.h>. For CONFIG_PM=n, leave out
the declarations of the functions that are not used in that mode.
This patch not only reduces the number of #ifdefs in the block layer
core code but also reduces the size of header file <linux/blkdev.h>
and hence should help to reduce the build time of the Linux kernel
if CONFIG_PM is not defined.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
---
 block/Kconfig          |   3 +
 block/Makefile         |   1 +
 block/blk-core.c       | 196 +----------------------------------------
 block/blk-pm.c         | 188 +++++++++++++++++++++++++++++++++++++++
 block/blk-pm.h         |  43 +++++++++
 block/elevator.c       |  22 +----
 drivers/scsi/scsi_pm.c |   1 +
 drivers/scsi/sd.c      |   1 +
 drivers/scsi/sr.c      |   1 +
 include/linux/blk-pm.h |  24 +++++
 include/linux/blkdev.h |  23 -----
 11 files changed, 264 insertions(+), 239 deletions(-)
 create mode 100644 block/blk-pm.c
 create mode 100644 block/blk-pm.h
 create mode 100644 include/linux/blk-pm.h

diff --git a/block/Kconfig b/block/Kconfig
index 1f2469a0123c..85263e7bded6 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -228,4 +228,7 @@ config BLK_MQ_RDMA
 	depends on BLOCK && INFINIBAND
 	default y
 
+config BLK_PM
+	def_bool BLOCK && PM
+
 source block/Kconfig.iosched
diff --git a/block/Makefile b/block/Makefile
index 572b33f32c07..27eac600474f 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -37,3 +37,4 @@ obj-$(CONFIG_BLK_WBT)		+= blk-wbt.o
 obj-$(CONFIG_BLK_DEBUG_FS)	+= blk-mq-debugfs.o
 obj-$(CONFIG_BLK_DEBUG_FS_ZONED)+= blk-mq-debugfs-zoned.o
 obj-$(CONFIG_BLK_SED_OPAL)	+= sed-opal.o
+obj-$(CONFIG_BLK_PM)		+= blk-pm.o
diff --git a/block/blk-core.c b/block/blk-core.c
index 4dbc93f43b38..6d4dd176bd9d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -42,6 +42,7 @@
 #include "blk.h"
 #include "blk-mq.h"
 #include "blk-mq-sched.h"
+#include "blk-pm.h"
 #include "blk-rq-qos.h"
 
 #ifdef CONFIG_DEBUG_FS
@@ -1726,16 +1727,6 @@ void part_round_stats(struct request_queue *q, int cpu, struct hd_struct *part)
 }
 EXPORT_SYMBOL_GPL(part_round_stats);
 
-#ifdef CONFIG_PM
-static void blk_pm_put_request(struct request *rq)
-{
-	if (rq->q->dev && !(rq->rq_flags & RQF_PM) && !--rq->q->nr_pending)
-		pm_runtime_mark_last_busy(rq->q->dev);
-}
-#else
-static inline void blk_pm_put_request(struct request *rq) {}
-#endif
-
 void __blk_put_request(struct request_queue *q, struct request *req)
 {
 	req_flags_t rq_flags = req->rq_flags;
@@ -3757,191 +3748,6 @@ void blk_finish_plug(struct blk_plug *plug)
 }
 EXPORT_SYMBOL(blk_finish_plug);
 
-#ifdef CONFIG_PM
-/**
- * blk_pm_runtime_init - Block layer runtime PM initialization routine
- * @q: the queue of the device
- * @dev: the device the queue belongs to
- *
- * Description:
- *    Initialize runtime-PM-related fields for @q and start auto suspend for
- *    @dev. Drivers that want to take advantage of request-based runtime PM
- *    should call this function after @dev has been initialized, and its
- *    request queue @q has been allocated, and runtime PM for it can not happen
- *    yet(either due to disabled/forbidden or its usage_count > 0). In most
- *    cases, driver should call this function before any I/O has taken place.
- *
- *    This function takes care of setting up using auto suspend for the device,
- *    the autosuspend delay is set to -1 to make runtime suspend impossible
- *    until an updated value is either set by user or by driver. Drivers do
- *    not need to touch other autosuspend settings.
- *
- *    The block layer runtime PM is request based, so only works for drivers
- *    that use request as their IO unit instead of those directly use bio's.
- */
-void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
-{
-	/* Don't enable runtime PM for blk-mq until it is ready */
-	if (q->mq_ops) {
-		pm_runtime_disable(dev);
-		return;
-	}
-
-	q->dev = dev;
-	q->rpm_status = RPM_ACTIVE;
-	pm_runtime_set_autosuspend_delay(q->dev, -1);
-	pm_runtime_use_autosuspend(q->dev);
-}
-EXPORT_SYMBOL(blk_pm_runtime_init);
-
-/**
- * blk_pre_runtime_suspend - Pre runtime suspend check
- * @q: the queue of the device
- *
- * Description:
- *    This function will check if runtime suspend is allowed for the device
- *    by examining if there are any requests pending in the queue. If there
- *    are requests pending, the device can not be runtime suspended; otherwise,
- *    the queue's status will be updated to SUSPENDING and the driver can
- *    proceed to suspend the device.
- *
- *    For the not allowed case, we mark last busy for the device so that
- *    runtime PM core will try to autosuspend it some time later.
- *
- *    This function should be called near the start of the device's
- *    runtime_suspend callback.
- *
- * Return:
- *    0		- OK to runtime suspend the device
- *    -EBUSY	- Device should not be runtime suspended
- */
-int blk_pre_runtime_suspend(struct request_queue *q)
-{
-	int ret = 0;
-
-	if (!q->dev)
-		return ret;
-
-	spin_lock_irq(q->queue_lock);
-	if (q->nr_pending) {
-		ret = -EBUSY;
-		pm_runtime_mark_last_busy(q->dev);
-	} else {
-		q->rpm_status = RPM_SUSPENDING;
-	}
-	spin_unlock_irq(q->queue_lock);
-	return ret;
-}
-EXPORT_SYMBOL(blk_pre_runtime_suspend);
-
-/**
- * blk_post_runtime_suspend - Post runtime suspend processing
- * @q: the queue of the device
- * @err: return value of the device's runtime_suspend function
- *
- * Description:
- *    Update the queue's runtime status according to the return value of the
- *    device's runtime suspend function and mark last busy for the device so
- *    that PM core will try to auto suspend the device at a later time.
- *
- *    This function should be called near the end of the device's
- *    runtime_suspend callback.
- */
-void blk_post_runtime_suspend(struct request_queue *q, int err)
-{
-	if (!q->dev)
-		return;
-
-	spin_lock_irq(q->queue_lock);
-	if (!err) {
-		q->rpm_status = RPM_SUSPENDED;
-	} else {
-		q->rpm_status = RPM_ACTIVE;
-		pm_runtime_mark_last_busy(q->dev);
-	}
-	spin_unlock_irq(q->queue_lock);
-}
-EXPORT_SYMBOL(blk_post_runtime_suspend);
-
-/**
- * blk_pre_runtime_resume - Pre runtime resume processing
- * @q: the queue of the device
- *
- * Description:
- *    Update the queue's runtime status to RESUMING in preparation for the
- *    runtime resume of the device.
- *
- *    This function should be called near the start of the device's
- *    runtime_resume callback.
- */
-void blk_pre_runtime_resume(struct request_queue *q)
-{
-	if (!q->dev)
-		return;
-
-	spin_lock_irq(q->queue_lock);
-	q->rpm_status = RPM_RESUMING;
-	spin_unlock_irq(q->queue_lock);
-}
-EXPORT_SYMBOL(blk_pre_runtime_resume);
-
-/**
- * blk_post_runtime_resume - Post runtime resume processing
- * @q: the queue of the device
- * @err: return value of the device's runtime_resume function
- *
- * Description:
- *    Update the queue's runtime status according to the return value of the
- *    device's runtime_resume function. If it is successfully resumed, process
- *    the requests that are queued into the device's queue when it is resuming
- *    and then mark last busy and initiate autosuspend for it.
- *
- *    This function should be called near the end of the device's
- *    runtime_resume callback.
- */
-void blk_post_runtime_resume(struct request_queue *q, int err)
-{
-	if (!q->dev)
-		return;
-
-	spin_lock_irq(q->queue_lock);
-	if (!err) {
-		q->rpm_status = RPM_ACTIVE;
-		__blk_run_queue(q);
-		pm_runtime_mark_last_busy(q->dev);
-		pm_request_autosuspend(q->dev);
-	} else {
-		q->rpm_status = RPM_SUSPENDED;
-	}
-	spin_unlock_irq(q->queue_lock);
-}
-EXPORT_SYMBOL(blk_post_runtime_resume);
-
-/**
- * blk_set_runtime_active - Force runtime status of the queue to be active
- * @q: the queue of the device
- *
- * If the device is left runtime suspended during system suspend the resume
- * hook typically resumes the device and corrects runtime status
- * accordingly. However, that does not affect the queue runtime PM status
- * which is still "suspended". This prevents processing requests from the
- * queue.
- *
- * This function can be used in driver's resume hook to correct queue
- * runtime PM status and re-enable peeking requests from the queue. It
- * should be called before first request is added to the queue.
- */
-void blk_set_runtime_active(struct request_queue *q)
-{
-	spin_lock_irq(q->queue_lock);
-	q->rpm_status = RPM_ACTIVE;
-	pm_runtime_mark_last_busy(q->dev);
-	pm_request_autosuspend(q->dev);
-	spin_unlock_irq(q->queue_lock);
-}
-EXPORT_SYMBOL(blk_set_runtime_active);
-#endif
-
 int __init blk_dev_init(void)
 {
 	BUILD_BUG_ON(REQ_OP_LAST >= (1 << REQ_OP_BITS));
diff --git a/block/blk-pm.c b/block/blk-pm.c
new file mode 100644
index 000000000000..9b636960d285
--- /dev/null
+++ b/block/blk-pm.c
@@ -0,0 +1,188 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/blk-pm.h>
+#include <linux/blkdev.h>
+#include <linux/pm_runtime.h>
+
+/**
+ * blk_pm_runtime_init - Block layer runtime PM initialization routine
+ * @q: the queue of the device
+ * @dev: the device the queue belongs to
+ *
+ * Description:
+ *    Initialize runtime-PM-related fields for @q and start auto suspend for
+ *    @dev. Drivers that want to take advantage of request-based runtime PM
+ *    should call this function after @dev has been initialized, and its
+ *    request queue @q has been allocated, and runtime PM for it can not happen
+ *    yet(either due to disabled/forbidden or its usage_count > 0). In most
+ *    cases, driver should call this function before any I/O has taken place.
+ *
+ *    This function takes care of setting up using auto suspend for the device,
+ *    the autosuspend delay is set to -1 to make runtime suspend impossible
+ *    until an updated value is either set by user or by driver. Drivers do
+ *    not need to touch other autosuspend settings.
+ *
+ *    The block layer runtime PM is request based, so only works for drivers
+ *    that use request as their IO unit instead of those directly use bio's.
+ */
+void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
+{
+	/* Don't enable runtime PM for blk-mq until it is ready */
+	if (q->mq_ops) {
+		pm_runtime_disable(dev);
+		return;
+	}
+
+	q->dev = dev;
+	q->rpm_status = RPM_ACTIVE;
+	pm_runtime_set_autosuspend_delay(q->dev, -1);
+	pm_runtime_use_autosuspend(q->dev);
+}
+EXPORT_SYMBOL(blk_pm_runtime_init);
+
+/**
+ * blk_pre_runtime_suspend - Pre runtime suspend check
+ * @q: the queue of the device
+ *
+ * Description:
+ *    This function will check if runtime suspend is allowed for the device
+ *    by examining if there are any requests pending in the queue. If there
+ *    are requests pending, the device can not be runtime suspended; otherwise,
+ *    the queue's status will be updated to SUSPENDING and the driver can
+ *    proceed to suspend the device.
+ *
+ *    For the not allowed case, we mark last busy for the device so that
+ *    runtime PM core will try to autosuspend it some time later.
+ *
+ *    This function should be called near the start of the device's
+ *    runtime_suspend callback.
+ *
+ * Return:
+ *    0		- OK to runtime suspend the device
+ *    -EBUSY	- Device should not be runtime suspended
+ */
+int blk_pre_runtime_suspend(struct request_queue *q)
+{
+	int ret = 0;
+
+	if (!q->dev)
+		return ret;
+
+	spin_lock_irq(q->queue_lock);
+	if (q->nr_pending) {
+		ret = -EBUSY;
+		pm_runtime_mark_last_busy(q->dev);
+	} else {
+		q->rpm_status = RPM_SUSPENDING;
+	}
+	spin_unlock_irq(q->queue_lock);
+	return ret;
+}
+EXPORT_SYMBOL(blk_pre_runtime_suspend);
+
+/**
+ * blk_post_runtime_suspend - Post runtime suspend processing
+ * @q: the queue of the device
+ * @err: return value of the device's runtime_suspend function
+ *
+ * Description:
+ *    Update the queue's runtime status according to the return value of the
+ *    device's runtime suspend function and mark last busy for the device so
+ *    that PM core will try to auto suspend the device at a later time.
+ *
+ *    This function should be called near the end of the device's
+ *    runtime_suspend callback.
+ */
+void blk_post_runtime_suspend(struct request_queue *q, int err)
+{
+	if (!q->dev)
+		return;
+
+	spin_lock_irq(q->queue_lock);
+	if (!err) {
+		q->rpm_status = RPM_SUSPENDED;
+	} else {
+		q->rpm_status = RPM_ACTIVE;
+		pm_runtime_mark_last_busy(q->dev);
+	}
+	spin_unlock_irq(q->queue_lock);
+}
+EXPORT_SYMBOL(blk_post_runtime_suspend);
+
+/**
+ * blk_pre_runtime_resume - Pre runtime resume processing
+ * @q: the queue of the device
+ *
+ * Description:
+ *    Update the queue's runtime status to RESUMING in preparation for the
+ *    runtime resume of the device.
+ *
+ *    This function should be called near the start of the device's
+ *    runtime_resume callback.
+ */
+void blk_pre_runtime_resume(struct request_queue *q)
+{
+	if (!q->dev)
+		return;
+
+	spin_lock_irq(q->queue_lock);
+	q->rpm_status = RPM_RESUMING;
+	spin_unlock_irq(q->queue_lock);
+}
+EXPORT_SYMBOL(blk_pre_runtime_resume);
+
+/**
+ * blk_post_runtime_resume - Post runtime resume processing
+ * @q: the queue of the device
+ * @err: return value of the device's runtime_resume function
+ *
+ * Description:
+ *    Update the queue's runtime status according to the return value of the
+ *    device's runtime_resume function. If it is successfully resumed, process
+ *    the requests that are queued into the device's queue when it is resuming
+ *    and then mark last busy and initiate autosuspend for it.
+ *
+ *    This function should be called near the end of the device's
+ *    runtime_resume callback.
+ */
+void blk_post_runtime_resume(struct request_queue *q, int err)
+{
+	if (!q->dev)
+		return;
+
+	spin_lock_irq(q->queue_lock);
+	if (!err) {
+		q->rpm_status = RPM_ACTIVE;
+		__blk_run_queue(q);
+		pm_runtime_mark_last_busy(q->dev);
+		pm_request_autosuspend(q->dev);
+	} else {
+		q->rpm_status = RPM_SUSPENDED;
+	}
+	spin_unlock_irq(q->queue_lock);
+}
+EXPORT_SYMBOL(blk_post_runtime_resume);
+
+/**
+ * blk_set_runtime_active - Force runtime status of the queue to be active
+ * @q: the queue of the device
+ *
+ * If the device is left runtime suspended during system suspend the resume
+ * hook typically resumes the device and corrects runtime status
+ * accordingly. However, that does not affect the queue runtime PM status
+ * which is still "suspended". This prevents processing requests from the
+ * queue.
+ *
+ * This function can be used in driver's resume hook to correct queue
+ * runtime PM status and re-enable peeking requests from the queue. It
+ * should be called before first request is added to the queue.
+ */
+void blk_set_runtime_active(struct request_queue *q)
+{
+	spin_lock_irq(q->queue_lock);
+	q->rpm_status = RPM_ACTIVE;
+	pm_runtime_mark_last_busy(q->dev);
+	pm_request_autosuspend(q->dev);
+	spin_unlock_irq(q->queue_lock);
+}
+EXPORT_SYMBOL(blk_set_runtime_active);
diff --git a/block/blk-pm.h b/block/blk-pm.h
new file mode 100644
index 000000000000..1ffc8ef203ec
--- /dev/null
+++ b/block/blk-pm.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _BLOCK_BLK_PM_H_
+#define _BLOCK_BLK_PM_H_
+
+#include <linux/pm_runtime.h>
+
+#ifdef CONFIG_PM
+static inline void blk_pm_requeue_request(struct request *rq)
+{
+	if (rq->q->dev && !(rq->rq_flags & RQF_PM))
+		rq->q->nr_pending--;
+}
+
+static inline void blk_pm_add_request(struct request_queue *q,
+				      struct request *rq)
+{
+	if (q->dev && !(rq->rq_flags & RQF_PM) && q->nr_pending++ == 0 &&
+	    (q->rpm_status == RPM_SUSPENDED || q->rpm_status == RPM_SUSPENDING))
+		pm_request_resume(q->dev);
+}
+
+static inline void blk_pm_put_request(struct request *rq)
+{
+	if (rq->q->dev && !(rq->rq_flags & RQF_PM) && !--rq->q->nr_pending)
+		pm_runtime_mark_last_busy(rq->q->dev);
+}
+#else
+static inline void blk_pm_requeue_request(struct request *rq)
+{
+}
+
+static inline void blk_pm_add_request(struct request_queue *q,
+				      struct request *rq)
+{
+}
+
+static inline void blk_pm_put_request(struct request *rq)
+{
+}
+#endif
+
+#endif /* _BLOCK_BLK_PM_H_ */
diff --git a/block/elevator.c b/block/elevator.c
index 6a06b5d040e5..e18ac68626e3 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -41,6 +41,7 @@
 
 #include "blk.h"
 #include "blk-mq-sched.h"
+#include "blk-pm.h"
 #include "blk-wbt.h"
 
 static DEFINE_SPINLOCK(elv_list_lock);
@@ -557,27 +558,6 @@ void elv_bio_merged(struct request_queue *q, struct request *rq,
 		e->type->ops.sq.elevator_bio_merged_fn(q, rq, bio);
 }
 
-#ifdef CONFIG_PM
-static void blk_pm_requeue_request(struct request *rq)
-{
-	if (rq->q->dev && !(rq->rq_flags & RQF_PM))
-		rq->q->nr_pending--;
-}
-
-static void blk_pm_add_request(struct request_queue *q, struct request *rq)
-{
-	if (q->dev && !(rq->rq_flags & RQF_PM) && q->nr_pending++ == 0 &&
-	    (q->rpm_status == RPM_SUSPENDED || q->rpm_status == RPM_SUSPENDING))
-		pm_request_resume(q->dev);
-}
-#else
-static inline void blk_pm_requeue_request(struct request *rq) {}
-static inline void blk_pm_add_request(struct request_queue *q,
-				      struct request *rq)
-{
-}
-#endif
-
 void elv_requeue_request(struct request_queue *q, struct request *rq)
 {
 	/*
diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
index b44c1bb687a2..a2b4179bfdf7 100644
--- a/drivers/scsi/scsi_pm.c
+++ b/drivers/scsi/scsi_pm.c
@@ -8,6 +8,7 @@
 #include <linux/pm_runtime.h>
 #include <linux/export.h>
 #include <linux/async.h>
+#include <linux/blk-pm.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_device.h>
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index b79b366a94f7..64514e8359e4 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -45,6 +45,7 @@
 #include <linux/init.h>
 #include <linux/blkdev.h>
 #include <linux/blkpg.h>
+#include <linux/blk-pm.h>
 #include <linux/delay.h>
 #include <linux/mutex.h>
 #include <linux/string_helpers.h>
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index d0389b20574d..4f07b3410595 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -43,6 +43,7 @@
 #include <linux/interrupt.h>
 #include <linux/init.h>
 #include <linux/blkdev.h>
+#include <linux/blk-pm.h>
 #include <linux/mutex.h>
 #include <linux/slab.h>
 #include <linux/pm_runtime.h>
diff --git a/include/linux/blk-pm.h b/include/linux/blk-pm.h
new file mode 100644
index 000000000000..b80c65aba249
--- /dev/null
+++ b/include/linux/blk-pm.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _BLK_PM_H_
+#define _BLK_PM_H_
+
+struct device;
+struct request_queue;
+
+/*
+ * block layer runtime pm functions
+ */
+#ifdef CONFIG_PM
+extern void blk_pm_runtime_init(struct request_queue *q, struct device *dev);
+extern int blk_pre_runtime_suspend(struct request_queue *q);
+extern void blk_post_runtime_suspend(struct request_queue *q, int err);
+extern void blk_pre_runtime_resume(struct request_queue *q);
+extern void blk_post_runtime_resume(struct request_queue *q, int err);
+extern void blk_set_runtime_active(struct request_queue *q);
+#else
+static inline void blk_pm_runtime_init(struct request_queue *q,
+				       struct device *dev) {}
+#endif
+
+#endif /* _BLK_PM_H_ */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 6980014357d4..e25ce42c982d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1280,29 +1280,6 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id,
 extern void blk_put_queue(struct request_queue *);
 extern void blk_set_queue_dying(struct request_queue *);
 
-/*
- * block layer runtime pm functions
- */
-#ifdef CONFIG_PM
-extern void blk_pm_runtime_init(struct request_queue *q, struct device *dev);
-extern int blk_pre_runtime_suspend(struct request_queue *q);
-extern void blk_post_runtime_suspend(struct request_queue *q, int err);
-extern void blk_pre_runtime_resume(struct request_queue *q);
-extern void blk_post_runtime_resume(struct request_queue *q, int err);
-extern void blk_set_runtime_active(struct request_queue *q);
-#else
-static inline void blk_pm_runtime_init(struct request_queue *q,
-	struct device *dev) {}
-static inline int blk_pre_runtime_suspend(struct request_queue *q)
-{
-	return -ENOSYS;
-}
-static inline void blk_post_runtime_suspend(struct request_queue *q, int err) {}
-static inline void blk_pre_runtime_resume(struct request_queue *q) {}
-static inline void blk_post_runtime_resume(struct request_queue *q, int err) {}
-static inline void blk_set_runtime_active(struct request_queue *q) {}
-#endif
-
 /*
  * blk_plug permits building a queue of related requests by holding the I/O
  * fragments for a short period. This allows merging of sequential requests
-- 
2.19.0.397.gdd90340f6a-goog

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v9 4/8] block, scsi: Change the preempt-only flag into a counter
  2018-09-19 22:45 [PATCH v9 0/8] blk-mq: Implement runtime power management Bart Van Assche
                   ` (2 preceding siblings ...)
  2018-09-19 22:45 ` [PATCH v9 3/8] block: Move power management code into a new source file Bart Van Assche
@ 2018-09-19 22:45 ` Bart Van Assche
  2018-09-19 22:45 ` [PATCH v9 5/8] block: Split blk_pm_add_request() and blk_pm_put_request() Bart Van Assche
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2018-09-19 22:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche,
	Martin K . Petersen, Ming Lei, Jianchao Wang, Johannes Thumshirn,
	Alan Stern

The RQF_PREEMPT flag is used for three purposes:
- In the SCSI core, for making sure that power management requests
  are executed even if a device is in the "quiesced" state.
- For domain validation by SCSI drivers that use the parallel port.
- In the IDE driver, for IDE preempt requests.
Rename "preempt-only" into "pm-only" because the primary purpose of
this mode is power management. Since the power management core may
but does not have to resume a runtime suspended device before
performing system-wide suspend and since a later patch will set
"pm-only" mode as long as a block device is runtime suspended, make
it possible to set "pm-only" mode from more than one context. Since
with this change scsi_device_quiesce() is no longer idempotent, make
that function return early if it is called for a quiesced queue.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
---
 block/blk-core.c        | 35 ++++++++++++++++++-----------------
 block/blk-mq-debugfs.c  | 10 +++++++++-
 drivers/scsi/scsi_lib.c | 11 +++++++----
 include/linux/blkdev.h  | 14 +++++++++-----
 4 files changed, 43 insertions(+), 27 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 6d4dd176bd9d..1a691f5269bb 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -422,24 +422,25 @@ void blk_sync_queue(struct request_queue *q)
 EXPORT_SYMBOL(blk_sync_queue);
 
 /**
- * blk_set_preempt_only - set QUEUE_FLAG_PREEMPT_ONLY
+ * blk_set_pm_only - increment pm_only counter
  * @q: request queue pointer
- *
- * Returns the previous value of the PREEMPT_ONLY flag - 0 if the flag was not
- * set and 1 if the flag was already set.
  */
-int blk_set_preempt_only(struct request_queue *q)
+void blk_set_pm_only(struct request_queue *q)
 {
-	return blk_queue_flag_test_and_set(QUEUE_FLAG_PREEMPT_ONLY, q);
+	atomic_inc(&q->pm_only);
 }
-EXPORT_SYMBOL_GPL(blk_set_preempt_only);
+EXPORT_SYMBOL_GPL(blk_set_pm_only);
 
-void blk_clear_preempt_only(struct request_queue *q)
+void blk_clear_pm_only(struct request_queue *q)
 {
-	blk_queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q);
-	wake_up_all(&q->mq_freeze_wq);
+	int pm_only;
+
+	pm_only = atomic_dec_return(&q->pm_only);
+	WARN_ON_ONCE(pm_only < 0);
+	if (pm_only == 0)
+		wake_up_all(&q->mq_freeze_wq);
 }
-EXPORT_SYMBOL_GPL(blk_clear_preempt_only);
+EXPORT_SYMBOL_GPL(blk_clear_pm_only);
 
 /**
  * __blk_run_queue_uncond - run a queue whether or not it has been stopped
@@ -918,7 +919,7 @@ EXPORT_SYMBOL(blk_alloc_queue);
  */
 int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 {
-	const bool preempt = flags & BLK_MQ_REQ_PREEMPT;
+	const bool pm = flags & BLK_MQ_REQ_PREEMPT;
 
 	while (true) {
 		bool success = false;
@@ -926,11 +927,11 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 		rcu_read_lock();
 		if (percpu_ref_tryget_live(&q->q_usage_counter)) {
 			/*
-			 * The code that sets the PREEMPT_ONLY flag is
-			 * responsible for ensuring that that flag is globally
-			 * visible before the queue is unfrozen.
+			 * The code that increments the pm_only counter is
+			 * responsible for ensuring that that counter is
+			 * globally visible before the queue is unfrozen.
 			 */
-			if (preempt || !blk_queue_preempt_only(q)) {
+			if (pm || !blk_queue_pm_only(q)) {
 				success = true;
 			} else {
 				percpu_ref_put(&q->q_usage_counter);
@@ -955,7 +956,7 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 
 		wait_event(q->mq_freeze_wq,
 			   (atomic_read(&q->mq_freeze_depth) == 0 &&
-			    (preempt || !blk_queue_preempt_only(q))) ||
+			    (pm || !blk_queue_pm_only(q))) ||
 			   blk_queue_dying(q));
 		if (blk_queue_dying(q))
 			return -ENODEV;
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index cb1e6cf7ac48..a5ea86835fcb 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -102,6 +102,14 @@ static int blk_flags_show(struct seq_file *m, const unsigned long flags,
 	return 0;
 }
 
+static int queue_pm_only_show(void *data, struct seq_file *m)
+{
+	struct request_queue *q = data;
+
+	seq_printf(m, "%d\n", atomic_read(&q->pm_only));
+	return 0;
+}
+
 #define QUEUE_FLAG_NAME(name) [QUEUE_FLAG_##name] = #name
 static const char *const blk_queue_flag_name[] = {
 	QUEUE_FLAG_NAME(QUEUED),
@@ -132,7 +140,6 @@ static const char *const blk_queue_flag_name[] = {
 	QUEUE_FLAG_NAME(REGISTERED),
 	QUEUE_FLAG_NAME(SCSI_PASSTHROUGH),
 	QUEUE_FLAG_NAME(QUIESCED),
-	QUEUE_FLAG_NAME(PREEMPT_ONLY),
 };
 #undef QUEUE_FLAG_NAME
 
@@ -209,6 +216,7 @@ static ssize_t queue_write_hint_store(void *data, const char __user *buf,
 static const struct blk_mq_debugfs_attr blk_mq_debugfs_queue_attrs[] = {
 	{ "poll_stat", 0400, queue_poll_stat_show },
 	{ "requeue_list", 0400, .seq_ops = &queue_requeue_list_seq_ops },
+	{ "pm_only", 0600, queue_pm_only_show, NULL },
 	{ "state", 0600, queue_state_show, queue_state_write },
 	{ "write_hints", 0600, queue_write_hint_show, queue_write_hint_store },
 	{ "zone_wlock", 0400, queue_zone_wlock_show, NULL },
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index eb97d2dd3651..62348412ed1b 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -3046,11 +3046,14 @@ scsi_device_quiesce(struct scsi_device *sdev)
 	 */
 	WARN_ON_ONCE(sdev->quiesced_by && sdev->quiesced_by != current);
 
-	blk_set_preempt_only(q);
+	if (sdev->quiesced_by == current)
+		return 0;
+
+	blk_set_pm_only(q);
 
 	blk_mq_freeze_queue(q);
 	/*
-	 * Ensure that the effect of blk_set_preempt_only() will be visible
+	 * Ensure that the effect of blk_set_pm_only() will be visible
 	 * for percpu_ref_tryget() callers that occur after the queue
 	 * unfreeze even if the queue was already frozen before this function
 	 * was called. See also https://lwn.net/Articles/573497/.
@@ -3063,7 +3066,7 @@ scsi_device_quiesce(struct scsi_device *sdev)
 	if (err == 0)
 		sdev->quiesced_by = current;
 	else
-		blk_clear_preempt_only(q);
+		blk_clear_pm_only(q);
 	mutex_unlock(&sdev->state_mutex);
 
 	return err;
@@ -3088,7 +3091,7 @@ void scsi_device_resume(struct scsi_device *sdev)
 	mutex_lock(&sdev->state_mutex);
 	WARN_ON_ONCE(!sdev->quiesced_by);
 	sdev->quiesced_by = NULL;
-	blk_clear_preempt_only(sdev->request_queue);
+	blk_clear_pm_only(sdev->request_queue);
 	if (sdev->sdev_state == SDEV_QUIESCE)
 		scsi_device_set_state(sdev, SDEV_RUNNING);
 	mutex_unlock(&sdev->state_mutex);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e25ce42c982d..5f6c36b5058c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -504,6 +504,12 @@ struct request_queue {
 	 * various queue flags, see QUEUE_* below
 	 */
 	unsigned long		queue_flags;
+	/*
+	 * Number of contexts that have called blk_set_pm_only(). If this
+	 * counter is above zero then only RQF_PM and RQF_PREEMPT requests are
+	 * processed.
+	 */
+	atomic_t		pm_only;
 
 	/*
 	 * ida allocated id for this queue.  Used to index queues from
@@ -698,7 +704,6 @@ struct request_queue {
 #define QUEUE_FLAG_REGISTERED  26	/* queue has been registered to a disk */
 #define QUEUE_FLAG_SCSI_PASSTHROUGH 27	/* queue supports SCSI commands */
 #define QUEUE_FLAG_QUIESCED    28	/* queue has been quiesced */
-#define QUEUE_FLAG_PREEMPT_ONLY	29	/* only process REQ_PREEMPT requests */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_SAME_COMP)	|	\
@@ -736,12 +741,11 @@ bool blk_queue_flag_test_and_clear(unsigned int flag, struct request_queue *q);
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
 			     REQ_FAILFAST_DRIVER))
 #define blk_queue_quiesced(q)	test_bit(QUEUE_FLAG_QUIESCED, &(q)->queue_flags)
-#define blk_queue_preempt_only(q)				\
-	test_bit(QUEUE_FLAG_PREEMPT_ONLY, &(q)->queue_flags)
+#define blk_queue_pm_only(q)	atomic_read(&(q)->pm_only)
 #define blk_queue_fua(q)	test_bit(QUEUE_FLAG_FUA, &(q)->queue_flags)
 
-extern int blk_set_preempt_only(struct request_queue *q);
-extern void blk_clear_preempt_only(struct request_queue *q);
+extern void blk_set_pm_only(struct request_queue *q);
+extern void blk_clear_pm_only(struct request_queue *q);
 
 static inline int queue_in_flight(struct request_queue *q)
 {
-- 
2.19.0.397.gdd90340f6a-goog

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v9 5/8] block: Split blk_pm_add_request() and blk_pm_put_request()
  2018-09-19 22:45 [PATCH v9 0/8] blk-mq: Implement runtime power management Bart Van Assche
                   ` (3 preceding siblings ...)
  2018-09-19 22:45 ` [PATCH v9 4/8] block, scsi: Change the preempt-only flag into a counter Bart Van Assche
@ 2018-09-19 22:45 ` Bart Van Assche
  2018-09-19 22:45 ` [PATCH v9 6/8] block: Schedule runtime resume earlier Bart Van Assche
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2018-09-19 22:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche,
	Martin K . Petersen, Ming Lei, Jianchao Wang, Hannes Reinecke,
	Johannes Thumshirn, Alan Stern

Move the pm_request_resume() and pm_runtime_mark_last_busy() calls into
two new functions and thereby separate legacy block layer code from code
that works for both the legacy block layer and blk-mq. A later patch will
add calls to the new functions in the blk-mq code.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
---
 block/blk-core.c |  1 +
 block/blk-pm.h   | 36 +++++++++++++++++++++++++++++++-----
 block/elevator.c |  1 +
 3 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 1a691f5269bb..fd91e9bf2893 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1744,6 +1744,7 @@ void __blk_put_request(struct request_queue *q, struct request *req)
 
 	blk_req_zone_write_unlock(req);
 	blk_pm_put_request(req);
+	blk_pm_mark_last_busy(req);
 
 	elv_completed_request(q, req);
 
diff --git a/block/blk-pm.h b/block/blk-pm.h
index 1ffc8ef203ec..a8564ea72a41 100644
--- a/block/blk-pm.h
+++ b/block/blk-pm.h
@@ -6,8 +6,23 @@
 #include <linux/pm_runtime.h>
 
 #ifdef CONFIG_PM
+static inline void blk_pm_request_resume(struct request_queue *q)
+{
+	if (q->dev && (q->rpm_status == RPM_SUSPENDED ||
+		       q->rpm_status == RPM_SUSPENDING))
+		pm_request_resume(q->dev);
+}
+
+static inline void blk_pm_mark_last_busy(struct request *rq)
+{
+	if (rq->q->dev && !(rq->rq_flags & RQF_PM))
+		pm_runtime_mark_last_busy(rq->q->dev);
+}
+
 static inline void blk_pm_requeue_request(struct request *rq)
 {
+	lockdep_assert_held(rq->q->queue_lock);
+
 	if (rq->q->dev && !(rq->rq_flags & RQF_PM))
 		rq->q->nr_pending--;
 }
@@ -15,17 +30,28 @@ static inline void blk_pm_requeue_request(struct request *rq)
 static inline void blk_pm_add_request(struct request_queue *q,
 				      struct request *rq)
 {
-	if (q->dev && !(rq->rq_flags & RQF_PM) && q->nr_pending++ == 0 &&
-	    (q->rpm_status == RPM_SUSPENDED || q->rpm_status == RPM_SUSPENDING))
-		pm_request_resume(q->dev);
+	lockdep_assert_held(q->queue_lock);
+
+	if (q->dev && !(rq->rq_flags & RQF_PM))
+		q->nr_pending++;
 }
 
 static inline void blk_pm_put_request(struct request *rq)
 {
-	if (rq->q->dev && !(rq->rq_flags & RQF_PM) && !--rq->q->nr_pending)
-		pm_runtime_mark_last_busy(rq->q->dev);
+	lockdep_assert_held(rq->q->queue_lock);
+
+	if (rq->q->dev && !(rq->rq_flags & RQF_PM))
+		--rq->q->nr_pending;
 }
 #else
+static inline void blk_pm_request_resume(struct request_queue *q)
+{
+}
+
+static inline void blk_pm_mark_last_busy(struct request *rq)
+{
+}
+
 static inline void blk_pm_requeue_request(struct request *rq)
 {
 }
diff --git a/block/elevator.c b/block/elevator.c
index e18ac68626e3..1c992bf6cfb1 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -601,6 +601,7 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where)
 	trace_block_rq_insert(q, rq);
 
 	blk_pm_add_request(q, rq);
+	blk_pm_request_resume(q);
 
 	rq->q = q;
 
-- 
2.19.0.397.gdd90340f6a-goog

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v9 6/8] block: Schedule runtime resume earlier
  2018-09-19 22:45 [PATCH v9 0/8] blk-mq: Implement runtime power management Bart Van Assche
                   ` (4 preceding siblings ...)
  2018-09-19 22:45 ` [PATCH v9 5/8] block: Split blk_pm_add_request() and blk_pm_put_request() Bart Van Assche
@ 2018-09-19 22:45 ` Bart Van Assche
  2018-09-19 22:45 ` [PATCH v9 7/8] block: Make blk_get_request() block for non-PM requests while suspended Bart Van Assche
  2018-09-19 22:45 ` [PATCH v9 8/8] blk-mq: Enable support for runtime power management Bart Van Assche
  7 siblings, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2018-09-19 22:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Ming Lei,
	Jianchao Wang, Hannes Reinecke, Johannes Thumshirn, Alan Stern

Instead of scheduling runtime resume of a request queue after a
request has been queued, schedule asynchronous resume during request
allocation. The new pm_request_resume() calls occur after
blk_queue_enter() has increased the q_usage_counter request queue
member. This change is needed for a later patch that will make request
allocation block while the queue status is not RPM_ACTIVE.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
---
 block/blk-core.c | 2 ++
 block/elevator.c | 1 -
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index fd91e9bf2893..18b874d5c9c9 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -942,6 +942,8 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 		if (success)
 			return 0;
 
+		blk_pm_request_resume(q);
+
 		if (flags & BLK_MQ_REQ_NOWAIT)
 			return -EBUSY;
 
diff --git a/block/elevator.c b/block/elevator.c
index 1c992bf6cfb1..e18ac68626e3 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -601,7 +601,6 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where)
 	trace_block_rq_insert(q, rq);
 
 	blk_pm_add_request(q, rq);
-	blk_pm_request_resume(q);
 
 	rq->q = q;
 
-- 
2.19.0.397.gdd90340f6a-goog

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v9 7/8] block: Make blk_get_request() block for non-PM requests while suspended
  2018-09-19 22:45 [PATCH v9 0/8] blk-mq: Implement runtime power management Bart Van Assche
                   ` (5 preceding siblings ...)
  2018-09-19 22:45 ` [PATCH v9 6/8] block: Schedule runtime resume earlier Bart Van Assche
@ 2018-09-19 22:45 ` Bart Van Assche
  2018-09-20  3:48   ` Ming Lei
  2018-09-19 22:45 ` [PATCH v9 8/8] blk-mq: Enable support for runtime power management Bart Van Assche
  7 siblings, 1 reply; 14+ messages in thread
From: Bart Van Assche @ 2018-09-19 22:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Ming Lei,
	Jianchao Wang, Hannes Reinecke, Johannes Thumshirn, Alan Stern

Instead of allowing requests that are not power management requests
to enter the queue in runtime suspended status (RPM_SUSPENDED), make
the blk_get_request() caller block. This change fixes a starvation
issue: it is now guaranteed that power management requests will be
executed no matter how many blk_get_request() callers are waiting.
For blk-mq, instead of maintaining the q->nr_pending counter, rely
on q->q_usage_counter. Call pm_runtime_mark_last_busy() every time a
request finishes instead of only if the queue depth drops to zero.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
---
 block/blk-core.c | 37 +++++-----------------
 block/blk-pm.c   | 81 +++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 84 insertions(+), 34 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 18b874d5c9c9..ae092ca121d5 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2747,30 +2747,6 @@ void blk_account_io_done(struct request *req, u64 now)
 	}
 }
 
-#ifdef CONFIG_PM
-/*
- * Don't process normal requests when queue is suspended
- * or in the process of suspending/resuming
- */
-static bool blk_pm_allow_request(struct request *rq)
-{
-	switch (rq->q->rpm_status) {
-	case RPM_RESUMING:
-	case RPM_SUSPENDING:
-		return rq->rq_flags & RQF_PM;
-	case RPM_SUSPENDED:
-		return false;
-	default:
-		return true;
-	}
-}
-#else
-static bool blk_pm_allow_request(struct request *rq)
-{
-	return true;
-}
-#endif
-
 void blk_account_io_start(struct request *rq, bool new_io)
 {
 	struct hd_struct *part;
@@ -2816,11 +2792,14 @@ static struct request *elv_next_request(struct request_queue *q)
 
 	while (1) {
 		list_for_each_entry(rq, &q->queue_head, queuelist) {
-			if (blk_pm_allow_request(rq))
-				return rq;
-
-			if (rq->rq_flags & RQF_SOFTBARRIER)
-				break;
+#ifdef CONFIG_PM
+			/*
+			 * If a request gets queued in state RPM_SUSPENDED
+			 * then that's a kernel bug.
+			 */
+			WARN_ON_ONCE(q->rpm_status == RPM_SUSPENDED);
+#endif
+			return rq;
 		}
 
 		/*
diff --git a/block/blk-pm.c b/block/blk-pm.c
index 9b636960d285..3ad5a7334baa 100644
--- a/block/blk-pm.c
+++ b/block/blk-pm.c
@@ -1,8 +1,11 @@
 // SPDX-License-Identifier: GPL-2.0
 
+#include <linux/blk-mq.h>
 #include <linux/blk-pm.h>
 #include <linux/blkdev.h>
 #include <linux/pm_runtime.h>
+#include "blk-mq.h"
+#include "blk-mq-tag.h"
 
 /**
  * blk_pm_runtime_init - Block layer runtime PM initialization routine
@@ -40,6 +43,34 @@ void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
 }
 EXPORT_SYMBOL(blk_pm_runtime_init);
 
+struct in_flight_data {
+	struct request_queue	*q;
+	int			in_flight;
+};
+
+static void blk_count_in_flight(struct blk_mq_hw_ctx *hctx, struct request *rq,
+				void *priv, bool reserved)
+{
+	struct in_flight_data *in_flight = priv;
+
+	if (rq->q == in_flight->q)
+		in_flight->in_flight++;
+}
+
+/*
+ * Count the number of requests that are in flight for request queue @q. Use
+ * @q->nr_pending for legacy queues. Iterate over the tag set for blk-mq
+ * queues.
+ */
+static int blk_requests_in_flight(struct request_queue *q)
+{
+	struct in_flight_data in_flight = { .q = q };
+
+	if (q->mq_ops)
+		blk_mq_queue_rq_iter(q, blk_count_in_flight, &in_flight);
+	return q->nr_pending + in_flight.in_flight;
+}
+
 /**
  * blk_pre_runtime_suspend - Pre runtime suspend check
  * @q: the queue of the device
@@ -68,14 +99,49 @@ int blk_pre_runtime_suspend(struct request_queue *q)
 	if (!q->dev)
 		return ret;
 
+	WARN_ON_ONCE(q->rpm_status != RPM_ACTIVE);
+
+	/*
+	 * Increase the pm_only counter before checking whether any
+	 * non-PM blk_queue_enter() calls are in progress to avoid that any
+	 * new non-PM blk_queue_enter() calls succeed before the pm_only
+	 * counter is decreased again.
+	 */
+	blk_set_pm_only(q);
+	/*
+	 * This function only gets called if the most recent request activity
+	 * occurred at least autosuspend_delay_ms ago. Since blk_queue_enter()
+	 * is called by the request allocation code before
+	 * pm_request_resume(), if no requests have a tag assigned it is safe
+	 * to suspend the device.
+	 */
+	ret = -EBUSY;
+	if (blk_requests_in_flight(q) == 0) {
+		blk_freeze_queue_start(q);
+		/*
+		 * Freezing a queue starts a transition of the queue
+		 * usage counter to atomic mode. Wait until atomic
+		 * mode has been reached. This involves calling
+		 * call_rcu(). That call guarantees that later
+		 * blk_queue_enter() calls see the pm-only state. See
+		 * also http://lwn.net/Articles/573497/.
+		 */
+		percpu_ref_switch_to_atomic_sync(&q->q_usage_counter);
+		if (percpu_ref_is_zero(&q->q_usage_counter))
+			ret = 0;
+		blk_mq_unfreeze_queue(q);
+	}
+
 	spin_lock_irq(q->queue_lock);
-	if (q->nr_pending) {
-		ret = -EBUSY;
+	if (ret < 0)
 		pm_runtime_mark_last_busy(q->dev);
-	} else {
+	else
 		q->rpm_status = RPM_SUSPENDING;
-	}
 	spin_unlock_irq(q->queue_lock);
+
+	if (ret)
+		blk_clear_pm_only(q);
+
 	return ret;
 }
 EXPORT_SYMBOL(blk_pre_runtime_suspend);
@@ -106,6 +172,9 @@ void blk_post_runtime_suspend(struct request_queue *q, int err)
 		pm_runtime_mark_last_busy(q->dev);
 	}
 	spin_unlock_irq(q->queue_lock);
+
+	if (err)
+		blk_clear_pm_only(q);
 }
 EXPORT_SYMBOL(blk_post_runtime_suspend);
 
@@ -153,13 +222,15 @@ void blk_post_runtime_resume(struct request_queue *q, int err)
 	spin_lock_irq(q->queue_lock);
 	if (!err) {
 		q->rpm_status = RPM_ACTIVE;
-		__blk_run_queue(q);
 		pm_runtime_mark_last_busy(q->dev);
 		pm_request_autosuspend(q->dev);
 	} else {
 		q->rpm_status = RPM_SUSPENDED;
 	}
 	spin_unlock_irq(q->queue_lock);
+
+	if (!err)
+		blk_clear_pm_only(q);
 }
 EXPORT_SYMBOL(blk_post_runtime_resume);
 
-- 
2.19.0.397.gdd90340f6a-goog

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v9 8/8] blk-mq: Enable support for runtime power management
  2018-09-19 22:45 [PATCH v9 0/8] blk-mq: Implement runtime power management Bart Van Assche
                   ` (6 preceding siblings ...)
  2018-09-19 22:45 ` [PATCH v9 7/8] block: Make blk_get_request() block for non-PM requests while suspended Bart Van Assche
@ 2018-09-19 22:45 ` Bart Van Assche
  7 siblings, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2018-09-19 22:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Ming Lei,
	Jianchao Wang, Hannes Reinecke, Johannes Thumshirn, Alan Stern

Now that the blk-mq core processes power management requests
(marked with RQF_PREEMPT) in other states than RPM_ACTIVE, enable
runtime power management for blk-mq.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
---
 block/blk-mq.c | 2 ++
 block/blk-pm.c | 6 ------
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 85a1c1a59c72..20fdd78b75c7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -33,6 +33,7 @@
 #include "blk-mq.h"
 #include "blk-mq-debugfs.h"
 #include "blk-mq-tag.h"
+#include "blk-pm.h"
 #include "blk-stat.h"
 #include "blk-mq-sched.h"
 #include "blk-rq-qos.h"
@@ -475,6 +476,7 @@ static void __blk_mq_free_request(struct request *rq)
 	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
 	const int sched_tag = rq->internal_tag;
 
+	blk_pm_mark_last_busy(rq);
 	if (rq->tag != -1)
 		blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag);
 	if (sched_tag != -1)
diff --git a/block/blk-pm.c b/block/blk-pm.c
index 3ad5a7334baa..7c0ac1424f46 100644
--- a/block/blk-pm.c
+++ b/block/blk-pm.c
@@ -30,12 +30,6 @@
  */
 void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
 {
-	/* Don't enable runtime PM for blk-mq until it is ready */
-	if (q->mq_ops) {
-		pm_runtime_disable(dev);
-		return;
-	}
-
 	q->dev = dev;
 	q->rpm_status = RPM_ACTIVE;
 	pm_runtime_set_autosuspend_delay(q->dev, -1);
-- 
2.19.0.397.gdd90340f6a-goog

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 7/8] block: Make blk_get_request() block for non-PM requests while suspended
  2018-09-19 22:45 ` [PATCH v9 7/8] block: Make blk_get_request() block for non-PM requests while suspended Bart Van Assche
@ 2018-09-20  3:48   ` Ming Lei
  2018-09-20 15:23     ` Bart Van Assche
  0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2018-09-20  3:48 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Jianchao Wang,
	Hannes Reinecke, Johannes Thumshirn, Alan Stern

On Wed, Sep 19, 2018 at 03:45:29PM -0700, Bart Van Assche wrote:
> Instead of allowing requests that are not power management requests
> to enter the queue in runtime suspended status (RPM_SUSPENDED), make
> the blk_get_request() caller block. This change fixes a starvation
> issue: it is now guaranteed that power management requests will be
> executed no matter how many blk_get_request() callers are waiting.
> For blk-mq, instead of maintaining the q->nr_pending counter, rely
> on q->q_usage_counter. Call pm_runtime_mark_last_busy() every time a
> request finishes instead of only if the queue depth drops to zero.
> 
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> ---
>  block/blk-core.c | 37 +++++-----------------
>  block/blk-pm.c   | 81 +++++++++++++++++++++++++++++++++++++++++++++---
>  2 files changed, 84 insertions(+), 34 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 18b874d5c9c9..ae092ca121d5 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -2747,30 +2747,6 @@ void blk_account_io_done(struct request *req, u64 now)
>  	}
>  }
>  
> -#ifdef CONFIG_PM
> -/*
> - * Don't process normal requests when queue is suspended
> - * or in the process of suspending/resuming
> - */
> -static bool blk_pm_allow_request(struct request *rq)
> -{
> -	switch (rq->q->rpm_status) {
> -	case RPM_RESUMING:
> -	case RPM_SUSPENDING:
> -		return rq->rq_flags & RQF_PM;
> -	case RPM_SUSPENDED:
> -		return false;
> -	default:
> -		return true;
> -	}
> -}
> -#else
> -static bool blk_pm_allow_request(struct request *rq)
> -{
> -	return true;
> -}
> -#endif
> -
>  void blk_account_io_start(struct request *rq, bool new_io)
>  {
>  	struct hd_struct *part;
> @@ -2816,11 +2792,14 @@ static struct request *elv_next_request(struct request_queue *q)
>  
>  	while (1) {
>  		list_for_each_entry(rq, &q->queue_head, queuelist) {
> -			if (blk_pm_allow_request(rq))
> -				return rq;
> -
> -			if (rq->rq_flags & RQF_SOFTBARRIER)
> -				break;
> +#ifdef CONFIG_PM
> +			/*
> +			 * If a request gets queued in state RPM_SUSPENDED
> +			 * then that's a kernel bug.
> +			 */
> +			WARN_ON_ONCE(q->rpm_status == RPM_SUSPENDED);
> +#endif
> +			return rq;
>  		}
>  
>  		/*
> diff --git a/block/blk-pm.c b/block/blk-pm.c
> index 9b636960d285..3ad5a7334baa 100644
> --- a/block/blk-pm.c
> +++ b/block/blk-pm.c
> @@ -1,8 +1,11 @@
>  // SPDX-License-Identifier: GPL-2.0
>  
> +#include <linux/blk-mq.h>
>  #include <linux/blk-pm.h>
>  #include <linux/blkdev.h>
>  #include <linux/pm_runtime.h>
> +#include "blk-mq.h"
> +#include "blk-mq-tag.h"
>  
>  /**
>   * blk_pm_runtime_init - Block layer runtime PM initialization routine
> @@ -40,6 +43,34 @@ void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
>  }
>  EXPORT_SYMBOL(blk_pm_runtime_init);
>  
> +struct in_flight_data {
> +	struct request_queue	*q;
> +	int			in_flight;
> +};
> +
> +static void blk_count_in_flight(struct blk_mq_hw_ctx *hctx, struct request *rq,
> +				void *priv, bool reserved)
> +{
> +	struct in_flight_data *in_flight = priv;
> +
> +	if (rq->q == in_flight->q)
> +		in_flight->in_flight++;
> +}
> +
> +/*
> + * Count the number of requests that are in flight for request queue @q. Use
> + * @q->nr_pending for legacy queues. Iterate over the tag set for blk-mq
> + * queues.
> + */
> +static int blk_requests_in_flight(struct request_queue *q)
> +{
> +	struct in_flight_data in_flight = { .q = q };
> +
> +	if (q->mq_ops)
> +		blk_mq_queue_rq_iter(q, blk_count_in_flight, &in_flight);
> +	return q->nr_pending + in_flight.in_flight;
> +}
> +
>  /**
>   * blk_pre_runtime_suspend - Pre runtime suspend check
>   * @q: the queue of the device
> @@ -68,14 +99,49 @@ int blk_pre_runtime_suspend(struct request_queue *q)
>  	if (!q->dev)
>  		return ret;
>  
> +	WARN_ON_ONCE(q->rpm_status != RPM_ACTIVE);
> +
> +	/*
> +	 * Increase the pm_only counter before checking whether any
> +	 * non-PM blk_queue_enter() calls are in progress to avoid that any
> +	 * new non-PM blk_queue_enter() calls succeed before the pm_only
> +	 * counter is decreased again.
> +	 */
> +	blk_set_pm_only(q);
> +	/*
> +	 * This function only gets called if the most recent request activity
> +	 * occurred at least autosuspend_delay_ms ago. Since blk_queue_enter()
> +	 * is called by the request allocation code before
> +	 * pm_request_resume(), if no requests have a tag assigned it is safe
> +	 * to suspend the device.
> +	 */
> +	ret = -EBUSY;
> +	if (blk_requests_in_flight(q) == 0) {
> +		blk_freeze_queue_start(q);
> +		/*
> +		 * Freezing a queue starts a transition of the queue
> +		 * usage counter to atomic mode. Wait until atomic
> +		 * mode has been reached. This involves calling
> +		 * call_rcu(). That call guarantees that later
> +		 * blk_queue_enter() calls see the pm-only state. See
> +		 * also http://lwn.net/Articles/573497/.
> +		 */
> +		percpu_ref_switch_to_atomic_sync(&q->q_usage_counter);
> +		if (percpu_ref_is_zero(&q->q_usage_counter))
> +			ret = 0;
> +		blk_mq_unfreeze_queue(q);

Tejun doesn't agree on this kind of usage yet, so the ref has to be
dropped before calling blk_mq_unfreeze_queue().

Also, this way still can't address the race in the following link:

https://marc.info/?l=linux-block&m=153732992701093&w=2

Thanks,
Ming

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 1/8] blk-mq: Document the functions that iterate over requests
  2018-09-19 22:45 ` [PATCH v9 1/8] blk-mq: Document the functions that iterate over requests Bart Van Assche
@ 2018-09-20  7:35   ` Johannes Thumshirn
  2018-09-20 15:30     ` Bart Van Assche
  0 siblings, 1 reply; 14+ messages in thread
From: Johannes Thumshirn @ 2018-09-20  7:35 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Ming Lei,
	Jianchao Wang, Hannes Reinecke

On Wed, Sep 19, 2018 at 03:45:23PM -0700, Bart Van Assche wrote:
> Make it easier to understand the purpose of the functions that iterate
> over requests by documenting their purpose. Fix three minor spelling
> mistakes in comments in these functions.

[...]

>  
> +/*
> + * Call function @fn(@hctx, rq, @data, @reserved) for each request queued on
> + * @hctx that has been assigned a driver tag. @reserved indicates whether @bt
> + * is the breserved_tags member or the bitmap_tags member of struct
> + * blk_mq_tags.
> + */
>  static void bt_for_each(struct blk_mq_hw_ctx *hctx, struct sbitmap_queue *bt,
>  			busy_iter_fn *fn, void *data, bool reserved)

One minor nit and only if you have to re-send the series, can you
please also add the usual kernel-doc headers to the comments, i.e.:


/**
 * bt_for_each() - run callback on each hctx with a driver tag
 * @hctx:	the hardware context
 * @bt:		the sbitmap_queue
 * @fn:		the callback function to run
 * @data:	the data for the callback function
 * @reserved:	is reserved or not
 *
 * Call function @fn(@hctx, rq, @data, @reserved) for each request queued on
 * @hctx that has been assigned a driver tag. @reserved indicates whether @bt
 * is the breserved_tags member or the bitmap_tags member of struct
 * blk_mq_tags.
 */
static void bt_for_each(struct blk_mq_hw_ctx *hctx, struct sbitmap_queue *bt,
  			busy_iter_fn *fn, void *data, bool reserved)


Otherwise:
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Felix Imend�rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N�rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 7/8] block: Make blk_get_request() block for non-PM requests while suspended
  2018-09-20  3:48   ` Ming Lei
@ 2018-09-20 15:23     ` Bart Van Assche
  2018-09-21  0:54       ` Ming Lei
  0 siblings, 1 reply; 14+ messages in thread
From: Bart Van Assche @ 2018-09-20 15:23 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Jianchao Wang,
	Hannes Reinecke, Johannes Thumshirn, Alan Stern

On Thu, 2018-09-20 at 11:48 +0800, Ming Lei wrote:
> On Wed, Sep 19, 2018 at 03:45:29PM -0700, Bart Van Assche wrote:
> > +	ret = -EBUSY;
> > +	if (blk_requests_in_flight(q) == 0) {
> > +		blk_freeze_queue_start(q);
> > +		/*
> > +		 * Freezing a queue starts a transition of the
> > queue
> > +		 * usage counter to atomic mode. Wait until atomic
> > +		 * mode has been reached. This involves calling
> > +		 * call_rcu(). That call guarantees that later
> > +		 * blk_queue_enter() calls see the pm-only state.
> > See
> > +		 * also http://lwn.net/Articles/573497/.
> > +		 */
> > +		percpu_ref_switch_to_atomic_sync(&q-
> > >q_usage_counter);
> > +		if (percpu_ref_is_zero(&q->q_usage_counter))
> > +			ret = 0;
> > +		blk_mq_unfreeze_queue(q);
> 
> Tejun doesn't agree on this kind of usage yet, so the ref has to be
> dropped before calling blk_mq_unfreeze_queue().

I read all Tejuns' recent e-mails but I have not found any e-mail from
Tejun in which he wrote that he disagrees with the above pattern.

> Also, this way still can't address the race in the following link:
> 
> https://marc.info/?l=linux-block&m=153732992701093&w=2

I think that the following patch is sufficient to fix that race:

diff --git a/block/blk-core.c b/block/blk-core.c
index ae092ca121d5..16dd3a989753 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -942,8 +942,6 @@ int blk_queue_enter(struct request_queue *q,
blk_mq_req_flags_t flags)
 		if (success)
 			return 0;
 
-		blk_pm_request_resume(q);
-
 		if (flags & BLK_MQ_REQ_NOWAIT)
 			return -EBUSY;
 
@@ -958,7 +956,8 @@ int blk_queue_enter(struct request_queue *q,
blk_mq_req_flags_t flags)
 
 		wait_event(q->mq_freeze_wq,
 			   (atomic_read(&q->mq_freeze_depth) == 0 &&
-			    (pm || !blk_queue_pm_only(q))) ||
+			    (pm || (blk_pm_request_resume(q),
+				    !blk_queue_pm_only(q)))) ||
 			   blk_queue_dying(q));
 		if (blk_queue_dying(q))
 			return -ENODEV;

Bart.

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 1/8] blk-mq: Document the functions that iterate over requests
  2018-09-20  7:35   ` Johannes Thumshirn
@ 2018-09-20 15:30     ` Bart Van Assche
  0 siblings, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2018-09-20 15:30 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Ming Lei,
	Jianchao Wang, Hannes Reinecke

On Thu, 2018-09-20 at 09:35 +0200, Johannes Thumshirn wrote:
> On Wed, Sep 19, 2018 at 03:45:23PM -0700, Bart Van Assche wrote: 
> > +/*
> > + * Call function @fn(@hctx, rq, @data, @reserved) for each request
> > queued on
> > + * @hctx that has been assigned a driver tag. @reserved indicates
> > whether @bt
> > + * is the breserved_tags member or the bitmap_tags member of
> > struct
> > + * blk_mq_tags.
> > + */
> >  static void bt_for_each(struct blk_mq_hw_ctx *hctx, struct
> > sbitmap_queue *bt,
> >  			busy_iter_fn *fn, void *data, bool
> > reserved)
> 
> One minor nit and only if you have to re-send the series, can you
> please also add the usual kernel-doc headers to the comments, i.e.:

OK, I will use the kernel-doc format.

> Otherwise:
> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

Thanks!

Bart.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 7/8] block: Make blk_get_request() block for non-PM requests while suspended
  2018-09-20 15:23     ` Bart Van Assche
@ 2018-09-21  0:54       ` Ming Lei
  0 siblings, 0 replies; 14+ messages in thread
From: Ming Lei @ 2018-09-21  0:54 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Jianchao Wang,
	Hannes Reinecke, Johannes Thumshirn, Alan Stern

On Thu, Sep 20, 2018 at 08:23:10AM -0700, Bart Van Assche wrote:
> On Thu, 2018-09-20 at 11:48 +0800, Ming Lei wrote:
> > On Wed, Sep 19, 2018 at 03:45:29PM -0700, Bart Van Assche wrote:
> > > +	ret = -EBUSY;
> > > +	if (blk_requests_in_flight(q) == 0) {
> > > +		blk_freeze_queue_start(q);
> > > +		/*
> > > +		 * Freezing a queue starts a transition of the
> > > queue
> > > +		 * usage counter to atomic mode. Wait until atomic
> > > +		 * mode has been reached. This involves calling
> > > +		 * call_rcu(). That call guarantees that later
> > > +		 * blk_queue_enter() calls see the pm-only state.
> > > See
> > > +		 * also http://lwn.net/Articles/573497/.
> > > +		 */
> > > +		percpu_ref_switch_to_atomic_sync(&q-
> > > >q_usage_counter);
> > > +		if (percpu_ref_is_zero(&q->q_usage_counter))
> > > +			ret = 0;
> > > +		blk_mq_unfreeze_queue(q);
> > 
> > Tejun doesn't agree on this kind of usage yet, so the ref has to be
> > dropped before calling blk_mq_unfreeze_queue().
> 
> I read all Tejuns' recent e-mails but I have not found any e-mail from
> Tejun in which he wrote that he disagrees with the above pattern.

Up to now, blk_mq_unfreeze_queue() should only be called when
percpu_ref_is_zero(&q->q_usage_counter) is true.

I am trying to relax this limit in the following patch, but Tejun
objects this approach.

https://marc.info/?l=linux-block&m=153726600813090&w=2

BTW, the race with .release() may be left to user to handle by removing
grabbing the ref of atomic part in this patch, and it won't be a issue
for blk-mq.

> 
> > Also, this way still can't address the race in the following link:
> > 
> > https://marc.info/?l=linux-block&m=153732992701093&w=2
> 
> I think that the following patch is sufficient to fix that race:
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index ae092ca121d5..16dd3a989753 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -942,8 +942,6 @@ int blk_queue_enter(struct request_queue *q,
> blk_mq_req_flags_t flags)
>  		if (success)
>  			return 0;
>  
> -		blk_pm_request_resume(q);
> -
>  		if (flags & BLK_MQ_REQ_NOWAIT)
>  			return -EBUSY;
>  
> @@ -958,7 +956,8 @@ int blk_queue_enter(struct request_queue *q,
> blk_mq_req_flags_t flags)
>  
>  		wait_event(q->mq_freeze_wq,
>  			   (atomic_read(&q->mq_freeze_depth) == 0 &&
> -			    (pm || !blk_queue_pm_only(q))) ||
> +			    (pm || (blk_pm_request_resume(q),
> +				    !blk_queue_pm_only(q)))) ||
>  			   blk_queue_dying(q));
>  		if (blk_queue_dying(q))
>  			return -ENODEV;

This fix looks clever.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-09-21  0:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-19 22:45 [PATCH v9 0/8] blk-mq: Implement runtime power management Bart Van Assche
2018-09-19 22:45 ` [PATCH v9 1/8] blk-mq: Document the functions that iterate over requests Bart Van Assche
2018-09-20  7:35   ` Johannes Thumshirn
2018-09-20 15:30     ` Bart Van Assche
2018-09-19 22:45 ` [PATCH v9 2/8] blk-mq: Introduce blk_mq_queue_rq_iter() Bart Van Assche
2018-09-19 22:45 ` [PATCH v9 3/8] block: Move power management code into a new source file Bart Van Assche
2018-09-19 22:45 ` [PATCH v9 4/8] block, scsi: Change the preempt-only flag into a counter Bart Van Assche
2018-09-19 22:45 ` [PATCH v9 5/8] block: Split blk_pm_add_request() and blk_pm_put_request() Bart Van Assche
2018-09-19 22:45 ` [PATCH v9 6/8] block: Schedule runtime resume earlier Bart Van Assche
2018-09-19 22:45 ` [PATCH v9 7/8] block: Make blk_get_request() block for non-PM requests while suspended Bart Van Assche
2018-09-20  3:48   ` Ming Lei
2018-09-20 15:23     ` Bart Van Assche
2018-09-21  0:54       ` Ming Lei
2018-09-19 22:45 ` [PATCH v9 8/8] blk-mq: Enable support for runtime power management Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.