All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] blk-mq: Enable runtime power management
@ 2018-07-17 23:49 Bart Van Assche
  2018-07-17 23:49 ` [PATCH 1/3] block: Fix a comment in a header file Bart Van Assche
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Bart Van Assche @ 2018-07-17 23:49 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Bart Van Assche

Hello Jens,

This patch series not only enables runtime power management for blk-mq but also
simplifies power management for the legacy block layer. Please consider this
patch series for kernel v4.19 (I am aware that you are on vacation).

Thanks,

Bart.

Bart Van Assche (3):
  block: Fix a comment in a header file
  block, scsi: Rework runtime power management
  blk-mq: Enable support for runtime power management

 block/blk-core.c          | 60 ++++++++++++++++-----------------------
 block/blk-mq-debugfs.c    |  1 -
 block/blk-mq.c            | 12 ++++----
 block/elevator.c          | 11 +------
 drivers/scsi/sd.c         |  4 +--
 drivers/scsi/ufs/ufshcd.c | 10 +++----
 include/linux/blk-mq.h    |  1 +
 include/linux/blkdev.h    |  9 ++----
 8 files changed, 42 insertions(+), 66 deletions(-)

-- 
2.18.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] block: Fix a comment in a header file
  2018-07-17 23:49 [PATCH 0/3] blk-mq: Enable runtime power management Bart Van Assche
@ 2018-07-17 23:49 ` Bart Van Assche
  2018-07-18  6:47   ` Johannes Thumshirn
  2018-07-17 23:49 ` [PATCH 2/3] block, scsi: Rework runtime power management Bart Van Assche
  2018-07-17 23:49 ` [PATCH 3/3] blk-mq: Enable support for " Bart Van Assche
  2 siblings, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2018-07-17 23:49 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Ming Lei,
	Alan Stern, Johannes Thumshirn

Since the REQ_PREEMPT flag has been renamed into RQF_PREEMPT, update the
comment that refers to that flag.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 include/linux/blkdev.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 79226ca8f80f..e35ed66d744e 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -700,7 +700,7 @@ struct request_queue {
 #define QUEUE_FLAG_REGISTERED  26	/* queue has been registered to a disk */
 #define QUEUE_FLAG_SCSI_PASSTHROUGH 27	/* queue supports SCSI commands */
 #define QUEUE_FLAG_QUIESCED    28	/* queue has been quiesced */
-#define QUEUE_FLAG_PREEMPT_ONLY	29	/* only process REQ_PREEMPT requests */
+#define QUEUE_FLAG_PREEMPT_ONLY	29	/* only process RQF_PREEMPT requests */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_SAME_COMP)	|	\
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] block, scsi: Rework runtime power management
  2018-07-17 23:49 [PATCH 0/3] blk-mq: Enable runtime power management Bart Van Assche
  2018-07-17 23:49 ` [PATCH 1/3] block: Fix a comment in a header file Bart Van Assche
@ 2018-07-17 23:49 ` Bart Van Assche
  2018-07-18 12:16   ` Ming Lei
  2018-07-17 23:49 ` [PATCH 3/3] blk-mq: Enable support for " Bart Van Assche
  2 siblings, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2018-07-17 23:49 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Ming Lei,
	Alan Stern, Johannes Thumshirn

Instead of allowing requests that are not power management requests
to enter the queue in runtime suspended status (RPM_SUSPENDED), make
the blk_get_request() caller block. This change fixes a starvation
issue: it is now guaranteed that power management requests will be
executed no matter how many blk_get_request() callers are waiting.
Instead of maintaining the q->nr_pending counter, rely on
q->q_usage_counter. Call pm_runtime_mark_last_busy() every time a
request finishes instead of only if the queue depth drops to zero.
Use RQF_PREEMPT to mark power management requests instead of RQF_PM.
This is safe because the power management core serializes system-wide
suspend/resume and runtime power management state changes.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 block/blk-core.c          | 56 +++++++++++++++++----------------------
 block/blk-mq-debugfs.c    |  1 -
 block/blk-mq.c            | 12 ++++-----
 block/elevator.c          | 11 +-------
 drivers/scsi/sd.c         |  4 +--
 drivers/scsi/ufs/ufshcd.c | 10 +++----
 include/linux/blk-mq.h    |  1 +
 include/linux/blkdev.h    |  7 ++---
 8 files changed, 41 insertions(+), 61 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index f84a9b7b6f5a..65d7f27c8c22 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1719,7 +1719,7 @@ EXPORT_SYMBOL_GPL(part_round_stats);
 #ifdef CONFIG_PM
 static void blk_pm_put_request(struct request *rq)
 {
-	if (rq->q->dev && !(rq->rq_flags & RQF_PM) && !--rq->q->nr_pending)
+	if (rq->q->dev && !(rq->rq_flags & RQF_PREEMPT))
 		pm_runtime_mark_last_busy(rq->q->dev);
 }
 #else
@@ -2737,30 +2737,6 @@ void blk_account_io_done(struct request *req, u64 now)
 	}
 }
 
-#ifdef CONFIG_PM
-/*
- * Don't process normal requests when queue is suspended
- * or in the process of suspending/resuming
- */
-static bool blk_pm_allow_request(struct request *rq)
-{
-	switch (rq->q->rpm_status) {
-	case RPM_RESUMING:
-	case RPM_SUSPENDING:
-		return rq->rq_flags & RQF_PM;
-	case RPM_SUSPENDED:
-		return false;
-	}
-
-	return true;
-}
-#else
-static bool blk_pm_allow_request(struct request *rq)
-{
-	return true;
-}
-#endif
-
 void blk_account_io_start(struct request *rq, bool new_io)
 {
 	struct hd_struct *part;
@@ -2806,11 +2782,12 @@ static struct request *elv_next_request(struct request_queue *q)
 
 	while (1) {
 		list_for_each_entry(rq, &q->queue_head, queuelist) {
-			if (blk_pm_allow_request(rq))
-				return rq;
-
-			if (rq->rq_flags & RQF_SOFTBARRIER)
-				break;
+			/*
+			 * If a request gets queued in state RPM_SUSPENDED
+			 * then that's a kernel bug.
+			 */
+			WARN_ON_ONCE(q->rpm_status == RPM_SUSPENDED);
+			return rq;
 		}
 
 		/*
@@ -3801,8 +3778,11 @@ int blk_pre_runtime_suspend(struct request_queue *q)
 	if (!q->dev)
 		return ret;
 
+	blk_set_preempt_only(q);
+	blk_freeze_queue_start(q);
+
 	spin_lock_irq(q->queue_lock);
-	if (q->nr_pending) {
+	if (!percpu_ref_is_zero(&q->q_usage_counter)) {
 		ret = -EBUSY;
 		pm_runtime_mark_last_busy(q->dev);
 	} else {
@@ -3837,8 +3817,15 @@ void blk_post_runtime_suspend(struct request_queue *q, int err)
 	} else {
 		q->rpm_status = RPM_ACTIVE;
 		pm_runtime_mark_last_busy(q->dev);
+		queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q);
 	}
 	spin_unlock_irq(q->queue_lock);
+
+	if (err) {
+		blk_unfreeze_queue(q);
+		/* Because QUEUE_FLAG_PREEMPT_ONLY has been cleared. */
+		wake_up_all(&q->mq_freeze_wq);
+	}
 }
 EXPORT_SYMBOL(blk_post_runtime_suspend);
 
@@ -3888,11 +3875,18 @@ void blk_post_runtime_resume(struct request_queue *q, int err)
 		q->rpm_status = RPM_ACTIVE;
 		__blk_run_queue(q);
 		pm_runtime_mark_last_busy(q->dev);
+		queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q);
 		pm_request_autosuspend(q->dev);
 	} else {
 		q->rpm_status = RPM_SUSPENDED;
 	}
 	spin_unlock_irq(q->queue_lock);
+
+	if (!err) {
+		blk_unfreeze_queue(q);
+		/* Because QUEUE_FLAG_PREEMPT_ONLY has been cleared. */
+		wake_up_all(&q->mq_freeze_wq);
+	}
 }
 EXPORT_SYMBOL(blk_post_runtime_resume);
 
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 1c4532e92938..74c9b17811e2 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -339,7 +339,6 @@ static const char *const rqf_name[] = {
 	RQF_NAME(ELVPRIV),
 	RQF_NAME(IO_STAT),
 	RQF_NAME(ALLOCED),
-	RQF_NAME(PM),
 	RQF_NAME(HASHED),
 	RQF_NAME(STATS),
 	RQF_NAME(SPECIAL_PAYLOAD),
diff --git a/block/blk-mq.c b/block/blk-mq.c
index d394cdd8d8c6..5b11dc474470 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -168,13 +168,6 @@ EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait_timeout);
  */
 void blk_freeze_queue(struct request_queue *q)
 {
-	/*
-	 * In the !blk_mq case we are only calling this to kill the
-	 * q_usage_counter, otherwise this increases the freeze depth
-	 * and waits for it to return to zero.  For this reason there is
-	 * no blk_unfreeze_queue(), and blk_freeze_queue() is not
-	 * exported to drivers as the only user for unfreeze is blk_mq.
-	 */
 	blk_freeze_queue_start(q);
 	if (!q->mq_ops)
 		blk_drain_queue(q);
@@ -191,6 +184,11 @@ void blk_mq_freeze_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue);
 
+void blk_unfreeze_queue(struct request_queue *q)
+{
+	blk_mq_unfreeze_queue(q);
+}
+
 void blk_mq_unfreeze_queue(struct request_queue *q)
 {
 	int freeze_depth;
diff --git a/block/elevator.c b/block/elevator.c
index fa828b5bfd4b..68174953e730 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -558,20 +558,13 @@ void elv_bio_merged(struct request_queue *q, struct request *rq,
 }
 
 #ifdef CONFIG_PM
-static void blk_pm_requeue_request(struct request *rq)
-{
-	if (rq->q->dev && !(rq->rq_flags & RQF_PM))
-		rq->q->nr_pending--;
-}
-
 static void blk_pm_add_request(struct request_queue *q, struct request *rq)
 {
-	if (q->dev && !(rq->rq_flags & RQF_PM) && q->nr_pending++ == 0 &&
+	if (q->dev && !(rq->rq_flags & RQF_PREEMPT) &&
 	    (q->rpm_status == RPM_SUSPENDED || q->rpm_status == RPM_SUSPENDING))
 		pm_request_resume(q->dev);
 }
 #else
-static inline void blk_pm_requeue_request(struct request *rq) {}
 static inline void blk_pm_add_request(struct request_queue *q,
 				      struct request *rq)
 {
@@ -592,8 +585,6 @@ void elv_requeue_request(struct request_queue *q, struct request *rq)
 
 	rq->rq_flags &= ~RQF_STARTED;
 
-	blk_pm_requeue_request(rq);
-
 	__elv_add_request(q, rq, ELEVATOR_INSERT_REQUEUE);
 }
 
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 9421d9877730..1a994c0840c9 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1627,7 +1627,7 @@ static int sd_sync_cache(struct scsi_disk *sdkp, struct scsi_sense_hdr *sshdr)
 		 * flush everything.
 		 */
 		res = scsi_execute(sdp, cmd, DMA_NONE, NULL, 0, NULL, sshdr,
-				timeout, SD_MAX_RETRIES, 0, RQF_PM, NULL);
+				timeout, SD_MAX_RETRIES, 0, RQF_PREEMPT, NULL);
 		if (res == 0)
 			break;
 	}
@@ -3487,7 +3487,7 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
 		return -ENODEV;
 
 	res = scsi_execute(sdp, cmd, DMA_NONE, NULL, 0, NULL, &sshdr,
-			SD_TIMEOUT, SD_MAX_RETRIES, 0, RQF_PM, NULL);
+			   SD_TIMEOUT, SD_MAX_RETRIES, 0, RQF_PREEMPT, NULL);
 	if (res) {
 		sd_print_result(sdkp, "Start/Stop Unit failed", res);
 		if (driver_byte(res) & DRIVER_SENSE)
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 3a811c5f70ba..93562ce85854 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -7219,7 +7219,7 @@ ufshcd_send_request_sense(struct ufs_hba *hba, struct scsi_device *sdp)
 
 	ret = scsi_execute(sdp, cmd, DMA_FROM_DEVICE, buffer,
 			UFSHCD_REQ_SENSE_SIZE, NULL, NULL,
-			msecs_to_jiffies(1000), 3, 0, RQF_PM, NULL);
+			msecs_to_jiffies(1000), 3, 0, RQF_PREEMPT, NULL);
 	if (ret)
 		pr_err("%s: failed with err %d\n", __func__, ret);
 
@@ -7280,12 +7280,12 @@ static int ufshcd_set_dev_pwr_mode(struct ufs_hba *hba,
 	cmd[4] = pwr_mode << 4;
 
 	/*
-	 * Current function would be generally called from the power management
-	 * callbacks hence set the RQF_PM flag so that it doesn't resume the
-	 * already suspended childs.
+	 * Current function would be generally called from the power
+	 * management callbacks hence set the RQF_PREEMPT flag so that it
+	 * doesn't resume the already suspended childs.
 	 */
 	ret = scsi_execute(sdp, cmd, DMA_NONE, NULL, 0, NULL, &sshdr,
-			START_STOP_TIMEOUT, 0, 0, RQF_PM, NULL);
+			   START_STOP_TIMEOUT, 0, 0, RQF_PREEMPT, NULL);
 	if (ret) {
 		sdev_printk(KERN_WARNING, sdp,
 			    "START_STOP failed for power mode: %d, result %x\n",
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index e3147eb74222..c615eb72a5f2 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -278,6 +278,7 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 void blk_mq_freeze_queue(struct request_queue *q);
 void blk_mq_unfreeze_queue(struct request_queue *q);
 void blk_freeze_queue_start(struct request_queue *q);
+void blk_unfreeze_queue(struct request_queue *q);
 void blk_mq_freeze_queue_wait(struct request_queue *q);
 int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
 				     unsigned long timeout);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e35ed66d744e..9ba90fa67887 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -99,8 +99,8 @@ typedef __u32 __bitwise req_flags_t;
 #define RQF_MQ_INFLIGHT		((__force req_flags_t)(1 << 6))
 /* don't call prep for this one */
 #define RQF_DONTPREP		((__force req_flags_t)(1 << 7))
-/* set for "ide_preempt" requests and also for requests for which the SCSI
-   "quiesce" state must be ignored. */
+/* set for requests that must be processed even if QUEUE_FLAG_PREEMPT_ONLY has
+   been set, e.g. power management requests and "ide_preempt" requests. */
 #define RQF_PREEMPT		((__force req_flags_t)(1 << 8))
 /* contains copies of user pages */
 #define RQF_COPY_USER		((__force req_flags_t)(1 << 9))
@@ -114,8 +114,6 @@ typedef __u32 __bitwise req_flags_t;
 #define RQF_IO_STAT		((__force req_flags_t)(1 << 13))
 /* request came from our alloc pool */
 #define RQF_ALLOCED		((__force req_flags_t)(1 << 14))
-/* runtime pm request */
-#define RQF_PM			((__force req_flags_t)(1 << 15))
 /* on IO scheduler merge hash */
 #define RQF_HASHED		((__force req_flags_t)(1 << 16))
 /* IO stats tracking on */
@@ -545,7 +543,6 @@ struct request_queue {
 #ifdef CONFIG_PM
 	struct device		*dev;
 	int			rpm_status;
-	unsigned int		nr_pending;
 #endif
 
 	/*
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] blk-mq: Enable support for runtime power management
  2018-07-17 23:49 [PATCH 0/3] blk-mq: Enable runtime power management Bart Van Assche
  2018-07-17 23:49 ` [PATCH 1/3] block: Fix a comment in a header file Bart Van Assche
  2018-07-17 23:49 ` [PATCH 2/3] block, scsi: Rework runtime power management Bart Van Assche
@ 2018-07-17 23:49 ` Bart Van Assche
  2 siblings, 0 replies; 10+ messages in thread
From: Bart Van Assche @ 2018-07-17 23:49 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Ming Lei,
	Alan Stern, Johannes Thumshirn

Now that the blk-mq core processes power management requests
(marked with RQF_PREEMPT) in other states than RPM_ACTIVE, enable
runtime power management for blk-mq.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 block/blk-core.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 65d7f27c8c22..5aade9fa96be 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -3739,10 +3739,6 @@ EXPORT_SYMBOL(blk_finish_plug);
  */
 void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
 {
-	/* not support for RQF_PM and ->rpm_status in blk-mq yet */
-	if (q->mq_ops)
-		return;
-
 	q->dev = dev;
 	q->rpm_status = RPM_ACTIVE;
 	pm_runtime_set_autosuspend_delay(q->dev, -1);
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] block: Fix a comment in a header file
  2018-07-17 23:49 ` [PATCH 1/3] block: Fix a comment in a header file Bart Van Assche
@ 2018-07-18  6:47   ` Johannes Thumshirn
  0 siblings, 0 replies; 10+ messages in thread
From: Johannes Thumshirn @ 2018-07-18  6:47 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Ming Lei, Alan Stern

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Felix Imend�rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N�rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] block, scsi: Rework runtime power management
  2018-07-17 23:49 ` [PATCH 2/3] block, scsi: Rework runtime power management Bart Van Assche
@ 2018-07-18 12:16   ` Ming Lei
  2018-07-18 15:45     ` Bart Van Assche
  0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2018-07-18 12:16 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Ming Lei, Alan Stern,
	Johannes Thumshirn

On Wed, Jul 18, 2018 at 7:49 AM, Bart Van Assche <bart.vanassche@wdc.com> wrote:
> Instead of allowing requests that are not power management requests
> to enter the queue in runtime suspended status (RPM_SUSPENDED), make
> the blk_get_request() caller block. This change fixes a starvation
> issue: it is now guaranteed that power management requests will be
> executed no matter how many blk_get_request() callers are waiting.
> Instead of maintaining the q->nr_pending counter, rely on
> q->q_usage_counter. Call pm_runtime_mark_last_busy() every time a
> request finishes instead of only if the queue depth drops to zero.
> Use RQF_PREEMPT to mark power management requests instead of RQF_PM.
> This is safe because the power management core serializes system-wide
> suspend/resume and runtime power management state changes.
>
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>  block/blk-core.c          | 56 +++++++++++++++++----------------------
>  block/blk-mq-debugfs.c    |  1 -
>  block/blk-mq.c            | 12 ++++-----
>  block/elevator.c          | 11 +-------
>  drivers/scsi/sd.c         |  4 +--
>  drivers/scsi/ufs/ufshcd.c | 10 +++----
>  include/linux/blk-mq.h    |  1 +
>  include/linux/blkdev.h    |  7 ++---
>  8 files changed, 41 insertions(+), 61 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index f84a9b7b6f5a..65d7f27c8c22 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1719,7 +1719,7 @@ EXPORT_SYMBOL_GPL(part_round_stats);
>  #ifdef CONFIG_PM
>  static void blk_pm_put_request(struct request *rq)
>  {
> -       if (rq->q->dev && !(rq->rq_flags & RQF_PM) && !--rq->q->nr_pending)
> +       if (rq->q->dev && !(rq->rq_flags & RQF_PREEMPT))
>                 pm_runtime_mark_last_busy(rq->q->dev);
>  }
>  #else
> @@ -2737,30 +2737,6 @@ void blk_account_io_done(struct request *req, u64 now)
>         }
>  }
>
> -#ifdef CONFIG_PM
> -/*
> - * Don't process normal requests when queue is suspended
> - * or in the process of suspending/resuming
> - */
> -static bool blk_pm_allow_request(struct request *rq)
> -{
> -       switch (rq->q->rpm_status) {
> -       case RPM_RESUMING:
> -       case RPM_SUSPENDING:
> -               return rq->rq_flags & RQF_PM;
> -       case RPM_SUSPENDED:
> -               return false;
> -       }
> -
> -       return true;
> -}
> -#else
> -static bool blk_pm_allow_request(struct request *rq)
> -{
> -       return true;
> -}
> -#endif
> -
>  void blk_account_io_start(struct request *rq, bool new_io)
>  {
>         struct hd_struct *part;
> @@ -2806,11 +2782,12 @@ static struct request *elv_next_request(struct request_queue *q)
>
>         while (1) {
>                 list_for_each_entry(rq, &q->queue_head, queuelist) {
> -                       if (blk_pm_allow_request(rq))
> -                               return rq;
> -
> -                       if (rq->rq_flags & RQF_SOFTBARRIER)
> -                               break;
> +                       /*
> +                        * If a request gets queued in state RPM_SUSPENDED
> +                        * then that's a kernel bug.
> +                        */
> +                       WARN_ON_ONCE(q->rpm_status == RPM_SUSPENDED);
> +                       return rq;
>                 }
>
>                 /*
> @@ -3801,8 +3778,11 @@ int blk_pre_runtime_suspend(struct request_queue *q)
>         if (!q->dev)
>                 return ret;
>
> +       blk_set_preempt_only(q);
> +       blk_freeze_queue_start(q);
> +
>         spin_lock_irq(q->queue_lock);
> -       if (q->nr_pending) {
> +       if (!percpu_ref_is_zero(&q->q_usage_counter)) {

This way can't work reliably because the percpu ref isn't in atomic mode
yet after blk_freeze_queue_start() returns, then percpu_ref_is_zero() won't
see accurate value of the counter, finally the device may be put down before
in-flight requests are completed by hardware.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] block, scsi: Rework runtime power management
  2018-07-18 12:16   ` Ming Lei
@ 2018-07-18 15:45     ` Bart Van Assche
  2018-07-18 22:45       ` Ming Lei
  0 siblings, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2018-07-18 15:45 UTC (permalink / raw)
  To: tom.leiming; +Cc: hch, linux-block, jthumshirn, stern, axboe, ming.lei

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-7", Size: 2172 bytes --]

On Wed, 2018-07-18 at 20:16 +-0800, Ming Lei wrote:
+AD4- On Wed, Jul 18, 2018 at 7:49 AM, Bart Van Assche +ADw-bart.vanassche+=
AEA-wdc.com+AD4- wrote:
+AD4- +AD4- +AEAAQA- -3801,8 +-3778,11 +AEAAQA- int blk+AF8-pre+AF8-runtime=
+AF8-suspend(struct request+AF8-queue +ACo-q)
+AD4- +AD4-         if (+ACE-q-+AD4-dev)
+AD4- +AD4-                 return ret+ADs-
+AD4- +AD4-=20
+AD4- +AD4- +-       blk+AF8-set+AF8-preempt+AF8-only(q)+ADs-
+AD4- +AD4- +-       blk+AF8-freeze+AF8-queue+AF8-start(q)+ADs-
+AD4- +AD4- +-
+AD4- +AD4-         spin+AF8-lock+AF8-irq(q-+AD4-queue+AF8-lock)+ADs-
+AD4- +AD4- -       if (q-+AD4-nr+AF8-pending) +AHs-
+AD4- +AD4- +-       if (+ACE-percpu+AF8-ref+AF8-is+AF8-zero(+ACY-q-+AD4-q+=
AF8-usage+AF8-counter)) +AHs-
+AD4-=20
+AD4- This way can't work reliably because the percpu ref isn't in atomic m=
ode
+AD4- yet after blk+AF8-freeze+AF8-queue+AF8-start() returns, then percpu+A=
F8-ref+AF8-is+AF8-zero() won't
+AD4- see accurate value of the counter, finally the device may be put down=
 before
+AD4- in-flight requests are completed by hardware.

Hello Ming,

The blk+AF8-freeze+AF8-queue+AF8-start() implementation is as follows:

void blk+AF8-freeze+AF8-queue+AF8-start(struct request+AF8-queue +ACo-q)
+AHs-
	int freeze+AF8-depth+ADs-

	freeze+AF8-depth +AD0- atomic+AF8-inc+AF8-return(+ACY-q-+AD4-mq+AF8-freeze=
+AF8-depth)+ADs-
	if (freeze+AF8-depth +AD0APQ- 1) +AHs-
		percpu+AF8-ref+AF8-kill(+ACY-q-+AD4-q+AF8-usage+AF8-counter)+ADs-
		if (q-+AD4-mq+AF8-ops)
			blk+AF8-mq+AF8-run+AF8-hw+AF8-queues(q, false)+ADs-
	+AH0-
+AH0-

>From the documentation header in include/linux/percpu-refcount.h above
percpu+AF8-ref+AF8-kill():

 +ACo- Switches +AEA-ref into atomic mode before gathering up the percpu co=
unters
 +ACo- and dropping the initial ref.

In other words, I think that after blk+AF8-freeze+AF8-queue+AF8-start() ret=
urns that it is
guaranteed that q-+AD4-q+AF8-usage+AF8-counter is in atomic mode. However, =
we may need to
serialize concurrent blk+AF8-freeze+AF8-queue+AF8-start() calls to guarante=
e that this is
always the case if multiple threads call blk+AF8-freeze+AF8-queue+AF8-start=
() concurrently.

Bart.=

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] block, scsi: Rework runtime power management
  2018-07-18 15:45     ` Bart Van Assche
@ 2018-07-18 22:45       ` Ming Lei
  2018-07-19 15:54         ` Bart Van Assche
  0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2018-07-18 22:45 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: tom.leiming, hch, linux-block, jthumshirn, stern, axboe

On Wed, Jul 18, 2018 at 03:45:15PM +0000, Bart Van Assche wrote:
> On Wed, 2018-07-18 at 20:16 +0800, Ming Lei wrote:
> > On Wed, Jul 18, 2018 at 7:49 AM, Bart Van Assche <bart.vanassche@wdc.com> wrote:
> > > @@ -3801,8 +3778,11 @@ int blk_pre_runtime_suspend(struct request_queue *q)
> > >         if (!q->dev)
> > >                 return ret;
> > > 
> > > +       blk_set_preempt_only(q);
> > > +       blk_freeze_queue_start(q);
> > > +
> > >         spin_lock_irq(q->queue_lock);
> > > -       if (q->nr_pending) {
> > > +       if (!percpu_ref_is_zero(&q->q_usage_counter)) {
> > 
> > This way can't work reliably because the percpu ref isn't in atomic mode
> > yet after blk_freeze_queue_start() returns, then percpu_ref_is_zero() won't
> > see accurate value of the counter, finally the device may be put down before
> > in-flight requests are completed by hardware.
> 
> Hello Ming,
> 
> The blk_freeze_queue_start() implementation is as follows:
> 
> void blk_freeze_queue_start(struct request_queue *q)
> {
> 	int freeze_depth;
> 
> 	freeze_depth = atomic_inc_return(&q->mq_freeze_depth);
> 	if (freeze_depth == 1) {
> 		percpu_ref_kill(&q->q_usage_counter);
> 		if (q->mq_ops)
> 			blk_mq_run_hw_queues(q, false);
> 	}
> }
> 
> From the documentation header in include/linux/percpu-refcount.h above
> percpu_ref_kill():
> 
>  * Switches @ref into atomic mode before gathering up the percpu counters
>  * and dropping the initial ref.

IMO, the comment isn't accurate enough. Yes, the counter becomes atomic
mode after percpu_ref_kill() returns, but the counter can't be retrieved
accurately before the rcu confirmation is done.

One extra get is done in __percpu_ref_switch_to_atomic(), and the put pair
is run in percpu_ref_call_confirm_rcu(), which is scheduled via call_rcu_sched().

So once blk_freeze_queue_start() returns, percpu_ref_is_zero() won't
return true only until the rcu confirmation is done. That means this
approach may not put device down.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] block, scsi: Rework runtime power management
  2018-07-18 22:45       ` Ming Lei
@ 2018-07-19 15:54         ` Bart Van Assche
  2018-07-19 23:26           ` Ming Lei
  0 siblings, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2018-07-19 15:54 UTC (permalink / raw)
  To: ming.lei; +Cc: hch, linux-block, tom.leiming, stern, jthumshirn, axboe

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-7", Size: 621 bytes --]

On Thu, 2018-07-19 at 06:45 +-0800, Ming Lei wrote:
+AD4- So once blk+AF8-freeze+AF8-queue+AF8-start() returns, percpu+AF8-ref+=
AF8-is+AF8-zero() won't
+AD4- return true only until the rcu confirmation is done. That means this
+AD4- approach may not put device down.

Hello Ming,

I agree with your conclusion that it is not guaranteed that q-+AD4-q+AF8-us=
age+AF8-counter
is in atomic mode when percpu+AF8-ref+AF8-is+AF8-zero() is called. However,=
 I think that's
fine: if blk+AF8-pre+AF8-runtime+AF8-suspend() returns -EBUSY then the runt=
ime core will
try again at a later time to perform runtime suspend.

Bart.=

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] block, scsi: Rework runtime power management
  2018-07-19 15:54         ` Bart Van Assche
@ 2018-07-19 23:26           ` Ming Lei
  0 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2018-07-19 23:26 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, linux-block, tom.leiming, stern, jthumshirn, axboe

On Thu, Jul 19, 2018 at 03:54:53PM +0000, Bart Van Assche wrote:
> On Thu, 2018-07-19 at 06:45 +0800, Ming Lei wrote:
> > So once blk_freeze_queue_start() returns, percpu_ref_is_zero() won't
> > return true only until the rcu confirmation is done. That means this
> > approach may not put device down.
> 
> Hello Ming,
> 
> I agree with your conclusion that it is not guaranteed that q->q_usage_counter
> is in atomic mode when percpu_ref_is_zero() is called. However, I think that's
> fine: if blk_pre_runtime_suspend() returns -EBUSY then the runtime core will
> try again at a later time to perform runtime suspend.

This behaviour should be persistent in next retry since percpu_ref_is_zero()
should always return false after blk_freeze_queue_start() is run on one
un-frozen queue.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-07-19 23:26 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-17 23:49 [PATCH 0/3] blk-mq: Enable runtime power management Bart Van Assche
2018-07-17 23:49 ` [PATCH 1/3] block: Fix a comment in a header file Bart Van Assche
2018-07-18  6:47   ` Johannes Thumshirn
2018-07-17 23:49 ` [PATCH 2/3] block, scsi: Rework runtime power management Bart Van Assche
2018-07-18 12:16   ` Ming Lei
2018-07-18 15:45     ` Bart Van Assche
2018-07-18 22:45       ` Ming Lei
2018-07-19 15:54         ` Bart Van Assche
2018-07-19 23:26           ` Ming Lei
2018-07-17 23:49 ` [PATCH 3/3] blk-mq: Enable support for " Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.