All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V3 0/7] blk-mq: don't allocate driver tag beforehand for flush rq
@ 2017-11-02 15:24 Ming Lei
  2017-11-02 15:24 ` [PATCH V3 1/7] blk-mq: put the driver tag of nxt rq before first one is requeued Ming Lei
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: Ming Lei @ 2017-11-02 15:24 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Omar Sandoval, Bart Van Assche, Hannes Reinecke, linux-kernel, Ming Lei

Hi Jens,

This patchset avoids to allocate driver tag beforehand for flush rq
in case of I/O scheduler, then flush rq isn't treated specially
wrt. get/put driver tag, code gets cleanup much, such as,
reorder_tags_to_front() is removed, and we needn't to worry
about request order in dispatch list for avoiding I/O deadlock.

'dbench -t 30 -s 64' has been run on different devices(shared tag,
multi-queue, singele queue, ...), and no issues are observed,
even very low queue depth test are run, debench still works well.

Please consider it for V4.15, thanks!


V3:
	- rebase on the latest for-next of block tree
	- clean up commit log and comment
	- include Jianchao's fix on put driver tag

V2:
	- release driver tag before requeue
	- warning on inserting a req with driver tag


Jianchao Wang (1):
  blk-mq: put the driver tag of nxt rq before first one is requeued

Ming Lei (6):
  blk-flush: don't run queue for requests of bypassing flush
  block: pass 'run_queue' to blk_mq_request_bypass_insert
  blk-flush: use blk_mq_request_bypass_insert()
  blk-mq-sched: decide how to handle flush rq via RQF_FLUSH_SEQ
  blk-mq: move blk_mq_put_driver_tag*() into blk-mq.h
  blk-mq: don't allocate driver tag beforehand for flush rq

 block/blk-core.c     |  2 +-
 block/blk-flush.c    | 37 ++++++++++++++------
 block/blk-mq-sched.c | 63 +++++++++-------------------------
 block/blk-mq.c       | 97 ++++++++++------------------------------------------
 block/blk-mq.h       | 35 ++++++++++++++++++-
 5 files changed, 97 insertions(+), 137 deletions(-)

-- 
2.9.5

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH V3 1/7] blk-mq: put the driver tag of nxt rq before first one is requeued
  2017-11-02 15:24 [PATCH V3 0/7] blk-mq: don't allocate driver tag beforehand for flush rq Ming Lei
@ 2017-11-02 15:24 ` Ming Lei
  2017-11-02 15:24 ` [PATCH V3 2/7] blk-flush: don't run queue for requests of bypassing flush Ming Lei
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2017-11-02 15:24 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Omar Sandoval, Bart Van Assche, Hannes Reinecke, linux-kernel,
	Jianchao Wang

From: Jianchao Wang <jianchao.w.wang@oracle.com>

When free the driver tag of the next rq with I/O scheduler
configured, it get the first entry of the list, however, at the
moment, the failed rq has been requeued at the head of the list.
The rq it gets is the failed rq not the next rq.
Free the driver tag of next rq before the failed one is requeued
in the failure branch of queue_rq callback and it is just needed
there.

Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
---
 block/blk-mq.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index e4d2490f4e7e..fe14b28760eb 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1091,7 +1091,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 		bool got_budget)
 {
 	struct blk_mq_hw_ctx *hctx;
-	struct request *rq;
+	struct request *rq, *nxt;
 	int errors, queued;
 
 	if (list_empty(list))
@@ -1153,14 +1153,20 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 		if (list_empty(list))
 			bd.last = true;
 		else {
-			struct request *nxt;
-
 			nxt = list_first_entry(list, struct request, queuelist);
 			bd.last = !blk_mq_get_driver_tag(nxt, NULL, false);
 		}
 
 		ret = q->mq_ops->queue_rq(hctx, &bd);
 		if (ret == BLK_STS_RESOURCE) {
+			/*
+			 * If an I/O scheduler has been configured and we got a
+			 * driver tag for the next request already, free it again.
+			 */
+			if (!list_empty(list)) {
+				nxt = list_first_entry(list, struct request, queuelist);
+				blk_mq_put_driver_tag(nxt);
+			}
 			blk_mq_put_driver_tag_hctx(hctx, rq);
 			list_add(&rq->queuelist, list);
 			__blk_mq_requeue_request(rq);
@@ -1184,13 +1190,6 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 	 * that is where we will continue on next queue run.
 	 */
 	if (!list_empty(list)) {
-		/*
-		 * If an I/O scheduler has been configured and we got a driver
-		 * tag for the next request already, free it again.
-		 */
-		rq = list_first_entry(list, struct request, queuelist);
-		blk_mq_put_driver_tag(rq);
-
 		spin_lock(&hctx->lock);
 		list_splice_init(list, &hctx->dispatch);
 		spin_unlock(&hctx->lock);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH V3 2/7] blk-flush: don't run queue for requests of bypassing flush
  2017-11-02 15:24 [PATCH V3 0/7] blk-mq: don't allocate driver tag beforehand for flush rq Ming Lei
  2017-11-02 15:24 ` [PATCH V3 1/7] blk-mq: put the driver tag of nxt rq before first one is requeued Ming Lei
@ 2017-11-02 15:24 ` Ming Lei
  2017-11-02 15:24 ` [PATCH V3 3/7] block: pass 'run_queue' to blk_mq_request_bypass_insert Ming Lei
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2017-11-02 15:24 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Omar Sandoval, Bart Van Assche, Hannes Reinecke, linux-kernel, Ming Lei

blk_insert_flush() should only insert request since run queue always
follows it.

In case of bypassing flush, we don't need to run queue because every
blk_insert_flush() follows one run queue.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-flush.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 4938bec8cfef..81bd1a843043 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -463,7 +463,7 @@ void blk_insert_flush(struct request *rq)
 	if ((policy & REQ_FSEQ_DATA) &&
 	    !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
 		if (q->mq_ops)
-			blk_mq_sched_insert_request(rq, false, true, false, false);
+			blk_mq_sched_insert_request(rq, false, false, false, false);
 		else
 			list_add_tail(&rq->queuelist, &q->queue_head);
 		return;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH V3 3/7] block: pass 'run_queue' to blk_mq_request_bypass_insert
  2017-11-02 15:24 [PATCH V3 0/7] blk-mq: don't allocate driver tag beforehand for flush rq Ming Lei
  2017-11-02 15:24 ` [PATCH V3 1/7] blk-mq: put the driver tag of nxt rq before first one is requeued Ming Lei
  2017-11-02 15:24 ` [PATCH V3 2/7] blk-flush: don't run queue for requests of bypassing flush Ming Lei
@ 2017-11-02 15:24 ` Ming Lei
  2017-11-02 15:24 ` [PATCH V3 4/7] blk-flush: use blk_mq_request_bypass_insert() Ming Lei
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2017-11-02 15:24 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Omar Sandoval, Bart Van Assche, Hannes Reinecke, linux-kernel, Ming Lei

Block flush need this function without running queue, so introduce the
parameter.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c | 2 +-
 block/blk-mq.c   | 5 +++--
 block/blk-mq.h   | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index bb4fce694a60..279d9f41a492 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2352,7 +2352,7 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *
 		 * bypass a potential scheduler on the bottom device for
 		 * insert.
 		 */
-		blk_mq_request_bypass_insert(rq);
+		blk_mq_request_bypass_insert(rq, true);
 		return BLK_STS_OK;
 	}
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index fe14b28760eb..5e2866b83305 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1495,7 +1495,7 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
  * Should only be used carefully, when the caller knows we want to
  * bypass a potential IO scheduler on the target device.
  */
-void blk_mq_request_bypass_insert(struct request *rq)
+void blk_mq_request_bypass_insert(struct request *rq, bool run_queue)
 {
 	struct blk_mq_ctx *ctx = rq->mq_ctx;
 	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, ctx->cpu);
@@ -1504,7 +1504,8 @@ void blk_mq_request_bypass_insert(struct request *rq)
 	list_add_tail(&rq->queuelist, &hctx->dispatch);
 	spin_unlock(&hctx->lock);
 
-	blk_mq_run_hw_queue(hctx, false);
+	if (run_queue)
+		blk_mq_run_hw_queue(hctx, false);
 }
 
 void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 522b420dedc0..713be5ae83ba 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -56,7 +56,7 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
  */
 void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 				bool at_head);
-void blk_mq_request_bypass_insert(struct request *rq);
+void blk_mq_request_bypass_insert(struct request *rq, bool run_queue);
 void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
 				struct list_head *list);
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH V3 4/7] blk-flush: use blk_mq_request_bypass_insert()
  2017-11-02 15:24 [PATCH V3 0/7] blk-mq: don't allocate driver tag beforehand for flush rq Ming Lei
                   ` (2 preceding siblings ...)
  2017-11-02 15:24 ` [PATCH V3 3/7] block: pass 'run_queue' to blk_mq_request_bypass_insert Ming Lei
@ 2017-11-02 15:24 ` Ming Lei
  2017-11-02 15:24 ` [PATCH V3 5/7] blk-mq-sched: decide how to handle flush rq via RQF_FLUSH_SEQ Ming Lei
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2017-11-02 15:24 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Omar Sandoval, Bart Van Assche, Hannes Reinecke, linux-kernel, Ming Lei

In the following patch, we will use RQF_FLUSH_SEQ to decide:

1) if the flag isn't set, the flush rq need to be inserted via
blk_insert_flush()

2) otherwise, the flush rq need to be dispatched directly since
it is in flush machinery now.

So we use blk_mq_request_bypass_insert() for requests of bypassing
flush machinery, just like what the legacy path did.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-flush.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 81bd1a843043..a9773d2075ac 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -463,7 +463,7 @@ void blk_insert_flush(struct request *rq)
 	if ((policy & REQ_FSEQ_DATA) &&
 	    !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
 		if (q->mq_ops)
-			blk_mq_sched_insert_request(rq, false, false, false, false);
+			blk_mq_request_bypass_insert(rq, false);
 		else
 			list_add_tail(&rq->queuelist, &q->queue_head);
 		return;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH V3 5/7] blk-mq-sched: decide how to handle flush rq via RQF_FLUSH_SEQ
  2017-11-02 15:24 [PATCH V3 0/7] blk-mq: don't allocate driver tag beforehand for flush rq Ming Lei
                   ` (3 preceding siblings ...)
  2017-11-02 15:24 ` [PATCH V3 4/7] blk-flush: use blk_mq_request_bypass_insert() Ming Lei
@ 2017-11-02 15:24 ` Ming Lei
  2017-11-02 15:24 ` [PATCH V3 6/7] blk-mq: move blk_mq_put_driver_tag*() into blk-mq.h Ming Lei
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2017-11-02 15:24 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Omar Sandoval, Bart Van Assche, Hannes Reinecke, linux-kernel, Ming Lei

In case of IO scheduler we always pre-allocate one driver tag before
calling blk_insert_flush(), and flush request will be marked as
RQF_FLUSH_SEQ once it is in flush machinary.

So if RQF_FLUSH_SEQ isn't set, we call blk_insert_flush() to handle
the request, otherwise the flush request is dispatched to ->dispatch
list directly.

This is a prepare patch for not preallocating driver tag on flush rq,
and don't treat flush rq as special case, just what the legacy path
did.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 7775f6b12fa9..32e4ea2d0a77 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -355,21 +355,23 @@ void blk_mq_sched_request_inserted(struct request *rq)
 EXPORT_SYMBOL_GPL(blk_mq_sched_request_inserted);
 
 static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
+				       bool has_sched,
 				       struct request *rq)
 {
-	if (rq->tag == -1) {
+	/* dispatch flush rq directly */
+	if (rq->rq_flags & RQF_FLUSH_SEQ) {
+		spin_lock(&hctx->lock);
+		list_add(&rq->queuelist, &hctx->dispatch);
+		spin_unlock(&hctx->lock);
+		return true;
+	}
+
+	if (has_sched) {
 		rq->rq_flags |= RQF_SORTED;
-		return false;
+		WARN_ON(rq->tag != -1);
 	}
 
-	/*
-	 * If we already have a real request tag, send directly to
-	 * the dispatch list.
-	 */
-	spin_lock(&hctx->lock);
-	list_add(&rq->queuelist, &hctx->dispatch);
-	spin_unlock(&hctx->lock);
-	return true;
+	return false;
 }
 
 /*
@@ -395,12 +397,13 @@ void blk_mq_sched_insert_request(struct request *rq, bool at_head,
 	struct blk_mq_ctx *ctx = rq->mq_ctx;
 	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
 
-	if (rq->tag == -1 && op_is_flush(rq->cmd_flags)) {
+	/* flush rq in flush machinery need to be dispatched directly */
+	if (!(rq->rq_flags & RQF_FLUSH_SEQ) && op_is_flush(rq->cmd_flags)) {
 		blk_mq_sched_insert_flush(hctx, rq, can_block);
 		return;
 	}
 
-	if (e && blk_mq_sched_bypass_insert(hctx, rq))
+	if (blk_mq_sched_bypass_insert(hctx, !!e, rq))
 		goto run;
 
 	if (e && e->type->ops.mq.insert_requests) {
@@ -438,7 +441,7 @@ void blk_mq_sched_insert_requests(struct request_queue *q,
 		list_for_each_entry_safe(rq, next, list, queuelist) {
 			if (WARN_ON_ONCE(rq->tag != -1)) {
 				list_del_init(&rq->queuelist);
-				blk_mq_sched_bypass_insert(hctx, rq);
+				blk_mq_sched_bypass_insert(hctx, true, rq);
 			}
 		}
 	}
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH V3 6/7] blk-mq: move blk_mq_put_driver_tag*() into blk-mq.h
  2017-11-02 15:24 [PATCH V3 0/7] blk-mq: don't allocate driver tag beforehand for flush rq Ming Lei
                   ` (4 preceding siblings ...)
  2017-11-02 15:24 ` [PATCH V3 5/7] blk-mq-sched: decide how to handle flush rq via RQF_FLUSH_SEQ Ming Lei
@ 2017-11-02 15:24 ` Ming Lei
  2017-11-02 15:24 ` [PATCH V3 7/7] blk-mq: don't allocate driver tag beforehand for flush rq Ming Lei
  2017-11-04  4:17 ` [PATCH V3 0/7] " Ming Lei
  7 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2017-11-02 15:24 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Omar Sandoval, Bart Van Assche, Hannes Reinecke, linux-kernel, Ming Lei

We need this helper to put the driver tag for flush rq, since we will
not share tag in the flush request sequence in the following patch
in case that I/O scheduler is applied.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 32 --------------------------------
 block/blk-mq.h | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 32 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5e2866b83305..bf62065a15ec 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -993,38 +993,6 @@ bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
 	return rq->tag != -1;
 }
 
-static void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
-				    struct request *rq)
-{
-	blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag);
-	rq->tag = -1;
-
-	if (rq->rq_flags & RQF_MQ_INFLIGHT) {
-		rq->rq_flags &= ~RQF_MQ_INFLIGHT;
-		atomic_dec(&hctx->nr_active);
-	}
-}
-
-static void blk_mq_put_driver_tag_hctx(struct blk_mq_hw_ctx *hctx,
-				       struct request *rq)
-{
-	if (rq->tag == -1 || rq->internal_tag == -1)
-		return;
-
-	__blk_mq_put_driver_tag(hctx, rq);
-}
-
-static void blk_mq_put_driver_tag(struct request *rq)
-{
-	struct blk_mq_hw_ctx *hctx;
-
-	if (rq->tag == -1 || rq->internal_tag == -1)
-		return;
-
-	hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu);
-	__blk_mq_put_driver_tag(hctx, rq);
-}
-
 /*
  * If we fail getting a driver tag because all the driver tags are already
  * assigned and on the dispatch list, BUT the first entry does not have a
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 713be5ae83ba..f55d39b0c6ed 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -2,6 +2,7 @@
 #define INT_BLK_MQ_H
 
 #include "blk-stat.h"
+#include "blk-mq-tag.h"
 
 struct blk_mq_tag_set;
 
@@ -157,4 +158,36 @@ static inline blk_status_t blk_mq_get_dispatch_budget(
 	return BLK_STS_OK;
 }
 
+static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
+					   struct request *rq)
+{
+	blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag);
+	rq->tag = -1;
+
+	if (rq->rq_flags & RQF_MQ_INFLIGHT) {
+		rq->rq_flags &= ~RQF_MQ_INFLIGHT;
+		atomic_dec(&hctx->nr_active);
+	}
+}
+
+static inline void blk_mq_put_driver_tag_hctx(struct blk_mq_hw_ctx *hctx,
+				       struct request *rq)
+{
+	if (rq->tag == -1 || rq->internal_tag == -1)
+		return;
+
+	__blk_mq_put_driver_tag(hctx, rq);
+}
+
+static inline void blk_mq_put_driver_tag(struct request *rq)
+{
+	struct blk_mq_hw_ctx *hctx;
+
+	if (rq->tag == -1 || rq->internal_tag == -1)
+		return;
+
+	hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu);
+	__blk_mq_put_driver_tag(hctx, rq);
+}
+
 #endif
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH V3 7/7] blk-mq: don't allocate driver tag beforehand for flush rq
  2017-11-02 15:24 [PATCH V3 0/7] blk-mq: don't allocate driver tag beforehand for flush rq Ming Lei
                   ` (5 preceding siblings ...)
  2017-11-02 15:24 ` [PATCH V3 6/7] blk-mq: move blk_mq_put_driver_tag*() into blk-mq.h Ming Lei
@ 2017-11-02 15:24 ` Ming Lei
  2017-11-04  4:17 ` [PATCH V3 0/7] " Ming Lei
  7 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2017-11-02 15:24 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Omar Sandoval, Bart Van Assche, Hannes Reinecke, linux-kernel, Ming Lei

The behind idea is simple:

1) for none scheduler, driver tag has to be borrowed for flush rq, otherwise
we may run out of tag, and IO hang is caused. And get/put driver tag is
actually noop, so reorder tags isn't necessary at all.

2) for real I/O scheduler, we needn't to allocate driver tag beforehand for
flush rq, and it works just fine to follow the way for normal requests: allocate
driver tag for each rq just before calling .queue_rq().

One visible change to driver is that the driver tag isn't shared in the flush
request sequence, that won't be a problem, since we always do that in legacy path.

Then flush rq needn't to be treated specially wrt. get/put driver tag, code gets
cleanup much, such as, reorder_tags_to_front() is removed, and we needn't to worry
about request order in dispatch list for avoiding I/O deadlock.

Also we have to put the driver tag before requeueing.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-flush.c    | 35 ++++++++++++++++++++++++++---------
 block/blk-mq-sched.c | 42 +++++-------------------------------------
 block/blk-mq.c       | 41 ++++++-----------------------------------
 3 files changed, 37 insertions(+), 81 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index a9773d2075ac..f17170675917 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -231,8 +231,13 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
 		/* release the tag's ownership to the req cloned from */
 		spin_lock_irqsave(&fq->mq_flush_lock, flags);
 		hctx = blk_mq_map_queue(q, flush_rq->mq_ctx->cpu);
-		blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq);
-		flush_rq->tag = -1;
+		if (!q->elevator) {
+			blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq);
+			flush_rq->tag = -1;
+		} else {
+			blk_mq_put_driver_tag_hctx(hctx, flush_rq);
+			flush_rq->internal_tag = -1;
+		}
 	}
 
 	running = &fq->flush_queue[fq->flush_running_idx];
@@ -318,19 +323,26 @@ static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq)
 	blk_rq_init(q, flush_rq);
 
 	/*
-	 * Borrow tag from the first request since they can't
-	 * be in flight at the same time. And acquire the tag's
-	 * ownership for flush req.
+	 * In case of none scheduler, borrow tag from the first request
+	 * since they can't be in flight at the same time. And acquire
+	 * the tag's ownership for flush req.
+	 *
+	 * In case of IO scheduler, flush rq need to borrow scheduler tag
+	 * just for cheating put/get driver tag.
 	 */
 	if (q->mq_ops) {
 		struct blk_mq_hw_ctx *hctx;
 
 		flush_rq->mq_ctx = first_rq->mq_ctx;
-		flush_rq->tag = first_rq->tag;
-		fq->orig_rq = first_rq;
 
-		hctx = blk_mq_map_queue(q, first_rq->mq_ctx->cpu);
-		blk_mq_tag_set_rq(hctx, first_rq->tag, flush_rq);
+		if (!q->elevator) {
+			fq->orig_rq = first_rq;
+			flush_rq->tag = first_rq->tag;
+			hctx = blk_mq_map_queue(q, first_rq->mq_ctx->cpu);
+			blk_mq_tag_set_rq(hctx, first_rq->tag, flush_rq);
+		} else {
+			flush_rq->internal_tag = first_rq->internal_tag;
+		}
 	}
 
 	flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH;
@@ -394,6 +406,11 @@ static void mq_flush_data_end_io(struct request *rq, blk_status_t error)
 
 	hctx = blk_mq_map_queue(q, ctx->cpu);
 
+	if (q->elevator) {
+		WARN_ON(rq->tag < 0);
+		blk_mq_put_driver_tag_hctx(hctx, rq);
+	}
+
 	/*
 	 * After populating an empty queue, kick it to avoid stall.  Read
 	 * the comment in flush_end_io().
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 32e4ea2d0a77..eb082f82b470 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -366,29 +366,12 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
 		return true;
 	}
 
-	if (has_sched) {
+	if (has_sched)
 		rq->rq_flags |= RQF_SORTED;
-		WARN_ON(rq->tag != -1);
-	}
 
 	return false;
 }
 
-/*
- * Add flush/fua to the queue. If we fail getting a driver tag, then
- * punt to the requeue list. Requeue will re-invoke us from a context
- * that's safe to block from.
- */
-static void blk_mq_sched_insert_flush(struct blk_mq_hw_ctx *hctx,
-				      struct request *rq, bool can_block)
-{
-	if (blk_mq_get_driver_tag(rq, &hctx, can_block)) {
-		blk_insert_flush(rq);
-		blk_mq_run_hw_queue(hctx, true);
-	} else
-		blk_mq_add_to_requeue_list(rq, false, true);
-}
-
 void blk_mq_sched_insert_request(struct request *rq, bool at_head,
 				 bool run_queue, bool async, bool can_block)
 {
@@ -399,10 +382,12 @@ void blk_mq_sched_insert_request(struct request *rq, bool at_head,
 
 	/* flush rq in flush machinery need to be dispatched directly */
 	if (!(rq->rq_flags & RQF_FLUSH_SEQ) && op_is_flush(rq->cmd_flags)) {
-		blk_mq_sched_insert_flush(hctx, rq, can_block);
-		return;
+		blk_insert_flush(rq);
+		goto run;
 	}
 
+	WARN_ON(e && (rq->tag != -1));
+
 	if (blk_mq_sched_bypass_insert(hctx, !!e, rq))
 		goto run;
 
@@ -429,23 +414,6 @@ void blk_mq_sched_insert_requests(struct request_queue *q,
 	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
 	struct elevator_queue *e = hctx->queue->elevator;
 
-	if (e) {
-		struct request *rq, *next;
-
-		/*
-		 * We bypass requests that already have a driver tag assigned,
-		 * which should only be flushes. Flushes are only ever inserted
-		 * as single requests, so we shouldn't ever hit the
-		 * WARN_ON_ONCE() below (but let's handle it just in case).
-		 */
-		list_for_each_entry_safe(rq, next, list, queuelist) {
-			if (WARN_ON_ONCE(rq->tag != -1)) {
-				list_del_init(&rq->queuelist);
-				blk_mq_sched_bypass_insert(hctx, true, rq);
-			}
-		}
-	}
-
 	if (e && e->type->ops.mq.insert_requests)
 		e->type->ops.mq.insert_requests(hctx, list, false);
 	else
diff --git a/block/blk-mq.c b/block/blk-mq.c
index bf62065a15ec..445c74e0f5e5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -650,6 +650,8 @@ static void __blk_mq_requeue_request(struct request *rq)
 {
 	struct request_queue *q = rq->q;
 
+	blk_mq_put_driver_tag(rq);
+
 	trace_block_rq_requeue(q, rq);
 	wbt_requeue(q->rq_wb, &rq->issue_stat);
 	blk_mq_sched_requeue_request(rq);
@@ -993,30 +995,6 @@ bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
 	return rq->tag != -1;
 }
 
-/*
- * If we fail getting a driver tag because all the driver tags are already
- * assigned and on the dispatch list, BUT the first entry does not have a
- * tag, then we could deadlock. For that case, move entries with assigned
- * driver tags to the front, leaving the set of tagged requests in the
- * same order, and the untagged set in the same order.
- */
-static bool reorder_tags_to_front(struct list_head *list)
-{
-	struct request *rq, *tmp, *first = NULL;
-
-	list_for_each_entry_safe_reverse(rq, tmp, list, queuelist) {
-		if (rq == first)
-			break;
-		if (rq->tag != -1) {
-			list_move(&rq->queuelist, list);
-			if (!first)
-				first = rq;
-		}
-	}
-
-	return first != NULL;
-}
-
 static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode, int flags,
 				void *key)
 {
@@ -1077,9 +1055,6 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 
 		rq = list_first_entry(list, struct request, queuelist);
 		if (!blk_mq_get_driver_tag(rq, &hctx, false)) {
-			if (!queued && reorder_tags_to_front(list))
-				continue;
-
 			/*
 			 * The initial allocation attempt failed, so we need to
 			 * rerun the hardware queue when a tag is freed.
@@ -1135,7 +1110,6 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 				nxt = list_first_entry(list, struct request, queuelist);
 				blk_mq_put_driver_tag(nxt);
 			}
-			blk_mq_put_driver_tag_hctx(hctx, rq);
 			list_add(&rq->queuelist, list);
 			__blk_mq_requeue_request(rq);
 			break;
@@ -1704,13 +1678,10 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 	if (unlikely(is_flush_fua)) {
 		blk_mq_put_ctx(data.ctx);
 		blk_mq_bio_to_request(rq, bio);
-		if (q->elevator) {
-			blk_mq_sched_insert_request(rq, false, true, true,
-					true);
-		} else {
-			blk_insert_flush(rq);
-			blk_mq_run_hw_queue(data.hctx, true);
-		}
+
+		/* bypass scheduler for flush rq */
+		blk_insert_flush(rq);
+		blk_mq_run_hw_queue(data.hctx, true);
 	} else if (plug && q->nr_hw_queues == 1) {
 		struct request *last = NULL;
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH V3 0/7] blk-mq: don't allocate driver tag beforehand for flush rq
  2017-11-02 15:24 [PATCH V3 0/7] blk-mq: don't allocate driver tag beforehand for flush rq Ming Lei
                   ` (6 preceding siblings ...)
  2017-11-02 15:24 ` [PATCH V3 7/7] blk-mq: don't allocate driver tag beforehand for flush rq Ming Lei
@ 2017-11-04  4:17 ` Ming Lei
  2017-11-04 14:18   ` Jens Axboe
  7 siblings, 1 reply; 11+ messages in thread
From: Ming Lei @ 2017-11-04  4:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Omar Sandoval, Bart Van Assche, Hannes Reinecke, linux-kernel

On Thu, Nov 02, 2017 at 11:24:31PM +0800, Ming Lei wrote:
> Hi Jens,
> 
> This patchset avoids to allocate driver tag beforehand for flush rq
> in case of I/O scheduler, then flush rq isn't treated specially
> wrt. get/put driver tag, code gets cleanup much, such as,
> reorder_tags_to_front() is removed, and we needn't to worry
> about request order in dispatch list for avoiding I/O deadlock.
> 
> 'dbench -t 30 -s 64' has been run on different devices(shared tag,
> multi-queue, singele queue, ...), and no issues are observed,
> even very low queue depth test are run, debench still works well.
> 
> Please consider it for V4.15, thanks!

Hi Jens,

As we discussed before, this patch is a good cleanup on handling flush
request, could you share your opinion on V3?

thanks,
Ming

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V3 0/7] blk-mq: don't allocate driver tag beforehand for flush rq
  2017-11-04  4:17 ` [PATCH V3 0/7] " Ming Lei
@ 2017-11-04 14:18   ` Jens Axboe
  2017-11-04 18:49     ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2017-11-04 14:18 UTC (permalink / raw)
  To: Ming Lei, linux-block, Christoph Hellwig
  Cc: Omar Sandoval, Bart Van Assche, Hannes Reinecke, linux-kernel

On 11/03/2017 10:17 PM, Ming Lei wrote:
> On Thu, Nov 02, 2017 at 11:24:31PM +0800, Ming Lei wrote:
>> Hi Jens,
>>
>> This patchset avoids to allocate driver tag beforehand for flush rq
>> in case of I/O scheduler, then flush rq isn't treated specially
>> wrt. get/put driver tag, code gets cleanup much, such as,
>> reorder_tags_to_front() is removed, and we needn't to worry
>> about request order in dispatch list for avoiding I/O deadlock.
>>
>> 'dbench -t 30 -s 64' has been run on different devices(shared tag,
>> multi-queue, singele queue, ...), and no issues are observed,
>> even very low queue depth test are run, debench still works well.
>>
>> Please consider it for V4.15, thanks!
> 
> Hi Jens,
> 
> As we discussed before, this patch is a good cleanup on handling flush
> request, could you share your opinion on V3?

It looks fine to me. But I'd really like to have the potential hang
in the current 4.15 branch ironed out, before we pile more stuff on
top. Meanwhile, I'll see if this passes my testing.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V3 0/7] blk-mq: don't allocate driver tag beforehand for flush rq
  2017-11-04 14:18   ` Jens Axboe
@ 2017-11-04 18:49     ` Jens Axboe
  0 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2017-11-04 18:49 UTC (permalink / raw)
  To: Ming Lei, linux-block, Christoph Hellwig
  Cc: Omar Sandoval, Bart Van Assche, Hannes Reinecke, linux-kernel

On 11/04/2017 08:18 AM, Jens Axboe wrote:
> On 11/03/2017 10:17 PM, Ming Lei wrote:
>> On Thu, Nov 02, 2017 at 11:24:31PM +0800, Ming Lei wrote:
>>> Hi Jens,
>>>
>>> This patchset avoids to allocate driver tag beforehand for flush rq
>>> in case of I/O scheduler, then flush rq isn't treated specially
>>> wrt. get/put driver tag, code gets cleanup much, such as,
>>> reorder_tags_to_front() is removed, and we needn't to worry
>>> about request order in dispatch list for avoiding I/O deadlock.
>>>
>>> 'dbench -t 30 -s 64' has been run on different devices(shared tag,
>>> multi-queue, singele queue, ...), and no issues are observed,
>>> even very low queue depth test are run, debench still works well.
>>>
>>> Please consider it for V4.15, thanks!
>>
>> Hi Jens,
>>
>> As we discussed before, this patch is a good cleanup on handling flush
>> request, could you share your opinion on V3?
> 
> It looks fine to me. But I'd really like to have the potential hang
> in the current 4.15 branch ironed out, before we pile more stuff on
> top. Meanwhile, I'll see if this passes my testing.

Passes testing. I applied it, but a heads-up that I rewrote most of
your changelogs and also changed formatting. Please at least follow
the 72 char line rule for commit messages, otherwise git log ends
up looking like a mess really quick.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-11-04 18:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-02 15:24 [PATCH V3 0/7] blk-mq: don't allocate driver tag beforehand for flush rq Ming Lei
2017-11-02 15:24 ` [PATCH V3 1/7] blk-mq: put the driver tag of nxt rq before first one is requeued Ming Lei
2017-11-02 15:24 ` [PATCH V3 2/7] blk-flush: don't run queue for requests of bypassing flush Ming Lei
2017-11-02 15:24 ` [PATCH V3 3/7] block: pass 'run_queue' to blk_mq_request_bypass_insert Ming Lei
2017-11-02 15:24 ` [PATCH V3 4/7] blk-flush: use blk_mq_request_bypass_insert() Ming Lei
2017-11-02 15:24 ` [PATCH V3 5/7] blk-mq-sched: decide how to handle flush rq via RQF_FLUSH_SEQ Ming Lei
2017-11-02 15:24 ` [PATCH V3 6/7] blk-mq: move blk_mq_put_driver_tag*() into blk-mq.h Ming Lei
2017-11-02 15:24 ` [PATCH V3 7/7] blk-mq: don't allocate driver tag beforehand for flush rq Ming Lei
2017-11-04  4:17 ` [PATCH V3 0/7] " Ming Lei
2017-11-04 14:18   ` Jens Axboe
2017-11-04 18:49     ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.