All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/7] blk-mq: don't allocate driver tag beforehand for flush rq
@ 2017-09-16  3:03 Ming Lei
  2017-09-16  3:03 ` Ming Lei
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: Ming Lei @ 2017-09-16  3:03 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Omar Sandoval, Ming Lei

Hi,

This patchset avoids to allocate driver tag beforehand for flush rq
in case of I/O scheduler, then flush rq isn't treated specially
wrt. get/put driver tag, code gets cleanup much, such as,
reorder_tags_to_front() is removed, and we needn't to worry
about request order in dispatch list for avoiding I/O deadlock.

'dbench -t 30 -s -F 64' has been run on different devices(shared tag,
multi-queue, singele queue, ...), and no issues are observed,
even very low queue depth(1) test are run, debench still works
well.

Any comments are welcome!

V2:
	- release driver tag before requeue
	- warning on inserting a req with driver tag


Ming Lei (7):
  blk-flush: don't run queue for requests of bypassing flush
  block: pass 'run_queue' to blk_mq_request_bypass_insert
  blk-flush: use blk_mq_request_bypass_insert()
  blk-mq: decide how to handle flush rq via RQF_FLUSH_SEQ
  blk-mq: move blk_mq_put_driver_tag*() into blk-mq.h
  blk-mq: don't allocate driver tag beforehand for flush rq
  blk-mq-sched: warning on inserting a req with driver tag allocated

 block/blk-core.c     |  2 +-
 block/blk-flush.c    | 43 +++++++++++++++++++++++-----
 block/blk-mq-sched.c | 65 ++++++++++++-----------------------------
 block/blk-mq.c       | 81 +++++++++-------------------------------------------
 block/blk-mq.h       | 35 ++++++++++++++++++++++-
 5 files changed, 102 insertions(+), 124 deletions(-)

-- 
2.9.5

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH V2 0/7] blk-mq: don't allocate driver tag beforehand for flush rq
  2017-09-16  3:03 [PATCH V2 0/7] blk-mq: don't allocate driver tag beforehand for flush rq Ming Lei
@ 2017-09-16  3:03 ` Ming Lei
  2017-09-22  0:36   ` Ming Lei
  2017-09-16  3:03 ` [PATCH V2 1/7] blk-flush: don't run queue for requests of bypassing flush Ming Lei
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2017-09-16  3:03 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Omar Sandoval, Ming Lei

Hi,

This patchset avoids to allocate driver tag beforehand for flush rq
in case of I/O scheduler, then flush rq isn't treated specially
wrt. get/put driver tag, code gets cleanup much, such as,
reorder_tags_to_front() is removed, and we needn't to worry
about request order in dispatch list for avoiding I/O deadlock.

'dbench -t 30 -s -F 64' has been run on different devices(shared tag,
multi-queue, singele queue, ...), and no issues are observed,
even very low queue depth(1) test are run, debench still works
well.

Any comments are welcome!

V2:
	- release driver tag before requeue
	- warning on inserting a req with driver tag


Ming Lei (7):
  blk-flush: don't run queue for requests of bypassing flush
  block: pass 'run_queue' to blk_mq_request_bypass_insert
  blk-flush: use blk_mq_request_bypass_insert()
  blk-mq: decide how to handle flush rq via RQF_FLUSH_SEQ
  blk-mq: move blk_mq_put_driver_tag*() into blk-mq.h
  blk-mq: don't allocate driver tag beforehand for flush rq
  blk-mq-sched: warning on inserting a req with driver tag allocated

 block/blk-core.c     |  2 +-
 block/blk-flush.c    | 43 +++++++++++++++++++++++-----
 block/blk-mq-sched.c | 65 ++++++++++++-----------------------------
 block/blk-mq.c       | 81 +++++++++-------------------------------------------
 block/blk-mq.h       | 35 ++++++++++++++++++++++-
 5 files changed, 102 insertions(+), 124 deletions(-)

-- 
2.9.5

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH V2 1/7] blk-flush: don't run queue for requests of bypassing flush
  2017-09-16  3:03 [PATCH V2 0/7] blk-mq: don't allocate driver tag beforehand for flush rq Ming Lei
  2017-09-16  3:03 ` Ming Lei
@ 2017-09-16  3:03 ` Ming Lei
  2017-09-16  3:03 ` [PATCH V2 2/7] block: pass 'run_queue' to blk_mq_request_bypass_insert Ming Lei
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2017-09-16  3:03 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Omar Sandoval, Ming Lei

blk_insert_flush() should only insert request since run
queue always follows it.

For the case of bypassing flush, we don't need to run queue
since every blk_insert_flush() follows one run queue.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-flush.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 4938bec8cfef..81bd1a843043 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -463,7 +463,7 @@ void blk_insert_flush(struct request *rq)
 	if ((policy & REQ_FSEQ_DATA) &&
 	    !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
 		if (q->mq_ops)
-			blk_mq_sched_insert_request(rq, false, true, false, false);
+			blk_mq_sched_insert_request(rq, false, false, false, false);
 		else
 			list_add_tail(&rq->queuelist, &q->queue_head);
 		return;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH V2 2/7] block: pass 'run_queue' to blk_mq_request_bypass_insert
  2017-09-16  3:03 [PATCH V2 0/7] blk-mq: don't allocate driver tag beforehand for flush rq Ming Lei
  2017-09-16  3:03 ` Ming Lei
  2017-09-16  3:03 ` [PATCH V2 1/7] blk-flush: don't run queue for requests of bypassing flush Ming Lei
@ 2017-09-16  3:03 ` Ming Lei
  2017-09-16  3:03 ` [PATCH V2 3/7] blk-flush: use blk_mq_request_bypass_insert() Ming Lei
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2017-09-16  3:03 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Omar Sandoval, Ming Lei

Block flush need this function without needing run queue,
so introduce the parameter.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c | 2 +-
 block/blk-mq.c   | 5 +++--
 block/blk-mq.h   | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index aebe676225e6..bb9ec0013582 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2347,7 +2347,7 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *
 		 * bypass a potential scheduler on the bottom device for
 		 * insert.
 		 */
-		blk_mq_request_bypass_insert(rq);
+		blk_mq_request_bypass_insert(rq, true);
 		return BLK_STS_OK;
 	}
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 98a18609755e..21331e785919 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1405,7 +1405,7 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
  * Should only be used carefully, when the caller knows we want to
  * bypass a potential IO scheduler on the target device.
  */
-void blk_mq_request_bypass_insert(struct request *rq)
+void blk_mq_request_bypass_insert(struct request *rq, bool run_queue)
 {
 	struct blk_mq_ctx *ctx = rq->mq_ctx;
 	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, ctx->cpu);
@@ -1414,7 +1414,8 @@ void blk_mq_request_bypass_insert(struct request *rq)
 	list_add_tail(&rq->queuelist, &hctx->dispatch);
 	spin_unlock(&hctx->lock);
 
-	blk_mq_run_hw_queue(hctx, false);
+	if (run_queue)
+		blk_mq_run_hw_queue(hctx, false);
 }
 
 void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
diff --git a/block/blk-mq.h b/block/blk-mq.h
index ef15b3414da5..038bdeb84629 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -54,7 +54,7 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
  */
 void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 				bool at_head);
-void blk_mq_request_bypass_insert(struct request *rq);
+void blk_mq_request_bypass_insert(struct request *rq, bool run_queue);
 void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
 				struct list_head *list);
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH V2 3/7] blk-flush: use blk_mq_request_bypass_insert()
  2017-09-16  3:03 [PATCH V2 0/7] blk-mq: don't allocate driver tag beforehand for flush rq Ming Lei
                   ` (2 preceding siblings ...)
  2017-09-16  3:03 ` [PATCH V2 2/7] block: pass 'run_queue' to blk_mq_request_bypass_insert Ming Lei
@ 2017-09-16  3:03 ` Ming Lei
  2017-09-16  3:03 ` [PATCH V2 4/7] blk-mq: decide how to handle flush rq via RQF_FLUSH_SEQ Ming Lei
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2017-09-16  3:03 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Omar Sandoval, Ming Lei

In the following patch, we will use RQF_FLUSH_SEQ
to decide:

- if the flag isn't set, the flush rq need to be inserted
via blk_insert_flush()
- otherwise, the flush rq need to be dispatched directly
since it is in flush machinery now.

So we use blk_mq_request_bypass_insert() for requsts
of bypassing flush machinery, like what the legacy did.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-flush.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 81bd1a843043..a9773d2075ac 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -463,7 +463,7 @@ void blk_insert_flush(struct request *rq)
 	if ((policy & REQ_FSEQ_DATA) &&
 	    !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
 		if (q->mq_ops)
-			blk_mq_sched_insert_request(rq, false, false, false, false);
+			blk_mq_request_bypass_insert(rq, false);
 		else
 			list_add_tail(&rq->queuelist, &q->queue_head);
 		return;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH V2 4/7] blk-mq: decide how to handle flush rq via RQF_FLUSH_SEQ
  2017-09-16  3:03 [PATCH V2 0/7] blk-mq: don't allocate driver tag beforehand for flush rq Ming Lei
                   ` (3 preceding siblings ...)
  2017-09-16  3:03 ` [PATCH V2 3/7] blk-flush: use blk_mq_request_bypass_insert() Ming Lei
@ 2017-09-16  3:03 ` Ming Lei
  2017-09-16  3:03 ` [PATCH V2 5/7] blk-mq: move blk_mq_put_driver_tag*() into blk-mq.h Ming Lei
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2017-09-16  3:03 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Omar Sandoval, Ming Lei

Now we always preallocate one driver tag before blk_insert_flush(),
and flush request will be marked as RQF_FLUSH_SEQ once it is in
flush machinary.

So if RQF_FLUSH_SEQ isn't set, we call blk_insert_flush()
to handle the request, otherwise the flush request is
dispatched to ->dispatch list directly.

This is a prepare patch for not preallocating driver tag
on flush rq, and remove the special case.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 4ab69435708c..fa0f33d7efb9 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -260,21 +260,23 @@ void blk_mq_sched_request_inserted(struct request *rq)
 EXPORT_SYMBOL_GPL(blk_mq_sched_request_inserted);
 
 static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
+				       bool has_sched,
 				       struct request *rq)
 {
-	if (rq->tag == -1) {
+	/* dispatch flush rq directly */
+	if (rq->rq_flags & RQF_FLUSH_SEQ) {
+		spin_lock(&hctx->lock);
+		list_add(&rq->queuelist, &hctx->dispatch);
+		spin_unlock(&hctx->lock);
+		return true;
+	}
+
+	if (has_sched) {
 		rq->rq_flags |= RQF_SORTED;
-		return false;
+		WARN_ON(rq->tag != -1);
 	}
 
-	/*
-	 * If we already have a real request tag, send directly to
-	 * the dispatch list.
-	 */
-	spin_lock(&hctx->lock);
-	list_add(&rq->queuelist, &hctx->dispatch);
-	spin_unlock(&hctx->lock);
-	return true;
+	return false;
 }
 
 /**
@@ -362,12 +364,13 @@ void blk_mq_sched_insert_request(struct request *rq, bool at_head,
 	struct blk_mq_ctx *ctx = rq->mq_ctx;
 	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
 
-	if (rq->tag == -1 && op_is_flush(rq->cmd_flags)) {
+	/* flush rq in flush machinery need to be dispatched directly */
+	if (!(rq->rq_flags & RQF_FLUSH_SEQ) && op_is_flush(rq->cmd_flags)) {
 		blk_mq_sched_insert_flush(hctx, rq, can_block);
 		return;
 	}
 
-	if (e && blk_mq_sched_bypass_insert(hctx, rq))
+	if (blk_mq_sched_bypass_insert(hctx, !!e, rq))
 		goto run;
 
 	if (e && e->type->ops.mq.insert_requests) {
@@ -405,7 +408,7 @@ void blk_mq_sched_insert_requests(struct request_queue *q,
 		list_for_each_entry_safe(rq, next, list, queuelist) {
 			if (WARN_ON_ONCE(rq->tag != -1)) {
 				list_del_init(&rq->queuelist);
-				blk_mq_sched_bypass_insert(hctx, rq);
+				blk_mq_sched_bypass_insert(hctx, true, rq);
 			}
 		}
 	}
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH V2 5/7] blk-mq: move blk_mq_put_driver_tag*() into blk-mq.h
  2017-09-16  3:03 [PATCH V2 0/7] blk-mq: don't allocate driver tag beforehand for flush rq Ming Lei
                   ` (4 preceding siblings ...)
  2017-09-16  3:03 ` [PATCH V2 4/7] blk-mq: decide how to handle flush rq via RQF_FLUSH_SEQ Ming Lei
@ 2017-09-16  3:03 ` Ming Lei
  2017-09-16  3:03 ` [PATCH V2 6/7] blk-mq: don't allocate driver tag beforehand for flush rq Ming Lei
  2017-09-16  3:03 ` [PATCH V2 7/7] blk-mq-sched: warning on inserting a req with driver tag allocated Ming Lei
  7 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2017-09-16  3:03 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Omar Sandoval, Ming Lei

We need this helper to put the tag for flush rq, since
we will not share tag in the flush request sequences
in case of I/O scheduler.

Also the driver tag need to be released before requeuing.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 32 --------------------------------
 block/blk-mq.h | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 32 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 21331e785919..b799d4e22c0b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -920,38 +920,6 @@ bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
 	return rq->tag != -1;
 }
 
-static void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
-				    struct request *rq)
-{
-	blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag);
-	rq->tag = -1;
-
-	if (rq->rq_flags & RQF_MQ_INFLIGHT) {
-		rq->rq_flags &= ~RQF_MQ_INFLIGHT;
-		atomic_dec(&hctx->nr_active);
-	}
-}
-
-static void blk_mq_put_driver_tag_hctx(struct blk_mq_hw_ctx *hctx,
-				       struct request *rq)
-{
-	if (rq->tag == -1 || rq->internal_tag == -1)
-		return;
-
-	__blk_mq_put_driver_tag(hctx, rq);
-}
-
-static void blk_mq_put_driver_tag(struct request *rq)
-{
-	struct blk_mq_hw_ctx *hctx;
-
-	if (rq->tag == -1 || rq->internal_tag == -1)
-		return;
-
-	hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu);
-	__blk_mq_put_driver_tag(hctx, rq);
-}
-
 /*
  * If we fail getting a driver tag because all the driver tags are already
  * assigned and on the dispatch list, BUT the first entry does not have a
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 038bdeb84629..2bef2d0cca7b 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -2,6 +2,7 @@
 #define INT_BLK_MQ_H
 
 #include "blk-stat.h"
+#include "blk-mq-tag.h"
 
 struct blk_mq_tag_set;
 
@@ -137,4 +138,36 @@ static inline bool blk_mq_hw_queue_mapped(struct blk_mq_hw_ctx *hctx)
 void blk_mq_in_flight(struct request_queue *q, struct hd_struct *part,
 			unsigned int inflight[2]);
 
+static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
+					   struct request *rq)
+{
+	blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag);
+	rq->tag = -1;
+
+	if (rq->rq_flags & RQF_MQ_INFLIGHT) {
+		rq->rq_flags &= ~RQF_MQ_INFLIGHT;
+		atomic_dec(&hctx->nr_active);
+	}
+}
+
+static inline void blk_mq_put_driver_tag_hctx(struct blk_mq_hw_ctx *hctx,
+				       struct request *rq)
+{
+	if (rq->tag == -1 || rq->internal_tag == -1)
+		return;
+
+	__blk_mq_put_driver_tag(hctx, rq);
+}
+
+static inline void blk_mq_put_driver_tag(struct request *rq)
+{
+	struct blk_mq_hw_ctx *hctx;
+
+	if (rq->tag == -1 || rq->internal_tag == -1)
+		return;
+
+	hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu);
+	__blk_mq_put_driver_tag(hctx, rq);
+}
+
 #endif
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH V2 6/7] blk-mq: don't allocate driver tag beforehand for flush rq
  2017-09-16  3:03 [PATCH V2 0/7] blk-mq: don't allocate driver tag beforehand for flush rq Ming Lei
                   ` (5 preceding siblings ...)
  2017-09-16  3:03 ` [PATCH V2 5/7] blk-mq: move blk_mq_put_driver_tag*() into blk-mq.h Ming Lei
@ 2017-09-16  3:03 ` Ming Lei
  2017-09-16  3:03 ` [PATCH V2 7/7] blk-mq-sched: warning on inserting a req with driver tag allocated Ming Lei
  7 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2017-09-16  3:03 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Omar Sandoval, Ming Lei

The behind idea is simple:

1) for none scheduler, driver tag has to be borrowed for flush
rq, otherwise we may run out of tag, and IO hang is caused.
get/put driver tag is actually a nop, so reorder tags isn't
necessary at all.

2) for real I/O scheduler, we needn't to allocate driver tag
beforehand for flush rq, and it works just fine to follow the
way for normal requests: allocate driver tag for each rq just
before calling .queue_rq().

One visible change to driver is that the driver tag isn't
shared in the flush request sequence, that won't be a
problem, since we always do that in legacy path.

Then flush rq isn't treated specially wrt. get/put driver tag,
code gets cleanup much, such as, reorder_tags_to_front() is
removed, and we needn't to worry about request order in
dispatch list for avoiding I/O deadlock.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-flush.c    | 41 +++++++++++++++++++++++++++++++++++------
 block/blk-mq-sched.c | 36 ++----------------------------------
 block/blk-mq.c       | 44 ++++++++++----------------------------------
 3 files changed, 47 insertions(+), 74 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index a9773d2075ac..65914a23fa77 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -216,6 +216,17 @@ static bool blk_flush_complete_seq(struct request *rq,
 	return kicked | queued;
 }
 
+/*
+ * We don't share tag between flush rq and data rq in case of
+ * IO scheduler, so have to release tag and restart queue
+ */
+static void put_flush_driver_tag(struct blk_mq_hw_ctx *hctx,
+				 struct request *rq)
+{
+	blk_mq_put_driver_tag_hctx(hctx, rq);
+	blk_mq_sched_restart(hctx);
+}
+
 static void flush_end_io(struct request *flush_rq, blk_status_t error)
 {
 	struct request_queue *q = flush_rq->q;
@@ -231,8 +242,13 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
 		/* release the tag's ownership to the req cloned from */
 		spin_lock_irqsave(&fq->mq_flush_lock, flags);
 		hctx = blk_mq_map_queue(q, flush_rq->mq_ctx->cpu);
-		blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq);
-		flush_rq->tag = -1;
+		if (!q->elevator) {
+			blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq);
+			flush_rq->tag = -1;
+		} else {
+			put_flush_driver_tag(hctx, flush_rq);
+			flush_rq->internal_tag = -1;
+		}
 	}
 
 	running = &fq->flush_queue[fq->flush_running_idx];
@@ -321,16 +337,24 @@ static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq)
 	 * Borrow tag from the first request since they can't
 	 * be in flight at the same time. And acquire the tag's
 	 * ownership for flush req.
+	 *
+	 * In case of io scheduler, flush rq need to borrow
+	 * scheduler tag for making put/get driver tag workable.
+	 * In case of none, flush rq need to borrow driver tag.
 	 */
 	if (q->mq_ops) {
 		struct blk_mq_hw_ctx *hctx;
 
 		flush_rq->mq_ctx = first_rq->mq_ctx;
-		flush_rq->tag = first_rq->tag;
-		fq->orig_rq = first_rq;
 
-		hctx = blk_mq_map_queue(q, first_rq->mq_ctx->cpu);
-		blk_mq_tag_set_rq(hctx, first_rq->tag, flush_rq);
+		if (!q->elevator) {
+			fq->orig_rq = first_rq;
+			flush_rq->tag = first_rq->tag;
+			hctx = blk_mq_map_queue(q, first_rq->mq_ctx->cpu);
+			blk_mq_tag_set_rq(hctx, first_rq->tag, flush_rq);
+		} else {
+			flush_rq->internal_tag = first_rq->internal_tag;
+		}
 	}
 
 	flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH;
@@ -394,6 +418,11 @@ static void mq_flush_data_end_io(struct request *rq, blk_status_t error)
 
 	hctx = blk_mq_map_queue(q, ctx->cpu);
 
+	if (q->elevator) {
+		WARN_ON(rq->tag < 0);
+		put_flush_driver_tag(hctx, rq);
+	}
+
 	/*
 	 * After populating an empty queue, kick it to avoid stall.  Read
 	 * the comment in flush_end_io().
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index fa0f33d7efb9..c2ea8c2b896b 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -341,21 +341,6 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *const hctx)
 	}
 }
 
-/*
- * Add flush/fua to the queue. If we fail getting a driver tag, then
- * punt to the requeue list. Requeue will re-invoke us from a context
- * that's safe to block from.
- */
-static void blk_mq_sched_insert_flush(struct blk_mq_hw_ctx *hctx,
-				      struct request *rq, bool can_block)
-{
-	if (blk_mq_get_driver_tag(rq, &hctx, can_block)) {
-		blk_insert_flush(rq);
-		blk_mq_run_hw_queue(hctx, true);
-	} else
-		blk_mq_add_to_requeue_list(rq, false, true);
-}
-
 void blk_mq_sched_insert_request(struct request *rq, bool at_head,
 				 bool run_queue, bool async, bool can_block)
 {
@@ -366,8 +351,8 @@ void blk_mq_sched_insert_request(struct request *rq, bool at_head,
 
 	/* flush rq in flush machinery need to be dispatched directly */
 	if (!(rq->rq_flags & RQF_FLUSH_SEQ) && op_is_flush(rq->cmd_flags)) {
-		blk_mq_sched_insert_flush(hctx, rq, can_block);
-		return;
+		blk_insert_flush(rq);
+		goto run;
 	}
 
 	if (blk_mq_sched_bypass_insert(hctx, !!e, rq))
@@ -396,23 +381,6 @@ void blk_mq_sched_insert_requests(struct request_queue *q,
 	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
 	struct elevator_queue *e = hctx->queue->elevator;
 
-	if (e) {
-		struct request *rq, *next;
-
-		/*
-		 * We bypass requests that already have a driver tag assigned,
-		 * which should only be flushes. Flushes are only ever inserted
-		 * as single requests, so we shouldn't ever hit the
-		 * WARN_ON_ONCE() below (but let's handle it just in case).
-		 */
-		list_for_each_entry_safe(rq, next, list, queuelist) {
-			if (WARN_ON_ONCE(rq->tag != -1)) {
-				list_del_init(&rq->queuelist);
-				blk_mq_sched_bypass_insert(hctx, true, rq);
-			}
-		}
-	}
-
 	if (e && e->type->ops.mq.insert_requests)
 		e->type->ops.mq.insert_requests(hctx, list, false);
 	else
diff --git a/block/blk-mq.c b/block/blk-mq.c
index b799d4e22c0b..c02dea2e3c1e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -634,6 +634,12 @@ static void __blk_mq_requeue_request(struct request *rq)
 {
 	struct request_queue *q = rq->q;
 
+	/*
+	 * release driver tag if the rq is being requeued from
+	 * completion path
+	 */
+	blk_mq_put_driver_tag(rq);
+
 	trace_block_rq_requeue(q, rq);
 	wbt_requeue(q->rq_wb, &rq->issue_stat);
 	blk_mq_sched_requeue_request(rq);
@@ -920,30 +926,6 @@ bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
 	return rq->tag != -1;
 }
 
-/*
- * If we fail getting a driver tag because all the driver tags are already
- * assigned and on the dispatch list, BUT the first entry does not have a
- * tag, then we could deadlock. For that case, move entries with assigned
- * driver tags to the front, leaving the set of tagged requests in the
- * same order, and the untagged set in the same order.
- */
-static bool reorder_tags_to_front(struct list_head *list)
-{
-	struct request *rq, *tmp, *first = NULL;
-
-	list_for_each_entry_safe_reverse(rq, tmp, list, queuelist) {
-		if (rq == first)
-			break;
-		if (rq->tag != -1) {
-			list_move(&rq->queuelist, list);
-			if (!first)
-				first = rq;
-		}
-	}
-
-	return first != NULL;
-}
-
 static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode, int flags,
 				void *key)
 {
@@ -1001,9 +983,6 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
 
 		rq = list_first_entry(list, struct request, queuelist);
 		if (!blk_mq_get_driver_tag(rq, &hctx, false)) {
-			if (!queued && reorder_tags_to_front(list))
-				continue;
-
 			/*
 			 * The initial allocation attempt failed, so we need to
 			 * rerun the hardware queue when a tag is freed.
@@ -1610,13 +1589,10 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 	if (unlikely(is_flush_fua)) {
 		blk_mq_put_ctx(data.ctx);
 		blk_mq_bio_to_request(rq, bio);
-		if (q->elevator) {
-			blk_mq_sched_insert_request(rq, false, true, true,
-					true);
-		} else {
-			blk_insert_flush(rq);
-			blk_mq_run_hw_queue(data.hctx, true);
-		}
+
+		/* bypass scheduler for flush rq */
+		blk_insert_flush(rq);
+		blk_mq_run_hw_queue(data.hctx, true);
 	} else if (plug && q->nr_hw_queues == 1) {
 		struct request *last = NULL;
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH V2 7/7] blk-mq-sched: warning on inserting a req with driver tag allocated
  2017-09-16  3:03 [PATCH V2 0/7] blk-mq: don't allocate driver tag beforehand for flush rq Ming Lei
                   ` (6 preceding siblings ...)
  2017-09-16  3:03 ` [PATCH V2 6/7] blk-mq: don't allocate driver tag beforehand for flush rq Ming Lei
@ 2017-09-16  3:03 ` Ming Lei
  7 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2017-09-16  3:03 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Omar Sandoval, Ming Lei

In case of IO scheduler, any request shouldn't have a tag
assigned before dispatching, so add the warning to monitor
possible bug.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index c2ea8c2b896b..b6934a9424ff 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -271,10 +271,8 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
 		return true;
 	}
 
-	if (has_sched) {
+	if (has_sched)
 		rq->rq_flags |= RQF_SORTED;
-		WARN_ON(rq->tag != -1);
-	}
 
 	return false;
 }
@@ -355,6 +353,8 @@ void blk_mq_sched_insert_request(struct request *rq, bool at_head,
 		goto run;
 	}
 
+	WARN_ON(e && (rq->tag != -1));
+
 	if (blk_mq_sched_bypass_insert(hctx, !!e, rq))
 		goto run;
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH V2 0/7] blk-mq: don't allocate driver tag beforehand for flush rq
  2017-09-16  3:03 ` Ming Lei
@ 2017-09-22  0:36   ` Ming Lei
  0 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2017-09-22  0:36 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Omar Sandoval

On Sat, Sep 16, 2017 at 11:03:44AM +0800, Ming Lei wrote:
> Hi,
> 
> This patchset avoids to allocate driver tag beforehand for flush rq
> in case of I/O scheduler, then flush rq isn't treated specially
> wrt. get/put driver tag, code gets cleanup much, such as,
> reorder_tags_to_front() is removed, and we needn't to worry
> about request order in dispatch list for avoiding I/O deadlock.
> 
> 'dbench -t 30 -s -F 64' has been run on different devices(shared tag,
> multi-queue, singele queue, ...), and no issues are observed,
> even very low queue depth(1) test are run, debench still works
> well.
> 
> Any comments are welcome!
> 
> V2:
> 	- release driver tag before requeue
> 	- warning on inserting a req with driver tag

Hi Jens,

Did V2 pass your test?

If yes and you are fine with this patchset, could you
move on and make it land in your for-next?

-- 
Ming

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-09-22  0:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-16  3:03 [PATCH V2 0/7] blk-mq: don't allocate driver tag beforehand for flush rq Ming Lei
2017-09-16  3:03 ` Ming Lei
2017-09-22  0:36   ` Ming Lei
2017-09-16  3:03 ` [PATCH V2 1/7] blk-flush: don't run queue for requests of bypassing flush Ming Lei
2017-09-16  3:03 ` [PATCH V2 2/7] block: pass 'run_queue' to blk_mq_request_bypass_insert Ming Lei
2017-09-16  3:03 ` [PATCH V2 3/7] blk-flush: use blk_mq_request_bypass_insert() Ming Lei
2017-09-16  3:03 ` [PATCH V2 4/7] blk-mq: decide how to handle flush rq via RQF_FLUSH_SEQ Ming Lei
2017-09-16  3:03 ` [PATCH V2 5/7] blk-mq: move blk_mq_put_driver_tag*() into blk-mq.h Ming Lei
2017-09-16  3:03 ` [PATCH V2 6/7] blk-mq: don't allocate driver tag beforehand for flush rq Ming Lei
2017-09-16  3:03 ` [PATCH V2 7/7] blk-mq-sched: warning on inserting a req with driver tag allocated Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.