linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 0/5] dm-rq: improve sequential I/O performance
@ 2017-11-27  5:07 Ming Lei
  2017-11-27  5:07 ` [PATCH V2 1/5] dm-mpath: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE Ming Lei
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Ming Lei @ 2017-11-27  5:07 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Mike Snitzer, dm-devel
  Cc: Christoph Hellwig, Bart Van Assche, linux-kernel,
	Hannes Reinecke, Omar Sandoval, Ming Lei

Hi Guys,

The 1st patch removes the workaround of blk_mq_delay_run_hw_queue() in
case of requeue, this way isn't necessary, and more worse, it makes
BLK_MQ_S_SCHED_RESTART not working, and degarde I/O performance.

The 2nd patch return DM_MAPIO_REQUEUE to dm-rq if underlying request
allocation fails, then we can return BLK_STS_RESOURCE from dm-rq to
blk-mq, so that blk-mq can hold the requests to be dequeued.

The other 3 paches changes the blk-mq part of blk_insert_cloned_request(),
in which we switch to blk_mq_try_issue_directly(), so that both dm-rq
and blk-mq can get the dispatch result of underlying queue, and with
this information, blk-mq can handle IO merge much better, then
sequential I/O performance is improved much. In my dm-mpath over
virtio-scsi test, this improvement can be 3X ~ 5X.

V2:
	- drop 'dm-mpath: cache ti->clone during requeue', which is a bit
	too complicated, and not see obvious performance improvement.
	- make change on blk-mq part cleaner

Ming Lei (5):
  dm-mpath: don't call blk_mq_delay_run_hw_queue() in case of
    BLK_STS_RESOURCE
  dm-mpath: return DM_MAPIO_REQUEUE in case of rq allocation failure
  blk-mq: move actual issue into one helper
  blk-mq: return dispatch result to caller in blk_mq_try_issue_directly
  blk-mq: issue request directly for blk_insert_cloned_request

 block/blk-core.c      |  3 +-
 block/blk-mq.c        | 88 +++++++++++++++++++++++++++++++++++++++------------
 block/blk-mq.h        |  3 ++
 drivers/md/dm-mpath.c | 18 ++++++++---
 drivers/md/dm-rq.c    | 20 +++++++++---
 5 files changed, 102 insertions(+), 30 deletions(-)

-- 
2.9.5

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH V2 1/5] dm-mpath: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
  2017-11-27  5:07 [PATCH V2 0/5] dm-rq: improve sequential I/O performance Ming Lei
@ 2017-11-27  5:07 ` Ming Lei
  2017-11-27 17:14   ` Bart Van Assche
  2017-11-27  5:07 ` [PATCH V2 2/5] dm-mpath: return DM_MAPIO_REQUEUE in case of rq allocation failure Ming Lei
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2017-11-27  5:07 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Mike Snitzer, dm-devel
  Cc: Christoph Hellwig, Bart Van Assche, linux-kernel,
	Hannes Reinecke, Omar Sandoval, Ming Lei

If .queue_rq() returns BLK_STS_RESOURCE, blk-mq will rerun the queue in
the three situations:

1) if BLK_MQ_S_SCHED_RESTART is set
- queue is rerun after one rq is completed, see blk_mq_sched_restart()
which is run from blk_mq_free_request()

2) run out of driver tag
- queue is rerun after one tag is freed

3) otherwise
- queue is run immediately in blk_mq_dispatch_rq_list()

This random dealy of running hw queue is introduced by commit 6077c2d706097c0
(dm rq: Avoid that request processing stalls sporadically), which claimed
one request processing stalling is fixed, but never explained the behind
idea, and it is a workaound at most. Even the question isn't explained by
anyone in recent discussion.

Also calling blk_mq_delay_run_hw_queue() inside .queue_rq() is a horrible
hack because it makes BLK_MQ_S_SCHED_RESTART not working, and degrades I/O
peformance a lot.

Finally this patch makes sure that dm-rq returns BLK_STS_RESOURCE to blk-mq
only when underlying queue is out of resource, so we switch to return
DM_MAPIO_DELAY_REQUEU if either MPATHF_QUEUE_IO or MPATHF_PG_INIT_REQUIRED
is set in multipath_clone_and_map().

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/md/dm-mpath.c | 4 +---
 drivers/md/dm-rq.c    | 1 -
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index c8faa2b85842..8fe3f45407ce 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -484,9 +484,7 @@ static int multipath_clone_and_map(struct dm_target *ti, struct request *rq,
 		return DM_MAPIO_KILL;
 	} else if (test_bit(MPATHF_QUEUE_IO, &m->flags) ||
 		   test_bit(MPATHF_PG_INIT_REQUIRED, &m->flags)) {
-		if (pg_init_all_paths(m))
-			return DM_MAPIO_DELAY_REQUEUE;
-		return DM_MAPIO_REQUEUE;
+		return DM_MAPIO_DELAY_REQUEUE;
 	}
 
 	memset(mpio, 0, sizeof(*mpio));
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 9d32f25489c2..cbe8a06ef8b0 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -758,7 +758,6 @@ static blk_status_t dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 		/* Undo dm_start_request() before requeuing */
 		rq_end_stats(md, rq);
 		rq_completed(md, rq_data_dir(rq), false);
-		blk_mq_delay_run_hw_queue(hctx, 100/*ms*/);
 		return BLK_STS_RESOURCE;
 	}
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V2 2/5] dm-mpath: return DM_MAPIO_REQUEUE in case of rq allocation failure
  2017-11-27  5:07 [PATCH V2 0/5] dm-rq: improve sequential I/O performance Ming Lei
  2017-11-27  5:07 ` [PATCH V2 1/5] dm-mpath: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE Ming Lei
@ 2017-11-27  5:07 ` Ming Lei
  2017-11-27  5:07 ` [PATCH V2 3/5] blk-mq: move actual issue into one helper Ming Lei
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2017-11-27  5:07 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Mike Snitzer, dm-devel
  Cc: Christoph Hellwig, Bart Van Assche, linux-kernel,
	Hannes Reinecke, Omar Sandoval, Ming Lei

blk-mq will rerun queue via RESTART or dispatch wake after one request
is completed, so not necessary to wait random time for requeuing, we
should trust blk-mq to do it.

More importantly, we need return BLK_STS_RESOURCE to blk-mq so that
dequeue from I/O scheduler can be stopped, then I/O merge gets improved.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/md/dm-mpath.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 8fe3f45407ce..1fa1ff5bfe73 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -500,8 +500,20 @@ static int multipath_clone_and_map(struct dm_target *ti, struct request *rq,
 		if (queue_dying) {
 			atomic_inc(&m->pg_init_in_progress);
 			activate_or_offline_path(pgpath);
+			return DM_MAPIO_DELAY_REQUEUE;
 		}
-		return DM_MAPIO_DELAY_REQUEUE;
+
+		/*
+		 * blk-mq's SCHED_RESTART can cover this requeue, so
+		 * we needn't to deal with it by DELAY_REQUEUE. More
+		 * importantly, we have to return DM_MAPIO_REQUEUE
+		 * so that blk-mq can get the queue busy feedback,
+		 * otherwise I/O merge can be hurt.
+		 */
+		if (q->mq_ops)
+			return DM_MAPIO_REQUEUE;
+		else
+			return DM_MAPIO_DELAY_REQUEUE;
 	}
 	clone->bio = clone->biotail = NULL;
 	clone->rq_disk = bdev->bd_disk;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V2 3/5] blk-mq: move actual issue into one helper
  2017-11-27  5:07 [PATCH V2 0/5] dm-rq: improve sequential I/O performance Ming Lei
  2017-11-27  5:07 ` [PATCH V2 1/5] dm-mpath: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE Ming Lei
  2017-11-27  5:07 ` [PATCH V2 2/5] dm-mpath: return DM_MAPIO_REQUEUE in case of rq allocation failure Ming Lei
@ 2017-11-27  5:07 ` Ming Lei
  2017-11-27  5:07 ` [PATCH V2 4/5] blk-mq: return dispatch result to caller in blk_mq_try_issue_directly Ming Lei
  2017-11-27  5:07 ` [PATCH V2 5/5] blk-mq: issue request directly for blk_insert_cloned_request Ming Lei
  4 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2017-11-27  5:07 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Mike Snitzer, dm-devel
  Cc: Christoph Hellwig, Bart Van Assche, linux-kernel,
	Hannes Reinecke, Omar Sandoval, Ming Lei

No functional change, just to clean up code a bit, so that the following
change of using direct issue for blk_mq_request_bypass_insert() which is
needed by DM can be easier to do.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 39 +++++++++++++++++++++++++++------------
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 11097477eeab..fe82d7a47b35 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1593,15 +1593,38 @@ static blk_qc_t request_to_qc_t(struct blk_mq_hw_ctx *hctx, struct request *rq)
 	return blk_tag_to_qc_t(rq->internal_tag, hctx->queue_num, true);
 }
 
-static void __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
-					struct request *rq,
-					blk_qc_t *cookie, bool may_sleep)
+/* return true if insert is needed */
+static bool __blk_mq_issue_req(struct blk_mq_hw_ctx *hctx,
+			       struct request *rq,
+			       blk_qc_t *new_cookie,
+			       blk_status_t *ret)
 {
 	struct request_queue *q = rq->q;
 	struct blk_mq_queue_data bd = {
 		.rq = rq,
 		.last = true,
 	};
+
+	if (!blk_mq_get_driver_tag(rq, NULL, false))
+		return true;
+
+	if (!blk_mq_get_dispatch_budget(hctx)) {
+		blk_mq_put_driver_tag(rq);
+		return true;
+	}
+
+	*new_cookie = request_to_qc_t(hctx, rq);
+
+	*ret = q->mq_ops->queue_rq(hctx, &bd);
+
+	return false;
+}
+
+static void __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
+					struct request *rq,
+					blk_qc_t *cookie, bool may_sleep)
+{
+	struct request_queue *q = rq->q;
 	blk_qc_t new_cookie;
 	blk_status_t ret;
 	bool run_queue = true;
@@ -1615,22 +1638,14 @@ static void __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 	if (q->elevator)
 		goto insert;
 
-	if (!blk_mq_get_driver_tag(rq, NULL, false))
+	if (__blk_mq_issue_req(hctx, rq, &new_cookie, &ret))
 		goto insert;
 
-	if (!blk_mq_get_dispatch_budget(hctx)) {
-		blk_mq_put_driver_tag(rq);
-		goto insert;
-	}
-
-	new_cookie = request_to_qc_t(hctx, rq);
-
 	/*
 	 * For OK queue, we are done. For error, kill it. Any other
 	 * error (busy), just add it to our list as we previously
 	 * would have done
 	 */
-	ret = q->mq_ops->queue_rq(hctx, &bd);
 	switch (ret) {
 	case BLK_STS_OK:
 		*cookie = new_cookie;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V2 4/5] blk-mq: return dispatch result to caller in blk_mq_try_issue_directly
  2017-11-27  5:07 [PATCH V2 0/5] dm-rq: improve sequential I/O performance Ming Lei
                   ` (2 preceding siblings ...)
  2017-11-27  5:07 ` [PATCH V2 3/5] blk-mq: move actual issue into one helper Ming Lei
@ 2017-11-27  5:07 ` Ming Lei
  2017-11-27  5:07 ` [PATCH V2 5/5] blk-mq: issue request directly for blk_insert_cloned_request Ming Lei
  4 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2017-11-27  5:07 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Mike Snitzer, dm-devel
  Cc: Christoph Hellwig, Bart Van Assche, linux-kernel,
	Hannes Reinecke, Omar Sandoval, Ming Lei

In the following patch, we will use blk_mq_try_issue_directly() for DM
to return the dispatch result, and DM need this informatin to improve
IO merge.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index fe82d7a47b35..fd4fb6316ea1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1620,13 +1620,14 @@ static bool __blk_mq_issue_req(struct blk_mq_hw_ctx *hctx,
 	return false;
 }
 
-static void __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
-					struct request *rq,
-					blk_qc_t *cookie, bool may_sleep)
+static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
+						struct request *rq,
+						blk_qc_t *cookie,
+						bool may_sleep)
 {
 	struct request_queue *q = rq->q;
 	blk_qc_t new_cookie;
-	blk_status_t ret;
+	blk_status_t ret = BLK_STS_OK;
 	bool run_queue = true;
 
 	/* RCU or SRCU read lock is needed before checking quiesced flag */
@@ -1649,26 +1650,30 @@ static void __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 	switch (ret) {
 	case BLK_STS_OK:
 		*cookie = new_cookie;
-		return;
+		return ret;
 	case BLK_STS_RESOURCE:
 		__blk_mq_requeue_request(rq);
 		goto insert;
 	default:
 		*cookie = BLK_QC_T_NONE;
 		blk_mq_end_request(rq, ret);
-		return;
+		return ret;
 	}
 
 insert:
 	blk_mq_sched_insert_request(rq, false, run_queue, false, may_sleep);
+	return ret;
 }
 
-static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
-		struct request *rq, blk_qc_t *cookie)
+static blk_status_t blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
+					      struct request *rq,
+					      blk_qc_t *cookie)
 {
+	blk_status_t ret;
+
 	if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
 		rcu_read_lock();
-		__blk_mq_try_issue_directly(hctx, rq, cookie, false);
+		ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false);
 		rcu_read_unlock();
 	} else {
 		unsigned int srcu_idx;
@@ -1676,9 +1681,11 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 		might_sleep();
 
 		srcu_idx = srcu_read_lock(hctx->queue_rq_srcu);
-		__blk_mq_try_issue_directly(hctx, rq, cookie, true);
+		ret = __blk_mq_try_issue_directly(hctx, rq, cookie, true);
 		srcu_read_unlock(hctx->queue_rq_srcu, srcu_idx);
 	}
+
+	return ret;
 }
 
 static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V2 5/5] blk-mq: issue request directly for blk_insert_cloned_request
  2017-11-27  5:07 [PATCH V2 0/5] dm-rq: improve sequential I/O performance Ming Lei
                   ` (3 preceding siblings ...)
  2017-11-27  5:07 ` [PATCH V2 4/5] blk-mq: return dispatch result to caller in blk_mq_try_issue_directly Ming Lei
@ 2017-11-27  5:07 ` Ming Lei
  4 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2017-11-27  5:07 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Mike Snitzer, dm-devel
  Cc: Christoph Hellwig, Bart Van Assche, linux-kernel,
	Hannes Reinecke, Omar Sandoval, Ming Lei

blk_insert_cloned_request() is called in fast path of dm-rq driver, and
in this function we append request to hctx->dispatch_list of the underlying
queue directly.

1) This way isn't efficient enough because hctx lock is always required

2) With blk_insert_cloned_request(), we bypass underlying queue's IO scheduler
totally, and depend on DM rq driver to do IO schedule completely. But DM
rq driver can't get underlying queue's dispatch feedback at all, and this
information is extreamly useful for IO merge. Without that IO merge can't
be done basically by blk-mq, and causes very bad sequential IO performance.

This patch makes use of blk_mq_try_issue_directly() to dispatch rq to
underlying queue and provides disptch result to dm-rq and blk-mq, and
improves the above situations very much.

With this patch, seqential IO is improved by 3X ~ 5X in my test over
mpath/virtio-scsi.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c   |  3 +--
 block/blk-mq.c     | 32 +++++++++++++++++++++++++++++---
 block/blk-mq.h     |  3 +++
 drivers/md/dm-rq.c | 19 ++++++++++++++++---
 4 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index b8881750a3ac..e5a623b45a1d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2488,8 +2488,7 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *
 		 * bypass a potential scheduler on the bottom device for
 		 * insert.
 		 */
-		blk_mq_request_bypass_insert(rq, true);
-		return BLK_STS_OK;
+		return blk_mq_request_direct_issue(rq);
 	}
 
 	spin_lock_irqsave(q->queue_lock, flags);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index fd4fb6316ea1..c94a8d225b63 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1629,6 +1629,12 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 	blk_qc_t new_cookie;
 	blk_status_t ret = BLK_STS_OK;
 	bool run_queue = true;
+	/*
+	 * This function is called from blk_insert_cloned_request() if
+	 * 'cookie' is NULL, and for dispatching this request only.
+	 */
+	bool dispatch_only = !cookie;
+	bool need_insert;
 
 	/* RCU or SRCU read lock is needed before checking quiesced flag */
 	if (blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)) {
@@ -1636,10 +1642,19 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 		goto insert;
 	}
 
-	if (q->elevator)
+	if (q->elevator && !dispatch_only)
 		goto insert;
 
-	if (__blk_mq_issue_req(hctx, rq, &new_cookie, &ret))
+	need_insert = __blk_mq_issue_req(hctx, rq, &new_cookie, &ret);
+	if (dispatch_only) {
+		if (need_insert)
+			return BLK_STS_RESOURCE;
+		if (ret == BLK_STS_RESOURCE)
+			__blk_mq_requeue_request(rq);
+		return ret;
+	}
+
+	if (need_insert)
 		goto insert;
 
 	/*
@@ -1661,7 +1676,10 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 	}
 
 insert:
-	blk_mq_sched_insert_request(rq, false, run_queue, false, may_sleep);
+	if (!dispatch_only)
+		blk_mq_sched_insert_request(rq, false, run_queue, false, may_sleep);
+	else
+		blk_mq_request_bypass_insert(rq, run_queue);
 	return ret;
 }
 
@@ -1688,6 +1706,14 @@ static blk_status_t blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 	return ret;
 }
 
+blk_status_t blk_mq_request_direct_issue(struct request *rq)
+{
+	struct blk_mq_ctx *ctx = rq->mq_ctx;
+	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, ctx->cpu);
+
+	return blk_mq_try_issue_directly(hctx, rq, NULL);
+}
+
 static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 {
 	const int is_sync = op_is_sync(bio->bi_opf);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 6c7c3ff5bf62..81df35fbce77 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -60,6 +60,9 @@ void blk_mq_request_bypass_insert(struct request *rq, bool run_queue);
 void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
 				struct list_head *list);
 
+/* Used by DM for issuing req directly */
+blk_status_t blk_mq_request_direct_issue(struct request *rq);
+
 /*
  * CPU -> queue mappings
  */
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index cbe8a06ef8b0..b96aa208e5cc 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -395,7 +395,7 @@ static void end_clone_request(struct request *clone, blk_status_t error)
 	dm_complete_request(tio->orig, error);
 }
 
-static void dm_dispatch_clone_request(struct request *clone, struct request *rq)
+static blk_status_t dm_dispatch_clone_request(struct request *clone, struct request *rq)
 {
 	blk_status_t r;
 
@@ -404,9 +404,10 @@ static void dm_dispatch_clone_request(struct request *clone, struct request *rq)
 
 	clone->start_time = jiffies;
 	r = blk_insert_cloned_request(clone->q, clone);
-	if (r)
+	if (r != BLK_STS_OK && r != BLK_STS_RESOURCE)
 		/* must complete clone in terms of original request */
 		dm_complete_request(rq, r);
+	return r;
 }
 
 static int dm_rq_bio_constructor(struct bio *bio, struct bio *bio_orig,
@@ -476,8 +477,10 @@ static int map_request(struct dm_rq_target_io *tio)
 	struct mapped_device *md = tio->md;
 	struct request *rq = tio->orig;
 	struct request *clone = NULL;
+	blk_status_t ret;
 
 	r = ti->type->clone_and_map_rq(ti, rq, &tio->info, &clone);
+ check_again:
 	switch (r) {
 	case DM_MAPIO_SUBMITTED:
 		/* The target has taken the I/O to submit by itself later */
@@ -492,7 +495,17 @@ static int map_request(struct dm_rq_target_io *tio)
 		/* The target has remapped the I/O so dispatch it */
 		trace_block_rq_remap(clone->q, clone, disk_devt(dm_disk(md)),
 				     blk_rq_pos(rq));
-		dm_dispatch_clone_request(clone, rq);
+		ret = dm_dispatch_clone_request(clone, rq);
+		if (ret == BLK_STS_RESOURCE) {
+			blk_rq_unprep_clone(clone);
+			tio->ti->type->release_clone_rq(clone);
+			tio->clone = NULL;
+			if (!rq->q->mq_ops)
+				r = DM_MAPIO_DELAY_REQUEUE;
+			else
+				r = DM_MAPIO_REQUEUE;
+			goto check_again;
+		}
 		break;
 	case DM_MAPIO_REQUEUE:
 		/* The target wants to requeue the I/O */
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH V2 1/5] dm-mpath: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
  2017-11-27  5:07 ` [PATCH V2 1/5] dm-mpath: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE Ming Lei
@ 2017-11-27 17:14   ` Bart Van Assche
  2017-12-01  2:01     ` Ming Lei
  0 siblings, 1 reply; 8+ messages in thread
From: Bart Van Assche @ 2017-11-27 17:14 UTC (permalink / raw)
  To: dm-devel, linux-block, axboe, ming.lei, snitzer
  Cc: hch, linux-kernel, hare, osandov

T24gTW9uLCAyMDE3LTExLTI3IGF0IDEzOjA3ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gSWYg
LnF1ZXVlX3JxKCkgcmV0dXJucyBCTEtfU1RTX1JFU09VUkNFLCBibGstbXEgd2lsbCByZXJ1biB0
aGUgcXVldWUgaW4NCj4gdGhlIHRocmVlIHNpdHVhdGlvbnM6DQo+IA0KPiAxKSBpZiBCTEtfTVFf
U19TQ0hFRF9SRVNUQVJUIGlzIHNldA0KPiAtIHF1ZXVlIGlzIHJlcnVuIGFmdGVyIG9uZSBycSBp
cyBjb21wbGV0ZWQsIHNlZSBibGtfbXFfc2NoZWRfcmVzdGFydCgpDQo+IHdoaWNoIGlzIHJ1biBm
cm9tIGJsa19tcV9mcmVlX3JlcXVlc3QoKQ0KPiANCj4gMikgcnVuIG91dCBvZiBkcml2ZXIgdGFn
DQo+IC0gcXVldWUgaXMgcmVydW4gYWZ0ZXIgb25lIHRhZyBpcyBmcmVlZA0KPiANCj4gMykgb3Ro
ZXJ3aXNlDQo+IC0gcXVldWUgaXMgcnVuIGltbWVkaWF0ZWx5IGluIGJsa19tcV9kaXNwYXRjaF9y
cV9saXN0KCkNCj4gDQo+IFRoaXMgcmFuZG9tIGRlYWx5IG9mIHJ1bm5pbmcgaHcgcXVldWUgaXMg
aW50cm9kdWNlZCBieSBjb21taXQgNjA3N2MyZDcwNjA5N2MwDQo+IChkbSBycTogQXZvaWQgdGhh
dCByZXF1ZXN0IHByb2Nlc3Npbmcgc3RhbGxzIHNwb3JhZGljYWxseSksIHdoaWNoIGNsYWltZWQN
Cj4gb25lIHJlcXVlc3QgcHJvY2Vzc2luZyBzdGFsbGluZyBpcyBmaXhlZCwgYnV0IG5ldmVyIGV4
cGxhaW5lZCB0aGUgYmVoaW5kDQo+IGlkZWEsIGFuZCBpdCBpcyBhIHdvcmthb3VuZCBhdCBtb3N0
LiBFdmVuIHRoZSBxdWVzdGlvbiBpc24ndCBleHBsYWluZWQgYnkNCj4gYW55b25lIGluIHJlY2Vu
dCBkaXNjdXNzaW9uLg0KPiANCj4gQWxzbyBjYWxsaW5nIGJsa19tcV9kZWxheV9ydW5faHdfcXVl
dWUoKSBpbnNpZGUgLnF1ZXVlX3JxKCkgaXMgYSBob3JyaWJsZQ0KPiBoYWNrIGJlY2F1c2UgaXQg
bWFrZXMgQkxLX01RX1NfU0NIRURfUkVTVEFSVCBub3Qgd29ya2luZywgYW5kIGRlZ3JhZGVzIEkv
Tw0KPiBwZWZvcm1hbmNlIGEgbG90Lg0KPiANCj4gRmluYWxseSB0aGlzIHBhdGNoIG1ha2VzIHN1
cmUgdGhhdCBkbS1ycSByZXR1cm5zIEJMS19TVFNfUkVTT1VSQ0UgdG8gYmxrLW1xDQo+IG9ubHkg
d2hlbiB1bmRlcmx5aW5nIHF1ZXVlIGlzIG91dCBvZiByZXNvdXJjZSwgc28gd2Ugc3dpdGNoIHRv
IHJldHVybg0KPiBETV9NQVBJT19ERUxBWV9SRVFVRVUgaWYgZWl0aGVyIE1QQVRIRl9RVUVVRV9J
TyBvciBNUEFUSEZfUEdfSU5JVF9SRVFVSVJFRA0KPiBpcyBzZXQgaW4gbXVsdGlwYXRoX2Nsb25l
X2FuZF9tYXAoKS4NCg0KU29ycnkgYnV0IGluIG15IG9waW5pb24gdGhlIGFib3ZlIGRlc2NyaXB0
aW9uIHNob3dzIHRoYXQgeW91IGRvbid0IHVuZGVyc3RhbmQNCnRoZSBkbS1tcGF0aCBkcml2ZXIg
Y29tcGxldGVseS4NCg0KPiBkaWZmIC0tZ2l0IGEvZHJpdmVycy9tZC9kbS1tcGF0aC5jIGIvZHJp
dmVycy9tZC9kbS1tcGF0aC5jDQo+IGluZGV4IGM4ZmFhMmI4NTg0Mi4uOGZlM2Y0NTQwN2NlIDEw
MDY0NA0KPiAtLS0gYS9kcml2ZXJzL21kL2RtLW1wYXRoLmMNCj4gKysrIGIvZHJpdmVycy9tZC9k
bS1tcGF0aC5jDQo+IEBAIC00ODQsOSArNDg0LDcgQEAgc3RhdGljIGludCBtdWx0aXBhdGhfY2xv
bmVfYW5kX21hcChzdHJ1Y3QgZG1fdGFyZ2V0ICp0aSwgc3RydWN0IHJlcXVlc3QgKnJxLA0KPiAg
CQlyZXR1cm4gRE1fTUFQSU9fS0lMTDsNCj4gIAl9IGVsc2UgaWYgKHRlc3RfYml0KE1QQVRIRl9R
VUVVRV9JTywgJm0tPmZsYWdzKSB8fA0KPiAgCQkgICB0ZXN0X2JpdChNUEFUSEZfUEdfSU5JVF9S
RVFVSVJFRCwgJm0tPmZsYWdzKSkgew0KPiAtCQlpZiAocGdfaW5pdF9hbGxfcGF0aHMobSkpDQo+
IC0JCQlyZXR1cm4gRE1fTUFQSU9fREVMQVlfUkVRVUVVRTsNCj4gLQkJcmV0dXJuIERNX01BUElP
X1JFUVVFVUU7DQo+ICsJCXJldHVybiBETV9NQVBJT19ERUxBWV9SRVFVRVVFOw0KPiAgCX0NCg0K
VGhpcyBwYXRjaCByZW1vdmVzIGEgcGdfaW5pdF9hbGxfcGF0aHMoKSBjYWxsIGJ1dCB5b3UgZG9u
J3QgZXhwbGFpbiB3aHkgeW91DQp0aGluayBpdCBpcyBhbGxvd2VkIHRvIHJlbW92ZSB0aGF0IGNh
bGwuIERpZCB5b3UgcGVyaGFwcyByZW1vdmUgdGhhdCBjYWxsIGJ5DQptaXN0YWtlPw0KDQpCYXJ0
Lg==

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V2 1/5] dm-mpath: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
  2017-11-27 17:14   ` Bart Van Assche
@ 2017-12-01  2:01     ` Ming Lei
  0 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2017-12-01  2:01 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: dm-devel, linux-block, axboe, snitzer, hch, linux-kernel, hare, osandov

On Mon, Nov 27, 2017 at 05:14:46PM +0000, Bart Van Assche wrote:
> On Mon, 2017-11-27 at 13:07 +0800, Ming Lei wrote:
> > If .queue_rq() returns BLK_STS_RESOURCE, blk-mq will rerun the queue in
> > the three situations:
> > 
> > 1) if BLK_MQ_S_SCHED_RESTART is set
> > - queue is rerun after one rq is completed, see blk_mq_sched_restart()
> > which is run from blk_mq_free_request()
> > 
> > 2) run out of driver tag
> > - queue is rerun after one tag is freed
> > 
> > 3) otherwise
> > - queue is run immediately in blk_mq_dispatch_rq_list()
> > 
> > This random dealy of running hw queue is introduced by commit 6077c2d706097c0
> > (dm rq: Avoid that request processing stalls sporadically), which claimed
> > one request processing stalling is fixed, but never explained the behind
> > idea, and it is a workaound at most. Even the question isn't explained by
> > anyone in recent discussion.
> > 
> > Also calling blk_mq_delay_run_hw_queue() inside .queue_rq() is a horrible
> > hack because it makes BLK_MQ_S_SCHED_RESTART not working, and degrades I/O
> > peformance a lot.
> > 
> > Finally this patch makes sure that dm-rq returns BLK_STS_RESOURCE to blk-mq
> > only when underlying queue is out of resource, so we switch to return
> > DM_MAPIO_DELAY_REQUEU if either MPATHF_QUEUE_IO or MPATHF_PG_INIT_REQUIRED
> > is set in multipath_clone_and_map().
> 
> Sorry but in my opinion the above description shows that you don't understand
> the dm-mpath driver completely.

I have to treat your above comment as a noop since you never provide a explanation.

Also I don't think it is wrong to deal with MPATHF_QUEUE_IO/MPATHF_PG_INIT_REQUIRED
via DM_MAPIO_DELAY_REQUEUE, since both can seldom happen, and the delay
won't cause performance issue.

The idea behind this change is that this patchset switches to return BLK_STS_RESOURCE
to blk-mq only when we run out of resource, but the above two(MPATHF_QUEUE_IO and
MPATHF_PG_INIT_REQUIRED) don't belong to 'run out of resource'.

> 
> > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> > index c8faa2b85842..8fe3f45407ce 100644
> > --- a/drivers/md/dm-mpath.c
> > +++ b/drivers/md/dm-mpath.c
> > @@ -484,9 +484,7 @@ static int multipath_clone_and_map(struct dm_target *ti, struct request *rq,
> >  		return DM_MAPIO_KILL;
> >  	} else if (test_bit(MPATHF_QUEUE_IO, &m->flags) ||
> >  		   test_bit(MPATHF_PG_INIT_REQUIRED, &m->flags)) {
> > -		if (pg_init_all_paths(m))
> > -			return DM_MAPIO_DELAY_REQUEUE;
> > -		return DM_MAPIO_REQUEUE;
> > +		return DM_MAPIO_DELAY_REQUEUE;
> >  	}
> 
> This patch removes a pg_init_all_paths() call but you don't explain why you
> think it is allowed to remove that call. Did you perhaps remove that call by
> mistake?

OK, that is a problem, will fix it in V2.

-- 
Ming

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-12-01  2:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-27  5:07 [PATCH V2 0/5] dm-rq: improve sequential I/O performance Ming Lei
2017-11-27  5:07 ` [PATCH V2 1/5] dm-mpath: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE Ming Lei
2017-11-27 17:14   ` Bart Van Assche
2017-12-01  2:01     ` Ming Lei
2017-11-27  5:07 ` [PATCH V2 2/5] dm-mpath: return DM_MAPIO_REQUEUE in case of rq allocation failure Ming Lei
2017-11-27  5:07 ` [PATCH V2 3/5] blk-mq: move actual issue into one helper Ming Lei
2017-11-27  5:07 ` [PATCH V2 4/5] blk-mq: return dispatch result to caller in blk_mq_try_issue_directly Ming Lei
2017-11-27  5:07 ` [PATCH V2 5/5] blk-mq: issue request directly for blk_insert_cloned_request Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).