All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, Ming Lei <ming.lei@redhat.com>,
	Sagi Grimberg <sagi@grimberg.me>,
	Baolin Wang <baolin.wang7@gmail.com>,
	Christoph Hellwig <hch@infradead.org>
Subject: [PATCH 9/9] blk-mq: support batching dispatch in case of io scheduler
Date: Wed, 13 May 2020 17:54:43 +0800	[thread overview]
Message-ID: <20200513095443.2038859-10-ming.lei@redhat.com> (raw)
In-Reply-To: <20200513095443.2038859-1-ming.lei@redhat.com>

More and more drivers want to get batching requests queued from
block layer, such as mmc, and tcp based storage drivers. Also
current in-tree users have virtio-scsi, virtio-blk and nvme.

For none, we already support batching dispatch.

But for io scheduler, every time we just take one request from scheduler
and pass the single request to blk_mq_dispatch_rq_list(). This way makes
batching dispatch not possible when io scheduler is applied. One reason
is that we don't want to hurt sequential IO performance, becasue IO
merge chance is reduced if more requests are dequeued from scheduler
queue.

Try to support batching dispatch for io scheduler by starting with the
following simple approach:

1) still make sure we can get budget before dequeueing request

2) use hctx->dispatch_busy to evaluate if queue is busy, if it is busy
we fackback to non-batching dispatch, otherwise dequeue as many as
possible requests from scheduler, and pass them to blk_mq_dispatch_rq_list().

Wrt. 2), we use similar policy for none, and turns out that SCSI SSD
performance got improved much.

In future, maybe we can develop more intelligent algorithem for batching
dispatch.

[1] https://lore.kernel.org/linux-block/20200512075501.GF1531898@T590/#r
[2] https://lore.kernel.org/linux-block/fe6bd8b9-6ed9-b225-f80c-314746133722@grimberg.me/

Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Baolin Wang <baolin.wang7@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c | 76 +++++++++++++++++++++++++++++++++++++++++++-
 block/blk-mq.c       |  2 --
 2 files changed, 75 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 78fc8d80caaf..77d5093916b7 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -7,6 +7,7 @@
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/blk-mq.h>
+#include <linux/list_sort.h>
 
 #include <trace/events/block.h>
 
@@ -80,6 +81,69 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
 	blk_mq_run_hw_queue(hctx, true);
 }
 
+/*
+ * We know bfq and deadline apply single scheduler queue instead of multi
+ * queue. However, the two are often used on single queue devices, also
+ * the current @hctx should affect the real device status most of times
+ * because of locality principle.
+ *
+ * So use current hctx->dispatch_busy directly for figuring out batching
+ * dispatch count.
+ */
+static unsigned int blk_mq_sched_get_batching_nr(struct blk_mq_hw_ctx *hctx)
+{
+	if (hctx->dispatch_busy)
+		return 1;
+	return hctx->queue->nr_requests;
+}
+
+static int sched_rq_cmp(void *priv, struct list_head *a, struct list_head *b)
+{
+	struct request *rqa = container_of(a, struct request, queuelist);
+	struct request *rqb = container_of(b, struct request, queuelist);
+
+	return rqa->mq_hctx > rqb->mq_hctx;
+}
+
+static inline void blk_mq_do_dispatch_rq_lists(struct blk_mq_hw_ctx *hctx,
+		struct list_head *lists, bool multi_hctxs, unsigned count)
+{
+
+	if (likely(!multi_hctxs)) {
+		blk_mq_dispatch_rq_list(hctx, lists, true, count);
+		return;
+	}
+
+	/*
+	 * Requests from different hctx may be dequeued from some scheduler,
+	 * such as bfq and deadline.
+	 *
+	 * Sort the requests in the list according to their hctx, dispatch
+	 * batching requests from same hctx
+	 */
+	list_sort(NULL, lists, sched_rq_cmp);
+
+	while (!list_empty(lists)) {
+		LIST_HEAD(list);
+		struct request *new, *rq = list_first_entry(lists,
+				struct request, queuelist);
+		unsigned cnt = 0;
+
+		list_for_each_entry(new, lists, queuelist) {
+			if (new->mq_hctx != rq->mq_hctx)
+				break;
+			cnt++;
+		}
+
+		if (new->mq_hctx == rq->mq_hctx)
+			list_splice_tail_init(lists, &list);
+		else
+			list_cut_before(&list, lists, &new->queuelist);
+
+		blk_mq_dispatch_rq_list(rq->mq_hctx, &list, true, cnt);
+	}
+}
+
 #define BLK_MQ_BUDGET_DELAY	3		/* ms units */
 
 /*
@@ -97,6 +161,9 @@ static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
 	LIST_HEAD(rq_list);
 	int ret = 0;
 	struct request *rq;
+	int cnt = 0;
+	unsigned int max_dispatch = blk_mq_sched_get_batching_nr(hctx);
+	bool multi_hctxs = false;
 
 	do {
 		if (e->type->ops.has_work && !e->type->ops.has_work(hctx))
@@ -130,7 +197,14 @@ static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
 		 * in blk_mq_dispatch_rq_list().
 		 */
 		list_add(&rq->queuelist, &rq_list);
-	} while (blk_mq_dispatch_rq_list(rq->mq_hctx, &rq_list, true, 1));
+		cnt++;
+
+		if (rq->mq_hctx != hctx && !multi_hctxs)
+			multi_hctxs = true;
+	} while (cnt < max_dispatch);
+
+	if (cnt)
+		blk_mq_do_dispatch_rq_lists(hctx, &rq_list, multi_hctxs, cnt);
 
 	return ret;
 }
diff --git a/block/blk-mq.c b/block/blk-mq.c
index bfdfdd61e663..b83c59d640b4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1319,8 +1319,6 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
 	if (list_empty(list))
 		return false;
 
-	WARN_ON(!list_is_singular(list) && got_budget);
-
 	/*
 	 * Now process all the entries, sending them to the driver.
 	 */
-- 
2.25.2


  parent reply	other threads:[~2020-05-13  9:57 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-13  9:54 [PATCH 0/9] blk-mq: support batching dispatch from scheduler Ming Lei
2020-05-13  9:54 ` [PATCH 1/9] blk-mq: pass request queue into get/put budget callback Ming Lei
2020-05-13 10:06   ` Johannes Thumshirn
2020-05-13 12:24   ` Christoph Hellwig
2020-05-13 14:36   ` Doug Anderson
2020-05-13 22:48   ` Sagi Grimberg
2020-05-13  9:54 ` [PATCH 2/9] blk-mq: pass hctx to blk_mq_dispatch_rq_list Ming Lei
2020-05-13 12:26   ` Christoph Hellwig
2020-05-13 22:49   ` Sagi Grimberg
2020-05-13  9:54 ` [PATCH 3/9] blk-mq: don't predicate last flag in blk_mq_dispatch_rq_list Ming Lei
2020-05-13 12:27   ` Christoph Hellwig
2020-05-14  0:50     ` Ming Lei
2020-05-14  5:50       ` Christoph Hellwig
2020-05-14  2:09     ` Ming Lei
2020-05-14  2:19       ` Ming Lei
2020-05-14  3:21       ` Keith Busch
2020-05-14  8:28         ` Ming Lei
2020-05-13  9:54 ` [PATCH 4/9] blk-mq: move getting driver tag and bugget into one helper Ming Lei
2020-05-13 12:37   ` Christoph Hellwig
2020-05-13 22:54   ` Sagi Grimberg
2020-05-13  9:54 ` [PATCH 5/9] blk-mq: move .queue_rq code " Ming Lei
2020-05-13 12:38   ` Christoph Hellwig
2020-05-13  9:54 ` [PATCH 6/9] blk-mq: move code for handling partial dispatch " Ming Lei
2020-05-13 12:56   ` Christoph Hellwig
2020-05-13 13:01     ` Christoph Hellwig
2020-05-14  1:25       ` Ming Lei
2020-05-13  9:54 ` [PATCH 7/9] blk-mq: remove dead check from blk_mq_dispatch_rq_list Ming Lei
2020-05-13 12:57   ` Christoph Hellwig
2020-05-13 23:24   ` Sagi Grimberg
2020-05-13  9:54 ` [PATCH 8/9] blk-mq: pass obtained budget count to blk_mq_dispatch_rq_list Ming Lei
2020-05-13 13:26   ` Christoph Hellwig
2020-05-13  9:54 ` Ming Lei [this message]
2020-05-23  7:45 ` [PATCH 0/9] blk-mq: support batching dispatch from scheduler Baolin Wang
2020-05-25  2:17   ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200513095443.2038859-10-ming.lei@redhat.com \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=baolin.wang7@gmail.com \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.