All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V3 00/14] blk-mq-sched: improve SCSI-MQ performance
@ 2017-08-26 16:33 Ming Lei
  2017-08-26 16:33 ` [PATCH V3 01/14] blk-mq-sched: fix scheduler bad performance Ming Lei
                   ` (13 more replies)
  0 siblings, 14 replies; 28+ messages in thread
From: Ming Lei @ 2017-08-26 16:33 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, Laurence Oberman, Paolo Valente, Mel Gorman, Ming Lei

In Red Hat internal storage test wrt. blk-mq scheduler, we
found that I/O performance is much bad with mq-deadline, especially
about sequential I/O on some multi-queue SCSI devcies(lpfc, qla2xxx,
SRP...)

Turns out one big issue causes the performance regression: requests
are still dequeued from sw queue/scheduler queue even when ldd's
queue is busy, so I/O merge becomes quite difficult to make, then
sequential IO degrades a lot.

The 1st five patches improve this situation, and brings back
some performance loss.

Patch 6 ~ 7 uses q->queue_depth as hint for setting up
scheduler queue depth.

Patch 8 ~ 15 improve bio merge via hash table in sw queue,
which makes bio merge more efficient than current approch
in which only the last 8 requests are checked. Since patch
6~14 converts to the scheduler way of dequeuing one request
from sw queue one time for SCSI device, and the times of
acquring ctx->lock is increased, and merging bio via hash
table decreases holding time of ctx->lock and should eliminate
effect from patch 14. 

With this changes, SCSI-MQ sequential I/O performance is
improved much, Paolo reported that mq-deadline performance
improved much[2] in his dbench test wrt V2. Also performanc
improvement on lpfc/qla2xx was observed with V1.[1]

Also Bart worried that this patchset may affect SRP, so provide
test data on SCSI SRP this time:

- fio(libaio, bs:4k, dio, queue_depth:64, 64 jobs)
- system(16 cores, dual sockets, mem: 96G)

          |v4.13-rc6+*  |v4.13-rc6+   | patched v4.13-rc6+ 
-----------------------------------------------------
 IOPS(K)  |  DEADLINE   |    NONE     |    NONE     
-----------------------------------------------------
read      |      587.81 |      511.96 |      518.51 
-----------------------------------------------------
randread  |      116.44 |      142.99 |      142.46 
-----------------------------------------------------
write     |      580.87 |       536.4 |      582.15 
-----------------------------------------------------
randwrite |      104.95 |      124.89 |      123.99 
-----------------------------------------------------


          |v4.13-rc6+   |v4.13-rc6+   | patched v4.13-rc6+ 
-----------------------------------------------------
 IOPS(K)  |  DEADLINE   |MQ-DEADLINE  |MQ-DEADLINE  
-----------------------------------------------------
read      |      587.81 |       158.7 |      450.41 
-----------------------------------------------------
randread  |      116.44 |      142.04 |      142.72 
-----------------------------------------------------
write     |      580.87 |      136.61 |      569.37 
-----------------------------------------------------
randwrite |      104.95 |      123.14 |      124.36 
-----------------------------------------------------

*: v4.13-rc6+ means v4.13-rc6 with block for-next


[1] http://marc.info/?l=linux-block&m=150151989915776&w=2
[2] https://marc.info/?l=linux-block&m=150217980602843&w=2

V3:
	- totally round robin for picking req from ctx, as suggested
	by Bart
	- remove one local variable in __sbitmap_for_each_set()
	- drop patches of single dispatch list, which can improve
	performance on mq-deadline, but cause a bit degrade on
	none because all hctxs need to be checked after ->dispatch
	is flushed. Will post it again once it is mature.
	- rebase on v4.13-rc6 with block for-next

V2:
	- dequeue request from sw queues in round roubin's style
	as suggested by Bart, and introduces one helper in sbitmap
	for this purpose
	- improve bio merge via hash table from sw queue
	- add comments about using DISPATCH_BUSY state in lockless way,
	simplifying handling on busy state,
	- hold ctx->lock when clearing ctx busy bit as suggested
	by Bart

Ming Lei (14):
  blk-mq-sched: fix scheduler bad performance
  sbitmap: introduce __sbitmap_for_each_set()
  blk-mq: introduce blk_mq_dispatch_rq_from_ctx()
  blk-mq-sched: move actual dispatching into one helper
  blk-mq-sched: improve dispatching from sw queue
  blk-mq-sched: don't dequeue request until all in ->dispatch are
    flushed
  blk-mq-sched: introduce blk_mq_sched_queue_depth()
  blk-mq-sched: use q->queue_depth as hint for q->nr_requests
  block: introduce rqhash helpers
  block: move actual bio merge code into __elv_merge
  block: add check on elevator for supporting bio merge via hashtable
    from blk-mq sw queue
  block: introduce .last_merge and .hash to blk_mq_ctx
  blk-mq-sched: refactor blk_mq_sched_try_merge()
  blk-mq: improve bio merge from blk-mq sw queue

 block/blk-mq-debugfs.c  |   1 +
 block/blk-mq-sched.c    | 186 ++++++++++++++++++++++++++++++++----------------
 block/blk-mq-sched.h    |  23 ++++++
 block/blk-mq.c          |  93 +++++++++++++++++++++++-
 block/blk-mq.h          |   7 ++
 block/blk-settings.c    |   2 +
 block/blk.h             |  55 ++++++++++++++
 block/elevator.c        |  93 ++++++++++++++----------
 include/linux/blk-mq.h  |   3 +
 include/linux/sbitmap.h |  56 +++++++++++----
 10 files changed, 401 insertions(+), 118 deletions(-)

-- 
2.9.5

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH V3 01/14] blk-mq-sched: fix scheduler bad performance
  2017-08-26 16:33 [PATCH V3 00/14] blk-mq-sched: improve SCSI-MQ performance Ming Lei
@ 2017-08-26 16:33 ` Ming Lei
  2017-08-26 16:33 ` [PATCH V3 02/14] sbitmap: introduce __sbitmap_for_each_set() Ming Lei
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2017-08-26 16:33 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, Laurence Oberman, Paolo Valente, Mel Gorman, Ming Lei

When hw queue is busy, we shouldn't take requests from
scheduler queue any more, otherwise it is difficult to do
IO merge.

This patch fixes the awful IO performance on some
SCSI devices(lpfc, qla2xxx, ...) when mq-deadline/kyber
is used by not taking requests if hw queue is busy.

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 4ab69435708c..845e5baf8af1 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -94,7 +94,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	struct request_queue *q = hctx->queue;
 	struct elevator_queue *e = q->elevator;
 	const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
-	bool did_work = false;
+	bool do_sched_dispatch = true;
 	LIST_HEAD(rq_list);
 
 	/* RCU or SRCU read lock is needed before checking quiesced flag */
@@ -125,7 +125,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	 */
 	if (!list_empty(&rq_list)) {
 		blk_mq_sched_mark_restart_hctx(hctx);
-		did_work = blk_mq_dispatch_rq_list(q, &rq_list);
+		do_sched_dispatch = blk_mq_dispatch_rq_list(q, &rq_list);
 	} else if (!has_sched_dispatch) {
 		blk_mq_flush_busy_ctxs(hctx, &rq_list);
 		blk_mq_dispatch_rq_list(q, &rq_list);
@@ -136,7 +136,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	 * on the dispatch list, OR if we did have work but weren't able
 	 * to make progress.
 	 */
-	if (!did_work && has_sched_dispatch) {
+	if (do_sched_dispatch && has_sched_dispatch) {
 		do {
 			struct request *rq;
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH V3 02/14] sbitmap: introduce __sbitmap_for_each_set()
  2017-08-26 16:33 [PATCH V3 00/14] blk-mq-sched: improve SCSI-MQ performance Ming Lei
  2017-08-26 16:33 ` [PATCH V3 01/14] blk-mq-sched: fix scheduler bad performance Ming Lei
@ 2017-08-26 16:33 ` Ming Lei
  2017-08-30 15:55   ` Bart Van Assche
  2017-08-26 16:33 ` [PATCH V3 03/14] blk-mq: introduce blk_mq_dispatch_rq_from_ctx() Ming Lei
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2017-08-26 16:33 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, Laurence Oberman, Paolo Valente, Mel Gorman,
	Ming Lei, Omar Sandoval

We need to iterate ctx starting from any ctx in round robin
way, so introduce this helper.

Cc: Omar Sandoval <osandov@fb.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/linux/sbitmap.h | 56 ++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 41 insertions(+), 15 deletions(-)

diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h
index a1904aadbc45..55b4a762c8b7 100644
--- a/include/linux/sbitmap.h
+++ b/include/linux/sbitmap.h
@@ -211,10 +211,14 @@ bool sbitmap_any_bit_set(const struct sbitmap *sb);
  */
 bool sbitmap_any_bit_clear(const struct sbitmap *sb);
 
+#define SB_NR_TO_INDEX(sb, bitnr) ((bitnr) >> (sb)->shift)
+#define SB_NR_TO_BIT(sb, bitnr) ((bitnr) & ((1U << (sb)->shift) - 1U))
+
 typedef bool (*sb_for_each_fn)(struct sbitmap *, unsigned int, void *);
 
 /**
  * sbitmap_for_each_set() - Iterate over each set bit in a &struct sbitmap.
+ * @start: Where to start the iteration
  * @sb: Bitmap to iterate over.
  * @fn: Callback. Should return true to continue or false to break early.
  * @data: Pointer to pass to callback.
@@ -222,35 +226,57 @@ typedef bool (*sb_for_each_fn)(struct sbitmap *, unsigned int, void *);
  * This is inline even though it's non-trivial so that the function calls to the
  * callback will hopefully get optimized away.
  */
-static inline void sbitmap_for_each_set(struct sbitmap *sb, sb_for_each_fn fn,
-					void *data)
+static inline void __sbitmap_for_each_set(struct sbitmap *sb,
+					  unsigned int start,
+					  sb_for_each_fn fn, void *data)
 {
-	unsigned int i;
+	unsigned int index = SB_NR_TO_INDEX(sb, start);
+	unsigned int nr = SB_NR_TO_BIT(sb, start);
+	unsigned int scanned = 0;
 
-	for (i = 0; i < sb->map_nr; i++) {
-		struct sbitmap_word *word = &sb->map[i];
-		unsigned int off, nr;
+	while (1) {
+		struct sbitmap_word *word = &sb->map[index];
+		unsigned int depth = min_t(unsigned int, word->depth - nr,
+					   sb->depth - scanned);
 
+		scanned += depth;
 		if (!word->word)
-			continue;
+			goto next;
 
-		nr = 0;
-		off = i << sb->shift;
+		depth += nr;
+		start = index << sb->shift;
 		while (1) {
-			nr = find_next_bit(&word->word, word->depth, nr);
-			if (nr >= word->depth)
+			nr = find_next_bit(&word->word, depth, nr);
+			if (nr >= depth)
 				break;
-
-			if (!fn(sb, off + nr, data))
+			if (!fn(sb, start + nr, data))
 				return;
 
 			nr++;
 		}
+ next:
+		if (scanned >= sb->depth)
+			break;
+		nr = 0;
+		if (++index >= sb->map_nr)
+			index = 0;
 	}
 }
 
-#define SB_NR_TO_INDEX(sb, bitnr) ((bitnr) >> (sb)->shift)
-#define SB_NR_TO_BIT(sb, bitnr) ((bitnr) & ((1U << (sb)->shift) - 1U))
+/**
+ * sbitmap_for_each_set() - Iterate over each set bit in a &struct sbitmap.
+ * @sb: Bitmap to iterate over.
+ * @fn: Callback. Should return true to continue or false to break early.
+ * @data: Pointer to pass to callback.
+ *
+ * This is inline even though it's non-trivial so that the function calls to the
+ * callback will hopefully get optimized away.
+ */
+static inline void sbitmap_for_each_set(struct sbitmap *sb, sb_for_each_fn fn,
+					void *data)
+{
+	__sbitmap_for_each_set(sb, 0, fn, data);
+}
 
 static inline unsigned long *__sbitmap_word(struct sbitmap *sb,
 					    unsigned int bitnr)
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH V3 03/14] blk-mq: introduce blk_mq_dispatch_rq_from_ctx()
  2017-08-26 16:33 [PATCH V3 00/14] blk-mq-sched: improve SCSI-MQ performance Ming Lei
  2017-08-26 16:33 ` [PATCH V3 01/14] blk-mq-sched: fix scheduler bad performance Ming Lei
  2017-08-26 16:33 ` [PATCH V3 02/14] sbitmap: introduce __sbitmap_for_each_set() Ming Lei
@ 2017-08-26 16:33 ` Ming Lei
  2017-08-30 16:01   ` Bart Van Assche
  2017-08-26 16:33 ` [PATCH V3 04/14] blk-mq-sched: move actual dispatching into one helper Ming Lei
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2017-08-26 16:33 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, Laurence Oberman, Paolo Valente, Mel Gorman, Ming Lei

This function is introduced for dequeuing request
from sw queue so that we can dispatch it in
scheduler's way.

More importantly, some SCSI devices may set
q->queue_depth, which is a per-request_queue limit,
and applied on pending I/O from all hctxs. This
function is introduced for avoiding to dequeue too
many requests from sw queue when ->dispatch isn't
flushed completely.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 38 ++++++++++++++++++++++++++++++++++++++
 block/blk-mq.h |  2 ++
 2 files changed, 40 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3f18cff80050..f063dd0f197f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -880,6 +880,44 @@ void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list)
 }
 EXPORT_SYMBOL_GPL(blk_mq_flush_busy_ctxs);
 
+struct dispatch_rq_data {
+	struct blk_mq_hw_ctx *hctx;
+	struct request *rq;
+};
+
+static bool dispatch_rq_from_ctx(struct sbitmap *sb, unsigned int bitnr, void *data)
+{
+	struct dispatch_rq_data *dispatch_data = data;
+	struct blk_mq_hw_ctx *hctx = dispatch_data->hctx;
+	struct blk_mq_ctx *ctx = hctx->ctxs[bitnr];
+
+	spin_lock(&ctx->lock);
+	if (unlikely(!list_empty(&ctx->rq_list))) {
+		dispatch_data->rq = list_entry_rq(ctx->rq_list.next);
+		list_del_init(&dispatch_data->rq->queuelist);
+		if (list_empty(&ctx->rq_list))
+			sbitmap_clear_bit(sb, bitnr);
+	}
+	spin_unlock(&ctx->lock);
+
+	return !dispatch_data->rq;
+}
+
+struct request *blk_mq_dispatch_rq_from_ctx(struct blk_mq_hw_ctx *hctx,
+					    struct blk_mq_ctx *start)
+{
+	unsigned off = start ? start->index_hw : 0;
+	struct dispatch_rq_data data = {
+		.hctx = hctx,
+		.rq   = NULL,
+	};
+
+	__sbitmap_for_each_set(&hctx->ctx_map, off,
+			       dispatch_rq_from_ctx, &data);
+
+	return data.rq;
+}
+
 static inline unsigned int queued_to_index(unsigned int queued)
 {
 	if (!queued)
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 98252b79b80b..e42748bfb959 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -35,6 +35,8 @@ void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list);
 bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx);
 bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
 				bool wait);
+struct request *blk_mq_dispatch_rq_from_ctx(struct blk_mq_hw_ctx *hctx,
+					    struct blk_mq_ctx *start);
 
 /*
  * Internal helpers for allocating/freeing the request map
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH V3 04/14] blk-mq-sched: move actual dispatching into one helper
  2017-08-26 16:33 [PATCH V3 00/14] blk-mq-sched: improve SCSI-MQ performance Ming Lei
                   ` (2 preceding siblings ...)
  2017-08-26 16:33 ` [PATCH V3 03/14] blk-mq: introduce blk_mq_dispatch_rq_from_ctx() Ming Lei
@ 2017-08-26 16:33 ` Ming Lei
  2017-08-26 16:33 ` [PATCH V3 05/14] blk-mq-sched: improve dispatching from sw queue Ming Lei
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2017-08-26 16:33 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, Laurence Oberman, Paolo Valente, Mel Gorman, Ming Lei

So that it becomes easy to support to dispatch from
sw queue in the following patch.

No functional change.

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 845e5baf8af1..f69752961a34 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -89,6 +89,22 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx)
 	return false;
 }
 
+static void blk_mq_do_dispatch(struct request_queue *q,
+			       struct elevator_queue *e,
+			       struct blk_mq_hw_ctx *hctx)
+{
+	LIST_HEAD(rq_list);
+
+	do {
+		struct request *rq;
+
+		rq = e->type->ops.mq.dispatch_request(hctx);
+		if (!rq)
+			break;
+		list_add(&rq->queuelist, &rq_list);
+	} while (blk_mq_dispatch_rq_list(q, &rq_list));
+}
+
 void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
@@ -136,16 +152,8 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	 * on the dispatch list, OR if we did have work but weren't able
 	 * to make progress.
 	 */
-	if (do_sched_dispatch && has_sched_dispatch) {
-		do {
-			struct request *rq;
-
-			rq = e->type->ops.mq.dispatch_request(hctx);
-			if (!rq)
-				break;
-			list_add(&rq->queuelist, &rq_list);
-		} while (blk_mq_dispatch_rq_list(q, &rq_list));
-	}
+	if (do_sched_dispatch && has_sched_dispatch)
+		blk_mq_do_dispatch(q, e, hctx);
 }
 
 bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH V3 05/14] blk-mq-sched: improve dispatching from sw queue
  2017-08-26 16:33 [PATCH V3 00/14] blk-mq-sched: improve SCSI-MQ performance Ming Lei
                   ` (3 preceding siblings ...)
  2017-08-26 16:33 ` [PATCH V3 04/14] blk-mq-sched: move actual dispatching into one helper Ming Lei
@ 2017-08-26 16:33 ` Ming Lei
  2017-08-30 16:34   ` Bart Van Assche
  2017-08-26 16:33 ` [PATCH V3 06/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed Ming Lei
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2017-08-26 16:33 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, Laurence Oberman, Paolo Valente, Mel Gorman, Ming Lei

SCSI devices use host-wide tagset, and the shared
driver tag space is often quite big. Meantime
there is also queue depth for each lun(.cmd_per_lun),
which is often small.

So lots of requests may stay in sw queue, and we
always flush all belonging to same hw queue and
dispatch them all to driver, unfortunately it is
easy to cause queue busy because of the small
per-lun queue depth. Once these requests are flushed
out, they have to stay in hctx->dispatch, and no bio
merge can participate into these requests, and
sequential IO performance is hurted.

This patch improves dispatching from sw queue when
there is per-request-queue queue depth by taking
request one by one from sw queue, just like the way
of IO scheduler.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c   | 61 +++++++++++++++++++++++++++++++++++++++++++++-----
 include/linux/blk-mq.h |  2 ++
 2 files changed, 57 insertions(+), 6 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index f69752961a34..735e432294ab 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -89,9 +89,9 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx)
 	return false;
 }
 
-static void blk_mq_do_dispatch(struct request_queue *q,
-			       struct elevator_queue *e,
-			       struct blk_mq_hw_ctx *hctx)
+static void blk_mq_do_dispatch_sched(struct request_queue *q,
+				     struct elevator_queue *e,
+				     struct blk_mq_hw_ctx *hctx)
 {
 	LIST_HEAD(rq_list);
 
@@ -105,6 +105,42 @@ static void blk_mq_do_dispatch(struct request_queue *q,
 	} while (blk_mq_dispatch_rq_list(q, &rq_list));
 }
 
+static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
+					  struct blk_mq_ctx *ctx)
+{
+	unsigned idx = ctx->index_hw;
+
+	if (++idx == hctx->nr_ctx)
+		idx = 0;
+
+	return hctx->ctxs[idx];
+}
+
+static void blk_mq_do_dispatch_ctx(struct request_queue *q,
+				   struct blk_mq_hw_ctx *hctx)
+{
+	LIST_HEAD(rq_list);
+	struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from);
+	bool dispatched;
+
+	do {
+		struct request *rq;
+
+		rq = blk_mq_dispatch_rq_from_ctx(hctx, ctx);
+		if (!rq)
+			break;
+		list_add(&rq->queuelist, &rq_list);
+
+		/* round robin for fair dispatch */
+		ctx = blk_mq_next_ctx(hctx, rq->mq_ctx);
+
+		dispatched = blk_mq_dispatch_rq_list(q, &rq_list);
+	} while (dispatched);
+
+	if (!dispatched)
+		WRITE_ONCE(hctx->dispatch_from, blk_mq_next_ctx(hctx, ctx));
+}
+
 void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
@@ -142,18 +178,31 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	if (!list_empty(&rq_list)) {
 		blk_mq_sched_mark_restart_hctx(hctx);
 		do_sched_dispatch = blk_mq_dispatch_rq_list(q, &rq_list);
-	} else if (!has_sched_dispatch) {
+	} else if (!has_sched_dispatch && !q->queue_depth) {
+		/*
+		 * If there is no per-request_queue depth, we
+		 * flush all requests in this hw queue, otherwise
+		 * pick up request one by one from sw queue for
+		 * avoiding to mess up I/O merge when dispatch
+		 * is busy, which can be triggered easily by
+		 * per-request_queue queue depth
+		 */
 		blk_mq_flush_busy_ctxs(hctx, &rq_list);
 		blk_mq_dispatch_rq_list(q, &rq_list);
 	}
 
+	if (!do_sched_dispatch)
+		return;
+
 	/*
 	 * We want to dispatch from the scheduler if we had no work left
 	 * on the dispatch list, OR if we did have work but weren't able
 	 * to make progress.
 	 */
-	if (do_sched_dispatch && has_sched_dispatch)
-		blk_mq_do_dispatch(q, e, hctx);
+	if (has_sched_dispatch)
+		blk_mq_do_dispatch_sched(q, e, hctx);
+	else
+		blk_mq_do_dispatch_ctx(q, hctx);
 }
 
 bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 50c6485cb04f..7b7a366a97f3 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -30,6 +30,8 @@ struct blk_mq_hw_ctx {
 
 	struct sbitmap		ctx_map;
 
+	struct blk_mq_ctx	*dispatch_from;
+
 	struct blk_mq_ctx	**ctxs;
 	unsigned int		nr_ctx;
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH V3 06/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed
  2017-08-26 16:33 [PATCH V3 00/14] blk-mq-sched: improve SCSI-MQ performance Ming Lei
                   ` (4 preceding siblings ...)
  2017-08-26 16:33 ` [PATCH V3 05/14] blk-mq-sched: improve dispatching from sw queue Ming Lei
@ 2017-08-26 16:33 ` Ming Lei
  2017-08-30 17:11   ` Bart Van Assche
  2017-08-26 16:33 ` [PATCH V3 07/14] blk-mq-sched: introduce blk_mq_sched_queue_depth() Ming Lei
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2017-08-26 16:33 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, Laurence Oberman, Paolo Valente, Mel Gorman, Ming Lei

During dispatching, we moved all requests from hctx->dispatch to
one temporary list, then dispatch them one by one from this list.
Unfortunately duirng this period, run queue from other contexts
may think the queue is idle, then start to dequeue from sw/scheduler
queue and still try to dispatch because ->dispatch is empty. This way
hurts sequential I/O performance because requests are dequeued when
lld queue is busy.

This patch introduces the state of BLK_MQ_S_DISPATCH_BUSY to
make sure that request isn't dequeued until ->dispatch is
flushed.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-debugfs.c |  1 +
 block/blk-mq-sched.c   | 58 +++++++++++++++++++++++++++++++++++---------------
 block/blk-mq.c         |  6 ++++++
 include/linux/blk-mq.h |  1 +
 4 files changed, 49 insertions(+), 17 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index e53b6129ca5a..64a6b34b402c 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -180,6 +180,7 @@ static const char *const hctx_state_name[] = {
 	HCTX_STATE_NAME(SCHED_RESTART),
 	HCTX_STATE_NAME(TAG_WAITING),
 	HCTX_STATE_NAME(START_ON_RUN),
+	HCTX_STATE_NAME(DISPATCH_BUSY),
 };
 #undef HCTX_STATE_NAME
 
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 735e432294ab..4d7bea8c2594 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -146,7 +146,6 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	struct request_queue *q = hctx->queue;
 	struct elevator_queue *e = q->elevator;
 	const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
-	bool do_sched_dispatch = true;
 	LIST_HEAD(rq_list);
 
 	/* RCU or SRCU read lock is needed before checking quiesced flag */
@@ -177,8 +176,33 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	 */
 	if (!list_empty(&rq_list)) {
 		blk_mq_sched_mark_restart_hctx(hctx);
-		do_sched_dispatch = blk_mq_dispatch_rq_list(q, &rq_list);
-	} else if (!has_sched_dispatch && !q->queue_depth) {
+		blk_mq_dispatch_rq_list(q, &rq_list);
+
+		/*
+		 * We may clear DISPATCH_BUSY just after it
+		 * is set from another context, the only cost
+		 * is that one request is dequeued a bit early,
+		 * we can survive that. Given the window is
+		 * too small, no need to worry about performance
+		 * effect.
+		 */
+		if (list_empty_careful(&hctx->dispatch))
+			clear_bit(BLK_MQ_S_DISPATCH_BUSY, &hctx->state);
+	}
+
+	/*
+	 * If DISPATCH_BUSY is set, that means hw queue is busy
+	 * and requests in the list of hctx->dispatch need to
+	 * be flushed first, so return early.
+	 *
+	 * Wherever DISPATCH_BUSY is set, blk_mq_run_hw_queue()
+	 * will be run to try to make progress, so it is always
+	 * safe to check the state here.
+	 */
+	if (test_bit(BLK_MQ_S_DISPATCH_BUSY, &hctx->state))
+		return;
+
+	if (!has_sched_dispatch) {
 		/*
 		 * If there is no per-request_queue depth, we
 		 * flush all requests in this hw queue, otherwise
@@ -187,22 +211,21 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 		 * is busy, which can be triggered easily by
 		 * per-request_queue queue depth
 		 */
-		blk_mq_flush_busy_ctxs(hctx, &rq_list);
-		blk_mq_dispatch_rq_list(q, &rq_list);
-	}
-
-	if (!do_sched_dispatch)
-		return;
+		if (!q->queue_depth) {
+			blk_mq_flush_busy_ctxs(hctx, &rq_list);
+			blk_mq_dispatch_rq_list(q, &rq_list);
+		} else {
+			blk_mq_do_dispatch_ctx(q, hctx);
+		}
+	} else {
 
-	/*
-	 * We want to dispatch from the scheduler if we had no work left
-	 * on the dispatch list, OR if we did have work but weren't able
-	 * to make progress.
-	 */
-	if (has_sched_dispatch)
+		/*
+		 * We want to dispatch from the scheduler if we had no work left
+		 * on the dispatch list, OR if we did have work but weren't able
+		 * to make progress.
+		 */
 		blk_mq_do_dispatch_sched(q, e, hctx);
-	else
-		blk_mq_do_dispatch_ctx(q, hctx);
+	}
 }
 
 bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
@@ -330,6 +353,7 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
 	 */
 	spin_lock(&hctx->lock);
 	list_add(&rq->queuelist, &hctx->dispatch);
+	set_bit(BLK_MQ_S_DISPATCH_BUSY, &hctx->state);
 	spin_unlock(&hctx->lock);
 	return true;
 }
diff --git a/block/blk-mq.c b/block/blk-mq.c
index f063dd0f197f..6af56a71c1cd 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1140,6 +1140,11 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
 
 		spin_lock(&hctx->lock);
 		list_splice_init(list, &hctx->dispatch);
+		/*
+		 * DISPATCH_BUSY won't be cleared until all requests
+		 * in hctx->dispatch are dispatched successfully
+		 */
+		set_bit(BLK_MQ_S_DISPATCH_BUSY, &hctx->state);
 		spin_unlock(&hctx->lock);
 
 		/*
@@ -1927,6 +1932,7 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
 
 	spin_lock(&hctx->lock);
 	list_splice_tail_init(&tmp, &hctx->dispatch);
+	set_bit(BLK_MQ_S_DISPATCH_BUSY, &hctx->state);
 	spin_unlock(&hctx->lock);
 
 	blk_mq_run_hw_queue(hctx, true);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 7b7a366a97f3..13f6c25fa461 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -172,6 +172,7 @@ enum {
 	BLK_MQ_S_SCHED_RESTART	= 2,
 	BLK_MQ_S_TAG_WAITING	= 3,
 	BLK_MQ_S_START_ON_RUN	= 4,
+	BLK_MQ_S_DISPATCH_BUSY	= 5,
 
 	BLK_MQ_MAX_DEPTH	= 10240,
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH V3 07/14] blk-mq-sched: introduce blk_mq_sched_queue_depth()
  2017-08-26 16:33 [PATCH V3 00/14] blk-mq-sched: improve SCSI-MQ performance Ming Lei
                   ` (5 preceding siblings ...)
  2017-08-26 16:33 ` [PATCH V3 06/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed Ming Lei
@ 2017-08-26 16:33 ` Ming Lei
  2017-08-26 16:33 ` [PATCH V3 08/14] blk-mq-sched: use q->queue_depth as hint for q->nr_requests Ming Lei
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2017-08-26 16:33 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, Laurence Oberman, Paolo Valente, Mel Gorman, Ming Lei

The following patch will use one hint to figure out
default queue depth for scheduler queue, so introduce
the helper of blk_mq_sched_queue_depth() for this purpose.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c |  8 +-------
 block/blk-mq-sched.h | 11 +++++++++++
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 4d7bea8c2594..28054e49df5d 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -595,13 +595,7 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 		return 0;
 	}
 
-	/*
-	 * Default to double of smaller one between hw queue_depth and 128,
-	 * since we don't split into sync/async like the old code did.
-	 * Additionally, this is a per-hw queue depth.
-	 */
-	q->nr_requests = 2 * min_t(unsigned int, q->tag_set->queue_depth,
-				   BLKDEV_MAX_RQ);
+	q->nr_requests = blk_mq_sched_queue_depth(q);
 
 	queue_for_each_hw_ctx(q, hctx, i) {
 		ret = blk_mq_sched_alloc_tags(q, hctx, i);
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 9267d0b7c197..1d47f3fda1d0 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -96,4 +96,15 @@ static inline bool blk_mq_sched_needs_restart(struct blk_mq_hw_ctx *hctx)
 	return test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
 }
 
+static inline unsigned blk_mq_sched_queue_depth(struct request_queue *q)
+{
+	/*
+	 * Default to double of smaller one between hw queue_depth and 128,
+	 * since we don't split into sync/async like the old code did.
+	 * Additionally, this is a per-hw queue depth.
+	 */
+	return 2 * min_t(unsigned int, q->tag_set->queue_depth,
+				   BLKDEV_MAX_RQ);
+}
+
 #endif
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH V3 08/14] blk-mq-sched: use q->queue_depth as hint for q->nr_requests
  2017-08-26 16:33 [PATCH V3 00/14] blk-mq-sched: improve SCSI-MQ performance Ming Lei
                   ` (6 preceding siblings ...)
  2017-08-26 16:33 ` [PATCH V3 07/14] blk-mq-sched: introduce blk_mq_sched_queue_depth() Ming Lei
@ 2017-08-26 16:33 ` Ming Lei
  2017-08-26 16:33 ` [PATCH V3 09/14] block: introduce rqhash helpers Ming Lei
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2017-08-26 16:33 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, Laurence Oberman, Paolo Valente, Mel Gorman, Ming Lei

SCSI sets q->queue_depth from shost->cmd_per_lun, and
q->queue_depth is per request_queue and more related to
scheduler queue compared with hw queue depth, which can be
shared by queues, such as TAG_SHARED.

This patch tries to use q->queue_depth as hint for computing
q->nr_requests, which should be more effective than
current way.

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.h | 18 +++++++++++++++---
 block/blk-mq.c       | 27 +++++++++++++++++++++++++--
 block/blk-mq.h       |  1 +
 block/blk-settings.c |  2 ++
 4 files changed, 43 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 1d47f3fda1d0..906b10c54f78 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -99,12 +99,24 @@ static inline bool blk_mq_sched_needs_restart(struct blk_mq_hw_ctx *hctx)
 static inline unsigned blk_mq_sched_queue_depth(struct request_queue *q)
 {
 	/*
-	 * Default to double of smaller one between hw queue_depth and 128,
+	 * q->queue_depth is more close to scheduler queue, so use it
+	 * as hint for computing scheduler queue depth if it is valid
+	 */
+	unsigned q_depth = q->queue_depth ?: q->tag_set->queue_depth;
+
+	/*
+	 * Default to double of smaller one between queue depth and 128,
 	 * since we don't split into sync/async like the old code did.
 	 * Additionally, this is a per-hw queue depth.
 	 */
-	return 2 * min_t(unsigned int, q->tag_set->queue_depth,
-				   BLKDEV_MAX_RQ);
+	q_depth = 2 * min_t(unsigned int, q_depth, BLKDEV_MAX_RQ);
+
+	/*
+	 * when queue depth of driver is too small, we set queue depth
+	 * of scheduler queue as 128 which is the default setting of
+	 * block legacy code.
+	 */
+	return max_t(unsigned, q_depth, BLKDEV_MAX_RQ);
 }
 
 #endif
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6af56a71c1cd..fc3d26bbfc1a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2650,7 +2650,9 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
 }
 EXPORT_SYMBOL(blk_mq_free_tag_set);
 
-int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
+static int __blk_mq_update_nr_requests(struct request_queue *q,
+				       bool sched_only,
+				       unsigned int nr)
 {
 	struct blk_mq_tag_set *set = q->tag_set;
 	struct blk_mq_hw_ctx *hctx;
@@ -2669,7 +2671,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 		 * If we're using an MQ scheduler, just update the scheduler
 		 * queue depth. This is similar to what the old code would do.
 		 */
-		if (!hctx->sched_tags) {
+		if (!sched_only && !hctx->sched_tags) {
 			ret = blk_mq_tag_update_depth(hctx, &hctx->tags,
 							min(nr, set->queue_depth),
 							false);
@@ -2689,6 +2691,27 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 	return ret;
 }
 
+int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
+{
+	return __blk_mq_update_nr_requests(q, false, nr);
+}
+
+/*
+ * When drivers update q->queue_depth, this API is called so that
+ * we can use this queue depth as hint for adjusting scheduler
+ * queue depth.
+ */
+int blk_mq_update_sched_queue_depth(struct request_queue *q)
+{
+	unsigned nr;
+
+	if (!q->mq_ops || !q->elevator)
+		return 0;
+
+	nr = blk_mq_sched_queue_depth(q);
+	return __blk_mq_update_nr_requests(q, true, nr);
+}
+
 static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 							int nr_hw_queues)
 {
diff --git a/block/blk-mq.h b/block/blk-mq.h
index e42748bfb959..0277f9771fab 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -37,6 +37,7 @@ bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
 				bool wait);
 struct request *blk_mq_dispatch_rq_from_ctx(struct blk_mq_hw_ctx *hctx,
 					    struct blk_mq_ctx *start);
+int blk_mq_update_sched_queue_depth(struct request_queue *q);
 
 /*
  * Internal helpers for allocating/freeing the request map
diff --git a/block/blk-settings.c b/block/blk-settings.c
index be1f115b538b..94a349601545 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -877,6 +877,8 @@ void blk_set_queue_depth(struct request_queue *q, unsigned int depth)
 {
 	q->queue_depth = depth;
 	wbt_set_queue_depth(q->rq_wb, depth);
+
+	WARN_ON(blk_mq_update_sched_queue_depth(q));
 }
 EXPORT_SYMBOL(blk_set_queue_depth);
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH V3 09/14] block: introduce rqhash helpers
  2017-08-26 16:33 [PATCH V3 00/14] blk-mq-sched: improve SCSI-MQ performance Ming Lei
                   ` (7 preceding siblings ...)
  2017-08-26 16:33 ` [PATCH V3 08/14] blk-mq-sched: use q->queue_depth as hint for q->nr_requests Ming Lei
@ 2017-08-26 16:33 ` Ming Lei
  2017-08-26 16:33 ` [PATCH V3 10/14] block: move actual bio merge code into __elv_merge Ming Lei
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2017-08-26 16:33 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, Laurence Oberman, Paolo Valente, Mel Gorman, Ming Lei

We need this helpers for supporting to use hashtable to improve
bio merge from sw queue in the following patches.

No functional change.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk.h      | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 block/elevator.c | 36 +++++++-----------------------------
 2 files changed, 59 insertions(+), 29 deletions(-)

diff --git a/block/blk.h b/block/blk.h
index 3a3d715bd725..313002c2f666 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -147,6 +147,58 @@ static inline void blk_clear_rq_complete(struct request *rq)
  */
 #define ELV_ON_HASH(rq) ((rq)->rq_flags & RQF_HASHED)
 
+/*
+ * Merge hash stuff.
+ */
+#define rq_hash_key(rq)		(blk_rq_pos(rq) + blk_rq_sectors(rq))
+
+#define bucket(head, key)	&((head)[hash_min((key), ELV_HASH_BITS)])
+
+static inline void __rqhash_del(struct request *rq)
+{
+	hash_del(&rq->hash);
+	rq->rq_flags &= ~RQF_HASHED;
+}
+
+static inline void rqhash_del(struct request *rq)
+{
+	if (ELV_ON_HASH(rq))
+		__rqhash_del(rq);
+}
+
+static inline void rqhash_add(struct hlist_head *hash, struct request *rq)
+{
+	BUG_ON(ELV_ON_HASH(rq));
+	hlist_add_head(&rq->hash, bucket(hash, rq_hash_key(rq)));
+	rq->rq_flags |= RQF_HASHED;
+}
+
+static inline void rqhash_reposition(struct hlist_head *hash, struct request *rq)
+{
+	__rqhash_del(rq);
+	rqhash_add(hash, rq);
+}
+
+static inline struct request *rqhash_find(struct hlist_head *hash, sector_t offset)
+{
+	struct hlist_node *next;
+	struct request *rq = NULL;
+
+	hlist_for_each_entry_safe(rq, next, bucket(hash, offset), hash) {
+		BUG_ON(!ELV_ON_HASH(rq));
+
+		if (unlikely(!rq_mergeable(rq))) {
+			__rqhash_del(rq);
+			continue;
+		}
+
+		if (rq_hash_key(rq) == offset)
+			return rq;
+	}
+
+	return NULL;
+}
+
 void blk_insert_flush(struct request *rq);
 
 static inline struct request *__elv_next_request(struct request_queue *q)
diff --git a/block/elevator.c b/block/elevator.c
index 4bb2f0c93fa6..7423e9c9cb27 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -47,11 +47,6 @@ static DEFINE_SPINLOCK(elv_list_lock);
 static LIST_HEAD(elv_list);
 
 /*
- * Merge hash stuff.
- */
-#define rq_hash_key(rq)		(blk_rq_pos(rq) + blk_rq_sectors(rq))
-
-/*
  * Query io scheduler to see if the current process issuing bio may be
  * merged with rq.
  */
@@ -268,14 +263,12 @@ EXPORT_SYMBOL(elevator_exit);
 
 static inline void __elv_rqhash_del(struct request *rq)
 {
-	hash_del(&rq->hash);
-	rq->rq_flags &= ~RQF_HASHED;
+	__rqhash_del(rq);
 }
 
 void elv_rqhash_del(struct request_queue *q, struct request *rq)
 {
-	if (ELV_ON_HASH(rq))
-		__elv_rqhash_del(rq);
+	rqhash_del(rq);
 }
 EXPORT_SYMBOL_GPL(elv_rqhash_del);
 
@@ -283,37 +276,22 @@ void elv_rqhash_add(struct request_queue *q, struct request *rq)
 {
 	struct elevator_queue *e = q->elevator;
 
-	BUG_ON(ELV_ON_HASH(rq));
-	hash_add(e->hash, &rq->hash, rq_hash_key(rq));
-	rq->rq_flags |= RQF_HASHED;
+	rqhash_add(e->hash, rq);
 }
 EXPORT_SYMBOL_GPL(elv_rqhash_add);
 
 void elv_rqhash_reposition(struct request_queue *q, struct request *rq)
 {
-	__elv_rqhash_del(rq);
-	elv_rqhash_add(q, rq);
+	struct elevator_queue *e = q->elevator;
+
+	rqhash_reposition(e->hash, rq);
 }
 
 struct request *elv_rqhash_find(struct request_queue *q, sector_t offset)
 {
 	struct elevator_queue *e = q->elevator;
-	struct hlist_node *next;
-	struct request *rq;
-
-	hash_for_each_possible_safe(e->hash, rq, next, hash, offset) {
-		BUG_ON(!ELV_ON_HASH(rq));
 
-		if (unlikely(!rq_mergeable(rq))) {
-			__elv_rqhash_del(rq);
-			continue;
-		}
-
-		if (rq_hash_key(rq) == offset)
-			return rq;
-	}
-
-	return NULL;
+	return rqhash_find(e->hash, offset);
 }
 
 /*
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH V3 10/14] block: move actual bio merge code into __elv_merge
  2017-08-26 16:33 [PATCH V3 00/14] blk-mq-sched: improve SCSI-MQ performance Ming Lei
                   ` (8 preceding siblings ...)
  2017-08-26 16:33 ` [PATCH V3 09/14] block: introduce rqhash helpers Ming Lei
@ 2017-08-26 16:33 ` Ming Lei
  2017-08-26 16:33 ` [PATCH V3 11/14] block: add check on elevator for supporting bio merge via hashtable from blk-mq sw queue Ming Lei
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2017-08-26 16:33 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, Laurence Oberman, Paolo Valente, Mel Gorman, Ming Lei

So that we can reuse __elv_merge() to merge bio
into requests from sw queue in the following patches.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/elevator.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index 7423e9c9cb27..4ac437b5fc0c 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -409,8 +409,9 @@ void elv_dispatch_add_tail(struct request_queue *q, struct request *rq)
 }
 EXPORT_SYMBOL(elv_dispatch_add_tail);
 
-enum elv_merge elv_merge(struct request_queue *q, struct request **req,
-		struct bio *bio)
+static enum elv_merge __elv_merge(struct request_queue *q,
+		struct request **req, struct bio *bio,
+		struct hlist_head *hash, struct request *last_merge)
 {
 	struct elevator_queue *e = q->elevator;
 	struct request *__rq;
@@ -427,11 +428,11 @@ enum elv_merge elv_merge(struct request_queue *q, struct request **req,
 	/*
 	 * First try one-hit cache.
 	 */
-	if (q->last_merge && elv_bio_merge_ok(q->last_merge, bio)) {
-		enum elv_merge ret = blk_try_merge(q->last_merge, bio);
+	if (last_merge && elv_bio_merge_ok(last_merge, bio)) {
+		enum elv_merge ret = blk_try_merge(last_merge, bio);
 
 		if (ret != ELEVATOR_NO_MERGE) {
-			*req = q->last_merge;
+			*req = last_merge;
 			return ret;
 		}
 	}
@@ -442,7 +443,7 @@ enum elv_merge elv_merge(struct request_queue *q, struct request **req,
 	/*
 	 * See if our hash lookup can find a potential backmerge.
 	 */
-	__rq = elv_rqhash_find(q, bio->bi_iter.bi_sector);
+	__rq = rqhash_find(hash, bio->bi_iter.bi_sector);
 	if (__rq && elv_bio_merge_ok(__rq, bio)) {
 		*req = __rq;
 		return ELEVATOR_BACK_MERGE;
@@ -456,6 +457,12 @@ enum elv_merge elv_merge(struct request_queue *q, struct request **req,
 	return ELEVATOR_NO_MERGE;
 }
 
+enum elv_merge elv_merge(struct request_queue *q, struct request **req,
+		struct bio *bio)
+{
+	return __elv_merge(q, req, bio, q->elevator->hash, q->last_merge);
+}
+
 /*
  * Attempt to do an insertion back merge. Only check for the case where
  * we can append 'rq' to an existing request, so we can throw 'rq' away
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH V3 11/14] block: add check on elevator for supporting bio merge via hashtable from blk-mq sw queue
  2017-08-26 16:33 [PATCH V3 00/14] blk-mq-sched: improve SCSI-MQ performance Ming Lei
                   ` (9 preceding siblings ...)
  2017-08-26 16:33 ` [PATCH V3 10/14] block: move actual bio merge code into __elv_merge Ming Lei
@ 2017-08-26 16:33 ` Ming Lei
  2017-08-26 16:33 ` [PATCH V3 12/14] block: introduce .last_merge and .hash to blk_mq_ctx Ming Lei
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2017-08-26 16:33 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, Laurence Oberman, Paolo Valente, Mel Gorman, Ming Lei

blk_mq_sched_try_merge() will be reused in following patches
to support bio merge to blk-mq sw queue, so add checkes to
related functions which are called from blk_mq_sched_try_merge().

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/elevator.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/block/elevator.c b/block/elevator.c
index 4ac437b5fc0c..a4e35419cf9a 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -71,6 +71,10 @@ bool elv_bio_merge_ok(struct request *rq, struct bio *bio)
 	if (!blk_rq_merge_ok(rq, bio))
 		return false;
 
+	/* We need to support to merge bio from sw queue */
+	if (!rq->q->elevator)
+		return true;
+
 	if (!elv_iosched_allow_bio_merge(rq, bio))
 		return false;
 
@@ -449,6 +453,10 @@ static enum elv_merge __elv_merge(struct request_queue *q,
 		return ELEVATOR_BACK_MERGE;
 	}
 
+	/* no elevator when merging bio to blk-mq sw queue */
+	if (!e)
+		return ELEVATOR_NO_MERGE;
+
 	if (e->uses_mq && e->type->ops.mq.request_merge)
 		return e->type->ops.mq.request_merge(q, req, bio);
 	else if (!e->uses_mq && e->type->ops.sq.elevator_merge_fn)
@@ -711,6 +719,10 @@ struct request *elv_latter_request(struct request_queue *q, struct request *rq)
 {
 	struct elevator_queue *e = q->elevator;
 
+	/* no elevator when merging bio to blk-mq sw queue */
+	if (!e)
+		return NULL;
+
 	if (e->uses_mq && e->type->ops.mq.next_request)
 		return e->type->ops.mq.next_request(q, rq);
 	else if (!e->uses_mq && e->type->ops.sq.elevator_latter_req_fn)
@@ -723,6 +735,10 @@ struct request *elv_former_request(struct request_queue *q, struct request *rq)
 {
 	struct elevator_queue *e = q->elevator;
 
+	/* no elevator when merging bio to blk-mq sw queue */
+	if (!e)
+		return NULL;
+
 	if (e->uses_mq && e->type->ops.mq.former_request)
 		return e->type->ops.mq.former_request(q, rq);
 	if (!e->uses_mq && e->type->ops.sq.elevator_former_req_fn)
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH V3 12/14] block: introduce .last_merge and .hash to blk_mq_ctx
  2017-08-26 16:33 [PATCH V3 00/14] blk-mq-sched: improve SCSI-MQ performance Ming Lei
                   ` (10 preceding siblings ...)
  2017-08-26 16:33 ` [PATCH V3 11/14] block: add check on elevator for supporting bio merge via hashtable from blk-mq sw queue Ming Lei
@ 2017-08-26 16:33 ` Ming Lei
  2017-08-26 16:33 ` [PATCH V3 13/14] blk-mq-sched: refactor blk_mq_sched_try_merge() Ming Lei
  2017-08-26 16:33 ` [PATCH V3 14/14] blk-mq: improve bio merge from blk-mq sw queue Ming Lei
  13 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2017-08-26 16:33 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, Laurence Oberman, Paolo Valente, Mel Gorman, Ming Lei

Prepare for supporting bio merge to sw queue if no
blk-mq io scheduler is taken.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.h   |  4 ++++
 block/blk.h      |  3 +++
 block/elevator.c | 22 +++++++++++++++++++---
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.h b/block/blk-mq.h
index 0277f9771fab..1b9742eb7399 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -18,6 +18,10 @@ struct blk_mq_ctx {
 	unsigned long		rq_dispatched[2];
 	unsigned long		rq_merged;
 
+	/* bio merge via request hash table */
+	struct request		*last_merge;
+	DECLARE_HASHTABLE(hash, ELV_HASH_BITS);
+
 	/* incremented at completion time */
 	unsigned long		____cacheline_aligned_in_smp rq_completed[2];
 
diff --git a/block/blk.h b/block/blk.h
index 313002c2f666..6847c5435cca 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -199,6 +199,9 @@ static inline struct request *rqhash_find(struct hlist_head *hash, sector_t offs
 	return NULL;
 }
 
+enum elv_merge elv_merge_ctx(struct request_queue *q, struct request **req,
+                struct bio *bio, struct blk_mq_ctx *ctx);
+
 void blk_insert_flush(struct request *rq);
 
 static inline struct request *__elv_next_request(struct request_queue *q)
diff --git a/block/elevator.c b/block/elevator.c
index a4e35419cf9a..0e465809d3f3 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -471,6 +471,13 @@ enum elv_merge elv_merge(struct request_queue *q, struct request **req,
 	return __elv_merge(q, req, bio, q->elevator->hash, q->last_merge);
 }
 
+enum elv_merge elv_merge_ctx(struct request_queue *q, struct request **req,
+		struct bio *bio, struct blk_mq_ctx *ctx)
+{
+	WARN_ON_ONCE(!q->mq_ops);
+	return __elv_merge(q, req, bio, ctx->hash, ctx->last_merge);
+}
+
 /*
  * Attempt to do an insertion back merge. Only check for the case where
  * we can append 'rq' to an existing request, so we can throw 'rq' away
@@ -516,16 +523,25 @@ void elv_merged_request(struct request_queue *q, struct request *rq,
 		enum elv_merge type)
 {
 	struct elevator_queue *e = q->elevator;
+	struct hlist_head *hash = e->hash;
+
+	/* we do bio merge on blk-mq sw queue */
+	if (q->mq_ops && !e) {
+		rq->mq_ctx->last_merge = rq;
+		hash = rq->mq_ctx->hash;
+		goto reposition;
+	}
+
+	q->last_merge = rq;
 
 	if (e->uses_mq && e->type->ops.mq.request_merged)
 		e->type->ops.mq.request_merged(q, rq, type);
 	else if (!e->uses_mq && e->type->ops.sq.elevator_merged_fn)
 		e->type->ops.sq.elevator_merged_fn(q, rq, type);
 
+ reposition:
 	if (type == ELEVATOR_BACK_MERGE)
-		elv_rqhash_reposition(q, rq);
-
-	q->last_merge = rq;
+		rqhash_reposition(hash, rq);
 }
 
 void elv_merge_requests(struct request_queue *q, struct request *rq,
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH V3 13/14] blk-mq-sched: refactor blk_mq_sched_try_merge()
  2017-08-26 16:33 [PATCH V3 00/14] blk-mq-sched: improve SCSI-MQ performance Ming Lei
                   ` (11 preceding siblings ...)
  2017-08-26 16:33 ` [PATCH V3 12/14] block: introduce .last_merge and .hash to blk_mq_ctx Ming Lei
@ 2017-08-26 16:33 ` Ming Lei
  2017-08-30 17:17   ` Bart Van Assche
  2017-08-26 16:33 ` [PATCH V3 14/14] blk-mq: improve bio merge from blk-mq sw queue Ming Lei
  13 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2017-08-26 16:33 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, Laurence Oberman, Paolo Valente, Mel Gorman, Ming Lei

This patch introduces one function __blk_mq_try_merge()
which will be resued for bio merge to sw queue in
the following patch.

No functional change.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 28054e49df5d..5af0ff71730c 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -228,12 +228,13 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	}
 }
 
-bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
-			    struct request **merged_request)
+static bool __blk_mq_try_merge(struct request_queue *q,
+		struct bio *bio, struct request **merged_request,
+		struct request *candidate, enum elv_merge type)
 {
-	struct request *rq;
+	struct request *rq = candidate;
 
-	switch (elv_merge(q, &rq, bio)) {
+	switch (type) {
 	case ELEVATOR_BACK_MERGE:
 		if (!blk_mq_sched_allow_merge(q, rq, bio))
 			return false;
@@ -256,6 +257,15 @@ bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
 		return false;
 	}
 }
+
+bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
+		struct request **merged_request)
+{
+	struct request *rq;
+	enum elv_merge type = elv_merge(q, &rq, bio);
+
+	return __blk_mq_try_merge(q, bio, merged_request, rq, type);
+}
 EXPORT_SYMBOL_GPL(blk_mq_sched_try_merge);
 
 /*
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH V3 14/14] blk-mq: improve bio merge from blk-mq sw queue
  2017-08-26 16:33 [PATCH V3 00/14] blk-mq-sched: improve SCSI-MQ performance Ming Lei
                   ` (12 preceding siblings ...)
  2017-08-26 16:33 ` [PATCH V3 13/14] blk-mq-sched: refactor blk_mq_sched_try_merge() Ming Lei
@ 2017-08-26 16:33 ` Ming Lei
  13 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2017-08-26 16:33 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, Laurence Oberman, Paolo Valente, Mel Gorman, Ming Lei

This patch uses hash table to do bio merge from sw queue,
then we can align to blk-mq scheduler/block legacy's way
for bio merge.

Turns out bio merge via hash table is more efficient than
simple merge on the last 8 requests in sw queue. On SCSI SRP,
it is observed ~10% IOPS is increased in sequential IO test
with this patch.

It is also one step forward to real 'none' scheduler, in which
way the blk-mq scheduler framework can be more clean.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c | 49 ++++++++++++-------------------------------------
 block/blk-mq.c       | 28 +++++++++++++++++++++++++---
 2 files changed, 37 insertions(+), 40 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 5af0ff71730c..b958caa8bccb 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -268,50 +268,25 @@ bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
 }
 EXPORT_SYMBOL_GPL(blk_mq_sched_try_merge);
 
-/*
- * Reverse check our software queue for entries that we could potentially
- * merge with. Currently includes a hand-wavy stop count of 8, to not spend
- * too much time checking for merges.
- */
-static bool blk_mq_attempt_merge(struct request_queue *q,
+static bool blk_mq_ctx_try_merge(struct request_queue *q,
 				 struct blk_mq_ctx *ctx, struct bio *bio)
 {
-	struct request *rq;
-	int checked = 8;
+	struct request *rq, *free = NULL;
+	enum elv_merge type;
+	bool merged;
 
 	lockdep_assert_held(&ctx->lock);
 
-	list_for_each_entry_reverse(rq, &ctx->rq_list, queuelist) {
-		bool merged = false;
-
-		if (!checked--)
-			break;
-
-		if (!blk_rq_merge_ok(rq, bio))
-			continue;
+	type = elv_merge_ctx(q, &rq, bio, ctx);
+	merged = __blk_mq_try_merge(q, bio, &free, rq, type);
 
-		switch (blk_try_merge(rq, bio)) {
-		case ELEVATOR_BACK_MERGE:
-			if (blk_mq_sched_allow_merge(q, rq, bio))
-				merged = bio_attempt_back_merge(q, rq, bio);
-			break;
-		case ELEVATOR_FRONT_MERGE:
-			if (blk_mq_sched_allow_merge(q, rq, bio))
-				merged = bio_attempt_front_merge(q, rq, bio);
-			break;
-		case ELEVATOR_DISCARD_MERGE:
-			merged = bio_attempt_discard_merge(q, rq, bio);
-			break;
-		default:
-			continue;
-		}
+	if (free)
+		blk_mq_free_request(free);
 
-		if (merged)
-			ctx->rq_merged++;
-		return merged;
-	}
+	if (merged)
+		ctx->rq_merged++;
 
-	return false;
+	return merged;
 }
 
 bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio)
@@ -329,7 +304,7 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio)
 	if (hctx->flags & BLK_MQ_F_SHOULD_MERGE) {
 		/* default per sw-queue merge */
 		spin_lock(&ctx->lock);
-		ret = blk_mq_attempt_merge(q, ctx, bio);
+		ret = blk_mq_ctx_try_merge(q, ctx, bio);
 		spin_unlock(&ctx->lock);
 	}
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index fc3d26bbfc1a..d935f15c54da 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -847,6 +847,18 @@ static void blk_mq_timeout_work(struct work_struct *work)
 	blk_queue_exit(q);
 }
 
+static void blk_mq_ctx_remove_rq_list(struct blk_mq_ctx *ctx,
+		struct list_head *head)
+{
+	struct request *rq;
+
+	lockdep_assert_held(&ctx->lock);
+
+	list_for_each_entry(rq, head, queuelist)
+		rqhash_del(rq);
+	ctx->last_merge = NULL;
+}
+
 struct flush_busy_ctx_data {
 	struct blk_mq_hw_ctx *hctx;
 	struct list_head *list;
@@ -861,6 +873,7 @@ static bool flush_busy_ctx(struct sbitmap *sb, unsigned int bitnr, void *data)
 	sbitmap_clear_bit(sb, bitnr);
 	spin_lock(&ctx->lock);
 	list_splice_tail_init(&ctx->rq_list, flush_data->list);
+	blk_mq_ctx_remove_rq_list(ctx, flush_data->list);
 	spin_unlock(&ctx->lock);
 	return true;
 }
@@ -890,17 +903,23 @@ static bool dispatch_rq_from_ctx(struct sbitmap *sb, unsigned int bitnr, void *d
 	struct dispatch_rq_data *dispatch_data = data;
 	struct blk_mq_hw_ctx *hctx = dispatch_data->hctx;
 	struct blk_mq_ctx *ctx = hctx->ctxs[bitnr];
+	struct request *rq = NULL;
 
 	spin_lock(&ctx->lock);
 	if (unlikely(!list_empty(&ctx->rq_list))) {
-		dispatch_data->rq = list_entry_rq(ctx->rq_list.next);
-		list_del_init(&dispatch_data->rq->queuelist);
+		rq = list_entry_rq(ctx->rq_list.next);
+		list_del_init(&rq->queuelist);
+		rqhash_del(rq);
 		if (list_empty(&ctx->rq_list))
 			sbitmap_clear_bit(sb, bitnr);
 	}
+	if (ctx->last_merge == rq)
+		ctx->last_merge = NULL;
 	spin_unlock(&ctx->lock);
 
-	return !dispatch_data->rq;
+	dispatch_data->rq = rq;
+
+	return !rq;
 }
 
 struct request *blk_mq_dispatch_rq_from_ctx(struct blk_mq_hw_ctx *hctx,
@@ -1431,6 +1450,8 @@ static inline void __blk_mq_insert_req_list(struct blk_mq_hw_ctx *hctx,
 		list_add(&rq->queuelist, &ctx->rq_list);
 	else
 		list_add_tail(&rq->queuelist, &ctx->rq_list);
+
+	rqhash_add(ctx->hash, rq);
 }
 
 void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
@@ -1923,6 +1944,7 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
 	spin_lock(&ctx->lock);
 	if (!list_empty(&ctx->rq_list)) {
 		list_splice_init(&ctx->rq_list, &tmp);
+		blk_mq_ctx_remove_rq_list(ctx, &tmp);
 		blk_mq_hctx_clear_pending(hctx, ctx);
 	}
 	spin_unlock(&ctx->lock);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH V3 02/14] sbitmap: introduce __sbitmap_for_each_set()
  2017-08-26 16:33 ` [PATCH V3 02/14] sbitmap: introduce __sbitmap_for_each_set() Ming Lei
@ 2017-08-30 15:55   ` Bart Van Assche
  2017-08-31  3:33     ` Ming Lei
  0 siblings, 1 reply; 28+ messages in thread
From: Bart Van Assche @ 2017-08-30 15:55 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei
  Cc: Bart Van Assche, osandov, mgorman, paolo.valente, loberman

T24gU3VuLCAyMDE3LTA4LTI3IGF0IDAwOjMzICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gIC8q
Kg0KPiAgICogc2JpdG1hcF9mb3JfZWFjaF9zZXQoKSAtIEl0ZXJhdGUgb3ZlciBlYWNoIHNldCBi
aXQgaW4gYSAmc3RydWN0IHNiaXRtYXAuDQo+ICsgKiBAc3RhcnQ6IFdoZXJlIHRvIHN0YXJ0IHRo
ZSBpdGVyYXRpb24NCg0KVGhhbmtzIGZvciBoYXZpbmcgY2hhbmdlZCB0aGUgbmFtZSBvZiB0aGlz
IGFyZ3VtZW50IC4uLg0KDQo+IC1zdGF0aWMgaW5saW5lIHZvaWQgc2JpdG1hcF9mb3JfZWFjaF9z
ZXQoc3RydWN0IHNiaXRtYXAgKnNiLCBzYl9mb3JfZWFjaF9mbiBmbiwNCj4gLQkJCQkJdm9pZCAq
ZGF0YSkNCj4gK3N0YXRpYyBpbmxpbmUgdm9pZCBfX3NiaXRtYXBfZm9yX2VhY2hfc2V0KHN0cnVj
dCBzYml0bWFwICpzYiwNCj4gKwkJCQkJICB1bnNpZ25lZCBpbnQgc3RhcnQsDQo+ICsJCQkJCSAg
c2JfZm9yX2VhY2hfZm4gZm4sIHZvaWQgKmRhdGEpDQo+ICB7DQo+IC0JdW5zaWduZWQgaW50IGk7
DQo+ICsJdW5zaWduZWQgaW50IGluZGV4ID0gU0JfTlJfVE9fSU5ERVgoc2IsIHN0YXJ0KTsNCj4g
Kwl1bnNpZ25lZCBpbnQgbnIgPSBTQl9OUl9UT19CSVQoc2IsIHN0YXJ0KTsNCj4gKwl1bnNpZ25l
ZCBpbnQgc2Nhbm5lZCA9IDA7DQoNCi4uLiBidXQgSSdtIHN0aWxsIG1pc3NpbmcgYSBjaGVjayBo
ZXJlIHdoZXRoZXIgb3Igbm90IGluZGV4ID49IHNiLT5tYXBfbnIuDQogDQo+IC0JZm9yIChpID0g
MDsgaSA8IHNiLT5tYXBfbnI7IGkrKykgew0KPiAtCQlzdHJ1Y3Qgc2JpdG1hcF93b3JkICp3b3Jk
ID0gJnNiLT5tYXBbaV07DQo+IC0JCXVuc2lnbmVkIGludCBvZmYsIG5yOw0KPiArCXdoaWxlICgx
KSB7DQo+ICsJCXN0cnVjdCBzYml0bWFwX3dvcmQgKndvcmQgPSAmc2ItPm1hcFtpbmRleF07DQo+
ICsJCXVuc2lnbmVkIGludCBkZXB0aCA9IG1pbl90KHVuc2lnbmVkIGludCwgd29yZC0+ZGVwdGgg
LSBuciwNCj4gKwkJCQkJICAgc2ItPmRlcHRoIC0gc2Nhbm5lZCk7DQo+ICANCj4gKwkJc2Nhbm5l
ZCArPSBkZXB0aDsNCj4gIAkJaWYgKCF3b3JkLT53b3JkKQ0KPiAtCQkJY29udGludWU7DQo+ICsJ
CQlnb3RvIG5leHQ7DQo+ICANCj4gLQkJbnIgPSAwOw0KPiAtCQlvZmYgPSBpIDw8IHNiLT5zaGlm
dDsNCj4gKwkJZGVwdGggKz0gbnI7DQo+ICsJCXN0YXJ0ID0gaW5kZXggPDwgc2ItPnNoaWZ0Ow0K
DQpUaGUgYWJvdmUgc3RhdGVtZW50IHJldXNlcyB0aGUgYXJndW1lbnQgJ3N0YXJ0JyBmb3IgYSBu
ZXcgcHVycG9zZS4gVGhpcyBpcw0KY29uZnVzaW5nIC0gcGxlYXNlIGRvbid0IGRvIHRoaXMuIFdo
eSBub3QgdG8ga2VlcCB0aGUgbmFtZSAnb2ZmJz8gVGhhdCB3aWxsDQprZWVwIHRoZSBjaGFuZ2Vz
IG1pbmltYWwuDQoNCkJhcnQu

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V3 03/14] blk-mq: introduce blk_mq_dispatch_rq_from_ctx()
  2017-08-26 16:33 ` [PATCH V3 03/14] blk-mq: introduce blk_mq_dispatch_rq_from_ctx() Ming Lei
@ 2017-08-30 16:01   ` Bart Van Assche
  0 siblings, 0 replies; 28+ messages in thread
From: Bart Van Assche @ 2017-08-30 16:01 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei
  Cc: Bart Van Assche, mgorman, paolo.valente, loberman

T24gU3VuLCAyMDE3LTA4LTI3IGF0IDAwOjMzICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gVGhp
cyBmdW5jdGlvbiBpcyBpbnRyb2R1Y2VkIGZvciBkZXF1ZXVpbmcgcmVxdWVzdA0KPiBmcm9tIHN3
IHF1ZXVlIHNvIHRoYXQgd2UgY2FuIGRpc3BhdGNoIGl0IGluDQo+IHNjaGVkdWxlcidzIHdheS4N
Cj4gDQo+IE1vcmUgaW1wb3J0YW50bHksIHNvbWUgU0NTSSBkZXZpY2VzIG1heSBzZXQNCj4gcS0+
cXVldWVfZGVwdGgsIHdoaWNoIGlzIGEgcGVyLXJlcXVlc3RfcXVldWUgbGltaXQsDQo+IGFuZCBh
cHBsaWVkIG9uIHBlbmRpbmcgSS9PIGZyb20gYWxsIGhjdHhzLiBUaGlzDQo+IGZ1bmN0aW9uIGlz
IGludHJvZHVjZWQgZm9yIGF2b2lkaW5nIHRvIGRlcXVldWUgdG9vDQo+IG1hbnkgcmVxdWVzdHMg
ZnJvbSBzdyBxdWV1ZSB3aGVuIC0+ZGlzcGF0Y2ggaXNuJ3QNCj4gZmx1c2hlZCBjb21wbGV0ZWx5
Lg0KDQpSZXZpZXdlZC1ieTogQmFydCBWYW4gQXNzY2hlIDxiYXJ0LnZhbmFzc2NoZUB3ZGMuY29t
Pg==

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V3 05/14] blk-mq-sched: improve dispatching from sw queue
  2017-08-26 16:33 ` [PATCH V3 05/14] blk-mq-sched: improve dispatching from sw queue Ming Lei
@ 2017-08-30 16:34   ` Bart Van Assche
  2017-08-31  3:43     ` Ming Lei
  0 siblings, 1 reply; 28+ messages in thread
From: Bart Van Assche @ 2017-08-30 16:34 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei
  Cc: Bart Van Assche, mgorman, paolo.valente, loberman

T24gU3VuLCAyMDE3LTA4LTI3IGF0IDAwOjMzICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gZGlm
ZiAtLWdpdCBhL2luY2x1ZGUvbGludXgvYmxrLW1xLmggYi9pbmNsdWRlL2xpbnV4L2Jsay1tcS5o
DQo+IGluZGV4IDUwYzY0ODVjYjA0Zi4uN2I3YTM2NmE5N2YzIDEwMDY0NA0KPiAtLS0gYS9pbmNs
dWRlL2xpbnV4L2Jsay1tcS5oDQo+ICsrKyBiL2luY2x1ZGUvbGludXgvYmxrLW1xLmgNCj4gQEAg
LTMwLDYgKzMwLDggQEAgc3RydWN0IGJsa19tcV9od19jdHggew0KPiAgDQo+ICAJc3RydWN0IHNi
aXRtYXAJCWN0eF9tYXA7DQo+ICANCj4gKwlzdHJ1Y3QgYmxrX21xX2N0eAkqZGlzcGF0Y2hfZnJv
bTsNCj4gKw0KPiAgCXN0cnVjdCBibGtfbXFfY3R4CSoqY3R4czsNCj4gIAl1bnNpZ25lZCBpbnQJ
CW5yX2N0eDsNCg0KSGVsbG8gTWluZywNCg0KQXJlIHlvdSByZWx5aW5nIGhlcmUgb24gdGhlIGZh
Y3QgdGhhdCB0aGUgcGVyLUNQVSBxdWV1ZXMgYXJlIG5ldmVyDQpyZWFsbG9jYXRlZCwgZXZlbiBp
ZiBDUFUgaG90LXBsdWdnaW5nIG9jY3Vycz8gU29ycnkgYnV0IHRoYXQgc2VlbXMgZnJhZ2lsZQ0K
dG8gbWUuIEkgd291bGQgbGlrZSB0byBzZWUgJ2Rpc3BhdGNoX2Zyb20nIGJlIGNvbnZlcnRlZCBp
bnRvIGFuIGludGVnZXIuIEl0DQppcyBlYXN5IHRvIGNoZWNrIHdoZXRoZXIgYW4gaW50ZWdlciBz
b2Z0d2FyZSBxdWV1ZSBpbmRleCBpcyBvdXQgb2YgcmFuZ2UNCmJ1dCBpdCdzIG5vdCB0aGF0IGVh
c3kgdG8gY2hlY2sgd2hldGhlciBhIHBvaW50ZXIgYmVjYW1lIGludmFsaWQuDQoNClRoYW5rcywN
Cg0KQmFydC4=

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V3 06/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed
  2017-08-26 16:33 ` [PATCH V3 06/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed Ming Lei
@ 2017-08-30 17:11   ` Bart Van Assche
  2017-08-31  4:01     ` Ming Lei
  0 siblings, 1 reply; 28+ messages in thread
From: Bart Van Assche @ 2017-08-30 17:11 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei
  Cc: Bart Van Assche, mgorman, paolo.valente, loberman

T24gU3VuLCAyMDE3LTA4LTI3IGF0IDAwOjMzICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gRHVy
aW5nIGRpc3BhdGNoaW5nLCB3ZSBtb3ZlZCBhbGwgcmVxdWVzdHMgZnJvbSBoY3R4LT5kaXNwYXRj
aCB0bw0KPiBvbmUgdGVtcG9yYXJ5IGxpc3QsIHRoZW4gZGlzcGF0Y2ggdGhlbSBvbmUgYnkgb25l
IGZyb20gdGhpcyBsaXN0Lg0KPiBVbmZvcnR1bmF0ZWx5IGR1aXJuZyB0aGlzIHBlcmlvZCwgcnVu
IHF1ZXVlIGZyb20gb3RoZXIgY29udGV4dHMNCiAgICAgICAgICAgICAgICBeXl5eXl4NCiAgICAg
ICAgICAgICAgICBkdXJpbmc/DQo+IG1heSB0aGluayB0aGUgcXVldWUgaXMgaWRsZSwgdGhlbiBz
dGFydCB0byBkZXF1ZXVlIGZyb20gc3cvc2NoZWR1bGVyDQo+IHF1ZXVlIGFuZCBzdGlsbCB0cnkg
dG8gZGlzcGF0Y2ggYmVjYXVzZSAtPmRpc3BhdGNoIGlzIGVtcHR5LiBUaGlzIHdheQ0KPiBodXJ0
cyBzZXF1ZW50aWFsIEkvTyBwZXJmb3JtYW5jZSBiZWNhdXNlIHJlcXVlc3RzIGFyZSBkZXF1ZXVl
ZCB3aGVuDQo+IGxsZCBxdWV1ZSBpcyBidXN5Lg0KPiBbIC4uLiBdDQo+IGRpZmYgLS1naXQgYS9i
bG9jay9ibGstbXEtc2NoZWQuYyBiL2Jsb2NrL2Jsay1tcS1zY2hlZC5jDQo+IGluZGV4IDczNWU0
MzIyOTRhYi4uNGQ3YmVhOGMyNTk0IDEwMDY0NA0KPiAtLS0gYS9ibG9jay9ibGstbXEtc2NoZWQu
Yw0KPiArKysgYi9ibG9jay9ibGstbXEtc2NoZWQuYw0KPiBAQCAtMTQ2LDcgKzE0Niw2IEBAIHZv
aWQgYmxrX21xX3NjaGVkX2Rpc3BhdGNoX3JlcXVlc3RzKHN0cnVjdCBibGtfbXFfaHdfY3R4ICpo
Y3R4KQ0KPiAgCXN0cnVjdCByZXF1ZXN0X3F1ZXVlICpxID0gaGN0eC0+cXVldWU7DQo+ICAJc3Ry
dWN0IGVsZXZhdG9yX3F1ZXVlICplID0gcS0+ZWxldmF0b3I7DQo+ICAJY29uc3QgYm9vbCBoYXNf
c2NoZWRfZGlzcGF0Y2ggPSBlICYmIGUtPnR5cGUtPm9wcy5tcS5kaXNwYXRjaF9yZXF1ZXN0Ow0K
PiAtCWJvb2wgZG9fc2NoZWRfZGlzcGF0Y2ggPSB0cnVlOw0KPiAgCUxJU1RfSEVBRChycV9saXN0
KTsNCj4gIA0KPiAgCS8qIFJDVSBvciBTUkNVIHJlYWQgbG9jayBpcyBuZWVkZWQgYmVmb3JlIGNo
ZWNraW5nIHF1aWVzY2VkIGZsYWcgKi8NCg0KU2hvdWxkbid0IGJsa19tcV9zY2hlZF9kaXNwYXRj
aF9yZXF1ZXN0cygpIHNldCBCTEtfTVFfU19ESVNQQVRDSF9CVVNZIGp1c3QgYWZ0ZXINCnRoZSBm
b2xsb3dpbmcgc3RhdGVtZW50IGJlY2F1c2UgdGhpcyBzdGF0ZW1lbnQgbWFrZXMgdGhlIGRpc3Bh
dGNoIGxpc3QgZW1wdHk/DQoNCgkJCWxpc3Rfc3BsaWNlX2luaXQoJmhjdHgtPmRpc3BhdGNoLCAm
cnFfbGlzdCk7DQoNCj4gQEAgLTE3Nyw4ICsxNzYsMzMgQEAgdm9pZCBibGtfbXFfc2NoZWRfZGlz
cGF0Y2hfcmVxdWVzdHMoc3RydWN0IGJsa19tcV9od19jdHggKmhjdHgpDQo+ICAJICovDQo+ICAJ
aWYgKCFsaXN0X2VtcHR5KCZycV9saXN0KSkgew0KPiAgCQlibGtfbXFfc2NoZWRfbWFya19yZXN0
YXJ0X2hjdHgoaGN0eCk7DQo+IC0JCWRvX3NjaGVkX2Rpc3BhdGNoID0gYmxrX21xX2Rpc3BhdGNo
X3JxX2xpc3QocSwgJnJxX2xpc3QpOw0KPiAtCX0gZWxzZSBpZiAoIWhhc19zY2hlZF9kaXNwYXRj
aCAmJiAhcS0+cXVldWVfZGVwdGgpIHsNCj4gKwkJYmxrX21xX2Rpc3BhdGNoX3JxX2xpc3QocSwg
JnJxX2xpc3QpOw0KPiArDQo+ICsJCS8qDQo+ICsJCSAqIFdlIG1heSBjbGVhciBESVNQQVRDSF9C
VVNZIGp1c3QgYWZ0ZXIgaXQNCj4gKwkJICogaXMgc2V0IGZyb20gYW5vdGhlciBjb250ZXh0LCB0
aGUgb25seSBjb3N0DQo+ICsJCSAqIGlzIHRoYXQgb25lIHJlcXVlc3QgaXMgZGVxdWV1ZWQgYSBi
aXQgZWFybHksDQo+ICsJCSAqIHdlIGNhbiBzdXJ2aXZlIHRoYXQuIEdpdmVuIHRoZSB3aW5kb3cg
aXMNCj4gKwkJICogdG9vIHNtYWxsLCBubyBuZWVkIHRvIHdvcnJ5IGFib3V0IHBlcmZvcm1hbmNl
DQogICAgICAgICAgICAgICAgICAgXl5eDQpUaGUgd29yZCAidG9vIiBzZWVtcyBleHRyYW5lb3Vz
IHRvIG1lIGluIHRoaXMgc2VudGVuY2UuDQoNCj4gIGJvb2wgYmxrX21xX3NjaGVkX3RyeV9tZXJn
ZShzdHJ1Y3QgcmVxdWVzdF9xdWV1ZSAqcSwgc3RydWN0IGJpbyAqYmlvLA0KPiBAQCAtMzMwLDYg
KzM1Myw3IEBAIHN0YXRpYyBib29sIGJsa19tcV9zY2hlZF9ieXBhc3NfaW5zZXJ0KHN0cnVjdCBi
bGtfbXFfaHdfY3R4ICpoY3R4LA0KPiAgCSAqLw0KPiAgCXNwaW5fbG9jaygmaGN0eC0+bG9jayk7
DQo+ICAJbGlzdF9hZGQoJnJxLT5xdWV1ZWxpc3QsICZoY3R4LT5kaXNwYXRjaCk7DQo+ICsJc2V0
X2JpdChCTEtfTVFfU19ESVNQQVRDSF9CVVNZLCAmaGN0eC0+c3RhdGUpOw0KPiAgCXNwaW5fdW5s
b2NrKCZoY3R4LT5sb2NrKTsNCj4gIAlyZXR1cm4gdHJ1ZTsNCj4gIH0NCg0KSXMgaXQgbmVjZXNz
YXJ5IHRvIG1ha2UgYmxrX21xX3NjaGVkX2J5cGFzc19pbnNlcnQoKSBzZXQgQkxLX01RX1NfRElT
UEFUQ0hfQlVTWT8NCk15IHVuZGVyc3RhbmRpbmcgaXMgdGhhdCBvbmx5IGNvZGUgdGhhdCBtYWtl
cyB0aGUgZGlzcGF0Y2ggbGlzdCBlbXB0eSBzaG91bGQNCnNldCBCTEtfTVFfU19ESVNQQVRDSF9C
VVNZLiBIb3dldmVyLCBibGtfbXFfc2NoZWRfYnlwYXNzX2luc2VydCgpIGFkZHMgYW4gZWxlbWVu
dA0KdG8gdGhlIGRpc3BhdGNoIGxpc3Qgc28gdGhhdCBndWFyYW50ZWVzIHRoYXQgdGhhdCBsaXN0
IGlzIG5vdCBlbXB0eS4NCg0KPiBkaWZmIC0tZ2l0IGEvYmxvY2svYmxrLW1xLmMgYi9ibG9jay9i
bGstbXEuYw0KPiBpbmRleCBmMDYzZGQwZjE5N2YuLjZhZjU2YTcxYzFjZCAxMDA2NDQNCj4gLS0t
IGEvYmxvY2svYmxrLW1xLmMNCj4gKysrIGIvYmxvY2svYmxrLW1xLmMNCj4gQEAgLTExNDAsNiAr
MTE0MCwxMSBAQCBib29sIGJsa19tcV9kaXNwYXRjaF9ycV9saXN0KHN0cnVjdCByZXF1ZXN0X3F1
ZXVlICpxLCBzdHJ1Y3QgbGlzdF9oZWFkICpsaXN0KQ0KPiAgDQo+ICAJCXNwaW5fbG9jaygmaGN0
eC0+bG9jayk7DQo+ICAJCWxpc3Rfc3BsaWNlX2luaXQobGlzdCwgJmhjdHgtPmRpc3BhdGNoKTsN
Cj4gKwkJLyoNCj4gKwkJICogRElTUEFUQ0hfQlVTWSB3b24ndCBiZSBjbGVhcmVkIHVudGlsIGFs
bCByZXF1ZXN0cw0KPiArCQkgKiBpbiBoY3R4LT5kaXNwYXRjaCBhcmUgZGlzcGF0Y2hlZCBzdWNj
ZXNzZnVsbHkNCj4gKwkJICovDQo+ICsJCXNldF9iaXQoQkxLX01RX1NfRElTUEFUQ0hfQlVTWSwg
JmhjdHgtPnN0YXRlKTsNCj4gIAkJc3Bpbl91bmxvY2soJmhjdHgtPmxvY2spOw0KDQpTYW1lIGNv
bW1lbnQgaGVyZSAtIHNpbmNlIHRoaXMgY29kZSBhZGRzIG9uZSBvciBtb3JlIHJlcXVlc3RzIHRv
IHRoZSBkaXNwYXRjaCBsaXN0LA0KaXMgaXQgcmVhbGx5IG5lZWRlZCB0byBzZXQgdGhlIERJU1BB
VENIX0JVU1kgZmxhZz8NCg0KQmFydC4=

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V3 13/14] blk-mq-sched: refactor blk_mq_sched_try_merge()
  2017-08-26 16:33 ` [PATCH V3 13/14] blk-mq-sched: refactor blk_mq_sched_try_merge() Ming Lei
@ 2017-08-30 17:17   ` Bart Van Assche
  2017-08-31  4:03     ` Ming Lei
  0 siblings, 1 reply; 28+ messages in thread
From: Bart Van Assche @ 2017-08-30 17:17 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei
  Cc: Bart Van Assche, mgorman, paolo.valente, loberman

T24gU3VuLCAyMDE3LTA4LTI3IGF0IDAwOjMzICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gLWJv
b2wgYmxrX21xX3NjaGVkX3RyeV9tZXJnZShzdHJ1Y3QgcmVxdWVzdF9xdWV1ZSAqcSwgc3RydWN0
IGJpbyAqYmlvLA0KPiAtCQkJICAgIHN0cnVjdCByZXF1ZXN0ICoqbWVyZ2VkX3JlcXVlc3QpDQo+
ICtzdGF0aWMgYm9vbCBfX2Jsa19tcV90cnlfbWVyZ2Uoc3RydWN0IHJlcXVlc3RfcXVldWUgKnEs
DQo+ICsJCXN0cnVjdCBiaW8gKmJpbywgc3RydWN0IHJlcXVlc3QgKiptZXJnZWRfcmVxdWVzdCwN
Cj4gKwkJc3RydWN0IHJlcXVlc3QgKmNhbmRpZGF0ZSwgZW51bSBlbHZfbWVyZ2UgdHlwZSkNCj4g
IHsNCj4gLQlzdHJ1Y3QgcmVxdWVzdCAqcnE7DQo+ICsJc3RydWN0IHJlcXVlc3QgKnJxID0gY2Fu
ZGlkYXRlOw0KDQpJdCBzZWVtcyB3ZWlyZCB0byBtZSB0aGF0IHRoZSBhcmd1bWVudCAnY2FuZGlk
YXRlJyBpcyBub3QgdXNlZCBvdGhlciB0aGFuIHRvDQpjb3B5IGl0cyB2YWx1ZSBpbnRvIHRoZSBs
b2NhbCB2YXJpYWJsZSAncnEnPyBIb3cgYWJvdXQgcmVtb3ZpbmcgdGhhdCBsb2NhbA0KdmFyaWFi
bGUgYW5kIHJlbmFtaW5nIHRoZSAnY2FuZGlkYXRlJyBhcmd1bWVudCBpbnRvICdycSc/IEFueXdh
eSwgd2l0aCBvcg0Kd2l0aG91dCB0aGF0IGNoYW5nZToNCg0KUmV2aWV3ZWQtYnk6IEJhcnQgVmFu
IEFzc2NoZSA8YmFydC52YW5hc3NjaGVAd2RjLmNvbT4=

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V3 02/14] sbitmap: introduce __sbitmap_for_each_set()
  2017-08-30 15:55   ` Bart Van Assche
@ 2017-08-31  3:33     ` Ming Lei
  0 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2017-08-31  3:33 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: hch, linux-block, axboe, osandov, mgorman, paolo.valente, loberman

On Wed, Aug 30, 2017 at 03:55:13PM +0000, Bart Van Assche wrote:
> On Sun, 2017-08-27 at 00:33 +0800, Ming Lei wrote:
> >  /**
> >   * sbitmap_for_each_set() - Iterate over each set bit in a &struct sbitmap.
> > + * @start: Where to start the iteration
> 
> Thanks for having changed the name of this argument ...
> 
> > -static inline void sbitmap_for_each_set(struct sbitmap *sb, sb_for_each_fn fn,
> > -					void *data)
> > +static inline void __sbitmap_for_each_set(struct sbitmap *sb,
> > +					  unsigned int start,
> > +					  sb_for_each_fn fn, void *data)
> >  {
> > -	unsigned int i;
> > +	unsigned int index = SB_NR_TO_INDEX(sb, start);
> > +	unsigned int nr = SB_NR_TO_BIT(sb, start);
> > +	unsigned int scanned = 0;
> 
> ... but I'm still missing a check here whether or not index >= sb->map_nr.

The reason I didn't do that is because the only one user is always to
pass correct 'start'.

OK, will add it.

>  
> > -	for (i = 0; i < sb->map_nr; i++) {
> > -		struct sbitmap_word *word = &sb->map[i];
> > -		unsigned int off, nr;
> > +	while (1) {
> > +		struct sbitmap_word *word = &sb->map[index];
> > +		unsigned int depth = min_t(unsigned int, word->depth - nr,
> > +					   sb->depth - scanned);
> >  
> > +		scanned += depth;
> >  		if (!word->word)
> > -			continue;
> > +			goto next;
> >  
> > -		nr = 0;
> > -		off = i << sb->shift;
> > +		depth += nr;
> > +		start = index << sb->shift;
> 
> The above statement reuses the argument 'start' for a new purpose. This is
> confusing - please don't do this. Why not to keep the name 'off'? That will
> keep the changes minimal.

OK, that will rename the parameter of 'start' as 'off', so that we can
save one local variable.

-- 
Ming

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V3 05/14] blk-mq-sched: improve dispatching from sw queue
  2017-08-30 16:34   ` Bart Van Assche
@ 2017-08-31  3:43     ` Ming Lei
  2017-08-31 20:36       ` Bart Van Assche
  0 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2017-08-31  3:43 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, linux-block, axboe, mgorman, paolo.valente, loberman

On Wed, Aug 30, 2017 at 04:34:47PM +0000, Bart Van Assche wrote:
> On Sun, 2017-08-27 at 00:33 +0800, Ming Lei wrote:
> > diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> > index 50c6485cb04f..7b7a366a97f3 100644
> > --- a/include/linux/blk-mq.h
> > +++ b/include/linux/blk-mq.h
> > @@ -30,6 +30,8 @@ struct blk_mq_hw_ctx {
> >  
> >  	struct sbitmap		ctx_map;
> >  
> > +	struct blk_mq_ctx	*dispatch_from;
> > +
> >  	struct blk_mq_ctx	**ctxs;
> >  	unsigned int		nr_ctx;
> 
> Hello Ming,
> 
> Are you relying here on the fact that the per-CPU queues are never
> reallocated, even if CPU hot-plugging occurs? Sorry but that seems fragile
> to me. I would like to see 'dispatch_from' be converted into an integer. It
> is easy to check whether an integer software queue index is out of range
> but it's not that easy to check whether a pointer became invalid.

If CPU hotplug happens, the instance of 'struct blk_mq_ctx' for that
CPU is still there and its index won't change from being setup because
its lifetime is same with 'struct request_queue', blk_mq_hctx_notify_dead()
just flushes the sw queue when the CPU is going away.

So we don't need to pay special attention to CPU hotplug here, please let
me know if you are fine now.


-- 
Ming

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V3 06/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed
  2017-08-30 17:11   ` Bart Van Assche
@ 2017-08-31  4:01     ` Ming Lei
  2017-08-31 21:00       ` Bart Van Assche
  0 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2017-08-31  4:01 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, linux-block, axboe, mgorman, paolo.valente, loberman

On Wed, Aug 30, 2017 at 05:11:00PM +0000, Bart Van Assche wrote:
> On Sun, 2017-08-27 at 00:33 +0800, Ming Lei wrote:
> > During dispatching, we moved all requests from hctx->dispatch to
> > one temporary list, then dispatch them one by one from this list.
> > Unfortunately duirng this period, run queue from other contexts
>                 ^^^^^^
>                 during?

OK.

> > may think the queue is idle, then start to dequeue from sw/scheduler
> > queue and still try to dispatch because ->dispatch is empty. This way
> > hurts sequential I/O performance because requests are dequeued when
> > lld queue is busy.
> > [ ... ]
> > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > index 735e432294ab..4d7bea8c2594 100644
> > --- a/block/blk-mq-sched.c
> > +++ b/block/blk-mq-sched.c
> > @@ -146,7 +146,6 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> >  	struct request_queue *q = hctx->queue;
> >  	struct elevator_queue *e = q->elevator;
> >  	const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
> > -	bool do_sched_dispatch = true;
> >  	LIST_HEAD(rq_list);
> >  
> >  	/* RCU or SRCU read lock is needed before checking quiesced flag */
> 
> Shouldn't blk_mq_sched_dispatch_requests() set BLK_MQ_S_DISPATCH_BUSY just after
> the following statement because this statement makes the dispatch list empty?

Actually that is what I did in V1.

I changed to this way because setting the BUSY flag here will increase
the race window a bit, for example, if one request is added to ->dispatch
just after it is flushed(), the check on the BUSY bit won't catch this
case. Then we can avoid to check both the bit and list_empty_careful(&hctx->dispatch)
in blk_mq_sched_dispatch_requests(), so code becomes simpler and more
readable if we set the flag simply from the beginning.

> 
> 			list_splice_init(&hctx->dispatch, &rq_list);
> 
> > @@ -177,8 +176,33 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> >  	 */
> >  	if (!list_empty(&rq_list)) {
> >  		blk_mq_sched_mark_restart_hctx(hctx);
> > -		do_sched_dispatch = blk_mq_dispatch_rq_list(q, &rq_list);
> > -	} else if (!has_sched_dispatch && !q->queue_depth) {
> > +		blk_mq_dispatch_rq_list(q, &rq_list);
> > +
> > +		/*
> > +		 * We may clear DISPATCH_BUSY just after it
> > +		 * is set from another context, the only cost
> > +		 * is that one request is dequeued a bit early,
> > +		 * we can survive that. Given the window is
> > +		 * too small, no need to worry about performance
>                    ^^^
> The word "too" seems extraneous to me in this sentence.

Maybe 'extremely' is better, or just remove it?

> 
> >  bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
> > @@ -330,6 +353,7 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
> >  	 */
> >  	spin_lock(&hctx->lock);
> >  	list_add(&rq->queuelist, &hctx->dispatch);
> > +	set_bit(BLK_MQ_S_DISPATCH_BUSY, &hctx->state);
> >  	spin_unlock(&hctx->lock);
> >  	return true;
> >  }
> 
> Is it necessary to make blk_mq_sched_bypass_insert() set BLK_MQ_S_DISPATCH_BUSY?
> My understanding is that only code that makes the dispatch list empty should
> set BLK_MQ_S_DISPATCH_BUSY. However, blk_mq_sched_bypass_insert() adds an element
> to the dispatch list so that guarantees that that list is not empty.

I believe my above comment has explained it already.

> 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index f063dd0f197f..6af56a71c1cd 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -1140,6 +1140,11 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
> >  
> >  		spin_lock(&hctx->lock);
> >  		list_splice_init(list, &hctx->dispatch);
> > +		/*
> > +		 * DISPATCH_BUSY won't be cleared until all requests
> > +		 * in hctx->dispatch are dispatched successfully
> > +		 */
> > +		set_bit(BLK_MQ_S_DISPATCH_BUSY, &hctx->state);
> >  		spin_unlock(&hctx->lock);
> 
> Same comment here - since this code adds one or more requests to the dispatch list,
> is it really needed to set the DISPATCH_BUSY flag?

See same comment above, :-)

-- 
Ming

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V3 13/14] blk-mq-sched: refactor blk_mq_sched_try_merge()
  2017-08-30 17:17   ` Bart Van Assche
@ 2017-08-31  4:03     ` Ming Lei
  0 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2017-08-31  4:03 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, linux-block, axboe, mgorman, paolo.valente, loberman

On Wed, Aug 30, 2017 at 05:17:05PM +0000, Bart Van Assche wrote:
> On Sun, 2017-08-27 at 00:33 +0800, Ming Lei wrote:
> > -bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
> > -			    struct request **merged_request)
> > +static bool __blk_mq_try_merge(struct request_queue *q,
> > +		struct bio *bio, struct request **merged_request,
> > +		struct request *candidate, enum elv_merge type)
> >  {
> > -	struct request *rq;
> > +	struct request *rq = candidate;
> 
> It seems weird to me that the argument 'candidate' is not used other than to
> copy its value into the local variable 'rq'? How about removing that local

OK, will simply use 'rq' as parameter.

> variable and renaming the 'candidate' argument into 'rq'? Anyway, with or
> without that change:
> 
> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>

OK, thanks!

-- 
Ming

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V3 05/14] blk-mq-sched: improve dispatching from sw queue
  2017-08-31  3:43     ` Ming Lei
@ 2017-08-31 20:36       ` Bart Van Assche
  0 siblings, 0 replies; 28+ messages in thread
From: Bart Van Assche @ 2017-08-31 20:36 UTC (permalink / raw)
  To: ming.lei; +Cc: hch, linux-block, mgorman, axboe, loberman, paolo.valente

T24gVGh1LCAyMDE3LTA4LTMxIGF0IDExOjQzICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gT24g
V2VkLCBBdWcgMzAsIDIwMTcgYXQgMDQ6MzQ6NDdQTSArMDAwMCwgQmFydCBWYW4gQXNzY2hlIHdy
b3RlOg0KPiA+IE9uIFN1biwgMjAxNy0wOC0yNyBhdCAwMDozMyArMDgwMCwgTWluZyBMZWkgd3Jv
dGU6DQo+ID4gPiBkaWZmIC0tZ2l0IGEvaW5jbHVkZS9saW51eC9ibGstbXEuaCBiL2luY2x1ZGUv
bGludXgvYmxrLW1xLmgNCj4gPiA+IGluZGV4IDUwYzY0ODVjYjA0Zi4uN2I3YTM2NmE5N2YzIDEw
MDY0NA0KPiA+ID4gLS0tIGEvaW5jbHVkZS9saW51eC9ibGstbXEuaA0KPiA+ID4gKysrIGIvaW5j
bHVkZS9saW51eC9ibGstbXEuaA0KPiA+ID4gQEAgLTMwLDYgKzMwLDggQEAgc3RydWN0IGJsa19t
cV9od19jdHggew0KPiA+ID4gIA0KPiA+ID4gIAlzdHJ1Y3Qgc2JpdG1hcAkJY3R4X21hcDsNCj4g
PiA+ICANCj4gPiA+ICsJc3RydWN0IGJsa19tcV9jdHgJKmRpc3BhdGNoX2Zyb207DQo+ID4gPiAr
DQo+ID4gPiAgCXN0cnVjdCBibGtfbXFfY3R4CSoqY3R4czsNCj4gPiA+ICAJdW5zaWduZWQgaW50
CQlucl9jdHg7DQo+ID4gDQo+ID4gSGVsbG8gTWluZywNCj4gPiANCj4gPiBBcmUgeW91IHJlbHlp
bmcgaGVyZSBvbiB0aGUgZmFjdCB0aGF0IHRoZSBwZXItQ1BVIHF1ZXVlcyBhcmUgbmV2ZXINCj4g
PiByZWFsbG9jYXRlZCwgZXZlbiBpZiBDUFUgaG90LXBsdWdnaW5nIG9jY3Vycz8gU29ycnkgYnV0
IHRoYXQgc2VlbXMgZnJhZ2lsZQ0KPiA+IHRvIG1lLiBJIHdvdWxkIGxpa2UgdG8gc2VlICdkaXNw
YXRjaF9mcm9tJyBiZSBjb252ZXJ0ZWQgaW50byBhbiBpbnRlZ2VyLiBJdA0KPiA+IGlzIGVhc3kg
dG8gY2hlY2sgd2hldGhlciBhbiBpbnRlZ2VyIHNvZnR3YXJlIHF1ZXVlIGluZGV4IGlzIG91dCBv
ZiByYW5nZQ0KPiA+IGJ1dCBpdCdzIG5vdCB0aGF0IGVhc3kgdG8gY2hlY2sgd2hldGhlciBhIHBv
aW50ZXIgYmVjYW1lIGludmFsaWQuDQo+IA0KPiBJZiBDUFUgaG90cGx1ZyBoYXBwZW5zLCB0aGUg
aW5zdGFuY2Ugb2YgJ3N0cnVjdCBibGtfbXFfY3R4JyBmb3IgdGhhdA0KPiBDUFUgaXMgc3RpbGwg
dGhlcmUgYW5kIGl0cyBpbmRleCB3b24ndCBjaGFuZ2UgZnJvbSBiZWluZyBzZXR1cCBiZWNhdXNl
DQo+IGl0cyBsaWZldGltZSBpcyBzYW1lIHdpdGggJ3N0cnVjdCByZXF1ZXN0X3F1ZXVlJywgYmxr
X21xX2hjdHhfbm90aWZ5X2RlYWQoKQ0KPiBqdXN0IGZsdXNoZXMgdGhlIHN3IHF1ZXVlIHdoZW4g
dGhlIENQVSBpcyBnb2luZyBhd2F5Lg0KPiANCj4gU28gd2UgZG9uJ3QgbmVlZCB0byBwYXkgc3Bl
Y2lhbCBhdHRlbnRpb24gdG8gQ1BVIGhvdHBsdWcgaGVyZSwgcGxlYXNlIGxldA0KPiBtZSBrbm93
IGlmIHlvdSBhcmUgZmluZSBub3cuDQoNCkhlbGxvIE1pbmcsDQoNClRoYXQncyBhbHNvIGhvdyBJ
IGludGVycHJldCB0aGUgYmxrLW1xIHNvdXJjZSBjb2RlLiBCdXQgaXQncyBwcm9iYWJseSBhIGdv
b2QNCmlkZWEgdG8gYWRkIGEgc2hvcnQgY29tbWVudCB0aGF0IHRoZSBkaXNwYXRjaF9mcm9tIHBv
aW50ZXIgaXMgbm90IGFmZmVjdGVkIGJ5DQpDUFUgaG90LXBsdWdnaW5nLiBBbnl3YXksIHdpdGgg
b3Igd2l0aG91dCBzdWNoIGEgY29tbWVudDoNCg0KUmV2aWV3ZWQtYnk6IEJhcnQgVmFuIEFzc2No
ZSA8YmFydC52YW5hc3NjaGVAd2RjLmNvbT4NCg0K

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V3 06/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed
  2017-08-31  4:01     ` Ming Lei
@ 2017-08-31 21:00       ` Bart Van Assche
  2017-09-01  3:02         ` Ming Lei
  0 siblings, 1 reply; 28+ messages in thread
From: Bart Van Assche @ 2017-08-31 21:00 UTC (permalink / raw)
  To: ming.lei; +Cc: hch, linux-block, mgorman, axboe, loberman, paolo.valente

T24gVGh1LCAyMDE3LTA4LTMxIGF0IDEyOjAxICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gT24g
V2VkLCBBdWcgMzAsIDIwMTcgYXQgMDU6MTE6MDBQTSArMDAwMCwgQmFydCBWYW4gQXNzY2hlIHdy
b3RlOg0KPiA+IE9uIFN1biwgMjAxNy0wOC0yNyBhdCAwMDozMyArMDgwMCwgTWluZyBMZWkgd3Jv
dGU6DQo+ID4gWyAuLi4gXQ0KPiA+IFNob3VsZG4ndCBibGtfbXFfc2NoZWRfZGlzcGF0Y2hfcmVx
dWVzdHMoKSBzZXQgQkxLX01RX1NfRElTUEFUQ0hfQlVTWSBqdXN0IGFmdGVyDQo+ID4gdGhlIGZv
bGxvd2luZyBzdGF0ZW1lbnQgYmVjYXVzZSB0aGlzIHN0YXRlbWVudCBtYWtlcyB0aGUgZGlzcGF0
Y2ggbGlzdCBlbXB0eT8NCj4gDQo+IEFjdHVhbGx5IHRoYXQgaXMgd2hhdCBJIGRpZCBpbiBWMS4N
Cj4gDQo+IEkgY2hhbmdlZCB0byB0aGlzIHdheSBiZWNhdXNlIHNldHRpbmcgdGhlIEJVU1kgZmxh
ZyBoZXJlIHdpbGwgaW5jcmVhc2UNCj4gdGhlIHJhY2Ugd2luZG93IGEgYml0LCBmb3IgZXhhbXBs
ZSwgaWYgb25lIHJlcXVlc3QgaXMgYWRkZWQgdG8gLT5kaXNwYXRjaA0KPiBqdXN0IGFmdGVyIGl0
IGlzIGZsdXNoZWQoKSwgdGhlIGNoZWNrIG9uIHRoZSBCVVNZIGJpdCB3b24ndCBjYXRjaCB0aGlz
DQo+IGNhc2UuIFRoZW4gd2UgY2FuIGF2b2lkIHRvIGNoZWNrIGJvdGggdGhlIGJpdCBhbmQgbGlz
dF9lbXB0eV9jYXJlZnVsKCZoY3R4LT5kaXNwYXRjaCkNCj4gaW4gYmxrX21xX3NjaGVkX2Rpc3Bh
dGNoX3JlcXVlc3RzKCksIHNvIGNvZGUgYmVjb21lcyBzaW1wbGVyIGFuZCBtb3JlDQo+IHJlYWRh
YmxlIGlmIHdlIHNldCB0aGUgZmxhZyBzaW1wbHkgZnJvbSB0aGUgYmVnaW5uaW5nLg0KDQpIZWxs
byBNaW5nLA0KDQpNeSB1bmRlcnN0YW5kaW5nIGlzIHRoYXQgYmxrX21xX3NjaGVkX2Rpc3BhdGNo
X3JlcXVlc3RzKCkgd2lsbCBvbmx5IHdvcmsNCmNvcnJlY3RseSBpZiB0aGUgY29kZSB0aGF0IHNl
dHMgdGhlIERJU1BBVENIX0JVU1kgZmxhZyBkb2VzIHRoYXQgYWZ0ZXIgaGF2aW5nDQppbnNlcnRl
ZCBvbmUgb3IgbW9yZSBlbGVtZW50cyBpbiB0aGUgZGlzcGF0Y2ggbGlzdC4gQWx0aG91Z2ggeDg2
IENQVXMgZG8gbm90DQpyZW9yZGVyIHN0b3JlIG9wZXJhdGlvbnMgSSB0aGluayB0aGF0IHRoZSBm
dW5jdGlvbnMgdGhhdCBzZXQgdGhlIERJU1BBVENIX0JVU1kNCmZsYWcgbmVlZCBhIG1lbW9yeSBi
YXJyaWVyIHRoZXNlIHR3byBzdG9yZSBvcGVyYXRpb25zLiBJJ20gcmVmZXJyaW5nIHRvIHRoZQ0K
YmxrX21xX3NjaGVkX2J5cGFzc19pbnNlcnQoKSwgYmxrX21xX2Rpc3BhdGNoX3dhaXRfYWRkKCkg
YW5kDQpibGtfbXFfaGN0eF9ub3RpZnlfZGVhZCgpIGZ1bmN0aW9ucy4NCg0KPiA+ID4gKwkJICog
dG9vIHNtYWxsLCBubyBuZWVkIHRvIHdvcnJ5IGFib3V0IHBlcmZvcm1hbmNlDQo+ID4gDQo+ID4g
ICAgICAgICAgICAgICAgICAgIF5eXg0KPiA+IFRoZSB3b3JkICJ0b28iIHNlZW1zIGV4dHJhbmVv
dXMgdG8gbWUgaW4gdGhpcyBzZW50ZW5jZS4NCj4gDQo+IE1heWJlICdleHRyZW1lbHknIGlzIGJl
dHRlciwgb3IganVzdCByZW1vdmUgaXQ/DQoNCklmIHRoZSB3b3JkICJ0b28iIHdvdWxkIGJlIHJl
bW92ZWQgSSB0aGluayB0aGUgY29tbWVudCBpcyBzdGlsbCBjbGVhci4NCg0KVGhhbmtzLA0KDQpC
YXJ0Lg==

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V3 06/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed
  2017-08-31 21:00       ` Bart Van Assche
@ 2017-09-01  3:02         ` Ming Lei
  2017-09-01 18:19           ` Bart Van Assche
  0 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2017-09-01  3:02 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, linux-block, mgorman, axboe, loberman, paolo.valente

On Thu, Aug 31, 2017 at 09:00:19PM +0000, Bart Van Assche wrote:
> On Thu, 2017-08-31 at 12:01 +0800, Ming Lei wrote:
> > On Wed, Aug 30, 2017 at 05:11:00PM +0000, Bart Van Assche wrote:
> > > On Sun, 2017-08-27 at 00:33 +0800, Ming Lei wrote:
> > > [ ... ]
> > > Shouldn't blk_mq_sched_dispatch_requests() set BLK_MQ_S_DISPATCH_BUSY just after
> > > the following statement because this statement makes the dispatch list empty?
> > 
> > Actually that is what I did in V1.
> > 
> > I changed to this way because setting the BUSY flag here will increase
> > the race window a bit, for example, if one request is added to ->dispatch
> > just after it is flushed(), the check on the BUSY bit won't catch this
> > case. Then we can avoid to check both the bit and list_empty_careful(&hctx->dispatch)
> > in blk_mq_sched_dispatch_requests(), so code becomes simpler and more
> > readable if we set the flag simply from the beginning.
> 
> Hello Ming,
> 
> My understanding is that blk_mq_sched_dispatch_requests() will only work
> correctly if the code that sets the DISPATCH_BUSY flag does that after having
> inserted one or more elements in the dispatch list. Although x86 CPUs do not
> reorder store operations I think that the functions that set the DISPATCH_BUSY
> flag need a memory barrier these two store operations. I'm referring to the
> blk_mq_sched_bypass_insert(), blk_mq_dispatch_wait_add() and
> blk_mq_hctx_notify_dead() functions.

That is a good question, but it has been answered by the comment
before checking the DISPATCH_BUSY state in blk_mq_sched_dispatch_requests():

	/*
	 * If DISPATCH_BUSY is set, that means hw queue is busy
	 * and requests in the list of hctx->dispatch need to
	 * be flushed first, so return early.
	 *
	 * Wherever DISPATCH_BUSY is set, blk_mq_run_hw_queue()
	 * will be run to try to make progress, so it is always
	 * safe to check the state here.
	 */

Suppose the two writes are reordered, sooner or latter, the
list_empty_careful(&hctx->dispatch) will be observed, and
will dispatch request from this hw queue after '->dispatch'
is flushed.

Since now the setting of DISPATCH_BUSY requires to hold
hctx->lock, we should avoid to add barrier there.

> 
> > > > +		 * too small, no need to worry about performance
> > > 
> > >                    ^^^
> > > The word "too" seems extraneous to me in this sentence.
> > 
> > Maybe 'extremely' is better, or just remove it?
> 
> If the word "too" would be removed I think the comment is still clear.

OK.

-- 
Ming

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V3 06/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed
  2017-09-01  3:02         ` Ming Lei
@ 2017-09-01 18:19           ` Bart Van Assche
  0 siblings, 0 replies; 28+ messages in thread
From: Bart Van Assche @ 2017-09-01 18:19 UTC (permalink / raw)
  To: ming.lei; +Cc: hch, linux-block, mgorman, axboe, loberman, paolo.valente

T24gRnJpLCAyMDE3LTA5LTAxIGF0IDExOjAyICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gVGhh
dCBpcyBhIGdvb2QgcXVlc3Rpb24sIGJ1dCBpdCBoYXMgYmVlbiBhbnN3ZXJlZCBieSB0aGUgY29t
bWVudA0KPiBiZWZvcmUgY2hlY2tpbmcgdGhlIERJU1BBVENIX0JVU1kgc3RhdGUgaW4gYmxrX21x
X3NjaGVkX2Rpc3BhdGNoX3JlcXVlc3RzKCk6DQo+IA0KPiAJLyoNCj4gCSAqIElmIERJU1BBVENI
X0JVU1kgaXMgc2V0LCB0aGF0IG1lYW5zIGh3IHF1ZXVlIGlzIGJ1c3kNCj4gCSAqIGFuZCByZXF1
ZXN0cyBpbiB0aGUgbGlzdCBvZiBoY3R4LT5kaXNwYXRjaCBuZWVkIHRvDQo+IAkgKiBiZSBmbHVz
aGVkIGZpcnN0LCBzbyByZXR1cm4gZWFybHkuDQo+IAkgKg0KPiAJICogV2hlcmV2ZXIgRElTUEFU
Q0hfQlVTWSBpcyBzZXQsIGJsa19tcV9ydW5faHdfcXVldWUoKQ0KPiAJICogd2lsbCBiZSBydW4g
dG8gdHJ5IHRvIG1ha2UgcHJvZ3Jlc3MsIHNvIGl0IGlzIGFsd2F5cw0KPiAJICogc2FmZSB0byBj
aGVjayB0aGUgc3RhdGUgaGVyZS4NCj4gCSAqLw0KPiANCj4gU3VwcG9zZSB0aGUgdHdvIHdyaXRl
cyBhcmUgcmVvcmRlcmVkLCBzb29uZXIgb3IgbGF0dGVyLCB0aGUNCj4gbGlzdF9lbXB0eV9jYXJl
ZnVsKCZoY3R4LT5kaXNwYXRjaCkgd2lsbCBiZSBvYnNlcnZlZCwgYW5kDQo+IHdpbGwgZGlzcGF0
Y2ggcmVxdWVzdCBmcm9tIHRoaXMgaHcgcXVldWUgYWZ0ZXIgJy0+ZGlzcGF0Y2gnDQo+IGlzIGZs
dXNoZWQuDQo+IA0KPiBTaW5jZSBub3cgdGhlIHNldHRpbmcgb2YgRElTUEFUQ0hfQlVTWSByZXF1
aXJlcyB0byBob2xkDQo+IGhjdHgtPmxvY2ssIHdlIHNob3VsZCBhdm9pZCB0byBhZGQgYmFycmll
ciB0aGVyZS4NCg0KQWx0aG91Z2ggaXQgaXMgbm90IGNsZWFyIHRvIG1lIGhvdyBhbnlvbmUgd2hv
IGhhcyBub3QgZm9sbG93ZWQgdGhpcyBkaXNjdXNzaW9uDQppcyBhc3N1bWVkIHRvIGZpZ3VyZSBv
dXQgYWxsIHRoZXNlIHN1YnRsZXRpZXMsIGlmIHRoZSBvdGhlciBjb21tZW50cyBnZXQNCmFkZHJl
c3NlZDoNCg0KUmV2aWV3ZWQtYnk6IEJhcnQgVmFuIEFzc2NoZSA8YmFydC52YW5hc3NjaGVAd2Rj
LmNvbT4NCg0K

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-09-01 18:20 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-26 16:33 [PATCH V3 00/14] blk-mq-sched: improve SCSI-MQ performance Ming Lei
2017-08-26 16:33 ` [PATCH V3 01/14] blk-mq-sched: fix scheduler bad performance Ming Lei
2017-08-26 16:33 ` [PATCH V3 02/14] sbitmap: introduce __sbitmap_for_each_set() Ming Lei
2017-08-30 15:55   ` Bart Van Assche
2017-08-31  3:33     ` Ming Lei
2017-08-26 16:33 ` [PATCH V3 03/14] blk-mq: introduce blk_mq_dispatch_rq_from_ctx() Ming Lei
2017-08-30 16:01   ` Bart Van Assche
2017-08-26 16:33 ` [PATCH V3 04/14] blk-mq-sched: move actual dispatching into one helper Ming Lei
2017-08-26 16:33 ` [PATCH V3 05/14] blk-mq-sched: improve dispatching from sw queue Ming Lei
2017-08-30 16:34   ` Bart Van Assche
2017-08-31  3:43     ` Ming Lei
2017-08-31 20:36       ` Bart Van Assche
2017-08-26 16:33 ` [PATCH V3 06/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed Ming Lei
2017-08-30 17:11   ` Bart Van Assche
2017-08-31  4:01     ` Ming Lei
2017-08-31 21:00       ` Bart Van Assche
2017-09-01  3:02         ` Ming Lei
2017-09-01 18:19           ` Bart Van Assche
2017-08-26 16:33 ` [PATCH V3 07/14] blk-mq-sched: introduce blk_mq_sched_queue_depth() Ming Lei
2017-08-26 16:33 ` [PATCH V3 08/14] blk-mq-sched: use q->queue_depth as hint for q->nr_requests Ming Lei
2017-08-26 16:33 ` [PATCH V3 09/14] block: introduce rqhash helpers Ming Lei
2017-08-26 16:33 ` [PATCH V3 10/14] block: move actual bio merge code into __elv_merge Ming Lei
2017-08-26 16:33 ` [PATCH V3 11/14] block: add check on elevator for supporting bio merge via hashtable from blk-mq sw queue Ming Lei
2017-08-26 16:33 ` [PATCH V3 12/14] block: introduce .last_merge and .hash to blk_mq_ctx Ming Lei
2017-08-26 16:33 ` [PATCH V3 13/14] blk-mq-sched: refactor blk_mq_sched_try_merge() Ming Lei
2017-08-30 17:17   ` Bart Van Assche
2017-08-31  4:03     ` Ming Lei
2017-08-26 16:33 ` [PATCH V3 14/14] blk-mq: improve bio merge from blk-mq sw queue Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.