[PATCH 00/14] blk-mq-sched: fix SCSI-MQ performance regression

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/14] blk-mq-sched: fix SCSI-MQ performance regression
@ 2017-07-31 16:50 Ming Lei
  2017-07-31 16:50 ` Ming Lei
                   ` (14 more replies)
  0 siblings, 15 replies; 47+ messages in thread
From: Ming Lei @ 2017-07-31 16:50 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, linux-scsi, Martin K . Petersen,
	James E . J . Bottomley, Ming Lei

In Red Hat internal storage test wrt. blk-mq scheduler, we
found that its performance is quite bad, especially
about sequential I/O on some multi-queue SCSI devcies.

Turns out one big issue causes the performance regression: requests
are still dequeued from sw queue/scheduler queue even when ldd's
queue is busy, so I/O merge becomes quite difficult to do, and
sequential IO degrades a lot.

The 1st five patches improve this situation, and brings back
some performance loss.

But looks they are still not enough. Finally it is caused by
the shared queue depth among all hw queues. For SCSI devices,
.cmd_per_lun defines the max number of pending I/O on one
request queue, which is per-request_queue depth. So during
dispatch, if one hctx is too busy to move on, all hctxs can't
dispatch too because of the per-request_queue depth.

Patch 6 ~ 14 use per-request_queue dispatch list to avoid
to dequeue requests from sw/scheduler queue when lld queue
is busy.

With this changes, SCSI-MQ performance is brought back
against block legacy path, follows the test result on lpfc:

- fio(libaio, bs:4k, dio, queue_depth:64, 20 jobs)


                   |v4.13-rc3       | v4.13-rc3   | patched v4.13-rc3
                   |legacy deadline | mq-none     | mq-none
---------------------------------------------------------------------
read        "iops" | 401749.4001    | 346237.5025 | 387536.4427
randread    "iops" | 25175.07121    | 21688.64067 | 25578.50374
write       "iops" | 376168.7578    | 335262.0475 | 370132.4735
reandwrite  "iops" | 25235.46163    | 24982.63819 | 23934.95610

                   |v4.13-rc3       | v4.13-rc3   | patched v4.13-rc3
                   |legacy deadline | mq-deadline | mq-deadline
------------------------------------------------------------------------------
read        "iops" | 401749.4001    | 35592.48901 | 401681.1137
randread    "iops" | 25175.07121    | 30029.52618 | 21446.68731
write       "iops" | 376168.7578    | 27340.56777 | 377356.7286
randwrite   "iops" | 25235.46163    | 24395.02969 | 24885.66152

Ming Lei (14):
  blk-mq-sched: fix scheduler bad performance
  blk-mq: rename flush_busy_ctx_data as ctx_iter_data
  blk-mq: introduce blk_mq_dispatch_rq_from_ctxs()
  blk-mq-sched: improve dispatching from sw queue
  blk-mq-sched: don't dequeue request until all in ->dispatch are
    flushed
  blk-mq-sched: introduce blk_mq_sched_queue_depth()
  blk-mq-sched: use q->queue_depth as hint for q->nr_requests
  blk-mq: introduce BLK_MQ_F_SHARED_DEPTH
  blk-mq-sched: cleanup blk_mq_sched_dispatch_requests()
  blk-mq-sched: introduce helpers for query, change busy state
  blk-mq: introduce helpers for operating ->dispatch list
  blk-mq: introduce pointers to dispatch lock & list
  blk-mq: pass 'request_queue *' to several helpers of operating BUSY
  blk-mq-sched: improve IO scheduling on SCSI devcie

 block/blk-mq-debugfs.c |  11 ++---
 block/blk-mq-sched.c   |  70 +++++++++++++++--------------
 block/blk-mq-sched.h   |  23 ++++++++++
 block/blk-mq.c         | 117 +++++++++++++++++++++++++++++++++++++++++++------
 block/blk-mq.h         |  72 ++++++++++++++++++++++++++++++
 block/blk-settings.c   |   2 +
 include/linux/blk-mq.h |   5 +++
 include/linux/blkdev.h |   5 +++
 8 files changed, 255 insertions(+), 50 deletions(-)

-- 
2.9.4

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 00/14] blk-mq-sched: fix SCSI-MQ performance regression
  2017-07-31 16:50 [PATCH 00/14] blk-mq-sched: fix SCSI-MQ performance regression Ming Lei
@ 2017-07-31 16:50 ` Ming Lei
  2017-07-31 16:50 ` [PATCH 01/14] blk-mq-sched: fix scheduler bad performance Ming Lei
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 47+ messages in thread
From: Ming Lei @ 2017-07-31 16:50 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, linux-scsi, Martin K . Petersen,
	James E . J . Bottomley, Ming Lei

In Red Hat internal storage test wrt. blk-mq scheduler, we
found that its performance is quite bad, especially
about sequential I/O on some multi-queue SCSI devcies.

Turns out one big issue causes the performance regression: requests
are still dequeued from sw queue/scheduler queue even when ldd's
queue is busy, so I/O merge becomes quite difficult to do, and
sequential IO degrades a lot.

The 1st five patches improve this situation, and brings back
some performance loss.

But looks they are still not enough. Finally it is caused by
the shared queue depth among all hw queues. For SCSI devices,
.cmd_per_lun defines the max number of pending I/O on one
request queue, which is per-request_queue depth. So during
dispatch, if one hctx is too busy to move on, all hctxs can't
dispatch too because of the per-request_queue depth.

Patch 6 ~ 14 use per-request_queue dispatch list to avoid
to dequeue requests from sw/scheduler queue when lld queue
is busy.

With this changes, SCSI-MQ performance is brought back
against block legacy path, follows the test result on lpfc:

- fio(libaio, bs:4k, dio, queue_depth:64, 20 jobs)


                   |v4.13-rc3       | v4.13-rc3   | patched v4.13-rc3
                   |legacy deadline | mq-none     | mq-none
---------------------------------------------------------------------
read        "iops" | 401749.4001    | 346237.5025 | 387536.4427
randread    "iops" | 25175.07121    | 21688.64067 | 25578.50374
write       "iops" | 376168.7578    | 335262.0475 | 370132.4735
reandwrite  "iops" | 25235.46163    | 24982.63819 | 23934.95610

                   |v4.13-rc3       | v4.13-rc3   | patched v4.13-rc3
                   |legacy deadline | mq-deadline | mq-deadline
------------------------------------------------------------------------------
read        "iops" | 401749.4001    | 35592.48901 | 401681.1137
randread    "iops" | 25175.07121    | 30029.52618 | 21446.68731
write       "iops" | 376168.7578    | 27340.56777 | 377356.7286
randwrite   "iops" | 25235.46163    | 24395.02969 | 24885.66152

Ming Lei (14):
  blk-mq-sched: fix scheduler bad performance
  blk-mq: rename flush_busy_ctx_data as ctx_iter_data
  blk-mq: introduce blk_mq_dispatch_rq_from_ctxs()
  blk-mq-sched: improve dispatching from sw queue
  blk-mq-sched: don't dequeue request until all in ->dispatch are
    flushed
  blk-mq-sched: introduce blk_mq_sched_queue_depth()
  blk-mq-sched: use q->queue_depth as hint for q->nr_requests
  blk-mq: introduce BLK_MQ_F_SHARED_DEPTH
  blk-mq-sched: cleanup blk_mq_sched_dispatch_requests()
  blk-mq-sched: introduce helpers for query, change busy state
  blk-mq: introduce helpers for operating ->dispatch list
  blk-mq: introduce pointers to dispatch lock & list
  blk-mq: pass 'request_queue *' to several helpers of operating BUSY
  blk-mq-sched: improve IO scheduling on SCSI devcie

 block/blk-mq-debugfs.c |  11 ++---
 block/blk-mq-sched.c   |  70 +++++++++++++++--------------
 block/blk-mq-sched.h   |  23 ++++++++++
 block/blk-mq.c         | 117 +++++++++++++++++++++++++++++++++++++++++++------
 block/blk-mq.h         |  72 ++++++++++++++++++++++++++++++
 block/blk-settings.c   |   2 +
 include/linux/blk-mq.h |   5 +++
 include/linux/blkdev.h |   5 +++
 8 files changed, 255 insertions(+), 50 deletions(-)

-- 
2.9.4

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 01/14] blk-mq-sched: fix scheduler bad performance
  2017-07-31 16:50 [PATCH 00/14] blk-mq-sched: fix SCSI-MQ performance regression Ming Lei
  2017-07-31 16:50 ` Ming Lei
@ 2017-07-31 16:50 ` Ming Lei
  2017-07-31 23:00     ` Bart Van Assche
  2017-07-31 16:50 ` [PATCH 02/14] blk-mq: rename flush_busy_ctx_data as ctx_iter_data Ming Lei
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Ming Lei @ 2017-07-31 16:50 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, linux-scsi, Martin K . Petersen,
	James E . J . Bottomley, Ming Lei

When hw queue is busy, we shouldn't take requests from
scheduler queue any more, otherwise IO merge will be
difficult to do.

This patch fixes the awful IO performance on some
SCSI devices(lpfc, qla2xxx, ...) when mq-deadline/kyber
is used by not taking requests if hw queue is busy.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 4ab69435708c..47a25333a136 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -94,7 +94,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	struct request_queue *q = hctx->queue;
 	struct elevator_queue *e = q->elevator;
 	const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
-	bool did_work = false;
+	bool can_go = true;
 	LIST_HEAD(rq_list);
 
 	/* RCU or SRCU read lock is needed before checking quiesced flag */
@@ -125,7 +125,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	 */
 	if (!list_empty(&rq_list)) {
 		blk_mq_sched_mark_restart_hctx(hctx);
-		did_work = blk_mq_dispatch_rq_list(q, &rq_list);
+		can_go = blk_mq_dispatch_rq_list(q, &rq_list);
 	} else if (!has_sched_dispatch) {
 		blk_mq_flush_busy_ctxs(hctx, &rq_list);
 		blk_mq_dispatch_rq_list(q, &rq_list);
@@ -136,7 +136,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	 * on the dispatch list, OR if we did have work but weren't able
 	 * to make progress.
 	 */
-	if (!did_work && has_sched_dispatch) {
+	if (can_go && has_sched_dispatch) {
 		do {
 			struct request *rq;
 
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 02/14] blk-mq: rename flush_busy_ctx_data as ctx_iter_data
  2017-07-31 16:50 [PATCH 00/14] blk-mq-sched: fix SCSI-MQ performance regression Ming Lei
  2017-07-31 16:50 ` Ming Lei
  2017-07-31 16:50 ` [PATCH 01/14] blk-mq-sched: fix scheduler bad performance Ming Lei
@ 2017-07-31 16:50 ` Ming Lei
  2017-07-31 23:03     ` Bart Van Assche
  2017-07-31 16:51 ` [PATCH 03/14] blk-mq: introduce blk_mq_dispatch_rq_from_ctxs() Ming Lei
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Ming Lei @ 2017-07-31 16:50 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, linux-scsi, Martin K . Petersen,
	James E . J . Bottomley, Ming Lei

The following patch need to reuse this data structure,
so rename as one generic name.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index b70a4ad78b63..94818f78c099 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -808,14 +808,14 @@ static void blk_mq_timeout_work(struct work_struct *work)
 	blk_queue_exit(q);
 }
 
-struct flush_busy_ctx_data {
+struct ctx_iter_data {
 	struct blk_mq_hw_ctx *hctx;
 	struct list_head *list;
 };
 
 static bool flush_busy_ctx(struct sbitmap *sb, unsigned int bitnr, void *data)
 {
-	struct flush_busy_ctx_data *flush_data = data;
+	struct ctx_iter_data *flush_data = data;
 	struct blk_mq_hw_ctx *hctx = flush_data->hctx;
 	struct blk_mq_ctx *ctx = hctx->ctxs[bitnr];
 
@@ -832,7 +832,7 @@ static bool flush_busy_ctx(struct sbitmap *sb, unsigned int bitnr, void *data)
  */
 void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list)
 {
-	struct flush_busy_ctx_data data = {
+	struct ctx_iter_data data = {
 		.hctx = hctx,
 		.list = list,
 	};
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 03/14] blk-mq: introduce blk_mq_dispatch_rq_from_ctxs()
  2017-07-31 16:50 [PATCH 00/14] blk-mq-sched: fix SCSI-MQ performance regression Ming Lei
                   ` (2 preceding siblings ...)
  2017-07-31 16:50 ` [PATCH 02/14] blk-mq: rename flush_busy_ctx_data as ctx_iter_data Ming Lei
@ 2017-07-31 16:51 ` Ming Lei
  2017-07-31 23:09     ` Bart Van Assche
  2017-08-02 17:19     ` kbuild test robot
  2017-07-31 16:51 ` [PATCH 04/14] blk-mq-sched: improve dispatching from sw queue Ming Lei
                   ` (10 subsequent siblings)
  14 siblings, 2 replies; 47+ messages in thread
From: Ming Lei @ 2017-07-31 16:51 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, linux-scsi, Martin K . Petersen,
	James E . J . Bottomley, Ming Lei

This function is introduced for picking up request
from sw queue so that we can dispatch in scheduler's way.

More importantly, for some SCSI devices, driver
tags are host wide, and the number is quite big,
but each lun has very limited queue depth. This
function is introduced for avoiding to take too
many requests from sw queue when queue is busy,
and only try to dispatch request when queue
isn't busy.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 38 +++++++++++++++++++++++++++++++++++++-
 block/blk-mq.h |  1 +
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 94818f78c099..86b8fdcb8434 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -810,7 +810,11 @@ static void blk_mq_timeout_work(struct work_struct *work)
 
 struct ctx_iter_data {
 	struct blk_mq_hw_ctx *hctx;
-	struct list_head *list;
+
+	union {
+		struct list_head *list;
+		struct request *rq;
+	};
 };
 
 static bool flush_busy_ctx(struct sbitmap *sb, unsigned int bitnr, void *data)
@@ -826,6 +830,26 @@ static bool flush_busy_ctx(struct sbitmap *sb, unsigned int bitnr, void *data)
 	return true;
 }
 
+static bool dispatch_rq_from_ctx(struct sbitmap *sb, unsigned int bitnr, void *data)
+{
+	struct ctx_iter_data *dispatch_data = data;
+	struct blk_mq_hw_ctx *hctx = dispatch_data->hctx;
+	struct blk_mq_ctx *ctx = hctx->ctxs[bitnr];
+	bool empty = true;
+
+	spin_lock(&ctx->lock);
+	if (unlikely(!list_empty(&ctx->rq_list))) {
+		dispatch_data->rq = list_entry_rq(ctx->rq_list.next);
+		list_del_init(&dispatch_data->rq->queuelist);
+		empty = list_empty(&ctx->rq_list);
+	}
+	spin_unlock(&ctx->lock);
+	if (empty)
+		sbitmap_clear_bit(sb, bitnr);
+
+	return !dispatch_data->rq;
+}
+
 /*
  * Process software queues that have been marked busy, splicing them
  * to the for-dispatch
@@ -841,6 +865,18 @@ void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list)
 }
 EXPORT_SYMBOL_GPL(blk_mq_flush_busy_ctxs);
 
+struct request *blk_mq_dispatch_rq_from_ctxs(struct blk_mq_hw_ctx *hctx)
+{
+	struct ctx_iter_data data = {
+		.hctx = hctx,
+		.rq   = NULL,
+	};
+
+	sbitmap_for_each_set(&hctx->ctx_map, dispatch_rq_from_ctx, &data);
+
+	return data.rq;
+}
+
 static inline unsigned int queued_to_index(unsigned int queued)
 {
 	if (!queued)
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 60b01c0309bc..0c398f29dc4b 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -35,6 +35,7 @@ void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list);
 bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx);
 bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
 				bool wait);
+struct request *blk_mq_dispatch_rq_from_ctxs(struct blk_mq_hw_ctx *hctx);
 
 /*
  * Internal helpers for allocating/freeing the request map
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 04/14] blk-mq-sched: improve dispatching from sw queue
  2017-07-31 16:50 [PATCH 00/14] blk-mq-sched: fix SCSI-MQ performance regression Ming Lei
                   ` (3 preceding siblings ...)
  2017-07-31 16:51 ` [PATCH 03/14] blk-mq: introduce blk_mq_dispatch_rq_from_ctxs() Ming Lei
@ 2017-07-31 16:51 ` Ming Lei
  2017-07-31 23:34     ` Bart Van Assche
  2017-07-31 16:51 ` [PATCH 05/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed Ming Lei
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Ming Lei @ 2017-07-31 16:51 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, linux-scsi, Martin K . Petersen,
	James E . J . Bottomley, Ming Lei

SCSI devices use host-wide tagset, and the shared
driver tag space is often quite big. Meantime
there is also queue depth for each lun(.cmd_per_lun),
which is often small.

So lots of requests may stay in sw queue, and we
always flush all belonging to same hw queue and
dispatch them all to driver, unfortunately it is
easy to cause queue busy becasue of the small
per-lun queue depth. Once these requests are flushed
out, they have to stay in hctx->dispatch, and no bio
merge can participate into these requests, and
sequential IO performance is hurted.

This patch improves dispatching from sw queue when
there is per-request-queue queue depth by taking
request one by one from sw queue, just like the way
of IO scheduler.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 47a25333a136..3510c01cb17b 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -96,6 +96,9 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
 	bool can_go = true;
 	LIST_HEAD(rq_list);
+	struct request *(*dispatch_fn)(struct blk_mq_hw_ctx *) =
+		has_sched_dispatch ? e->type->ops.mq.dispatch_request :
+			blk_mq_dispatch_rq_from_ctxs;
 
 	/* RCU or SRCU read lock is needed before checking quiesced flag */
 	if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
@@ -126,26 +129,28 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	if (!list_empty(&rq_list)) {
 		blk_mq_sched_mark_restart_hctx(hctx);
 		can_go = blk_mq_dispatch_rq_list(q, &rq_list);
-	} else if (!has_sched_dispatch) {
+	} else if (!has_sched_dispatch && !q->queue_depth) {
 		blk_mq_flush_busy_ctxs(hctx, &rq_list);
 		blk_mq_dispatch_rq_list(q, &rq_list);
+		can_go = false;
 	}
 
+	if (!can_go)
+		return;
+
 	/*
 	 * We want to dispatch from the scheduler if we had no work left
 	 * on the dispatch list, OR if we did have work but weren't able
 	 * to make progress.
 	 */
-	if (can_go && has_sched_dispatch) {
-		do {
-			struct request *rq;
+	do {
+		struct request *rq;
 
-			rq = e->type->ops.mq.dispatch_request(hctx);
-			if (!rq)
-				break;
-			list_add(&rq->queuelist, &rq_list);
-		} while (blk_mq_dispatch_rq_list(q, &rq_list));
-	}
+		rq = dispatch_fn(hctx);
+		if (!rq)
+			break;
+		list_add(&rq->queuelist, &rq_list);
+	} while (blk_mq_dispatch_rq_list(q, &rq_list));
 }
 
 bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 05/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed
  2017-07-31 16:50 [PATCH 00/14] blk-mq-sched: fix SCSI-MQ performance regression Ming Lei
                   ` (4 preceding siblings ...)
  2017-07-31 16:51 ` [PATCH 04/14] blk-mq-sched: improve dispatching from sw queue Ming Lei
@ 2017-07-31 16:51 ` Ming Lei
  2017-07-31 23:42     ` Bart Van Assche
  2017-07-31 16:51 ` [PATCH 06/14] blk-mq-sched: introduce blk_mq_sched_queue_depth() Ming Lei
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Ming Lei @ 2017-07-31 16:51 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, linux-scsi, Martin K . Petersen,
	James E . J . Bottomley, Ming Lei

During dispatch, we moved all requests from hctx->dispatch to
one temporary list, then dispatch them one by one from this list.
Unfortunately duirng this period, run queue from other contexts
may think the queue is idle and start to dequeue from sw/scheduler
queue and try to dispatch because ->dispatch is empty.

This way will hurt sequential I/O performance because requests are
dequeued when queue is busy.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c   | 24 ++++++++++++++++++------
 include/linux/blk-mq.h |  1 +
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 3510c01cb17b..eb638063673f 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -112,8 +112,15 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	 */
 	if (!list_empty_careful(&hctx->dispatch)) {
 		spin_lock(&hctx->lock);
-		if (!list_empty(&hctx->dispatch))
+		if (!list_empty(&hctx->dispatch)) {
 			list_splice_init(&hctx->dispatch, &rq_list);
+
+			/*
+			 * BUSY won't be cleared until all requests
+			 * in hctx->dispatch are dispatched successfully
+			 */
+			set_bit(BLK_MQ_S_BUSY, &hctx->state);
+		}
 		spin_unlock(&hctx->lock);
 	}
 
@@ -129,15 +136,20 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	if (!list_empty(&rq_list)) {
 		blk_mq_sched_mark_restart_hctx(hctx);
 		can_go = blk_mq_dispatch_rq_list(q, &rq_list);
-	} else if (!has_sched_dispatch && !q->queue_depth) {
-		blk_mq_flush_busy_ctxs(hctx, &rq_list);
-		blk_mq_dispatch_rq_list(q, &rq_list);
-		can_go = false;
+		if (can_go)
+			clear_bit(BLK_MQ_S_BUSY, &hctx->state);
 	}
 
-	if (!can_go)
+	/* can't go until ->dispatch is flushed */
+	if (!can_go || test_bit(BLK_MQ_S_BUSY, &hctx->state))
 		return;
 
+	if (!has_sched_dispatch && !q->queue_depth) {
+		blk_mq_flush_busy_ctxs(hctx, &rq_list);
+		blk_mq_dispatch_rq_list(q, &rq_list);
+		return;
+	}
+
 	/*
 	 * We want to dispatch from the scheduler if we had no work left
 	 * on the dispatch list, OR if we did have work but weren't able
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 14542308d25b..6d44b242b495 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -172,6 +172,7 @@ enum {
 	BLK_MQ_S_SCHED_RESTART	= 2,
 	BLK_MQ_S_TAG_WAITING	= 3,
 	BLK_MQ_S_START_ON_RUN	= 4,
+	BLK_MQ_S_BUSY		= 5,
 
 	BLK_MQ_MAX_DEPTH	= 10240,
 
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 06/14] blk-mq-sched: introduce blk_mq_sched_queue_depth()
  2017-07-31 16:50 [PATCH 00/14] blk-mq-sched: fix SCSI-MQ performance regression Ming Lei
                   ` (5 preceding siblings ...)
  2017-07-31 16:51 ` [PATCH 05/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed Ming Lei
@ 2017-07-31 16:51 ` Ming Lei
  2017-07-31 16:51 ` [PATCH 07/14] blk-mq-sched: use q->queue_depth as hint for q->nr_requests Ming Lei
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 47+ messages in thread
From: Ming Lei @ 2017-07-31 16:51 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, linux-scsi, Martin K . Petersen,
	James E . J . Bottomley, Ming Lei

The following patch will propose some hints to figure out
default queue depth for scheduler queue, so introduce helper
of blk_mq_sched_queue_depth() for this purpose.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c |  8 +-------
 block/blk-mq-sched.h | 11 +++++++++++
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index eb638063673f..3eb524ccb7aa 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -531,13 +531,7 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 		return 0;
 	}
 
-	/*
-	 * Default to double of smaller one between hw queue_depth and 128,
-	 * since we don't split into sync/async like the old code did.
-	 * Additionally, this is a per-hw queue depth.
-	 */
-	q->nr_requests = 2 * min_t(unsigned int, q->tag_set->queue_depth,
-				   BLKDEV_MAX_RQ);
+	q->nr_requests = blk_mq_sched_queue_depth(q);
 
 	queue_for_each_hw_ctx(q, hctx, i) {
 		ret = blk_mq_sched_alloc_tags(q, hctx, i);
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 9267d0b7c197..1d47f3fda1d0 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -96,4 +96,15 @@ static inline bool blk_mq_sched_needs_restart(struct blk_mq_hw_ctx *hctx)
 	return test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
 }
 
+static inline unsigned blk_mq_sched_queue_depth(struct request_queue *q)
+{
+	/*
+	 * Default to double of smaller one between hw queue_depth and 128,
+	 * since we don't split into sync/async like the old code did.
+	 * Additionally, this is a per-hw queue depth.
+	 */
+	return 2 * min_t(unsigned int, q->tag_set->queue_depth,
+				   BLKDEV_MAX_RQ);
+}
+
 #endif
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 07/14] blk-mq-sched: use q->queue_depth as hint for q->nr_requests
  2017-07-31 16:50 [PATCH 00/14] blk-mq-sched: fix SCSI-MQ performance regression Ming Lei
                   ` (6 preceding siblings ...)
  2017-07-31 16:51 ` [PATCH 06/14] blk-mq-sched: introduce blk_mq_sched_queue_depth() Ming Lei
@ 2017-07-31 16:51 ` Ming Lei
  2017-07-31 16:51 ` [PATCH 08/14] blk-mq: introduce BLK_MQ_F_SHARED_DEPTH Ming Lei
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 47+ messages in thread
From: Ming Lei @ 2017-07-31 16:51 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, linux-scsi, Martin K . Petersen,
	James E . J . Bottomley, Ming Lei

SCSI sets q->queue_depth from shost->cmd_per_lun, and
q->queue_depth is per request_queue and more related to
scheduler queue compared with hw queue depth, which can be
shared by queues, such as TAG_SHARED.

This patch trys to use q->queue_depth as hint for computing
q->nr_requests, which should be more effective than
current way.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.h | 18 +++++++++++++++---
 block/blk-mq.c       | 27 +++++++++++++++++++++++++--
 block/blk-mq.h       |  1 +
 block/blk-settings.c |  2 ++
 4 files changed, 43 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 1d47f3fda1d0..bb772e680e01 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -99,12 +99,24 @@ static inline bool blk_mq_sched_needs_restart(struct blk_mq_hw_ctx *hctx)
 static inline unsigned blk_mq_sched_queue_depth(struct request_queue *q)
 {
 	/*
-	 * Default to double of smaller one between hw queue_depth and 128,
+	 * q->queue_depth is more close to scheduler queue, so use it
+	 * as hint for computing scheduler queue depth if it is valid
+	 */
+	unsigned q_depth = q->queue_depth ?: q->tag_set->queue_depth;
+
+	/*
+	 * Default to double of smaller one between queue depth and 128,
 	 * since we don't split into sync/async like the old code did.
 	 * Additionally, this is a per-hw queue depth.
 	 */
-	return 2 * min_t(unsigned int, q->tag_set->queue_depth,
-				   BLKDEV_MAX_RQ);
+	q_depth = 2 * min_t(unsigned int, q_depth, BLKDEV_MAX_RQ);
+
+	/*
+	 * when queue depth of driver is too small, we set queue depth
+	 * of scheduler queue as 32 so that small queue device still
+	 * can benefit from IO merging.
+	 */
+	return max_t(unsigned, q_depth, 32);
 }
 
 #endif
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 86b8fdcb8434..7df68d31bc23 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2593,7 +2593,9 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
 }
 EXPORT_SYMBOL(blk_mq_free_tag_set);
 
-int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
+static int __blk_mq_update_nr_requests(struct request_queue *q,
+				       bool sched_only,
+				       unsigned int nr)
 {
 	struct blk_mq_tag_set *set = q->tag_set;
 	struct blk_mq_hw_ctx *hctx;
@@ -2612,7 +2614,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 		 * If we're using an MQ scheduler, just update the scheduler
 		 * queue depth. This is similar to what the old code would do.
 		 */
-		if (!hctx->sched_tags) {
+		if (!sched_only && !hctx->sched_tags) {
 			ret = blk_mq_tag_update_depth(hctx, &hctx->tags,
 							min(nr, set->queue_depth),
 							false);
@@ -2632,6 +2634,27 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 	return ret;
 }
 
+int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
+{
+	return __blk_mq_update_nr_requests(q, false, nr);
+}
+
+/*
+ * When drivers update q->queue_depth, this API is called so that
+ * we can use this queue depth as hint for adjusting scheduler
+ * queue depth.
+ */
+int blk_mq_update_sched_queue_depth(struct request_queue *q)
+{
+	unsigned nr;
+
+	if (!q->mq_ops || !q->elevator)
+		return 0;
+
+	nr = blk_mq_sched_queue_depth(q);
+	return __blk_mq_update_nr_requests(q, true, nr);
+}
+
 static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 							int nr_hw_queues)
 {
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 0c398f29dc4b..44d3aaa03d7c 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -36,6 +36,7 @@ bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx);
 bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
 				bool wait);
 struct request *blk_mq_dispatch_rq_from_ctxs(struct blk_mq_hw_ctx *hctx);
+int blk_mq_update_sched_queue_depth(struct request_queue *q);
 
 /*
  * Internal helpers for allocating/freeing the request map
diff --git a/block/blk-settings.c b/block/blk-settings.c
index be1f115b538b..94a349601545 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -877,6 +877,8 @@ void blk_set_queue_depth(struct request_queue *q, unsigned int depth)
 {
 	q->queue_depth = depth;
 	wbt_set_queue_depth(q->rq_wb, depth);
+
+	WARN_ON(blk_mq_update_sched_queue_depth(q));
 }
 EXPORT_SYMBOL(blk_set_queue_depth);
 
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 08/14] blk-mq: introduce BLK_MQ_F_SHARED_DEPTH
  2017-07-31 16:50 [PATCH 00/14] blk-mq-sched: fix SCSI-MQ performance regression Ming Lei
                   ` (7 preceding siblings ...)
  2017-07-31 16:51 ` [PATCH 07/14] blk-mq-sched: use q->queue_depth as hint for q->nr_requests Ming Lei
@ 2017-07-31 16:51 ` Ming Lei
  2017-07-31 16:51 ` [PATCH 09/14] blk-mq-sched: cleanup blk_mq_sched_dispatch_requests() Ming Lei
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 47+ messages in thread
From: Ming Lei @ 2017-07-31 16:51 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, linux-scsi, Martin K . Petersen,
	James E . J . Bottomley, Ming Lei

SCSI devices often provides one per-requeest_queue depth via
q->queue_depth(.cmd_per_lun), which is a global limit on all
hw queues. After the pending I/O submitted to one rquest queue
reaches this limit, BLK_STS_RESOURCE will be returned to all
dispatch path. That means when one hw queue is stuck, actually
all hctxs are stuck too.

This flag is introduced for improving blk-mq IO scheduling
on this kind of device.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-debugfs.c |  1 +
 block/blk-mq-sched.c   |  2 +-
 block/blk-mq.c         | 25 ++++++++++++++++++++++---
 include/linux/blk-mq.h |  1 +
 4 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 9ebc2945f991..c4f70b453c76 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -209,6 +209,7 @@ static const char *const hctx_flag_name[] = {
 	HCTX_FLAG_NAME(SG_MERGE),
 	HCTX_FLAG_NAME(BLOCKING),
 	HCTX_FLAG_NAME(NO_SCHED),
+	HCTX_FLAG_NAME(SHARED_DEPTH),
 };
 #undef HCTX_FLAG_NAME
 
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 3eb524ccb7aa..cc0687a4d0ab 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -144,7 +144,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	if (!can_go || test_bit(BLK_MQ_S_BUSY, &hctx->state))
 		return;
 
-	if (!has_sched_dispatch && !q->queue_depth) {
+	if (!has_sched_dispatch && !(hctx->flags & BLK_MQ_F_SHARED_DEPTH)) {
 		blk_mq_flush_busy_ctxs(hctx, &rq_list);
 		blk_mq_dispatch_rq_list(q, &rq_list);
 		return;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7df68d31bc23..db635ef06a72 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2647,12 +2647,31 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 int blk_mq_update_sched_queue_depth(struct request_queue *q)
 {
 	unsigned nr;
+	struct blk_mq_hw_ctx *hctx;
+	unsigned int i;
+	int ret = 0;
 
-	if (!q->mq_ops || !q->elevator)
-		return 0;
+	if (!q->mq_ops)
+		return ret;
+
+	blk_mq_freeze_queue(q);
+	/*
+	 * if there is q->queue_depth, all hw queues share
+	 * this queue depth limit
+	 */
+	if (q->queue_depth) {
+		queue_for_each_hw_ctx(q, hctx, i)
+			hctx->flags |= BLK_MQ_F_SHARED_DEPTH;
+	}
+
+	if (!q->elevator)
+		goto exit;
 
 	nr = blk_mq_sched_queue_depth(q);
-	return __blk_mq_update_nr_requests(q, true, nr);
+	ret = __blk_mq_update_nr_requests(q, true, nr);
+ exit:
+	blk_mq_unfreeze_queue(q);
+	return ret;
 }
 
 static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 6d44b242b495..14f2ad3af31f 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -164,6 +164,7 @@ enum {
 	BLK_MQ_F_SG_MERGE	= 1 << 2,
 	BLK_MQ_F_BLOCKING	= 1 << 5,
 	BLK_MQ_F_NO_SCHED	= 1 << 6,
+	BLK_MQ_F_SHARED_DEPTH	= 1 << 7,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
 	BLK_MQ_F_ALLOC_POLICY_BITS = 1,
 
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 09/14] blk-mq-sched: cleanup blk_mq_sched_dispatch_requests()
  2017-07-31 16:50 [PATCH 00/14] blk-mq-sched: fix SCSI-MQ performance regression Ming Lei
                   ` (8 preceding siblings ...)
  2017-07-31 16:51 ` [PATCH 08/14] blk-mq: introduce BLK_MQ_F_SHARED_DEPTH Ming Lei
@ 2017-07-31 16:51 ` Ming Lei
  2017-07-31 16:51 ` [PATCH 10/14] blk-mq-sched: introduce helpers for query, change busy state Ming Lei
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 47+ messages in thread
From: Ming Lei @ 2017-07-31 16:51 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, linux-scsi, Martin K . Petersen,
	James E . J . Bottomley, Ming Lei

This patch split blk_mq_sched_dispatch_requests()
into two parts:

1) the 1st part is for checking if queue is busy, and
handle the busy situation

2) the 2nd part is moved to __blk_mq_sched_dispatch_requests()
which focuses on dispatch from sw queue or scheduler queue.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c | 42 +++++++++++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index cc0687a4d0ab..07ff53187617 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -89,16 +89,37 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx)
 	return false;
 }
 
-void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
+static void __blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
 	struct elevator_queue *e = q->elevator;
 	const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
-	bool can_go = true;
-	LIST_HEAD(rq_list);
 	struct request *(*dispatch_fn)(struct blk_mq_hw_ctx *) =
 		has_sched_dispatch ? e->type->ops.mq.dispatch_request :
 			blk_mq_dispatch_rq_from_ctxs;
+	LIST_HEAD(rq_list);
+
+	if (!has_sched_dispatch && !(hctx->flags & BLK_MQ_F_SHARED_DEPTH)) {
+		blk_mq_flush_busy_ctxs(hctx, &rq_list);
+		blk_mq_dispatch_rq_list(q, &rq_list);
+		return;
+	}
+
+	do {
+		struct request *rq;
+
+		rq = dispatch_fn(hctx);
+		if (!rq)
+			break;
+		list_add(&rq->queuelist, &rq_list);
+	} while (blk_mq_dispatch_rq_list(q, &rq_list));
+}
+
+void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
+{
+	struct request_queue *q = hctx->queue;
+	bool can_go = true;
+	LIST_HEAD(rq_list);
 
 	/* RCU or SRCU read lock is needed before checking quiesced flag */
 	if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
@@ -144,25 +165,12 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	if (!can_go || test_bit(BLK_MQ_S_BUSY, &hctx->state))
 		return;
 
-	if (!has_sched_dispatch && !(hctx->flags & BLK_MQ_F_SHARED_DEPTH)) {
-		blk_mq_flush_busy_ctxs(hctx, &rq_list);
-		blk_mq_dispatch_rq_list(q, &rq_list);
-		return;
-	}
-
 	/*
 	 * We want to dispatch from the scheduler if we had no work left
 	 * on the dispatch list, OR if we did have work but weren't able
 	 * to make progress.
 	 */
-	do {
-		struct request *rq;
-
-		rq = dispatch_fn(hctx);
-		if (!rq)
-			break;
-		list_add(&rq->queuelist, &rq_list);
-	} while (blk_mq_dispatch_rq_list(q, &rq_list));
+	__blk_mq_sched_dispatch_requests(hctx);
 }
 
 bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 10/14] blk-mq-sched: introduce helpers for query, change busy state
  2017-07-31 16:50 [PATCH 00/14] blk-mq-sched: fix SCSI-MQ performance regression Ming Lei
                   ` (9 preceding siblings ...)
  2017-07-31 16:51 ` [PATCH 09/14] blk-mq-sched: cleanup blk_mq_sched_dispatch_requests() Ming Lei
@ 2017-07-31 16:51 ` Ming Lei
  2017-07-31 16:51 ` [PATCH 11/14] blk-mq: introduce helpers for operating ->dispatch list Ming Lei
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 47+ messages in thread
From: Ming Lei @ 2017-07-31 16:51 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, linux-scsi, Martin K . Petersen,
	James E . J . Bottomley, Ming Lei

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c |  6 +++---
 block/blk-mq.h       | 15 +++++++++++++++
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 07ff53187617..112270961af0 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -140,7 +140,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 			 * BUSY won't be cleared until all requests
 			 * in hctx->dispatch are dispatched successfully
 			 */
-			set_bit(BLK_MQ_S_BUSY, &hctx->state);
+			blk_mq_hctx_set_busy(hctx);
 		}
 		spin_unlock(&hctx->lock);
 	}
@@ -158,11 +158,11 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 		blk_mq_sched_mark_restart_hctx(hctx);
 		can_go = blk_mq_dispatch_rq_list(q, &rq_list);
 		if (can_go)
-			clear_bit(BLK_MQ_S_BUSY, &hctx->state);
+			blk_mq_hctx_clear_busy(hctx);
 	}
 
 	/* can't go until ->dispatch is flushed */
-	if (!can_go || test_bit(BLK_MQ_S_BUSY, &hctx->state))
+	if (!can_go || blk_mq_hctx_is_busy(hctx))
 		return;
 
 	/*
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 44d3aaa03d7c..d9f875093613 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -135,4 +135,19 @@ static inline bool blk_mq_hw_queue_mapped(struct blk_mq_hw_ctx *hctx)
 	return hctx->nr_ctx && hctx->tags;
 }
 
+static inline bool blk_mq_hctx_is_busy(struct blk_mq_hw_ctx *hctx)
+{
+	return test_bit(BLK_MQ_S_BUSY, &hctx->state);
+}
+
+static inline void blk_mq_hctx_set_busy(struct blk_mq_hw_ctx *hctx)
+{
+	set_bit(BLK_MQ_S_BUSY, &hctx->state);
+}
+
+static inline void blk_mq_hctx_clear_busy(struct blk_mq_hw_ctx *hctx)
+{
+	clear_bit(BLK_MQ_S_BUSY, &hctx->state);
+}
+
 #endif
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 11/14] blk-mq: introduce helpers for operating ->dispatch list
  2017-07-31 16:50 [PATCH 00/14] blk-mq-sched: fix SCSI-MQ performance regression Ming Lei
                   ` (10 preceding siblings ...)
  2017-07-31 16:51 ` [PATCH 10/14] blk-mq-sched: introduce helpers for query, change busy state Ming Lei
@ 2017-07-31 16:51 ` Ming Lei
  2017-07-31 16:51 ` [PATCH 12/14] blk-mq: introduce pointers to dispatch lock & list Ming Lei
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 47+ messages in thread
From: Ming Lei @ 2017-07-31 16:51 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, linux-scsi, Martin K . Petersen,
	James E . J . Bottomley, Ming Lei

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c | 19 +++----------------
 block/blk-mq.c       | 18 +++++++++++-------
 block/blk-mq.h       | 44 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 58 insertions(+), 23 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 112270961af0..8ff74efe4172 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -131,19 +131,8 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	 * If we have previous entries on our dispatch list, grab them first for
 	 * more fair dispatch.
 	 */
-	if (!list_empty_careful(&hctx->dispatch)) {
-		spin_lock(&hctx->lock);
-		if (!list_empty(&hctx->dispatch)) {
-			list_splice_init(&hctx->dispatch, &rq_list);
-
-			/*
-			 * BUSY won't be cleared until all requests
-			 * in hctx->dispatch are dispatched successfully
-			 */
-			blk_mq_hctx_set_busy(hctx);
-		}
-		spin_unlock(&hctx->lock);
-	}
+	if (blk_mq_has_dispatch_rqs(hctx))
+		blk_mq_take_list_from_dispatch(hctx, &rq_list);
 
 	/*
 	 * Only ask the scheduler for requests, if we didn't have residual
@@ -296,9 +285,7 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
 	 * If we already have a real request tag, send directly to
 	 * the dispatch list.
 	 */
-	spin_lock(&hctx->lock);
-	list_add(&rq->queuelist, &hctx->dispatch);
-	spin_unlock(&hctx->lock);
+	blk_mq_add_rq_to_dispatch(hctx, rq);
 	return true;
 }
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index db635ef06a72..785145f60c1d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -63,7 +63,7 @@ static int blk_mq_poll_stats_bkt(const struct request *rq)
 bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
 {
 	return sbitmap_any_bit_set(&hctx->ctx_map) ||
-			!list_empty_careful(&hctx->dispatch) ||
+			blk_mq_has_dispatch_rqs(hctx) ||
 			blk_mq_sched_has_work(hctx);
 }
 
@@ -1097,9 +1097,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
 		rq = list_first_entry(list, struct request, queuelist);
 		blk_mq_put_driver_tag(rq);
 
-		spin_lock(&hctx->lock);
-		list_splice_init(list, &hctx->dispatch);
-		spin_unlock(&hctx->lock);
+		blk_mq_add_list_to_dispatch(hctx, list);
 
 		/*
 		 * If SCHED_RESTART was set by the caller of this function and
@@ -1874,9 +1872,7 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
 	if (list_empty(&tmp))
 		return 0;
 
-	spin_lock(&hctx->lock);
-	list_splice_tail_init(&tmp, &hctx->dispatch);
-	spin_unlock(&hctx->lock);
+	blk_mq_add_list_to_dispatch_tail(hctx, &tmp);
 
 	blk_mq_run_hw_queue(hctx, true);
 	return 0;
@@ -1926,6 +1922,13 @@ static void blk_mq_exit_hw_queues(struct request_queue *q,
 	}
 }
 
+static void blk_mq_init_dispatch(struct request_queue *q,
+		struct blk_mq_hw_ctx *hctx)
+{
+	spin_lock_init(&hctx->lock);
+	INIT_LIST_HEAD(&hctx->dispatch);
+}
+
 static int blk_mq_init_hctx(struct request_queue *q,
 		struct blk_mq_tag_set *set,
 		struct blk_mq_hw_ctx *hctx, unsigned hctx_idx)
@@ -1939,6 +1942,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
 	INIT_DELAYED_WORK(&hctx->run_work, blk_mq_run_work_fn);
 	spin_lock_init(&hctx->lock);
 	INIT_LIST_HEAD(&hctx->dispatch);
+	blk_mq_init_dispatch(q, hctx);
 	hctx->queue = q;
 	hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED;
 
diff --git a/block/blk-mq.h b/block/blk-mq.h
index d9f875093613..2ed355881996 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -150,4 +150,48 @@ static inline void blk_mq_hctx_clear_busy(struct blk_mq_hw_ctx *hctx)
 	clear_bit(BLK_MQ_S_BUSY, &hctx->state);
 }
 
+static inline bool blk_mq_has_dispatch_rqs(struct blk_mq_hw_ctx *hctx)
+{
+	return !list_empty_careful(&hctx->dispatch);
+}
+
+static inline void blk_mq_add_rq_to_dispatch(struct blk_mq_hw_ctx *hctx,
+		struct request *rq)
+{
+	spin_lock(&hctx->lock);
+	list_add(&rq->queuelist, &hctx->dispatch);
+	spin_unlock(&hctx->lock);
+}
+
+static inline void blk_mq_add_list_to_dispatch(struct blk_mq_hw_ctx *hctx,
+		struct list_head *list)
+{
+	spin_lock(&hctx->lock);
+	list_splice_init(list, &hctx->dispatch);
+	spin_unlock(&hctx->lock);
+}
+
+static inline void blk_mq_add_list_to_dispatch_tail(struct blk_mq_hw_ctx *hctx,
+						    struct list_head *list)
+{
+	spin_lock(&hctx->lock);
+	list_splice_tail_init(list, &hctx->dispatch);
+	spin_unlock(&hctx->lock);
+}
+
+static inline void blk_mq_take_list_from_dispatch(struct blk_mq_hw_ctx *hctx,
+		struct list_head *list)
+{
+	spin_lock(&hctx->lock);
+	list_splice_init(&hctx->dispatch, list);
+
+	/*
+	 * BUSY won't be cleared until all requests
+	 * in hctx->dispatch are dispatched successfully
+	 */
+	if (!list_empty(list))
+		blk_mq_hctx_set_busy(hctx);
+	spin_unlock(&hctx->lock);
+}
+
 #endif
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 12/14] blk-mq: introduce pointers to dispatch lock & list
  2017-07-31 16:50 [PATCH 00/14] blk-mq-sched: fix SCSI-MQ performance regression Ming Lei
                   ` (11 preceding siblings ...)
  2017-07-31 16:51 ` [PATCH 11/14] blk-mq: introduce helpers for operating ->dispatch list Ming Lei
@ 2017-07-31 16:51 ` Ming Lei
  2017-07-31 16:51 ` [PATCH 13/14] blk-mq: pass 'request_queue *' to several helpers of operating BUSY Ming Lei
  2017-07-31 16:51 ` [PATCH 14/14] blk-mq-sched: improve IO scheduling on SCSI devcie Ming Lei
  14 siblings, 0 replies; 47+ messages in thread
From: Ming Lei @ 2017-07-31 16:51 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, linux-scsi, Martin K . Petersen,
	James E . J . Bottomley, Ming Lei

Prepare to support per-request-queue dispatch list,
so introduce dispatch lock and list for avoiding to
do runtime check.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-debugfs.c | 10 +++++-----
 block/blk-mq.c         |  7 +++++--
 block/blk-mq.h         | 26 +++++++++++++-------------
 include/linux/blk-mq.h |  3 +++
 4 files changed, 26 insertions(+), 20 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index c4f70b453c76..4f8cddb8505f 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -370,23 +370,23 @@ static void *hctx_dispatch_start(struct seq_file *m, loff_t *pos)
 {
 	struct blk_mq_hw_ctx *hctx = m->private;
 
-	spin_lock(&hctx->lock);
-	return seq_list_start(&hctx->dispatch, *pos);
+	spin_lock(hctx->dispatch_lock);
+	return seq_list_start(hctx->dispatch_list, *pos);
 }
 
 static void *hctx_dispatch_next(struct seq_file *m, void *v, loff_t *pos)
 {
 	struct blk_mq_hw_ctx *hctx = m->private;
 
-	return seq_list_next(v, &hctx->dispatch, pos);
+	return seq_list_next(v, hctx->dispatch_list, pos);
 }
 
 static void hctx_dispatch_stop(struct seq_file *m, void *v)
-	__releases(&hctx->lock)
+	__releases(hctx->dispatch_lock)
 {
 	struct blk_mq_hw_ctx *hctx = m->private;
 
-	spin_unlock(&hctx->lock);
+	spin_unlock(hctx->dispatch_lock);
 }
 
 static const struct seq_operations hctx_dispatch_seq_ops = {
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 785145f60c1d..9b8b3a740d18 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1925,8 +1925,11 @@ static void blk_mq_exit_hw_queues(struct request_queue *q,
 static void blk_mq_init_dispatch(struct request_queue *q,
 		struct blk_mq_hw_ctx *hctx)
 {
-	spin_lock_init(&hctx->lock);
-	INIT_LIST_HEAD(&hctx->dispatch);
+	hctx->dispatch_lock = &hctx->lock;
+	hctx->dispatch_list = &hctx->dispatch;
+
+	spin_lock_init(hctx->dispatch_lock);
+	INIT_LIST_HEAD(hctx->dispatch_list);
 }
 
 static int blk_mq_init_hctx(struct request_queue *q,
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 2ed355881996..d9795cbba1bb 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -152,38 +152,38 @@ static inline void blk_mq_hctx_clear_busy(struct blk_mq_hw_ctx *hctx)
 
 static inline bool blk_mq_has_dispatch_rqs(struct blk_mq_hw_ctx *hctx)
 {
-	return !list_empty_careful(&hctx->dispatch);
+	return !list_empty_careful(hctx->dispatch_list);
 }
 
 static inline void blk_mq_add_rq_to_dispatch(struct blk_mq_hw_ctx *hctx,
 		struct request *rq)
 {
-	spin_lock(&hctx->lock);
-	list_add(&rq->queuelist, &hctx->dispatch);
-	spin_unlock(&hctx->lock);
+	spin_lock(hctx->dispatch_lock);
+	list_add(&rq->queuelist, hctx->dispatch_list);
+	spin_unlock(hctx->dispatch_lock);
 }
 
 static inline void blk_mq_add_list_to_dispatch(struct blk_mq_hw_ctx *hctx,
 		struct list_head *list)
 {
-	spin_lock(&hctx->lock);
-	list_splice_init(list, &hctx->dispatch);
-	spin_unlock(&hctx->lock);
+	spin_lock(hctx->dispatch_lock);
+	list_splice_init(list, hctx->dispatch_list);
+	spin_unlock(hctx->dispatch_lock);
 }
 
 static inline void blk_mq_add_list_to_dispatch_tail(struct blk_mq_hw_ctx *hctx,
 						    struct list_head *list)
 {
-	spin_lock(&hctx->lock);
-	list_splice_tail_init(list, &hctx->dispatch);
-	spin_unlock(&hctx->lock);
+	spin_lock(hctx->dispatch_lock);
+	list_splice_tail_init(list, hctx->dispatch_list);
+	spin_unlock(hctx->dispatch_lock);
 }
 
 static inline void blk_mq_take_list_from_dispatch(struct blk_mq_hw_ctx *hctx,
 		struct list_head *list)
 {
-	spin_lock(&hctx->lock);
-	list_splice_init(&hctx->dispatch, list);
+	spin_lock(hctx->dispatch_lock);
+	list_splice_init(hctx->dispatch_list, list);
 
 	/*
 	 * BUSY won't be cleared until all requests
@@ -191,7 +191,7 @@ static inline void blk_mq_take_list_from_dispatch(struct blk_mq_hw_ctx *hctx,
 	 */
 	if (!list_empty(list))
 		blk_mq_hctx_set_busy(hctx);
-	spin_unlock(&hctx->lock);
+	spin_unlock(hctx->dispatch_lock);
 }
 
 #endif
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 14f2ad3af31f..016f16c48f72 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -22,6 +22,9 @@ struct blk_mq_hw_ctx {
 
 	unsigned long		flags;		/* BLK_MQ_F_* flags */
 
+	spinlock_t		*dispatch_lock;
+	struct list_head	*dispatch_list;
+
 	void			*sched_data;
 	struct request_queue	*queue;
 	struct blk_flush_queue	*fq;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 13/14] blk-mq: pass 'request_queue *' to several helpers of operating BUSY
  2017-07-31 16:50 [PATCH 00/14] blk-mq-sched: fix SCSI-MQ performance regression Ming Lei
                   ` (12 preceding siblings ...)
  2017-07-31 16:51 ` [PATCH 12/14] blk-mq: introduce pointers to dispatch lock & list Ming Lei
@ 2017-07-31 16:51 ` Ming Lei
  2017-07-31 16:51 ` [PATCH 14/14] blk-mq-sched: improve IO scheduling on SCSI devcie Ming Lei
  14 siblings, 0 replies; 47+ messages in thread
From: Ming Lei @ 2017-07-31 16:51 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, linux-scsi, Martin K . Petersen,
	James E . J . Bottomley, Ming Lei

We need to support per-request_queue dispatch list for avoiding
early dispatch in case of shared queue depth.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c |  6 +++---
 block/blk-mq.h       | 15 +++++++++------
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 8ff74efe4172..37702786c6d1 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -132,7 +132,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	 * more fair dispatch.
 	 */
 	if (blk_mq_has_dispatch_rqs(hctx))
-		blk_mq_take_list_from_dispatch(hctx, &rq_list);
+		blk_mq_take_list_from_dispatch(q, hctx, &rq_list);
 
 	/*
 	 * Only ask the scheduler for requests, if we didn't have residual
@@ -147,11 +147,11 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 		blk_mq_sched_mark_restart_hctx(hctx);
 		can_go = blk_mq_dispatch_rq_list(q, &rq_list);
 		if (can_go)
-			blk_mq_hctx_clear_busy(hctx);
+			blk_mq_hctx_clear_busy(q, hctx);
 	}
 
 	/* can't go until ->dispatch is flushed */
-	if (!can_go || blk_mq_hctx_is_busy(hctx))
+	if (!can_go || blk_mq_hctx_is_busy(q, hctx))
 		return;
 
 	/*
diff --git a/block/blk-mq.h b/block/blk-mq.h
index d9795cbba1bb..a8788058da56 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -135,17 +135,20 @@ static inline bool blk_mq_hw_queue_mapped(struct blk_mq_hw_ctx *hctx)
 	return hctx->nr_ctx && hctx->tags;
 }
 
-static inline bool blk_mq_hctx_is_busy(struct blk_mq_hw_ctx *hctx)
+static inline bool blk_mq_hctx_is_busy(struct request_queue *q,
+		struct blk_mq_hw_ctx *hctx)
 {
 	return test_bit(BLK_MQ_S_BUSY, &hctx->state);
 }
 
-static inline void blk_mq_hctx_set_busy(struct blk_mq_hw_ctx *hctx)
+static inline void blk_mq_hctx_set_busy(struct request_queue *q,
+		struct blk_mq_hw_ctx *hctx)
 {
 	set_bit(BLK_MQ_S_BUSY, &hctx->state);
 }
 
-static inline void blk_mq_hctx_clear_busy(struct blk_mq_hw_ctx *hctx)
+static inline void blk_mq_hctx_clear_busy(struct request_queue *q,
+		struct blk_mq_hw_ctx *hctx)
 {
 	clear_bit(BLK_MQ_S_BUSY, &hctx->state);
 }
@@ -179,8 +182,8 @@ static inline void blk_mq_add_list_to_dispatch_tail(struct blk_mq_hw_ctx *hctx,
 	spin_unlock(hctx->dispatch_lock);
 }
 
-static inline void blk_mq_take_list_from_dispatch(struct blk_mq_hw_ctx *hctx,
-		struct list_head *list)
+static inline void blk_mq_take_list_from_dispatch(struct request_queue *q,
+		struct blk_mq_hw_ctx *hctx, struct list_head *list)
 {
 	spin_lock(hctx->dispatch_lock);
 	list_splice_init(hctx->dispatch_list, list);
@@ -190,7 +193,7 @@ static inline void blk_mq_take_list_from_dispatch(struct blk_mq_hw_ctx *hctx,
 	 * in hctx->dispatch are dispatched successfully
 	 */
 	if (!list_empty(list))
-		blk_mq_hctx_set_busy(hctx);
+		blk_mq_hctx_set_busy(q, hctx);
 	spin_unlock(hctx->dispatch_lock);
 }
 
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 14/14] blk-mq-sched: improve IO scheduling on SCSI devcie
  2017-07-31 16:50 [PATCH 00/14] blk-mq-sched: fix SCSI-MQ performance regression Ming Lei
                   ` (13 preceding siblings ...)
  2017-07-31 16:51 ` [PATCH 13/14] blk-mq: pass 'request_queue *' to several helpers of operating BUSY Ming Lei
@ 2017-07-31 16:51 ` Ming Lei
  14 siblings, 0 replies; 47+ messages in thread
From: Ming Lei @ 2017-07-31 16:51 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, linux-scsi, Martin K . Petersen,
	James E . J . Bottomley, Ming Lei

SCSI device often has per-request_queue queue depth
(.cmd_per_lun), which is applied among all hw queues
actually, and this patchset calls this as shared
queue depth.

One theory of scheduler is that we shouldn't dequeue
request from sw/scheduler queue and dispatch it to
driver when the low level queue is busy.

For SCSI device, queue being busy depends on the
per-request_queue limit, so we should hold all
hw queues if the request queue is busy.

This patch introduces per-request_queue dispatch
list for this purpose, and only when all requests
in this list are dispatched out successfully, we
can restart to dequeue request from sw/scheduler
queue and dispath it to lld.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c         |  8 +++++++-
 block/blk-mq.h         | 14 +++++++++++---
 include/linux/blkdev.h |  5 +++++
 3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9b8b3a740d18..6d02901d798e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2667,8 +2667,14 @@ int blk_mq_update_sched_queue_depth(struct request_queue *q)
 	 * this queue depth limit
 	 */
 	if (q->queue_depth) {
-		queue_for_each_hw_ctx(q, hctx, i)
+		queue_for_each_hw_ctx(q, hctx, i) {
 			hctx->flags |= BLK_MQ_F_SHARED_DEPTH;
+			hctx->dispatch_lock = &q->__mq_dispatch_lock;
+			hctx->dispatch_list = &q->__mq_dispatch_list;
+
+			spin_lock_init(hctx->dispatch_lock);
+			INIT_LIST_HEAD(hctx->dispatch_list);
+		}
 	}
 
 	if (!q->elevator)
diff --git a/block/blk-mq.h b/block/blk-mq.h
index a8788058da56..4853d422836f 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -138,19 +138,27 @@ static inline bool blk_mq_hw_queue_mapped(struct blk_mq_hw_ctx *hctx)
 static inline bool blk_mq_hctx_is_busy(struct request_queue *q,
 		struct blk_mq_hw_ctx *hctx)
 {
-	return test_bit(BLK_MQ_S_BUSY, &hctx->state);
+	if (!(hctx->flags & BLK_MQ_F_SHARED_DEPTH))
+		return test_bit(BLK_MQ_S_BUSY, &hctx->state);
+	return q->mq_dispatch_busy;
 }
 
 static inline void blk_mq_hctx_set_busy(struct request_queue *q,
 		struct blk_mq_hw_ctx *hctx)
 {
-	set_bit(BLK_MQ_S_BUSY, &hctx->state);
+	if (!(hctx->flags & BLK_MQ_F_SHARED_DEPTH))
+		set_bit(BLK_MQ_S_BUSY, &hctx->state);
+	else
+		q->mq_dispatch_busy = 1;
 }
 
 static inline void blk_mq_hctx_clear_busy(struct request_queue *q,
 		struct blk_mq_hw_ctx *hctx)
 {
-	clear_bit(BLK_MQ_S_BUSY, &hctx->state);
+	if (!(hctx->flags & BLK_MQ_F_SHARED_DEPTH))
+		clear_bit(BLK_MQ_S_BUSY, &hctx->state);
+	else
+		q->mq_dispatch_busy = 0;
 }
 
 static inline bool blk_mq_has_dispatch_rqs(struct blk_mq_hw_ctx *hctx)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 25f6a0cb27d3..bc0e607710f2 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -395,6 +395,11 @@ struct request_queue {
 
 	atomic_t		shared_hctx_restart;
 
+	/* blk-mq dispatch list and lock for shared queue depth case */
+	struct list_head	__mq_dispatch_list;
+	spinlock_t		__mq_dispatch_lock;
+	unsigned int		mq_dispatch_busy;
+
 	struct blk_queue_stats	*stats;
 	struct rq_wb		*rq_wb;
 
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 01/14] blk-mq-sched: fix scheduler bad performance
  2017-07-31 16:50 ` [PATCH 01/14] blk-mq-sched: fix scheduler bad performance Ming Lei
@ 2017-07-31 23:00     ` Bart Van Assche
  0 siblings, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2017-07-31 23:00 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei
  Cc: Bart Van Assche, linux-scsi, jejb, martin.petersen

On Tue, 2017-08-01 at 00:50 +0800, Ming Lei wrote:
> When hw queue is busy, we shouldn't take requests from
> scheduler queue any more, otherwise IO merge will be
> difficult to do.
>=20
> This patch fixes the awful IO performance on some
> SCSI devices(lpfc, qla2xxx, ...) when mq-deadline/kyber
> is used by not taking requests if hw queue is busy.
>=20
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-mq-sched.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>=20
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 4ab69435708c..47a25333a136 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -94,7 +94,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ct=
x *hctx)
>  	struct request_queue *q =3D hctx->queue;
>  	struct elevator_queue *e =3D q->elevator;
>  	const bool has_sched_dispatch =3D e && e->type->ops.mq.dispatch_request=
;
> -	bool did_work =3D false;
> +	bool can_go =3D true;
>  	LIST_HEAD(rq_list);
> =20
>  	/* RCU or SRCU read lock is needed before checking quiesced flag */
> @@ -125,7 +125,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_=
ctx *hctx)
>  	 */
>  	if (!list_empty(&rq_list)) {
>  		blk_mq_sched_mark_restart_hctx(hctx);
> -		did_work =3D blk_mq_dispatch_rq_list(q, &rq_list);
> +		can_go =3D blk_mq_dispatch_rq_list(q, &rq_list);
>  	} else if (!has_sched_dispatch) {
>  		blk_mq_flush_busy_ctxs(hctx, &rq_list);
>  		blk_mq_dispatch_rq_list(q, &rq_list);
> @@ -136,7 +136,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_=
ctx *hctx)
>  	 * on the dispatch list, OR if we did have work but weren't able
>  	 * to make progress.
>  	 */
> -	if (!did_work && has_sched_dispatch) {
> +	if (can_go && has_sched_dispatch) {
>  		do {
>  			struct request *rq;

Hello Ming,

Please chose a better name for the new variable, e.g. "do_sched_dispatch".=
=20
Otherwise this patch looks fine to me. Hence:

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>

Bart.=

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 01/14] blk-mq-sched: fix scheduler bad performance
@ 2017-07-31 23:00     ` Bart Van Assche
  0 siblings, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2017-07-31 23:00 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei
  Cc: Bart Van Assche, linux-scsi, jejb, martin.petersen

On Tue, 2017-08-01 at 00:50 +0800, Ming Lei wrote:
> When hw queue is busy, we shouldn't take requests from
> scheduler queue any more, otherwise IO merge will be
> difficult to do.
> 
> This patch fixes the awful IO performance on some
> SCSI devices(lpfc, qla2xxx, ...) when mq-deadline/kyber
> is used by not taking requests if hw queue is busy.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-mq-sched.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 4ab69435708c..47a25333a136 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -94,7 +94,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
>  	struct request_queue *q = hctx->queue;
>  	struct elevator_queue *e = q->elevator;
>  	const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
> -	bool did_work = false;
> +	bool can_go = true;
>  	LIST_HEAD(rq_list);
>  
>  	/* RCU or SRCU read lock is needed before checking quiesced flag */
> @@ -125,7 +125,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
>  	 */
>  	if (!list_empty(&rq_list)) {
>  		blk_mq_sched_mark_restart_hctx(hctx);
> -		did_work = blk_mq_dispatch_rq_list(q, &rq_list);
> +		can_go = blk_mq_dispatch_rq_list(q, &rq_list);
>  	} else if (!has_sched_dispatch) {
>  		blk_mq_flush_busy_ctxs(hctx, &rq_list);
>  		blk_mq_dispatch_rq_list(q, &rq_list);
> @@ -136,7 +136,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
>  	 * on the dispatch list, OR if we did have work but weren't able
>  	 * to make progress.
>  	 */
> -	if (!did_work && has_sched_dispatch) {
> +	if (can_go && has_sched_dispatch) {
>  		do {
>  			struct request *rq;

Hello Ming,

Please chose a better name for the new variable, e.g. "do_sched_dispatch". 
Otherwise this patch looks fine to me. Hence:

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>

Bart.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 02/14] blk-mq: rename flush_busy_ctx_data as ctx_iter_data
  2017-07-31 16:50 ` [PATCH 02/14] blk-mq: rename flush_busy_ctx_data as ctx_iter_data Ming Lei
@ 2017-07-31 23:03     ` Bart Van Assche
  0 siblings, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2017-07-31 23:03 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei
  Cc: Bart Van Assche, linux-scsi, jejb, martin.petersen

On Tue, 2017-08-01 at 00:50 +0800, Ming Lei wrote:
> The following patch need to reuse this data structure,
> so rename as one generic name.

Hello Ming,

Please drop this patch (see also my comments on the next patch).

Thanks,

Bart.=

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 02/14] blk-mq: rename flush_busy_ctx_data as ctx_iter_data
@ 2017-07-31 23:03     ` Bart Van Assche
  0 siblings, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2017-07-31 23:03 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei
  Cc: Bart Van Assche, linux-scsi, jejb, martin.petersen

On Tue, 2017-08-01 at 00:50 +0800, Ming Lei wrote:
> The following patch need to reuse this data structure,
> so rename as one generic name.

Hello Ming,

Please drop this patch (see also my comments on the next patch).

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 03/14] blk-mq: introduce blk_mq_dispatch_rq_from_ctxs()
  2017-07-31 16:51 ` [PATCH 03/14] blk-mq: introduce blk_mq_dispatch_rq_from_ctxs() Ming Lei
@ 2017-07-31 23:09     ` Bart Van Assche
  2017-08-02 17:19     ` kbuild test robot
  1 sibling, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2017-07-31 23:09 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei
  Cc: Bart Van Assche, linux-scsi, jejb, martin.petersen

On Tue, 2017-08-01 at 00:51 +0800, Ming Lei wrote:
> @@ -810,7 +810,11 @@ static void blk_mq_timeout_work(struct work_struct *=
work)
> =20
>  struct ctx_iter_data {
>  	struct blk_mq_hw_ctx *hctx;
> -	struct list_head *list;
> +
> +	union {
> +		struct list_head *list;
> +		struct request *rq;
> +	};
>  };

Hello Ming,

Please introduce a new data structure for dispatch_rq_from_ctx() /
blk_mq_dispatch_rq_from_ctxs() instead of introducing a union in struct
ctx_iter_data. That will avoid that .list can be used in a context where
a struct request * pointer has been stored in the structure and vice versa.
=20
>  static bool flush_busy_ctx(struct sbitmap *sb, unsigned int bitnr, void =
*data)
> @@ -826,6 +830,26 @@ static bool flush_busy_ctx(struct sbitmap *sb, unsig=
ned int bitnr, void *data)
>  	return true;
>  }
> =20
> +static bool dispatch_rq_from_ctx(struct sbitmap *sb, unsigned int bitnr,=
 void *data)
> +{
> +	struct ctx_iter_data *dispatch_data =3D data;
> +	struct blk_mq_hw_ctx *hctx =3D dispatch_data->hctx;
> +	struct blk_mq_ctx *ctx =3D hctx->ctxs[bitnr];
> +	bool empty =3D true;
> +
> +	spin_lock(&ctx->lock);
> +	if (unlikely(!list_empty(&ctx->rq_list))) {
> +		dispatch_data->rq =3D list_entry_rq(ctx->rq_list.next);
> +		list_del_init(&dispatch_data->rq->queuelist);
> +		empty =3D list_empty(&ctx->rq_list);
> +	}
> +	spin_unlock(&ctx->lock);
> +	if (empty)
> +		sbitmap_clear_bit(sb, bitnr);

This sbitmap_clear_bit() occurs without holding blk_mq_ctx.lock. Sorry but
I don't think this is safe. Please either remove this sbitmap_clear_bit() c=
all
or make sure that it happens with blk_mq_ctx.lock held.

Thanks,

Bart.=

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 03/14] blk-mq: introduce blk_mq_dispatch_rq_from_ctxs()
@ 2017-07-31 23:09     ` Bart Van Assche
  0 siblings, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2017-07-31 23:09 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei
  Cc: Bart Van Assche, linux-scsi, jejb, martin.petersen

On Tue, 2017-08-01 at 00:51 +0800, Ming Lei wrote:
> @@ -810,7 +810,11 @@ static void blk_mq_timeout_work(struct work_struct *work)
>  
>  struct ctx_iter_data {
>  	struct blk_mq_hw_ctx *hctx;
> -	struct list_head *list;
> +
> +	union {
> +		struct list_head *list;
> +		struct request *rq;
> +	};
>  };

Hello Ming,

Please introduce a new data structure for dispatch_rq_from_ctx() /
blk_mq_dispatch_rq_from_ctxs() instead of introducing a union in struct
ctx_iter_data. That will avoid that .list can be used in a context where
a struct request * pointer has been stored in the structure and vice versa.
 
>  static bool flush_busy_ctx(struct sbitmap *sb, unsigned int bitnr, void *data)
> @@ -826,6 +830,26 @@ static bool flush_busy_ctx(struct sbitmap *sb, unsigned int bitnr, void *data)
>  	return true;
>  }
>  
> +static bool dispatch_rq_from_ctx(struct sbitmap *sb, unsigned int bitnr, void *data)
> +{
> +	struct ctx_iter_data *dispatch_data = data;
> +	struct blk_mq_hw_ctx *hctx = dispatch_data->hctx;
> +	struct blk_mq_ctx *ctx = hctx->ctxs[bitnr];
> +	bool empty = true;
> +
> +	spin_lock(&ctx->lock);
> +	if (unlikely(!list_empty(&ctx->rq_list))) {
> +		dispatch_data->rq = list_entry_rq(ctx->rq_list.next);
> +		list_del_init(&dispatch_data->rq->queuelist);
> +		empty = list_empty(&ctx->rq_list);
> +	}
> +	spin_unlock(&ctx->lock);
> +	if (empty)
> +		sbitmap_clear_bit(sb, bitnr);

This sbitmap_clear_bit() occurs without holding blk_mq_ctx.lock. Sorry but
I don't think this is safe. Please either remove this sbitmap_clear_bit() call
or make sure that it happens with blk_mq_ctx.lock held.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/14] blk-mq-sched: improve dispatching from sw queue
  2017-07-31 16:51 ` [PATCH 04/14] blk-mq-sched: improve dispatching from sw queue Ming Lei
@ 2017-07-31 23:34     ` Bart Van Assche
  0 siblings, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2017-07-31 23:34 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei; +Cc: linux-scsi, jejb, martin.petersen

On Tue, 2017-08-01 at 00:51 +0800, Ming Lei wrote:
> SCSI devices use host-wide tagset, and the shared
> driver tag space is often quite big. Meantime
> there is also queue depth for each lun(.cmd_per_lun),
> which is often small.
>=20
> So lots of requests may stay in sw queue, and we
> always flush all belonging to same hw queue and
> dispatch them all to driver, unfortunately it is
> easy to cause queue busy becasue of the small
> per-lun queue depth. Once these requests are flushed
> out, they have to stay in hctx->dispatch, and no bio
> merge can participate into these requests, and
> sequential IO performance is hurted.
>=20
> This patch improves dispatching from sw queue when
> there is per-request-queue queue depth by taking
> request one by one from sw queue, just like the way
> of IO scheduler.
>=20
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-mq-sched.c | 25 +++++++++++++++----------
>  1 file changed, 15 insertions(+), 10 deletions(-)
>=20
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 47a25333a136..3510c01cb17b 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -96,6 +96,9 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ct=
x *hctx)
>  	const bool has_sched_dispatch =3D e && e->type->ops.mq.dispatch_request=
;
>  	bool can_go =3D true;
>  	LIST_HEAD(rq_list);
> +	struct request *(*dispatch_fn)(struct blk_mq_hw_ctx *) =3D
> +		has_sched_dispatch ? e->type->ops.mq.dispatch_request :
> +			blk_mq_dispatch_rq_from_ctxs;
> =20
>  	/* RCU or SRCU read lock is needed before checking quiesced flag */
>  	if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> @@ -126,26 +129,28 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_h=
w_ctx *hctx)
>  	if (!list_empty(&rq_list)) {
>  		blk_mq_sched_mark_restart_hctx(hctx);
>  		can_go =3D blk_mq_dispatch_rq_list(q, &rq_list);
> -	} else if (!has_sched_dispatch) {
> +	} else if (!has_sched_dispatch && !q->queue_depth) {
>  		blk_mq_flush_busy_ctxs(hctx, &rq_list);
>  		blk_mq_dispatch_rq_list(q, &rq_list);
> +		can_go =3D false;
>  	}
> =20
> +	if (!can_go)
> +		return;
> +
>  	/*
>  	 * We want to dispatch from the scheduler if we had no work left
>  	 * on the dispatch list, OR if we did have work but weren't able
>  	 * to make progress.
>  	 */
> -	if (can_go && has_sched_dispatch) {
> -		do {
> -			struct request *rq;
> +	do {
> +		struct request *rq;
> =20
> -			rq =3D e->type->ops.mq.dispatch_request(hctx);
> -			if (!rq)
> -				break;
> -			list_add(&rq->queuelist, &rq_list);
> -		} while (blk_mq_dispatch_rq_list(q, &rq_list));
> -	}
> +		rq =3D dispatch_fn(hctx);
> +		if (!rq)
> +			break;
> +		list_add(&rq->queuelist, &rq_list);
> +	} while (blk_mq_dispatch_rq_list(q, &rq_list));
>  }

Hello Ming,

Although I like the idea behind this patch, I'm afraid that this patch will
cause a performance regression for high-performance SCSI LLD drivers, e.g.
ib_srp. Have you considered to rework this patch as follows:
* Remove the code under "else if (!has_sched_dispatch && !q->queue_depth) {=
".
* Modify all blk_mq_dispatch_rq_list() functions such that these dispatch u=
p
  to cmd_per_lun - (number of requests in progress) at once.

Thanks,

Bart.=

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/14] blk-mq-sched: improve dispatching from sw queue
@ 2017-07-31 23:34     ` Bart Van Assche
  0 siblings, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2017-07-31 23:34 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei; +Cc: linux-scsi, jejb, martin.petersen

On Tue, 2017-08-01 at 00:51 +0800, Ming Lei wrote:
> SCSI devices use host-wide tagset, and the shared
> driver tag space is often quite big. Meantime
> there is also queue depth for each lun(.cmd_per_lun),
> which is often small.
> 
> So lots of requests may stay in sw queue, and we
> always flush all belonging to same hw queue and
> dispatch them all to driver, unfortunately it is
> easy to cause queue busy becasue of the small
> per-lun queue depth. Once these requests are flushed
> out, they have to stay in hctx->dispatch, and no bio
> merge can participate into these requests, and
> sequential IO performance is hurted.
> 
> This patch improves dispatching from sw queue when
> there is per-request-queue queue depth by taking
> request one by one from sw queue, just like the way
> of IO scheduler.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-mq-sched.c | 25 +++++++++++++++----------
>  1 file changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 47a25333a136..3510c01cb17b 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -96,6 +96,9 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
>  	const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
>  	bool can_go = true;
>  	LIST_HEAD(rq_list);
> +	struct request *(*dispatch_fn)(struct blk_mq_hw_ctx *) =
> +		has_sched_dispatch ? e->type->ops.mq.dispatch_request :
> +			blk_mq_dispatch_rq_from_ctxs;
>  
>  	/* RCU or SRCU read lock is needed before checking quiesced flag */
>  	if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> @@ -126,26 +129,28 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
>  	if (!list_empty(&rq_list)) {
>  		blk_mq_sched_mark_restart_hctx(hctx);
>  		can_go = blk_mq_dispatch_rq_list(q, &rq_list);
> -	} else if (!has_sched_dispatch) {
> +	} else if (!has_sched_dispatch && !q->queue_depth) {
>  		blk_mq_flush_busy_ctxs(hctx, &rq_list);
>  		blk_mq_dispatch_rq_list(q, &rq_list);
> +		can_go = false;
>  	}
>  
> +	if (!can_go)
> +		return;
> +
>  	/*
>  	 * We want to dispatch from the scheduler if we had no work left
>  	 * on the dispatch list, OR if we did have work but weren't able
>  	 * to make progress.
>  	 */
> -	if (can_go && has_sched_dispatch) {
> -		do {
> -			struct request *rq;
> +	do {
> +		struct request *rq;
>  
> -			rq = e->type->ops.mq.dispatch_request(hctx);
> -			if (!rq)
> -				break;
> -			list_add(&rq->queuelist, &rq_list);
> -		} while (blk_mq_dispatch_rq_list(q, &rq_list));
> -	}
> +		rq = dispatch_fn(hctx);
> +		if (!rq)
> +			break;
> +		list_add(&rq->queuelist, &rq_list);
> +	} while (blk_mq_dispatch_rq_list(q, &rq_list));
>  }

Hello Ming,

Although I like the idea behind this patch, I'm afraid that this patch will
cause a performance regression for high-performance SCSI LLD drivers, e.g.
ib_srp. Have you considered to rework this patch as follows:
* Remove the code under "else if (!has_sched_dispatch && !q->queue_depth) {".
* Modify all blk_mq_dispatch_rq_list() functions such that these dispatch up
  to cmd_per_lun - (number of requests in progress) at once.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed
  2017-07-31 16:51 ` [PATCH 05/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed Ming Lei
@ 2017-07-31 23:42     ` Bart Van Assche
  0 siblings, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2017-07-31 23:42 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei; +Cc: linux-scsi, jejb, martin.petersen

On Tue, 2017-08-01 at 00:51 +0800, Ming Lei wrote:
> During dispatch, we moved all requests from hctx->dispatch to
> one temporary list, then dispatch them one by one from this list.
> Unfortunately duirng this period, run queue from other contexts
> may think the queue is idle and start to dequeue from sw/scheduler
> queue and try to dispatch because ->dispatch is empty.
>=20
> This way will hurt sequential I/O performance because requests are
> dequeued when queue is busy.
>=20
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-mq-sched.c   | 24 ++++++++++++++++++------
>  include/linux/blk-mq.h |  1 +
>  2 files changed, 19 insertions(+), 6 deletions(-)
>=20
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 3510c01cb17b..eb638063673f 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -112,8 +112,15 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw=
_ctx *hctx)
>  	 */
>  	if (!list_empty_careful(&hctx->dispatch)) {
>  		spin_lock(&hctx->lock);
> -		if (!list_empty(&hctx->dispatch))
> +		if (!list_empty(&hctx->dispatch)) {
>  			list_splice_init(&hctx->dispatch, &rq_list);
> +
> +			/*
> +			 * BUSY won't be cleared until all requests
> +			 * in hctx->dispatch are dispatched successfully
> +			 */
> +			set_bit(BLK_MQ_S_BUSY, &hctx->state);
> +		}
>  		spin_unlock(&hctx->lock);
>  	}
> =20
> @@ -129,15 +136,20 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_h=
w_ctx *hctx)
>  	if (!list_empty(&rq_list)) {
>  		blk_mq_sched_mark_restart_hctx(hctx);
>  		can_go =3D blk_mq_dispatch_rq_list(q, &rq_list);
> -	} else if (!has_sched_dispatch && !q->queue_depth) {
> -		blk_mq_flush_busy_ctxs(hctx, &rq_list);
> -		blk_mq_dispatch_rq_list(q, &rq_list);
> -		can_go =3D false;
> +		if (can_go)
> +			clear_bit(BLK_MQ_S_BUSY, &hctx->state);
>  	}
> =20
> -	if (!can_go)
> +	/* can't go until ->dispatch is flushed */
> +	if (!can_go || test_bit(BLK_MQ_S_BUSY, &hctx->state))
>  		return;
> =20
> +	if (!has_sched_dispatch && !q->queue_depth) {
> +		blk_mq_flush_busy_ctxs(hctx, &rq_list);
> +		blk_mq_dispatch_rq_list(q, &rq_list);
> +		return;
> +	}

Hello Ming,

Since setting, clearing and testing of BLK_MQ_S_BUSY can happen concurrentl=
y
and since clearing and testing happens without any locks held I'm afraid th=
is
patch introduces the following race conditions:
* Clearing of BLK_MQ_S_BUSY immediately after this bit has been set, result=
ing
  in this bit not being set although there are requests on the dispatch lis=
t.
* Checking BLK_MQ_S_BUSY after requests have been added to the dispatch lis=
t
  but before that bit is set, resulting in test_bit(BLK_MQ_S_BUSY, &hctx->s=
tate)
  reporting that the BLK_MQ_S_BUSY has not been set although there are requ=
ests
  on the dispatch list.
* Checking BLK_MQ_S_BUSY after requests have been removed from the dispatch=
 list
  but before that bit is cleared, resulting in test_bit(BLK_MQ_S_BUSY, &hct=
x->state)
  reporting that the BLK_MQ_S_BUSY
has been set although there are no requests
  on the dispatch list.

Bart.=

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed
@ 2017-07-31 23:42     ` Bart Van Assche
  0 siblings, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2017-07-31 23:42 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei; +Cc: linux-scsi, jejb, martin.petersen

On Tue, 2017-08-01 at 00:51 +0800, Ming Lei wrote:
> During dispatch, we moved all requests from hctx->dispatch to
> one temporary list, then dispatch them one by one from this list.
> Unfortunately duirng this period, run queue from other contexts
> may think the queue is idle and start to dequeue from sw/scheduler
> queue and try to dispatch because ->dispatch is empty.
> 
> This way will hurt sequential I/O performance because requests are
> dequeued when queue is busy.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-mq-sched.c   | 24 ++++++++++++++++++------
>  include/linux/blk-mq.h |  1 +
>  2 files changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 3510c01cb17b..eb638063673f 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -112,8 +112,15 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
>  	 */
>  	if (!list_empty_careful(&hctx->dispatch)) {
>  		spin_lock(&hctx->lock);
> -		if (!list_empty(&hctx->dispatch))
> +		if (!list_empty(&hctx->dispatch)) {
>  			list_splice_init(&hctx->dispatch, &rq_list);
> +
> +			/*
> +			 * BUSY won't be cleared until all requests
> +			 * in hctx->dispatch are dispatched successfully
> +			 */
> +			set_bit(BLK_MQ_S_BUSY, &hctx->state);
> +		}
>  		spin_unlock(&hctx->lock);
>  	}
>  
> @@ -129,15 +136,20 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
>  	if (!list_empty(&rq_list)) {
>  		blk_mq_sched_mark_restart_hctx(hctx);
>  		can_go = blk_mq_dispatch_rq_list(q, &rq_list);
> -	} else if (!has_sched_dispatch && !q->queue_depth) {
> -		blk_mq_flush_busy_ctxs(hctx, &rq_list);
> -		blk_mq_dispatch_rq_list(q, &rq_list);
> -		can_go = false;
> +		if (can_go)
> +			clear_bit(BLK_MQ_S_BUSY, &hctx->state);
>  	}
>  
> -	if (!can_go)
> +	/* can't go until ->dispatch is flushed */
> +	if (!can_go || test_bit(BLK_MQ_S_BUSY, &hctx->state))
>  		return;
>  
> +	if (!has_sched_dispatch && !q->queue_depth) {
> +		blk_mq_flush_busy_ctxs(hctx, &rq_list);
> +		blk_mq_dispatch_rq_list(q, &rq_list);
> +		return;
> +	}

Hello Ming,

Since setting, clearing and testing of BLK_MQ_S_BUSY can happen concurrently
and since clearing and testing happens without any locks held I'm afraid this
patch introduces the following race conditions:
* Clearing of BLK_MQ_S_BUSY immediately after this bit has been set, resulting
  in this bit not being set although there are requests on the dispatch list.
* Checking BLK_MQ_S_BUSY after requests have been added to the dispatch list
  but before that bit is set, resulting in test_bit(BLK_MQ_S_BUSY, &hctx->state)
  reporting that the BLK_MQ_S_BUSY has not been set although there are requests
  on the dispatch list.
* Checking BLK_MQ_S_BUSY after requests have been removed from the dispatch list
  but before that bit is cleared, resulting in test_bit(BLK_MQ_S_BUSY, &hctx->state)
  reporting that the BLK_MQ_S_BUSY
has been set although there are no requests
  on the dispatch list.

Bart.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 03/14] blk-mq: introduce blk_mq_dispatch_rq_from_ctxs()
  2017-07-31 23:09     ` Bart Van Assche
  (?)
@ 2017-08-01 10:07     ` Ming Lei
  -1 siblings, 0 replies; 47+ messages in thread
From: Ming Lei @ 2017-08-01 10:07 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: hch, linux-block, axboe, linux-scsi, jejb, martin.petersen

On Mon, Jul 31, 2017 at 11:09:38PM +0000, Bart Van Assche wrote:
> On Tue, 2017-08-01 at 00:51 +0800, Ming Lei wrote:
> > @@ -810,7 +810,11 @@ static void blk_mq_timeout_work(struct work_struct *work)
> >  
> >  struct ctx_iter_data {
> >  	struct blk_mq_hw_ctx *hctx;
> > -	struct list_head *list;
> > +
> > +	union {
> > +		struct list_head *list;
> > +		struct request *rq;
> > +	};
> >  };
> 
> Hello Ming,
> 
> Please introduce a new data structure for dispatch_rq_from_ctx() /
> blk_mq_dispatch_rq_from_ctxs() instead of introducing a union in struct
> ctx_iter_data. That will avoid that .list can be used in a context where
> a struct request * pointer has been stored in the structure and vice versa.

Looks there isn't such usage now, or we can just both 'list' and 'rq' in
this data structure if there is.

>  
> >  static bool flush_busy_ctx(struct sbitmap *sb, unsigned int bitnr, void *data)
> > @@ -826,6 +830,26 @@ static bool flush_busy_ctx(struct sbitmap *sb, unsigned int bitnr, void *data)
> >  	return true;
> >  }
> >  
> > +static bool dispatch_rq_from_ctx(struct sbitmap *sb, unsigned int bitnr, void *data)
> > +{
> > +	struct ctx_iter_data *dispatch_data = data;
> > +	struct blk_mq_hw_ctx *hctx = dispatch_data->hctx;
> > +	struct blk_mq_ctx *ctx = hctx->ctxs[bitnr];
> > +	bool empty = true;
> > +
> > +	spin_lock(&ctx->lock);
> > +	if (unlikely(!list_empty(&ctx->rq_list))) {
> > +		dispatch_data->rq = list_entry_rq(ctx->rq_list.next);
> > +		list_del_init(&dispatch_data->rq->queuelist);
> > +		empty = list_empty(&ctx->rq_list);
> > +	}
> > +	spin_unlock(&ctx->lock);
> > +	if (empty)
> > +		sbitmap_clear_bit(sb, bitnr);
> 
> This sbitmap_clear_bit() occurs without holding blk_mq_ctx.lock. Sorry but
> I don't think this is safe. Please either remove this sbitmap_clear_bit() call
> or make sure that it happens with blk_mq_ctx.lock held.

Good catch, sbitmap_clear_bit() should have been done with holding
ctx->lock, otherwise a new pending bit may be cleared.

-- 
Ming

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/14] blk-mq-sched: improve dispatching from sw queue
  2017-07-31 23:34     ` Bart Van Assche
  (?)
@ 2017-08-01 10:17     ` Ming Lei
  2017-08-01 10:50       ` Ming Lei
  -1 siblings, 1 reply; 47+ messages in thread
From: Ming Lei @ 2017-08-01 10:17 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: hch, linux-block, axboe, linux-scsi, jejb, martin.petersen

On Mon, Jul 31, 2017 at 11:34:35PM +0000, Bart Van Assche wrote:
> On Tue, 2017-08-01 at 00:51 +0800, Ming Lei wrote:
> > SCSI devices use host-wide tagset, and the shared
> > driver tag space is often quite big. Meantime
> > there is also queue depth for each lun(.cmd_per_lun),
> > which is often small.
> > 
> > So lots of requests may stay in sw queue, and we
> > always flush all belonging to same hw queue and
> > dispatch them all to driver, unfortunately it is
> > easy to cause queue busy becasue of the small
> > per-lun queue depth. Once these requests are flushed
> > out, they have to stay in hctx->dispatch, and no bio
> > merge can participate into these requests, and
> > sequential IO performance is hurted.
> > 
> > This patch improves dispatching from sw queue when
> > there is per-request-queue queue depth by taking
> > request one by one from sw queue, just like the way
> > of IO scheduler.
> > 
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  block/blk-mq-sched.c | 25 +++++++++++++++----------
> >  1 file changed, 15 insertions(+), 10 deletions(-)
> > 
> > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > index 47a25333a136..3510c01cb17b 100644
> > --- a/block/blk-mq-sched.c
> > +++ b/block/blk-mq-sched.c
> > @@ -96,6 +96,9 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> >  	const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
> >  	bool can_go = true;
> >  	LIST_HEAD(rq_list);
> > +	struct request *(*dispatch_fn)(struct blk_mq_hw_ctx *) =
> > +		has_sched_dispatch ? e->type->ops.mq.dispatch_request :
> > +			blk_mq_dispatch_rq_from_ctxs;
> >  
> >  	/* RCU or SRCU read lock is needed before checking quiesced flag */
> >  	if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> > @@ -126,26 +129,28 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> >  	if (!list_empty(&rq_list)) {
> >  		blk_mq_sched_mark_restart_hctx(hctx);
> >  		can_go = blk_mq_dispatch_rq_list(q, &rq_list);
> > -	} else if (!has_sched_dispatch) {
> > +	} else if (!has_sched_dispatch && !q->queue_depth) {
> >  		blk_mq_flush_busy_ctxs(hctx, &rq_list);
> >  		blk_mq_dispatch_rq_list(q, &rq_list);
> > +		can_go = false;
> >  	}
> >  
> > +	if (!can_go)
> > +		return;
> > +
> >  	/*
> >  	 * We want to dispatch from the scheduler if we had no work left
> >  	 * on the dispatch list, OR if we did have work but weren't able
> >  	 * to make progress.
> >  	 */
> > -	if (can_go && has_sched_dispatch) {
> > -		do {
> > -			struct request *rq;
> > +	do {
> > +		struct request *rq;
> >  
> > -			rq = e->type->ops.mq.dispatch_request(hctx);
> > -			if (!rq)
> > -				break;
> > -			list_add(&rq->queuelist, &rq_list);
> > -		} while (blk_mq_dispatch_rq_list(q, &rq_list));
> > -	}
> > +		rq = dispatch_fn(hctx);
> > +		if (!rq)
> > +			break;
> > +		list_add(&rq->queuelist, &rq_list);
> > +	} while (blk_mq_dispatch_rq_list(q, &rq_list));
> >  }
> 
> Hello Ming,
> 
> Although I like the idea behind this patch, I'm afraid that this patch will
> cause a performance regression for high-performance SCSI LLD drivers, e.g.
> ib_srp. Have you considered to rework this patch as follows:
> * Remove the code under "else if (!has_sched_dispatch && !q->queue_depth) {".

This will affect devices such as NVMe in which busy isn't triggered
basically, so better to not do this.

> * Modify all blk_mq_dispatch_rq_list() functions such that these dispatch up
>   to cmd_per_lun - (number of requests in progress) at once.

How can we get the accurate 'number of requests in progress' efficiently?

And we have done it in this way for blk-mq scheduler already, so it
shouldn't be a problem.

>From my test data of mq-deadline on lpfc, the performance is good,
please see it in cover letter.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed
  2017-07-31 23:42     ` Bart Van Assche
  (?)
@ 2017-08-01 10:44     ` Ming Lei
  2017-08-01 16:14         ` Bart Van Assche
  -1 siblings, 1 reply; 47+ messages in thread
From: Ming Lei @ 2017-08-01 10:44 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: hch, linux-block, axboe, linux-scsi, jejb, martin.petersen

On Mon, Jul 31, 2017 at 11:42:21PM +0000, Bart Van Assche wrote:
> On Tue, 2017-08-01 at 00:51 +0800, Ming Lei wrote:
> > During dispatch, we moved all requests from hctx->dispatch to
> > one temporary list, then dispatch them one by one from this list.
> > Unfortunately duirng this period, run queue from other contexts
> > may think the queue is idle and start to dequeue from sw/scheduler
> > queue and try to dispatch because ->dispatch is empty.
> > 
> > This way will hurt sequential I/O performance because requests are
> > dequeued when queue is busy.
> > 
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  block/blk-mq-sched.c   | 24 ++++++++++++++++++------
> >  include/linux/blk-mq.h |  1 +
> >  2 files changed, 19 insertions(+), 6 deletions(-)
> > 
> > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > index 3510c01cb17b..eb638063673f 100644
> > --- a/block/blk-mq-sched.c
> > +++ b/block/blk-mq-sched.c
> > @@ -112,8 +112,15 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> >  	 */
> >  	if (!list_empty_careful(&hctx->dispatch)) {
> >  		spin_lock(&hctx->lock);
> > -		if (!list_empty(&hctx->dispatch))
> > +		if (!list_empty(&hctx->dispatch)) {
> >  			list_splice_init(&hctx->dispatch, &rq_list);
> > +
> > +			/*
> > +			 * BUSY won't be cleared until all requests
> > +			 * in hctx->dispatch are dispatched successfully
> > +			 */
> > +			set_bit(BLK_MQ_S_BUSY, &hctx->state);
> > +		}
> >  		spin_unlock(&hctx->lock);
> >  	}
> >  
> > @@ -129,15 +136,20 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> >  	if (!list_empty(&rq_list)) {
> >  		blk_mq_sched_mark_restart_hctx(hctx);
> >  		can_go = blk_mq_dispatch_rq_list(q, &rq_list);
> > -	} else if (!has_sched_dispatch && !q->queue_depth) {
> > -		blk_mq_flush_busy_ctxs(hctx, &rq_list);
> > -		blk_mq_dispatch_rq_list(q, &rq_list);
> > -		can_go = false;
> > +		if (can_go)
> > +			clear_bit(BLK_MQ_S_BUSY, &hctx->state);
> >  	}
> >  
> > -	if (!can_go)
> > +	/* can't go until ->dispatch is flushed */
> > +	if (!can_go || test_bit(BLK_MQ_S_BUSY, &hctx->state))
> >  		return;
> >  
> > +	if (!has_sched_dispatch && !q->queue_depth) {
> > +		blk_mq_flush_busy_ctxs(hctx, &rq_list);
> > +		blk_mq_dispatch_rq_list(q, &rq_list);
> > +		return;
> > +	}
> 
> Hello Ming,
> 
> Since setting, clearing and testing of BLK_MQ_S_BUSY can happen concurrently
> and since clearing and testing happens without any locks held I'm afraid this

Yes, I really want to avoid lock.

> patch introduces the following race conditions:
> * Clearing of BLK_MQ_S_BUSY immediately after this bit has been set, resulting
>   in this bit not being set although there are requests on the dispatch list.

The window is small enough.

And in the context of setting the BUSY bit, dispatch still can't move on
because 'can_go' will stop that.

Even it happens, no big deal, it just means only one request is dequeued
a bit early. What we really need to avoid is I/O hang.


> * Checking BLK_MQ_S_BUSY after requests have been added to the dispatch list
>   but before that bit is set, resulting in test_bit(BLK_MQ_S_BUSY, &hctx->state)
>   reporting that the BLK_MQ_S_BUSY has not been set although there are requests
>   on the dispatch list.

Same as above, no big deal, we can survive that.


> * Checking BLK_MQ_S_BUSY after requests have been removed from the dispatch list
>   but before that bit is cleared, resulting in test_bit(BLK_MQ_S_BUSY, &hctx->state)
>   reporting that the BLK_MQ_S_BUSY
> has been set although there are no requests
>   on the dispatch list.

That won't be a problem, because dispatch will be started in the
context in which dispatch list is flushed, since the BUSY bit
is cleared after blk_mq_dispatch_rq_list() returns. So no I/O
hang.


-- 
Ming

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/14] blk-mq-sched: improve dispatching from sw queue
  2017-08-01 10:17     ` Ming Lei
@ 2017-08-01 10:50       ` Ming Lei
  2017-08-01 15:11           ` Bart Van Assche
  0 siblings, 1 reply; 47+ messages in thread
From: Ming Lei @ 2017-08-01 10:50 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: hch, linux-block, axboe, linux-scsi, jejb, martin.petersen

On Tue, Aug 01, 2017 at 06:17:18PM +0800, Ming Lei wrote:
> On Mon, Jul 31, 2017 at 11:34:35PM +0000, Bart Van Assche wrote:
> > On Tue, 2017-08-01 at 00:51 +0800, Ming Lei wrote:
> > > SCSI devices use host-wide tagset, and the shared
> > > driver tag space is often quite big. Meantime
> > > there is also queue depth for each lun(.cmd_per_lun),
> > > which is often small.
> > > 
> > > So lots of requests may stay in sw queue, and we
> > > always flush all belonging to same hw queue and
> > > dispatch them all to driver, unfortunately it is
> > > easy to cause queue busy becasue of the small
> > > per-lun queue depth. Once these requests are flushed
> > > out, they have to stay in hctx->dispatch, and no bio
> > > merge can participate into these requests, and
> > > sequential IO performance is hurted.
> > > 
> > > This patch improves dispatching from sw queue when
> > > there is per-request-queue queue depth by taking
> > > request one by one from sw queue, just like the way
> > > of IO scheduler.
> > > 
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > ---
> > >  block/blk-mq-sched.c | 25 +++++++++++++++----------
> > >  1 file changed, 15 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > > index 47a25333a136..3510c01cb17b 100644
> > > --- a/block/blk-mq-sched.c
> > > +++ b/block/blk-mq-sched.c
> > > @@ -96,6 +96,9 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > >  	const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
> > >  	bool can_go = true;
> > >  	LIST_HEAD(rq_list);
> > > +	struct request *(*dispatch_fn)(struct blk_mq_hw_ctx *) =
> > > +		has_sched_dispatch ? e->type->ops.mq.dispatch_request :
> > > +			blk_mq_dispatch_rq_from_ctxs;
> > >  
> > >  	/* RCU or SRCU read lock is needed before checking quiesced flag */
> > >  	if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> > > @@ -126,26 +129,28 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > >  	if (!list_empty(&rq_list)) {
> > >  		blk_mq_sched_mark_restart_hctx(hctx);
> > >  		can_go = blk_mq_dispatch_rq_list(q, &rq_list);
> > > -	} else if (!has_sched_dispatch) {
> > > +	} else if (!has_sched_dispatch && !q->queue_depth) {
> > >  		blk_mq_flush_busy_ctxs(hctx, &rq_list);
> > >  		blk_mq_dispatch_rq_list(q, &rq_list);
> > > +		can_go = false;
> > >  	}
> > >  
> > > +	if (!can_go)
> > > +		return;
> > > +
> > >  	/*
> > >  	 * We want to dispatch from the scheduler if we had no work left
> > >  	 * on the dispatch list, OR if we did have work but weren't able
> > >  	 * to make progress.
> > >  	 */
> > > -	if (can_go && has_sched_dispatch) {
> > > -		do {
> > > -			struct request *rq;
> > > +	do {
> > > +		struct request *rq;
> > >  
> > > -			rq = e->type->ops.mq.dispatch_request(hctx);
> > > -			if (!rq)
> > > -				break;
> > > -			list_add(&rq->queuelist, &rq_list);
> > > -		} while (blk_mq_dispatch_rq_list(q, &rq_list));
> > > -	}
> > > +		rq = dispatch_fn(hctx);
> > > +		if (!rq)
> > > +			break;
> > > +		list_add(&rq->queuelist, &rq_list);
> > > +	} while (blk_mq_dispatch_rq_list(q, &rq_list));
> > >  }
> > 
> > Hello Ming,
> > 
> > Although I like the idea behind this patch, I'm afraid that this patch will
> > cause a performance regression for high-performance SCSI LLD drivers, e.g.
> > ib_srp. Have you considered to rework this patch as follows:
> > * Remove the code under "else if (!has_sched_dispatch && !q->queue_depth) {".
> 
> This will affect devices such as NVMe in which busy isn't triggered
> basically, so better to not do this.
> 
> > * Modify all blk_mq_dispatch_rq_list() functions such that these dispatch up
> >   to cmd_per_lun - (number of requests in progress) at once.
> 
> How can we get the accurate 'number of requests in progress' efficiently?
> 
> And we have done it in this way for blk-mq scheduler already, so it
> shouldn't be a problem.
> 
> From my test data of mq-deadline on lpfc, the performance is good,
> please see it in cover letter.

Forget to mention, ctx->list is one per-cpu list and the lock is percpu
lock, so changing to this way shouldn't be a performance issue.

-- 
Ming

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/14] blk-mq-sched: improve dispatching from sw queue
  2017-08-01 10:50       ` Ming Lei
@ 2017-08-01 15:11           ` Bart Van Assche
  0 siblings, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2017-08-01 15:11 UTC (permalink / raw)
  To: ming.lei; +Cc: linux-scsi, hch, linux-block, axboe, jejb, martin.petersen

On Tue, 2017-08-01 at 18:50 +0800, Ming Lei wrote:
> On Tue, Aug 01, 2017 at 06:17:18PM +0800, Ming Lei wrote:
> > How can we get the accurate 'number of requests in progress' efficientl=
y?

Hello Ming,

How about counting the number of bits that have been set in the tag set?
I am aware that these bits can be set and/or cleared concurrently with the
dispatch code but that count is probably a good starting point.

> > From my test data of mq-deadline on lpfc, the performance is good,
> > please see it in cover letter.
>=20
> Forget to mention, ctx->list is one per-cpu list and the lock is percpu
> lock, so changing to this way shouldn't be a performance issue.

Sorry but I don't consider this reply as sufficient. The latency of IB HCA'=
s
is significantly lower than that of any FC hardware I ran performance
measurements on myself. It's not because this patch series improves perform=
ance
for lpfc that that guarantees that there won't be a performance regression =
for
ib_srp, ib_iser or any other low-latency initiator driver for which q->dept=
h
!=3D 0.

Additionally, patch 03/14 most likely introduces a fairness problem. Should=
n't
blk_mq_dispatch_rq_from_ctxs() dequeue requests from the per-CPU queues in =
a
round-robin fashion instead of always starting at the first per-CPU queue i=
n
hctx->ctx_map?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/14] blk-mq-sched: improve dispatching from sw queue
@ 2017-08-01 15:11           ` Bart Van Assche
  0 siblings, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2017-08-01 15:11 UTC (permalink / raw)
  To: ming.lei; +Cc: linux-scsi, hch, linux-block, axboe, jejb, martin.petersen

On Tue, 2017-08-01 at 18:50 +0800, Ming Lei wrote:
> On Tue, Aug 01, 2017 at 06:17:18PM +0800, Ming Lei wrote:
> > How can we get the accurate 'number of requests in progress' efficiently?

Hello Ming,

How about counting the number of bits that have been set in the tag set?
I am aware that these bits can be set and/or cleared concurrently with the
dispatch code but that count is probably a good starting point.

> > From my test data of mq-deadline on lpfc, the performance is good,
> > please see it in cover letter.
> 
> Forget to mention, ctx->list is one per-cpu list and the lock is percpu
> lock, so changing to this way shouldn't be a performance issue.

Sorry but I don't consider this reply as sufficient. The latency of IB HCA's
is significantly lower than that of any FC hardware I ran performance
measurements on myself. It's not because this patch series improves performance
for lpfc that that guarantees that there won't be a performance regression for
ib_srp, ib_iser or any other low-latency initiator driver for which q->depth
!= 0.

Additionally, patch 03/14 most likely introduces a fairness problem. Shouldn't
blk_mq_dispatch_rq_from_ctxs() dequeue requests from the per-CPU queues in a
round-robin fashion instead of always starting at the first per-CPU queue in
hctx->ctx_map?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed
  2017-08-01 10:44     ` Ming Lei
@ 2017-08-01 16:14         ` Bart Van Assche
  0 siblings, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2017-08-01 16:14 UTC (permalink / raw)
  To: ming.lei; +Cc: linux-scsi, hch, linux-block, axboe, jejb, martin.petersen

On Tue, 2017-08-01 at 18:44 +0800, Ming Lei wrote:
> On Mon, Jul 31, 2017 at 11:42:21PM +0000, Bart Van Assche wrote:
> > Since setting, clearing and testing of BLK_MQ_S_BUSY can happen concurr=
ently
> > and since clearing and testing happens without any locks held I'm afrai=
d this
> > patch introduces the following race conditions:
> > [ ... ]
> > * Checking BLK_MQ_S_BUSY after requests have been removed from the disp=
atch list
> >   but before that bit is cleared, resulting in test_bit(BLK_MQ_S_BUSY, =
&hctx->state)
> >   reporting that the BLK_MQ_S_BUSY
> > has been set although there are no requests
> >   on the dispatch list.
>=20
> That won't be a problem, because dispatch will be started in the
> context in which dispatch list is flushed, since the BUSY bit
> is cleared after blk_mq_dispatch_rq_list() returns. So no I/O
> hang.

Hello Ming,

Please consider changing the name of the BLK_MQ_S_BUSY constant. That bit
is used to serialize dispatching requests from the hctx dispatch list but
that's not clear from the name of that constant.

Thanks,

Bart.=

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed
@ 2017-08-01 16:14         ` Bart Van Assche
  0 siblings, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2017-08-01 16:14 UTC (permalink / raw)
  To: ming.lei; +Cc: linux-scsi, hch, linux-block, axboe, jejb, martin.petersen

On Tue, 2017-08-01 at 18:44 +0800, Ming Lei wrote:
> On Mon, Jul 31, 2017 at 11:42:21PM +0000, Bart Van Assche wrote:
> > Since setting, clearing and testing of BLK_MQ_S_BUSY can happen concurrently
> > and since clearing and testing happens without any locks held I'm afraid this
> > patch introduces the following race conditions:
> > [ ... ]
> > * Checking BLK_MQ_S_BUSY after requests have been removed from the dispatch list
> >   but before that bit is cleared, resulting in test_bit(BLK_MQ_S_BUSY, &hctx->state)
> >   reporting that the BLK_MQ_S_BUSY
> > has been set although there are no requests
> >   on the dispatch list.
> 
> That won't be a problem, because dispatch will be started in the
> context in which dispatch list is flushed, since the BUSY bit
> is cleared after blk_mq_dispatch_rq_list() returns. So no I/O
> hang.

Hello Ming,

Please consider changing the name of the BLK_MQ_S_BUSY constant. That bit
is used to serialize dispatching requests from the hctx dispatch list but
that's not clear from the name of that constant.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed
  2017-08-01 16:14         ` Bart Van Assche
  (?)
@ 2017-08-02  3:01         ` Ming Lei
  2017-08-03  1:33             ` Bart Van Assche
  -1 siblings, 1 reply; 47+ messages in thread
From: Ming Lei @ 2017-08-02  3:01 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, hch, linux-block, axboe, jejb, martin.petersen

On Tue, Aug 01, 2017 at 04:14:07PM +0000, Bart Van Assche wrote:
> On Tue, 2017-08-01 at 18:44 +0800, Ming Lei wrote:
> > On Mon, Jul 31, 2017 at 11:42:21PM +0000, Bart Van Assche wrote:
> > > Since setting, clearing and testing of BLK_MQ_S_BUSY can happen concurrently
> > > and since clearing and testing happens without any locks held I'm afraid this
> > > patch introduces the following race conditions:
> > > [ ... ]
> > > * Checking BLK_MQ_S_BUSY after requests have been removed from the dispatch list
> > >   but before that bit is cleared, resulting in test_bit(BLK_MQ_S_BUSY, &hctx->state)
> > >   reporting that the BLK_MQ_S_BUSY
> > > has been set although there are no requests
> > >   on the dispatch list.
> > 
> > That won't be a problem, because dispatch will be started in the
> > context in which dispatch list is flushed, since the BUSY bit
> > is cleared after blk_mq_dispatch_rq_list() returns. So no I/O
> > hang.
> 
> Hello Ming,
> 
> Please consider changing the name of the BLK_MQ_S_BUSY constant. That bit
> is used to serialize dispatching requests from the hctx dispatch list but
> that's not clear from the name of that constant.

Actually what we want to do is to stop taking request from sw/scheduler
queue when ->dispatch aren't flushed completely, I think BUSY isn't
a bad name for this case, or how about DISPATCH_BUSY? or
FLUSHING_DISPATCH?

After thinking about the handling further, we can set the
BUSY bit just when adding requests to ->dispatch, and clear the
bit after returning from blk_mq_dispatch_rq_list() when the
current local list(from ->dispatch) is flushed completely and
->dispatch is empty. This way can minimize the race window, and
still safe, because we always move on to dispatch either new
request is added to ->dispatch or ->dispatch is flushed completely.

But anyway comment should be added for clarifying the fact.

-- 
Ming

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/14] blk-mq-sched: improve dispatching from sw queue
  2017-08-01 15:11           ` Bart Van Assche
  (?)
@ 2017-08-02  3:31           ` Ming Lei
  2017-08-03  1:35               ` Bart Van Assche
  -1 siblings, 1 reply; 47+ messages in thread
From: Ming Lei @ 2017-08-02  3:31 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, hch, linux-block, axboe, jejb, martin.petersen,
	Laurence Oberman

On Tue, Aug 01, 2017 at 03:11:42PM +0000, Bart Van Assche wrote:
> On Tue, 2017-08-01 at 18:50 +0800, Ming Lei wrote:
> > On Tue, Aug 01, 2017 at 06:17:18PM +0800, Ming Lei wrote:
> > > How can we get the accurate 'number of requests in progress' efficiently?
> 
> Hello Ming,
> 
> How about counting the number of bits that have been set in the tag set?
> I am aware that these bits can be set and/or cleared concurrently with the
> dispatch code but that count is probably a good starting point.

It has to be atomic_t, which is too too heavy for us, please see the report:

	http://marc.info/?t=149868448400003&r=1&w=2

Both Jens and I want to kill hd_struct.in_flight, but looks still no
good way. 

> 
> > > From my test data of mq-deadline on lpfc, the performance is good,
> > > please see it in cover letter.
> > 
> > Forget to mention, ctx->list is one per-cpu list and the lock is percpu
> > lock, so changing to this way shouldn't be a performance issue.
> 
> Sorry but I don't consider this reply as sufficient. The latency of IB HCA's
> is significantly lower than that of any FC hardware I ran performance
> measurements on myself. It's not because this patch series improves performance
> for lpfc that that guarantees that there won't be a performance regression for
> ib_srp, ib_iser or any other low-latency initiator driver for which q->depth
> != 0.

If IB HCA has lower latency than other FC, there should be less chances
to trigger BUSY. Otherwise IB should have the same sequential I/O
performance issue, I guess.

ctx list is per-cpu list, in theory there shouldn't be issues with this
change, and the only change for IB is the following:

V4.13-rc3:
	blk_mq_flush_busy_ctxs(hctx, &rq_list);
    blk_mq_dispatch_rq_list(q, &rq_list);

v4.13-rc3 patched:
        do {
                struct request *rq;

				/* pick up one request from one ctx list */
                rq = blk_mq_dispatch_rq_from_ctxs(hctx); 
                if (!rq)
                        break;
                list_add(&rq->queuelist, &rq_list);
        } while (blk_mq_dispatch_rq_list(q, &rq_list));

I doubt that the change can be observable in actual test.

Never mind, we will provide test data on IB HCA with V2, and Laurence
will help me run the test on IB.

> 
> Additionally, patch 03/14 most likely introduces a fairness problem. Shouldn't
> blk_mq_dispatch_rq_from_ctxs() dequeue requests from the per-CPU queues in a
> round-robin fashion instead of always starting at the first per-CPU queue in
> hctx->ctx_map?

That is a good question, looks round-robin should be more fair, will
change to it in V2 ant the implementation should be simple.

-- 
Ming

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 03/14] blk-mq: introduce blk_mq_dispatch_rq_from_ctxs()
  2017-07-31 16:51 ` [PATCH 03/14] blk-mq: introduce blk_mq_dispatch_rq_from_ctxs() Ming Lei
@ 2017-08-02 17:19     ` kbuild test robot
  2017-08-02 17:19     ` kbuild test robot
  1 sibling, 0 replies; 47+ messages in thread
From: kbuild test robot @ 2017-08-02 17:19 UTC (permalink / raw)
  To: Ming Lei
  Cc: kbuild-all, Jens Axboe, linux-block, Christoph Hellwig,
	Bart Van Assche, linux-scsi, Martin K . Petersen,
	James E . J . Bottomley, Ming Lei

[-- Attachment #1: Type: text/plain, Size: 3387 bytes --]

Hi Ming,

[auto build test ERROR on block/for-next]
[also build test ERROR on v4.13-rc3 next-20170802]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Ming-Lei/blk-mq-sched-fix-SCSI-MQ-performance-regression/20170801-031007
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: x86_64-randconfig-b0-08022356 (attached as .config)
compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All error/warnings (new ones prefixed by >>):

   block/blk-mq.c: In function 'blk_mq_flush_busy_ctxs':
>> block/blk-mq.c:861: error: unknown field 'list' specified in initializer
>> block/blk-mq.c:861: warning: missing braces around initializer
   block/blk-mq.c:861: warning: (near initialization for 'data.<anonymous>')
   block/blk-mq.c: In function 'blk_mq_dispatch_rq_from_ctxs':
>> block/blk-mq.c:872: error: unknown field 'rq' specified in initializer
   block/blk-mq.c:872: warning: missing braces around initializer
   block/blk-mq.c:872: warning: (near initialization for 'data.<anonymous>')

vim +/list +861 block/blk-mq.c

22e09fd59 Ming Lei      2017-08-01  852  
320ae51fe Jens Axboe    2013-10-24  853  /*
1429d7c94 Jens Axboe    2014-05-19  854   * Process software queues that have been marked busy, splicing them
1429d7c94 Jens Axboe    2014-05-19  855   * to the for-dispatch
1429d7c94 Jens Axboe    2014-05-19  856   */
2c3ad6679 Jens Axboe    2016-12-14  857  void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list)
1429d7c94 Jens Axboe    2014-05-19  858  {
4b5ef3bbb Ming Lei      2017-08-01  859  	struct ctx_iter_data data = {
88459642c Omar Sandoval 2016-09-17  860  		.hctx = hctx,
88459642c Omar Sandoval 2016-09-17 @861  		.list = list,
88459642c Omar Sandoval 2016-09-17  862  	};
1429d7c94 Jens Axboe    2014-05-19  863  
88459642c Omar Sandoval 2016-09-17  864  	sbitmap_for_each_set(&hctx->ctx_map, flush_busy_ctx, &data);
1429d7c94 Jens Axboe    2014-05-19  865  }
2c3ad6679 Jens Axboe    2016-12-14  866  EXPORT_SYMBOL_GPL(blk_mq_flush_busy_ctxs);
1429d7c94 Jens Axboe    2014-05-19  867  
22e09fd59 Ming Lei      2017-08-01  868  struct request *blk_mq_dispatch_rq_from_ctxs(struct blk_mq_hw_ctx *hctx)
22e09fd59 Ming Lei      2017-08-01  869  {
22e09fd59 Ming Lei      2017-08-01  870  	struct ctx_iter_data data = {
22e09fd59 Ming Lei      2017-08-01  871  		.hctx = hctx,
22e09fd59 Ming Lei      2017-08-01 @872  		.rq   = NULL,
22e09fd59 Ming Lei      2017-08-01  873  	};
22e09fd59 Ming Lei      2017-08-01  874  
22e09fd59 Ming Lei      2017-08-01  875  	sbitmap_for_each_set(&hctx->ctx_map, dispatch_rq_from_ctx, &data);
22e09fd59 Ming Lei      2017-08-01  876  
22e09fd59 Ming Lei      2017-08-01  877  	return data.rq;
22e09fd59 Ming Lei      2017-08-01  878  }
22e09fd59 Ming Lei      2017-08-01  879  

:::::: The code at line 861 was first introduced by commit
:::::: 88459642cba452630326b9cab1c651e09577d4e4 blk-mq: abstract tag allocation out into sbitmap library

:::::: TO: Omar Sandoval <osandov@fb.com>
:::::: CC: Jens Axboe <axboe@fb.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26703 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 03/14] blk-mq: introduce blk_mq_dispatch_rq_from_ctxs()
@ 2017-08-02 17:19     ` kbuild test robot
  0 siblings, 0 replies; 47+ messages in thread
From: kbuild test robot @ 2017-08-02 17:19 UTC (permalink / raw)
  Cc: kbuild-all, Jens Axboe, linux-block, Christoph Hellwig,
	Bart Van Assche, linux-scsi, Martin K . Petersen,
	James E . J . Bottomley, Ming Lei

[-- Attachment #1: Type: text/plain, Size: 3387 bytes --]

Hi Ming,

[auto build test ERROR on block/for-next]
[also build test ERROR on v4.13-rc3 next-20170802]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Ming-Lei/blk-mq-sched-fix-SCSI-MQ-performance-regression/20170801-031007
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: x86_64-randconfig-b0-08022356 (attached as .config)
compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All error/warnings (new ones prefixed by >>):

   block/blk-mq.c: In function 'blk_mq_flush_busy_ctxs':
>> block/blk-mq.c:861: error: unknown field 'list' specified in initializer
>> block/blk-mq.c:861: warning: missing braces around initializer
   block/blk-mq.c:861: warning: (near initialization for 'data.<anonymous>')
   block/blk-mq.c: In function 'blk_mq_dispatch_rq_from_ctxs':
>> block/blk-mq.c:872: error: unknown field 'rq' specified in initializer
   block/blk-mq.c:872: warning: missing braces around initializer
   block/blk-mq.c:872: warning: (near initialization for 'data.<anonymous>')

vim +/list +861 block/blk-mq.c

22e09fd59 Ming Lei      2017-08-01  852  
320ae51fe Jens Axboe    2013-10-24  853  /*
1429d7c94 Jens Axboe    2014-05-19  854   * Process software queues that have been marked busy, splicing them
1429d7c94 Jens Axboe    2014-05-19  855   * to the for-dispatch
1429d7c94 Jens Axboe    2014-05-19  856   */
2c3ad6679 Jens Axboe    2016-12-14  857  void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list)
1429d7c94 Jens Axboe    2014-05-19  858  {
4b5ef3bbb Ming Lei      2017-08-01  859  	struct ctx_iter_data data = {
88459642c Omar Sandoval 2016-09-17  860  		.hctx = hctx,
88459642c Omar Sandoval 2016-09-17 @861  		.list = list,
88459642c Omar Sandoval 2016-09-17  862  	};
1429d7c94 Jens Axboe    2014-05-19  863  
88459642c Omar Sandoval 2016-09-17  864  	sbitmap_for_each_set(&hctx->ctx_map, flush_busy_ctx, &data);
1429d7c94 Jens Axboe    2014-05-19  865  }
2c3ad6679 Jens Axboe    2016-12-14  866  EXPORT_SYMBOL_GPL(blk_mq_flush_busy_ctxs);
1429d7c94 Jens Axboe    2014-05-19  867  
22e09fd59 Ming Lei      2017-08-01  868  struct request *blk_mq_dispatch_rq_from_ctxs(struct blk_mq_hw_ctx *hctx)
22e09fd59 Ming Lei      2017-08-01  869  {
22e09fd59 Ming Lei      2017-08-01  870  	struct ctx_iter_data data = {
22e09fd59 Ming Lei      2017-08-01  871  		.hctx = hctx,
22e09fd59 Ming Lei      2017-08-01 @872  		.rq   = NULL,
22e09fd59 Ming Lei      2017-08-01  873  	};
22e09fd59 Ming Lei      2017-08-01  874  
22e09fd59 Ming Lei      2017-08-01  875  	sbitmap_for_each_set(&hctx->ctx_map, dispatch_rq_from_ctx, &data);
22e09fd59 Ming Lei      2017-08-01  876  
22e09fd59 Ming Lei      2017-08-01  877  	return data.rq;
22e09fd59 Ming Lei      2017-08-01  878  }
22e09fd59 Ming Lei      2017-08-01  879  

:::::: The code at line 861 was first introduced by commit
:::::: 88459642cba452630326b9cab1c651e09577d4e4 blk-mq: abstract tag allocation out into sbitmap library

:::::: TO: Omar Sandoval <osandov@fb.com>
:::::: CC: Jens Axboe <axboe@fb.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26703 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed
  2017-08-02  3:01         ` Ming Lei
@ 2017-08-03  1:33             ` Bart Van Assche
  0 siblings, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2017-08-03  1:33 UTC (permalink / raw)
  To: ming.lei; +Cc: linux-scsi, hch, linux-block, axboe, jejb, martin.petersen

T24gV2VkLCAyMDE3LTA4LTAyIGF0IDExOjAxICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gT24g
VHVlLCBBdWcgMDEsIDIwMTcgYXQgMDQ6MTQ6MDdQTSArMDAwMCwgQmFydCBWYW4gQXNzY2hlIHdy
b3RlOg0KPiA+IE9uIFR1ZSwgMjAxNy0wOC0wMSBhdCAxODo0NCArMDgwMCwgTWluZyBMZWkgd3Jv
dGU6DQo+ID4gPiBPbiBNb24sIEp1bCAzMSwgMjAxNyBhdCAxMTo0MjoyMVBNICswMDAwLCBCYXJ0
IFZhbiBBc3NjaGUgd3JvdGU6DQo+ID4gPiA+IFNpbmNlIHNldHRpbmcsIGNsZWFyaW5nIGFuZCB0
ZXN0aW5nIG9mIEJMS19NUV9TX0JVU1kgY2FuIGhhcHBlbiBjb25jdXJyZW50bHkNCj4gPiA+ID4g
YW5kIHNpbmNlIGNsZWFyaW5nIGFuZCB0ZXN0aW5nIGhhcHBlbnMgd2l0aG91dCBhbnkgbG9ja3Mg
aGVsZCBJJ20gYWZyYWlkIHRoaXMNCj4gPiA+ID4gcGF0Y2ggaW50cm9kdWNlcyB0aGUgZm9sbG93
aW5nIHJhY2UgY29uZGl0aW9uczoNCj4gPiA+ID4gWyAuLi4gXQ0KPiA+ID4gPiAqIENoZWNraW5n
IEJMS19NUV9TX0JVU1kgYWZ0ZXIgcmVxdWVzdHMgaGF2ZSBiZWVuIHJlbW92ZWQgZnJvbSB0aGUg
ZGlzcGF0Y2ggbGlzdA0KPiA+ID4gPiAgIGJ1dCBiZWZvcmUgdGhhdCBiaXQgaXMgY2xlYXJlZCwg
cmVzdWx0aW5nIGluIHRlc3RfYml0KEJMS19NUV9TX0JVU1ksICZoY3R4LT5zdGF0ZSkNCj4gPiA+
ID4gICByZXBvcnRpbmcgdGhhdCB0aGUgQkxLX01RX1NfQlVTWQ0KPiA+ID4gPiBoYXMgYmVlbiBz
ZXQgYWx0aG91Z2ggdGhlcmUgYXJlIG5vIHJlcXVlc3RzDQo+ID4gPiA+ICAgb24gdGhlIGRpc3Bh
dGNoIGxpc3QuDQo+ID4gPiANCj4gPiA+IFRoYXQgd29uJ3QgYmUgYSBwcm9ibGVtLCBiZWNhdXNl
IGRpc3BhdGNoIHdpbGwgYmUgc3RhcnRlZCBpbiB0aGUNCj4gPiA+IGNvbnRleHQgaW4gd2hpY2gg
ZGlzcGF0Y2ggbGlzdCBpcyBmbHVzaGVkLCBzaW5jZSB0aGUgQlVTWSBiaXQNCj4gPiA+IGlzIGNs
ZWFyZWQgYWZ0ZXIgYmxrX21xX2Rpc3BhdGNoX3JxX2xpc3QoKSByZXR1cm5zLiBTbyBubyBJL08N
Cj4gPiA+IGhhbmcuDQo+ID4gDQo+ID4gSGVsbG8gTWluZywNCj4gPiANCj4gPiBQbGVhc2UgY29u
c2lkZXIgY2hhbmdpbmcgdGhlIG5hbWUgb2YgdGhlIEJMS19NUV9TX0JVU1kgY29uc3RhbnQuIFRo
YXQgYml0DQo+ID4gaXMgdXNlZCB0byBzZXJpYWxpemUgZGlzcGF0Y2hpbmcgcmVxdWVzdHMgZnJv
bSB0aGUgaGN0eCBkaXNwYXRjaCBsaXN0IGJ1dA0KPiA+IHRoYXQncyBub3QgY2xlYXIgZnJvbSB0
aGUgbmFtZSBvZiB0aGF0IGNvbnN0YW50Lg0KPiANCj4gQWN0dWFsbHkgd2hhdCB3ZSB3YW50IHRv
IGRvIGlzIHRvIHN0b3AgdGFraW5nIHJlcXVlc3QgZnJvbSBzdy9zY2hlZHVsZXINCj4gcXVldWUg
d2hlbiAtPmRpc3BhdGNoIGFyZW4ndCBmbHVzaGVkIGNvbXBsZXRlbHksIEkgdGhpbmsgQlVTWSBp
c24ndA0KPiBhIGJhZCBuYW1lIGZvciB0aGlzIGNhc2UsIG9yIGhvdyBhYm91dCBESVNQQVRDSF9C
VVNZPyBvcg0KPiBGTFVTSElOR19ESVNQQVRDSD8NCg0KSGVsbG8gTWluZywNCg0KRkxVU0hJTkdf
RElTUEFUQ0ggc291bmRzIGZpbmUgdG8gbWUuIEluIGNhc2UgeW91IHdvdWxkIHByZWZlciBhIHNo
b3J0ZXIgbmFtZSwNCmhvdyBhYm91dCBCTEtfTVFfU19ESVNQQVRDSElORyAocmVmZXJzIHRvIGRp
c3BhdGNoaW5nIHJlcXVlc3RzIHRvIHRoZSBkcml2ZXIpPw0KDQpCYXJ0Lg==

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed
@ 2017-08-03  1:33             ` Bart Van Assche
  0 siblings, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2017-08-03  1:33 UTC (permalink / raw)
  To: ming.lei; +Cc: linux-scsi, hch, linux-block, axboe, jejb, martin.petersen

On Wed, 2017-08-02 at 11:01 +0800, Ming Lei wrote:
> On Tue, Aug 01, 2017 at 04:14:07PM +0000, Bart Van Assche wrote:
> > On Tue, 2017-08-01 at 18:44 +0800, Ming Lei wrote:
> > > On Mon, Jul 31, 2017 at 11:42:21PM +0000, Bart Van Assche wrote:
> > > > Since setting, clearing and testing of BLK_MQ_S_BUSY can happen concurrently
> > > > and since clearing and testing happens without any locks held I'm afraid this
> > > > patch introduces the following race conditions:
> > > > [ ... ]
> > > > * Checking BLK_MQ_S_BUSY after requests have been removed from the dispatch list
> > > >   but before that bit is cleared, resulting in test_bit(BLK_MQ_S_BUSY, &hctx->state)
> > > >   reporting that the BLK_MQ_S_BUSY
> > > > has been set although there are no requests
> > > >   on the dispatch list.
> > > 
> > > That won't be a problem, because dispatch will be started in the
> > > context in which dispatch list is flushed, since the BUSY bit
> > > is cleared after blk_mq_dispatch_rq_list() returns. So no I/O
> > > hang.
> > 
> > Hello Ming,
> > 
> > Please consider changing the name of the BLK_MQ_S_BUSY constant. That bit
> > is used to serialize dispatching requests from the hctx dispatch list but
> > that's not clear from the name of that constant.
> 
> Actually what we want to do is to stop taking request from sw/scheduler
> queue when ->dispatch aren't flushed completely, I think BUSY isn't
> a bad name for this case, or how about DISPATCH_BUSY? or
> FLUSHING_DISPATCH?

Hello Ming,

FLUSHING_DISPATCH sounds fine to me. In case you would prefer a shorter name,
how about BLK_MQ_S_DISPATCHING (refers to dispatching requests to the driver)?

Bart.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/14] blk-mq-sched: improve dispatching from sw queue
  2017-08-02  3:31           ` Ming Lei
@ 2017-08-03  1:35               ` Bart Van Assche
  0 siblings, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2017-08-03  1:35 UTC (permalink / raw)
  To: ming.lei
  Cc: linux-scsi, hch, linux-block, loberman, axboe, jejb, martin.petersen

T24gV2VkLCAyMDE3LTA4LTAyIGF0IDExOjMxICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gT24g
VHVlLCBBdWcgMDEsIDIwMTcgYXQgMDM6MTE6NDJQTSArMDAwMCwgQmFydCBWYW4gQXNzY2hlIHdy
b3RlOg0KPiA+IE9uIFR1ZSwgMjAxNy0wOC0wMSBhdCAxODo1MCArMDgwMCwgTWluZyBMZWkgd3Jv
dGU6DQo+ID4gPiBPbiBUdWUsIEF1ZyAwMSwgMjAxNyBhdCAwNjoxNzoxOFBNICswODAwLCBNaW5n
IExlaSB3cm90ZToNCj4gPiA+ID4gSG93IGNhbiB3ZSBnZXQgdGhlIGFjY3VyYXRlICdudW1iZXIg
b2YgcmVxdWVzdHMgaW4gcHJvZ3Jlc3MnIGVmZmljaWVudGx5Pw0KPiA+IA0KPiA+IEhlbGxvIE1p
bmcsDQo+ID4gDQo+ID4gSG93IGFib3V0IGNvdW50aW5nIHRoZSBudW1iZXIgb2YgYml0cyB0aGF0
IGhhdmUgYmVlbiBzZXQgaW4gdGhlIHRhZyBzZXQ/DQo+ID4gSSBhbSBhd2FyZSB0aGF0IHRoZXNl
IGJpdHMgY2FuIGJlIHNldCBhbmQvb3IgY2xlYXJlZCBjb25jdXJyZW50bHkgd2l0aCB0aGUNCj4g
PiBkaXNwYXRjaCBjb2RlIGJ1dCB0aGF0IGNvdW50IGlzIHByb2JhYmx5IGEgZ29vZCBzdGFydGlu
ZyBwb2ludC4NCj4gDQo+IEl0IGhhcyB0byBiZSBhdG9taWNfdCwgd2hpY2ggaXMgdG9vIHRvbyBo
ZWF2eSBmb3IgdXMsIHBsZWFzZSBzZWUgdGhlIHJlcG9ydDoNCj4gDQo+IAlodHRwOi8vbWFyYy5p
bmZvLz90PTE0OTg2ODQ0ODQwMDAwMyZyPTEmdz0yDQo+IA0KPiBCb3RoIEplbnMgYW5kIEkgd2Fu
dCB0byBraWxsIGhkX3N0cnVjdC5pbl9mbGlnaHQsIGJ1dCBsb29rcyBzdGlsbCBubw0KPiBnb29k
IHdheS4gDQoNCkhlbGxvIE1pbmcsDQoNClNvcnJ5IGJ1dCBJIGRpc2FncmVlIHRoYXQgYSBuZXcg
YXRvbWljIHZhcmlhYmxlIHNob3VsZCBiZSBhZGRlZCB0byBrZWVwIHRyYWNrDQpvZiB0aGUgbnVt
YmVyIG9mIGJ1c3kgcmVxdWVzdHMuIENvdW50aW5nIHRoZSBudW1iZXIgb2YgYml0cyB0aGF0IGFy
ZSBzZXQgaW4NCnRoZSB0YWcgc2V0IHNob3VsZCBiZSBnb29kIGVub3VnaCBpbiB0aGlzIGNvbnRl
eHQuDQoNClRoYW5rcywNCg0KQmFydC4=

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/14] blk-mq-sched: improve dispatching from sw queue
@ 2017-08-03  1:35               ` Bart Van Assche
  0 siblings, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2017-08-03  1:35 UTC (permalink / raw)
  To: ming.lei
  Cc: linux-scsi, hch, linux-block, loberman, axboe, jejb, martin.petersen

On Wed, 2017-08-02 at 11:31 +0800, Ming Lei wrote:
> On Tue, Aug 01, 2017 at 03:11:42PM +0000, Bart Van Assche wrote:
> > On Tue, 2017-08-01 at 18:50 +0800, Ming Lei wrote:
> > > On Tue, Aug 01, 2017 at 06:17:18PM +0800, Ming Lei wrote:
> > > > How can we get the accurate 'number of requests in progress' efficiently?
> > 
> > Hello Ming,
> > 
> > How about counting the number of bits that have been set in the tag set?
> > I am aware that these bits can be set and/or cleared concurrently with the
> > dispatch code but that count is probably a good starting point.
> 
> It has to be atomic_t, which is too too heavy for us, please see the report:
> 
> 	http://marc.info/?t=149868448400003&r=1&w=2
> 
> Both Jens and I want to kill hd_struct.in_flight, but looks still no
> good way. 

Hello Ming,

Sorry but I disagree that a new atomic variable should be added to keep track
of the number of busy requests. Counting the number of bits that are set in
the tag set should be good enough in this context.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/14] blk-mq-sched: improve dispatching from sw queue
  2017-08-03  1:35               ` Bart Van Assche
  (?)
@ 2017-08-03  3:13               ` Ming Lei
  2017-08-03 17:33                   ` Bart Van Assche
  -1 siblings, 1 reply; 47+ messages in thread
From: Ming Lei @ 2017-08-03  3:13 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, hch, linux-block, loberman, axboe, jejb, martin.petersen

On Thu, Aug 03, 2017 at 01:35:29AM +0000, Bart Van Assche wrote:
> On Wed, 2017-08-02 at 11:31 +0800, Ming Lei wrote:
> > On Tue, Aug 01, 2017 at 03:11:42PM +0000, Bart Van Assche wrote:
> > > On Tue, 2017-08-01 at 18:50 +0800, Ming Lei wrote:
> > > > On Tue, Aug 01, 2017 at 06:17:18PM +0800, Ming Lei wrote:
> > > > > How can we get the accurate 'number of requests in progress' efficiently?
> > > 
> > > Hello Ming,
> > > 
> > > How about counting the number of bits that have been set in the tag set?
> > > I am aware that these bits can be set and/or cleared concurrently with the
> > > dispatch code but that count is probably a good starting point.
> > 
> > It has to be atomic_t, which is too too heavy for us, please see the report:
> > 
> > 	http://marc.info/?t=149868448400003&r=1&w=2
> > 
> > Both Jens and I want to kill hd_struct.in_flight, but looks still no
> > good way. 
> 
> Hello Ming,
> 
> Sorry but I disagree that a new atomic variable should be added to keep track
> of the number of busy requests. Counting the number of bits that are set in
> the tag set should be good enough in this context.

That won't work because the tag set is host wide and shared by all LUNs.


-- 
Ming

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/14] blk-mq-sched: improve dispatching from sw queue
  2017-08-03  3:13               ` Ming Lei
@ 2017-08-03 17:33                   ` Bart Van Assche
  0 siblings, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2017-08-03 17:33 UTC (permalink / raw)
  To: ming.lei
  Cc: linux-scsi, hch, jejb, linux-block, axboe, loberman, martin.petersen

On Thu, 2017-08-03 at 11:13 +0800, Ming Lei wrote:
> On Thu, Aug 03, 2017 at 01:35:29AM +0000, Bart Van Assche wrote:
> > On Wed, 2017-08-02 at 11:31 +0800, Ming Lei wrote:
> > > On Tue, Aug 01, 2017 at 03:11:42PM +0000, Bart Van Assche wrote:
> > > > On Tue, 2017-08-01 at 18:50 +0800, Ming Lei wrote:
> > > > > On Tue, Aug 01, 2017 at 06:17:18PM +0800, Ming Lei wrote:
> > > > > > How can we get the accurate 'number of requests in progress' ef=
ficiently?
> > > >=20
> > > > Hello Ming,
> > > >=20
> > > > How about counting the number of bits that have been set in the tag=
 set?
> > > > I am aware that these bits can be set and/or cleared concurrently w=
ith the
> > > > dispatch code but that count is probably a good starting point.
> > >=20
> > > It has to be atomic_t, which is too too heavy for us, please see the =
report:
> > >=20
> > > 	http://marc.info/?t=3D149868448400003&r=3D1&w=3D2
> > >=20
> > > Both Jens and I want to kill hd_struct.in_flight, but looks still no
> > > good way.=20
> >=20
> > Hello Ming,
> >=20
> > Sorry but I disagree that a new atomic variable should be added to keep=
 track
> > of the number of busy requests. Counting the number of bits that are se=
t in
> > the tag set should be good enough in this context.
>=20
> That won't work because the tag set is host wide and shared by all LUNs.

Hello Ming,

Are you aware that the SCSI core already keeps track of the number of busy =
requests
per LUN? See also the device_busy member of struct scsi_device. How about g=
iving the
block layer core access in some way to that counter?

Bart.=

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/14] blk-mq-sched: improve dispatching from sw queue
@ 2017-08-03 17:33                   ` Bart Van Assche
  0 siblings, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2017-08-03 17:33 UTC (permalink / raw)
  To: ming.lei
  Cc: linux-scsi, hch, jejb, linux-block, axboe, loberman, martin.petersen

On Thu, 2017-08-03 at 11:13 +0800, Ming Lei wrote:
> On Thu, Aug 03, 2017 at 01:35:29AM +0000, Bart Van Assche wrote:
> > On Wed, 2017-08-02 at 11:31 +0800, Ming Lei wrote:
> > > On Tue, Aug 01, 2017 at 03:11:42PM +0000, Bart Van Assche wrote:
> > > > On Tue, 2017-08-01 at 18:50 +0800, Ming Lei wrote:
> > > > > On Tue, Aug 01, 2017 at 06:17:18PM +0800, Ming Lei wrote:
> > > > > > How can we get the accurate 'number of requests in progress' efficiently?
> > > > 
> > > > Hello Ming,
> > > > 
> > > > How about counting the number of bits that have been set in the tag set?
> > > > I am aware that these bits can be set and/or cleared concurrently with the
> > > > dispatch code but that count is probably a good starting point.
> > > 
> > > It has to be atomic_t, which is too too heavy for us, please see the report:
> > > 
> > > 	http://marc.info/?t=149868448400003&r=1&w=2
> > > 
> > > Both Jens and I want to kill hd_struct.in_flight, but looks still no
> > > good way. 
> > 
> > Hello Ming,
> > 
> > Sorry but I disagree that a new atomic variable should be added to keep track
> > of the number of busy requests. Counting the number of bits that are set in
> > the tag set should be good enough in this context.
> 
> That won't work because the tag set is host wide and shared by all LUNs.

Hello Ming,

Are you aware that the SCSI core already keeps track of the number of busy requests
per LUN? See also the device_busy member of struct scsi_device. How about giving the
block layer core access in some way to that counter?

Bart.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/14] blk-mq-sched: improve dispatching from sw queue
  2017-08-03 17:33                   ` Bart Van Assche
  (?)
@ 2017-08-05  8:40                   ` hch
  -1 siblings, 0 replies; 47+ messages in thread
From: hch @ 2017-08-05  8:40 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: ming.lei, linux-scsi, hch, jejb, linux-block, axboe, loberman,
	martin.petersen

On Thu, Aug 03, 2017 at 05:33:13PM +0000, Bart Van Assche wrote:
> Are you aware that the SCSI core already keeps track of the number of busy requests
> per LUN? See also the device_busy member of struct scsi_device. How about giving the
> block layer core access in some way to that counter?

I'd love ot move it to blk-mq in a scalable way eventually, as
we'll run into the same problems with NVMe systems with multiple
namespaces.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/14] blk-mq-sched: improve dispatching from sw queue
  2017-08-03 17:33                   ` Bart Van Assche
  (?)
  (?)
@ 2017-08-05 13:40                   ` Ming Lei
  -1 siblings, 0 replies; 47+ messages in thread
From: Ming Lei @ 2017-08-05 13:40 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, hch, jejb, linux-block, axboe, loberman, martin.petersen

On Thu, Aug 03, 2017 at 05:33:13PM +0000, Bart Van Assche wrote:
> On Thu, 2017-08-03 at 11:13 +0800, Ming Lei wrote:
> > On Thu, Aug 03, 2017 at 01:35:29AM +0000, Bart Van Assche wrote:
> > > On Wed, 2017-08-02 at 11:31 +0800, Ming Lei wrote:
> > > > On Tue, Aug 01, 2017 at 03:11:42PM +0000, Bart Van Assche wrote:
> > > > > On Tue, 2017-08-01 at 18:50 +0800, Ming Lei wrote:
> > > > > > On Tue, Aug 01, 2017 at 06:17:18PM +0800, Ming Lei wrote:
> > > > > > > How can we get the accurate 'number of requests in progress' efficiently?
> > > > > 
> > > > > Hello Ming,
> > > > > 
> > > > > How about counting the number of bits that have been set in the tag set?
> > > > > I am aware that these bits can be set and/or cleared concurrently with the
> > > > > dispatch code but that count is probably a good starting point.
> > > > 
> > > > It has to be atomic_t, which is too too heavy for us, please see the report:
> > > > 
> > > > 	http://marc.info/?t=149868448400003&r=1&w=2
> > > > 
> > > > Both Jens and I want to kill hd_struct.in_flight, but looks still no
> > > > good way. 
> > > 
> > > Hello Ming,
> > > 
> > > Sorry but I disagree that a new atomic variable should be added to keep track
> > > of the number of busy requests. Counting the number of bits that are set in
> > > the tag set should be good enough in this context.
> > 
> > That won't work because the tag set is host wide and shared by all LUNs.
> 
> Hello Ming,
> 
> Are you aware that the SCSI core already keeps track of the number of busy requests
> per LUN? See also the device_busy member of struct scsi_device. How about giving the
> block layer core access in some way to that counter?

Yes, I know that.

Last time I mentioned it to Christoph that this counter can be used for
implementing Runtime PM for avoiding to introduce one new counter to 
account pending I/O.

But for this purpose(estimating how many requests to dequeue from hctxs),
it isn't a good idea:

1) strictly speaking, atomic counter isn't enough, and lock 
is needed, because we need to make sure that the counter can't
be changed during dequeuing requests, so exporting the counter
to block won't work

2) even though you may think it is just for estimating, and
not use a lock, it isn't good too, because for some SCSI devices,
q->queue_depth is very small, both qla2xxx and lfpc's .cmd_perf_lun
is 3. So it can be very inaccurate since it is normal to dequeue
requests from all hctx at the same time.

Also I have posted V2 today, from the test result on SRP, looks
it is good to dequeue one request one time, so I suggest that we
follow mq scheduler's way to dequeue request(pick up one in one time)
for blk-mq 'none' in this patchset. We may consider to improve
it in future if there is better & mature idea.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2017-08-05 13:40 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-31 16:50 [PATCH 00/14] blk-mq-sched: fix SCSI-MQ performance regression Ming Lei
2017-07-31 16:50 ` Ming Lei
2017-07-31 16:50 ` [PATCH 01/14] blk-mq-sched: fix scheduler bad performance Ming Lei
2017-07-31 23:00   ` Bart Van Assche
2017-07-31 23:00     ` Bart Van Assche
2017-07-31 16:50 ` [PATCH 02/14] blk-mq: rename flush_busy_ctx_data as ctx_iter_data Ming Lei
2017-07-31 23:03   ` Bart Van Assche
2017-07-31 23:03     ` Bart Van Assche
2017-07-31 16:51 ` [PATCH 03/14] blk-mq: introduce blk_mq_dispatch_rq_from_ctxs() Ming Lei
2017-07-31 23:09   ` Bart Van Assche
2017-07-31 23:09     ` Bart Van Assche
2017-08-01 10:07     ` Ming Lei
2017-08-02 17:19   ` kbuild test robot
2017-08-02 17:19     ` kbuild test robot
2017-07-31 16:51 ` [PATCH 04/14] blk-mq-sched: improve dispatching from sw queue Ming Lei
2017-07-31 23:34   ` Bart Van Assche
2017-07-31 23:34     ` Bart Van Assche
2017-08-01 10:17     ` Ming Lei
2017-08-01 10:50       ` Ming Lei
2017-08-01 15:11         ` Bart Van Assche
2017-08-01 15:11           ` Bart Van Assche
2017-08-02  3:31           ` Ming Lei
2017-08-03  1:35             ` Bart Van Assche
2017-08-03  1:35               ` Bart Van Assche
2017-08-03  3:13               ` Ming Lei
2017-08-03 17:33                 ` Bart Van Assche
2017-08-03 17:33                   ` Bart Van Assche
2017-08-05  8:40                   ` hch
2017-08-05 13:40                   ` Ming Lei
2017-07-31 16:51 ` [PATCH 05/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed Ming Lei
2017-07-31 23:42   ` Bart Van Assche
2017-07-31 23:42     ` Bart Van Assche
2017-08-01 10:44     ` Ming Lei
2017-08-01 16:14       ` Bart Van Assche
2017-08-01 16:14         ` Bart Van Assche
2017-08-02  3:01         ` Ming Lei
2017-08-03  1:33           ` Bart Van Assche
2017-08-03  1:33             ` Bart Van Assche
2017-07-31 16:51 ` [PATCH 06/14] blk-mq-sched: introduce blk_mq_sched_queue_depth() Ming Lei
2017-07-31 16:51 ` [PATCH 07/14] blk-mq-sched: use q->queue_depth as hint for q->nr_requests Ming Lei
2017-07-31 16:51 ` [PATCH 08/14] blk-mq: introduce BLK_MQ_F_SHARED_DEPTH Ming Lei
2017-07-31 16:51 ` [PATCH 09/14] blk-mq-sched: cleanup blk_mq_sched_dispatch_requests() Ming Lei
2017-07-31 16:51 ` [PATCH 10/14] blk-mq-sched: introduce helpers for query, change busy state Ming Lei
2017-07-31 16:51 ` [PATCH 11/14] blk-mq: introduce helpers for operating ->dispatch list Ming Lei
2017-07-31 16:51 ` [PATCH 12/14] blk-mq: introduce pointers to dispatch lock & list Ming Lei
2017-07-31 16:51 ` [PATCH 13/14] blk-mq: pass 'request_queue *' to several helpers of operating BUSY Ming Lei
2017-07-31 16:51 ` [PATCH 14/14] blk-mq-sched: improve IO scheduling on SCSI devcie Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.