All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET 0/8] block plugging improvements
@ 2018-11-26 16:35 ` Jens Axboe
  0 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-26 16:35 UTC (permalink / raw)
  To: linux-block, linux-nvme

Series improving plugging for fast devices, but some fixes in here too.

1-2 are improvements around plugging accounting. Changes the behavior
a bit, but works fine for me.

3-6 add a ->commit_rqs() hook and implement it in drivers that use (or
will use) bd->last to optimize IO submission. If a driver currently uses
bd->last to know if it's needed to kick the hardware into action, there
are cases where we flag bd->last == false, but then fail to submit any
further IO due to other resource constraints. We probably get saved by
the fact that this happens for the case where we have pending IO and
that will eventually guarantee forward progress, but we really should
kick IO into gear at that point.

7-8 improve plugging for blk-mq.

In terms of improvements, for virtualized nvme, I've seen a 2-x IOPS
improvement with proper handling of bd->last with this series.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCHSET 0/8] block plugging improvements
@ 2018-11-26 16:35 ` Jens Axboe
  0 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-26 16:35 UTC (permalink / raw)


Series improving plugging for fast devices, but some fixes in here too.

1-2 are improvements around plugging accounting. Changes the behavior
a bit, but works fine for me.

3-6 add a ->commit_rqs() hook and implement it in drivers that use (or
will use) bd->last to optimize IO submission. If a driver currently uses
bd->last to know if it's needed to kick the hardware into action, there
are cases where we flag bd->last == false, but then fail to submit any
further IO due to other resource constraints. We probably get saved by
the fact that this happens for the case where we have pending IO and
that will eventually guarantee forward progress, but we really should
kick IO into gear at that point.

7-8 improve plugging for blk-mq.

In terms of improvements, for virtualized nvme, I've seen a 2-x IOPS
improvement with proper handling of bd->last with this series.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 1/8] block: sum requests in the plug structure
  2018-11-26 16:35 ` Jens Axboe
@ 2018-11-26 16:35   ` Jens Axboe
  -1 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-26 16:35 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: Jens Axboe

This isn't exactly the same as the previous count, as it includes
requests for all devices. But that really doesn't matter, if we have
more than the threshold (16) queued up, flush it. It's not worth it
to have an expensive list loop for this.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-core.c       | 30 ++++--------------------------
 block/blk-mq.c         | 16 +++++-----------
 block/blk.h            |  2 --
 include/linux/blkdev.h |  1 +
 4 files changed, 10 insertions(+), 39 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 9af56dbb84f1..be9233400314 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -736,7 +736,6 @@ bool bio_attempt_discard_merge(struct request_queue *q, struct request *req,
  * Caller must ensure !blk_queue_nomerges(q) beforehand.
  */
 bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
-			    unsigned int *request_count,
 			    struct request **same_queue_rq)
 {
 	struct blk_plug *plug;
@@ -746,22 +745,19 @@ bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
 	plug = current->plug;
 	if (!plug)
 		return false;
-	*request_count = 0;
 
 	plug_list = &plug->mq_list;
 
 	list_for_each_entry_reverse(rq, plug_list, queuelist) {
 		bool merged = false;
 
-		if (rq->q == q) {
-			(*request_count)++;
+		if (rq->q == q && same_queue_rq) {
 			/*
 			 * Only blk-mq multiple hardware queues case checks the
 			 * rq in the same queue, there should be only one such
 			 * rq in a queue
 			 **/
-			if (same_queue_rq)
-				*same_queue_rq = rq;
+			*same_queue_rq = rq;
 		}
 
 		if (rq->q != q || !blk_rq_merge_ok(rq, bio))
@@ -788,26 +784,6 @@ bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
 	return false;
 }
 
-unsigned int blk_plug_queued_count(struct request_queue *q)
-{
-	struct blk_plug *plug;
-	struct request *rq;
-	struct list_head *plug_list;
-	unsigned int ret = 0;
-
-	plug = current->plug;
-	if (!plug)
-		goto out;
-
-	plug_list = &plug->mq_list;
-	list_for_each_entry(rq, plug_list, queuelist) {
-		if (rq->q == q)
-			ret++;
-	}
-out:
-	return ret;
-}
-
 void blk_init_request_from_bio(struct request *req, struct bio *bio)
 {
 	if (bio->bi_opf & REQ_RAHEAD)
@@ -1803,6 +1779,8 @@ void blk_start_plug(struct blk_plug *plug)
 
 	INIT_LIST_HEAD(&plug->mq_list);
 	INIT_LIST_HEAD(&plug->cb_list);
+	plug->rq_count = 0;
+
 	/*
 	 * Store ordering should not be needed here, since a potential
 	 * preempt will imply a full memory barrier
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 37674c1766a7..99c66823d52f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1676,6 +1676,7 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
 	unsigned int depth;
 
 	list_splice_init(&plug->mq_list, &list);
+	plug->rq_count = 0;
 
 	list_sort(NULL, &list, plug_rq_cmp);
 
@@ -1872,7 +1873,6 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 	const int is_flush_fua = op_is_flush(bio->bi_opf);
 	struct blk_mq_alloc_data data = { .flags = 0, .cmd_flags = bio->bi_opf };
 	struct request *rq;
-	unsigned int request_count = 0;
 	struct blk_plug *plug;
 	struct request *same_queue_rq = NULL;
 	blk_qc_t cookie;
@@ -1885,7 +1885,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		return BLK_QC_T_NONE;
 
 	if (!is_flush_fua && !blk_queue_nomerges(q) &&
-	    blk_attempt_plug_merge(q, bio, &request_count, &same_queue_rq))
+	    blk_attempt_plug_merge(q, bio, &same_queue_rq))
 		return BLK_QC_T_NONE;
 
 	if (blk_mq_sched_bio_merge(q, bio))
@@ -1916,20 +1916,12 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_insert_flush(rq);
 		blk_mq_run_hw_queue(data.hctx, true);
 	} else if (plug && q->nr_hw_queues == 1) {
+		unsigned int request_count = plug->rq_count;
 		struct request *last = NULL;
 
 		blk_mq_put_ctx(data.ctx);
 		blk_mq_bio_to_request(rq, bio);
 
-		/*
-		 * @request_count may become stale because of schedule
-		 * out, so check the list again.
-		 */
-		if (list_empty(&plug->mq_list))
-			request_count = 0;
-		else if (blk_queue_nomerges(q))
-			request_count = blk_plug_queued_count(q);
-
 		if (!request_count)
 			trace_block_plug(q);
 		else
@@ -1942,6 +1934,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		}
 
 		list_add_tail(&rq->queuelist, &plug->mq_list);
+		plug->rq_count++;
 	} else if (plug && !blk_queue_nomerges(q)) {
 		blk_mq_bio_to_request(rq, bio);
 
@@ -1957,6 +1950,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		if (same_queue_rq)
 			list_del_init(&same_queue_rq->queuelist);
 		list_add_tail(&rq->queuelist, &plug->mq_list);
+		plug->rq_count++;
 
 		blk_mq_put_ctx(data.ctx);
 
diff --git a/block/blk.h b/block/blk.h
index 610948157a5b..848278c52030 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -161,9 +161,7 @@ bool bio_attempt_back_merge(struct request_queue *q, struct request *req,
 bool bio_attempt_discard_merge(struct request_queue *q, struct request *req,
 		struct bio *bio);
 bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
-			    unsigned int *request_count,
 			    struct request **same_queue_rq);
-unsigned int blk_plug_queued_count(struct request_queue *q);
 
 void blk_account_io_start(struct request *req, bool new_io);
 void blk_account_io_completion(struct request *req, unsigned int bytes);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e3c0a8ec16a7..02732cae6080 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1130,6 +1130,7 @@ extern void blk_set_queue_dying(struct request_queue *);
 struct blk_plug {
 	struct list_head mq_list; /* blk-mq requests */
 	struct list_head cb_list; /* md requires an unplug callback */
+	unsigned short rq_count;
 };
 #define BLK_MAX_REQUEST_COUNT 16
 #define BLK_PLUG_FLUSH_SIZE (128 * 1024)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 1/8] block: sum requests in the plug structure
@ 2018-11-26 16:35   ` Jens Axboe
  0 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-26 16:35 UTC (permalink / raw)


This isn't exactly the same as the previous count, as it includes
requests for all devices. But that really doesn't matter, if we have
more than the threshold (16) queued up, flush it. It's not worth it
to have an expensive list loop for this.

Signed-off-by: Jens Axboe <axboe at kernel.dk>
---
 block/blk-core.c       | 30 ++++--------------------------
 block/blk-mq.c         | 16 +++++-----------
 block/blk.h            |  2 --
 include/linux/blkdev.h |  1 +
 4 files changed, 10 insertions(+), 39 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 9af56dbb84f1..be9233400314 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -736,7 +736,6 @@ bool bio_attempt_discard_merge(struct request_queue *q, struct request *req,
  * Caller must ensure !blk_queue_nomerges(q) beforehand.
  */
 bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
-			    unsigned int *request_count,
 			    struct request **same_queue_rq)
 {
 	struct blk_plug *plug;
@@ -746,22 +745,19 @@ bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
 	plug = current->plug;
 	if (!plug)
 		return false;
-	*request_count = 0;
 
 	plug_list = &plug->mq_list;
 
 	list_for_each_entry_reverse(rq, plug_list, queuelist) {
 		bool merged = false;
 
-		if (rq->q == q) {
-			(*request_count)++;
+		if (rq->q == q && same_queue_rq) {
 			/*
 			 * Only blk-mq multiple hardware queues case checks the
 			 * rq in the same queue, there should be only one such
 			 * rq in a queue
 			 **/
-			if (same_queue_rq)
-				*same_queue_rq = rq;
+			*same_queue_rq = rq;
 		}
 
 		if (rq->q != q || !blk_rq_merge_ok(rq, bio))
@@ -788,26 +784,6 @@ bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
 	return false;
 }
 
-unsigned int blk_plug_queued_count(struct request_queue *q)
-{
-	struct blk_plug *plug;
-	struct request *rq;
-	struct list_head *plug_list;
-	unsigned int ret = 0;
-
-	plug = current->plug;
-	if (!plug)
-		goto out;
-
-	plug_list = &plug->mq_list;
-	list_for_each_entry(rq, plug_list, queuelist) {
-		if (rq->q == q)
-			ret++;
-	}
-out:
-	return ret;
-}
-
 void blk_init_request_from_bio(struct request *req, struct bio *bio)
 {
 	if (bio->bi_opf & REQ_RAHEAD)
@@ -1803,6 +1779,8 @@ void blk_start_plug(struct blk_plug *plug)
 
 	INIT_LIST_HEAD(&plug->mq_list);
 	INIT_LIST_HEAD(&plug->cb_list);
+	plug->rq_count = 0;
+
 	/*
 	 * Store ordering should not be needed here, since a potential
 	 * preempt will imply a full memory barrier
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 37674c1766a7..99c66823d52f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1676,6 +1676,7 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
 	unsigned int depth;
 
 	list_splice_init(&plug->mq_list, &list);
+	plug->rq_count = 0;
 
 	list_sort(NULL, &list, plug_rq_cmp);
 
@@ -1872,7 +1873,6 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 	const int is_flush_fua = op_is_flush(bio->bi_opf);
 	struct blk_mq_alloc_data data = { .flags = 0, .cmd_flags = bio->bi_opf };
 	struct request *rq;
-	unsigned int request_count = 0;
 	struct blk_plug *plug;
 	struct request *same_queue_rq = NULL;
 	blk_qc_t cookie;
@@ -1885,7 +1885,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		return BLK_QC_T_NONE;
 
 	if (!is_flush_fua && !blk_queue_nomerges(q) &&
-	    blk_attempt_plug_merge(q, bio, &request_count, &same_queue_rq))
+	    blk_attempt_plug_merge(q, bio, &same_queue_rq))
 		return BLK_QC_T_NONE;
 
 	if (blk_mq_sched_bio_merge(q, bio))
@@ -1916,20 +1916,12 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_insert_flush(rq);
 		blk_mq_run_hw_queue(data.hctx, true);
 	} else if (plug && q->nr_hw_queues == 1) {
+		unsigned int request_count = plug->rq_count;
 		struct request *last = NULL;
 
 		blk_mq_put_ctx(data.ctx);
 		blk_mq_bio_to_request(rq, bio);
 
-		/*
-		 * @request_count may become stale because of schedule
-		 * out, so check the list again.
-		 */
-		if (list_empty(&plug->mq_list))
-			request_count = 0;
-		else if (blk_queue_nomerges(q))
-			request_count = blk_plug_queued_count(q);
-
 		if (!request_count)
 			trace_block_plug(q);
 		else
@@ -1942,6 +1934,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		}
 
 		list_add_tail(&rq->queuelist, &plug->mq_list);
+		plug->rq_count++;
 	} else if (plug && !blk_queue_nomerges(q)) {
 		blk_mq_bio_to_request(rq, bio);
 
@@ -1957,6 +1950,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		if (same_queue_rq)
 			list_del_init(&same_queue_rq->queuelist);
 		list_add_tail(&rq->queuelist, &plug->mq_list);
+		plug->rq_count++;
 
 		blk_mq_put_ctx(data.ctx);
 
diff --git a/block/blk.h b/block/blk.h
index 610948157a5b..848278c52030 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -161,9 +161,7 @@ bool bio_attempt_back_merge(struct request_queue *q, struct request *req,
 bool bio_attempt_discard_merge(struct request_queue *q, struct request *req,
 		struct bio *bio);
 bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
-			    unsigned int *request_count,
 			    struct request **same_queue_rq);
-unsigned int blk_plug_queued_count(struct request_queue *q);
 
 void blk_account_io_start(struct request *req, bool new_io);
 void blk_account_io_completion(struct request *req, unsigned int bytes);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e3c0a8ec16a7..02732cae6080 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1130,6 +1130,7 @@ extern void blk_set_queue_dying(struct request_queue *);
 struct blk_plug {
 	struct list_head mq_list; /* blk-mq requests */
 	struct list_head cb_list; /* md requires an unplug callback */
+	unsigned short rq_count;
 };
 #define BLK_MAX_REQUEST_COUNT 16
 #define BLK_PLUG_FLUSH_SIZE (128 * 1024)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 2/8] block: improve logic around when to sort a plug list
  2018-11-26 16:35 ` Jens Axboe
@ 2018-11-26 16:35   ` Jens Axboe
  -1 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-26 16:35 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: Jens Axboe

Do it for the nr_hw_queues == 1 case, but only do it for the multi queue
case if we have requests for multiple devices in the plug.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-core.c       | 1 +
 block/blk-mq.c         | 7 +++++--
 include/linux/blkdev.h | 1 +
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index be9233400314..c9758d185357 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1780,6 +1780,7 @@ void blk_start_plug(struct blk_plug *plug)
 	INIT_LIST_HEAD(&plug->mq_list);
 	INIT_LIST_HEAD(&plug->cb_list);
 	plug->rq_count = 0;
+	plug->do_sort = false;
 
 	/*
 	 * Store ordering should not be needed here, since a potential
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 99c66823d52f..6a249bf6ed00 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1678,7 +1678,8 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
 	list_splice_init(&plug->mq_list, &list);
 	plug->rq_count = 0;
 
-	list_sort(NULL, &list, plug_rq_cmp);
+	if (plug->do_sort)
+		list_sort(NULL, &list, plug_rq_cmp);
 
 	this_q = NULL;
 	this_hctx = NULL;
@@ -1935,6 +1936,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 
 		list_add_tail(&rq->queuelist, &plug->mq_list);
 		plug->rq_count++;
+		plug->do_sort = true;
 	} else if (plug && !blk_queue_nomerges(q)) {
 		blk_mq_bio_to_request(rq, bio);
 
@@ -1958,7 +1960,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 			data.hctx = same_queue_rq->mq_hctx;
 			blk_mq_try_issue_directly(data.hctx, same_queue_rq,
 					&cookie);
-		}
+		} else if (plug->rq_count > 1)
+			plug->do_sort = true;
 	} else if ((q->nr_hw_queues > 1 && is_sync) || (!q->elevator &&
 			!data.hctx->dispatch_busy)) {
 		blk_mq_put_ctx(data.ctx);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 02732cae6080..b6d0b33feee7 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1131,6 +1131,7 @@ struct blk_plug {
 	struct list_head mq_list; /* blk-mq requests */
 	struct list_head cb_list; /* md requires an unplug callback */
 	unsigned short rq_count;
+	bool do_sort;
 };
 #define BLK_MAX_REQUEST_COUNT 16
 #define BLK_PLUG_FLUSH_SIZE (128 * 1024)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 2/8] block: improve logic around when to sort a plug list
@ 2018-11-26 16:35   ` Jens Axboe
  0 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-26 16:35 UTC (permalink / raw)


Do it for the nr_hw_queues == 1 case, but only do it for the multi queue
case if we have requests for multiple devices in the plug.

Signed-off-by: Jens Axboe <axboe at kernel.dk>
---
 block/blk-core.c       | 1 +
 block/blk-mq.c         | 7 +++++--
 include/linux/blkdev.h | 1 +
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index be9233400314..c9758d185357 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1780,6 +1780,7 @@ void blk_start_plug(struct blk_plug *plug)
 	INIT_LIST_HEAD(&plug->mq_list);
 	INIT_LIST_HEAD(&plug->cb_list);
 	plug->rq_count = 0;
+	plug->do_sort = false;
 
 	/*
 	 * Store ordering should not be needed here, since a potential
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 99c66823d52f..6a249bf6ed00 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1678,7 +1678,8 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
 	list_splice_init(&plug->mq_list, &list);
 	plug->rq_count = 0;
 
-	list_sort(NULL, &list, plug_rq_cmp);
+	if (plug->do_sort)
+		list_sort(NULL, &list, plug_rq_cmp);
 
 	this_q = NULL;
 	this_hctx = NULL;
@@ -1935,6 +1936,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 
 		list_add_tail(&rq->queuelist, &plug->mq_list);
 		plug->rq_count++;
+		plug->do_sort = true;
 	} else if (plug && !blk_queue_nomerges(q)) {
 		blk_mq_bio_to_request(rq, bio);
 
@@ -1958,7 +1960,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 			data.hctx = same_queue_rq->mq_hctx;
 			blk_mq_try_issue_directly(data.hctx, same_queue_rq,
 					&cookie);
-		}
+		} else if (plug->rq_count > 1)
+			plug->do_sort = true;
 	} else if ((q->nr_hw_queues > 1 && is_sync) || (!q->elevator &&
 			!data.hctx->dispatch_busy)) {
 		blk_mq_put_ctx(data.ctx);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 02732cae6080..b6d0b33feee7 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1131,6 +1131,7 @@ struct blk_plug {
 	struct list_head mq_list; /* blk-mq requests */
 	struct list_head cb_list; /* md requires an unplug callback */
 	unsigned short rq_count;
+	bool do_sort;
 };
 #define BLK_MAX_REQUEST_COUNT 16
 #define BLK_PLUG_FLUSH_SIZE (128 * 1024)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 3/8] blk-mq: add mq_ops->commit_rqs()
  2018-11-26 16:35 ` Jens Axboe
@ 2018-11-26 16:35   ` Jens Axboe
  -1 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-26 16:35 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: Jens Axboe

blk-mq passes information to the hardware about any given request being
the last that we will issue in this sequence. The point is that hardware
can defer costly doorbell type writes to the last request. But if we run
into errors issuing a sequence of requests, we may never send the request
with bd->last == true set. For that case, we need a hook that tells the
hardware that nothing else is coming right now.

For failures returned by the drivers ->queue_rq() hook, the driver is
responsible for flushing pending requests, if it uses bd->last to
optimize that part. This works like before, no changes there.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 include/linux/blk-mq.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index ca0520ca6437..1fd139b65a6e 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -117,6 +117,7 @@ struct blk_mq_queue_data {
 
 typedef blk_status_t (queue_rq_fn)(struct blk_mq_hw_ctx *,
 		const struct blk_mq_queue_data *);
+typedef void (commit_rqs_fn)(struct blk_mq_hw_ctx *);
 /* takes rq->cmd_flags as input, returns a hardware type index */
 typedef int (rq_flags_to_type_fn)(struct request_queue *, unsigned int);
 typedef bool (get_budget_fn)(struct blk_mq_hw_ctx *);
@@ -144,6 +145,15 @@ struct blk_mq_ops {
 	 */
 	queue_rq_fn		*queue_rq;
 
+	/*
+	 * If a driver uses bd->last to judge when to submit requests to
+	 * hardware, it must define this function. In case of errors that
+	 * make us stop issuing further requests, this hook serves the
+	 * purpose of kicking the hardware (which the last request otherwise
+	 * would have done).
+	 */
+	commit_rqs_fn		*commit_rqs;
+
 	/*
 	 * Return a queue map type for the given request/bio flags
 	 */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 3/8] blk-mq: add mq_ops->commit_rqs()
@ 2018-11-26 16:35   ` Jens Axboe
  0 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-26 16:35 UTC (permalink / raw)


blk-mq passes information to the hardware about any given request being
the last that we will issue in this sequence. The point is that hardware
can defer costly doorbell type writes to the last request. But if we run
into errors issuing a sequence of requests, we may never send the request
with bd->last == true set. For that case, we need a hook that tells the
hardware that nothing else is coming right now.

For failures returned by the drivers ->queue_rq() hook, the driver is
responsible for flushing pending requests, if it uses bd->last to
optimize that part. This works like before, no changes there.

Signed-off-by: Jens Axboe <axboe at kernel.dk>
---
 include/linux/blk-mq.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index ca0520ca6437..1fd139b65a6e 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -117,6 +117,7 @@ struct blk_mq_queue_data {
 
 typedef blk_status_t (queue_rq_fn)(struct blk_mq_hw_ctx *,
 		const struct blk_mq_queue_data *);
+typedef void (commit_rqs_fn)(struct blk_mq_hw_ctx *);
 /* takes rq->cmd_flags as input, returns a hardware type index */
 typedef int (rq_flags_to_type_fn)(struct request_queue *, unsigned int);
 typedef bool (get_budget_fn)(struct blk_mq_hw_ctx *);
@@ -144,6 +145,15 @@ struct blk_mq_ops {
 	 */
 	queue_rq_fn		*queue_rq;
 
+	/*
+	 * If a driver uses bd->last to judge when to submit requests to
+	 * hardware, it must define this function. In case of errors that
+	 * make us stop issuing further requests, this hook serves the
+	 * purpose of kicking the hardware (which the last request otherwise
+	 * would have done).
+	 */
+	commit_rqs_fn		*commit_rqs;
+
 	/*
 	 * Return a queue map type for the given request/bio flags
 	 */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 4/8] nvme: implement mq_ops->commit_rqs() hook
  2018-11-26 16:35 ` Jens Axboe
@ 2018-11-26 16:35   ` Jens Axboe
  -1 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-26 16:35 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: Jens Axboe

Split the command submission and the SQ doorbell ring, and add the
doorbell ring as our ->commit_rqs() hook. This allows a list of
requests to be issued, with nvme only writing the SQ update when
it's necessary. This is more efficient if we have lists of requests
to issue, particularly on virtualized hardware, where writing the
SQ doorbell is more expensive than on real hardware. For those cases,
performance increases of 2-3x have been observed.

The use case for this is plugged IO, where blk-mq flushes a batch of
requests at the time.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 drivers/nvme/host/pci.c | 52 +++++++++++++++++++++++++++++++++--------
 1 file changed, 42 insertions(+), 10 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 73effe586e5f..d503bf6cd8ba 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -203,6 +203,7 @@ struct nvme_queue {
 	u16 q_depth;
 	s16 cq_vector;
 	u16 sq_tail;
+	u16 last_sq_tail;
 	u16 cq_head;
 	u16 last_cq_head;
 	u16 qid;
@@ -522,22 +523,52 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
 	return 0;
 }
 
+static inline void nvme_write_sq_db(struct nvme_queue *nvmeq)
+{
+	if (nvme_dbbuf_update_and_check_event(nvmeq->sq_tail,
+			nvmeq->dbbuf_sq_db, nvmeq->dbbuf_sq_ei))
+		writel(nvmeq->sq_tail, nvmeq->q_db);
+	nvmeq->last_sq_tail = nvmeq->sq_tail;
+}
+
+static inline int nvme_next_ring_index(struct nvme_queue *nvmeq, u16 index)
+{
+	if (++index == nvmeq->q_depth)
+		return 0;
+
+	return index;
+}
+
 /**
  * nvme_submit_cmd() - Copy a command into a queue and ring the doorbell
  * @nvmeq: The queue to use
  * @cmd: The command to send
+ * @write_sq: whether to write to the SQ doorbell
  */
-static void nvme_submit_cmd(struct nvme_queue *nvmeq, struct nvme_command *cmd)
+static void nvme_submit_cmd(struct nvme_queue *nvmeq, struct nvme_command *cmd,
+			    bool write_sq)
 {
 	spin_lock(&nvmeq->sq_lock);
 
 	memcpy(&nvmeq->sq_cmds[nvmeq->sq_tail], cmd, sizeof(*cmd));
 
-	if (++nvmeq->sq_tail == nvmeq->q_depth)
-		nvmeq->sq_tail = 0;
-	if (nvme_dbbuf_update_and_check_event(nvmeq->sq_tail,
-			nvmeq->dbbuf_sq_db, nvmeq->dbbuf_sq_ei))
-		writel(nvmeq->sq_tail, nvmeq->q_db);
+	/*
+	 * Write sq tail if we have to, OR if the next command would wrap
+	 */
+	nvmeq->sq_tail = nvme_next_ring_index(nvmeq, nvmeq->sq_tail);
+	if (write_sq ||
+	    nvme_next_ring_index(nvmeq, nvmeq->sq_tail) == nvmeq->last_sq_tail)
+		nvme_write_sq_db(nvmeq);
+	spin_unlock(&nvmeq->sq_lock);
+}
+
+static void nvme_commit_rqs(struct blk_mq_hw_ctx *hctx)
+{
+	struct nvme_queue *nvmeq = hctx->driver_data;
+
+	spin_lock(&nvmeq->sq_lock);
+	if (nvmeq->sq_tail != nvmeq->last_sq_tail)
+		nvme_write_sq_db(nvmeq);
 	spin_unlock(&nvmeq->sq_lock);
 }
 
@@ -923,7 +954,7 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	}
 
 	blk_mq_start_request(req);
-	nvme_submit_cmd(nvmeq, &cmnd);
+	nvme_submit_cmd(nvmeq, &cmnd, bd->last);
 	return BLK_STS_OK;
 out_cleanup_iod:
 	nvme_free_iod(dev, req);
@@ -999,8 +1030,7 @@ static void nvme_complete_cqes(struct nvme_queue *nvmeq, u16 start, u16 end)
 {
 	while (start != end) {
 		nvme_handle_cqe(nvmeq, start);
-		if (++start == nvmeq->q_depth)
-			start = 0;
+		start = nvme_next_ring_index(nvmeq, start);
 	}
 }
 
@@ -1108,7 +1138,7 @@ static void nvme_pci_submit_async_event(struct nvme_ctrl *ctrl)
 	memset(&c, 0, sizeof(c));
 	c.common.opcode = nvme_admin_async_event;
 	c.common.command_id = NVME_AQ_BLK_MQ_DEPTH;
-	nvme_submit_cmd(nvmeq, &c);
+	nvme_submit_cmd(nvmeq, &c, true);
 }
 
 static int adapter_delete_queue(struct nvme_dev *dev, u8 opcode, u16 id)
@@ -1531,6 +1561,7 @@ static void nvme_init_queue(struct nvme_queue *nvmeq, u16 qid)
 
 	spin_lock_irq(&nvmeq->cq_lock);
 	nvmeq->sq_tail = 0;
+	nvmeq->last_sq_tail = 0;
 	nvmeq->cq_head = 0;
 	nvmeq->cq_phase = 1;
 	nvmeq->q_db = &dev->dbs[qid * 2 * dev->db_stride];
@@ -1603,6 +1634,7 @@ static const struct blk_mq_ops nvme_mq_admin_ops = {
 
 #define NVME_SHARED_MQ_OPS					\
 	.queue_rq		= nvme_queue_rq,		\
+	.commit_rqs		= nvme_commit_rqs,		\
 	.rq_flags_to_type	= nvme_rq_flags_to_type,	\
 	.complete		= nvme_pci_complete_rq,		\
 	.init_hctx		= nvme_init_hctx,		\
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 4/8] nvme: implement mq_ops->commit_rqs() hook
@ 2018-11-26 16:35   ` Jens Axboe
  0 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-26 16:35 UTC (permalink / raw)


Split the command submission and the SQ doorbell ring, and add the
doorbell ring as our ->commit_rqs() hook. This allows a list of
requests to be issued, with nvme only writing the SQ update when
it's necessary. This is more efficient if we have lists of requests
to issue, particularly on virtualized hardware, where writing the
SQ doorbell is more expensive than on real hardware. For those cases,
performance increases of 2-3x have been observed.

The use case for this is plugged IO, where blk-mq flushes a batch of
requests at the time.

Signed-off-by: Jens Axboe <axboe at kernel.dk>
---
 drivers/nvme/host/pci.c | 52 +++++++++++++++++++++++++++++++++--------
 1 file changed, 42 insertions(+), 10 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 73effe586e5f..d503bf6cd8ba 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -203,6 +203,7 @@ struct nvme_queue {
 	u16 q_depth;
 	s16 cq_vector;
 	u16 sq_tail;
+	u16 last_sq_tail;
 	u16 cq_head;
 	u16 last_cq_head;
 	u16 qid;
@@ -522,22 +523,52 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
 	return 0;
 }
 
+static inline void nvme_write_sq_db(struct nvme_queue *nvmeq)
+{
+	if (nvme_dbbuf_update_and_check_event(nvmeq->sq_tail,
+			nvmeq->dbbuf_sq_db, nvmeq->dbbuf_sq_ei))
+		writel(nvmeq->sq_tail, nvmeq->q_db);
+	nvmeq->last_sq_tail = nvmeq->sq_tail;
+}
+
+static inline int nvme_next_ring_index(struct nvme_queue *nvmeq, u16 index)
+{
+	if (++index == nvmeq->q_depth)
+		return 0;
+
+	return index;
+}
+
 /**
  * nvme_submit_cmd() - Copy a command into a queue and ring the doorbell
  * @nvmeq: The queue to use
  * @cmd: The command to send
+ * @write_sq: whether to write to the SQ doorbell
  */
-static void nvme_submit_cmd(struct nvme_queue *nvmeq, struct nvme_command *cmd)
+static void nvme_submit_cmd(struct nvme_queue *nvmeq, struct nvme_command *cmd,
+			    bool write_sq)
 {
 	spin_lock(&nvmeq->sq_lock);
 
 	memcpy(&nvmeq->sq_cmds[nvmeq->sq_tail], cmd, sizeof(*cmd));
 
-	if (++nvmeq->sq_tail == nvmeq->q_depth)
-		nvmeq->sq_tail = 0;
-	if (nvme_dbbuf_update_and_check_event(nvmeq->sq_tail,
-			nvmeq->dbbuf_sq_db, nvmeq->dbbuf_sq_ei))
-		writel(nvmeq->sq_tail, nvmeq->q_db);
+	/*
+	 * Write sq tail if we have to, OR if the next command would wrap
+	 */
+	nvmeq->sq_tail = nvme_next_ring_index(nvmeq, nvmeq->sq_tail);
+	if (write_sq ||
+	    nvme_next_ring_index(nvmeq, nvmeq->sq_tail) == nvmeq->last_sq_tail)
+		nvme_write_sq_db(nvmeq);
+	spin_unlock(&nvmeq->sq_lock);
+}
+
+static void nvme_commit_rqs(struct blk_mq_hw_ctx *hctx)
+{
+	struct nvme_queue *nvmeq = hctx->driver_data;
+
+	spin_lock(&nvmeq->sq_lock);
+	if (nvmeq->sq_tail != nvmeq->last_sq_tail)
+		nvme_write_sq_db(nvmeq);
 	spin_unlock(&nvmeq->sq_lock);
 }
 
@@ -923,7 +954,7 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	}
 
 	blk_mq_start_request(req);
-	nvme_submit_cmd(nvmeq, &cmnd);
+	nvme_submit_cmd(nvmeq, &cmnd, bd->last);
 	return BLK_STS_OK;
 out_cleanup_iod:
 	nvme_free_iod(dev, req);
@@ -999,8 +1030,7 @@ static void nvme_complete_cqes(struct nvme_queue *nvmeq, u16 start, u16 end)
 {
 	while (start != end) {
 		nvme_handle_cqe(nvmeq, start);
-		if (++start == nvmeq->q_depth)
-			start = 0;
+		start = nvme_next_ring_index(nvmeq, start);
 	}
 }
 
@@ -1108,7 +1138,7 @@ static void nvme_pci_submit_async_event(struct nvme_ctrl *ctrl)
 	memset(&c, 0, sizeof(c));
 	c.common.opcode = nvme_admin_async_event;
 	c.common.command_id = NVME_AQ_BLK_MQ_DEPTH;
-	nvme_submit_cmd(nvmeq, &c);
+	nvme_submit_cmd(nvmeq, &c, true);
 }
 
 static int adapter_delete_queue(struct nvme_dev *dev, u8 opcode, u16 id)
@@ -1531,6 +1561,7 @@ static void nvme_init_queue(struct nvme_queue *nvmeq, u16 qid)
 
 	spin_lock_irq(&nvmeq->cq_lock);
 	nvmeq->sq_tail = 0;
+	nvmeq->last_sq_tail = 0;
 	nvmeq->cq_head = 0;
 	nvmeq->cq_phase = 1;
 	nvmeq->q_db = &dev->dbs[qid * 2 * dev->db_stride];
@@ -1603,6 +1634,7 @@ static const struct blk_mq_ops nvme_mq_admin_ops = {
 
 #define NVME_SHARED_MQ_OPS					\
 	.queue_rq		= nvme_queue_rq,		\
+	.commit_rqs		= nvme_commit_rqs,		\
 	.rq_flags_to_type	= nvme_rq_flags_to_type,	\
 	.complete		= nvme_pci_complete_rq,		\
 	.init_hctx		= nvme_init_hctx,		\
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook
  2018-11-26 16:35 ` Jens Axboe
@ 2018-11-26 16:35   ` Jens Axboe
  -1 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-26 16:35 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: Jens Axboe

We need this for blk-mq to kick things into gear, if we told it that
we had more IO coming, but then failed to deliver on that promise.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 drivers/block/virtio_blk.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 6e869d05f91e..b49c57e77780 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq)
 	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
 }
 
+static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
+{
+	struct virtio_blk *vblk = hctx->queue->queuedata;
+	int qid = hctx->queue_num;
+	bool kick;
+
+	spin_lock_irq(&vblk->vqs[qid].lock);
+	kick = virtqueue_kick_prepare(vblk->vqs[qid].vq);
+	spin_unlock_irq(&vblk->vqs[qid].lock);
+
+	if (kick)
+		virtqueue_notify(vblk->vqs[qid].vq);
+}
+
 static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
 			   const struct blk_mq_queue_data *bd)
 {
@@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req)
 
 static const struct blk_mq_ops virtio_mq_ops = {
 	.queue_rq	= virtio_queue_rq,
+	.commit_rqs	= virtio_commit_rqs,
 	.complete	= virtblk_request_done,
 	.init_request	= virtblk_init_request,
 #ifdef CONFIG_VIRTIO_BLK_SCSI
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook
@ 2018-11-26 16:35   ` Jens Axboe
  0 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-26 16:35 UTC (permalink / raw)


We need this for blk-mq to kick things into gear, if we told it that
we had more IO coming, but then failed to deliver on that promise.

Signed-off-by: Jens Axboe <axboe at kernel.dk>
---
 drivers/block/virtio_blk.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 6e869d05f91e..b49c57e77780 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq)
 	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
 }
 
+static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
+{
+	struct virtio_blk *vblk = hctx->queue->queuedata;
+	int qid = hctx->queue_num;
+	bool kick;
+
+	spin_lock_irq(&vblk->vqs[qid].lock);
+	kick = virtqueue_kick_prepare(vblk->vqs[qid].vq);
+	spin_unlock_irq(&vblk->vqs[qid].lock);
+
+	if (kick)
+		virtqueue_notify(vblk->vqs[qid].vq);
+}
+
 static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
 			   const struct blk_mq_queue_data *bd)
 {
@@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req)
 
 static const struct blk_mq_ops virtio_mq_ops = {
 	.queue_rq	= virtio_queue_rq,
+	.commit_rqs	= virtio_commit_rqs,
 	.complete	= virtblk_request_done,
 	.init_request	= virtblk_init_request,
 #ifdef CONFIG_VIRTIO_BLK_SCSI
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 6/8] ataflop: implement mq_ops->commit_rqs() hook
  2018-11-26 16:35 ` Jens Axboe
@ 2018-11-26 16:35   ` Jens Axboe
  -1 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-26 16:35 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: Jens Axboe

We need this for blk-mq to kick things into gear, if we told it that
we had more IO coming, but then failed to deliver on that promise.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 drivers/block/ataflop.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/block/ataflop.c b/drivers/block/ataflop.c
index f88b4c26d422..475cb972f324 100644
--- a/drivers/block/ataflop.c
+++ b/drivers/block/ataflop.c
@@ -1471,6 +1471,13 @@ static void setup_req_params( int drive )
 			ReqTrack, ReqSector, (unsigned long)ReqData ));
 }
 
+static void ataflop_commit_rqs(struct blk_mq_hw_ctx *hctx)
+{
+	spin_lock_irq(&ataflop_lock);
+	finish_fdc();
+	spin_unlock_irq(&ataflop_lock);
+}
+
 static blk_status_t ataflop_queue_rq(struct blk_mq_hw_ctx *hctx,
 				     const struct blk_mq_queue_data *bd)
 {
@@ -1947,6 +1954,7 @@ static const struct block_device_operations floppy_fops = {
 
 static const struct blk_mq_ops ataflop_mq_ops = {
 	.queue_rq = ataflop_queue_rq,
+	.commit_rqs = ataflop_commit_rqs,
 };
 
 static struct kobject *floppy_find(dev_t dev, int *part, void *data)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 6/8] ataflop: implement mq_ops->commit_rqs() hook
@ 2018-11-26 16:35   ` Jens Axboe
  0 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-26 16:35 UTC (permalink / raw)


We need this for blk-mq to kick things into gear, if we told it that
we had more IO coming, but then failed to deliver on that promise.

Signed-off-by: Jens Axboe <axboe at kernel.dk>
---
 drivers/block/ataflop.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/block/ataflop.c b/drivers/block/ataflop.c
index f88b4c26d422..475cb972f324 100644
--- a/drivers/block/ataflop.c
+++ b/drivers/block/ataflop.c
@@ -1471,6 +1471,13 @@ static void setup_req_params( int drive )
 			ReqTrack, ReqSector, (unsigned long)ReqData ));
 }
 
+static void ataflop_commit_rqs(struct blk_mq_hw_ctx *hctx)
+{
+	spin_lock_irq(&ataflop_lock);
+	finish_fdc();
+	spin_unlock_irq(&ataflop_lock);
+}
+
 static blk_status_t ataflop_queue_rq(struct blk_mq_hw_ctx *hctx,
 				     const struct blk_mq_queue_data *bd)
 {
@@ -1947,6 +1954,7 @@ static const struct block_device_operations floppy_fops = {
 
 static const struct blk_mq_ops ataflop_mq_ops = {
 	.queue_rq = ataflop_queue_rq,
+	.commit_rqs = ataflop_commit_rqs,
 };
 
 static struct kobject *floppy_find(dev_t dev, int *part, void *data)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 7/8] blk-mq: use bd->last == true for list inserts
  2018-11-26 16:35 ` Jens Axboe
@ 2018-11-26 16:35   ` Jens Axboe
  -1 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-26 16:35 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: Jens Axboe

If we are issuing a list of requests, we know if we're at the last one.
If we fail issuing, ensure that we call ->commits_rqs() to flush any
potential previous requests.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-core.c |  2 +-
 block/blk-mq.c   | 32 ++++++++++++++++++++++++--------
 block/blk-mq.h   |  2 +-
 3 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index c9758d185357..808a65d23f1a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1334,7 +1334,7 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *
 	 * bypass a potential scheduler on the bottom device for
 	 * insert.
 	 */
-	return blk_mq_request_issue_directly(rq);
+	return blk_mq_request_issue_directly(rq, true);
 }
 EXPORT_SYMBOL_GPL(blk_insert_cloned_request);
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6a249bf6ed00..0a12cec0b426 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1260,6 +1260,14 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 	if (!list_empty(list)) {
 		bool needs_restart;
 
+		/*
+		 * If we didn't flush the entire list, we could have told
+		 * the driver there was more coming, but that turned out to
+		 * be a lie.
+		 */
+		if (q->mq_ops->commit_rqs)
+			q->mq_ops->commit_rqs(hctx);
+
 		spin_lock(&hctx->lock);
 		list_splice_init(list, &hctx->dispatch);
 		spin_unlock(&hctx->lock);
@@ -1736,12 +1744,12 @@ static blk_qc_t request_to_qc_t(struct blk_mq_hw_ctx *hctx, struct request *rq)
 
 static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
 					    struct request *rq,
-					    blk_qc_t *cookie)
+					    blk_qc_t *cookie, bool last)
 {
 	struct request_queue *q = rq->q;
 	struct blk_mq_queue_data bd = {
 		.rq = rq,
-		.last = true,
+		.last = last,
 	};
 	blk_qc_t new_cookie;
 	blk_status_t ret;
@@ -1776,7 +1784,7 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
 static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 						struct request *rq,
 						blk_qc_t *cookie,
-						bool bypass_insert)
+						bool bypass_insert, bool last)
 {
 	struct request_queue *q = rq->q;
 	bool run_queue = true;
@@ -1805,7 +1813,7 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 		goto insert;
 	}
 
-	return __blk_mq_issue_directly(hctx, rq, cookie);
+	return __blk_mq_issue_directly(hctx, rq, cookie, last);
 insert:
 	if (bypass_insert)
 		return BLK_STS_RESOURCE;
@@ -1824,7 +1832,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 
 	hctx_lock(hctx, &srcu_idx);
 
-	ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false);
+	ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false, true);
 	if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE)
 		blk_mq_sched_insert_request(rq, false, true, false);
 	else if (ret != BLK_STS_OK)
@@ -1833,7 +1841,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 	hctx_unlock(hctx, srcu_idx);
 }
 
-blk_status_t blk_mq_request_issue_directly(struct request *rq)
+blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last)
 {
 	blk_status_t ret;
 	int srcu_idx;
@@ -1841,7 +1849,7 @@ blk_status_t blk_mq_request_issue_directly(struct request *rq)
 	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 
 	hctx_lock(hctx, &srcu_idx);
-	ret = __blk_mq_try_issue_directly(hctx, rq, &unused_cookie, true);
+	ret = __blk_mq_try_issue_directly(hctx, rq, &unused_cookie, true, last);
 	hctx_unlock(hctx, srcu_idx);
 
 	return ret;
@@ -1856,7 +1864,7 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
 				queuelist);
 
 		list_del_init(&rq->queuelist);
-		ret = blk_mq_request_issue_directly(rq);
+		ret = blk_mq_request_issue_directly(rq, list_empty(list));
 		if (ret != BLK_STS_OK) {
 			if (ret == BLK_STS_RESOURCE ||
 					ret == BLK_STS_DEV_RESOURCE) {
@@ -1866,6 +1874,14 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
 			blk_mq_end_request(rq, ret);
 		}
 	}
+
+	/*
+	 * If we didn't flush the entire list, we could have told
+	 * the driver there was more coming, but that turned out to
+	 * be a lie.
+	 */
+	if (!list_empty(list) && hctx->queue->mq_ops->commit_rqs)
+		hctx->queue->mq_ops->commit_rqs(hctx);
 }
 
 static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 9ae8e9f8f8b1..7291e5379358 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -69,7 +69,7 @@ void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
 				struct list_head *list);
 
 /* Used by blk_insert_cloned_request() to issue request directly */
-blk_status_t blk_mq_request_issue_directly(struct request *rq);
+blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last);
 void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
 				    struct list_head *list);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 7/8] blk-mq: use bd->last == true for list inserts
@ 2018-11-26 16:35   ` Jens Axboe
  0 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-26 16:35 UTC (permalink / raw)


If we are issuing a list of requests, we know if we're at the last one.
If we fail issuing, ensure that we call ->commits_rqs() to flush any
potential previous requests.

Signed-off-by: Jens Axboe <axboe at kernel.dk>
---
 block/blk-core.c |  2 +-
 block/blk-mq.c   | 32 ++++++++++++++++++++++++--------
 block/blk-mq.h   |  2 +-
 3 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index c9758d185357..808a65d23f1a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1334,7 +1334,7 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *
 	 * bypass a potential scheduler on the bottom device for
 	 * insert.
 	 */
-	return blk_mq_request_issue_directly(rq);
+	return blk_mq_request_issue_directly(rq, true);
 }
 EXPORT_SYMBOL_GPL(blk_insert_cloned_request);
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6a249bf6ed00..0a12cec0b426 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1260,6 +1260,14 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 	if (!list_empty(list)) {
 		bool needs_restart;
 
+		/*
+		 * If we didn't flush the entire list, we could have told
+		 * the driver there was more coming, but that turned out to
+		 * be a lie.
+		 */
+		if (q->mq_ops->commit_rqs)
+			q->mq_ops->commit_rqs(hctx);
+
 		spin_lock(&hctx->lock);
 		list_splice_init(list, &hctx->dispatch);
 		spin_unlock(&hctx->lock);
@@ -1736,12 +1744,12 @@ static blk_qc_t request_to_qc_t(struct blk_mq_hw_ctx *hctx, struct request *rq)
 
 static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
 					    struct request *rq,
-					    blk_qc_t *cookie)
+					    blk_qc_t *cookie, bool last)
 {
 	struct request_queue *q = rq->q;
 	struct blk_mq_queue_data bd = {
 		.rq = rq,
-		.last = true,
+		.last = last,
 	};
 	blk_qc_t new_cookie;
 	blk_status_t ret;
@@ -1776,7 +1784,7 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
 static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 						struct request *rq,
 						blk_qc_t *cookie,
-						bool bypass_insert)
+						bool bypass_insert, bool last)
 {
 	struct request_queue *q = rq->q;
 	bool run_queue = true;
@@ -1805,7 +1813,7 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 		goto insert;
 	}
 
-	return __blk_mq_issue_directly(hctx, rq, cookie);
+	return __blk_mq_issue_directly(hctx, rq, cookie, last);
 insert:
 	if (bypass_insert)
 		return BLK_STS_RESOURCE;
@@ -1824,7 +1832,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 
 	hctx_lock(hctx, &srcu_idx);
 
-	ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false);
+	ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false, true);
 	if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE)
 		blk_mq_sched_insert_request(rq, false, true, false);
 	else if (ret != BLK_STS_OK)
@@ -1833,7 +1841,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 	hctx_unlock(hctx, srcu_idx);
 }
 
-blk_status_t blk_mq_request_issue_directly(struct request *rq)
+blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last)
 {
 	blk_status_t ret;
 	int srcu_idx;
@@ -1841,7 +1849,7 @@ blk_status_t blk_mq_request_issue_directly(struct request *rq)
 	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 
 	hctx_lock(hctx, &srcu_idx);
-	ret = __blk_mq_try_issue_directly(hctx, rq, &unused_cookie, true);
+	ret = __blk_mq_try_issue_directly(hctx, rq, &unused_cookie, true, last);
 	hctx_unlock(hctx, srcu_idx);
 
 	return ret;
@@ -1856,7 +1864,7 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
 				queuelist);
 
 		list_del_init(&rq->queuelist);
-		ret = blk_mq_request_issue_directly(rq);
+		ret = blk_mq_request_issue_directly(rq, list_empty(list));
 		if (ret != BLK_STS_OK) {
 			if (ret == BLK_STS_RESOURCE ||
 					ret == BLK_STS_DEV_RESOURCE) {
@@ -1866,6 +1874,14 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
 			blk_mq_end_request(rq, ret);
 		}
 	}
+
+	/*
+	 * If we didn't flush the entire list, we could have told
+	 * the driver there was more coming, but that turned out to
+	 * be a lie.
+	 */
+	if (!list_empty(list) && hctx->queue->mq_ops->commit_rqs)
+		hctx->queue->mq_ops->commit_rqs(hctx);
 }
 
 static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 9ae8e9f8f8b1..7291e5379358 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -69,7 +69,7 @@ void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
 				struct list_head *list);
 
 /* Used by blk_insert_cloned_request() to issue request directly */
-blk_status_t blk_mq_request_issue_directly(struct request *rq);
+blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last);
 void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
 				    struct list_head *list);
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 8/8] blk-mq: add plug case for devices that implement ->commits_rqs()
  2018-11-26 16:35 ` Jens Axboe
@ 2018-11-26 16:35   ` Jens Axboe
  -1 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-26 16:35 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: Jens Axboe

If we have that hook, we know the driver handles bd->last == true in
a smart fashion. If it does, even for multiple hardware queues, it's
a good idea to flush batches of requests to the device, if we have
batches of requests from the submitter.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-mq.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0a12cec0b426..232f4914c8db 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1953,6 +1953,32 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		list_add_tail(&rq->queuelist, &plug->mq_list);
 		plug->rq_count++;
 		plug->do_sort = true;
+	} else if (plug && q->mq_ops->commit_rqs) {
+		/*
+		 * If we have a ->commit_rqs(), then we know the driver can
+		 * batch submission doorbell updates. Add rq to plug list,
+		 * and flush if we exceed the plug count only.
+		 */
+		blk_mq_bio_to_request(rq, bio);
+		blk_mq_put_ctx(data.ctx);
+
+		/*
+		 * If we have requests for more than one queue here, we
+		 * should sort the list when it's flushed.
+		 */
+		if (!plug->do_sort && !list_empty(&plug->mq_list)) {
+			same_queue_rq = list_first_entry(&plug->mq_list,
+						struct request, queuelist);
+			if (same_queue_rq->q != q)
+				plug->do_sort = true;
+		}
+
+		list_add_tail(&rq->queuelist, &plug->mq_list);
+		plug->rq_count++;
+		if (plug->rq_count >= BLK_MAX_REQUEST_COUNT) {
+			blk_flush_plug_list(plug, false);
+			trace_block_plug(q);
+		}
 	} else if (plug && !blk_queue_nomerges(q)) {
 		blk_mq_bio_to_request(rq, bio);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 8/8] blk-mq: add plug case for devices that implement ->commits_rqs()
@ 2018-11-26 16:35   ` Jens Axboe
  0 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-26 16:35 UTC (permalink / raw)


If we have that hook, we know the driver handles bd->last == true in
a smart fashion. If it does, even for multiple hardware queues, it's
a good idea to flush batches of requests to the device, if we have
batches of requests from the submitter.

Signed-off-by: Jens Axboe <axboe at kernel.dk>
---
 block/blk-mq.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0a12cec0b426..232f4914c8db 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1953,6 +1953,32 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		list_add_tail(&rq->queuelist, &plug->mq_list);
 		plug->rq_count++;
 		plug->do_sort = true;
+	} else if (plug && q->mq_ops->commit_rqs) {
+		/*
+		 * If we have a ->commit_rqs(), then we know the driver can
+		 * batch submission doorbell updates. Add rq to plug list,
+		 * and flush if we exceed the plug count only.
+		 */
+		blk_mq_bio_to_request(rq, bio);
+		blk_mq_put_ctx(data.ctx);
+
+		/*
+		 * If we have requests for more than one queue here, we
+		 * should sort the list when it's flushed.
+		 */
+		if (!plug->do_sort && !list_empty(&plug->mq_list)) {
+			same_queue_rq = list_first_entry(&plug->mq_list,
+						struct request, queuelist);
+			if (same_queue_rq->q != q)
+				plug->do_sort = true;
+		}
+
+		list_add_tail(&rq->queuelist, &plug->mq_list);
+		plug->rq_count++;
+		if (plug->rq_count >= BLK_MAX_REQUEST_COUNT) {
+			blk_flush_plug_list(plug, false);
+			trace_block_plug(q);
+		}
 	} else if (plug && !blk_queue_nomerges(q)) {
 		blk_mq_bio_to_request(rq, bio);
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: [PATCH 1/8] block: sum requests in the plug structure
  2018-11-26 16:35   ` Jens Axboe
@ 2018-11-26 17:02     ` Christoph Hellwig
  -1 siblings, 0 replies; 84+ messages in thread
From: Christoph Hellwig @ 2018-11-26 17:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

On Mon, Nov 26, 2018 at 09:35:49AM -0700, Jens Axboe wrote:
> This isn't exactly the same as the previous count, as it includes
> requests for all devices. But that really doesn't matter, if we have
> more than the threshold (16) queued up, flush it. It's not worth it
> to have an expensive list loop for this.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 1/8] block: sum requests in the plug structure
@ 2018-11-26 17:02     ` Christoph Hellwig
  0 siblings, 0 replies; 84+ messages in thread
From: Christoph Hellwig @ 2018-11-26 17:02 UTC (permalink / raw)


On Mon, Nov 26, 2018@09:35:49AM -0700, Jens Axboe wrote:
> This isn't exactly the same as the previous count, as it includes
> requests for all devices. But that really doesn't matter, if we have
> more than the threshold (16) queued up, flush it. It's not worth it
> to have an expensive list loop for this.
> 
> Signed-off-by: Jens Axboe <axboe at kernel.dk>

Looks good,

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 2/8] block: improve logic around when to sort a plug list
  2018-11-26 16:35   ` Jens Axboe
@ 2018-11-27 23:31     ` Omar Sandoval
  -1 siblings, 0 replies; 84+ messages in thread
From: Omar Sandoval @ 2018-11-27 23:31 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

On Mon, Nov 26, 2018 at 09:35:50AM -0700, Jens Axboe wrote:
> Do it for the nr_hw_queues == 1 case, but only do it for the multi queue
> case if we have requests for multiple devices in the plug.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  block/blk-core.c       | 1 +
>  block/blk-mq.c         | 7 +++++--
>  include/linux/blkdev.h | 1 +
>  3 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index be9233400314..c9758d185357 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1780,6 +1780,7 @@ void blk_start_plug(struct blk_plug *plug)
>  	INIT_LIST_HEAD(&plug->mq_list);
>  	INIT_LIST_HEAD(&plug->cb_list);
>  	plug->rq_count = 0;
> +	plug->do_sort = false;
>  
>  	/*
>  	 * Store ordering should not be needed here, since a potential
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 99c66823d52f..6a249bf6ed00 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1678,7 +1678,8 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
>  	list_splice_init(&plug->mq_list, &list);
>  	plug->rq_count = 0;
>  
> -	list_sort(NULL, &list, plug_rq_cmp);
> +	if (plug->do_sort)
> +		list_sort(NULL, &list, plug_rq_cmp);
>  
>  	this_q = NULL;
>  	this_hctx = NULL;
> @@ -1935,6 +1936,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>  
>  		list_add_tail(&rq->queuelist, &plug->mq_list);
>  		plug->rq_count++;
> +		plug->do_sort = true;
>  	} else if (plug && !blk_queue_nomerges(q)) {
>  		blk_mq_bio_to_request(rq, bio);
>  
> @@ -1958,7 +1960,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>  			data.hctx = same_queue_rq->mq_hctx;
>  			blk_mq_try_issue_directly(data.hctx, same_queue_rq,
>  					&cookie);
> -		}
> +		} else if (plug->rq_count > 1)
> +			plug->do_sort = true;

If plug->rq_count == 2, there's no benefit to sorting, either. The
nr_hw_queues == 1 case could also avoid sorting in that case. So maybe
this whole patch could just be replaced with:

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7b7dff85cf6c..de93c14e4700 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1675,9 +1675,10 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
 	unsigned int depth;
 
 	list_splice_init(&plug->mq_list, &list);
-	plug->rq_count = 0;
 
-	list_sort(NULL, &list, plug_rq_cmp);
+	if (plug->rq_count > 2)
+		list_sort(NULL, &list, plug_rq_cmp);
+	plug->rq_count = 0;
 
 	this_q = NULL;
 	this_hctx = NULL;

That will also handle the multi-queue case since we only plug one
request per multi-queue device. Thoughts?

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 2/8] block: improve logic around when to sort a plug list
@ 2018-11-27 23:31     ` Omar Sandoval
  0 siblings, 0 replies; 84+ messages in thread
From: Omar Sandoval @ 2018-11-27 23:31 UTC (permalink / raw)


On Mon, Nov 26, 2018@09:35:50AM -0700, Jens Axboe wrote:
> Do it for the nr_hw_queues == 1 case, but only do it for the multi queue
> case if we have requests for multiple devices in the plug.
> 
> Signed-off-by: Jens Axboe <axboe at kernel.dk>
> ---
>  block/blk-core.c       | 1 +
>  block/blk-mq.c         | 7 +++++--
>  include/linux/blkdev.h | 1 +
>  3 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index be9233400314..c9758d185357 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1780,6 +1780,7 @@ void blk_start_plug(struct blk_plug *plug)
>  	INIT_LIST_HEAD(&plug->mq_list);
>  	INIT_LIST_HEAD(&plug->cb_list);
>  	plug->rq_count = 0;
> +	plug->do_sort = false;
>  
>  	/*
>  	 * Store ordering should not be needed here, since a potential
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 99c66823d52f..6a249bf6ed00 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1678,7 +1678,8 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
>  	list_splice_init(&plug->mq_list, &list);
>  	plug->rq_count = 0;
>  
> -	list_sort(NULL, &list, plug_rq_cmp);
> +	if (plug->do_sort)
> +		list_sort(NULL, &list, plug_rq_cmp);
>  
>  	this_q = NULL;
>  	this_hctx = NULL;
> @@ -1935,6 +1936,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>  
>  		list_add_tail(&rq->queuelist, &plug->mq_list);
>  		plug->rq_count++;
> +		plug->do_sort = true;
>  	} else if (plug && !blk_queue_nomerges(q)) {
>  		blk_mq_bio_to_request(rq, bio);
>  
> @@ -1958,7 +1960,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>  			data.hctx = same_queue_rq->mq_hctx;
>  			blk_mq_try_issue_directly(data.hctx, same_queue_rq,
>  					&cookie);
> -		}
> +		} else if (plug->rq_count > 1)
> +			plug->do_sort = true;

If plug->rq_count == 2, there's no benefit to sorting, either. The
nr_hw_queues == 1 case could also avoid sorting in that case. So maybe
this whole patch could just be replaced with:

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7b7dff85cf6c..de93c14e4700 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1675,9 +1675,10 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
 	unsigned int depth;
 
 	list_splice_init(&plug->mq_list, &list);
-	plug->rq_count = 0;
 
-	list_sort(NULL, &list, plug_rq_cmp);
+	if (plug->rq_count > 2)
+		list_sort(NULL, &list, plug_rq_cmp);
+	plug->rq_count = 0;
 
 	this_q = NULL;
 	this_hctx = NULL;

That will also handle the multi-queue case since we only plug one
request per multi-queue device. Thoughts?

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: [PATCH 3/8] blk-mq: add mq_ops->commit_rqs()
  2018-11-26 16:35   ` Jens Axboe
@ 2018-11-27 23:43     ` Omar Sandoval
  -1 siblings, 0 replies; 84+ messages in thread
From: Omar Sandoval @ 2018-11-27 23:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

On Mon, Nov 26, 2018 at 09:35:51AM -0700, Jens Axboe wrote:
> blk-mq passes information to the hardware about any given request being
> the last that we will issue in this sequence. The point is that hardware
> can defer costly doorbell type writes to the last request. But if we run
> into errors issuing a sequence of requests, we may never send the request
> with bd->last == true set. For that case, we need a hook that tells the
> hardware that nothing else is coming right now.
> 
> For failures returned by the drivers ->queue_rq() hook, the driver is
> responsible for flushing pending requests, if it uses bd->last to
> optimize that part. This works like before, no changes there.

Reviewed-by: Omar Sandoval <osandov@fb.com>

> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  include/linux/blk-mq.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index ca0520ca6437..1fd139b65a6e 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -117,6 +117,7 @@ struct blk_mq_queue_data {
>  
>  typedef blk_status_t (queue_rq_fn)(struct blk_mq_hw_ctx *,
>  		const struct blk_mq_queue_data *);
> +typedef void (commit_rqs_fn)(struct blk_mq_hw_ctx *);
>  /* takes rq->cmd_flags as input, returns a hardware type index */
>  typedef int (rq_flags_to_type_fn)(struct request_queue *, unsigned int);
>  typedef bool (get_budget_fn)(struct blk_mq_hw_ctx *);
> @@ -144,6 +145,15 @@ struct blk_mq_ops {
>  	 */
>  	queue_rq_fn		*queue_rq;
>  
> +	/*
> +	 * If a driver uses bd->last to judge when to submit requests to
> +	 * hardware, it must define this function. In case of errors that
> +	 * make us stop issuing further requests, this hook serves the
> +	 * purpose of kicking the hardware (which the last request otherwise
> +	 * would have done).
> +	 */
> +	commit_rqs_fn		*commit_rqs;
> +
>  	/*
>  	 * Return a queue map type for the given request/bio flags
>  	 */
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 3/8] blk-mq: add mq_ops->commit_rqs()
@ 2018-11-27 23:43     ` Omar Sandoval
  0 siblings, 0 replies; 84+ messages in thread
From: Omar Sandoval @ 2018-11-27 23:43 UTC (permalink / raw)


On Mon, Nov 26, 2018@09:35:51AM -0700, Jens Axboe wrote:
> blk-mq passes information to the hardware about any given request being
> the last that we will issue in this sequence. The point is that hardware
> can defer costly doorbell type writes to the last request. But if we run
> into errors issuing a sequence of requests, we may never send the request
> with bd->last == true set. For that case, we need a hook that tells the
> hardware that nothing else is coming right now.
> 
> For failures returned by the drivers ->queue_rq() hook, the driver is
> responsible for flushing pending requests, if it uses bd->last to
> optimize that part. This works like before, no changes there.

Reviewed-by: Omar Sandoval <osandov at fb.com>

> Signed-off-by: Jens Axboe <axboe at kernel.dk>
> ---
>  include/linux/blk-mq.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index ca0520ca6437..1fd139b65a6e 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -117,6 +117,7 @@ struct blk_mq_queue_data {
>  
>  typedef blk_status_t (queue_rq_fn)(struct blk_mq_hw_ctx *,
>  		const struct blk_mq_queue_data *);
> +typedef void (commit_rqs_fn)(struct blk_mq_hw_ctx *);
>  /* takes rq->cmd_flags as input, returns a hardware type index */
>  typedef int (rq_flags_to_type_fn)(struct request_queue *, unsigned int);
>  typedef bool (get_budget_fn)(struct blk_mq_hw_ctx *);
> @@ -144,6 +145,15 @@ struct blk_mq_ops {
>  	 */
>  	queue_rq_fn		*queue_rq;
>  
> +	/*
> +	 * If a driver uses bd->last to judge when to submit requests to
> +	 * hardware, it must define this function. In case of errors that
> +	 * make us stop issuing further requests, this hook serves the
> +	 * purpose of kicking the hardware (which the last request otherwise
> +	 * would have done).
> +	 */
> +	commit_rqs_fn		*commit_rqs;
> +
>  	/*
>  	 * Return a queue map type for the given request/bio flags
>  	 */
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook
  2018-11-26 16:35   ` Jens Axboe
@ 2018-11-27 23:45     ` Omar Sandoval
  -1 siblings, 0 replies; 84+ messages in thread
From: Omar Sandoval @ 2018-11-27 23:45 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme, Michael S. Tsirkin, Jason Wang

On Mon, Nov 26, 2018 at 09:35:53AM -0700, Jens Axboe wrote:
> We need this for blk-mq to kick things into gear, if we told it that
> we had more IO coming, but then failed to deliver on that promise.

Reviewed-by: Omar Sandoval <osandov@fb.com>

But also cc'd the virtio-blk maintainers.

> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  drivers/block/virtio_blk.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 6e869d05f91e..b49c57e77780 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq)
>  	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>  }
>  
> +static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
> +{
> +	struct virtio_blk *vblk = hctx->queue->queuedata;
> +	int qid = hctx->queue_num;
> +	bool kick;
> +
> +	spin_lock_irq(&vblk->vqs[qid].lock);
> +	kick = virtqueue_kick_prepare(vblk->vqs[qid].vq);
> +	spin_unlock_irq(&vblk->vqs[qid].lock);
> +
> +	if (kick)
> +		virtqueue_notify(vblk->vqs[qid].vq);
> +}
> +
>  static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>  			   const struct blk_mq_queue_data *bd)
>  {
> @@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req)
>  
>  static const struct blk_mq_ops virtio_mq_ops = {
>  	.queue_rq	= virtio_queue_rq,
> +	.commit_rqs	= virtio_commit_rqs,
>  	.complete	= virtblk_request_done,
>  	.init_request	= virtblk_init_request,
>  #ifdef CONFIG_VIRTIO_BLK_SCSI
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook
@ 2018-11-27 23:45     ` Omar Sandoval
  0 siblings, 0 replies; 84+ messages in thread
From: Omar Sandoval @ 2018-11-27 23:45 UTC (permalink / raw)


On Mon, Nov 26, 2018@09:35:53AM -0700, Jens Axboe wrote:
> We need this for blk-mq to kick things into gear, if we told it that
> we had more IO coming, but then failed to deliver on that promise.

Reviewed-by: Omar Sandoval <osandov at fb.com>

But also cc'd the virtio-blk maintainers.

> Signed-off-by: Jens Axboe <axboe at kernel.dk>
> ---
>  drivers/block/virtio_blk.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 6e869d05f91e..b49c57e77780 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq)
>  	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>  }
>  
> +static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
> +{
> +	struct virtio_blk *vblk = hctx->queue->queuedata;
> +	int qid = hctx->queue_num;
> +	bool kick;
> +
> +	spin_lock_irq(&vblk->vqs[qid].lock);
> +	kick = virtqueue_kick_prepare(vblk->vqs[qid].vq);
> +	spin_unlock_irq(&vblk->vqs[qid].lock);
> +
> +	if (kick)
> +		virtqueue_notify(vblk->vqs[qid].vq);
> +}
> +
>  static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>  			   const struct blk_mq_queue_data *bd)
>  {
> @@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req)
>  
>  static const struct blk_mq_ops virtio_mq_ops = {
>  	.queue_rq	= virtio_queue_rq,
> +	.commit_rqs	= virtio_commit_rqs,
>  	.complete	= virtblk_request_done,
>  	.init_request	= virtblk_init_request,
>  #ifdef CONFIG_VIRTIO_BLK_SCSI
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 6/8] ataflop: implement mq_ops->commit_rqs() hook
  2018-11-26 16:35   ` Jens Axboe
@ 2018-11-27 23:46     ` Omar Sandoval
  -1 siblings, 0 replies; 84+ messages in thread
From: Omar Sandoval @ 2018-11-27 23:46 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

On Mon, Nov 26, 2018 at 09:35:54AM -0700, Jens Axboe wrote:
> We need this for blk-mq to kick things into gear, if we told it that
> we had more IO coming, but then failed to deliver on that promise.

Who converted this one? Oh yeah, it was me...

Reviewed-by: Omar Sandoval <osandov@fb.com>

> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  drivers/block/ataflop.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/block/ataflop.c b/drivers/block/ataflop.c
> index f88b4c26d422..475cb972f324 100644
> --- a/drivers/block/ataflop.c
> +++ b/drivers/block/ataflop.c
> @@ -1471,6 +1471,13 @@ static void setup_req_params( int drive )
>  			ReqTrack, ReqSector, (unsigned long)ReqData ));
>  }
>  
> +static void ataflop_commit_rqs(struct blk_mq_hw_ctx *hctx)
> +{
> +	spin_lock_irq(&ataflop_lock);
> +	finish_fdc();
> +	spin_unlock_irq(&ataflop_lock);
> +}
> +
>  static blk_status_t ataflop_queue_rq(struct blk_mq_hw_ctx *hctx,
>  				     const struct blk_mq_queue_data *bd)
>  {
> @@ -1947,6 +1954,7 @@ static const struct block_device_operations floppy_fops = {
>  
>  static const struct blk_mq_ops ataflop_mq_ops = {
>  	.queue_rq = ataflop_queue_rq,
> +	.commit_rqs = ataflop_commit_rqs,
>  };
>  
>  static struct kobject *floppy_find(dev_t dev, int *part, void *data)
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 6/8] ataflop: implement mq_ops->commit_rqs() hook
@ 2018-11-27 23:46     ` Omar Sandoval
  0 siblings, 0 replies; 84+ messages in thread
From: Omar Sandoval @ 2018-11-27 23:46 UTC (permalink / raw)


On Mon, Nov 26, 2018@09:35:54AM -0700, Jens Axboe wrote:
> We need this for blk-mq to kick things into gear, if we told it that
> we had more IO coming, but then failed to deliver on that promise.

Who converted this one? Oh yeah, it was me...

Reviewed-by: Omar Sandoval <osandov at fb.com>

> Signed-off-by: Jens Axboe <axboe at kernel.dk>
> ---
>  drivers/block/ataflop.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/block/ataflop.c b/drivers/block/ataflop.c
> index f88b4c26d422..475cb972f324 100644
> --- a/drivers/block/ataflop.c
> +++ b/drivers/block/ataflop.c
> @@ -1471,6 +1471,13 @@ static void setup_req_params( int drive )
>  			ReqTrack, ReqSector, (unsigned long)ReqData ));
>  }
>  
> +static void ataflop_commit_rqs(struct blk_mq_hw_ctx *hctx)
> +{
> +	spin_lock_irq(&ataflop_lock);
> +	finish_fdc();
> +	spin_unlock_irq(&ataflop_lock);
> +}
> +
>  static blk_status_t ataflop_queue_rq(struct blk_mq_hw_ctx *hctx,
>  				     const struct blk_mq_queue_data *bd)
>  {
> @@ -1947,6 +1954,7 @@ static const struct block_device_operations floppy_fops = {
>  
>  static const struct blk_mq_ops ataflop_mq_ops = {
>  	.queue_rq = ataflop_queue_rq,
> +	.commit_rqs = ataflop_commit_rqs,
>  };
>  
>  static struct kobject *floppy_find(dev_t dev, int *part, void *data)
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 2/8] block: improve logic around when to sort a plug list
  2018-11-27 23:31     ` Omar Sandoval
@ 2018-11-27 23:49       ` Jens Axboe
  -1 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-27 23:49 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-block, linux-nvme

On 11/27/18 4:31 PM, Omar Sandoval wrote:
> On Mon, Nov 26, 2018 at 09:35:50AM -0700, Jens Axboe wrote:
>> Do it for the nr_hw_queues == 1 case, but only do it for the multi queue
>> case if we have requests for multiple devices in the plug.
>>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> ---
>>  block/blk-core.c       | 1 +
>>  block/blk-mq.c         | 7 +++++--
>>  include/linux/blkdev.h | 1 +
>>  3 files changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index be9233400314..c9758d185357 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -1780,6 +1780,7 @@ void blk_start_plug(struct blk_plug *plug)
>>  	INIT_LIST_HEAD(&plug->mq_list);
>>  	INIT_LIST_HEAD(&plug->cb_list);
>>  	plug->rq_count = 0;
>> +	plug->do_sort = false;
>>  
>>  	/*
>>  	 * Store ordering should not be needed here, since a potential
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 99c66823d52f..6a249bf6ed00 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -1678,7 +1678,8 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
>>  	list_splice_init(&plug->mq_list, &list);
>>  	plug->rq_count = 0;
>>  
>> -	list_sort(NULL, &list, plug_rq_cmp);
>> +	if (plug->do_sort)
>> +		list_sort(NULL, &list, plug_rq_cmp);
>>  
>>  	this_q = NULL;
>>  	this_hctx = NULL;
>> @@ -1935,6 +1936,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>>  
>>  		list_add_tail(&rq->queuelist, &plug->mq_list);
>>  		plug->rq_count++;
>> +		plug->do_sort = true;
>>  	} else if (plug && !blk_queue_nomerges(q)) {
>>  		blk_mq_bio_to_request(rq, bio);
>>  
>> @@ -1958,7 +1960,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>>  			data.hctx = same_queue_rq->mq_hctx;
>>  			blk_mq_try_issue_directly(data.hctx, same_queue_rq,
>>  					&cookie);
>> -		}
>> +		} else if (plug->rq_count > 1)
>> +			plug->do_sort = true;
> 
> If plug->rq_count == 2, there's no benefit to sorting, either. The
> nr_hw_queues == 1 case could also avoid sorting in that case. So maybe
> this whole patch could just be replaced with:

Heh yes, good point, it should be 3 at least. But if you look at the
later mq plug patch, we only sort for that one if we have multiple
queues. So the logic should be something ala:

if (plug->rq_count > 2 && plug->has_multiple_queues)

since that's the only case we want to sort for.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 2/8] block: improve logic around when to sort a plug list
@ 2018-11-27 23:49       ` Jens Axboe
  0 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-27 23:49 UTC (permalink / raw)


On 11/27/18 4:31 PM, Omar Sandoval wrote:
> On Mon, Nov 26, 2018@09:35:50AM -0700, Jens Axboe wrote:
>> Do it for the nr_hw_queues == 1 case, but only do it for the multi queue
>> case if we have requests for multiple devices in the plug.
>>
>> Signed-off-by: Jens Axboe <axboe at kernel.dk>
>> ---
>>  block/blk-core.c       | 1 +
>>  block/blk-mq.c         | 7 +++++--
>>  include/linux/blkdev.h | 1 +
>>  3 files changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index be9233400314..c9758d185357 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -1780,6 +1780,7 @@ void blk_start_plug(struct blk_plug *plug)
>>  	INIT_LIST_HEAD(&plug->mq_list);
>>  	INIT_LIST_HEAD(&plug->cb_list);
>>  	plug->rq_count = 0;
>> +	plug->do_sort = false;
>>  
>>  	/*
>>  	 * Store ordering should not be needed here, since a potential
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 99c66823d52f..6a249bf6ed00 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -1678,7 +1678,8 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
>>  	list_splice_init(&plug->mq_list, &list);
>>  	plug->rq_count = 0;
>>  
>> -	list_sort(NULL, &list, plug_rq_cmp);
>> +	if (plug->do_sort)
>> +		list_sort(NULL, &list, plug_rq_cmp);
>>  
>>  	this_q = NULL;
>>  	this_hctx = NULL;
>> @@ -1935,6 +1936,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>>  
>>  		list_add_tail(&rq->queuelist, &plug->mq_list);
>>  		plug->rq_count++;
>> +		plug->do_sort = true;
>>  	} else if (plug && !blk_queue_nomerges(q)) {
>>  		blk_mq_bio_to_request(rq, bio);
>>  
>> @@ -1958,7 +1960,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>>  			data.hctx = same_queue_rq->mq_hctx;
>>  			blk_mq_try_issue_directly(data.hctx, same_queue_rq,
>>  					&cookie);
>> -		}
>> +		} else if (plug->rq_count > 1)
>> +			plug->do_sort = true;
> 
> If plug->rq_count == 2, there's no benefit to sorting, either. The
> nr_hw_queues == 1 case could also avoid sorting in that case. So maybe
> this whole patch could just be replaced with:

Heh yes, good point, it should be 3 at least. But if you look at the
later mq plug patch, we only sort for that one if we have multiple
queues. So the logic should be something ala:

if (plug->rq_count > 2 && plug->has_multiple_queues)

since that's the only case we want to sort for.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 7/8] blk-mq: use bd->last == true for list inserts
  2018-11-26 16:35   ` Jens Axboe
@ 2018-11-27 23:49     ` Omar Sandoval
  -1 siblings, 0 replies; 84+ messages in thread
From: Omar Sandoval @ 2018-11-27 23:49 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

On Mon, Nov 26, 2018 at 09:35:55AM -0700, Jens Axboe wrote:
> If we are issuing a list of requests, we know if we're at the last one.
> If we fail issuing, ensure that we call ->commits_rqs() to flush any
> potential previous requests.

One comment below, otherwise

Reviewed-by: Omar Sandoval <osandov@fb.com>

> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  block/blk-core.c |  2 +-
>  block/blk-mq.c   | 32 ++++++++++++++++++++++++--------
>  block/blk-mq.h   |  2 +-
>  3 files changed, 26 insertions(+), 10 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index c9758d185357..808a65d23f1a 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1334,7 +1334,7 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *
>  	 * bypass a potential scheduler on the bottom device for
>  	 * insert.
>  	 */
> -	return blk_mq_request_issue_directly(rq);
> +	return blk_mq_request_issue_directly(rq, true);
>  }
>  EXPORT_SYMBOL_GPL(blk_insert_cloned_request);
>  
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 6a249bf6ed00..0a12cec0b426 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1260,6 +1260,14 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
>  	if (!list_empty(list)) {
>  		bool needs_restart;
>  
> +		/*
> +		 * If we didn't flush the entire list, we could have told
> +		 * the driver there was more coming, but that turned out to
> +		 * be a lie.
> +		 */
> +		if (q->mq_ops->commit_rqs)
> +			q->mq_ops->commit_rqs(hctx);
> +

This hunk seems like it should go with the patch adding commit_rqs.

>  		spin_lock(&hctx->lock);
>  		list_splice_init(list, &hctx->dispatch);
>  		spin_unlock(&hctx->lock);
> @@ -1736,12 +1744,12 @@ static blk_qc_t request_to_qc_t(struct blk_mq_hw_ctx *hctx, struct request *rq)
>  
>  static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
>  					    struct request *rq,
> -					    blk_qc_t *cookie)
> +					    blk_qc_t *cookie, bool last)
>  {
>  	struct request_queue *q = rq->q;
>  	struct blk_mq_queue_data bd = {
>  		.rq = rq,
> -		.last = true,
> +		.last = last,
>  	};
>  	blk_qc_t new_cookie;
>  	blk_status_t ret;
> @@ -1776,7 +1784,7 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
>  static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
>  						struct request *rq,
>  						blk_qc_t *cookie,
> -						bool bypass_insert)
> +						bool bypass_insert, bool last)
>  {
>  	struct request_queue *q = rq->q;
>  	bool run_queue = true;
> @@ -1805,7 +1813,7 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
>  		goto insert;
>  	}
>  
> -	return __blk_mq_issue_directly(hctx, rq, cookie);
> +	return __blk_mq_issue_directly(hctx, rq, cookie, last);
>  insert:
>  	if (bypass_insert)
>  		return BLK_STS_RESOURCE;
> @@ -1824,7 +1832,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
>  
>  	hctx_lock(hctx, &srcu_idx);
>  
> -	ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false);
> +	ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false, true);
>  	if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE)
>  		blk_mq_sched_insert_request(rq, false, true, false);
>  	else if (ret != BLK_STS_OK)
> @@ -1833,7 +1841,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
>  	hctx_unlock(hctx, srcu_idx);
>  }
>  
> -blk_status_t blk_mq_request_issue_directly(struct request *rq)
> +blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last)
>  {
>  	blk_status_t ret;
>  	int srcu_idx;
> @@ -1841,7 +1849,7 @@ blk_status_t blk_mq_request_issue_directly(struct request *rq)
>  	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
>  
>  	hctx_lock(hctx, &srcu_idx);
> -	ret = __blk_mq_try_issue_directly(hctx, rq, &unused_cookie, true);
> +	ret = __blk_mq_try_issue_directly(hctx, rq, &unused_cookie, true, last);
>  	hctx_unlock(hctx, srcu_idx);
>  
>  	return ret;
> @@ -1856,7 +1864,7 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
>  				queuelist);
>  
>  		list_del_init(&rq->queuelist);
> -		ret = blk_mq_request_issue_directly(rq);
> +		ret = blk_mq_request_issue_directly(rq, list_empty(list));
>  		if (ret != BLK_STS_OK) {
>  			if (ret == BLK_STS_RESOURCE ||
>  					ret == BLK_STS_DEV_RESOURCE) {
> @@ -1866,6 +1874,14 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
>  			blk_mq_end_request(rq, ret);
>  		}
>  	}
> +
> +	/*
> +	 * If we didn't flush the entire list, we could have told
> +	 * the driver there was more coming, but that turned out to
> +	 * be a lie.
> +	 */
> +	if (!list_empty(list) && hctx->queue->mq_ops->commit_rqs)
> +		hctx->queue->mq_ops->commit_rqs(hctx);
>  }
>  
>  static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
> diff --git a/block/blk-mq.h b/block/blk-mq.h
> index 9ae8e9f8f8b1..7291e5379358 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -69,7 +69,7 @@ void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
>  				struct list_head *list);
>  
>  /* Used by blk_insert_cloned_request() to issue request directly */
> -blk_status_t blk_mq_request_issue_directly(struct request *rq);
> +blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last);
>  void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
>  				    struct list_head *list);
>  
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 7/8] blk-mq: use bd->last == true for list inserts
@ 2018-11-27 23:49     ` Omar Sandoval
  0 siblings, 0 replies; 84+ messages in thread
From: Omar Sandoval @ 2018-11-27 23:49 UTC (permalink / raw)


On Mon, Nov 26, 2018@09:35:55AM -0700, Jens Axboe wrote:
> If we are issuing a list of requests, we know if we're at the last one.
> If we fail issuing, ensure that we call ->commits_rqs() to flush any
> potential previous requests.

One comment below, otherwise

Reviewed-by: Omar Sandoval <osandov at fb.com>

> Signed-off-by: Jens Axboe <axboe at kernel.dk>
> ---
>  block/blk-core.c |  2 +-
>  block/blk-mq.c   | 32 ++++++++++++++++++++++++--------
>  block/blk-mq.h   |  2 +-
>  3 files changed, 26 insertions(+), 10 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index c9758d185357..808a65d23f1a 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1334,7 +1334,7 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *
>  	 * bypass a potential scheduler on the bottom device for
>  	 * insert.
>  	 */
> -	return blk_mq_request_issue_directly(rq);
> +	return blk_mq_request_issue_directly(rq, true);
>  }
>  EXPORT_SYMBOL_GPL(blk_insert_cloned_request);
>  
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 6a249bf6ed00..0a12cec0b426 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1260,6 +1260,14 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
>  	if (!list_empty(list)) {
>  		bool needs_restart;
>  
> +		/*
> +		 * If we didn't flush the entire list, we could have told
> +		 * the driver there was more coming, but that turned out to
> +		 * be a lie.
> +		 */
> +		if (q->mq_ops->commit_rqs)
> +			q->mq_ops->commit_rqs(hctx);
> +

This hunk seems like it should go with the patch adding commit_rqs.

>  		spin_lock(&hctx->lock);
>  		list_splice_init(list, &hctx->dispatch);
>  		spin_unlock(&hctx->lock);
> @@ -1736,12 +1744,12 @@ static blk_qc_t request_to_qc_t(struct blk_mq_hw_ctx *hctx, struct request *rq)
>  
>  static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
>  					    struct request *rq,
> -					    blk_qc_t *cookie)
> +					    blk_qc_t *cookie, bool last)
>  {
>  	struct request_queue *q = rq->q;
>  	struct blk_mq_queue_data bd = {
>  		.rq = rq,
> -		.last = true,
> +		.last = last,
>  	};
>  	blk_qc_t new_cookie;
>  	blk_status_t ret;
> @@ -1776,7 +1784,7 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
>  static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
>  						struct request *rq,
>  						blk_qc_t *cookie,
> -						bool bypass_insert)
> +						bool bypass_insert, bool last)
>  {
>  	struct request_queue *q = rq->q;
>  	bool run_queue = true;
> @@ -1805,7 +1813,7 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
>  		goto insert;
>  	}
>  
> -	return __blk_mq_issue_directly(hctx, rq, cookie);
> +	return __blk_mq_issue_directly(hctx, rq, cookie, last);
>  insert:
>  	if (bypass_insert)
>  		return BLK_STS_RESOURCE;
> @@ -1824,7 +1832,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
>  
>  	hctx_lock(hctx, &srcu_idx);
>  
> -	ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false);
> +	ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false, true);
>  	if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE)
>  		blk_mq_sched_insert_request(rq, false, true, false);
>  	else if (ret != BLK_STS_OK)
> @@ -1833,7 +1841,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
>  	hctx_unlock(hctx, srcu_idx);
>  }
>  
> -blk_status_t blk_mq_request_issue_directly(struct request *rq)
> +blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last)
>  {
>  	blk_status_t ret;
>  	int srcu_idx;
> @@ -1841,7 +1849,7 @@ blk_status_t blk_mq_request_issue_directly(struct request *rq)
>  	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
>  
>  	hctx_lock(hctx, &srcu_idx);
> -	ret = __blk_mq_try_issue_directly(hctx, rq, &unused_cookie, true);
> +	ret = __blk_mq_try_issue_directly(hctx, rq, &unused_cookie, true, last);
>  	hctx_unlock(hctx, srcu_idx);
>  
>  	return ret;
> @@ -1856,7 +1864,7 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
>  				queuelist);
>  
>  		list_del_init(&rq->queuelist);
> -		ret = blk_mq_request_issue_directly(rq);
> +		ret = blk_mq_request_issue_directly(rq, list_empty(list));
>  		if (ret != BLK_STS_OK) {
>  			if (ret == BLK_STS_RESOURCE ||
>  					ret == BLK_STS_DEV_RESOURCE) {
> @@ -1866,6 +1874,14 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
>  			blk_mq_end_request(rq, ret);
>  		}
>  	}
> +
> +	/*
> +	 * If we didn't flush the entire list, we could have told
> +	 * the driver there was more coming, but that turned out to
> +	 * be a lie.
> +	 */
> +	if (!list_empty(list) && hctx->queue->mq_ops->commit_rqs)
> +		hctx->queue->mq_ops->commit_rqs(hctx);
>  }
>  
>  static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
> diff --git a/block/blk-mq.h b/block/blk-mq.h
> index 9ae8e9f8f8b1..7291e5379358 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -69,7 +69,7 @@ void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
>  				struct list_head *list);
>  
>  /* Used by blk_insert_cloned_request() to issue request directly */
> -blk_status_t blk_mq_request_issue_directly(struct request *rq);
> +blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last);
>  void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
>  				    struct list_head *list);
>  
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 7/8] blk-mq: use bd->last == true for list inserts
  2018-11-27 23:49     ` Omar Sandoval
@ 2018-11-27 23:51       ` Jens Axboe
  -1 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-27 23:51 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-block, linux-nvme

On 11/27/18 4:49 PM, Omar Sandoval wrote:
> On Mon, Nov 26, 2018 at 09:35:55AM -0700, Jens Axboe wrote:
>> If we are issuing a list of requests, we know if we're at the last one.
>> If we fail issuing, ensure that we call ->commits_rqs() to flush any
>> potential previous requests.
> 
> One comment below, otherwise
> 
> Reviewed-by: Omar Sandoval <osandov@fb.com>

>> @@ -1260,6 +1260,14 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
>>  	if (!list_empty(list)) {
>>  		bool needs_restart;
>>  
>> +		/*
>> +		 * If we didn't flush the entire list, we could have told
>> +		 * the driver there was more coming, but that turned out to
>> +		 * be a lie.
>> +		 */
>> +		if (q->mq_ops->commit_rqs)
>> +			q->mq_ops->commit_rqs(hctx);
>> +
> 
> This hunk seems like it should go with the patch adding commit_rqs.

Agree, that would be better, since that also makes that patch fix an
actual issue instead of just being a prep patch. I'll shuffle that hunk
to that patch.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 7/8] blk-mq: use bd->last == true for list inserts
@ 2018-11-27 23:51       ` Jens Axboe
  0 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-27 23:51 UTC (permalink / raw)


On 11/27/18 4:49 PM, Omar Sandoval wrote:
> On Mon, Nov 26, 2018@09:35:55AM -0700, Jens Axboe wrote:
>> If we are issuing a list of requests, we know if we're at the last one.
>> If we fail issuing, ensure that we call ->commits_rqs() to flush any
>> potential previous requests.
> 
> One comment below, otherwise
> 
> Reviewed-by: Omar Sandoval <osandov at fb.com>

>> @@ -1260,6 +1260,14 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
>>  	if (!list_empty(list)) {
>>  		bool needs_restart;
>>  
>> +		/*
>> +		 * If we didn't flush the entire list, we could have told
>> +		 * the driver there was more coming, but that turned out to
>> +		 * be a lie.
>> +		 */
>> +		if (q->mq_ops->commit_rqs)
>> +			q->mq_ops->commit_rqs(hctx);
>> +
> 
> This hunk seems like it should go with the patch adding commit_rqs.

Agree, that would be better, since that also makes that patch fix an
actual issue instead of just being a prep patch. I'll shuffle that hunk
to that patch.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 2/8] block: improve logic around when to sort a plug list
  2018-11-27 23:49       ` Jens Axboe
@ 2018-11-27 23:55         ` Omar Sandoval
  -1 siblings, 0 replies; 84+ messages in thread
From: Omar Sandoval @ 2018-11-27 23:55 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

On Tue, Nov 27, 2018 at 04:49:27PM -0700, Jens Axboe wrote:
> On 11/27/18 4:31 PM, Omar Sandoval wrote:
> > On Mon, Nov 26, 2018 at 09:35:50AM -0700, Jens Axboe wrote:
> >> Do it for the nr_hw_queues == 1 case, but only do it for the multi queue
> >> case if we have requests for multiple devices in the plug.
> >>
> >> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >> ---
> >>  block/blk-core.c       | 1 +
> >>  block/blk-mq.c         | 7 +++++--
> >>  include/linux/blkdev.h | 1 +
> >>  3 files changed, 7 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/block/blk-core.c b/block/blk-core.c
> >> index be9233400314..c9758d185357 100644
> >> --- a/block/blk-core.c
> >> +++ b/block/blk-core.c
> >> @@ -1780,6 +1780,7 @@ void blk_start_plug(struct blk_plug *plug)
> >>  	INIT_LIST_HEAD(&plug->mq_list);
> >>  	INIT_LIST_HEAD(&plug->cb_list);
> >>  	plug->rq_count = 0;
> >> +	plug->do_sort = false;
> >>  
> >>  	/*
> >>  	 * Store ordering should not be needed here, since a potential
> >> diff --git a/block/blk-mq.c b/block/blk-mq.c
> >> index 99c66823d52f..6a249bf6ed00 100644
> >> --- a/block/blk-mq.c
> >> +++ b/block/blk-mq.c
> >> @@ -1678,7 +1678,8 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
> >>  	list_splice_init(&plug->mq_list, &list);
> >>  	plug->rq_count = 0;
> >>  
> >> -	list_sort(NULL, &list, plug_rq_cmp);
> >> +	if (plug->do_sort)
> >> +		list_sort(NULL, &list, plug_rq_cmp);
> >>  
> >>  	this_q = NULL;
> >>  	this_hctx = NULL;
> >> @@ -1935,6 +1936,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
> >>  
> >>  		list_add_tail(&rq->queuelist, &plug->mq_list);
> >>  		plug->rq_count++;
> >> +		plug->do_sort = true;
> >>  	} else if (plug && !blk_queue_nomerges(q)) {
> >>  		blk_mq_bio_to_request(rq, bio);
> >>  
> >> @@ -1958,7 +1960,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
> >>  			data.hctx = same_queue_rq->mq_hctx;
> >>  			blk_mq_try_issue_directly(data.hctx, same_queue_rq,
> >>  					&cookie);
> >> -		}
> >> +		} else if (plug->rq_count > 1)
> >> +			plug->do_sort = true;
> > 
> > If plug->rq_count == 2, there's no benefit to sorting, either. The
> > nr_hw_queues == 1 case could also avoid sorting in that case. So maybe
> > this whole patch could just be replaced with:
> 
> Heh yes, good point, it should be 3 at least. But if you look at the
> later mq plug patch, we only sort for that one if we have multiple
> queues. So the logic should be something ala:
> 
> if (plug->rq_count > 2 && plug->has_multiple_queues)
> 
> since that's the only case we want to sort for.

Yeah, just got to that one. If you're going to respin this, I'll wait
for you to change that around before really looking at it.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 2/8] block: improve logic around when to sort a plug list
@ 2018-11-27 23:55         ` Omar Sandoval
  0 siblings, 0 replies; 84+ messages in thread
From: Omar Sandoval @ 2018-11-27 23:55 UTC (permalink / raw)


On Tue, Nov 27, 2018@04:49:27PM -0700, Jens Axboe wrote:
> On 11/27/18 4:31 PM, Omar Sandoval wrote:
> > On Mon, Nov 26, 2018@09:35:50AM -0700, Jens Axboe wrote:
> >> Do it for the nr_hw_queues == 1 case, but only do it for the multi queue
> >> case if we have requests for multiple devices in the plug.
> >>
> >> Signed-off-by: Jens Axboe <axboe at kernel.dk>
> >> ---
> >>  block/blk-core.c       | 1 +
> >>  block/blk-mq.c         | 7 +++++--
> >>  include/linux/blkdev.h | 1 +
> >>  3 files changed, 7 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/block/blk-core.c b/block/blk-core.c
> >> index be9233400314..c9758d185357 100644
> >> --- a/block/blk-core.c
> >> +++ b/block/blk-core.c
> >> @@ -1780,6 +1780,7 @@ void blk_start_plug(struct blk_plug *plug)
> >>  	INIT_LIST_HEAD(&plug->mq_list);
> >>  	INIT_LIST_HEAD(&plug->cb_list);
> >>  	plug->rq_count = 0;
> >> +	plug->do_sort = false;
> >>  
> >>  	/*
> >>  	 * Store ordering should not be needed here, since a potential
> >> diff --git a/block/blk-mq.c b/block/blk-mq.c
> >> index 99c66823d52f..6a249bf6ed00 100644
> >> --- a/block/blk-mq.c
> >> +++ b/block/blk-mq.c
> >> @@ -1678,7 +1678,8 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
> >>  	list_splice_init(&plug->mq_list, &list);
> >>  	plug->rq_count = 0;
> >>  
> >> -	list_sort(NULL, &list, plug_rq_cmp);
> >> +	if (plug->do_sort)
> >> +		list_sort(NULL, &list, plug_rq_cmp);
> >>  
> >>  	this_q = NULL;
> >>  	this_hctx = NULL;
> >> @@ -1935,6 +1936,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
> >>  
> >>  		list_add_tail(&rq->queuelist, &plug->mq_list);
> >>  		plug->rq_count++;
> >> +		plug->do_sort = true;
> >>  	} else if (plug && !blk_queue_nomerges(q)) {
> >>  		blk_mq_bio_to_request(rq, bio);
> >>  
> >> @@ -1958,7 +1960,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
> >>  			data.hctx = same_queue_rq->mq_hctx;
> >>  			blk_mq_try_issue_directly(data.hctx, same_queue_rq,
> >>  					&cookie);
> >> -		}
> >> +		} else if (plug->rq_count > 1)
> >> +			plug->do_sort = true;
> > 
> > If plug->rq_count == 2, there's no benefit to sorting, either. The
> > nr_hw_queues == 1 case could also avoid sorting in that case. So maybe
> > this whole patch could just be replaced with:
> 
> Heh yes, good point, it should be 3 at least. But if you look at the
> later mq plug patch, we only sort for that one if we have multiple
> queues. So the logic should be something ala:
> 
> if (plug->rq_count > 2 && plug->has_multiple_queues)
> 
> since that's the only case we want to sort for.

Yeah, just got to that one. If you're going to respin this, I'll wait
for you to change that around before really looking at it.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 2/8] block: improve logic around when to sort a plug list
  2018-11-27 23:49       ` Jens Axboe
@ 2018-11-27 23:59         ` Jens Axboe
  -1 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-27 23:59 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-block, linux-nvme

On 11/27/18 4:49 PM, Jens Axboe wrote:
> On 11/27/18 4:31 PM, Omar Sandoval wrote:
>> On Mon, Nov 26, 2018 at 09:35:50AM -0700, Jens Axboe wrote:
>>> Do it for the nr_hw_queues == 1 case, but only do it for the multi queue
>>> case if we have requests for multiple devices in the plug.
>>>
>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>> ---
>>>  block/blk-core.c       | 1 +
>>>  block/blk-mq.c         | 7 +++++--
>>>  include/linux/blkdev.h | 1 +
>>>  3 files changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>> index be9233400314..c9758d185357 100644
>>> --- a/block/blk-core.c
>>> +++ b/block/blk-core.c
>>> @@ -1780,6 +1780,7 @@ void blk_start_plug(struct blk_plug *plug)
>>>  	INIT_LIST_HEAD(&plug->mq_list);
>>>  	INIT_LIST_HEAD(&plug->cb_list);
>>>  	plug->rq_count = 0;
>>> +	plug->do_sort = false;
>>>  
>>>  	/*
>>>  	 * Store ordering should not be needed here, since a potential
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index 99c66823d52f..6a249bf6ed00 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -1678,7 +1678,8 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
>>>  	list_splice_init(&plug->mq_list, &list);
>>>  	plug->rq_count = 0;
>>>  
>>> -	list_sort(NULL, &list, plug_rq_cmp);
>>> +	if (plug->do_sort)
>>> +		list_sort(NULL, &list, plug_rq_cmp);
>>>  
>>>  	this_q = NULL;
>>>  	this_hctx = NULL;
>>> @@ -1935,6 +1936,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>>>  
>>>  		list_add_tail(&rq->queuelist, &plug->mq_list);
>>>  		plug->rq_count++;
>>> +		plug->do_sort = true;
>>>  	} else if (plug && !blk_queue_nomerges(q)) {
>>>  		blk_mq_bio_to_request(rq, bio);
>>>  
>>> @@ -1958,7 +1960,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>>>  			data.hctx = same_queue_rq->mq_hctx;
>>>  			blk_mq_try_issue_directly(data.hctx, same_queue_rq,
>>>  					&cookie);
>>> -		}
>>> +		} else if (plug->rq_count > 1)
>>> +			plug->do_sort = true;
>>
>> If plug->rq_count == 2, there's no benefit to sorting, either. The
>> nr_hw_queues == 1 case could also avoid sorting in that case. So maybe
>> this whole patch could just be replaced with:
> 
> Heh yes, good point, it should be 3 at least. But if you look at the
> later mq plug patch, we only sort for that one if we have multiple
> queues. So the logic should be something ala:
> 
> if (plug->rq_count > 2 && plug->has_multiple_queues)
> 
> since that's the only case we want to sort for.

How about something like this?


diff --git a/block/blk-core.c b/block/blk-core.c
index be9233400314..d107d016b92b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1780,6 +1780,7 @@ void blk_start_plug(struct blk_plug *plug)
 	INIT_LIST_HEAD(&plug->mq_list);
 	INIT_LIST_HEAD(&plug->cb_list);
 	plug->rq_count = 0;
+	plug->multiple_queues = false;
 
 	/*
 	 * Store ordering should not be needed here, since a potential
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7b7dff85cf6c..02daa32c5d77 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1677,7 +1677,8 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
 	list_splice_init(&plug->mq_list, &list);
 	plug->rq_count = 0;
 
-	list_sort(NULL, &list, plug_rq_cmp);
+	if (plug->rq_count > 2 && plug->multiple_queues)
+		list_sort(NULL, &list, plug_rq_cmp);
 
 	this_q = NULL;
 	this_hctx = NULL;
@@ -1866,6 +1867,20 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
 	}
 }
 
+static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq)
+{
+	list_add_tail(&rq->queuelist, &plug->mq_list);
+	plug->rq_count++;
+	if (!plug->multiple_queues && !list_is_singular(&plug->mq_list)) {
+		struct request *tmp;
+
+		tmp = list_first_entry(&plug->mq_list, struct request,
+						queuelist);
+		if (tmp->q != rq->q)
+			plug->multiple_queues = true;
+	}
+}
+
 static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 {
 	const int is_sync = op_is_sync(bio->bi_opf);
@@ -1932,8 +1947,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 			trace_block_plug(q);
 		}
 
-		list_add_tail(&rq->queuelist, &plug->mq_list);
-		plug->rq_count++;
+		blk_add_rq_to_plug(plug, rq);
 	} else if (plug && !blk_queue_nomerges(q)) {
 		blk_mq_bio_to_request(rq, bio);
 
@@ -1948,8 +1962,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 			same_queue_rq = NULL;
 		if (same_queue_rq)
 			list_del_init(&same_queue_rq->queuelist);
-		list_add_tail(&rq->queuelist, &plug->mq_list);
-		plug->rq_count++;
+		blk_add_rq_to_plug(plug, rq);
 
 		blk_mq_put_ctx(data.ctx);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 02732cae6080..08d940f85fa0 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1131,6 +1131,7 @@ struct blk_plug {
 	struct list_head mq_list; /* blk-mq requests */
 	struct list_head cb_list; /* md requires an unplug callback */
 	unsigned short rq_count;
+	bool multiple_queues;
 };
 #define BLK_MAX_REQUEST_COUNT 16
 #define BLK_PLUG_FLUSH_SIZE (128 * 1024)

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 2/8] block: improve logic around when to sort a plug list
@ 2018-11-27 23:59         ` Jens Axboe
  0 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-27 23:59 UTC (permalink / raw)


On 11/27/18 4:49 PM, Jens Axboe wrote:
> On 11/27/18 4:31 PM, Omar Sandoval wrote:
>> On Mon, Nov 26, 2018@09:35:50AM -0700, Jens Axboe wrote:
>>> Do it for the nr_hw_queues == 1 case, but only do it for the multi queue
>>> case if we have requests for multiple devices in the plug.
>>>
>>> Signed-off-by: Jens Axboe <axboe at kernel.dk>
>>> ---
>>>  block/blk-core.c       | 1 +
>>>  block/blk-mq.c         | 7 +++++--
>>>  include/linux/blkdev.h | 1 +
>>>  3 files changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>> index be9233400314..c9758d185357 100644
>>> --- a/block/blk-core.c
>>> +++ b/block/blk-core.c
>>> @@ -1780,6 +1780,7 @@ void blk_start_plug(struct blk_plug *plug)
>>>  	INIT_LIST_HEAD(&plug->mq_list);
>>>  	INIT_LIST_HEAD(&plug->cb_list);
>>>  	plug->rq_count = 0;
>>> +	plug->do_sort = false;
>>>  
>>>  	/*
>>>  	 * Store ordering should not be needed here, since a potential
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index 99c66823d52f..6a249bf6ed00 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -1678,7 +1678,8 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
>>>  	list_splice_init(&plug->mq_list, &list);
>>>  	plug->rq_count = 0;
>>>  
>>> -	list_sort(NULL, &list, plug_rq_cmp);
>>> +	if (plug->do_sort)
>>> +		list_sort(NULL, &list, plug_rq_cmp);
>>>  
>>>  	this_q = NULL;
>>>  	this_hctx = NULL;
>>> @@ -1935,6 +1936,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>>>  
>>>  		list_add_tail(&rq->queuelist, &plug->mq_list);
>>>  		plug->rq_count++;
>>> +		plug->do_sort = true;
>>>  	} else if (plug && !blk_queue_nomerges(q)) {
>>>  		blk_mq_bio_to_request(rq, bio);
>>>  
>>> @@ -1958,7 +1960,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>>>  			data.hctx = same_queue_rq->mq_hctx;
>>>  			blk_mq_try_issue_directly(data.hctx, same_queue_rq,
>>>  					&cookie);
>>> -		}
>>> +		} else if (plug->rq_count > 1)
>>> +			plug->do_sort = true;
>>
>> If plug->rq_count == 2, there's no benefit to sorting, either. The
>> nr_hw_queues == 1 case could also avoid sorting in that case. So maybe
>> this whole patch could just be replaced with:
> 
> Heh yes, good point, it should be 3 at least. But if you look at the
> later mq plug patch, we only sort for that one if we have multiple
> queues. So the logic should be something ala:
> 
> if (plug->rq_count > 2 && plug->has_multiple_queues)
> 
> since that's the only case we want to sort for.

How about something like this?


diff --git a/block/blk-core.c b/block/blk-core.c
index be9233400314..d107d016b92b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1780,6 +1780,7 @@ void blk_start_plug(struct blk_plug *plug)
 	INIT_LIST_HEAD(&plug->mq_list);
 	INIT_LIST_HEAD(&plug->cb_list);
 	plug->rq_count = 0;
+	plug->multiple_queues = false;
 
 	/*
 	 * Store ordering should not be needed here, since a potential
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7b7dff85cf6c..02daa32c5d77 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1677,7 +1677,8 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
 	list_splice_init(&plug->mq_list, &list);
 	plug->rq_count = 0;
 
-	list_sort(NULL, &list, plug_rq_cmp);
+	if (plug->rq_count > 2 && plug->multiple_queues)
+		list_sort(NULL, &list, plug_rq_cmp);
 
 	this_q = NULL;
 	this_hctx = NULL;
@@ -1866,6 +1867,20 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
 	}
 }
 
+static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq)
+{
+	list_add_tail(&rq->queuelist, &plug->mq_list);
+	plug->rq_count++;
+	if (!plug->multiple_queues && !list_is_singular(&plug->mq_list)) {
+		struct request *tmp;
+
+		tmp = list_first_entry(&plug->mq_list, struct request,
+						queuelist);
+		if (tmp->q != rq->q)
+			plug->multiple_queues = true;
+	}
+}
+
 static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 {
 	const int is_sync = op_is_sync(bio->bi_opf);
@@ -1932,8 +1947,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 			trace_block_plug(q);
 		}
 
-		list_add_tail(&rq->queuelist, &plug->mq_list);
-		plug->rq_count++;
+		blk_add_rq_to_plug(plug, rq);
 	} else if (plug && !blk_queue_nomerges(q)) {
 		blk_mq_bio_to_request(rq, bio);
 
@@ -1948,8 +1962,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 			same_queue_rq = NULL;
 		if (same_queue_rq)
 			list_del_init(&same_queue_rq->queuelist);
-		list_add_tail(&rq->queuelist, &plug->mq_list);
-		plug->rq_count++;
+		blk_add_rq_to_plug(plug, rq);
 
 		blk_mq_put_ctx(data.ctx);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 02732cae6080..08d940f85fa0 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1131,6 +1131,7 @@ struct blk_plug {
 	struct list_head mq_list; /* blk-mq requests */
 	struct list_head cb_list; /* md requires an unplug callback */
 	unsigned short rq_count;
+	bool multiple_queues;
 };
 #define BLK_MAX_REQUEST_COUNT 16
 #define BLK_PLUG_FLUSH_SIZE (128 * 1024)

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: [PATCH 2/8] block: improve logic around when to sort a plug list
  2018-11-27 23:59         ` Jens Axboe
@ 2018-11-28  0:05           ` Omar Sandoval
  -1 siblings, 0 replies; 84+ messages in thread
From: Omar Sandoval @ 2018-11-28  0:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

On Tue, Nov 27, 2018 at 04:59:14PM -0700, Jens Axboe wrote:
> On 11/27/18 4:49 PM, Jens Axboe wrote:
> > On 11/27/18 4:31 PM, Omar Sandoval wrote:
> >> On Mon, Nov 26, 2018 at 09:35:50AM -0700, Jens Axboe wrote:
> >>> Do it for the nr_hw_queues == 1 case, but only do it for the multi queue
> >>> case if we have requests for multiple devices in the plug.
> >>>
> >>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >>> ---
> >>>  block/blk-core.c       | 1 +
> >>>  block/blk-mq.c         | 7 +++++--
> >>>  include/linux/blkdev.h | 1 +
> >>>  3 files changed, 7 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/block/blk-core.c b/block/blk-core.c
> >>> index be9233400314..c9758d185357 100644
> >>> --- a/block/blk-core.c
> >>> +++ b/block/blk-core.c
> >>> @@ -1780,6 +1780,7 @@ void blk_start_plug(struct blk_plug *plug)
> >>>  	INIT_LIST_HEAD(&plug->mq_list);
> >>>  	INIT_LIST_HEAD(&plug->cb_list);
> >>>  	plug->rq_count = 0;
> >>> +	plug->do_sort = false;
> >>>  
> >>>  	/*
> >>>  	 * Store ordering should not be needed here, since a potential
> >>> diff --git a/block/blk-mq.c b/block/blk-mq.c
> >>> index 99c66823d52f..6a249bf6ed00 100644
> >>> --- a/block/blk-mq.c
> >>> +++ b/block/blk-mq.c
> >>> @@ -1678,7 +1678,8 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
> >>>  	list_splice_init(&plug->mq_list, &list);
> >>>  	plug->rq_count = 0;
> >>>  
> >>> -	list_sort(NULL, &list, plug_rq_cmp);
> >>> +	if (plug->do_sort)
> >>> +		list_sort(NULL, &list, plug_rq_cmp);
> >>>  
> >>>  	this_q = NULL;
> >>>  	this_hctx = NULL;
> >>> @@ -1935,6 +1936,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
> >>>  
> >>>  		list_add_tail(&rq->queuelist, &plug->mq_list);
> >>>  		plug->rq_count++;
> >>> +		plug->do_sort = true;
> >>>  	} else if (plug && !blk_queue_nomerges(q)) {
> >>>  		blk_mq_bio_to_request(rq, bio);
> >>>  
> >>> @@ -1958,7 +1960,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
> >>>  			data.hctx = same_queue_rq->mq_hctx;
> >>>  			blk_mq_try_issue_directly(data.hctx, same_queue_rq,
> >>>  					&cookie);
> >>> -		}
> >>> +		} else if (plug->rq_count > 1)
> >>> +			plug->do_sort = true;
> >>
> >> If plug->rq_count == 2, there's no benefit to sorting, either. The
> >> nr_hw_queues == 1 case could also avoid sorting in that case. So maybe
> >> this whole patch could just be replaced with:
> > 
> > Heh yes, good point, it should be 3 at least. But if you look at the
> > later mq plug patch, we only sort for that one if we have multiple
> > queues. So the logic should be something ala:
> > 
> > if (plug->rq_count > 2 && plug->has_multiple_queues)
> > 
> > since that's the only case we want to sort for.
> 
> How about something like this?
> 
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index be9233400314..d107d016b92b 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1780,6 +1780,7 @@ void blk_start_plug(struct blk_plug *plug)
>  	INIT_LIST_HEAD(&plug->mq_list);
>  	INIT_LIST_HEAD(&plug->cb_list);
>  	plug->rq_count = 0;
> +	plug->multiple_queues = false;
>  
>  	/*
>  	 * Store ordering should not be needed here, since a potential
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 7b7dff85cf6c..02daa32c5d77 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1677,7 +1677,8 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
>  	list_splice_init(&plug->mq_list, &list);
>  	plug->rq_count = 0;
>  
> -	list_sort(NULL, &list, plug_rq_cmp);
> +	if (plug->rq_count > 2 && plug->multiple_queues)
> +		list_sort(NULL, &list, plug_rq_cmp);
>  
>  	this_q = NULL;
>  	this_hctx = NULL;
> @@ -1866,6 +1867,20 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
>  	}
>  }
>  
> +static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq)
> +{
> +	list_add_tail(&rq->queuelist, &plug->mq_list);
> +	plug->rq_count++;
> +	if (!plug->multiple_queues && !list_is_singular(&plug->mq_list)) {
> +		struct request *tmp;
> +
> +		tmp = list_first_entry(&plug->mq_list, struct request,
> +						queuelist);
> +		if (tmp->q != rq->q)
> +			plug->multiple_queues = true;

Actually, I thought we want to sort whenever there are different
software/hardware queues on the plug list, even if they're on the same
queue?

> +	}
> +}
> +
>  static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>  {
>  	const int is_sync = op_is_sync(bio->bi_opf);
> @@ -1932,8 +1947,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>  			trace_block_plug(q);
>  		}
>  
> -		list_add_tail(&rq->queuelist, &plug->mq_list);
> -		plug->rq_count++;
> +		blk_add_rq_to_plug(plug, rq);
>  	} else if (plug && !blk_queue_nomerges(q)) {
>  		blk_mq_bio_to_request(rq, bio);
>  
> @@ -1948,8 +1962,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>  			same_queue_rq = NULL;
>  		if (same_queue_rq)
>  			list_del_init(&same_queue_rq->queuelist);
> -		list_add_tail(&rq->queuelist, &plug->mq_list);
> -		plug->rq_count++;
> +		blk_add_rq_to_plug(plug, rq);
>  
>  		blk_mq_put_ctx(data.ctx);
>  
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 02732cae6080..08d940f85fa0 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1131,6 +1131,7 @@ struct blk_plug {
>  	struct list_head mq_list; /* blk-mq requests */
>  	struct list_head cb_list; /* md requires an unplug callback */
>  	unsigned short rq_count;
> +	bool multiple_queues;
>  };
>  #define BLK_MAX_REQUEST_COUNT 16
>  #define BLK_PLUG_FLUSH_SIZE (128 * 1024)

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 2/8] block: improve logic around when to sort a plug list
@ 2018-11-28  0:05           ` Omar Sandoval
  0 siblings, 0 replies; 84+ messages in thread
From: Omar Sandoval @ 2018-11-28  0:05 UTC (permalink / raw)


On Tue, Nov 27, 2018@04:59:14PM -0700, Jens Axboe wrote:
> On 11/27/18 4:49 PM, Jens Axboe wrote:
> > On 11/27/18 4:31 PM, Omar Sandoval wrote:
> >> On Mon, Nov 26, 2018@09:35:50AM -0700, Jens Axboe wrote:
> >>> Do it for the nr_hw_queues == 1 case, but only do it for the multi queue
> >>> case if we have requests for multiple devices in the plug.
> >>>
> >>> Signed-off-by: Jens Axboe <axboe at kernel.dk>
> >>> ---
> >>>  block/blk-core.c       | 1 +
> >>>  block/blk-mq.c         | 7 +++++--
> >>>  include/linux/blkdev.h | 1 +
> >>>  3 files changed, 7 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/block/blk-core.c b/block/blk-core.c
> >>> index be9233400314..c9758d185357 100644
> >>> --- a/block/blk-core.c
> >>> +++ b/block/blk-core.c
> >>> @@ -1780,6 +1780,7 @@ void blk_start_plug(struct blk_plug *plug)
> >>>  	INIT_LIST_HEAD(&plug->mq_list);
> >>>  	INIT_LIST_HEAD(&plug->cb_list);
> >>>  	plug->rq_count = 0;
> >>> +	plug->do_sort = false;
> >>>  
> >>>  	/*
> >>>  	 * Store ordering should not be needed here, since a potential
> >>> diff --git a/block/blk-mq.c b/block/blk-mq.c
> >>> index 99c66823d52f..6a249bf6ed00 100644
> >>> --- a/block/blk-mq.c
> >>> +++ b/block/blk-mq.c
> >>> @@ -1678,7 +1678,8 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
> >>>  	list_splice_init(&plug->mq_list, &list);
> >>>  	plug->rq_count = 0;
> >>>  
> >>> -	list_sort(NULL, &list, plug_rq_cmp);
> >>> +	if (plug->do_sort)
> >>> +		list_sort(NULL, &list, plug_rq_cmp);
> >>>  
> >>>  	this_q = NULL;
> >>>  	this_hctx = NULL;
> >>> @@ -1935,6 +1936,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
> >>>  
> >>>  		list_add_tail(&rq->queuelist, &plug->mq_list);
> >>>  		plug->rq_count++;
> >>> +		plug->do_sort = true;
> >>>  	} else if (plug && !blk_queue_nomerges(q)) {
> >>>  		blk_mq_bio_to_request(rq, bio);
> >>>  
> >>> @@ -1958,7 +1960,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
> >>>  			data.hctx = same_queue_rq->mq_hctx;
> >>>  			blk_mq_try_issue_directly(data.hctx, same_queue_rq,
> >>>  					&cookie);
> >>> -		}
> >>> +		} else if (plug->rq_count > 1)
> >>> +			plug->do_sort = true;
> >>
> >> If plug->rq_count == 2, there's no benefit to sorting, either. The
> >> nr_hw_queues == 1 case could also avoid sorting in that case. So maybe
> >> this whole patch could just be replaced with:
> > 
> > Heh yes, good point, it should be 3 at least. But if you look at the
> > later mq plug patch, we only sort for that one if we have multiple
> > queues. So the logic should be something ala:
> > 
> > if (plug->rq_count > 2 && plug->has_multiple_queues)
> > 
> > since that's the only case we want to sort for.
> 
> How about something like this?
> 
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index be9233400314..d107d016b92b 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1780,6 +1780,7 @@ void blk_start_plug(struct blk_plug *plug)
>  	INIT_LIST_HEAD(&plug->mq_list);
>  	INIT_LIST_HEAD(&plug->cb_list);
>  	plug->rq_count = 0;
> +	plug->multiple_queues = false;
>  
>  	/*
>  	 * Store ordering should not be needed here, since a potential
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 7b7dff85cf6c..02daa32c5d77 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1677,7 +1677,8 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
>  	list_splice_init(&plug->mq_list, &list);
>  	plug->rq_count = 0;
>  
> -	list_sort(NULL, &list, plug_rq_cmp);
> +	if (plug->rq_count > 2 && plug->multiple_queues)
> +		list_sort(NULL, &list, plug_rq_cmp);
>  
>  	this_q = NULL;
>  	this_hctx = NULL;
> @@ -1866,6 +1867,20 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
>  	}
>  }
>  
> +static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq)
> +{
> +	list_add_tail(&rq->queuelist, &plug->mq_list);
> +	plug->rq_count++;
> +	if (!plug->multiple_queues && !list_is_singular(&plug->mq_list)) {
> +		struct request *tmp;
> +
> +		tmp = list_first_entry(&plug->mq_list, struct request,
> +						queuelist);
> +		if (tmp->q != rq->q)
> +			plug->multiple_queues = true;

Actually, I thought we want to sort whenever there are different
software/hardware queues on the plug list, even if they're on the same
queue?

> +	}
> +}
> +
>  static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>  {
>  	const int is_sync = op_is_sync(bio->bi_opf);
> @@ -1932,8 +1947,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>  			trace_block_plug(q);
>  		}
>  
> -		list_add_tail(&rq->queuelist, &plug->mq_list);
> -		plug->rq_count++;
> +		blk_add_rq_to_plug(plug, rq);
>  	} else if (plug && !blk_queue_nomerges(q)) {
>  		blk_mq_bio_to_request(rq, bio);
>  
> @@ -1948,8 +1962,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>  			same_queue_rq = NULL;
>  		if (same_queue_rq)
>  			list_del_init(&same_queue_rq->queuelist);
> -		list_add_tail(&rq->queuelist, &plug->mq_list);
> -		plug->rq_count++;
> +		blk_add_rq_to_plug(plug, rq);
>  
>  		blk_mq_put_ctx(data.ctx);
>  
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 02732cae6080..08d940f85fa0 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1131,6 +1131,7 @@ struct blk_plug {
>  	struct list_head mq_list; /* blk-mq requests */
>  	struct list_head cb_list; /* md requires an unplug callback */
>  	unsigned short rq_count;
> +	bool multiple_queues;
>  };
>  #define BLK_MAX_REQUEST_COUNT 16
>  #define BLK_PLUG_FLUSH_SIZE (128 * 1024)

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 2/8] block: improve logic around when to sort a plug list
  2018-11-28  0:05           ` Omar Sandoval
@ 2018-11-28  0:16             ` Jens Axboe
  -1 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-28  0:16 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-block, linux-nvme

On 11/27/18 5:05 PM, Omar Sandoval wrote:
>> +static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq)
>> +{
>> +	list_add_tail(&rq->queuelist, &plug->mq_list);
>> +	plug->rq_count++;
>> +	if (!plug->multiple_queues && !list_is_singular(&plug->mq_list)) {
>> +		struct request *tmp;
>> +
>> +		tmp = list_first_entry(&plug->mq_list, struct request,
>> +						queuelist);
>> +		if (tmp->q != rq->q)
>> +			plug->multiple_queues = true;
> 
> Actually, I thought we want to sort whenever there are different
> software/hardware queues on the plug list, even if they're on the same
> queue?

That was the original intent, but with the tons of testing I've done
lately, the cost of sorting batches is higher than the cost we pay for
having to do multiple inserts. So I think the only sane case is >= 3
requests, and multiple queues.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 2/8] block: improve logic around when to sort a plug list
@ 2018-11-28  0:16             ` Jens Axboe
  0 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-28  0:16 UTC (permalink / raw)


On 11/27/18 5:05 PM, Omar Sandoval wrote:
>> +static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq)
>> +{
>> +	list_add_tail(&rq->queuelist, &plug->mq_list);
>> +	plug->rq_count++;
>> +	if (!plug->multiple_queues && !list_is_singular(&plug->mq_list)) {
>> +		struct request *tmp;
>> +
>> +		tmp = list_first_entry(&plug->mq_list, struct request,
>> +						queuelist);
>> +		if (tmp->q != rq->q)
>> +			plug->multiple_queues = true;
> 
> Actually, I thought we want to sort whenever there are different
> software/hardware queues on the plug list, even if they're on the same
> queue?

That was the original intent, but with the tons of testing I've done
lately, the cost of sorting batches is higher than the cost we pay for
having to do multiple inserts. So I think the only sane case is >= 3
requests, and multiple queues.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 3/8] blk-mq: add mq_ops->commit_rqs()
  2018-11-26 16:35   ` Jens Axboe
@ 2018-11-28  1:38     ` Ming Lei
  -1 siblings, 0 replies; 84+ messages in thread
From: Ming Lei @ 2018-11-28  1:38 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

On Mon, Nov 26, 2018 at 09:35:51AM -0700, Jens Axboe wrote:
> blk-mq passes information to the hardware about any given request being
> the last that we will issue in this sequence. The point is that hardware
> can defer costly doorbell type writes to the last request. But if we run
> into errors issuing a sequence of requests, we may never send the request
> with bd->last == true set. For that case, we need a hook that tells the
> hardware that nothing else is coming right now.
> 
> For failures returned by the drivers ->queue_rq() hook, the driver is
> responsible for flushing pending requests, if it uses bd->last to
> optimize that part. This works like before, no changes there.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  include/linux/blk-mq.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index ca0520ca6437..1fd139b65a6e 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -117,6 +117,7 @@ struct blk_mq_queue_data {
>  
>  typedef blk_status_t (queue_rq_fn)(struct blk_mq_hw_ctx *,
>  		const struct blk_mq_queue_data *);
> +typedef void (commit_rqs_fn)(struct blk_mq_hw_ctx *);
>  /* takes rq->cmd_flags as input, returns a hardware type index */
>  typedef int (rq_flags_to_type_fn)(struct request_queue *, unsigned int);
>  typedef bool (get_budget_fn)(struct blk_mq_hw_ctx *);
> @@ -144,6 +145,15 @@ struct blk_mq_ops {
>  	 */
>  	queue_rq_fn		*queue_rq;
>  
> +	/*
> +	 * If a driver uses bd->last to judge when to submit requests to
> +	 * hardware, it must define this function. In case of errors that
> +	 * make us stop issuing further requests, this hook serves the
> +	 * purpose of kicking the hardware (which the last request otherwise
> +	 * would have done).
> +	 */
> +	commit_rqs_fn		*commit_rqs;
> +
>  	/*
>  	 * Return a queue map type for the given request/bio flags
>  	 */
> -- 
> 2.17.1
>

Looks fine,

Reviewed-by: Ming Lei <ming.lei@redhat.com>

Thanks,
Ming

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 3/8] blk-mq: add mq_ops->commit_rqs()
@ 2018-11-28  1:38     ` Ming Lei
  0 siblings, 0 replies; 84+ messages in thread
From: Ming Lei @ 2018-11-28  1:38 UTC (permalink / raw)


On Mon, Nov 26, 2018@09:35:51AM -0700, Jens Axboe wrote:
> blk-mq passes information to the hardware about any given request being
> the last that we will issue in this sequence. The point is that hardware
> can defer costly doorbell type writes to the last request. But if we run
> into errors issuing a sequence of requests, we may never send the request
> with bd->last == true set. For that case, we need a hook that tells the
> hardware that nothing else is coming right now.
> 
> For failures returned by the drivers ->queue_rq() hook, the driver is
> responsible for flushing pending requests, if it uses bd->last to
> optimize that part. This works like before, no changes there.
> 
> Signed-off-by: Jens Axboe <axboe at kernel.dk>
> ---
>  include/linux/blk-mq.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index ca0520ca6437..1fd139b65a6e 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -117,6 +117,7 @@ struct blk_mq_queue_data {
>  
>  typedef blk_status_t (queue_rq_fn)(struct blk_mq_hw_ctx *,
>  		const struct blk_mq_queue_data *);
> +typedef void (commit_rqs_fn)(struct blk_mq_hw_ctx *);
>  /* takes rq->cmd_flags as input, returns a hardware type index */
>  typedef int (rq_flags_to_type_fn)(struct request_queue *, unsigned int);
>  typedef bool (get_budget_fn)(struct blk_mq_hw_ctx *);
> @@ -144,6 +145,15 @@ struct blk_mq_ops {
>  	 */
>  	queue_rq_fn		*queue_rq;
>  
> +	/*
> +	 * If a driver uses bd->last to judge when to submit requests to
> +	 * hardware, it must define this function. In case of errors that
> +	 * make us stop issuing further requests, this hook serves the
> +	 * purpose of kicking the hardware (which the last request otherwise
> +	 * would have done).
> +	 */
> +	commit_rqs_fn		*commit_rqs;
> +
>  	/*
>  	 * Return a queue map type for the given request/bio flags
>  	 */
> -- 
> 2.17.1
>

Looks fine,

Reviewed-by: Ming Lei <ming.lei at redhat.com>

Thanks,
Ming

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 7/8] blk-mq: use bd->last == true for list inserts
  2018-11-26 16:35   ` Jens Axboe
@ 2018-11-28  1:49     ` Ming Lei
  -1 siblings, 0 replies; 84+ messages in thread
From: Ming Lei @ 2018-11-28  1:49 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

On Mon, Nov 26, 2018 at 09:35:55AM -0700, Jens Axboe wrote:
> If we are issuing a list of requests, we know if we're at the last one.
> If we fail issuing, ensure that we call ->commits_rqs() to flush any
> potential previous requests.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  block/blk-core.c |  2 +-
>  block/blk-mq.c   | 32 ++++++++++++++++++++++++--------
>  block/blk-mq.h   |  2 +-
>  3 files changed, 26 insertions(+), 10 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index c9758d185357..808a65d23f1a 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1334,7 +1334,7 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *
>  	 * bypass a potential scheduler on the bottom device for
>  	 * insert.
>  	 */
> -	return blk_mq_request_issue_directly(rq);
> +	return blk_mq_request_issue_directly(rq, true);
>  }
>  EXPORT_SYMBOL_GPL(blk_insert_cloned_request);
>  
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 6a249bf6ed00..0a12cec0b426 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1260,6 +1260,14 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
>  	if (!list_empty(list)) {
>  		bool needs_restart;
>  
> +		/*
> +		 * If we didn't flush the entire list, we could have told
> +		 * the driver there was more coming, but that turned out to
> +		 * be a lie.
> +		 */
> +		if (q->mq_ops->commit_rqs)
> +			q->mq_ops->commit_rqs(hctx);
> +

Looks you miss to do it for blk_mq_do_dispatch_sched() and
blk_mq_do_dispatch_ctx(), in which only one request is added to
the rq_list.

Maybe we can call the .commit_rqs(hctx) just at the end of
blk_mq_sched_dispatch_requests() for cover all non-direct-issue
cases.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 7/8] blk-mq: use bd->last == true for list inserts
@ 2018-11-28  1:49     ` Ming Lei
  0 siblings, 0 replies; 84+ messages in thread
From: Ming Lei @ 2018-11-28  1:49 UTC (permalink / raw)


On Mon, Nov 26, 2018@09:35:55AM -0700, Jens Axboe wrote:
> If we are issuing a list of requests, we know if we're at the last one.
> If we fail issuing, ensure that we call ->commits_rqs() to flush any
> potential previous requests.
> 
> Signed-off-by: Jens Axboe <axboe at kernel.dk>
> ---
>  block/blk-core.c |  2 +-
>  block/blk-mq.c   | 32 ++++++++++++++++++++++++--------
>  block/blk-mq.h   |  2 +-
>  3 files changed, 26 insertions(+), 10 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index c9758d185357..808a65d23f1a 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1334,7 +1334,7 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *
>  	 * bypass a potential scheduler on the bottom device for
>  	 * insert.
>  	 */
> -	return blk_mq_request_issue_directly(rq);
> +	return blk_mq_request_issue_directly(rq, true);
>  }
>  EXPORT_SYMBOL_GPL(blk_insert_cloned_request);
>  
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 6a249bf6ed00..0a12cec0b426 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1260,6 +1260,14 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
>  	if (!list_empty(list)) {
>  		bool needs_restart;
>  
> +		/*
> +		 * If we didn't flush the entire list, we could have told
> +		 * the driver there was more coming, but that turned out to
> +		 * be a lie.
> +		 */
> +		if (q->mq_ops->commit_rqs)
> +			q->mq_ops->commit_rqs(hctx);
> +

Looks you miss to do it for blk_mq_do_dispatch_sched() and
blk_mq_do_dispatch_ctx(), in which only one request is added to
the rq_list.

Maybe we can call the .commit_rqs(hctx) just at the end of
blk_mq_sched_dispatch_requests() for cover all non-direct-issue
cases.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook
  2018-11-26 16:35   ` Jens Axboe
@ 2018-11-28  2:10     ` Ming Lei
  -1 siblings, 0 replies; 84+ messages in thread
From: Ming Lei @ 2018-11-28  2:10 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

On Mon, Nov 26, 2018 at 09:35:53AM -0700, Jens Axboe wrote:
> We need this for blk-mq to kick things into gear, if we told it that
> we had more IO coming, but then failed to deliver on that promise.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  drivers/block/virtio_blk.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 6e869d05f91e..b49c57e77780 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq)
>  	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>  }
>  
> +static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
> +{
> +	struct virtio_blk *vblk = hctx->queue->queuedata;
> +	int qid = hctx->queue_num;
> +	bool kick;
> +
> +	spin_lock_irq(&vblk->vqs[qid].lock);
> +	kick = virtqueue_kick_prepare(vblk->vqs[qid].vq);
> +	spin_unlock_irq(&vblk->vqs[qid].lock);
> +
> +	if (kick)
> +		virtqueue_notify(vblk->vqs[qid].vq);
> +}
> +
>  static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>  			   const struct blk_mq_queue_data *bd)
>  {
> @@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req)
>  
>  static const struct blk_mq_ops virtio_mq_ops = {
>  	.queue_rq	= virtio_queue_rq,
> +	.commit_rqs	= virtio_commit_rqs,
>  	.complete	= virtblk_request_done,
>  	.init_request	= virtblk_init_request,
>  #ifdef CONFIG_VIRTIO_BLK_SCSI
> -- 
> 2.17.1
> 

If .commit_rqs() is implemented, virtqueue_notify() in virtio_queue_rq()
should have been removed for saving the world switch per .queue_rq()

thanks,
Ming

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook
@ 2018-11-28  2:10     ` Ming Lei
  0 siblings, 0 replies; 84+ messages in thread
From: Ming Lei @ 2018-11-28  2:10 UTC (permalink / raw)


On Mon, Nov 26, 2018@09:35:53AM -0700, Jens Axboe wrote:
> We need this for blk-mq to kick things into gear, if we told it that
> we had more IO coming, but then failed to deliver on that promise.
> 
> Signed-off-by: Jens Axboe <axboe at kernel.dk>
> ---
>  drivers/block/virtio_blk.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 6e869d05f91e..b49c57e77780 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq)
>  	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>  }
>  
> +static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
> +{
> +	struct virtio_blk *vblk = hctx->queue->queuedata;
> +	int qid = hctx->queue_num;
> +	bool kick;
> +
> +	spin_lock_irq(&vblk->vqs[qid].lock);
> +	kick = virtqueue_kick_prepare(vblk->vqs[qid].vq);
> +	spin_unlock_irq(&vblk->vqs[qid].lock);
> +
> +	if (kick)
> +		virtqueue_notify(vblk->vqs[qid].vq);
> +}
> +
>  static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>  			   const struct blk_mq_queue_data *bd)
>  {
> @@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req)
>  
>  static const struct blk_mq_ops virtio_mq_ops = {
>  	.queue_rq	= virtio_queue_rq,
> +	.commit_rqs	= virtio_commit_rqs,
>  	.complete	= virtblk_request_done,
>  	.init_request	= virtblk_init_request,
>  #ifdef CONFIG_VIRTIO_BLK_SCSI
> -- 
> 2.17.1
> 

If .commit_rqs() is implemented, virtqueue_notify() in virtio_queue_rq()
should have been removed for saving the world switch per .queue_rq()

thanks,
Ming

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook
  2018-11-28  2:10     ` Ming Lei
@ 2018-11-28  2:34       ` Jens Axboe
  -1 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-28  2:34 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block, linux-nvme

On 11/27/18 7:10 PM, Ming Lei wrote:
> On Mon, Nov 26, 2018 at 09:35:53AM -0700, Jens Axboe wrote:
>> We need this for blk-mq to kick things into gear, if we told it that
>> we had more IO coming, but then failed to deliver on that promise.
>>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> ---
>>  drivers/block/virtio_blk.c | 15 +++++++++++++++
>>  1 file changed, 15 insertions(+)
>>
>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>> index 6e869d05f91e..b49c57e77780 100644
>> --- a/drivers/block/virtio_blk.c
>> +++ b/drivers/block/virtio_blk.c
>> @@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq)
>>  	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>>  }
>>  
>> +static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
>> +{
>> +	struct virtio_blk *vblk = hctx->queue->queuedata;
>> +	int qid = hctx->queue_num;
>> +	bool kick;
>> +
>> +	spin_lock_irq(&vblk->vqs[qid].lock);
>> +	kick = virtqueue_kick_prepare(vblk->vqs[qid].vq);
>> +	spin_unlock_irq(&vblk->vqs[qid].lock);
>> +
>> +	if (kick)
>> +		virtqueue_notify(vblk->vqs[qid].vq);
>> +}
>> +
>>  static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>>  			   const struct blk_mq_queue_data *bd)
>>  {
>> @@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req)
>>  
>>  static const struct blk_mq_ops virtio_mq_ops = {
>>  	.queue_rq	= virtio_queue_rq,
>> +	.commit_rqs	= virtio_commit_rqs,
>>  	.complete	= virtblk_request_done,
>>  	.init_request	= virtblk_init_request,
>>  #ifdef CONFIG_VIRTIO_BLK_SCSI
>> -- 
>> 2.17.1
>>
> 
> If .commit_rqs() is implemented, virtqueue_notify() in virtio_queue_rq()
> should have been removed for saving the world switch per .queue_rq()

->commits_rqs() is only for the case where bd->last is set to false,
and we never make it to the end and flag bd->last == true. If bd->last
is true, the driver should kick things into gear.


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook
@ 2018-11-28  2:34       ` Jens Axboe
  0 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-28  2:34 UTC (permalink / raw)


On 11/27/18 7:10 PM, Ming Lei wrote:
> On Mon, Nov 26, 2018@09:35:53AM -0700, Jens Axboe wrote:
>> We need this for blk-mq to kick things into gear, if we told it that
>> we had more IO coming, but then failed to deliver on that promise.
>>
>> Signed-off-by: Jens Axboe <axboe at kernel.dk>
>> ---
>>  drivers/block/virtio_blk.c | 15 +++++++++++++++
>>  1 file changed, 15 insertions(+)
>>
>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>> index 6e869d05f91e..b49c57e77780 100644
>> --- a/drivers/block/virtio_blk.c
>> +++ b/drivers/block/virtio_blk.c
>> @@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq)
>>  	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>>  }
>>  
>> +static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
>> +{
>> +	struct virtio_blk *vblk = hctx->queue->queuedata;
>> +	int qid = hctx->queue_num;
>> +	bool kick;
>> +
>> +	spin_lock_irq(&vblk->vqs[qid].lock);
>> +	kick = virtqueue_kick_prepare(vblk->vqs[qid].vq);
>> +	spin_unlock_irq(&vblk->vqs[qid].lock);
>> +
>> +	if (kick)
>> +		virtqueue_notify(vblk->vqs[qid].vq);
>> +}
>> +
>>  static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>>  			   const struct blk_mq_queue_data *bd)
>>  {
>> @@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req)
>>  
>>  static const struct blk_mq_ops virtio_mq_ops = {
>>  	.queue_rq	= virtio_queue_rq,
>> +	.commit_rqs	= virtio_commit_rqs,
>>  	.complete	= virtblk_request_done,
>>  	.init_request	= virtblk_init_request,
>>  #ifdef CONFIG_VIRTIO_BLK_SCSI
>> -- 
>> 2.17.1
>>
> 
> If .commit_rqs() is implemented, virtqueue_notify() in virtio_queue_rq()
> should have been removed for saving the world switch per .queue_rq()

->commits_rqs() is only for the case where bd->last is set to false,
and we never make it to the end and flag bd->last == true. If bd->last
is true, the driver should kick things into gear.


-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 7/8] blk-mq: use bd->last == true for list inserts
  2018-11-28  1:49     ` Ming Lei
@ 2018-11-28  2:37       ` Jens Axboe
  -1 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-28  2:37 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block, linux-nvme

On 11/27/18 6:49 PM, Ming Lei wrote:
> On Mon, Nov 26, 2018 at 09:35:55AM -0700, Jens Axboe wrote:
>> If we are issuing a list of requests, we know if we're at the last one.
>> If we fail issuing, ensure that we call ->commits_rqs() to flush any
>> potential previous requests.
>>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> ---
>>  block/blk-core.c |  2 +-
>>  block/blk-mq.c   | 32 ++++++++++++++++++++++++--------
>>  block/blk-mq.h   |  2 +-
>>  3 files changed, 26 insertions(+), 10 deletions(-)
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index c9758d185357..808a65d23f1a 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -1334,7 +1334,7 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *
>>  	 * bypass a potential scheduler on the bottom device for
>>  	 * insert.
>>  	 */
>> -	return blk_mq_request_issue_directly(rq);
>> +	return blk_mq_request_issue_directly(rq, true);
>>  }
>>  EXPORT_SYMBOL_GPL(blk_insert_cloned_request);
>>  
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 6a249bf6ed00..0a12cec0b426 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -1260,6 +1260,14 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
>>  	if (!list_empty(list)) {
>>  		bool needs_restart;
>>  
>> +		/*
>> +		 * If we didn't flush the entire list, we could have told
>> +		 * the driver there was more coming, but that turned out to
>> +		 * be a lie.
>> +		 */
>> +		if (q->mq_ops->commit_rqs)
>> +			q->mq_ops->commit_rqs(hctx);
>> +
> 
> Looks you miss to do it for blk_mq_do_dispatch_sched() and
> blk_mq_do_dispatch_ctx(), in which only one request is added to
> the rq_list.

blk_mq_do_dispatch() is handled through blk_mq_dispatch_rq_list(),
it doesn't need to do this on its own, it's only working on
single requests.

Ditto for blk_mq_do_dispatch()

> Maybe we can call the .commit_rqs(hctx) just at the end of
> blk_mq_sched_dispatch_requests() for cover all non-direct-issue
> cases.

I think you miss the point of ->commits_rqs() - it's only meant to
catch the case where we set bd->last == false, and never got to
the end. Like not getting a driver tag, for instance. It's not
meant to always be a kick things into gear, it's more efficient
to use bd->last for the general case, as most drivers require some
sort of locking for the submission.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 7/8] blk-mq: use bd->last == true for list inserts
@ 2018-11-28  2:37       ` Jens Axboe
  0 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-28  2:37 UTC (permalink / raw)


On 11/27/18 6:49 PM, Ming Lei wrote:
> On Mon, Nov 26, 2018@09:35:55AM -0700, Jens Axboe wrote:
>> If we are issuing a list of requests, we know if we're at the last one.
>> If we fail issuing, ensure that we call ->commits_rqs() to flush any
>> potential previous requests.
>>
>> Signed-off-by: Jens Axboe <axboe at kernel.dk>
>> ---
>>  block/blk-core.c |  2 +-
>>  block/blk-mq.c   | 32 ++++++++++++++++++++++++--------
>>  block/blk-mq.h   |  2 +-
>>  3 files changed, 26 insertions(+), 10 deletions(-)
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index c9758d185357..808a65d23f1a 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -1334,7 +1334,7 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *
>>  	 * bypass a potential scheduler on the bottom device for
>>  	 * insert.
>>  	 */
>> -	return blk_mq_request_issue_directly(rq);
>> +	return blk_mq_request_issue_directly(rq, true);
>>  }
>>  EXPORT_SYMBOL_GPL(blk_insert_cloned_request);
>>  
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 6a249bf6ed00..0a12cec0b426 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -1260,6 +1260,14 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
>>  	if (!list_empty(list)) {
>>  		bool needs_restart;
>>  
>> +		/*
>> +		 * If we didn't flush the entire list, we could have told
>> +		 * the driver there was more coming, but that turned out to
>> +		 * be a lie.
>> +		 */
>> +		if (q->mq_ops->commit_rqs)
>> +			q->mq_ops->commit_rqs(hctx);
>> +
> 
> Looks you miss to do it for blk_mq_do_dispatch_sched() and
> blk_mq_do_dispatch_ctx(), in which only one request is added to
> the rq_list.

blk_mq_do_dispatch() is handled through blk_mq_dispatch_rq_list(),
it doesn't need to do this on its own, it's only working on
single requests.

Ditto for blk_mq_do_dispatch()

> Maybe we can call the .commit_rqs(hctx) just at the end of
> blk_mq_sched_dispatch_requests() for cover all non-direct-issue
> cases.

I think you miss the point of ->commits_rqs() - it's only meant to
catch the case where we set bd->last == false, and never got to
the end. Like not getting a driver tag, for instance. It's not
meant to always be a kick things into gear, it's more efficient
to use bd->last for the general case, as most drivers require some
sort of locking for the submission.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook
  2018-11-27 23:45     ` Omar Sandoval
@ 2018-11-28  3:05       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 84+ messages in thread
From: Michael S. Tsirkin @ 2018-11-28  3:05 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: Jens Axboe, linux-block, linux-nvme, Jason Wang

On Tue, Nov 27, 2018 at 03:45:38PM -0800, Omar Sandoval wrote:
> On Mon, Nov 26, 2018 at 09:35:53AM -0700, Jens Axboe wrote:
> > We need this for blk-mq to kick things into gear, if we told it that
> > we had more IO coming, but then failed to deliver on that promise.
> 
> Reviewed-by: Omar Sandoval <osandov@fb.com>
> 
> But also cc'd the virtio-blk maintainers.

Acked-by: Michael S. Tsirkin <mst@redhat.com>


Feel free to queue with other changes.


> > Signed-off-by: Jens Axboe <axboe@kernel.dk>
> > ---
> >  drivers/block/virtio_blk.c | 15 +++++++++++++++
> >  1 file changed, 15 insertions(+)
> > 
> > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > index 6e869d05f91e..b49c57e77780 100644
> > --- a/drivers/block/virtio_blk.c
> > +++ b/drivers/block/virtio_blk.c
> > @@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq)
> >  	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
> >  }
> >  
> > +static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
> > +{
> > +	struct virtio_blk *vblk = hctx->queue->queuedata;
> > +	int qid = hctx->queue_num;
> > +	bool kick;
> > +
> > +	spin_lock_irq(&vblk->vqs[qid].lock);
> > +	kick = virtqueue_kick_prepare(vblk->vqs[qid].vq);
> > +	spin_unlock_irq(&vblk->vqs[qid].lock);
> > +
> > +	if (kick)
> > +		virtqueue_notify(vblk->vqs[qid].vq);
> > +}
> > +
> >  static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
> >  			   const struct blk_mq_queue_data *bd)
> >  {
> > @@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req)
> >  
> >  static const struct blk_mq_ops virtio_mq_ops = {
> >  	.queue_rq	= virtio_queue_rq,
> > +	.commit_rqs	= virtio_commit_rqs,
> >  	.complete	= virtblk_request_done,
> >  	.init_request	= virtblk_init_request,
> >  #ifdef CONFIG_VIRTIO_BLK_SCSI
> > -- 
> > 2.17.1
> > 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook
@ 2018-11-28  3:05       ` Michael S. Tsirkin
  0 siblings, 0 replies; 84+ messages in thread
From: Michael S. Tsirkin @ 2018-11-28  3:05 UTC (permalink / raw)


On Tue, Nov 27, 2018@03:45:38PM -0800, Omar Sandoval wrote:
> On Mon, Nov 26, 2018@09:35:53AM -0700, Jens Axboe wrote:
> > We need this for blk-mq to kick things into gear, if we told it that
> > we had more IO coming, but then failed to deliver on that promise.
> 
> Reviewed-by: Omar Sandoval <osandov at fb.com>
> 
> But also cc'd the virtio-blk maintainers.

Acked-by: Michael S. Tsirkin <mst at redhat.com>


Feel free to queue with other changes.


> > Signed-off-by: Jens Axboe <axboe at kernel.dk>
> > ---
> >  drivers/block/virtio_blk.c | 15 +++++++++++++++
> >  1 file changed, 15 insertions(+)
> > 
> > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > index 6e869d05f91e..b49c57e77780 100644
> > --- a/drivers/block/virtio_blk.c
> > +++ b/drivers/block/virtio_blk.c
> > @@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq)
> >  	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
> >  }
> >  
> > +static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
> > +{
> > +	struct virtio_blk *vblk = hctx->queue->queuedata;
> > +	int qid = hctx->queue_num;
> > +	bool kick;
> > +
> > +	spin_lock_irq(&vblk->vqs[qid].lock);
> > +	kick = virtqueue_kick_prepare(vblk->vqs[qid].vq);
> > +	spin_unlock_irq(&vblk->vqs[qid].lock);
> > +
> > +	if (kick)
> > +		virtqueue_notify(vblk->vqs[qid].vq);
> > +}
> > +
> >  static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
> >  			   const struct blk_mq_queue_data *bd)
> >  {
> > @@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req)
> >  
> >  static const struct blk_mq_ops virtio_mq_ops = {
> >  	.queue_rq	= virtio_queue_rq,
> > +	.commit_rqs	= virtio_commit_rqs,
> >  	.complete	= virtblk_request_done,
> >  	.init_request	= virtblk_init_request,
> >  #ifdef CONFIG_VIRTIO_BLK_SCSI
> > -- 
> > 2.17.1
> > 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 3/8] blk-mq: add mq_ops->commit_rqs()
  2018-11-26 16:35   ` Jens Axboe
@ 2018-11-28  7:16     ` Christoph Hellwig
  -1 siblings, 0 replies; 84+ messages in thread
From: Christoph Hellwig @ 2018-11-28  7:16 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

On Mon, Nov 26, 2018 at 09:35:51AM -0700, Jens Axboe wrote:
> blk-mq passes information to the hardware about any given request being
> the last that we will issue in this sequence. The point is that hardware
> can defer costly doorbell type writes to the last request. But if we run
> into errors issuing a sequence of requests, we may never send the request
> with bd->last == true set. For that case, we need a hook that tells the
> hardware that nothing else is coming right now.
> 
> For failures returned by the drivers ->queue_rq() hook, the driver is
> responsible for flushing pending requests, if it uses bd->last to
> optimize that part. This works like before, no changes there.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

This looks fine, but normally I would only add the method together with
the actual user..

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 3/8] blk-mq: add mq_ops->commit_rqs()
@ 2018-11-28  7:16     ` Christoph Hellwig
  0 siblings, 0 replies; 84+ messages in thread
From: Christoph Hellwig @ 2018-11-28  7:16 UTC (permalink / raw)


On Mon, Nov 26, 2018@09:35:51AM -0700, Jens Axboe wrote:
> blk-mq passes information to the hardware about any given request being
> the last that we will issue in this sequence. The point is that hardware
> can defer costly doorbell type writes to the last request. But if we run
> into errors issuing a sequence of requests, we may never send the request
> with bd->last == true set. For that case, we need a hook that tells the
> hardware that nothing else is coming right now.
> 
> For failures returned by the drivers ->queue_rq() hook, the driver is
> responsible for flushing pending requests, if it uses bd->last to
> optimize that part. This works like before, no changes there.
> 
> Signed-off-by: Jens Axboe <axboe at kernel.dk>

This looks fine, but normally I would only add the method together with
the actual user..

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 4/8] nvme: implement mq_ops->commit_rqs() hook
  2018-11-26 16:35   ` Jens Axboe
@ 2018-11-28  7:20     ` Christoph Hellwig
  -1 siblings, 0 replies; 84+ messages in thread
From: Christoph Hellwig @ 2018-11-28  7:20 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

> +static inline int nvme_next_ring_index(struct nvme_queue *nvmeq, u16 index)
> +{
> +	if (++index == nvmeq->q_depth)
> +		return 0;
> +
> +	return index;

Can you please just drop this helper?  It makes the code not only
less readable but also longer.

Otherwise the change looks fine to me.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 4/8] nvme: implement mq_ops->commit_rqs() hook
@ 2018-11-28  7:20     ` Christoph Hellwig
  0 siblings, 0 replies; 84+ messages in thread
From: Christoph Hellwig @ 2018-11-28  7:20 UTC (permalink / raw)


> +static inline int nvme_next_ring_index(struct nvme_queue *nvmeq, u16 index)
> +{
> +	if (++index == nvmeq->q_depth)
> +		return 0;
> +
> +	return index;

Can you please just drop this helper?  It makes the code not only
less readable but also longer.

Otherwise the change looks fine to me.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook
  2018-11-26 16:35   ` Jens Axboe
@ 2018-11-28  7:21     ` Christoph Hellwig
  -1 siblings, 0 replies; 84+ messages in thread
From: Christoph Hellwig @ 2018-11-28  7:21 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

On Mon, Nov 26, 2018 at 09:35:53AM -0700, Jens Axboe wrote:
> We need this for blk-mq to kick things into gear, if we told it that
> we had more IO coming, but then failed to deliver on that promise.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook
@ 2018-11-28  7:21     ` Christoph Hellwig
  0 siblings, 0 replies; 84+ messages in thread
From: Christoph Hellwig @ 2018-11-28  7:21 UTC (permalink / raw)


On Mon, Nov 26, 2018@09:35:53AM -0700, Jens Axboe wrote:
> We need this for blk-mq to kick things into gear, if we told it that
> we had more IO coming, but then failed to deliver on that promise.
> 
> Signed-off-by: Jens Axboe <axboe at kernel.dk>

Looks fine,

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 6/8] ataflop: implement mq_ops->commit_rqs() hook
  2018-11-26 16:35   ` Jens Axboe
@ 2018-11-28  7:22     ` Christoph Hellwig
  -1 siblings, 0 replies; 84+ messages in thread
From: Christoph Hellwig @ 2018-11-28  7:22 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

On Mon, Nov 26, 2018 at 09:35:54AM -0700, Jens Axboe wrote:
> We need this for blk-mq to kick things into gear, if we told it that
> we had more IO coming, but then failed to deliver on that promise.

queue_rq also calls finish_fdc under atari_disable_irq( IRQ_MFP_FDC ),
do we need that here as well?

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 6/8] ataflop: implement mq_ops->commit_rqs() hook
@ 2018-11-28  7:22     ` Christoph Hellwig
  0 siblings, 0 replies; 84+ messages in thread
From: Christoph Hellwig @ 2018-11-28  7:22 UTC (permalink / raw)


On Mon, Nov 26, 2018@09:35:54AM -0700, Jens Axboe wrote:
> We need this for blk-mq to kick things into gear, if we told it that
> we had more IO coming, but then failed to deliver on that promise.

queue_rq also calls finish_fdc under atari_disable_irq( IRQ_MFP_FDC ),
do we need that here as well?

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 8/8] blk-mq: add plug case for devices that implement ->commits_rqs()
  2018-11-26 16:35   ` Jens Axboe
@ 2018-11-28  7:26     ` Christoph Hellwig
  -1 siblings, 0 replies; 84+ messages in thread
From: Christoph Hellwig @ 2018-11-28  7:26 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

> +	} else if (plug && q->mq_ops->commit_rqs) {
> +		/*
> +		 * If we have a ->commit_rqs(), then we know the driver can
> +		 * batch submission doorbell updates. Add rq to plug list,
> +		 * and flush if we exceed the plug count only.
> +		 */
> +		blk_mq_bio_to_request(rq, bio);
> +		blk_mq_put_ctx(data.ctx);
> +
> +		/*
> +		 * If we have requests for more than one queue here, we
> +		 * should sort the list when it's flushed.
> +		 */
> +		if (!plug->do_sort && !list_empty(&plug->mq_list)) {
> +			same_queue_rq = list_first_entry(&plug->mq_list,
> +						struct request, queuelist);
> +			if (same_queue_rq->q != q)
> +				plug->do_sort = true;
> +		}
> +
> +		list_add_tail(&rq->queuelist, &plug->mq_list);
> +		plug->rq_count++;
> +		if (plug->rq_count >= BLK_MAX_REQUEST_COUNT) {
> +			blk_flush_plug_list(plug, false);
> +			trace_block_plug(q);
> +		}

This seems like a slight variation of the the single queue plug
case.  Can we share the code instead of having slightly differing
logic?

Also that should this respect blk_queue_nomerges?

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 8/8] blk-mq: add plug case for devices that implement ->commits_rqs()
@ 2018-11-28  7:26     ` Christoph Hellwig
  0 siblings, 0 replies; 84+ messages in thread
From: Christoph Hellwig @ 2018-11-28  7:26 UTC (permalink / raw)


> +	} else if (plug && q->mq_ops->commit_rqs) {
> +		/*
> +		 * If we have a ->commit_rqs(), then we know the driver can
> +		 * batch submission doorbell updates. Add rq to plug list,
> +		 * and flush if we exceed the plug count only.
> +		 */
> +		blk_mq_bio_to_request(rq, bio);
> +		blk_mq_put_ctx(data.ctx);
> +
> +		/*
> +		 * If we have requests for more than one queue here, we
> +		 * should sort the list when it's flushed.
> +		 */
> +		if (!plug->do_sort && !list_empty(&plug->mq_list)) {
> +			same_queue_rq = list_first_entry(&plug->mq_list,
> +						struct request, queuelist);
> +			if (same_queue_rq->q != q)
> +				plug->do_sort = true;
> +		}
> +
> +		list_add_tail(&rq->queuelist, &plug->mq_list);
> +		plug->rq_count++;
> +		if (plug->rq_count >= BLK_MAX_REQUEST_COUNT) {
> +			blk_flush_plug_list(plug, false);
> +			trace_block_plug(q);
> +		}

This seems like a slight variation of the the single queue plug
case.  Can we share the code instead of having slightly differing
logic?

Also that should this respect blk_queue_nomerges?

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 3/8] blk-mq: add mq_ops->commit_rqs()
  2018-11-28  7:16     ` Christoph Hellwig
@ 2018-11-28 12:54       ` Jens Axboe
  -1 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-28 12:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, linux-nvme

On 11/28/18 12:16 AM, Christoph Hellwig wrote:
> On Mon, Nov 26, 2018 at 09:35:51AM -0700, Jens Axboe wrote:
>> blk-mq passes information to the hardware about any given request being
>> the last that we will issue in this sequence. The point is that hardware
>> can defer costly doorbell type writes to the last request. But if we run
>> into errors issuing a sequence of requests, we may never send the request
>> with bd->last == true set. For that case, we need a hook that tells the
>> hardware that nothing else is coming right now.
>>
>> For failures returned by the drivers ->queue_rq() hook, the driver is
>> responsible for flushing pending requests, if it uses bd->last to
>> optimize that part. This works like before, no changes there.
>>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> This looks fine, but normally I would only add the method together with
> the actual user..

I included the two hunks from patch 7 in this one, so there's a real
use (and fix) with it.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 3/8] blk-mq: add mq_ops->commit_rqs()
@ 2018-11-28 12:54       ` Jens Axboe
  0 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-28 12:54 UTC (permalink / raw)


On 11/28/18 12:16 AM, Christoph Hellwig wrote:
> On Mon, Nov 26, 2018@09:35:51AM -0700, Jens Axboe wrote:
>> blk-mq passes information to the hardware about any given request being
>> the last that we will issue in this sequence. The point is that hardware
>> can defer costly doorbell type writes to the last request. But if we run
>> into errors issuing a sequence of requests, we may never send the request
>> with bd->last == true set. For that case, we need a hook that tells the
>> hardware that nothing else is coming right now.
>>
>> For failures returned by the drivers ->queue_rq() hook, the driver is
>> responsible for flushing pending requests, if it uses bd->last to
>> optimize that part. This works like before, no changes there.
>>
>> Signed-off-by: Jens Axboe <axboe at kernel.dk>
> 
> This looks fine, but normally I would only add the method together with
> the actual user..

I included the two hunks from patch 7 in this one, so there's a real
use (and fix) with it.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 4/8] nvme: implement mq_ops->commit_rqs() hook
  2018-11-28  7:20     ` Christoph Hellwig
@ 2018-11-28 13:07       ` Jens Axboe
  -1 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-28 13:07 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, linux-nvme

On 11/28/18 12:20 AM, Christoph Hellwig wrote:
>> +static inline int nvme_next_ring_index(struct nvme_queue *nvmeq, u16 index)
>> +{
>> +	if (++index == nvmeq->q_depth)
>> +		return 0;
>> +
>> +	return index;
> 
> Can you please just drop this helper?  It makes the code not only
> less readable but also longer.
> 
> Otherwise the change looks fine to me.

Alright, dropped the helper.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 4/8] nvme: implement mq_ops->commit_rqs() hook
@ 2018-11-28 13:07       ` Jens Axboe
  0 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-28 13:07 UTC (permalink / raw)


On 11/28/18 12:20 AM, Christoph Hellwig wrote:
>> +static inline int nvme_next_ring_index(struct nvme_queue *nvmeq, u16 index)
>> +{
>> +	if (++index == nvmeq->q_depth)
>> +		return 0;
>> +
>> +	return index;
> 
> Can you please just drop this helper?  It makes the code not only
> less readable but also longer.
> 
> Otherwise the change looks fine to me.

Alright, dropped the helper.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 6/8] ataflop: implement mq_ops->commit_rqs() hook
  2018-11-28  7:22     ` Christoph Hellwig
@ 2018-11-28 13:09       ` Jens Axboe
  -1 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-28 13:09 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, linux-nvme

On 11/28/18 12:22 AM, Christoph Hellwig wrote:
> On Mon, Nov 26, 2018 at 09:35:54AM -0700, Jens Axboe wrote:
>> We need this for blk-mq to kick things into gear, if we told it that
>> we had more IO coming, but then failed to deliver on that promise.
> 
> queue_rq also calls finish_fdc under atari_disable_irq( IRQ_MFP_FDC ),
> do we need that here as well?

Probably the safer choice... I'll fix that up.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 6/8] ataflop: implement mq_ops->commit_rqs() hook
@ 2018-11-28 13:09       ` Jens Axboe
  0 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-28 13:09 UTC (permalink / raw)


On 11/28/18 12:22 AM, Christoph Hellwig wrote:
> On Mon, Nov 26, 2018@09:35:54AM -0700, Jens Axboe wrote:
>> We need this for blk-mq to kick things into gear, if we told it that
>> we had more IO coming, but then failed to deliver on that promise.
> 
> queue_rq also calls finish_fdc under atari_disable_irq( IRQ_MFP_FDC ),
> do we need that here as well?

Probably the safer choice... I'll fix that up.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 8/8] blk-mq: add plug case for devices that implement ->commits_rqs()
  2018-11-28  7:26     ` Christoph Hellwig
@ 2018-11-28 13:11       ` Jens Axboe
  -1 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-28 13:11 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, linux-nvme

On 11/28/18 12:26 AM, Christoph Hellwig wrote:
>> +	} else if (plug && q->mq_ops->commit_rqs) {
>> +		/*
>> +		 * If we have a ->commit_rqs(), then we know the driver can
>> +		 * batch submission doorbell updates. Add rq to plug list,
>> +		 * and flush if we exceed the plug count only.
>> +		 */
>> +		blk_mq_bio_to_request(rq, bio);
>> +		blk_mq_put_ctx(data.ctx);
>> +
>> +		/*
>> +		 * If we have requests for more than one queue here, we
>> +		 * should sort the list when it's flushed.
>> +		 */
>> +		if (!plug->do_sort && !list_empty(&plug->mq_list)) {
>> +			same_queue_rq = list_first_entry(&plug->mq_list,
>> +						struct request, queuelist);
>> +			if (same_queue_rq->q != q)
>> +				plug->do_sort = true;
>> +		}
>> +
>> +		list_add_tail(&rq->queuelist, &plug->mq_list);
>> +		plug->rq_count++;
>> +		if (plug->rq_count >= BLK_MAX_REQUEST_COUNT) {
>> +			blk_flush_plug_list(plug, false);
>> +			trace_block_plug(q);
>> +		}
> 
> This seems like a slight variation of the the single queue plug
> case.  Can we share the code instead of having slightly differing
> logic?

I'll see if I can fold them.

> Also that should this respect blk_queue_nomerges?

It doesn't do any merging at all, it's just batching for the list
inserts. In terms of plug merging, if nomerges isn't true, then we would
have done that further up.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 8/8] blk-mq: add plug case for devices that implement ->commits_rqs()
@ 2018-11-28 13:11       ` Jens Axboe
  0 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-28 13:11 UTC (permalink / raw)


On 11/28/18 12:26 AM, Christoph Hellwig wrote:
>> +	} else if (plug && q->mq_ops->commit_rqs) {
>> +		/*
>> +		 * If we have a ->commit_rqs(), then we know the driver can
>> +		 * batch submission doorbell updates. Add rq to plug list,
>> +		 * and flush if we exceed the plug count only.
>> +		 */
>> +		blk_mq_bio_to_request(rq, bio);
>> +		blk_mq_put_ctx(data.ctx);
>> +
>> +		/*
>> +		 * If we have requests for more than one queue here, we
>> +		 * should sort the list when it's flushed.
>> +		 */
>> +		if (!plug->do_sort && !list_empty(&plug->mq_list)) {
>> +			same_queue_rq = list_first_entry(&plug->mq_list,
>> +						struct request, queuelist);
>> +			if (same_queue_rq->q != q)
>> +				plug->do_sort = true;
>> +		}
>> +
>> +		list_add_tail(&rq->queuelist, &plug->mq_list);
>> +		plug->rq_count++;
>> +		if (plug->rq_count >= BLK_MAX_REQUEST_COUNT) {
>> +			blk_flush_plug_list(plug, false);
>> +			trace_block_plug(q);
>> +		}
> 
> This seems like a slight variation of the the single queue plug
> case.  Can we share the code instead of having slightly differing
> logic?

I'll see if I can fold them.

> Also that should this respect blk_queue_nomerges?

It doesn't do any merging at all, it's just batching for the list
inserts. In terms of plug merging, if nomerges isn't true, then we would
have done that further up.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook
  2018-11-28  2:34       ` Jens Axboe
@ 2018-11-29  1:23         ` Ming Lei
  -1 siblings, 0 replies; 84+ messages in thread
From: Ming Lei @ 2018-11-29  1:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

On Tue, Nov 27, 2018 at 07:34:51PM -0700, Jens Axboe wrote:
> On 11/27/18 7:10 PM, Ming Lei wrote:
> > On Mon, Nov 26, 2018 at 09:35:53AM -0700, Jens Axboe wrote:
> >> We need this for blk-mq to kick things into gear, if we told it that
> >> we had more IO coming, but then failed to deliver on that promise.
> >>
> >> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >> ---
> >>  drivers/block/virtio_blk.c | 15 +++++++++++++++
> >>  1 file changed, 15 insertions(+)
> >>
> >> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> >> index 6e869d05f91e..b49c57e77780 100644
> >> --- a/drivers/block/virtio_blk.c
> >> +++ b/drivers/block/virtio_blk.c
> >> @@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq)
> >>  	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
> >>  }
> >>  
> >> +static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
> >> +{
> >> +	struct virtio_blk *vblk = hctx->queue->queuedata;
> >> +	int qid = hctx->queue_num;
> >> +	bool kick;
> >> +
> >> +	spin_lock_irq(&vblk->vqs[qid].lock);
> >> +	kick = virtqueue_kick_prepare(vblk->vqs[qid].vq);
> >> +	spin_unlock_irq(&vblk->vqs[qid].lock);
> >> +
> >> +	if (kick)
> >> +		virtqueue_notify(vblk->vqs[qid].vq);
> >> +}
> >> +
> >>  static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
> >>  			   const struct blk_mq_queue_data *bd)
> >>  {
> >> @@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req)
> >>  
> >>  static const struct blk_mq_ops virtio_mq_ops = {
> >>  	.queue_rq	= virtio_queue_rq,
> >> +	.commit_rqs	= virtio_commit_rqs,
> >>  	.complete	= virtblk_request_done,
> >>  	.init_request	= virtblk_init_request,
> >>  #ifdef CONFIG_VIRTIO_BLK_SCSI
> >> -- 
> >> 2.17.1
> >>
> > 
> > If .commit_rqs() is implemented, virtqueue_notify() in virtio_queue_rq()
> > should have been removed for saving the world switch per .queue_rq()
> 
> ->commits_rqs() is only for the case where bd->last is set to false,
> and we never make it to the end and flag bd->last == true. If bd->last
> is true, the driver should kick things into gear.

OK, looks I misunderstood it. However, virtio-blk doesn't need this
change since virtio_queue_rq() can handle it well. This patch may introduce
one unnecessary VM world switch in case of queue busy.

IMO bd->last won't work well in case of io scheduler given the rq_list
only includes one single request.

I wrote this kind of patch(never posted) before to use sort of ->commits_rqs()
to replace the current bd->last mechanism which need one extra driver tag,
which may improve the above case, also code gets cleaned up.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook
@ 2018-11-29  1:23         ` Ming Lei
  0 siblings, 0 replies; 84+ messages in thread
From: Ming Lei @ 2018-11-29  1:23 UTC (permalink / raw)


On Tue, Nov 27, 2018@07:34:51PM -0700, Jens Axboe wrote:
> On 11/27/18 7:10 PM, Ming Lei wrote:
> > On Mon, Nov 26, 2018@09:35:53AM -0700, Jens Axboe wrote:
> >> We need this for blk-mq to kick things into gear, if we told it that
> >> we had more IO coming, but then failed to deliver on that promise.
> >>
> >> Signed-off-by: Jens Axboe <axboe at kernel.dk>
> >> ---
> >>  drivers/block/virtio_blk.c | 15 +++++++++++++++
> >>  1 file changed, 15 insertions(+)
> >>
> >> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> >> index 6e869d05f91e..b49c57e77780 100644
> >> --- a/drivers/block/virtio_blk.c
> >> +++ b/drivers/block/virtio_blk.c
> >> @@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq)
> >>  	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
> >>  }
> >>  
> >> +static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
> >> +{
> >> +	struct virtio_blk *vblk = hctx->queue->queuedata;
> >> +	int qid = hctx->queue_num;
> >> +	bool kick;
> >> +
> >> +	spin_lock_irq(&vblk->vqs[qid].lock);
> >> +	kick = virtqueue_kick_prepare(vblk->vqs[qid].vq);
> >> +	spin_unlock_irq(&vblk->vqs[qid].lock);
> >> +
> >> +	if (kick)
> >> +		virtqueue_notify(vblk->vqs[qid].vq);
> >> +}
> >> +
> >>  static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
> >>  			   const struct blk_mq_queue_data *bd)
> >>  {
> >> @@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req)
> >>  
> >>  static const struct blk_mq_ops virtio_mq_ops = {
> >>  	.queue_rq	= virtio_queue_rq,
> >> +	.commit_rqs	= virtio_commit_rqs,
> >>  	.complete	= virtblk_request_done,
> >>  	.init_request	= virtblk_init_request,
> >>  #ifdef CONFIG_VIRTIO_BLK_SCSI
> >> -- 
> >> 2.17.1
> >>
> > 
> > If .commit_rqs() is implemented, virtqueue_notify() in virtio_queue_rq()
> > should have been removed for saving the world switch per .queue_rq()
> 
> ->commits_rqs() is only for the case where bd->last is set to false,
> and we never make it to the end and flag bd->last == true. If bd->last
> is true, the driver should kick things into gear.

OK, looks I misunderstood it. However, virtio-blk doesn't need this
change since virtio_queue_rq() can handle it well. This patch may introduce
one unnecessary VM world switch in case of queue busy.

IMO bd->last won't work well in case of io scheduler given the rq_list
only includes one single request.

I wrote this kind of patch(never posted) before to use sort of ->commits_rqs()
to replace the current bd->last mechanism which need one extra driver tag,
which may improve the above case, also code gets cleaned up.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook
  2018-11-29  1:23         ` Ming Lei
@ 2018-11-29  2:19           ` Jens Axboe
  -1 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-29  2:19 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block, linux-nvme

On 11/28/18 6:23 PM, Ming Lei wrote:
> On Tue, Nov 27, 2018 at 07:34:51PM -0700, Jens Axboe wrote:
>> On 11/27/18 7:10 PM, Ming Lei wrote:
>>> On Mon, Nov 26, 2018 at 09:35:53AM -0700, Jens Axboe wrote:
>>>> We need this for blk-mq to kick things into gear, if we told it that
>>>> we had more IO coming, but then failed to deliver on that promise.
>>>>
>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>> ---
>>>>  drivers/block/virtio_blk.c | 15 +++++++++++++++
>>>>  1 file changed, 15 insertions(+)
>>>>
>>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>>>> index 6e869d05f91e..b49c57e77780 100644
>>>> --- a/drivers/block/virtio_blk.c
>>>> +++ b/drivers/block/virtio_blk.c
>>>> @@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq)
>>>>  	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>>>>  }
>>>>  
>>>> +static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
>>>> +{
>>>> +	struct virtio_blk *vblk = hctx->queue->queuedata;
>>>> +	int qid = hctx->queue_num;
>>>> +	bool kick;
>>>> +
>>>> +	spin_lock_irq(&vblk->vqs[qid].lock);
>>>> +	kick = virtqueue_kick_prepare(vblk->vqs[qid].vq);
>>>> +	spin_unlock_irq(&vblk->vqs[qid].lock);
>>>> +
>>>> +	if (kick)
>>>> +		virtqueue_notify(vblk->vqs[qid].vq);
>>>> +}
>>>> +
>>>>  static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>>>>  			   const struct blk_mq_queue_data *bd)
>>>>  {
>>>> @@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req)
>>>>  
>>>>  static const struct blk_mq_ops virtio_mq_ops = {
>>>>  	.queue_rq	= virtio_queue_rq,
>>>> +	.commit_rqs	= virtio_commit_rqs,
>>>>  	.complete	= virtblk_request_done,
>>>>  	.init_request	= virtblk_init_request,
>>>>  #ifdef CONFIG_VIRTIO_BLK_SCSI
>>>> -- 
>>>> 2.17.1
>>>>
>>>
>>> If .commit_rqs() is implemented, virtqueue_notify() in virtio_queue_rq()
>>> should have been removed for saving the world switch per .queue_rq()
>>
>> ->commits_rqs() is only for the case where bd->last is set to false,
>> and we never make it to the end and flag bd->last == true. If bd->last
>> is true, the driver should kick things into gear.
> 
> OK, looks I misunderstood it. However, virtio-blk doesn't need this
> change since virtio_queue_rq() can handle it well. This patch may introduce
> one unnecessary VM world switch in case of queue busy.

Not it won't, it may in the case of some failure outside of the driver.
The only reason that virtio-blk doesn't currently hang is because it
has restart logic, and the failure case only happens in the if we
already have IO in-flight. For the NVMe variant, that's not going
to be the case.

> IMO bd->last won't work well in case of io scheduler given the rq_list
> only includes one single request.

But that's a fake limitation that definitely should just be lifted,
the fact that blk-mq-sched is _currently_ just doing single requests
is woefully inefficient.

> I wrote this kind of patch(never posted) before to use sort of
> ->commits_rqs() to replace the current bd->last mechanism which need
> one extra driver tag, which may improve the above case, also code gets
> cleaned up.

It doesn't need one extra driver tag, we currently get an extra one just
to flag ->last correctly. That's not a requirement, that's a limitation
of the current implementation. We could get rid of that, and it it
proves to be an issue, that's not hard to do.

I prefer to deal with actual issues and fix those, not in hypotheticals.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook
@ 2018-11-29  2:19           ` Jens Axboe
  0 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-29  2:19 UTC (permalink / raw)


On 11/28/18 6:23 PM, Ming Lei wrote:
> On Tue, Nov 27, 2018@07:34:51PM -0700, Jens Axboe wrote:
>> On 11/27/18 7:10 PM, Ming Lei wrote:
>>> On Mon, Nov 26, 2018@09:35:53AM -0700, Jens Axboe wrote:
>>>> We need this for blk-mq to kick things into gear, if we told it that
>>>> we had more IO coming, but then failed to deliver on that promise.
>>>>
>>>> Signed-off-by: Jens Axboe <axboe at kernel.dk>
>>>> ---
>>>>  drivers/block/virtio_blk.c | 15 +++++++++++++++
>>>>  1 file changed, 15 insertions(+)
>>>>
>>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>>>> index 6e869d05f91e..b49c57e77780 100644
>>>> --- a/drivers/block/virtio_blk.c
>>>> +++ b/drivers/block/virtio_blk.c
>>>> @@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq)
>>>>  	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>>>>  }
>>>>  
>>>> +static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
>>>> +{
>>>> +	struct virtio_blk *vblk = hctx->queue->queuedata;
>>>> +	int qid = hctx->queue_num;
>>>> +	bool kick;
>>>> +
>>>> +	spin_lock_irq(&vblk->vqs[qid].lock);
>>>> +	kick = virtqueue_kick_prepare(vblk->vqs[qid].vq);
>>>> +	spin_unlock_irq(&vblk->vqs[qid].lock);
>>>> +
>>>> +	if (kick)
>>>> +		virtqueue_notify(vblk->vqs[qid].vq);
>>>> +}
>>>> +
>>>>  static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>>>>  			   const struct blk_mq_queue_data *bd)
>>>>  {
>>>> @@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req)
>>>>  
>>>>  static const struct blk_mq_ops virtio_mq_ops = {
>>>>  	.queue_rq	= virtio_queue_rq,
>>>> +	.commit_rqs	= virtio_commit_rqs,
>>>>  	.complete	= virtblk_request_done,
>>>>  	.init_request	= virtblk_init_request,
>>>>  #ifdef CONFIG_VIRTIO_BLK_SCSI
>>>> -- 
>>>> 2.17.1
>>>>
>>>
>>> If .commit_rqs() is implemented, virtqueue_notify() in virtio_queue_rq()
>>> should have been removed for saving the world switch per .queue_rq()
>>
>> ->commits_rqs() is only for the case where bd->last is set to false,
>> and we never make it to the end and flag bd->last == true. If bd->last
>> is true, the driver should kick things into gear.
> 
> OK, looks I misunderstood it. However, virtio-blk doesn't need this
> change since virtio_queue_rq() can handle it well. This patch may introduce
> one unnecessary VM world switch in case of queue busy.

Not it won't, it may in the case of some failure outside of the driver.
The only reason that virtio-blk doesn't currently hang is because it
has restart logic, and the failure case only happens in the if we
already have IO in-flight. For the NVMe variant, that's not going
to be the case.

> IMO bd->last won't work well in case of io scheduler given the rq_list
> only includes one single request.

But that's a fake limitation that definitely should just be lifted,
the fact that blk-mq-sched is _currently_ just doing single requests
is woefully inefficient.

> I wrote this kind of patch(never posted) before to use sort of
> ->commits_rqs() to replace the current bd->last mechanism which need
> one extra driver tag, which may improve the above case, also code gets
> cleaned up.

It doesn't need one extra driver tag, we currently get an extra one just
to flag ->last correctly. That's not a requirement, that's a limitation
of the current implementation. We could get rid of that, and it it
proves to be an issue, that's not hard to do.

I prefer to deal with actual issues and fix those, not in hypotheticals.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook
  2018-11-29  2:19           ` Jens Axboe
@ 2018-11-29  2:51             ` Ming Lei
  -1 siblings, 0 replies; 84+ messages in thread
From: Ming Lei @ 2018-11-29  2:51 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

On Wed, Nov 28, 2018 at 07:19:09PM -0700, Jens Axboe wrote:
> On 11/28/18 6:23 PM, Ming Lei wrote:
> > On Tue, Nov 27, 2018 at 07:34:51PM -0700, Jens Axboe wrote:
> >> On 11/27/18 7:10 PM, Ming Lei wrote:
> >>> On Mon, Nov 26, 2018 at 09:35:53AM -0700, Jens Axboe wrote:
> >>>> We need this for blk-mq to kick things into gear, if we told it that
> >>>> we had more IO coming, but then failed to deliver on that promise.
> >>>>
> >>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >>>> ---
> >>>>  drivers/block/virtio_blk.c | 15 +++++++++++++++
> >>>>  1 file changed, 15 insertions(+)
> >>>>
> >>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> >>>> index 6e869d05f91e..b49c57e77780 100644
> >>>> --- a/drivers/block/virtio_blk.c
> >>>> +++ b/drivers/block/virtio_blk.c
> >>>> @@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq)
> >>>>  	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
> >>>>  }
> >>>>  
> >>>> +static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
> >>>> +{
> >>>> +	struct virtio_blk *vblk = hctx->queue->queuedata;
> >>>> +	int qid = hctx->queue_num;
> >>>> +	bool kick;
> >>>> +
> >>>> +	spin_lock_irq(&vblk->vqs[qid].lock);
> >>>> +	kick = virtqueue_kick_prepare(vblk->vqs[qid].vq);
> >>>> +	spin_unlock_irq(&vblk->vqs[qid].lock);
> >>>> +
> >>>> +	if (kick)
> >>>> +		virtqueue_notify(vblk->vqs[qid].vq);
> >>>> +}
> >>>> +
> >>>>  static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
> >>>>  			   const struct blk_mq_queue_data *bd)
> >>>>  {
> >>>> @@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req)
> >>>>  
> >>>>  static const struct blk_mq_ops virtio_mq_ops = {
> >>>>  	.queue_rq	= virtio_queue_rq,
> >>>> +	.commit_rqs	= virtio_commit_rqs,
> >>>>  	.complete	= virtblk_request_done,
> >>>>  	.init_request	= virtblk_init_request,
> >>>>  #ifdef CONFIG_VIRTIO_BLK_SCSI
> >>>> -- 
> >>>> 2.17.1
> >>>>
> >>>
> >>> If .commit_rqs() is implemented, virtqueue_notify() in virtio_queue_rq()
> >>> should have been removed for saving the world switch per .queue_rq()
> >>
> >> ->commits_rqs() is only for the case where bd->last is set to false,
> >> and we never make it to the end and flag bd->last == true. If bd->last
> >> is true, the driver should kick things into gear.
> > 
> > OK, looks I misunderstood it. However, virtio-blk doesn't need this
> > change since virtio_queue_rq() can handle it well. This patch may introduce
> > one unnecessary VM world switch in case of queue busy.
> 
> Not it won't, it may in the case of some failure outside of the driver.

If the failure is because of out of tag, blk_mq_dispatch_wake() will
rerun the queue, and the bd->last will be set finally. Or is there
other failure(outside of driver) not covered?

> The only reason that virtio-blk doesn't currently hang is because it
> has restart logic, and the failure case only happens in the if we
> already have IO in-flight.

Yeah, virtqueue_kick() is called in case of any error in virtio_queue_rq(),
so I am still wondering why we have to implement .commit_rqs() for virtio-blk.

> For the NVMe variant, that's not going to be the case.

OK.

> 
> > IMO bd->last won't work well in case of io scheduler given the rq_list
> > only includes one single request.
> 
> But that's a fake limitation that definitely should just be lifted,
> the fact that blk-mq-sched is _currently_ just doing single requests
> is woefully inefficient.

I agree, but seems a bit hard given we have to consider request
merge.

> 
> > I wrote this kind of patch(never posted) before to use sort of
> > ->commits_rqs() to replace the current bd->last mechanism which need
> > one extra driver tag, which may improve the above case, also code gets
> > cleaned up.
> 
> It doesn't need one extra driver tag, we currently get an extra one just
> to flag ->last correctly. That's not a requirement, that's a limitation
> of the current implementation. We could get rid of that, and it it
> proves to be an issue, that's not hard to do.

What do you think about using .commit_rqs() to replace ->last? For
example, just call .commit_rqs() after the last request is queued to
driver successfully. Then we can remove bd->last and avoid to get the
extra tag for figuring out bd->last.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook
@ 2018-11-29  2:51             ` Ming Lei
  0 siblings, 0 replies; 84+ messages in thread
From: Ming Lei @ 2018-11-29  2:51 UTC (permalink / raw)


On Wed, Nov 28, 2018@07:19:09PM -0700, Jens Axboe wrote:
> On 11/28/18 6:23 PM, Ming Lei wrote:
> > On Tue, Nov 27, 2018@07:34:51PM -0700, Jens Axboe wrote:
> >> On 11/27/18 7:10 PM, Ming Lei wrote:
> >>> On Mon, Nov 26, 2018@09:35:53AM -0700, Jens Axboe wrote:
> >>>> We need this for blk-mq to kick things into gear, if we told it that
> >>>> we had more IO coming, but then failed to deliver on that promise.
> >>>>
> >>>> Signed-off-by: Jens Axboe <axboe at kernel.dk>
> >>>> ---
> >>>>  drivers/block/virtio_blk.c | 15 +++++++++++++++
> >>>>  1 file changed, 15 insertions(+)
> >>>>
> >>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> >>>> index 6e869d05f91e..b49c57e77780 100644
> >>>> --- a/drivers/block/virtio_blk.c
> >>>> +++ b/drivers/block/virtio_blk.c
> >>>> @@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq)
> >>>>  	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
> >>>>  }
> >>>>  
> >>>> +static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
> >>>> +{
> >>>> +	struct virtio_blk *vblk = hctx->queue->queuedata;
> >>>> +	int qid = hctx->queue_num;
> >>>> +	bool kick;
> >>>> +
> >>>> +	spin_lock_irq(&vblk->vqs[qid].lock);
> >>>> +	kick = virtqueue_kick_prepare(vblk->vqs[qid].vq);
> >>>> +	spin_unlock_irq(&vblk->vqs[qid].lock);
> >>>> +
> >>>> +	if (kick)
> >>>> +		virtqueue_notify(vblk->vqs[qid].vq);
> >>>> +}
> >>>> +
> >>>>  static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
> >>>>  			   const struct blk_mq_queue_data *bd)
> >>>>  {
> >>>> @@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req)
> >>>>  
> >>>>  static const struct blk_mq_ops virtio_mq_ops = {
> >>>>  	.queue_rq	= virtio_queue_rq,
> >>>> +	.commit_rqs	= virtio_commit_rqs,
> >>>>  	.complete	= virtblk_request_done,
> >>>>  	.init_request	= virtblk_init_request,
> >>>>  #ifdef CONFIG_VIRTIO_BLK_SCSI
> >>>> -- 
> >>>> 2.17.1
> >>>>
> >>>
> >>> If .commit_rqs() is implemented, virtqueue_notify() in virtio_queue_rq()
> >>> should have been removed for saving the world switch per .queue_rq()
> >>
> >> ->commits_rqs() is only for the case where bd->last is set to false,
> >> and we never make it to the end and flag bd->last == true. If bd->last
> >> is true, the driver should kick things into gear.
> > 
> > OK, looks I misunderstood it. However, virtio-blk doesn't need this
> > change since virtio_queue_rq() can handle it well. This patch may introduce
> > one unnecessary VM world switch in case of queue busy.
> 
> Not it won't, it may in the case of some failure outside of the driver.

If the failure is because of out of tag, blk_mq_dispatch_wake() will
rerun the queue, and the bd->last will be set finally. Or is there
other failure(outside of driver) not covered?

> The only reason that virtio-blk doesn't currently hang is because it
> has restart logic, and the failure case only happens in the if we
> already have IO in-flight.

Yeah, virtqueue_kick() is called in case of any error in virtio_queue_rq(),
so I am still wondering why we have to implement .commit_rqs() for virtio-blk.

> For the NVMe variant, that's not going to be the case.

OK.

> 
> > IMO bd->last won't work well in case of io scheduler given the rq_list
> > only includes one single request.
> 
> But that's a fake limitation that definitely should just be lifted,
> the fact that blk-mq-sched is _currently_ just doing single requests
> is woefully inefficient.

I agree, but seems a bit hard given we have to consider request
merge.

> 
> > I wrote this kind of patch(never posted) before to use sort of
> > ->commits_rqs() to replace the current bd->last mechanism which need
> > one extra driver tag, which may improve the above case, also code gets
> > cleaned up.
> 
> It doesn't need one extra driver tag, we currently get an extra one just
> to flag ->last correctly. That's not a requirement, that's a limitation
> of the current implementation. We could get rid of that, and it it
> proves to be an issue, that's not hard to do.

What do you think about using .commit_rqs() to replace ->last? For
example, just call .commit_rqs() after the last request is queued to
driver successfully. Then we can remove bd->last and avoid to get the
extra tag for figuring out bd->last.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook
  2018-11-29  2:51             ` Ming Lei
@ 2018-11-29  3:13               ` Jens Axboe
  -1 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-29  3:13 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block, linux-nvme

On 11/28/18 7:51 PM, Ming Lei wrote:
> On Wed, Nov 28, 2018 at 07:19:09PM -0700, Jens Axboe wrote:
>> On 11/28/18 6:23 PM, Ming Lei wrote:
>>> On Tue, Nov 27, 2018 at 07:34:51PM -0700, Jens Axboe wrote:
>>>> On 11/27/18 7:10 PM, Ming Lei wrote:
>>>>> On Mon, Nov 26, 2018 at 09:35:53AM -0700, Jens Axboe wrote:
>>>>>> We need this for blk-mq to kick things into gear, if we told it that
>>>>>> we had more IO coming, but then failed to deliver on that promise.
>>>>>>
>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>>>> ---
>>>>>>  drivers/block/virtio_blk.c | 15 +++++++++++++++
>>>>>>  1 file changed, 15 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>>>>>> index 6e869d05f91e..b49c57e77780 100644
>>>>>> --- a/drivers/block/virtio_blk.c
>>>>>> +++ b/drivers/block/virtio_blk.c
>>>>>> @@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq)
>>>>>>  	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>>>>>>  }
>>>>>>  
>>>>>> +static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
>>>>>> +{
>>>>>> +	struct virtio_blk *vblk = hctx->queue->queuedata;
>>>>>> +	int qid = hctx->queue_num;
>>>>>> +	bool kick;
>>>>>> +
>>>>>> +	spin_lock_irq(&vblk->vqs[qid].lock);
>>>>>> +	kick = virtqueue_kick_prepare(vblk->vqs[qid].vq);
>>>>>> +	spin_unlock_irq(&vblk->vqs[qid].lock);
>>>>>> +
>>>>>> +	if (kick)
>>>>>> +		virtqueue_notify(vblk->vqs[qid].vq);
>>>>>> +}
>>>>>> +
>>>>>>  static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>>>>>>  			   const struct blk_mq_queue_data *bd)
>>>>>>  {
>>>>>> @@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req)
>>>>>>  
>>>>>>  static const struct blk_mq_ops virtio_mq_ops = {
>>>>>>  	.queue_rq	= virtio_queue_rq,
>>>>>> +	.commit_rqs	= virtio_commit_rqs,
>>>>>>  	.complete	= virtblk_request_done,
>>>>>>  	.init_request	= virtblk_init_request,
>>>>>>  #ifdef CONFIG_VIRTIO_BLK_SCSI
>>>>>> -- 
>>>>>> 2.17.1
>>>>>>
>>>>>
>>>>> If .commit_rqs() is implemented, virtqueue_notify() in virtio_queue_rq()
>>>>> should have been removed for saving the world switch per .queue_rq()
>>>>
>>>> ->commits_rqs() is only for the case where bd->last is set to false,
>>>> and we never make it to the end and flag bd->last == true. If bd->last
>>>> is true, the driver should kick things into gear.
>>>
>>> OK, looks I misunderstood it. However, virtio-blk doesn't need this
>>> change since virtio_queue_rq() can handle it well. This patch may introduce
>>> one unnecessary VM world switch in case of queue busy.
>>
>> Not it won't, it may in the case of some failure outside of the driver.
> 
> If the failure is because of out of tag, blk_mq_dispatch_wake() will
> rerun the queue, and the bd->last will be set finally. Or is there
> other failure(outside of driver) not covered?

The point is to make this happen when we commit the IOs, not needing to
do a restart (or relying on IO being in-flight). If we're submitting a
string of requests, we should not rely on failures happening only due to
IO being going and thus restarting us. It defeats the purpose of even
having ->last in the first place.

>> The only reason that virtio-blk doesn't currently hang is because it
>> has restart logic, and the failure case only happens in the if we
>> already have IO in-flight.
> 
> Yeah, virtqueue_kick() is called in case of any error in
> virtio_queue_rq(), so I am still wondering why we have to implement
> .commit_rqs() for virtio-blk.

It's not strictly needed for virtio-blk with the restart logic that it
has, but I think it'd be nicer to kill that since we have other real use
cases of bd->last at this point.

>>> IMO bd->last won't work well in case of io scheduler given the rq_list
>>> only includes one single request.
>>
>> But that's a fake limitation that definitely should just be lifted,
>> the fact that blk-mq-sched is _currently_ just doing single requests
>> is woefully inefficient.
> 
> I agree, but seems a bit hard given we have to consider request
> merge.

We don't have to drain everything, it should still be feasible to submit
at least a batch of requests. For basic sequential IO, you want to leave
the last one in the queue, if you have IOs going, for instance. But
doing each and every request individually is a huge extra task. Doing
IOPS comparisons of kyber and no scheduler reveals that to be very true.

>>> I wrote this kind of patch(never posted) before to use sort of
>>> ->commits_rqs() to replace the current bd->last mechanism which need
>>> one extra driver tag, which may improve the above case, also code gets
>>> cleaned up.
>>
>> It doesn't need one extra driver tag, we currently get an extra one just
>> to flag ->last correctly. That's not a requirement, that's a limitation
>> of the current implementation. We could get rid of that, and it it
>> proves to be an issue, that's not hard to do.
> 
> What do you think about using .commit_rqs() to replace ->last? For
> example, just call .commit_rqs() after the last request is queued to
> driver successfully. Then we can remove bd->last and avoid to get the
> extra tag for figuring out bd->last.

I don't want to make ->commit_rqs() part of the regular execution, it
should be relegated to the "failure" case of not being able to fulfil
our promise of sending a request with bd->last == true. Reasons
mentioned earlier, but basically it's more efficient to commit from
inside ->queue_rq() if we can, so we don't have to re-grab the
submission lock needlessly.

I like the idea of separate ->queue and ->commit, but in practice I
don't see it working out without a performance penalty.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook
@ 2018-11-29  3:13               ` Jens Axboe
  0 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-29  3:13 UTC (permalink / raw)


On 11/28/18 7:51 PM, Ming Lei wrote:
> On Wed, Nov 28, 2018@07:19:09PM -0700, Jens Axboe wrote:
>> On 11/28/18 6:23 PM, Ming Lei wrote:
>>> On Tue, Nov 27, 2018@07:34:51PM -0700, Jens Axboe wrote:
>>>> On 11/27/18 7:10 PM, Ming Lei wrote:
>>>>> On Mon, Nov 26, 2018@09:35:53AM -0700, Jens Axboe wrote:
>>>>>> We need this for blk-mq to kick things into gear, if we told it that
>>>>>> we had more IO coming, but then failed to deliver on that promise.
>>>>>>
>>>>>> Signed-off-by: Jens Axboe <axboe at kernel.dk>
>>>>>> ---
>>>>>>  drivers/block/virtio_blk.c | 15 +++++++++++++++
>>>>>>  1 file changed, 15 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>>>>>> index 6e869d05f91e..b49c57e77780 100644
>>>>>> --- a/drivers/block/virtio_blk.c
>>>>>> +++ b/drivers/block/virtio_blk.c
>>>>>> @@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq)
>>>>>>  	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>>>>>>  }
>>>>>>  
>>>>>> +static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
>>>>>> +{
>>>>>> +	struct virtio_blk *vblk = hctx->queue->queuedata;
>>>>>> +	int qid = hctx->queue_num;
>>>>>> +	bool kick;
>>>>>> +
>>>>>> +	spin_lock_irq(&vblk->vqs[qid].lock);
>>>>>> +	kick = virtqueue_kick_prepare(vblk->vqs[qid].vq);
>>>>>> +	spin_unlock_irq(&vblk->vqs[qid].lock);
>>>>>> +
>>>>>> +	if (kick)
>>>>>> +		virtqueue_notify(vblk->vqs[qid].vq);
>>>>>> +}
>>>>>> +
>>>>>>  static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>>>>>>  			   const struct blk_mq_queue_data *bd)
>>>>>>  {
>>>>>> @@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req)
>>>>>>  
>>>>>>  static const struct blk_mq_ops virtio_mq_ops = {
>>>>>>  	.queue_rq	= virtio_queue_rq,
>>>>>> +	.commit_rqs	= virtio_commit_rqs,
>>>>>>  	.complete	= virtblk_request_done,
>>>>>>  	.init_request	= virtblk_init_request,
>>>>>>  #ifdef CONFIG_VIRTIO_BLK_SCSI
>>>>>> -- 
>>>>>> 2.17.1
>>>>>>
>>>>>
>>>>> If .commit_rqs() is implemented, virtqueue_notify() in virtio_queue_rq()
>>>>> should have been removed for saving the world switch per .queue_rq()
>>>>
>>>> ->commits_rqs() is only for the case where bd->last is set to false,
>>>> and we never make it to the end and flag bd->last == true. If bd->last
>>>> is true, the driver should kick things into gear.
>>>
>>> OK, looks I misunderstood it. However, virtio-blk doesn't need this
>>> change since virtio_queue_rq() can handle it well. This patch may introduce
>>> one unnecessary VM world switch in case of queue busy.
>>
>> Not it won't, it may in the case of some failure outside of the driver.
> 
> If the failure is because of out of tag, blk_mq_dispatch_wake() will
> rerun the queue, and the bd->last will be set finally. Or is there
> other failure(outside of driver) not covered?

The point is to make this happen when we commit the IOs, not needing to
do a restart (or relying on IO being in-flight). If we're submitting a
string of requests, we should not rely on failures happening only due to
IO being going and thus restarting us. It defeats the purpose of even
having ->last in the first place.

>> The only reason that virtio-blk doesn't currently hang is because it
>> has restart logic, and the failure case only happens in the if we
>> already have IO in-flight.
> 
> Yeah, virtqueue_kick() is called in case of any error in
> virtio_queue_rq(), so I am still wondering why we have to implement
> .commit_rqs() for virtio-blk.

It's not strictly needed for virtio-blk with the restart logic that it
has, but I think it'd be nicer to kill that since we have other real use
cases of bd->last at this point.

>>> IMO bd->last won't work well in case of io scheduler given the rq_list
>>> only includes one single request.
>>
>> But that's a fake limitation that definitely should just be lifted,
>> the fact that blk-mq-sched is _currently_ just doing single requests
>> is woefully inefficient.
> 
> I agree, but seems a bit hard given we have to consider request
> merge.

We don't have to drain everything, it should still be feasible to submit
at least a batch of requests. For basic sequential IO, you want to leave
the last one in the queue, if you have IOs going, for instance. But
doing each and every request individually is a huge extra task. Doing
IOPS comparisons of kyber and no scheduler reveals that to be very true.

>>> I wrote this kind of patch(never posted) before to use sort of
>>> ->commits_rqs() to replace the current bd->last mechanism which need
>>> one extra driver tag, which may improve the above case, also code gets
>>> cleaned up.
>>
>> It doesn't need one extra driver tag, we currently get an extra one just
>> to flag ->last correctly. That's not a requirement, that's a limitation
>> of the current implementation. We could get rid of that, and it it
>> proves to be an issue, that's not hard to do.
> 
> What do you think about using .commit_rqs() to replace ->last? For
> example, just call .commit_rqs() after the last request is queued to
> driver successfully. Then we can remove bd->last and avoid to get the
> extra tag for figuring out bd->last.

I don't want to make ->commit_rqs() part of the regular execution, it
should be relegated to the "failure" case of not being able to fulfil
our promise of sending a request with bd->last == true. Reasons
mentioned earlier, but basically it's more efficient to commit from
inside ->queue_rq() if we can, so we don't have to re-grab the
submission lock needlessly.

I like the idea of separate ->queue and ->commit, but in practice I
don't see it working out without a performance penalty.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook
  2018-11-29  3:13               ` Jens Axboe
@ 2018-11-29  3:27                 ` Ming Lei
  -1 siblings, 0 replies; 84+ messages in thread
From: Ming Lei @ 2018-11-29  3:27 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

On Wed, Nov 28, 2018 at 08:13:43PM -0700, Jens Axboe wrote:
> On 11/28/18 7:51 PM, Ming Lei wrote:
> > On Wed, Nov 28, 2018 at 07:19:09PM -0700, Jens Axboe wrote:
> >> On 11/28/18 6:23 PM, Ming Lei wrote:
> >>> On Tue, Nov 27, 2018 at 07:34:51PM -0700, Jens Axboe wrote:
> >>>> On 11/27/18 7:10 PM, Ming Lei wrote:
> >>>>> On Mon, Nov 26, 2018 at 09:35:53AM -0700, Jens Axboe wrote:
> >>>>>> We need this for blk-mq to kick things into gear, if we told it that
> >>>>>> we had more IO coming, but then failed to deliver on that promise.
> >>>>>>
> >>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >>>>>> ---
> >>>>>>  drivers/block/virtio_blk.c | 15 +++++++++++++++
> >>>>>>  1 file changed, 15 insertions(+)
> >>>>>>
> >>>>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> >>>>>> index 6e869d05f91e..b49c57e77780 100644
> >>>>>> --- a/drivers/block/virtio_blk.c
> >>>>>> +++ b/drivers/block/virtio_blk.c
> >>>>>> @@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq)
> >>>>>>  	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
> >>>>>>  }
> >>>>>>  
> >>>>>> +static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
> >>>>>> +{
> >>>>>> +	struct virtio_blk *vblk = hctx->queue->queuedata;
> >>>>>> +	int qid = hctx->queue_num;
> >>>>>> +	bool kick;
> >>>>>> +
> >>>>>> +	spin_lock_irq(&vblk->vqs[qid].lock);
> >>>>>> +	kick = virtqueue_kick_prepare(vblk->vqs[qid].vq);
> >>>>>> +	spin_unlock_irq(&vblk->vqs[qid].lock);
> >>>>>> +
> >>>>>> +	if (kick)
> >>>>>> +		virtqueue_notify(vblk->vqs[qid].vq);
> >>>>>> +}
> >>>>>> +
> >>>>>>  static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
> >>>>>>  			   const struct blk_mq_queue_data *bd)
> >>>>>>  {
> >>>>>> @@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req)
> >>>>>>  
> >>>>>>  static const struct blk_mq_ops virtio_mq_ops = {
> >>>>>>  	.queue_rq	= virtio_queue_rq,
> >>>>>> +	.commit_rqs	= virtio_commit_rqs,
> >>>>>>  	.complete	= virtblk_request_done,
> >>>>>>  	.init_request	= virtblk_init_request,
> >>>>>>  #ifdef CONFIG_VIRTIO_BLK_SCSI
> >>>>>> -- 
> >>>>>> 2.17.1
> >>>>>>
> >>>>>
> >>>>> If .commit_rqs() is implemented, virtqueue_notify() in virtio_queue_rq()
> >>>>> should have been removed for saving the world switch per .queue_rq()
> >>>>
> >>>> ->commits_rqs() is only for the case where bd->last is set to false,
> >>>> and we never make it to the end and flag bd->last == true. If bd->last
> >>>> is true, the driver should kick things into gear.
> >>>
> >>> OK, looks I misunderstood it. However, virtio-blk doesn't need this
> >>> change since virtio_queue_rq() can handle it well. This patch may introduce
> >>> one unnecessary VM world switch in case of queue busy.
> >>
> >> Not it won't, it may in the case of some failure outside of the driver.
> > 
> > If the failure is because of out of tag, blk_mq_dispatch_wake() will
> > rerun the queue, and the bd->last will be set finally. Or is there
> > other failure(outside of driver) not covered?
> 
> The point is to make this happen when we commit the IOs, not needing to
> do a restart (or relying on IO being in-flight). If we're submitting a
> string of requests, we should not rely on failures happening only due to
> IO being going and thus restarting us. It defeats the purpose of even
> having ->last in the first place.

OK, it makes sense.

> 
> >> The only reason that virtio-blk doesn't currently hang is because it
> >> has restart logic, and the failure case only happens in the if we
> >> already have IO in-flight.
> > 
> > Yeah, virtqueue_kick() is called in case of any error in
> > virtio_queue_rq(), so I am still wondering why we have to implement
> > .commit_rqs() for virtio-blk.
> 
> It's not strictly needed for virtio-blk with the restart logic that it
> has, but I think it'd be nicer to kill that since we have other real use
> cases of bd->last at this point.
> 
> >>> IMO bd->last won't work well in case of io scheduler given the rq_list
> >>> only includes one single request.
> >>
> >> But that's a fake limitation that definitely should just be lifted,
> >> the fact that blk-mq-sched is _currently_ just doing single requests
> >> is woefully inefficient.
> > 
> > I agree, but seems a bit hard given we have to consider request
> > merge.
> 
> We don't have to drain everything, it should still be feasible to submit
> at least a batch of requests. For basic sequential IO, you want to leave
> the last one in the queue, if you have IOs going, for instance. But
> doing each and every request individually is a huge extra task. Doing
> IOPS comparisons of kyber and no scheduler reveals that to be very true.
> 
> >>> I wrote this kind of patch(never posted) before to use sort of
> >>> ->commits_rqs() to replace the current bd->last mechanism which need
> >>> one extra driver tag, which may improve the above case, also code gets
> >>> cleaned up.
> >>
> >> It doesn't need one extra driver tag, we currently get an extra one just
> >> to flag ->last correctly. That's not a requirement, that's a limitation
> >> of the current implementation. We could get rid of that, and it it
> >> proves to be an issue, that's not hard to do.
> > 
> > What do you think about using .commit_rqs() to replace ->last? For
> > example, just call .commit_rqs() after the last request is queued to
> > driver successfully. Then we can remove bd->last and avoid to get the
> > extra tag for figuring out bd->last.
> 
> I don't want to make ->commit_rqs() part of the regular execution, it
> should be relegated to the "failure" case of not being able to fulfil
> our promise of sending a request with bd->last == true. Reasons
> mentioned earlier, but basically it's more efficient to commit from
> inside ->queue_rq() if we can, so we don't have to re-grab the
> submission lock needlessly.
> 
> I like the idea of separate ->queue and ->commit, but in practice I
> don't see it working out without a performance penalty.

Thanks for your detailed explanation, this patch looks fine:

Reviewed-by: Ming Lei <ming.lei@redhat.com>

Thanks,
Ming

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook
@ 2018-11-29  3:27                 ` Ming Lei
  0 siblings, 0 replies; 84+ messages in thread
From: Ming Lei @ 2018-11-29  3:27 UTC (permalink / raw)


On Wed, Nov 28, 2018@08:13:43PM -0700, Jens Axboe wrote:
> On 11/28/18 7:51 PM, Ming Lei wrote:
> > On Wed, Nov 28, 2018@07:19:09PM -0700, Jens Axboe wrote:
> >> On 11/28/18 6:23 PM, Ming Lei wrote:
> >>> On Tue, Nov 27, 2018@07:34:51PM -0700, Jens Axboe wrote:
> >>>> On 11/27/18 7:10 PM, Ming Lei wrote:
> >>>>> On Mon, Nov 26, 2018@09:35:53AM -0700, Jens Axboe wrote:
> >>>>>> We need this for blk-mq to kick things into gear, if we told it that
> >>>>>> we had more IO coming, but then failed to deliver on that promise.
> >>>>>>
> >>>>>> Signed-off-by: Jens Axboe <axboe at kernel.dk>
> >>>>>> ---
> >>>>>>  drivers/block/virtio_blk.c | 15 +++++++++++++++
> >>>>>>  1 file changed, 15 insertions(+)
> >>>>>>
> >>>>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> >>>>>> index 6e869d05f91e..b49c57e77780 100644
> >>>>>> --- a/drivers/block/virtio_blk.c
> >>>>>> +++ b/drivers/block/virtio_blk.c
> >>>>>> @@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq)
> >>>>>>  	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
> >>>>>>  }
> >>>>>>  
> >>>>>> +static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
> >>>>>> +{
> >>>>>> +	struct virtio_blk *vblk = hctx->queue->queuedata;
> >>>>>> +	int qid = hctx->queue_num;
> >>>>>> +	bool kick;
> >>>>>> +
> >>>>>> +	spin_lock_irq(&vblk->vqs[qid].lock);
> >>>>>> +	kick = virtqueue_kick_prepare(vblk->vqs[qid].vq);
> >>>>>> +	spin_unlock_irq(&vblk->vqs[qid].lock);
> >>>>>> +
> >>>>>> +	if (kick)
> >>>>>> +		virtqueue_notify(vblk->vqs[qid].vq);
> >>>>>> +}
> >>>>>> +
> >>>>>>  static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
> >>>>>>  			   const struct blk_mq_queue_data *bd)
> >>>>>>  {
> >>>>>> @@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req)
> >>>>>>  
> >>>>>>  static const struct blk_mq_ops virtio_mq_ops = {
> >>>>>>  	.queue_rq	= virtio_queue_rq,
> >>>>>> +	.commit_rqs	= virtio_commit_rqs,
> >>>>>>  	.complete	= virtblk_request_done,
> >>>>>>  	.init_request	= virtblk_init_request,
> >>>>>>  #ifdef CONFIG_VIRTIO_BLK_SCSI
> >>>>>> -- 
> >>>>>> 2.17.1
> >>>>>>
> >>>>>
> >>>>> If .commit_rqs() is implemented, virtqueue_notify() in virtio_queue_rq()
> >>>>> should have been removed for saving the world switch per .queue_rq()
> >>>>
> >>>> ->commits_rqs() is only for the case where bd->last is set to false,
> >>>> and we never make it to the end and flag bd->last == true. If bd->last
> >>>> is true, the driver should kick things into gear.
> >>>
> >>> OK, looks I misunderstood it. However, virtio-blk doesn't need this
> >>> change since virtio_queue_rq() can handle it well. This patch may introduce
> >>> one unnecessary VM world switch in case of queue busy.
> >>
> >> Not it won't, it may in the case of some failure outside of the driver.
> > 
> > If the failure is because of out of tag, blk_mq_dispatch_wake() will
> > rerun the queue, and the bd->last will be set finally. Or is there
> > other failure(outside of driver) not covered?
> 
> The point is to make this happen when we commit the IOs, not needing to
> do a restart (or relying on IO being in-flight). If we're submitting a
> string of requests, we should not rely on failures happening only due to
> IO being going and thus restarting us. It defeats the purpose of even
> having ->last in the first place.

OK, it makes sense.

> 
> >> The only reason that virtio-blk doesn't currently hang is because it
> >> has restart logic, and the failure case only happens in the if we
> >> already have IO in-flight.
> > 
> > Yeah, virtqueue_kick() is called in case of any error in
> > virtio_queue_rq(), so I am still wondering why we have to implement
> > .commit_rqs() for virtio-blk.
> 
> It's not strictly needed for virtio-blk with the restart logic that it
> has, but I think it'd be nicer to kill that since we have other real use
> cases of bd->last at this point.
> 
> >>> IMO bd->last won't work well in case of io scheduler given the rq_list
> >>> only includes one single request.
> >>
> >> But that's a fake limitation that definitely should just be lifted,
> >> the fact that blk-mq-sched is _currently_ just doing single requests
> >> is woefully inefficient.
> > 
> > I agree, but seems a bit hard given we have to consider request
> > merge.
> 
> We don't have to drain everything, it should still be feasible to submit
> at least a batch of requests. For basic sequential IO, you want to leave
> the last one in the queue, if you have IOs going, for instance. But
> doing each and every request individually is a huge extra task. Doing
> IOPS comparisons of kyber and no scheduler reveals that to be very true.
> 
> >>> I wrote this kind of patch(never posted) before to use sort of
> >>> ->commits_rqs() to replace the current bd->last mechanism which need
> >>> one extra driver tag, which may improve the above case, also code gets
> >>> cleaned up.
> >>
> >> It doesn't need one extra driver tag, we currently get an extra one just
> >> to flag ->last correctly. That's not a requirement, that's a limitation
> >> of the current implementation. We could get rid of that, and it it
> >> proves to be an issue, that's not hard to do.
> > 
> > What do you think about using .commit_rqs() to replace ->last? For
> > example, just call .commit_rqs() after the last request is queued to
> > driver successfully. Then we can remove bd->last and avoid to get the
> > extra tag for figuring out bd->last.
> 
> I don't want to make ->commit_rqs() part of the regular execution, it
> should be relegated to the "failure" case of not being able to fulfil
> our promise of sending a request with bd->last == true. Reasons
> mentioned earlier, but basically it's more efficient to commit from
> inside ->queue_rq() if we can, so we don't have to re-grab the
> submission lock needlessly.
> 
> I like the idea of separate ->queue and ->commit, but in practice I
> don't see it working out without a performance penalty.

Thanks for your detailed explanation, this patch looks fine:

Reviewed-by: Ming Lei <ming.lei at redhat.com>

Thanks,
Ming

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook
  2018-11-29  3:27                 ` Ming Lei
@ 2018-11-29  3:53                   ` Jens Axboe
  -1 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-29  3:53 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block, linux-nvme

On 11/28/18 8:27 PM, Ming Lei wrote:
> On Wed, Nov 28, 2018 at 08:13:43PM -0700, Jens Axboe wrote:
>> On 11/28/18 7:51 PM, Ming Lei wrote:
>>> On Wed, Nov 28, 2018 at 07:19:09PM -0700, Jens Axboe wrote:
>>>> On 11/28/18 6:23 PM, Ming Lei wrote:
>>>>> On Tue, Nov 27, 2018 at 07:34:51PM -0700, Jens Axboe wrote:
>>>>>> On 11/27/18 7:10 PM, Ming Lei wrote:
>>>>>>> On Mon, Nov 26, 2018 at 09:35:53AM -0700, Jens Axboe wrote:
>>>>>>>> We need this for blk-mq to kick things into gear, if we told it that
>>>>>>>> we had more IO coming, but then failed to deliver on that promise.
>>>>>>>>
>>>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>> ---
>>>>>>>>  drivers/block/virtio_blk.c | 15 +++++++++++++++
>>>>>>>>  1 file changed, 15 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>>>>>>>> index 6e869d05f91e..b49c57e77780 100644
>>>>>>>> --- a/drivers/block/virtio_blk.c
>>>>>>>> +++ b/drivers/block/virtio_blk.c
>>>>>>>> @@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq)
>>>>>>>>  	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> +static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
>>>>>>>> +{
>>>>>>>> +	struct virtio_blk *vblk = hctx->queue->queuedata;
>>>>>>>> +	int qid = hctx->queue_num;
>>>>>>>> +	bool kick;
>>>>>>>> +
>>>>>>>> +	spin_lock_irq(&vblk->vqs[qid].lock);
>>>>>>>> +	kick = virtqueue_kick_prepare(vblk->vqs[qid].vq);
>>>>>>>> +	spin_unlock_irq(&vblk->vqs[qid].lock);
>>>>>>>> +
>>>>>>>> +	if (kick)
>>>>>>>> +		virtqueue_notify(vblk->vqs[qid].vq);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>  static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>>>>>>>>  			   const struct blk_mq_queue_data *bd)
>>>>>>>>  {
>>>>>>>> @@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req)
>>>>>>>>  
>>>>>>>>  static const struct blk_mq_ops virtio_mq_ops = {
>>>>>>>>  	.queue_rq	= virtio_queue_rq,
>>>>>>>> +	.commit_rqs	= virtio_commit_rqs,
>>>>>>>>  	.complete	= virtblk_request_done,
>>>>>>>>  	.init_request	= virtblk_init_request,
>>>>>>>>  #ifdef CONFIG_VIRTIO_BLK_SCSI
>>>>>>>> -- 
>>>>>>>> 2.17.1
>>>>>>>>
>>>>>>>
>>>>>>> If .commit_rqs() is implemented, virtqueue_notify() in virtio_queue_rq()
>>>>>>> should have been removed for saving the world switch per .queue_rq()
>>>>>>
>>>>>> ->commits_rqs() is only for the case where bd->last is set to false,
>>>>>> and we never make it to the end and flag bd->last == true. If bd->last
>>>>>> is true, the driver should kick things into gear.
>>>>>
>>>>> OK, looks I misunderstood it. However, virtio-blk doesn't need this
>>>>> change since virtio_queue_rq() can handle it well. This patch may introduce
>>>>> one unnecessary VM world switch in case of queue busy.
>>>>
>>>> Not it won't, it may in the case of some failure outside of the driver.
>>>
>>> If the failure is because of out of tag, blk_mq_dispatch_wake() will
>>> rerun the queue, and the bd->last will be set finally. Or is there
>>> other failure(outside of driver) not covered?
>>
>> The point is to make this happen when we commit the IOs, not needing to
>> do a restart (or relying on IO being in-flight). If we're submitting a
>> string of requests, we should not rely on failures happening only due to
>> IO being going and thus restarting us. It defeats the purpose of even
>> having ->last in the first place.
> 
> OK, it makes sense.
> 
>>
>>>> The only reason that virtio-blk doesn't currently hang is because it
>>>> has restart logic, and the failure case only happens in the if we
>>>> already have IO in-flight.
>>>
>>> Yeah, virtqueue_kick() is called in case of any error in
>>> virtio_queue_rq(), so I am still wondering why we have to implement
>>> .commit_rqs() for virtio-blk.
>>
>> It's not strictly needed for virtio-blk with the restart logic that it
>> has, but I think it'd be nicer to kill that since we have other real use
>> cases of bd->last at this point.
>>
>>>>> IMO bd->last won't work well in case of io scheduler given the rq_list
>>>>> only includes one single request.
>>>>
>>>> But that's a fake limitation that definitely should just be lifted,
>>>> the fact that blk-mq-sched is _currently_ just doing single requests
>>>> is woefully inefficient.
>>>
>>> I agree, but seems a bit hard given we have to consider request
>>> merge.
>>
>> We don't have to drain everything, it should still be feasible to submit
>> at least a batch of requests. For basic sequential IO, you want to leave
>> the last one in the queue, if you have IOs going, for instance. But
>> doing each and every request individually is a huge extra task. Doing
>> IOPS comparisons of kyber and no scheduler reveals that to be very true.
>>
>>>>> I wrote this kind of patch(never posted) before to use sort of
>>>>> ->commits_rqs() to replace the current bd->last mechanism which need
>>>>> one extra driver tag, which may improve the above case, also code gets
>>>>> cleaned up.
>>>>
>>>> It doesn't need one extra driver tag, we currently get an extra one just
>>>> to flag ->last correctly. That's not a requirement, that's a limitation
>>>> of the current implementation. We could get rid of that, and it it
>>>> proves to be an issue, that's not hard to do.
>>>
>>> What do you think about using .commit_rqs() to replace ->last? For
>>> example, just call .commit_rqs() after the last request is queued to
>>> driver successfully. Then we can remove bd->last and avoid to get the
>>> extra tag for figuring out bd->last.
>>
>> I don't want to make ->commit_rqs() part of the regular execution, it
>> should be relegated to the "failure" case of not being able to fulfil
>> our promise of sending a request with bd->last == true. Reasons
>> mentioned earlier, but basically it's more efficient to commit from
>> inside ->queue_rq() if we can, so we don't have to re-grab the
>> submission lock needlessly.
>>
>> I like the idea of separate ->queue and ->commit, but in practice I
>> don't see it working out without a performance penalty.
> 
> Thanks for your detailed explanation, this patch looks fine:
> 
> Reviewed-by: Ming Lei <ming.lei@redhat.com>

Thanks Ming.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook
@ 2018-11-29  3:53                   ` Jens Axboe
  0 siblings, 0 replies; 84+ messages in thread
From: Jens Axboe @ 2018-11-29  3:53 UTC (permalink / raw)


On 11/28/18 8:27 PM, Ming Lei wrote:
> On Wed, Nov 28, 2018@08:13:43PM -0700, Jens Axboe wrote:
>> On 11/28/18 7:51 PM, Ming Lei wrote:
>>> On Wed, Nov 28, 2018@07:19:09PM -0700, Jens Axboe wrote:
>>>> On 11/28/18 6:23 PM, Ming Lei wrote:
>>>>> On Tue, Nov 27, 2018@07:34:51PM -0700, Jens Axboe wrote:
>>>>>> On 11/27/18 7:10 PM, Ming Lei wrote:
>>>>>>> On Mon, Nov 26, 2018@09:35:53AM -0700, Jens Axboe wrote:
>>>>>>>> We need this for blk-mq to kick things into gear, if we told it that
>>>>>>>> we had more IO coming, but then failed to deliver on that promise.
>>>>>>>>
>>>>>>>> Signed-off-by: Jens Axboe <axboe at kernel.dk>
>>>>>>>> ---
>>>>>>>>  drivers/block/virtio_blk.c | 15 +++++++++++++++
>>>>>>>>  1 file changed, 15 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>>>>>>>> index 6e869d05f91e..b49c57e77780 100644
>>>>>>>> --- a/drivers/block/virtio_blk.c
>>>>>>>> +++ b/drivers/block/virtio_blk.c
>>>>>>>> @@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq)
>>>>>>>>  	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> +static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
>>>>>>>> +{
>>>>>>>> +	struct virtio_blk *vblk = hctx->queue->queuedata;
>>>>>>>> +	int qid = hctx->queue_num;
>>>>>>>> +	bool kick;
>>>>>>>> +
>>>>>>>> +	spin_lock_irq(&vblk->vqs[qid].lock);
>>>>>>>> +	kick = virtqueue_kick_prepare(vblk->vqs[qid].vq);
>>>>>>>> +	spin_unlock_irq(&vblk->vqs[qid].lock);
>>>>>>>> +
>>>>>>>> +	if (kick)
>>>>>>>> +		virtqueue_notify(vblk->vqs[qid].vq);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>  static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>>>>>>>>  			   const struct blk_mq_queue_data *bd)
>>>>>>>>  {
>>>>>>>> @@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req)
>>>>>>>>  
>>>>>>>>  static const struct blk_mq_ops virtio_mq_ops = {
>>>>>>>>  	.queue_rq	= virtio_queue_rq,
>>>>>>>> +	.commit_rqs	= virtio_commit_rqs,
>>>>>>>>  	.complete	= virtblk_request_done,
>>>>>>>>  	.init_request	= virtblk_init_request,
>>>>>>>>  #ifdef CONFIG_VIRTIO_BLK_SCSI
>>>>>>>> -- 
>>>>>>>> 2.17.1
>>>>>>>>
>>>>>>>
>>>>>>> If .commit_rqs() is implemented, virtqueue_notify() in virtio_queue_rq()
>>>>>>> should have been removed for saving the world switch per .queue_rq()
>>>>>>
>>>>>> ->commits_rqs() is only for the case where bd->last is set to false,
>>>>>> and we never make it to the end and flag bd->last == true. If bd->last
>>>>>> is true, the driver should kick things into gear.
>>>>>
>>>>> OK, looks I misunderstood it. However, virtio-blk doesn't need this
>>>>> change since virtio_queue_rq() can handle it well. This patch may introduce
>>>>> one unnecessary VM world switch in case of queue busy.
>>>>
>>>> Not it won't, it may in the case of some failure outside of the driver.
>>>
>>> If the failure is because of out of tag, blk_mq_dispatch_wake() will
>>> rerun the queue, and the bd->last will be set finally. Or is there
>>> other failure(outside of driver) not covered?
>>
>> The point is to make this happen when we commit the IOs, not needing to
>> do a restart (or relying on IO being in-flight). If we're submitting a
>> string of requests, we should not rely on failures happening only due to
>> IO being going and thus restarting us. It defeats the purpose of even
>> having ->last in the first place.
> 
> OK, it makes sense.
> 
>>
>>>> The only reason that virtio-blk doesn't currently hang is because it
>>>> has restart logic, and the failure case only happens in the if we
>>>> already have IO in-flight.
>>>
>>> Yeah, virtqueue_kick() is called in case of any error in
>>> virtio_queue_rq(), so I am still wondering why we have to implement
>>> .commit_rqs() for virtio-blk.
>>
>> It's not strictly needed for virtio-blk with the restart logic that it
>> has, but I think it'd be nicer to kill that since we have other real use
>> cases of bd->last at this point.
>>
>>>>> IMO bd->last won't work well in case of io scheduler given the rq_list
>>>>> only includes one single request.
>>>>
>>>> But that's a fake limitation that definitely should just be lifted,
>>>> the fact that blk-mq-sched is _currently_ just doing single requests
>>>> is woefully inefficient.
>>>
>>> I agree, but seems a bit hard given we have to consider request
>>> merge.
>>
>> We don't have to drain everything, it should still be feasible to submit
>> at least a batch of requests. For basic sequential IO, you want to leave
>> the last one in the queue, if you have IOs going, for instance. But
>> doing each and every request individually is a huge extra task. Doing
>> IOPS comparisons of kyber and no scheduler reveals that to be very true.
>>
>>>>> I wrote this kind of patch(never posted) before to use sort of
>>>>> ->commits_rqs() to replace the current bd->last mechanism which need
>>>>> one extra driver tag, which may improve the above case, also code gets
>>>>> cleaned up.
>>>>
>>>> It doesn't need one extra driver tag, we currently get an extra one just
>>>> to flag ->last correctly. That's not a requirement, that's a limitation
>>>> of the current implementation. We could get rid of that, and it it
>>>> proves to be an issue, that's not hard to do.
>>>
>>> What do you think about using .commit_rqs() to replace ->last? For
>>> example, just call .commit_rqs() after the last request is queued to
>>> driver successfully. Then we can remove bd->last and avoid to get the
>>> extra tag for figuring out bd->last.
>>
>> I don't want to make ->commit_rqs() part of the regular execution, it
>> should be relegated to the "failure" case of not being able to fulfil
>> our promise of sending a request with bd->last == true. Reasons
>> mentioned earlier, but basically it's more efficient to commit from
>> inside ->queue_rq() if we can, so we don't have to re-grab the
>> submission lock needlessly.
>>
>> I like the idea of separate ->queue and ->commit, but in practice I
>> don't see it working out without a performance penalty.
> 
> Thanks for your detailed explanation, this patch looks fine:
> 
> Reviewed-by: Ming Lei <ming.lei at redhat.com>

Thanks Ming.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 84+ messages in thread

end of thread, other threads:[~2018-11-29  3:53 UTC | newest]

Thread overview: 84+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-26 16:35 [PATCHSET 0/8] block plugging improvements Jens Axboe
2018-11-26 16:35 ` Jens Axboe
2018-11-26 16:35 ` [PATCH 1/8] block: sum requests in the plug structure Jens Axboe
2018-11-26 16:35   ` Jens Axboe
2018-11-26 17:02   ` Christoph Hellwig
2018-11-26 17:02     ` Christoph Hellwig
2018-11-26 16:35 ` [PATCH 2/8] block: improve logic around when to sort a plug list Jens Axboe
2018-11-26 16:35   ` Jens Axboe
2018-11-27 23:31   ` Omar Sandoval
2018-11-27 23:31     ` Omar Sandoval
2018-11-27 23:49     ` Jens Axboe
2018-11-27 23:49       ` Jens Axboe
2018-11-27 23:55       ` Omar Sandoval
2018-11-27 23:55         ` Omar Sandoval
2018-11-27 23:59       ` Jens Axboe
2018-11-27 23:59         ` Jens Axboe
2018-11-28  0:05         ` Omar Sandoval
2018-11-28  0:05           ` Omar Sandoval
2018-11-28  0:16           ` Jens Axboe
2018-11-28  0:16             ` Jens Axboe
2018-11-26 16:35 ` [PATCH 3/8] blk-mq: add mq_ops->commit_rqs() Jens Axboe
2018-11-26 16:35   ` Jens Axboe
2018-11-27 23:43   ` Omar Sandoval
2018-11-27 23:43     ` Omar Sandoval
2018-11-28  1:38   ` Ming Lei
2018-11-28  1:38     ` Ming Lei
2018-11-28  7:16   ` Christoph Hellwig
2018-11-28  7:16     ` Christoph Hellwig
2018-11-28 12:54     ` Jens Axboe
2018-11-28 12:54       ` Jens Axboe
2018-11-26 16:35 ` [PATCH 4/8] nvme: implement mq_ops->commit_rqs() hook Jens Axboe
2018-11-26 16:35   ` Jens Axboe
2018-11-28  7:20   ` Christoph Hellwig
2018-11-28  7:20     ` Christoph Hellwig
2018-11-28 13:07     ` Jens Axboe
2018-11-28 13:07       ` Jens Axboe
2018-11-26 16:35 ` [PATCH 5/8] virtio_blk: " Jens Axboe
2018-11-26 16:35   ` Jens Axboe
2018-11-27 23:45   ` Omar Sandoval
2018-11-27 23:45     ` Omar Sandoval
2018-11-28  3:05     ` Michael S. Tsirkin
2018-11-28  3:05       ` Michael S. Tsirkin
2018-11-28  2:10   ` Ming Lei
2018-11-28  2:10     ` Ming Lei
2018-11-28  2:34     ` Jens Axboe
2018-11-28  2:34       ` Jens Axboe
2018-11-29  1:23       ` Ming Lei
2018-11-29  1:23         ` Ming Lei
2018-11-29  2:19         ` Jens Axboe
2018-11-29  2:19           ` Jens Axboe
2018-11-29  2:51           ` Ming Lei
2018-11-29  2:51             ` Ming Lei
2018-11-29  3:13             ` Jens Axboe
2018-11-29  3:13               ` Jens Axboe
2018-11-29  3:27               ` Ming Lei
2018-11-29  3:27                 ` Ming Lei
2018-11-29  3:53                 ` Jens Axboe
2018-11-29  3:53                   ` Jens Axboe
2018-11-28  7:21   ` Christoph Hellwig
2018-11-28  7:21     ` Christoph Hellwig
2018-11-26 16:35 ` [PATCH 6/8] ataflop: " Jens Axboe
2018-11-26 16:35   ` Jens Axboe
2018-11-27 23:46   ` Omar Sandoval
2018-11-27 23:46     ` Omar Sandoval
2018-11-28  7:22   ` Christoph Hellwig
2018-11-28  7:22     ` Christoph Hellwig
2018-11-28 13:09     ` Jens Axboe
2018-11-28 13:09       ` Jens Axboe
2018-11-26 16:35 ` [PATCH 7/8] blk-mq: use bd->last == true for list inserts Jens Axboe
2018-11-26 16:35   ` Jens Axboe
2018-11-27 23:49   ` Omar Sandoval
2018-11-27 23:49     ` Omar Sandoval
2018-11-27 23:51     ` Jens Axboe
2018-11-27 23:51       ` Jens Axboe
2018-11-28  1:49   ` Ming Lei
2018-11-28  1:49     ` Ming Lei
2018-11-28  2:37     ` Jens Axboe
2018-11-28  2:37       ` Jens Axboe
2018-11-26 16:35 ` [PATCH 8/8] blk-mq: add plug case for devices that implement ->commits_rqs() Jens Axboe
2018-11-26 16:35   ` Jens Axboe
2018-11-28  7:26   ` Christoph Hellwig
2018-11-28  7:26     ` Christoph Hellwig
2018-11-28 13:11     ` Jens Axboe
2018-11-28 13:11       ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.