linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET v2 0/4] Add support for list issue
@ 2021-12-03 21:45 Jens Axboe
  2021-12-03 21:45 ` [PATCH 1/4] block: add mq_ops->queue_rqs hook Jens Axboe
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Jens Axboe @ 2021-12-03 21:45 UTC (permalink / raw)
  To: linux-block, linux-nvme

Hi,

With the support in 5.16-rc1 for allocating and completing batches of
IO, the one missing piece is passing down a list of requests for issue.
Drivers can take advantage of this by defining an mq_ops->queue_rqs()
hook.

This implements it for NVMe, allowing copy of multiple commands in one
swoop.

This is good for around a 500K IOPS/core improvement in my testing,
which is around a 5-6% improvement in efficiency.

Changes since v1:

- Addressed review comments
- Rebase on top of Ming's hctx lock change
- Clean ups
- Bypass for shared tags

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] block: add mq_ops->queue_rqs hook
  2021-12-03 21:45 [PATCHSET v2 0/4] Add support for list issue Jens Axboe
@ 2021-12-03 21:45 ` Jens Axboe
  2021-12-04 10:43   ` Hannes Reinecke
  2021-12-03 21:45 ` [PATCH 2/4] nvme: split command copy into a helper Jens Axboe
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2021-12-03 21:45 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: Jens Axboe

If we have a list of requests in our plug list, send it to the driver in
one go, if possible. The driver must set mq_ops->queue_rqs() to support
this, if not the usual one-by-one path is used.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-mq.c         | 24 +++++++++++++++++++++---
 include/linux/blk-mq.h |  8 ++++++++
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 22ec21aa0c22..9ac9174a2ba4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2513,6 +2513,7 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
 {
 	struct blk_mq_hw_ctx *this_hctx;
 	struct blk_mq_ctx *this_ctx;
+	struct request *rq;
 	unsigned int depth;
 	LIST_HEAD(list);
 
@@ -2521,7 +2522,26 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
 	plug->rq_count = 0;
 
 	if (!plug->multiple_queues && !plug->has_elevator && !from_schedule) {
-		blk_mq_run_dispatch_ops(plug->mq_list->q,
+		struct request_queue *q;
+
+		rq = plug->mq_list;
+		q = rq->q;
+
+		/*
+		 * Peek first request and see if we have a ->queue_rqs() hook.
+		 * If we do, we can dispatch the whole plug list in one go. We
+		 * already know at this point that all requests belong to the
+		 * same queue, caller must ensure that's the case.
+		 */
+		if (q->mq_ops->queue_rqs &&
+		    !(rq->mq_hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
+			blk_mq_run_dispatch_ops(q,
+				q->mq_ops->queue_rqs(&plug->mq_list));
+			if (rq_list_empty(plug->mq_list))
+				return;
+		}
+
+		blk_mq_run_dispatch_ops(q,
 				blk_mq_plug_issue_direct(plug, false));
 		if (rq_list_empty(plug->mq_list))
 			return;
@@ -2531,8 +2551,6 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
 	this_ctx = NULL;
 	depth = 0;
 	do {
-		struct request *rq;
-
 		rq = rq_list_pop(&plug->mq_list);
 
 		if (!this_hctx) {
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index ecdc049b52fa..cdd183757ea0 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -494,6 +494,14 @@ struct blk_mq_ops {
 	 */
 	void (*commit_rqs)(struct blk_mq_hw_ctx *);
 
+	/**
+	 * @queue_rqs: Queue a list of new requests. Driver is guaranteed
+	 * that each request belongs to the same queue. If the driver doesn't
+	 * empty the @rqlist completely, then the rest will be queued
+	 * individually by the block layer upon return.
+	 */
+	void (*queue_rqs)(struct request **rqlist);
+
 	/**
 	 * @get_budget: Reserve budget before queue request, once .queue_rq is
 	 * run, it is driver's responsibility to release the
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] nvme: split command copy into a helper
  2021-12-03 21:45 [PATCHSET v2 0/4] Add support for list issue Jens Axboe
  2021-12-03 21:45 ` [PATCH 1/4] block: add mq_ops->queue_rqs hook Jens Axboe
@ 2021-12-03 21:45 ` Jens Axboe
  2021-12-04 10:44   ` Hannes Reinecke
  2021-12-06  7:38   ` Christoph Hellwig
  2021-12-03 21:45 ` [PATCH 3/4] nvme: separate command prep and issue Jens Axboe
  2021-12-03 21:45 ` [PATCH 4/4] nvme: add support for mq_ops->queue_rqs() Jens Axboe
  3 siblings, 2 replies; 16+ messages in thread
From: Jens Axboe @ 2021-12-03 21:45 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: Jens Axboe, Chaitanya Kulkarni

We'll need it for batched submit as well.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 drivers/nvme/host/pci.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 8637538f3fd5..09ea21f75439 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -500,6 +500,15 @@ static inline void nvme_write_sq_db(struct nvme_queue *nvmeq, bool write_sq)
 	nvmeq->last_sq_tail = nvmeq->sq_tail;
 }
 
+static inline void nvme_sq_copy_cmd(struct nvme_queue *nvmeq,
+				    struct nvme_command *cmd)
+{
+	memcpy(nvmeq->sq_cmds + (nvmeq->sq_tail << nvmeq->sqes), cmd,
+		sizeof(*cmd));
+	if (++nvmeq->sq_tail == nvmeq->q_depth)
+		nvmeq->sq_tail = 0;
+}
+
 /**
  * nvme_submit_cmd() - Copy a command into a queue and ring the doorbell
  * @nvmeq: The queue to use
@@ -510,10 +519,7 @@ static void nvme_submit_cmd(struct nvme_queue *nvmeq, struct nvme_command *cmd,
 			    bool write_sq)
 {
 	spin_lock(&nvmeq->sq_lock);
-	memcpy(nvmeq->sq_cmds + (nvmeq->sq_tail << nvmeq->sqes),
-	       cmd, sizeof(*cmd));
-	if (++nvmeq->sq_tail == nvmeq->q_depth)
-		nvmeq->sq_tail = 0;
+	nvme_sq_copy_cmd(nvmeq, cmd);
 	nvme_write_sq_db(nvmeq, write_sq);
 	spin_unlock(&nvmeq->sq_lock);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] nvme: separate command prep and issue
  2021-12-03 21:45 [PATCHSET v2 0/4] Add support for list issue Jens Axboe
  2021-12-03 21:45 ` [PATCH 1/4] block: add mq_ops->queue_rqs hook Jens Axboe
  2021-12-03 21:45 ` [PATCH 2/4] nvme: split command copy into a helper Jens Axboe
@ 2021-12-03 21:45 ` Jens Axboe
  2021-12-04 10:45   ` Hannes Reinecke
  2021-12-03 21:45 ` [PATCH 4/4] nvme: add support for mq_ops->queue_rqs() Jens Axboe
  3 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2021-12-03 21:45 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: Jens Axboe

Add a nvme_prep_rq() helper to setup a command, and nvme_queue_rq() is
adapted to use this helper.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 drivers/nvme/host/pci.c | 57 ++++++++++++++++++++++++-----------------
 1 file changed, 33 insertions(+), 24 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 09ea21f75439..6be6b1ab4285 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -918,52 +918,32 @@ static blk_status_t nvme_map_metadata(struct nvme_dev *dev, struct request *req,
 	return BLK_STS_OK;
 }
 
-/*
- * NOTE: ns is NULL when called on the admin queue.
- */
-static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
-			 const struct blk_mq_queue_data *bd)
+static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req)
 {
-	struct nvme_ns *ns = hctx->queue->queuedata;
-	struct nvme_queue *nvmeq = hctx->driver_data;
-	struct nvme_dev *dev = nvmeq->dev;
-	struct request *req = bd->rq;
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-	struct nvme_command *cmnd = &iod->cmd;
 	blk_status_t ret;
 
 	iod->aborted = 0;
 	iod->npages = -1;
 	iod->nents = 0;
 
-	/*
-	 * We should not need to do this, but we're still using this to
-	 * ensure we can drain requests on a dying queue.
-	 */
-	if (unlikely(!test_bit(NVMEQ_ENABLED, &nvmeq->flags)))
-		return BLK_STS_IOERR;
-
-	if (!nvme_check_ready(&dev->ctrl, req, true))
-		return nvme_fail_nonready_command(&dev->ctrl, req);
-
-	ret = nvme_setup_cmd(ns, req);
+	ret = nvme_setup_cmd(req->q->queuedata, req);
 	if (ret)
 		return ret;
 
 	if (blk_rq_nr_phys_segments(req)) {
-		ret = nvme_map_data(dev, req, cmnd);
+		ret = nvme_map_data(dev, req, &iod->cmd);
 		if (ret)
 			goto out_free_cmd;
 	}
 
 	if (blk_integrity_rq(req)) {
-		ret = nvme_map_metadata(dev, req, cmnd);
+		ret = nvme_map_metadata(dev, req, &iod->cmd);
 		if (ret)
 			goto out_unmap_data;
 	}
 
 	blk_mq_start_request(req);
-	nvme_submit_cmd(nvmeq, cmnd, bd->last);
 	return BLK_STS_OK;
 out_unmap_data:
 	nvme_unmap_data(dev, req);
@@ -972,6 +952,35 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	return ret;
 }
 
+/*
+ * NOTE: ns is NULL when called on the admin queue.
+ */
+static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
+			 const struct blk_mq_queue_data *bd)
+{
+	struct nvme_queue *nvmeq = hctx->driver_data;
+	struct nvme_dev *dev = nvmeq->dev;
+	struct request *req = bd->rq;
+	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
+	blk_status_t ret;
+
+	/*
+	 * We should not need to do this, but we're still using this to
+	 * ensure we can drain requests on a dying queue.
+	 */
+	if (unlikely(!test_bit(NVMEQ_ENABLED, &nvmeq->flags)))
+		return BLK_STS_IOERR;
+
+	if (unlikely(!nvme_check_ready(&dev->ctrl, req, true)))
+		return nvme_fail_nonready_command(&dev->ctrl, req);
+
+	ret = nvme_prep_rq(dev, req);
+	if (unlikely(ret))
+		return ret;
+	nvme_submit_cmd(nvmeq, &iod->cmd, bd->last);
+	return BLK_STS_OK;
+}
+
 static __always_inline void nvme_pci_unmap_rq(struct request *req)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/4] nvme: add support for mq_ops->queue_rqs()
  2021-12-03 21:45 [PATCHSET v2 0/4] Add support for list issue Jens Axboe
                   ` (2 preceding siblings ...)
  2021-12-03 21:45 ` [PATCH 3/4] nvme: separate command prep and issue Jens Axboe
@ 2021-12-03 21:45 ` Jens Axboe
  2021-12-04 10:47   ` Hannes Reinecke
  2021-12-06  7:40   ` Christoph Hellwig
  3 siblings, 2 replies; 16+ messages in thread
From: Jens Axboe @ 2021-12-03 21:45 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: Jens Axboe

This enables the block layer to send us a full plug list of requests
that need submitting. The block layer guarantees that they all belong
to the same queue, but we do have to check the hardware queue mapping
for each request.

If errors are encountered, leave them in the passed in list. Then the
block layer will handle them individually.

This is good for about a 4% improvement in peak performance, taking us
from 9.6M to 10M IOPS/core.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 drivers/nvme/host/pci.c | 61 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 6be6b1ab4285..197aa45ef7ef 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -981,6 +981,66 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	return BLK_STS_OK;
 }
 
+static void nvme_submit_cmds(struct nvme_queue *nvmeq, struct request **rqlist)
+{
+	spin_lock(&nvmeq->sq_lock);
+	while (!rq_list_empty(*rqlist)) {
+		struct request *req = rq_list_pop(rqlist);
+		struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
+
+		memcpy(nvmeq->sq_cmds + (nvmeq->sq_tail << nvmeq->sqes),
+				absolute_pointer(&iod->cmd), sizeof(iod->cmd));
+		if (++nvmeq->sq_tail == nvmeq->q_depth)
+			nvmeq->sq_tail = 0;
+	}
+	nvme_write_sq_db(nvmeq, true);
+	spin_unlock(&nvmeq->sq_lock);
+}
+
+static bool nvme_prep_rq_batch(struct nvme_queue *nvmeq, struct request *req)
+{
+	/*
+	 * We should not need to do this, but we're still using this to
+	 * ensure we can drain requests on a dying queue.
+	 */
+	if (unlikely(!test_bit(NVMEQ_ENABLED, &nvmeq->flags)))
+		return false;
+	if (unlikely(!nvme_check_ready(&nvmeq->dev->ctrl, req, true)))
+		return false;
+
+	req->mq_hctx->tags->rqs[req->tag] = req;
+	return nvme_prep_rq(nvmeq->dev, req) == BLK_STS_OK;
+}
+
+static void nvme_queue_rqs(struct request **rqlist)
+{
+	struct request *req = rq_list_peek(rqlist), *prev = NULL;
+	struct request *requeue_list = NULL;
+
+	do {
+		struct nvme_queue *nvmeq = req->mq_hctx->driver_data;
+
+		if (!nvme_prep_rq_batch(nvmeq, req)) {
+			/* detach 'req' and add to remainder list */
+			if (prev)
+				prev->rq_next = req->rq_next;
+			rq_list_add(&requeue_list, req);
+		} else {
+			prev = req;
+		}
+
+		req = rq_list_next(req);
+		if (!req || (prev && req->mq_hctx != prev->mq_hctx)) {
+			/* detach rest of list, and submit */
+			prev->rq_next = NULL;
+			nvme_submit_cmds(nvmeq, rqlist);
+			*rqlist = req;
+		}
+	} while (req);
+
+	*rqlist = requeue_list;
+}
+
 static __always_inline void nvme_pci_unmap_rq(struct request *req)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
@@ -1678,6 +1738,7 @@ static const struct blk_mq_ops nvme_mq_admin_ops = {
 
 static const struct blk_mq_ops nvme_mq_ops = {
 	.queue_rq	= nvme_queue_rq,
+	.queue_rqs	= nvme_queue_rqs,
 	.complete	= nvme_pci_complete_rq,
 	.commit_rqs	= nvme_commit_rqs,
 	.init_hctx	= nvme_init_hctx,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] block: add mq_ops->queue_rqs hook
  2021-12-03 21:45 ` [PATCH 1/4] block: add mq_ops->queue_rqs hook Jens Axboe
@ 2021-12-04 10:43   ` Hannes Reinecke
  2021-12-04 20:13     ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: Hannes Reinecke @ 2021-12-04 10:43 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-nvme

On 12/3/21 10:45 PM, Jens Axboe wrote:
> If we have a list of requests in our plug list, send it to the driver in
> one go, if possible. The driver must set mq_ops->queue_rqs() to support
> this, if not the usual one-by-one path is used.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>   block/blk-mq.c         | 24 +++++++++++++++++++++---
>   include/linux/blk-mq.h |  8 ++++++++
>   2 files changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 22ec21aa0c22..9ac9174a2ba4 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2513,6 +2513,7 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
>   {
>   	struct blk_mq_hw_ctx *this_hctx;
>   	struct blk_mq_ctx *this_ctx;
> +	struct request *rq;
>   	unsigned int depth;
>   	LIST_HEAD(list);
>   
> @@ -2521,7 +2522,26 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
>   	plug->rq_count = 0;
>   
>   	if (!plug->multiple_queues && !plug->has_elevator && !from_schedule) {
> -		blk_mq_run_dispatch_ops(plug->mq_list->q,
> +		struct request_queue *q;
> +
> +		rq = plug->mq_list;
> +		q = rq->q;
> +
> +		/*
> +		 * Peek first request and see if we have a ->queue_rqs() hook.
> +		 * If we do, we can dispatch the whole plug list in one go. We
> +		 * already know at this point that all requests belong to the
> +		 * same queue, caller must ensure that's the case.
> +		 */
> +		if (q->mq_ops->queue_rqs &&
> +		    !(rq->mq_hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {

What is the dependency on shared tags here?
 From what I've seen it's just about submitting requests; the only 
difference to shared tags is the way the tags are allocated.
Care to explain?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] nvme: split command copy into a helper
  2021-12-03 21:45 ` [PATCH 2/4] nvme: split command copy into a helper Jens Axboe
@ 2021-12-04 10:44   ` Hannes Reinecke
  2021-12-06  7:38   ` Christoph Hellwig
  1 sibling, 0 replies; 16+ messages in thread
From: Hannes Reinecke @ 2021-12-04 10:44 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-nvme; +Cc: Chaitanya Kulkarni

On 12/3/21 10:45 PM, Jens Axboe wrote:
> We'll need it for batched submit as well.
> 
> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>   drivers/nvme/host/pci.c | 14 ++++++++++----
>   1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 8637538f3fd5..09ea21f75439 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -500,6 +500,15 @@ static inline void nvme_write_sq_db(struct nvme_queue *nvmeq, bool write_sq)
>   	nvmeq->last_sq_tail = nvmeq->sq_tail;
>   }
>   
> +static inline void nvme_sq_copy_cmd(struct nvme_queue *nvmeq,
> +				    struct nvme_command *cmd)
> +{
> +	memcpy(nvmeq->sq_cmds + (nvmeq->sq_tail << nvmeq->sqes), cmd,
> +		sizeof(*cmd));
> +	if (++nvmeq->sq_tail == nvmeq->q_depth)
> +		nvmeq->sq_tail = 0;
> +}
> +
>   /**
>    * nvme_submit_cmd() - Copy a command into a queue and ring the doorbell
>    * @nvmeq: The queue to use
> @@ -510,10 +519,7 @@ static void nvme_submit_cmd(struct nvme_queue *nvmeq, struct nvme_command *cmd,
>   			    bool write_sq)
>   {
>   	spin_lock(&nvmeq->sq_lock);
> -	memcpy(nvmeq->sq_cmds + (nvmeq->sq_tail << nvmeq->sqes),
> -	       cmd, sizeof(*cmd));
> -	if (++nvmeq->sq_tail == nvmeq->q_depth)
> -		nvmeq->sq_tail = 0;
> +	nvme_sq_copy_cmd(nvmeq, cmd);
>   	nvme_write_sq_db(nvmeq, write_sq);
>   	spin_unlock(&nvmeq->sq_lock);
>   }
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] nvme: separate command prep and issue
  2021-12-03 21:45 ` [PATCH 3/4] nvme: separate command prep and issue Jens Axboe
@ 2021-12-04 10:45   ` Hannes Reinecke
  0 siblings, 0 replies; 16+ messages in thread
From: Hannes Reinecke @ 2021-12-04 10:45 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-nvme

On 12/3/21 10:45 PM, Jens Axboe wrote:
> Add a nvme_prep_rq() helper to setup a command, and nvme_queue_rq() is
> adapted to use this helper.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>   drivers/nvme/host/pci.c | 57 ++++++++++++++++++++++++-----------------
>   1 file changed, 33 insertions(+), 24 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] nvme: add support for mq_ops->queue_rqs()
  2021-12-03 21:45 ` [PATCH 4/4] nvme: add support for mq_ops->queue_rqs() Jens Axboe
@ 2021-12-04 10:47   ` Hannes Reinecke
  2021-12-06  7:40   ` Christoph Hellwig
  1 sibling, 0 replies; 16+ messages in thread
From: Hannes Reinecke @ 2021-12-04 10:47 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-nvme

On 12/3/21 10:45 PM, Jens Axboe wrote:
> This enables the block layer to send us a full plug list of requests
> that need submitting. The block layer guarantees that they all belong
> to the same queue, but we do have to check the hardware queue mapping
> for each request.
> 
> If errors are encountered, leave them in the passed in list. Then the
> block layer will handle them individually.
> 
> This is good for about a 4% improvement in peak performance, taking us
> from 9.6M to 10M IOPS/core.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>   drivers/nvme/host/pci.c | 61 +++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 61 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] block: add mq_ops->queue_rqs hook
  2021-12-04 10:43   ` Hannes Reinecke
@ 2021-12-04 20:13     ` Jens Axboe
  2021-12-05  9:07       ` Hannes Reinecke
  0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2021-12-04 20:13 UTC (permalink / raw)
  To: Hannes Reinecke, linux-block, linux-nvme

On 12/4/21 3:43 AM, Hannes Reinecke wrote:
> On 12/3/21 10:45 PM, Jens Axboe wrote:
>> If we have a list of requests in our plug list, send it to the driver in
>> one go, if possible. The driver must set mq_ops->queue_rqs() to support
>> this, if not the usual one-by-one path is used.
>>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> ---
>>   block/blk-mq.c         | 24 +++++++++++++++++++++---
>>   include/linux/blk-mq.h |  8 ++++++++
>>   2 files changed, 29 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 22ec21aa0c22..9ac9174a2ba4 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -2513,6 +2513,7 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
>>   {
>>   	struct blk_mq_hw_ctx *this_hctx;
>>   	struct blk_mq_ctx *this_ctx;
>> +	struct request *rq;
>>   	unsigned int depth;
>>   	LIST_HEAD(list);
>>   
>> @@ -2521,7 +2522,26 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
>>   	plug->rq_count = 0;
>>   
>>   	if (!plug->multiple_queues && !plug->has_elevator && !from_schedule) {
>> -		blk_mq_run_dispatch_ops(plug->mq_list->q,
>> +		struct request_queue *q;
>> +
>> +		rq = plug->mq_list;
>> +		q = rq->q;
>> +
>> +		/*
>> +		 * Peek first request and see if we have a ->queue_rqs() hook.
>> +		 * If we do, we can dispatch the whole plug list in one go. We
>> +		 * already know at this point that all requests belong to the
>> +		 * same queue, caller must ensure that's the case.
>> +		 */
>> +		if (q->mq_ops->queue_rqs &&
>> +		    !(rq->mq_hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
> 
> What is the dependency on shared tags here?
>  From what I've seen it's just about submitting requests; the only 
> difference to shared tags is the way the tags are allocated.
> Care to explain?

For shared tags, we need to actively increment the use count per
request. This path doesn't do that, so it's disabled for now. It could
be done, but then it'd have to be in the caller, so I'd rather leave it
for a future optimization if anyone cares enough about this for shared
tags.

I can add a comment about it if that helps.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] block: add mq_ops->queue_rqs hook
  2021-12-04 20:13     ` Jens Axboe
@ 2021-12-05  9:07       ` Hannes Reinecke
  2021-12-05 13:09         ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: Hannes Reinecke @ 2021-12-05  9:07 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-nvme

On 12/4/21 9:13 PM, Jens Axboe wrote:
> On 12/4/21 3:43 AM, Hannes Reinecke wrote:
>> On 12/3/21 10:45 PM, Jens Axboe wrote:
>>> If we have a list of requests in our plug list, send it to the driver in
>>> one go, if possible. The driver must set mq_ops->queue_rqs() to support
>>> this, if not the usual one-by-one path is used.
>>>
>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>> ---
>>>    block/blk-mq.c         | 24 +++++++++++++++++++++---
>>>    include/linux/blk-mq.h |  8 ++++++++
>>>    2 files changed, 29 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index 22ec21aa0c22..9ac9174a2ba4 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -2513,6 +2513,7 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
>>>    {
>>>    	struct blk_mq_hw_ctx *this_hctx;
>>>    	struct blk_mq_ctx *this_ctx;
>>> +	struct request *rq;
>>>    	unsigned int depth;
>>>    	LIST_HEAD(list);
>>>    
>>> @@ -2521,7 +2522,26 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
>>>    	plug->rq_count = 0;
>>>    
>>>    	if (!plug->multiple_queues && !plug->has_elevator && !from_schedule) {
>>> -		blk_mq_run_dispatch_ops(plug->mq_list->q,
>>> +		struct request_queue *q;
>>> +
>>> +		rq = plug->mq_list;
>>> +		q = rq->q;
>>> +
>>> +		/*
>>> +		 * Peek first request and see if we have a ->queue_rqs() hook.
>>> +		 * If we do, we can dispatch the whole plug list in one go. We
>>> +		 * already know at this point that all requests belong to the
>>> +		 * same queue, caller must ensure that's the case.
>>> +		 */
>>> +		if (q->mq_ops->queue_rqs &&
>>> +		    !(rq->mq_hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
>>
>> What is the dependency on shared tags here?
>>   From what I've seen it's just about submitting requests; the only
>> difference to shared tags is the way the tags are allocated.
>> Care to explain?
> 
> For shared tags, we need to actively increment the use count per
> request. This path doesn't do that, so it's disabled for now. It could
> be done, but then it'd have to be in the caller, so I'd rather leave it
> for a future optimization if anyone cares enough about this for shared
> tags.
> 
> I can add a comment about it if that helps.
> 
Please do.
It'll act as a reminder what needs to be done if and when one of the 
drivers requiring shared tags is looking at implementing queue_rqs.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] block: add mq_ops->queue_rqs hook
  2021-12-05  9:07       ` Hannes Reinecke
@ 2021-12-05 13:09         ` Jens Axboe
  0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2021-12-05 13:09 UTC (permalink / raw)
  To: Hannes Reinecke, linux-block, linux-nvme

On 12/5/21 2:07 AM, Hannes Reinecke wrote:
> On 12/4/21 9:13 PM, Jens Axboe wrote:
>> On 12/4/21 3:43 AM, Hannes Reinecke wrote:
>>> On 12/3/21 10:45 PM, Jens Axboe wrote:
>>>> If we have a list of requests in our plug list, send it to the driver in
>>>> one go, if possible. The driver must set mq_ops->queue_rqs() to support
>>>> this, if not the usual one-by-one path is used.
>>>>
>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>> ---
>>>>    block/blk-mq.c         | 24 +++++++++++++++++++++---
>>>>    include/linux/blk-mq.h |  8 ++++++++
>>>>    2 files changed, 29 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>> index 22ec21aa0c22..9ac9174a2ba4 100644
>>>> --- a/block/blk-mq.c
>>>> +++ b/block/blk-mq.c
>>>> @@ -2513,6 +2513,7 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
>>>>    {
>>>>    	struct blk_mq_hw_ctx *this_hctx;
>>>>    	struct blk_mq_ctx *this_ctx;
>>>> +	struct request *rq;
>>>>    	unsigned int depth;
>>>>    	LIST_HEAD(list);
>>>>    
>>>> @@ -2521,7 +2522,26 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
>>>>    	plug->rq_count = 0;
>>>>    
>>>>    	if (!plug->multiple_queues && !plug->has_elevator && !from_schedule) {
>>>> -		blk_mq_run_dispatch_ops(plug->mq_list->q,
>>>> +		struct request_queue *q;
>>>> +
>>>> +		rq = plug->mq_list;
>>>> +		q = rq->q;
>>>> +
>>>> +		/*
>>>> +		 * Peek first request and see if we have a ->queue_rqs() hook.
>>>> +		 * If we do, we can dispatch the whole plug list in one go. We
>>>> +		 * already know at this point that all requests belong to the
>>>> +		 * same queue, caller must ensure that's the case.
>>>> +		 */
>>>> +		if (q->mq_ops->queue_rqs &&
>>>> +		    !(rq->mq_hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
>>>
>>> What is the dependency on shared tags here?
>>>   From what I've seen it's just about submitting requests; the only
>>> difference to shared tags is the way the tags are allocated.
>>> Care to explain?
>>
>> For shared tags, we need to actively increment the use count per
>> request. This path doesn't do that, so it's disabled for now. It could
>> be done, but then it'd have to be in the caller, so I'd rather leave it
>> for a future optimization if anyone cares enough about this for shared
>> tags.
>>
>> I can add a comment about it if that helps.
>>
> Please do.
> It'll act as a reminder what needs to be done if and when one of the 
> drivers requiring shared tags is looking at implementing queue_rqs.

I added to the comment yesterday:

https://git.kernel.dk/cgit/linux-block/commit/?h=perf-wip&id=f9f526700607bf804fa8541c824ea54253f4241a

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] nvme: split command copy into a helper
  2021-12-03 21:45 ` [PATCH 2/4] nvme: split command copy into a helper Jens Axboe
  2021-12-04 10:44   ` Hannes Reinecke
@ 2021-12-06  7:38   ` Christoph Hellwig
  2021-12-06 16:33     ` Jens Axboe
  1 sibling, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2021-12-06  7:38 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme, Chaitanya Kulkarni

As mentioned last time I really don't like the proliferation of
tiny helpers with just two callers.  See my patch I responded with
for my alternative.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] nvme: add support for mq_ops->queue_rqs()
  2021-12-03 21:45 ` [PATCH 4/4] nvme: add support for mq_ops->queue_rqs() Jens Axboe
  2021-12-04 10:47   ` Hannes Reinecke
@ 2021-12-06  7:40   ` Christoph Hellwig
  2021-12-06 16:33     ` Jens Axboe
  1 sibling, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2021-12-06  7:40 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

On Fri, Dec 03, 2021 at 02:45:44PM -0700, Jens Axboe wrote:
> This enables the block layer to send us a full plug list of requests
> that need submitting. The block layer guarantees that they all belong
> to the same queue, but we do have to check the hardware queue mapping
> for each request.
> 
> If errors are encountered, leave them in the passed in list. Then the
> block layer will handle them individually.
> 
> This is good for about a 4% improvement in peak performance, taking us
> from 9.6M to 10M IOPS/core.

This looks pretty similar to my proposed cleanups (which is nice), but
back then you mentioned the cleaner version was much slower.  Do you
know what brought the speed back in this version?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] nvme: split command copy into a helper
  2021-12-06  7:38   ` Christoph Hellwig
@ 2021-12-06 16:33     ` Jens Axboe
  0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2021-12-06 16:33 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, linux-nvme, Chaitanya Kulkarni

On 12/6/21 12:38 AM, Christoph Hellwig wrote:
> As mentioned last time I really don't like the proliferation of
> tiny helpers with just two callers.  See my patch I responded with
> for my alternative.

I tend to agree with you, but for this particular case I really don't
think it's better to not have the helper. I did try that version too,
but you end up with several sites duplicating it. So while I agree in
general, I don't think this one applies.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] nvme: add support for mq_ops->queue_rqs()
  2021-12-06  7:40   ` Christoph Hellwig
@ 2021-12-06 16:33     ` Jens Axboe
  0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2021-12-06 16:33 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, linux-nvme

On 12/6/21 12:40 AM, Christoph Hellwig wrote:
> On Fri, Dec 03, 2021 at 02:45:44PM -0700, Jens Axboe wrote:
>> This enables the block layer to send us a full plug list of requests
>> that need submitting. The block layer guarantees that they all belong
>> to the same queue, but we do have to check the hardware queue mapping
>> for each request.
>>
>> If errors are encountered, leave them in the passed in list. Then the
>> block layer will handle them individually.
>>
>> This is good for about a 4% improvement in peak performance, taking us
>> from 9.6M to 10M IOPS/core.
> 
> This looks pretty similar to my proposed cleanups (which is nice), but
> back then you mentioned the cleaner version was much slower.  Do you
> know what brought the speed back in this version?

Yes, it has that folded in and tweaked on top. Current version seems
to be fine.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-12-06 16:33 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-03 21:45 [PATCHSET v2 0/4] Add support for list issue Jens Axboe
2021-12-03 21:45 ` [PATCH 1/4] block: add mq_ops->queue_rqs hook Jens Axboe
2021-12-04 10:43   ` Hannes Reinecke
2021-12-04 20:13     ` Jens Axboe
2021-12-05  9:07       ` Hannes Reinecke
2021-12-05 13:09         ` Jens Axboe
2021-12-03 21:45 ` [PATCH 2/4] nvme: split command copy into a helper Jens Axboe
2021-12-04 10:44   ` Hannes Reinecke
2021-12-06  7:38   ` Christoph Hellwig
2021-12-06 16:33     ` Jens Axboe
2021-12-03 21:45 ` [PATCH 3/4] nvme: separate command prep and issue Jens Axboe
2021-12-04 10:45   ` Hannes Reinecke
2021-12-03 21:45 ` [PATCH 4/4] nvme: add support for mq_ops->queue_rqs() Jens Axboe
2021-12-04 10:47   ` Hannes Reinecke
2021-12-06  7:40   ` Christoph Hellwig
2021-12-06 16:33     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).