Linux-Block Archive on lore.kernel.org
 help / color / Atom feed
* [RFC] optimize nvme single segment I/O
@ 2019-03-21 23:10 Christoph Hellwig
  2019-03-21 23:10 ` [PATCH 01/15] block: add a req_bvec helper Christoph Hellwig
                   ` (17 more replies)
  0 siblings, 18 replies; 46+ messages in thread
From: Christoph Hellwig @ 2019-03-21 23:10 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg; +Cc: linux-nvme, linux-block

Hi all,

with all the discussion on small I/O performance lately I thought
it was time off to dust my old idea to optimize this path a bit
by avoiding to build a scatterlist.  I've only done very basic
testing because I've been a bit busy, but I thought it might be
worthwhile to get it out for feedback.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 01/15] block: add a req_bvec helper
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O Christoph Hellwig
@ 2019-03-21 23:10 ` Christoph Hellwig
  2019-03-25  5:07   ` Chaitanya Kulkarni
  2019-03-21 23:10 ` [PATCH 02/15] block: add a rq_integrity_vec helper Christoph Hellwig
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2019-03-21 23:10 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg; +Cc: linux-nvme, linux-block

Return the currently active bvec segment, potentially spanning multiple
pages.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/blkdev.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 0de92b29f589..255e20313cde 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -929,6 +929,13 @@ static inline unsigned int blk_rq_payload_bytes(struct request *rq)
 	return blk_rq_bytes(rq);
 }
 
+static inline struct bio_vec req_bvec(struct request *rq)
+{
+	if (rq->rq_flags & RQF_SPECIAL_PAYLOAD)
+		return rq->special_vec;
+	return mp_bvec_iter_bvec(rq->bio->bi_io_vec, rq->bio->bi_iter);
+}
+
 static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
 						     int op)
 {
-- 
2.20.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 02/15] block: add a rq_integrity_vec helper
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O Christoph Hellwig
  2019-03-21 23:10 ` [PATCH 01/15] block: add a req_bvec helper Christoph Hellwig
@ 2019-03-21 23:10 ` Christoph Hellwig
  2019-03-25  5:10   ` Chaitanya Kulkarni
  2019-03-21 23:10 ` [PATCH 03/15] block: add a rq_dma_dir helper Christoph Hellwig
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2019-03-21 23:10 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg; +Cc: linux-nvme, linux-block

This provides a nice little shortcut to get the integrity data for
drivers like NVMe that only support a single integrity segment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/blkdev.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 255e20313cde..f9a072610d28 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1552,6 +1552,16 @@ static inline unsigned int bio_integrity_bytes(struct blk_integrity *bi,
 	return bio_integrity_intervals(bi, sectors) * bi->tuple_size;
 }
 
+/*
+ * Return the first bvec that contains integrity data.  In general only
+ * drivers that are limited to a single integrity segment should use this
+ * helper.
+ */
+static inline struct bio_vec *rq_integrity_vec(struct request *rq)
+{
+	return rq->bio->bi_integrity->bip_vec;
+}
+
 #else /* CONFIG_BLK_DEV_INTEGRITY */
 
 struct bio;
@@ -1626,6 +1636,11 @@ static inline unsigned int bio_integrity_bytes(struct blk_integrity *bi,
 	return 0;
 }
 
+static inline struct bio_vec *rq_integrity_vec(struct request *rq)
+{
+	return NULL;
+}
+
 #endif /* CONFIG_BLK_DEV_INTEGRITY */
 
 struct block_device_operations {
-- 
2.20.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 03/15] block: add a rq_dma_dir helper
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O Christoph Hellwig
  2019-03-21 23:10 ` [PATCH 01/15] block: add a req_bvec helper Christoph Hellwig
  2019-03-21 23:10 ` [PATCH 02/15] block: add a rq_integrity_vec helper Christoph Hellwig
@ 2019-03-21 23:10 ` Christoph Hellwig
  2019-03-22 13:06   ` Johannes Thumshirn
  2019-03-25  5:11   ` Chaitanya Kulkarni
  2019-03-21 23:10 ` [PATCH 04/15] block: add dma_map_bvec helper Christoph Hellwig
                   ` (14 subsequent siblings)
  17 siblings, 2 replies; 46+ messages in thread
From: Christoph Hellwig @ 2019-03-21 23:10 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg; +Cc: linux-nvme, linux-block

In a lot of places we want to know the DMA direction for a given
struct request.  Add a little helper to make it a littler easier.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/blkdev.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f9a072610d28..5279104527ad 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -638,6 +638,9 @@ static inline bool blk_account_rq(struct request *rq)
 
 #define rq_data_dir(rq)		(op_is_write(req_op(rq)) ? WRITE : READ)
 
+#define rq_dma_dir(rq) \
+	(op_is_write(req_op(rq)) ? DMA_TO_DEVICE : DMA_FROM_DEVICE)
+
 static inline bool queue_is_mq(struct request_queue *q)
 {
 	return q->mq_ops;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 04/15] block: add dma_map_bvec helper
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O Christoph Hellwig
                   ` (2 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 03/15] block: add a rq_dma_dir helper Christoph Hellwig
@ 2019-03-21 23:10 ` Christoph Hellwig
  2019-03-25  5:13   ` Chaitanya Kulkarni
  2019-03-21 23:10 ` [PATCH 05/15] nvme-pci: remove the unused iod->length field Christoph Hellwig
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2019-03-21 23:10 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg; +Cc: linux-nvme, linux-block

Provide a nice little shortcut for mapping a single bvec.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/blkdev.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5279104527ad..322ff969659c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -641,6 +641,10 @@ static inline bool blk_account_rq(struct request *rq)
 #define rq_dma_dir(rq) \
 	(op_is_write(req_op(rq)) ? DMA_TO_DEVICE : DMA_FROM_DEVICE)
 
+#define dma_map_bvec(dev, bv, dir, attrs) \
+	dma_map_page_attrs(dev, (bv)->bv_page, (bv)->bv_offset, (bv)->bv_len, \
+	(dir), (attrs))
+
 static inline bool queue_is_mq(struct request_queue *q)
 {
 	return q->mq_ops;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 05/15] nvme-pci: remove the unused iod->length field
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O Christoph Hellwig
                   ` (3 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 04/15] block: add dma_map_bvec helper Christoph Hellwig
@ 2019-03-21 23:10 ` Christoph Hellwig
  2019-03-25  5:14   ` Chaitanya Kulkarni
  2019-03-21 23:10 ` [PATCH 06/15] nvme-pci: remove nvme_init_iod Christoph Hellwig
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2019-03-21 23:10 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg; +Cc: linux-nvme, linux-block

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a90cf5d63aac..bf0d71fe243e 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -220,7 +220,6 @@ struct nvme_iod {
 	int aborted;
 	int npages;		/* In the PRP list. 0 means small pool in use */
 	int nents;		/* Used in scatterlist */
-	int length;		/* Of data, in bytes */
 	dma_addr_t first_dma;
 	struct scatterlist meta_sg; /* metadata requires single contiguous buffer */
 	struct scatterlist *sg;
@@ -603,7 +602,6 @@ static blk_status_t nvme_init_iod(struct request *rq, struct nvme_dev *dev)
 	iod->aborted = 0;
 	iod->npages = -1;
 	iod->nents = 0;
-	iod->length = size;
 
 	return BLK_STS_OK;
 }
-- 
2.20.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 06/15] nvme-pci: remove nvme_init_iod
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O Christoph Hellwig
                   ` (4 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 05/15] nvme-pci: remove the unused iod->length field Christoph Hellwig
@ 2019-03-21 23:10 ` Christoph Hellwig
  2019-03-25  5:19   ` Chaitanya Kulkarni
  2019-03-21 23:10 ` [PATCH 07/15] nvme-pci: move the call to nvme_cleanup_cmd out of nvme_unmap_data Christoph Hellwig
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2019-03-21 23:10 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg; +Cc: linux-nvme, linux-block

nvme_init_iod should really be split into two parts: initialize a few
general iod fields, which can easily be done at the beginning of
nvme_queue_rq, and allocating the scatterlist if needed, which logically
belongs into nvme_map_data with the code making use of it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 56 ++++++++++++++++-------------------------
 1 file changed, 22 insertions(+), 34 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index bf0d71fe243e..3f06e942fb47 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -208,10 +208,10 @@ struct nvme_queue {
 };
 
 /*
- * The nvme_iod describes the data in an I/O, including the list of PRP
- * entries.  You can't see it in this data structure because C doesn't let
- * me express that.  Use nvme_init_iod to ensure there's enough space
- * allocated to store the PRP list.
+ * The nvme_iod describes the data in an I/O.
+ *
+ * The sg pointer contains the list of PRP/SGL chunk allocations in addition
+ * to the actual struct scatterlist.
  */
 struct nvme_iod {
 	struct nvme_request req;
@@ -583,29 +583,6 @@ static inline bool nvme_pci_use_sgls(struct nvme_dev *dev, struct request *req)
 	return true;
 }
 
-static blk_status_t nvme_init_iod(struct request *rq, struct nvme_dev *dev)
-{
-	struct nvme_iod *iod = blk_mq_rq_to_pdu(rq);
-	int nseg = blk_rq_nr_phys_segments(rq);
-	unsigned int size = blk_rq_payload_bytes(rq);
-
-	iod->use_sgl = nvme_pci_use_sgls(dev, rq);
-
-	if (nseg > NVME_INT_PAGES || size > NVME_INT_BYTES(dev)) {
-		iod->sg = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
-		if (!iod->sg)
-			return BLK_STS_RESOURCE;
-	} else {
-		iod->sg = iod->inline_sg;
-	}
-
-	iod->aborted = 0;
-	iod->npages = -1;
-	iod->nents = 0;
-
-	return BLK_STS_OK;
-}
-
 static void nvme_free_iod(struct nvme_dev *dev, struct request *req)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
@@ -837,6 +814,17 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 	blk_status_t ret = BLK_STS_IOERR;
 	int nr_mapped;
 
+	if (blk_rq_payload_bytes(req) > NVME_INT_BYTES(dev) ||
+	    blk_rq_nr_phys_segments(req) > NVME_INT_PAGES) {
+		iod->sg = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
+		if (!iod->sg)
+			return BLK_STS_RESOURCE;
+	} else {
+		iod->sg = iod->inline_sg;
+	}
+
+	iod->use_sgl = nvme_pci_use_sgls(dev, req);
+
 	sg_init_table(iod->sg, blk_rq_nr_phys_segments(req));
 	iod->nents = blk_rq_map_sg(q, req, iod->sg);
 	if (!iod->nents)
@@ -881,6 +869,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 out_unmap:
 	dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
 out:
+	nvme_free_iod(dev, req);
 	return ret;
 }
 
@@ -913,9 +902,14 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	struct nvme_queue *nvmeq = hctx->driver_data;
 	struct nvme_dev *dev = nvmeq->dev;
 	struct request *req = bd->rq;
+	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 	struct nvme_command cmnd;
 	blk_status_t ret;
 
+	iod->aborted = 0;
+	iod->npages = -1;
+	iod->nents = 0;
+
 	/*
 	 * We should not need to do this, but we're still using this to
 	 * ensure we can drain requests on a dying queue.
@@ -927,21 +921,15 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	if (ret)
 		return ret;
 
-	ret = nvme_init_iod(req, dev);
-	if (ret)
-		goto out_free_cmd;
-
 	if (blk_rq_nr_phys_segments(req)) {
 		ret = nvme_map_data(dev, req, &cmnd);
 		if (ret)
-			goto out_cleanup_iod;
+			goto out_free_cmd;
 	}
 
 	blk_mq_start_request(req);
 	nvme_submit_cmd(nvmeq, &cmnd, bd->last);
 	return BLK_STS_OK;
-out_cleanup_iod:
-	nvme_free_iod(dev, req);
 out_free_cmd:
 	nvme_cleanup_cmd(req);
 	return ret;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 07/15] nvme-pci: move the call to nvme_cleanup_cmd out of nvme_unmap_data
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O Christoph Hellwig
                   ` (5 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 06/15] nvme-pci: remove nvme_init_iod Christoph Hellwig
@ 2019-03-21 23:10 ` Christoph Hellwig
  2019-03-25  5:21   ` Chaitanya Kulkarni
  2019-03-21 23:10 ` [PATCH 08/15] nvme-pci: merge nvme_free_iod into nvme_unmap_data Christoph Hellwig
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2019-03-21 23:10 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg; +Cc: linux-nvme, linux-block

Cleaning up the command setup isn't related to unmapping data, and
disentangling them will simplify error handling a bit down the road.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 3f06e942fb47..2fb35d44010a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -888,7 +888,6 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
 	}
 
-	nvme_cleanup_cmd(req);
 	nvme_free_iod(dev, req);
 }
 
@@ -939,6 +938,7 @@ static void nvme_pci_complete_rq(struct request *req)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 
+	nvme_cleanup_cmd(req);
 	nvme_unmap_data(iod->nvmeq->dev, req);
 	nvme_complete_rq(req);
 }
-- 
2.20.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 08/15] nvme-pci: merge nvme_free_iod into nvme_unmap_data
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O Christoph Hellwig
                   ` (6 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 07/15] nvme-pci: move the call to nvme_cleanup_cmd out of nvme_unmap_data Christoph Hellwig
@ 2019-03-21 23:10 ` Christoph Hellwig
  2019-03-25  5:22   ` Chaitanya Kulkarni
  2019-03-21 23:10 ` [PATCH 09/15] nvme-pci: only call nvme_unmap_data for requests transferring data Christoph Hellwig
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2019-03-21 23:10 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg; +Cc: linux-nvme, linux-block

This means we now have a function that undoes everything nvme_map_data
does and we can simplify the error handling a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 44 ++++++++++++++++-------------------------
 1 file changed, 17 insertions(+), 27 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 2fb35d44010a..ce8bdc15ec3c 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -583,14 +583,24 @@ static inline bool nvme_pci_use_sgls(struct nvme_dev *dev, struct request *req)
 	return true;
 }
 
-static void nvme_free_iod(struct nvme_dev *dev, struct request *req)
+static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
+	enum dma_data_direction dma_dir = rq_data_dir(req) ?
+			DMA_TO_DEVICE : DMA_FROM_DEVICE;
 	const int last_prp = dev->ctrl.page_size / sizeof(__le64) - 1;
 	dma_addr_t dma_addr = iod->first_dma, next_dma_addr;
-
 	int i;
 
+	if (iod->nents) {
+		/* P2PDMA requests do not need to be unmapped */
+		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
+			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
+
+		if (blk_integrity_rq(req))
+			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
+	}
+
 	if (iod->npages == 0)
 		dma_pool_free(dev->prp_small_pool, nvme_pci_iod_list(req)[0],
 			dma_addr);
@@ -847,50 +857,30 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 		ret = nvme_pci_setup_prps(dev, req, &cmnd->rw);
 
 	if (ret != BLK_STS_OK)
-		goto out_unmap;
+		goto out;
 
 	ret = BLK_STS_IOERR;
 	if (blk_integrity_rq(req)) {
 		if (blk_rq_count_integrity_sg(q, req->bio) != 1)
-			goto out_unmap;
+			goto out;
 
 		sg_init_table(&iod->meta_sg, 1);
 		if (blk_rq_map_integrity_sg(q, req->bio, &iod->meta_sg) != 1)
-			goto out_unmap;
+			goto out;
 
 		if (!dma_map_sg(dev->dev, &iod->meta_sg, 1, dma_dir))
-			goto out_unmap;
+			goto out;
 
 		cmnd->rw.metadata = cpu_to_le64(sg_dma_address(&iod->meta_sg));
 	}
 
 	return BLK_STS_OK;
 
-out_unmap:
-	dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
 out:
-	nvme_free_iod(dev, req);
+	nvme_unmap_data(dev, req);
 	return ret;
 }
 
-static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
-{
-	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-	enum dma_data_direction dma_dir = rq_data_dir(req) ?
-			DMA_TO_DEVICE : DMA_FROM_DEVICE;
-
-	if (iod->nents) {
-		/* P2PDMA requests do not need to be unmapped */
-		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
-			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
-
-		if (blk_integrity_rq(req))
-			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
-	}
-
-	nvme_free_iod(dev, req);
-}
-
 /*
  * NOTE: ns is NULL when called on the admin queue.
  */
-- 
2.20.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 09/15] nvme-pci: only call nvme_unmap_data for requests transferring data
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O Christoph Hellwig
                   ` (7 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 08/15] nvme-pci: merge nvme_free_iod into nvme_unmap_data Christoph Hellwig
@ 2019-03-21 23:10 ` Christoph Hellwig
  2019-03-25  5:23   ` Chaitanya Kulkarni
  2019-03-21 23:10 ` [PATCH 10/15] nvme-pci: do not build a scatterlist to map metadata Christoph Hellwig
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2019-03-21 23:10 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg; +Cc: linux-nvme, linux-block

This mirrors how nvme_map_pci is called and will allow simplifying some
checks in nvme_unmap_pci later on.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ce8bdc15ec3c..bc4ee869fe82 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -929,7 +929,8 @@ static void nvme_pci_complete_rq(struct request *req)
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 
 	nvme_cleanup_cmd(req);
-	nvme_unmap_data(iod->nvmeq->dev, req);
+	if (blk_rq_nr_phys_segments(req))
+		nvme_unmap_data(iod->nvmeq->dev, req);
 	nvme_complete_rq(req);
 }
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 10/15] nvme-pci: do not build a scatterlist to map metadata
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O Christoph Hellwig
                   ` (8 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 09/15] nvme-pci: only call nvme_unmap_data for requests transferring data Christoph Hellwig
@ 2019-03-21 23:10 ` Christoph Hellwig
  2019-03-25  5:27   ` Chaitanya Kulkarni
  2019-08-28  9:20   ` Ming Lei
  2019-03-21 23:10 ` [PATCH 11/15] nvme-pci: split metadata handling from nvme_map_data / nvme_unmap_data Christoph Hellwig
                   ` (7 subsequent siblings)
  17 siblings, 2 replies; 46+ messages in thread
From: Christoph Hellwig @ 2019-03-21 23:10 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg; +Cc: linux-nvme, linux-block

We always have exactly one segment, so we can simply call dma_map_bvec.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index bc4ee869fe82..a7dad24e0406 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -221,7 +221,7 @@ struct nvme_iod {
 	int npages;		/* In the PRP list. 0 means small pool in use */
 	int nents;		/* Used in scatterlist */
 	dma_addr_t first_dma;
-	struct scatterlist meta_sg; /* metadata requires single contiguous buffer */
+	dma_addr_t meta_dma;
 	struct scatterlist *sg;
 	struct scatterlist inline_sg[0];
 };
@@ -592,13 +592,16 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 	dma_addr_t dma_addr = iod->first_dma, next_dma_addr;
 	int i;
 
+	if (blk_integrity_rq(req)) {
+		dma_unmap_page(dev->dev, iod->meta_dma,
+				rq_integrity_vec(req)->bv_len, dma_dir);
+	}
+
 	if (iod->nents) {
 		/* P2PDMA requests do not need to be unmapped */
 		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
 			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
 
-		if (blk_integrity_rq(req))
-			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
 	}
 
 	if (iod->npages == 0)
@@ -861,17 +864,11 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 
 	ret = BLK_STS_IOERR;
 	if (blk_integrity_rq(req)) {
-		if (blk_rq_count_integrity_sg(q, req->bio) != 1)
-			goto out;
-
-		sg_init_table(&iod->meta_sg, 1);
-		if (blk_rq_map_integrity_sg(q, req->bio, &iod->meta_sg) != 1)
-			goto out;
-
-		if (!dma_map_sg(dev->dev, &iod->meta_sg, 1, dma_dir))
+		iod->meta_dma = dma_map_bvec(dev->dev, rq_integrity_vec(req),
+				dma_dir, 0);
+		if (dma_mapping_error(dev->dev, iod->meta_dma))
 			goto out;
-
-		cmnd->rw.metadata = cpu_to_le64(sg_dma_address(&iod->meta_sg));
+		cmnd->rw.metadata = cpu_to_le64(iod->meta_dma);
 	}
 
 	return BLK_STS_OK;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 11/15] nvme-pci: split metadata handling from nvme_map_data / nvme_unmap_data
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O Christoph Hellwig
                   ` (9 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 10/15] nvme-pci: do not build a scatterlist to map metadata Christoph Hellwig
@ 2019-03-21 23:10 ` Christoph Hellwig
  2019-03-25  5:29   ` Chaitanya Kulkarni
  2019-03-21 23:10 ` [PATCH 12/15] nvme-pci: remove the inline scatterlist optimization Christoph Hellwig
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2019-03-21 23:10 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg; +Cc: linux-nvme, linux-block

This prepares for some bigger changes to the data mapping helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 48 +++++++++++++++++++++++------------------
 1 file changed, 27 insertions(+), 21 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a7dad24e0406..cf29d079ad5b 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -592,11 +592,6 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 	dma_addr_t dma_addr = iod->first_dma, next_dma_addr;
 	int i;
 
-	if (blk_integrity_rq(req)) {
-		dma_unmap_page(dev->dev, iod->meta_dma,
-				rq_integrity_vec(req)->bv_len, dma_dir);
-	}
-
 	if (iod->nents) {
 		/* P2PDMA requests do not need to be unmapped */
 		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
@@ -858,24 +853,23 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 		ret = nvme_pci_setup_sgls(dev, req, &cmnd->rw, nr_mapped);
 	else
 		ret = nvme_pci_setup_prps(dev, req, &cmnd->rw);
-
+out:
 	if (ret != BLK_STS_OK)
-		goto out;
-
-	ret = BLK_STS_IOERR;
-	if (blk_integrity_rq(req)) {
-		iod->meta_dma = dma_map_bvec(dev->dev, rq_integrity_vec(req),
-				dma_dir, 0);
-		if (dma_mapping_error(dev->dev, iod->meta_dma))
-			goto out;
-		cmnd->rw.metadata = cpu_to_le64(iod->meta_dma);
-	}
+		nvme_unmap_data(dev, req);
+	return ret;
+}
 
-	return BLK_STS_OK;
+static blk_status_t nvme_map_metadata(struct nvme_dev *dev, struct request *req,
+		struct nvme_command *cmnd)
+{
+	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 
-out:
-	nvme_unmap_data(dev, req);
-	return ret;
+	iod->meta_dma = dma_map_bvec(dev->dev, rq_integrity_vec(req),
+			rq_dma_dir(req), 0);
+	if (dma_mapping_error(dev->dev, iod->meta_dma))
+		return BLK_STS_IOERR;
+	cmnd->rw.metadata = cpu_to_le64(iod->meta_dma);
+	return 0;
 }
 
 /*
@@ -913,9 +907,17 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 			goto out_free_cmd;
 	}
 
+	if (blk_integrity_rq(req)) {
+		ret = nvme_map_metadata(dev, req, &cmnd);
+		if (ret)
+			goto out_unmap_data;
+	}
+
 	blk_mq_start_request(req);
 	nvme_submit_cmd(nvmeq, &cmnd, bd->last);
 	return BLK_STS_OK;
+out_unmap_data:
+	nvme_unmap_data(dev, req);
 out_free_cmd:
 	nvme_cleanup_cmd(req);
 	return ret;
@@ -924,10 +926,14 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 static void nvme_pci_complete_rq(struct request *req)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
+	struct nvme_dev *dev = iod->nvmeq->dev;
 
 	nvme_cleanup_cmd(req);
+	if (blk_integrity_rq(req))
+		dma_unmap_page(dev->dev, iod->meta_dma,
+			       rq_integrity_vec(req)->bv_len, rq_data_dir(req));
 	if (blk_rq_nr_phys_segments(req))
-		nvme_unmap_data(iod->nvmeq->dev, req);
+		nvme_unmap_data(dev, req);
 	nvme_complete_rq(req);
 }
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 12/15] nvme-pci: remove the inline scatterlist optimization
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O Christoph Hellwig
                   ` (10 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 11/15] nvme-pci: split metadata handling from nvme_map_data / nvme_unmap_data Christoph Hellwig
@ 2019-03-21 23:10 ` Christoph Hellwig
  2019-03-25  5:30   ` Chaitanya Kulkarni
  2019-03-21 23:10 ` [PATCH 13/15] nvme-pci: optimize mapping of small single segment requests Christoph Hellwig
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2019-03-21 23:10 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg; +Cc: linux-nvme, linux-block

We'll have a better way to optimize for small I/O that doesn't
require it soon, so remove the existing inline_sg case to make that
optimization easier to implement.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 38 ++++++--------------------------------
 1 file changed, 6 insertions(+), 32 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index cf29d079ad5b..c6047935e825 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -223,7 +223,6 @@ struct nvme_iod {
 	dma_addr_t first_dma;
 	dma_addr_t meta_dma;
 	struct scatterlist *sg;
-	struct scatterlist inline_sg[0];
 };
 
 /*
@@ -370,12 +369,6 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, u32 *dbbuf_db,
 	return true;
 }
 
-/*
- * Max size of iod being embedded in the request payload
- */
-#define NVME_INT_PAGES		2
-#define NVME_INT_BYTES(dev)	(NVME_INT_PAGES * (dev)->ctrl.page_size)
-
 /*
  * Will slightly overestimate the number of pages needed.  This is OK
  * as it only leads to a small amount of wasted memory for the lifetime of
@@ -410,15 +403,6 @@ static unsigned int nvme_pci_iod_alloc_size(struct nvme_dev *dev,
 	return alloc_size + sizeof(struct scatterlist) * nseg;
 }
 
-static unsigned int nvme_pci_cmd_size(struct nvme_dev *dev, bool use_sgl)
-{
-	unsigned int alloc_size = nvme_pci_iod_alloc_size(dev,
-				    NVME_INT_BYTES(dev), NVME_INT_PAGES,
-				    use_sgl);
-
-	return sizeof(struct nvme_iod) + alloc_size;
-}
-
 static int nvme_admin_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
 				unsigned int hctx_idx)
 {
@@ -621,8 +605,7 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 		dma_addr = next_dma_addr;
 	}
 
-	if (iod->sg != iod->inline_sg)
-		mempool_free(iod->sg, dev->iod_mempool);
+	mempool_free(iod->sg, dev->iod_mempool);
 }
 
 static void nvme_print_sgl(struct scatterlist *sgl, int nents)
@@ -822,14 +805,9 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 	blk_status_t ret = BLK_STS_IOERR;
 	int nr_mapped;
 
-	if (blk_rq_payload_bytes(req) > NVME_INT_BYTES(dev) ||
-	    blk_rq_nr_phys_segments(req) > NVME_INT_PAGES) {
-		iod->sg = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
-		if (!iod->sg)
-			return BLK_STS_RESOURCE;
-	} else {
-		iod->sg = iod->inline_sg;
-	}
+	iod->sg = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
+	if (!iod->sg)
+		return BLK_STS_RESOURCE;
 
 	iod->use_sgl = nvme_pci_use_sgls(dev, req);
 
@@ -1619,7 +1597,7 @@ static int nvme_alloc_admin_tags(struct nvme_dev *dev)
 		dev->admin_tagset.queue_depth = NVME_AQ_MQ_TAG_DEPTH;
 		dev->admin_tagset.timeout = ADMIN_TIMEOUT;
 		dev->admin_tagset.numa_node = dev_to_node(dev->dev);
-		dev->admin_tagset.cmd_size = nvme_pci_cmd_size(dev, false);
+		dev->admin_tagset.cmd_size = sizeof(struct nvme_iod);
 		dev->admin_tagset.flags = BLK_MQ_F_NO_SCHED;
 		dev->admin_tagset.driver_data = dev;
 
@@ -2266,11 +2244,7 @@ static int nvme_dev_add(struct nvme_dev *dev)
 		dev->tagset.numa_node = dev_to_node(dev->dev);
 		dev->tagset.queue_depth =
 				min_t(int, dev->q_depth, BLK_MQ_MAX_DEPTH) - 1;
-		dev->tagset.cmd_size = nvme_pci_cmd_size(dev, false);
-		if ((dev->ctrl.sgls & ((1 << 0) | (1 << 1))) && sgl_threshold) {
-			dev->tagset.cmd_size = max(dev->tagset.cmd_size,
-					nvme_pci_cmd_size(dev, true));
-		}
+		dev->tagset.cmd_size = sizeof(struct nvme_iod);
 		dev->tagset.flags = BLK_MQ_F_SHOULD_MERGE;
 		dev->tagset.driver_data = dev;
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 13/15] nvme-pci: optimize mapping of small single segment requests
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O Christoph Hellwig
                   ` (11 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 12/15] nvme-pci: remove the inline scatterlist optimization Christoph Hellwig
@ 2019-03-21 23:10 ` Christoph Hellwig
  2019-03-25  5:36   ` Chaitanya Kulkarni
  2019-03-21 23:10 ` [PATCH 14/15] nvme-pci: optimize mapping single segment requests using SGLs Christoph Hellwig
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2019-03-21 23:10 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg; +Cc: linux-nvme, linux-block

If a request is single segment and fits into one or two PRP entries we
do not have to create a scatterlist for it, but can just map the bio_vec
directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 45 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 40 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index c6047935e825..47fc4d653961 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -221,6 +221,7 @@ struct nvme_iod {
 	int npages;		/* In the PRP list. 0 means small pool in use */
 	int nents;		/* Used in scatterlist */
 	dma_addr_t first_dma;
+	unsigned int dma_len;	/* length of single DMA segment mapping */
 	dma_addr_t meta_dma;
 	struct scatterlist *sg;
 };
@@ -576,13 +577,18 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 	dma_addr_t dma_addr = iod->first_dma, next_dma_addr;
 	int i;
 
-	if (iod->nents) {
-		/* P2PDMA requests do not need to be unmapped */
-		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
-			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
-
+	if (iod->dma_len) {
+		dma_unmap_page(dev->dev, dma_addr, iod->dma_len, dma_dir);
+		return;
 	}
 
+	WARN_ON_ONCE(!iod->nents);
+
+	/* P2PDMA requests do not need to be unmapped */
+	if (!is_pci_p2pdma_page(sg_page(iod->sg)))
+		dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
+
+
 	if (iod->npages == 0)
 		dma_pool_free(dev->prp_small_pool, nvme_pci_iod_list(req)[0],
 			dma_addr);
@@ -795,6 +801,24 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_dev *dev,
 	return BLK_STS_OK;
 }
 
+static blk_status_t nvme_setup_prp_simple(struct nvme_dev *dev,
+		struct request *req, struct nvme_rw_command *cmnd,
+		struct bio_vec *bv)
+{
+	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
+	unsigned int first_prp_len = dev->ctrl.page_size - bv->bv_offset;
+
+	iod->first_dma = dma_map_bvec(dev->dev, bv, rq_dma_dir(req), 0);
+	if (dma_mapping_error(dev->dev, iod->first_dma))
+		return BLK_STS_RESOURCE;
+	iod->dma_len = bv->bv_len;
+
+	cmnd->dptr.prp1 = cpu_to_le64(iod->first_dma);
+	if (bv->bv_len > first_prp_len)
+		cmnd->dptr.prp2 = cpu_to_le64(iod->first_dma + first_prp_len);
+	return 0;
+}
+
 static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 		struct nvme_command *cmnd)
 {
@@ -805,6 +829,17 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 	blk_status_t ret = BLK_STS_IOERR;
 	int nr_mapped;
 
+	if (blk_rq_nr_phys_segments(req) == 1) {
+		struct bio_vec bv = req_bvec(req);
+
+		if (!is_pci_p2pdma_page(bv.bv_page)) {
+			if (bv.bv_offset + bv.bv_len <= dev->ctrl.page_size * 2)
+				return nvme_setup_prp_simple(dev, req,
+							     &cmnd->rw, &bv);
+		}
+	}
+
+	iod->dma_len = 0;
 	iod->sg = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
 	if (!iod->sg)
 		return BLK_STS_RESOURCE;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 14/15] nvme-pci: optimize mapping single segment requests using SGLs
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O Christoph Hellwig
                   ` (12 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 13/15] nvme-pci: optimize mapping of small single segment requests Christoph Hellwig
@ 2019-03-21 23:10 ` Christoph Hellwig
  2019-03-25  5:39   ` Chaitanya Kulkarni
  2019-04-30 14:17   ` Klaus Birkelund
  2019-03-21 23:10 ` [PATCH 15/15] nvme-pci: tidy up nvme_map_data Christoph Hellwig
                   ` (3 subsequent siblings)
  17 siblings, 2 replies; 46+ messages in thread
From: Christoph Hellwig @ 2019-03-21 23:10 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg; +Cc: linux-nvme, linux-block

If the controller supports SGLs we can take another short cut for single
segment request, given that we can always map those without another
indirection structure, and thus don't need to create a scatterlist
structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 47fc4d653961..38869f59c296 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -819,6 +819,23 @@ static blk_status_t nvme_setup_prp_simple(struct nvme_dev *dev,
 	return 0;
 }
 
+static blk_status_t nvme_setup_sgl_simple(struct nvme_dev *dev,
+		struct request *req, struct nvme_rw_command *cmnd,
+		struct bio_vec *bv)
+{
+	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
+
+	iod->first_dma = dma_map_bvec(dev->dev, bv, rq_dma_dir(req), 0);
+	if (dma_mapping_error(dev->dev, iod->first_dma))
+		return BLK_STS_RESOURCE;
+	iod->dma_len = bv->bv_len;
+
+	cmnd->dptr.sgl.addr = cpu_to_le64(iod->first_dma);
+	cmnd->dptr.sgl.length = cpu_to_le32(iod->dma_len);
+	cmnd->dptr.sgl.type = NVME_SGL_FMT_DATA_DESC << 4;
+	return 0;
+}
+
 static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 		struct nvme_command *cmnd)
 {
@@ -836,6 +853,11 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 			if (bv.bv_offset + bv.bv_len <= dev->ctrl.page_size * 2)
 				return nvme_setup_prp_simple(dev, req,
 							     &cmnd->rw, &bv);
+
+			if (iod->nvmeq->qid &&
+			    dev->ctrl.sgls & ((1 << 0) | (1 << 1)))
+				return nvme_setup_sgl_simple(dev, req,
+							     &cmnd->rw, &bv);
 		}
 	}
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 15/15] nvme-pci: tidy up nvme_map_data
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O Christoph Hellwig
                   ` (13 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 14/15] nvme-pci: optimize mapping single segment requests using SGLs Christoph Hellwig
@ 2019-03-21 23:10 ` Christoph Hellwig
  2019-03-25  5:40   ` Chaitanya Kulkarni
  2019-03-22 15:44 ` [RFC] optimize nvme single segment I/O Jens Axboe
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2019-03-21 23:10 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg; +Cc: linux-nvme, linux-block

Remove two pointless local variables, remove ret assignment that is
never used, move the use_sgl initialization closer to where it is used.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 38869f59c296..9ec0704d5f78 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -840,10 +840,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 		struct nvme_command *cmnd)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-	struct request_queue *q = req->q;
-	enum dma_data_direction dma_dir = rq_data_dir(req) ?
-			DMA_TO_DEVICE : DMA_FROM_DEVICE;
-	blk_status_t ret = BLK_STS_IOERR;
+	blk_status_t ret = BLK_STS_RESOURCE;
 	int nr_mapped;
 
 	if (blk_rq_nr_phys_segments(req) == 1) {
@@ -865,25 +862,21 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 	iod->sg = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
 	if (!iod->sg)
 		return BLK_STS_RESOURCE;
-
-	iod->use_sgl = nvme_pci_use_sgls(dev, req);
-
 	sg_init_table(iod->sg, blk_rq_nr_phys_segments(req));
-	iod->nents = blk_rq_map_sg(q, req, iod->sg);
+	iod->nents = blk_rq_map_sg(req->q, req, iod->sg);
 	if (!iod->nents)
 		goto out;
 
-	ret = BLK_STS_RESOURCE;
-
 	if (is_pci_p2pdma_page(sg_page(iod->sg)))
 		nr_mapped = pci_p2pdma_map_sg(dev->dev, iod->sg, iod->nents,
-					  dma_dir);
+					      rq_dma_dir(req));
 	else
 		nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
-					     dma_dir,  DMA_ATTR_NO_WARN);
+					     rq_dma_dir(req), DMA_ATTR_NO_WARN);
 	if (!nr_mapped)
 		goto out;
 
+	iod->use_sgl = nvme_pci_use_sgls(dev, req);
 	if (iod->use_sgl)
 		ret = nvme_pci_setup_sgls(dev, req, &cmnd->rw, nr_mapped);
 	else
-- 
2.20.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 03/15] block: add a rq_dma_dir helper
  2019-03-21 23:10 ` [PATCH 03/15] block: add a rq_dma_dir helper Christoph Hellwig
@ 2019-03-22 13:06   ` Johannes Thumshirn
  2019-03-27 14:20     ` Christoph Hellwig
  2019-03-25  5:11   ` Chaitanya Kulkarni
  1 sibling, 1 reply; 46+ messages in thread
From: Johannes Thumshirn @ 2019-03-22 13:06 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: linux-nvme, linux-block

On 22/03/2019 00:10, Christoph Hellwig wrote:
> In a lot of places we want to know the DMA direction for a given
> struct request.  Add a little helper to make it a littler easier.

You introuduce the helper here but you're using it in 8/15 for the 1st time.

Could you convert the possible users to use the new helper in this patch
as well?



-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] optimize nvme single segment I/O
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O Christoph Hellwig
                   ` (14 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 15/15] nvme-pci: tidy up nvme_map_data Christoph Hellwig
@ 2019-03-22 15:44 ` Jens Axboe
  2019-03-27 14:24   ` Christoph Hellwig
  2019-03-22 17:37 ` Keith Busch
  2019-03-22 18:55 ` Sagi Grimberg
  17 siblings, 1 reply; 46+ messages in thread
From: Jens Axboe @ 2019-03-22 15:44 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch, Sagi Grimberg; +Cc: linux-nvme, linux-block

On 3/21/19 5:10 PM, Christoph Hellwig wrote:
> Hi all,
> 
> with all the discussion on small I/O performance lately I thought
> it was time off to dust my old idea to optimize this path a bit
> by avoiding to build a scatterlist.  I've only done very basic
> testing because I've been a bit busy, but I thought it might be
> worthwhile to get it out for feedback.

I ran this through some quick testing, and it's a big win for 4k
IOs here. Around an 8% improvement for cases that are CPU bound.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] optimize nvme single segment I/O
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O Christoph Hellwig
                   ` (15 preceding siblings ...)
  2019-03-22 15:44 ` [RFC] optimize nvme single segment I/O Jens Axboe
@ 2019-03-22 17:37 ` Keith Busch
  2019-03-22 18:55 ` Sagi Grimberg
  17 siblings, 0 replies; 46+ messages in thread
From: Keith Busch @ 2019-03-22 17:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, Sagi Grimberg, linux-nvme, linux-block

On Thu, Mar 21, 2019 at 04:10:22PM -0700, Christoph Hellwig wrote:
> Hi all,
> 
> with all the discussion on small I/O performance lately I thought
> it was time off to dust my old idea to optimize this path a bit
> by avoiding to build a scatterlist.  I've only done very basic
> testing because I've been a bit busy, but I thought it might be
> worthwhile to get it out for feedback.

Tests well here with a measurable IOPs improvement at lower queue depths.

Series looks good to me, especially patch 5! :p

Reviewed-by: Keith Busch <keith.busch@intel.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] optimize nvme single segment I/O
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O Christoph Hellwig
                   ` (16 preceding siblings ...)
  2019-03-22 17:37 ` Keith Busch
@ 2019-03-22 18:55 ` Sagi Grimberg
  17 siblings, 0 replies; 46+ messages in thread
From: Sagi Grimberg @ 2019-03-22 18:55 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch; +Cc: linux-block, linux-nvme


> Hi all,
> 
> with all the discussion on small I/O performance lately I thought
> it was time off to dust my old idea to optimize this path a bit
> by avoiding to build a scatterlist.  I've only done very basic
> testing because I've been a bit busy, but I thought it might be
> worthwhile to get it out for feedback.

This looks good Christoph! for the series,

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 01/15] block: add a req_bvec helper
  2019-03-21 23:10 ` [PATCH 01/15] block: add a req_bvec helper Christoph Hellwig
@ 2019-03-25  5:07   ` Chaitanya Kulkarni
  2019-03-27 14:16     ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Chaitanya Kulkarni @ 2019-03-25  5:07 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: linux-nvme, linux-block

On 3/21/19 4:11 PM, Christoph Hellwig wrote:
> Return the currently active bvec segment, potentially spanning multiple
> pages.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   include/linux/blkdev.h | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 0de92b29f589..255e20313cde 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -929,6 +929,13 @@ static inline unsigned int blk_rq_payload_bytes(struct request *rq)
>   	return blk_rq_bytes(rq);
>   }
>   
> +static inline struct bio_vec req_bvec(struct request *rq)
> +{
> +	if (rq->rq_flags & RQF_SPECIAL_PAYLOAD)
> +		return rq->special_vec;
Quick question here mostly for my understanding, do we also have to 
check here for nr_phys_segments for the operations such as write-zeroes ? OR

this may never get called in write-zeroes context ? OR
not applicable ?

> +	return mp_bvec_iter_bvec(rq->bio->bi_io_vec, rq->bio->bi_iter);
> +}
> +
>   static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
>   						     int op)
>   {
> 

Otherwise, looks good.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/15] block: add a rq_integrity_vec helper
  2019-03-21 23:10 ` [PATCH 02/15] block: add a rq_integrity_vec helper Christoph Hellwig
@ 2019-03-25  5:10   ` Chaitanya Kulkarni
  2019-03-27 14:19     ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Chaitanya Kulkarni @ 2019-03-25  5:10 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: linux-nvme, linux-block

On 3/21/19 4:11 PM, Christoph Hellwig wrote:
> This provides a nice little shortcut to get the integrity data for
> drivers like NVMe that only support a single integrity segment.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   include/linux/blkdev.h | 15 +++++++++++++++
>   1 file changed, 15 insertions(+)
> 
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 255e20313cde..f9a072610d28 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1552,6 +1552,16 @@ static inline unsigned int bio_integrity_bytes(struct blk_integrity *bi,
>   	return bio_integrity_intervals(bi, sectors) * bi->tuple_size;
>   }
>   
> +/*
> + * Return the first bvec that contains integrity data.  In general only
> + * drivers that are limited to a single integrity segment should use this
> + * helper.
> + */
> +static inline struct bio_vec *rq_integrity_vec(struct request *rq)
> +{
Wrt comment, should we add a check here to make sure underlaying driver
has limited single integrity segment ?
> +	return rq->bio->bi_integrity->bip_vec;
> +}
> +
>   #else /* CONFIG_BLK_DEV_INTEGRITY */
>   
>   struct bio;
> @@ -1626,6 +1636,11 @@ static inline unsigned int bio_integrity_bytes(struct blk_integrity *bi,
>   	return 0;
>   }
>   
> +static inline struct bio_vec *rq_integrity_vec(struct request *rq)
> +{
> +	return NULL;
> +}
> +
>   #endif /* CONFIG_BLK_DEV_INTEGRITY */
>   
>   struct block_device_operations {
> 

Otherwise looks good.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 03/15] block: add a rq_dma_dir helper
  2019-03-21 23:10 ` [PATCH 03/15] block: add a rq_dma_dir helper Christoph Hellwig
  2019-03-22 13:06   ` Johannes Thumshirn
@ 2019-03-25  5:11   ` Chaitanya Kulkarni
  1 sibling, 0 replies; 46+ messages in thread
From: Chaitanya Kulkarni @ 2019-03-25  5:11 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: linux-nvme, linux-block

On 3/21/19 4:11 PM, Christoph Hellwig wrote:
> In a lot of places we want to know the DMA direction for a given
> struct request.  Add a little helper to make it a littler easier.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   include/linux/blkdev.h | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index f9a072610d28..5279104527ad 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -638,6 +638,9 @@ static inline bool blk_account_rq(struct request *rq)
>   
>   #define rq_data_dir(rq)		(op_is_write(req_op(rq)) ? WRITE : READ)
>   
> +#define rq_dma_dir(rq) \
> +	(op_is_write(req_op(rq)) ? DMA_TO_DEVICE : DMA_FROM_DEVICE)
> +
>   static inline bool queue_is_mq(struct request_queue *q)
>   {
>   	return q->mq_ops;
> 

Apart from Johannes's comment looks good.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/15] block: add dma_map_bvec helper
  2019-03-21 23:10 ` [PATCH 04/15] block: add dma_map_bvec helper Christoph Hellwig
@ 2019-03-25  5:13   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 46+ messages in thread
From: Chaitanya Kulkarni @ 2019-03-25  5:13 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: linux-nvme, linux-block

On 3/21/19 4:11 PM, Christoph Hellwig wrote:
> Provide a nice little shortcut for mapping a single bvec.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   include/linux/blkdev.h | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 5279104527ad..322ff969659c 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -641,6 +641,10 @@ static inline bool blk_account_rq(struct request *rq)
>   #define rq_dma_dir(rq) \
>   	(op_is_write(req_op(rq)) ? DMA_TO_DEVICE : DMA_FROM_DEVICE)
>   
> +#define dma_map_bvec(dev, bv, dir, attrs) \
> +	dma_map_page_attrs(dev, (bv)->bv_page, (bv)->bv_offset, (bv)->bv_len, \
> +	(dir), (attrs))
> +
>   static inline bool queue_is_mq(struct request_queue *q)
>   {
>   	return q->mq_ops;
> 

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 05/15] nvme-pci: remove the unused iod->length field
  2019-03-21 23:10 ` [PATCH 05/15] nvme-pci: remove the unused iod->length field Christoph Hellwig
@ 2019-03-25  5:14   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 46+ messages in thread
From: Chaitanya Kulkarni @ 2019-03-25  5:14 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: linux-nvme, linux-block

On 3/21/19 4:11 PM, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/pci.c | 2 --
>   1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index a90cf5d63aac..bf0d71fe243e 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -220,7 +220,6 @@ struct nvme_iod {
>   	int aborted;
>   	int npages;		/* In the PRP list. 0 means small pool in use */
>   	int nents;		/* Used in scatterlist */
> -	int length;		/* Of data, in bytes */
>   	dma_addr_t first_dma;
>   	struct scatterlist meta_sg; /* metadata requires single contiguous buffer */
>   	struct scatterlist *sg;
> @@ -603,7 +602,6 @@ static blk_status_t nvme_init_iod(struct request *rq, struct nvme_dev *dev)
>   	iod->aborted = 0;
>   	iod->npages = -1;
>   	iod->nents = 0;
> -	iod->length = size;
>   
>   	return BLK_STS_OK;
>   }
> 

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 06/15] nvme-pci: remove nvme_init_iod
  2019-03-21 23:10 ` [PATCH 06/15] nvme-pci: remove nvme_init_iod Christoph Hellwig
@ 2019-03-25  5:19   ` Chaitanya Kulkarni
  2019-03-27 14:21     ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Chaitanya Kulkarni @ 2019-03-25  5:19 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: linux-nvme, linux-block

On 3/21/19 4:11 PM, Christoph Hellwig wrote:
> nvme_init_iod should really be split into two parts: initialize a few
> general iod fields, which can easily be done at the beginning of
> nvme_queue_rq, and allocating the scatterlist if needed, which logically
> belongs into nvme_map_data with the code making use of it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/pci.c | 56 ++++++++++++++++-------------------------
>   1 file changed, 22 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index bf0d71fe243e..3f06e942fb47 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -208,10 +208,10 @@ struct nvme_queue {
>   };
>   
>   /*
> - * The nvme_iod describes the data in an I/O, including the list of PRP
> - * entries.  You can't see it in this data structure because C doesn't let
> - * me express that.  Use nvme_init_iod to ensure there's enough space
> - * allocated to store the PRP list.
> + * The nvme_iod describes the data in an I/O.
> + *
> + * The sg pointer contains the list of PRP/SGL chunk allocations in addition
> + * to the actual struct scatterlist.
>    */
>   struct nvme_iod {
>   	struct nvme_request req;
> @@ -583,29 +583,6 @@ static inline bool nvme_pci_use_sgls(struct nvme_dev *dev, struct request *req)
>   	return true;
>   }
>   
> -static blk_status_t nvme_init_iod(struct request *rq, struct nvme_dev *dev)
> -{
> -	struct nvme_iod *iod = blk_mq_rq_to_pdu(rq);
> -	int nseg = blk_rq_nr_phys_segments(rq);
> -	unsigned int size = blk_rq_payload_bytes(rq);
> -
> -	iod->use_sgl = nvme_pci_use_sgls(dev, rq);
> -
> -	if (nseg > NVME_INT_PAGES || size > NVME_INT_BYTES(dev)) {
> -		iod->sg = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
> -		if (!iod->sg)
> -			return BLK_STS_RESOURCE;
> -	} else {
> -		iod->sg = iod->inline_sg;
> -	}
> -
> -	iod->aborted = 0;
> -	iod->npages = -1;
> -	iod->nents = 0;
> -
> -	return BLK_STS_OK;
> -}
> -
>   static void nvme_free_iod(struct nvme_dev *dev, struct request *req)
>   {
>   	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> @@ -837,6 +814,17 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   	blk_status_t ret = BLK_STS_IOERR;
>   	int nr_mapped;
>   
> +	if (blk_rq_payload_bytes(req) > NVME_INT_BYTES(dev) ||
> +	    blk_rq_nr_phys_segments(req) > NVME_INT_PAGES) {
> +		iod->sg = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
> +		if (!iod->sg)
> +			return BLK_STS_RESOURCE;
> +	} else {
> +		iod->sg = iod->inline_sg;
> +	}
> +
> +	iod->use_sgl = nvme_pci_use_sgls(dev, req);
> +
>   	sg_init_table(iod->sg, blk_rq_nr_phys_segments(req));
>   	iod->nents = blk_rq_map_sg(q, req, iod->sg);
>   	if (!iod->nents)
> @@ -881,6 +869,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   out_unmap:
>   	dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
>   out:
> +	nvme_free_iod(dev, req);
>   	return ret;
>   }
>   
> @@ -913,9 +902,14 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
>   	struct nvme_queue *nvmeq = hctx->driver_data;
>   	struct nvme_dev *dev = nvmeq->dev;
>   	struct request *req = bd->rq;
> +	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
>   	struct nvme_command cmnd;
>   	blk_status_t ret;
>   
> +	iod->aborted = 0;
> +	iod->npages = -1;
> +	iod->nents = 0;
> +
Maybe add an inline helper for above default IOD initialization ?
>   	/*
>   	 * We should not need to do this, but we're still using this to
>   	 * ensure we can drain requests on a dying queue.
> @@ -927,21 +921,15 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
>   	if (ret)
>   		return ret;
>   
> -	ret = nvme_init_iod(req, dev);
> -	if (ret)
> -		goto out_free_cmd;
> -
>   	if (blk_rq_nr_phys_segments(req)) {
>   		ret = nvme_map_data(dev, req, &cmnd);
>   		if (ret)
> -			goto out_cleanup_iod;
> +			goto out_free_cmd;
>   	}
>   
>   	blk_mq_start_request(req);
>   	nvme_submit_cmd(nvmeq, &cmnd, bd->last);
>   	return BLK_STS_OK;
> -out_cleanup_iod:
> -	nvme_free_iod(dev, req);
>   out_free_cmd:
>   	nvme_cleanup_cmd(req);
>   	return ret;
> 

Otherwise, looks good.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 07/15] nvme-pci: move the call to nvme_cleanup_cmd out of nvme_unmap_data
  2019-03-21 23:10 ` [PATCH 07/15] nvme-pci: move the call to nvme_cleanup_cmd out of nvme_unmap_data Christoph Hellwig
@ 2019-03-25  5:21   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 46+ messages in thread
From: Chaitanya Kulkarni @ 2019-03-25  5:21 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: linux-nvme, linux-block

On 3/21/19 4:11 PM, Christoph Hellwig wrote:
> Cleaning up the command setup isn't related to unmapping data, and
> disentangling them will simplify error handling a bit down the road.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/pci.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 3f06e942fb47..2fb35d44010a 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -888,7 +888,6 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
>   			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
>   	}
>   
> -	nvme_cleanup_cmd(req);
>   	nvme_free_iod(dev, req);
>   }
>   
> @@ -939,6 +938,7 @@ static void nvme_pci_complete_rq(struct request *req)
>   {
>   	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
>   
> +	nvme_cleanup_cmd(req);
>   	nvme_unmap_data(iod->nvmeq->dev, req);
>   	nvme_complete_rq(req);
>   }
> 

Also it is much easier to debug now that it is separated.

Looks good.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 08/15] nvme-pci: merge nvme_free_iod into nvme_unmap_data
  2019-03-21 23:10 ` [PATCH 08/15] nvme-pci: merge nvme_free_iod into nvme_unmap_data Christoph Hellwig
@ 2019-03-25  5:22   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 46+ messages in thread
From: Chaitanya Kulkarni @ 2019-03-25  5:22 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: linux-nvme, linux-block

On 3/21/19 4:11 PM, Christoph Hellwig wrote:
> This means we now have a function that undoes everything nvme_map_data
> does and we can simplify the error handling a bit.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/pci.c | 44 ++++++++++++++++-------------------------
>   1 file changed, 17 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 2fb35d44010a..ce8bdc15ec3c 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -583,14 +583,24 @@ static inline bool nvme_pci_use_sgls(struct nvme_dev *dev, struct request *req)
>   	return true;
>   }
>   
> -static void nvme_free_iod(struct nvme_dev *dev, struct request *req)
> +static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
>   {
>   	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> +	enum dma_data_direction dma_dir = rq_data_dir(req) ?
> +			DMA_TO_DEVICE : DMA_FROM_DEVICE;
>   	const int last_prp = dev->ctrl.page_size / sizeof(__le64) - 1;
>   	dma_addr_t dma_addr = iod->first_dma, next_dma_addr;
> -
>   	int i;
>   
> +	if (iod->nents) {
> +		/* P2PDMA requests do not need to be unmapped */
> +		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
> +			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> +
> +		if (blk_integrity_rq(req))
> +			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
> +	}
> +
>   	if (iod->npages == 0)
>   		dma_pool_free(dev->prp_small_pool, nvme_pci_iod_list(req)[0],
>   			dma_addr);
> @@ -847,50 +857,30 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   		ret = nvme_pci_setup_prps(dev, req, &cmnd->rw);
>   
>   	if (ret != BLK_STS_OK)
> -		goto out_unmap;
> +		goto out;
>   
>   	ret = BLK_STS_IOERR;
>   	if (blk_integrity_rq(req)) {
>   		if (blk_rq_count_integrity_sg(q, req->bio) != 1)
> -			goto out_unmap;
> +			goto out;
>   
>   		sg_init_table(&iod->meta_sg, 1);
>   		if (blk_rq_map_integrity_sg(q, req->bio, &iod->meta_sg) != 1)
> -			goto out_unmap;
> +			goto out;
>   
>   		if (!dma_map_sg(dev->dev, &iod->meta_sg, 1, dma_dir))
> -			goto out_unmap;
> +			goto out;
>   
>   		cmnd->rw.metadata = cpu_to_le64(sg_dma_address(&iod->meta_sg));
>   	}
>   
>   	return BLK_STS_OK;
>   
> -out_unmap:
> -	dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
>   out:
> -	nvme_free_iod(dev, req);
> +	nvme_unmap_data(dev, req);
>   	return ret;
>   }
>   
> -static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
> -{
> -	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> -	enum dma_data_direction dma_dir = rq_data_dir(req) ?
> -			DMA_TO_DEVICE : DMA_FROM_DEVICE;
> -
> -	if (iod->nents) {
> -		/* P2PDMA requests do not need to be unmapped */
> -		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
> -			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> -
> -		if (blk_integrity_rq(req))
> -			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
> -	}
> -
> -	nvme_free_iod(dev, req);
> -}
> -
>   /*
>    * NOTE: ns is NULL when called on the admin queue.
>    */
> 
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 09/15] nvme-pci: only call nvme_unmap_data for requests transferring data
  2019-03-21 23:10 ` [PATCH 09/15] nvme-pci: only call nvme_unmap_data for requests transferring data Christoph Hellwig
@ 2019-03-25  5:23   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 46+ messages in thread
From: Chaitanya Kulkarni @ 2019-03-25  5:23 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: linux-nvme, linux-block

On 3/21/19 4:11 PM, Christoph Hellwig wrote:
> This mirrors how nvme_map_pci is called and will allow simplifying some
> checks in nvme_unmap_pci later on.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/pci.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index ce8bdc15ec3c..bc4ee869fe82 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -929,7 +929,8 @@ static void nvme_pci_complete_rq(struct request *req)
>   	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
>   
>   	nvme_cleanup_cmd(req);
> -	nvme_unmap_data(iod->nvmeq->dev, req);
> +	if (blk_rq_nr_phys_segments(req))
> +		nvme_unmap_data(iod->nvmeq->dev, req);
>   	nvme_complete_rq(req);
>   }
>   
> 

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/15] nvme-pci: do not build a scatterlist to map metadata
  2019-03-21 23:10 ` [PATCH 10/15] nvme-pci: do not build a scatterlist to map metadata Christoph Hellwig
@ 2019-03-25  5:27   ` Chaitanya Kulkarni
  2019-08-28  9:20   ` Ming Lei
  1 sibling, 0 replies; 46+ messages in thread
From: Chaitanya Kulkarni @ 2019-03-25  5:27 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: linux-nvme, linux-block

On 3/21/19 4:12 PM, Christoph Hellwig wrote:
> We always have exactly one segment, so we can simply call dma_map_bvec.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/pci.c | 23 ++++++++++-------------
>   1 file changed, 10 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index bc4ee869fe82..a7dad24e0406 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -221,7 +221,7 @@ struct nvme_iod {
>   	int npages;		/* In the PRP list. 0 means small pool in use */
>   	int nents;		/* Used in scatterlist */
>   	dma_addr_t first_dma;
> -	struct scatterlist meta_sg; /* metadata requires single contiguous buffer */
> +	dma_addr_t meta_dma;
>   	struct scatterlist *sg;
>   	struct scatterlist inline_sg[0];
>   };
> @@ -592,13 +592,16 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
>   	dma_addr_t dma_addr = iod->first_dma, next_dma_addr;
>   	int i;
>   
> +	if (blk_integrity_rq(req)) {
> +		dma_unmap_page(dev->dev, iod->meta_dma,
> +				rq_integrity_vec(req)->bv_len, dma_dir);
> +	}
> +
>   	if (iod->nents) {
>   		/* P2PDMA requests do not need to be unmapped */
>   		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
>   			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
>   
> -		if (blk_integrity_rq(req))
> -			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
>   	}
>   
>   	if (iod->npages == 0)
> @@ -861,17 +864,11 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   
>   	ret = BLK_STS_IOERR;
>   	if (blk_integrity_rq(req)) {
> -		if (blk_rq_count_integrity_sg(q, req->bio) != 1)
> -			goto out;
> -
> -		sg_init_table(&iod->meta_sg, 1);
> -		if (blk_rq_map_integrity_sg(q, req->bio, &iod->meta_sg) != 1)
> -			goto out;
> -
> -		if (!dma_map_sg(dev->dev, &iod->meta_sg, 1, dma_dir))
> +		iod->meta_dma = dma_map_bvec(dev->dev, rq_integrity_vec(req),
> +				dma_dir, 0);
> +		if (dma_mapping_error(dev->dev, iod->meta_dma))
>   			goto out;
> -
> -		cmnd->rw.metadata = cpu_to_le64(sg_dma_address(&iod->meta_sg));
> +		cmnd->rw.metadata = cpu_to_le64(iod->meta_dma);
>   	}
>   
>   	return BLK_STS_OK;
>

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/15] nvme-pci: split metadata handling from nvme_map_data / nvme_unmap_data
  2019-03-21 23:10 ` [PATCH 11/15] nvme-pci: split metadata handling from nvme_map_data / nvme_unmap_data Christoph Hellwig
@ 2019-03-25  5:29   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 46+ messages in thread
From: Chaitanya Kulkarni @ 2019-03-25  5:29 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: linux-nvme, linux-block

On 3/21/19 4:12 PM, Christoph Hellwig wrote:
> This prepares for some bigger changes to the data mapping helpers.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/pci.c | 48 +++++++++++++++++++++++------------------
>   1 file changed, 27 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index a7dad24e0406..cf29d079ad5b 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -592,11 +592,6 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
>   	dma_addr_t dma_addr = iod->first_dma, next_dma_addr;
>   	int i;
>   
> -	if (blk_integrity_rq(req)) {
> -		dma_unmap_page(dev->dev, iod->meta_dma,
> -				rq_integrity_vec(req)->bv_len, dma_dir);
> -	}
> -
>   	if (iod->nents) {
>   		/* P2PDMA requests do not need to be unmapped */
>   		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
> @@ -858,24 +853,23 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   		ret = nvme_pci_setup_sgls(dev, req, &cmnd->rw, nr_mapped);
>   	else
>   		ret = nvme_pci_setup_prps(dev, req, &cmnd->rw);
> -
> +out:
>   	if (ret != BLK_STS_OK)
> -		goto out;
> -
> -	ret = BLK_STS_IOERR;
> -	if (blk_integrity_rq(req)) {
> -		iod->meta_dma = dma_map_bvec(dev->dev, rq_integrity_vec(req),
> -				dma_dir, 0);
> -		if (dma_mapping_error(dev->dev, iod->meta_dma))
> -			goto out;
> -		cmnd->rw.metadata = cpu_to_le64(iod->meta_dma);
> -	}
> +		nvme_unmap_data(dev, req);
> +	return ret;
> +}
>   
> -	return BLK_STS_OK;
> +static blk_status_t nvme_map_metadata(struct nvme_dev *dev, struct request *req,
> +		struct nvme_command *cmnd)
> +{
> +	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
>   
> -out:
> -	nvme_unmap_data(dev, req);
> -	return ret;
> +	iod->meta_dma = dma_map_bvec(dev->dev, rq_integrity_vec(req),
> +			rq_dma_dir(req), 0);
> +	if (dma_mapping_error(dev->dev, iod->meta_dma))
> +		return BLK_STS_IOERR;
> +	cmnd->rw.metadata = cpu_to_le64(iod->meta_dma);
> +	return 0;
>   }
>   
>   /*
> @@ -913,9 +907,17 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
>   			goto out_free_cmd;
>   	}
>   
> +	if (blk_integrity_rq(req)) {
> +		ret = nvme_map_metadata(dev, req, &cmnd);
> +		if (ret)
> +			goto out_unmap_data;
> +	}
> +
>   	blk_mq_start_request(req);
>   	nvme_submit_cmd(nvmeq, &cmnd, bd->last);
>   	return BLK_STS_OK;
> +out_unmap_data:
> +	nvme_unmap_data(dev, req);
>   out_free_cmd:
>   	nvme_cleanup_cmd(req);
>   	return ret;
> @@ -924,10 +926,14 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
>   static void nvme_pci_complete_rq(struct request *req)
>   {
>   	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> +	struct nvme_dev *dev = iod->nvmeq->dev;
>   
>   	nvme_cleanup_cmd(req);
> +	if (blk_integrity_rq(req))
> +		dma_unmap_page(dev->dev, iod->meta_dma,
> +			       rq_integrity_vec(req)->bv_len, rq_data_dir(req));
>   	if (blk_rq_nr_phys_segments(req))
> -		nvme_unmap_data(iod->nvmeq->dev, req);
> +		nvme_unmap_data(dev, req);
>   	nvme_complete_rq(req);
>   }
>   
> 

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 12/15] nvme-pci: remove the inline scatterlist optimization
  2019-03-21 23:10 ` [PATCH 12/15] nvme-pci: remove the inline scatterlist optimization Christoph Hellwig
@ 2019-03-25  5:30   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 46+ messages in thread
From: Chaitanya Kulkarni @ 2019-03-25  5:30 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: linux-nvme, linux-block

On 3/21/19 4:12 PM, Christoph Hellwig wrote:
> We'll have a better way to optimize for small I/O that doesn't
> require it soon, so remove the existing inline_sg case to make that
> optimization easier to implement.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/pci.c | 38 ++++++--------------------------------
>   1 file changed, 6 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index cf29d079ad5b..c6047935e825 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -223,7 +223,6 @@ struct nvme_iod {
>   	dma_addr_t first_dma;
>   	dma_addr_t meta_dma;
>   	struct scatterlist *sg;
> -	struct scatterlist inline_sg[0];
>   };
>   
>   /*
> @@ -370,12 +369,6 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, u32 *dbbuf_db,
>   	return true;
>   }
>   
> -/*
> - * Max size of iod being embedded in the request payload
> - */
> -#define NVME_INT_PAGES		2
> -#define NVME_INT_BYTES(dev)	(NVME_INT_PAGES * (dev)->ctrl.page_size)
> -
>   /*
>    * Will slightly overestimate the number of pages needed.  This is OK
>    * as it only leads to a small amount of wasted memory for the lifetime of
> @@ -410,15 +403,6 @@ static unsigned int nvme_pci_iod_alloc_size(struct nvme_dev *dev,
>   	return alloc_size + sizeof(struct scatterlist) * nseg;
>   }
>   
> -static unsigned int nvme_pci_cmd_size(struct nvme_dev *dev, bool use_sgl)
> -{
> -	unsigned int alloc_size = nvme_pci_iod_alloc_size(dev,
> -				    NVME_INT_BYTES(dev), NVME_INT_PAGES,
> -				    use_sgl);
> -
> -	return sizeof(struct nvme_iod) + alloc_size;
> -}
> -
>   static int nvme_admin_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
>   				unsigned int hctx_idx)
>   {
> @@ -621,8 +605,7 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
>   		dma_addr = next_dma_addr;
>   	}
>   
> -	if (iod->sg != iod->inline_sg)
> -		mempool_free(iod->sg, dev->iod_mempool);
> +	mempool_free(iod->sg, dev->iod_mempool);
>   }
>   
>   static void nvme_print_sgl(struct scatterlist *sgl, int nents)
> @@ -822,14 +805,9 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   	blk_status_t ret = BLK_STS_IOERR;
>   	int nr_mapped;
>   
> -	if (blk_rq_payload_bytes(req) > NVME_INT_BYTES(dev) ||
> -	    blk_rq_nr_phys_segments(req) > NVME_INT_PAGES) {
> -		iod->sg = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
> -		if (!iod->sg)
> -			return BLK_STS_RESOURCE;
> -	} else {
> -		iod->sg = iod->inline_sg;
> -	}
> +	iod->sg = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
> +	if (!iod->sg)
> +		return BLK_STS_RESOURCE;
>   
>   	iod->use_sgl = nvme_pci_use_sgls(dev, req);
>   
> @@ -1619,7 +1597,7 @@ static int nvme_alloc_admin_tags(struct nvme_dev *dev)
>   		dev->admin_tagset.queue_depth = NVME_AQ_MQ_TAG_DEPTH;
>   		dev->admin_tagset.timeout = ADMIN_TIMEOUT;
>   		dev->admin_tagset.numa_node = dev_to_node(dev->dev);
> -		dev->admin_tagset.cmd_size = nvme_pci_cmd_size(dev, false);
> +		dev->admin_tagset.cmd_size = sizeof(struct nvme_iod);
>   		dev->admin_tagset.flags = BLK_MQ_F_NO_SCHED;
>   		dev->admin_tagset.driver_data = dev;
>   
> @@ -2266,11 +2244,7 @@ static int nvme_dev_add(struct nvme_dev *dev)
>   		dev->tagset.numa_node = dev_to_node(dev->dev);
>   		dev->tagset.queue_depth =
>   				min_t(int, dev->q_depth, BLK_MQ_MAX_DEPTH) - 1;
> -		dev->tagset.cmd_size = nvme_pci_cmd_size(dev, false);
> -		if ((dev->ctrl.sgls & ((1 << 0) | (1 << 1))) && sgl_threshold) {
> -			dev->tagset.cmd_size = max(dev->tagset.cmd_size,
> -					nvme_pci_cmd_size(dev, true));
> -		}
> +		dev->tagset.cmd_size = sizeof(struct nvme_iod);
>   		dev->tagset.flags = BLK_MQ_F_SHOULD_MERGE;
>   		dev->tagset.driver_data = dev;
>   
>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 13/15] nvme-pci: optimize mapping of small single segment requests
  2019-03-21 23:10 ` [PATCH 13/15] nvme-pci: optimize mapping of small single segment requests Christoph Hellwig
@ 2019-03-25  5:36   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 46+ messages in thread
From: Chaitanya Kulkarni @ 2019-03-25  5:36 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: linux-nvme, linux-block

On 3/21/19 4:12 PM, Christoph Hellwig wrote:
> If a request is single segment and fits into one or two PRP entries we
> do not have to create a scatterlist for it, but can just map the bio_vec
> directly.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/pci.c | 45 ++++++++++++++++++++++++++++++++++++-----
>   1 file changed, 40 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index c6047935e825..47fc4d653961 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -221,6 +221,7 @@ struct nvme_iod {
>   	int npages;		/* In the PRP list. 0 means small pool in use */
>   	int nents;		/* Used in scatterlist */
>   	dma_addr_t first_dma;
> +	unsigned int dma_len;	/* length of single DMA segment mapping */
>   	dma_addr_t meta_dma;
>   	struct scatterlist *sg;
>   };
> @@ -576,13 +577,18 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
>   	dma_addr_t dma_addr = iod->first_dma, next_dma_addr;
>   	int i;
>   
> -	if (iod->nents) {
> -		/* P2PDMA requests do not need to be unmapped */
> -		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
> -			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> -
> +	if (iod->dma_len) {
> +		dma_unmap_page(dev->dev, dma_addr, iod->dma_len, dma_dir);
> +		return;
>   	}
>   
> +	WARN_ON_ONCE(!iod->nents);
> +
> +	/* P2PDMA requests do not need to be unmapped */
> +	if (!is_pci_p2pdma_page(sg_page(iod->sg)))
> +		dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
> +
> +
>   	if (iod->npages == 0)
>   		dma_pool_free(dev->prp_small_pool, nvme_pci_iod_list(req)[0],
>   			dma_addr);
> @@ -795,6 +801,24 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_dev *dev,
>   	return BLK_STS_OK;
>   }
>   
> +static blk_status_t nvme_setup_prp_simple(struct nvme_dev *dev,
> +		struct request *req, struct nvme_rw_command *cmnd,
> +		struct bio_vec *bv)
> +{
> +	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> +	unsigned int first_prp_len = dev->ctrl.page_size - bv->bv_offset;
> +
> +	iod->first_dma = dma_map_bvec(dev->dev, bv, rq_dma_dir(req), 0);
> +	if (dma_mapping_error(dev->dev, iod->first_dma))
> +		return BLK_STS_RESOURCE;
> +	iod->dma_len = bv->bv_len;
> +
> +	cmnd->dptr.prp1 = cpu_to_le64(iod->first_dma);
> +	if (bv->bv_len > first_prp_len)
> +		cmnd->dptr.prp2 = cpu_to_le64(iod->first_dma + first_prp_len);
> +	return 0;
> +}
> +
>   static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   		struct nvme_command *cmnd)
>   {
> @@ -805,6 +829,17 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   	blk_status_t ret = BLK_STS_IOERR;
>   	int nr_mapped;
>   
> +	if (blk_rq_nr_phys_segments(req) == 1) {
> +		struct bio_vec bv = req_bvec(req);
> +
> +		if (!is_pci_p2pdma_page(bv.bv_page)) {
> +			if (bv.bv_offset + bv.bv_len <= dev->ctrl.page_size * 2)
> +				return nvme_setup_prp_simple(dev, req,
> +							     &cmnd->rw, &bv);
> +		}
> +	}
> +
> +	iod->dma_len = 0;
>   	iod->sg = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
>   	if (!iod->sg)
>   		return BLK_STS_RESOURCE;
> 
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/15] nvme-pci: optimize mapping single segment requests using SGLs
  2019-03-21 23:10 ` [PATCH 14/15] nvme-pci: optimize mapping single segment requests using SGLs Christoph Hellwig
@ 2019-03-25  5:39   ` Chaitanya Kulkarni
  2019-04-30 14:17   ` Klaus Birkelund
  1 sibling, 0 replies; 46+ messages in thread
From: Chaitanya Kulkarni @ 2019-03-25  5:39 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: linux-nvme, linux-block

On 3/21/19 4:12 PM, Christoph Hellwig wrote:
> If the controller supports SGLs we can take another short cut for single
> segment request, given that we can always map those without another
> indirection structure, and thus don't need to create a scatterlist
> structure.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/pci.c | 22 ++++++++++++++++++++++
>   1 file changed, 22 insertions(+)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 47fc4d653961..38869f59c296 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -819,6 +819,23 @@ static blk_status_t nvme_setup_prp_simple(struct nvme_dev *dev,
>   	return 0;
>   }
>   
> +static blk_status_t nvme_setup_sgl_simple(struct nvme_dev *dev,
> +		struct request *req, struct nvme_rw_command *cmnd,
> +		struct bio_vec *bv)
> +{
> +	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> +
> +	iod->first_dma = dma_map_bvec(dev->dev, bv, rq_dma_dir(req), 0);
> +	if (dma_mapping_error(dev->dev, iod->first_dma))
> +		return BLK_STS_RESOURCE;
> +	iod->dma_len = bv->bv_len;
> +
> +	cmnd->dptr.sgl.addr = cpu_to_le64(iod->first_dma);
> +	cmnd->dptr.sgl.length = cpu_to_le32(iod->dma_len);
> +	cmnd->dptr.sgl.type = NVME_SGL_FMT_DATA_DESC << 4;
> +	return 0;
> +}
> +
>   static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   		struct nvme_command *cmnd)
>   {
> @@ -836,6 +853,11 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   			if (bv.bv_offset + bv.bv_len <= dev->ctrl.page_size * 2)
>   				return nvme_setup_prp_simple(dev, req,
>   							     &cmnd->rw, &bv);
> +
> +			if (iod->nvmeq->qid &&
> +			    dev->ctrl.sgls & ((1 << 0) | (1 << 1)))
> +				return nvme_setup_sgl_simple(dev, req,
> +							     &cmnd->rw, &bv);
>   		}
>   	}
>   
> 

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 15/15] nvme-pci: tidy up nvme_map_data
  2019-03-21 23:10 ` [PATCH 15/15] nvme-pci: tidy up nvme_map_data Christoph Hellwig
@ 2019-03-25  5:40   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 46+ messages in thread
From: Chaitanya Kulkarni @ 2019-03-25  5:40 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: linux-nvme, linux-block

On 3/21/19 4:12 PM, Christoph Hellwig wrote:
> Remove two pointless local variables, remove ret assignment that is
> never used, move the use_sgl initialization closer to where it is used.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/pci.c | 17 +++++------------
>   1 file changed, 5 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 38869f59c296..9ec0704d5f78 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -840,10 +840,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   		struct nvme_command *cmnd)
>   {
>   	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> -	struct request_queue *q = req->q;
> -	enum dma_data_direction dma_dir = rq_data_dir(req) ?
> -			DMA_TO_DEVICE : DMA_FROM_DEVICE;
> -	blk_status_t ret = BLK_STS_IOERR;
> +	blk_status_t ret = BLK_STS_RESOURCE;
>   	int nr_mapped;
>   
>   	if (blk_rq_nr_phys_segments(req) == 1) {
> @@ -865,25 +862,21 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   	iod->sg = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
>   	if (!iod->sg)
>   		return BLK_STS_RESOURCE;
> -
> -	iod->use_sgl = nvme_pci_use_sgls(dev, req);
> -
>   	sg_init_table(iod->sg, blk_rq_nr_phys_segments(req));
> -	iod->nents = blk_rq_map_sg(q, req, iod->sg);
> +	iod->nents = blk_rq_map_sg(req->q, req, iod->sg);
>   	if (!iod->nents)
>   		goto out;
>   
> -	ret = BLK_STS_RESOURCE;
> -
>   	if (is_pci_p2pdma_page(sg_page(iod->sg)))
>   		nr_mapped = pci_p2pdma_map_sg(dev->dev, iod->sg, iod->nents,
> -					  dma_dir);
> +					      rq_dma_dir(req));
>   	else
>   		nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
> -					     dma_dir,  DMA_ATTR_NO_WARN);
> +					     rq_dma_dir(req), DMA_ATTR_NO_WARN);
>   	if (!nr_mapped)
>   		goto out;
>   
> +	iod->use_sgl = nvme_pci_use_sgls(dev, req);
>   	if (iod->use_sgl)
>   		ret = nvme_pci_setup_sgls(dev, req, &cmnd->rw, nr_mapped);
>   	else
>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 01/15] block: add a req_bvec helper
  2019-03-25  5:07   ` Chaitanya Kulkarni
@ 2019-03-27 14:16     ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2019-03-27 14:16 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg,
	linux-nvme, linux-block

On Mon, Mar 25, 2019 at 05:07:44AM +0000, Chaitanya Kulkarni wrote:
> > +static inline struct bio_vec req_bvec(struct request *rq)
> > +{
> > +	if (rq->rq_flags & RQF_SPECIAL_PAYLOAD)
> > +		return rq->special_vec;
> Quick question here mostly for my understanding, do we also have to 
> check here for nr_phys_segments for the operations such as write-zeroes ? OR
> 
> this may never get called in write-zeroes context ? OR
> not applicable ?

We assume the caller checks that we have nr_phys_segments.  I'll
add a commen to document that.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/15] block: add a rq_integrity_vec helper
  2019-03-25  5:10   ` Chaitanya Kulkarni
@ 2019-03-27 14:19     ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2019-03-27 14:19 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg,
	linux-nvme, linux-block

On Mon, Mar 25, 2019 at 05:10:20AM +0000, Chaitanya Kulkarni wrote:
> > +/*
> > + * Return the first bvec that contains integrity data.  In general only
> > + * drivers that are limited to a single integrity segment should use this
> > + * helper.
> > + */
> > +static inline struct bio_vec *rq_integrity_vec(struct request *rq)
> > +{
> Wrt comment, should we add a check here to make sure underlaying driver
> has limited single integrity segment ?

I've added a WARN_ON_ONCE.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 03/15] block: add a rq_dma_dir helper
  2019-03-22 13:06   ` Johannes Thumshirn
@ 2019-03-27 14:20     ` Christoph Hellwig
  2019-03-28 10:26       ` Johannes Thumshirn
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2019-03-27 14:20 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg,
	linux-nvme, linux-block

On Fri, Mar 22, 2019 at 02:06:43PM +0100, Johannes Thumshirn wrote:
> On 22/03/2019 00:10, Christoph Hellwig wrote:
> > In a lot of places we want to know the DMA direction for a given
> > struct request.  Add a little helper to make it a littler easier.
> 
> You introuduce the helper here but you're using it in 8/15 for the 1st time.
> 
> Could you convert the possible users to use the new helper in this patch
> as well?

I'd rather not throw that into a nvme series.  Those are easily cleanups
for later.  Maybe we can even encourage people that would otherwise
do whitespace cleanups or incorrect refcount_t conversions to take care
of it :)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 06/15] nvme-pci: remove nvme_init_iod
  2019-03-25  5:19   ` Chaitanya Kulkarni
@ 2019-03-27 14:21     ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2019-03-27 14:21 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg,
	linux-nvme, linux-block

On Mon, Mar 25, 2019 at 05:19:34AM +0000, Chaitanya Kulkarni wrote:
> > @@ -913,9 +902,14 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
> >   	struct nvme_queue *nvmeq = hctx->driver_data;
> >   	struct nvme_dev *dev = nvmeq->dev;
> >   	struct request *req = bd->rq;
> > +	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> >   	struct nvme_command cmnd;
> >   	blk_status_t ret;
> >   
> > +	iod->aborted = 0;
> > +	iod->npages = -1;
> > +	iod->nents = 0;
> > +
> Maybe add an inline helper for above default IOD initialization ?

Why?  It is a few lines of code and we only ever do it here.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] optimize nvme single segment I/O
  2019-03-22 15:44 ` [RFC] optimize nvme single segment I/O Jens Axboe
@ 2019-03-27 14:24   ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2019-03-27 14:24 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, Keith Busch, Sagi Grimberg, linux-nvme, linux-block

FYI, I've pulled the series including the block helpers into nvme-5.2,
with a few tiny changes for the review comments from Chaitanya.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 03/15] block: add a rq_dma_dir helper
  2019-03-27 14:20     ` Christoph Hellwig
@ 2019-03-28 10:26       ` Johannes Thumshirn
  0 siblings, 0 replies; 46+ messages in thread
From: Johannes Thumshirn @ 2019-03-28 10:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Sagi Grimberg, linux-nvme, Jens Axboe, linux-block

On 27/03/2019 15:20, Christoph Hellwig wrote:
[...]
> 
> I'd rather not throw that into a nvme series.  Those are easily cleanups
> for later.  Maybe we can even encourage people that would otherwise
> do whitespace cleanups or incorrect refcount_t conversions to take care
> of it :)

Fair enough.

Byte,
	Johannes
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/15] nvme-pci: optimize mapping single segment requests using SGLs
  2019-03-21 23:10 ` [PATCH 14/15] nvme-pci: optimize mapping single segment requests using SGLs Christoph Hellwig
  2019-03-25  5:39   ` Chaitanya Kulkarni
@ 2019-04-30 14:17   ` Klaus Birkelund
  2019-04-30 14:32     ` Christoph Hellwig
  1 sibling, 1 reply; 46+ messages in thread
From: Klaus Birkelund @ 2019-04-30 14:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, Sagi Grimberg, linux-nvme, linux-block

On Thu, Mar 21, 2019 at 04:10:36PM -0700, Christoph Hellwig wrote:
> If the controller supports SGLs we can take another short cut for single
> segment request, given that we can always map those without another
> indirection structure, and thus don't need to create a scatterlist
> structure.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/nvme/host/pci.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 47fc4d653961..38869f59c296 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -819,6 +819,23 @@ static blk_status_t nvme_setup_prp_simple(struct nvme_dev *dev,
>  	return 0;
>  }
>  
> +static blk_status_t nvme_setup_sgl_simple(struct nvme_dev *dev,
> +		struct request *req, struct nvme_rw_command *cmnd,
> +		struct bio_vec *bv)
> +{
> +	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> +
> +	iod->first_dma = dma_map_bvec(dev->dev, bv, rq_dma_dir(req), 0);
> +	if (dma_mapping_error(dev->dev, iod->first_dma))
> +		return BLK_STS_RESOURCE;
> +	iod->dma_len = bv->bv_len;
> +
> +	cmnd->dptr.sgl.addr = cpu_to_le64(iod->first_dma);
> +	cmnd->dptr.sgl.length = cpu_to_le32(iod->dma_len);
> +	cmnd->dptr.sgl.type = NVME_SGL_FMT_DATA_DESC << 4;
> +	return 0;
> +}
> +
>  static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>  		struct nvme_command *cmnd)
>  {
> @@ -836,6 +853,11 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>  			if (bv.bv_offset + bv.bv_len <= dev->ctrl.page_size * 2)
>  				return nvme_setup_prp_simple(dev, req,
>  							     &cmnd->rw, &bv);
> +
> +			if (iod->nvmeq->qid &&
> +			    dev->ctrl.sgls & ((1 << 0) | (1 << 1)))
> +				return nvme_setup_sgl_simple(dev, req,
> +							     &cmnd->rw, &bv);
>  		}
>  	}
>  
> -- 
> 2.20.1
> 

Hi Christoph,

nvme_setup_sgl_simple does not set the PSDT field, so bypassing
nvme_pci_setup_sgls causing controllers to assume PRP.

Adding `cmnd->flags = NVME_CMD_SGL_METABUF` in nvme_setup_sgl_simple
fixes the issue.


-- 
Klaus

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/15] nvme-pci: optimize mapping single segment requests using SGLs
  2019-04-30 14:17   ` Klaus Birkelund
@ 2019-04-30 14:32     ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2019-04-30 14:32 UTC (permalink / raw)
  To: Klaus Birkelund, Jens Axboe, Keith Busch, Sagi Grimberg,
	linux-nvme, linux-block

On Tue, Apr 30, 2019 at 04:17:22PM +0200, Klaus Birkelund wrote:
> nvme_setup_sgl_simple does not set the PSDT field, so bypassing
> nvme_pci_setup_sgls causing controllers to assume PRP.
> 
> Adding `cmnd->flags = NVME_CMD_SGL_METABUF` in nvme_setup_sgl_simple
> fixes the issue.

Indeed.  Can you send the patch?


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/15] nvme-pci: do not build a scatterlist to map metadata
  2019-03-21 23:10 ` [PATCH 10/15] nvme-pci: do not build a scatterlist to map metadata Christoph Hellwig
  2019-03-25  5:27   ` Chaitanya Kulkarni
@ 2019-08-28  9:20   ` Ming Lei
  2019-09-12  1:02     ` Ming Lei
  1 sibling, 1 reply; 46+ messages in thread
From: Ming Lei @ 2019-08-28  9:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, Sagi Grimberg, linux-nvme, linux-block,
	Gopal Tiwari, dmilburn

On Thu, Mar 21, 2019 at 04:10:32PM -0700, Christoph Hellwig wrote:
> We always have exactly one segment, so we can simply call dma_map_bvec.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/nvme/host/pci.c | 23 ++++++++++-------------
>  1 file changed, 10 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index bc4ee869fe82..a7dad24e0406 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -221,7 +221,7 @@ struct nvme_iod {
>  	int npages;		/* In the PRP list. 0 means small pool in use */
>  	int nents;		/* Used in scatterlist */
>  	dma_addr_t first_dma;
> -	struct scatterlist meta_sg; /* metadata requires single contiguous buffer */
> +	dma_addr_t meta_dma;
>  	struct scatterlist *sg;
>  	struct scatterlist inline_sg[0];
>  };
> @@ -592,13 +592,16 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
>  	dma_addr_t dma_addr = iod->first_dma, next_dma_addr;
>  	int i;
>  
> +	if (blk_integrity_rq(req)) {
> +		dma_unmap_page(dev->dev, iod->meta_dma,
> +				rq_integrity_vec(req)->bv_len, dma_dir);
> +	}
> +
>  	if (iod->nents) {
>  		/* P2PDMA requests do not need to be unmapped */
>  		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
>  			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
>  
> -		if (blk_integrity_rq(req))
> -			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
>  	}
>  
>  	if (iod->npages == 0)
> @@ -861,17 +864,11 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>  
>  	ret = BLK_STS_IOERR;
>  	if (blk_integrity_rq(req)) {
> -		if (blk_rq_count_integrity_sg(q, req->bio) != 1)
> -			goto out;
> -
> -		sg_init_table(&iod->meta_sg, 1);
> -		if (blk_rq_map_integrity_sg(q, req->bio, &iod->meta_sg) != 1)
> -			goto out;
> -
> -		if (!dma_map_sg(dev->dev, &iod->meta_sg, 1, dma_dir))
> +		iod->meta_dma = dma_map_bvec(dev->dev, rq_integrity_vec(req),
> +				dma_dir, 0);

Hi Christoph,

When one bio is enough big, the generated integrity data could cross
more than one pages even though the data is still in single segment.

However, we don't convert to multi-page bvec for bio_integrity_prep(),
and each page may consume one bvec, so is it possible for this patch to
cause issues in case of NVMe's protection? Since this patch supposes
that there is only one bvec for integrity data.

BTW, not see such kind of report, just a concern in theory.

thanks,
Ming

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/15] nvme-pci: do not build a scatterlist to map metadata
  2019-08-28  9:20   ` Ming Lei
@ 2019-09-12  1:02     ` Ming Lei
  2019-09-12  8:20       ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Ming Lei @ 2019-09-12  1:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, Sagi Grimberg, linux-nvme, linux-block,
	Gopal Tiwari, dmilburn

On Wed, Aug 28, 2019 at 05:20:57PM +0800, Ming Lei wrote:
> On Thu, Mar 21, 2019 at 04:10:32PM -0700, Christoph Hellwig wrote:
> > We always have exactly one segment, so we can simply call dma_map_bvec.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  drivers/nvme/host/pci.c | 23 ++++++++++-------------
> >  1 file changed, 10 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index bc4ee869fe82..a7dad24e0406 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -221,7 +221,7 @@ struct nvme_iod {
> >  	int npages;		/* In the PRP list. 0 means small pool in use */
> >  	int nents;		/* Used in scatterlist */
> >  	dma_addr_t first_dma;
> > -	struct scatterlist meta_sg; /* metadata requires single contiguous buffer */
> > +	dma_addr_t meta_dma;
> >  	struct scatterlist *sg;
> >  	struct scatterlist inline_sg[0];
> >  };
> > @@ -592,13 +592,16 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
> >  	dma_addr_t dma_addr = iod->first_dma, next_dma_addr;
> >  	int i;
> >  
> > +	if (blk_integrity_rq(req)) {
> > +		dma_unmap_page(dev->dev, iod->meta_dma,
> > +				rq_integrity_vec(req)->bv_len, dma_dir);
> > +	}
> > +
> >  	if (iod->nents) {
> >  		/* P2PDMA requests do not need to be unmapped */
> >  		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
> >  			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> >  
> > -		if (blk_integrity_rq(req))
> > -			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
> >  	}
> >  
> >  	if (iod->npages == 0)
> > @@ -861,17 +864,11 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
> >  
> >  	ret = BLK_STS_IOERR;
> >  	if (blk_integrity_rq(req)) {
> > -		if (blk_rq_count_integrity_sg(q, req->bio) != 1)
> > -			goto out;
> > -
> > -		sg_init_table(&iod->meta_sg, 1);
> > -		if (blk_rq_map_integrity_sg(q, req->bio, &iod->meta_sg) != 1)
> > -			goto out;
> > -
> > -		if (!dma_map_sg(dev->dev, &iod->meta_sg, 1, dma_dir))
> > +		iod->meta_dma = dma_map_bvec(dev->dev, rq_integrity_vec(req),
> > +				dma_dir, 0);
> 
> Hi Christoph,
> 
> When one bio is enough big, the generated integrity data could cross
> more than one pages even though the data is still in single segment.
> 
> However, we don't convert to multi-page bvec for bio_integrity_prep(),
> and each page may consume one bvec, so is it possible for this patch to
> cause issues in case of NVMe's protection? Since this patch supposes
> that there is only one bvec for integrity data.
> 
> BTW, not see such kind of report, just a concern in theory.

Hello Christoph,

Gently ping...


Thanks,
Ming

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/15] nvme-pci: do not build a scatterlist to map metadata
  2019-09-12  1:02     ` Ming Lei
@ 2019-09-12  8:20       ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2019-09-12  8:20 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg,
	linux-nvme, linux-block, Gopal Tiwari, dmilburn

This is in my todo queue, which is big..

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, back to index

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-21 23:10 [RFC] optimize nvme single segment I/O Christoph Hellwig
2019-03-21 23:10 ` [PATCH 01/15] block: add a req_bvec helper Christoph Hellwig
2019-03-25  5:07   ` Chaitanya Kulkarni
2019-03-27 14:16     ` Christoph Hellwig
2019-03-21 23:10 ` [PATCH 02/15] block: add a rq_integrity_vec helper Christoph Hellwig
2019-03-25  5:10   ` Chaitanya Kulkarni
2019-03-27 14:19     ` Christoph Hellwig
2019-03-21 23:10 ` [PATCH 03/15] block: add a rq_dma_dir helper Christoph Hellwig
2019-03-22 13:06   ` Johannes Thumshirn
2019-03-27 14:20     ` Christoph Hellwig
2019-03-28 10:26       ` Johannes Thumshirn
2019-03-25  5:11   ` Chaitanya Kulkarni
2019-03-21 23:10 ` [PATCH 04/15] block: add dma_map_bvec helper Christoph Hellwig
2019-03-25  5:13   ` Chaitanya Kulkarni
2019-03-21 23:10 ` [PATCH 05/15] nvme-pci: remove the unused iod->length field Christoph Hellwig
2019-03-25  5:14   ` Chaitanya Kulkarni
2019-03-21 23:10 ` [PATCH 06/15] nvme-pci: remove nvme_init_iod Christoph Hellwig
2019-03-25  5:19   ` Chaitanya Kulkarni
2019-03-27 14:21     ` Christoph Hellwig
2019-03-21 23:10 ` [PATCH 07/15] nvme-pci: move the call to nvme_cleanup_cmd out of nvme_unmap_data Christoph Hellwig
2019-03-25  5:21   ` Chaitanya Kulkarni
2019-03-21 23:10 ` [PATCH 08/15] nvme-pci: merge nvme_free_iod into nvme_unmap_data Christoph Hellwig
2019-03-25  5:22   ` Chaitanya Kulkarni
2019-03-21 23:10 ` [PATCH 09/15] nvme-pci: only call nvme_unmap_data for requests transferring data Christoph Hellwig
2019-03-25  5:23   ` Chaitanya Kulkarni
2019-03-21 23:10 ` [PATCH 10/15] nvme-pci: do not build a scatterlist to map metadata Christoph Hellwig
2019-03-25  5:27   ` Chaitanya Kulkarni
2019-08-28  9:20   ` Ming Lei
2019-09-12  1:02     ` Ming Lei
2019-09-12  8:20       ` Christoph Hellwig
2019-03-21 23:10 ` [PATCH 11/15] nvme-pci: split metadata handling from nvme_map_data / nvme_unmap_data Christoph Hellwig
2019-03-25  5:29   ` Chaitanya Kulkarni
2019-03-21 23:10 ` [PATCH 12/15] nvme-pci: remove the inline scatterlist optimization Christoph Hellwig
2019-03-25  5:30   ` Chaitanya Kulkarni
2019-03-21 23:10 ` [PATCH 13/15] nvme-pci: optimize mapping of small single segment requests Christoph Hellwig
2019-03-25  5:36   ` Chaitanya Kulkarni
2019-03-21 23:10 ` [PATCH 14/15] nvme-pci: optimize mapping single segment requests using SGLs Christoph Hellwig
2019-03-25  5:39   ` Chaitanya Kulkarni
2019-04-30 14:17   ` Klaus Birkelund
2019-04-30 14:32     ` Christoph Hellwig
2019-03-21 23:10 ` [PATCH 15/15] nvme-pci: tidy up nvme_map_data Christoph Hellwig
2019-03-25  5:40   ` Chaitanya Kulkarni
2019-03-22 15:44 ` [RFC] optimize nvme single segment I/O Jens Axboe
2019-03-27 14:24   ` Christoph Hellwig
2019-03-22 17:37 ` Keith Busch
2019-03-22 18:55 ` Sagi Grimberg

Linux-Block Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-block/0 linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ https://lore.kernel.org/linux-block \
		linux-block@vger.kernel.org linux-block@archiver.kernel.org
	public-inbox-index linux-block


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-block


AGPL code for this site: git clone https://public-inbox.org/ public-inbox