Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
* [RFC] optimize nvme single segment I/O
@ 2019-03-21 23:10 hch
  2019-03-21 23:10 ` [PATCH 01/15] block: add a req_bvec helper hch
                   ` (17 more replies)
  0 siblings, 18 replies; 45+ messages in thread
From: hch @ 2019-03-21 23:10 UTC (permalink / raw)


Hi all,

with all the discussion on small I/O performance lately I thought
it was time off to dust my old idea to optimize this path a bit
by avoiding to build a scatterlist.  I've only done very basic
testing because I've been a bit busy, but I thought it might be
worthwhile to get it out for feedback.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 01/15] block: add a req_bvec helper
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O hch
@ 2019-03-21 23:10 ` hch
  2019-03-25  5:07   ` Chaitanya.Kulkarni
  2019-03-21 23:10 ` [PATCH 02/15] block: add a rq_integrity_vec helper hch
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 45+ messages in thread
From: hch @ 2019-03-21 23:10 UTC (permalink / raw)


Return the currently active bvec segment, potentially spanning multiple
pages.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 include/linux/blkdev.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 0de92b29f589..255e20313cde 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -929,6 +929,13 @@ static inline unsigned int blk_rq_payload_bytes(struct request *rq)
 	return blk_rq_bytes(rq);
 }
 
+static inline struct bio_vec req_bvec(struct request *rq)
+{
+	if (rq->rq_flags & RQF_SPECIAL_PAYLOAD)
+		return rq->special_vec;
+	return mp_bvec_iter_bvec(rq->bio->bi_io_vec, rq->bio->bi_iter);
+}
+
 static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
 						     int op)
 {
-- 
2.20.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 02/15] block: add a rq_integrity_vec helper
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O hch
  2019-03-21 23:10 ` [PATCH 01/15] block: add a req_bvec helper hch
@ 2019-03-21 23:10 ` hch
  2019-03-25  5:10   ` Chaitanya.Kulkarni
  2019-03-21 23:10 ` [PATCH 03/15] block: add a rq_dma_dir helper hch
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 45+ messages in thread
From: hch @ 2019-03-21 23:10 UTC (permalink / raw)


This provides a nice little shortcut to get the integrity data for
drivers like NVMe that only support a single integrity segment.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 include/linux/blkdev.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 255e20313cde..f9a072610d28 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1552,6 +1552,16 @@ static inline unsigned int bio_integrity_bytes(struct blk_integrity *bi,
 	return bio_integrity_intervals(bi, sectors) * bi->tuple_size;
 }
 
+/*
+ * Return the first bvec that contains integrity data.  In general only
+ * drivers that are limited to a single integrity segment should use this
+ * helper.
+ */
+static inline struct bio_vec *rq_integrity_vec(struct request *rq)
+{
+	return rq->bio->bi_integrity->bip_vec;
+}
+
 #else /* CONFIG_BLK_DEV_INTEGRITY */
 
 struct bio;
@@ -1626,6 +1636,11 @@ static inline unsigned int bio_integrity_bytes(struct blk_integrity *bi,
 	return 0;
 }
 
+static inline struct bio_vec *rq_integrity_vec(struct request *rq)
+{
+	return NULL;
+}
+
 #endif /* CONFIG_BLK_DEV_INTEGRITY */
 
 struct block_device_operations {
-- 
2.20.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 03/15] block: add a rq_dma_dir helper
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O hch
  2019-03-21 23:10 ` [PATCH 01/15] block: add a req_bvec helper hch
  2019-03-21 23:10 ` [PATCH 02/15] block: add a rq_integrity_vec helper hch
@ 2019-03-21 23:10 ` hch
  2019-03-22 13:06   ` jthumshirn
  2019-03-25  5:11   ` Chaitanya.Kulkarni
  2019-03-21 23:10 ` [PATCH 04/15] block: add dma_map_bvec helper hch
                   ` (14 subsequent siblings)
  17 siblings, 2 replies; 45+ messages in thread
From: hch @ 2019-03-21 23:10 UTC (permalink / raw)


In a lot of places we want to know the DMA direction for a given
struct request.  Add a little helper to make it a littler easier.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 include/linux/blkdev.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f9a072610d28..5279104527ad 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -638,6 +638,9 @@ static inline bool blk_account_rq(struct request *rq)
 
 #define rq_data_dir(rq)		(op_is_write(req_op(rq)) ? WRITE : READ)
 
+#define rq_dma_dir(rq) \
+	(op_is_write(req_op(rq)) ? DMA_TO_DEVICE : DMA_FROM_DEVICE)
+
 static inline bool queue_is_mq(struct request_queue *q)
 {
 	return q->mq_ops;
-- 
2.20.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 04/15] block: add dma_map_bvec helper
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O hch
                   ` (2 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 03/15] block: add a rq_dma_dir helper hch
@ 2019-03-21 23:10 ` hch
  2019-03-25  5:13   ` Chaitanya.Kulkarni
  2019-03-21 23:10 ` [PATCH 05/15] nvme-pci: remove the unused iod->length field hch
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 45+ messages in thread
From: hch @ 2019-03-21 23:10 UTC (permalink / raw)


Provide a nice little shortcut for mapping a single bvec.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 include/linux/blkdev.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5279104527ad..322ff969659c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -641,6 +641,10 @@ static inline bool blk_account_rq(struct request *rq)
 #define rq_dma_dir(rq) \
 	(op_is_write(req_op(rq)) ? DMA_TO_DEVICE : DMA_FROM_DEVICE)
 
+#define dma_map_bvec(dev, bv, dir, attrs) \
+	dma_map_page_attrs(dev, (bv)->bv_page, (bv)->bv_offset, (bv)->bv_len, \
+	(dir), (attrs))
+
 static inline bool queue_is_mq(struct request_queue *q)
 {
 	return q->mq_ops;
-- 
2.20.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 05/15] nvme-pci: remove the unused iod->length field
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O hch
                   ` (3 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 04/15] block: add dma_map_bvec helper hch
@ 2019-03-21 23:10 ` hch
  2019-03-25  5:14   ` Chaitanya.Kulkarni
  2019-03-21 23:10 ` [PATCH 06/15] nvme-pci: remove nvme_init_iod hch
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 45+ messages in thread
From: hch @ 2019-03-21 23:10 UTC (permalink / raw)


Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a90cf5d63aac..bf0d71fe243e 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -220,7 +220,6 @@ struct nvme_iod {
 	int aborted;
 	int npages;		/* In the PRP list. 0 means small pool in use */
 	int nents;		/* Used in scatterlist */
-	int length;		/* Of data, in bytes */
 	dma_addr_t first_dma;
 	struct scatterlist meta_sg; /* metadata requires single contiguous buffer */
 	struct scatterlist *sg;
@@ -603,7 +602,6 @@ static blk_status_t nvme_init_iod(struct request *rq, struct nvme_dev *dev)
 	iod->aborted = 0;
 	iod->npages = -1;
 	iod->nents = 0;
-	iod->length = size;
 
 	return BLK_STS_OK;
 }
-- 
2.20.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 06/15] nvme-pci: remove nvme_init_iod
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O hch
                   ` (4 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 05/15] nvme-pci: remove the unused iod->length field hch
@ 2019-03-21 23:10 ` hch
  2019-03-25  5:19   ` Chaitanya.Kulkarni
  2019-03-21 23:10 ` [PATCH 07/15] nvme-pci: move the call to nvme_cleanup_cmd out of nvme_unmap_data hch
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 45+ messages in thread
From: hch @ 2019-03-21 23:10 UTC (permalink / raw)


nvme_init_iod should really be split into two parts: initialize a few
general iod fields, which can easily be done at the beginning of
nvme_queue_rq, and allocating the scatterlist if needed, which logically
belongs into nvme_map_data with the code making use of it.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 56 ++++++++++++++++-------------------------
 1 file changed, 22 insertions(+), 34 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index bf0d71fe243e..3f06e942fb47 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -208,10 +208,10 @@ struct nvme_queue {
 };
 
 /*
- * The nvme_iod describes the data in an I/O, including the list of PRP
- * entries.  You can't see it in this data structure because C doesn't let
- * me express that.  Use nvme_init_iod to ensure there's enough space
- * allocated to store the PRP list.
+ * The nvme_iod describes the data in an I/O.
+ *
+ * The sg pointer contains the list of PRP/SGL chunk allocations in addition
+ * to the actual struct scatterlist.
  */
 struct nvme_iod {
 	struct nvme_request req;
@@ -583,29 +583,6 @@ static inline bool nvme_pci_use_sgls(struct nvme_dev *dev, struct request *req)
 	return true;
 }
 
-static blk_status_t nvme_init_iod(struct request *rq, struct nvme_dev *dev)
-{
-	struct nvme_iod *iod = blk_mq_rq_to_pdu(rq);
-	int nseg = blk_rq_nr_phys_segments(rq);
-	unsigned int size = blk_rq_payload_bytes(rq);
-
-	iod->use_sgl = nvme_pci_use_sgls(dev, rq);
-
-	if (nseg > NVME_INT_PAGES || size > NVME_INT_BYTES(dev)) {
-		iod->sg = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
-		if (!iod->sg)
-			return BLK_STS_RESOURCE;
-	} else {
-		iod->sg = iod->inline_sg;
-	}
-
-	iod->aborted = 0;
-	iod->npages = -1;
-	iod->nents = 0;
-
-	return BLK_STS_OK;
-}
-
 static void nvme_free_iod(struct nvme_dev *dev, struct request *req)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
@@ -837,6 +814,17 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 	blk_status_t ret = BLK_STS_IOERR;
 	int nr_mapped;
 
+	if (blk_rq_payload_bytes(req) > NVME_INT_BYTES(dev) ||
+	    blk_rq_nr_phys_segments(req) > NVME_INT_PAGES) {
+		iod->sg = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
+		if (!iod->sg)
+			return BLK_STS_RESOURCE;
+	} else {
+		iod->sg = iod->inline_sg;
+	}
+
+	iod->use_sgl = nvme_pci_use_sgls(dev, req);
+
 	sg_init_table(iod->sg, blk_rq_nr_phys_segments(req));
 	iod->nents = blk_rq_map_sg(q, req, iod->sg);
 	if (!iod->nents)
@@ -881,6 +869,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 out_unmap:
 	dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
 out:
+	nvme_free_iod(dev, req);
 	return ret;
 }
 
@@ -913,9 +902,14 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	struct nvme_queue *nvmeq = hctx->driver_data;
 	struct nvme_dev *dev = nvmeq->dev;
 	struct request *req = bd->rq;
+	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 	struct nvme_command cmnd;
 	blk_status_t ret;
 
+	iod->aborted = 0;
+	iod->npages = -1;
+	iod->nents = 0;
+
 	/*
 	 * We should not need to do this, but we're still using this to
 	 * ensure we can drain requests on a dying queue.
@@ -927,21 +921,15 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	if (ret)
 		return ret;
 
-	ret = nvme_init_iod(req, dev);
-	if (ret)
-		goto out_free_cmd;
-
 	if (blk_rq_nr_phys_segments(req)) {
 		ret = nvme_map_data(dev, req, &cmnd);
 		if (ret)
-			goto out_cleanup_iod;
+			goto out_free_cmd;
 	}
 
 	blk_mq_start_request(req);
 	nvme_submit_cmd(nvmeq, &cmnd, bd->last);
 	return BLK_STS_OK;
-out_cleanup_iod:
-	nvme_free_iod(dev, req);
 out_free_cmd:
 	nvme_cleanup_cmd(req);
 	return ret;
-- 
2.20.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 07/15] nvme-pci: move the call to nvme_cleanup_cmd out of nvme_unmap_data
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O hch
                   ` (5 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 06/15] nvme-pci: remove nvme_init_iod hch
@ 2019-03-21 23:10 ` hch
  2019-03-25  5:21   ` Chaitanya.Kulkarni
  2019-03-21 23:10 ` [PATCH 08/15] nvme-pci: merge nvme_free_iod into nvme_unmap_data hch
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 45+ messages in thread
From: hch @ 2019-03-21 23:10 UTC (permalink / raw)


Cleaning up the command setup isn't related to unmapping data, and
disentangling them will simplify error handling a bit down the road.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 3f06e942fb47..2fb35d44010a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -888,7 +888,6 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
 	}
 
-	nvme_cleanup_cmd(req);
 	nvme_free_iod(dev, req);
 }
 
@@ -939,6 +938,7 @@ static void nvme_pci_complete_rq(struct request *req)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 
+	nvme_cleanup_cmd(req);
 	nvme_unmap_data(iod->nvmeq->dev, req);
 	nvme_complete_rq(req);
 }
-- 
2.20.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 08/15] nvme-pci: merge nvme_free_iod into nvme_unmap_data
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O hch
                   ` (6 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 07/15] nvme-pci: move the call to nvme_cleanup_cmd out of nvme_unmap_data hch
@ 2019-03-21 23:10 ` hch
  2019-03-25  5:22   ` Chaitanya.Kulkarni
  2019-03-21 23:10 ` [PATCH 09/15] nvme-pci: only call nvme_unmap_data for requests transferring data hch
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 45+ messages in thread
From: hch @ 2019-03-21 23:10 UTC (permalink / raw)


This means we now have a function that undoes everything nvme_map_data
does and we can simplify the error handling a bit.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 44 ++++++++++++++++-------------------------
 1 file changed, 17 insertions(+), 27 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 2fb35d44010a..ce8bdc15ec3c 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -583,14 +583,24 @@ static inline bool nvme_pci_use_sgls(struct nvme_dev *dev, struct request *req)
 	return true;
 }
 
-static void nvme_free_iod(struct nvme_dev *dev, struct request *req)
+static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
+	enum dma_data_direction dma_dir = rq_data_dir(req) ?
+			DMA_TO_DEVICE : DMA_FROM_DEVICE;
 	const int last_prp = dev->ctrl.page_size / sizeof(__le64) - 1;
 	dma_addr_t dma_addr = iod->first_dma, next_dma_addr;
-
 	int i;
 
+	if (iod->nents) {
+		/* P2PDMA requests do not need to be unmapped */
+		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
+			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
+
+		if (blk_integrity_rq(req))
+			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
+	}
+
 	if (iod->npages == 0)
 		dma_pool_free(dev->prp_small_pool, nvme_pci_iod_list(req)[0],
 			dma_addr);
@@ -847,50 +857,30 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 		ret = nvme_pci_setup_prps(dev, req, &cmnd->rw);
 
 	if (ret != BLK_STS_OK)
-		goto out_unmap;
+		goto out;
 
 	ret = BLK_STS_IOERR;
 	if (blk_integrity_rq(req)) {
 		if (blk_rq_count_integrity_sg(q, req->bio) != 1)
-			goto out_unmap;
+			goto out;
 
 		sg_init_table(&iod->meta_sg, 1);
 		if (blk_rq_map_integrity_sg(q, req->bio, &iod->meta_sg) != 1)
-			goto out_unmap;
+			goto out;
 
 		if (!dma_map_sg(dev->dev, &iod->meta_sg, 1, dma_dir))
-			goto out_unmap;
+			goto out;
 
 		cmnd->rw.metadata = cpu_to_le64(sg_dma_address(&iod->meta_sg));
 	}
 
 	return BLK_STS_OK;
 
-out_unmap:
-	dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
 out:
-	nvme_free_iod(dev, req);
+	nvme_unmap_data(dev, req);
 	return ret;
 }
 
-static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
-{
-	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-	enum dma_data_direction dma_dir = rq_data_dir(req) ?
-			DMA_TO_DEVICE : DMA_FROM_DEVICE;
-
-	if (iod->nents) {
-		/* P2PDMA requests do not need to be unmapped */
-		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
-			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
-
-		if (blk_integrity_rq(req))
-			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
-	}
-
-	nvme_free_iod(dev, req);
-}
-
 /*
  * NOTE: ns is NULL when called on the admin queue.
  */
-- 
2.20.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 09/15] nvme-pci: only call nvme_unmap_data for requests transferring data
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O hch
                   ` (7 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 08/15] nvme-pci: merge nvme_free_iod into nvme_unmap_data hch
@ 2019-03-21 23:10 ` hch
  2019-03-25  5:23   ` Chaitanya.Kulkarni
  2019-03-21 23:10 ` [PATCH 10/15] nvme-pci: do not build a scatterlist to map metadata hch
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 45+ messages in thread
From: hch @ 2019-03-21 23:10 UTC (permalink / raw)


This mirrors how nvme_map_pci is called and will allow simplifying some
checks in nvme_unmap_pci later on.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ce8bdc15ec3c..bc4ee869fe82 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -929,7 +929,8 @@ static void nvme_pci_complete_rq(struct request *req)
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 
 	nvme_cleanup_cmd(req);
-	nvme_unmap_data(iod->nvmeq->dev, req);
+	if (blk_rq_nr_phys_segments(req))
+		nvme_unmap_data(iod->nvmeq->dev, req);
 	nvme_complete_rq(req);
 }
 
-- 
2.20.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 10/15] nvme-pci: do not build a scatterlist to map metadata
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O hch
                   ` (8 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 09/15] nvme-pci: only call nvme_unmap_data for requests transferring data hch
@ 2019-03-21 23:10 ` hch
  2019-03-25  5:27   ` Chaitanya.Kulkarni
       [not found]   ` <20190828092057.GA15524@ming.t460p>
  2019-03-21 23:10 ` [PATCH 11/15] nvme-pci: split metadata handling from nvme_map_data / nvme_unmap_data hch
                   ` (7 subsequent siblings)
  17 siblings, 2 replies; 45+ messages in thread
From: hch @ 2019-03-21 23:10 UTC (permalink / raw)


We always have exactly one segment, so we can simply call dma_map_bvec.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index bc4ee869fe82..a7dad24e0406 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -221,7 +221,7 @@ struct nvme_iod {
 	int npages;		/* In the PRP list. 0 means small pool in use */
 	int nents;		/* Used in scatterlist */
 	dma_addr_t first_dma;
-	struct scatterlist meta_sg; /* metadata requires single contiguous buffer */
+	dma_addr_t meta_dma;
 	struct scatterlist *sg;
 	struct scatterlist inline_sg[0];
 };
@@ -592,13 +592,16 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 	dma_addr_t dma_addr = iod->first_dma, next_dma_addr;
 	int i;
 
+	if (blk_integrity_rq(req)) {
+		dma_unmap_page(dev->dev, iod->meta_dma,
+				rq_integrity_vec(req)->bv_len, dma_dir);
+	}
+
 	if (iod->nents) {
 		/* P2PDMA requests do not need to be unmapped */
 		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
 			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
 
-		if (blk_integrity_rq(req))
-			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
 	}
 
 	if (iod->npages == 0)
@@ -861,17 +864,11 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 
 	ret = BLK_STS_IOERR;
 	if (blk_integrity_rq(req)) {
-		if (blk_rq_count_integrity_sg(q, req->bio) != 1)
-			goto out;
-
-		sg_init_table(&iod->meta_sg, 1);
-		if (blk_rq_map_integrity_sg(q, req->bio, &iod->meta_sg) != 1)
-			goto out;
-
-		if (!dma_map_sg(dev->dev, &iod->meta_sg, 1, dma_dir))
+		iod->meta_dma = dma_map_bvec(dev->dev, rq_integrity_vec(req),
+				dma_dir, 0);
+		if (dma_mapping_error(dev->dev, iod->meta_dma))
 			goto out;
-
-		cmnd->rw.metadata = cpu_to_le64(sg_dma_address(&iod->meta_sg));
+		cmnd->rw.metadata = cpu_to_le64(iod->meta_dma);
 	}
 
 	return BLK_STS_OK;
-- 
2.20.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 11/15] nvme-pci: split metadata handling from nvme_map_data / nvme_unmap_data
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O hch
                   ` (9 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 10/15] nvme-pci: do not build a scatterlist to map metadata hch
@ 2019-03-21 23:10 ` hch
  2019-03-25  5:29   ` Chaitanya.Kulkarni
  2019-03-21 23:10 ` [PATCH 12/15] nvme-pci: remove the inline scatterlist optimization hch
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 45+ messages in thread
From: hch @ 2019-03-21 23:10 UTC (permalink / raw)


This prepares for some bigger changes to the data mapping helpers.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 48 +++++++++++++++++++++++------------------
 1 file changed, 27 insertions(+), 21 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a7dad24e0406..cf29d079ad5b 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -592,11 +592,6 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 	dma_addr_t dma_addr = iod->first_dma, next_dma_addr;
 	int i;
 
-	if (blk_integrity_rq(req)) {
-		dma_unmap_page(dev->dev, iod->meta_dma,
-				rq_integrity_vec(req)->bv_len, dma_dir);
-	}
-
 	if (iod->nents) {
 		/* P2PDMA requests do not need to be unmapped */
 		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
@@ -858,24 +853,23 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 		ret = nvme_pci_setup_sgls(dev, req, &cmnd->rw, nr_mapped);
 	else
 		ret = nvme_pci_setup_prps(dev, req, &cmnd->rw);
-
+out:
 	if (ret != BLK_STS_OK)
-		goto out;
-
-	ret = BLK_STS_IOERR;
-	if (blk_integrity_rq(req)) {
-		iod->meta_dma = dma_map_bvec(dev->dev, rq_integrity_vec(req),
-				dma_dir, 0);
-		if (dma_mapping_error(dev->dev, iod->meta_dma))
-			goto out;
-		cmnd->rw.metadata = cpu_to_le64(iod->meta_dma);
-	}
+		nvme_unmap_data(dev, req);
+	return ret;
+}
 
-	return BLK_STS_OK;
+static blk_status_t nvme_map_metadata(struct nvme_dev *dev, struct request *req,
+		struct nvme_command *cmnd)
+{
+	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 
-out:
-	nvme_unmap_data(dev, req);
-	return ret;
+	iod->meta_dma = dma_map_bvec(dev->dev, rq_integrity_vec(req),
+			rq_dma_dir(req), 0);
+	if (dma_mapping_error(dev->dev, iod->meta_dma))
+		return BLK_STS_IOERR;
+	cmnd->rw.metadata = cpu_to_le64(iod->meta_dma);
+	return 0;
 }
 
 /*
@@ -913,9 +907,17 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 			goto out_free_cmd;
 	}
 
+	if (blk_integrity_rq(req)) {
+		ret = nvme_map_metadata(dev, req, &cmnd);
+		if (ret)
+			goto out_unmap_data;
+	}
+
 	blk_mq_start_request(req);
 	nvme_submit_cmd(nvmeq, &cmnd, bd->last);
 	return BLK_STS_OK;
+out_unmap_data:
+	nvme_unmap_data(dev, req);
 out_free_cmd:
 	nvme_cleanup_cmd(req);
 	return ret;
@@ -924,10 +926,14 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 static void nvme_pci_complete_rq(struct request *req)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
+	struct nvme_dev *dev = iod->nvmeq->dev;
 
 	nvme_cleanup_cmd(req);
+	if (blk_integrity_rq(req))
+		dma_unmap_page(dev->dev, iod->meta_dma,
+			       rq_integrity_vec(req)->bv_len, rq_data_dir(req));
 	if (blk_rq_nr_phys_segments(req))
-		nvme_unmap_data(iod->nvmeq->dev, req);
+		nvme_unmap_data(dev, req);
 	nvme_complete_rq(req);
 }
 
-- 
2.20.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 12/15] nvme-pci: remove the inline scatterlist optimization
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O hch
                   ` (10 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 11/15] nvme-pci: split metadata handling from nvme_map_data / nvme_unmap_data hch
@ 2019-03-21 23:10 ` hch
  2019-03-25  5:30   ` Chaitanya.Kulkarni
  2019-03-21 23:10 ` [PATCH 13/15] nvme-pci: optimize mapping of small single segment requests hch
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 45+ messages in thread
From: hch @ 2019-03-21 23:10 UTC (permalink / raw)


We'll have a better way to optimize for small I/O that doesn't
require it soon, so remove the existing inline_sg case to make that
optimization easier to implement.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 38 ++++++--------------------------------
 1 file changed, 6 insertions(+), 32 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index cf29d079ad5b..c6047935e825 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -223,7 +223,6 @@ struct nvme_iod {
 	dma_addr_t first_dma;
 	dma_addr_t meta_dma;
 	struct scatterlist *sg;
-	struct scatterlist inline_sg[0];
 };
 
 /*
@@ -370,12 +369,6 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, u32 *dbbuf_db,
 	return true;
 }
 
-/*
- * Max size of iod being embedded in the request payload
- */
-#define NVME_INT_PAGES		2
-#define NVME_INT_BYTES(dev)	(NVME_INT_PAGES * (dev)->ctrl.page_size)
-
 /*
  * Will slightly overestimate the number of pages needed.  This is OK
  * as it only leads to a small amount of wasted memory for the lifetime of
@@ -410,15 +403,6 @@ static unsigned int nvme_pci_iod_alloc_size(struct nvme_dev *dev,
 	return alloc_size + sizeof(struct scatterlist) * nseg;
 }
 
-static unsigned int nvme_pci_cmd_size(struct nvme_dev *dev, bool use_sgl)
-{
-	unsigned int alloc_size = nvme_pci_iod_alloc_size(dev,
-				    NVME_INT_BYTES(dev), NVME_INT_PAGES,
-				    use_sgl);
-
-	return sizeof(struct nvme_iod) + alloc_size;
-}
-
 static int nvme_admin_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
 				unsigned int hctx_idx)
 {
@@ -621,8 +605,7 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 		dma_addr = next_dma_addr;
 	}
 
-	if (iod->sg != iod->inline_sg)
-		mempool_free(iod->sg, dev->iod_mempool);
+	mempool_free(iod->sg, dev->iod_mempool);
 }
 
 static void nvme_print_sgl(struct scatterlist *sgl, int nents)
@@ -822,14 +805,9 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 	blk_status_t ret = BLK_STS_IOERR;
 	int nr_mapped;
 
-	if (blk_rq_payload_bytes(req) > NVME_INT_BYTES(dev) ||
-	    blk_rq_nr_phys_segments(req) > NVME_INT_PAGES) {
-		iod->sg = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
-		if (!iod->sg)
-			return BLK_STS_RESOURCE;
-	} else {
-		iod->sg = iod->inline_sg;
-	}
+	iod->sg = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
+	if (!iod->sg)
+		return BLK_STS_RESOURCE;
 
 	iod->use_sgl = nvme_pci_use_sgls(dev, req);
 
@@ -1619,7 +1597,7 @@ static int nvme_alloc_admin_tags(struct nvme_dev *dev)
 		dev->admin_tagset.queue_depth = NVME_AQ_MQ_TAG_DEPTH;
 		dev->admin_tagset.timeout = ADMIN_TIMEOUT;
 		dev->admin_tagset.numa_node = dev_to_node(dev->dev);
-		dev->admin_tagset.cmd_size = nvme_pci_cmd_size(dev, false);
+		dev->admin_tagset.cmd_size = sizeof(struct nvme_iod);
 		dev->admin_tagset.flags = BLK_MQ_F_NO_SCHED;
 		dev->admin_tagset.driver_data = dev;
 
@@ -2266,11 +2244,7 @@ static int nvme_dev_add(struct nvme_dev *dev)
 		dev->tagset.numa_node = dev_to_node(dev->dev);
 		dev->tagset.queue_depth =
 				min_t(int, dev->q_depth, BLK_MQ_MAX_DEPTH) - 1;
-		dev->tagset.cmd_size = nvme_pci_cmd_size(dev, false);
-		if ((dev->ctrl.sgls & ((1 << 0) | (1 << 1))) && sgl_threshold) {
-			dev->tagset.cmd_size = max(dev->tagset.cmd_size,
-					nvme_pci_cmd_size(dev, true));
-		}
+		dev->tagset.cmd_size = sizeof(struct nvme_iod);
 		dev->tagset.flags = BLK_MQ_F_SHOULD_MERGE;
 		dev->tagset.driver_data = dev;
 
-- 
2.20.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 13/15] nvme-pci: optimize mapping of small single segment requests
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O hch
                   ` (11 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 12/15] nvme-pci: remove the inline scatterlist optimization hch
@ 2019-03-21 23:10 ` hch
  2019-03-25  5:36   ` Chaitanya.Kulkarni
  2019-03-21 23:10 ` [PATCH 14/15] nvme-pci: optimize mapping single segment requests using SGLs hch
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 45+ messages in thread
From: hch @ 2019-03-21 23:10 UTC (permalink / raw)


If a request is single segment and fits into one or two PRP entries we
do not have to create a scatterlist for it, but can just map the bio_vec
directly.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 45 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 40 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index c6047935e825..47fc4d653961 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -221,6 +221,7 @@ struct nvme_iod {
 	int npages;		/* In the PRP list. 0 means small pool in use */
 	int nents;		/* Used in scatterlist */
 	dma_addr_t first_dma;
+	unsigned int dma_len;	/* length of single DMA segment mapping */
 	dma_addr_t meta_dma;
 	struct scatterlist *sg;
 };
@@ -576,13 +577,18 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 	dma_addr_t dma_addr = iod->first_dma, next_dma_addr;
 	int i;
 
-	if (iod->nents) {
-		/* P2PDMA requests do not need to be unmapped */
-		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
-			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
-
+	if (iod->dma_len) {
+		dma_unmap_page(dev->dev, dma_addr, iod->dma_len, dma_dir);
+		return;
 	}
 
+	WARN_ON_ONCE(!iod->nents);
+
+	/* P2PDMA requests do not need to be unmapped */
+	if (!is_pci_p2pdma_page(sg_page(iod->sg)))
+		dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
+
+
 	if (iod->npages == 0)
 		dma_pool_free(dev->prp_small_pool, nvme_pci_iod_list(req)[0],
 			dma_addr);
@@ -795,6 +801,24 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_dev *dev,
 	return BLK_STS_OK;
 }
 
+static blk_status_t nvme_setup_prp_simple(struct nvme_dev *dev,
+		struct request *req, struct nvme_rw_command *cmnd,
+		struct bio_vec *bv)
+{
+	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
+	unsigned int first_prp_len = dev->ctrl.page_size - bv->bv_offset;
+
+	iod->first_dma = dma_map_bvec(dev->dev, bv, rq_dma_dir(req), 0);
+	if (dma_mapping_error(dev->dev, iod->first_dma))
+		return BLK_STS_RESOURCE;
+	iod->dma_len = bv->bv_len;
+
+	cmnd->dptr.prp1 = cpu_to_le64(iod->first_dma);
+	if (bv->bv_len > first_prp_len)
+		cmnd->dptr.prp2 = cpu_to_le64(iod->first_dma + first_prp_len);
+	return 0;
+}
+
 static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 		struct nvme_command *cmnd)
 {
@@ -805,6 +829,17 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 	blk_status_t ret = BLK_STS_IOERR;
 	int nr_mapped;
 
+	if (blk_rq_nr_phys_segments(req) == 1) {
+		struct bio_vec bv = req_bvec(req);
+
+		if (!is_pci_p2pdma_page(bv.bv_page)) {
+			if (bv.bv_offset + bv.bv_len <= dev->ctrl.page_size * 2)
+				return nvme_setup_prp_simple(dev, req,
+							     &cmnd->rw, &bv);
+		}
+	}
+
+	iod->dma_len = 0;
 	iod->sg = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
 	if (!iod->sg)
 		return BLK_STS_RESOURCE;
-- 
2.20.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 14/15] nvme-pci: optimize mapping single segment requests using SGLs
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O hch
                   ` (12 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 13/15] nvme-pci: optimize mapping of small single segment requests hch
@ 2019-03-21 23:10 ` hch
  2019-03-25  5:39   ` Chaitanya.Kulkarni
  2019-04-30 14:17   ` klaus
  2019-03-21 23:10 ` [PATCH 15/15] nvme-pci: tidy up nvme_map_data hch
                   ` (3 subsequent siblings)
  17 siblings, 2 replies; 45+ messages in thread
From: hch @ 2019-03-21 23:10 UTC (permalink / raw)


If the controller supports SGLs we can take another short cut for single
segment request, given that we can always map those without another
indirection structure, and thus don't need to create a scatterlist
structure.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 47fc4d653961..38869f59c296 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -819,6 +819,23 @@ static blk_status_t nvme_setup_prp_simple(struct nvme_dev *dev,
 	return 0;
 }
 
+static blk_status_t nvme_setup_sgl_simple(struct nvme_dev *dev,
+		struct request *req, struct nvme_rw_command *cmnd,
+		struct bio_vec *bv)
+{
+	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
+
+	iod->first_dma = dma_map_bvec(dev->dev, bv, rq_dma_dir(req), 0);
+	if (dma_mapping_error(dev->dev, iod->first_dma))
+		return BLK_STS_RESOURCE;
+	iod->dma_len = bv->bv_len;
+
+	cmnd->dptr.sgl.addr = cpu_to_le64(iod->first_dma);
+	cmnd->dptr.sgl.length = cpu_to_le32(iod->dma_len);
+	cmnd->dptr.sgl.type = NVME_SGL_FMT_DATA_DESC << 4;
+	return 0;
+}
+
 static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 		struct nvme_command *cmnd)
 {
@@ -836,6 +853,11 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 			if (bv.bv_offset + bv.bv_len <= dev->ctrl.page_size * 2)
 				return nvme_setup_prp_simple(dev, req,
 							     &cmnd->rw, &bv);
+
+			if (iod->nvmeq->qid &&
+			    dev->ctrl.sgls & ((1 << 0) | (1 << 1)))
+				return nvme_setup_sgl_simple(dev, req,
+							     &cmnd->rw, &bv);
 		}
 	}
 
-- 
2.20.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 15/15] nvme-pci: tidy up nvme_map_data
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O hch
                   ` (13 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 14/15] nvme-pci: optimize mapping single segment requests using SGLs hch
@ 2019-03-21 23:10 ` hch
  2019-03-25  5:40   ` Chaitanya.Kulkarni
  2019-03-22 15:44 ` [RFC] optimize nvme single segment I/O axboe
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 45+ messages in thread
From: hch @ 2019-03-21 23:10 UTC (permalink / raw)


Remove two pointless local variables, remove ret assignment that is
never used, move the use_sgl initialization closer to where it is used.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 38869f59c296..9ec0704d5f78 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -840,10 +840,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 		struct nvme_command *cmnd)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-	struct request_queue *q = req->q;
-	enum dma_data_direction dma_dir = rq_data_dir(req) ?
-			DMA_TO_DEVICE : DMA_FROM_DEVICE;
-	blk_status_t ret = BLK_STS_IOERR;
+	blk_status_t ret = BLK_STS_RESOURCE;
 	int nr_mapped;
 
 	if (blk_rq_nr_phys_segments(req) == 1) {
@@ -865,25 +862,21 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 	iod->sg = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
 	if (!iod->sg)
 		return BLK_STS_RESOURCE;
-
-	iod->use_sgl = nvme_pci_use_sgls(dev, req);
-
 	sg_init_table(iod->sg, blk_rq_nr_phys_segments(req));
-	iod->nents = blk_rq_map_sg(q, req, iod->sg);
+	iod->nents = blk_rq_map_sg(req->q, req, iod->sg);
 	if (!iod->nents)
 		goto out;
 
-	ret = BLK_STS_RESOURCE;
-
 	if (is_pci_p2pdma_page(sg_page(iod->sg)))
 		nr_mapped = pci_p2pdma_map_sg(dev->dev, iod->sg, iod->nents,
-					  dma_dir);
+					      rq_dma_dir(req));
 	else
 		nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
-					     dma_dir,  DMA_ATTR_NO_WARN);
+					     rq_dma_dir(req), DMA_ATTR_NO_WARN);
 	if (!nr_mapped)
 		goto out;
 
+	iod->use_sgl = nvme_pci_use_sgls(dev, req);
 	if (iod->use_sgl)
 		ret = nvme_pci_setup_sgls(dev, req, &cmnd->rw, nr_mapped);
 	else
-- 
2.20.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 03/15] block: add a rq_dma_dir helper
  2019-03-21 23:10 ` [PATCH 03/15] block: add a rq_dma_dir helper hch
@ 2019-03-22 13:06   ` jthumshirn
  2019-03-27 14:20     ` hch
  2019-03-25  5:11   ` Chaitanya.Kulkarni
  1 sibling, 1 reply; 45+ messages in thread
From: jthumshirn @ 2019-03-22 13:06 UTC (permalink / raw)


On 22/03/2019 00:10, Christoph Hellwig wrote:
> In a lot of places we want to know the DMA direction for a given
> struct request.  Add a little helper to make it a littler easier.

You introuduce the helper here but you're using it in 8/15 for the 1st time.

Could you convert the possible users to use the new helper in this patch
as well?



-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC] optimize nvme single segment I/O
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O hch
                   ` (14 preceding siblings ...)
  2019-03-21 23:10 ` [PATCH 15/15] nvme-pci: tidy up nvme_map_data hch
@ 2019-03-22 15:44 ` axboe
  2019-03-27 14:24   ` hch
  2019-03-22 17:37 ` kbusch
  2019-03-22 18:55 ` sagi
  17 siblings, 1 reply; 45+ messages in thread
From: axboe @ 2019-03-22 15:44 UTC (permalink / raw)


On 3/21/19 5:10 PM, Christoph Hellwig wrote:
> Hi all,
> 
> with all the discussion on small I/O performance lately I thought
> it was time off to dust my old idea to optimize this path a bit
> by avoiding to build a scatterlist.  I've only done very basic
> testing because I've been a bit busy, but I thought it might be
> worthwhile to get it out for feedback.

I ran this through some quick testing, and it's a big win for 4k
IOs here. Around an 8% improvement for cases that are CPU bound.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC] optimize nvme single segment I/O
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O hch
                   ` (15 preceding siblings ...)
  2019-03-22 15:44 ` [RFC] optimize nvme single segment I/O axboe
@ 2019-03-22 17:37 ` kbusch
  2019-03-22 18:55 ` sagi
  17 siblings, 0 replies; 45+ messages in thread
From: kbusch @ 2019-03-22 17:37 UTC (permalink / raw)


On Thu, Mar 21, 2019@04:10:22PM -0700, Christoph Hellwig wrote:
> Hi all,
> 
> with all the discussion on small I/O performance lately I thought
> it was time off to dust my old idea to optimize this path a bit
> by avoiding to build a scatterlist.  I've only done very basic
> testing because I've been a bit busy, but I thought it might be
> worthwhile to get it out for feedback.

Tests well here with a measurable IOPs improvement at lower queue depths.

Series looks good to me, especially patch 5! :p

Reviewed-by: Keith Busch <keith.busch at intel.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC] optimize nvme single segment I/O
  2019-03-21 23:10 [RFC] optimize nvme single segment I/O hch
                   ` (16 preceding siblings ...)
  2019-03-22 17:37 ` kbusch
@ 2019-03-22 18:55 ` sagi
  17 siblings, 0 replies; 45+ messages in thread
From: sagi @ 2019-03-22 18:55 UTC (permalink / raw)



> Hi all,
> 
> with all the discussion on small I/O performance lately I thought
> it was time off to dust my old idea to optimize this path a bit
> by avoiding to build a scatterlist.  I've only done very basic
> testing because I've been a bit busy, but I thought it might be
> worthwhile to get it out for feedback.

This looks good Christoph! for the series,

Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 01/15] block: add a req_bvec helper
  2019-03-21 23:10 ` [PATCH 01/15] block: add a req_bvec helper hch
@ 2019-03-25  5:07   ` Chaitanya.Kulkarni
  2019-03-27 14:16     ` hch
  0 siblings, 1 reply; 45+ messages in thread
From: Chaitanya.Kulkarni @ 2019-03-25  5:07 UTC (permalink / raw)


On 3/21/19 4:11 PM, Christoph Hellwig wrote:
> Return the currently active bvec segment, potentially spanning multiple
> pages.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>   include/linux/blkdev.h | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 0de92b29f589..255e20313cde 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -929,6 +929,13 @@ static inline unsigned int blk_rq_payload_bytes(struct request *rq)
>   	return blk_rq_bytes(rq);
>   }
>   
> +static inline struct bio_vec req_bvec(struct request *rq)
> +{
> +	if (rq->rq_flags & RQF_SPECIAL_PAYLOAD)
> +		return rq->special_vec;
Quick question here mostly for my understanding, do we also have to 
check here for nr_phys_segments for the operations such as write-zeroes ? OR

this may never get called in write-zeroes context ? OR
not applicable ?

> +	return mp_bvec_iter_bvec(rq->bio->bi_io_vec, rq->bio->bi_iter);
> +}
> +
>   static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
>   						     int op)
>   {
> 

Otherwise, looks good.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni at wdc.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 02/15] block: add a rq_integrity_vec helper
  2019-03-21 23:10 ` [PATCH 02/15] block: add a rq_integrity_vec helper hch
@ 2019-03-25  5:10   ` Chaitanya.Kulkarni
  2019-03-27 14:19     ` hch
  0 siblings, 1 reply; 45+ messages in thread
From: Chaitanya.Kulkarni @ 2019-03-25  5:10 UTC (permalink / raw)


On 3/21/19 4:11 PM, Christoph Hellwig wrote:
> This provides a nice little shortcut to get the integrity data for
> drivers like NVMe that only support a single integrity segment.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>   include/linux/blkdev.h | 15 +++++++++++++++
>   1 file changed, 15 insertions(+)
> 
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 255e20313cde..f9a072610d28 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1552,6 +1552,16 @@ static inline unsigned int bio_integrity_bytes(struct blk_integrity *bi,
>   	return bio_integrity_intervals(bi, sectors) * bi->tuple_size;
>   }
>   
> +/*
> + * Return the first bvec that contains integrity data.  In general only
> + * drivers that are limited to a single integrity segment should use this
> + * helper.
> + */
> +static inline struct bio_vec *rq_integrity_vec(struct request *rq)
> +{
Wrt comment, should we add a check here to make sure underlaying driver
has limited single integrity segment ?
> +	return rq->bio->bi_integrity->bip_vec;
> +}
> +
>   #else /* CONFIG_BLK_DEV_INTEGRITY */
>   
>   struct bio;
> @@ -1626,6 +1636,11 @@ static inline unsigned int bio_integrity_bytes(struct blk_integrity *bi,
>   	return 0;
>   }
>   
> +static inline struct bio_vec *rq_integrity_vec(struct request *rq)
> +{
> +	return NULL;
> +}
> +
>   #endif /* CONFIG_BLK_DEV_INTEGRITY */
>   
>   struct block_device_operations {
> 

Otherwise looks good.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni at wdc.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 03/15] block: add a rq_dma_dir helper
  2019-03-21 23:10 ` [PATCH 03/15] block: add a rq_dma_dir helper hch
  2019-03-22 13:06   ` jthumshirn
@ 2019-03-25  5:11   ` Chaitanya.Kulkarni
  1 sibling, 0 replies; 45+ messages in thread
From: Chaitanya.Kulkarni @ 2019-03-25  5:11 UTC (permalink / raw)


On 3/21/19 4:11 PM, Christoph Hellwig wrote:
> In a lot of places we want to know the DMA direction for a given
> struct request.  Add a little helper to make it a littler easier.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>   include/linux/blkdev.h | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index f9a072610d28..5279104527ad 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -638,6 +638,9 @@ static inline bool blk_account_rq(struct request *rq)
>   
>   #define rq_data_dir(rq)		(op_is_write(req_op(rq)) ? WRITE : READ)
>   
> +#define rq_dma_dir(rq) \
> +	(op_is_write(req_op(rq)) ? DMA_TO_DEVICE : DMA_FROM_DEVICE)
> +
>   static inline bool queue_is_mq(struct request_queue *q)
>   {
>   	return q->mq_ops;
> 

Apart from Johannes's comment looks good.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni at wdc.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 04/15] block: add dma_map_bvec helper
  2019-03-21 23:10 ` [PATCH 04/15] block: add dma_map_bvec helper hch
@ 2019-03-25  5:13   ` Chaitanya.Kulkarni
  0 siblings, 0 replies; 45+ messages in thread
From: Chaitanya.Kulkarni @ 2019-03-25  5:13 UTC (permalink / raw)


On 3/21/19 4:11 PM, Christoph Hellwig wrote:
> Provide a nice little shortcut for mapping a single bvec.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>   include/linux/blkdev.h | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 5279104527ad..322ff969659c 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -641,6 +641,10 @@ static inline bool blk_account_rq(struct request *rq)
>   #define rq_dma_dir(rq) \
>   	(op_is_write(req_op(rq)) ? DMA_TO_DEVICE : DMA_FROM_DEVICE)
>   
> +#define dma_map_bvec(dev, bv, dir, attrs) \
> +	dma_map_page_attrs(dev, (bv)->bv_page, (bv)->bv_offset, (bv)->bv_len, \
> +	(dir), (attrs))
> +
>   static inline bool queue_is_mq(struct request_queue *q)
>   {
>   	return q->mq_ops;
> 

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni at wdc.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 05/15] nvme-pci: remove the unused iod->length field
  2019-03-21 23:10 ` [PATCH 05/15] nvme-pci: remove the unused iod->length field hch
@ 2019-03-25  5:14   ` Chaitanya.Kulkarni
  0 siblings, 0 replies; 45+ messages in thread
From: Chaitanya.Kulkarni @ 2019-03-25  5:14 UTC (permalink / raw)


On 3/21/19 4:11 PM, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>   drivers/nvme/host/pci.c | 2 --
>   1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index a90cf5d63aac..bf0d71fe243e 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -220,7 +220,6 @@ struct nvme_iod {
>   	int aborted;
>   	int npages;		/* In the PRP list. 0 means small pool in use */
>   	int nents;		/* Used in scatterlist */
> -	int length;		/* Of data, in bytes */
>   	dma_addr_t first_dma;
>   	struct scatterlist meta_sg; /* metadata requires single contiguous buffer */
>   	struct scatterlist *sg;
> @@ -603,7 +602,6 @@ static blk_status_t nvme_init_iod(struct request *rq, struct nvme_dev *dev)
>   	iod->aborted = 0;
>   	iod->npages = -1;
>   	iod->nents = 0;
> -	iod->length = size;
>   
>   	return BLK_STS_OK;
>   }
> 

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni at wdc.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 06/15] nvme-pci: remove nvme_init_iod
  2019-03-21 23:10 ` [PATCH 06/15] nvme-pci: remove nvme_init_iod hch
@ 2019-03-25  5:19   ` Chaitanya.Kulkarni
  2019-03-27 14:21     ` hch
  0 siblings, 1 reply; 45+ messages in thread
From: Chaitanya.Kulkarni @ 2019-03-25  5:19 UTC (permalink / raw)


On 3/21/19 4:11 PM, Christoph Hellwig wrote:
> nvme_init_iod should really be split into two parts: initialize a few
> general iod fields, which can easily be done at the beginning of
> nvme_queue_rq, and allocating the scatterlist if needed, which logically
> belongs into nvme_map_data with the code making use of it.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>   drivers/nvme/host/pci.c | 56 ++++++++++++++++-------------------------
>   1 file changed, 22 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index bf0d71fe243e..3f06e942fb47 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -208,10 +208,10 @@ struct nvme_queue {
>   };
>   
>   /*
> - * The nvme_iod describes the data in an I/O, including the list of PRP
> - * entries.  You can't see it in this data structure because C doesn't let
> - * me express that.  Use nvme_init_iod to ensure there's enough space
> - * allocated to store the PRP list.
> + * The nvme_iod describes the data in an I/O.
> + *
> + * The sg pointer contains the list of PRP/SGL chunk allocations in addition
> + * to the actual struct scatterlist.
>    */
>   struct nvme_iod {
>   	struct nvme_request req;
> @@ -583,29 +583,6 @@ static inline bool nvme_pci_use_sgls(struct nvme_dev *dev, struct request *req)
>   	return true;
>   }
>   
> -static blk_status_t nvme_init_iod(struct request *rq, struct nvme_dev *dev)
> -{
> -	struct nvme_iod *iod = blk_mq_rq_to_pdu(rq);
> -	int nseg = blk_rq_nr_phys_segments(rq);
> -	unsigned int size = blk_rq_payload_bytes(rq);
> -
> -	iod->use_sgl = nvme_pci_use_sgls(dev, rq);
> -
> -	if (nseg > NVME_INT_PAGES || size > NVME_INT_BYTES(dev)) {
> -		iod->sg = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
> -		if (!iod->sg)
> -			return BLK_STS_RESOURCE;
> -	} else {
> -		iod->sg = iod->inline_sg;
> -	}
> -
> -	iod->aborted = 0;
> -	iod->npages = -1;
> -	iod->nents = 0;
> -
> -	return BLK_STS_OK;
> -}
> -
>   static void nvme_free_iod(struct nvme_dev *dev, struct request *req)
>   {
>   	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> @@ -837,6 +814,17 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   	blk_status_t ret = BLK_STS_IOERR;
>   	int nr_mapped;
>   
> +	if (blk_rq_payload_bytes(req) > NVME_INT_BYTES(dev) ||
> +	    blk_rq_nr_phys_segments(req) > NVME_INT_PAGES) {
> +		iod->sg = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
> +		if (!iod->sg)
> +			return BLK_STS_RESOURCE;
> +	} else {
> +		iod->sg = iod->inline_sg;
> +	}
> +
> +	iod->use_sgl = nvme_pci_use_sgls(dev, req);
> +
>   	sg_init_table(iod->sg, blk_rq_nr_phys_segments(req));
>   	iod->nents = blk_rq_map_sg(q, req, iod->sg);
>   	if (!iod->nents)
> @@ -881,6 +869,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   out_unmap:
>   	dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
>   out:
> +	nvme_free_iod(dev, req);
>   	return ret;
>   }
>   
> @@ -913,9 +902,14 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
>   	struct nvme_queue *nvmeq = hctx->driver_data;
>   	struct nvme_dev *dev = nvmeq->dev;
>   	struct request *req = bd->rq;
> +	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
>   	struct nvme_command cmnd;
>   	blk_status_t ret;
>   
> +	iod->aborted = 0;
> +	iod->npages = -1;
> +	iod->nents = 0;
> +
Maybe add an inline helper for above default IOD initialization ?
>   	/*
>   	 * We should not need to do this, but we're still using this to
>   	 * ensure we can drain requests on a dying queue.
> @@ -927,21 +921,15 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
>   	if (ret)
>   		return ret;
>   
> -	ret = nvme_init_iod(req, dev);
> -	if (ret)
> -		goto out_free_cmd;
> -
>   	if (blk_rq_nr_phys_segments(req)) {
>   		ret = nvme_map_data(dev, req, &cmnd);
>   		if (ret)
> -			goto out_cleanup_iod;
> +			goto out_free_cmd;
>   	}
>   
>   	blk_mq_start_request(req);
>   	nvme_submit_cmd(nvmeq, &cmnd, bd->last);
>   	return BLK_STS_OK;
> -out_cleanup_iod:
> -	nvme_free_iod(dev, req);
>   out_free_cmd:
>   	nvme_cleanup_cmd(req);
>   	return ret;
> 

Otherwise, looks good.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni at wdc.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 07/15] nvme-pci: move the call to nvme_cleanup_cmd out of nvme_unmap_data
  2019-03-21 23:10 ` [PATCH 07/15] nvme-pci: move the call to nvme_cleanup_cmd out of nvme_unmap_data hch
@ 2019-03-25  5:21   ` Chaitanya.Kulkarni
  0 siblings, 0 replies; 45+ messages in thread
From: Chaitanya.Kulkarni @ 2019-03-25  5:21 UTC (permalink / raw)


On 3/21/19 4:11 PM, Christoph Hellwig wrote:
> Cleaning up the command setup isn't related to unmapping data, and
> disentangling them will simplify error handling a bit down the road.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>   drivers/nvme/host/pci.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 3f06e942fb47..2fb35d44010a 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -888,7 +888,6 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
>   			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
>   	}
>   
> -	nvme_cleanup_cmd(req);
>   	nvme_free_iod(dev, req);
>   }
>   
> @@ -939,6 +938,7 @@ static void nvme_pci_complete_rq(struct request *req)
>   {
>   	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
>   
> +	nvme_cleanup_cmd(req);
>   	nvme_unmap_data(iod->nvmeq->dev, req);
>   	nvme_complete_rq(req);
>   }
> 

Also it is much easier to debug now that it is separated.

Looks good.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni at wdc.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 08/15] nvme-pci: merge nvme_free_iod into nvme_unmap_data
  2019-03-21 23:10 ` [PATCH 08/15] nvme-pci: merge nvme_free_iod into nvme_unmap_data hch
@ 2019-03-25  5:22   ` Chaitanya.Kulkarni
  0 siblings, 0 replies; 45+ messages in thread
From: Chaitanya.Kulkarni @ 2019-03-25  5:22 UTC (permalink / raw)


On 3/21/19 4:11 PM, Christoph Hellwig wrote:
> This means we now have a function that undoes everything nvme_map_data
> does and we can simplify the error handling a bit.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>   drivers/nvme/host/pci.c | 44 ++++++++++++++++-------------------------
>   1 file changed, 17 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 2fb35d44010a..ce8bdc15ec3c 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -583,14 +583,24 @@ static inline bool nvme_pci_use_sgls(struct nvme_dev *dev, struct request *req)
>   	return true;
>   }
>   
> -static void nvme_free_iod(struct nvme_dev *dev, struct request *req)
> +static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
>   {
>   	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> +	enum dma_data_direction dma_dir = rq_data_dir(req) ?
> +			DMA_TO_DEVICE : DMA_FROM_DEVICE;
>   	const int last_prp = dev->ctrl.page_size / sizeof(__le64) - 1;
>   	dma_addr_t dma_addr = iod->first_dma, next_dma_addr;
> -
>   	int i;
>   
> +	if (iod->nents) {
> +		/* P2PDMA requests do not need to be unmapped */
> +		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
> +			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> +
> +		if (blk_integrity_rq(req))
> +			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
> +	}
> +
>   	if (iod->npages == 0)
>   		dma_pool_free(dev->prp_small_pool, nvme_pci_iod_list(req)[0],
>   			dma_addr);
> @@ -847,50 +857,30 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   		ret = nvme_pci_setup_prps(dev, req, &cmnd->rw);
>   
>   	if (ret != BLK_STS_OK)
> -		goto out_unmap;
> +		goto out;
>   
>   	ret = BLK_STS_IOERR;
>   	if (blk_integrity_rq(req)) {
>   		if (blk_rq_count_integrity_sg(q, req->bio) != 1)
> -			goto out_unmap;
> +			goto out;
>   
>   		sg_init_table(&iod->meta_sg, 1);
>   		if (blk_rq_map_integrity_sg(q, req->bio, &iod->meta_sg) != 1)
> -			goto out_unmap;
> +			goto out;
>   
>   		if (!dma_map_sg(dev->dev, &iod->meta_sg, 1, dma_dir))
> -			goto out_unmap;
> +			goto out;
>   
>   		cmnd->rw.metadata = cpu_to_le64(sg_dma_address(&iod->meta_sg));
>   	}
>   
>   	return BLK_STS_OK;
>   
> -out_unmap:
> -	dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
>   out:
> -	nvme_free_iod(dev, req);
> +	nvme_unmap_data(dev, req);
>   	return ret;
>   }
>   
> -static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
> -{
> -	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> -	enum dma_data_direction dma_dir = rq_data_dir(req) ?
> -			DMA_TO_DEVICE : DMA_FROM_DEVICE;
> -
> -	if (iod->nents) {
> -		/* P2PDMA requests do not need to be unmapped */
> -		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
> -			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> -
> -		if (blk_integrity_rq(req))
> -			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
> -	}
> -
> -	nvme_free_iod(dev, req);
> -}
> -
>   /*
>    * NOTE: ns is NULL when called on the admin queue.
>    */
> 
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni at wdc.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 09/15] nvme-pci: only call nvme_unmap_data for requests transferring data
  2019-03-21 23:10 ` [PATCH 09/15] nvme-pci: only call nvme_unmap_data for requests transferring data hch
@ 2019-03-25  5:23   ` Chaitanya.Kulkarni
  0 siblings, 0 replies; 45+ messages in thread
From: Chaitanya.Kulkarni @ 2019-03-25  5:23 UTC (permalink / raw)


On 3/21/19 4:11 PM, Christoph Hellwig wrote:
> This mirrors how nvme_map_pci is called and will allow simplifying some
> checks in nvme_unmap_pci later on.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>   drivers/nvme/host/pci.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index ce8bdc15ec3c..bc4ee869fe82 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -929,7 +929,8 @@ static void nvme_pci_complete_rq(struct request *req)
>   	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
>   
>   	nvme_cleanup_cmd(req);
> -	nvme_unmap_data(iod->nvmeq->dev, req);
> +	if (blk_rq_nr_phys_segments(req))
> +		nvme_unmap_data(iod->nvmeq->dev, req);
>   	nvme_complete_rq(req);
>   }
>   
> 

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni at wdc.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 10/15] nvme-pci: do not build a scatterlist to map metadata
  2019-03-21 23:10 ` [PATCH 10/15] nvme-pci: do not build a scatterlist to map metadata hch
@ 2019-03-25  5:27   ` Chaitanya.Kulkarni
       [not found]   ` <20190828092057.GA15524@ming.t460p>
  1 sibling, 0 replies; 45+ messages in thread
From: Chaitanya.Kulkarni @ 2019-03-25  5:27 UTC (permalink / raw)


On 3/21/19 4:12 PM, Christoph Hellwig wrote:
> We always have exactly one segment, so we can simply call dma_map_bvec.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>   drivers/nvme/host/pci.c | 23 ++++++++++-------------
>   1 file changed, 10 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index bc4ee869fe82..a7dad24e0406 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -221,7 +221,7 @@ struct nvme_iod {
>   	int npages;		/* In the PRP list. 0 means small pool in use */
>   	int nents;		/* Used in scatterlist */
>   	dma_addr_t first_dma;
> -	struct scatterlist meta_sg; /* metadata requires single contiguous buffer */
> +	dma_addr_t meta_dma;
>   	struct scatterlist *sg;
>   	struct scatterlist inline_sg[0];
>   };
> @@ -592,13 +592,16 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
>   	dma_addr_t dma_addr = iod->first_dma, next_dma_addr;
>   	int i;
>   
> +	if (blk_integrity_rq(req)) {
> +		dma_unmap_page(dev->dev, iod->meta_dma,
> +				rq_integrity_vec(req)->bv_len, dma_dir);
> +	}
> +
>   	if (iod->nents) {
>   		/* P2PDMA requests do not need to be unmapped */
>   		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
>   			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
>   
> -		if (blk_integrity_rq(req))
> -			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
>   	}
>   
>   	if (iod->npages == 0)
> @@ -861,17 +864,11 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   
>   	ret = BLK_STS_IOERR;
>   	if (blk_integrity_rq(req)) {
> -		if (blk_rq_count_integrity_sg(q, req->bio) != 1)
> -			goto out;
> -
> -		sg_init_table(&iod->meta_sg, 1);
> -		if (blk_rq_map_integrity_sg(q, req->bio, &iod->meta_sg) != 1)
> -			goto out;
> -
> -		if (!dma_map_sg(dev->dev, &iod->meta_sg, 1, dma_dir))
> +		iod->meta_dma = dma_map_bvec(dev->dev, rq_integrity_vec(req),
> +				dma_dir, 0);
> +		if (dma_mapping_error(dev->dev, iod->meta_dma))
>   			goto out;
> -
> -		cmnd->rw.metadata = cpu_to_le64(sg_dma_address(&iod->meta_sg));
> +		cmnd->rw.metadata = cpu_to_le64(iod->meta_dma);
>   	}
>   
>   	return BLK_STS_OK;
>

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni at wdc.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 11/15] nvme-pci: split metadata handling from nvme_map_data / nvme_unmap_data
  2019-03-21 23:10 ` [PATCH 11/15] nvme-pci: split metadata handling from nvme_map_data / nvme_unmap_data hch
@ 2019-03-25  5:29   ` Chaitanya.Kulkarni
  0 siblings, 0 replies; 45+ messages in thread
From: Chaitanya.Kulkarni @ 2019-03-25  5:29 UTC (permalink / raw)


On 3/21/19 4:12 PM, Christoph Hellwig wrote:
> This prepares for some bigger changes to the data mapping helpers.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>   drivers/nvme/host/pci.c | 48 +++++++++++++++++++++++------------------
>   1 file changed, 27 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index a7dad24e0406..cf29d079ad5b 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -592,11 +592,6 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
>   	dma_addr_t dma_addr = iod->first_dma, next_dma_addr;
>   	int i;
>   
> -	if (blk_integrity_rq(req)) {
> -		dma_unmap_page(dev->dev, iod->meta_dma,
> -				rq_integrity_vec(req)->bv_len, dma_dir);
> -	}
> -
>   	if (iod->nents) {
>   		/* P2PDMA requests do not need to be unmapped */
>   		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
> @@ -858,24 +853,23 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   		ret = nvme_pci_setup_sgls(dev, req, &cmnd->rw, nr_mapped);
>   	else
>   		ret = nvme_pci_setup_prps(dev, req, &cmnd->rw);
> -
> +out:
>   	if (ret != BLK_STS_OK)
> -		goto out;
> -
> -	ret = BLK_STS_IOERR;
> -	if (blk_integrity_rq(req)) {
> -		iod->meta_dma = dma_map_bvec(dev->dev, rq_integrity_vec(req),
> -				dma_dir, 0);
> -		if (dma_mapping_error(dev->dev, iod->meta_dma))
> -			goto out;
> -		cmnd->rw.metadata = cpu_to_le64(iod->meta_dma);
> -	}
> +		nvme_unmap_data(dev, req);
> +	return ret;
> +}
>   
> -	return BLK_STS_OK;
> +static blk_status_t nvme_map_metadata(struct nvme_dev *dev, struct request *req,
> +		struct nvme_command *cmnd)
> +{
> +	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
>   
> -out:
> -	nvme_unmap_data(dev, req);
> -	return ret;
> +	iod->meta_dma = dma_map_bvec(dev->dev, rq_integrity_vec(req),
> +			rq_dma_dir(req), 0);
> +	if (dma_mapping_error(dev->dev, iod->meta_dma))
> +		return BLK_STS_IOERR;
> +	cmnd->rw.metadata = cpu_to_le64(iod->meta_dma);
> +	return 0;
>   }
>   
>   /*
> @@ -913,9 +907,17 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
>   			goto out_free_cmd;
>   	}
>   
> +	if (blk_integrity_rq(req)) {
> +		ret = nvme_map_metadata(dev, req, &cmnd);
> +		if (ret)
> +			goto out_unmap_data;
> +	}
> +
>   	blk_mq_start_request(req);
>   	nvme_submit_cmd(nvmeq, &cmnd, bd->last);
>   	return BLK_STS_OK;
> +out_unmap_data:
> +	nvme_unmap_data(dev, req);
>   out_free_cmd:
>   	nvme_cleanup_cmd(req);
>   	return ret;
> @@ -924,10 +926,14 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
>   static void nvme_pci_complete_rq(struct request *req)
>   {
>   	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> +	struct nvme_dev *dev = iod->nvmeq->dev;
>   
>   	nvme_cleanup_cmd(req);
> +	if (blk_integrity_rq(req))
> +		dma_unmap_page(dev->dev, iod->meta_dma,
> +			       rq_integrity_vec(req)->bv_len, rq_data_dir(req));
>   	if (blk_rq_nr_phys_segments(req))
> -		nvme_unmap_data(iod->nvmeq->dev, req);
> +		nvme_unmap_data(dev, req);
>   	nvme_complete_rq(req);
>   }
>   
> 

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni at wdc.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 12/15] nvme-pci: remove the inline scatterlist optimization
  2019-03-21 23:10 ` [PATCH 12/15] nvme-pci: remove the inline scatterlist optimization hch
@ 2019-03-25  5:30   ` Chaitanya.Kulkarni
  0 siblings, 0 replies; 45+ messages in thread
From: Chaitanya.Kulkarni @ 2019-03-25  5:30 UTC (permalink / raw)


On 3/21/19 4:12 PM, Christoph Hellwig wrote:
> We'll have a better way to optimize for small I/O that doesn't
> require it soon, so remove the existing inline_sg case to make that
> optimization easier to implement.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>   drivers/nvme/host/pci.c | 38 ++++++--------------------------------
>   1 file changed, 6 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index cf29d079ad5b..c6047935e825 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -223,7 +223,6 @@ struct nvme_iod {
>   	dma_addr_t first_dma;
>   	dma_addr_t meta_dma;
>   	struct scatterlist *sg;
> -	struct scatterlist inline_sg[0];
>   };
>   
>   /*
> @@ -370,12 +369,6 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, u32 *dbbuf_db,
>   	return true;
>   }
>   
> -/*
> - * Max size of iod being embedded in the request payload
> - */
> -#define NVME_INT_PAGES		2
> -#define NVME_INT_BYTES(dev)	(NVME_INT_PAGES * (dev)->ctrl.page_size)
> -
>   /*
>    * Will slightly overestimate the number of pages needed.  This is OK
>    * as it only leads to a small amount of wasted memory for the lifetime of
> @@ -410,15 +403,6 @@ static unsigned int nvme_pci_iod_alloc_size(struct nvme_dev *dev,
>   	return alloc_size + sizeof(struct scatterlist) * nseg;
>   }
>   
> -static unsigned int nvme_pci_cmd_size(struct nvme_dev *dev, bool use_sgl)
> -{
> -	unsigned int alloc_size = nvme_pci_iod_alloc_size(dev,
> -				    NVME_INT_BYTES(dev), NVME_INT_PAGES,
> -				    use_sgl);
> -
> -	return sizeof(struct nvme_iod) + alloc_size;
> -}
> -
>   static int nvme_admin_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
>   				unsigned int hctx_idx)
>   {
> @@ -621,8 +605,7 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
>   		dma_addr = next_dma_addr;
>   	}
>   
> -	if (iod->sg != iod->inline_sg)
> -		mempool_free(iod->sg, dev->iod_mempool);
> +	mempool_free(iod->sg, dev->iod_mempool);
>   }
>   
>   static void nvme_print_sgl(struct scatterlist *sgl, int nents)
> @@ -822,14 +805,9 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   	blk_status_t ret = BLK_STS_IOERR;
>   	int nr_mapped;
>   
> -	if (blk_rq_payload_bytes(req) > NVME_INT_BYTES(dev) ||
> -	    blk_rq_nr_phys_segments(req) > NVME_INT_PAGES) {
> -		iod->sg = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
> -		if (!iod->sg)
> -			return BLK_STS_RESOURCE;
> -	} else {
> -		iod->sg = iod->inline_sg;
> -	}
> +	iod->sg = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
> +	if (!iod->sg)
> +		return BLK_STS_RESOURCE;
>   
>   	iod->use_sgl = nvme_pci_use_sgls(dev, req);
>   
> @@ -1619,7 +1597,7 @@ static int nvme_alloc_admin_tags(struct nvme_dev *dev)
>   		dev->admin_tagset.queue_depth = NVME_AQ_MQ_TAG_DEPTH;
>   		dev->admin_tagset.timeout = ADMIN_TIMEOUT;
>   		dev->admin_tagset.numa_node = dev_to_node(dev->dev);
> -		dev->admin_tagset.cmd_size = nvme_pci_cmd_size(dev, false);
> +		dev->admin_tagset.cmd_size = sizeof(struct nvme_iod);
>   		dev->admin_tagset.flags = BLK_MQ_F_NO_SCHED;
>   		dev->admin_tagset.driver_data = dev;
>   
> @@ -2266,11 +2244,7 @@ static int nvme_dev_add(struct nvme_dev *dev)
>   		dev->tagset.numa_node = dev_to_node(dev->dev);
>   		dev->tagset.queue_depth =
>   				min_t(int, dev->q_depth, BLK_MQ_MAX_DEPTH) - 1;
> -		dev->tagset.cmd_size = nvme_pci_cmd_size(dev, false);
> -		if ((dev->ctrl.sgls & ((1 << 0) | (1 << 1))) && sgl_threshold) {
> -			dev->tagset.cmd_size = max(dev->tagset.cmd_size,
> -					nvme_pci_cmd_size(dev, true));
> -		}
> +		dev->tagset.cmd_size = sizeof(struct nvme_iod);
>   		dev->tagset.flags = BLK_MQ_F_SHOULD_MERGE;
>   		dev->tagset.driver_data = dev;
>   
>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni at wdc.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 13/15] nvme-pci: optimize mapping of small single segment requests
  2019-03-21 23:10 ` [PATCH 13/15] nvme-pci: optimize mapping of small single segment requests hch
@ 2019-03-25  5:36   ` Chaitanya.Kulkarni
  0 siblings, 0 replies; 45+ messages in thread
From: Chaitanya.Kulkarni @ 2019-03-25  5:36 UTC (permalink / raw)


On 3/21/19 4:12 PM, Christoph Hellwig wrote:
> If a request is single segment and fits into one or two PRP entries we
> do not have to create a scatterlist for it, but can just map the bio_vec
> directly.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>   drivers/nvme/host/pci.c | 45 ++++++++++++++++++++++++++++++++++++-----
>   1 file changed, 40 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index c6047935e825..47fc4d653961 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -221,6 +221,7 @@ struct nvme_iod {
>   	int npages;		/* In the PRP list. 0 means small pool in use */
>   	int nents;		/* Used in scatterlist */
>   	dma_addr_t first_dma;
> +	unsigned int dma_len;	/* length of single DMA segment mapping */
>   	dma_addr_t meta_dma;
>   	struct scatterlist *sg;
>   };
> @@ -576,13 +577,18 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
>   	dma_addr_t dma_addr = iod->first_dma, next_dma_addr;
>   	int i;
>   
> -	if (iod->nents) {
> -		/* P2PDMA requests do not need to be unmapped */
> -		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
> -			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> -
> +	if (iod->dma_len) {
> +		dma_unmap_page(dev->dev, dma_addr, iod->dma_len, dma_dir);
> +		return;
>   	}
>   
> +	WARN_ON_ONCE(!iod->nents);
> +
> +	/* P2PDMA requests do not need to be unmapped */
> +	if (!is_pci_p2pdma_page(sg_page(iod->sg)))
> +		dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
> +
> +
>   	if (iod->npages == 0)
>   		dma_pool_free(dev->prp_small_pool, nvme_pci_iod_list(req)[0],
>   			dma_addr);
> @@ -795,6 +801,24 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_dev *dev,
>   	return BLK_STS_OK;
>   }
>   
> +static blk_status_t nvme_setup_prp_simple(struct nvme_dev *dev,
> +		struct request *req, struct nvme_rw_command *cmnd,
> +		struct bio_vec *bv)
> +{
> +	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> +	unsigned int first_prp_len = dev->ctrl.page_size - bv->bv_offset;
> +
> +	iod->first_dma = dma_map_bvec(dev->dev, bv, rq_dma_dir(req), 0);
> +	if (dma_mapping_error(dev->dev, iod->first_dma))
> +		return BLK_STS_RESOURCE;
> +	iod->dma_len = bv->bv_len;
> +
> +	cmnd->dptr.prp1 = cpu_to_le64(iod->first_dma);
> +	if (bv->bv_len > first_prp_len)
> +		cmnd->dptr.prp2 = cpu_to_le64(iod->first_dma + first_prp_len);
> +	return 0;
> +}
> +
>   static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   		struct nvme_command *cmnd)
>   {
> @@ -805,6 +829,17 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   	blk_status_t ret = BLK_STS_IOERR;
>   	int nr_mapped;
>   
> +	if (blk_rq_nr_phys_segments(req) == 1) {
> +		struct bio_vec bv = req_bvec(req);
> +
> +		if (!is_pci_p2pdma_page(bv.bv_page)) {
> +			if (bv.bv_offset + bv.bv_len <= dev->ctrl.page_size * 2)
> +				return nvme_setup_prp_simple(dev, req,
> +							     &cmnd->rw, &bv);
> +		}
> +	}
> +
> +	iod->dma_len = 0;
>   	iod->sg = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
>   	if (!iod->sg)
>   		return BLK_STS_RESOURCE;
> 
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni at wdc.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 14/15] nvme-pci: optimize mapping single segment requests using SGLs
  2019-03-21 23:10 ` [PATCH 14/15] nvme-pci: optimize mapping single segment requests using SGLs hch
@ 2019-03-25  5:39   ` Chaitanya.Kulkarni
  2019-04-30 14:17   ` klaus
  1 sibling, 0 replies; 45+ messages in thread
From: Chaitanya.Kulkarni @ 2019-03-25  5:39 UTC (permalink / raw)


On 3/21/19 4:12 PM, Christoph Hellwig wrote:
> If the controller supports SGLs we can take another short cut for single
> segment request, given that we can always map those without another
> indirection structure, and thus don't need to create a scatterlist
> structure.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>   drivers/nvme/host/pci.c | 22 ++++++++++++++++++++++
>   1 file changed, 22 insertions(+)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 47fc4d653961..38869f59c296 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -819,6 +819,23 @@ static blk_status_t nvme_setup_prp_simple(struct nvme_dev *dev,
>   	return 0;
>   }
>   
> +static blk_status_t nvme_setup_sgl_simple(struct nvme_dev *dev,
> +		struct request *req, struct nvme_rw_command *cmnd,
> +		struct bio_vec *bv)
> +{
> +	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> +
> +	iod->first_dma = dma_map_bvec(dev->dev, bv, rq_dma_dir(req), 0);
> +	if (dma_mapping_error(dev->dev, iod->first_dma))
> +		return BLK_STS_RESOURCE;
> +	iod->dma_len = bv->bv_len;
> +
> +	cmnd->dptr.sgl.addr = cpu_to_le64(iod->first_dma);
> +	cmnd->dptr.sgl.length = cpu_to_le32(iod->dma_len);
> +	cmnd->dptr.sgl.type = NVME_SGL_FMT_DATA_DESC << 4;
> +	return 0;
> +}
> +
>   static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   		struct nvme_command *cmnd)
>   {
> @@ -836,6 +853,11 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   			if (bv.bv_offset + bv.bv_len <= dev->ctrl.page_size * 2)
>   				return nvme_setup_prp_simple(dev, req,
>   							     &cmnd->rw, &bv);
> +
> +			if (iod->nvmeq->qid &&
> +			    dev->ctrl.sgls & ((1 << 0) | (1 << 1)))
> +				return nvme_setup_sgl_simple(dev, req,
> +							     &cmnd->rw, &bv);
>   		}
>   	}
>   
> 

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni at wdc.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 15/15] nvme-pci: tidy up nvme_map_data
  2019-03-21 23:10 ` [PATCH 15/15] nvme-pci: tidy up nvme_map_data hch
@ 2019-03-25  5:40   ` Chaitanya.Kulkarni
  0 siblings, 0 replies; 45+ messages in thread
From: Chaitanya.Kulkarni @ 2019-03-25  5:40 UTC (permalink / raw)


On 3/21/19 4:12 PM, Christoph Hellwig wrote:
> Remove two pointless local variables, remove ret assignment that is
> never used, move the use_sgl initialization closer to where it is used.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>   drivers/nvme/host/pci.c | 17 +++++------------
>   1 file changed, 5 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 38869f59c296..9ec0704d5f78 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -840,10 +840,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   		struct nvme_command *cmnd)
>   {
>   	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> -	struct request_queue *q = req->q;
> -	enum dma_data_direction dma_dir = rq_data_dir(req) ?
> -			DMA_TO_DEVICE : DMA_FROM_DEVICE;
> -	blk_status_t ret = BLK_STS_IOERR;
> +	blk_status_t ret = BLK_STS_RESOURCE;
>   	int nr_mapped;
>   
>   	if (blk_rq_nr_phys_segments(req) == 1) {
> @@ -865,25 +862,21 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   	iod->sg = mempool_alloc(dev->iod_mempool, GFP_ATOMIC);
>   	if (!iod->sg)
>   		return BLK_STS_RESOURCE;
> -
> -	iod->use_sgl = nvme_pci_use_sgls(dev, req);
> -
>   	sg_init_table(iod->sg, blk_rq_nr_phys_segments(req));
> -	iod->nents = blk_rq_map_sg(q, req, iod->sg);
> +	iod->nents = blk_rq_map_sg(req->q, req, iod->sg);
>   	if (!iod->nents)
>   		goto out;
>   
> -	ret = BLK_STS_RESOURCE;
> -
>   	if (is_pci_p2pdma_page(sg_page(iod->sg)))
>   		nr_mapped = pci_p2pdma_map_sg(dev->dev, iod->sg, iod->nents,
> -					  dma_dir);
> +					      rq_dma_dir(req));
>   	else
>   		nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
> -					     dma_dir,  DMA_ATTR_NO_WARN);
> +					     rq_dma_dir(req), DMA_ATTR_NO_WARN);
>   	if (!nr_mapped)
>   		goto out;
>   
> +	iod->use_sgl = nvme_pci_use_sgls(dev, req);
>   	if (iod->use_sgl)
>   		ret = nvme_pci_setup_sgls(dev, req, &cmnd->rw, nr_mapped);
>   	else
>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni at wdc.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 01/15] block: add a req_bvec helper
  2019-03-25  5:07   ` Chaitanya.Kulkarni
@ 2019-03-27 14:16     ` hch
  0 siblings, 0 replies; 45+ messages in thread
From: hch @ 2019-03-27 14:16 UTC (permalink / raw)


On Mon, Mar 25, 2019@05:07:44AM +0000, Chaitanya Kulkarni wrote:
> > +static inline struct bio_vec req_bvec(struct request *rq)
> > +{
> > +	if (rq->rq_flags & RQF_SPECIAL_PAYLOAD)
> > +		return rq->special_vec;
> Quick question here mostly for my understanding, do we also have to 
> check here for nr_phys_segments for the operations such as write-zeroes ? OR
> 
> this may never get called in write-zeroes context ? OR
> not applicable ?

We assume the caller checks that we have nr_phys_segments.  I'll
add a commen to document that.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 02/15] block: add a rq_integrity_vec helper
  2019-03-25  5:10   ` Chaitanya.Kulkarni
@ 2019-03-27 14:19     ` hch
  0 siblings, 0 replies; 45+ messages in thread
From: hch @ 2019-03-27 14:19 UTC (permalink / raw)


On Mon, Mar 25, 2019@05:10:20AM +0000, Chaitanya Kulkarni wrote:
> > +/*
> > + * Return the first bvec that contains integrity data.  In general only
> > + * drivers that are limited to a single integrity segment should use this
> > + * helper.
> > + */
> > +static inline struct bio_vec *rq_integrity_vec(struct request *rq)
> > +{
> Wrt comment, should we add a check here to make sure underlaying driver
> has limited single integrity segment ?

I've added a WARN_ON_ONCE.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 03/15] block: add a rq_dma_dir helper
  2019-03-22 13:06   ` jthumshirn
@ 2019-03-27 14:20     ` hch
  2019-03-28 10:26       ` jthumshirn
  0 siblings, 1 reply; 45+ messages in thread
From: hch @ 2019-03-27 14:20 UTC (permalink / raw)


On Fri, Mar 22, 2019@02:06:43PM +0100, Johannes Thumshirn wrote:
> On 22/03/2019 00:10, Christoph Hellwig wrote:
> > In a lot of places we want to know the DMA direction for a given
> > struct request.  Add a little helper to make it a littler easier.
> 
> You introuduce the helper here but you're using it in 8/15 for the 1st time.
> 
> Could you convert the possible users to use the new helper in this patch
> as well?

I'd rather not throw that into a nvme series.  Those are easily cleanups
for later.  Maybe we can even encourage people that would otherwise
do whitespace cleanups or incorrect refcount_t conversions to take care
of it :)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 06/15] nvme-pci: remove nvme_init_iod
  2019-03-25  5:19   ` Chaitanya.Kulkarni
@ 2019-03-27 14:21     ` hch
  0 siblings, 0 replies; 45+ messages in thread
From: hch @ 2019-03-27 14:21 UTC (permalink / raw)


On Mon, Mar 25, 2019@05:19:34AM +0000, Chaitanya Kulkarni wrote:
> > @@ -913,9 +902,14 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
> >   	struct nvme_queue *nvmeq = hctx->driver_data;
> >   	struct nvme_dev *dev = nvmeq->dev;
> >   	struct request *req = bd->rq;
> > +	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> >   	struct nvme_command cmnd;
> >   	blk_status_t ret;
> >   
> > +	iod->aborted = 0;
> > +	iod->npages = -1;
> > +	iod->nents = 0;
> > +
> Maybe add an inline helper for above default IOD initialization ?

Why?  It is a few lines of code and we only ever do it here.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC] optimize nvme single segment I/O
  2019-03-22 15:44 ` [RFC] optimize nvme single segment I/O axboe
@ 2019-03-27 14:24   ` hch
  0 siblings, 0 replies; 45+ messages in thread
From: hch @ 2019-03-27 14:24 UTC (permalink / raw)


FYI, I've pulled the series including the block helpers into nvme-5.2,
with a few tiny changes for the review comments from Chaitanya.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 03/15] block: add a rq_dma_dir helper
  2019-03-27 14:20     ` hch
@ 2019-03-28 10:26       ` jthumshirn
  0 siblings, 0 replies; 45+ messages in thread
From: jthumshirn @ 2019-03-28 10:26 UTC (permalink / raw)


On 27/03/2019 15:20, Christoph Hellwig wrote:
[...]
> 
> I'd rather not throw that into a nvme series.  Those are easily cleanups
> for later.  Maybe we can even encourage people that would otherwise
> do whitespace cleanups or incorrect refcount_t conversions to take care
> of it :)

Fair enough.

Byte,
	Johannes
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 14/15] nvme-pci: optimize mapping single segment requests using SGLs
  2019-03-21 23:10 ` [PATCH 14/15] nvme-pci: optimize mapping single segment requests using SGLs hch
  2019-03-25  5:39   ` Chaitanya.Kulkarni
@ 2019-04-30 14:17   ` klaus
  2019-04-30 14:32     ` hch
  1 sibling, 1 reply; 45+ messages in thread
From: klaus @ 2019-04-30 14:17 UTC (permalink / raw)


On Thu, Mar 21, 2019@04:10:36PM -0700, Christoph Hellwig wrote:
> If the controller supports SGLs we can take another short cut for single
> segment request, given that we can always map those without another
> indirection structure, and thus don't need to create a scatterlist
> structure.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>  drivers/nvme/host/pci.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 47fc4d653961..38869f59c296 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -819,6 +819,23 @@ static blk_status_t nvme_setup_prp_simple(struct nvme_dev *dev,
>  	return 0;
>  }
>  
> +static blk_status_t nvme_setup_sgl_simple(struct nvme_dev *dev,
> +		struct request *req, struct nvme_rw_command *cmnd,
> +		struct bio_vec *bv)
> +{
> +	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> +
> +	iod->first_dma = dma_map_bvec(dev->dev, bv, rq_dma_dir(req), 0);
> +	if (dma_mapping_error(dev->dev, iod->first_dma))
> +		return BLK_STS_RESOURCE;
> +	iod->dma_len = bv->bv_len;
> +
> +	cmnd->dptr.sgl.addr = cpu_to_le64(iod->first_dma);
> +	cmnd->dptr.sgl.length = cpu_to_le32(iod->dma_len);
> +	cmnd->dptr.sgl.type = NVME_SGL_FMT_DATA_DESC << 4;
> +	return 0;
> +}
> +
>  static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>  		struct nvme_command *cmnd)
>  {
> @@ -836,6 +853,11 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>  			if (bv.bv_offset + bv.bv_len <= dev->ctrl.page_size * 2)
>  				return nvme_setup_prp_simple(dev, req,
>  							     &cmnd->rw, &bv);
> +
> +			if (iod->nvmeq->qid &&
> +			    dev->ctrl.sgls & ((1 << 0) | (1 << 1)))
> +				return nvme_setup_sgl_simple(dev, req,
> +							     &cmnd->rw, &bv);
>  		}
>  	}
>  
> -- 
> 2.20.1
> 

Hi Christoph,

nvme_setup_sgl_simple does not set the PSDT field, so bypassing
nvme_pci_setup_sgls causing controllers to assume PRP.

Adding `cmnd->flags = NVME_CMD_SGL_METABUF` in nvme_setup_sgl_simple
fixes the issue.


-- 
Klaus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 14/15] nvme-pci: optimize mapping single segment requests using SGLs
  2019-04-30 14:17   ` klaus
@ 2019-04-30 14:32     ` hch
  0 siblings, 0 replies; 45+ messages in thread
From: hch @ 2019-04-30 14:32 UTC (permalink / raw)


On Tue, Apr 30, 2019@04:17:22PM +0200, Klaus Birkelund wrote:
> nvme_setup_sgl_simple does not set the PSDT field, so bypassing
> nvme_pci_setup_sgls causing controllers to assume PRP.
> 
> Adding `cmnd->flags = NVME_CMD_SGL_METABUF` in nvme_setup_sgl_simple
> fixes the issue.

Indeed.  Can you send the patch?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 10/15] nvme-pci: do not build a scatterlist to map metadata
       [not found]   ` <20190828092057.GA15524@ming.t460p>
@ 2019-09-12  1:02     ` Ming Lei
  2019-09-12  8:20       ` Christoph Hellwig
  0 siblings, 1 reply; 45+ messages in thread
From: Ming Lei @ 2019-09-12  1:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Sagi Grimberg, Gopal Tiwari, linux-nvme, Jens Axboe,
	dmilburn, linux-block

On Wed, Aug 28, 2019 at 05:20:57PM +0800, Ming Lei wrote:
> On Thu, Mar 21, 2019 at 04:10:32PM -0700, Christoph Hellwig wrote:
> > We always have exactly one segment, so we can simply call dma_map_bvec.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  drivers/nvme/host/pci.c | 23 ++++++++++-------------
> >  1 file changed, 10 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index bc4ee869fe82..a7dad24e0406 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -221,7 +221,7 @@ struct nvme_iod {
> >  	int npages;		/* In the PRP list. 0 means small pool in use */
> >  	int nents;		/* Used in scatterlist */
> >  	dma_addr_t first_dma;
> > -	struct scatterlist meta_sg; /* metadata requires single contiguous buffer */
> > +	dma_addr_t meta_dma;
> >  	struct scatterlist *sg;
> >  	struct scatterlist inline_sg[0];
> >  };
> > @@ -592,13 +592,16 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
> >  	dma_addr_t dma_addr = iod->first_dma, next_dma_addr;
> >  	int i;
> >  
> > +	if (blk_integrity_rq(req)) {
> > +		dma_unmap_page(dev->dev, iod->meta_dma,
> > +				rq_integrity_vec(req)->bv_len, dma_dir);
> > +	}
> > +
> >  	if (iod->nents) {
> >  		/* P2PDMA requests do not need to be unmapped */
> >  		if (!is_pci_p2pdma_page(sg_page(iod->sg)))
> >  			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> >  
> > -		if (blk_integrity_rq(req))
> > -			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
> >  	}
> >  
> >  	if (iod->npages == 0)
> > @@ -861,17 +864,11 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
> >  
> >  	ret = BLK_STS_IOERR;
> >  	if (blk_integrity_rq(req)) {
> > -		if (blk_rq_count_integrity_sg(q, req->bio) != 1)
> > -			goto out;
> > -
> > -		sg_init_table(&iod->meta_sg, 1);
> > -		if (blk_rq_map_integrity_sg(q, req->bio, &iod->meta_sg) != 1)
> > -			goto out;
> > -
> > -		if (!dma_map_sg(dev->dev, &iod->meta_sg, 1, dma_dir))
> > +		iod->meta_dma = dma_map_bvec(dev->dev, rq_integrity_vec(req),
> > +				dma_dir, 0);
> 
> Hi Christoph,
> 
> When one bio is enough big, the generated integrity data could cross
> more than one pages even though the data is still in single segment.
> 
> However, we don't convert to multi-page bvec for bio_integrity_prep(),
> and each page may consume one bvec, so is it possible for this patch to
> cause issues in case of NVMe's protection? Since this patch supposes
> that there is only one bvec for integrity data.
> 
> BTW, not see such kind of report, just a concern in theory.

Hello Christoph,

Gently ping...


Thanks,
Ming

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 10/15] nvme-pci: do not build a scatterlist to map metadata
  2019-09-12  1:02     ` Ming Lei
@ 2019-09-12  8:20       ` Christoph Hellwig
  0 siblings, 0 replies; 45+ messages in thread
From: Christoph Hellwig @ 2019-09-12  8:20 UTC (permalink / raw)
  To: Ming Lei
  Cc: Keith Busch, Sagi Grimberg, Gopal Tiwari, linux-nvme, Jens Axboe,
	dmilburn, linux-block, Christoph Hellwig

This is in my todo queue, which is big..

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, back to index

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-21 23:10 [RFC] optimize nvme single segment I/O hch
2019-03-21 23:10 ` [PATCH 01/15] block: add a req_bvec helper hch
2019-03-25  5:07   ` Chaitanya.Kulkarni
2019-03-27 14:16     ` hch
2019-03-21 23:10 ` [PATCH 02/15] block: add a rq_integrity_vec helper hch
2019-03-25  5:10   ` Chaitanya.Kulkarni
2019-03-27 14:19     ` hch
2019-03-21 23:10 ` [PATCH 03/15] block: add a rq_dma_dir helper hch
2019-03-22 13:06   ` jthumshirn
2019-03-27 14:20     ` hch
2019-03-28 10:26       ` jthumshirn
2019-03-25  5:11   ` Chaitanya.Kulkarni
2019-03-21 23:10 ` [PATCH 04/15] block: add dma_map_bvec helper hch
2019-03-25  5:13   ` Chaitanya.Kulkarni
2019-03-21 23:10 ` [PATCH 05/15] nvme-pci: remove the unused iod->length field hch
2019-03-25  5:14   ` Chaitanya.Kulkarni
2019-03-21 23:10 ` [PATCH 06/15] nvme-pci: remove nvme_init_iod hch
2019-03-25  5:19   ` Chaitanya.Kulkarni
2019-03-27 14:21     ` hch
2019-03-21 23:10 ` [PATCH 07/15] nvme-pci: move the call to nvme_cleanup_cmd out of nvme_unmap_data hch
2019-03-25  5:21   ` Chaitanya.Kulkarni
2019-03-21 23:10 ` [PATCH 08/15] nvme-pci: merge nvme_free_iod into nvme_unmap_data hch
2019-03-25  5:22   ` Chaitanya.Kulkarni
2019-03-21 23:10 ` [PATCH 09/15] nvme-pci: only call nvme_unmap_data for requests transferring data hch
2019-03-25  5:23   ` Chaitanya.Kulkarni
2019-03-21 23:10 ` [PATCH 10/15] nvme-pci: do not build a scatterlist to map metadata hch
2019-03-25  5:27   ` Chaitanya.Kulkarni
     [not found]   ` <20190828092057.GA15524@ming.t460p>
2019-09-12  1:02     ` Ming Lei
2019-09-12  8:20       ` Christoph Hellwig
2019-03-21 23:10 ` [PATCH 11/15] nvme-pci: split metadata handling from nvme_map_data / nvme_unmap_data hch
2019-03-25  5:29   ` Chaitanya.Kulkarni
2019-03-21 23:10 ` [PATCH 12/15] nvme-pci: remove the inline scatterlist optimization hch
2019-03-25  5:30   ` Chaitanya.Kulkarni
2019-03-21 23:10 ` [PATCH 13/15] nvme-pci: optimize mapping of small single segment requests hch
2019-03-25  5:36   ` Chaitanya.Kulkarni
2019-03-21 23:10 ` [PATCH 14/15] nvme-pci: optimize mapping single segment requests using SGLs hch
2019-03-25  5:39   ` Chaitanya.Kulkarni
2019-04-30 14:17   ` klaus
2019-04-30 14:32     ` hch
2019-03-21 23:10 ` [PATCH 15/15] nvme-pci: tidy up nvme_map_data hch
2019-03-25  5:40   ` Chaitanya.Kulkarni
2019-03-22 15:44 ` [RFC] optimize nvme single segment I/O axboe
2019-03-27 14:24   ` hch
2019-03-22 17:37 ` kbusch
2019-03-22 18:55 ` sagi

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org linux-nvme@archiver.kernel.org
	public-inbox-index linux-nvme


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/ public-inbox