linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 0/2] nvme: fix iod size calculation in nvme_probe()
@ 2020-07-09 23:40 Chaitanya Kulkarni
  2020-07-09 23:40 ` [PATCH V2 1/2] nvme-core: replace ctrl page size with a macro Chaitanya Kulkarni
  2020-07-09 23:40 ` [PATCH V2 2/2] nvme-pci: use max of PRP or SGL for iod size Chaitanya Kulkarni
  0 siblings, 2 replies; 10+ messages in thread
From: Chaitanya Kulkarni @ 2020-07-09 23:40 UTC (permalink / raw)
  To: hch, kbusch, sagi; +Cc: Chaitanya Kulkarni, linux-nvme

Hi,

This is a small patch series which fixes the IO size calulation in the
nvme_probe.

The first patch replaces the ctrl->page_size with a macro. The second
patch calculates the maximum value based on NVMe PRP and SGL size.

I've tested this patch with different block sizes 4k-128k on NVMe QEMU
and NVMe PCIe (non-SGL) controller.

Regards,
Chaitanya

* Changes from V1:-
-------------------

1. Remove the ctrl->page_size and use macro instead.
2. Get rid of the conditional operater and use max_t() for SGL vs PRP
   size calulation.

Chaitanya Kulkarni (2):
  nvme-core: replace ctrl page size with a macro
  nvme-pci: use max of PRP or SGL for iod size

 drivers/nvme/host/core.c | 19 +++++--------
 drivers/nvme/host/nvme.h |  9 ++++++-
 drivers/nvme/host/pci.c  | 58 +++++++++++++++++++---------------------
 3 files changed, 42 insertions(+), 44 deletions(-)

-- 
2.26.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH V2 1/2] nvme-core: replace ctrl page size with a macro
  2020-07-09 23:40 [PATCH V2 0/2] nvme: fix iod size calculation in nvme_probe() Chaitanya Kulkarni
@ 2020-07-09 23:40 ` Chaitanya Kulkarni
  2020-07-10 14:57   ` Keith Busch
  2020-07-13  7:42   ` Christoph Hellwig
  2020-07-09 23:40 ` [PATCH V2 2/2] nvme-pci: use max of PRP or SGL for iod size Chaitanya Kulkarni
  1 sibling, 2 replies; 10+ messages in thread
From: Chaitanya Kulkarni @ 2020-07-09 23:40 UTC (permalink / raw)
  To: hch, kbusch, sagi; +Cc: Chaitanya Kulkarni, linux-nvme

In nvme_probe() when calculating mempool size nvme_npages uses
dev->ctrl.page_size which is initialized in the following code path:-

Ctx insmod :
nvme_probe()
 nvme_reset_ctrl()
  queue_work(nvme_reset_work()

Ctx Workqueue :
nvme_reset_work()
 nvme_pci_configure_admin_queue()
  nvme_enable_ctrl()
   ctrl->page_size = 1 << NVME_CTRL_PAGE_SHIFT.

When nvme_pci_iod_alloc_size() is called with false as last argument it
results in following oops since dev->ctrl.page_size used before we set
it in the nvme_enable_ctrl() in above path :-

Entering kdb (current=0xffff88881fc0c980, pid 339) on processor 0 Oops: (null)
due to oops @ 0xffffffffa05f1723
CPU: 0 PID: 339 Comm: kworker/0:2 Tainted: G        W  OE     5.8.0-rc1nvme-5.9+ #20
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.4Workqueue: events work_for_cpu_fn
RIP: 0010:nvme_probe+0x263/0x502 [nvme]
Code: 00 00 66 41 81 7c 24 3c 4d 14 0f 84 98 01 00 00 3d 0f 1e 01 00 0f 84 aa 01 00 00 8b 8b a0 0c 00 2
RSP: 0018:ffffc90000d9bdd8 EFLAGS: 00010246
RAX: 0000000000000fff RBX: ffff8887da128000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000246
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: ffff8887c29f5570 R12: ffff888813c37000
R13: 0000000000000202 R14: 0000000fffffffe0 R15: ffff888813c370b0
FS:  0000000000000000(0000) GS:ffff88880fe00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f23185131a0 CR3: 0000000811c54000 CR4: 00000000003406f0
Call Trace:
 local_pci_probe+0x42/0x80
 work_for_cpu_fn+0x16/0x20
 process_one_work+0x24e/0x5a0
 ? __schedule+0x353/0x840
 worker_thread+0x1d5/0x380
 ? process_one_work+0x5a0/0x5a0
 kthread+0x135/0x150
 ? kthread_create_on_node+0x60/0x60
 ret_from_fork+0x22/0x30

This patch removes the dev->ctrl.page_size variable and replaces with
the macro which avoids above oops.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/host/core.c | 19 ++++++-----------
 drivers/nvme/host/nvme.h |  9 +++++++-
 drivers/nvme/host/pci.c  | 45 ++++++++++++++++++++--------------------
 3 files changed, 36 insertions(+), 37 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3d00ea4e7146..b9b4aa7b53ce 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2345,12 +2345,7 @@ EXPORT_SYMBOL_GPL(nvme_disable_ctrl);
 
 int nvme_enable_ctrl(struct nvme_ctrl *ctrl)
 {
-	/*
-	 * Default to a 4K page size, with the intention to update this
-	 * path in the future to accomodate architectures with differing
-	 * kernel and IO page sizes.
-	 */
-	unsigned dev_page_min, page_shift = 12;
+	unsigned dev_page_min;
 	int ret;
 
 	ret = ctrl->ops->reg_read64(ctrl, NVME_REG_CAP, &ctrl->cap);
@@ -2360,20 +2355,18 @@ int nvme_enable_ctrl(struct nvme_ctrl *ctrl)
 	}
 	dev_page_min = NVME_CAP_MPSMIN(ctrl->cap) + 12;
 
-	if (page_shift < dev_page_min) {
+	if (NVME_CTRL_PAGE_SHIFT < dev_page_min) {
 		dev_err(ctrl->device,
 			"Minimum device page size %u too large for host (%u)\n",
-			1 << dev_page_min, 1 << page_shift);
+			1 << dev_page_min, 1 << NVME_CTRL_PAGE_SHIFT);
 		return -ENODEV;
 	}
 
-	ctrl->page_size = 1 << page_shift;
-
 	if (NVME_CAP_CSS(ctrl->cap) & NVME_CAP_CSS_CSI)
 		ctrl->ctrl_config = NVME_CC_CSS_CSI;
 	else
 		ctrl->ctrl_config = NVME_CC_CSS_NVM;
-	ctrl->ctrl_config |= (page_shift - 12) << NVME_CC_MPS_SHIFT;
+	ctrl->ctrl_config |= (NVME_CTRL_PAGE_SHIFT - 12) << NVME_CC_MPS_SHIFT;
 	ctrl->ctrl_config |= NVME_CC_AMS_RR | NVME_CC_SHN_NONE;
 	ctrl->ctrl_config |= NVME_CC_IOSQES | NVME_CC_IOCQES;
 	ctrl->ctrl_config |= NVME_CC_ENABLE;
@@ -2423,13 +2416,13 @@ static void nvme_set_queue_limits(struct nvme_ctrl *ctrl,
 
 	if (ctrl->max_hw_sectors) {
 		u32 max_segments =
-			(ctrl->max_hw_sectors / (ctrl->page_size >> 9)) + 1;
+			(ctrl->max_hw_sectors / (NVME_CTRL_PAGE_SIZE >> 9)) + 1;
 
 		max_segments = min_not_zero(max_segments, ctrl->max_segments);
 		blk_queue_max_hw_sectors(q, ctrl->max_hw_sectors);
 		blk_queue_max_segments(q, min_t(u32, max_segments, USHRT_MAX));
 	}
-	blk_queue_virt_boundary(q, ctrl->page_size - 1);
+	blk_queue_virt_boundary(q, NVME_CTRL_PAGE_SIZE - 1);
 	blk_queue_dma_alignment(q, 7);
 	if (ctrl->vwc & NVME_CTRL_VWC_PRESENT)
 		vwc = true;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 13ca90bcd352..25dae702e0a2 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -37,6 +37,14 @@ extern unsigned int admin_timeout;
 #define  NVME_INLINE_METADATA_SG_CNT  1
 #endif
 
+/*
+ * Default to a 4K page size, with the intention to update this
+ * path in the future to accommodate architectures with differing
+ * kernel and IO page sizes.
+ */
+#define NVME_CTRL_PAGE_SHIFT	12
+#define NVME_CTRL_PAGE_SIZE	4096
+
 extern struct workqueue_struct *nvme_wq;
 extern struct workqueue_struct *nvme_reset_wq;
 extern struct workqueue_struct *nvme_delete_wq;
@@ -234,7 +242,6 @@ struct nvme_ctrl {
 	u32 queue_count;
 
 	u64 cap;
-	u32 page_size;
 	u32 max_hw_sectors;
 	u32 max_segments;
 	u32 max_integrity_segments;
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 45e94f016ec2..68f7c090cf51 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -348,8 +348,8 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, u32 *dbbuf_db,
  */
 static int nvme_npages(unsigned size, struct nvme_dev *dev)
 {
-	unsigned nprps = DIV_ROUND_UP(size + dev->ctrl.page_size,
-				      dev->ctrl.page_size);
+	unsigned nprps = DIV_ROUND_UP(size + NVME_CTRL_PAGE_SIZE,
+						NVME_CTRL_PAGE_SIZE);
 	return DIV_ROUND_UP(8 * nprps, PAGE_SIZE - 8);
 }
 
@@ -515,7 +515,7 @@ static inline bool nvme_pci_use_sgls(struct nvme_dev *dev, struct request *req)
 static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-	const int last_prp = dev->ctrl.page_size / sizeof(__le64) - 1;
+	const int last_prp = NVME_CTRL_PAGE_SIZE / sizeof(__le64) - 1;
 	dma_addr_t dma_addr = iod->first_dma, next_dma_addr;
 	int i;
 
@@ -582,34 +582,33 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev *dev,
 	struct scatterlist *sg = iod->sg;
 	int dma_len = sg_dma_len(sg);
 	u64 dma_addr = sg_dma_address(sg);
-	u32 page_size = dev->ctrl.page_size;
-	int offset = dma_addr & (page_size - 1);
+	int offset = dma_addr & (NVME_CTRL_PAGE_SIZE - 1);
 	__le64 *prp_list;
 	void **list = nvme_pci_iod_list(req);
 	dma_addr_t prp_dma;
 	int nprps, i;
 
-	length -= (page_size - offset);
+	length -= (NVME_CTRL_PAGE_SIZE - offset);
 	if (length <= 0) {
 		iod->first_dma = 0;
 		goto done;
 	}
 
-	dma_len -= (page_size - offset);
+	dma_len -= (NVME_CTRL_PAGE_SIZE - offset);
 	if (dma_len) {
-		dma_addr += (page_size - offset);
+		dma_addr += (NVME_CTRL_PAGE_SIZE - offset);
 	} else {
 		sg = sg_next(sg);
 		dma_addr = sg_dma_address(sg);
 		dma_len = sg_dma_len(sg);
 	}
 
-	if (length <= page_size) {
+	if (length <= NVME_CTRL_PAGE_SIZE) {
 		iod->first_dma = dma_addr;
 		goto done;
 	}
 
-	nprps = DIV_ROUND_UP(length, page_size);
+	nprps = DIV_ROUND_UP(length, NVME_CTRL_PAGE_SIZE);
 	if (nprps <= (256 / 8)) {
 		pool = dev->prp_small_pool;
 		iod->npages = 0;
@@ -628,7 +627,7 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev *dev,
 	iod->first_dma = prp_dma;
 	i = 0;
 	for (;;) {
-		if (i == page_size >> 3) {
+		if (i == NVME_CTRL_PAGE_SIZE >> 3) {
 			__le64 *old_prp_list = prp_list;
 			prp_list = dma_pool_alloc(pool, GFP_ATOMIC, &prp_dma);
 			if (!prp_list)
@@ -639,9 +638,9 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev *dev,
 			i = 1;
 		}
 		prp_list[i++] = cpu_to_le64(dma_addr);
-		dma_len -= page_size;
-		dma_addr += page_size;
-		length -= page_size;
+		dma_len -= NVME_CTRL_PAGE_SIZE;
+		dma_addr += NVME_CTRL_PAGE_SIZE;
+		length -= NVME_CTRL_PAGE_SIZE;
 		if (length <= 0)
 			break;
 		if (dma_len > 0)
@@ -751,8 +750,8 @@ static blk_status_t nvme_setup_prp_simple(struct nvme_dev *dev,
 		struct bio_vec *bv)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-	unsigned int offset = bv->bv_offset & (dev->ctrl.page_size - 1);
-	unsigned int first_prp_len = dev->ctrl.page_size - offset;
+	unsigned int offset = bv->bv_offset & (NVME_CTRL_PAGE_SIZE - 1);
+	unsigned int first_prp_len = NVME_CTRL_PAGE_SIZE - offset;
 
 	iod->first_dma = dma_map_bvec(dev->dev, bv, rq_dma_dir(req), 0);
 	if (dma_mapping_error(dev->dev, iod->first_dma))
@@ -794,7 +793,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 		struct bio_vec bv = req_bvec(req);
 
 		if (!is_pci_p2pdma_page(bv.bv_page)) {
-			if (bv.bv_offset + bv.bv_len <= dev->ctrl.page_size * 2)
+			if (bv.bv_offset + bv.bv_len <= NVME_CTRL_PAGE_SIZE * 2)
 				return nvme_setup_prp_simple(dev, req,
 							     &cmnd->rw, &bv);
 
@@ -1396,12 +1395,12 @@ static int nvme_cmb_qdepth(struct nvme_dev *dev, int nr_io_queues,
 {
 	int q_depth = dev->q_depth;
 	unsigned q_size_aligned = roundup(q_depth * entry_size,
-					  dev->ctrl.page_size);
+					  NVME_CTRL_PAGE_SIZE);
 
 	if (q_size_aligned * nr_io_queues > dev->cmb_size) {
 		u64 mem_per_q = div_u64(dev->cmb_size, nr_io_queues);
 
-		mem_per_q = round_down(mem_per_q, dev->ctrl.page_size);
+		mem_per_q = round_down(mem_per_q, NVME_CTRL_PAGE_SIZE);
 		q_depth = div_u64(mem_per_q, entry_size);
 
 		/*
@@ -1825,7 +1824,7 @@ static int nvme_set_host_mem(struct nvme_dev *dev, u32 bits)
 	c.features.fid		= cpu_to_le32(NVME_FEAT_HOST_MEM_BUF);
 	c.features.dword11	= cpu_to_le32(bits);
 	c.features.dword12	= cpu_to_le32(dev->host_mem_size >>
-					      ilog2(dev->ctrl.page_size));
+					      ilog2(NVME_CTRL_PAGE_SIZE));
 	c.features.dword13	= cpu_to_le32(lower_32_bits(dma_addr));
 	c.features.dword14	= cpu_to_le32(upper_32_bits(dma_addr));
 	c.features.dword15	= cpu_to_le32(dev->nr_host_mem_descs);
@@ -1845,7 +1844,7 @@ static void nvme_free_host_mem(struct nvme_dev *dev)
 
 	for (i = 0; i < dev->nr_host_mem_descs; i++) {
 		struct nvme_host_mem_buf_desc *desc = &dev->host_mem_descs[i];
-		size_t size = le32_to_cpu(desc->size) * dev->ctrl.page_size;
+		size_t size = le32_to_cpu(desc->size) * NVME_CTRL_PAGE_SIZE;
 
 		dma_free_attrs(dev->dev, size, dev->host_mem_desc_bufs[i],
 			       le64_to_cpu(desc->addr),
@@ -1897,7 +1896,7 @@ static int __nvme_alloc_host_mem(struct nvme_dev *dev, u64 preferred,
 			break;
 
 		descs[i].addr = cpu_to_le64(dma_addr);
-		descs[i].size = cpu_to_le32(len / dev->ctrl.page_size);
+		descs[i].size = cpu_to_le32(len / NVME_CTRL_PAGE_SIZE);
 		i++;
 	}
 
@@ -1913,7 +1912,7 @@ static int __nvme_alloc_host_mem(struct nvme_dev *dev, u64 preferred,
 
 out_free_bufs:
 	while (--i >= 0) {
-		size_t size = le32_to_cpu(descs[i].size) * dev->ctrl.page_size;
+		size_t size = le32_to_cpu(descs[i].size) * NVME_CTRL_PAGE_SIZE;
 
 		dma_free_attrs(dev->dev, size, bufs[i],
 			       le64_to_cpu(descs[i].addr),
-- 
2.26.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH V2 2/2] nvme-pci: use max of PRP or SGL for iod size
  2020-07-09 23:40 [PATCH V2 0/2] nvme: fix iod size calculation in nvme_probe() Chaitanya Kulkarni
  2020-07-09 23:40 ` [PATCH V2 1/2] nvme-core: replace ctrl page size with a macro Chaitanya Kulkarni
@ 2020-07-09 23:40 ` Chaitanya Kulkarni
  2020-07-13  7:42   ` Christoph Hellwig
  1 sibling, 1 reply; 10+ messages in thread
From: Chaitanya Kulkarni @ 2020-07-09 23:40 UTC (permalink / raw)
  To: hch, kbusch, sagi; +Cc: Chaitanya Kulkarni, linux-nvme

From the initial implementation of NVMe SGL kernel support
commit a7a7cbe353a5 ("nvme-pci: add SGL support") with addition of the
commit 943e942e6266 ("nvme-pci: limit max IO size and segments to avoid
high order allocations") now there is only caller left for
nvme_pci_iod_alloc_size() which statically passes true for last
parameter that calculates allocation size based on SGL since we need
size of biggest command supported for mempool allocation.

This patch modifies the helper functions nvme_pci_iod_alloc_size() such
that it is now uses maximum of PRP and SGL size for iod allocation size
calculation.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/host/pci.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 68f7c090cf51..8b6792005360 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -363,14 +363,13 @@ static int nvme_pci_npages_sgl(unsigned int num_seg)
 }
 
 static size_t nvme_pci_iod_alloc_size(struct nvme_dev *dev,
-		unsigned int size, unsigned int nseg, bool use_sgl)
+		unsigned int size, unsigned int nseg)
 {
-	size_t alloc_size;
+	size_t npages_sgl = nvme_pci_npages_sgl(nseg);
+	size_t npages = nvme_npages(size, dev);
+	size_t alloc_size = sizeof(__le64 *);
 
-	if (use_sgl)
-		alloc_size = sizeof(__le64 *) * nvme_pci_npages_sgl(nseg);
-	else
-		alloc_size = sizeof(__le64 *) * nvme_npages(size, dev);
+	alloc_size *= max_t(size_t, npages_sgl, npages);
 
 	return alloc_size + sizeof(struct scatterlist) * nseg;
 }
@@ -2812,7 +2811,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	 * command we support.
 	 */
 	alloc_size = nvme_pci_iod_alloc_size(dev, NVME_MAX_KB_SZ,
-						NVME_MAX_SEGS, true);
+						NVME_MAX_SEGS);
 	WARN_ON_ONCE(alloc_size > PAGE_SIZE);
 
 	dev->iod_mempool = mempool_create_node(1, mempool_kmalloc,
-- 
2.26.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH V2 1/2] nvme-core: replace ctrl page size with a macro
  2020-07-09 23:40 ` [PATCH V2 1/2] nvme-core: replace ctrl page size with a macro Chaitanya Kulkarni
@ 2020-07-10 14:57   ` Keith Busch
  2020-07-11 18:20     ` Chaitanya Kulkarni
  2020-07-13  7:42   ` Christoph Hellwig
  1 sibling, 1 reply; 10+ messages in thread
From: Keith Busch @ 2020-07-10 14:57 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: hch, linux-nvme, sagi

On Thu, Jul 09, 2020 at 04:40:24PM -0700, Chaitanya Kulkarni wrote:
> When nvme_pci_iod_alloc_size() is called with false as last argument it
> results in following oops since dev->ctrl.page_size used before we set
> it in the nvme_enable_ctrl() in above path :-
> 
> Entering kdb (current=0xffff88881fc0c980, pid 339) on processor 0 Oops: (null)
> due to oops @ 0xffffffffa05f1723
> CPU: 0 PID: 339 Comm: kworker/0:2 Tainted: G        W  OE     5.8.0-rc1nvme-5.9+ #20
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.4Workqueue: events work_for_cpu_fn
> RIP: 0010:nvme_probe+0x263/0x502 [nvme]
> Code: 00 00 66 41 81 7c 24 3c 4d 14 0f 84 98 01 00 00 3d 0f 1e 01 00 0f 84 aa 01 00 00 8b 8b a0 0c 00 2
> RSP: 0018:ffffc90000d9bdd8 EFLAGS: 00010246
> RAX: 0000000000000fff RBX: ffff8887da128000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000246
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: ffff8887c29f5570 R12: ffff888813c37000
> R13: 0000000000000202 R14: 0000000fffffffe0 R15: ffff888813c370b0
> FS:  0000000000000000(0000) GS:ffff88880fe00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f23185131a0 CR3: 0000000811c54000 CR4: 00000000003406f0
> Call Trace:
>  local_pci_probe+0x42/0x80
>  work_for_cpu_fn+0x16/0x20
>  process_one_work+0x24e/0x5a0
>  ? __schedule+0x353/0x840
>  worker_thread+0x1d5/0x380
>  ? process_one_work+0x5a0/0x5a0
>  kthread+0x135/0x150
>  ? kthread_create_on_node+0x60/0x60
>  ret_from_fork+0x22/0x30
> 
> This patch removes the dev->ctrl.page_size variable and replaces with
> the macro which avoids above oops

This makes it sound like we'd actually trigger this bug, but we
currently don't do that. I think the changelog should be more about why
we can remove of 'ctrl->page_size', something like:

  Saving the nvme controller's page size was from a time when the driver
  tried to use different sized pages, but this value is always set to
  a constant, and has been this way for some time. Remove the
  'page_size' field and replace its usage with the constant value.

This also lets the compiler make some micro-optimizations in the io
path, and that's always a good thing.


> @@ -1825,7 +1824,7 @@ static int nvme_set_host_mem(struct nvme_dev *dev, u32 bits)
>  	c.features.fid		= cpu_to_le32(NVME_FEAT_HOST_MEM_BUF);
>  	c.features.dword11	= cpu_to_le32(bits);
>  	c.features.dword12	= cpu_to_le32(dev->host_mem_size >>
> -					      ilog2(dev->ctrl.page_size));
> +					      ilog2(NVME_CTRL_PAGE_SIZE));

You could replace the ilog2() with NVME_CTRL_PAGE_SHIFT. Both compile to
the same result, though.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2 1/2] nvme-core: replace ctrl page size with a macro
  2020-07-10 14:57   ` Keith Busch
@ 2020-07-11 18:20     ` Chaitanya Kulkarni
  0 siblings, 0 replies; 10+ messages in thread
From: Chaitanya Kulkarni @ 2020-07-11 18:20 UTC (permalink / raw)
  To: Keith Busch; +Cc: hch, linux-nvme, sagi

On 7/10/20 07:57, Keith Busch wrote:
> This makes it sound like we'd actually trigger this bug, but we
> currently don't do that. I think the changelog should be more about why
> we can remove of 'ctrl->page_size', something like:
> 
>    Saving the nvme controller's page size was from a time when the driver
>    tried to use different sized pages, but this value is always set to
>    a constant, and has been this way for some time. Remove the
>    'page_size' field and replace its usage with the constant value.
> 
> This also lets the compiler make some micro-optimizations in the io
> path, and that's always a good thing.
> 
Sure, I felt since it is not triggered yet it make sense to document 
what actually might happen due to current code.

I'll use you log.
> 
>> @@ -1825,7 +1824,7 @@ static int nvme_set_host_mem(struct nvme_dev *dev, u32 bits)
>>   	c.features.fid		= cpu_to_le32(NVME_FEAT_HOST_MEM_BUF);
>>   	c.features.dword11	= cpu_to_le32(bits);
>>   	c.features.dword12	= cpu_to_le32(dev->host_mem_size >>
>> -					      ilog2(dev->ctrl.page_size));
>> +					      ilog2(NVME_CTRL_PAGE_SIZE));
> You could replace the ilog2() with NVME_CTRL_PAGE_SHIFT. Both compile to
> the same result, though.
> 
Make sense.

I'll send V3 shortly.


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2 1/2] nvme-core: replace ctrl page size with a macro
  2020-07-09 23:40 ` [PATCH V2 1/2] nvme-core: replace ctrl page size with a macro Chaitanya Kulkarni
  2020-07-10 14:57   ` Keith Busch
@ 2020-07-13  7:42   ` Christoph Hellwig
  2020-07-14  0:02     ` Chaitanya Kulkarni
  2020-07-14 16:44     ` Keith Busch
  1 sibling, 2 replies; 10+ messages in thread
From: Christoph Hellwig @ 2020-07-13  7:42 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: kbusch, hch, linux-nvme, sagi

On Thu, Jul 09, 2020 at 04:40:24PM -0700, Chaitanya Kulkarni wrote:
> +/*
> + * Default to a 4K page size, with the intention to update this
> + * path in the future to accommodate architectures with differing
> + * kernel and IO page sizes.
> + */
> +#define NVME_CTRL_PAGE_SHIFT	12
> +#define NVME_CTRL_PAGE_SIZE	4096

NVME_CTRL_PAGE_SIZE can be defined as

(1 << NVME_CTRL_PAGE_SHIFT) instead of duplicating the value.

> index 45e94f016ec2..68f7c090cf51 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -348,8 +348,8 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, u32 *dbbuf_db,
>   */
>  static int nvme_npages(unsigned size, struct nvme_dev *dev)
>  {
> -	unsigned nprps = DIV_ROUND_UP(size + dev->ctrl.page_size,
> -				      dev->ctrl.page_size);
> +	unsigned nprps = DIV_ROUND_UP(size + NVME_CTRL_PAGE_SIZE,
> +						NVME_CTRL_PAGE_SIZE);
>  	return DIV_ROUND_UP(8 * nprps, PAGE_SIZE - 8);

This looks like the existing code here is wrong, as DIV_ROUND_UP already
adds the page size itself.  But that is better left for another patch..

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2 2/2] nvme-pci: use max of PRP or SGL for iod size
  2020-07-09 23:40 ` [PATCH V2 2/2] nvme-pci: use max of PRP or SGL for iod size Chaitanya Kulkarni
@ 2020-07-13  7:42   ` Christoph Hellwig
  2020-07-13 23:59     ` Chaitanya Kulkarni
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2020-07-13  7:42 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: kbusch, hch, linux-nvme, sagi

On Thu, Jul 09, 2020 at 04:40:25PM -0700, Chaitanya Kulkarni wrote:
> >From the initial implementation of NVMe SGL kernel support
> commit a7a7cbe353a5 ("nvme-pci: add SGL support") with addition of the
> commit 943e942e6266 ("nvme-pci: limit max IO size and segments to avoid
> high order allocations") now there is only caller left for
> nvme_pci_iod_alloc_size() which statically passes true for last
> parameter that calculates allocation size based on SGL since we need
> size of biggest command supported for mempool allocation.
> 
> This patch modifies the helper functions nvme_pci_iod_alloc_size() such
> that it is now uses maximum of PRP and SGL size for iod allocation size
> calculation.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/host/pci.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 68f7c090cf51..8b6792005360 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -363,14 +363,13 @@ static int nvme_pci_npages_sgl(unsigned int num_seg)
>  }
>  
>  static size_t nvme_pci_iod_alloc_size(struct nvme_dev *dev,
> -		unsigned int size, unsigned int nseg, bool use_sgl)
> +		unsigned int size, unsigned int nseg)
>  {
> -	size_t alloc_size;
> +	size_t npages_sgl = nvme_pci_npages_sgl(nseg);
> +	size_t npages = nvme_npages(size, dev);
> +	size_t alloc_size = sizeof(__le64 *);
>  
> -	if (use_sgl)
> -		alloc_size = sizeof(__le64 *) * nvme_pci_npages_sgl(nseg);
> -	else
> -		alloc_size = sizeof(__le64 *) * nvme_npages(size, dev);
> +	alloc_size *= max_t(size_t, npages_sgl, npages);
>  
>  	return alloc_size + sizeof(struct scatterlist) * nseg;
>  }
> @@ -2812,7 +2811,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  	 * command we support.
>  	 */
>  	alloc_size = nvme_pci_iod_alloc_size(dev, NVME_MAX_KB_SZ,
> -						NVME_MAX_SEGS, true);
> +						NVME_MAX_SEGS);

I think we can just remove the size and nseg arguments and hard code them
in the function itself.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2 2/2] nvme-pci: use max of PRP or SGL for iod size
  2020-07-13  7:42   ` Christoph Hellwig
@ 2020-07-13 23:59     ` Chaitanya Kulkarni
  0 siblings, 0 replies; 10+ messages in thread
From: Chaitanya Kulkarni @ 2020-07-13 23:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: kbusch, sagi, linux-nvme

On 7/13/20 00:43, Christoph Hellwig wrote:
> I think we can just remove the size and nseg arguments and hard code them
> in the function itself.
Okay, will add this in V3.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2 1/2] nvme-core: replace ctrl page size with a macro
  2020-07-13  7:42   ` Christoph Hellwig
@ 2020-07-14  0:02     ` Chaitanya Kulkarni
  2020-07-14 16:44     ` Keith Busch
  1 sibling, 0 replies; 10+ messages in thread
From: Chaitanya Kulkarni @ 2020-07-14  0:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: kbusch, sagi, linux-nvme

On 7/13/20 00:42, Christoph Hellwig wrote:
> On Thu, Jul 09, 2020 at 04:40:24PM -0700, Chaitanya Kulkarni wrote:
>> +/*
>> + * Default to a 4K page size, with the intention to update this
>> + * path in the future to accommodate architectures with differing
>> + * kernel and IO page sizes.
>> + */
>> +#define NVME_CTRL_PAGE_SHIFT	12
>> +#define NVME_CTRL_PAGE_SIZE	4096
> NVME_CTRL_PAGE_SIZE can be defined as
> 
> (1 << NVME_CTRL_PAGE_SHIFT) instead of duplicating the value.
This adds a calculation every time we use the macro and makes us rely on 
compiler to substitute the value, I guess we are okay with this, will 
send V3.
> 
>> index 45e94f016ec2..68f7c090cf51 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -348,8 +348,8 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, u32 *dbbuf_db,
>>    */
>>   static int nvme_npages(unsigned size, struct nvme_dev *dev)
>>   {
>> -	unsigned nprps = DIV_ROUND_UP(size + dev->ctrl.page_size,
>> -				      dev->ctrl.page_size);
>> +	unsigned nprps = DIV_ROUND_UP(size + NVME_CTRL_PAGE_SIZE,
>> +						NVME_CTRL_PAGE_SIZE);
>>   	return DIV_ROUND_UP(8 * nprps, PAGE_SIZE - 8);
> This looks like the existing code here is wrong, as DIV_ROUND_UP already
> adds the page size itself.  But that is better left for another patch..
> 

This nprp calculation should be  ?

DIV_ROUND_UP(size, dev->ctrl.page_size);

Should I add a 3rd patch to get it done since we are touching this code 
in this series ?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2 1/2] nvme-core: replace ctrl page size with a macro
  2020-07-13  7:42   ` Christoph Hellwig
  2020-07-14  0:02     ` Chaitanya Kulkarni
@ 2020-07-14 16:44     ` Keith Busch
  1 sibling, 0 replies; 10+ messages in thread
From: Keith Busch @ 2020-07-14 16:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, sagi, Chaitanya Kulkarni

On Mon, Jul 13, 2020 at 09:42:10AM +0200, Christoph Hellwig wrote:
> On Thu, Jul 09, 2020 at 04:40:24PM -0700, Chaitanya Kulkarni wrote:
> >  static int nvme_npages(unsigned size, struct nvme_dev *dev)
> >  {
> > -	unsigned nprps = DIV_ROUND_UP(size + dev->ctrl.page_size,
> > -				      dev->ctrl.page_size);
> > +	unsigned nprps = DIV_ROUND_UP(size + NVME_CTRL_PAGE_SIZE,
> > +						NVME_CTRL_PAGE_SIZE);
> >  	return DIV_ROUND_UP(8 * nprps, PAGE_SIZE - 8);
> 
> This looks like the existing code here is wrong, as DIV_ROUND_UP already
> adds the page size itself.  But that is better left for another patch..

The extra addition is for when the starting address has a page offset,
which may require an extra PRP than DIV_ROUND_UP would return without
the addition.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-07-14 16:44 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-09 23:40 [PATCH V2 0/2] nvme: fix iod size calculation in nvme_probe() Chaitanya Kulkarni
2020-07-09 23:40 ` [PATCH V2 1/2] nvme-core: replace ctrl page size with a macro Chaitanya Kulkarni
2020-07-10 14:57   ` Keith Busch
2020-07-11 18:20     ` Chaitanya Kulkarni
2020-07-13  7:42   ` Christoph Hellwig
2020-07-14  0:02     ` Chaitanya Kulkarni
2020-07-14 16:44     ` Keith Busch
2020-07-09 23:40 ` [PATCH V2 2/2] nvme-pci: use max of PRP or SGL for iod size Chaitanya Kulkarni
2020-07-13  7:42   ` Christoph Hellwig
2020-07-13 23:59     ` Chaitanya Kulkarni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).