* [PATCH V2 0/2] nvme: fix iod size calculation in nvme_probe() @ 2020-07-09 23:40 Chaitanya Kulkarni 2020-07-09 23:40 ` [PATCH V2 1/2] nvme-core: replace ctrl page size with a macro Chaitanya Kulkarni 2020-07-09 23:40 ` [PATCH V2 2/2] nvme-pci: use max of PRP or SGL for iod size Chaitanya Kulkarni 0 siblings, 2 replies; 10+ messages in thread From: Chaitanya Kulkarni @ 2020-07-09 23:40 UTC (permalink / raw) To: hch, kbusch, sagi; +Cc: Chaitanya Kulkarni, linux-nvme Hi, This is a small patch series which fixes the IO size calulation in the nvme_probe. The first patch replaces the ctrl->page_size with a macro. The second patch calculates the maximum value based on NVMe PRP and SGL size. I've tested this patch with different block sizes 4k-128k on NVMe QEMU and NVMe PCIe (non-SGL) controller. Regards, Chaitanya * Changes from V1:- ------------------- 1. Remove the ctrl->page_size and use macro instead. 2. Get rid of the conditional operater and use max_t() for SGL vs PRP size calulation. Chaitanya Kulkarni (2): nvme-core: replace ctrl page size with a macro nvme-pci: use max of PRP or SGL for iod size drivers/nvme/host/core.c | 19 +++++-------- drivers/nvme/host/nvme.h | 9 ++++++- drivers/nvme/host/pci.c | 58 +++++++++++++++++++--------------------- 3 files changed, 42 insertions(+), 44 deletions(-) -- 2.26.0 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH V2 1/2] nvme-core: replace ctrl page size with a macro 2020-07-09 23:40 [PATCH V2 0/2] nvme: fix iod size calculation in nvme_probe() Chaitanya Kulkarni @ 2020-07-09 23:40 ` Chaitanya Kulkarni 2020-07-10 14:57 ` Keith Busch 2020-07-13 7:42 ` Christoph Hellwig 2020-07-09 23:40 ` [PATCH V2 2/2] nvme-pci: use max of PRP or SGL for iod size Chaitanya Kulkarni 1 sibling, 2 replies; 10+ messages in thread From: Chaitanya Kulkarni @ 2020-07-09 23:40 UTC (permalink / raw) To: hch, kbusch, sagi; +Cc: Chaitanya Kulkarni, linux-nvme In nvme_probe() when calculating mempool size nvme_npages uses dev->ctrl.page_size which is initialized in the following code path:- Ctx insmod : nvme_probe() nvme_reset_ctrl() queue_work(nvme_reset_work() Ctx Workqueue : nvme_reset_work() nvme_pci_configure_admin_queue() nvme_enable_ctrl() ctrl->page_size = 1 << NVME_CTRL_PAGE_SHIFT. When nvme_pci_iod_alloc_size() is called with false as last argument it results in following oops since dev->ctrl.page_size used before we set it in the nvme_enable_ctrl() in above path :- Entering kdb (current=0xffff88881fc0c980, pid 339) on processor 0 Oops: (null) due to oops @ 0xffffffffa05f1723 CPU: 0 PID: 339 Comm: kworker/0:2 Tainted: G W OE 5.8.0-rc1nvme-5.9+ #20 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.4Workqueue: events work_for_cpu_fn RIP: 0010:nvme_probe+0x263/0x502 [nvme] Code: 00 00 66 41 81 7c 24 3c 4d 14 0f 84 98 01 00 00 3d 0f 1e 01 00 0f 84 aa 01 00 00 8b 8b a0 0c 00 2 RSP: 0018:ffffc90000d9bdd8 EFLAGS: 00010246 RAX: 0000000000000fff RBX: ffff8887da128000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000246 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: ffff8887c29f5570 R12: ffff888813c37000 R13: 0000000000000202 R14: 0000000fffffffe0 R15: ffff888813c370b0 FS: 0000000000000000(0000) GS:ffff88880fe00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f23185131a0 CR3: 0000000811c54000 CR4: 00000000003406f0 Call Trace: local_pci_probe+0x42/0x80 work_for_cpu_fn+0x16/0x20 process_one_work+0x24e/0x5a0 ? __schedule+0x353/0x840 worker_thread+0x1d5/0x380 ? process_one_work+0x5a0/0x5a0 kthread+0x135/0x150 ? kthread_create_on_node+0x60/0x60 ret_from_fork+0x22/0x30 This patch removes the dev->ctrl.page_size variable and replaces with the macro which avoids above oops. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> --- drivers/nvme/host/core.c | 19 ++++++----------- drivers/nvme/host/nvme.h | 9 +++++++- drivers/nvme/host/pci.c | 45 ++++++++++++++++++++-------------------- 3 files changed, 36 insertions(+), 37 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 3d00ea4e7146..b9b4aa7b53ce 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -2345,12 +2345,7 @@ EXPORT_SYMBOL_GPL(nvme_disable_ctrl); int nvme_enable_ctrl(struct nvme_ctrl *ctrl) { - /* - * Default to a 4K page size, with the intention to update this - * path in the future to accomodate architectures with differing - * kernel and IO page sizes. - */ - unsigned dev_page_min, page_shift = 12; + unsigned dev_page_min; int ret; ret = ctrl->ops->reg_read64(ctrl, NVME_REG_CAP, &ctrl->cap); @@ -2360,20 +2355,18 @@ int nvme_enable_ctrl(struct nvme_ctrl *ctrl) } dev_page_min = NVME_CAP_MPSMIN(ctrl->cap) + 12; - if (page_shift < dev_page_min) { + if (NVME_CTRL_PAGE_SHIFT < dev_page_min) { dev_err(ctrl->device, "Minimum device page size %u too large for host (%u)\n", - 1 << dev_page_min, 1 << page_shift); + 1 << dev_page_min, 1 << NVME_CTRL_PAGE_SHIFT); return -ENODEV; } - ctrl->page_size = 1 << page_shift; - if (NVME_CAP_CSS(ctrl->cap) & NVME_CAP_CSS_CSI) ctrl->ctrl_config = NVME_CC_CSS_CSI; else ctrl->ctrl_config = NVME_CC_CSS_NVM; - ctrl->ctrl_config |= (page_shift - 12) << NVME_CC_MPS_SHIFT; + ctrl->ctrl_config |= (NVME_CTRL_PAGE_SHIFT - 12) << NVME_CC_MPS_SHIFT; ctrl->ctrl_config |= NVME_CC_AMS_RR | NVME_CC_SHN_NONE; ctrl->ctrl_config |= NVME_CC_IOSQES | NVME_CC_IOCQES; ctrl->ctrl_config |= NVME_CC_ENABLE; @@ -2423,13 +2416,13 @@ static void nvme_set_queue_limits(struct nvme_ctrl *ctrl, if (ctrl->max_hw_sectors) { u32 max_segments = - (ctrl->max_hw_sectors / (ctrl->page_size >> 9)) + 1; + (ctrl->max_hw_sectors / (NVME_CTRL_PAGE_SIZE >> 9)) + 1; max_segments = min_not_zero(max_segments, ctrl->max_segments); blk_queue_max_hw_sectors(q, ctrl->max_hw_sectors); blk_queue_max_segments(q, min_t(u32, max_segments, USHRT_MAX)); } - blk_queue_virt_boundary(q, ctrl->page_size - 1); + blk_queue_virt_boundary(q, NVME_CTRL_PAGE_SIZE - 1); blk_queue_dma_alignment(q, 7); if (ctrl->vwc & NVME_CTRL_VWC_PRESENT) vwc = true; diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 13ca90bcd352..25dae702e0a2 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -37,6 +37,14 @@ extern unsigned int admin_timeout; #define NVME_INLINE_METADATA_SG_CNT 1 #endif +/* + * Default to a 4K page size, with the intention to update this + * path in the future to accommodate architectures with differing + * kernel and IO page sizes. + */ +#define NVME_CTRL_PAGE_SHIFT 12 +#define NVME_CTRL_PAGE_SIZE 4096 + extern struct workqueue_struct *nvme_wq; extern struct workqueue_struct *nvme_reset_wq; extern struct workqueue_struct *nvme_delete_wq; @@ -234,7 +242,6 @@ struct nvme_ctrl { u32 queue_count; u64 cap; - u32 page_size; u32 max_hw_sectors; u32 max_segments; u32 max_integrity_segments; diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 45e94f016ec2..68f7c090cf51 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -348,8 +348,8 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, u32 *dbbuf_db, */ static int nvme_npages(unsigned size, struct nvme_dev *dev) { - unsigned nprps = DIV_ROUND_UP(size + dev->ctrl.page_size, - dev->ctrl.page_size); + unsigned nprps = DIV_ROUND_UP(size + NVME_CTRL_PAGE_SIZE, + NVME_CTRL_PAGE_SIZE); return DIV_ROUND_UP(8 * nprps, PAGE_SIZE - 8); } @@ -515,7 +515,7 @@ static inline bool nvme_pci_use_sgls(struct nvme_dev *dev, struct request *req) static void nvme_unmap_data(struct nvme_dev *dev, struct request *req) { struct nvme_iod *iod = blk_mq_rq_to_pdu(req); - const int last_prp = dev->ctrl.page_size / sizeof(__le64) - 1; + const int last_prp = NVME_CTRL_PAGE_SIZE / sizeof(__le64) - 1; dma_addr_t dma_addr = iod->first_dma, next_dma_addr; int i; @@ -582,34 +582,33 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev *dev, struct scatterlist *sg = iod->sg; int dma_len = sg_dma_len(sg); u64 dma_addr = sg_dma_address(sg); - u32 page_size = dev->ctrl.page_size; - int offset = dma_addr & (page_size - 1); + int offset = dma_addr & (NVME_CTRL_PAGE_SIZE - 1); __le64 *prp_list; void **list = nvme_pci_iod_list(req); dma_addr_t prp_dma; int nprps, i; - length -= (page_size - offset); + length -= (NVME_CTRL_PAGE_SIZE - offset); if (length <= 0) { iod->first_dma = 0; goto done; } - dma_len -= (page_size - offset); + dma_len -= (NVME_CTRL_PAGE_SIZE - offset); if (dma_len) { - dma_addr += (page_size - offset); + dma_addr += (NVME_CTRL_PAGE_SIZE - offset); } else { sg = sg_next(sg); dma_addr = sg_dma_address(sg); dma_len = sg_dma_len(sg); } - if (length <= page_size) { + if (length <= NVME_CTRL_PAGE_SIZE) { iod->first_dma = dma_addr; goto done; } - nprps = DIV_ROUND_UP(length, page_size); + nprps = DIV_ROUND_UP(length, NVME_CTRL_PAGE_SIZE); if (nprps <= (256 / 8)) { pool = dev->prp_small_pool; iod->npages = 0; @@ -628,7 +627,7 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev *dev, iod->first_dma = prp_dma; i = 0; for (;;) { - if (i == page_size >> 3) { + if (i == NVME_CTRL_PAGE_SIZE >> 3) { __le64 *old_prp_list = prp_list; prp_list = dma_pool_alloc(pool, GFP_ATOMIC, &prp_dma); if (!prp_list) @@ -639,9 +638,9 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev *dev, i = 1; } prp_list[i++] = cpu_to_le64(dma_addr); - dma_len -= page_size; - dma_addr += page_size; - length -= page_size; + dma_len -= NVME_CTRL_PAGE_SIZE; + dma_addr += NVME_CTRL_PAGE_SIZE; + length -= NVME_CTRL_PAGE_SIZE; if (length <= 0) break; if (dma_len > 0) @@ -751,8 +750,8 @@ static blk_status_t nvme_setup_prp_simple(struct nvme_dev *dev, struct bio_vec *bv) { struct nvme_iod *iod = blk_mq_rq_to_pdu(req); - unsigned int offset = bv->bv_offset & (dev->ctrl.page_size - 1); - unsigned int first_prp_len = dev->ctrl.page_size - offset; + unsigned int offset = bv->bv_offset & (NVME_CTRL_PAGE_SIZE - 1); + unsigned int first_prp_len = NVME_CTRL_PAGE_SIZE - offset; iod->first_dma = dma_map_bvec(dev->dev, bv, rq_dma_dir(req), 0); if (dma_mapping_error(dev->dev, iod->first_dma)) @@ -794,7 +793,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req, struct bio_vec bv = req_bvec(req); if (!is_pci_p2pdma_page(bv.bv_page)) { - if (bv.bv_offset + bv.bv_len <= dev->ctrl.page_size * 2) + if (bv.bv_offset + bv.bv_len <= NVME_CTRL_PAGE_SIZE * 2) return nvme_setup_prp_simple(dev, req, &cmnd->rw, &bv); @@ -1396,12 +1395,12 @@ static int nvme_cmb_qdepth(struct nvme_dev *dev, int nr_io_queues, { int q_depth = dev->q_depth; unsigned q_size_aligned = roundup(q_depth * entry_size, - dev->ctrl.page_size); + NVME_CTRL_PAGE_SIZE); if (q_size_aligned * nr_io_queues > dev->cmb_size) { u64 mem_per_q = div_u64(dev->cmb_size, nr_io_queues); - mem_per_q = round_down(mem_per_q, dev->ctrl.page_size); + mem_per_q = round_down(mem_per_q, NVME_CTRL_PAGE_SIZE); q_depth = div_u64(mem_per_q, entry_size); /* @@ -1825,7 +1824,7 @@ static int nvme_set_host_mem(struct nvme_dev *dev, u32 bits) c.features.fid = cpu_to_le32(NVME_FEAT_HOST_MEM_BUF); c.features.dword11 = cpu_to_le32(bits); c.features.dword12 = cpu_to_le32(dev->host_mem_size >> - ilog2(dev->ctrl.page_size)); + ilog2(NVME_CTRL_PAGE_SIZE)); c.features.dword13 = cpu_to_le32(lower_32_bits(dma_addr)); c.features.dword14 = cpu_to_le32(upper_32_bits(dma_addr)); c.features.dword15 = cpu_to_le32(dev->nr_host_mem_descs); @@ -1845,7 +1844,7 @@ static void nvme_free_host_mem(struct nvme_dev *dev) for (i = 0; i < dev->nr_host_mem_descs; i++) { struct nvme_host_mem_buf_desc *desc = &dev->host_mem_descs[i]; - size_t size = le32_to_cpu(desc->size) * dev->ctrl.page_size; + size_t size = le32_to_cpu(desc->size) * NVME_CTRL_PAGE_SIZE; dma_free_attrs(dev->dev, size, dev->host_mem_desc_bufs[i], le64_to_cpu(desc->addr), @@ -1897,7 +1896,7 @@ static int __nvme_alloc_host_mem(struct nvme_dev *dev, u64 preferred, break; descs[i].addr = cpu_to_le64(dma_addr); - descs[i].size = cpu_to_le32(len / dev->ctrl.page_size); + descs[i].size = cpu_to_le32(len / NVME_CTRL_PAGE_SIZE); i++; } @@ -1913,7 +1912,7 @@ static int __nvme_alloc_host_mem(struct nvme_dev *dev, u64 preferred, out_free_bufs: while (--i >= 0) { - size_t size = le32_to_cpu(descs[i].size) * dev->ctrl.page_size; + size_t size = le32_to_cpu(descs[i].size) * NVME_CTRL_PAGE_SIZE; dma_free_attrs(dev->dev, size, bufs[i], le64_to_cpu(descs[i].addr), -- 2.26.0 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH V2 1/2] nvme-core: replace ctrl page size with a macro 2020-07-09 23:40 ` [PATCH V2 1/2] nvme-core: replace ctrl page size with a macro Chaitanya Kulkarni @ 2020-07-10 14:57 ` Keith Busch 2020-07-11 18:20 ` Chaitanya Kulkarni 2020-07-13 7:42 ` Christoph Hellwig 1 sibling, 1 reply; 10+ messages in thread From: Keith Busch @ 2020-07-10 14:57 UTC (permalink / raw) To: Chaitanya Kulkarni; +Cc: hch, linux-nvme, sagi On Thu, Jul 09, 2020 at 04:40:24PM -0700, Chaitanya Kulkarni wrote: > When nvme_pci_iod_alloc_size() is called with false as last argument it > results in following oops since dev->ctrl.page_size used before we set > it in the nvme_enable_ctrl() in above path :- > > Entering kdb (current=0xffff88881fc0c980, pid 339) on processor 0 Oops: (null) > due to oops @ 0xffffffffa05f1723 > CPU: 0 PID: 339 Comm: kworker/0:2 Tainted: G W OE 5.8.0-rc1nvme-5.9+ #20 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.4Workqueue: events work_for_cpu_fn > RIP: 0010:nvme_probe+0x263/0x502 [nvme] > Code: 00 00 66 41 81 7c 24 3c 4d 14 0f 84 98 01 00 00 3d 0f 1e 01 00 0f 84 aa 01 00 00 8b 8b a0 0c 00 2 > RSP: 0018:ffffc90000d9bdd8 EFLAGS: 00010246 > RAX: 0000000000000fff RBX: ffff8887da128000 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000246 > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: ffff8887c29f5570 R12: ffff888813c37000 > R13: 0000000000000202 R14: 0000000fffffffe0 R15: ffff888813c370b0 > FS: 0000000000000000(0000) GS:ffff88880fe00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f23185131a0 CR3: 0000000811c54000 CR4: 00000000003406f0 > Call Trace: > local_pci_probe+0x42/0x80 > work_for_cpu_fn+0x16/0x20 > process_one_work+0x24e/0x5a0 > ? __schedule+0x353/0x840 > worker_thread+0x1d5/0x380 > ? process_one_work+0x5a0/0x5a0 > kthread+0x135/0x150 > ? kthread_create_on_node+0x60/0x60 > ret_from_fork+0x22/0x30 > > This patch removes the dev->ctrl.page_size variable and replaces with > the macro which avoids above oops This makes it sound like we'd actually trigger this bug, but we currently don't do that. I think the changelog should be more about why we can remove of 'ctrl->page_size', something like: Saving the nvme controller's page size was from a time when the driver tried to use different sized pages, but this value is always set to a constant, and has been this way for some time. Remove the 'page_size' field and replace its usage with the constant value. This also lets the compiler make some micro-optimizations in the io path, and that's always a good thing. > @@ -1825,7 +1824,7 @@ static int nvme_set_host_mem(struct nvme_dev *dev, u32 bits) > c.features.fid = cpu_to_le32(NVME_FEAT_HOST_MEM_BUF); > c.features.dword11 = cpu_to_le32(bits); > c.features.dword12 = cpu_to_le32(dev->host_mem_size >> > - ilog2(dev->ctrl.page_size)); > + ilog2(NVME_CTRL_PAGE_SIZE)); You could replace the ilog2() with NVME_CTRL_PAGE_SHIFT. Both compile to the same result, though. _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH V2 1/2] nvme-core: replace ctrl page size with a macro 2020-07-10 14:57 ` Keith Busch @ 2020-07-11 18:20 ` Chaitanya Kulkarni 0 siblings, 0 replies; 10+ messages in thread From: Chaitanya Kulkarni @ 2020-07-11 18:20 UTC (permalink / raw) To: Keith Busch; +Cc: hch, linux-nvme, sagi On 7/10/20 07:57, Keith Busch wrote: > This makes it sound like we'd actually trigger this bug, but we > currently don't do that. I think the changelog should be more about why > we can remove of 'ctrl->page_size', something like: > > Saving the nvme controller's page size was from a time when the driver > tried to use different sized pages, but this value is always set to > a constant, and has been this way for some time. Remove the > 'page_size' field and replace its usage with the constant value. > > This also lets the compiler make some micro-optimizations in the io > path, and that's always a good thing. > Sure, I felt since it is not triggered yet it make sense to document what actually might happen due to current code. I'll use you log. > >> @@ -1825,7 +1824,7 @@ static int nvme_set_host_mem(struct nvme_dev *dev, u32 bits) >> c.features.fid = cpu_to_le32(NVME_FEAT_HOST_MEM_BUF); >> c.features.dword11 = cpu_to_le32(bits); >> c.features.dword12 = cpu_to_le32(dev->host_mem_size >> >> - ilog2(dev->ctrl.page_size)); >> + ilog2(NVME_CTRL_PAGE_SIZE)); > You could replace the ilog2() with NVME_CTRL_PAGE_SHIFT. Both compile to > the same result, though. > Make sense. I'll send V3 shortly. _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH V2 1/2] nvme-core: replace ctrl page size with a macro 2020-07-09 23:40 ` [PATCH V2 1/2] nvme-core: replace ctrl page size with a macro Chaitanya Kulkarni 2020-07-10 14:57 ` Keith Busch @ 2020-07-13 7:42 ` Christoph Hellwig 2020-07-14 0:02 ` Chaitanya Kulkarni 2020-07-14 16:44 ` Keith Busch 1 sibling, 2 replies; 10+ messages in thread From: Christoph Hellwig @ 2020-07-13 7:42 UTC (permalink / raw) To: Chaitanya Kulkarni; +Cc: kbusch, hch, linux-nvme, sagi On Thu, Jul 09, 2020 at 04:40:24PM -0700, Chaitanya Kulkarni wrote: > +/* > + * Default to a 4K page size, with the intention to update this > + * path in the future to accommodate architectures with differing > + * kernel and IO page sizes. > + */ > +#define NVME_CTRL_PAGE_SHIFT 12 > +#define NVME_CTRL_PAGE_SIZE 4096 NVME_CTRL_PAGE_SIZE can be defined as (1 << NVME_CTRL_PAGE_SHIFT) instead of duplicating the value. > index 45e94f016ec2..68f7c090cf51 100644 > --- a/drivers/nvme/host/pci.c > +++ b/drivers/nvme/host/pci.c > @@ -348,8 +348,8 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, u32 *dbbuf_db, > */ > static int nvme_npages(unsigned size, struct nvme_dev *dev) > { > - unsigned nprps = DIV_ROUND_UP(size + dev->ctrl.page_size, > - dev->ctrl.page_size); > + unsigned nprps = DIV_ROUND_UP(size + NVME_CTRL_PAGE_SIZE, > + NVME_CTRL_PAGE_SIZE); > return DIV_ROUND_UP(8 * nprps, PAGE_SIZE - 8); This looks like the existing code here is wrong, as DIV_ROUND_UP already adds the page size itself. But that is better left for another patch.. _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH V2 1/2] nvme-core: replace ctrl page size with a macro 2020-07-13 7:42 ` Christoph Hellwig @ 2020-07-14 0:02 ` Chaitanya Kulkarni 2020-07-14 16:44 ` Keith Busch 1 sibling, 0 replies; 10+ messages in thread From: Chaitanya Kulkarni @ 2020-07-14 0:02 UTC (permalink / raw) To: Christoph Hellwig; +Cc: kbusch, sagi, linux-nvme On 7/13/20 00:42, Christoph Hellwig wrote: > On Thu, Jul 09, 2020 at 04:40:24PM -0700, Chaitanya Kulkarni wrote: >> +/* >> + * Default to a 4K page size, with the intention to update this >> + * path in the future to accommodate architectures with differing >> + * kernel and IO page sizes. >> + */ >> +#define NVME_CTRL_PAGE_SHIFT 12 >> +#define NVME_CTRL_PAGE_SIZE 4096 > NVME_CTRL_PAGE_SIZE can be defined as > > (1 << NVME_CTRL_PAGE_SHIFT) instead of duplicating the value. This adds a calculation every time we use the macro and makes us rely on compiler to substitute the value, I guess we are okay with this, will send V3. > >> index 45e94f016ec2..68f7c090cf51 100644 >> --- a/drivers/nvme/host/pci.c >> +++ b/drivers/nvme/host/pci.c >> @@ -348,8 +348,8 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, u32 *dbbuf_db, >> */ >> static int nvme_npages(unsigned size, struct nvme_dev *dev) >> { >> - unsigned nprps = DIV_ROUND_UP(size + dev->ctrl.page_size, >> - dev->ctrl.page_size); >> + unsigned nprps = DIV_ROUND_UP(size + NVME_CTRL_PAGE_SIZE, >> + NVME_CTRL_PAGE_SIZE); >> return DIV_ROUND_UP(8 * nprps, PAGE_SIZE - 8); > This looks like the existing code here is wrong, as DIV_ROUND_UP already > adds the page size itself. But that is better left for another patch.. > This nprp calculation should be ? DIV_ROUND_UP(size, dev->ctrl.page_size); Should I add a 3rd patch to get it done since we are touching this code in this series ? _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH V2 1/2] nvme-core: replace ctrl page size with a macro 2020-07-13 7:42 ` Christoph Hellwig 2020-07-14 0:02 ` Chaitanya Kulkarni @ 2020-07-14 16:44 ` Keith Busch 1 sibling, 0 replies; 10+ messages in thread From: Keith Busch @ 2020-07-14 16:44 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-nvme, sagi, Chaitanya Kulkarni On Mon, Jul 13, 2020 at 09:42:10AM +0200, Christoph Hellwig wrote: > On Thu, Jul 09, 2020 at 04:40:24PM -0700, Chaitanya Kulkarni wrote: > > static int nvme_npages(unsigned size, struct nvme_dev *dev) > > { > > - unsigned nprps = DIV_ROUND_UP(size + dev->ctrl.page_size, > > - dev->ctrl.page_size); > > + unsigned nprps = DIV_ROUND_UP(size + NVME_CTRL_PAGE_SIZE, > > + NVME_CTRL_PAGE_SIZE); > > return DIV_ROUND_UP(8 * nprps, PAGE_SIZE - 8); > > This looks like the existing code here is wrong, as DIV_ROUND_UP already > adds the page size itself. But that is better left for another patch.. The extra addition is for when the starting address has a page offset, which may require an extra PRP than DIV_ROUND_UP would return without the addition. _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH V2 2/2] nvme-pci: use max of PRP or SGL for iod size 2020-07-09 23:40 [PATCH V2 0/2] nvme: fix iod size calculation in nvme_probe() Chaitanya Kulkarni 2020-07-09 23:40 ` [PATCH V2 1/2] nvme-core: replace ctrl page size with a macro Chaitanya Kulkarni @ 2020-07-09 23:40 ` Chaitanya Kulkarni 2020-07-13 7:42 ` Christoph Hellwig 1 sibling, 1 reply; 10+ messages in thread From: Chaitanya Kulkarni @ 2020-07-09 23:40 UTC (permalink / raw) To: hch, kbusch, sagi; +Cc: Chaitanya Kulkarni, linux-nvme From the initial implementation of NVMe SGL kernel support commit a7a7cbe353a5 ("nvme-pci: add SGL support") with addition of the commit 943e942e6266 ("nvme-pci: limit max IO size and segments to avoid high order allocations") now there is only caller left for nvme_pci_iod_alloc_size() which statically passes true for last parameter that calculates allocation size based on SGL since we need size of biggest command supported for mempool allocation. This patch modifies the helper functions nvme_pci_iod_alloc_size() such that it is now uses maximum of PRP and SGL size for iod allocation size calculation. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> --- drivers/nvme/host/pci.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 68f7c090cf51..8b6792005360 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -363,14 +363,13 @@ static int nvme_pci_npages_sgl(unsigned int num_seg) } static size_t nvme_pci_iod_alloc_size(struct nvme_dev *dev, - unsigned int size, unsigned int nseg, bool use_sgl) + unsigned int size, unsigned int nseg) { - size_t alloc_size; + size_t npages_sgl = nvme_pci_npages_sgl(nseg); + size_t npages = nvme_npages(size, dev); + size_t alloc_size = sizeof(__le64 *); - if (use_sgl) - alloc_size = sizeof(__le64 *) * nvme_pci_npages_sgl(nseg); - else - alloc_size = sizeof(__le64 *) * nvme_npages(size, dev); + alloc_size *= max_t(size_t, npages_sgl, npages); return alloc_size + sizeof(struct scatterlist) * nseg; } @@ -2812,7 +2811,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id) * command we support. */ alloc_size = nvme_pci_iod_alloc_size(dev, NVME_MAX_KB_SZ, - NVME_MAX_SEGS, true); + NVME_MAX_SEGS); WARN_ON_ONCE(alloc_size > PAGE_SIZE); dev->iod_mempool = mempool_create_node(1, mempool_kmalloc, -- 2.26.0 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH V2 2/2] nvme-pci: use max of PRP or SGL for iod size 2020-07-09 23:40 ` [PATCH V2 2/2] nvme-pci: use max of PRP or SGL for iod size Chaitanya Kulkarni @ 2020-07-13 7:42 ` Christoph Hellwig 2020-07-13 23:59 ` Chaitanya Kulkarni 0 siblings, 1 reply; 10+ messages in thread From: Christoph Hellwig @ 2020-07-13 7:42 UTC (permalink / raw) To: Chaitanya Kulkarni; +Cc: kbusch, hch, linux-nvme, sagi On Thu, Jul 09, 2020 at 04:40:25PM -0700, Chaitanya Kulkarni wrote: > >From the initial implementation of NVMe SGL kernel support > commit a7a7cbe353a5 ("nvme-pci: add SGL support") with addition of the > commit 943e942e6266 ("nvme-pci: limit max IO size and segments to avoid > high order allocations") now there is only caller left for > nvme_pci_iod_alloc_size() which statically passes true for last > parameter that calculates allocation size based on SGL since we need > size of biggest command supported for mempool allocation. > > This patch modifies the helper functions nvme_pci_iod_alloc_size() such > that it is now uses maximum of PRP and SGL size for iod allocation size > calculation. > > Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> > --- > drivers/nvme/host/pci.c | 13 ++++++------- > 1 file changed, 6 insertions(+), 7 deletions(-) > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > index 68f7c090cf51..8b6792005360 100644 > --- a/drivers/nvme/host/pci.c > +++ b/drivers/nvme/host/pci.c > @@ -363,14 +363,13 @@ static int nvme_pci_npages_sgl(unsigned int num_seg) > } > > static size_t nvme_pci_iod_alloc_size(struct nvme_dev *dev, > - unsigned int size, unsigned int nseg, bool use_sgl) > + unsigned int size, unsigned int nseg) > { > - size_t alloc_size; > + size_t npages_sgl = nvme_pci_npages_sgl(nseg); > + size_t npages = nvme_npages(size, dev); > + size_t alloc_size = sizeof(__le64 *); > > - if (use_sgl) > - alloc_size = sizeof(__le64 *) * nvme_pci_npages_sgl(nseg); > - else > - alloc_size = sizeof(__le64 *) * nvme_npages(size, dev); > + alloc_size *= max_t(size_t, npages_sgl, npages); > > return alloc_size + sizeof(struct scatterlist) * nseg; > } > @@ -2812,7 +2811,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id) > * command we support. > */ > alloc_size = nvme_pci_iod_alloc_size(dev, NVME_MAX_KB_SZ, > - NVME_MAX_SEGS, true); > + NVME_MAX_SEGS); I think we can just remove the size and nseg arguments and hard code them in the function itself. _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH V2 2/2] nvme-pci: use max of PRP or SGL for iod size 2020-07-13 7:42 ` Christoph Hellwig @ 2020-07-13 23:59 ` Chaitanya Kulkarni 0 siblings, 0 replies; 10+ messages in thread From: Chaitanya Kulkarni @ 2020-07-13 23:59 UTC (permalink / raw) To: Christoph Hellwig; +Cc: kbusch, sagi, linux-nvme On 7/13/20 00:43, Christoph Hellwig wrote: > I think we can just remove the size and nseg arguments and hard code them > in the function itself. Okay, will add this in V3. _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-07-14 16:44 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-07-09 23:40 [PATCH V2 0/2] nvme: fix iod size calculation in nvme_probe() Chaitanya Kulkarni 2020-07-09 23:40 ` [PATCH V2 1/2] nvme-core: replace ctrl page size with a macro Chaitanya Kulkarni 2020-07-10 14:57 ` Keith Busch 2020-07-11 18:20 ` Chaitanya Kulkarni 2020-07-13 7:42 ` Christoph Hellwig 2020-07-14 0:02 ` Chaitanya Kulkarni 2020-07-14 16:44 ` Keith Busch 2020-07-09 23:40 ` [PATCH V2 2/2] nvme-pci: use max of PRP or SGL for iod size Chaitanya Kulkarni 2020-07-13 7:42 ` Christoph Hellwig 2020-07-13 23:59 ` Chaitanya Kulkarni
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).