* host memory buffer support @ 2017-05-20 13:13 Christoph Hellwig 2017-05-20 13:13 ` [PATCH 1/4] nvme.h: add struct nvme_host_mem_buf_desc and HMB flags Christoph Hellwig ` (3 more replies) 0 siblings, 4 replies; 18+ messages in thread From: Christoph Hellwig @ 2017-05-20 13:13 UTC (permalink / raw) This series adds support for the NVMe host memory buffer feature, which is a way resource contstrained devices can use some host computer memory for its own purposes. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/4] nvme.h: add struct nvme_host_mem_buf_desc and HMB flags 2017-05-20 13:13 host memory buffer support Christoph Hellwig @ 2017-05-20 13:13 ` Christoph Hellwig 2017-05-30 14:31 ` Sagi Grimberg 2017-05-20 13:13 ` [PATCH 2/4] nvme.h: add dword 12 - 15 fields to struct nvme_features Christoph Hellwig ` (2 subsequent siblings) 3 siblings, 1 reply; 18+ messages in thread From: Christoph Hellwig @ 2017-05-20 13:13 UTC (permalink / raw) Signed-off-by: Christoph Hellwig <hch at lst.de> --- include/linux/nvme.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/include/linux/nvme.h b/include/linux/nvme.h index b625bacf37ef..9fd77fde4ae0 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -586,6 +586,11 @@ struct nvme_feat_auto_pst { __le64 entries[32]; }; +enum { + NVME_HOST_MEM_ENABLE = (1 << 0), + NVME_HOST_MEM_RETURN = (1 << 1), +}; + /* Admin commands */ enum nvme_admin_opcode { @@ -670,6 +675,12 @@ struct nvme_features { __u32 rsvd12[4]; }; +struct nvme_host_mem_buf_desc { + __le64 addr; + __le32 size; + __u32 rsvd; +}; + struct nvme_create_cq { __u8 opcode; __u8 flags; -- 2.11.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 1/4] nvme.h: add struct nvme_host_mem_buf_desc and HMB flags 2017-05-20 13:13 ` [PATCH 1/4] nvme.h: add struct nvme_host_mem_buf_desc and HMB flags Christoph Hellwig @ 2017-05-30 14:31 ` Sagi Grimberg 0 siblings, 0 replies; 18+ messages in thread From: Sagi Grimberg @ 2017-05-30 14:31 UTC (permalink / raw) Reviewed-by: Sagi Grimberg <sagi at grimberg.me> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/4] nvme.h: add dword 12 - 15 fields to struct nvme_features 2017-05-20 13:13 host memory buffer support Christoph Hellwig 2017-05-20 13:13 ` [PATCH 1/4] nvme.h: add struct nvme_host_mem_buf_desc and HMB flags Christoph Hellwig @ 2017-05-20 13:13 ` Christoph Hellwig 2017-05-30 14:31 ` Sagi Grimberg 2017-05-20 13:13 ` [PATCH 3/4] nvme: save hmpre and hmmin in struct nvme_ctrl Christoph Hellwig 2017-05-20 13:13 ` [PATCH 4/4] nvme-pci: implement host memory buffer support Christoph Hellwig 3 siblings, 1 reply; 18+ messages in thread From: Christoph Hellwig @ 2017-05-20 13:13 UTC (permalink / raw) From: Arnav Dawn <a.dawn@samsung.com> Signed-off-by: Arnav Dawn <a.dawn at samsung.com> [hch: split from a larger patch, new changelog] Signed-off-by: Christoph Hellwig <hch at lst.de> --- include/linux/nvme.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/include/linux/nvme.h b/include/linux/nvme.h index 9fd77fde4ae0..beaefe822104 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -672,7 +672,10 @@ struct nvme_features { union nvme_data_ptr dptr; __le32 fid; __le32 dword11; - __u32 rsvd12[4]; + __le32 dword12; + __le32 dword13; + __le32 dword14; + __le32 dword15; }; struct nvme_host_mem_buf_desc { -- 2.11.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/4] nvme.h: add dword 12 - 15 fields to struct nvme_features 2017-05-20 13:13 ` [PATCH 2/4] nvme.h: add dword 12 - 15 fields to struct nvme_features Christoph Hellwig @ 2017-05-30 14:31 ` Sagi Grimberg 0 siblings, 0 replies; 18+ messages in thread From: Sagi Grimberg @ 2017-05-30 14:31 UTC (permalink / raw) Reviewed-by: Sagi Grimberg <sagi at grimberg.me> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 3/4] nvme: save hmpre and hmmin in struct nvme_ctrl 2017-05-20 13:13 host memory buffer support Christoph Hellwig 2017-05-20 13:13 ` [PATCH 1/4] nvme.h: add struct nvme_host_mem_buf_desc and HMB flags Christoph Hellwig 2017-05-20 13:13 ` [PATCH 2/4] nvme.h: add dword 12 - 15 fields to struct nvme_features Christoph Hellwig @ 2017-05-20 13:13 ` Christoph Hellwig 2017-05-30 14:31 ` Sagi Grimberg 2017-05-20 13:13 ` [PATCH 4/4] nvme-pci: implement host memory buffer support Christoph Hellwig 3 siblings, 1 reply; 18+ messages in thread From: Christoph Hellwig @ 2017-05-20 13:13 UTC (permalink / raw) We'll need the later for the HMB support. Signed-off-by: Christoph Hellwig <hch at lst.de> --- drivers/nvme/host/core.c | 2 ++ drivers/nvme/host/nvme.h | 3 +++ 2 files changed, 5 insertions(+) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index d5e0906262ea..0f7fada284da 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1625,6 +1625,8 @@ int nvme_init_identify(struct nvme_ctrl *ctrl) } } else { ctrl->cntlid = le16_to_cpu(id->cntlid); + ctrl->hmpre = le32_to_cpu(id->hmpre); + ctrl->hmmin = le32_to_cpu(id->hmmin); } kfree(id); diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 29c708ca9621..605f25b5145e 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -166,6 +166,9 @@ struct nvme_ctrl { /* Power saving configuration */ u64 ps_max_latency_us; + u32 hmpre; + u32 hmmin; + /* Fabrics only */ u16 sqsize; u32 ioccsz; -- 2.11.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 3/4] nvme: save hmpre and hmmin in struct nvme_ctrl 2017-05-20 13:13 ` [PATCH 3/4] nvme: save hmpre and hmmin in struct nvme_ctrl Christoph Hellwig @ 2017-05-30 14:31 ` Sagi Grimberg 0 siblings, 0 replies; 18+ messages in thread From: Sagi Grimberg @ 2017-05-30 14:31 UTC (permalink / raw) Reviewed-by: Sagi Grimberg <sagi at grimberg.me> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 4/4] nvme-pci: implement host memory buffer support 2017-05-20 13:13 host memory buffer support Christoph Hellwig ` (2 preceding siblings ...) 2017-05-20 13:13 ` [PATCH 3/4] nvme: save hmpre and hmmin in struct nvme_ctrl Christoph Hellwig @ 2017-05-20 13:13 ` Christoph Hellwig 2017-05-22 7:22 ` Keith Busch ` (2 more replies) 3 siblings, 3 replies; 18+ messages in thread From: Christoph Hellwig @ 2017-05-20 13:13 UTC (permalink / raw) If a controller supports the host memory buffer we try to provide it with the requested size up to an upper cap set as a module parameter. We try to give as few as possible descriptors, eventually working our way down. Signed-off-by: Christoph Hellwig <hch at lst.de> --- drivers/nvme/host/pci.c | 187 +++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 185 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index fed803232edc..93868b9750cd 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -66,6 +66,11 @@ static bool use_cmb_sqes = true; module_param(use_cmb_sqes, bool, 0644); MODULE_PARM_DESC(use_cmb_sqes, "use controller's memory buffer for I/O SQes"); +static unsigned int max_host_mem_size_mb = 128; +module_param(max_host_mem_size_mb, uint, 0444); +MODULE_PARM_DESC(max_host_mem_size_mb, + "Maximum Host Memory Buffer (HMB) size per controller (in MiB)"); + static struct workqueue_struct *nvme_workq; struct nvme_dev; @@ -104,10 +109,18 @@ struct nvme_dev { u32 cmbloc; struct nvme_ctrl ctrl; struct completion ioq_wait; + + /* shadow doorbell buffer support: */ u32 *dbbuf_dbs; dma_addr_t dbbuf_dbs_dma_addr; u32 *dbbuf_eis; dma_addr_t dbbuf_eis_dma_addr; + + /* host memory buffer support: */ + u64 host_mem_size; + u32 nr_host_mem_descs; + struct nvme_host_mem_buf_desc *host_mem_descs; + void **host_mem_desc_bufs; }; static inline unsigned int sq_idx(unsigned int qid, u32 stride) @@ -1509,6 +1522,160 @@ static inline void nvme_release_cmb(struct nvme_dev *dev) } } +static int nvme_set_host_mem(struct nvme_dev *dev, u32 bits) +{ + size_t len = dev->nr_host_mem_descs * sizeof(*dev->host_mem_descs); + struct nvme_command c; + u64 dma_addr; + int ret; + + dma_addr = dma_map_single(dev->dev, dev->host_mem_descs, len, + DMA_TO_DEVICE); + if (dma_mapping_error(dev->dev, dma_addr)) + return -ENOMEM; + + memset(&c, 0, sizeof(c)); + c.features.opcode = nvme_admin_set_features; + c.features.fid = cpu_to_le32(NVME_FEAT_HOST_MEM_BUF); + c.features.dword11 = cpu_to_le32(bits); + c.features.dword12 = cpu_to_le32(dev->host_mem_size / + dev->ctrl.page_size); + c.features.dword13 = cpu_to_le32(lower_32_bits(dma_addr)); + c.features.dword14 = cpu_to_le32(upper_32_bits(dma_addr)); + c.features.dword15 = cpu_to_le32(dev->nr_host_mem_descs); + + ret = nvme_submit_sync_cmd(dev->ctrl.admin_q, &c, NULL, 0); + if (ret) { + dev_warn(dev->ctrl.device, + "failed to set host mem (err %d, flags %#x).\n", + ret, bits); + } + dma_unmap_single(dev->dev, dma_addr, len, DMA_TO_DEVICE); + return ret; +} + +static void nvme_free_host_mem(struct nvme_dev *dev) +{ + int i; + + for (i = 0; i < dev->nr_host_mem_descs; i++) { + struct nvme_host_mem_buf_desc *desc = &dev->host_mem_descs[i]; + size_t size = le32_to_cpu(desc->size) * dev->ctrl.page_size; + + dma_free_coherent(dev->dev, size, dev->host_mem_desc_bufs[i], + le64_to_cpu(desc->addr)); + } + + kfree(dev->host_mem_desc_bufs); + dev->host_mem_desc_bufs = NULL; + kfree(dev->host_mem_descs); + dev->host_mem_descs = NULL; +} + +static int nvme_alloc_host_mem(struct nvme_dev *dev, u64 min, u64 preferred) +{ + struct nvme_host_mem_buf_desc *descs; + u32 chunk_size, max_entries, i = 0; + void **bufs; + u64 size; + + /* start big and work our way down */ + chunk_size = min(preferred, (u64)PAGE_SIZE << MAX_ORDER); +retry: + max_entries = DIV_ROUND_UP(preferred, chunk_size); + descs = kcalloc(max_entries, sizeof(*descs), GFP_KERNEL); + if (!descs) + goto out; + + bufs = kcalloc(max_entries, sizeof(*bufs), GFP_KERNEL); + if (!bufs) + goto out_free_descs; + + for (size = 0; size < preferred; size += chunk_size) { + u32 len = min_t(u64, chunk_size, preferred - size); + dma_addr_t dma_addr; + + bufs[i] = dma_alloc_attrs(dev->dev, len, &dma_addr, GFP_KERNEL, + DMA_ATTR_NO_KERNEL_MAPPING | DMA_ATTR_NO_WARN); + if (!bufs[i]) + break; + + descs[i].addr = cpu_to_le64(dma_addr); + descs[i].size = cpu_to_le32(len / dev->ctrl.page_size); + i++; + } + + if (!size || (min && size < min)) { + dev_warn(dev->ctrl.device, + "failed to allocate host memory buffer.\n"); + goto out_free_bufs; + } + + dev_info(dev->ctrl.device, + "allocated %lld MiB host memory buffer.\n", size / 1024 / 1024); + dev->nr_host_mem_descs = i; + dev->host_mem_size = size; + dev->host_mem_descs = descs; + dev->host_mem_desc_bufs = bufs; + return 0; + +out_free_bufs: + while (--i >= 0) { + struct nvme_host_mem_buf_desc *desc = &dev->host_mem_descs[i]; + size_t size = le32_to_cpu(desc->size) * dev->ctrl.page_size; + + dma_free_coherent(dev->dev, size, dev->host_mem_desc_bufs[i], + le64_to_cpu(desc->addr)); + } + + kfree(bufs); +out_free_descs: + kfree(descs); +out: + /* try a smaller chunk size if we failed early */ + if (chunk_size >= PAGE_SIZE * 2 && (i == 0 || size < min)) { + chunk_size /= 2; + goto retry; + } + dev->host_mem_descs = NULL; + return -ENOMEM; +} + +static void nvme_setup_host_mem(struct nvme_dev *dev) +{ + u64 max = (u64)max_host_mem_size_mb * 1024 * 1024; + u64 preferred = (u64)dev->ctrl.hmpre * 4096; + u64 min = (u64)dev->ctrl.hmmin * 4096; + u32 enable_bits = NVME_HOST_MEM_ENABLE; + + preferred = min(preferred, max); + if (min > max) { + dev_warn(dev->ctrl.device, + "min host memory (%lld MiB) above limit (%d MiB).\n", + preferred / 1024 / 1024, max_host_mem_size_mb); + nvme_free_host_mem(dev); + return; + } + + /* + * If we already have a buffer allocated check if we can reuse it. + */ + if (dev->host_mem_descs) { + if (dev->host_mem_size >= min) + enable_bits |= NVME_HOST_MEM_RETURN; + else + nvme_free_host_mem(dev); + } + + if (!dev->host_mem_descs) { + if (nvme_alloc_host_mem(dev, min, preferred)) + return; + } + + if (nvme_set_host_mem(dev, enable_bits)) + nvme_free_host_mem(dev); +} + static size_t db_bar_size(struct nvme_dev *dev, unsigned nr_io_queues) { return 4096 + ((nr_io_queues + 1) * 8 * dev->db_stride); @@ -1809,8 +1976,20 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown) * Give the controller a chance to complete all entered requests if * doing a safe shutdown. */ - if (!dead && shutdown) - nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT); + if (!dead) { + if (shutdown) + nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT); + + /* + * If the controller is still alive tell it to stop using the + * host memory buffer. In theory the shutdown / reset should + * make sure that it doesn't access the host memoery anymore, + * but I'd rather be safe than sorry.. + */ + if (dev->host_mem_descs) + nvme_set_host_mem(dev, 0); + + } nvme_stop_queues(&dev->ctrl); queues = dev->online_queues - 1; @@ -1945,6 +2124,9 @@ static void nvme_reset_work(struct work_struct *work) "unable to allocate dma for dbbuf\n"); } + if (dev->ctrl.hmpre) + nvme_setup_host_mem(dev); + result = nvme_setup_io_queues(dev); if (result) goto out; @@ -2182,6 +2364,7 @@ static void nvme_remove(struct pci_dev *pdev) flush_work(&dev->reset_work); nvme_uninit_ctrl(&dev->ctrl); nvme_dev_disable(dev, true); + nvme_free_host_mem(dev); nvme_dev_remove_admin(dev); nvme_free_queues(dev, 0); nvme_release_cmb(dev); -- 2.11.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 4/4] nvme-pci: implement host memory buffer support 2017-05-20 13:13 ` [PATCH 4/4] nvme-pci: implement host memory buffer support Christoph Hellwig @ 2017-05-22 7:22 ` Keith Busch 2017-05-22 12:48 ` Christoph Hellwig 2017-05-30 14:35 ` Sagi Grimberg 2017-05-30 21:39 ` Max Gurtovoy 2 siblings, 1 reply; 18+ messages in thread From: Keith Busch @ 2017-05-22 7:22 UTC (permalink / raw) On Sat, May 20, 2017@03:13:57PM +0200, Christoph Hellwig wrote: > + for (size = 0; size < preferred; size += chunk_size) { > + u32 len = min_t(u64, chunk_size, preferred - size); > + dma_addr_t dma_addr; > + > + bufs[i] = dma_alloc_attrs(dev->dev, len, &dma_addr, GFP_KERNEL, > + DMA_ATTR_NO_KERNEL_MAPPING | DMA_ATTR_NO_WARN); > + if (!bufs[i]) > + break; > + > + descs[i].addr = cpu_to_le64(dma_addr); > + descs[i].size = cpu_to_le32(len / dev->ctrl.page_size); > + i++; > + } > + > + if (!size || (min && size < min)) { > + dev_warn(dev->ctrl.device, > + "failed to allocate host memory buffer.\n"); > + goto out_free_bufs; > + } > + > + dev_info(dev->ctrl.device, > + "allocated %lld MiB host memory buffer.\n", size / 1024 / 1024); > + dev->nr_host_mem_descs = i; > + dev->host_mem_size = size; > + dev->host_mem_descs = descs; > + dev->host_mem_desc_bufs = bufs; > + return 0; > + > +out_free_bufs: > + while (--i >= 0) { > + struct nvme_host_mem_buf_desc *desc = &dev->host_mem_descs[i]; I think you want to use the local desc's variable since dev->host_mem_descs isn't set if you goto this out_free_bufs label. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 4/4] nvme-pci: implement host memory buffer support 2017-05-22 7:22 ` Keith Busch @ 2017-05-22 12:48 ` Christoph Hellwig 0 siblings, 0 replies; 18+ messages in thread From: Christoph Hellwig @ 2017-05-22 12:48 UTC (permalink / raw) On Mon, May 22, 2017@03:22:30AM -0400, Keith Busch wrote: > > +out_free_bufs: > > + while (--i >= 0) { > > + struct nvme_host_mem_buf_desc *desc = &dev->host_mem_descs[i]; > > I think you want to use the local desc's variable since > dev->host_mem_descs isn't set if you goto this out_free_bufs label. Indeed.. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 4/4] nvme-pci: implement host memory buffer support 2017-05-20 13:13 ` [PATCH 4/4] nvme-pci: implement host memory buffer support Christoph Hellwig 2017-05-22 7:22 ` Keith Busch @ 2017-05-30 14:35 ` Sagi Grimberg 2017-05-30 21:39 ` Max Gurtovoy 2 siblings, 0 replies; 18+ messages in thread From: Sagi Grimberg @ 2017-05-30 14:35 UTC (permalink / raw) > + dev_info(dev->ctrl.device, > + "allocated %lld MiB host memory buffer.\n", size / 1024 / 1024); super-nit: SZ_1M Looks good, Reviewed-by: Sagi Grimberg <sagi at grimberg.me> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 4/4] nvme-pci: implement host memory buffer support 2017-05-20 13:13 ` [PATCH 4/4] nvme-pci: implement host memory buffer support Christoph Hellwig 2017-05-22 7:22 ` Keith Busch 2017-05-30 14:35 ` Sagi Grimberg @ 2017-05-30 21:39 ` Max Gurtovoy 2017-06-01 7:18 ` Christoph Hellwig 2 siblings, 1 reply; 18+ messages in thread From: Max Gurtovoy @ 2017-05-30 21:39 UTC (permalink / raw) On 5/20/2017 4:13 PM, Christoph Hellwig wrote: > If a controller supports the host memory buffer we try to provide > it with the requested size up to an upper cap set as a module > parameter. We try to give as few as possible descriptors, eventually > working our way down. > > Signed-off-by: Christoph Hellwig <hch at lst.de> > --- > drivers/nvme/host/pci.c | 187 +++++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 185 insertions(+), 2 deletions(-) > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > index fed803232edc..93868b9750cd 100644 > --- a/drivers/nvme/host/pci.c > +++ b/drivers/nvme/host/pci.c > @@ -66,6 +66,11 @@ static bool use_cmb_sqes = true; > module_param(use_cmb_sqes, bool, 0644); > MODULE_PARM_DESC(use_cmb_sqes, "use controller's memory buffer for I/O SQes"); > > +static unsigned int max_host_mem_size_mb = 128; > +module_param(max_host_mem_size_mb, uint, 0444); > +MODULE_PARM_DESC(max_host_mem_size_mb, > + "Maximum Host Memory Buffer (HMB) size per controller (in MiB)"); > + > static struct workqueue_struct *nvme_workq; > > struct nvme_dev; > @@ -104,10 +109,18 @@ struct nvme_dev { > u32 cmbloc; > struct nvme_ctrl ctrl; > struct completion ioq_wait; > + > + /* shadow doorbell buffer support: */ > u32 *dbbuf_dbs; > dma_addr_t dbbuf_dbs_dma_addr; > u32 *dbbuf_eis; > dma_addr_t dbbuf_eis_dma_addr; > + > + /* host memory buffer support: */ > + u64 host_mem_size; > + u32 nr_host_mem_descs; > + struct nvme_host_mem_buf_desc *host_mem_descs; > + void **host_mem_desc_bufs; > }; > > static inline unsigned int sq_idx(unsigned int qid, u32 stride) > @@ -1509,6 +1522,160 @@ static inline void nvme_release_cmb(struct nvme_dev *dev) > } > } > > +static int nvme_set_host_mem(struct nvme_dev *dev, u32 bits) > +{ > + size_t len = dev->nr_host_mem_descs * sizeof(*dev->host_mem_descs); > + struct nvme_command c; > + u64 dma_addr; > + int ret; > + > + dma_addr = dma_map_single(dev->dev, dev->host_mem_descs, len, > + DMA_TO_DEVICE); > + if (dma_mapping_error(dev->dev, dma_addr)) > + return -ENOMEM; > + > + memset(&c, 0, sizeof(c)); > + c.features.opcode = nvme_admin_set_features; > + c.features.fid = cpu_to_le32(NVME_FEAT_HOST_MEM_BUF); > + c.features.dword11 = cpu_to_le32(bits); > + c.features.dword12 = cpu_to_le32(dev->host_mem_size / > + dev->ctrl.page_size); > + c.features.dword13 = cpu_to_le32(lower_32_bits(dma_addr)); > + c.features.dword14 = cpu_to_le32(upper_32_bits(dma_addr)); > + c.features.dword15 = cpu_to_le32(dev->nr_host_mem_descs); > + > + ret = nvme_submit_sync_cmd(dev->ctrl.admin_q, &c, NULL, 0); > + if (ret) { > + dev_warn(dev->ctrl.device, > + "failed to set host mem (err %d, flags %#x).\n", > + ret, bits); > + } > + dma_unmap_single(dev->dev, dma_addr, len, DMA_TO_DEVICE); > + return ret; > +} > + > +static void nvme_free_host_mem(struct nvme_dev *dev) > +{ > + int i; > + > + for (i = 0; i < dev->nr_host_mem_descs; i++) { > + struct nvme_host_mem_buf_desc *desc = &dev->host_mem_descs[i]; > + size_t size = le32_to_cpu(desc->size) * dev->ctrl.page_size; > + > + dma_free_coherent(dev->dev, size, dev->host_mem_desc_bufs[i], > + le64_to_cpu(desc->addr)); > + } > + > + kfree(dev->host_mem_desc_bufs); > + dev->host_mem_desc_bufs = NULL; > + kfree(dev->host_mem_descs); > + dev->host_mem_descs = NULL; > +} > + > +static int nvme_alloc_host_mem(struct nvme_dev *dev, u64 min, u64 preferred) > +{ > + struct nvme_host_mem_buf_desc *descs; > + u32 chunk_size, max_entries, i = 0; > + void **bufs; > + u64 size; > + > + /* start big and work our way down */ > + chunk_size = min(preferred, (u64)PAGE_SIZE << MAX_ORDER); > +retry: > + max_entries = DIV_ROUND_UP(preferred, chunk_size); > + descs = kcalloc(max_entries, sizeof(*descs), GFP_KERNEL); > + if (!descs) > + goto out; > + > + bufs = kcalloc(max_entries, sizeof(*bufs), GFP_KERNEL); > + if (!bufs) > + goto out_free_descs; > + > + for (size = 0; size < preferred; size += chunk_size) { > + u32 len = min_t(u64, chunk_size, preferred - size); > + dma_addr_t dma_addr; > + > + bufs[i] = dma_alloc_attrs(dev->dev, len, &dma_addr, GFP_KERNEL, > + DMA_ATTR_NO_KERNEL_MAPPING | DMA_ATTR_NO_WARN); > + if (!bufs[i]) > + break; > + > + descs[i].addr = cpu_to_le64(dma_addr); > + descs[i].size = cpu_to_le32(len / dev->ctrl.page_size); > + i++; > + } > + > + if (!size || (min && size < min)) { > + dev_warn(dev->ctrl.device, > + "failed to allocate host memory buffer.\n"); > + goto out_free_bufs; > + } > + > + dev_info(dev->ctrl.device, > + "allocated %lld MiB host memory buffer.\n", size / 1024 / 1024); > + dev->nr_host_mem_descs = i; > + dev->host_mem_size = size; > + dev->host_mem_descs = descs; > + dev->host_mem_desc_bufs = bufs; > + return 0; > + > +out_free_bufs: > + while (--i >= 0) { > + struct nvme_host_mem_buf_desc *desc = &dev->host_mem_descs[i]; > + size_t size = le32_to_cpu(desc->size) * dev->ctrl.page_size; > + > + dma_free_coherent(dev->dev, size, dev->host_mem_desc_bufs[i], > + le64_to_cpu(desc->addr)); > + } > + > + kfree(bufs); > +out_free_descs: > + kfree(descs); > +out: > + /* try a smaller chunk size if we failed early */ > + if (chunk_size >= PAGE_SIZE * 2 && (i == 0 || size < min)) { > + chunk_size /= 2; > + goto retry; > + } > + dev->host_mem_descs = NULL; > + return -ENOMEM; > +} > + > +static void nvme_setup_host_mem(struct nvme_dev *dev) > +{ > + u64 max = (u64)max_host_mem_size_mb * 1024 * 1024; > + u64 preferred = (u64)dev->ctrl.hmpre * 4096; > + u64 min = (u64)dev->ctrl.hmmin * 4096; > + u32 enable_bits = NVME_HOST_MEM_ENABLE; > + > + preferred = min(preferred, max); > + if (min > max) { Should it be: if (min > preferred) ? for more intuitive code without assumptions? > + dev_warn(dev->ctrl.device, > + "min host memory (%lld MiB) above limit (%d MiB).\n", > + preferred / 1024 / 1024, max_host_mem_size_mb); here we can: preferred ==> min max_host_mem_size_mb ==> preferred / 1024 / 1024 ? other option is change the warning print from "preferred" to "min" since we assume that dev->ctrl.hmpre >= dev->ctrl.hmmin. your call. Otherwise Looks good, Reviewed-by: Max Gurtovoy <maxg at mellanox.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 4/4] nvme-pci: implement host memory buffer support 2017-05-30 21:39 ` Max Gurtovoy @ 2017-06-01 7:18 ` Christoph Hellwig 2017-06-01 9:19 ` Max Gurtovoy 0 siblings, 1 reply; 18+ messages in thread From: Christoph Hellwig @ 2017-06-01 7:18 UTC (permalink / raw) > > +static void nvme_setup_host_mem(struct nvme_dev *dev) > > +{ > > + u64 max = (u64)max_host_mem_size_mb * 1024 * 1024; > > + u64 preferred = (u64)dev->ctrl.hmpre * 4096; > > + u64 min = (u64)dev->ctrl.hmmin * 4096; > > + u32 enable_bits = NVME_HOST_MEM_ENABLE; > > + > > + preferred = min(preferred, max); > > + if (min > max) { > > Should it be: > if (min > preferred) ? No. max is the maxium value the kernel allows a device to take. If the minimum required value is bigger than that we have to fail, it has nothing to do with the preferred value. > > + dev_warn(dev->ctrl.device, > > + "min host memory (%lld MiB) above limit (%d MiB).\n", > > + preferred / 1024 / 1024, max_host_mem_size_mb); > > here we can: > preferred ==> min > max_host_mem_size_mb ==> preferred / 1024 / 1024 ? > > other option is change the warning print from "preferred" to "min" since > we assume that dev->ctrl.hmpre >= dev->ctrl.hmmin. The first one should indeed be min, but I think that's the only change we need. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 4/4] nvme-pci: implement host memory buffer support 2017-06-01 7:18 ` Christoph Hellwig @ 2017-06-01 9:19 ` Max Gurtovoy 0 siblings, 0 replies; 18+ messages in thread From: Max Gurtovoy @ 2017-06-01 9:19 UTC (permalink / raw) On 6/1/2017 10:18 AM, Christoph Hellwig wrote: >>> +static void nvme_setup_host_mem(struct nvme_dev *dev) >>> +{ >>> + u64 max = (u64)max_host_mem_size_mb * 1024 * 1024; >>> + u64 preferred = (u64)dev->ctrl.hmpre * 4096; >>> + u64 min = (u64)dev->ctrl.hmmin * 4096; >>> + u32 enable_bits = NVME_HOST_MEM_ENABLE; >>> + >>> + preferred = min(preferred, max); >>> + if (min > max) { >> >> Should it be: >> if (min > preferred) ? > > No. max is the maxium value the kernel allows a device to take. > If the minimum required value is bigger than that we have to > fail, it has nothing to do with the preferred value. > Ok, I probably got confused here because the "preferred = min(preferred, max);" statement was just before the "if" clause. >>> + dev_warn(dev->ctrl.device, >>> + "min host memory (%lld MiB) above limit (%d MiB).\n", >>> + preferred / 1024 / 1024, max_host_mem_size_mb); >> >> here we can: >> preferred ==> min >> max_host_mem_size_mb ==> preferred / 1024 / 1024 ? >> >> other option is change the warning print from "preferred" to "min" since >> we assume that dev->ctrl.hmpre >= dev->ctrl.hmmin. > > The first one should indeed be min, but I think that's the only change > we need. > Sure. ^ permalink raw reply [flat|nested] 18+ messages in thread
* host memory buffer support V2 @ 2017-06-01 11:27 Christoph Hellwig 2017-06-01 11:27 ` [PATCH 4/4] nvme-pci: implement host memory buffer support Christoph Hellwig 0 siblings, 1 reply; 18+ messages in thread From: Christoph Hellwig @ 2017-06-01 11:27 UTC (permalink / raw) This series adds support for the NVMe host memory buffer feature, which is a way resource contstrained devices can use some host computer memory for its own purposes. Changes since V1: - properly unwind on allocation failures - minor cleanups ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 4/4] nvme-pci: implement host memory buffer support 2017-06-01 11:27 host memory buffer support V2 Christoph Hellwig @ 2017-06-01 11:27 ` Christoph Hellwig 2017-06-01 12:58 ` Keith Busch ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Christoph Hellwig @ 2017-06-01 11:27 UTC (permalink / raw) If a controller supports the host memory buffer we try to provide it with the requested size up to an upper cap set as a module parameter. We try to give as few as possible descriptors, eventually working our way down. Signed-off-by: Christoph Hellwig <hch at lst.de> --- drivers/nvme/host/pci.c | 186 +++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 184 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index d52701df7245..b4235ffacd83 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -66,6 +66,11 @@ static bool use_cmb_sqes = true; module_param(use_cmb_sqes, bool, 0644); MODULE_PARM_DESC(use_cmb_sqes, "use controller's memory buffer for I/O SQes"); +static unsigned int max_host_mem_size_mb = 128; +module_param(max_host_mem_size_mb, uint, 0444); +MODULE_PARM_DESC(max_host_mem_size_mb, + "Maximum Host Memory Buffer (HMB) size per controller (in MiB)"); + static struct workqueue_struct *nvme_workq; struct nvme_dev; @@ -104,10 +109,18 @@ struct nvme_dev { u32 cmbloc; struct nvme_ctrl ctrl; struct completion ioq_wait; + + /* shadow doorbell buffer support: */ u32 *dbbuf_dbs; dma_addr_t dbbuf_dbs_dma_addr; u32 *dbbuf_eis; dma_addr_t dbbuf_eis_dma_addr; + + /* host memory buffer support: */ + u64 host_mem_size; + u32 nr_host_mem_descs; + struct nvme_host_mem_buf_desc *host_mem_descs; + void **host_mem_desc_bufs; }; static inline unsigned int sq_idx(unsigned int qid, u32 stride) @@ -1514,6 +1527,159 @@ static inline void nvme_release_cmb(struct nvme_dev *dev) } } +static int nvme_set_host_mem(struct nvme_dev *dev, u32 bits) +{ + size_t len = dev->nr_host_mem_descs * sizeof(*dev->host_mem_descs); + struct nvme_command c; + u64 dma_addr; + int ret; + + dma_addr = dma_map_single(dev->dev, dev->host_mem_descs, len, + DMA_TO_DEVICE); + if (dma_mapping_error(dev->dev, dma_addr)) + return -ENOMEM; + + memset(&c, 0, sizeof(c)); + c.features.opcode = nvme_admin_set_features; + c.features.fid = cpu_to_le32(NVME_FEAT_HOST_MEM_BUF); + c.features.dword11 = cpu_to_le32(bits); + c.features.dword12 = cpu_to_le32(dev->host_mem_size / + dev->ctrl.page_size); + c.features.dword13 = cpu_to_le32(lower_32_bits(dma_addr)); + c.features.dword14 = cpu_to_le32(upper_32_bits(dma_addr)); + c.features.dword15 = cpu_to_le32(dev->nr_host_mem_descs); + + ret = nvme_submit_sync_cmd(dev->ctrl.admin_q, &c, NULL, 0); + if (ret) { + dev_warn(dev->ctrl.device, + "failed to set host mem (err %d, flags %#x).\n", + ret, bits); + } + dma_unmap_single(dev->dev, dma_addr, len, DMA_TO_DEVICE); + return ret; +} + +static void nvme_free_host_mem(struct nvme_dev *dev) +{ + int i; + + for (i = 0; i < dev->nr_host_mem_descs; i++) { + struct nvme_host_mem_buf_desc *desc = &dev->host_mem_descs[i]; + size_t size = le32_to_cpu(desc->size) * dev->ctrl.page_size; + + dma_free_coherent(dev->dev, size, dev->host_mem_desc_bufs[i], + le64_to_cpu(desc->addr)); + } + + kfree(dev->host_mem_desc_bufs); + dev->host_mem_desc_bufs = NULL; + kfree(dev->host_mem_descs); + dev->host_mem_descs = NULL; +} + +static int nvme_alloc_host_mem(struct nvme_dev *dev, u64 min, u64 preferred) +{ + struct nvme_host_mem_buf_desc *descs; + u32 chunk_size, max_entries, i = 0; + void **bufs; + u64 size; + + /* start big and work our way down */ + chunk_size = min(preferred, (u64)PAGE_SIZE << MAX_ORDER); +retry: + max_entries = DIV_ROUND_UP(preferred, chunk_size); + descs = kcalloc(max_entries, sizeof(*descs), GFP_KERNEL); + if (!descs) + goto out; + + bufs = kcalloc(max_entries, sizeof(*bufs), GFP_KERNEL); + if (!bufs) + goto out_free_descs; + + for (size = 0; size < preferred; size += chunk_size) { + u32 len = min_t(u64, chunk_size, preferred - size); + dma_addr_t dma_addr; + + bufs[i] = dma_alloc_attrs(dev->dev, len, &dma_addr, GFP_KERNEL, + DMA_ATTR_NO_KERNEL_MAPPING | DMA_ATTR_NO_WARN); + if (!bufs[i]) + break; + + descs[i].addr = cpu_to_le64(dma_addr); + descs[i].size = cpu_to_le32(len / dev->ctrl.page_size); + i++; + } + + if (!size || (min && size < min)) { + dev_warn(dev->ctrl.device, + "failed to allocate host memory buffer.\n"); + goto out_free_bufs; + } + + dev_info(dev->ctrl.device, + "allocated %lld MiB host memory buffer.\n", size / SZ_1M); + dev->nr_host_mem_descs = i; + dev->host_mem_size = size; + dev->host_mem_descs = descs; + dev->host_mem_desc_bufs = bufs; + return 0; + +out_free_bufs: + while (--i >= 0) { + size_t size = le32_to_cpu(descs[i].size) * dev->ctrl.page_size; + + dma_free_coherent(dev->dev, size, bufs[i], + le64_to_cpu(descs[i].addr)); + } + + kfree(bufs); +out_free_descs: + kfree(descs); +out: + /* try a smaller chunk size if we failed early */ + if (chunk_size >= PAGE_SIZE * 2 && (i == 0 || size < min)) { + chunk_size /= 2; + goto retry; + } + dev->host_mem_descs = NULL; + return -ENOMEM; +} + +static void nvme_setup_host_mem(struct nvme_dev *dev) +{ + u64 max = (u64)max_host_mem_size_mb * SZ_1M; + u64 preferred = (u64)dev->ctrl.hmpre * 4096; + u64 min = (u64)dev->ctrl.hmmin * 4096; + u32 enable_bits = NVME_HOST_MEM_ENABLE; + + preferred = min(preferred, max); + if (min > max) { + dev_warn(dev->ctrl.device, + "min host memory (%lld MiB) above limit (%d MiB).\n", + min / SZ_1M, max_host_mem_size_mb); + nvme_free_host_mem(dev); + return; + } + + /* + * If we already have a buffer allocated check if we can reuse it. + */ + if (dev->host_mem_descs) { + if (dev->host_mem_size >= min) + enable_bits |= NVME_HOST_MEM_RETURN; + else + nvme_free_host_mem(dev); + } + + if (!dev->host_mem_descs) { + if (nvme_alloc_host_mem(dev, min, preferred)) + return; + } + + if (nvme_set_host_mem(dev, enable_bits)) + nvme_free_host_mem(dev); +} + static size_t db_bar_size(struct nvme_dev *dev, unsigned nr_io_queues) { return 4096 + ((nr_io_queues + 1) * 8 * dev->db_stride); @@ -1815,8 +1981,20 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown) * Give the controller a chance to complete all entered requests if * doing a safe shutdown. */ - if (!dead && shutdown) - nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT); + if (!dead) { + if (shutdown) + nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT); + + /* + * If the controller is still alive tell it to stop using the + * host memory buffer. In theory the shutdown / reset should + * make sure that it doesn't access the host memoery anymore, + * but I'd rather be safe than sorry.. + */ + if (dev->host_mem_descs) + nvme_set_host_mem(dev, 0); + + } nvme_stop_queues(&dev->ctrl); queues = dev->online_queues - 1; @@ -1951,6 +2129,9 @@ static void nvme_reset_work(struct work_struct *work) "unable to allocate dma for dbbuf\n"); } + if (dev->ctrl.hmpre) + nvme_setup_host_mem(dev); + result = nvme_setup_io_queues(dev); if (result) goto out; @@ -2189,6 +2370,7 @@ static void nvme_remove(struct pci_dev *pdev) flush_work(&dev->reset_work); nvme_uninit_ctrl(&dev->ctrl); nvme_dev_disable(dev, true); + nvme_free_host_mem(dev); nvme_dev_remove_admin(dev); nvme_free_queues(dev, 0); nvme_release_prp_pools(dev); -- 2.11.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 4/4] nvme-pci: implement host memory buffer support 2017-06-01 11:27 ` [PATCH 4/4] nvme-pci: implement host memory buffer support Christoph Hellwig @ 2017-06-01 12:58 ` Keith Busch 2017-06-01 14:56 ` Max Gurtovoy 2017-06-02 7:52 ` Johannes Thumshirn 2 siblings, 0 replies; 18+ messages in thread From: Keith Busch @ 2017-06-01 12:58 UTC (permalink / raw) Looks good. Reviewed-by: Keith Busch <keith.busch at intel.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 4/4] nvme-pci: implement host memory buffer support 2017-06-01 11:27 ` [PATCH 4/4] nvme-pci: implement host memory buffer support Christoph Hellwig 2017-06-01 12:58 ` Keith Busch @ 2017-06-01 14:56 ` Max Gurtovoy 2017-06-02 7:52 ` Johannes Thumshirn 2 siblings, 0 replies; 18+ messages in thread From: Max Gurtovoy @ 2017-06-01 14:56 UTC (permalink / raw) Looks good, Reviewed-by: Max Gurtovoy <maxg at mellanox.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 4/4] nvme-pci: implement host memory buffer support 2017-06-01 11:27 ` [PATCH 4/4] nvme-pci: implement host memory buffer support Christoph Hellwig 2017-06-01 12:58 ` Keith Busch 2017-06-01 14:56 ` Max Gurtovoy @ 2017-06-02 7:52 ` Johannes Thumshirn 2 siblings, 0 replies; 18+ messages in thread From: Johannes Thumshirn @ 2017-06-02 7:52 UTC (permalink / raw) On 06/01/2017 01:27 PM, Christoph Hellwig wrote: > If a controller supports the host memory buffer we try to provide > it with the requested size up to an upper cap set as a module > parameter. We try to give as few as possible descriptors, eventually > working our way down. > > Signed-off-by: Christoph Hellwig <hch at lst.de> > --- I don't really think the goto back and forth jumping is very intuitive, but I can't come up with a better solution either, so Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de> -- Johannes Thumshirn Storage jthumshirn at suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg GF: Felix Imend?rffer, Jane Smithard, Graham Norton HRB 21284 (AG N?rnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2017-06-02 7:52 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-05-20 13:13 host memory buffer support Christoph Hellwig 2017-05-20 13:13 ` [PATCH 1/4] nvme.h: add struct nvme_host_mem_buf_desc and HMB flags Christoph Hellwig 2017-05-30 14:31 ` Sagi Grimberg 2017-05-20 13:13 ` [PATCH 2/4] nvme.h: add dword 12 - 15 fields to struct nvme_features Christoph Hellwig 2017-05-30 14:31 ` Sagi Grimberg 2017-05-20 13:13 ` [PATCH 3/4] nvme: save hmpre and hmmin in struct nvme_ctrl Christoph Hellwig 2017-05-30 14:31 ` Sagi Grimberg 2017-05-20 13:13 ` [PATCH 4/4] nvme-pci: implement host memory buffer support Christoph Hellwig 2017-05-22 7:22 ` Keith Busch 2017-05-22 12:48 ` Christoph Hellwig 2017-05-30 14:35 ` Sagi Grimberg 2017-05-30 21:39 ` Max Gurtovoy 2017-06-01 7:18 ` Christoph Hellwig 2017-06-01 9:19 ` Max Gurtovoy 2017-06-01 11:27 host memory buffer support V2 Christoph Hellwig 2017-06-01 11:27 ` [PATCH 4/4] nvme-pci: implement host memory buffer support Christoph Hellwig 2017-06-01 12:58 ` Keith Busch 2017-06-01 14:56 ` Max Gurtovoy 2017-06-02 7:52 ` Johannes Thumshirn
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.