* host memory buffer support
@ 2017-05-20 13:13 Christoph Hellwig
2017-05-20 13:13 ` [PATCH 1/4] nvme.h: add struct nvme_host_mem_buf_desc and HMB flags Christoph Hellwig
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Christoph Hellwig @ 2017-05-20 13:13 UTC (permalink / raw)
This series adds support for the NVMe host memory buffer feature, which
is a way resource contstrained devices can use some host computer
memory for its own purposes.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/4] nvme.h: add struct nvme_host_mem_buf_desc and HMB flags
2017-05-20 13:13 host memory buffer support Christoph Hellwig
@ 2017-05-20 13:13 ` Christoph Hellwig
2017-05-30 14:31 ` Sagi Grimberg
2017-05-20 13:13 ` [PATCH 2/4] nvme.h: add dword 12 - 15 fields to struct nvme_features Christoph Hellwig
` (2 subsequent siblings)
3 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2017-05-20 13:13 UTC (permalink / raw)
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
include/linux/nvme.h | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index b625bacf37ef..9fd77fde4ae0 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -586,6 +586,11 @@ struct nvme_feat_auto_pst {
__le64 entries[32];
};
+enum {
+ NVME_HOST_MEM_ENABLE = (1 << 0),
+ NVME_HOST_MEM_RETURN = (1 << 1),
+};
+
/* Admin commands */
enum nvme_admin_opcode {
@@ -670,6 +675,12 @@ struct nvme_features {
__u32 rsvd12[4];
};
+struct nvme_host_mem_buf_desc {
+ __le64 addr;
+ __le32 size;
+ __u32 rsvd;
+};
+
struct nvme_create_cq {
__u8 opcode;
__u8 flags;
--
2.11.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/4] nvme.h: add dword 12 - 15 fields to struct nvme_features
2017-05-20 13:13 host memory buffer support Christoph Hellwig
2017-05-20 13:13 ` [PATCH 1/4] nvme.h: add struct nvme_host_mem_buf_desc and HMB flags Christoph Hellwig
@ 2017-05-20 13:13 ` Christoph Hellwig
2017-05-30 14:31 ` Sagi Grimberg
2017-05-20 13:13 ` [PATCH 3/4] nvme: save hmpre and hmmin in struct nvme_ctrl Christoph Hellwig
2017-05-20 13:13 ` [PATCH 4/4] nvme-pci: implement host memory buffer support Christoph Hellwig
3 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2017-05-20 13:13 UTC (permalink / raw)
From: Arnav Dawn <a.dawn@samsung.com>
Signed-off-by: Arnav Dawn <a.dawn at samsung.com>
[hch: split from a larger patch, new changelog]
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
include/linux/nvme.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 9fd77fde4ae0..beaefe822104 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -672,7 +672,10 @@ struct nvme_features {
union nvme_data_ptr dptr;
__le32 fid;
__le32 dword11;
- __u32 rsvd12[4];
+ __le32 dword12;
+ __le32 dword13;
+ __le32 dword14;
+ __le32 dword15;
};
struct nvme_host_mem_buf_desc {
--
2.11.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/4] nvme: save hmpre and hmmin in struct nvme_ctrl
2017-05-20 13:13 host memory buffer support Christoph Hellwig
2017-05-20 13:13 ` [PATCH 1/4] nvme.h: add struct nvme_host_mem_buf_desc and HMB flags Christoph Hellwig
2017-05-20 13:13 ` [PATCH 2/4] nvme.h: add dword 12 - 15 fields to struct nvme_features Christoph Hellwig
@ 2017-05-20 13:13 ` Christoph Hellwig
2017-05-30 14:31 ` Sagi Grimberg
2017-05-20 13:13 ` [PATCH 4/4] nvme-pci: implement host memory buffer support Christoph Hellwig
3 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2017-05-20 13:13 UTC (permalink / raw)
We'll need the later for the HMB support.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
drivers/nvme/host/core.c | 2 ++
drivers/nvme/host/nvme.h | 3 +++
2 files changed, 5 insertions(+)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d5e0906262ea..0f7fada284da 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1625,6 +1625,8 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
}
} else {
ctrl->cntlid = le16_to_cpu(id->cntlid);
+ ctrl->hmpre = le32_to_cpu(id->hmpre);
+ ctrl->hmmin = le32_to_cpu(id->hmmin);
}
kfree(id);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 29c708ca9621..605f25b5145e 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -166,6 +166,9 @@ struct nvme_ctrl {
/* Power saving configuration */
u64 ps_max_latency_us;
+ u32 hmpre;
+ u32 hmmin;
+
/* Fabrics only */
u16 sqsize;
u32 ioccsz;
--
2.11.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 4/4] nvme-pci: implement host memory buffer support
2017-05-20 13:13 host memory buffer support Christoph Hellwig
` (2 preceding siblings ...)
2017-05-20 13:13 ` [PATCH 3/4] nvme: save hmpre and hmmin in struct nvme_ctrl Christoph Hellwig
@ 2017-05-20 13:13 ` Christoph Hellwig
2017-05-22 7:22 ` Keith Busch
` (2 more replies)
3 siblings, 3 replies; 14+ messages in thread
From: Christoph Hellwig @ 2017-05-20 13:13 UTC (permalink / raw)
If a controller supports the host memory buffer we try to provide
it with the requested size up to an upper cap set as a module
parameter. We try to give as few as possible descriptors, eventually
working our way down.
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
drivers/nvme/host/pci.c | 187 +++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 185 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index fed803232edc..93868b9750cd 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -66,6 +66,11 @@ static bool use_cmb_sqes = true;
module_param(use_cmb_sqes, bool, 0644);
MODULE_PARM_DESC(use_cmb_sqes, "use controller's memory buffer for I/O SQes");
+static unsigned int max_host_mem_size_mb = 128;
+module_param(max_host_mem_size_mb, uint, 0444);
+MODULE_PARM_DESC(max_host_mem_size_mb,
+ "Maximum Host Memory Buffer (HMB) size per controller (in MiB)");
+
static struct workqueue_struct *nvme_workq;
struct nvme_dev;
@@ -104,10 +109,18 @@ struct nvme_dev {
u32 cmbloc;
struct nvme_ctrl ctrl;
struct completion ioq_wait;
+
+ /* shadow doorbell buffer support: */
u32 *dbbuf_dbs;
dma_addr_t dbbuf_dbs_dma_addr;
u32 *dbbuf_eis;
dma_addr_t dbbuf_eis_dma_addr;
+
+ /* host memory buffer support: */
+ u64 host_mem_size;
+ u32 nr_host_mem_descs;
+ struct nvme_host_mem_buf_desc *host_mem_descs;
+ void **host_mem_desc_bufs;
};
static inline unsigned int sq_idx(unsigned int qid, u32 stride)
@@ -1509,6 +1522,160 @@ static inline void nvme_release_cmb(struct nvme_dev *dev)
}
}
+static int nvme_set_host_mem(struct nvme_dev *dev, u32 bits)
+{
+ size_t len = dev->nr_host_mem_descs * sizeof(*dev->host_mem_descs);
+ struct nvme_command c;
+ u64 dma_addr;
+ int ret;
+
+ dma_addr = dma_map_single(dev->dev, dev->host_mem_descs, len,
+ DMA_TO_DEVICE);
+ if (dma_mapping_error(dev->dev, dma_addr))
+ return -ENOMEM;
+
+ memset(&c, 0, sizeof(c));
+ c.features.opcode = nvme_admin_set_features;
+ c.features.fid = cpu_to_le32(NVME_FEAT_HOST_MEM_BUF);
+ c.features.dword11 = cpu_to_le32(bits);
+ c.features.dword12 = cpu_to_le32(dev->host_mem_size /
+ dev->ctrl.page_size);
+ c.features.dword13 = cpu_to_le32(lower_32_bits(dma_addr));
+ c.features.dword14 = cpu_to_le32(upper_32_bits(dma_addr));
+ c.features.dword15 = cpu_to_le32(dev->nr_host_mem_descs);
+
+ ret = nvme_submit_sync_cmd(dev->ctrl.admin_q, &c, NULL, 0);
+ if (ret) {
+ dev_warn(dev->ctrl.device,
+ "failed to set host mem (err %d, flags %#x).\n",
+ ret, bits);
+ }
+ dma_unmap_single(dev->dev, dma_addr, len, DMA_TO_DEVICE);
+ return ret;
+}
+
+static void nvme_free_host_mem(struct nvme_dev *dev)
+{
+ int i;
+
+ for (i = 0; i < dev->nr_host_mem_descs; i++) {
+ struct nvme_host_mem_buf_desc *desc = &dev->host_mem_descs[i];
+ size_t size = le32_to_cpu(desc->size) * dev->ctrl.page_size;
+
+ dma_free_coherent(dev->dev, size, dev->host_mem_desc_bufs[i],
+ le64_to_cpu(desc->addr));
+ }
+
+ kfree(dev->host_mem_desc_bufs);
+ dev->host_mem_desc_bufs = NULL;
+ kfree(dev->host_mem_descs);
+ dev->host_mem_descs = NULL;
+}
+
+static int nvme_alloc_host_mem(struct nvme_dev *dev, u64 min, u64 preferred)
+{
+ struct nvme_host_mem_buf_desc *descs;
+ u32 chunk_size, max_entries, i = 0;
+ void **bufs;
+ u64 size;
+
+ /* start big and work our way down */
+ chunk_size = min(preferred, (u64)PAGE_SIZE << MAX_ORDER);
+retry:
+ max_entries = DIV_ROUND_UP(preferred, chunk_size);
+ descs = kcalloc(max_entries, sizeof(*descs), GFP_KERNEL);
+ if (!descs)
+ goto out;
+
+ bufs = kcalloc(max_entries, sizeof(*bufs), GFP_KERNEL);
+ if (!bufs)
+ goto out_free_descs;
+
+ for (size = 0; size < preferred; size += chunk_size) {
+ u32 len = min_t(u64, chunk_size, preferred - size);
+ dma_addr_t dma_addr;
+
+ bufs[i] = dma_alloc_attrs(dev->dev, len, &dma_addr, GFP_KERNEL,
+ DMA_ATTR_NO_KERNEL_MAPPING | DMA_ATTR_NO_WARN);
+ if (!bufs[i])
+ break;
+
+ descs[i].addr = cpu_to_le64(dma_addr);
+ descs[i].size = cpu_to_le32(len / dev->ctrl.page_size);
+ i++;
+ }
+
+ if (!size || (min && size < min)) {
+ dev_warn(dev->ctrl.device,
+ "failed to allocate host memory buffer.\n");
+ goto out_free_bufs;
+ }
+
+ dev_info(dev->ctrl.device,
+ "allocated %lld MiB host memory buffer.\n", size / 1024 / 1024);
+ dev->nr_host_mem_descs = i;
+ dev->host_mem_size = size;
+ dev->host_mem_descs = descs;
+ dev->host_mem_desc_bufs = bufs;
+ return 0;
+
+out_free_bufs:
+ while (--i >= 0) {
+ struct nvme_host_mem_buf_desc *desc = &dev->host_mem_descs[i];
+ size_t size = le32_to_cpu(desc->size) * dev->ctrl.page_size;
+
+ dma_free_coherent(dev->dev, size, dev->host_mem_desc_bufs[i],
+ le64_to_cpu(desc->addr));
+ }
+
+ kfree(bufs);
+out_free_descs:
+ kfree(descs);
+out:
+ /* try a smaller chunk size if we failed early */
+ if (chunk_size >= PAGE_SIZE * 2 && (i == 0 || size < min)) {
+ chunk_size /= 2;
+ goto retry;
+ }
+ dev->host_mem_descs = NULL;
+ return -ENOMEM;
+}
+
+static void nvme_setup_host_mem(struct nvme_dev *dev)
+{
+ u64 max = (u64)max_host_mem_size_mb * 1024 * 1024;
+ u64 preferred = (u64)dev->ctrl.hmpre * 4096;
+ u64 min = (u64)dev->ctrl.hmmin * 4096;
+ u32 enable_bits = NVME_HOST_MEM_ENABLE;
+
+ preferred = min(preferred, max);
+ if (min > max) {
+ dev_warn(dev->ctrl.device,
+ "min host memory (%lld MiB) above limit (%d MiB).\n",
+ preferred / 1024 / 1024, max_host_mem_size_mb);
+ nvme_free_host_mem(dev);
+ return;
+ }
+
+ /*
+ * If we already have a buffer allocated check if we can reuse it.
+ */
+ if (dev->host_mem_descs) {
+ if (dev->host_mem_size >= min)
+ enable_bits |= NVME_HOST_MEM_RETURN;
+ else
+ nvme_free_host_mem(dev);
+ }
+
+ if (!dev->host_mem_descs) {
+ if (nvme_alloc_host_mem(dev, min, preferred))
+ return;
+ }
+
+ if (nvme_set_host_mem(dev, enable_bits))
+ nvme_free_host_mem(dev);
+}
+
static size_t db_bar_size(struct nvme_dev *dev, unsigned nr_io_queues)
{
return 4096 + ((nr_io_queues + 1) * 8 * dev->db_stride);
@@ -1809,8 +1976,20 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
* Give the controller a chance to complete all entered requests if
* doing a safe shutdown.
*/
- if (!dead && shutdown)
- nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
+ if (!dead) {
+ if (shutdown)
+ nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
+
+ /*
+ * If the controller is still alive tell it to stop using the
+ * host memory buffer. In theory the shutdown / reset should
+ * make sure that it doesn't access the host memoery anymore,
+ * but I'd rather be safe than sorry..
+ */
+ if (dev->host_mem_descs)
+ nvme_set_host_mem(dev, 0);
+
+ }
nvme_stop_queues(&dev->ctrl);
queues = dev->online_queues - 1;
@@ -1945,6 +2124,9 @@ static void nvme_reset_work(struct work_struct *work)
"unable to allocate dma for dbbuf\n");
}
+ if (dev->ctrl.hmpre)
+ nvme_setup_host_mem(dev);
+
result = nvme_setup_io_queues(dev);
if (result)
goto out;
@@ -2182,6 +2364,7 @@ static void nvme_remove(struct pci_dev *pdev)
flush_work(&dev->reset_work);
nvme_uninit_ctrl(&dev->ctrl);
nvme_dev_disable(dev, true);
+ nvme_free_host_mem(dev);
nvme_dev_remove_admin(dev);
nvme_free_queues(dev, 0);
nvme_release_cmb(dev);
--
2.11.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 4/4] nvme-pci: implement host memory buffer support
2017-05-20 13:13 ` [PATCH 4/4] nvme-pci: implement host memory buffer support Christoph Hellwig
@ 2017-05-22 7:22 ` Keith Busch
2017-05-22 12:48 ` Christoph Hellwig
2017-05-30 14:35 ` Sagi Grimberg
2017-05-30 21:39 ` Max Gurtovoy
2 siblings, 1 reply; 14+ messages in thread
From: Keith Busch @ 2017-05-22 7:22 UTC (permalink / raw)
On Sat, May 20, 2017@03:13:57PM +0200, Christoph Hellwig wrote:
> + for (size = 0; size < preferred; size += chunk_size) {
> + u32 len = min_t(u64, chunk_size, preferred - size);
> + dma_addr_t dma_addr;
> +
> + bufs[i] = dma_alloc_attrs(dev->dev, len, &dma_addr, GFP_KERNEL,
> + DMA_ATTR_NO_KERNEL_MAPPING | DMA_ATTR_NO_WARN);
> + if (!bufs[i])
> + break;
> +
> + descs[i].addr = cpu_to_le64(dma_addr);
> + descs[i].size = cpu_to_le32(len / dev->ctrl.page_size);
> + i++;
> + }
> +
> + if (!size || (min && size < min)) {
> + dev_warn(dev->ctrl.device,
> + "failed to allocate host memory buffer.\n");
> + goto out_free_bufs;
> + }
> +
> + dev_info(dev->ctrl.device,
> + "allocated %lld MiB host memory buffer.\n", size / 1024 / 1024);
> + dev->nr_host_mem_descs = i;
> + dev->host_mem_size = size;
> + dev->host_mem_descs = descs;
> + dev->host_mem_desc_bufs = bufs;
> + return 0;
> +
> +out_free_bufs:
> + while (--i >= 0) {
> + struct nvme_host_mem_buf_desc *desc = &dev->host_mem_descs[i];
I think you want to use the local desc's variable since
dev->host_mem_descs isn't set if you goto this out_free_bufs label.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 4/4] nvme-pci: implement host memory buffer support
2017-05-22 7:22 ` Keith Busch
@ 2017-05-22 12:48 ` Christoph Hellwig
0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2017-05-22 12:48 UTC (permalink / raw)
On Mon, May 22, 2017@03:22:30AM -0400, Keith Busch wrote:
> > +out_free_bufs:
> > + while (--i >= 0) {
> > + struct nvme_host_mem_buf_desc *desc = &dev->host_mem_descs[i];
>
> I think you want to use the local desc's variable since
> dev->host_mem_descs isn't set if you goto this out_free_bufs label.
Indeed..
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/4] nvme.h: add struct nvme_host_mem_buf_desc and HMB flags
2017-05-20 13:13 ` [PATCH 1/4] nvme.h: add struct nvme_host_mem_buf_desc and HMB flags Christoph Hellwig
@ 2017-05-30 14:31 ` Sagi Grimberg
0 siblings, 0 replies; 14+ messages in thread
From: Sagi Grimberg @ 2017-05-30 14:31 UTC (permalink / raw)
Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/4] nvme.h: add dword 12 - 15 fields to struct nvme_features
2017-05-20 13:13 ` [PATCH 2/4] nvme.h: add dword 12 - 15 fields to struct nvme_features Christoph Hellwig
@ 2017-05-30 14:31 ` Sagi Grimberg
0 siblings, 0 replies; 14+ messages in thread
From: Sagi Grimberg @ 2017-05-30 14:31 UTC (permalink / raw)
Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 3/4] nvme: save hmpre and hmmin in struct nvme_ctrl
2017-05-20 13:13 ` [PATCH 3/4] nvme: save hmpre and hmmin in struct nvme_ctrl Christoph Hellwig
@ 2017-05-30 14:31 ` Sagi Grimberg
0 siblings, 0 replies; 14+ messages in thread
From: Sagi Grimberg @ 2017-05-30 14:31 UTC (permalink / raw)
Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 4/4] nvme-pci: implement host memory buffer support
2017-05-20 13:13 ` [PATCH 4/4] nvme-pci: implement host memory buffer support Christoph Hellwig
2017-05-22 7:22 ` Keith Busch
@ 2017-05-30 14:35 ` Sagi Grimberg
2017-05-30 21:39 ` Max Gurtovoy
2 siblings, 0 replies; 14+ messages in thread
From: Sagi Grimberg @ 2017-05-30 14:35 UTC (permalink / raw)
> + dev_info(dev->ctrl.device,
> + "allocated %lld MiB host memory buffer.\n", size / 1024 / 1024);
super-nit: SZ_1M
Looks good,
Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 4/4] nvme-pci: implement host memory buffer support
2017-05-20 13:13 ` [PATCH 4/4] nvme-pci: implement host memory buffer support Christoph Hellwig
2017-05-22 7:22 ` Keith Busch
2017-05-30 14:35 ` Sagi Grimberg
@ 2017-05-30 21:39 ` Max Gurtovoy
2017-06-01 7:18 ` Christoph Hellwig
2 siblings, 1 reply; 14+ messages in thread
From: Max Gurtovoy @ 2017-05-30 21:39 UTC (permalink / raw)
On 5/20/2017 4:13 PM, Christoph Hellwig wrote:
> If a controller supports the host memory buffer we try to provide
> it with the requested size up to an upper cap set as a module
> parameter. We try to give as few as possible descriptors, eventually
> working our way down.
>
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
> drivers/nvme/host/pci.c | 187 +++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 185 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index fed803232edc..93868b9750cd 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -66,6 +66,11 @@ static bool use_cmb_sqes = true;
> module_param(use_cmb_sqes, bool, 0644);
> MODULE_PARM_DESC(use_cmb_sqes, "use controller's memory buffer for I/O SQes");
>
> +static unsigned int max_host_mem_size_mb = 128;
> +module_param(max_host_mem_size_mb, uint, 0444);
> +MODULE_PARM_DESC(max_host_mem_size_mb,
> + "Maximum Host Memory Buffer (HMB) size per controller (in MiB)");
> +
> static struct workqueue_struct *nvme_workq;
>
> struct nvme_dev;
> @@ -104,10 +109,18 @@ struct nvme_dev {
> u32 cmbloc;
> struct nvme_ctrl ctrl;
> struct completion ioq_wait;
> +
> + /* shadow doorbell buffer support: */
> u32 *dbbuf_dbs;
> dma_addr_t dbbuf_dbs_dma_addr;
> u32 *dbbuf_eis;
> dma_addr_t dbbuf_eis_dma_addr;
> +
> + /* host memory buffer support: */
> + u64 host_mem_size;
> + u32 nr_host_mem_descs;
> + struct nvme_host_mem_buf_desc *host_mem_descs;
> + void **host_mem_desc_bufs;
> };
>
> static inline unsigned int sq_idx(unsigned int qid, u32 stride)
> @@ -1509,6 +1522,160 @@ static inline void nvme_release_cmb(struct nvme_dev *dev)
> }
> }
>
> +static int nvme_set_host_mem(struct nvme_dev *dev, u32 bits)
> +{
> + size_t len = dev->nr_host_mem_descs * sizeof(*dev->host_mem_descs);
> + struct nvme_command c;
> + u64 dma_addr;
> + int ret;
> +
> + dma_addr = dma_map_single(dev->dev, dev->host_mem_descs, len,
> + DMA_TO_DEVICE);
> + if (dma_mapping_error(dev->dev, dma_addr))
> + return -ENOMEM;
> +
> + memset(&c, 0, sizeof(c));
> + c.features.opcode = nvme_admin_set_features;
> + c.features.fid = cpu_to_le32(NVME_FEAT_HOST_MEM_BUF);
> + c.features.dword11 = cpu_to_le32(bits);
> + c.features.dword12 = cpu_to_le32(dev->host_mem_size /
> + dev->ctrl.page_size);
> + c.features.dword13 = cpu_to_le32(lower_32_bits(dma_addr));
> + c.features.dword14 = cpu_to_le32(upper_32_bits(dma_addr));
> + c.features.dword15 = cpu_to_le32(dev->nr_host_mem_descs);
> +
> + ret = nvme_submit_sync_cmd(dev->ctrl.admin_q, &c, NULL, 0);
> + if (ret) {
> + dev_warn(dev->ctrl.device,
> + "failed to set host mem (err %d, flags %#x).\n",
> + ret, bits);
> + }
> + dma_unmap_single(dev->dev, dma_addr, len, DMA_TO_DEVICE);
> + return ret;
> +}
> +
> +static void nvme_free_host_mem(struct nvme_dev *dev)
> +{
> + int i;
> +
> + for (i = 0; i < dev->nr_host_mem_descs; i++) {
> + struct nvme_host_mem_buf_desc *desc = &dev->host_mem_descs[i];
> + size_t size = le32_to_cpu(desc->size) * dev->ctrl.page_size;
> +
> + dma_free_coherent(dev->dev, size, dev->host_mem_desc_bufs[i],
> + le64_to_cpu(desc->addr));
> + }
> +
> + kfree(dev->host_mem_desc_bufs);
> + dev->host_mem_desc_bufs = NULL;
> + kfree(dev->host_mem_descs);
> + dev->host_mem_descs = NULL;
> +}
> +
> +static int nvme_alloc_host_mem(struct nvme_dev *dev, u64 min, u64 preferred)
> +{
> + struct nvme_host_mem_buf_desc *descs;
> + u32 chunk_size, max_entries, i = 0;
> + void **bufs;
> + u64 size;
> +
> + /* start big and work our way down */
> + chunk_size = min(preferred, (u64)PAGE_SIZE << MAX_ORDER);
> +retry:
> + max_entries = DIV_ROUND_UP(preferred, chunk_size);
> + descs = kcalloc(max_entries, sizeof(*descs), GFP_KERNEL);
> + if (!descs)
> + goto out;
> +
> + bufs = kcalloc(max_entries, sizeof(*bufs), GFP_KERNEL);
> + if (!bufs)
> + goto out_free_descs;
> +
> + for (size = 0; size < preferred; size += chunk_size) {
> + u32 len = min_t(u64, chunk_size, preferred - size);
> + dma_addr_t dma_addr;
> +
> + bufs[i] = dma_alloc_attrs(dev->dev, len, &dma_addr, GFP_KERNEL,
> + DMA_ATTR_NO_KERNEL_MAPPING | DMA_ATTR_NO_WARN);
> + if (!bufs[i])
> + break;
> +
> + descs[i].addr = cpu_to_le64(dma_addr);
> + descs[i].size = cpu_to_le32(len / dev->ctrl.page_size);
> + i++;
> + }
> +
> + if (!size || (min && size < min)) {
> + dev_warn(dev->ctrl.device,
> + "failed to allocate host memory buffer.\n");
> + goto out_free_bufs;
> + }
> +
> + dev_info(dev->ctrl.device,
> + "allocated %lld MiB host memory buffer.\n", size / 1024 / 1024);
> + dev->nr_host_mem_descs = i;
> + dev->host_mem_size = size;
> + dev->host_mem_descs = descs;
> + dev->host_mem_desc_bufs = bufs;
> + return 0;
> +
> +out_free_bufs:
> + while (--i >= 0) {
> + struct nvme_host_mem_buf_desc *desc = &dev->host_mem_descs[i];
> + size_t size = le32_to_cpu(desc->size) * dev->ctrl.page_size;
> +
> + dma_free_coherent(dev->dev, size, dev->host_mem_desc_bufs[i],
> + le64_to_cpu(desc->addr));
> + }
> +
> + kfree(bufs);
> +out_free_descs:
> + kfree(descs);
> +out:
> + /* try a smaller chunk size if we failed early */
> + if (chunk_size >= PAGE_SIZE * 2 && (i == 0 || size < min)) {
> + chunk_size /= 2;
> + goto retry;
> + }
> + dev->host_mem_descs = NULL;
> + return -ENOMEM;
> +}
> +
> +static void nvme_setup_host_mem(struct nvme_dev *dev)
> +{
> + u64 max = (u64)max_host_mem_size_mb * 1024 * 1024;
> + u64 preferred = (u64)dev->ctrl.hmpre * 4096;
> + u64 min = (u64)dev->ctrl.hmmin * 4096;
> + u32 enable_bits = NVME_HOST_MEM_ENABLE;
> +
> + preferred = min(preferred, max);
> + if (min > max) {
Should it be:
if (min > preferred) ?
for more intuitive code without assumptions?
> + dev_warn(dev->ctrl.device,
> + "min host memory (%lld MiB) above limit (%d MiB).\n",
> + preferred / 1024 / 1024, max_host_mem_size_mb);
here we can:
preferred ==> min
max_host_mem_size_mb ==> preferred / 1024 / 1024 ?
other option is change the warning print from "preferred" to "min" since
we assume that dev->ctrl.hmpre >= dev->ctrl.hmmin.
your call.
Otherwise Looks good,
Reviewed-by: Max Gurtovoy <maxg at mellanox.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 4/4] nvme-pci: implement host memory buffer support
2017-05-30 21:39 ` Max Gurtovoy
@ 2017-06-01 7:18 ` Christoph Hellwig
2017-06-01 9:19 ` Max Gurtovoy
0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2017-06-01 7:18 UTC (permalink / raw)
> > +static void nvme_setup_host_mem(struct nvme_dev *dev)
> > +{
> > + u64 max = (u64)max_host_mem_size_mb * 1024 * 1024;
> > + u64 preferred = (u64)dev->ctrl.hmpre * 4096;
> > + u64 min = (u64)dev->ctrl.hmmin * 4096;
> > + u32 enable_bits = NVME_HOST_MEM_ENABLE;
> > +
> > + preferred = min(preferred, max);
> > + if (min > max) {
>
> Should it be:
> if (min > preferred) ?
No. max is the maxium value the kernel allows a device to take.
If the minimum required value is bigger than that we have to
fail, it has nothing to do with the preferred value.
> > + dev_warn(dev->ctrl.device,
> > + "min host memory (%lld MiB) above limit (%d MiB).\n",
> > + preferred / 1024 / 1024, max_host_mem_size_mb);
>
> here we can:
> preferred ==> min
> max_host_mem_size_mb ==> preferred / 1024 / 1024 ?
>
> other option is change the warning print from "preferred" to "min" since
> we assume that dev->ctrl.hmpre >= dev->ctrl.hmmin.
The first one should indeed be min, but I think that's the only change
we need.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 4/4] nvme-pci: implement host memory buffer support
2017-06-01 7:18 ` Christoph Hellwig
@ 2017-06-01 9:19 ` Max Gurtovoy
0 siblings, 0 replies; 14+ messages in thread
From: Max Gurtovoy @ 2017-06-01 9:19 UTC (permalink / raw)
On 6/1/2017 10:18 AM, Christoph Hellwig wrote:
>>> +static void nvme_setup_host_mem(struct nvme_dev *dev)
>>> +{
>>> + u64 max = (u64)max_host_mem_size_mb * 1024 * 1024;
>>> + u64 preferred = (u64)dev->ctrl.hmpre * 4096;
>>> + u64 min = (u64)dev->ctrl.hmmin * 4096;
>>> + u32 enable_bits = NVME_HOST_MEM_ENABLE;
>>> +
>>> + preferred = min(preferred, max);
>>> + if (min > max) {
>>
>> Should it be:
>> if (min > preferred) ?
>
> No. max is the maxium value the kernel allows a device to take.
> If the minimum required value is bigger than that we have to
> fail, it has nothing to do with the preferred value.
>
Ok, I probably got confused here because the "preferred = min(preferred,
max);" statement was just before the "if" clause.
>>> + dev_warn(dev->ctrl.device,
>>> + "min host memory (%lld MiB) above limit (%d MiB).\n",
>>> + preferred / 1024 / 1024, max_host_mem_size_mb);
>>
>> here we can:
>> preferred ==> min
>> max_host_mem_size_mb ==> preferred / 1024 / 1024 ?
>>
>> other option is change the warning print from "preferred" to "min" since
>> we assume that dev->ctrl.hmpre >= dev->ctrl.hmmin.
>
> The first one should indeed be min, but I think that's the only change
> we need.
>
Sure.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2017-06-01 9:19 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-20 13:13 host memory buffer support Christoph Hellwig
2017-05-20 13:13 ` [PATCH 1/4] nvme.h: add struct nvme_host_mem_buf_desc and HMB flags Christoph Hellwig
2017-05-30 14:31 ` Sagi Grimberg
2017-05-20 13:13 ` [PATCH 2/4] nvme.h: add dword 12 - 15 fields to struct nvme_features Christoph Hellwig
2017-05-30 14:31 ` Sagi Grimberg
2017-05-20 13:13 ` [PATCH 3/4] nvme: save hmpre and hmmin in struct nvme_ctrl Christoph Hellwig
2017-05-30 14:31 ` Sagi Grimberg
2017-05-20 13:13 ` [PATCH 4/4] nvme-pci: implement host memory buffer support Christoph Hellwig
2017-05-22 7:22 ` Keith Busch
2017-05-22 12:48 ` Christoph Hellwig
2017-05-30 14:35 ` Sagi Grimberg
2017-05-30 21:39 ` Max Gurtovoy
2017-06-01 7:18 ` Christoph Hellwig
2017-06-01 9:19 ` Max Gurtovoy
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.