All of lore.kernel.org
 help / color / mirror / Atom feed
* host memory buffer support
@ 2017-05-20 13:13 Christoph Hellwig
  2017-05-20 13:13 ` [PATCH 1/4] nvme.h: add struct nvme_host_mem_buf_desc and HMB flags Christoph Hellwig
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Christoph Hellwig @ 2017-05-20 13:13 UTC (permalink / raw)


This series adds support for the NVMe host memory buffer feature, which
is a way resource contstrained devices can use some host computer
memory for its own purposes.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] nvme.h: add struct nvme_host_mem_buf_desc and HMB flags
  2017-05-20 13:13 host memory buffer support Christoph Hellwig
@ 2017-05-20 13:13 ` Christoph Hellwig
  2017-05-30 14:31   ` Sagi Grimberg
  2017-05-20 13:13 ` [PATCH 2/4] nvme.h: add dword 12 - 15 fields to struct nvme_features Christoph Hellwig
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2017-05-20 13:13 UTC (permalink / raw)


Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 include/linux/nvme.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index b625bacf37ef..9fd77fde4ae0 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -586,6 +586,11 @@ struct nvme_feat_auto_pst {
 	__le64 entries[32];
 };
 
+enum {
+	NVME_HOST_MEM_ENABLE	= (1 << 0),
+	NVME_HOST_MEM_RETURN	= (1 << 1),
+};
+
 /* Admin commands */
 
 enum nvme_admin_opcode {
@@ -670,6 +675,12 @@ struct nvme_features {
 	__u32			rsvd12[4];
 };
 
+struct nvme_host_mem_buf_desc {
+	__le64			addr;
+	__le32			size;
+	__u32			rsvd;
+};
+
 struct nvme_create_cq {
 	__u8			opcode;
 	__u8			flags;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] nvme.h: add dword 12 - 15 fields to struct nvme_features
  2017-05-20 13:13 host memory buffer support Christoph Hellwig
  2017-05-20 13:13 ` [PATCH 1/4] nvme.h: add struct nvme_host_mem_buf_desc and HMB flags Christoph Hellwig
@ 2017-05-20 13:13 ` Christoph Hellwig
  2017-05-30 14:31   ` Sagi Grimberg
  2017-05-20 13:13 ` [PATCH 3/4] nvme: save hmpre and hmmin in struct nvme_ctrl Christoph Hellwig
  2017-05-20 13:13 ` [PATCH 4/4] nvme-pci: implement host memory buffer support Christoph Hellwig
  3 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2017-05-20 13:13 UTC (permalink / raw)


From: Arnav Dawn <a.dawn@samsung.com>

Signed-off-by: Arnav Dawn <a.dawn at samsung.com>
[hch: split from a larger patch, new changelog]
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 include/linux/nvme.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 9fd77fde4ae0..beaefe822104 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -672,7 +672,10 @@ struct nvme_features {
 	union nvme_data_ptr	dptr;
 	__le32			fid;
 	__le32			dword11;
-	__u32			rsvd12[4];
+	__le32                  dword12;
+	__le32                  dword13;
+	__le32                  dword14;
+	__le32                  dword15;
 };
 
 struct nvme_host_mem_buf_desc {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] nvme: save hmpre and hmmin in struct nvme_ctrl
  2017-05-20 13:13 host memory buffer support Christoph Hellwig
  2017-05-20 13:13 ` [PATCH 1/4] nvme.h: add struct nvme_host_mem_buf_desc and HMB flags Christoph Hellwig
  2017-05-20 13:13 ` [PATCH 2/4] nvme.h: add dword 12 - 15 fields to struct nvme_features Christoph Hellwig
@ 2017-05-20 13:13 ` Christoph Hellwig
  2017-05-30 14:31   ` Sagi Grimberg
  2017-05-20 13:13 ` [PATCH 4/4] nvme-pci: implement host memory buffer support Christoph Hellwig
  3 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2017-05-20 13:13 UTC (permalink / raw)


We'll need the later for the HMB support.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c | 2 ++
 drivers/nvme/host/nvme.h | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d5e0906262ea..0f7fada284da 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1625,6 +1625,8 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 		}
 	} else {
 		ctrl->cntlid = le16_to_cpu(id->cntlid);
+		ctrl->hmpre = le32_to_cpu(id->hmpre);
+		ctrl->hmmin = le32_to_cpu(id->hmmin);
 	}
 
 	kfree(id);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 29c708ca9621..605f25b5145e 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -166,6 +166,9 @@ struct nvme_ctrl {
 	/* Power saving configuration */
 	u64 ps_max_latency_us;
 
+	u32 hmpre;
+	u32 hmmin;
+
 	/* Fabrics only */
 	u16 sqsize;
 	u32 ioccsz;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] nvme-pci: implement host memory buffer support
  2017-05-20 13:13 host memory buffer support Christoph Hellwig
                   ` (2 preceding siblings ...)
  2017-05-20 13:13 ` [PATCH 3/4] nvme: save hmpre and hmmin in struct nvme_ctrl Christoph Hellwig
@ 2017-05-20 13:13 ` Christoph Hellwig
  2017-05-22  7:22   ` Keith Busch
                     ` (2 more replies)
  3 siblings, 3 replies; 14+ messages in thread
From: Christoph Hellwig @ 2017-05-20 13:13 UTC (permalink / raw)


If a controller supports the host memory buffer we try to provide
it with the requested size up to an upper cap set as a module
parameter.  We try to give as few as possible descriptors, eventually
working our way down.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 187 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 185 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index fed803232edc..93868b9750cd 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -66,6 +66,11 @@ static bool use_cmb_sqes = true;
 module_param(use_cmb_sqes, bool, 0644);
 MODULE_PARM_DESC(use_cmb_sqes, "use controller's memory buffer for I/O SQes");
 
+static unsigned int max_host_mem_size_mb = 128;
+module_param(max_host_mem_size_mb, uint, 0444);
+MODULE_PARM_DESC(max_host_mem_size_mb,
+	"Maximum Host Memory Buffer (HMB) size per controller (in MiB)");
+
 static struct workqueue_struct *nvme_workq;
 
 struct nvme_dev;
@@ -104,10 +109,18 @@ struct nvme_dev {
 	u32 cmbloc;
 	struct nvme_ctrl ctrl;
 	struct completion ioq_wait;
+
+	/* shadow doorbell buffer support: */
 	u32 *dbbuf_dbs;
 	dma_addr_t dbbuf_dbs_dma_addr;
 	u32 *dbbuf_eis;
 	dma_addr_t dbbuf_eis_dma_addr;
+
+	/* host memory buffer support: */
+	u64 host_mem_size;
+	u32 nr_host_mem_descs;
+	struct nvme_host_mem_buf_desc *host_mem_descs;
+	void **host_mem_desc_bufs;
 };
 
 static inline unsigned int sq_idx(unsigned int qid, u32 stride)
@@ -1509,6 +1522,160 @@ static inline void nvme_release_cmb(struct nvme_dev *dev)
 	}
 }
 
+static int nvme_set_host_mem(struct nvme_dev *dev, u32 bits)
+{
+	size_t len = dev->nr_host_mem_descs * sizeof(*dev->host_mem_descs);
+	struct nvme_command c;
+	u64 dma_addr;
+	int ret;
+
+	dma_addr = dma_map_single(dev->dev, dev->host_mem_descs, len,
+			DMA_TO_DEVICE);
+	if (dma_mapping_error(dev->dev, dma_addr))
+		return -ENOMEM;
+
+	memset(&c, 0, sizeof(c));
+	c.features.opcode	= nvme_admin_set_features;
+	c.features.fid		= cpu_to_le32(NVME_FEAT_HOST_MEM_BUF);
+	c.features.dword11	= cpu_to_le32(bits);
+	c.features.dword12	= cpu_to_le32(dev->host_mem_size /
+					      dev->ctrl.page_size);
+	c.features.dword13	= cpu_to_le32(lower_32_bits(dma_addr));
+	c.features.dword14	= cpu_to_le32(upper_32_bits(dma_addr));
+	c.features.dword15	= cpu_to_le32(dev->nr_host_mem_descs);
+
+	ret = nvme_submit_sync_cmd(dev->ctrl.admin_q, &c, NULL, 0);
+	if (ret) {
+		dev_warn(dev->ctrl.device,
+			 "failed to set host mem (err %d, flags %#x).\n",
+			 ret, bits);
+	}
+	dma_unmap_single(dev->dev, dma_addr, len, DMA_TO_DEVICE);
+	return ret;
+}
+
+static void nvme_free_host_mem(struct nvme_dev *dev)
+{
+	int i;
+
+	for (i = 0; i < dev->nr_host_mem_descs; i++) {
+		struct nvme_host_mem_buf_desc *desc = &dev->host_mem_descs[i];
+		size_t size = le32_to_cpu(desc->size) * dev->ctrl.page_size;
+
+		dma_free_coherent(dev->dev, size, dev->host_mem_desc_bufs[i],
+				le64_to_cpu(desc->addr));
+	}
+
+	kfree(dev->host_mem_desc_bufs);
+	dev->host_mem_desc_bufs = NULL;
+	kfree(dev->host_mem_descs);
+	dev->host_mem_descs = NULL;
+}
+
+static int nvme_alloc_host_mem(struct nvme_dev *dev, u64 min, u64 preferred)
+{
+	struct nvme_host_mem_buf_desc *descs;
+	u32 chunk_size, max_entries, i = 0;
+	void **bufs;
+	u64 size;
+
+	/* start big and work our way down */
+	chunk_size = min(preferred, (u64)PAGE_SIZE << MAX_ORDER);
+retry:
+	max_entries = DIV_ROUND_UP(preferred, chunk_size);
+	descs = kcalloc(max_entries, sizeof(*descs), GFP_KERNEL);
+	if (!descs)
+		goto out;
+
+	bufs = kcalloc(max_entries, sizeof(*bufs), GFP_KERNEL);
+	if (!bufs)
+		goto out_free_descs;
+
+	for (size = 0; size < preferred; size += chunk_size) {
+		u32 len = min_t(u64, chunk_size, preferred - size);
+		dma_addr_t dma_addr;
+
+		bufs[i] = dma_alloc_attrs(dev->dev, len, &dma_addr, GFP_KERNEL,
+				DMA_ATTR_NO_KERNEL_MAPPING | DMA_ATTR_NO_WARN);
+		if (!bufs[i])
+			break;
+
+		descs[i].addr = cpu_to_le64(dma_addr);
+		descs[i].size = cpu_to_le32(len / dev->ctrl.page_size);
+		i++;
+	}
+
+	if (!size || (min && size < min)) {
+		dev_warn(dev->ctrl.device,
+			"failed to allocate host memory buffer.\n");
+		goto out_free_bufs;
+	}
+
+	dev_info(dev->ctrl.device,
+		"allocated %lld MiB host memory buffer.\n", size / 1024 / 1024);
+	dev->nr_host_mem_descs = i;
+	dev->host_mem_size = size;
+	dev->host_mem_descs = descs;
+	dev->host_mem_desc_bufs = bufs;
+	return 0;
+
+out_free_bufs:
+	while (--i >= 0) {
+		struct nvme_host_mem_buf_desc *desc = &dev->host_mem_descs[i];
+		size_t size = le32_to_cpu(desc->size) * dev->ctrl.page_size;
+
+		dma_free_coherent(dev->dev, size, dev->host_mem_desc_bufs[i],
+				le64_to_cpu(desc->addr));
+	}
+
+	kfree(bufs);
+out_free_descs:
+	kfree(descs);
+out:
+	/* try a smaller chunk size if we failed early */
+	if (chunk_size >= PAGE_SIZE * 2 && (i == 0 || size < min)) {
+		chunk_size /= 2;
+		goto retry;
+	}
+	dev->host_mem_descs = NULL;
+	return -ENOMEM;
+}
+
+static void nvme_setup_host_mem(struct nvme_dev *dev)
+{
+	u64 max = (u64)max_host_mem_size_mb * 1024 * 1024;
+	u64 preferred = (u64)dev->ctrl.hmpre * 4096;
+	u64 min = (u64)dev->ctrl.hmmin * 4096;
+	u32 enable_bits = NVME_HOST_MEM_ENABLE;
+
+	preferred = min(preferred, max);
+	if (min > max) {
+		dev_warn(dev->ctrl.device,
+			"min host memory (%lld MiB) above limit (%d MiB).\n",
+			preferred / 1024 / 1024, max_host_mem_size_mb);
+		nvme_free_host_mem(dev);
+		return;
+	}
+
+	/*
+	 * If we already have a buffer allocated check if we can reuse it.
+	 */
+	if (dev->host_mem_descs) {
+		if (dev->host_mem_size >= min)
+			enable_bits |= NVME_HOST_MEM_RETURN;
+		else
+			nvme_free_host_mem(dev);
+	}
+
+	if (!dev->host_mem_descs) {
+		if (nvme_alloc_host_mem(dev, min, preferred))
+			return;
+	}
+
+	if (nvme_set_host_mem(dev, enable_bits))
+		nvme_free_host_mem(dev);
+}
+
 static size_t db_bar_size(struct nvme_dev *dev, unsigned nr_io_queues)
 {
 	return 4096 + ((nr_io_queues + 1) * 8 * dev->db_stride);
@@ -1809,8 +1976,20 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 	 * Give the controller a chance to complete all entered requests if
 	 * doing a safe shutdown.
 	 */
-	if (!dead && shutdown)
-		nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
+	if (!dead) {
+		if (shutdown)
+			nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
+
+		/*
+		 * If the controller is still alive tell it to stop using the
+		 * host memory buffer.  In theory the shutdown / reset should
+		 * make sure that it doesn't access the host memoery anymore,
+		 * but I'd rather be safe than sorry..
+		 */
+		if (dev->host_mem_descs)
+			nvme_set_host_mem(dev, 0);
+
+	}
 	nvme_stop_queues(&dev->ctrl);
 
 	queues = dev->online_queues - 1;
@@ -1945,6 +2124,9 @@ static void nvme_reset_work(struct work_struct *work)
 				 "unable to allocate dma for dbbuf\n");
 	}
 
+	if (dev->ctrl.hmpre)
+		nvme_setup_host_mem(dev);
+
 	result = nvme_setup_io_queues(dev);
 	if (result)
 		goto out;
@@ -2182,6 +2364,7 @@ static void nvme_remove(struct pci_dev *pdev)
 	flush_work(&dev->reset_work);
 	nvme_uninit_ctrl(&dev->ctrl);
 	nvme_dev_disable(dev, true);
+	nvme_free_host_mem(dev);
 	nvme_dev_remove_admin(dev);
 	nvme_free_queues(dev, 0);
 	nvme_release_cmb(dev);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] nvme-pci: implement host memory buffer support
  2017-05-20 13:13 ` [PATCH 4/4] nvme-pci: implement host memory buffer support Christoph Hellwig
@ 2017-05-22  7:22   ` Keith Busch
  2017-05-22 12:48     ` Christoph Hellwig
  2017-05-30 14:35   ` Sagi Grimberg
  2017-05-30 21:39   ` Max Gurtovoy
  2 siblings, 1 reply; 14+ messages in thread
From: Keith Busch @ 2017-05-22  7:22 UTC (permalink / raw)


On Sat, May 20, 2017@03:13:57PM +0200, Christoph Hellwig wrote:
> +	for (size = 0; size < preferred; size += chunk_size) {
> +		u32 len = min_t(u64, chunk_size, preferred - size);
> +		dma_addr_t dma_addr;
> +
> +		bufs[i] = dma_alloc_attrs(dev->dev, len, &dma_addr, GFP_KERNEL,
> +				DMA_ATTR_NO_KERNEL_MAPPING | DMA_ATTR_NO_WARN);
> +		if (!bufs[i])
> +			break;
> +
> +		descs[i].addr = cpu_to_le64(dma_addr);
> +		descs[i].size = cpu_to_le32(len / dev->ctrl.page_size);
> +		i++;
> +	}
> +
> +	if (!size || (min && size < min)) {
> +		dev_warn(dev->ctrl.device,
> +			"failed to allocate host memory buffer.\n");
> +		goto out_free_bufs;
> +	}
> +
> +	dev_info(dev->ctrl.device,
> +		"allocated %lld MiB host memory buffer.\n", size / 1024 / 1024);
> +	dev->nr_host_mem_descs = i;
> +	dev->host_mem_size = size;
> +	dev->host_mem_descs = descs;
> +	dev->host_mem_desc_bufs = bufs;
> +	return 0;
> +
> +out_free_bufs:
> +	while (--i >= 0) {
> +		struct nvme_host_mem_buf_desc *desc = &dev->host_mem_descs[i];

I think you want to use the local desc's variable since
dev->host_mem_descs isn't set if you goto this out_free_bufs label.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 4/4] nvme-pci: implement host memory buffer support
  2017-05-22  7:22   ` Keith Busch
@ 2017-05-22 12:48     ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2017-05-22 12:48 UTC (permalink / raw)


On Mon, May 22, 2017@03:22:30AM -0400, Keith Busch wrote:
> > +out_free_bufs:
> > +	while (--i >= 0) {
> > +		struct nvme_host_mem_buf_desc *desc = &dev->host_mem_descs[i];
> 
> I think you want to use the local desc's variable since
> dev->host_mem_descs isn't set if you goto this out_free_bufs label.

Indeed..

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] nvme.h: add struct nvme_host_mem_buf_desc and HMB flags
  2017-05-20 13:13 ` [PATCH 1/4] nvme.h: add struct nvme_host_mem_buf_desc and HMB flags Christoph Hellwig
@ 2017-05-30 14:31   ` Sagi Grimberg
  0 siblings, 0 replies; 14+ messages in thread
From: Sagi Grimberg @ 2017-05-30 14:31 UTC (permalink / raw)


Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/4] nvme.h: add dword 12 - 15 fields to struct nvme_features
  2017-05-20 13:13 ` [PATCH 2/4] nvme.h: add dword 12 - 15 fields to struct nvme_features Christoph Hellwig
@ 2017-05-30 14:31   ` Sagi Grimberg
  0 siblings, 0 replies; 14+ messages in thread
From: Sagi Grimberg @ 2017-05-30 14:31 UTC (permalink / raw)


Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 3/4] nvme: save hmpre and hmmin in struct nvme_ctrl
  2017-05-20 13:13 ` [PATCH 3/4] nvme: save hmpre and hmmin in struct nvme_ctrl Christoph Hellwig
@ 2017-05-30 14:31   ` Sagi Grimberg
  0 siblings, 0 replies; 14+ messages in thread
From: Sagi Grimberg @ 2017-05-30 14:31 UTC (permalink / raw)


Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 4/4] nvme-pci: implement host memory buffer support
  2017-05-20 13:13 ` [PATCH 4/4] nvme-pci: implement host memory buffer support Christoph Hellwig
  2017-05-22  7:22   ` Keith Busch
@ 2017-05-30 14:35   ` Sagi Grimberg
  2017-05-30 21:39   ` Max Gurtovoy
  2 siblings, 0 replies; 14+ messages in thread
From: Sagi Grimberg @ 2017-05-30 14:35 UTC (permalink / raw)



> +	dev_info(dev->ctrl.device,
> +		"allocated %lld MiB host memory buffer.\n", size / 1024 / 1024);

super-nit: SZ_1M

Looks good,

Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 4/4] nvme-pci: implement host memory buffer support
  2017-05-20 13:13 ` [PATCH 4/4] nvme-pci: implement host memory buffer support Christoph Hellwig
  2017-05-22  7:22   ` Keith Busch
  2017-05-30 14:35   ` Sagi Grimberg
@ 2017-05-30 21:39   ` Max Gurtovoy
  2017-06-01  7:18     ` Christoph Hellwig
  2 siblings, 1 reply; 14+ messages in thread
From: Max Gurtovoy @ 2017-05-30 21:39 UTC (permalink / raw)




On 5/20/2017 4:13 PM, Christoph Hellwig wrote:
> If a controller supports the host memory buffer we try to provide
> it with the requested size up to an upper cap set as a module
> parameter.  We try to give as few as possible descriptors, eventually
> working our way down.
>
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>  drivers/nvme/host/pci.c | 187 +++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 185 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index fed803232edc..93868b9750cd 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -66,6 +66,11 @@ static bool use_cmb_sqes = true;
>  module_param(use_cmb_sqes, bool, 0644);
>  MODULE_PARM_DESC(use_cmb_sqes, "use controller's memory buffer for I/O SQes");
>
> +static unsigned int max_host_mem_size_mb = 128;
> +module_param(max_host_mem_size_mb, uint, 0444);
> +MODULE_PARM_DESC(max_host_mem_size_mb,
> +	"Maximum Host Memory Buffer (HMB) size per controller (in MiB)");
> +
>  static struct workqueue_struct *nvme_workq;
>
>  struct nvme_dev;
> @@ -104,10 +109,18 @@ struct nvme_dev {
>  	u32 cmbloc;
>  	struct nvme_ctrl ctrl;
>  	struct completion ioq_wait;
> +
> +	/* shadow doorbell buffer support: */
>  	u32 *dbbuf_dbs;
>  	dma_addr_t dbbuf_dbs_dma_addr;
>  	u32 *dbbuf_eis;
>  	dma_addr_t dbbuf_eis_dma_addr;
> +
> +	/* host memory buffer support: */
> +	u64 host_mem_size;
> +	u32 nr_host_mem_descs;
> +	struct nvme_host_mem_buf_desc *host_mem_descs;
> +	void **host_mem_desc_bufs;
>  };
>
>  static inline unsigned int sq_idx(unsigned int qid, u32 stride)
> @@ -1509,6 +1522,160 @@ static inline void nvme_release_cmb(struct nvme_dev *dev)
>  	}
>  }
>
> +static int nvme_set_host_mem(struct nvme_dev *dev, u32 bits)
> +{
> +	size_t len = dev->nr_host_mem_descs * sizeof(*dev->host_mem_descs);
> +	struct nvme_command c;
> +	u64 dma_addr;
> +	int ret;
> +
> +	dma_addr = dma_map_single(dev->dev, dev->host_mem_descs, len,
> +			DMA_TO_DEVICE);
> +	if (dma_mapping_error(dev->dev, dma_addr))
> +		return -ENOMEM;
> +
> +	memset(&c, 0, sizeof(c));
> +	c.features.opcode	= nvme_admin_set_features;
> +	c.features.fid		= cpu_to_le32(NVME_FEAT_HOST_MEM_BUF);
> +	c.features.dword11	= cpu_to_le32(bits);
> +	c.features.dword12	= cpu_to_le32(dev->host_mem_size /
> +					      dev->ctrl.page_size);
> +	c.features.dword13	= cpu_to_le32(lower_32_bits(dma_addr));
> +	c.features.dword14	= cpu_to_le32(upper_32_bits(dma_addr));
> +	c.features.dword15	= cpu_to_le32(dev->nr_host_mem_descs);
> +
> +	ret = nvme_submit_sync_cmd(dev->ctrl.admin_q, &c, NULL, 0);
> +	if (ret) {
> +		dev_warn(dev->ctrl.device,
> +			 "failed to set host mem (err %d, flags %#x).\n",
> +			 ret, bits);
> +	}
> +	dma_unmap_single(dev->dev, dma_addr, len, DMA_TO_DEVICE);
> +	return ret;
> +}
> +
> +static void nvme_free_host_mem(struct nvme_dev *dev)
> +{
> +	int i;
> +
> +	for (i = 0; i < dev->nr_host_mem_descs; i++) {
> +		struct nvme_host_mem_buf_desc *desc = &dev->host_mem_descs[i];
> +		size_t size = le32_to_cpu(desc->size) * dev->ctrl.page_size;
> +
> +		dma_free_coherent(dev->dev, size, dev->host_mem_desc_bufs[i],
> +				le64_to_cpu(desc->addr));
> +	}
> +
> +	kfree(dev->host_mem_desc_bufs);
> +	dev->host_mem_desc_bufs = NULL;
> +	kfree(dev->host_mem_descs);
> +	dev->host_mem_descs = NULL;
> +}
> +
> +static int nvme_alloc_host_mem(struct nvme_dev *dev, u64 min, u64 preferred)
> +{
> +	struct nvme_host_mem_buf_desc *descs;
> +	u32 chunk_size, max_entries, i = 0;
> +	void **bufs;
> +	u64 size;
> +
> +	/* start big and work our way down */
> +	chunk_size = min(preferred, (u64)PAGE_SIZE << MAX_ORDER);
> +retry:
> +	max_entries = DIV_ROUND_UP(preferred, chunk_size);
> +	descs = kcalloc(max_entries, sizeof(*descs), GFP_KERNEL);
> +	if (!descs)
> +		goto out;
> +
> +	bufs = kcalloc(max_entries, sizeof(*bufs), GFP_KERNEL);
> +	if (!bufs)
> +		goto out_free_descs;
> +
> +	for (size = 0; size < preferred; size += chunk_size) {
> +		u32 len = min_t(u64, chunk_size, preferred - size);
> +		dma_addr_t dma_addr;
> +
> +		bufs[i] = dma_alloc_attrs(dev->dev, len, &dma_addr, GFP_KERNEL,
> +				DMA_ATTR_NO_KERNEL_MAPPING | DMA_ATTR_NO_WARN);
> +		if (!bufs[i])
> +			break;
> +
> +		descs[i].addr = cpu_to_le64(dma_addr);
> +		descs[i].size = cpu_to_le32(len / dev->ctrl.page_size);
> +		i++;
> +	}
> +
> +	if (!size || (min && size < min)) {
> +		dev_warn(dev->ctrl.device,
> +			"failed to allocate host memory buffer.\n");
> +		goto out_free_bufs;
> +	}
> +
> +	dev_info(dev->ctrl.device,
> +		"allocated %lld MiB host memory buffer.\n", size / 1024 / 1024);
> +	dev->nr_host_mem_descs = i;
> +	dev->host_mem_size = size;
> +	dev->host_mem_descs = descs;
> +	dev->host_mem_desc_bufs = bufs;
> +	return 0;
> +
> +out_free_bufs:
> +	while (--i >= 0) {
> +		struct nvme_host_mem_buf_desc *desc = &dev->host_mem_descs[i];
> +		size_t size = le32_to_cpu(desc->size) * dev->ctrl.page_size;
> +
> +		dma_free_coherent(dev->dev, size, dev->host_mem_desc_bufs[i],
> +				le64_to_cpu(desc->addr));
> +	}
> +
> +	kfree(bufs);
> +out_free_descs:
> +	kfree(descs);
> +out:
> +	/* try a smaller chunk size if we failed early */
> +	if (chunk_size >= PAGE_SIZE * 2 && (i == 0 || size < min)) {
> +		chunk_size /= 2;
> +		goto retry;
> +	}
> +	dev->host_mem_descs = NULL;
> +	return -ENOMEM;
> +}
> +
> +static void nvme_setup_host_mem(struct nvme_dev *dev)
> +{
> +	u64 max = (u64)max_host_mem_size_mb * 1024 * 1024;
> +	u64 preferred = (u64)dev->ctrl.hmpre * 4096;
> +	u64 min = (u64)dev->ctrl.hmmin * 4096;
> +	u32 enable_bits = NVME_HOST_MEM_ENABLE;
> +
> +	preferred = min(preferred, max);
> +	if (min > max) {

Should it be:
if (min > preferred) ?

for more intuitive code without assumptions?

> +		dev_warn(dev->ctrl.device,
> +			"min host memory (%lld MiB) above limit (%d MiB).\n",
> +			preferred / 1024 / 1024, max_host_mem_size_mb);

here we can:
preferred ==> min
max_host_mem_size_mb ==> preferred / 1024 / 1024 ?

other option is change the warning print from "preferred" to "min" since
we assume that dev->ctrl.hmpre >= dev->ctrl.hmmin.

your call.

Otherwise Looks good,
Reviewed-by: Max Gurtovoy <maxg at mellanox.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 4/4] nvme-pci: implement host memory buffer support
  2017-05-30 21:39   ` Max Gurtovoy
@ 2017-06-01  7:18     ` Christoph Hellwig
  2017-06-01  9:19       ` Max Gurtovoy
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2017-06-01  7:18 UTC (permalink / raw)


> > +static void nvme_setup_host_mem(struct nvme_dev *dev)
> > +{
> > +	u64 max = (u64)max_host_mem_size_mb * 1024 * 1024;
> > +	u64 preferred = (u64)dev->ctrl.hmpre * 4096;
> > +	u64 min = (u64)dev->ctrl.hmmin * 4096;
> > +	u32 enable_bits = NVME_HOST_MEM_ENABLE;
> > +
> > +	preferred = min(preferred, max);
> > +	if (min > max) {
> 
> Should it be:
> if (min > preferred) ?

No.  max is the maxium value the kernel allows a device to take.
If the minimum required value is bigger than that we have to
fail, it has nothing to do with the preferred value.

> > +		dev_warn(dev->ctrl.device,
> > +			"min host memory (%lld MiB) above limit (%d MiB).\n",
> > +			preferred / 1024 / 1024, max_host_mem_size_mb);
> 
> here we can:
> preferred ==> min
> max_host_mem_size_mb ==> preferred / 1024 / 1024 ?
> 
> other option is change the warning print from "preferred" to "min" since
> we assume that dev->ctrl.hmpre >= dev->ctrl.hmmin.

The first one should indeed be min, but I think that's the only change
we need.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 4/4] nvme-pci: implement host memory buffer support
  2017-06-01  7:18     ` Christoph Hellwig
@ 2017-06-01  9:19       ` Max Gurtovoy
  0 siblings, 0 replies; 14+ messages in thread
From: Max Gurtovoy @ 2017-06-01  9:19 UTC (permalink / raw)




On 6/1/2017 10:18 AM, Christoph Hellwig wrote:
>>> +static void nvme_setup_host_mem(struct nvme_dev *dev)
>>> +{
>>> +	u64 max = (u64)max_host_mem_size_mb * 1024 * 1024;
>>> +	u64 preferred = (u64)dev->ctrl.hmpre * 4096;
>>> +	u64 min = (u64)dev->ctrl.hmmin * 4096;
>>> +	u32 enable_bits = NVME_HOST_MEM_ENABLE;
>>> +
>>> +	preferred = min(preferred, max);
>>> +	if (min > max) {
>>
>> Should it be:
>> if (min > preferred) ?
>
> No.  max is the maxium value the kernel allows a device to take.
> If the minimum required value is bigger than that we have to
> fail, it has nothing to do with the preferred value.
>

Ok, I probably got confused here because the "preferred = min(preferred, 
max);" statement was just before the "if" clause.


>>> +		dev_warn(dev->ctrl.device,
>>> +			"min host memory (%lld MiB) above limit (%d MiB).\n",
>>> +			preferred / 1024 / 1024, max_host_mem_size_mb);
>>
>> here we can:
>> preferred ==> min
>> max_host_mem_size_mb ==> preferred / 1024 / 1024 ?
>>
>> other option is change the warning print from "preferred" to "min" since
>> we assume that dev->ctrl.hmpre >= dev->ctrl.hmmin.
>
> The first one should indeed be min, but I think that's the only change
> we need.
>

Sure.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-06-01  9:19 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-20 13:13 host memory buffer support Christoph Hellwig
2017-05-20 13:13 ` [PATCH 1/4] nvme.h: add struct nvme_host_mem_buf_desc and HMB flags Christoph Hellwig
2017-05-30 14:31   ` Sagi Grimberg
2017-05-20 13:13 ` [PATCH 2/4] nvme.h: add dword 12 - 15 fields to struct nvme_features Christoph Hellwig
2017-05-30 14:31   ` Sagi Grimberg
2017-05-20 13:13 ` [PATCH 3/4] nvme: save hmpre and hmmin in struct nvme_ctrl Christoph Hellwig
2017-05-30 14:31   ` Sagi Grimberg
2017-05-20 13:13 ` [PATCH 4/4] nvme-pci: implement host memory buffer support Christoph Hellwig
2017-05-22  7:22   ` Keith Busch
2017-05-22 12:48     ` Christoph Hellwig
2017-05-30 14:35   ` Sagi Grimberg
2017-05-30 21:39   ` Max Gurtovoy
2017-06-01  7:18     ` Christoph Hellwig
2017-06-01  9:19       ` Max Gurtovoy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.