linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] update RDMA controllers queue depth
@ 2021-09-22 21:55 Max Gurtovoy
  2021-09-22 21:55 ` [PATCH 1/3] nvme-rdma: limit the maximal queue size for RDMA controllers Max Gurtovoy
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Max Gurtovoy @ 2021-09-22 21:55 UTC (permalink / raw)
  To: linux-nvme, hch, kbusch, sagi, chaitanyak
  Cc: israelr, mruijter, oren, nitzanc, jgg, Max Gurtovoy

Hi all,
This series is solving the issue that was introduced by Mark Ruijter
while testing SPDK initiators on Vmware-7.x while connecting to Linux
RDMA target running on NVIDIA's ConnectX-6 Mellanox Technologies
adapter. During connection establishment, the NVMf target controller
expose a 1024 queue depth capability but wasn't able to satisfy this
depth in reality. The reason for that is that the NVMf driver didn't
take the underlying HW capabilities into consideration. For now, limit
the RDMA queue depth to a value of 128 (that is the default and works
for all the RDMA controllers probably). For that, introduce a new
controller operation to return the possible queue size for a given HW.
Other transport will continue with their old behaviour.

Also fix the other side of the coin. In case there is a target that is
capable to expose a queue depth of 1024 (or any value > 128), limit the
initiator side to open only queues upto 128 entries to avoid a situation
of failing to allocate larger QP.

In the future, in order to increase this size, we'll need to create a
special RDMA API to calculate a possible queue depth for ULPs. As we
know, the RDMA IO operations sometimes are built from multiple WRs (such
as memory registrations and invalidations) that the ULP driver should
take this into consideration during device discovery and queue depth
calculations.

Changes from V1:
 - added Reviewed-by signatures (Chaitanya)
 - rename get_queue_size to get_max_queue_size (Sagi)
 - add a fix to initiator side as well (Jason)
 - move NVME_RDMA_MAX_QUEUE_SIZE to common header file for nvme-rdma

Max Gurtovoy (3):
  nvme-rdma: limit the maximal queue size for RDMA controllers
  nvmet: add get_max_queue_size op for controllers
  nvmet-rdma: implement get_max_queue_size controller op

 drivers/nvme/host/rdma.c    | 7 +++++++
 drivers/nvme/target/core.c  | 8 +++++---
 drivers/nvme/target/nvmet.h | 1 +
 drivers/nvme/target/rdma.c  | 6 ++++++
 include/linux/nvme-rdma.h   | 2 ++
 5 files changed, 21 insertions(+), 3 deletions(-)

-- 
2.18.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/3] nvme-rdma: limit the maximal queue size for RDMA controllers
  2021-09-22 21:55 [PATCH v2 0/3] update RDMA controllers queue depth Max Gurtovoy
@ 2021-09-22 21:55 ` Max Gurtovoy
  2021-09-22 21:55 ` [PATCH 2/3] nvmet: add get_max_queue_size op for controllers Max Gurtovoy
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Max Gurtovoy @ 2021-09-22 21:55 UTC (permalink / raw)
  To: linux-nvme, hch, kbusch, sagi, chaitanyak
  Cc: israelr, mruijter, oren, nitzanc, jgg, Max Gurtovoy

Corrent limit of 1024 isn't valid for some of the RDMA based ctrls. In
case the target expose a cap of larger amount of entries (e.g. 1024),
the initiator may fail to create a QP with this size. Thus limit to a
value that works for all RDMA adapters.

Future general solution should use RDMA/core API to calculate this size
according to device capabilities and number of WRs needed per NVMe IO
request.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 drivers/nvme/host/rdma.c  | 7 +++++++
 include/linux/nvme-rdma.h | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index a68704e39084..a2f62eefccd9 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1111,6 +1111,13 @@ static int nvme_rdma_setup_ctrl(struct nvme_rdma_ctrl *ctrl, bool new)
 			ctrl->ctrl.opts->queue_size, ctrl->ctrl.sqsize + 1);
 	}
 
+	if (ctrl->ctrl.sqsize + 1 > NVME_RDMA_MAX_QUEUE_SIZE) {
+		dev_warn(ctrl->ctrl.device,
+			"ctrl sqsize %u > max queue size %u, clamping down\n",
+			ctrl->ctrl.sqsize + 1, NVME_RDMA_MAX_QUEUE_SIZE);
+		ctrl->ctrl.sqsize = NVME_RDMA_MAX_QUEUE_SIZE - 1;
+	}
+
 	if (ctrl->ctrl.sqsize + 1 > ctrl->ctrl.maxcmd) {
 		dev_warn(ctrl->ctrl.device,
 			"sqsize %u > ctrl maxcmd %u, clamping down\n",
diff --git a/include/linux/nvme-rdma.h b/include/linux/nvme-rdma.h
index 3ec8e50efa16..4dd7e6fe92fb 100644
--- a/include/linux/nvme-rdma.h
+++ b/include/linux/nvme-rdma.h
@@ -6,6 +6,8 @@
 #ifndef _LINUX_NVME_RDMA_H
 #define _LINUX_NVME_RDMA_H
 
+#define NVME_RDMA_MAX_QUEUE_SIZE	128
+
 enum nvme_rdma_cm_fmt {
 	NVME_RDMA_CM_FMT_1_0 = 0x0,
 };
-- 
2.18.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/3] nvmet: add get_max_queue_size op for controllers
  2021-09-22 21:55 [PATCH v2 0/3] update RDMA controllers queue depth Max Gurtovoy
  2021-09-22 21:55 ` [PATCH 1/3] nvme-rdma: limit the maximal queue size for RDMA controllers Max Gurtovoy
@ 2021-09-22 21:55 ` Max Gurtovoy
  2021-09-22 21:55 ` [PATCH 3/3] nvmet-rdma: implement get_max_queue_size controller op Max Gurtovoy
  2021-10-12 13:04 ` [PATCH v2 0/3] update RDMA controllers queue depth Christoph Hellwig
  3 siblings, 0 replies; 5+ messages in thread
From: Max Gurtovoy @ 2021-09-22 21:55 UTC (permalink / raw)
  To: linux-nvme, hch, kbusch, sagi, chaitanyak
  Cc: israelr, mruijter, oren, nitzanc, jgg, Max Gurtovoy

Some transports, such as RDMA, would like to set the queue size
according to device/port/ctrl characteristics. Add a new nvmet transport
op that is called during ctrl initialization. This will not effect
transports that don't implement this option.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 drivers/nvme/target/core.c  | 8 +++++---
 drivers/nvme/target/nvmet.h | 1 +
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index b8425fa34300..93107af3310d 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -1205,7 +1205,10 @@ static void nvmet_init_cap(struct nvmet_ctrl *ctrl)
 	/* CC.EN timeout in 500msec units: */
 	ctrl->cap |= (15ULL << 24);
 	/* maximum queue entries supported: */
-	ctrl->cap |= NVMET_QUEUE_SIZE - 1;
+	if (ctrl->ops->get_max_queue_size)
+		ctrl->cap |= ctrl->ops->get_max_queue_size(ctrl) - 1;
+	else
+		ctrl->cap |= NVMET_QUEUE_SIZE - 1;
 
 	if (nvmet_is_passthru_subsys(ctrl->subsys))
 		nvmet_passthrough_override_cap(ctrl);
@@ -1367,6 +1370,7 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 	mutex_init(&ctrl->lock);
 
 	ctrl->port = req->port;
+	ctrl->ops = req->ops;
 
 	INIT_WORK(&ctrl->async_event_work, nvmet_async_event_work);
 	INIT_LIST_HEAD(&ctrl->async_events);
@@ -1405,8 +1409,6 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 	}
 	ctrl->cntlid = ret;
 
-	ctrl->ops = req->ops;
-
 	/*
 	 * Discovery controllers may use some arbitrary high value
 	 * in order to cleanup stale discovery sessions
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 7143c7fa7464..f8e0ee131dc6 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -309,6 +309,7 @@ struct nvmet_fabrics_ops {
 	u16 (*install_queue)(struct nvmet_sq *nvme_sq);
 	void (*discovery_chg)(struct nvmet_port *port);
 	u8 (*get_mdts)(const struct nvmet_ctrl *ctrl);
+	u16 (*get_max_queue_size)(const struct nvmet_ctrl *ctrl);
 };
 
 #define NVMET_MAX_INLINE_BIOVEC	8
-- 
2.18.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/3] nvmet-rdma: implement get_max_queue_size controller op
  2021-09-22 21:55 [PATCH v2 0/3] update RDMA controllers queue depth Max Gurtovoy
  2021-09-22 21:55 ` [PATCH 1/3] nvme-rdma: limit the maximal queue size for RDMA controllers Max Gurtovoy
  2021-09-22 21:55 ` [PATCH 2/3] nvmet: add get_max_queue_size op for controllers Max Gurtovoy
@ 2021-09-22 21:55 ` Max Gurtovoy
  2021-10-12 13:04 ` [PATCH v2 0/3] update RDMA controllers queue depth Christoph Hellwig
  3 siblings, 0 replies; 5+ messages in thread
From: Max Gurtovoy @ 2021-09-22 21:55 UTC (permalink / raw)
  To: linux-nvme, hch, kbusch, sagi, chaitanyak
  Cc: israelr, mruijter, oren, nitzanc, jgg, Max Gurtovoy

Limit the maximal queue size for RDMA controllers. Today, the target
reports a limit of 1024 and this limit isn't valid for some of the RDMA
based controllers. For now, limit RDMA transport to 128 entries (the
max queue depth configured for Linux NVMe/RDMA host).

Future general solution should use RDMA/core API to calculate this size
according to device capabilities and number of WRs needed per NVMe IO
request.

Reported-by: Mark Ruijter <mruijter@primelogic.nl>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 drivers/nvme/target/rdma.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 891174ccd44b..8bf2fad429f9 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -1975,6 +1975,11 @@ static u8 nvmet_rdma_get_mdts(const struct nvmet_ctrl *ctrl)
 	return NVMET_RDMA_MAX_MDTS;
 }
 
+static u16 nvmet_rdma_get_max_queue_size(const struct nvmet_ctrl *ctrl)
+{
+	return NVME_RDMA_MAX_QUEUE_SIZE;
+}
+
 static const struct nvmet_fabrics_ops nvmet_rdma_ops = {
 	.owner			= THIS_MODULE,
 	.type			= NVMF_TRTYPE_RDMA,
@@ -1986,6 +1991,7 @@ static const struct nvmet_fabrics_ops nvmet_rdma_ops = {
 	.delete_ctrl		= nvmet_rdma_delete_ctrl,
 	.disc_traddr		= nvmet_rdma_disc_port_addr,
 	.get_mdts		= nvmet_rdma_get_mdts,
+	.get_max_queue_size	= nvmet_rdma_get_max_queue_size,
 };
 
 static void nvmet_rdma_remove_one(struct ib_device *ib_device, void *client_data)
-- 
2.18.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 0/3] update RDMA controllers queue depth
  2021-09-22 21:55 [PATCH v2 0/3] update RDMA controllers queue depth Max Gurtovoy
                   ` (2 preceding siblings ...)
  2021-09-22 21:55 ` [PATCH 3/3] nvmet-rdma: implement get_max_queue_size controller op Max Gurtovoy
@ 2021-10-12 13:04 ` Christoph Hellwig
  3 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2021-10-12 13:04 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: linux-nvme, hch, kbusch, sagi, chaitanyak, israelr, mruijter,
	oren, nitzanc, jgg

Thanks,

applied to nvme-5.16.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-10-12 13:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-22 21:55 [PATCH v2 0/3] update RDMA controllers queue depth Max Gurtovoy
2021-09-22 21:55 ` [PATCH 1/3] nvme-rdma: limit the maximal queue size for RDMA controllers Max Gurtovoy
2021-09-22 21:55 ` [PATCH 2/3] nvmet: add get_max_queue_size op for controllers Max Gurtovoy
2021-09-22 21:55 ` [PATCH 3/3] nvmet-rdma: implement get_max_queue_size controller op Max Gurtovoy
2021-10-12 13:04 ` [PATCH v2 0/3] update RDMA controllers queue depth Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).