All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/2] update RDMA controllers queue depth
@ 2021-09-21 19:04 Max Gurtovoy
  2021-09-21 19:04 ` [PATCH 1/2] nvmet: add get_queue_size op for controllers Max Gurtovoy
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Max Gurtovoy @ 2021-09-21 19:04 UTC (permalink / raw)
  To: linux-nvme, hch, kbusch, sagi, chaitanyak
  Cc: israelr, mruijter, oren, nitzanc, Max Gurtovoy

Hi all,
This series is solving the issue that was introduced by Mark Ruijter
while testing SPDK initiators on Vmware-7.x while connecting to Linux
RDMA target running on NVIDIA's ConnectX-6 Mellanox Technologies
adapter. During connection establishment, the NVMf target controller
expose a 1024 queue depth capability but wasn't able to satisfy this
depth in reality. The reason for that is that the NVMf driver didn't
take the underlying HW capabilities into consideration. For now, limit
the RDMA queue depth to a value of 128 (that is the default and works
for all the RDMA controllers probably). For that, introduce a new
controller operation to return the possible queue size for a given HW.
Other transport will continue with thier old behaviour.

In the future, in order to increase this size, we'll need to create a
special RDMA API to calculate a possible queue depth for ULPs. As we
know, the RDMA IO operations sometimes are built from multiple WRs (such
as memory registrations and invalidations) that the ULP driver should
take this into consideration during device discovery and queue depth
calculations.

Max Gurtovoy (2):
  nvmet: add get_queue_size op for controllers
  nvmet-rdma: implement get_queue_size controller op

 drivers/nvme/target/core.c  | 8 +++++---
 drivers/nvme/target/nvmet.h | 1 +
 drivers/nvme/target/rdma.c  | 8 ++++++++
 3 files changed, 14 insertions(+), 3 deletions(-)

-- 
2.18.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/2] nvmet: add get_queue_size op for controllers
  2021-09-21 19:04 [PATCH v1 0/2] update RDMA controllers queue depth Max Gurtovoy
@ 2021-09-21 19:04 ` Max Gurtovoy
  2021-09-21 19:20   ` Chaitanya Kulkarni
  2021-09-21 22:47   ` Sagi Grimberg
  2021-09-21 19:04 ` [PATCH 2/2] nvmet-rdma: implement get_queue_size controller op Max Gurtovoy
  2021-09-21 19:22 ` [PATCH v1 0/2] update RDMA controllers queue depth Chaitanya Kulkarni
  2 siblings, 2 replies; 19+ messages in thread
From: Max Gurtovoy @ 2021-09-21 19:04 UTC (permalink / raw)
  To: linux-nvme, hch, kbusch, sagi, chaitanyak
  Cc: israelr, mruijter, oren, nitzanc, Max Gurtovoy

Some transports, such as RDMA, would like to set the queue size
according to device/port/ctrl characteristics. Add a new nvmet transport
op that is called during ctrl initialization. This will not effect
transports that don't implement this option.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 drivers/nvme/target/core.c  | 8 +++++---
 drivers/nvme/target/nvmet.h | 1 +
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index b8425fa34300..cd42655d2980 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -1205,7 +1205,10 @@ static void nvmet_init_cap(struct nvmet_ctrl *ctrl)
 	/* CC.EN timeout in 500msec units: */
 	ctrl->cap |= (15ULL << 24);
 	/* maximum queue entries supported: */
-	ctrl->cap |= NVMET_QUEUE_SIZE - 1;
+	if (ctrl->ops->get_queue_size)
+		ctrl->cap |= ctrl->ops->get_queue_size(ctrl) - 1;
+	else
+		ctrl->cap |= NVMET_QUEUE_SIZE - 1;
 
 	if (nvmet_is_passthru_subsys(ctrl->subsys))
 		nvmet_passthrough_override_cap(ctrl);
@@ -1367,6 +1370,7 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 	mutex_init(&ctrl->lock);
 
 	ctrl->port = req->port;
+	ctrl->ops = req->ops;
 
 	INIT_WORK(&ctrl->async_event_work, nvmet_async_event_work);
 	INIT_LIST_HEAD(&ctrl->async_events);
@@ -1405,8 +1409,6 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 	}
 	ctrl->cntlid = ret;
 
-	ctrl->ops = req->ops;
-
 	/*
 	 * Discovery controllers may use some arbitrary high value
 	 * in order to cleanup stale discovery sessions
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 7143c7fa7464..79a4127f5ec9 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -309,6 +309,7 @@ struct nvmet_fabrics_ops {
 	u16 (*install_queue)(struct nvmet_sq *nvme_sq);
 	void (*discovery_chg)(struct nvmet_port *port);
 	u8 (*get_mdts)(const struct nvmet_ctrl *ctrl);
+	u16 (*get_queue_size)(const struct nvmet_ctrl *ctrl);
 };
 
 #define NVMET_MAX_INLINE_BIOVEC	8
-- 
2.18.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/2] nvmet-rdma: implement get_queue_size controller op
  2021-09-21 19:04 [PATCH v1 0/2] update RDMA controllers queue depth Max Gurtovoy
  2021-09-21 19:04 ` [PATCH 1/2] nvmet: add get_queue_size op for controllers Max Gurtovoy
@ 2021-09-21 19:04 ` Max Gurtovoy
  2021-09-21 19:21   ` Chaitanya Kulkarni
  2021-09-21 22:52   ` Sagi Grimberg
  2021-09-21 19:22 ` [PATCH v1 0/2] update RDMA controllers queue depth Chaitanya Kulkarni
  2 siblings, 2 replies; 19+ messages in thread
From: Max Gurtovoy @ 2021-09-21 19:04 UTC (permalink / raw)
  To: linux-nvme, hch, kbusch, sagi, chaitanyak
  Cc: israelr, mruijter, oren, nitzanc, Max Gurtovoy

Limit the maximal queue size for RDMA controllers. Today, the target
reports a limit of 1024 and this limit isn't valid for some of the RDMA
based controllers. For now, limit RDMA transport to 128 entries (the
default queue depth configured for Linux NVMeoF host fabric drivers).
Future general solution should use RDMA/core API to calculate this size
according to device capabilities and number of WRs needed per NVMe IO
request.

Reported-by: Mark Ruijter <mruijter@primelogic.nl>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 drivers/nvme/target/rdma.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 891174ccd44b..16342c5f31e8 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -36,6 +36,8 @@
 #define NVMET_RDMA_MAX_MDTS			8
 #define NVMET_RDMA_MAX_METADATA_MDTS		5
 
+#define NVMET_RDMA_QUEUE_SIZE	128
+
 struct nvmet_rdma_srq;
 
 struct nvmet_rdma_cmd {
@@ -1975,6 +1977,11 @@ static u8 nvmet_rdma_get_mdts(const struct nvmet_ctrl *ctrl)
 	return NVMET_RDMA_MAX_MDTS;
 }
 
+static u16 nvmet_rdma_get_queue_size(const struct nvmet_ctrl *ctrl)
+{
+	return NVMET_RDMA_QUEUE_SIZE;
+}
+
 static const struct nvmet_fabrics_ops nvmet_rdma_ops = {
 	.owner			= THIS_MODULE,
 	.type			= NVMF_TRTYPE_RDMA,
@@ -1986,6 +1993,7 @@ static const struct nvmet_fabrics_ops nvmet_rdma_ops = {
 	.delete_ctrl		= nvmet_rdma_delete_ctrl,
 	.disc_traddr		= nvmet_rdma_disc_port_addr,
 	.get_mdts		= nvmet_rdma_get_mdts,
+	.get_queue_size		= nvmet_rdma_get_queue_size,
 };
 
 static void nvmet_rdma_remove_one(struct ib_device *ib_device, void *client_data)
-- 
2.18.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] nvmet: add get_queue_size op for controllers
  2021-09-21 19:04 ` [PATCH 1/2] nvmet: add get_queue_size op for controllers Max Gurtovoy
@ 2021-09-21 19:20   ` Chaitanya Kulkarni
  2021-09-21 22:47   ` Sagi Grimberg
  1 sibling, 0 replies; 19+ messages in thread
From: Chaitanya Kulkarni @ 2021-09-21 19:20 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, hch, kbusch, sagi
  Cc: Israel Rukshin, mruijter, Oren Duer, Nitzan Carmi

On 9/21/2021 12:04 PM, Max Gurtovoy wrote:
> Some transports, such as RDMA, would like to set the queue size
> according to device/port/ctrl characteristics. Add a new nvmet transport
> op that is called during ctrl initialization. This will not effect
> transports that don't implement this option.
> 
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>

I guess there is no way other than making it generic API, looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] nvmet-rdma: implement get_queue_size controller op
  2021-09-21 19:04 ` [PATCH 2/2] nvmet-rdma: implement get_queue_size controller op Max Gurtovoy
@ 2021-09-21 19:21   ` Chaitanya Kulkarni
  2021-09-21 22:52   ` Sagi Grimberg
  1 sibling, 0 replies; 19+ messages in thread
From: Chaitanya Kulkarni @ 2021-09-21 19:21 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, hch, kbusch, sagi
  Cc: Israel Rukshin, mruijter, Oren Duer, Nitzan Carmi

On 9/21/2021 12:04 PM, Max Gurtovoy wrote:
> Limit the maximal queue size for RDMA controllers. Today, the target
> reports a limit of 1024 and this limit isn't valid for some of the RDMA
> based controllers. For now, limit RDMA transport to 128 entries (the
> default queue depth configured for Linux NVMeoF host fabric drivers).
> Future general solution should use RDMA/core API to calculate this size
> according to device capabilities and number of WRs needed per NVMe IO
> request.
> 
> Reported-by: Mark Ruijter <mruijter@primelogic.nl>
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>

Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 0/2] update RDMA controllers queue depth
  2021-09-21 19:04 [PATCH v1 0/2] update RDMA controllers queue depth Max Gurtovoy
  2021-09-21 19:04 ` [PATCH 1/2] nvmet: add get_queue_size op for controllers Max Gurtovoy
  2021-09-21 19:04 ` [PATCH 2/2] nvmet-rdma: implement get_queue_size controller op Max Gurtovoy
@ 2021-09-21 19:22 ` Chaitanya Kulkarni
  2021-09-21 19:42   ` Mark Ruijter
  2 siblings, 1 reply; 19+ messages in thread
From: Chaitanya Kulkarni @ 2021-09-21 19:22 UTC (permalink / raw)
  To: mruijter
  Cc: Israel Rukshin, Oren Duer, Nitzan Carmi, Max Gurtovoy,
	linux-nvme, kbusch, sagi, hch

Mark,

On 9/21/2021 12:04 PM, Max Gurtovoy wrote:
> Hi all,
> This series is solving the issue that was introduced by Mark Ruijter
> while testing SPDK initiators on Vmware-7.x while connecting to Linux
> RDMA target running on NVIDIA's ConnectX-6 Mellanox Technologies
> adapter. During connection establishment, the NVMf target controller
> expose a 1024 queue depth capability but wasn't able to satisfy this
> depth in reality. The reason for that is that the NVMf driver didn't
> take the underlying HW capabilities into consideration. For now, limit
> the RDMA queue depth to a value of 128 (that is the default and works
> for all the RDMA controllers probably). For that, introduce a new
> controller operation to return the possible queue size for a given HW.
> Other transport will continue with thier old behaviour.
> 
> In the future, in order to increase this size, we'll need to create a
> special RDMA API to calculate a possible queue depth for ULPs. As we
> know, the RDMA IO operations sometimes are built from multiple WRs (such
> as memory registrations and invalidations) that the ULP driver should
> take this into consideration during device discovery and queue depth
> calculations.
> 
> Max Gurtovoy (2):
>    nvmet: add get_queue_size op for controllers
>    nvmet-rdma: implement get_queue_size controller op
> 
>   drivers/nvme/target/core.c  | 8 +++++---
>   drivers/nvme/target/nvmet.h | 1 +
>   drivers/nvme/target/rdma.c  | 8 ++++++++
>   3 files changed, 14 insertions(+), 3 deletions(-)
> 

It will be great if you can provide tested by tag.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 0/2] update RDMA controllers queue depth
  2021-09-21 19:22 ` [PATCH v1 0/2] update RDMA controllers queue depth Chaitanya Kulkarni
@ 2021-09-21 19:42   ` Mark Ruijter
  0 siblings, 0 replies; 19+ messages in thread
From: Mark Ruijter @ 2021-09-21 19:42 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Israel Rukshin, Oren Duer, Nitzan Carmi, Max Gurtovoy,
	linux-nvme, kbusch, sagi, hch

Hi Chaitanya,

I’ll try to run some tests asap and let you know.

—Mark

> Op 21 sep. 2021 om 21:22 heeft Chaitanya Kulkarni <chaitanyak@nvidia.com> het volgende geschreven:
> 
> Mark,
> 
>> On 9/21/2021 12:04 PM, Max Gurtovoy wrote:
>> Hi all,
>> This series is solving the issue that was introduced by Mark Ruijter
>> while testing SPDK initiators on Vmware-7.x while connecting to Linux
>> RDMA target running on NVIDIA's ConnectX-6 Mellanox Technologies
>> adapter. During connection establishment, the NVMf target controller
>> expose a 1024 queue depth capability but wasn't able to satisfy this
>> depth in reality. The reason for that is that the NVMf driver didn't
>> take the underlying HW capabilities into consideration. For now, limit
>> the RDMA queue depth to a value of 128 (that is the default and works
>> for all the RDMA controllers probably). For that, introduce a new
>> controller operation to return the possible queue size for a given HW.
>> Other transport will continue with thier old behaviour.
>> 
>> In the future, in order to increase this size, we'll need to create a
>> special RDMA API to calculate a possible queue depth for ULPs. As we
>> know, the RDMA IO operations sometimes are built from multiple WRs (such
>> as memory registrations and invalidations) that the ULP driver should
>> take this into consideration during device discovery and queue depth
>> calculations.
>> 
>> Max Gurtovoy (2):
>>   nvmet: add get_queue_size op for controllers
>>   nvmet-rdma: implement get_queue_size controller op
>> 
>>  drivers/nvme/target/core.c  | 8 +++++---
>>  drivers/nvme/target/nvmet.h | 1 +
>>  drivers/nvme/target/rdma.c  | 8 ++++++++
>>  3 files changed, 14 insertions(+), 3 deletions(-)
>> 
> 
> It will be great if you can provide tested by tag.
> 
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] nvmet: add get_queue_size op for controllers
  2021-09-21 19:04 ` [PATCH 1/2] nvmet: add get_queue_size op for controllers Max Gurtovoy
  2021-09-21 19:20   ` Chaitanya Kulkarni
@ 2021-09-21 22:47   ` Sagi Grimberg
  2021-09-22  7:35     ` Max Gurtovoy
  1 sibling, 1 reply; 19+ messages in thread
From: Sagi Grimberg @ 2021-09-21 22:47 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, hch, kbusch, chaitanyak
  Cc: israelr, mruijter, oren, nitzanc


> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
> index b8425fa34300..cd42655d2980 100644
> --- a/drivers/nvme/target/core.c
> +++ b/drivers/nvme/target/core.c
> @@ -1205,7 +1205,10 @@ static void nvmet_init_cap(struct nvmet_ctrl *ctrl)
>   	/* CC.EN timeout in 500msec units: */
>   	ctrl->cap |= (15ULL << 24);
>   	/* maximum queue entries supported: */
> -	ctrl->cap |= NVMET_QUEUE_SIZE - 1;
> +	if (ctrl->ops->get_queue_size)

Maybe call it max_queue_size? it is a maximum supported in essence.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] nvmet-rdma: implement get_queue_size controller op
  2021-09-21 19:04 ` [PATCH 2/2] nvmet-rdma: implement get_queue_size controller op Max Gurtovoy
  2021-09-21 19:21   ` Chaitanya Kulkarni
@ 2021-09-21 22:52   ` Sagi Grimberg
  2021-09-22  7:44     ` Max Gurtovoy
  1 sibling, 1 reply; 19+ messages in thread
From: Sagi Grimberg @ 2021-09-21 22:52 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, hch, kbusch, chaitanyak
  Cc: israelr, mruijter, oren, nitzanc


> Limit the maximal queue size for RDMA controllers. Today, the target
> reports a limit of 1024 and this limit isn't valid for some of the RDMA
> based controllers. For now, limit RDMA transport to 128 entries (the
> default queue depth configured for Linux NVMeoF host fabric drivers).
> Future general solution should use RDMA/core API to calculate this size
> according to device capabilities and number of WRs needed per NVMe IO
> request.

What is preventing you from doing that today? You have the device,
can't you check attr.max_qp_wr?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] nvmet: add get_queue_size op for controllers
  2021-09-21 22:47   ` Sagi Grimberg
@ 2021-09-22  7:35     ` Max Gurtovoy
  0 siblings, 0 replies; 19+ messages in thread
From: Max Gurtovoy @ 2021-09-22  7:35 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme, hch, kbusch, chaitanyak
  Cc: israelr, mruijter, oren, nitzanc


On 9/22/2021 1:47 AM, Sagi Grimberg wrote:
>
>> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
>> index b8425fa34300..cd42655d2980 100644
>> --- a/drivers/nvme/target/core.c
>> +++ b/drivers/nvme/target/core.c
>> @@ -1205,7 +1205,10 @@ static void nvmet_init_cap(struct nvmet_ctrl 
>> *ctrl)
>>       /* CC.EN timeout in 500msec units: */
>>       ctrl->cap |= (15ULL << 24);
>>       /* maximum queue entries supported: */
>> -    ctrl->cap |= NVMET_QUEUE_SIZE - 1;
>> +    if (ctrl->ops->get_queue_size)
>
> Maybe call it max_queue_size? it is a maximum supported in essence.

Yes, I can call it get_max_queue_size.



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] nvmet-rdma: implement get_queue_size controller op
  2021-09-21 22:52   ` Sagi Grimberg
@ 2021-09-22  7:44     ` Max Gurtovoy
  2021-09-22  7:45       ` Christoph Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Max Gurtovoy @ 2021-09-22  7:44 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme, hch, kbusch, chaitanyak
  Cc: israelr, mruijter, oren, nitzanc


On 9/22/2021 1:52 AM, Sagi Grimberg wrote:
>
>> Limit the maximal queue size for RDMA controllers. Today, the target
>> reports a limit of 1024 and this limit isn't valid for some of the RDMA
>> based controllers. For now, limit RDMA transport to 128 entries (the
>> default queue depth configured for Linux NVMeoF host fabric drivers).
>> Future general solution should use RDMA/core API to calculate this size
>> according to device capabilities and number of WRs needed per NVMe IO
>> request.
>
> What is preventing you from doing that today? You have the device,
> can't you check attr.max_qp_wr?

max_qp_wr is giving me the maximal amount of wqe's one can issue (the 
minimal unit). In reality we have WRs that constructed from multiple WQEs.

Initially, I wanted to divide max_qp_wr by the maximal WR operation size 
in the low level driver. But that would cause ULPs that don't use this 
maximal WR to suffer.

So for now, as mentioned, till we have some ib_ API, lets set it to 128.


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] nvmet-rdma: implement get_queue_size controller op
  2021-09-22  7:44     ` Max Gurtovoy
@ 2021-09-22  7:45       ` Christoph Hellwig
  2021-09-22  8:10         ` Max Gurtovoy
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2021-09-22  7:45 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: Sagi Grimberg, linux-nvme, hch, kbusch, chaitanyak, israelr,
	mruijter, oren, nitzanc

On Wed, Sep 22, 2021 at 10:44:20AM +0300, Max Gurtovoy wrote:
> So for now, as mentioned, till we have some ib_ API, lets set it to 128.

Please just add the proper ib_ API, it should not be a whole lot of
work as we already do that calculation anyway for the R/W API setup.


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] nvmet-rdma: implement get_queue_size controller op
  2021-09-22  7:45       ` Christoph Hellwig
@ 2021-09-22  8:10         ` Max Gurtovoy
  2021-09-22  9:18           ` Sagi Grimberg
  0 siblings, 1 reply; 19+ messages in thread
From: Max Gurtovoy @ 2021-09-22  8:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, linux-nvme, kbusch, chaitanyak, israelr, mruijter,
	oren, nitzanc, Jason Gunthorpe


On 9/22/2021 10:45 AM, Christoph Hellwig wrote:
> On Wed, Sep 22, 2021 at 10:44:20AM +0300, Max Gurtovoy wrote:
>> So for now, as mentioned, till we have some ib_ API, lets set it to 128.
> Please just add the proper ib_ API, it should not be a whole lot of
> work as we already do that calculation anyway for the R/W API setup.

We don't do this exact calculation since only the low level driver knows 
that number of WQEs we need for some sophisticated WR.

The API we need is like ib_get_qp_limits when one provides some input on 
the operations it will issue and will receive an output for it.

Then we need to divide it by some factor that will reflect the amount of 
max WRs per NVMe request (e.g mem_reg + mem_invalidation + rdma_op + 
pi_yes_no).

I spoke with Jason on that and we decided that it's not a trivial patch.

is it necessary for this submission or can we live with 128 depth for 
now ? with and without new ib_ API the queue depth will be in these sizes.

>

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] nvmet-rdma: implement get_queue_size controller op
  2021-09-22  8:10         ` Max Gurtovoy
@ 2021-09-22  9:18           ` Sagi Grimberg
  2021-09-22  9:35             ` Max Gurtovoy
  2021-09-22 12:10             ` Jason Gunthorpe
  0 siblings, 2 replies; 19+ messages in thread
From: Sagi Grimberg @ 2021-09-22  9:18 UTC (permalink / raw)
  To: Max Gurtovoy, Christoph Hellwig
  Cc: linux-nvme, kbusch, chaitanyak, israelr, mruijter, oren, nitzanc,
	Jason Gunthorpe


>>> So for now, as mentioned, till we have some ib_ API, lets set it to 128.
>> Please just add the proper ib_ API, it should not be a whole lot of
>> work as we already do that calculation anyway for the R/W API setup.
> 
> We don't do this exact calculation since only the low level driver knows 
> that number of WQEs we need for some sophisticated WR.
> 
> The API we need is like ib_get_qp_limits when one provides some input on 
> the operations it will issue and will receive an output for it.
> 
> Then we need to divide it by some factor that will reflect the amount of 
> max WRs per NVMe request (e.g mem_reg + mem_invalidation + rdma_op + 
> pi_yes_no).
> 
> I spoke with Jason on that and we decided that it's not a trivial patch.

Can't you do this in rdmw_rw? all of the users of it will need the
exact same value right?

> is it necessary for this submission or can we live with 128 depth for 
> now ? with and without new ib_ API the queue depth will be in these sizes.

I am not sure I see the entire complexity. Even if this calc is not
accurate, you are already proposing to hard-code it to 128, so you
can do this to account for the boundaries there.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] nvmet-rdma: implement get_queue_size controller op
  2021-09-22  9:18           ` Sagi Grimberg
@ 2021-09-22  9:35             ` Max Gurtovoy
  2021-09-22 12:10             ` Jason Gunthorpe
  1 sibling, 0 replies; 19+ messages in thread
From: Max Gurtovoy @ 2021-09-22  9:35 UTC (permalink / raw)
  To: Sagi Grimberg, Christoph Hellwig
  Cc: linux-nvme, kbusch, chaitanyak, israelr, mruijter, oren, nitzanc,
	Jason Gunthorpe


On 9/22/2021 12:18 PM, Sagi Grimberg wrote:
>
>>>> So for now, as mentioned, till we have some ib_ API, lets set it to 
>>>> 128.
>>> Please just add the proper ib_ API, it should not be a whole lot of
>>> work as we already do that calculation anyway for the R/W API setup.
>>
>> We don't do this exact calculation since only the low level driver 
>> knows that number of WQEs we need for some sophisticated WR.
>>
>> The API we need is like ib_get_qp_limits when one provides some input 
>> on the operations it will issue and will receive an output for it.
>>
>> Then we need to divide it by some factor that will reflect the amount 
>> of max WRs per NVMe request (e.g mem_reg + mem_invalidation + rdma_op 
>> + pi_yes_no).
>>
>> I spoke with Jason on that and we decided that it's not a trivial patch.
>
> Can't you do this in rdmw_rw? all of the users of it will need the
> exact same value right?

The factor of the operations per IO req is to be added to RW API.

The factor of WR to WQE is in low level driver and is ib_ API.


>
>> is it necessary for this submission or can we live with 128 depth for 
>> now ? with and without new ib_ API the queue depth will be in these 
>> sizes.
>
> I am not sure I see the entire complexity. Even if this calc is not
> accurate, you are already proposing to hard-code it to 128, so you
> can do this to account for the boundaries there.

How does the ULP know the BBs per max WR operation ?

I prepared a patch to solve the case we say we support X but we actually 
support less than X.

128 Value is supported by mlx device and I assume by other RDMA devices 
as well since its the default value for the initiator.

The full solution include changes in RDMA_RW, ib_, low level drivers to 
implement ib_ API.

I wanted to divide it to early solution (this series) and full solution 
(the above).



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] nvmet-rdma: implement get_queue_size controller op
  2021-09-22  9:18           ` Sagi Grimberg
  2021-09-22  9:35             ` Max Gurtovoy
@ 2021-09-22 12:10             ` Jason Gunthorpe
  2021-09-22 12:57               ` Max Gurtovoy
  1 sibling, 1 reply; 19+ messages in thread
From: Jason Gunthorpe @ 2021-09-22 12:10 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Max Gurtovoy, Christoph Hellwig, linux-nvme, kbusch, chaitanyak,
	israelr, mruijter, oren, nitzanc

On Wed, Sep 22, 2021 at 12:18:15PM +0300, Sagi Grimberg wrote:

> Can't you do this in rdmw_rw? all of the users of it will need the
> exact same value right?

No, it depends on what ops the user is going to use.
 
> > is it necessary for this submission or can we live with 128 depth for
> > now ? with and without new ib_ API the queue depth will be in these
> > sizes.
> 
> I am not sure I see the entire complexity. Even if this calc is not
> accurate, you are already proposing to hard-code it to 128, so you
> can do this to account for the boundaries there.

As I understood it the 128 is to match what the initiator hardcodes
its limit to - both sides have the same basic problem with allocating
the RDMA QP, they just had different hard coded limits. Due to this we
know that 128 is OK for all RDMA HW as the initiator has proven it
already.

For a stable fix to the interop problem this is a good approach.

If someone wants to add all sorts of complexity to try and figure out
the actual device specific limit then they should probably also show
that there is a performance win (or at least not a loss) to increasing
this number further..

Jason

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] nvmet-rdma: implement get_queue_size controller op
  2021-09-22 12:10             ` Jason Gunthorpe
@ 2021-09-22 12:57               ` Max Gurtovoy
  2021-09-22 13:31                 ` Jason Gunthorpe
  0 siblings, 1 reply; 19+ messages in thread
From: Max Gurtovoy @ 2021-09-22 12:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Sagi Grimberg
  Cc: Christoph Hellwig, linux-nvme, kbusch, chaitanyak, israelr,
	mruijter, oren, nitzanc


On 9/22/2021 3:10 PM, Jason Gunthorpe wrote:
> On Wed, Sep 22, 2021 at 12:18:15PM +0300, Sagi Grimberg wrote:
>
>> Can't you do this in rdmw_rw? all of the users of it will need the
>> exact same value right?
> No, it depends on what ops the user is going to use.
>   
>>> is it necessary for this submission or can we live with 128 depth for
>>> now ? with and without new ib_ API the queue depth will be in these
>>> sizes.
>> I am not sure I see the entire complexity. Even if this calc is not
>> accurate, you are already proposing to hard-code it to 128, so you
>> can do this to account for the boundaries there.
> As I understood it the 128 is to match what the initiator hardcodes
> its limit to - both sides have the same basic problem with allocating
> the RDMA QP, they just had different hard coded limits. Due to this we
> know that 128 is OK for all RDMA HW as the initiator has proven it
> already.

Not exactly. The initiator 128 is the default value if not set 
differently in the connect command.

Probably this value can be bigger in initiator since it doesn't perform 
RDMA operation but only sends descriptors to the target.

So we'll need the ib_ future API for initiator as well, but not the RW 
API since the factor for NVMe IO will be 3 (MEM_REG, MEM_INVALID, SEND).


>
> For a stable fix to the interop problem this is a good approach.
>
> If someone wants to add all sorts of complexity to try and figure out
> the actual device specific limit then they should probably also show
> that there is a performance win (or at least not a loss) to increasing
> this number further..

Correct.


> Jason

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] nvmet-rdma: implement get_queue_size controller op
  2021-09-22 12:57               ` Max Gurtovoy
@ 2021-09-22 13:31                 ` Jason Gunthorpe
  2021-09-22 14:00                   ` Max Gurtovoy
  0 siblings, 1 reply; 19+ messages in thread
From: Jason Gunthorpe @ 2021-09-22 13:31 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: Sagi Grimberg, Christoph Hellwig, linux-nvme, kbusch, chaitanyak,
	israelr, mruijter, oren, nitzanc

On Wed, Sep 22, 2021 at 03:57:17PM +0300, Max Gurtovoy wrote:
> 
> On 9/22/2021 3:10 PM, Jason Gunthorpe wrote:
> > On Wed, Sep 22, 2021 at 12:18:15PM +0300, Sagi Grimberg wrote:
> > 
> > > Can't you do this in rdmw_rw? all of the users of it will need the
> > > exact same value right?
> > No, it depends on what ops the user is going to use.
> > > > is it necessary for this submission or can we live with 128 depth for
> > > > now ? with and without new ib_ API the queue depth will be in these
> > > > sizes.
> > > I am not sure I see the entire complexity. Even if this calc is not
> > > accurate, you are already proposing to hard-code it to 128, so you
> > > can do this to account for the boundaries there.
> > As I understood it the 128 is to match what the initiator hardcodes
> > its limit to - both sides have the same basic problem with allocating
> > the RDMA QP, they just had different hard coded limits. Due to this we
> > know that 128 is OK for all RDMA HW as the initiator has proven it
> > already.
> 
> Not exactly. The initiator 128 is the default value if not set differently
> in the connect command.
> 
> Probably this value can be bigger in initiator since it doesn't perform RDMA
> operation but only sends descriptors to the target.

Well, that means the initiator side needs fixing too. I see this:

			if (token < NVMF_MIN_QUEUE_SIZE ||
			    token > NVMF_MAX_QUEUE_SIZE)
                            ERR
			opts->queue_size = token;

Which is probably still too big for what some HW can do.

Both host and target need to bring in an upper limit of queue_size
from the RDMA layer. A ULP should not pass in a value to
ib_qp_init_attr::max_send_wr that will cause QP creation to fail if
the queue_size is programmable.

Currently there is no way to to get the device limit for QPs using
IB_QP_CREATE_INTEGRITY_EN. We know at least that 128 works on all RDMA
devices.

In any case I still view it as two tasks, fix the various interop
problems by adjusting the current hardwired limits to something that
works on all RDMA HW and computing the actual HW limit, adjusted by
RW, etc.

Jason

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] nvmet-rdma: implement get_queue_size controller op
  2021-09-22 13:31                 ` Jason Gunthorpe
@ 2021-09-22 14:00                   ` Max Gurtovoy
  0 siblings, 0 replies; 19+ messages in thread
From: Max Gurtovoy @ 2021-09-22 14:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Sagi Grimberg, Christoph Hellwig, linux-nvme, kbusch, chaitanyak,
	israelr, mruijter, oren, nitzanc


On 9/22/2021 4:31 PM, Jason Gunthorpe wrote:
> On Wed, Sep 22, 2021 at 03:57:17PM +0300, Max Gurtovoy wrote:
>> On 9/22/2021 3:10 PM, Jason Gunthorpe wrote:
>>> On Wed, Sep 22, 2021 at 12:18:15PM +0300, Sagi Grimberg wrote:
>>>
>>>> Can't you do this in rdmw_rw? all of the users of it will need the
>>>> exact same value right?
>>> No, it depends on what ops the user is going to use.
>>>>> is it necessary for this submission or can we live with 128 depth for
>>>>> now ? with and without new ib_ API the queue depth will be in these
>>>>> sizes.
>>>> I am not sure I see the entire complexity. Even if this calc is not
>>>> accurate, you are already proposing to hard-code it to 128, so you
>>>> can do this to account for the boundaries there.
>>> As I understood it the 128 is to match what the initiator hardcodes
>>> its limit to - both sides have the same basic problem with allocating
>>> the RDMA QP, they just had different hard coded limits. Due to this we
>>> know that 128 is OK for all RDMA HW as the initiator has proven it
>>> already.
>> Not exactly. The initiator 128 is the default value if not set differently
>> in the connect command.
>>
>> Probably this value can be bigger in initiator since it doesn't perform RDMA
>> operation but only sends descriptors to the target.
> Well, that means the initiator side needs fixing too. I see this:
>
> 			if (token < NVMF_MIN_QUEUE_SIZE ||
> 			    token > NVMF_MAX_QUEUE_SIZE)
>                              ERR
> 			opts->queue_size = token;
>
> Which is probably still too big for what some HW can do.

Yes. I'll fix it in V2.

>
> Both host and target need to bring in an upper limit of queue_size
> from the RDMA layer. A ULP should not pass in a value to
> ib_qp_init_attr::max_send_wr that will cause QP creation to fail if
> the queue_size is programmable.
>
> Currently there is no way to to get the device limit for QPs using
> IB_QP_CREATE_INTEGRITY_EN. We know at least that 128 works on all RDMA
> devices.
>
> In any case I still view it as two tasks, fix the various interop
> problems by adjusting the current hardwired limits to something that
> works on all RDMA HW and computing the actual HW limit, adjusted by
> RW, etc.
>
> Jason

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-09-22 14:03 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-21 19:04 [PATCH v1 0/2] update RDMA controllers queue depth Max Gurtovoy
2021-09-21 19:04 ` [PATCH 1/2] nvmet: add get_queue_size op for controllers Max Gurtovoy
2021-09-21 19:20   ` Chaitanya Kulkarni
2021-09-21 22:47   ` Sagi Grimberg
2021-09-22  7:35     ` Max Gurtovoy
2021-09-21 19:04 ` [PATCH 2/2] nvmet-rdma: implement get_queue_size controller op Max Gurtovoy
2021-09-21 19:21   ` Chaitanya Kulkarni
2021-09-21 22:52   ` Sagi Grimberg
2021-09-22  7:44     ` Max Gurtovoy
2021-09-22  7:45       ` Christoph Hellwig
2021-09-22  8:10         ` Max Gurtovoy
2021-09-22  9:18           ` Sagi Grimberg
2021-09-22  9:35             ` Max Gurtovoy
2021-09-22 12:10             ` Jason Gunthorpe
2021-09-22 12:57               ` Max Gurtovoy
2021-09-22 13:31                 ` Jason Gunthorpe
2021-09-22 14:00                   ` Max Gurtovoy
2021-09-21 19:22 ` [PATCH v1 0/2] update RDMA controllers queue depth Chaitanya Kulkarni
2021-09-21 19:42   ` Mark Ruijter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.