All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme-rdma: Remove timeout for getting RDMA-CM established event
@ 2022-05-15 15:04 Israel Rukshin
  2022-05-16 22:28 ` Chaitanya Kulkarni
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Israel Rukshin @ 2022-05-15 15:04 UTC (permalink / raw)
  To: Linux-nvme, Sagi Grimberg, Christoph Hellwig
  Cc: Israel Rukshin, Nitzan Carmi, Max Gurtovoy

In case many controllers start error recovery at the same time (i.e.,
when port is down and up), they may never succeed to reconnect again.
This is because the target can't handle all the connect requests at
three seconds (the arbitrary value set today). Even if some of the
connections are established, when a single queue fails to connect,
all the controller's queues are destroyed as well. So, on the
following reconnection attempts the number of connect requests may
remain the same. To fix this, remove the timeout and wait for RDMA-CM
event to abort/complete the connect request. RDMA-CM sends unreachable
event when a timeout of ~90 seconds is expired. This approach is used
at other RDMA-CM users like SRP and iSER at blocking mode. The commit
also renames NVME_RDMA_CONNECT_TIMEOUT_MS to NVME_RDMA_CM_TIMEOUT_MS.

Signed-off-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 drivers/nvme/host/rdma.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index d9f19d901313..0ccd8f65b50b 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -29,7 +29,7 @@
 #include "fabrics.h"
 
 
-#define NVME_RDMA_CONNECT_TIMEOUT_MS	3000		/* 3 second */
+#define NVME_RDMA_CM_TIMEOUT_MS		3000		/* 3 second */
 
 #define NVME_RDMA_MAX_SEGMENTS		256
 
@@ -248,12 +248,9 @@ static int nvme_rdma_wait_for_cm(struct nvme_rdma_queue *queue)
 {
 	int ret;
 
-	ret = wait_for_completion_interruptible_timeout(&queue->cm_done,
-			msecs_to_jiffies(NVME_RDMA_CONNECT_TIMEOUT_MS) + 1);
-	if (ret < 0)
+	ret = wait_for_completion_interruptible(&queue->cm_done);
+	if (ret)
 		return ret;
-	if (ret == 0)
-		return -ETIMEDOUT;
 	WARN_ON_ONCE(queue->cm_error > 0);
 	return queue->cm_error;
 }
@@ -612,7 +609,7 @@ static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl *ctrl,
 	queue->cm_error = -ETIMEDOUT;
 	ret = rdma_resolve_addr(queue->cm_id, src_addr,
 			(struct sockaddr *)&ctrl->addr,
-			NVME_RDMA_CONNECT_TIMEOUT_MS);
+			NVME_RDMA_CM_TIMEOUT_MS);
 	if (ret) {
 		dev_info(ctrl->ctrl.device,
 			"rdma_resolve_addr failed (%d).\n", ret);
@@ -1886,7 +1883,7 @@ static int nvme_rdma_addr_resolved(struct nvme_rdma_queue *queue)
 
 	if (ctrl->opts->tos >= 0)
 		rdma_set_service_type(queue->cm_id, ctrl->opts->tos);
-	ret = rdma_resolve_route(queue->cm_id, NVME_RDMA_CONNECT_TIMEOUT_MS);
+	ret = rdma_resolve_route(queue->cm_id, NVME_RDMA_CM_TIMEOUT_MS);
 	if (ret) {
 		dev_err(ctrl->device, "rdma_resolve_route failed (%d).\n",
 			queue->cm_error);
-- 
2.18.2



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme-rdma: Remove timeout for getting RDMA-CM established event
  2022-05-15 15:04 [PATCH] nvme-rdma: Remove timeout for getting RDMA-CM established event Israel Rukshin
@ 2022-05-16 22:28 ` Chaitanya Kulkarni
  2022-05-17 10:11 ` Sagi Grimberg
  2022-07-12 15:35 ` Christoph Hellwig
  2 siblings, 0 replies; 6+ messages in thread
From: Chaitanya Kulkarni @ 2022-05-16 22:28 UTC (permalink / raw)
  To: Israel Rukshin
  Cc: Nitzan Carmi, Max Gurtovoy, Linux-nvme, Christoph Hellwig, Sagi Grimberg

On 5/15/22 08:04, Israel Rukshin wrote:
> In case many controllers start error recovery at the same time (i.e.,
> when port is down and up), they may never succeed to reconnect again.
> This is because the target can't handle all the connect requests at
> three seconds (the arbitrary value set today). Even if some of the
> connections are established, when a single queue fails to connect,
> all the controller's queues are destroyed as well. So, on the
> following reconnection attempts the number of connect requests may
> remain the same. To fix this, remove the timeout and wait for RDMA-CM
> event to abort/complete the connect request. RDMA-CM sends unreachable
> event when a timeout of ~90 seconds is expired. This approach is used
> at other RDMA-CM users like SRP and iSER at blocking mode. The commit
> also renames NVME_RDMA_CONNECT_TIMEOUT_MS to NVME_RDMA_CM_TIMEOUT_MS.
> 
> Signed-off-by: Israel Rukshin <israelr@nvidia.com>
> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> ---

Based on the complexity of components are involved in this, please write
a blktests for rdma transport to make sure this gets tested on each
release.

-ck



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme-rdma: Remove timeout for getting RDMA-CM established event
  2022-05-15 15:04 [PATCH] nvme-rdma: Remove timeout for getting RDMA-CM established event Israel Rukshin
  2022-05-16 22:28 ` Chaitanya Kulkarni
@ 2022-05-17 10:11 ` Sagi Grimberg
  2022-05-18 12:14   ` Israel Rukshin
       [not found]   ` <55da512f-03af-aa15-3e5a-754cebbde4d6@nvidia.com>
  2022-07-12 15:35 ` Christoph Hellwig
  2 siblings, 2 replies; 6+ messages in thread
From: Sagi Grimberg @ 2022-05-17 10:11 UTC (permalink / raw)
  To: Israel Rukshin, Linux-nvme, Christoph Hellwig; +Cc: Nitzan Carmi, Max Gurtovoy


> In case many controllers start error recovery at the same time (i.e.,
> when port is down and up), they may never succeed to reconnect again.
> This is because the target can't handle all the connect requests at
> three seconds (the arbitrary value set today). Even if some of the
> connections are established, when a single queue fails to connect,
> all the controller's queues are destroyed as well. So, on the
> following reconnection attempts the number of connect requests may
> remain the same. To fix this, remove the timeout and wait for RDMA-CM
> event to abort/complete the connect request. RDMA-CM sends unreachable
> event when a timeout of ~90 seconds is expired. This approach is used
> at other RDMA-CM users like SRP and iSER at blocking mode.

So with this connecting to an unreachable controller will take 90
seconds?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme-rdma: Remove timeout for getting RDMA-CM established event
  2022-05-17 10:11 ` Sagi Grimberg
@ 2022-05-18 12:14   ` Israel Rukshin
       [not found]   ` <55da512f-03af-aa15-3e5a-754cebbde4d6@nvidia.com>
  1 sibling, 0 replies; 6+ messages in thread
From: Israel Rukshin @ 2022-05-18 12:14 UTC (permalink / raw)
  To: Linux-nvme

Hi Sagi,

On 5/17/2022 1:11 PM, Sagi Grimberg wrote:
>
>> In case many controllers start error recovery at the same time (i.e.,
>> when port is down and up), they may never succeed to reconnect again.
>> This is because the target can't handle all the connect requests at
>> three seconds (the arbitrary value set today). Even if some of the
>> connections are established, when a single queue fails to connect,
>> all the controller's queues are destroyed as well. So, on the
>> following reconnection attempts the number of connect requests may
>> remain the same. To fix this, remove the timeout and wait for RDMA-CM
>> event to abort/complete the connect request. RDMA-CM sends unreachable
>> event when a timeout of ~90 seconds is expired. This approach is used
>> at other RDMA-CM users like SRP and iSER at blocking mode.
>
> So with this connecting to an unreachable controller will take 90
> seconds?

The answer is yes.
An unreachable controller is only when adders/route resolve passed 
successfully, so bad IP will fail immediately.
When running nvme connect, the user can press (Ctrl + C) to fail the 
connection immediately. On error recovery
it is not possible to fail it by pressing (Ctrl + C), but it doesn't 
block others.
I ran "nvme connect" with and without this patch and you can see the 
results at the table below:

Test                                                       |    With 
Patch                                                      | Without patch
Target is busy with other connections/disconnections               |  
succeed                   |Fail after 3 seconds
Kill target after successful resolve address and boot after 30 seconds   
|  Get reject when target machine is up         |Fail after 3 seconds
Kill target after successful resolve address and boot after 120 seconds 
|  Get unreachable event after ~90 seconds  |Fail after 3 seconds
Port down after successful resolve address and up after 30 seconds     
|  succeed       |Fail after 3 seconds
Port down after successful resolve address and up after 120 seconds   |  
Get unreachable event after ~90 seconds  |Fail after 3 seconds
At error recovery, one reconnect attempt time is                      |  
Up to 90 seconds per controller.        |Up to 3 seconds per controller
Error recovery with many connections                         |   succeed 
                              |Never

Israel



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme-rdma: Remove timeout for getting RDMA-CM established event
       [not found]   ` <55da512f-03af-aa15-3e5a-754cebbde4d6@nvidia.com>
@ 2022-07-07  7:14     ` Sagi Grimberg
  0 siblings, 0 replies; 6+ messages in thread
From: Sagi Grimberg @ 2022-07-07  7:14 UTC (permalink / raw)
  To: Israel Rukshin, Linux-nvme, Christoph Hellwig; +Cc: Nitzan Carmi, Max Gurtovoy


> Hi Sagi
> 
> On 5/17/2022 1:11 PM, Sagi Grimberg wrote:
>>
>>> In case many controllers start error recovery at the same time (i.e.,
>>> when port is down and up), they may never succeed to reconnect again.
>>> This is because the target can't handle all the connect requests at
>>> three seconds (the arbitrary value set today). Even if some of the
>>> connections are established, when a single queue fails to connect,
>>> all the controller's queues are destroyed as well. So, on the
>>> following reconnection attempts the number of connect requests may
>>> remain the same. To fix this, remove the timeout and wait for RDMA-CM
>>> event to abort/complete the connect request. RDMA-CM sends unreachable
>>> event when a timeout of ~90 seconds is expired. This approach is used
>>> at other RDMA-CM users like SRP and iSER at blocking mode.

If we are aligning to srp/iser, we can also align to their cm timeout
as well (1s). Given that it is not the full connection establishment.

>>
>> So with this connecting to an unreachable controller will take 90
>> seconds?
> The answer is yes.
> An unreachable controller is only when adders/route resolve passed 
> successfully, so bad IP will fail immediately.
> When running nvme connect, the user can press (Ctrl + C) to fail the 
> connection immediately. On error recovery
> it is not possible to fail it by pressing (Ctrl + C), but it doesn't 
> block others.
> I ran "nvme connect" with and without this patch and you can see the 
> results at the table below:
> Test
>      With Patch
>      Without patch
> Target is busy with other connections/disconnections     succeed     
> Fail after 3 seconds
> Kill target after successful resolve address and boot after 30 seconds 
> Get reject when target machine is up (30 seconds)     Fail after 3 seconds
> Kill target after successful resolve address and boot after 120 seconds 
> Get unreachable event after ~90 seconds     Fail after 3 seconds
> Port down after successful resolve address and up after 30 seconds 
> succeed     Fail after 3 seconds
> Port down after successful resolve address and up after 120 seconds     
> Get unreachable event after ~90 seconds     Fail after 3 seconds
> At error recovery, one reconnect attempt time is     Up to 90 seconds 
> per controller.     Up to 3 seconds per controller
> Error recovery with many connections     succeed     Never
> 
> Israel
> 


I think its a bit problematic that a (re)connect may take 90 seconds
just to fail, that will block the reconnect_work thread for 90+ seconds...

But I guess its fine for now, and it happens with other transports for
years and was never a concern, so I guess I'm fine with it:

Acked-by: Sagi Grimberg <sagi@grimberg.me>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme-rdma: Remove timeout for getting RDMA-CM established event
  2022-05-15 15:04 [PATCH] nvme-rdma: Remove timeout for getting RDMA-CM established event Israel Rukshin
  2022-05-16 22:28 ` Chaitanya Kulkarni
  2022-05-17 10:11 ` Sagi Grimberg
@ 2022-07-12 15:35 ` Christoph Hellwig
  2 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2022-07-12 15:35 UTC (permalink / raw)
  To: Israel Rukshin
  Cc: Linux-nvme, Sagi Grimberg, Christoph Hellwig, Nitzan Carmi, Max Gurtovoy

Thanks,

applied to nvme-5.20.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-07-12 15:35 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-15 15:04 [PATCH] nvme-rdma: Remove timeout for getting RDMA-CM established event Israel Rukshin
2022-05-16 22:28 ` Chaitanya Kulkarni
2022-05-17 10:11 ` Sagi Grimberg
2022-05-18 12:14   ` Israel Rukshin
     [not found]   ` <55da512f-03af-aa15-3e5a-754cebbde4d6@nvidia.com>
2022-07-07  7:14     ` Sagi Grimberg
2022-07-12 15:35 ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.