All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] cifs: smbd: avoid reconnect lockup
@ 2018-03-30 22:16 Long Li
  2018-03-30 22:16 ` [PATCH 2/2] cifs: smbd: disconnect transport on RDMA errors Long Li
  2018-03-30 22:23 ` [PATCH 1/2] cifs: smbd: avoid reconnect lockup ronnie sahlberg
  0 siblings, 2 replies; 4+ messages in thread
From: Long Li @ 2018-03-30 22:16 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel; +Cc: Long Li

From: Long Li <longli@microsoft.com>

During transport reconnect, other processes may have registered memory
and blocked on transport. This creates a deadlock situation because the
transport resources can't be freed, and reconnect is blocked.

Fix this by returning to upper layer on timeout. Before returning,
transport status is set to reconnecting so other processes will release
memory registration resources.

Upper layer will retry the reconnect. This is not in fast I/O path so
setting the timeout to 5 seconds.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/smbdirect.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index 5aa0b54..3f7883e 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -1498,8 +1498,8 @@ int smbd_reconnect(struct TCP_Server_Info *server)
 	log_rdma_event(INFO, "reconnecting rdma session\n");
 
 	if (!server->smbd_conn) {
-		log_rdma_event(ERR, "rdma session already destroyed\n");
-		return -EINVAL;
+		log_rdma_event(INFO, "rdma session already destroyed\n");
+		goto create_conn;
 	}
 
 	/*
@@ -1512,15 +1512,19 @@ int smbd_reconnect(struct TCP_Server_Info *server)
 	}
 
 	/* wait until the transport is destroyed */
-	wait_event(server->smbd_conn->wait_destroy,
-		server->smbd_conn->transport_status == SMBD_DESTROYED);
+	if (!wait_event_timeout(server->smbd_conn->wait_destroy,
+		server->smbd_conn->transport_status == SMBD_DESTROYED, 5*HZ))
+		return -EAGAIN;
 
 	destroy_workqueue(server->smbd_conn->workqueue);
 	kfree(server->smbd_conn);
 
+create_conn:
 	log_rdma_event(INFO, "creating rdma session\n");
 	server->smbd_conn = smbd_get_connection(
 		server, (struct sockaddr *) &server->dstaddr);
+	log_rdma_event(INFO, "created rdma session info=%p\n",
+		server->smbd_conn);
 
 	return server->smbd_conn ? 0 : -ENOENT;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] cifs: smbd: disconnect transport on RDMA errors
  2018-03-30 22:16 [PATCH 1/2] cifs: smbd: avoid reconnect lockup Long Li
@ 2018-03-30 22:16 ` Long Li
  2018-03-30 22:23 ` [PATCH 1/2] cifs: smbd: avoid reconnect lockup ronnie sahlberg
  1 sibling, 0 replies; 4+ messages in thread
From: Long Li @ 2018-03-30 22:16 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel; +Cc: Long Li

From: Long Li <longli@microsoft.com>

On RDMA errors, transport should disconnect the RDMA CM connection. This
will notify the upper layer, and it will attempt transport reconnect.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/smbdirect.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index 3f7883e..5008af5 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -862,6 +862,8 @@ static int smbd_post_send_negotiate_req(struct smbd_connection *info)
 	ib_dma_unmap_single(info->id->device, request->sge[0].addr,
 		request->sge[0].length, DMA_TO_DEVICE);
 
+	smbd_disconnect_rdma_connection(info);
+
 dma_mapping_failed:
 	mempool_free(request, info->request_mempool);
 	return rc;
@@ -1061,6 +1063,7 @@ static int smbd_post_send(struct smbd_connection *info,
 			if (atomic_dec_and_test(&info->send_pending))
 				wake_up(&info->wait_send_pending);
 		}
+		smbd_disconnect_rdma_connection(info);
 	} else
 		/* Reset timer for idle connection after packet is sent */
 		mod_delayed_work(info->workqueue, &info->idle_timer_work,
@@ -1202,7 +1205,7 @@ static int smbd_post_recv(
 	if (rc) {
 		ib_dma_unmap_single(info->id->device, response->sge.addr,
 				    response->sge.length, DMA_FROM_DEVICE);
-
+		smbd_disconnect_rdma_connection(info);
 		log_rdma_recv(ERR, "ib_post_recv failed rc=%d\n", rc);
 	}
 
@@ -2546,6 +2549,8 @@ struct smbd_mr *smbd_register_mr(
 	if (atomic_dec_and_test(&info->mr_used_count))
 		wake_up(&info->wait_for_mr_cleanup);
 
+	smbd_disconnect_rdma_connection(info);
+
 	return NULL;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] cifs: smbd: avoid reconnect lockup
  2018-03-30 22:16 [PATCH 1/2] cifs: smbd: avoid reconnect lockup Long Li
  2018-03-30 22:16 ` [PATCH 2/2] cifs: smbd: disconnect transport on RDMA errors Long Li
@ 2018-03-30 22:23 ` ronnie sahlberg
  2018-03-31  0:19   ` Steve French
  1 sibling, 1 reply; 4+ messages in thread
From: ronnie sahlberg @ 2018-03-30 22:23 UTC (permalink / raw)
  To: Long Li; +Cc: Steve French, linux-cifs, samba-technical, LKML

Looks good to me (both patches)

Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>

On Sat, Mar 31, 2018 at 8:16 AM, Long Li <longli@linuxonhyperv.com> wrote:
> From: Long Li <longli@microsoft.com>
>
> During transport reconnect, other processes may have registered memory
> and blocked on transport. This creates a deadlock situation because the
> transport resources can't be freed, and reconnect is blocked.
>
> Fix this by returning to upper layer on timeout. Before returning,
> transport status is set to reconnecting so other processes will release
> memory registration resources.
>
> Upper layer will retry the reconnect. This is not in fast I/O path so
> setting the timeout to 5 seconds.
>
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
>  fs/cifs/smbdirect.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
> index 5aa0b54..3f7883e 100644
> --- a/fs/cifs/smbdirect.c
> +++ b/fs/cifs/smbdirect.c
> @@ -1498,8 +1498,8 @@ int smbd_reconnect(struct TCP_Server_Info *server)
>         log_rdma_event(INFO, "reconnecting rdma session\n");
>
>         if (!server->smbd_conn) {
> -               log_rdma_event(ERR, "rdma session already destroyed\n");
> -               return -EINVAL;
> +               log_rdma_event(INFO, "rdma session already destroyed\n");
> +               goto create_conn;
>         }
>
>         /*
> @@ -1512,15 +1512,19 @@ int smbd_reconnect(struct TCP_Server_Info *server)
>         }
>
>         /* wait until the transport is destroyed */
> -       wait_event(server->smbd_conn->wait_destroy,
> -               server->smbd_conn->transport_status == SMBD_DESTROYED);
> +       if (!wait_event_timeout(server->smbd_conn->wait_destroy,
> +               server->smbd_conn->transport_status == SMBD_DESTROYED, 5*HZ))
> +               return -EAGAIN;
>
>         destroy_workqueue(server->smbd_conn->workqueue);
>         kfree(server->smbd_conn);
>
> +create_conn:
>         log_rdma_event(INFO, "creating rdma session\n");
>         server->smbd_conn = smbd_get_connection(
>                 server, (struct sockaddr *) &server->dstaddr);
> +       log_rdma_event(INFO, "created rdma session info=%p\n",
> +               server->smbd_conn);
>
>         return server->smbd_conn ? 0 : -ENOENT;
>  }
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] cifs: smbd: avoid reconnect lockup
  2018-03-30 22:23 ` [PATCH 1/2] cifs: smbd: avoid reconnect lockup ronnie sahlberg
@ 2018-03-31  0:19   ` Steve French
  0 siblings, 0 replies; 4+ messages in thread
From: Steve French @ 2018-03-31  0:19 UTC (permalink / raw)
  To: ronnie sahlberg; +Cc: Long Li, Steve French, linux-cifs, samba-technical, LKML

merged into cifs-2.6.git for-next

added cc:stable

On Fri, Mar 30, 2018 at 5:23 PM, ronnie sahlberg
<ronniesahlberg@gmail.com> wrote:
> Looks good to me (both patches)
>
> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
>
> On Sat, Mar 31, 2018 at 8:16 AM, Long Li <longli@linuxonhyperv.com> wrote:
>> From: Long Li <longli@microsoft.com>
>>
>> During transport reconnect, other processes may have registered memory
>> and blocked on transport. This creates a deadlock situation because the
>> transport resources can't be freed, and reconnect is blocked.
>>
>> Fix this by returning to upper layer on timeout. Before returning,
>> transport status is set to reconnecting so other processes will release
>> memory registration resources.
>>
>> Upper layer will retry the reconnect. This is not in fast I/O path so
>> setting the timeout to 5 seconds.
>>
>> Signed-off-by: Long Li <longli@microsoft.com>
>> ---
>>  fs/cifs/smbdirect.c | 12 ++++++++----
>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
>> index 5aa0b54..3f7883e 100644
>> --- a/fs/cifs/smbdirect.c
>> +++ b/fs/cifs/smbdirect.c
>> @@ -1498,8 +1498,8 @@ int smbd_reconnect(struct TCP_Server_Info *server)
>>         log_rdma_event(INFO, "reconnecting rdma session\n");
>>
>>         if (!server->smbd_conn) {
>> -               log_rdma_event(ERR, "rdma session already destroyed\n");
>> -               return -EINVAL;
>> +               log_rdma_event(INFO, "rdma session already destroyed\n");
>> +               goto create_conn;
>>         }
>>
>>         /*
>> @@ -1512,15 +1512,19 @@ int smbd_reconnect(struct TCP_Server_Info *server)
>>         }
>>
>>         /* wait until the transport is destroyed */
>> -       wait_event(server->smbd_conn->wait_destroy,
>> -               server->smbd_conn->transport_status == SMBD_DESTROYED);
>> +       if (!wait_event_timeout(server->smbd_conn->wait_destroy,
>> +               server->smbd_conn->transport_status == SMBD_DESTROYED, 5*HZ))
>> +               return -EAGAIN;
>>
>>         destroy_workqueue(server->smbd_conn->workqueue);
>>         kfree(server->smbd_conn);
>>
>> +create_conn:
>>         log_rdma_event(INFO, "creating rdma session\n");
>>         server->smbd_conn = smbd_get_connection(
>>                 server, (struct sockaddr *) &server->dstaddr);
>> +       log_rdma_event(INFO, "created rdma session info=%p\n",
>> +               server->smbd_conn);
>>
>>         return server->smbd_conn ? 0 : -ENOENT;
>>  }
>> --
>> 2.7.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-03-31  0:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-30 22:16 [PATCH 1/2] cifs: smbd: avoid reconnect lockup Long Li
2018-03-30 22:16 ` [PATCH 2/2] cifs: smbd: disconnect transport on RDMA errors Long Li
2018-03-30 22:23 ` [PATCH 1/2] cifs: smbd: avoid reconnect lockup ronnie sahlberg
2018-03-31  0:19   ` Steve French

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.