* [PATCH] xprtrdma: Wake up re_connect_wait on disconnect
@ 2020-06-20 17:18 Dan Aloni
2020-06-20 18:46 ` Chuck Lever
0 siblings, 1 reply; 4+ messages in thread
From: Dan Aloni @ 2020-06-20 17:18 UTC (permalink / raw)
To: Chuck Lever; +Cc: linux-rdma, linux-nfs
Given that rpcrdma_xprt_connect() happens from workqueue context, on cases where
connections don't succeeds, something needs to wake it up. In my case, this has
been observed when the CM callback received `RDMA_CM_EVENT_REJECTED`, and
`rpcrdma_xprt_connect()` slept forever.
This continues the fix in commit 58bd6656f808 ('xprtrdma: Restore wake-up-all to
rpcrdma_cm_event_handler()').
Signed-off-by: Dan Aloni <dan@kernelim.com>
CC: Chuck Lever <chuck.lever@oracle.com>
---
Notes:
Hi Chuck,
Maybe I missd something, as it is not clear to me how otherwise (without this
patch), re_connect_wait can be woken up in this situation. Please explain?
net/sunrpc/xprtrdma/verbs.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 2ae348377806..8bd76a47a91f 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -289,6 +289,7 @@ rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event)
ep->re_connect_status = -ECONNABORTED;
disconnected:
xprt_force_disconnect(xprt);
+ wake_up_all(&ep->re_connect_wait);
return rpcrdma_ep_destroy(ep);
default:
break;
--
2.25.4
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] xprtrdma: Wake up re_connect_wait on disconnect
2020-06-20 17:18 [PATCH] xprtrdma: Wake up re_connect_wait on disconnect Dan Aloni
@ 2020-06-20 18:46 ` Chuck Lever
2020-06-21 14:49 ` Chuck Lever
0 siblings, 1 reply; 4+ messages in thread
From: Chuck Lever @ 2020-06-20 18:46 UTC (permalink / raw)
To: Dan Aloni; +Cc: linux-rdma, Linux NFS Mailing List
Hi Dan-
> On Jun 20, 2020, at 1:18 PM, Dan Aloni <dan@kernelim.com> wrote:
>
> Given that rpcrdma_xprt_connect() happens from workqueue context, on cases where
> connections don't succeeds, something needs to wake it up. In my case, this has
> been observed when the CM callback received `RDMA_CM_EVENT_REJECTED`, and
> `rpcrdma_xprt_connect()` slept forever.
Interesting. My development and testing generates plenty of REJECTED connection
requests, but I never saw this particular failure mode.
> This continues the fix in commit 58bd6656f808 ('xprtrdma: Restore wake-up-all to
> rpcrdma_cm_event_handler()').
The patch looks sensible. I'll pull it into my test harness.
> Signed-off-by: Dan Aloni <dan@kernelim.com>
> CC: Chuck Lever <chuck.lever@oracle.com>
> ---
>
> Notes:
> Hi Chuck,
>
> Maybe I missd something, as it is not clear to me how otherwise (without this
> patch), re_connect_wait can be woken up in this situation. Please explain?
>
> net/sunrpc/xprtrdma/verbs.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 2ae348377806..8bd76a47a91f 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -289,6 +289,7 @@ rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event)
> ep->re_connect_status = -ECONNABORTED;
> disconnected:
> xprt_force_disconnect(xprt);
> + wake_up_all(&ep->re_connect_wait);
> return rpcrdma_ep_destroy(ep);
> default:
> break;
> --
> 2.25.4
>
--
Chuck Lever
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] xprtrdma: Wake up re_connect_wait on disconnect
2020-06-20 18:46 ` Chuck Lever
@ 2020-06-21 14:49 ` Chuck Lever
2020-06-21 15:11 ` Dan Aloni
0 siblings, 1 reply; 4+ messages in thread
From: Chuck Lever @ 2020-06-21 14:49 UTC (permalink / raw)
To: Dan Aloni; +Cc: linux-rdma, Linux NFS Mailing List
Hi Dan-
> On Jun 20, 2020, at 2:46 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
>
> Hi Dan-
>
>> On Jun 20, 2020, at 1:18 PM, Dan Aloni <dan@kernelim.com> wrote:
>>
>> Given that rpcrdma_xprt_connect() happens from workqueue context, on cases where
>> connections don't succeeds, something needs to wake it up. In my case, this has
>> been observed when the CM callback received `RDMA_CM_EVENT_REJECTED`, and
>> `rpcrdma_xprt_connect()` slept forever.
>
> Interesting. My development and testing generates plenty of REJECTED connection
> requests, but I never saw this particular failure mode.
Correction: My testing _used_ _to_ generate REJECTED events regularly. It does
not seem to any more, even after client crashes. So that explains why I haven't
seen this before.
I haven't reproduced the problem here, but the fix still looks proper to me,
and doesn't appear to introduce any regressions. I do have some issues with your
proposed patch, though.
The first paragraph of the patch description is incorrect. RDMA_CM_EVENT_DISCONNECTED
can occur only once a connection has been established. That guarantees there are no
waiters on re_connect_wait in that case. It's connect errors that need to wake-up
the connect worker.
>> This continues the fix in commit 58bd6656f808 ('xprtrdma: Restore wake-up-all to
>> rpcrdma_cm_event_handler()').
IMO this paragraph needs to be replaced by:
Fixes: e28ce90083f0 ("xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt")
>> Signed-off-by: Dan Aloni <dan@kernelim.com>
>> CC: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>
>> Notes:
>> Hi Chuck,
>>
>> Maybe I missd something, as it is not clear to me how otherwise (without this
>> patch), re_connect_wait can be woken up in this situation. Please explain?
>>
>> net/sunrpc/xprtrdma/verbs.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>> index 2ae348377806..8bd76a47a91f 100644
>> --- a/net/sunrpc/xprtrdma/verbs.c
>> +++ b/net/sunrpc/xprtrdma/verbs.c
>> @@ -289,6 +289,7 @@ rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event)
>> ep->re_connect_status = -ECONNABORTED;
>> disconnected:
>> xprt_force_disconnect(xprt);
>> + wake_up_all(&ep->re_connect_wait);
>> return rpcrdma_ep_destroy(ep);
>> default:
>> break;
This hunk does not apply on top of fixes I've already sent to Anna for 5.8-rc1.
So, if you don't object, I'll adjust your patch (this hunk and the description)
before sending it along to Anna.
--
Chuck Lever
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] xprtrdma: Wake up re_connect_wait on disconnect
2020-06-21 14:49 ` Chuck Lever
@ 2020-06-21 15:11 ` Dan Aloni
0 siblings, 0 replies; 4+ messages in thread
From: Dan Aloni @ 2020-06-21 15:11 UTC (permalink / raw)
To: Chuck Lever; +Cc: linux-rdma, Linux NFS Mailing List
On Sun, Jun 21, 2020 at 10:49:53AM -0400, Chuck Lever wrote:
> >> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> >> index 2ae348377806..8bd76a47a91f 100644
> >> --- a/net/sunrpc/xprtrdma/verbs.c
> >> +++ b/net/sunrpc/xprtrdma/verbs.c
> >> @@ -289,6 +289,7 @@ rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event)
> >> ep->re_connect_status = -ECONNABORTED;
> >> disconnected:
> >> xprt_force_disconnect(xprt);
> >> + wake_up_all(&ep->re_connect_wait);
> >> return rpcrdma_ep_destroy(ep);
> >> default:
> >> break;
>
> This hunk does not apply on top of fixes I've already sent to Anna for 5.8-rc1.
>
> So, if you don't object, I'll adjust your patch (this hunk and the description)
> before sending it along to Anna.
Sure, go ahead. Thanks for working on this!
--
Dan Aloni
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-06-21 15:11 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-20 17:18 [PATCH] xprtrdma: Wake up re_connect_wait on disconnect Dan Aloni
2020-06-20 18:46 ` Chuck Lever
2020-06-21 14:49 ` Chuck Lever
2020-06-21 15:11 ` Dan Aloni
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).