All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever III <chuck.lever@oracle.com>
To: Trond Myklebust <trondmy@hammerspace.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH v1] SUNRPC: Ensure backchannel transports are marked connected
Date: Thu, 19 Aug 2021 14:34:49 +0000	[thread overview]
Message-ID: <24B4FABF-C6D5-4FF6-89B7-68EFDC3AB415@oracle.com> (raw)
In-Reply-To: <69e8d4c3ae302ba51a92e20c927006ad33b2486c.camel@hammerspace.com>



> On Aug 19, 2021, at 10:14 AM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> On Thu, 2021-08-19 at 13:16 +0000, Chuck Lever III wrote:
>> 
>> 
>>> On Aug 19, 2021, at 9:01 AM, Trond Myklebust
>>> <trondmy@hammerspace.com> wrote:
>>> 
>>> On Thu, 2021-08-19 at 08:29 -0400, Chuck Lever wrote:
>>>> With NFSv4.1+ on RDMA, backchannel recovery appears not to work.
>>>> 
>>>> xprt_setup_xxx_bc() is invoked by the client's first
>>>> CREATE_SESSION
>>>> operation, and it always marks the rpc_clnt's transport as
>>>> connected.
>>>> 
>>>> On a subsequent CREATE_SESSION, if rpc_create() is called and
>>>> xpt_bc_xprt is populated, it might not be connected (for
>>>> instance,
>>>> if a backchannel fault has occurred). Ensure that code path
>>>> returns
>>>> a connected xprt also.
>>>> 
>>>> Reported-by: Timo Rothenpieler <timo@rothenpieler.org>
>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>> ---
>>>>  net/sunrpc/clnt.c |    1 +
>>>>  1 file changed, 1 insertion(+)
>>>> 
>>>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
>>>> index 8b4de70e8ead..570480a649a3 100644
>>>> --- a/net/sunrpc/clnt.c
>>>> +++ b/net/sunrpc/clnt.c
>>>> @@ -535,6 +535,7 @@ struct rpc_clnt *rpc_create(struct
>>>> rpc_create_args *args)
>>>>                 xprt = args->bc_xprt->xpt_bc_xprt;
>>>>                 if (xprt) {
>>>>                         xprt_get(xprt);
>>>> +                       xprt_set_connected(xprt);
>>>>                         return rpc_create_xprt(args, xprt);
>>>>                 }
>>>>         }
>>>> 
>>>> 
>>> 
>>> No. This is wrong. If the connection got disconnected, then the
>>> client
>>> needs to reconnect and build a new connection altogether. We can't
>>> just
>>> make pretend that the old connection still exists.
>> 
>> The patch description is not clear: the client has not disconnected.
>> The forward channel is functioning properly, and the server has set
>> SEQ4_STATUS_BACKCHANNEL_FAULT.
>> 
>> To recover, the client sends a DESTROY_SESSION / CREATE_SESSION pair
>> on the existing connection. On the server, setup_callback_client()
>> invokes rpc_create() again -- it's this step that is failing during
>> the second CREATE_SESSION on a connection because the old xprt
>> is returned but it's still marked disconnected.
> 
> How is that happening?

The RPC client's autodisconnect is firing. This was fixed in 6820bf77864d
("svcrdma: disable timeouts on rdma backchannel").

However we're still seeing cases where backchannel recovery is not
working. See:

https://lore.kernel.org/linux-nfs/c3e31e57-15fa-8f70-bbb8-af5452c1f5fc@rothenpieler.org/T/#t

As an experiment, we've reverted 6820bf77864d to try to provoke the bug.


> As far as I can tell, the only thing that can
> cause the backchannel to be marked as closed is a call to
> svc_delete_xprt(), which explicitly calls xprt->xpt_bc_xprt->ops->close(xprt->xpt_bc_xprt).
> 
> The server shouldn't be reusing anything from that svc_xprt after that.

>> An alternative would be to ensure that setup_callback_client()
>> always puts xpt_bc_xprt before it invokes rpc_create(). But it
>> looked to me like rpc_create() already has a bunch of logic to
>> deal with an existing xpt_bc_xprt.
> 
> The connection status is managed by the transport layer, not the RPC
> client layer.

If it's a bug to mark a backchannel transport disconnected, then
asserts should be added to capture the problem.

What is the purpose of this code in rpc_create() ?

 533         if (args->bc_xprt) {
 534                 WARN_ON_ONCE(!(args->protocol & XPRT_TRANSPORT_BC));
 535                 xprt = args->bc_xprt->xpt_bc_xprt;
 536                 if (xprt) {
 537                         xprt_get(xprt);
 538                         return rpc_create_xprt(args, xprt);
 539                 }
 540         }


--
Chuck Lever




  reply	other threads:[~2021-08-19 14:34 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-19 12:29 [PATCH v1] SUNRPC: Ensure backchannel transports are marked connected Chuck Lever
2021-08-19 13:01 ` Trond Myklebust
2021-08-19 13:16   ` Chuck Lever III
2021-08-19 14:14     ` Trond Myklebust
2021-08-19 14:34       ` Chuck Lever III [this message]
2021-08-20  0:14         ` Trond Myklebust
2021-08-20 13:58           ` Chuck Lever III
2021-08-20 14:31             ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=24B4FABF-C6D5-4FF6-89B7-68EFDC3AB415@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.