All of lore.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "bfields@redhat.com" <bfields@redhat.com>,
	"chuck.lever@oracle.com" <chuck.lever@oracle.com>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH v1] SUNRPC: Ensure backchannel transports are marked connected
Date: Fri, 20 Aug 2021 14:31:37 +0000	[thread overview]
Message-ID: <ae55e152e9b7f409eb2b8c92e9840d34c2d0b49b.camel@hammerspace.com> (raw)
In-Reply-To: <B5C44073-6391-4310-B89A-FCF31A5A5E22@oracle.com>

On Fri, 2021-08-20 at 13:58 +0000, Chuck Lever III wrote:
> 
> 
> > On Aug 19, 2021, at 8:14 PM, Trond Myklebust
> > <trondmy@hammerspace.com> wrote:
> > 
> > On Thu, 2021-08-19 at 14:34 +0000, Chuck Lever III wrote:
> > > 
> > > 
> > > > On Aug 19, 2021, at 10:14 AM, Trond Myklebust
> > > > <trondmy@hammerspace.com> wrote:
> > > > 
> > > > On Thu, 2021-08-19 at 13:16 +0000, Chuck Lever III wrote:
> > > > > 
> > > > > 
> > > > > > On Aug 19, 2021, at 9:01 AM, Trond Myklebust
> > > > > > <trondmy@hammerspace.com> wrote:
> > > > > > 
> > > > > > On Thu, 2021-08-19 at 08:29 -0400, Chuck Lever wrote:
> > > > > > > With NFSv4.1+ on RDMA, backchannel recovery appears not to
> > > > > > > work.
> > > > > > > 
> > > > > > > xprt_setup_xxx_bc() is invoked by the client's first
> > > > > > > CREATE_SESSION
> > > > > > > operation, and it always marks the rpc_clnt's transport as
> > > > > > > connected.
> > > > > > > 
> > > > > > > On a subsequent CREATE_SESSION, if rpc_create() is called
> > > > > > > and
> > > > > > > xpt_bc_xprt is populated, it might not be connected (for
> > > > > > > instance,
> > > > > > > if a backchannel fault has occurred). Ensure that code path
> > > > > > > returns
> > > > > > > a connected xprt also.
> > > > > > > 
> > > > > > > Reported-by: Timo Rothenpieler <timo@rothenpieler.org>
> > > > > > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > > > > > > ---
> > > > > > >  net/sunrpc/clnt.c |    1 +
> > > > > > >  1 file changed, 1 insertion(+)
> > > > > > > 
> > > > > > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> > > > > > > index 8b4de70e8ead..570480a649a3 100644
> > > > > > > --- a/net/sunrpc/clnt.c
> > > > > > > +++ b/net/sunrpc/clnt.c
> > > > > > > @@ -535,6 +535,7 @@ struct rpc_clnt *rpc_create(struct
> > > > > > > rpc_create_args *args)
> > > > > > >                 xprt = args->bc_xprt->xpt_bc_xprt;
> > > > > > >                 if (xprt) {
> > > > > > >                         xprt_get(xprt);
> > > > > > > +                       xprt_set_connected(xprt);
> > > > > > >                         return rpc_create_xprt(args, xprt);
> > > > > > >                 }
> > > > > > >         }
> > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > > No. This is wrong. If the connection got disconnected, then
> > > > > > the
> > > > > > client
> > > > > > needs to reconnect and build a new connection altogether. We
> > > > > > can't
> > > > > > just
> > > > > > make pretend that the old connection still exists.
> > > > > 
> > > > > The patch description is not clear: the client has not
> > > > > disconnected.
> > > > > The forward channel is functioning properly, and the server has
> > > > > set
> > > > > SEQ4_STATUS_BACKCHANNEL_FAULT.
> > > > > 
> > > > > To recover, the client sends a DESTROY_SESSION / CREATE_SESSION
> > > > > pair
> > > > > on the existing connection. On the server,
> > > > > setup_callback_client()
> > > > > invokes rpc_create() again -- it's this step that is failing
> > > > > during
> > > > > the second CREATE_SESSION on a connection because the old xprt
> > > > > is returned but it's still marked disconnected.
> > > > 
> > > > How is that happening?
> > > 
> > > The RPC client's autodisconnect is firing. This was fixed in
> > > 6820bf77864d
> > > ("svcrdma: disable timeouts on rdma backchannel").
> > > 
> > > However we're still seeing cases where backchannel recovery is not
> > > working. See:
> > > 
> > > https://lore.kernel.org/linux-nfs/c3e31e57-15fa-8f70-bbb8-af5452c1f5fc@rothenpieler.org/T/#t
> > > 
> > > As an experiment, we've reverted 6820bf77864d to try to provoke the
> > > bug.
> > > 
> > > 
> > > > As far as I can tell, the only thing that can
> > > > cause the backchannel to be marked as closed is a call to
> > > > svc_delete_xprt(), which explicitly calls xprt->xpt_bc_xprt->ops-
> > > > > close(xprt->xpt_bc_xprt).
> > > > 
> > > > The server shouldn't be reusing anything from that svc_xprt after
> > > > that.
> > > 
> > > > > An alternative would be to ensure that setup_callback_client()
> > > > > always puts xpt_bc_xprt before it invokes rpc_create(). But it
> > > > > looked to me like rpc_create() already has a bunch of logic to
> > > > > deal with an existing xpt_bc_xprt.
> > > > 
> > > > The connection status is managed by the transport layer, not the
> > > > RPC
> > > > client layer.
> > > 
> > > If it's a bug to mark a backchannel transport disconnected, then
> > > asserts should be added to capture the problem.
> > 
> > It isn't a bug to mark the transport as disconnected. It is a bug to
> > mark it as disconnected and then to continue using it as if it were
> > connected.
> 
> Are you suggesting the server's backchannel logic should
> look for -107 status after posting an RPC, and then recover
> somehow? Perhaps it could put xpt_bc_xprt then set it to
> NULL, and then invoke setup_callback_client() again ?
> 

What I'm saying is that nothing should be calling ->close() on the
struct rpc_xprt held by a struct svc_xprt without doing so through a
call to svc_delete_xprt(), and that nothing should be calling
svc_delete_xprt() without it being part of an effort to close the
svc_xprt.

> 
> > > What is the purpose of this code in rpc_create() ?
> > > 
> > >  533         if (args->bc_xprt) {
> > >  534                 WARN_ON_ONCE(!(args->protocol &
> > > XPRT_TRANSPORT_BC));
> > >  535                 xprt = args->bc_xprt->xpt_bc_xprt;
> > >  536                 if (xprt) {
> > >  537                         xprt_get(xprt);
> > >  538                         return rpc_create_xprt(args, xprt);
> > >  539                 }
> > >  540         }
> > > 
> > 
> > Are you asking me or Bruce?
> 
> Fair enough, adding Bruce.
> 
> 
> > I'm not much of a fan of this code snippet either, because it doesn't
> > really appear to have much to do with the normal function of
> > rpc_create().
> > However as I understand it, the purpose is to create an instance of
> > struct rpc_clnt when presented with an existing instance of svc_xprt.
> > It does not currently modify that svc_xprt instance in any way that
> > breaks layering. All it does is take a reference to the xpt_bc_xprt
> > field.
> 
> I see that the initial addition of this logic was done by
> d531c008d7d9 ("NFSD/SUNRPC: Check rpc_xprt out of xs_setup_bc_tcp").
> It was later moved into rpc_create() by d50039ea5ee6 ("nfsd4/rpc:
> move backchannel create logic into rpc code").
> 
> Probably the essential purpose of this code is explained by a comment
> that was deleted by d531c008d7d9:
> 
> -       if (args->bc_xprt->xpt_bc_xprt) {
> -               /*
> -                * This server connection already has a backchannel
> -                * transport; we can't create a new one, as we wouldn't
> -                * be able to match replies based on xid any more.  So,
> -                * reuse the already-existing one:
> -                */
> -                return args->bc_xprt->xpt_bc_xprt;
> -       }
> 
> Is this still a sensible expectation?
> 

OK, let me clarify where my statements above are coming from and
perhaps clarify my language.

IMO, we need to distinguish between a closed _transport_ as understood
by the RPC layer, and a closed _channel_ as understood by the NFS
layer.

When the transport is closed at the RPC layer, then it means that the
actual transport connection needs to be broken. Both the forward and
backward channels on that transport connection are affected, and the
client needs to re-establish a transport connection in order to resume
communication with the server.

Sessions and their channels are managed by the NFS layer, and so if
only the back channel is 'closed', then that is a matter for the NFS
layer to deal with. It may decide that it needs to shut down the
transport, or not. However if it does keep the transport open, then it
needs to manage the consequences should something come in over the
'closed' back channel.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



      reply	other threads:[~2021-08-20 14:31 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-19 12:29 [PATCH v1] SUNRPC: Ensure backchannel transports are marked connected Chuck Lever
2021-08-19 13:01 ` Trond Myklebust
2021-08-19 13:16   ` Chuck Lever III
2021-08-19 14:14     ` Trond Myklebust
2021-08-19 14:34       ` Chuck Lever III
2021-08-20  0:14         ` Trond Myklebust
2021-08-20 13:58           ` Chuck Lever III
2021-08-20 14:31             ` Trond Myklebust [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ae55e152e9b7f409eb2b8c92e9840d34c2d0b49b.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=bfields@redhat.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.