From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-fw-33001.amazon.com ([207.171.190.10]:20648 "EHLO smtp-fw-33001.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935410AbeEJQWK (ORCPT ); Thu, 10 May 2018 12:22:10 -0400 Date: Thu, 10 May 2018 16:22:02 +0000 From: Vallish Vaidyeshwara To: Trond Myklebust CC: "bfields@fieldses.org" , "anna.schumaker@netapp.com" , "linux-nfs@vger.kernel.org" , "jlayton@kernel.org" , "jsstraus@amazon.com" Subject: Re: [PATCH 2/2] SUNRPC: Reconnect with new port on server initiated connection termination Message-ID: <20180510162202.GA24317@amazon.com> References: <1525932774-98736-1-git-send-email-vallish@amazon.com> <1525932774-98736-3-git-send-email-vallish@amazon.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" In-Reply-To: Sender: linux-nfs-owner@vger.kernel.org List-ID: On Thu, May 10, 2018 at 03:25:14PM +0000, Trond Myklebust wrote: > On Thu, 2018-05-10 at 06:12 +0000, Vallish Vaidyeshwara wrote: > > Server initiated socket close can corrupt connection state tracking > > table in conjunction with other network middle boxes. In situations > > like these, client connection hangs till connection state tracking > > table entries age out and get purged. Client reconnection with a new > > port in such a situation will avoid connection hang. > > > > Reviewed-by: Jacob Strauss > > Reviewed-by: Alakesh Haloi > > Signed-off-by: Vallish Vaidyeshwara > > --- > > net/sunrpc/xprtsock.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > > index 5bf75b3..d293c8d 100644 > > --- a/net/sunrpc/xprtsock.c > > +++ b/net/sunrpc/xprtsock.c > > @@ -1629,6 +1629,8 @@ static void xs_tcp_state_change(struct sock > > *sk) > > /* The server initiated a shutdown of the socket */ > > xprt->connect_cookie++; > > clear_bit(XPRT_CONNECTED, &xprt->state); > > + /* Server sent FIN, reconnect with a new port */ > > + transport->srcport = 0; > > xs_tcp_force_close(xprt); > > /* fall through */ > > case TCP_CLOSING: > > @@ -1650,6 +1652,9 @@ static void xs_tcp_state_change(struct sock > > *sk) > > &transport->sock_state)) > > xprt_clear_connecting(xprt); > > clear_bit(XPRT_CLOSING, &xprt->state); > > + /* Server sent RST, reconnect with a new port */ > > + if (sk->sk_err == ECONNRESET) > > + transport->srcport = 0; > > if (sk->sk_err) > > xprt_wake_pending_tasks(xprt, -sk->sk_err); > > /* Trigger the socket release */ > > NACK. This will utterly break NFSv2, NFSv3 and NFSv4.0 duplicate replay > cache semantics. > > Cheers > Trond > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com Hello Trond, The first patch in this series is actually helping restore DRC behavior in cases like network partition where packets are dropped: [PATCH 1/2] SUNRPC: Need to reuse non-reserved port for reconnect Patch 1 starts reusing port in all cases. The second patch is still not breaking DRC semantics, to quote from source code: /** * xs_close - close a socket * @xprt: transport * * This is used when all requests are complete; ie, no DRC state remains * on the server we want to save. * * The caller _must_ be holding XPRT_LOCKED in order to avoid issues with * xs_reset_transport() zeroing the socket from underneath a writer. */ static void xs_close(struct rpc_xprt *xprt) If the server has closed a connection, then no DRC state remains on the server we want to use. The second patch is exploiting this semantics and using a new port in following 2 cases: a) RST from server implies the connection was torn down & no useful DRC exists on server b) FIN from server implies that server is shutting down the connection as part of close and no DRC state remains Please let me know if I have missed something obvious, I definitely do not want to break DRC as that is not the intention of this patch series. Is there a situation where server can close a connection and still keep DRC? Thanks. -Vallish