All of lore.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "tom@talpey.com" <tom@talpey.com>,
	"rmacklem@uoguelph.ca" <rmacklem@uoguelph.ca>,
	"aglo@umich.edu" <aglo@umich.edu>,
	"neilb@suse.com" <neilb@suse.com>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"Anna.Schumaker@netapp.com" <Anna.Schumaker@netapp.com>,
	"chuck.lever@oracle.com" <chuck.lever@oracle.com>
Subject: Re: [PATCH 0/9] Multiple network connections for a single NFS mount.
Date: Fri, 31 May 2019 13:33:31 +0000	[thread overview]
Message-ID: <be61bb21f6a208ef01b5a5addeb42d17c52c621a.camel@hammerspace.com> (raw)
In-Reply-To: <4031093f-2044-da8b-9ba7-7b2a2000847c@talpey.com>

On Fri, 2019-05-31 at 08:36 -0400, Tom Talpey wrote:
> On 5/30/2019 10:20 PM, Rick Macklem wrote:
> > NeilBrown wrote:
> > > On Thu, May 30 2019, Rick Macklem wrote:
> > > 
> > > > Olga Kornievskaia wrote:
> > > > > On Thu, May 30, 2019 at 1:05 PM Tom Talpey <tom@talpey.com>
> > > > > wrote:
> > > > > > On 5/29/2019 8:41 PM, NeilBrown wrote:
> > > > > > > I've also re-arrange the patches a bit, merged two, and
> > > > > > > remove the
> > > > > > > restriction to TCP and NFSV4.x,x>=1.  Discussions seemed
> > > > > > > to suggest
> > > > > > > these restrictions were not needed, I can see no need.
> > > > > > 
> > > > > > I believe the need is for the correctness of retries.
> > > > > > Because NFSv2,
> > > > > > NFSv3 and NFSv4.0 have no exactly-once semantics of their
> > > > > > own, server
> > > > > > duplicate request caches are important (although often
> > > > > > imperfect).
> > > > > > These caches use client XID's, source ports and addresses,
> > > > > > sometimes
> > > > > > in addition to other methods, to detect retry. Existing
> > > > > > clients are
> > > > > > careful to reconnect with the same source port, to ensure
> > > > > > this. And
> > > > > > existing servers won't change.
> > > > > 
> > > > > Retries are already bound to the same connection so there
> > > > > shouldn't be
> > > > > an issue of a retransmission coming from a different source
> > > > > port.
> > > > I don't think the above is correct for NFSv4.0 (it may very
> > > > well be true for NFSv3).
> > > 
> > > It is correct for the Linux implementation of NFS, though the
> > > term
> > > "xprt" is more accurate than "connection".
> > > 
> > > A "task" is bound it a specific "xprt" which, in the case of tcp,
> > > has a
> > > fixed source port.  If the TCP connection breaks, a new one is
> > > created
> > > with the same addresses and ports, and this new connection serves
> > > the
> > > same xprt.
> > Ok, that's interesting. The FreeBSD client side krpc uses "xprt"s
> > too
> > (I assume they came from some old Sun open sources for RPC)
> > but it just creates a new socket and binds it to any port#
> > available.
> > When this happens in the FreeBSD client, the old connection is
> > sometimes still
> > sitting around in some FIN_WAIT state. My TCP is pretty minimal,
> > but I didn't
> > think you could safely create a new connection using the same
> > port#s at that point,
> > or at least the old BSD TCP stack code won't allow it.
> > 
> > Anyhow, the FreeBSD client doesn't use same source port# for the
> > new connection.
> > 
> > > > Here's what RFC7530 Sec. 3.1.1 says:
> > > > 3.1.1.  Client Retransmission Behavior
> > > > 
> > > >     When processing an NFSv4 request received over a reliable
> > > > transport
> > > >     such as TCP, the NFSv4 server MUST NOT silently drop the
> > > > request,
> > > >     except if the established transport connection has been
> > > > broken.
> > > >     Given such a contract between NFSv4 clients and servers,
> > > > clients MUST
> > > >     NOT retry a request unless one or both of the following are
> > > > true:
> > > > 
> > > >     o  The transport connection has been broken
> > > > 
> > > >     o  The procedure being retried is the NULL procedure
> > > > 
> > > > If the transport connection is broken, the retry needs to be
> > > > done on a new TCP
> > > > connection, does it not? (I'm assuming you are referring to a
> > > > retry of an RPC here.)
> > > > (My interpretation of "broken" is "can't be fixed, so the
> > > > client must use a different
> > > >   TCP connection.)
> > > 
> > > Yes, a new connection.  But the Linux client makes sure to use
> > > the same
> > > source port.
> > Ok. I guess my DRC code that expects "different source port#" for
> > NFSv4.0 is
> > broken. It will result in a DRC miss, which isn't great, but is
> > always possible for
> > any DRC design. (Not nearly as bad as a false hit.)
> > 
> > > > Also, NFSv4.0 cannot use Sun RPC over UDP, whereas some DRCs
> > > > only
> > > > work for UDP traffic. (The FreeBSD server does have DRC support
> > > > for TCP, but
> > > > the algorithm is very different than what is used for UDP, due
> > > > to the long delay
> > > > before a retried RPC request is received. This can result in
> > > > significant server
> > > > overheads, so some sites choose to disable the DRC for TCP
> > > > traffic or tune it
> > > > in such a way as it becomes almost useless.)
> > > > The FreeBSD DRC code for NFS over TCP expects the retry to be
> > > > from a different
> > > > port# (due to a new connection re: the above) for NFSv4.0. For
> > > > NFSv3, my best
> > > > recollection is that it doesn't care what the source port# is.
> > > > (It basically uses a
> > > > hash on the RPC request excluding TCP/IP header to recognize
> > > > possible
> > > > duplicates.)
> > > 
> > > Interesting .... hopefully the hash is sufficiently strong.
> > It doesn't just use the hash (it still expects same xid, etc), it
> > just doesn't use the TCP
> > source port#.
> > 
> > To be honest, when I played with this many years ago, unless the
> > size of the DRC
> > is very large and entries persist in the cache for a long time,
> > they always fall out
> > of the cache before the retry happens over TCP. At least for the
> > cases I tried back
> > then, where the RPC retry timeout for TCP was pretty large.
> > (Sites that use FreeBSD servers under heavy load usually find the
> > DRC grows too
> >   large and tune it down until it no longer would work for TCP
> > anyhow.)
> > 
> > My position is that this all got fixed by sessions and if someone
> > uses NFSv4.0 instead
> > of NFSv4.1, they may just have to live with the limitations of no
> > "exactly once"
> > semantics. (Personally, NFSv4.0 should just be deprecated. I know
> > people still have good uses for NFSv3, but I have trouble believing
> > NFSv4.0 is preferred over NFSv4.1,
> > although Bruce did note a case where there was a performance
> > difference.)
> > 
> > > I think it is best to assume same source port, but there is no
> > > formal
> > > standard.
> > I'd say you can't assume "same port#" or "different port#', since
> > there is no standard.
> > But I would agree that "assuming same port#" will just result in
> > false misses for
> > clients that don't use the same port#.
> 
> Hey Rick. I think the best summary is to say the traditional DRC is
> deeply flawed and can't fully protect this. Many of us, you and I
> included, have tried various ways to fix this, with varying degrees
> of success.
> 
> My point here is not perfection however. My point is, there are
> servers
> out there which will behave quite differently in the face of this
> proposed client behavior, and I'm raising the issue.

Tom, this set of patches does _not_ change client behaviour w.r.t.
replays in any way compared to before. I deliberately designed it not
to.

As others have already explained, the design does not change the
behaviour of reusing the same port when reconnecting any given xprt.
The client reuses exactly the same code that is currently used, where
there is only one xprt, to ensure that we first try to bind to the same
port we used before the connection was broken.
Furthermore, there is never a case where the client deliberately tries
to break the connection when there are outstanding RPC requests
(including when replaying NFSv2/v3 requests). Requests are always
replayed on the same xprt on which they were originally sent because
the purpose of this patchset has not been to provide fail-over
redundancy, but to attempt to improve performance in the case where the
server is responsive and able to scale.
Any TCP connection breakage happens from the server side (or from the
network itself), meaning that TIME_WAIT states are generally not a
problem. Any other issues with TCP reconnection are common to both the
existing code and the new code.

When we add dynamic management of the number of xprts per client (and
yes, I do still want to do that) then there will be DRC replay issues
with NFSv2/v3/v4.0 if we start removing xprts which have active
requests associated with them, so that needs to be done with care.
However the current patchset does not do dynamic management, so that
point is moot for now (using the word "moot" in the American, and not
the British sense).

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



  reply	other threads:[~2019-05-31 13:33 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-30  0:41 [PATCH 0/9] Multiple network connections for a single NFS mount NeilBrown
2019-05-30  0:41 ` [PATCH 2/9] SUNRPC: Allow creation of RPC clients with multiple connections NeilBrown
2019-05-30  0:41 ` [PATCH 9/9] NFS: Allow multiple connections to a NFSv2 or NFSv3 server NeilBrown
2019-05-30  0:41 ` [PATCH 4/9] SUNRPC: enhance rpc_clnt_show_stats() to report on all xprts NeilBrown
2019-05-30  0:41 ` [PATCH 5/9] SUNRPC: add links for all client xprts to debugfs NeilBrown
2019-05-30  0:41 ` [PATCH 3/9] NFS: send state management on a single connection NeilBrown
2019-07-23 18:11   ` Schumaker, Anna
2019-07-23 22:54     ` NeilBrown
2019-07-31  2:05     ` [PATCH] NFS: add flags arg to nfs4_call_sync_sequence() NeilBrown
2019-05-30  0:41 ` [PATCH 8/9] pNFS: Allow multiple connections to the DS NeilBrown
2019-05-30  0:41 ` [PATCH 1/9] SUNRPC: Add basic load balancing to the transport switch NeilBrown
2019-05-30  0:41 ` [PATCH 7/9] NFSv4: Allow multiple connections to NFSv4.x servers NeilBrown
2019-05-30  0:41 ` [PATCH 6/9] NFS: Add a mount option to specify number of TCP connections to use NeilBrown
2019-05-30 17:05 ` [PATCH 0/9] Multiple network connections for a single NFS mount Tom Talpey
2019-05-30 17:20   ` Olga Kornievskaia
2019-05-30 17:41     ` Tom Talpey
2019-05-30 18:41       ` Olga Kornievskaia
2019-05-31  1:45         ` Tom Talpey
2019-05-30 22:38       ` NeilBrown
2019-05-31  1:48         ` Tom Talpey
2019-05-31  2:31           ` NeilBrown
2019-05-31 12:39             ` Tom Talpey
2019-05-30 23:53     ` Rick Macklem
2019-05-31  0:15       ` J. Bruce Fields
2019-05-31  1:01       ` NeilBrown
2019-05-31  2:20         ` Rick Macklem
2019-05-31 12:36           ` Tom Talpey
2019-05-31 13:33             ` Trond Myklebust [this message]
2019-05-30 17:56 ` Chuck Lever
2019-05-30 18:59   ` Olga Kornievskaia
2019-05-30 22:56   ` NeilBrown
2019-05-31 13:46     ` Chuck Lever
2019-05-31 15:38       ` J. Bruce Fields
2019-06-11  1:09       ` NeilBrown
2019-06-11 14:51         ` Chuck Lever
2019-06-11 15:05           ` Tom Talpey
2019-06-11 15:20           ` Trond Myklebust
2019-06-11 15:35             ` Chuck Lever
2019-06-11 16:41               ` Trond Myklebust
2019-06-11 17:32                 ` Chuck Lever
2019-06-11 17:44                   ` Trond Myklebust
2019-06-12 12:34                     ` Steve Dickson
2019-06-12 12:47                       ` Trond Myklebust
2019-06-12 13:10                         ` Trond Myklebust
2019-06-11 15:34           ` Olga Kornievskaia
2019-06-11 17:46             ` Chuck Lever
2019-06-11 19:13               ` Olga Kornievskaia
2019-06-11 20:02                 ` Tom Talpey
2019-06-11 20:09                   ` Chuck Lever
2019-06-11 21:10                     ` Olga Kornievskaia
2019-06-11 21:35                       ` Tom Talpey
2019-06-11 22:55                         ` NeilBrown
2019-06-12 12:55                           ` Tom Talpey
2019-06-11 23:02                       ` NeilBrown
2019-06-11 23:21                   ` NeilBrown
2019-06-12 12:52                     ` Tom Talpey
2019-06-11 23:42               ` NeilBrown
2019-06-12 12:39                 ` Steve Dickson
2019-06-12 17:36                 ` Chuck Lever
2019-06-12 23:03                   ` NeilBrown
2019-06-13 16:13                     ` Chuck Lever
2019-06-12  1:49           ` NeilBrown
2019-06-12 18:32             ` Chuck Lever
2019-06-12 23:37               ` NeilBrown
2019-06-13 16:27                 ` Chuck Lever
2019-05-31  0:24 ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=be61bb21f6a208ef01b5a5addeb42d17c52c621a.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=Anna.Schumaker@netapp.com \
    --cc=aglo@umich.edu \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.com \
    --cc=rmacklem@uoguelph.ca \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.