All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: NeilBrown <neilb@suse.com>
Cc: Olga Kornievskaia <aglo@umich.edu>,
	Anna Schumaker <Anna.Schumaker@netapp.com>,
	Trond Myklebust <trondmy@hammerspace.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH 0/9] Multiple network connections for a single NFS mount.
Date: Fri, 31 May 2019 09:46:32 -0400	[thread overview]
Message-ID: <4316E30B-1BD7-4F0E-8375-03E9F85FFD2B@oracle.com> (raw)
In-Reply-To: <87muj3tuuk.fsf@notabene.neil.brown.name>



> On May 30, 2019, at 6:56 PM, NeilBrown <neilb@suse.com> wrote:
> 
> On Thu, May 30 2019, Chuck Lever wrote:
> 
>> Hi Neil-
>> 
>> Thanks for chasing this a little further.
>> 
>> 
>>> On May 29, 2019, at 8:41 PM, NeilBrown <neilb@suse.com> wrote:
>>> 
>>> This patch set is based on the patches in the multipath_tcp branch of
>>> git://git.linux-nfs.org/projects/trondmy/nfs-2.6.git
>>> 
>>> I'd like to add my voice to those supporting this work and wanting to
>>> see it land.
>>> We have had customers/partners wanting this sort of functionality for
>>> years.  In SLES releases prior to SLE15, we've provide a
>>> "nosharetransport" mount option, so that several filesystem could be
>>> mounted from the same server and each would get its own TCP
>>> connection.
>> 
>> Is it well understood why splitting up the TCP connections result
>> in better performance?
>> 
>> 
>>> In SLE15 we are using this 'nconnect' feature, which is much nicer.
>>> 
>>> Partners have assured us that it improves total throughput,
>>> particularly with bonded networks, but we haven't had any concrete
>>> data until Olga Kornievskaia provided some concrete test data - thanks
>>> Olga!
>>> 
>>> My understanding, as I explain in one of the patches, is that parallel
>>> hardware is normally utilized by distributing flows, rather than
>>> packets.  This avoid out-of-order deliver of packets in a flow.
>>> So multiple flows are needed to utilizes parallel hardware.
>> 
>> Indeed.
>> 
>> However I think one of the problems is what happens in simpler scenarios.
>> We had reports that using nconnect > 1 on virtual clients made things
>> go slower. It's not always wise to establish multiple connections
>> between the same two IP addresses. It depends on the hardware on each
>> end, and the network conditions.
> 
> This is a good argument for leaving the default at '1'.  When
> documentation is added to nfs(5), we can make it clear that the optimal
> number is dependant on hardware.

Is there any visibility into the NIC hardware that can guide this setting?


>> What about situations where the network capabilities between server and
>> client change? Problem is that neither endpoint can detect that; TCP
>> usually just deals with it.
> 
> Being able to manually change (-o remount) the number of connections
> might be useful...

Ugh. I have problems with the administrative interface for this feature,
and this is one of them.

Another is what prevents your client from using a different nconnect=
setting on concurrent mounts of the same server? It's another case of a
per-mount setting being used to control a resource that is shared across
mounts.

Adding user tunables has never been known to increase the aggregate
amount of happiness in the universe. I really hope we can come up with
a better administrative interface... ideally, none would be best.


>> Related Work:
>> 
>> We now have protocol (more like conventions) for clients to discover
>> when a server has additional endpoints so that it can establish
>> connections to each of them.
>> 
>> https://datatracker.ietf.org/doc/rfc8587/
>> 
>> and
>> 
>> https://datatracker.ietf.org/doc/draft-ietf-nfsv4-rfc5661-msns-update/
>> 
>> Boiled down, the client uses fs_locations and trunking detection to
>> figure out when two IP addresses are the same server instance.
>> 
>> This facility can also be used to establish a connection over a
>> different path if network connectivity is lost.
>> 
>> There has also been some exploration of MP-TCP. The magic happens
>> under the transport socket in the network layer, and the RPC client
>> is not involved.
> 
> I would think that SCTP would be the best protocol for NFS to use as it
> supports multi-streaming - several independent streams.  That would
> require that hardware understands it of course.
> 
> Though I have examined MP-TCP closely, it looks like it is still fully
> sequenced, so it would be tricky for two RPC messages to be assembled
> into TCP frames completely independently - at least you would need
> synchronization on the sequence number.
> 
> Thanks for your thoughts,
> NeilBrown
> 
> 
>> 
>> 
>>> Comments most welcome.  I'd love to see this, or something similar,
>>> merged.
>>> 
>>> Thanks,
>>> NeilBrown
>>> 
>>> ---
>>> 
>>> NeilBrown (4):
>>>     NFS: send state management on a single connection.
>>>     SUNRPC: enhance rpc_clnt_show_stats() to report on all xprts.
>>>     SUNRPC: add links for all client xprts to debugfs
>>> 
>>> Trond Myklebust (5):
>>>     SUNRPC: Add basic load balancing to the transport switch
>>>     SUNRPC: Allow creation of RPC clients with multiple connections
>>>     NFS: Add a mount option to specify number of TCP connections to use
>>>     NFSv4: Allow multiple connections to NFSv4.x servers
>>>     pNFS: Allow multiple connections to the DS
>>>     NFS: Allow multiple connections to a NFSv2 or NFSv3 server
>>> 
>>> 
>>> fs/nfs/client.c                      |    3 +
>>> fs/nfs/internal.h                    |    2 +
>>> fs/nfs/nfs3client.c                  |    1 
>>> fs/nfs/nfs4client.c                  |   13 ++++-
>>> fs/nfs/nfs4proc.c                    |   22 +++++---
>>> fs/nfs/super.c                       |   12 ++++
>>> include/linux/nfs_fs_sb.h            |    1 
>>> include/linux/sunrpc/clnt.h          |    1 
>>> include/linux/sunrpc/sched.h         |    1 
>>> include/linux/sunrpc/xprt.h          |    1 
>>> include/linux/sunrpc/xprtmultipath.h |    2 +
>>> net/sunrpc/clnt.c                    |   98 ++++++++++++++++++++++++++++++++--
>>> net/sunrpc/debugfs.c                 |   46 ++++++++++------
>>> net/sunrpc/sched.c                   |    3 +
>>> net/sunrpc/stats.c                   |   15 +++--
>>> net/sunrpc/sunrpc.h                  |    3 +
>>> net/sunrpc/xprtmultipath.c           |   23 +++++++-
>>> 17 files changed, 204 insertions(+), 43 deletions(-)
>>> 
>>> --
>>> Signature
>>> 
>> 
>> --
>> Chuck Lever

--
Chuck Lever




  reply	other threads:[~2019-05-31 13:46 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-30  0:41 [PATCH 0/9] Multiple network connections for a single NFS mount NeilBrown
2019-05-30  0:41 ` [PATCH 2/9] SUNRPC: Allow creation of RPC clients with multiple connections NeilBrown
2019-05-30  0:41 ` [PATCH 9/9] NFS: Allow multiple connections to a NFSv2 or NFSv3 server NeilBrown
2019-05-30  0:41 ` [PATCH 4/9] SUNRPC: enhance rpc_clnt_show_stats() to report on all xprts NeilBrown
2019-05-30  0:41 ` [PATCH 5/9] SUNRPC: add links for all client xprts to debugfs NeilBrown
2019-05-30  0:41 ` [PATCH 3/9] NFS: send state management on a single connection NeilBrown
2019-07-23 18:11   ` Schumaker, Anna
2019-07-23 22:54     ` NeilBrown
2019-07-31  2:05     ` [PATCH] NFS: add flags arg to nfs4_call_sync_sequence() NeilBrown
2019-05-30  0:41 ` [PATCH 8/9] pNFS: Allow multiple connections to the DS NeilBrown
2019-05-30  0:41 ` [PATCH 1/9] SUNRPC: Add basic load balancing to the transport switch NeilBrown
2019-05-30  0:41 ` [PATCH 7/9] NFSv4: Allow multiple connections to NFSv4.x servers NeilBrown
2019-05-30  0:41 ` [PATCH 6/9] NFS: Add a mount option to specify number of TCP connections to use NeilBrown
2019-05-30 17:05 ` [PATCH 0/9] Multiple network connections for a single NFS mount Tom Talpey
2019-05-30 17:20   ` Olga Kornievskaia
2019-05-30 17:41     ` Tom Talpey
2019-05-30 18:41       ` Olga Kornievskaia
2019-05-31  1:45         ` Tom Talpey
2019-05-30 22:38       ` NeilBrown
2019-05-31  1:48         ` Tom Talpey
2019-05-31  2:31           ` NeilBrown
2019-05-31 12:39             ` Tom Talpey
2019-05-30 23:53     ` Rick Macklem
2019-05-31  0:15       ` J. Bruce Fields
2019-05-31  1:01       ` NeilBrown
2019-05-31  2:20         ` Rick Macklem
2019-05-31 12:36           ` Tom Talpey
2019-05-31 13:33             ` Trond Myklebust
2019-05-30 17:56 ` Chuck Lever
2019-05-30 18:59   ` Olga Kornievskaia
2019-05-30 22:56   ` NeilBrown
2019-05-31 13:46     ` Chuck Lever [this message]
2019-05-31 15:38       ` J. Bruce Fields
2019-06-11  1:09       ` NeilBrown
2019-06-11 14:51         ` Chuck Lever
2019-06-11 15:05           ` Tom Talpey
2019-06-11 15:20           ` Trond Myklebust
2019-06-11 15:35             ` Chuck Lever
2019-06-11 16:41               ` Trond Myklebust
2019-06-11 17:32                 ` Chuck Lever
2019-06-11 17:44                   ` Trond Myklebust
2019-06-12 12:34                     ` Steve Dickson
2019-06-12 12:47                       ` Trond Myklebust
2019-06-12 13:10                         ` Trond Myklebust
2019-06-11 15:34           ` Olga Kornievskaia
2019-06-11 17:46             ` Chuck Lever
2019-06-11 19:13               ` Olga Kornievskaia
2019-06-11 20:02                 ` Tom Talpey
2019-06-11 20:09                   ` Chuck Lever
2019-06-11 21:10                     ` Olga Kornievskaia
2019-06-11 21:35                       ` Tom Talpey
2019-06-11 22:55                         ` NeilBrown
2019-06-12 12:55                           ` Tom Talpey
2019-06-11 23:02                       ` NeilBrown
2019-06-11 23:21                   ` NeilBrown
2019-06-12 12:52                     ` Tom Talpey
2019-06-11 23:42               ` NeilBrown
2019-06-12 12:39                 ` Steve Dickson
2019-06-12 17:36                 ` Chuck Lever
2019-06-12 23:03                   ` NeilBrown
2019-06-13 16:13                     ` Chuck Lever
2019-06-12  1:49           ` NeilBrown
2019-06-12 18:32             ` Chuck Lever
2019-06-12 23:37               ` NeilBrown
2019-06-13 16:27                 ` Chuck Lever
2019-05-31  0:24 ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4316E30B-1BD7-4F0E-8375-03E9F85FFD2B@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=Anna.Schumaker@netapp.com \
    --cc=aglo@umich.edu \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.com \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.