* [RFC PATCH 0/5] Fun with the multipathing code @ 2017-04-28 17:25 Trond Myklebust 2017-04-28 17:25 ` [RFC PATCH 1/5] SUNRPC: Allow creation of RPC clients with multiple connections Trond Myklebust ` (3 more replies) 0 siblings, 4 replies; 23+ messages in thread From: Trond Myklebust @ 2017-04-28 17:25 UTC (permalink / raw) To: linux-nfs In the spirit of experimentation, I've put together a set of patches that implement setting up multiple TCP connections to the server. The connections all go to the same server IP address, so do not provide support for multiple IP addresses (which I believe is something Andy Adamson is working on). The feature is only enabled for NFSv4.1 and NFSv4.2 for now; I don't feel comfortable subjecting NFSv3/v4 replay caches to this treatment yet. It relies on the mount option "nconnect" to specify the number of connections to st up. So you can do something like 'mount -t nfs -overs=4.1,nconnect=8 foo:/bar /mnt' to set up 8 TCP connections to server 'foo'. Anyhow, feel free to test and give me feedback as to whether or not this helps performance on your system. Trond Myklebust (5): SUNRPC: Allow creation of RPC clients with multiple connections NFS: Add a mount option to specify number of TCP connections to use NFSv4: Allow multiple connections to NFSv4.x (x>0) servers pNFS: Allow multiple connections to the DS NFS: Display the "nconnect" mount option if it is set. fs/nfs/client.c | 2 ++ fs/nfs/internal.h | 2 ++ fs/nfs/nfs3client.c | 3 +++ fs/nfs/nfs4client.c | 13 +++++++++++-- fs/nfs/super.c | 12 ++++++++++++ include/linux/nfs_fs_sb.h | 1 + include/linux/sunrpc/clnt.h | 1 + net/sunrpc/clnt.c | 17 ++++++++++++++++- net/sunrpc/xprtmultipath.c | 3 +-- 9 files changed, 49 insertions(+), 5 deletions(-) -- 2.9.3 ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH 1/5] SUNRPC: Allow creation of RPC clients with multiple connections 2017-04-28 17:25 [RFC PATCH 0/5] Fun with the multipathing code Trond Myklebust @ 2017-04-28 17:25 ` Trond Myklebust 2017-04-28 17:25 ` [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use Trond Myklebust 2017-04-28 17:45 ` [RFC PATCH 0/5] Fun with the multipathing code Chuck Lever ` (2 subsequent siblings) 3 siblings, 1 reply; 23+ messages in thread From: Trond Myklebust @ 2017-04-28 17:25 UTC (permalink / raw) To: linux-nfs Add an argument to struct rpc_create_args that allows the specification of how many transport connections you want to set up to the server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- include/linux/sunrpc/clnt.h | 1 + net/sunrpc/clnt.c | 17 ++++++++++++++++- net/sunrpc/xprtmultipath.c | 3 +-- 3 files changed, 18 insertions(+), 3 deletions(-) diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h index 6095ecba0dde..8c3cb38a385b 100644 --- a/include/linux/sunrpc/clnt.h +++ b/include/linux/sunrpc/clnt.h @@ -120,6 +120,7 @@ struct rpc_create_args { u32 prognumber; /* overrides program->number */ u32 version; rpc_authflavor_t authflavor; + u32 nconnect; unsigned long flags; char *client_name; struct svc_xprt *bc_xprt; /* NFSv4.1 backchannel */ diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 673046c64e48..0ff97288b43f 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -522,6 +522,8 @@ struct rpc_clnt *rpc_create(struct rpc_create_args *args) .bc_xprt = args->bc_xprt, }; char servername[48]; + struct rpc_clnt *clnt; + int i; if (args->bc_xprt) { WARN_ON_ONCE(!(args->protocol & XPRT_TRANSPORT_BC)); @@ -584,7 +586,15 @@ struct rpc_clnt *rpc_create(struct rpc_create_args *args) if (args->flags & RPC_CLNT_CREATE_NONPRIVPORT) xprt->resvport = 0; - return rpc_create_xprt(args, xprt); + clnt = rpc_create_xprt(args, xprt); + if (IS_ERR(clnt) || args->nconnect <= 1) + return clnt; + + for (i = 0; i < args->nconnect - 1; i++) { + if (rpc_clnt_add_xprt(clnt, &xprtargs, NULL, NULL) < 0) + break; + } + return clnt; } EXPORT_SYMBOL_GPL(rpc_create); @@ -2605,6 +2615,10 @@ int rpc_clnt_test_and_add_xprt(struct rpc_clnt *clnt, return -ENOMEM; data->xps = xprt_switch_get(xps); data->xprt = xprt_get(xprt); + if (rpc_xprt_switch_has_addr(data->xps, (struct sockaddr *)&xprt->addr)) { + rpc_cb_add_xprt_release(data); + goto success; + } cred = authnull_ops.lookup_cred(NULL, NULL, 0); task = rpc_call_null_helper(clnt, xprt, cred, @@ -2614,6 +2628,7 @@ int rpc_clnt_test_and_add_xprt(struct rpc_clnt *clnt, if (IS_ERR(task)) return PTR_ERR(task); rpc_put_task(task); +success: return 1; } EXPORT_SYMBOL_GPL(rpc_clnt_test_and_add_xprt); diff --git a/net/sunrpc/xprtmultipath.c b/net/sunrpc/xprtmultipath.c index 95064d510ce6..486819d0c58b 100644 --- a/net/sunrpc/xprtmultipath.c +++ b/net/sunrpc/xprtmultipath.c @@ -51,8 +51,7 @@ void rpc_xprt_switch_add_xprt(struct rpc_xprt_switch *xps, if (xprt == NULL) return; spin_lock(&xps->xps_lock); - if ((xps->xps_net == xprt->xprt_net || xps->xps_net == NULL) && - !rpc_xprt_switch_has_addr(xps, (struct sockaddr *)&xprt->addr)) + if (xps->xps_net == xprt->xprt_net || xps->xps_net == NULL) xprt_switch_add_xprt_locked(xps, xprt); spin_unlock(&xps->xps_lock); } -- 2.9.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use 2017-04-28 17:25 ` [RFC PATCH 1/5] SUNRPC: Allow creation of RPC clients with multiple connections Trond Myklebust @ 2017-04-28 17:25 ` Trond Myklebust 2017-04-28 17:25 ` [RFC PATCH 3/5] NFSv4: Allow multiple connections to NFSv4.x (x>0) servers Trond Myklebust 2017-05-04 13:45 ` [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use Chuck Lever 0 siblings, 2 replies; 23+ messages in thread From: Trond Myklebust @ 2017-04-28 17:25 UTC (permalink / raw) To: linux-nfs Allow the user to specify that the client should use multiple connections to the server. For the moment, this functionality will be limited to TCP and to NFSv4.x (x>0). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/internal.h | 1 + fs/nfs/super.c | 10 ++++++++++ 2 files changed, 11 insertions(+) diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 31b26cf1b476..31757a742e9b 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -117,6 +117,7 @@ struct nfs_parsed_mount_data { char *export_path; int port; unsigned short protocol; + unsigned short nconnect; } nfs_server; struct security_mnt_opts lsm_opts; diff --git a/fs/nfs/super.c b/fs/nfs/super.c index 54e0f9f2dd94..7eb48934dc79 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -76,6 +76,8 @@ #define NFS_DEFAULT_VERSION 2 #endif +#define NFS_MAX_CONNECTIONS 16 + enum { /* Mount options that take no arguments */ Opt_soft, Opt_hard, @@ -107,6 +109,7 @@ enum { Opt_nfsvers, Opt_sec, Opt_proto, Opt_mountproto, Opt_mounthost, Opt_addr, Opt_mountaddr, Opt_clientaddr, + Opt_nconnect, Opt_lookupcache, Opt_fscache_uniq, Opt_local_lock, @@ -179,6 +182,8 @@ static const match_table_t nfs_mount_option_tokens = { { Opt_mounthost, "mounthost=%s" }, { Opt_mountaddr, "mountaddr=%s" }, + { Opt_nconnect, "nconnect=%s" }, + { Opt_lookupcache, "lookupcache=%s" }, { Opt_fscache_uniq, "fsc=%s" }, { Opt_local_lock, "local_lock=%s" }, @@ -1544,6 +1549,11 @@ static int nfs_parse_mount_options(char *raw, if (mnt->mount_server.addrlen == 0) goto out_invalid_address; break; + case Opt_nconnect: + if (nfs_get_option_ul_bound(args, &option, 1, NFS_MAX_CONNECTIONS)) + goto out_invalid_value; + mnt->nfs_server.nconnect = option; + break; case Opt_lookupcache: string = match_strdup(args); if (string == NULL) -- 2.9.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [RFC PATCH 3/5] NFSv4: Allow multiple connections to NFSv4.x (x>0) servers 2017-04-28 17:25 ` [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use Trond Myklebust @ 2017-04-28 17:25 ` Trond Myklebust 2017-04-28 17:25 ` [RFC PATCH 4/5] pNFS: Allow multiple connections to the DS Trond Myklebust 2017-05-04 13:45 ` [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use Chuck Lever 1 sibling, 1 reply; 23+ messages in thread From: Trond Myklebust @ 2017-04-28 17:25 UTC (permalink / raw) To: linux-nfs If the user specifies the -onconn=<number> mount option, and the transport protocol is TCP, then set up <number> connections to the server. The connections will all go to the same IP address. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/client.c | 2 ++ fs/nfs/internal.h | 1 + fs/nfs/nfs4client.c | 10 ++++++++-- include/linux/nfs_fs_sb.h | 1 + 4 files changed, 12 insertions(+), 2 deletions(-) diff --git a/fs/nfs/client.c b/fs/nfs/client.c index e0302101e18a..c5b0f3e270a3 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -180,6 +180,7 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init) clp->cl_rpcclient = ERR_PTR(-EINVAL); clp->cl_proto = cl_init->proto; + clp->cl_nconnect = cl_init->nconnect; clp->cl_net = get_net(cl_init->net); cred = rpc_lookup_machine_cred("*"); @@ -488,6 +489,7 @@ int nfs_create_rpc_client(struct nfs_client *clp, struct rpc_create_args args = { .net = clp->cl_net, .protocol = clp->cl_proto, + .nconnect = clp->cl_nconnect, .address = (struct sockaddr *)&clp->cl_addr, .addrsize = clp->cl_addrlen, .timeout = cl_init->timeparms, diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 31757a742e9b..abe5d3934eaf 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -77,6 +77,7 @@ struct nfs_client_initdata { struct nfs_subversion *nfs_mod; int proto; u32 minorversion; + unsigned int nconnect; struct net *net; const struct rpc_timeout *timeparms; }; diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c index 692a7a8bfc7a..c9b10b7829f0 100644 --- a/fs/nfs/nfs4client.c +++ b/fs/nfs/nfs4client.c @@ -834,7 +834,8 @@ static int nfs4_set_client(struct nfs_server *server, const size_t addrlen, const char *ip_addr, int proto, const struct rpc_timeout *timeparms, - u32 minorversion, struct net *net) + u32 minorversion, unsigned int nconnect, + struct net *net) { struct nfs_client_initdata cl_init = { .hostname = hostname, @@ -849,6 +850,8 @@ static int nfs4_set_client(struct nfs_server *server, }; struct nfs_client *clp; + if (minorversion > 0 && proto == XPRT_TRANSPORT_TCP) + cl_init.nconnect = nconnect; if (server->flags & NFS_MOUNT_NORESVPORT) set_bit(NFS_CS_NORESVPORT, &cl_init.init_flags); if (server->options & NFS_OPTION_MIGRATION) @@ -1040,6 +1043,7 @@ static int nfs4_init_server(struct nfs_server *server, data->nfs_server.protocol, &timeparms, data->minorversion, + data->nfs_server.nconnect, data->net); if (error < 0) return error; @@ -1124,6 +1128,7 @@ struct nfs_server *nfs4_create_referral_server(struct nfs_clone_mount *data, rpc_protocol(parent_server->client), parent_server->client->cl_timeout, parent_client->cl_mvops->minor_version, + parent_client->cl_nconnect, parent_client->cl_net); if (error < 0) goto error; @@ -1215,7 +1220,8 @@ int nfs4_update_server(struct nfs_server *server, const char *hostname, nfs_server_remove_lists(server); error = nfs4_set_client(server, hostname, sap, salen, buf, clp->cl_proto, clnt->cl_timeout, - clp->cl_minorversion, net); + clp->cl_minorversion, + clp->cl_nconnect, net); nfs_put_client(clp); if (error != 0) { nfs_server_insert_lists(server); diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h index 2a70f34dffe8..b7e6b94d1246 100644 --- a/include/linux/nfs_fs_sb.h +++ b/include/linux/nfs_fs_sb.h @@ -55,6 +55,7 @@ struct nfs_client { struct nfs_subversion * cl_nfs_mod; /* pointer to nfs version module */ u32 cl_minorversion;/* NFSv4 minorversion */ + unsigned int cl_nconnect; /* Number of connections */ struct rpc_cred *cl_machine_cred; #if IS_ENABLED(CONFIG_NFS_V4) -- 2.9.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [RFC PATCH 4/5] pNFS: Allow multiple connections to the DS 2017-04-28 17:25 ` [RFC PATCH 3/5] NFSv4: Allow multiple connections to NFSv4.x (x>0) servers Trond Myklebust @ 2017-04-28 17:25 ` Trond Myklebust 2017-04-28 17:25 ` [RFC PATCH 5/5] NFS: Display the "nconnect" mount option if it is set Trond Myklebust 0 siblings, 1 reply; 23+ messages in thread From: Trond Myklebust @ 2017-04-28 17:25 UTC (permalink / raw) To: linux-nfs If the user specifies -onconn=<number> mount option, and the transport protocol is TCP, then set up <number> connections to the pNFS data server as well. The connections will all go to the same IP address. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/nfs3client.c | 3 +++ fs/nfs/nfs4client.c | 3 +++ 2 files changed, 6 insertions(+) diff --git a/fs/nfs/nfs3client.c b/fs/nfs/nfs3client.c index 7879f2a0fcfd..8c624c74ddbe 100644 --- a/fs/nfs/nfs3client.c +++ b/fs/nfs/nfs3client.c @@ -100,6 +100,9 @@ struct nfs_client *nfs3_set_ds_client(struct nfs_server *mds_srv, return ERR_PTR(-EINVAL); cl_init.hostname = buf; + if (mds_clp->cl_nconnect > 1 && ds_proto == XPRT_TRANSPORT_TCP) + cl_init.nconnect = mds_clp->cl_nconnect; + if (mds_srv->flags & NFS_MOUNT_NORESVPORT) set_bit(NFS_CS_NORESVPORT, &cl_init.init_flags); diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c index c9b10b7829f0..bfea1b232dd2 100644 --- a/fs/nfs/nfs4client.c +++ b/fs/nfs/nfs4client.c @@ -912,6 +912,9 @@ struct nfs_client *nfs4_set_ds_client(struct nfs_server *mds_srv, return ERR_PTR(-EINVAL); cl_init.hostname = buf; + if (mds_clp->cl_nconnect > 1 && ds_proto == XPRT_TRANSPORT_TCP) + cl_init.nconnect = mds_clp->cl_nconnect; + if (mds_srv->flags & NFS_MOUNT_NORESVPORT) __set_bit(NFS_CS_NORESVPORT, &cl_init.init_flags); -- 2.9.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [RFC PATCH 5/5] NFS: Display the "nconnect" mount option if it is set. 2017-04-28 17:25 ` [RFC PATCH 4/5] pNFS: Allow multiple connections to the DS Trond Myklebust @ 2017-04-28 17:25 ` Trond Myklebust 0 siblings, 0 replies; 23+ messages in thread From: Trond Myklebust @ 2017-04-28 17:25 UTC (permalink / raw) To: linux-nfs Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/super.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/nfs/super.c b/fs/nfs/super.c index 7eb48934dc79..0e07a6684235 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -673,6 +673,8 @@ static void nfs_show_mount_options(struct seq_file *m, struct nfs_server *nfss, seq_printf(m, ",proto=%s", rpc_peeraddr2str(nfss->client, RPC_DISPLAY_NETID)); rcu_read_unlock(); + if (clp->cl_nconnect > 0) + seq_printf(m, ",nconnect=%u", clp->cl_nconnect); if (version == 4) { if (nfss->port != NFS_PORT) seq_printf(m, ",port=%u", nfss->port); -- 2.9.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use 2017-04-28 17:25 ` [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use Trond Myklebust 2017-04-28 17:25 ` [RFC PATCH 3/5] NFSv4: Allow multiple connections to NFSv4.x (x>0) servers Trond Myklebust @ 2017-05-04 13:45 ` Chuck Lever 2017-05-04 13:53 ` Chuck Lever 2017-05-04 16:01 ` Chuck Lever 1 sibling, 2 replies; 23+ messages in thread From: Chuck Lever @ 2017-05-04 13:45 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List Hi Trond- > On Apr 28, 2017, at 1:25 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > > Allow the user to specify that the client should use multiple connections > to the server. For the moment, this functionality will be limited to > TCP and to NFSv4.x (x>0). Some initial reactions: - 5/5 could be squashed into this patch (2/5). - 4/5 adds support for using NFSv3 with a DS. Why can't you add NFSv3 support for multiple connections in the non-pNFS case? If there is a good reason, it should be stated in 4/5's patch description or added as a comment somewhere in the source code. - Testing with a Linux server shows that the basic NFS/RDMA pieces work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use nconnect > 1. I'm looking into it. - Testing with a Solaris 12 server prototype that supports NFSv4.1 works fine with nconnect=[23]. Not seeing much performance improvement there because the server is using QDR and a single SATA SSD. Thus I don't see a strong need to keep the TCP-only limitation. However, if you do keep it, the logic that implements the second sentence in the patch description above is added in 3/5. Should this sentence be in that patch description instead? Or, instead: s/For the moment/In a subsequent patch > Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> > --- > fs/nfs/internal.h | 1 + > fs/nfs/super.c | 10 ++++++++++ > 2 files changed, 11 insertions(+) > > diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h > index 31b26cf1b476..31757a742e9b 100644 > --- a/fs/nfs/internal.h > +++ b/fs/nfs/internal.h > @@ -117,6 +117,7 @@ struct nfs_parsed_mount_data { > char *export_path; > int port; > unsigned short protocol; > + unsigned short nconnect; > } nfs_server; > > struct security_mnt_opts lsm_opts; > diff --git a/fs/nfs/super.c b/fs/nfs/super.c > index 54e0f9f2dd94..7eb48934dc79 100644 > --- a/fs/nfs/super.c > +++ b/fs/nfs/super.c > @@ -76,6 +76,8 @@ > #define NFS_DEFAULT_VERSION 2 > #endif > > +#define NFS_MAX_CONNECTIONS 16 > + > enum { > /* Mount options that take no arguments */ > Opt_soft, Opt_hard, > @@ -107,6 +109,7 @@ enum { > Opt_nfsvers, > Opt_sec, Opt_proto, Opt_mountproto, Opt_mounthost, > Opt_addr, Opt_mountaddr, Opt_clientaddr, > + Opt_nconnect, > Opt_lookupcache, > Opt_fscache_uniq, > Opt_local_lock, > @@ -179,6 +182,8 @@ static const match_table_t nfs_mount_option_tokens = { > { Opt_mounthost, "mounthost=%s" }, > { Opt_mountaddr, "mountaddr=%s" }, > > + { Opt_nconnect, "nconnect=%s" }, > + > { Opt_lookupcache, "lookupcache=%s" }, > { Opt_fscache_uniq, "fsc=%s" }, > { Opt_local_lock, "local_lock=%s" }, > @@ -1544,6 +1549,11 @@ static int nfs_parse_mount_options(char *raw, > if (mnt->mount_server.addrlen == 0) > goto out_invalid_address; > break; > + case Opt_nconnect: > + if (nfs_get_option_ul_bound(args, &option, 1, NFS_MAX_CONNECTIONS)) > + goto out_invalid_value; > + mnt->nfs_server.nconnect = option; > + break; > case Opt_lookupcache: > string = match_strdup(args); > if (string == NULL) > -- > 2.9.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use 2017-05-04 13:45 ` [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use Chuck Lever @ 2017-05-04 13:53 ` Chuck Lever 2017-05-04 16:01 ` Chuck Lever 1 sibling, 0 replies; 23+ messages in thread From: Chuck Lever @ 2017-05-04 13:53 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List > On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote: > > Hi Trond- > > >> On Apr 28, 2017, at 1:25 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: >> >> Allow the user to specify that the client should use multiple connections >> to the server. For the moment, this functionality will be limited to >> TCP and to NFSv4.x (x>0). > > Some initial reactions: > > - 5/5 could be squashed into this patch (2/5). > > - 4/5 adds support for using NFSv3 with a DS. Why can't you add NFSv3 > support for multiple connections in the non-pNFS case? If there is a > good reason, it should be stated in 4/5's patch description or added > as a comment somewhere in the source code. > > - Testing with a Linux server shows that the basic NFS/RDMA pieces > work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use > nconnect > 1. I'm looking into it. > > - Testing with a Solaris 12 server prototype that supports NFSv4.1 > works fine with nconnect=[23]. Not seeing much performance improvement > there because the server is using QDR and a single SATA SSD. > > Thus I don't see a strong need to keep the TCP-only limitation. However, > if you do keep it, the logic that implements the second sentence in the > patch description above is added in 3/5. Should this sentence be in that > patch description instead? Or, instead: > > s/For the moment/In a subsequent patch Oops, I forgot to mention: mountstats data looks a little confused when nconnect > 1. For example: WRITE: 3075342 ops (131%) avg bytes sent per op: 26829 avg bytes received per op: 113 backlog wait: 162.375128 RTT: 1.481101 total execute time: 163.861735 (milliseconds) Haven't looked closely at that 131%, but it could be either the kernel or the script itself is assuming one connection per mount. >> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> >> --- >> fs/nfs/internal.h | 1 + >> fs/nfs/super.c | 10 ++++++++++ >> 2 files changed, 11 insertions(+) >> >> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h >> index 31b26cf1b476..31757a742e9b 100644 >> --- a/fs/nfs/internal.h >> +++ b/fs/nfs/internal.h >> @@ -117,6 +117,7 @@ struct nfs_parsed_mount_data { >> char *export_path; >> int port; >> unsigned short protocol; >> + unsigned short nconnect; >> } nfs_server; >> >> struct security_mnt_opts lsm_opts; >> diff --git a/fs/nfs/super.c b/fs/nfs/super.c >> index 54e0f9f2dd94..7eb48934dc79 100644 >> --- a/fs/nfs/super.c >> +++ b/fs/nfs/super.c >> @@ -76,6 +76,8 @@ >> #define NFS_DEFAULT_VERSION 2 >> #endif >> >> +#define NFS_MAX_CONNECTIONS 16 >> + >> enum { >> /* Mount options that take no arguments */ >> Opt_soft, Opt_hard, >> @@ -107,6 +109,7 @@ enum { >> Opt_nfsvers, >> Opt_sec, Opt_proto, Opt_mountproto, Opt_mounthost, >> Opt_addr, Opt_mountaddr, Opt_clientaddr, >> + Opt_nconnect, >> Opt_lookupcache, >> Opt_fscache_uniq, >> Opt_local_lock, >> @@ -179,6 +182,8 @@ static const match_table_t nfs_mount_option_tokens = { >> { Opt_mounthost, "mounthost=%s" }, >> { Opt_mountaddr, "mountaddr=%s" }, >> >> + { Opt_nconnect, "nconnect=%s" }, >> + >> { Opt_lookupcache, "lookupcache=%s" }, >> { Opt_fscache_uniq, "fsc=%s" }, >> { Opt_local_lock, "local_lock=%s" }, >> @@ -1544,6 +1549,11 @@ static int nfs_parse_mount_options(char *raw, >> if (mnt->mount_server.addrlen == 0) >> goto out_invalid_address; >> break; >> + case Opt_nconnect: >> + if (nfs_get_option_ul_bound(args, &option, 1, NFS_MAX_CONNECTIONS)) >> + goto out_invalid_value; >> + mnt->nfs_server.nconnect = option; >> + break; >> case Opt_lookupcache: >> string = match_strdup(args); >> if (string == NULL) >> -- >> 2.9.3 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > Chuck Lever > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use 2017-05-04 13:45 ` [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use Chuck Lever 2017-05-04 13:53 ` Chuck Lever @ 2017-05-04 16:01 ` Chuck Lever 2017-05-04 17:36 ` J. Bruce Fields 1 sibling, 1 reply; 23+ messages in thread From: Chuck Lever @ 2017-05-04 16:01 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List > On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote: > > - Testing with a Linux server shows that the basic NFS/RDMA pieces > work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use > nconnect > 1. I'm looking into it. Reproduced with NFSv4.1, TCP, and nconnect=2. 363 /* 364 * RFC5661 18.51.3 365 * Before RECLAIM_COMPLETE done, server should deny new lock 366 */ 367 if (nfsd4_has_session(cstate) && 368 !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, 369 &cstate->session->se_client->cl_flags) && 370 open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS) 371 return nfserr_grace; Server-side instrumentation confirms: May 4 11:28:29 klimt kernel: nfsd4_open: has_session returns true May 4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false May 4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0 Network capture shows the RPCs are interleaved between the two connections as the client establishes its lease, and that appears to be confusing the server. C1: NULL -> NFS4_OK C1: EXCHANGE_ID -> NFS4_OK C2: CREATE_SESSION -> NFS4_OK C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED C2: SEQUENCE -> NFS4_OK C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION C1: BIND_CONN_TO_SESSION -> NFS4_OK C2: BIND_CONN_TO_SESSION -> NFS4_OK C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED .... mix of GETATTRs and other simple requests .... C1: OPEN -> NFS4ERR_GRACE C2: OPEN -> NFS4ERR_GRACE The RECLAIM_COMPLETE operation failed, and the client does not retry it. That leaves its lease stuck in GRACE. -- Chuck Lever ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use 2017-05-04 16:01 ` Chuck Lever @ 2017-05-04 17:36 ` J. Bruce Fields 2017-05-04 17:38 ` Chuck Lever 0 siblings, 1 reply; 23+ messages in thread From: J. Bruce Fields @ 2017-05-04 17:36 UTC (permalink / raw) To: Chuck Lever; +Cc: Trond Myklebust, Linux NFS Mailing List On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote: > > > On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote: > > > > - Testing with a Linux server shows that the basic NFS/RDMA pieces > > work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use > > nconnect > 1. I'm looking into it. > > Reproduced with NFSv4.1, TCP, and nconnect=2. > > 363 /* > 364 * RFC5661 18.51.3 > 365 * Before RECLAIM_COMPLETE done, server should deny new lock > 366 */ > 367 if (nfsd4_has_session(cstate) && > 368 !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, > 369 &cstate->session->se_client->cl_flags) && > 370 open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS) > 371 return nfserr_grace; > > Server-side instrumentation confirms: > > May 4 11:28:29 klimt kernel: nfsd4_open: has_session returns true > May 4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false > May 4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0 > > Network capture shows the RPCs are interleaved between the two > connections as the client establishes its lease, and that appears > to be confusing the server. > > C1: NULL -> NFS4_OK > C1: EXCHANGE_ID -> NFS4_OK > C2: CREATE_SESSION -> NFS4_OK > C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION What security flavors are involved? I believe the correct behavior depends on whether gss is in use or not. --b. > C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED > C2: SEQUENCE -> NFS4_OK > C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION > C1: BIND_CONN_TO_SESSION -> NFS4_OK > C2: BIND_CONN_TO_SESSION -> NFS4_OK > C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED > > .... mix of GETATTRs and other simple requests .... > > C1: OPEN -> NFS4ERR_GRACE > C2: OPEN -> NFS4ERR_GRACE > > The RECLAIM_COMPLETE operation failed, and the client does not > retry it. That leaves its lease stuck in GRACE. > > > -- > Chuck Lever > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use 2017-05-04 17:36 ` J. Bruce Fields @ 2017-05-04 17:38 ` Chuck Lever 2017-05-04 17:45 ` J. Bruce Fields 0 siblings, 1 reply; 23+ messages in thread From: Chuck Lever @ 2017-05-04 17:38 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Trond Myklebust, Linux NFS Mailing List > On May 4, 2017, at 1:36 PM, bfields@fieldses.org wrote: > > On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote: >> >>> On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote: >>> >>> - Testing with a Linux server shows that the basic NFS/RDMA pieces >>> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use >>> nconnect > 1. I'm looking into it. >> >> Reproduced with NFSv4.1, TCP, and nconnect=2. >> >> 363 /* >> 364 * RFC5661 18.51.3 >> 365 * Before RECLAIM_COMPLETE done, server should deny new lock >> 366 */ >> 367 if (nfsd4_has_session(cstate) && >> 368 !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, >> 369 &cstate->session->se_client->cl_flags) && >> 370 open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS) >> 371 return nfserr_grace; >> >> Server-side instrumentation confirms: >> >> May 4 11:28:29 klimt kernel: nfsd4_open: has_session returns true >> May 4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false >> May 4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0 >> >> Network capture shows the RPCs are interleaved between the two >> connections as the client establishes its lease, and that appears >> to be confusing the server. >> >> C1: NULL -> NFS4_OK >> C1: EXCHANGE_ID -> NFS4_OK >> C2: CREATE_SESSION -> NFS4_OK >> C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION > > What security flavors are involved? I believe the correct behavior > depends on whether gss is in use or not. The mount options are "sec=sys" but both sides have a keytab. So the lease management operations are done with krb5i. > --b. > >> C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED >> C2: SEQUENCE -> NFS4_OK >> C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION >> C1: BIND_CONN_TO_SESSION -> NFS4_OK >> C2: BIND_CONN_TO_SESSION -> NFS4_OK >> C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED >> >> .... mix of GETATTRs and other simple requests .... >> >> C1: OPEN -> NFS4ERR_GRACE >> C2: OPEN -> NFS4ERR_GRACE >> >> The RECLAIM_COMPLETE operation failed, and the client does not >> retry it. That leaves its lease stuck in GRACE. >> >> >> -- >> Chuck Lever >> >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use 2017-05-04 17:38 ` Chuck Lever @ 2017-05-04 17:45 ` J. Bruce Fields 2017-05-04 18:55 ` Chuck Lever 2017-05-04 20:40 ` Trond Myklebust 0 siblings, 2 replies; 23+ messages in thread From: J. Bruce Fields @ 2017-05-04 17:45 UTC (permalink / raw) To: Chuck Lever; +Cc: Trond Myklebust, Linux NFS Mailing List On Thu, May 04, 2017 at 01:38:35PM -0400, Chuck Lever wrote: > > > On May 4, 2017, at 1:36 PM, bfields@fieldses.org wrote: > > > > On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote: > >> > >>> On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote: > >>> > >>> - Testing with a Linux server shows that the basic NFS/RDMA pieces > >>> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use > >>> nconnect > 1. I'm looking into it. > >> > >> Reproduced with NFSv4.1, TCP, and nconnect=2. > >> > >> 363 /* > >> 364 * RFC5661 18.51.3 > >> 365 * Before RECLAIM_COMPLETE done, server should deny new lock > >> 366 */ > >> 367 if (nfsd4_has_session(cstate) && > >> 368 !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, > >> 369 &cstate->session->se_client->cl_flags) && > >> 370 open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS) > >> 371 return nfserr_grace; > >> > >> Server-side instrumentation confirms: > >> > >> May 4 11:28:29 klimt kernel: nfsd4_open: has_session returns true > >> May 4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false > >> May 4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0 > >> > >> Network capture shows the RPCs are interleaved between the two > >> connections as the client establishes its lease, and that appears > >> to be confusing the server. > >> > >> C1: NULL -> NFS4_OK > >> C1: EXCHANGE_ID -> NFS4_OK > >> C2: CREATE_SESSION -> NFS4_OK > >> C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION > > > > What security flavors are involved? I believe the correct behavior > > depends on whether gss is in use or not. > > The mount options are "sec=sys" but both sides have a keytab. > So the lease management operations are done with krb5i. OK. I'm pretty sure the client needs to send BIND_CONN_TO_SESSION before step C1. My memory is that over auth_sys you're allowed to treat any SEQUENCE over a new connection as implicitly binding that connection to the referenced session, but over krb5 the server's required to return that NOT_BOUND error if the server skips the BIND_CONN_TO_SESSION. I think CREATE_SESSION is allowed as long as the principals agree, and that's why the call at C2 succeeds. Seems a little weird, though. --b. > > > > --b. > > > >> C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED > >> C2: SEQUENCE -> NFS4_OK > >> C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION > >> C1: BIND_CONN_TO_SESSION -> NFS4_OK > >> C2: BIND_CONN_TO_SESSION -> NFS4_OK > >> C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED > >> > >> .... mix of GETATTRs and other simple requests .... > >> > >> C1: OPEN -> NFS4ERR_GRACE > >> C2: OPEN -> NFS4ERR_GRACE > >> > >> The RECLAIM_COMPLETE operation failed, and the client does not > >> retry it. That leaves its lease stuck in GRACE. > >> > >> > >> -- > >> Chuck Lever > >> > >> > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > Chuck Lever > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use 2017-05-04 17:45 ` J. Bruce Fields @ 2017-05-04 18:55 ` Chuck Lever 2017-05-04 19:58 ` J. Bruce Fields 2017-05-04 20:40 ` Trond Myklebust 1 sibling, 1 reply; 23+ messages in thread From: Chuck Lever @ 2017-05-04 18:55 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Trond Myklebust, Linux NFS Mailing List > On May 4, 2017, at 1:45 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > On Thu, May 04, 2017 at 01:38:35PM -0400, Chuck Lever wrote: >> >>> On May 4, 2017, at 1:36 PM, bfields@fieldses.org wrote: >>> >>> On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote: >>>> >>>>> On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote: >>>>> >>>>> - Testing with a Linux server shows that the basic NFS/RDMA pieces >>>>> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use >>>>> nconnect > 1. I'm looking into it. >>>> >>>> Reproduced with NFSv4.1, TCP, and nconnect=2. >>>> >>>> 363 /* >>>> 364 * RFC5661 18.51.3 >>>> 365 * Before RECLAIM_COMPLETE done, server should deny new lock >>>> 366 */ >>>> 367 if (nfsd4_has_session(cstate) && >>>> 368 !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, >>>> 369 &cstate->session->se_client->cl_flags) && >>>> 370 open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS) >>>> 371 return nfserr_grace; >>>> >>>> Server-side instrumentation confirms: >>>> >>>> May 4 11:28:29 klimt kernel: nfsd4_open: has_session returns true >>>> May 4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false >>>> May 4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0 >>>> >>>> Network capture shows the RPCs are interleaved between the two >>>> connections as the client establishes its lease, and that appears >>>> to be confusing the server. >>>> >>>> C1: NULL -> NFS4_OK >>>> C1: EXCHANGE_ID -> NFS4_OK >>>> C2: CREATE_SESSION -> NFS4_OK >>>> C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION >>> >>> What security flavors are involved? I believe the correct behavior >>> depends on whether gss is in use or not. >> >> The mount options are "sec=sys" but both sides have a keytab. >> So the lease management operations are done with krb5i. > > OK. I'm pretty sure the client needs to send BIND_CONN_TO_SESSION > before step C1. > > My memory is that over auth_sys you're allowed to treat any SEQUENCE > over a new connection as implicitly binding that connection to the > referenced session, but over krb5 the server's required to return that > NOT_BOUND error if the server skips the BIND_CONN_TO_SESSION. Ah, that would explain why nconnect=[234] is working against my Solaris 12 server: no keytab on that server means lease management is done using plain-old AUTH_SYS. Multiple connections are now handled entirely by the RPC layer, and are opened and used at rpc_clnt creation time. The NFS client is not aware (except for allowing more than one connection to be used) and relies on its own recovery mechanisms to deal with exceptions that might arise. IOW it doesn't seem to know that an extra BC2S is needed, nor does it know where in the RPC stream to insert that operation. Seems to me a good approach would be to handle server trunking discovery and lease establishment using a single connection, and then open more connections. A conservative approach might actually hold off on opening additional connections until there are enough RPC transactions being initiated in parallel to warrant it. Or, if @nconnect > 1, use a single connection to perform lease management, and open @nconnect additional connections that handle only per- mount I/O activity. > I think CREATE_SESSION is allowed as long as the principals agree, and > that's why the call at C2 succeeds. Seems a little weird, though. Well, there's no SEQUENCE operation in that COMPOUND. No session or connection to use there, I think the principal and client ID are the only way to recognize the target of the operation? > --b. > >> >> >>> --b. >>> >>>> C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED >>>> C2: SEQUENCE -> NFS4_OK >>>> C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION >>>> C1: BIND_CONN_TO_SESSION -> NFS4_OK >>>> C2: BIND_CONN_TO_SESSION -> NFS4_OK >>>> C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED >>>> >>>> .... mix of GETATTRs and other simple requests .... >>>> >>>> C1: OPEN -> NFS4ERR_GRACE >>>> C2: OPEN -> NFS4ERR_GRACE >>>> >>>> The RECLAIM_COMPLETE operation failed, and the client does not >>>> retry it. That leaves its lease stuck in GRACE. >>>> >>>> >>>> -- >>>> Chuck Lever >>>> >>>> >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- >> Chuck Lever >> >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use 2017-05-04 18:55 ` Chuck Lever @ 2017-05-04 19:58 ` J. Bruce Fields 0 siblings, 0 replies; 23+ messages in thread From: J. Bruce Fields @ 2017-05-04 19:58 UTC (permalink / raw) To: Chuck Lever; +Cc: Trond Myklebust, Linux NFS Mailing List On Thu, May 04, 2017 at 02:55:06PM -0400, Chuck Lever wrote: > > > On May 4, 2017, at 1:45 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > > > On Thu, May 04, 2017 at 01:38:35PM -0400, Chuck Lever wrote: > >> > >>> On May 4, 2017, at 1:36 PM, bfields@fieldses.org wrote: > >>> > >>> On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote: > >>>> > >>>>> On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote: > >>>>> > >>>>> - Testing with a Linux server shows that the basic NFS/RDMA pieces > >>>>> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use > >>>>> nconnect > 1. I'm looking into it. > >>>> > >>>> Reproduced with NFSv4.1, TCP, and nconnect=2. > >>>> > >>>> 363 /* > >>>> 364 * RFC5661 18.51.3 > >>>> 365 * Before RECLAIM_COMPLETE done, server should deny new lock > >>>> 366 */ > >>>> 367 if (nfsd4_has_session(cstate) && > >>>> 368 !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, > >>>> 369 &cstate->session->se_client->cl_flags) && > >>>> 370 open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS) > >>>> 371 return nfserr_grace; > >>>> > >>>> Server-side instrumentation confirms: > >>>> > >>>> May 4 11:28:29 klimt kernel: nfsd4_open: has_session returns true > >>>> May 4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false > >>>> May 4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0 > >>>> > >>>> Network capture shows the RPCs are interleaved between the two > >>>> connections as the client establishes its lease, and that appears > >>>> to be confusing the server. > >>>> > >>>> C1: NULL -> NFS4_OK > >>>> C1: EXCHANGE_ID -> NFS4_OK > >>>> C2: CREATE_SESSION -> NFS4_OK > >>>> C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION > >>> > >>> What security flavors are involved? I believe the correct behavior > >>> depends on whether gss is in use or not. > >> > >> The mount options are "sec=sys" but both sides have a keytab. > >> So the lease management operations are done with krb5i. > > > > OK. I'm pretty sure the client needs to send BIND_CONN_TO_SESSION > > before step C1. > > > > My memory is that over auth_sys you're allowed to treat any SEQUENCE > > over a new connection as implicitly binding that connection to the > > referenced session, but over krb5 the server's required to return that > > NOT_BOUND error if the server skips the BIND_CONN_TO_SESSION. > > Ah, that would explain why nconnect=[234] is working against my > Solaris 12 server: no keytab on that server means lease management > is done using plain-old AUTH_SYS. > > Multiple connections are now handled entirely by the RPC layer, > and are opened and used at rpc_clnt creation time. The NFS client > is not aware (except for allowing more than one connection to be > used) and relies on its own recovery mechanisms to deal with > exceptions that might arise. IOW it doesn't seem to know that an > extra BC2S is needed, nor does it know where in the RPC stream > to insert that operation. > > Seems to me a good approach would be to handle server trunking > discovery and lease establishment using a single connection, and > then open more connections. A conservative approach might actually > hold off on opening additional connections until there are enough > RPC transactions being initiated in parallel to warrant it. Or, if > @nconnect > 1, use a single connection to perform lease management, > and open @nconnect additional connections that handle only per- > mount I/O activity. > > > > I think CREATE_SESSION is allowed as long as the principals agree, and > > that's why the call at C2 succeeds. Seems a little weird, though. > > Well, there's no SEQUENCE operation in that COMPOUND. No session > or connection to use there, I think the principal and client ID > are the only way to recognize the target of the operation? I'm just not clear why the explicit BIND_CONN_TO_SESSION is required in the gss case. Actually, it's not gss exactly, it's the state protection level: If, when the client ID was created, the client opted for SP4_NONE state protection, the client is not required to use BIND_CONN_TO_SESSION to associate the connection with the session, unless the client wishes to associate the connection with the backchannel. When SP4_NONE protection is used, simply sending a COMPOUND request with a SEQUENCE operation is sufficient to associate the connection with the session specified in SEQUENCE. Anyway. --b. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use 2017-05-04 17:45 ` J. Bruce Fields 2017-05-04 18:55 ` Chuck Lever @ 2017-05-04 20:40 ` Trond Myklebust 2017-05-04 20:42 ` bfields 1 sibling, 1 reply; 23+ messages in thread From: Trond Myklebust @ 2017-05-04 20:40 UTC (permalink / raw) To: bfields, chuck.lever; +Cc: linux-nfs T24gVGh1LCAyMDE3LTA1LTA0IGF0IDEzOjQ1IC0wNDAwLCBKLiBCcnVjZSBGaWVsZHMgd3JvdGU6 DQo+IE9uIFRodSwgTWF5IDA0LCAyMDE3IGF0IDAxOjM4OjM1UE0gLTA0MDAsIENodWNrIExldmVy IHdyb3RlOg0KPiA+IA0KPiA+ID4gT24gTWF5IDQsIDIwMTcsIGF0IDE6MzYgUE0sIGJmaWVsZHNA ZmllbGRzZXMub3JnIHdyb3RlOg0KPiA+ID4gDQo+ID4gPiBPbiBUaHUsIE1heSAwNCwgMjAxNyBh dCAxMjowMToyOVBNIC0wNDAwLCBDaHVjayBMZXZlciB3cm90ZToNCj4gPiA+ID4gDQo+ID4gPiA+ ID4gT24gTWF5IDQsIDIwMTcsIGF0IDk6NDUgQU0sIENodWNrIExldmVyIDxjaHVjay5sZXZlckBv cmFjbGUuYw0KPiA+ID4gPiA+IG9tPiB3cm90ZToNCj4gPiA+ID4gPiANCj4gPiA+ID4gPiAtIFRl c3Rpbmcgd2l0aCBhIExpbnV4IHNlcnZlciBzaG93cyB0aGF0IHRoZSBiYXNpYyBORlMvUkRNQQ0K PiA+ID4gPiA+IHBpZWNlcw0KPiA+ID4gPiA+IHdvcmssIGJ1dCBhbnkgT1BFTiBvcGVyYXRpb24g Z2V0cyBORlM0RVJSX0dSQUNFLCBmb3JldmVyLA0KPiA+ID4gPiA+IHdoZW4gSSB1c2UNCj4gPiA+ ID4gPiBuY29ubmVjdCA+IDEuIEknbSBsb29raW5nIGludG8gaXQuDQo+ID4gPiA+IA0KPiA+ID4g PiBSZXByb2R1Y2VkIHdpdGggTkZTdjQuMSwgVENQLCBhbmQgbmNvbm5lY3Q9Mi4NCj4gPiA+ID4g DQo+ID4gPiA+IDM2M8KgwqDCoMKgwqDCoMKgwqDCoC8qDQo+ID4gPiA+IDM2NMKgwqDCoMKgwqDC oMKgwqDCoMKgKiBSRkM1NjYxIDE4LjUxLjMNCj4gPiA+ID4gMzY1wqDCoMKgwqDCoMKgwqDCoMKg wqAqIEJlZm9yZSBSRUNMQUlNX0NPTVBMRVRFIGRvbmUsIHNlcnZlciBzaG91bGQgZGVueQ0KPiA+ ID4gPiBuZXcgbG9jaw0KPiA+ID4gPiAzNjbCoMKgwqDCoMKgwqDCoMKgwqDCoCovDQo+ID4gPiA+ IDM2N8KgwqDCoMKgwqDCoMKgwqDCoGlmIChuZnNkNF9oYXNfc2Vzc2lvbihjc3RhdGUpICYmDQo+ ID4gPiA+IDM2OMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgIXRlc3RfYml0KE5GU0Q0X0NMSUVO VF9SRUNMQUlNX0NPTVBMRVRFLA0KPiA+ID4gPiAzNjnCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC oMKgwqDCoMKgwqDCoMKgwqDCoMKgJmNzdGF0ZS0+c2Vzc2lvbi0+c2VfY2xpZW50LQ0KPiA+ID4g PiA+Y2xfZmxhZ3MpICYmDQo+ID4gPiA+IDM3MMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgb3Bl bi0+b3BfY2xhaW1fdHlwZSAhPQ0KPiA+ID4gPiBORlM0X09QRU5fQ0xBSU1fUFJFVklPVVMpDQo+ ID4gPiA+IDM3McKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqByZXR1cm4gbmZzZXJy X2dyYWNlOw0KPiA+ID4gPiANCj4gPiA+ID4gU2VydmVyLXNpZGUgaW5zdHJ1bWVudGF0aW9uIGNv bmZpcm1zOg0KPiA+ID4gPiANCj4gPiA+ID4gTWF5wqDCoDQgMTE6Mjg6Mjkga2xpbXQga2VybmVs OiBuZnNkNF9vcGVuOiBoYXNfc2Vzc2lvbiByZXR1cm5zDQo+ID4gPiA+IHRydWUNCj4gPiA+ID4g TWF5wqDCoDQgMTE6Mjg6Mjkga2xpbXQga2VybmVsOiBuZnNkNF9vcGVuOiBSRUNMQUlNX0NPTVBM RVRFIGlzDQo+ID4gPiA+IGZhbHNlDQo+ID4gPiA+IE1hecKgwqA0IDExOjI4OjI5IGtsaW10IGtl cm5lbDogbmZzZDRfb3BlbjogY2xhaW1fdHlwZSBpcyAwDQo+ID4gPiA+IA0KPiA+ID4gPiBOZXR3 b3JrIGNhcHR1cmUgc2hvd3MgdGhlIFJQQ3MgYXJlIGludGVybGVhdmVkIGJldHdlZW4gdGhlIHR3 bw0KPiA+ID4gPiBjb25uZWN0aW9ucyBhcyB0aGUgY2xpZW50IGVzdGFibGlzaGVzIGl0cyBsZWFz ZSwgYW5kIHRoYXQNCj4gPiA+ID4gYXBwZWFycw0KPiA+ID4gPiB0byBiZSBjb25mdXNpbmcgdGhl IHNlcnZlci4NCj4gPiA+ID4gDQo+ID4gPiA+IEMxOiBOVUxMIC0+IE5GUzRfT0sNCj4gPiA+ID4g QzE6IEVYQ0hBTkdFX0lEIC0+IE5GUzRfT0sNCj4gPiA+ID4gQzI6IENSRUFURV9TRVNTSU9OIC0+ IE5GUzRfT0sNCj4gPiA+ID4gQzE6IFJFQ0xBSU1fQ09NUExFVEUgLT4gTkZTNEVSUl9DT05OX05P VF9CT1VORF9UT19TRVNTSU9ODQo+ID4gPiANCj4gPiA+IFdoYXQgc2VjdXJpdHkgZmxhdm9ycyBh cmUgaW52b2x2ZWQ/wqDCoEkgYmVsaWV2ZSB0aGUgY29ycmVjdA0KPiA+ID4gYmVoYXZpb3INCj4g PiA+IGRlcGVuZHMgb24gd2hldGhlciBnc3MgaXMgaW4gdXNlIG9yIG5vdC4NCj4gPiANCj4gPiBU aGUgbW91bnQgb3B0aW9ucyBhcmUgInNlYz1zeXMiIGJ1dCBib3RoIHNpZGVzIGhhdmUgYSBrZXl0 YWIuDQo+ID4gU28gdGhlIGxlYXNlIG1hbmFnZW1lbnQgb3BlcmF0aW9ucyBhcmUgZG9uZSB3aXRo IGtyYjVpLg0KPiANCj4gT0suwqDCoEknbSBwcmV0dHkgc3VyZSB0aGUgY2xpZW50IG5lZWRzIHRv IHNlbmQgQklORF9DT05OX1RPX1NFU1NJT04NCj4gYmVmb3JlIHN0ZXAgQzEuDQo+IA0KPiBNeSBt ZW1vcnkgaXMgdGhhdCBvdmVyIGF1dGhfc3lzIHlvdSdyZSBhbGxvd2VkIHRvIHRyZWF0IGFueSBT RVFVRU5DRQ0KPiBvdmVyIGEgbmV3IGNvbm5lY3Rpb24gYXMgaW1wbGljaXRseSBiaW5kaW5nIHRo YXQgY29ubmVjdGlvbiB0byB0aGUNCj4gcmVmZXJlbmNlZCBzZXNzaW9uLCBidXQgb3ZlciBrcmI1 IHRoZSBzZXJ2ZXIncyByZXF1aXJlZCB0byByZXR1cm4NCj4gdGhhdA0KPiBOT1RfQk9VTkQgZXJy b3IgaWYgdGhlIHNlcnZlciBza2lwcyB0aGUgQklORF9DT05OX1RPX1NFU1NJT04uDQo+IA0KPiBJ IHRoaW5rIENSRUFURV9TRVNTSU9OIGlzIGFsbG93ZWQgYXMgbG9uZyBhcyB0aGUgcHJpbmNpcGFs cyBhZ3JlZSwNCj4gYW5kDQo+IHRoYXQncyB3aHkgdGhlIGNhbGwgYXQgQzIgc3VjY2VlZHMuwqDC oFNlZW1zIGEgbGl0dGxlIHdlaXJkLCB0aG91Z2guDQo+IA0KDQpTZWUgaHR0cHM6Ly90b29scy5p ZXRmLm9yZy9odG1sL3JmYzU2NjEjc2VjdGlvbi0yLjEwLjMuMQ0KDQpTbywgd2UgcHJvYmFibHkg c2hvdWxkIHNlbmQgdGhlIEJJTkRfQ09OTl9UT19TRVNTSU9OIGFmdGVyIGNyZWF0aW5nIHRoZQ0K c2Vzc2lvbiwgYnV0IHNpbmNlIHRoYXQgaW52b2x2ZXMgZmlndXJpbmcgb3V0IHdoZXRoZXIgb3Ig bm90IHN0YXRlDQpwcm90ZWN0aW9uIHdhcyBzdWNjZXNzZnVsbHkgbmVnb3RpYXRlZCwgYW5kIHNp bmNlIHdlIGhhdmUgdG8gc3VwcG9ydA0KaGFuZGxpbmcgTkZTNEVSUl9DT05OX05PVF9CT1VORF9U T19TRVNTSU9OIGFueXdheSwgSSdtIGFsbCBmb3IganVzdA0Kd2FpdGluZyBmb3IgdGhlIHNlcnZl ciB0byBzZW5kIHRoZSBlcnJvci4NCg0KPiAtLWIuDQo+IA0KPiA+IA0KPiA+IA0KPiA+ID4gLS1i Lg0KPiA+ID4gDQo+ID4gPiA+IEMxOiBQVVRST09URkggfCBHRVRBVFRSIC0+IE5GUzRFUlJfU0VR X01JU09SREVSRUQNCj4gPiA+ID4gQzI6IFNFUVVFTkNFIC0+IE5GUzRfT0sNCj4gPiA+ID4gQzE6 IFBVVFJPT1RGSCB8IEdFVEFUVFIgLT4gTkZTNEVSUl9DT05OX05PVF9CT1VORF9UT19TRVNTSU9O DQo+ID4gPiA+IEMxOiBCSU5EX0NPTk5fVE9fU0VTU0lPTiAtPiBORlM0X09LDQo+ID4gPiA+IEMy OiBCSU5EX0NPTk5fVE9fU0VTU0lPTiAtPiBORlM0X09LDQo+ID4gPiA+IEMyOiBQVVRST09URkgg fCBHRVRBVFRSIC0+IE5GUzRFUlJfU0VRX01JU09SREVSRUQNCj4gPiA+ID4gDQo+ID4gPiA+IC4u Li4gbWl4IG9mIEdFVEFUVFJzIGFuZCBvdGhlciBzaW1wbGUgcmVxdWVzdHMgLi4uLg0KPiA+ID4g PiANCj4gPiA+ID4gQzE6IE9QRU4gLT4gTkZTNEVSUl9HUkFDRQ0KPiA+ID4gPiBDMjogT1BFTiAt PiBORlM0RVJSX0dSQUNFDQo+ID4gPiA+IA0KPiA+ID4gPiBUaGUgUkVDTEFJTV9DT01QTEVURSBv cGVyYXRpb24gZmFpbGVkLCBhbmQgdGhlIGNsaWVudCBkb2VzIG5vdA0KPiA+ID4gPiByZXRyeSBp dC4gVGhhdCBsZWF2ZXMgaXRzIGxlYXNlIHN0dWNrIGluIEdSQUNFLg0KPiA+ID4gPiANCj4gPiA+ ID4gDQo+ID4gPiA+IC0tDQo+ID4gPiA+IENodWNrIExldmVyDQo+ID4gPiA+IA0KPiA+ID4gPiAN Cj4gPiA+ID4gDQo+ID4gPiA+IC0tDQo+ID4gPiA+IFRvIHVuc3Vic2NyaWJlIGZyb20gdGhpcyBs aXN0OiBzZW5kIHRoZSBsaW5lICJ1bnN1YnNjcmliZQ0KPiA+ID4gPiBsaW51eC1uZnMiIGluDQo+ ID4gPiA+IHRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5rZXJuZWwub3Jn DQo+ID4gPiA+IE1vcmUgbWFqb3Jkb21vIGluZm8gYXTCoMKgaHR0cDovL3ZnZXIua2VybmVsLm9y Zy9tYWpvcmRvbW8taW5mby5oDQo+ID4gPiA+IHRtbA0KPiA+ID4gDQo+ID4gPiAtLQ0KPiA+ID4g VG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2NyaWJl IGxpbnV4LQ0KPiA+ID4gbmZzIiBpbg0KPiA+ID4gdGhlIGJvZHkgb2YgYSBtZXNzYWdlIHRvIG1h am9yZG9tb0B2Z2VyLmtlcm5lbC5vcmcNCj4gPiA+IE1vcmUgbWFqb3Jkb21vIGluZm8gYXTCoMKg aHR0cDovL3ZnZXIua2VybmVsLm9yZy9tYWpvcmRvbW8taW5mby5odG0NCj4gPiA+IGwNCj4gPiAN Cj4gPiAtLQ0KPiA+IENodWNrIExldmVyDQo+ID4gDQo+ID4gDQo+ID4gDQo+ID4gLS0NCj4gPiBU byB1bnN1YnNjcmliZSBmcm9tIHRoaXMgbGlzdDogc2VuZCB0aGUgbGluZSAidW5zdWJzY3JpYmUg bGludXgtDQo+ID4gbmZzIiBpbg0KPiA+IHRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRv bW9Admdlci5rZXJuZWwub3JnDQo+ID4gTW9yZSBtYWpvcmRvbW8gaW5mbyBhdMKgwqBodHRwOi8v dmdlci5rZXJuZWwub3JnL21ham9yZG9tby1pbmZvLmh0bWwNCj4gDQo+IC0tDQo+IFRvIHVuc3Vi c2NyaWJlIGZyb20gdGhpcyBsaXN0OiBzZW5kIHRoZSBsaW5lICJ1bnN1YnNjcmliZSBsaW51eC1u ZnMiDQo+IGluDQo+IHRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5rZXJu ZWwub3JnDQo+IE1vcmUgbWFqb3Jkb21vIGluZm8gYXTCoMKgaHR0cDovL3ZnZXIua2VybmVsLm9y Zy9tYWpvcmRvbW8taW5mby5odG1sDQo+IA0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5G UyBjbGllbnQgbWFpbnRhaW5lciwgUHJpbWFyeURhdGENCnRyb25kLm15a2xlYnVzdEBwcmltYXJ5 ZGF0YS5jb20NCg== ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use 2017-05-04 20:40 ` Trond Myklebust @ 2017-05-04 20:42 ` bfields 0 siblings, 0 replies; 23+ messages in thread From: bfields @ 2017-05-04 20:42 UTC (permalink / raw) To: Trond Myklebust; +Cc: chuck.lever, linux-nfs On Thu, May 04, 2017 at 08:40:07PM +0000, Trond Myklebust wrote: > See https://tools.ietf.org/html/rfc5661#section-2.10.3.1 > > So, we probably should send the BIND_CONN_TO_SESSION after creating the > session, but since that involves figuring out whether or not state > protection was successfully negotiated, and since we have to support > handling NFS4ERR_CONN_NOT_BOUND_TO_SESSION anyway, I'm all for just > waiting for the server to send the error. Makes sense to me. --b. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 0/5] Fun with the multipathing code 2017-04-28 17:25 [RFC PATCH 0/5] Fun with the multipathing code Trond Myklebust 2017-04-28 17:25 ` [RFC PATCH 1/5] SUNRPC: Allow creation of RPC clients with multiple connections Trond Myklebust @ 2017-04-28 17:45 ` Chuck Lever 2017-04-28 18:08 ` Trond Myklebust 2017-05-04 19:09 ` Anna Schumaker 2019-01-09 19:39 ` Olga Kornievskaia 3 siblings, 1 reply; 23+ messages in thread From: Chuck Lever @ 2017-04-28 17:45 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List > On Apr 28, 2017, at 10:25 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > > In the spirit of experimentation, I've put together a set of patches > that implement setting up multiple TCP connections to the server. > The connections all go to the same server IP address, so do not > provide support for multiple IP addresses (which I believe is > something Andy Adamson is working on). > > The feature is only enabled for NFSv4.1 and NFSv4.2 for now; I don't > feel comfortable subjecting NFSv3/v4 replay caches to this > treatment yet. It relies on the mount option "nconnect" to specify > the number of connections to st up. So you can do something like > 'mount -t nfs -overs=4.1,nconnect=8 foo:/bar /mnt' > to set up 8 TCP connections to server 'foo'. IMO this setting should eventually be set dynamically by the client, or should be global (eg., a module parameter). Since mount points to the same server share the same transport, what happens if you specify a different "nconnect" setting on two mount points to the same server? What will the client do if there are not enough resources (eg source ports) to create that many? Or is this an "up to N" kind of setting? I can imagine a big client having to reduce the number of connections to each server to help it scale in number of server connections. Other storage protocols have a mechanism for determining how transport connections are provisioned: One connection per CPU core (or one CPU per NUMA node) on the client. This gives a clear way to decide which connection to use for each RPC, and guarantees the reply will arrive at the same compute domain that sent the call. And of course: RPC-over-RDMA really loves this kind of feature (multiple connections between same IP tuples) to spread the workload over multiple QPs. There isn't anything special needed for RDMA, I hope, but I'll have a look at the SUNRPC pieces. Thanks for posting, I'm looking forward to seeing this capability in the Linux client. > Anyhow, feel free to test and give me feedback as to whether or not > this helps performance on your system. > > Trond Myklebust (5): > SUNRPC: Allow creation of RPC clients with multiple connections > NFS: Add a mount option to specify number of TCP connections to use > NFSv4: Allow multiple connections to NFSv4.x (x>0) servers > pNFS: Allow multiple connections to the DS > NFS: Display the "nconnect" mount option if it is set. > > fs/nfs/client.c | 2 ++ > fs/nfs/internal.h | 2 ++ > fs/nfs/nfs3client.c | 3 +++ > fs/nfs/nfs4client.c | 13 +++++++++++-- > fs/nfs/super.c | 12 ++++++++++++ > include/linux/nfs_fs_sb.h | 1 + > include/linux/sunrpc/clnt.h | 1 + > net/sunrpc/clnt.c | 17 ++++++++++++++++- > net/sunrpc/xprtmultipath.c | 3 +-- > 9 files changed, 49 insertions(+), 5 deletions(-) > > -- > 2.9.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 0/5] Fun with the multipathing code 2017-04-28 17:45 ` [RFC PATCH 0/5] Fun with the multipathing code Chuck Lever @ 2017-04-28 18:08 ` Trond Myklebust 2017-04-29 17:53 ` Chuck Lever 0 siblings, 1 reply; 23+ messages in thread From: Trond Myklebust @ 2017-04-28 18:08 UTC (permalink / raw) To: chuck.lever; +Cc: linux-nfs T24gRnJpLCAyMDE3LTA0LTI4IGF0IDEwOjQ1IC0wNzAwLCBDaHVjayBMZXZlciB3cm90ZToNCj4g PiBPbiBBcHIgMjgsIDIwMTcsIGF0IDEwOjI1IEFNLCBUcm9uZCBNeWtsZWJ1c3QgPHRyb25kLm15 a2xlYnVzdEBwcmltDQo+ID4gYXJ5ZGF0YS5jb20+IHdyb3RlOg0KPiA+IA0KPiA+IEluIHRoZSBz cGlyaXQgb2YgZXhwZXJpbWVudGF0aW9uLCBJJ3ZlIHB1dCB0b2dldGhlciBhIHNldCBvZg0KPiA+ IHBhdGNoZXMNCj4gPiB0aGF0IGltcGxlbWVudCBzZXR0aW5nIHVwIG11bHRpcGxlIFRDUCBjb25u ZWN0aW9ucyB0byB0aGUgc2VydmVyLg0KPiA+IFRoZSBjb25uZWN0aW9ucyBhbGwgZ28gdG8gdGhl IHNhbWUgc2VydmVyIElQIGFkZHJlc3MsIHNvIGRvIG5vdA0KPiA+IHByb3ZpZGUgc3VwcG9ydCBm b3IgbXVsdGlwbGUgSVAgYWRkcmVzc2VzICh3aGljaCBJIGJlbGlldmUgaXMNCj4gPiBzb21ldGhp bmcgQW5keSBBZGFtc29uIGlzIHdvcmtpbmcgb24pLg0KPiA+IA0KPiA+IFRoZSBmZWF0dXJlIGlz IG9ubHkgZW5hYmxlZCBmb3IgTkZTdjQuMSBhbmQgTkZTdjQuMiBmb3Igbm93OyBJDQo+ID4gZG9u J3QNCj4gPiBmZWVsIGNvbWZvcnRhYmxlIHN1YmplY3RpbmcgTkZTdjMvdjQgcmVwbGF5IGNhY2hl cyB0byB0aGlzDQo+ID4gdHJlYXRtZW50IHlldC4gSXQgcmVsaWVzIG9uIHRoZSBtb3VudCBvcHRp b24gIm5jb25uZWN0IiB0byBzcGVjaWZ5DQo+ID4gdGhlIG51bWJlciBvZiBjb25uZWN0aW9ucyB0 byBzdCB1cC4gU28geW91IGNhbiBkbyBzb21ldGhpbmcgbGlrZQ0KPiA+IMKgJ21vdW50IC10IG5m cyAtb3ZlcnM9NC4xLG5jb25uZWN0PTggZm9vOi9iYXIgL21udCcNCj4gPiB0byBzZXQgdXAgOCBU Q1AgY29ubmVjdGlvbnMgdG8gc2VydmVyICdmb28nLg0KPiANCj4gSU1PIHRoaXMgc2V0dGluZyBz aG91bGQgZXZlbnR1YWxseSBiZSBzZXQgZHluYW1pY2FsbHkgYnkgdGhlDQo+IGNsaWVudCwgb3Ig c2hvdWxkIGJlIGdsb2JhbCAoZWcuLCBhIG1vZHVsZSBwYXJhbWV0ZXIpLg0KDQpUaGVyZSBpcyBh biBhcmd1bWVudCBmb3IgbWFraW5nIGl0IGEgcGVyLXNlcnZlciB2YWx1ZSAod2hpY2ggaXMgd2hh dA0KdGhpcyBwYXRjaHNldCBkb2VzKS4gSXQgYWxsb3dzIHRoZSBhZG1pbiBhIGNlcnRhaW4gY29u dHJvbCB0byBsaW1pdCB0aGUNCm51bWJlciBvZiBjb25uZWN0aW9ucyB0byBzcGVjaWZpYyBzZXJ2 ZXJzIHRoYXQgYXJlIG5lZWQgdG8gc2VydmUgbGFyZ2VyDQpudW1iZXJzIG9mIGNsaWVudHMuIEhv d2V2ZXIgSSdtIG9wZW4gdG8gY291bnRlciBhcmd1bWVudHMuIEkndmUgbm8NCnN0cm9uZyBvcGlu aW9ucyB5ZXQuDQoNCj4gU2luY2UgbW91bnQgcG9pbnRzIHRvIHRoZSBzYW1lIHNlcnZlciBzaGFy ZSB0aGUgc2FtZSB0cmFuc3BvcnQsDQo+IHdoYXQgaGFwcGVucyBpZiB5b3Ugc3BlY2lmeSBhIGRp ZmZlcmVudCAibmNvbm5lY3QiIHNldHRpbmcgb24NCj4gdHdvIG1vdW50IHBvaW50cyB0byB0aGUg c2FtZSBzZXJ2ZXI/DQoNCkN1cnJlbnRseSwgdGhlIGZpcnN0IG9uZSB3aW5zLg0KDQo+IFdoYXQg d2lsbCB0aGUgY2xpZW50IGRvIGlmIHRoZXJlIGFyZSBub3QgZW5vdWdoIHJlc291cmNlcw0KPiAo ZWcgc291cmNlIHBvcnRzKSB0byBjcmVhdGUgdGhhdCBtYW55PyBPciBpcyB0aGlzIGFuICJ1cCB0 byBOIg0KPiBraW5kIG9mIHNldHRpbmc/IEkgY2FuIGltYWdpbmUgYSBiaWcgY2xpZW50IGhhdmlu ZyB0byByZWR1Y2UNCj4gdGhlIG51bWJlciBvZiBjb25uZWN0aW9ucyB0byBlYWNoIHNlcnZlciB0 byBoZWxwIGl0IHNjYWxlIGluDQo+IG51bWJlciBvZiBzZXJ2ZXIgY29ubmVjdGlvbnMuDQoNClRo ZXJlIGlzIGFuIGFyYml0cmFyeSAoY29tcGlsZSB0aW1lKSBsaW1pdCBvZiAxNi4gVGhlIHVzZSBv ZiB0aGUNClNPX1JFVVNFUE9SVCBzb2NrZXQgb3B0aW9uIGVuc3VyZXMgdGhhdCB3ZSBzaG91bGQg YWxtb3N0IGFsd2F5cyBiZSBhYmxlDQp0byBzYXRpc2Z5IHRoYXQgbnVtYmVyIG9mIHNvdXJjZSBw b3J0cywgc2luY2UgdGhleSBjYW4gYmUgc2hhcmVkIHdpdGgNCmNvbm5lY3Rpb25zIHRvIG90aGVy IHNlcnZlcnMuDQoNCj4gT3RoZXIgc3RvcmFnZSBwcm90b2NvbHMgaGF2ZSBhIG1lY2hhbmlzbSBm b3IgZGV0ZXJtaW5pbmcgaG93DQo+IHRyYW5zcG9ydCBjb25uZWN0aW9ucyBhcmUgcHJvdmlzaW9u ZWQ6IE9uZSBjb25uZWN0aW9uIHBlcg0KPiBDUFUgY29yZSAob3Igb25lIENQVSBwZXIgTlVNQSBu b2RlKSBvbiB0aGUgY2xpZW50LiBUaGlzIGdpdmVzDQo+IGEgY2xlYXIgd2F5IHRvIGRlY2lkZSB3 aGljaCBjb25uZWN0aW9uIHRvIHVzZSBmb3IgZWFjaCBSUEMsDQo+IGFuZCBndWFyYW50ZWVzIHRo ZSByZXBseSB3aWxsIGFycml2ZSBhdCB0aGUgc2FtZSBjb21wdXRlDQo+IGRvbWFpbiB0aGF0IHNl bnQgdGhlIGNhbGwuDQoNCkNhbiB3ZSBwZXJoYXBzIGxheSBvdXQgYSBjYXNlIGZvciB3aGljaCBt ZWNoYW5pc21zIGFyZSB1c2VmdWwgYXMgZmFyIGFzDQpoYXJkd2FyZSBpcyBjb25jZXJuZWQ/IEkg dW5kZXJzdGFuZCB0aGUgc29ja2V0IGNvZGUgaXMgYWxyZWFkeQ0KYWZmaW5pdGlzZWQgdG8gQ1BV IGNhY2hlcywgc28gdGhhdCBvbmUncyBlYXN5LiBJJ20gbGVzcyBmYW1pbGlhciB3aXRoDQp0aGUg dmFyaW91cyBmZWF0dXJlcyBvZiB0aGUgdW5kZXJseWluZyBvZmZsb2FkZWQgTklDcyBhbmQgaG93 IHRoZXkgdGVuZA0KdG8gcmVhY3Qgd2hlbiB5b3UgYWRkL3N1YnRyYWN0IFRDUCBjb25uZWN0aW9u cy4NCg0KPiBBbmQgb2YgY291cnNlOiBSUEMtb3Zlci1SRE1BIHJlYWxseSBsb3ZlcyB0aGlzIGtp bmQgb2YgZmVhdHVyZQ0KPiAobXVsdGlwbGUgY29ubmVjdGlvbnMgYmV0d2VlbiBzYW1lIElQIHR1 cGxlcykgdG8gc3ByZWFkIHRoZQ0KPiB3b3JrbG9hZCBvdmVyIG11bHRpcGxlIFFQcy4gVGhlcmUg aXNuJ3QgYW55dGhpbmcgc3BlY2lhbCBuZWVkZWQNCj4gZm9yIFJETUEsIEkgaG9wZSwgYnV0IEkn bGwgaGF2ZSBhIGxvb2sgYXQgdGhlIFNVTlJQQyBwaWVjZXMuDQoNCkkgaGF2ZW4ndCB5ZXQgZW5h YmxlZCBpdCBmb3IgUlBDL1JETUEsIGJ1dCBJIGltYWdpbmUgeW91IGNhbiBoZWxwIG91dA0KaWYg eW91IGZpbmQgaXQgdXNlZnVsIChhcyB5b3UgYXBwZWFyIHRvIGRvKS4NCg0KPiBUaGFua3MgZm9y IHBvc3RpbmcsIEknbSBsb29raW5nIGZvcndhcmQgdG8gc2VlaW5nIHRoaXMNCj4gY2FwYWJpbGl0 eSBpbiB0aGUgTGludXggY2xpZW50Lg0KPiANCj4gDQo+ID4gQW55aG93LCBmZWVsIGZyZWUgdG8g dGVzdCBhbmQgZ2l2ZSBtZSBmZWVkYmFjayBhcyB0byB3aGV0aGVyIG9yIG5vdA0KPiA+IHRoaXMg aGVscHMgcGVyZm9ybWFuY2Ugb24geW91ciBzeXN0ZW0uDQo+ID4gDQo+ID4gVHJvbmQgTXlrbGVi dXN0ICg1KToNCj4gPiDCoFNVTlJQQzogQWxsb3cgY3JlYXRpb24gb2YgUlBDIGNsaWVudHMgd2l0 aCBtdWx0aXBsZSBjb25uZWN0aW9ucw0KPiA+IMKgTkZTOiBBZGQgYSBtb3VudCBvcHRpb24gdG8g c3BlY2lmeSBudW1iZXIgb2YgVENQIGNvbm5lY3Rpb25zIHRvDQo+ID4gdXNlDQo+ID4gwqBORlN2 NDogQWxsb3cgbXVsdGlwbGUgY29ubmVjdGlvbnMgdG8gTkZTdjQueCAoeD4wKSBzZXJ2ZXJzDQo+ ID4gwqBwTkZTOiBBbGxvdyBtdWx0aXBsZSBjb25uZWN0aW9ucyB0byB0aGUgRFMNCj4gPiDCoE5G UzogRGlzcGxheSB0aGUgIm5jb25uZWN0IiBtb3VudCBvcHRpb24gaWYgaXQgaXMgc2V0Lg0KPiA+ IA0KPiA+IGZzL25mcy9jbGllbnQuY8KgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgfMKgwqAyICsr DQo+ID4gZnMvbmZzL2ludGVybmFsLmjCoMKgwqDCoMKgwqDCoMKgwqDCoMKgfMKgwqAyICsrDQo+ ID4gZnMvbmZzL25mczNjbGllbnQuY8KgwqDCoMKgwqDCoMKgwqDCoHzCoMKgMyArKysNCj4gPiBm cy9uZnMvbmZzNGNsaWVudC5jwqDCoMKgwqDCoMKgwqDCoMKgfCAxMyArKysrKysrKysrKy0tDQo+ ID4gZnMvbmZzL3N1cGVyLmPCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgfCAxMiArKysrKysr KysrKysNCj4gPiBpbmNsdWRlL2xpbnV4L25mc19mc19zYi5owqDCoMKgfMKgwqAxICsNCj4gPiBp bmNsdWRlL2xpbnV4L3N1bnJwYy9jbG50LmggfMKgwqAxICsNCj4gPiBuZXQvc3VucnBjL2NsbnQu Y8KgwqDCoMKgwqDCoMKgwqDCoMKgwqB8IDE3ICsrKysrKysrKysrKysrKystDQo+ID4gbmV0L3N1 bnJwYy94cHJ0bXVsdGlwYXRoLmPCoMKgfMKgwqAzICstLQ0KPiA+IDkgZmlsZXMgY2hhbmdlZCwg NDkgaW5zZXJ0aW9ucygrKSwgNSBkZWxldGlvbnMoLSkNCj4gPiANCj4gPiAtLcKgDQo+ID4gMi45 LjMNCj4gPiANCj4gPiAtLQ0KPiA+IFRvIHVuc3Vic2NyaWJlIGZyb20gdGhpcyBsaXN0OiBzZW5k IHRoZSBsaW5lICJ1bnN1YnNjcmliZSBsaW51eC0NCj4gPiBuZnMiIGluDQo+ID4gdGhlIGJvZHkg b2YgYSBtZXNzYWdlIHRvIG1ham9yZG9tb0B2Z2VyLmtlcm5lbC5vcmcNCj4gPiBNb3JlIG1ham9y ZG9tbyBpbmZvIGF0wqDCoGh0dHA6Ly92Z2VyLmtlcm5lbC5vcmcvbWFqb3Jkb21vLWluZm8uaHRt bA0KPiANCj4gLS0NCj4gQ2h1Y2sgTGV2ZXINCj4gDQo+IA0KPiANCi0tIA0KVHJvbmQgTXlrbGVi dXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXIsIFByaW1hcnlEYXRhDQp0cm9uZC5teWts ZWJ1c3RAcHJpbWFyeWRhdGEuY29tDQo= ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 0/5] Fun with the multipathing code 2017-04-28 18:08 ` Trond Myklebust @ 2017-04-29 17:53 ` Chuck Lever 0 siblings, 0 replies; 23+ messages in thread From: Chuck Lever @ 2017-04-29 17:53 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List > On Apr 28, 2017, at 2:08 PM, Trond Myklebust <trondmy@primarydata.com> wrote: > > On Fri, 2017-04-28 at 10:45 -0700, Chuck Lever wrote: >>> On Apr 28, 2017, at 10:25 AM, Trond Myklebust <trond.myklebust@prim >>> arydata.com> wrote: >>> >>> In the spirit of experimentation, I've put together a set of >>> patches >>> that implement setting up multiple TCP connections to the server. >>> The connections all go to the same server IP address, so do not >>> provide support for multiple IP addresses (which I believe is >>> something Andy Adamson is working on). >>> >>> The feature is only enabled for NFSv4.1 and NFSv4.2 for now; I >>> don't >>> feel comfortable subjecting NFSv3/v4 replay caches to this >>> treatment yet. It relies on the mount option "nconnect" to specify >>> the number of connections to st up. So you can do something like >>> 聽'mount -t nfs -overs=4.1,nconnect=8 foo:/bar /mnt' >>> to set up 8 TCP connections to server 'foo'. >> >> IMO this setting should eventually be set dynamically by the >> client, or should be global (eg., a module parameter). > > There is an argument for making it a per-server value (which is what > this patchset does). It allows the admin a certain control to limit the > number of connections to specific servers that are need to serve larger > numbers of clients. However I'm open to counter arguments. I've no > strong opinions yet. Like direct I/O, this kind of setting could allow a single client to DoS a server. One additional concern might be how to deal with servers who have no more ability to accept connections during certain periods, but are able to support a lot of connections at other times. >> Since mount points to the same server share the same transport, >> what happens if you specify a different "nconnect" setting on >> two mount points to the same server? > > Currently, the first one wins. > >> What will the client do if there are not enough resources >> (eg source ports) to create that many? Or is this an "up to N" >> kind of setting? I can imagine a big client having to reduce >> the number of connections to each server to help it scale in >> number of server connections. > > There is an arbitrary (compile time) limit of 16. The use of the > SO_REUSEPORT socket option ensures that we should almost always be able > to satisfy that number of source ports, since they can be shared with > connections to other servers. FWIW, Solaris limits this setting to 8. I think past that value, there is only incremental and diminishing gain. That could be apples to pears, though. I'm not aware of a mount option, but there might be a system tunable that controls this setting on each client. >> Other storage protocols have a mechanism for determining how >> transport connections are provisioned: One connection per >> CPU core (or one CPU per NUMA node) on the client. This gives >> a clear way to decide which connection to use for each RPC, >> and guarantees the reply will arrive at the same compute >> domain that sent the call. > > Can we perhaps lay out a case for which mechanisms are useful as far as > hardware is concerned? I understand the socket code is already > affinitised to CPU caches, so that one's easy. I'm less familiar with > the various features of the underlying offloaded NICs and how they tend > to react when you add/subtract TCP connections. Well, the optimal number of connections varies depending on the NIC hardware design. I don't think there's a hard-and-fast rule, but typically the server-class NICs have multiple DMA engines and multiple cores. Thus they benefit from having multiple sockets, up to a point. Smaller clients would have a handful of cores, a single memory hierarchy, and one NIC. I would guess optimizing for the NIC (or server) would be best in that case. I'd bet two connections would be a very good default. For large clients, a connection per NUMA node makes sense. This keeps the amount of cross-node memory traffic to a minimum, which improves system scalability. The issues with "socket per CPU core" are: there can be a lot of cores, and it might be wasteful to open that many sockets to each NFS server; and what do you do with a socket when a CPU core is taken offline? >> And of course: RPC-over-RDMA really loves this kind of feature >> (multiple connections between same IP tuples) to spread the >> workload over multiple QPs. There isn't anything special needed >> for RDMA, I hope, but I'll have a look at the SUNRPC pieces. > > I haven't yet enabled it for RPC/RDMA, but I imagine you can help out > if you find it useful (as you appear to do). I can give the patch set a try this week. I haven't seen any thing that would exclude proto=rdma from playing in this sandbox. >> Thanks for posting, I'm looking forward to seeing this >> capability in the Linux client. >> >> >>> Anyhow, feel free to test and give me feedback as to whether or not >>> this helps performance on your system. >>> >>> Trond Myklebust (5): >>> 聽SUNRPC: Allow creation of RPC clients with multiple connections >>> 聽NFS: Add a mount option to specify number of TCP connections to >>> use >>> 聽NFSv4: Allow multiple connections to NFSv4.x (x>0) servers >>> 聽pNFS: Allow multiple connections to the DS >>> 聽NFS: Display the "nconnect" mount option if it is set. >>> >>> fs/nfs/client.c聽聽聽聽聽聽聽聽聽聽聽聽聽|聽聽2 ++ >>> fs/nfs/internal.h聽聽聽聽聽聽聽聽聽聽聽|聽聽2 ++ >>> fs/nfs/nfs3client.c聽聽聽聽聽聽聽聽聽|聽聽3 +++ >>> fs/nfs/nfs4client.c聽聽聽聽聽聽聽聽聽| 13 +++++++++++-- >>> fs/nfs/super.c聽聽聽聽聽聽聽聽聽聽聽聽聽聽| 12 ++++++++++++ >>> include/linux/nfs_fs_sb.h聽聽聽|聽聽1 + >>> include/linux/sunrpc/clnt.h |聽聽1 + >>> net/sunrpc/clnt.c聽聽聽聽聽聽聽聽聽聽聽| 17 ++++++++++++++++- >>> net/sunrpc/xprtmultipath.c聽聽|聽聽3 +-- >>> 9 files changed, 49 insertions(+), 5 deletions(-) >>> >>> --聽 >>> 2.9.3 >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux- >>> nfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at聽聽http://vger.kernel.org/majordomo-info.html >> >> -- >> Chuck Lever >> >> >> > -- > Trond Myklebust > Linux NFS client maintainer, PrimaryData > trond.myklebust@primarydata.com > �N嫥叉靣笡y氊b瞂千v豝�)藓{.n�+壏{睗�"炟^n噐■��侂h櫒璀�&Ⅷ�瓽珴閔��(殠娸"濟���m��飦赇z罐枈帼f"穐殘坢 -- Chuck Lever ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 0/5] Fun with the multipathing code 2017-04-28 17:25 [RFC PATCH 0/5] Fun with the multipathing code Trond Myklebust 2017-04-28 17:25 ` [RFC PATCH 1/5] SUNRPC: Allow creation of RPC clients with multiple connections Trond Myklebust 2017-04-28 17:45 ` [RFC PATCH 0/5] Fun with the multipathing code Chuck Lever @ 2017-05-04 19:09 ` Anna Schumaker 2019-01-09 19:39 ` Olga Kornievskaia 3 siblings, 0 replies; 23+ messages in thread From: Anna Schumaker @ 2017-05-04 19:09 UTC (permalink / raw) To: Trond Myklebust, linux-nfs Hi Trond, I'm testing these on two VMs with a single core each, so probably not the use case you had in mind for these patches. I ran my script that runs connectathon tests on every NFS version, and I'm seeing it consistently takes about a minute longer with "nconnect=2" than it does without the option. Thanks for working on this! Anna On 04/28/2017 01:25 PM, Trond Myklebust wrote: > In the spirit of experimentation, I've put together a set of patches > that implement setting up multiple TCP connections to the server. > The connections all go to the same server IP address, so do not > provide support for multiple IP addresses (which I believe is > something Andy Adamson is working on). > > The feature is only enabled for NFSv4.1 and NFSv4.2 for now; I don't > feel comfortable subjecting NFSv3/v4 replay caches to this > treatment yet. It relies on the mount option "nconnect" to specify > the number of connections to st up. So you can do something like > 'mount -t nfs -overs=4.1,nconnect=8 foo:/bar /mnt' > to set up 8 TCP connections to server 'foo'. > > Anyhow, feel free to test and give me feedback as to whether or not > this helps performance on your system. > > Trond Myklebust (5): > SUNRPC: Allow creation of RPC clients with multiple connections > NFS: Add a mount option to specify number of TCP connections to use > NFSv4: Allow multiple connections to NFSv4.x (x>0) servers > pNFS: Allow multiple connections to the DS > NFS: Display the "nconnect" mount option if it is set. > > fs/nfs/client.c | 2 ++ > fs/nfs/internal.h | 2 ++ > fs/nfs/nfs3client.c | 3 +++ > fs/nfs/nfs4client.c | 13 +++++++++++-- > fs/nfs/super.c | 12 ++++++++++++ > include/linux/nfs_fs_sb.h | 1 + > include/linux/sunrpc/clnt.h | 1 + > net/sunrpc/clnt.c | 17 ++++++++++++++++- > net/sunrpc/xprtmultipath.c | 3 +-- > 9 files changed, 49 insertions(+), 5 deletions(-) > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 0/5] Fun with the multipathing code 2017-04-28 17:25 [RFC PATCH 0/5] Fun with the multipathing code Trond Myklebust ` (2 preceding siblings ...) 2017-05-04 19:09 ` Anna Schumaker @ 2019-01-09 19:39 ` Olga Kornievskaia 2019-01-09 20:38 ` Trond Myklebust 3 siblings, 1 reply; 23+ messages in thread From: Olga Kornievskaia @ 2019-01-09 19:39 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs Hi Trond, Do you have any plans for this patch set? I applied the patches on top of 4.20-rc7 kernel I had and tested it (linux to linux) with iozone on the hardware (40G link with Mellanox CX-5 card). Results seem to show read IO improvement from 1.9GB to 3.9GB. Write IO speed seems to be the same (disk bound I'm guessing). I also tried mounting tmpfs. Same thing. Seems like a useful feature to include? Some raw numbers I got. Each nconnect=X value is just a single data point. With nconnect=10 Command line used: /home/kolga/iozone3_482/src/current/iozone -i0 -i1 -s52m -y2k -az -I Output is in kBytes/sec random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 53248 2 7820 10956 20960 20927 53248 4 14871 21305 38743 38242 53248 8 27803 35001 75568 75830 53248 16 47452 59596 132513 130921 53248 32 70572 84940 234902 233423 53248 64 94774 101237 355664 354372 53248 128 114667 119413 523245 524855 53248 256 132340 137530 682411 681260 53248 512 143172 146157 784144 356064 53248 1024 148874 154177 1013764 982943 53248 2048 144311 161233 1282095 1592057 53248 4096 164679 169837 1637788 2438329 53248 8192 159221 142882 188536 1523659 53248 16384 169236 96996 3914910 1875398 With nconnect=9 random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 53248 2 7833 10991 20893 20910 53248 4 15254 21136 40030 37510 53248 8 28077 37834 76688 67560 53248 16 47850 60174 137175 135266 53248 32 70653 85120 240219 235160 53248 64 96742 103856 364931 363556 53248 128 115002 119222 526446 517589 53248 256 132349 137254 684606 693748 53248 512 142849 147385 838735 876868 53248 1024 149612 152187 1060375 968514 53248 2048 150830 156006 1476364 1689987 53248 4096 163228 168421 1000338 1645183 53248 8192 165049 151047 3168655 3274393 53248 16384 166007 175972 743835 3817903 With nconnect=8 random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 53248 2 7118 10321 20281 20353 53248 4 13960 20445 39233 39160 53248 8 24688 36543 74964 75111 53248 16 44674 57346 131362 130294 53248 32 67547 82716 231881 228998 53248 64 94195 103270 345326 343389 53248 128 116830 119816 521772 511537 53248 256 133709 137917 682126 693098 53248 512 143913 148801 878939 860046 53248 1024 150329 154027 1041977 1028612 53248 2048 157680 158844 7378 1486753 53248 4096 159543 160027 2441901 2168589 53248 8192 165155 160193 2515452 3142285 53248 16384 169411 176009 2385325 3894130 With nconnect=7 random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 53248 2 7574 10593 20459 20381 53248 4 15064 20928 39865 39760 53248 8 27696 36864 74300 65721 53248 16 46960 59010 128354 127600 53248 32 68841 83578 230226 227369 53248 64 93114 100612 342303 331331 53248 128 112599 116108 498004 508645 53248 256 130668 136554 653718 634570 53248 512 142318 146749 805566 807056 53248 1024 148693 152493 965095 974736 53248 2048 157342 161170 1794490 1697579 53248 4096 144672 161154 2371227 2089308 53248 8192 148515 172814 3098132 766539 53248 16384 152801 143075 3799398 3778023 With nconnect=6 random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 53248 2 7832 11103 21119 21254 53248 4 15490 21607 40520 40215 53248 8 25519 37333 78626 77118 53248 16 47885 54596 139343 138482 53248 32 71914 85094 239720 237024 53248 64 93901 100491 383238 377849 53248 128 95497 119289 545658 533312 53248 256 131614 137665 726717 716209 53248 512 143397 147452 896038 869623 53248 1024 149938 153885 1057554 1062727 53248 2048 157542 159369 1750302 1691100 53248 4096 163450 162691 2524086 2622917 53248 8192 162439 153065 3320433 3286189 53248 16384 153553 166918 3873279 3855965 With nconnect=5 random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 53248 2 7592 10794 20382 20251 53248 4 15068 21096 41136 41865 53248 8 27606 37260 74947 74655 53248 16 47387 59806 137103 135962 53248 32 70402 83767 244301 241492 53248 64 95702 103042 361709 356424 53248 128 114189 118505 564857 556585 53248 256 132799 137856 751432 726667 53248 512 143233 146747 900493 921180 53248 1024 150787 154337 1106200 1088739 53248 2048 156873 161403 1133588 1709520 53248 4096 163741 166672 2468622 2275947 53248 8192 147689 165501 2969179 2943782 53248 16384 157076 143898 3468473 3580892 With nconnect=4 random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 53248 2 7280 10499 20140 21610 53248 4 15003 20658 39282 39084 53248 8 27440 36211 72983 74006 53248 16 46702 58114 130113 129372 53248 32 67942 81592 237173 246333 53248 64 92098 98403 351618 349844 53248 128 117327 120451 492681 480222 53248 256 134457 137616 676207 666874 53248 512 144648 148179 853880 855267 53248 1024 151171 156382 1108038 1075847 53248 2048 157698 161736 1704862 1659547 53248 4096 164955 163237 9991 2274603 53248 8192 167987 173542 3189440 1304661 53248 16384 160230 158367 616211 1008327 With nconnect=3 random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 53248 2 7954 11188 21786 21304 53248 4 15574 21973 41739 40116 53248 8 26917 38019 77460 77323 53248 16 47879 60593 140885 139938 53248 32 69304 83709 250196 247017 53248 64 95273 102929 371638 362578 53248 128 113436 118636 504672 495772 53248 256 131659 136857 749558 739310 53248 512 142581 146588 933209 907939 53248 1024 149502 152321 1092066 1093344 53248 2048 156992 162151 1821551 1772388 53248 4096 164692 170124 2530693 2442783 53248 8192 169409 175014 2795110 2795262 53248 16384 171873 176216 3088432 3172946 With nconnect=2 random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 53248 2 7653 10723 20632 20970 53248 4 15232 21710 43017 42909 53248 8 27894 38009 80566 80249 53248 16 47392 60132 140226 138809 53248 32 72166 84713 240219 240935 53248 64 95449 102520 392916 387097 53248 128 113915 118447 592994 579702 53248 256 132337 136397 808895 782690 53248 512 142757 147276 1023450 980987 53248 1024 149803 153748 1232539 1200873 53248 2048 117144 142496 1726862 1846521 53248 4096 129211 168913 2327366 2035403 53248 8192 168842 173977 2079450 859542 53248 16384 170514 133000 2450596 856588 With nconnect=1 random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 53248 2 7287 10482 20808 20586 53248 4 14282 20916 41216 40532 53248 8 26230 36606 76589 79005 53248 16 45838 59445 142976 141382 53248 32 70513 84601 250468 247247 53248 64 95128 103600 373719 377915 53248 128 116702 121174 571526 558482 53248 256 133131 137286 720249 702101 53248 512 140870 145269 907632 894129 53248 1024 148632 152558 1025853 1071471 53248 2048 69684 68052 1640169 1587587 53248 4096 57389 65044 1932496 1923277 53248 8192 65201 75412 1896445 1880839 53248 16384 86395 109635 1784491 1777077 Mounting a tmpfs instead of the disk nconnect=10 random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 53248 2 20766 21097 21248 21096 53248 4 38718 39837 40282 40562 53248 8 70787 73029 75134 75473 53248 16 129871 135244 137464 137202 53248 32 206931 225844 246440 243423 53248 64 307101 324226 362781 363964 53248 128 423743 437825 533503 539324 53248 256 549566 600099 726419 756622 53248 512 658211 723361 890941 902508 53248 1024 771731 898627 1079691 1125845 53248 2048 904072 1047097 1746060 1814433 53248 4096 1197609 1278558 1780285 2390797 53248 8192 1022231 1523377 1463727 1304735 53248 16384 1321716 1716730 3913052 3861092 nconnect=9 random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 53248 2 18595 19418 19935 19555 53248 4 38048 38871 39015 39058 53248 8 70431 73903 73787 73437 53248 16 115428 120146 108439 132652 53248 32 189369 208458 238736 239319 53248 64 310172 326099 351834 350228 53248 128 419917 443973 540968 538233 53248 256 542390 578625 724630 721654 53248 512 636801 692928 876813 886978 53248 1024 740769 807593 1023254 1038803 53248 2048 900703 977706 1744465 1795702 53248 4096 991434 1218405 2312809 1534298 53248 8192 172671 1556220 3210650 1240208 53248 16384 1135860 1732470 3855099 3912755 nconnect=8 random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 53248 2 20164 20622 20499 21020 53248 4 38006 39090 40008 40093 53248 8 70803 72965 75611 75827 53248 16 125845 132516 135011 135602 53248 32 216442 232697 239348 239241 53248 64 288013 297895 356983 363912 53248 128 418932 441833 520451 513015 53248 256 560464 616810 726013 730965 53248 512 674367 722693 903227 936461 53248 1024 761283 840974 1089472 1128827 53248 2048 943060 924299 1467459 1666917 53248 4096 970724 1052788 2433414 1938400 53248 8192 1342030 1089869 464917 3304996 53248 16384 1458436 1095725 3794363 1635401 nconnect=7 random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 53248 2 20482 21154 21481 21409 53248 4 39328 40445 41006 40581 53248 8 75042 77753 80518 79727 53248 16 131785 136573 139394 138978 53248 32 150097 209044 249709 250655 53248 64 316353 333310 380193 383393 53248 128 427594 453668 573614 573235 53248 256 568166 611842 751230 753997 53248 512 655601 718936 909862 920353 53248 1024 749337 824988 1073221 1092846 53248 2048 959526 991769 1722507 1835308 53248 4096 1114485 1273084 824029 2244745 53248 8192 1096944 1590424 3208102 1612757 53248 16384 186085 1777460 2446002 3071636 nconnect=6 random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 53248 2 19954 20159 20472 20692 53248 4 38829 39657 40025 39943 53248 8 70936 73492 74566 74764 53248 16 119267 123319 136927 136591 53248 32 193462 227254 239441 240293 53248 64 280700 280861 348085 352502 53248 128 410708 433280 268324 480572 53248 256 549707 599025 705775 721743 53248 512 694691 777286 834676 831794 53248 1024 796161 899669 985672 1011762 53248 2048 660219 1095097 1442643 1536969 53248 4096 713024 1097287 2375110 2278199 53248 8192 961825 814827 1414807 1073586 53248 16384 1302666 188459 789169 3799328 nconnect=5 random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 53248 2 20083 20853 21387 21790 53248 4 39346 40634 41595 41911 53248 8 72275 75203 78950 79016 53248 16 110484 128308 135731 131166 53248 32 202718 216528 239493 240653 53248 64 293191 298468 379034 382413 53248 128 457944 496666 551294 555308 53248 256 595181 641156 750500 751126 53248 512 694337 787317 895434 898956 53248 1024 761906 854799 1064769 1073980 53248 2048 946967 1116994 1735369 1746934 53248 4096 392953 1355423 2615086 2455756 53248 8192 1356030 1578369 3033668 3172360 53248 16384 1454587 1743974 3562513 3540975 nconnect=4 random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 53248 2 20228 20092 21908 22120 53248 4 38694 39986 41089 41153 53248 8 75699 78465 80083 80017 53248 16 102728 130883 135680 141924 53248 32 220118 231684 240910 249315 53248 64 302994 321295 385046 386325 53248 128 457099 488792 564420 563577 53248 256 586191 676053 767127 776559 53248 512 715344 782611 899003 906520 53248 1024 771923 874051 1182348 1256440 53248 2048 969607 1104706 1557321 1911278 53248 4096 1179644 981022 1722069 2709534 53248 8192 1216820 1556373 3159477 3254646 53248 16384 1508198 605894 3517653 3571029 nconnect=3 random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 53248 2 21481 21763 21988 21311 53248 4 39828 40888 41669 41768 53248 8 65010 76085 80491 80466 53248 16 123527 135609 143423 144154 53248 32 225695 236990 250957 251665 53248 64 320309 348847 396364 396967 53248 128 426707 452220 565097 565103 53248 256 558951 600620 763477 767196 53248 512 668986 726410 972622 989905 53248 1024 782668 839173 1183444 1149741 53248 2048 974740 1075588 1853002 1885892 53248 4096 1198605 1308529 1270347 1624458 53248 8192 936760 1609546 2008581 2949932 53248 16384 579957 1068755 1254678 1268465 nconnect=2 random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 53248 2 20386 21137 21406 21519 53248 4 38273 39530 40406 41521 53248 8 73789 73972 78914 79116 53248 16 127961 133436 138270 137096 53248 32 213333 231143 238689 239144 53248 64 292544 301586 372603 374027 53248 128 449001 480655 552909 532209 53248 256 551713 611455 726627 738374 53248 512 652788 745258 845863 848531 53248 1024 822491 904270 1080454 1024272 53248 2048 829847 948519 2001870 1985974 53248 4096 1198116 1387247 2519900 2503433 53248 8192 1345305 1475502 2918073 3259019 53248 16384 634718 475630 3128884 2969906 nconnect=1 random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 53248 2 21288 21799 21638 21763 53248 4 40599 42412 42758 42762 53248 8 75734 78713 80072 81414 53248 16 124331 133874 148128 148421 53248 32 229738 242286 261479 262589 53248 64 337174 357993 385598 391051 53248 128 428862 462394 582345 576003 53248 256 527788 530506 780829 790614 53248 512 668147 732605 1071388 1058561 53248 1024 823391 921211 1218651 1223558 53248 2048 1016144 1111789 1600626 1585513 53248 4096 1251567 1436417 1818215 1868426 53248 8192 1479547 1716916 1804469 1789697 53248 16384 1435145 1954500 1796230 1799570 On Sun, Apr 30, 2017 at 8:49 AM Trond Myklebust <trond.myklebust@primarydata.com> wrote: > > In the spirit of experimentation, I've put together a set of patches > that implement setting up multiple TCP connections to the server. > The connections all go to the same server IP address, so do not > provide support for multiple IP addresses (which I believe is > something Andy Adamson is working on). > > The feature is only enabled for NFSv4.1 and NFSv4.2 for now; I don't > feel comfortable subjecting NFSv3/v4 replay caches to this > treatment yet. It relies on the mount option "nconnect" to specify > the number of connections to st up. So you can do something like > 'mount -t nfs -overs=4.1,nconnect=8 foo:/bar /mnt' > to set up 8 TCP connections to server 'foo'. > > Anyhow, feel free to test and give me feedback as to whether or not > this helps performance on your system. > > Trond Myklebust (5): > SUNRPC: Allow creation of RPC clients with multiple connections > NFS: Add a mount option to specify number of TCP connections to use > NFSv4: Allow multiple connections to NFSv4.x (x>0) servers > pNFS: Allow multiple connections to the DS > NFS: Display the "nconnect" mount option if it is set. > > fs/nfs/client.c | 2 ++ > fs/nfs/internal.h | 2 ++ > fs/nfs/nfs3client.c | 3 +++ > fs/nfs/nfs4client.c | 13 +++++++++++-- > fs/nfs/super.c | 12 ++++++++++++ > include/linux/nfs_fs_sb.h | 1 + > include/linux/sunrpc/clnt.h | 1 + > net/sunrpc/clnt.c | 17 ++++++++++++++++- > net/sunrpc/xprtmultipath.c | 3 +-- > 9 files changed, 49 insertions(+), 5 deletions(-) > > -- > 2.9.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 0/5] Fun with the multipathing code 2019-01-09 19:39 ` Olga Kornievskaia @ 2019-01-09 20:38 ` Trond Myklebust 2019-01-09 22:18 ` Olga Kornievskaia 0 siblings, 1 reply; 23+ messages in thread From: Trond Myklebust @ 2019-01-09 20:38 UTC (permalink / raw) To: aglo; +Cc: linux-nfs Hi Olga On Wed, 2019-01-09 at 14:39 -0500, Olga Kornievskaia wrote: > Hi Trond, > > Do you have any plans for this patch set? > > I applied the patches on top of 4.20-rc7 kernel I had and tested it > (linux to linux) with iozone on the hardware (40G link with Mellanox > CX-5 card). > > Results seem to show read IO improvement from 1.9GB to 3.9GB. Write > IO > speed seems to be the same (disk bound I'm guessing). I also tried > mounting tmpfs. Same thing. > > Seems like a useful feature to include? Thanks for testing this. Was this your own port of the original patches, or have you taken my branch from http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=shortlog;h=refs/heads/multipath_tcp ? Either way I appreciate the data point. I haven't seen too many other reports of performance improvements, and that's the main reason why this patchset has languished. 3.9GB/s would be about 31Gbps, so that is not quite wire speed, but certainly a big improvement on 1.9GB/s. I'm a little surprised, tbough, that the write performance did not improve with the tmpfs. Was all this using aio+dio on the client? -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 0/5] Fun with the multipathing code 2019-01-09 20:38 ` Trond Myklebust @ 2019-01-09 22:18 ` Olga Kornievskaia 0 siblings, 0 replies; 23+ messages in thread From: Olga Kornievskaia @ 2019-01-09 22:18 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs On Wed, Jan 9, 2019 at 3:38 PM Trond Myklebust <trondmy@hammerspace.com> wrote: > > Hi Olga > > On Wed, 2019-01-09 at 14:39 -0500, Olga Kornievskaia wrote: > > Hi Trond, > > > > Do you have any plans for this patch set? > > > > I applied the patches on top of 4.20-rc7 kernel I had and tested it > > (linux to linux) with iozone on the hardware (40G link with Mellanox > > CX-5 card). > > > > Results seem to show read IO improvement from 1.9GB to 3.9GB. Write > > IO > > speed seems to be the same (disk bound I'm guessing). I also tried > > mounting tmpfs. Same thing. > > > > Seems like a useful feature to include? > > Thanks for testing this. > > Was this your own port of the original patches, or have you taken my > branch from > http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=shortlog;h=refs/heads/multipath_tcp > ? I didn't know one existed. I just took original patches from the mailing list and applied to 4.20-rc7 (they applied without issues that I recall). > Either way I appreciate the data point. I haven't seen too many other > reports of performance improvements, and that's the main reason why > this patchset has languished. > > 3.9GB/s would be about 31Gbps, so that is not quite wire speed, but > certainly a big improvement on 1.9GB/s. Maybe it's the lab setup that's not tuned to achieve max performance. > I'm a little surprised, tbough, > that the write performance did not improve with the tmpfs. Was all this > using aio+dio on the client? It is what ever "iozone -i0 -i1 -s52m -y2k -az -I" translates to. To clarify by "didn't improve" I didn't mean the write speed with disk is same as write speed with tmpfs (disk write speed is ~168MB and tmpfs write speed is 1.47GB). I meant that it seems with nconnect=1 it achieves the "max" performance of disk/tmpfs. > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > > ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2019-01-09 22:18 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-04-28 17:25 [RFC PATCH 0/5] Fun with the multipathing code Trond Myklebust 2017-04-28 17:25 ` [RFC PATCH 1/5] SUNRPC: Allow creation of RPC clients with multiple connections Trond Myklebust 2017-04-28 17:25 ` [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use Trond Myklebust 2017-04-28 17:25 ` [RFC PATCH 3/5] NFSv4: Allow multiple connections to NFSv4.x (x>0) servers Trond Myklebust 2017-04-28 17:25 ` [RFC PATCH 4/5] pNFS: Allow multiple connections to the DS Trond Myklebust 2017-04-28 17:25 ` [RFC PATCH 5/5] NFS: Display the "nconnect" mount option if it is set Trond Myklebust 2017-05-04 13:45 ` [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use Chuck Lever 2017-05-04 13:53 ` Chuck Lever 2017-05-04 16:01 ` Chuck Lever 2017-05-04 17:36 ` J. Bruce Fields 2017-05-04 17:38 ` Chuck Lever 2017-05-04 17:45 ` J. Bruce Fields 2017-05-04 18:55 ` Chuck Lever 2017-05-04 19:58 ` J. Bruce Fields 2017-05-04 20:40 ` Trond Myklebust 2017-05-04 20:42 ` bfields 2017-04-28 17:45 ` [RFC PATCH 0/5] Fun with the multipathing code Chuck Lever 2017-04-28 18:08 ` Trond Myklebust 2017-04-29 17:53 ` Chuck Lever 2017-05-04 19:09 ` Anna Schumaker 2019-01-09 19:39 ` Olga Kornievskaia 2019-01-09 20:38 ` Trond Myklebust 2019-01-09 22:18 ` Olga Kornievskaia
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).