linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] Fun with the multipathing code
@ 2017-04-28 17:25 Trond Myklebust
  2017-04-28 17:25 ` [RFC PATCH 1/5] SUNRPC: Allow creation of RPC clients with multiple connections Trond Myklebust
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Trond Myklebust @ 2017-04-28 17:25 UTC (permalink / raw)
  To: linux-nfs

In the spirit of experimentation, I've put together a set of patches
that implement setting up multiple TCP connections to the server.
The connections all go to the same server IP address, so do not
provide support for multiple IP addresses (which I believe is
something Andy Adamson is working on).

The feature is only enabled for NFSv4.1 and NFSv4.2 for now; I don't
feel comfortable subjecting NFSv3/v4 replay caches to this
treatment yet. It relies on the mount option "nconnect" to specify
the number of connections to st up. So you can do something like
  'mount -t nfs -overs=4.1,nconnect=8 foo:/bar /mnt'
to set up 8 TCP connections to server 'foo'.

Anyhow, feel free to test and give me feedback as to whether or not
this helps performance on your system.

Trond Myklebust (5):
  SUNRPC: Allow creation of RPC clients with multiple connections
  NFS: Add a mount option to specify number of TCP connections to use
  NFSv4: Allow multiple connections to NFSv4.x (x>0) servers
  pNFS: Allow multiple connections to the DS
  NFS: Display the "nconnect" mount option if it is set.

 fs/nfs/client.c             |  2 ++
 fs/nfs/internal.h           |  2 ++
 fs/nfs/nfs3client.c         |  3 +++
 fs/nfs/nfs4client.c         | 13 +++++++++++--
 fs/nfs/super.c              | 12 ++++++++++++
 include/linux/nfs_fs_sb.h   |  1 +
 include/linux/sunrpc/clnt.h |  1 +
 net/sunrpc/clnt.c           | 17 ++++++++++++++++-
 net/sunrpc/xprtmultipath.c  |  3 +--
 9 files changed, 49 insertions(+), 5 deletions(-)

-- 
2.9.3


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH 1/5] SUNRPC: Allow creation of RPC clients with multiple connections
  2017-04-28 17:25 [RFC PATCH 0/5] Fun with the multipathing code Trond Myklebust
@ 2017-04-28 17:25 ` Trond Myklebust
  2017-04-28 17:25   ` [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use Trond Myklebust
  2017-04-28 17:45 ` [RFC PATCH 0/5] Fun with the multipathing code Chuck Lever
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 24+ messages in thread
From: Trond Myklebust @ 2017-04-28 17:25 UTC (permalink / raw)
  To: linux-nfs

Add an argument to struct rpc_create_args that allows the specification
of how many transport connections you want to set up to the server.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 include/linux/sunrpc/clnt.h |  1 +
 net/sunrpc/clnt.c           | 17 ++++++++++++++++-
 net/sunrpc/xprtmultipath.c  |  3 +--
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index 6095ecba0dde..8c3cb38a385b 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -120,6 +120,7 @@ struct rpc_create_args {
 	u32			prognumber;	/* overrides program->number */
 	u32			version;
 	rpc_authflavor_t	authflavor;
+	u32			nconnect;
 	unsigned long		flags;
 	char			*client_name;
 	struct svc_xprt		*bc_xprt;	/* NFSv4.1 backchannel */
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 673046c64e48..0ff97288b43f 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -522,6 +522,8 @@ struct rpc_clnt *rpc_create(struct rpc_create_args *args)
 		.bc_xprt = args->bc_xprt,
 	};
 	char servername[48];
+	struct rpc_clnt *clnt;
+	int i;
 
 	if (args->bc_xprt) {
 		WARN_ON_ONCE(!(args->protocol & XPRT_TRANSPORT_BC));
@@ -584,7 +586,15 @@ struct rpc_clnt *rpc_create(struct rpc_create_args *args)
 	if (args->flags & RPC_CLNT_CREATE_NONPRIVPORT)
 		xprt->resvport = 0;
 
-	return rpc_create_xprt(args, xprt);
+	clnt = rpc_create_xprt(args, xprt);
+	if (IS_ERR(clnt) || args->nconnect <= 1)
+		return clnt;
+
+	for (i = 0; i < args->nconnect - 1; i++) {
+		if (rpc_clnt_add_xprt(clnt, &xprtargs, NULL, NULL) < 0)
+			break;
+	}
+	return clnt;
 }
 EXPORT_SYMBOL_GPL(rpc_create);
 
@@ -2605,6 +2615,10 @@ int rpc_clnt_test_and_add_xprt(struct rpc_clnt *clnt,
 		return -ENOMEM;
 	data->xps = xprt_switch_get(xps);
 	data->xprt = xprt_get(xprt);
+	if (rpc_xprt_switch_has_addr(data->xps, (struct sockaddr *)&xprt->addr)) {
+		rpc_cb_add_xprt_release(data);
+		goto success;
+	}
 
 	cred = authnull_ops.lookup_cred(NULL, NULL, 0);
 	task = rpc_call_null_helper(clnt, xprt, cred,
@@ -2614,6 +2628,7 @@ int rpc_clnt_test_and_add_xprt(struct rpc_clnt *clnt,
 	if (IS_ERR(task))
 		return PTR_ERR(task);
 	rpc_put_task(task);
+success:
 	return 1;
 }
 EXPORT_SYMBOL_GPL(rpc_clnt_test_and_add_xprt);
diff --git a/net/sunrpc/xprtmultipath.c b/net/sunrpc/xprtmultipath.c
index 95064d510ce6..486819d0c58b 100644
--- a/net/sunrpc/xprtmultipath.c
+++ b/net/sunrpc/xprtmultipath.c
@@ -51,8 +51,7 @@ void rpc_xprt_switch_add_xprt(struct rpc_xprt_switch *xps,
 	if (xprt == NULL)
 		return;
 	spin_lock(&xps->xps_lock);
-	if ((xps->xps_net == xprt->xprt_net || xps->xps_net == NULL) &&
-	    !rpc_xprt_switch_has_addr(xps, (struct sockaddr *)&xprt->addr))
+	if (xps->xps_net == xprt->xprt_net || xps->xps_net == NULL)
 		xprt_switch_add_xprt_locked(xps, xprt);
 	spin_unlock(&xps->xps_lock);
 }
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use
  2017-04-28 17:25 ` [RFC PATCH 1/5] SUNRPC: Allow creation of RPC clients with multiple connections Trond Myklebust
@ 2017-04-28 17:25   ` Trond Myklebust
  2017-04-28 17:25     ` [RFC PATCH 3/5] NFSv4: Allow multiple connections to NFSv4.x (x>0) servers Trond Myklebust
  2017-05-04 13:45     ` [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use Chuck Lever
  0 siblings, 2 replies; 24+ messages in thread
From: Trond Myklebust @ 2017-04-28 17:25 UTC (permalink / raw)
  To: linux-nfs

Allow the user to specify that the client should use multiple connections
to the server. For the moment, this functionality will be limited to
TCP and to NFSv4.x (x>0).

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/internal.h |  1 +
 fs/nfs/super.c    | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 31b26cf1b476..31757a742e9b 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -117,6 +117,7 @@ struct nfs_parsed_mount_data {
 		char			*export_path;
 		int			port;
 		unsigned short		protocol;
+		unsigned short		nconnect;
 	} nfs_server;
 
 	struct security_mnt_opts lsm_opts;
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 54e0f9f2dd94..7eb48934dc79 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -76,6 +76,8 @@
 #define NFS_DEFAULT_VERSION 2
 #endif
 
+#define NFS_MAX_CONNECTIONS 16
+
 enum {
 	/* Mount options that take no arguments */
 	Opt_soft, Opt_hard,
@@ -107,6 +109,7 @@ enum {
 	Opt_nfsvers,
 	Opt_sec, Opt_proto, Opt_mountproto, Opt_mounthost,
 	Opt_addr, Opt_mountaddr, Opt_clientaddr,
+	Opt_nconnect,
 	Opt_lookupcache,
 	Opt_fscache_uniq,
 	Opt_local_lock,
@@ -179,6 +182,8 @@ static const match_table_t nfs_mount_option_tokens = {
 	{ Opt_mounthost, "mounthost=%s" },
 	{ Opt_mountaddr, "mountaddr=%s" },
 
+	{ Opt_nconnect, "nconnect=%s" },
+
 	{ Opt_lookupcache, "lookupcache=%s" },
 	{ Opt_fscache_uniq, "fsc=%s" },
 	{ Opt_local_lock, "local_lock=%s" },
@@ -1544,6 +1549,11 @@ static int nfs_parse_mount_options(char *raw,
 			if (mnt->mount_server.addrlen == 0)
 				goto out_invalid_address;
 			break;
+		case Opt_nconnect:
+			if (nfs_get_option_ul_bound(args, &option, 1, NFS_MAX_CONNECTIONS))
+				goto out_invalid_value;
+			mnt->nfs_server.nconnect = option;
+			break;
 		case Opt_lookupcache:
 			string = match_strdup(args);
 			if (string == NULL)
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 3/5] NFSv4: Allow multiple connections to NFSv4.x (x>0) servers
  2017-04-28 17:25   ` [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use Trond Myklebust
@ 2017-04-28 17:25     ` Trond Myklebust
  2017-04-28 17:25       ` [RFC PATCH 4/5] pNFS: Allow multiple connections to the DS Trond Myklebust
  2017-05-04 13:45     ` [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use Chuck Lever
  1 sibling, 1 reply; 24+ messages in thread
From: Trond Myklebust @ 2017-04-28 17:25 UTC (permalink / raw)
  To: linux-nfs

If the user specifies the -onconn=<number> mount option, and the transport
protocol is TCP, then set up <number> connections to the server. The
connections will all go to the same IP address.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/client.c           |  2 ++
 fs/nfs/internal.h         |  1 +
 fs/nfs/nfs4client.c       | 10 ++++++++--
 include/linux/nfs_fs_sb.h |  1 +
 4 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index e0302101e18a..c5b0f3e270a3 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -180,6 +180,7 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
 	clp->cl_rpcclient = ERR_PTR(-EINVAL);
 
 	clp->cl_proto = cl_init->proto;
+	clp->cl_nconnect = cl_init->nconnect;
 	clp->cl_net = get_net(cl_init->net);
 
 	cred = rpc_lookup_machine_cred("*");
@@ -488,6 +489,7 @@ int nfs_create_rpc_client(struct nfs_client *clp,
 	struct rpc_create_args args = {
 		.net		= clp->cl_net,
 		.protocol	= clp->cl_proto,
+		.nconnect	= clp->cl_nconnect,
 		.address	= (struct sockaddr *)&clp->cl_addr,
 		.addrsize	= clp->cl_addrlen,
 		.timeout	= cl_init->timeparms,
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 31757a742e9b..abe5d3934eaf 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -77,6 +77,7 @@ struct nfs_client_initdata {
 	struct nfs_subversion *nfs_mod;
 	int proto;
 	u32 minorversion;
+	unsigned int nconnect;
 	struct net *net;
 	const struct rpc_timeout *timeparms;
 };
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 692a7a8bfc7a..c9b10b7829f0 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -834,7 +834,8 @@ static int nfs4_set_client(struct nfs_server *server,
 		const size_t addrlen,
 		const char *ip_addr,
 		int proto, const struct rpc_timeout *timeparms,
-		u32 minorversion, struct net *net)
+		u32 minorversion, unsigned int nconnect,
+		struct net *net)
 {
 	struct nfs_client_initdata cl_init = {
 		.hostname = hostname,
@@ -849,6 +850,8 @@ static int nfs4_set_client(struct nfs_server *server,
 	};
 	struct nfs_client *clp;
 
+	if (minorversion > 0 && proto == XPRT_TRANSPORT_TCP)
+		cl_init.nconnect = nconnect;
 	if (server->flags & NFS_MOUNT_NORESVPORT)
 		set_bit(NFS_CS_NORESVPORT, &cl_init.init_flags);
 	if (server->options & NFS_OPTION_MIGRATION)
@@ -1040,6 +1043,7 @@ static int nfs4_init_server(struct nfs_server *server,
 			data->nfs_server.protocol,
 			&timeparms,
 			data->minorversion,
+			data->nfs_server.nconnect,
 			data->net);
 	if (error < 0)
 		return error;
@@ -1124,6 +1128,7 @@ struct nfs_server *nfs4_create_referral_server(struct nfs_clone_mount *data,
 				rpc_protocol(parent_server->client),
 				parent_server->client->cl_timeout,
 				parent_client->cl_mvops->minor_version,
+				parent_client->cl_nconnect,
 				parent_client->cl_net);
 	if (error < 0)
 		goto error;
@@ -1215,7 +1220,8 @@ int nfs4_update_server(struct nfs_server *server, const char *hostname,
 	nfs_server_remove_lists(server);
 	error = nfs4_set_client(server, hostname, sap, salen, buf,
 				clp->cl_proto, clnt->cl_timeout,
-				clp->cl_minorversion, net);
+				clp->cl_minorversion,
+				clp->cl_nconnect, net);
 	nfs_put_client(clp);
 	if (error != 0) {
 		nfs_server_insert_lists(server);
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 2a70f34dffe8..b7e6b94d1246 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -55,6 +55,7 @@ struct nfs_client {
 	struct nfs_subversion *	cl_nfs_mod;	/* pointer to nfs version module */
 
 	u32			cl_minorversion;/* NFSv4 minorversion */
+	unsigned int		cl_nconnect;	/* Number of connections */
 	struct rpc_cred		*cl_machine_cred;
 
 #if IS_ENABLED(CONFIG_NFS_V4)
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 4/5] pNFS: Allow multiple connections to the DS
  2017-04-28 17:25     ` [RFC PATCH 3/5] NFSv4: Allow multiple connections to NFSv4.x (x>0) servers Trond Myklebust
@ 2017-04-28 17:25       ` Trond Myklebust
  2017-04-28 17:25         ` [RFC PATCH 5/5] NFS: Display the "nconnect" mount option if it is set Trond Myklebust
  0 siblings, 1 reply; 24+ messages in thread
From: Trond Myklebust @ 2017-04-28 17:25 UTC (permalink / raw)
  To: linux-nfs

If the user specifies -onconn=<number> mount option, and the transport
protocol is TCP, then set up <number> connections to the pNFS data server
as well. The connections will all go to the same IP address.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/nfs3client.c | 3 +++
 fs/nfs/nfs4client.c | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/fs/nfs/nfs3client.c b/fs/nfs/nfs3client.c
index 7879f2a0fcfd..8c624c74ddbe 100644
--- a/fs/nfs/nfs3client.c
+++ b/fs/nfs/nfs3client.c
@@ -100,6 +100,9 @@ struct nfs_client *nfs3_set_ds_client(struct nfs_server *mds_srv,
 		return ERR_PTR(-EINVAL);
 	cl_init.hostname = buf;
 
+	if (mds_clp->cl_nconnect > 1 && ds_proto == XPRT_TRANSPORT_TCP)
+		cl_init.nconnect = mds_clp->cl_nconnect;
+
 	if (mds_srv->flags & NFS_MOUNT_NORESVPORT)
 		set_bit(NFS_CS_NORESVPORT, &cl_init.init_flags);
 
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index c9b10b7829f0..bfea1b232dd2 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -912,6 +912,9 @@ struct nfs_client *nfs4_set_ds_client(struct nfs_server *mds_srv,
 		return ERR_PTR(-EINVAL);
 	cl_init.hostname = buf;
 
+	if (mds_clp->cl_nconnect > 1 && ds_proto == XPRT_TRANSPORT_TCP)
+		cl_init.nconnect = mds_clp->cl_nconnect;
+
 	if (mds_srv->flags & NFS_MOUNT_NORESVPORT)
 		__set_bit(NFS_CS_NORESVPORT, &cl_init.init_flags);
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 5/5] NFS: Display the "nconnect" mount option if it is set.
  2017-04-28 17:25       ` [RFC PATCH 4/5] pNFS: Allow multiple connections to the DS Trond Myklebust
@ 2017-04-28 17:25         ` Trond Myklebust
  0 siblings, 0 replies; 24+ messages in thread
From: Trond Myklebust @ 2017-04-28 17:25 UTC (permalink / raw)
  To: linux-nfs

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/super.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 7eb48934dc79..0e07a6684235 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -673,6 +673,8 @@ static void nfs_show_mount_options(struct seq_file *m, struct nfs_server *nfss,
 	seq_printf(m, ",proto=%s",
 		   rpc_peeraddr2str(nfss->client, RPC_DISPLAY_NETID));
 	rcu_read_unlock();
+	if (clp->cl_nconnect > 0)
+		seq_printf(m, ",nconnect=%u", clp->cl_nconnect);
 	if (version == 4) {
 		if (nfss->port != NFS_PORT)
 			seq_printf(m, ",port=%u", nfss->port);
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 0/5] Fun with the multipathing code
  2017-04-28 17:25 [RFC PATCH 0/5] Fun with the multipathing code Trond Myklebust
  2017-04-28 17:25 ` [RFC PATCH 1/5] SUNRPC: Allow creation of RPC clients with multiple connections Trond Myklebust
@ 2017-04-28 17:45 ` Chuck Lever
  2017-04-28 18:08   ` Trond Myklebust
  2017-05-04 19:09 ` Anna Schumaker
  2019-01-09 19:39 ` Olga Kornievskaia
  3 siblings, 1 reply; 24+ messages in thread
From: Chuck Lever @ 2017-04-28 17:45 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux NFS Mailing List


> On Apr 28, 2017, at 10:25 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote:
> 
> In the spirit of experimentation, I've put together a set of patches
> that implement setting up multiple TCP connections to the server.
> The connections all go to the same server IP address, so do not
> provide support for multiple IP addresses (which I believe is
> something Andy Adamson is working on).
> 
> The feature is only enabled for NFSv4.1 and NFSv4.2 for now; I don't
> feel comfortable subjecting NFSv3/v4 replay caches to this
> treatment yet. It relies on the mount option "nconnect" to specify
> the number of connections to st up. So you can do something like
>  'mount -t nfs -overs=4.1,nconnect=8 foo:/bar /mnt'
> to set up 8 TCP connections to server 'foo'.

IMO this setting should eventually be set dynamically by the
client, or should be global (eg., a module parameter).

Since mount points to the same server share the same transport,
what happens if you specify a different "nconnect" setting on
two mount points to the same server?

What will the client do if there are not enough resources
(eg source ports) to create that many? Or is this an "up to N"
kind of setting? I can imagine a big client having to reduce
the number of connections to each server to help it scale in
number of server connections.

Other storage protocols have a mechanism for determining how
transport connections are provisioned: One connection per
CPU core (or one CPU per NUMA node) on the client. This gives
a clear way to decide which connection to use for each RPC,
and guarantees the reply will arrive at the same compute
domain that sent the call.

And of course: RPC-over-RDMA really loves this kind of feature
(multiple connections between same IP tuples) to spread the
workload over multiple QPs. There isn't anything special needed
for RDMA, I hope, but I'll have a look at the SUNRPC pieces.

Thanks for posting, I'm looking forward to seeing this
capability in the Linux client.


> Anyhow, feel free to test and give me feedback as to whether or not
> this helps performance on your system.
> 
> Trond Myklebust (5):
>  SUNRPC: Allow creation of RPC clients with multiple connections
>  NFS: Add a mount option to specify number of TCP connections to use
>  NFSv4: Allow multiple connections to NFSv4.x (x>0) servers
>  pNFS: Allow multiple connections to the DS
>  NFS: Display the "nconnect" mount option if it is set.
> 
> fs/nfs/client.c             |  2 ++
> fs/nfs/internal.h           |  2 ++
> fs/nfs/nfs3client.c         |  3 +++
> fs/nfs/nfs4client.c         | 13 +++++++++++--
> fs/nfs/super.c              | 12 ++++++++++++
> include/linux/nfs_fs_sb.h   |  1 +
> include/linux/sunrpc/clnt.h |  1 +
> net/sunrpc/clnt.c           | 17 ++++++++++++++++-
> net/sunrpc/xprtmultipath.c  |  3 +--
> 9 files changed, 49 insertions(+), 5 deletions(-)
> 
> -- 
> 2.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 0/5] Fun with the multipathing code
  2017-04-28 17:45 ` [RFC PATCH 0/5] Fun with the multipathing code Chuck Lever
@ 2017-04-28 18:08   ` Trond Myklebust
  2017-04-29 17:53     ` Chuck Lever
  0 siblings, 1 reply; 24+ messages in thread
From: Trond Myklebust @ 2017-04-28 18:08 UTC (permalink / raw)
  To: chuck.lever; +Cc: linux-nfs

T24gRnJpLCAyMDE3LTA0LTI4IGF0IDEwOjQ1IC0wNzAwLCBDaHVjayBMZXZlciB3cm90ZToNCj4g
PiBPbiBBcHIgMjgsIDIwMTcsIGF0IDEwOjI1IEFNLCBUcm9uZCBNeWtsZWJ1c3QgPHRyb25kLm15
a2xlYnVzdEBwcmltDQo+ID4gYXJ5ZGF0YS5jb20+IHdyb3RlOg0KPiA+IA0KPiA+IEluIHRoZSBz
cGlyaXQgb2YgZXhwZXJpbWVudGF0aW9uLCBJJ3ZlIHB1dCB0b2dldGhlciBhIHNldCBvZg0KPiA+
IHBhdGNoZXMNCj4gPiB0aGF0IGltcGxlbWVudCBzZXR0aW5nIHVwIG11bHRpcGxlIFRDUCBjb25u
ZWN0aW9ucyB0byB0aGUgc2VydmVyLg0KPiA+IFRoZSBjb25uZWN0aW9ucyBhbGwgZ28gdG8gdGhl
IHNhbWUgc2VydmVyIElQIGFkZHJlc3MsIHNvIGRvIG5vdA0KPiA+IHByb3ZpZGUgc3VwcG9ydCBm
b3IgbXVsdGlwbGUgSVAgYWRkcmVzc2VzICh3aGljaCBJIGJlbGlldmUgaXMNCj4gPiBzb21ldGhp
bmcgQW5keSBBZGFtc29uIGlzIHdvcmtpbmcgb24pLg0KPiA+IA0KPiA+IFRoZSBmZWF0dXJlIGlz
IG9ubHkgZW5hYmxlZCBmb3IgTkZTdjQuMSBhbmQgTkZTdjQuMiBmb3Igbm93OyBJDQo+ID4gZG9u
J3QNCj4gPiBmZWVsIGNvbWZvcnRhYmxlIHN1YmplY3RpbmcgTkZTdjMvdjQgcmVwbGF5IGNhY2hl
cyB0byB0aGlzDQo+ID4gdHJlYXRtZW50IHlldC4gSXQgcmVsaWVzIG9uIHRoZSBtb3VudCBvcHRp
b24gIm5jb25uZWN0IiB0byBzcGVjaWZ5DQo+ID4gdGhlIG51bWJlciBvZiBjb25uZWN0aW9ucyB0
byBzdCB1cC4gU28geW91IGNhbiBkbyBzb21ldGhpbmcgbGlrZQ0KPiA+IMKgJ21vdW50IC10IG5m
cyAtb3ZlcnM9NC4xLG5jb25uZWN0PTggZm9vOi9iYXIgL21udCcNCj4gPiB0byBzZXQgdXAgOCBU
Q1AgY29ubmVjdGlvbnMgdG8gc2VydmVyICdmb28nLg0KPiANCj4gSU1PIHRoaXMgc2V0dGluZyBz
aG91bGQgZXZlbnR1YWxseSBiZSBzZXQgZHluYW1pY2FsbHkgYnkgdGhlDQo+IGNsaWVudCwgb3Ig
c2hvdWxkIGJlIGdsb2JhbCAoZWcuLCBhIG1vZHVsZSBwYXJhbWV0ZXIpLg0KDQpUaGVyZSBpcyBh
biBhcmd1bWVudCBmb3IgbWFraW5nIGl0IGEgcGVyLXNlcnZlciB2YWx1ZSAod2hpY2ggaXMgd2hh
dA0KdGhpcyBwYXRjaHNldCBkb2VzKS4gSXQgYWxsb3dzIHRoZSBhZG1pbiBhIGNlcnRhaW4gY29u
dHJvbCB0byBsaW1pdCB0aGUNCm51bWJlciBvZiBjb25uZWN0aW9ucyB0byBzcGVjaWZpYyBzZXJ2
ZXJzIHRoYXQgYXJlIG5lZWQgdG8gc2VydmUgbGFyZ2VyDQpudW1iZXJzIG9mIGNsaWVudHMuIEhv
d2V2ZXIgSSdtIG9wZW4gdG8gY291bnRlciBhcmd1bWVudHMuIEkndmUgbm8NCnN0cm9uZyBvcGlu
aW9ucyB5ZXQuDQoNCj4gU2luY2UgbW91bnQgcG9pbnRzIHRvIHRoZSBzYW1lIHNlcnZlciBzaGFy
ZSB0aGUgc2FtZSB0cmFuc3BvcnQsDQo+IHdoYXQgaGFwcGVucyBpZiB5b3Ugc3BlY2lmeSBhIGRp
ZmZlcmVudCAibmNvbm5lY3QiIHNldHRpbmcgb24NCj4gdHdvIG1vdW50IHBvaW50cyB0byB0aGUg
c2FtZSBzZXJ2ZXI/DQoNCkN1cnJlbnRseSwgdGhlIGZpcnN0IG9uZSB3aW5zLg0KDQo+IFdoYXQg
d2lsbCB0aGUgY2xpZW50IGRvIGlmIHRoZXJlIGFyZSBub3QgZW5vdWdoIHJlc291cmNlcw0KPiAo
ZWcgc291cmNlIHBvcnRzKSB0byBjcmVhdGUgdGhhdCBtYW55PyBPciBpcyB0aGlzIGFuICJ1cCB0
byBOIg0KPiBraW5kIG9mIHNldHRpbmc/IEkgY2FuIGltYWdpbmUgYSBiaWcgY2xpZW50IGhhdmlu
ZyB0byByZWR1Y2UNCj4gdGhlIG51bWJlciBvZiBjb25uZWN0aW9ucyB0byBlYWNoIHNlcnZlciB0
byBoZWxwIGl0IHNjYWxlIGluDQo+IG51bWJlciBvZiBzZXJ2ZXIgY29ubmVjdGlvbnMuDQoNClRo
ZXJlIGlzIGFuIGFyYml0cmFyeSAoY29tcGlsZSB0aW1lKSBsaW1pdCBvZiAxNi4gVGhlIHVzZSBv
ZiB0aGUNClNPX1JFVVNFUE9SVCBzb2NrZXQgb3B0aW9uIGVuc3VyZXMgdGhhdCB3ZSBzaG91bGQg
YWxtb3N0IGFsd2F5cyBiZSBhYmxlDQp0byBzYXRpc2Z5IHRoYXQgbnVtYmVyIG9mIHNvdXJjZSBw
b3J0cywgc2luY2UgdGhleSBjYW4gYmUgc2hhcmVkIHdpdGgNCmNvbm5lY3Rpb25zIHRvIG90aGVy
IHNlcnZlcnMuDQoNCj4gT3RoZXIgc3RvcmFnZSBwcm90b2NvbHMgaGF2ZSBhIG1lY2hhbmlzbSBm
b3IgZGV0ZXJtaW5pbmcgaG93DQo+IHRyYW5zcG9ydCBjb25uZWN0aW9ucyBhcmUgcHJvdmlzaW9u
ZWQ6IE9uZSBjb25uZWN0aW9uIHBlcg0KPiBDUFUgY29yZSAob3Igb25lIENQVSBwZXIgTlVNQSBu
b2RlKSBvbiB0aGUgY2xpZW50LiBUaGlzIGdpdmVzDQo+IGEgY2xlYXIgd2F5IHRvIGRlY2lkZSB3
aGljaCBjb25uZWN0aW9uIHRvIHVzZSBmb3IgZWFjaCBSUEMsDQo+IGFuZCBndWFyYW50ZWVzIHRo
ZSByZXBseSB3aWxsIGFycml2ZSBhdCB0aGUgc2FtZSBjb21wdXRlDQo+IGRvbWFpbiB0aGF0IHNl
bnQgdGhlIGNhbGwuDQoNCkNhbiB3ZSBwZXJoYXBzIGxheSBvdXQgYSBjYXNlIGZvciB3aGljaCBt
ZWNoYW5pc21zIGFyZSB1c2VmdWwgYXMgZmFyIGFzDQpoYXJkd2FyZSBpcyBjb25jZXJuZWQ/IEkg
dW5kZXJzdGFuZCB0aGUgc29ja2V0IGNvZGUgaXMgYWxyZWFkeQ0KYWZmaW5pdGlzZWQgdG8gQ1BV
IGNhY2hlcywgc28gdGhhdCBvbmUncyBlYXN5LiBJJ20gbGVzcyBmYW1pbGlhciB3aXRoDQp0aGUg
dmFyaW91cyBmZWF0dXJlcyBvZiB0aGUgdW5kZXJseWluZyBvZmZsb2FkZWQgTklDcyBhbmQgaG93
IHRoZXkgdGVuZA0KdG8gcmVhY3Qgd2hlbiB5b3UgYWRkL3N1YnRyYWN0IFRDUCBjb25uZWN0aW9u
cy4NCg0KPiBBbmQgb2YgY291cnNlOiBSUEMtb3Zlci1SRE1BIHJlYWxseSBsb3ZlcyB0aGlzIGtp
bmQgb2YgZmVhdHVyZQ0KPiAobXVsdGlwbGUgY29ubmVjdGlvbnMgYmV0d2VlbiBzYW1lIElQIHR1
cGxlcykgdG8gc3ByZWFkIHRoZQ0KPiB3b3JrbG9hZCBvdmVyIG11bHRpcGxlIFFQcy4gVGhlcmUg
aXNuJ3QgYW55dGhpbmcgc3BlY2lhbCBuZWVkZWQNCj4gZm9yIFJETUEsIEkgaG9wZSwgYnV0IEkn
bGwgaGF2ZSBhIGxvb2sgYXQgdGhlIFNVTlJQQyBwaWVjZXMuDQoNCkkgaGF2ZW4ndCB5ZXQgZW5h
YmxlZCBpdCBmb3IgUlBDL1JETUEsIGJ1dCBJIGltYWdpbmUgeW91IGNhbiBoZWxwIG91dA0KaWYg
eW91IGZpbmQgaXQgdXNlZnVsIChhcyB5b3UgYXBwZWFyIHRvIGRvKS4NCg0KPiBUaGFua3MgZm9y
IHBvc3RpbmcsIEknbSBsb29raW5nIGZvcndhcmQgdG8gc2VlaW5nIHRoaXMNCj4gY2FwYWJpbGl0
eSBpbiB0aGUgTGludXggY2xpZW50Lg0KPiANCj4gDQo+ID4gQW55aG93LCBmZWVsIGZyZWUgdG8g
dGVzdCBhbmQgZ2l2ZSBtZSBmZWVkYmFjayBhcyB0byB3aGV0aGVyIG9yIG5vdA0KPiA+IHRoaXMg
aGVscHMgcGVyZm9ybWFuY2Ugb24geW91ciBzeXN0ZW0uDQo+ID4gDQo+ID4gVHJvbmQgTXlrbGVi
dXN0ICg1KToNCj4gPiDCoFNVTlJQQzogQWxsb3cgY3JlYXRpb24gb2YgUlBDIGNsaWVudHMgd2l0
aCBtdWx0aXBsZSBjb25uZWN0aW9ucw0KPiA+IMKgTkZTOiBBZGQgYSBtb3VudCBvcHRpb24gdG8g
c3BlY2lmeSBudW1iZXIgb2YgVENQIGNvbm5lY3Rpb25zIHRvDQo+ID4gdXNlDQo+ID4gwqBORlN2
NDogQWxsb3cgbXVsdGlwbGUgY29ubmVjdGlvbnMgdG8gTkZTdjQueCAoeD4wKSBzZXJ2ZXJzDQo+
ID4gwqBwTkZTOiBBbGxvdyBtdWx0aXBsZSBjb25uZWN0aW9ucyB0byB0aGUgRFMNCj4gPiDCoE5G
UzogRGlzcGxheSB0aGUgIm5jb25uZWN0IiBtb3VudCBvcHRpb24gaWYgaXQgaXMgc2V0Lg0KPiA+
IA0KPiA+IGZzL25mcy9jbGllbnQuY8KgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgfMKgwqAyICsr
DQo+ID4gZnMvbmZzL2ludGVybmFsLmjCoMKgwqDCoMKgwqDCoMKgwqDCoMKgfMKgwqAyICsrDQo+
ID4gZnMvbmZzL25mczNjbGllbnQuY8KgwqDCoMKgwqDCoMKgwqDCoHzCoMKgMyArKysNCj4gPiBm
cy9uZnMvbmZzNGNsaWVudC5jwqDCoMKgwqDCoMKgwqDCoMKgfCAxMyArKysrKysrKysrKy0tDQo+
ID4gZnMvbmZzL3N1cGVyLmPCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgfCAxMiArKysrKysr
KysrKysNCj4gPiBpbmNsdWRlL2xpbnV4L25mc19mc19zYi5owqDCoMKgfMKgwqAxICsNCj4gPiBp
bmNsdWRlL2xpbnV4L3N1bnJwYy9jbG50LmggfMKgwqAxICsNCj4gPiBuZXQvc3VucnBjL2NsbnQu
Y8KgwqDCoMKgwqDCoMKgwqDCoMKgwqB8IDE3ICsrKysrKysrKysrKysrKystDQo+ID4gbmV0L3N1
bnJwYy94cHJ0bXVsdGlwYXRoLmPCoMKgfMKgwqAzICstLQ0KPiA+IDkgZmlsZXMgY2hhbmdlZCwg
NDkgaW5zZXJ0aW9ucygrKSwgNSBkZWxldGlvbnMoLSkNCj4gPiANCj4gPiAtLcKgDQo+ID4gMi45
LjMNCj4gPiANCj4gPiAtLQ0KPiA+IFRvIHVuc3Vic2NyaWJlIGZyb20gdGhpcyBsaXN0OiBzZW5k
IHRoZSBsaW5lICJ1bnN1YnNjcmliZSBsaW51eC0NCj4gPiBuZnMiIGluDQo+ID4gdGhlIGJvZHkg
b2YgYSBtZXNzYWdlIHRvIG1ham9yZG9tb0B2Z2VyLmtlcm5lbC5vcmcNCj4gPiBNb3JlIG1ham9y
ZG9tbyBpbmZvIGF0wqDCoGh0dHA6Ly92Z2VyLmtlcm5lbC5vcmcvbWFqb3Jkb21vLWluZm8uaHRt
bA0KPiANCj4gLS0NCj4gQ2h1Y2sgTGV2ZXINCj4gDQo+IA0KPiANCi0tIA0KVHJvbmQgTXlrbGVi
dXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXIsIFByaW1hcnlEYXRhDQp0cm9uZC5teWts
ZWJ1c3RAcHJpbWFyeWRhdGEuY29tDQo=


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 0/5] Fun with the multipathing code
  2017-04-28 18:08   ` Trond Myklebust
@ 2017-04-29 17:53     ` Chuck Lever
  0 siblings, 0 replies; 24+ messages in thread
From: Chuck Lever @ 2017-04-29 17:53 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux NFS Mailing List


> On Apr 28, 2017, at 2:08 PM, Trond Myklebust <trondmy@primarydata.com> wrote:
> 
> On Fri, 2017-04-28 at 10:45 -0700, Chuck Lever wrote:
>>> On Apr 28, 2017, at 10:25 AM, Trond Myklebust <trond.myklebust@prim
>>> arydata.com> wrote:
>>> 
>>> In the spirit of experimentation, I've put together a set of
>>> patches
>>> that implement setting up multiple TCP connections to the server.
>>> The connections all go to the same server IP address, so do not
>>> provide support for multiple IP addresses (which I believe is
>>> something Andy Adamson is working on).
>>> 
>>> The feature is only enabled for NFSv4.1 and NFSv4.2 for now; I
>>> don't
>>> feel comfortable subjecting NFSv3/v4 replay caches to this
>>> treatment yet. It relies on the mount option "nconnect" to specify
>>> the number of connections to st up. So you can do something like
>>> 聽'mount -t nfs -overs=4.1,nconnect=8 foo:/bar /mnt'
>>> to set up 8 TCP connections to server 'foo'.
>> 
>> IMO this setting should eventually be set dynamically by the
>> client, or should be global (eg., a module parameter).
> 
> There is an argument for making it a per-server value (which is what
> this patchset does). It allows the admin a certain control to limit the
> number of connections to specific servers that are need to serve larger
> numbers of clients. However I'm open to counter arguments. I've no
> strong opinions yet.

Like direct I/O, this kind of setting could allow a single
client to DoS a server.

One additional concern might be how to deal with servers who
have no more ability to accept connections during certain
periods, but are able to support a lot of connections at
other times.


>> Since mount points to the same server share the same transport,
>> what happens if you specify a different "nconnect" setting on
>> two mount points to the same server?
> 
> Currently, the first one wins.
> 
>> What will the client do if there are not enough resources
>> (eg source ports) to create that many? Or is this an "up to N"
>> kind of setting? I can imagine a big client having to reduce
>> the number of connections to each server to help it scale in
>> number of server connections.
> 
> There is an arbitrary (compile time) limit of 16. The use of the
> SO_REUSEPORT socket option ensures that we should almost always be able
> to satisfy that number of source ports, since they can be shared with
> connections to other servers.

FWIW, Solaris limits this setting to 8. I think past that
value, there is only incremental and diminishing gain.
That could be apples to pears, though.

I'm not aware of a mount option, but there might be a
system tunable that controls this setting on each client.


>> Other storage protocols have a mechanism for determining how
>> transport connections are provisioned: One connection per
>> CPU core (or one CPU per NUMA node) on the client. This gives
>> a clear way to decide which connection to use for each RPC,
>> and guarantees the reply will arrive at the same compute
>> domain that sent the call.
> 
> Can we perhaps lay out a case for which mechanisms are useful as far as
> hardware is concerned? I understand the socket code is already
> affinitised to CPU caches, so that one's easy. I'm less familiar with
> the various features of the underlying offloaded NICs and how they tend
> to react when you add/subtract TCP connections.

Well, the optimal number of connections varies depending on
the NIC hardware design. I don't think there's a hard-and-fast
rule, but typically the server-class NICs have multiple DMA
engines and multiple cores. Thus they benefit from having
multiple sockets, up to a point.

Smaller clients would have a handful of cores, a single
memory hierarchy, and one NIC. I would guess optimizing for
the NIC (or server) would be best in that case. I'd bet
two connections would be a very good default.

For large clients, a connection per NUMA node makes sense.
This keeps the amount of cross-node memory traffic to a
minimum, which improves system scalability.

The issues with "socket per CPU core" are: there can be a lot
of cores, and it might be wasteful to open that many sockets
to each NFS server; and what do you do with a socket when
a CPU core is taken offline?


>> And of course: RPC-over-RDMA really loves this kind of feature
>> (multiple connections between same IP tuples) to spread the
>> workload over multiple QPs. There isn't anything special needed
>> for RDMA, I hope, but I'll have a look at the SUNRPC pieces.
> 
> I haven't yet enabled it for RPC/RDMA, but I imagine you can help out
> if you find it useful (as you appear to do).

I can give the patch set a try this week. I haven't seen any
thing that would exclude proto=rdma from playing in this
sandbox.


>> Thanks for posting, I'm looking forward to seeing this
>> capability in the Linux client.
>> 
>> 
>>> Anyhow, feel free to test and give me feedback as to whether or not
>>> this helps performance on your system.
>>> 
>>> Trond Myklebust (5):
>>> 聽SUNRPC: Allow creation of RPC clients with multiple connections
>>> 聽NFS: Add a mount option to specify number of TCP connections to
>>> use
>>> 聽NFSv4: Allow multiple connections to NFSv4.x (x>0) servers
>>> 聽pNFS: Allow multiple connections to the DS
>>> 聽NFS: Display the "nconnect" mount option if it is set.
>>> 
>>> fs/nfs/client.c聽聽聽聽聽聽聽聽聽聽聽聽聽|聽聽2 ++
>>> fs/nfs/internal.h聽聽聽聽聽聽聽聽聽聽聽|聽聽2 ++
>>> fs/nfs/nfs3client.c聽聽聽聽聽聽聽聽聽|聽聽3 +++
>>> fs/nfs/nfs4client.c聽聽聽聽聽聽聽聽聽| 13 +++++++++++--
>>> fs/nfs/super.c聽聽聽聽聽聽聽聽聽聽聽聽聽聽| 12 ++++++++++++
>>> include/linux/nfs_fs_sb.h聽聽聽|聽聽1 +
>>> include/linux/sunrpc/clnt.h |聽聽1 +
>>> net/sunrpc/clnt.c聽聽聽聽聽聽聽聽聽聽聽| 17 ++++++++++++++++-
>>> net/sunrpc/xprtmultipath.c聽聽|聽聽3 +--
>>> 9 files changed, 49 insertions(+), 5 deletions(-)
>>> 
>>> --聽
>>> 2.9.3
>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-
>>> nfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at聽聽http://vger.kernel.org/majordomo-info.html
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> trond.myklebust@primarydata.com
> �N嫥叉靣笡y氊b瞂千v豝�)藓{.n�+壏{睗�"炟^n噐■��侂h櫒璀�&Ⅷ�瓽珴閔��(殠娸"濟���m��飦赇z罐枈帼f"穐殘坢

--
Chuck Lever




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use
  2017-04-28 17:25   ` [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use Trond Myklebust
  2017-04-28 17:25     ` [RFC PATCH 3/5] NFSv4: Allow multiple connections to NFSv4.x (x>0) servers Trond Myklebust
@ 2017-05-04 13:45     ` Chuck Lever
  2017-05-04 13:53       ` Chuck Lever
  2017-05-04 16:01       ` Chuck Lever
  1 sibling, 2 replies; 24+ messages in thread
From: Chuck Lever @ 2017-05-04 13:45 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux NFS Mailing List

Hi Trond-


> On Apr 28, 2017, at 1:25 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote:
> 
> Allow the user to specify that the client should use multiple connections
> to the server. For the moment, this functionality will be limited to
> TCP and to NFSv4.x (x>0).

Some initial reactions:

- 5/5 could be squashed into this patch (2/5).

- 4/5 adds support for using NFSv3 with a DS. Why can't you add NFSv3
support for multiple connections in the non-pNFS case? If there is a
good reason, it should be stated in 4/5's patch description or added
as a comment somewhere in the source code.

- Testing with a Linux server shows that the basic NFS/RDMA pieces
work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use
nconnect > 1. I'm looking into it.

- Testing with a Solaris 12 server prototype that supports NFSv4.1
works fine with nconnect=[23]. Not seeing much performance improvement
there because the server is using QDR and a single SATA SSD.

Thus I don't see a strong need to keep the TCP-only limitation. However,
if you do keep it, the logic that implements the second sentence in the
patch description above is added in 3/5. Should this sentence be in that
patch description instead? Or, instead:

s/For the moment/In a subsequent patch


> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
> ---
> fs/nfs/internal.h |  1 +
> fs/nfs/super.c    | 10 ++++++++++
> 2 files changed, 11 insertions(+)
> 
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 31b26cf1b476..31757a742e9b 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -117,6 +117,7 @@ struct nfs_parsed_mount_data {
> 		char			*export_path;
> 		int			port;
> 		unsigned short		protocol;
> +		unsigned short		nconnect;
> 	} nfs_server;
> 
> 	struct security_mnt_opts lsm_opts;
> diff --git a/fs/nfs/super.c b/fs/nfs/super.c
> index 54e0f9f2dd94..7eb48934dc79 100644
> --- a/fs/nfs/super.c
> +++ b/fs/nfs/super.c
> @@ -76,6 +76,8 @@
> #define NFS_DEFAULT_VERSION 2
> #endif
> 
> +#define NFS_MAX_CONNECTIONS 16
> +
> enum {
> 	/* Mount options that take no arguments */
> 	Opt_soft, Opt_hard,
> @@ -107,6 +109,7 @@ enum {
> 	Opt_nfsvers,
> 	Opt_sec, Opt_proto, Opt_mountproto, Opt_mounthost,
> 	Opt_addr, Opt_mountaddr, Opt_clientaddr,
> +	Opt_nconnect,
> 	Opt_lookupcache,
> 	Opt_fscache_uniq,
> 	Opt_local_lock,
> @@ -179,6 +182,8 @@ static const match_table_t nfs_mount_option_tokens = {
> 	{ Opt_mounthost, "mounthost=%s" },
> 	{ Opt_mountaddr, "mountaddr=%s" },
> 
> +	{ Opt_nconnect, "nconnect=%s" },
> +
> 	{ Opt_lookupcache, "lookupcache=%s" },
> 	{ Opt_fscache_uniq, "fsc=%s" },
> 	{ Opt_local_lock, "local_lock=%s" },
> @@ -1544,6 +1549,11 @@ static int nfs_parse_mount_options(char *raw,
> 			if (mnt->mount_server.addrlen == 0)
> 				goto out_invalid_address;
> 			break;
> +		case Opt_nconnect:
> +			if (nfs_get_option_ul_bound(args, &option, 1, NFS_MAX_CONNECTIONS))
> +				goto out_invalid_value;
> +			mnt->nfs_server.nconnect = option;
> +			break;
> 		case Opt_lookupcache:
> 			string = match_strdup(args);
> 			if (string == NULL)
> -- 
> 2.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use
  2017-05-04 13:45     ` [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use Chuck Lever
@ 2017-05-04 13:53       ` Chuck Lever
  2017-05-04 16:01       ` Chuck Lever
  1 sibling, 0 replies; 24+ messages in thread
From: Chuck Lever @ 2017-05-04 13:53 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux NFS Mailing List


> On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> 
> Hi Trond-
> 
> 
>> On Apr 28, 2017, at 1:25 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote:
>> 
>> Allow the user to specify that the client should use multiple connections
>> to the server. For the moment, this functionality will be limited to
>> TCP and to NFSv4.x (x>0).
> 
> Some initial reactions:
> 
> - 5/5 could be squashed into this patch (2/5).
> 
> - 4/5 adds support for using NFSv3 with a DS. Why can't you add NFSv3
> support for multiple connections in the non-pNFS case? If there is a
> good reason, it should be stated in 4/5's patch description or added
> as a comment somewhere in the source code.
> 
> - Testing with a Linux server shows that the basic NFS/RDMA pieces
> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use
> nconnect > 1. I'm looking into it.
> 
> - Testing with a Solaris 12 server prototype that supports NFSv4.1
> works fine with nconnect=[23]. Not seeing much performance improvement
> there because the server is using QDR and a single SATA SSD.
> 
> Thus I don't see a strong need to keep the TCP-only limitation. However,
> if you do keep it, the logic that implements the second sentence in the
> patch description above is added in 3/5. Should this sentence be in that
> patch description instead? Or, instead:
> 
> s/For the moment/In a subsequent patch

Oops, I forgot to mention: mountstats data looks a little confused
when nconnect > 1. For example:

WRITE:
           3075342 ops (131%)
        avg bytes sent per op: 26829    avg bytes received per op: 113
        backlog wait: 162.375128        RTT: 1.481101   total execute time: 163.861735 (milliseconds)

Haven't looked closely at that 131%, but it could be either the kernel
or the script itself is assuming one connection per mount.


>> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
>> ---
>> fs/nfs/internal.h |  1 +
>> fs/nfs/super.c    | 10 ++++++++++
>> 2 files changed, 11 insertions(+)
>> 
>> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
>> index 31b26cf1b476..31757a742e9b 100644
>> --- a/fs/nfs/internal.h
>> +++ b/fs/nfs/internal.h
>> @@ -117,6 +117,7 @@ struct nfs_parsed_mount_data {
>> 		char			*export_path;
>> 		int			port;
>> 		unsigned short		protocol;
>> +		unsigned short		nconnect;
>> 	} nfs_server;
>> 
>> 	struct security_mnt_opts lsm_opts;
>> diff --git a/fs/nfs/super.c b/fs/nfs/super.c
>> index 54e0f9f2dd94..7eb48934dc79 100644
>> --- a/fs/nfs/super.c
>> +++ b/fs/nfs/super.c
>> @@ -76,6 +76,8 @@
>> #define NFS_DEFAULT_VERSION 2
>> #endif
>> 
>> +#define NFS_MAX_CONNECTIONS 16
>> +
>> enum {
>> 	/* Mount options that take no arguments */
>> 	Opt_soft, Opt_hard,
>> @@ -107,6 +109,7 @@ enum {
>> 	Opt_nfsvers,
>> 	Opt_sec, Opt_proto, Opt_mountproto, Opt_mounthost,
>> 	Opt_addr, Opt_mountaddr, Opt_clientaddr,
>> +	Opt_nconnect,
>> 	Opt_lookupcache,
>> 	Opt_fscache_uniq,
>> 	Opt_local_lock,
>> @@ -179,6 +182,8 @@ static const match_table_t nfs_mount_option_tokens = {
>> 	{ Opt_mounthost, "mounthost=%s" },
>> 	{ Opt_mountaddr, "mountaddr=%s" },
>> 
>> +	{ Opt_nconnect, "nconnect=%s" },
>> +
>> 	{ Opt_lookupcache, "lookupcache=%s" },
>> 	{ Opt_fscache_uniq, "fsc=%s" },
>> 	{ Opt_local_lock, "local_lock=%s" },
>> @@ -1544,6 +1549,11 @@ static int nfs_parse_mount_options(char *raw,
>> 			if (mnt->mount_server.addrlen == 0)
>> 				goto out_invalid_address;
>> 			break;
>> +		case Opt_nconnect:
>> +			if (nfs_get_option_ul_bound(args, &option, 1, NFS_MAX_CONNECTIONS))
>> +				goto out_invalid_value;
>> +			mnt->nfs_server.nconnect = option;
>> +			break;
>> 		case Opt_lookupcache:
>> 			string = match_strdup(args);
>> 			if (string == NULL)
>> -- 
>> 2.9.3
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> Chuck Lever
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use
  2017-05-04 13:45     ` [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use Chuck Lever
  2017-05-04 13:53       ` Chuck Lever
@ 2017-05-04 16:01       ` Chuck Lever
  2017-05-04 17:36         ` J. Bruce Fields
  1 sibling, 1 reply; 24+ messages in thread
From: Chuck Lever @ 2017-05-04 16:01 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux NFS Mailing List


> On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> 
> - Testing with a Linux server shows that the basic NFS/RDMA pieces
> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use
> nconnect > 1. I'm looking into it.

Reproduced with NFSv4.1, TCP, and nconnect=2.

363         /*
364          * RFC5661 18.51.3
365          * Before RECLAIM_COMPLETE done, server should deny new lock
366          */
367         if (nfsd4_has_session(cstate) &&
368             !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE,
369                       &cstate->session->se_client->cl_flags) &&
370             open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS)
371                 return nfserr_grace;

Server-side instrumentation confirms:

May  4 11:28:29 klimt kernel: nfsd4_open: has_session returns true
May  4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false
May  4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0

Network capture shows the RPCs are interleaved between the two
connections as the client establishes its lease, and that appears
to be confusing the server.

C1: NULL -> NFS4_OK
C1: EXCHANGE_ID -> NFS4_OK
C2: CREATE_SESSION -> NFS4_OK
C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
C2: SEQUENCE -> NFS4_OK
C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
C1: BIND_CONN_TO_SESSION -> NFS4_OK
C2: BIND_CONN_TO_SESSION -> NFS4_OK
C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED

.... mix of GETATTRs and other simple requests ....

C1: OPEN -> NFS4ERR_GRACE
C2: OPEN -> NFS4ERR_GRACE

The RECLAIM_COMPLETE operation failed, and the client does not
retry it. That leaves its lease stuck in GRACE.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use
  2017-05-04 16:01       ` Chuck Lever
@ 2017-05-04 17:36         ` J. Bruce Fields
  2017-05-04 17:38           ` Chuck Lever
  0 siblings, 1 reply; 24+ messages in thread
From: J. Bruce Fields @ 2017-05-04 17:36 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Trond Myklebust, Linux NFS Mailing List

On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote:
> 
> > On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> > 
> > - Testing with a Linux server shows that the basic NFS/RDMA pieces
> > work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use
> > nconnect > 1. I'm looking into it.
> 
> Reproduced with NFSv4.1, TCP, and nconnect=2.
> 
> 363         /*
> 364          * RFC5661 18.51.3
> 365          * Before RECLAIM_COMPLETE done, server should deny new lock
> 366          */
> 367         if (nfsd4_has_session(cstate) &&
> 368             !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE,
> 369                       &cstate->session->se_client->cl_flags) &&
> 370             open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS)
> 371                 return nfserr_grace;
> 
> Server-side instrumentation confirms:
> 
> May  4 11:28:29 klimt kernel: nfsd4_open: has_session returns true
> May  4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false
> May  4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0
> 
> Network capture shows the RPCs are interleaved between the two
> connections as the client establishes its lease, and that appears
> to be confusing the server.
> 
> C1: NULL -> NFS4_OK
> C1: EXCHANGE_ID -> NFS4_OK
> C2: CREATE_SESSION -> NFS4_OK
> C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION

What security flavors are involved?  I believe the correct behavior
depends on whether gss is in use or not.

--b.

> C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
> C2: SEQUENCE -> NFS4_OK
> C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
> C1: BIND_CONN_TO_SESSION -> NFS4_OK
> C2: BIND_CONN_TO_SESSION -> NFS4_OK
> C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
> 
> .... mix of GETATTRs and other simple requests ....
> 
> C1: OPEN -> NFS4ERR_GRACE
> C2: OPEN -> NFS4ERR_GRACE
> 
> The RECLAIM_COMPLETE operation failed, and the client does not
> retry it. That leaves its lease stuck in GRACE.
> 
> 
> --
> Chuck Lever
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use
  2017-05-04 17:36         ` J. Bruce Fields
@ 2017-05-04 17:38           ` Chuck Lever
  2017-05-04 17:45             ` J. Bruce Fields
  0 siblings, 1 reply; 24+ messages in thread
From: Chuck Lever @ 2017-05-04 17:38 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Trond Myklebust, Linux NFS Mailing List


> On May 4, 2017, at 1:36 PM, bfields@fieldses.org wrote:
> 
> On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote:
>> 
>>> On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
>>> 
>>> - Testing with a Linux server shows that the basic NFS/RDMA pieces
>>> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use
>>> nconnect > 1. I'm looking into it.
>> 
>> Reproduced with NFSv4.1, TCP, and nconnect=2.
>> 
>> 363         /*
>> 364          * RFC5661 18.51.3
>> 365          * Before RECLAIM_COMPLETE done, server should deny new lock
>> 366          */
>> 367         if (nfsd4_has_session(cstate) &&
>> 368             !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE,
>> 369                       &cstate->session->se_client->cl_flags) &&
>> 370             open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS)
>> 371                 return nfserr_grace;
>> 
>> Server-side instrumentation confirms:
>> 
>> May  4 11:28:29 klimt kernel: nfsd4_open: has_session returns true
>> May  4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false
>> May  4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0
>> 
>> Network capture shows the RPCs are interleaved between the two
>> connections as the client establishes its lease, and that appears
>> to be confusing the server.
>> 
>> C1: NULL -> NFS4_OK
>> C1: EXCHANGE_ID -> NFS4_OK
>> C2: CREATE_SESSION -> NFS4_OK
>> C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
> 
> What security flavors are involved?  I believe the correct behavior
> depends on whether gss is in use or not.

The mount options are "sec=sys" but both sides have a keytab.
So the lease management operations are done with krb5i.


> --b.
> 
>> C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
>> C2: SEQUENCE -> NFS4_OK
>> C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
>> C1: BIND_CONN_TO_SESSION -> NFS4_OK
>> C2: BIND_CONN_TO_SESSION -> NFS4_OK
>> C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
>> 
>> .... mix of GETATTRs and other simple requests ....
>> 
>> C1: OPEN -> NFS4ERR_GRACE
>> C2: OPEN -> NFS4ERR_GRACE
>> 
>> The RECLAIM_COMPLETE operation failed, and the client does not
>> retry it. That leaves its lease stuck in GRACE.
>> 
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use
  2017-05-04 17:38           ` Chuck Lever
@ 2017-05-04 17:45             ` J. Bruce Fields
  2017-05-04 18:55               ` Chuck Lever
  2017-05-04 20:40               ` Trond Myklebust
  0 siblings, 2 replies; 24+ messages in thread
From: J. Bruce Fields @ 2017-05-04 17:45 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Trond Myklebust, Linux NFS Mailing List

On Thu, May 04, 2017 at 01:38:35PM -0400, Chuck Lever wrote:
> 
> > On May 4, 2017, at 1:36 PM, bfields@fieldses.org wrote:
> > 
> > On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote:
> >> 
> >>> On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> >>> 
> >>> - Testing with a Linux server shows that the basic NFS/RDMA pieces
> >>> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use
> >>> nconnect > 1. I'm looking into it.
> >> 
> >> Reproduced with NFSv4.1, TCP, and nconnect=2.
> >> 
> >> 363         /*
> >> 364          * RFC5661 18.51.3
> >> 365          * Before RECLAIM_COMPLETE done, server should deny new lock
> >> 366          */
> >> 367         if (nfsd4_has_session(cstate) &&
> >> 368             !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE,
> >> 369                       &cstate->session->se_client->cl_flags) &&
> >> 370             open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS)
> >> 371                 return nfserr_grace;
> >> 
> >> Server-side instrumentation confirms:
> >> 
> >> May  4 11:28:29 klimt kernel: nfsd4_open: has_session returns true
> >> May  4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false
> >> May  4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0
> >> 
> >> Network capture shows the RPCs are interleaved between the two
> >> connections as the client establishes its lease, and that appears
> >> to be confusing the server.
> >> 
> >> C1: NULL -> NFS4_OK
> >> C1: EXCHANGE_ID -> NFS4_OK
> >> C2: CREATE_SESSION -> NFS4_OK
> >> C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
> > 
> > What security flavors are involved?  I believe the correct behavior
> > depends on whether gss is in use or not.
> 
> The mount options are "sec=sys" but both sides have a keytab.
> So the lease management operations are done with krb5i.

OK.  I'm pretty sure the client needs to send BIND_CONN_TO_SESSION
before step C1.

My memory is that over auth_sys you're allowed to treat any SEQUENCE
over a new connection as implicitly binding that connection to the
referenced session, but over krb5 the server's required to return that
NOT_BOUND error if the server skips the BIND_CONN_TO_SESSION.

I think CREATE_SESSION is allowed as long as the principals agree, and
that's why the call at C2 succeeds.  Seems a little weird, though.

--b.

> 
> 
> > --b.
> > 
> >> C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
> >> C2: SEQUENCE -> NFS4_OK
> >> C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
> >> C1: BIND_CONN_TO_SESSION -> NFS4_OK
> >> C2: BIND_CONN_TO_SESSION -> NFS4_OK
> >> C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
> >> 
> >> .... mix of GETATTRs and other simple requests ....
> >> 
> >> C1: OPEN -> NFS4ERR_GRACE
> >> C2: OPEN -> NFS4ERR_GRACE
> >> 
> >> The RECLAIM_COMPLETE operation failed, and the client does not
> >> retry it. That leaves its lease stuck in GRACE.
> >> 
> >> 
> >> --
> >> Chuck Lever
> >> 
> >> 
> >> 
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> Chuck Lever
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use
  2017-05-04 17:45             ` J. Bruce Fields
@ 2017-05-04 18:55               ` Chuck Lever
  2017-05-04 19:58                 ` J. Bruce Fields
  2017-05-04 20:40               ` Trond Myklebust
  1 sibling, 1 reply; 24+ messages in thread
From: Chuck Lever @ 2017-05-04 18:55 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Trond Myklebust, Linux NFS Mailing List


> On May 4, 2017, at 1:45 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Thu, May 04, 2017 at 01:38:35PM -0400, Chuck Lever wrote:
>> 
>>> On May 4, 2017, at 1:36 PM, bfields@fieldses.org wrote:
>>> 
>>> On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote:
>>>> 
>>>>> On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
>>>>> 
>>>>> - Testing with a Linux server shows that the basic NFS/RDMA pieces
>>>>> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use
>>>>> nconnect > 1. I'm looking into it.
>>>> 
>>>> Reproduced with NFSv4.1, TCP, and nconnect=2.
>>>> 
>>>> 363         /*
>>>> 364          * RFC5661 18.51.3
>>>> 365          * Before RECLAIM_COMPLETE done, server should deny new lock
>>>> 366          */
>>>> 367         if (nfsd4_has_session(cstate) &&
>>>> 368             !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE,
>>>> 369                       &cstate->session->se_client->cl_flags) &&
>>>> 370             open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS)
>>>> 371                 return nfserr_grace;
>>>> 
>>>> Server-side instrumentation confirms:
>>>> 
>>>> May  4 11:28:29 klimt kernel: nfsd4_open: has_session returns true
>>>> May  4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false
>>>> May  4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0
>>>> 
>>>> Network capture shows the RPCs are interleaved between the two
>>>> connections as the client establishes its lease, and that appears
>>>> to be confusing the server.
>>>> 
>>>> C1: NULL -> NFS4_OK
>>>> C1: EXCHANGE_ID -> NFS4_OK
>>>> C2: CREATE_SESSION -> NFS4_OK
>>>> C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
>>> 
>>> What security flavors are involved?  I believe the correct behavior
>>> depends on whether gss is in use or not.
>> 
>> The mount options are "sec=sys" but both sides have a keytab.
>> So the lease management operations are done with krb5i.
> 
> OK.  I'm pretty sure the client needs to send BIND_CONN_TO_SESSION
> before step C1.
> 
> My memory is that over auth_sys you're allowed to treat any SEQUENCE
> over a new connection as implicitly binding that connection to the
> referenced session, but over krb5 the server's required to return that
> NOT_BOUND error if the server skips the BIND_CONN_TO_SESSION.

Ah, that would explain why nconnect=[234] is working against my
Solaris 12 server: no keytab on that server means lease management
is done using plain-old AUTH_SYS.

Multiple connections are now handled entirely by the RPC layer,
and are opened and used at rpc_clnt creation time. The NFS client
is not aware (except for allowing more than one connection to be
used) and relies on its own recovery mechanisms to deal with
exceptions that might arise. IOW it doesn't seem to know that an
extra BC2S is needed, nor does it know where in the RPC stream
to insert that operation.

Seems to me a good approach would be to handle server trunking
discovery and lease establishment using a single connection, and
then open more connections. A conservative approach might actually
hold off on opening additional connections until there are enough
RPC transactions being initiated in parallel to warrant it. Or, if
@nconnect > 1, use a single connection to perform lease management,
and open @nconnect additional connections that handle only per-
mount I/O activity.


> I think CREATE_SESSION is allowed as long as the principals agree, and
> that's why the call at C2 succeeds.  Seems a little weird, though.

Well, there's no SEQUENCE operation in that COMPOUND. No session
or connection to use there, I think the principal and client ID
are the only way to recognize the target of the operation?


> --b.
> 
>> 
>> 
>>> --b.
>>> 
>>>> C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
>>>> C2: SEQUENCE -> NFS4_OK
>>>> C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
>>>> C1: BIND_CONN_TO_SESSION -> NFS4_OK
>>>> C2: BIND_CONN_TO_SESSION -> NFS4_OK
>>>> C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
>>>> 
>>>> .... mix of GETATTRs and other simple requests ....
>>>> 
>>>> C1: OPEN -> NFS4ERR_GRACE
>>>> C2: OPEN -> NFS4ERR_GRACE
>>>> 
>>>> The RECLAIM_COMPLETE operation failed, and the client does not
>>>> retry it. That leaves its lease stuck in GRACE.
>>>> 
>>>> 
>>>> --
>>>> Chuck Lever
>>>> 
>>>> 
>>>> 
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 0/5] Fun with the multipathing code
  2017-04-28 17:25 [RFC PATCH 0/5] Fun with the multipathing code Trond Myklebust
  2017-04-28 17:25 ` [RFC PATCH 1/5] SUNRPC: Allow creation of RPC clients with multiple connections Trond Myklebust
  2017-04-28 17:45 ` [RFC PATCH 0/5] Fun with the multipathing code Chuck Lever
@ 2017-05-04 19:09 ` Anna Schumaker
  2019-01-09 19:39 ` Olga Kornievskaia
  3 siblings, 0 replies; 24+ messages in thread
From: Anna Schumaker @ 2017-05-04 19:09 UTC (permalink / raw)
  To: Trond Myklebust, linux-nfs

Hi Trond,

I'm testing these on two VMs with a single core each, so probably not the use case you had in mind for these patches.  I ran my script that runs connectathon tests on every NFS version, and I'm seeing it consistently takes about a minute longer with "nconnect=2" than it does without the option.

Thanks for working on this!
Anna

On 04/28/2017 01:25 PM, Trond Myklebust wrote:
> In the spirit of experimentation, I've put together a set of patches
> that implement setting up multiple TCP connections to the server.
> The connections all go to the same server IP address, so do not
> provide support for multiple IP addresses (which I believe is
> something Andy Adamson is working on).
> 
> The feature is only enabled for NFSv4.1 and NFSv4.2 for now; I don't
> feel comfortable subjecting NFSv3/v4 replay caches to this
> treatment yet. It relies on the mount option "nconnect" to specify
> the number of connections to st up. So you can do something like
>   'mount -t nfs -overs=4.1,nconnect=8 foo:/bar /mnt'
> to set up 8 TCP connections to server 'foo'.
> 
> Anyhow, feel free to test and give me feedback as to whether or not
> this helps performance on your system.
> 
> Trond Myklebust (5):
>   SUNRPC: Allow creation of RPC clients with multiple connections
>   NFS: Add a mount option to specify number of TCP connections to use
>   NFSv4: Allow multiple connections to NFSv4.x (x>0) servers
>   pNFS: Allow multiple connections to the DS
>   NFS: Display the "nconnect" mount option if it is set.
> 
>  fs/nfs/client.c             |  2 ++
>  fs/nfs/internal.h           |  2 ++
>  fs/nfs/nfs3client.c         |  3 +++
>  fs/nfs/nfs4client.c         | 13 +++++++++++--
>  fs/nfs/super.c              | 12 ++++++++++++
>  include/linux/nfs_fs_sb.h   |  1 +
>  include/linux/sunrpc/clnt.h |  1 +
>  net/sunrpc/clnt.c           | 17 ++++++++++++++++-
>  net/sunrpc/xprtmultipath.c  |  3 +--
>  9 files changed, 49 insertions(+), 5 deletions(-)
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use
  2017-05-04 18:55               ` Chuck Lever
@ 2017-05-04 19:58                 ` J. Bruce Fields
  0 siblings, 0 replies; 24+ messages in thread
From: J. Bruce Fields @ 2017-05-04 19:58 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Trond Myklebust, Linux NFS Mailing List

On Thu, May 04, 2017 at 02:55:06PM -0400, Chuck Lever wrote:
> 
> > On May 4, 2017, at 1:45 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > 
> > On Thu, May 04, 2017 at 01:38:35PM -0400, Chuck Lever wrote:
> >> 
> >>> On May 4, 2017, at 1:36 PM, bfields@fieldses.org wrote:
> >>> 
> >>> On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote:
> >>>> 
> >>>>> On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> >>>>> 
> >>>>> - Testing with a Linux server shows that the basic NFS/RDMA pieces
> >>>>> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use
> >>>>> nconnect > 1. I'm looking into it.
> >>>> 
> >>>> Reproduced with NFSv4.1, TCP, and nconnect=2.
> >>>> 
> >>>> 363         /*
> >>>> 364          * RFC5661 18.51.3
> >>>> 365          * Before RECLAIM_COMPLETE done, server should deny new lock
> >>>> 366          */
> >>>> 367         if (nfsd4_has_session(cstate) &&
> >>>> 368             !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE,
> >>>> 369                       &cstate->session->se_client->cl_flags) &&
> >>>> 370             open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS)
> >>>> 371                 return nfserr_grace;
> >>>> 
> >>>> Server-side instrumentation confirms:
> >>>> 
> >>>> May  4 11:28:29 klimt kernel: nfsd4_open: has_session returns true
> >>>> May  4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false
> >>>> May  4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0
> >>>> 
> >>>> Network capture shows the RPCs are interleaved between the two
> >>>> connections as the client establishes its lease, and that appears
> >>>> to be confusing the server.
> >>>> 
> >>>> C1: NULL -> NFS4_OK
> >>>> C1: EXCHANGE_ID -> NFS4_OK
> >>>> C2: CREATE_SESSION -> NFS4_OK
> >>>> C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
> >>> 
> >>> What security flavors are involved?  I believe the correct behavior
> >>> depends on whether gss is in use or not.
> >> 
> >> The mount options are "sec=sys" but both sides have a keytab.
> >> So the lease management operations are done with krb5i.
> > 
> > OK.  I'm pretty sure the client needs to send BIND_CONN_TO_SESSION
> > before step C1.
> > 
> > My memory is that over auth_sys you're allowed to treat any SEQUENCE
> > over a new connection as implicitly binding that connection to the
> > referenced session, but over krb5 the server's required to return that
> > NOT_BOUND error if the server skips the BIND_CONN_TO_SESSION.
> 
> Ah, that would explain why nconnect=[234] is working against my
> Solaris 12 server: no keytab on that server means lease management
> is done using plain-old AUTH_SYS.
> 
> Multiple connections are now handled entirely by the RPC layer,
> and are opened and used at rpc_clnt creation time. The NFS client
> is not aware (except for allowing more than one connection to be
> used) and relies on its own recovery mechanisms to deal with
> exceptions that might arise. IOW it doesn't seem to know that an
> extra BC2S is needed, nor does it know where in the RPC stream
> to insert that operation.
> 
> Seems to me a good approach would be to handle server trunking
> discovery and lease establishment using a single connection, and
> then open more connections. A conservative approach might actually
> hold off on opening additional connections until there are enough
> RPC transactions being initiated in parallel to warrant it. Or, if
> @nconnect > 1, use a single connection to perform lease management,
> and open @nconnect additional connections that handle only per-
> mount I/O activity.
> 
> 
> > I think CREATE_SESSION is allowed as long as the principals agree, and
> > that's why the call at C2 succeeds.  Seems a little weird, though.
> 
> Well, there's no SEQUENCE operation in that COMPOUND. No session
> or connection to use there, I think the principal and client ID
> are the only way to recognize the target of the operation?

I'm just not clear why the explicit BIND_CONN_TO_SESSION is required in
the gss case.

Actually, it's not gss exactly, it's the state protection level:

	If, when the client ID was created, the client opted for
	SP4_NONE state protection, the client is not required to use
	BIND_CONN_TO_SESSION to associate the connection with the
	session, unless the client wishes to associate the connection
	with the backchannel.  When SP4_NONE protection is used, simply
	sending a COMPOUND request with a SEQUENCE operation is
	sufficient to associate the connection with the session
	specified in SEQUENCE.

Anyway.

--b.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use
  2017-05-04 17:45             ` J. Bruce Fields
  2017-05-04 18:55               ` Chuck Lever
@ 2017-05-04 20:40               ` Trond Myklebust
  2017-05-04 20:42                 ` bfields
  1 sibling, 1 reply; 24+ messages in thread
From: Trond Myklebust @ 2017-05-04 20:40 UTC (permalink / raw)
  To: bfields, chuck.lever; +Cc: linux-nfs

T24gVGh1LCAyMDE3LTA1LTA0IGF0IDEzOjQ1IC0wNDAwLCBKLiBCcnVjZSBGaWVsZHMgd3JvdGU6
DQo+IE9uIFRodSwgTWF5IDA0LCAyMDE3IGF0IDAxOjM4OjM1UE0gLTA0MDAsIENodWNrIExldmVy
IHdyb3RlOg0KPiA+IA0KPiA+ID4gT24gTWF5IDQsIDIwMTcsIGF0IDE6MzYgUE0sIGJmaWVsZHNA
ZmllbGRzZXMub3JnIHdyb3RlOg0KPiA+ID4gDQo+ID4gPiBPbiBUaHUsIE1heSAwNCwgMjAxNyBh
dCAxMjowMToyOVBNIC0wNDAwLCBDaHVjayBMZXZlciB3cm90ZToNCj4gPiA+ID4gDQo+ID4gPiA+
ID4gT24gTWF5IDQsIDIwMTcsIGF0IDk6NDUgQU0sIENodWNrIExldmVyIDxjaHVjay5sZXZlckBv
cmFjbGUuYw0KPiA+ID4gPiA+IG9tPiB3cm90ZToNCj4gPiA+ID4gPiANCj4gPiA+ID4gPiAtIFRl
c3Rpbmcgd2l0aCBhIExpbnV4IHNlcnZlciBzaG93cyB0aGF0IHRoZSBiYXNpYyBORlMvUkRNQQ0K
PiA+ID4gPiA+IHBpZWNlcw0KPiA+ID4gPiA+IHdvcmssIGJ1dCBhbnkgT1BFTiBvcGVyYXRpb24g
Z2V0cyBORlM0RVJSX0dSQUNFLCBmb3JldmVyLA0KPiA+ID4gPiA+IHdoZW4gSSB1c2UNCj4gPiA+
ID4gPiBuY29ubmVjdCA+IDEuIEknbSBsb29raW5nIGludG8gaXQuDQo+ID4gPiA+IA0KPiA+ID4g
PiBSZXByb2R1Y2VkIHdpdGggTkZTdjQuMSwgVENQLCBhbmQgbmNvbm5lY3Q9Mi4NCj4gPiA+ID4g
DQo+ID4gPiA+IDM2M8KgwqDCoMKgwqDCoMKgwqDCoC8qDQo+ID4gPiA+IDM2NMKgwqDCoMKgwqDC
oMKgwqDCoMKgKiBSRkM1NjYxIDE4LjUxLjMNCj4gPiA+ID4gMzY1wqDCoMKgwqDCoMKgwqDCoMKg
wqAqIEJlZm9yZSBSRUNMQUlNX0NPTVBMRVRFIGRvbmUsIHNlcnZlciBzaG91bGQgZGVueQ0KPiA+
ID4gPiBuZXcgbG9jaw0KPiA+ID4gPiAzNjbCoMKgwqDCoMKgwqDCoMKgwqDCoCovDQo+ID4gPiA+
IDM2N8KgwqDCoMKgwqDCoMKgwqDCoGlmIChuZnNkNF9oYXNfc2Vzc2lvbihjc3RhdGUpICYmDQo+
ID4gPiA+IDM2OMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgIXRlc3RfYml0KE5GU0Q0X0NMSUVO
VF9SRUNMQUlNX0NPTVBMRVRFLA0KPiA+ID4gPiAzNjnCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
oMKgwqDCoMKgwqDCoMKgwqDCoMKgJmNzdGF0ZS0+c2Vzc2lvbi0+c2VfY2xpZW50LQ0KPiA+ID4g
PiA+Y2xfZmxhZ3MpICYmDQo+ID4gPiA+IDM3MMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgb3Bl
bi0+b3BfY2xhaW1fdHlwZSAhPQ0KPiA+ID4gPiBORlM0X09QRU5fQ0xBSU1fUFJFVklPVVMpDQo+
ID4gPiA+IDM3McKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqByZXR1cm4gbmZzZXJy
X2dyYWNlOw0KPiA+ID4gPiANCj4gPiA+ID4gU2VydmVyLXNpZGUgaW5zdHJ1bWVudGF0aW9uIGNv
bmZpcm1zOg0KPiA+ID4gPiANCj4gPiA+ID4gTWF5wqDCoDQgMTE6Mjg6Mjkga2xpbXQga2VybmVs
OiBuZnNkNF9vcGVuOiBoYXNfc2Vzc2lvbiByZXR1cm5zDQo+ID4gPiA+IHRydWUNCj4gPiA+ID4g
TWF5wqDCoDQgMTE6Mjg6Mjkga2xpbXQga2VybmVsOiBuZnNkNF9vcGVuOiBSRUNMQUlNX0NPTVBM
RVRFIGlzDQo+ID4gPiA+IGZhbHNlDQo+ID4gPiA+IE1hecKgwqA0IDExOjI4OjI5IGtsaW10IGtl
cm5lbDogbmZzZDRfb3BlbjogY2xhaW1fdHlwZSBpcyAwDQo+ID4gPiA+IA0KPiA+ID4gPiBOZXR3
b3JrIGNhcHR1cmUgc2hvd3MgdGhlIFJQQ3MgYXJlIGludGVybGVhdmVkIGJldHdlZW4gdGhlIHR3
bw0KPiA+ID4gPiBjb25uZWN0aW9ucyBhcyB0aGUgY2xpZW50IGVzdGFibGlzaGVzIGl0cyBsZWFz
ZSwgYW5kIHRoYXQNCj4gPiA+ID4gYXBwZWFycw0KPiA+ID4gPiB0byBiZSBjb25mdXNpbmcgdGhl
IHNlcnZlci4NCj4gPiA+ID4gDQo+ID4gPiA+IEMxOiBOVUxMIC0+IE5GUzRfT0sNCj4gPiA+ID4g
QzE6IEVYQ0hBTkdFX0lEIC0+IE5GUzRfT0sNCj4gPiA+ID4gQzI6IENSRUFURV9TRVNTSU9OIC0+
IE5GUzRfT0sNCj4gPiA+ID4gQzE6IFJFQ0xBSU1fQ09NUExFVEUgLT4gTkZTNEVSUl9DT05OX05P
VF9CT1VORF9UT19TRVNTSU9ODQo+ID4gPiANCj4gPiA+IFdoYXQgc2VjdXJpdHkgZmxhdm9ycyBh
cmUgaW52b2x2ZWQ/wqDCoEkgYmVsaWV2ZSB0aGUgY29ycmVjdA0KPiA+ID4gYmVoYXZpb3INCj4g
PiA+IGRlcGVuZHMgb24gd2hldGhlciBnc3MgaXMgaW4gdXNlIG9yIG5vdC4NCj4gPiANCj4gPiBU
aGUgbW91bnQgb3B0aW9ucyBhcmUgInNlYz1zeXMiIGJ1dCBib3RoIHNpZGVzIGhhdmUgYSBrZXl0
YWIuDQo+ID4gU28gdGhlIGxlYXNlIG1hbmFnZW1lbnQgb3BlcmF0aW9ucyBhcmUgZG9uZSB3aXRo
IGtyYjVpLg0KPiANCj4gT0suwqDCoEknbSBwcmV0dHkgc3VyZSB0aGUgY2xpZW50IG5lZWRzIHRv
IHNlbmQgQklORF9DT05OX1RPX1NFU1NJT04NCj4gYmVmb3JlIHN0ZXAgQzEuDQo+IA0KPiBNeSBt
ZW1vcnkgaXMgdGhhdCBvdmVyIGF1dGhfc3lzIHlvdSdyZSBhbGxvd2VkIHRvIHRyZWF0IGFueSBT
RVFVRU5DRQ0KPiBvdmVyIGEgbmV3IGNvbm5lY3Rpb24gYXMgaW1wbGljaXRseSBiaW5kaW5nIHRo
YXQgY29ubmVjdGlvbiB0byB0aGUNCj4gcmVmZXJlbmNlZCBzZXNzaW9uLCBidXQgb3ZlciBrcmI1
IHRoZSBzZXJ2ZXIncyByZXF1aXJlZCB0byByZXR1cm4NCj4gdGhhdA0KPiBOT1RfQk9VTkQgZXJy
b3IgaWYgdGhlIHNlcnZlciBza2lwcyB0aGUgQklORF9DT05OX1RPX1NFU1NJT04uDQo+IA0KPiBJ
IHRoaW5rIENSRUFURV9TRVNTSU9OIGlzIGFsbG93ZWQgYXMgbG9uZyBhcyB0aGUgcHJpbmNpcGFs
cyBhZ3JlZSwNCj4gYW5kDQo+IHRoYXQncyB3aHkgdGhlIGNhbGwgYXQgQzIgc3VjY2VlZHMuwqDC
oFNlZW1zIGEgbGl0dGxlIHdlaXJkLCB0aG91Z2guDQo+IA0KDQpTZWUgaHR0cHM6Ly90b29scy5p
ZXRmLm9yZy9odG1sL3JmYzU2NjEjc2VjdGlvbi0yLjEwLjMuMQ0KDQpTbywgd2UgcHJvYmFibHkg
c2hvdWxkIHNlbmQgdGhlIEJJTkRfQ09OTl9UT19TRVNTSU9OIGFmdGVyIGNyZWF0aW5nIHRoZQ0K
c2Vzc2lvbiwgYnV0IHNpbmNlIHRoYXQgaW52b2x2ZXMgZmlndXJpbmcgb3V0IHdoZXRoZXIgb3Ig
bm90IHN0YXRlDQpwcm90ZWN0aW9uIHdhcyBzdWNjZXNzZnVsbHkgbmVnb3RpYXRlZCwgYW5kIHNp
bmNlIHdlIGhhdmUgdG8gc3VwcG9ydA0KaGFuZGxpbmcgTkZTNEVSUl9DT05OX05PVF9CT1VORF9U
T19TRVNTSU9OIGFueXdheSwgSSdtIGFsbCBmb3IganVzdA0Kd2FpdGluZyBmb3IgdGhlIHNlcnZl
ciB0byBzZW5kIHRoZSBlcnJvci4NCg0KPiAtLWIuDQo+IA0KPiA+IA0KPiA+IA0KPiA+ID4gLS1i
Lg0KPiA+ID4gDQo+ID4gPiA+IEMxOiBQVVRST09URkggfCBHRVRBVFRSIC0+IE5GUzRFUlJfU0VR
X01JU09SREVSRUQNCj4gPiA+ID4gQzI6IFNFUVVFTkNFIC0+IE5GUzRfT0sNCj4gPiA+ID4gQzE6
IFBVVFJPT1RGSCB8IEdFVEFUVFIgLT4gTkZTNEVSUl9DT05OX05PVF9CT1VORF9UT19TRVNTSU9O
DQo+ID4gPiA+IEMxOiBCSU5EX0NPTk5fVE9fU0VTU0lPTiAtPiBORlM0X09LDQo+ID4gPiA+IEMy
OiBCSU5EX0NPTk5fVE9fU0VTU0lPTiAtPiBORlM0X09LDQo+ID4gPiA+IEMyOiBQVVRST09URkgg
fCBHRVRBVFRSIC0+IE5GUzRFUlJfU0VRX01JU09SREVSRUQNCj4gPiA+ID4gDQo+ID4gPiA+IC4u
Li4gbWl4IG9mIEdFVEFUVFJzIGFuZCBvdGhlciBzaW1wbGUgcmVxdWVzdHMgLi4uLg0KPiA+ID4g
PiANCj4gPiA+ID4gQzE6IE9QRU4gLT4gTkZTNEVSUl9HUkFDRQ0KPiA+ID4gPiBDMjogT1BFTiAt
PiBORlM0RVJSX0dSQUNFDQo+ID4gPiA+IA0KPiA+ID4gPiBUaGUgUkVDTEFJTV9DT01QTEVURSBv
cGVyYXRpb24gZmFpbGVkLCBhbmQgdGhlIGNsaWVudCBkb2VzIG5vdA0KPiA+ID4gPiByZXRyeSBp
dC4gVGhhdCBsZWF2ZXMgaXRzIGxlYXNlIHN0dWNrIGluIEdSQUNFLg0KPiA+ID4gPiANCj4gPiA+
ID4gDQo+ID4gPiA+IC0tDQo+ID4gPiA+IENodWNrIExldmVyDQo+ID4gPiA+IA0KPiA+ID4gPiAN
Cj4gPiA+ID4gDQo+ID4gPiA+IC0tDQo+ID4gPiA+IFRvIHVuc3Vic2NyaWJlIGZyb20gdGhpcyBs
aXN0OiBzZW5kIHRoZSBsaW5lICJ1bnN1YnNjcmliZQ0KPiA+ID4gPiBsaW51eC1uZnMiIGluDQo+
ID4gPiA+IHRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5rZXJuZWwub3Jn
DQo+ID4gPiA+IE1vcmUgbWFqb3Jkb21vIGluZm8gYXTCoMKgaHR0cDovL3ZnZXIua2VybmVsLm9y
Zy9tYWpvcmRvbW8taW5mby5oDQo+ID4gPiA+IHRtbA0KPiA+ID4gDQo+ID4gPiAtLQ0KPiA+ID4g
VG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2NyaWJl
IGxpbnV4LQ0KPiA+ID4gbmZzIiBpbg0KPiA+ID4gdGhlIGJvZHkgb2YgYSBtZXNzYWdlIHRvIG1h
am9yZG9tb0B2Z2VyLmtlcm5lbC5vcmcNCj4gPiA+IE1vcmUgbWFqb3Jkb21vIGluZm8gYXTCoMKg
aHR0cDovL3ZnZXIua2VybmVsLm9yZy9tYWpvcmRvbW8taW5mby5odG0NCj4gPiA+IGwNCj4gPiAN
Cj4gPiAtLQ0KPiA+IENodWNrIExldmVyDQo+ID4gDQo+ID4gDQo+ID4gDQo+ID4gLS0NCj4gPiBU
byB1bnN1YnNjcmliZSBmcm9tIHRoaXMgbGlzdDogc2VuZCB0aGUgbGluZSAidW5zdWJzY3JpYmUg
bGludXgtDQo+ID4gbmZzIiBpbg0KPiA+IHRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRv
bW9Admdlci5rZXJuZWwub3JnDQo+ID4gTW9yZSBtYWpvcmRvbW8gaW5mbyBhdMKgwqBodHRwOi8v
dmdlci5rZXJuZWwub3JnL21ham9yZG9tby1pbmZvLmh0bWwNCj4gDQo+IC0tDQo+IFRvIHVuc3Vi
c2NyaWJlIGZyb20gdGhpcyBsaXN0OiBzZW5kIHRoZSBsaW5lICJ1bnN1YnNjcmliZSBsaW51eC1u
ZnMiDQo+IGluDQo+IHRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5rZXJu
ZWwub3JnDQo+IE1vcmUgbWFqb3Jkb21vIGluZm8gYXTCoMKgaHR0cDovL3ZnZXIua2VybmVsLm9y
Zy9tYWpvcmRvbW8taW5mby5odG1sDQo+IA0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5G
UyBjbGllbnQgbWFpbnRhaW5lciwgUHJpbWFyeURhdGENCnRyb25kLm15a2xlYnVzdEBwcmltYXJ5
ZGF0YS5jb20NCg==


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use
  2017-05-04 20:40               ` Trond Myklebust
@ 2017-05-04 20:42                 ` bfields
  0 siblings, 0 replies; 24+ messages in thread
From: bfields @ 2017-05-04 20:42 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: chuck.lever, linux-nfs

On Thu, May 04, 2017 at 08:40:07PM +0000, Trond Myklebust wrote:
> See https://tools.ietf.org/html/rfc5661#section-2.10.3.1
> 
> So, we probably should send the BIND_CONN_TO_SESSION after creating the
> session, but since that involves figuring out whether or not state
> protection was successfully negotiated, and since we have to support
> handling NFS4ERR_CONN_NOT_BOUND_TO_SESSION anyway, I'm all for just
> waiting for the server to send the error.

Makes sense to me.

--b.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 0/5] Fun with the multipathing code
  2017-04-28 17:25 [RFC PATCH 0/5] Fun with the multipathing code Trond Myklebust
                   ` (2 preceding siblings ...)
  2017-05-04 19:09 ` Anna Schumaker
@ 2019-01-09 19:39 ` Olga Kornievskaia
  2019-01-09 20:38   ` Trond Myklebust
  3 siblings, 1 reply; 24+ messages in thread
From: Olga Kornievskaia @ 2019-01-09 19:39 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

Hi Trond,

Do you have any plans for this patch set?

I applied the patches on top of 4.20-rc7 kernel I had and tested it
(linux to linux) with iozone on the hardware (40G link with Mellanox
CX-5 card).

Results seem to show read IO improvement from 1.9GB to 3.9GB. Write IO
speed seems to be the same (disk bound I'm guessing). I also tried
mounting tmpfs. Same thing.

Seems like a useful feature to include?

Some raw numbers I got. Each nconnect=X value is just a single data point.

With nconnect=10
        Command line used: /home/kolga/iozone3_482/src/current/iozone
-i0 -i1 -s52m -y2k -az -I
        Output is in kBytes/sec
                                                              random
 random     bkwd    record    stride
              kB  reclen    write  rewrite    read    reread    read
  write     read   rewrite      read   fwrite frewrite    fread
freread
           53248       2     7820    10956    20960    20927
           53248       4    14871    21305    38743    38242
           53248       8    27803    35001    75568    75830
           53248      16    47452    59596   132513   130921
           53248      32    70572    84940   234902   233423
           53248      64    94774   101237   355664   354372
           53248     128   114667   119413   523245   524855
           53248     256   132340   137530   682411   681260
           53248     512   143172   146157   784144   356064
           53248    1024   148874   154177  1013764   982943
           53248    2048   144311   161233  1282095  1592057
           53248    4096   164679   169837  1637788  2438329
           53248    8192   159221   142882   188536  1523659
           53248   16384   169236    96996  3914910  1875398

With nconnect=9
                                                              random
 random     bkwd    record    stride
              kB  reclen    write  rewrite    read    reread    read
  write     read   rewrite      read   fwrite frewrite    fread
freread
           53248       2     7833    10991    20893    20910
           53248       4    15254    21136    40030    37510
           53248       8    28077    37834    76688    67560
           53248      16    47850    60174   137175   135266
           53248      32    70653    85120   240219   235160
           53248      64    96742   103856   364931   363556
           53248     128   115002   119222   526446   517589
           53248     256   132349   137254   684606   693748
           53248     512   142849   147385   838735   876868
           53248    1024   149612   152187  1060375   968514
           53248    2048   150830   156006  1476364  1689987
           53248    4096   163228   168421  1000338  1645183
           53248    8192   165049   151047  3168655  3274393
           53248   16384   166007   175972   743835  3817903

With nconnect=8
                                                              random
 random     bkwd    record    stride
              kB  reclen    write  rewrite    read    reread    read
  write     read   rewrite      read   fwrite frewrite    fread
freread
           53248       2     7118    10321    20281    20353
           53248       4    13960    20445    39233    39160
           53248       8    24688    36543    74964    75111
           53248      16    44674    57346   131362   130294
           53248      32    67547    82716   231881   228998
           53248      64    94195   103270   345326   343389
           53248     128   116830   119816   521772   511537
           53248     256   133709   137917   682126   693098
           53248     512   143913   148801   878939   860046
           53248    1024   150329   154027  1041977  1028612
           53248    2048   157680   158844     7378  1486753
           53248    4096   159543   160027  2441901  2168589
           53248    8192   165155   160193  2515452  3142285
           53248   16384   169411   176009  2385325  3894130

With nconnect=7
                                                              random
 random     bkwd    record    stride
              kB  reclen    write  rewrite    read    reread    read
  write     read   rewrite      read   fwrite frewrite    fread
freread
           53248       2     7574    10593    20459    20381
           53248       4    15064    20928    39865    39760
           53248       8    27696    36864    74300    65721
           53248      16    46960    59010   128354   127600
           53248      32    68841    83578   230226   227369
           53248      64    93114   100612   342303   331331
           53248     128   112599   116108   498004   508645
           53248     256   130668   136554   653718   634570
           53248     512   142318   146749   805566   807056
           53248    1024   148693   152493   965095   974736
           53248    2048   157342   161170  1794490  1697579
           53248    4096   144672   161154  2371227  2089308
           53248    8192   148515   172814  3098132   766539
           53248   16384   152801   143075  3799398  3778023

With nconnect=6
                                                              random
 random     bkwd    record    stride
              kB  reclen    write  rewrite    read    reread    read
  write     read   rewrite      read   fwrite frewrite    fread
freread
           53248       2     7832    11103    21119    21254
           53248       4    15490    21607    40520    40215
           53248       8    25519    37333    78626    77118
           53248      16    47885    54596   139343   138482
           53248      32    71914    85094   239720   237024
           53248      64    93901   100491   383238   377849
           53248     128    95497   119289   545658   533312
           53248     256   131614   137665   726717   716209
           53248     512   143397   147452   896038   869623
           53248    1024   149938   153885  1057554  1062727
           53248    2048   157542   159369  1750302  1691100
           53248    4096   163450   162691  2524086  2622917
           53248    8192   162439   153065  3320433  3286189
           53248   16384   153553   166918  3873279  3855965

With nconnect=5
                                                             random
random     bkwd    record    stride
              kB  reclen    write  rewrite    read    reread    read
  write     read   rewrite      read   fwrite frewrite    fread
freread
           53248       2     7592    10794    20382    20251
           53248       4    15068    21096    41136    41865
           53248       8    27606    37260    74947    74655
           53248      16    47387    59806   137103   135962
           53248      32    70402    83767   244301   241492
           53248      64    95702   103042   361709   356424
           53248     128   114189   118505   564857   556585
           53248     256   132799   137856   751432   726667
           53248     512   143233   146747   900493   921180
           53248    1024   150787   154337  1106200  1088739
           53248    2048   156873   161403  1133588  1709520
           53248    4096   163741   166672  2468622  2275947
           53248    8192   147689   165501  2969179  2943782
           53248   16384   157076   143898  3468473  3580892

With nconnect=4
                                                              random
 random     bkwd    record    stride
              kB  reclen    write  rewrite    read    reread    read
  write     read   rewrite      read   fwrite frewrite    fread
freread
           53248       2     7280    10499    20140    21610
           53248       4    15003    20658    39282    39084
           53248       8    27440    36211    72983    74006
           53248      16    46702    58114   130113   129372
           53248      32    67942    81592   237173   246333
           53248      64    92098    98403   351618   349844
           53248     128   117327   120451   492681   480222
           53248     256   134457   137616   676207   666874
           53248     512   144648   148179   853880   855267
           53248    1024   151171   156382  1108038  1075847
           53248    2048   157698   161736  1704862  1659547
           53248    4096   164955   163237     9991  2274603
           53248    8192   167987   173542  3189440  1304661
           53248   16384   160230   158367   616211  1008327

With nconnect=3
                                                              random
 random     bkwd    record    stride
              kB  reclen    write  rewrite    read    reread    read
  write     read   rewrite      read   fwrite frewrite    fread
freread
           53248       2     7954    11188    21786    21304
           53248       4    15574    21973    41739    40116
           53248       8    26917    38019    77460    77323
           53248      16    47879    60593   140885   139938
           53248      32    69304    83709   250196   247017
           53248      64    95273   102929   371638   362578
           53248     128   113436   118636   504672   495772
           53248     256   131659   136857   749558   739310
           53248     512   142581   146588   933209   907939
           53248    1024   149502   152321  1092066  1093344
           53248    2048   156992   162151  1821551  1772388
           53248    4096   164692   170124  2530693  2442783
           53248    8192   169409   175014  2795110  2795262
           53248   16384   171873   176216  3088432  3172946
With nconnect=2
                                                               random
  random     bkwd    record    stride
              kB  reclen    write  rewrite    read    reread    read
  write     read   rewrite      read   fwrite frewrite    fread
freread
           53248       2     7653    10723    20632    20970
           53248       4    15232    21710    43017    42909
           53248       8    27894    38009    80566    80249
           53248      16    47392    60132   140226   138809
           53248      32    72166    84713   240219   240935
           53248      64    95449   102520   392916   387097
           53248     128   113915   118447   592994   579702
           53248     256   132337   136397   808895   782690
           53248     512   142757   147276  1023450   980987
           53248    1024   149803   153748  1232539  1200873
           53248    2048   117144   142496  1726862  1846521
           53248    4096   129211   168913  2327366  2035403
           53248    8192   168842   173977  2079450   859542
           53248   16384   170514   133000  2450596   856588

With nconnect=1
                                                              random
 random     bkwd    record    stride
              kB  reclen    write  rewrite    read    reread    read
  write     read   rewrite      read   fwrite frewrite    fread
freread
           53248       2     7287    10482    20808    20586
           53248       4    14282    20916    41216    40532
           53248       8    26230    36606    76589    79005
           53248      16    45838    59445   142976   141382
           53248      32    70513    84601   250468   247247
           53248      64    95128   103600   373719   377915
           53248     128   116702   121174   571526   558482
           53248     256   133131   137286   720249   702101
           53248     512   140870   145269   907632   894129
           53248    1024   148632   152558  1025853  1071471
           53248    2048    69684    68052  1640169  1587587
           53248    4096    57389    65044  1932496  1923277
           53248    8192    65201    75412  1896445  1880839
           53248   16384    86395   109635  1784491  1777077


Mounting a tmpfs instead of the disk
nconnect=10
                                                              random
 random     bkwd    record    stride
              kB  reclen    write  rewrite    read    reread    read
  write     read   rewrite      read   fwrite frewrite    fread
freread
           53248       2    20766    21097    21248    21096
           53248       4    38718    39837    40282    40562
           53248       8    70787    73029    75134    75473
           53248      16   129871   135244   137464   137202
           53248      32   206931   225844   246440   243423
           53248      64   307101   324226   362781   363964
           53248     128   423743   437825   533503   539324
           53248     256   549566   600099   726419   756622
           53248     512   658211   723361   890941   902508
           53248    1024   771731   898627  1079691  1125845
           53248    2048   904072  1047097  1746060  1814433
           53248    4096  1197609  1278558  1780285  2390797
           53248    8192  1022231  1523377  1463727  1304735
           53248   16384  1321716  1716730  3913052  3861092

nconnect=9
                                                              random
 random     bkwd    record    stride
              kB  reclen    write  rewrite    read    reread    read
  write     read   rewrite      read   fwrite frewrite    fread
freread
           53248       2    18595    19418    19935    19555
           53248       4    38048    38871    39015    39058
           53248       8    70431    73903    73787    73437
           53248      16   115428   120146   108439   132652
           53248      32   189369   208458   238736   239319
           53248      64   310172   326099   351834   350228
           53248     128   419917   443973   540968   538233
           53248     256   542390   578625   724630   721654
           53248     512   636801   692928   876813   886978
           53248    1024   740769   807593  1023254  1038803
           53248    2048   900703   977706  1744465  1795702
           53248    4096   991434  1218405  2312809  1534298
           53248    8192   172671  1556220  3210650  1240208
           53248   16384  1135860  1732470  3855099  3912755

nconnect=8
                                                              random
 random     bkwd    record    stride
              kB  reclen    write  rewrite    read    reread    read
  write     read   rewrite      read   fwrite frewrite    fread
freread
           53248       2    20164    20622    20499    21020
           53248       4    38006    39090    40008    40093
           53248       8    70803    72965    75611    75827
           53248      16   125845   132516   135011   135602
           53248      32   216442   232697   239348   239241
           53248      64   288013   297895   356983   363912
           53248     128   418932   441833   520451   513015
           53248     256   560464   616810   726013   730965
           53248     512   674367   722693   903227   936461
           53248    1024   761283   840974  1089472  1128827
           53248    2048   943060   924299  1467459  1666917
           53248    4096   970724  1052788  2433414  1938400
           53248    8192  1342030  1089869   464917  3304996
           53248   16384  1458436  1095725  3794363  1635401

nconnect=7
                                                              random
 random     bkwd    record    stride
              kB  reclen    write  rewrite    read    reread    read
  write     read   rewrite      read   fwrite frewrite    fread
freread
           53248       2    20482    21154    21481    21409
           53248       4    39328    40445    41006    40581
           53248       8    75042    77753    80518    79727
           53248      16   131785   136573   139394   138978
           53248      32   150097   209044   249709   250655
           53248      64   316353   333310   380193   383393
           53248     128   427594   453668   573614   573235
           53248     256   568166   611842   751230   753997
           53248     512   655601   718936   909862   920353
           53248    1024   749337   824988  1073221  1092846
           53248    2048   959526   991769  1722507  1835308
           53248    4096  1114485  1273084   824029  2244745
           53248    8192  1096944  1590424  3208102  1612757
           53248   16384   186085  1777460  2446002  3071636

nconnect=6
                                                              random
 random     bkwd    record    stride
              kB  reclen    write  rewrite    read    reread    read
  write     read   rewrite      read   fwrite frewrite    fread
freread
           53248       2    19954    20159    20472    20692
           53248       4    38829    39657    40025    39943
           53248       8    70936    73492    74566    74764
           53248      16   119267   123319   136927   136591
           53248      32   193462   227254   239441   240293
           53248      64   280700   280861   348085   352502
           53248     128   410708   433280   268324   480572
           53248     256   549707   599025   705775   721743
           53248     512   694691   777286   834676   831794
           53248    1024   796161   899669   985672  1011762
           53248    2048   660219  1095097  1442643  1536969
           53248    4096   713024  1097287  2375110  2278199
           53248    8192   961825   814827  1414807  1073586
           53248   16384  1302666   188459   789169  3799328

nconnect=5
                                                              random
 random     bkwd    record    stride
              kB  reclen    write  rewrite    read    reread    read
  write     read   rewrite      read   fwrite frewrite    fread
freread
           53248       2    20083    20853    21387    21790
           53248       4    39346    40634    41595    41911
           53248       8    72275    75203    78950    79016
           53248      16   110484   128308   135731   131166
           53248      32   202718   216528   239493   240653
           53248      64   293191   298468   379034   382413
           53248     128   457944   496666   551294   555308
           53248     256   595181   641156   750500   751126
           53248     512   694337   787317   895434   898956
           53248    1024   761906   854799  1064769  1073980
           53248    2048   946967  1116994  1735369  1746934
           53248    4096   392953  1355423  2615086  2455756
           53248    8192  1356030  1578369  3033668  3172360
           53248   16384  1454587  1743974  3562513  3540975

nconnect=4
                                                             random
random     bkwd    record    stride
              kB  reclen    write  rewrite    read    reread    read
  write     read   rewrite      read   fwrite frewrite    fread
freread
           53248       2    20228    20092    21908    22120
           53248       4    38694    39986    41089    41153
           53248       8    75699    78465    80083    80017
           53248      16   102728   130883   135680   141924
           53248      32   220118   231684   240910   249315
           53248      64   302994   321295   385046   386325
           53248     128   457099   488792   564420   563577
           53248     256   586191   676053   767127   776559
           53248     512   715344   782611   899003   906520
           53248    1024   771923   874051  1182348  1256440
           53248    2048   969607  1104706  1557321  1911278
           53248    4096  1179644   981022  1722069  2709534
           53248    8192  1216820  1556373  3159477  3254646
           53248   16384  1508198   605894  3517653  3571029

nconnect=3
                                                              random
 random     bkwd    record    stride
              kB  reclen    write  rewrite    read    reread    read
  write     read   rewrite      read   fwrite frewrite    fread
freread
           53248       2    21481    21763    21988    21311
           53248       4    39828    40888    41669    41768
           53248       8    65010    76085    80491    80466
           53248      16   123527   135609   143423   144154
           53248      32   225695   236990   250957   251665
           53248      64   320309   348847   396364   396967
           53248     128   426707   452220   565097   565103
           53248     256   558951   600620   763477   767196
           53248     512   668986   726410   972622   989905
           53248    1024   782668   839173  1183444  1149741
           53248    2048   974740  1075588  1853002  1885892
           53248    4096  1198605  1308529  1270347  1624458
           53248    8192   936760  1609546  2008581  2949932
           53248   16384   579957  1068755  1254678  1268465

nconnect=2
                                                              random
 random     bkwd    record    stride
              kB  reclen    write  rewrite    read    reread    read
  write     read   rewrite      read   fwrite frewrite    fread
freread
           53248       2    20386    21137    21406    21519
           53248       4    38273    39530    40406    41521
           53248       8    73789    73972    78914    79116
           53248      16   127961   133436   138270   137096
           53248      32   213333   231143   238689   239144
           53248      64   292544   301586   372603   374027
           53248     128   449001   480655   552909   532209
           53248     256   551713   611455   726627   738374
           53248     512   652788   745258   845863   848531
           53248    1024   822491   904270  1080454  1024272
           53248    2048   829847   948519  2001870  1985974
           53248    4096  1198116  1387247  2519900  2503433
           53248    8192  1345305  1475502  2918073  3259019
           53248   16384   634718   475630  3128884  2969906

nconnect=1
                                                              random
 random     bkwd    record    stride
              kB  reclen    write  rewrite    read    reread    read
  write     read   rewrite      read   fwrite frewrite    fread
freread
           53248       2    21288    21799    21638    21763
           53248       4    40599    42412    42758    42762
           53248       8    75734    78713    80072    81414
           53248      16   124331   133874   148128   148421
           53248      32   229738   242286   261479   262589
           53248      64   337174   357993   385598   391051
           53248     128   428862   462394   582345   576003
           53248     256   527788   530506   780829   790614
           53248     512   668147   732605  1071388  1058561
           53248    1024   823391   921211  1218651  1223558
           53248    2048  1016144  1111789  1600626  1585513
           53248    4096  1251567  1436417  1818215  1868426
           53248    8192  1479547  1716916  1804469  1789697
           53248   16384  1435145  1954500  1796230  1799570



On Sun, Apr 30, 2017 at 8:49 AM Trond Myklebust
<trond.myklebust@primarydata.com> wrote:
>
> In the spirit of experimentation, I've put together a set of patches
> that implement setting up multiple TCP connections to the server.
> The connections all go to the same server IP address, so do not
> provide support for multiple IP addresses (which I believe is
> something Andy Adamson is working on).
>
> The feature is only enabled for NFSv4.1 and NFSv4.2 for now; I don't
> feel comfortable subjecting NFSv3/v4 replay caches to this
> treatment yet. It relies on the mount option "nconnect" to specify
> the number of connections to st up. So you can do something like
>   'mount -t nfs -overs=4.1,nconnect=8 foo:/bar /mnt'
> to set up 8 TCP connections to server 'foo'.
>
> Anyhow, feel free to test and give me feedback as to whether or not
> this helps performance on your system.
>
> Trond Myklebust (5):
>   SUNRPC: Allow creation of RPC clients with multiple connections
>   NFS: Add a mount option to specify number of TCP connections to use
>   NFSv4: Allow multiple connections to NFSv4.x (x>0) servers
>   pNFS: Allow multiple connections to the DS
>   NFS: Display the "nconnect" mount option if it is set.
>
>  fs/nfs/client.c             |  2 ++
>  fs/nfs/internal.h           |  2 ++
>  fs/nfs/nfs3client.c         |  3 +++
>  fs/nfs/nfs4client.c         | 13 +++++++++++--
>  fs/nfs/super.c              | 12 ++++++++++++
>  include/linux/nfs_fs_sb.h   |  1 +
>  include/linux/sunrpc/clnt.h |  1 +
>  net/sunrpc/clnt.c           | 17 ++++++++++++++++-
>  net/sunrpc/xprtmultipath.c  |  3 +--
>  9 files changed, 49 insertions(+), 5 deletions(-)
>
> --
> 2.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 0/5] Fun with the multipathing code
  2019-01-09 19:39 ` Olga Kornievskaia
@ 2019-01-09 20:38   ` Trond Myklebust
  2019-01-09 22:18     ` Olga Kornievskaia
  0 siblings, 1 reply; 24+ messages in thread
From: Trond Myklebust @ 2019-01-09 20:38 UTC (permalink / raw)
  To: aglo; +Cc: linux-nfs

Hi Olga

On Wed, 2019-01-09 at 14:39 -0500, Olga Kornievskaia wrote:
> Hi Trond,
> 
> Do you have any plans for this patch set?
> 
> I applied the patches on top of 4.20-rc7 kernel I had and tested it
> (linux to linux) with iozone on the hardware (40G link with Mellanox
> CX-5 card).
> 
> Results seem to show read IO improvement from 1.9GB to 3.9GB. Write
> IO
> speed seems to be the same (disk bound I'm guessing). I also tried
> mounting tmpfs. Same thing.
> 
> Seems like a useful feature to include?

Thanks for testing this.

Was this your own port of the original patches, or have you taken my
branch from 
http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=shortlog;h=refs/heads/multipath_tcp
?

Either way I appreciate the data point. I haven't seen too many other
reports of performance improvements, and that's the main reason why
this patchset has languished.

3.9GB/s would be about 31Gbps, so that is not quite wire speed, but
certainly a big improvement on 1.9GB/s. I'm a little surprised, tbough,
that the write performance did not improve with the tmpfs. Was all this
using aio+dio on the client?

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 0/5] Fun with the multipathing code
  2019-01-09 20:38   ` Trond Myklebust
@ 2019-01-09 22:18     ` Olga Kornievskaia
  0 siblings, 0 replies; 24+ messages in thread
From: Olga Kornievskaia @ 2019-01-09 22:18 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Wed, Jan 9, 2019 at 3:38 PM Trond Myklebust <trondmy@hammerspace.com> wrote:
>
> Hi Olga
>
> On Wed, 2019-01-09 at 14:39 -0500, Olga Kornievskaia wrote:
> > Hi Trond,
> >
> > Do you have any plans for this patch set?
> >
> > I applied the patches on top of 4.20-rc7 kernel I had and tested it
> > (linux to linux) with iozone on the hardware (40G link with Mellanox
> > CX-5 card).
> >
> > Results seem to show read IO improvement from 1.9GB to 3.9GB. Write
> > IO
> > speed seems to be the same (disk bound I'm guessing). I also tried
> > mounting tmpfs. Same thing.
> >
> > Seems like a useful feature to include?
>
> Thanks for testing this.
>
> Was this your own port of the original patches, or have you taken my
> branch from
> http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=shortlog;h=refs/heads/multipath_tcp
> ?

I didn't know one existed. I just took original patches from the
mailing list and applied to 4.20-rc7 (they applied without issues that
I recall).

> Either way I appreciate the data point. I haven't seen too many other
> reports of performance improvements, and that's the main reason why
> this patchset has languished.
>
> 3.9GB/s would be about 31Gbps, so that is not quite wire speed, but
> certainly a big improvement on 1.9GB/s.

Maybe it's the lab setup that's not tuned to achieve max performance.

> I'm a little surprised, tbough,
> that the write performance did not improve with the tmpfs. Was all this
> using aio+dio on the client?

It is what ever "iozone -i0 -i1 -s52m -y2k -az -I" translates to.

To clarify by "didn't improve" I didn't mean the write speed with disk
is same as write speed with tmpfs (disk write speed is ~168MB and
tmpfs write speed is 1.47GB). I meant that it seems with nconnect=1 it
achieves the "max" performance of disk/tmpfs.

>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH 4/5] pNFS: Allow multiple connections to the DS
  2017-05-02 16:38           ` [PATCH 3/3] pNFS: Fix a typo in pnfs_generic_alloc_ds_commits Trond Myklebust
@ 2017-05-02 16:38             ` Trond Myklebust
  0 siblings, 0 replies; 24+ messages in thread
From: Trond Myklebust @ 2017-05-02 16:38 UTC (permalink / raw)
  To: linux-nfs

If the user specifies -onconn=<number> mount option, and the transport
protocol is TCP, then set up <number> connections to the pNFS data server
as well. The connections will all go to the same IP address.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/nfs3client.c | 3 +++
 fs/nfs/nfs4client.c | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/fs/nfs/nfs3client.c b/fs/nfs/nfs3client.c
index 7879f2a0fcfd..8c624c74ddbe 100644
--- a/fs/nfs/nfs3client.c
+++ b/fs/nfs/nfs3client.c
@@ -100,6 +100,9 @@ struct nfs_client *nfs3_set_ds_client(struct nfs_server *mds_srv,
 		return ERR_PTR(-EINVAL);
 	cl_init.hostname = buf;
 
+	if (mds_clp->cl_nconnect > 1 && ds_proto == XPRT_TRANSPORT_TCP)
+		cl_init.nconnect = mds_clp->cl_nconnect;
+
 	if (mds_srv->flags & NFS_MOUNT_NORESVPORT)
 		set_bit(NFS_CS_NORESVPORT, &cl_init.init_flags);
 
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index c9b10b7829f0..bfea1b232dd2 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -912,6 +912,9 @@ struct nfs_client *nfs4_set_ds_client(struct nfs_server *mds_srv,
 		return ERR_PTR(-EINVAL);
 	cl_init.hostname = buf;
 
+	if (mds_clp->cl_nconnect > 1 && ds_proto == XPRT_TRANSPORT_TCP)
+		cl_init.nconnect = mds_clp->cl_nconnect;
+
 	if (mds_srv->flags & NFS_MOUNT_NORESVPORT)
 		__set_bit(NFS_CS_NORESVPORT, &cl_init.init_flags);
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2019-01-09 22:18 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-28 17:25 [RFC PATCH 0/5] Fun with the multipathing code Trond Myklebust
2017-04-28 17:25 ` [RFC PATCH 1/5] SUNRPC: Allow creation of RPC clients with multiple connections Trond Myklebust
2017-04-28 17:25   ` [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use Trond Myklebust
2017-04-28 17:25     ` [RFC PATCH 3/5] NFSv4: Allow multiple connections to NFSv4.x (x>0) servers Trond Myklebust
2017-04-28 17:25       ` [RFC PATCH 4/5] pNFS: Allow multiple connections to the DS Trond Myklebust
2017-04-28 17:25         ` [RFC PATCH 5/5] NFS: Display the "nconnect" mount option if it is set Trond Myklebust
2017-05-04 13:45     ` [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use Chuck Lever
2017-05-04 13:53       ` Chuck Lever
2017-05-04 16:01       ` Chuck Lever
2017-05-04 17:36         ` J. Bruce Fields
2017-05-04 17:38           ` Chuck Lever
2017-05-04 17:45             ` J. Bruce Fields
2017-05-04 18:55               ` Chuck Lever
2017-05-04 19:58                 ` J. Bruce Fields
2017-05-04 20:40               ` Trond Myklebust
2017-05-04 20:42                 ` bfields
2017-04-28 17:45 ` [RFC PATCH 0/5] Fun with the multipathing code Chuck Lever
2017-04-28 18:08   ` Trond Myklebust
2017-04-29 17:53     ` Chuck Lever
2017-05-04 19:09 ` Anna Schumaker
2019-01-09 19:39 ` Olga Kornievskaia
2019-01-09 20:38   ` Trond Myklebust
2019-01-09 22:18     ` Olga Kornievskaia
2017-05-02 16:38 [PATCH 0/3] Fix up a couple of issues around layout handling Trond Myklebust
2017-05-02 16:38 ` [RFC PATCH 1/5] SUNRPC: Allow creation of RPC clients with multiple connections Trond Myklebust
2017-05-02 16:38   ` [PATCH 1/3] pNFS: Don't clear the layout return info if there are segments to return Trond Myklebust
2017-05-02 16:38     ` [RFC PATCH 2/5] NFS: Add a mount option to specify number of TCP connections to use Trond Myklebust
2017-05-02 16:38       ` [PATCH 2/3] pNFS: Fix a deadlock when coalescing writes and returning the layout Trond Myklebust
2017-05-02 16:38         ` [RFC PATCH 3/5] NFSv4: Allow multiple connections to NFSv4.x (x>0) servers Trond Myklebust
2017-05-02 16:38           ` [PATCH 3/3] pNFS: Fix a typo in pnfs_generic_alloc_ds_commits Trond Myklebust
2017-05-02 16:38             ` [RFC PATCH 4/5] pNFS: Allow multiple connections to the DS Trond Myklebust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).