Linux-NFS Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v4 0/8] server-side support for "inter" SSC copy
@ 2019-07-08 19:23 Olga Kornievskaia
  2019-07-08 19:23 ` [PATCH v4 1/8] NFSD fill-in netloc4 structure Olga Kornievskaia
                   ` (8 more replies)
  0 siblings, 9 replies; 51+ messages in thread
From: Olga Kornievskaia @ 2019-07-08 19:23 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

This patch series adds support for NFSv4.2 copy offload feature
allowing copy between two different NFS servers.

This functionality depends on the VFS ability to support generic
copy_file_range() where a copy is done between an NFS file and
a local file system. This is on top of Amir's VFS generic copy
offload series.

This feature is enabled by the kernel module parameter --
inter_copy_offload_enable -- and by default is disabled. There is
also a kernel compile configuration of NFSD_V4_2_INTER_SSC that
adds dependency on the NFS client side functions called from the
server.

These patches work on top of existing async intra copy offload
patches. For the "inter" SSC, the implementation only supports
asynchronous inter copy.

On the source server, upon receiving a COPY_NOTIFY, it generate a
unique stateid that's kept in the global list. Upon receiving a READ
with a stateid, the code checks the normal list of open stateid and
now additionally, it'll check the copy state list as well before
deciding to either fail with BAD_STATEID or find one that matches.
The stored stateid is only valid to be used for the first time
with a choosen lease period (90s currently). When the source server
received an OFFLOAD_CANCEL, it will remove the stateid from the
global list. Otherwise, the copy stateid is removed upon the removal
of its "parent" stateid (open/lock/delegation stateid).

On the destination server, upon receiving a COPY request, the server
establishes the necessary clientid/session with the source server.
It calls into the NFS client code to establish the necessary
open stateid, filehandle, file description (without doing an NFS open).
Then the server calls into the copy_file_range() to preform the copy
where the source file will issue NFS READs and then do local file
system writes (this depends on the VFS ability to do cross device
copy_file_range().

v4:
--- allowing for synchronous inter server-to-server copy
--- added missing offload_cancel on the source server

Already presented numbers for performance improvement for large
file transfer but here are times for copying linux kernel tree
(which is mostly small files):
-- regular cp 6m1s (intra)
-- copy offload cp 4m11s (intra)
   -- benefit of using copy offload with small copies using sync copy
-- regular cp 6m9s (inter)
-- copy offload cp 6m3s (inter)
   -- same performance as traditional as for most it fallback to traditional
copy offload

Olga Kornievskaia (8):
  NFSD fill-in netloc4 structure
  NFSD add ca_source_server<> to COPY
  NFSD return nfs4_stid in nfs4_preprocess_stateid_op
  NFSD add COPY_NOTIFY operation
  NFSD check stateids against copy stateids
  NFSD generalize nfsd4_compound_state flag names
  NFSD: allow inter server COPY to have a STALE source server fh
  NFSD add nfs4 inter ssc to nfsd4_copy

 fs/nfsd/Kconfig     |  10 ++
 fs/nfsd/nfs4proc.c  | 434 +++++++++++++++++++++++++++++++++++++++++++++++-----
 fs/nfsd/nfs4state.c | 135 ++++++++++++++--
 fs/nfsd/nfs4xdr.c   | 172 ++++++++++++++++++++-
 fs/nfsd/nfsd.h      |  32 ++++
 fs/nfsd/nfsfh.h     |   5 +-
 fs/nfsd/nfssvc.c    |   6 +
 fs/nfsd/state.h     |  25 ++-
 fs/nfsd/xdr4.h      |  37 ++++-
 9 files changed, 790 insertions(+), 66 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v4 1/8] NFSD fill-in netloc4 structure
  2019-07-08 19:23 [PATCH v4 0/8] server-side support for "inter" SSC copy Olga Kornievskaia
@ 2019-07-08 19:23 ` Olga Kornievskaia
  2019-07-17 21:13   ` bfields
  2019-07-08 19:23 ` [PATCH v4 2/8] NFSD add ca_source_server<> to COPY Olga Kornievskaia
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 51+ messages in thread
From: Olga Kornievskaia @ 2019-07-08 19:23 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

From: Olga Kornievskaia <kolga@netapp.com>

nfs.4 defines nfs42_netaddr structure that represents netloc4.

Populate needed fields from the sockaddr structure.

This will be used by flexfiles and 4.2 inter copy

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfsd/nfsd.h | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 24187b5..8f4fc50 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -19,6 +19,7 @@
 #include <linux/sunrpc/svc.h>
 #include <linux/sunrpc/svc_xprt.h>
 #include <linux/sunrpc/msg_prot.h>
+#include <linux/sunrpc/addr.h>
 
 #include <uapi/linux/nfsd/debug.h>
 
@@ -375,6 +376,37 @@ static inline bool nfsd4_spo_must_allow(struct svc_rqst *rqstp)
 
 extern const u32 nfsd_suppattrs[3][3];
 
+static inline u32 nfsd4_set_netaddr(struct sockaddr *addr,
+				    struct nfs42_netaddr *netaddr)
+{
+	struct sockaddr_in *sin = (struct sockaddr_in *)addr;
+	struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)addr;
+	unsigned int port;
+	size_t ret_addr, ret_port;
+
+	switch (addr->sa_family) {
+	case AF_INET:
+		port = ntohs(sin->sin_port);
+		sprintf(netaddr->netid, "tcp");
+		netaddr->netid_len = 3;
+		break;
+	case AF_INET6:
+		port = ntohs(sin6->sin6_port);
+		sprintf(netaddr->netid, "tcp6");
+		netaddr->netid_len = 4;
+		break;
+	default:
+		return nfserr_inval;
+	}
+	ret_addr = rpc_ntop(addr, netaddr->addr, sizeof(netaddr->addr));
+	ret_port = snprintf(netaddr->addr + ret_addr,
+			    RPCBIND_MAXUADDRLEN + 1 - ret_addr,
+			    ".%u.%u", port >> 8, port & 0xff);
+	WARN_ON(ret_port >= RPCBIND_MAXUADDRLEN + 1 - ret_addr);
+	netaddr->addr_len = ret_addr + ret_port;
+	return 0;
+}
+
 static inline bool bmval_is_subset(const u32 *bm1, const u32 *bm2)
 {
 	return !((bm1[0] & ~bm2[0]) ||
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v4 2/8] NFSD add ca_source_server<> to COPY
  2019-07-08 19:23 [PATCH v4 0/8] server-side support for "inter" SSC copy Olga Kornievskaia
  2019-07-08 19:23 ` [PATCH v4 1/8] NFSD fill-in netloc4 structure Olga Kornievskaia
@ 2019-07-08 19:23 ` Olga Kornievskaia
  2019-07-17 21:40   ` bfields
  2019-07-08 19:23 ` [PATCH v4 3/8] NFSD return nfs4_stid in nfs4_preprocess_stateid_op Olga Kornievskaia
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 51+ messages in thread
From: Olga Kornievskaia @ 2019-07-08 19:23 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

From: Olga Kornievskaia <kolga@netapp.com>

Decode the ca_source_server list that's sent but only use the
first one. Presence of non-zero list indicates an "inter" copy.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfsd/nfs4xdr.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 fs/nfsd/xdr4.h    | 12 +++++----
 2 files changed, 80 insertions(+), 7 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 52c4f6d..15f53bb 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -40,6 +40,7 @@
 #include <linux/utsname.h>
 #include <linux/pagemap.h>
 #include <linux/sunrpc/svcauth_gss.h>
+#include <linux/sunrpc/addr.h>
 
 #include "idmap.h"
 #include "acl.h"
@@ -1744,11 +1745,58 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, str
 	DECODE_TAIL;
 }
 
+static __be32 nfsd4_decode_nl4_server(struct nfsd4_compoundargs *argp,
+				      struct nl4_server *ns)
+{
+	DECODE_HEAD;
+	struct nfs42_netaddr *naddr;
+
+	READ_BUF(4);
+	ns->nl4_type = be32_to_cpup(p++);
+
+	/* currently support for 1 inter-server source server */
+	switch (ns->nl4_type) {
+	case NL4_NAME:
+	case NL4_URL:
+		READ_BUF(4);
+		ns->u.nl4_str_sz = be32_to_cpup(p++);
+		if (ns->u.nl4_str_sz > NFS4_OPAQUE_LIMIT)
+			goto xdr_error;
+
+		READ_BUF(ns->u.nl4_str_sz);
+		COPYMEM(ns->u.nl4_str,
+			ns->u.nl4_str_sz);
+		break;
+	case NL4_NETADDR:
+		naddr = &ns->u.nl4_addr;
+
+		READ_BUF(4);
+		naddr->netid_len = be32_to_cpup(p++);
+		if (naddr->netid_len > RPCBIND_MAXNETIDLEN)
+			goto xdr_error;
+
+		READ_BUF(naddr->netid_len + 4); /* 4 for uaddr len */
+		COPYMEM(naddr->netid, naddr->netid_len);
+
+		naddr->addr_len = be32_to_cpup(p++);
+		if (naddr->addr_len > RPCBIND_MAXUADDRLEN)
+			goto xdr_error;
+
+		READ_BUF(naddr->addr_len);
+		COPYMEM(naddr->addr, naddr->addr_len);
+		break;
+	default:
+		goto xdr_error;
+	}
+	DECODE_TAIL;
+}
+
 static __be32
 nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy)
 {
 	DECODE_HEAD;
-	unsigned int tmp;
+	struct nl4_server *ns_dummy;
+	int i, count;
 
 	status = nfsd4_decode_stateid(argp, &copy->cp_src_stateid);
 	if (status)
@@ -1763,8 +1811,31 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, str
 	p = xdr_decode_hyper(p, &copy->cp_count);
 	p++; /* ca_consecutive: we always do consecutive copies */
 	copy->cp_synchronous = be32_to_cpup(p++);
-	tmp = be32_to_cpup(p); /* Source server list not supported */
+	count = be32_to_cpup(p++);
 
+	copy->cp_intra = false;
+	if (count == 0) { /* intra-server copy */
+		copy->cp_intra = true;
+		goto intra;
+	}
+
+	/* decode all the supplied server addresses but use first */
+	status = nfsd4_decode_nl4_server(argp, &copy->cp_src);
+	if (status)
+		return status;
+
+	ns_dummy = kmalloc(sizeof(struct nl4_server), GFP_KERNEL);
+	if (ns_dummy == NULL)
+		return nfserrno(-ENOMEM);
+	for (i = 0; i < count - 1; i++) {
+		status = nfsd4_decode_nl4_server(argp, ns_dummy);
+		if (status) {
+			kfree(ns_dummy);
+			return status;
+		}
+	}
+	kfree(ns_dummy);
+intra:
 	DECODE_TAIL;
 }
 
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index feeb6d4..513c9ff 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -516,11 +516,13 @@ struct nfsd42_write_res {
 
 struct nfsd4_copy {
 	/* request */
-	stateid_t	cp_src_stateid;
-	stateid_t	cp_dst_stateid;
-	u64		cp_src_pos;
-	u64		cp_dst_pos;
-	u64		cp_count;
+	stateid_t		cp_src_stateid;
+	stateid_t		cp_dst_stateid;
+	u64			cp_src_pos;
+	u64			cp_dst_pos;
+	u64			cp_count;
+	struct nl4_server	cp_src;
+	bool			cp_intra;
 
 	/* both */
 	bool		cp_synchronous;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v4 3/8] NFSD return nfs4_stid in nfs4_preprocess_stateid_op
  2019-07-08 19:23 [PATCH v4 0/8] server-side support for "inter" SSC copy Olga Kornievskaia
  2019-07-08 19:23 ` [PATCH v4 1/8] NFSD fill-in netloc4 structure Olga Kornievskaia
  2019-07-08 19:23 ` [PATCH v4 2/8] NFSD add ca_source_server<> to COPY Olga Kornievskaia
@ 2019-07-08 19:23 ` Olga Kornievskaia
  2019-07-08 19:23 ` [PATCH v4 4/8] NFSD add COPY_NOTIFY operation Olga Kornievskaia
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 51+ messages in thread
From: Olga Kornievskaia @ 2019-07-08 19:23 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Needed for copy to add nfs4_cp_state to the nfs4_stid.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfsd/nfs4proc.c  | 17 ++++++++++-------
 fs/nfsd/nfs4state.c |  8 ++++++--
 fs/nfsd/state.h     |  3 ++-
 3 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 8beda99..cfd8767 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -782,7 +782,8 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
 	/* check stateid */
 	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
 					&read->rd_stateid, RD_STATE,
-					&read->rd_filp, &read->rd_tmp_file);
+					&read->rd_filp, &read->rd_tmp_file,
+					NULL);
 	if (status) {
 		dprintk("NFSD: nfsd4_read: couldn't process stateid!\n");
 		goto out;
@@ -954,7 +955,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
 	if (setattr->sa_iattr.ia_valid & ATTR_SIZE) {
 		status = nfs4_preprocess_stateid_op(rqstp, cstate,
 				&cstate->current_fh, &setattr->sa_stateid,
-				WR_STATE, NULL, NULL);
+				WR_STATE, NULL, NULL, NULL);
 		if (status) {
 			dprintk("NFSD: nfsd4_setattr: couldn't process stateid!\n");
 			return status;
@@ -1005,7 +1006,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
 	trace_nfsd_write_start(rqstp, &cstate->current_fh,
 			       write->wr_offset, cnt);
 	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
-						stateid, WR_STATE, &filp, NULL);
+					stateid, WR_STATE, &filp, NULL, NULL);
 	if (status) {
 		dprintk("NFSD: nfsd4_write: couldn't process stateid!\n");
 		return status;
@@ -1040,14 +1041,16 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
 		return nfserr_nofilehandle;
 
 	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->save_fh,
-					    src_stateid, RD_STATE, src, NULL);
+					    src_stateid, RD_STATE, src, NULL,
+					    NULL);
 	if (status) {
 		dprintk("NFSD: %s: couldn't process src stateid!\n", __func__);
 		goto out;
 	}
 
 	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
-					    dst_stateid, WR_STATE, dst, NULL);
+					    dst_stateid, WR_STATE, dst, NULL,
+					    NULL);
 	if (status) {
 		dprintk("NFSD: %s: couldn't process dst stateid!\n", __func__);
 		goto out_put_src;
@@ -1351,7 +1354,7 @@ struct nfsd4_copy *
 
 	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
 					    &fallocate->falloc_stateid,
-					    WR_STATE, &file, NULL);
+					    WR_STATE, &file, NULL, NULL);
 	if (status != nfs_ok) {
 		dprintk("NFSD: nfsd4_fallocate: couldn't process stateid!\n");
 		return status;
@@ -1410,7 +1413,7 @@ struct nfsd4_copy *
 
 	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
 					    &seek->seek_stateid,
-					    RD_STATE, &file, NULL);
+					    RD_STATE, &file, NULL, NULL);
 	if (status) {
 		dprintk("NFSD: nfsd4_seek: couldn't process stateid!\n");
 		return status;
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 618e660..05c0295 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -5187,7 +5187,8 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
 __be32
 nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
 		struct nfsd4_compound_state *cstate, struct svc_fh *fhp,
-		stateid_t *stateid, int flags, struct file **filpp, bool *tmp_file)
+		stateid_t *stateid, int flags, struct file **filpp,
+		bool *tmp_file, struct nfs4_stid **cstid)
 {
 	struct inode *ino = d_inode(fhp->fh_dentry);
 	struct net *net = SVC_NET(rqstp);
@@ -5238,8 +5239,11 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
 	if (!status && filpp)
 		status = nfs4_check_file(rqstp, fhp, s, filpp, tmp_file, flags);
 out:
-	if (s)
+	if (s) {
+		if (!status && cstid)
+			*cstid = s;
 		nfs4_put_stid(s);
+	}
 	return status;
 }
 
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 0b74d37..5da9cc3 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -607,7 +607,8 @@ struct nfsd4_blocked_lock {
 
 extern __be32 nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
 		struct nfsd4_compound_state *cstate, struct svc_fh *fhp,
-		stateid_t *stateid, int flags, struct file **filp, bool *tmp_file);
+		stateid_t *stateid, int flags, struct file **filp,
+		bool *tmp_file, struct nfs4_stid **cstid);
 __be32 nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
 		     stateid_t *stateid, unsigned char typemask,
 		     struct nfs4_stid **s, struct nfsd_net *nn);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v4 4/8] NFSD add COPY_NOTIFY operation
  2019-07-08 19:23 [PATCH v4 0/8] server-side support for "inter" SSC copy Olga Kornievskaia
                   ` (2 preceding siblings ...)
  2019-07-08 19:23 ` [PATCH v4 3/8] NFSD return nfs4_stid in nfs4_preprocess_stateid_op Olga Kornievskaia
@ 2019-07-08 19:23 ` Olga Kornievskaia
  2019-07-09 12:34   ` Anna Schumaker
                     ` (3 more replies)
  2019-07-08 19:23 ` [PATCH v4 5/8] NFSD check stateids against copy stateids Olga Kornievskaia
                   ` (4 subsequent siblings)
  8 siblings, 4 replies; 51+ messages in thread
From: Olga Kornievskaia @ 2019-07-08 19:23 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

From: Olga Kornievskaia <kolga@netapp.com>

Introducing the COPY_NOTIFY operation.

Create a new unique stateid that will keep track of the copy
state and the upcoming READs that will use that stateid. Keep
it in the list associated with parent stateid.

Return single netaddr to advertise to the copy.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfsd/nfs4proc.c  | 71 +++++++++++++++++++++++++++++++++++----
 fs/nfsd/nfs4state.c | 64 +++++++++++++++++++++++++++++++----
 fs/nfsd/nfs4xdr.c   | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 fs/nfsd/state.h     | 18 ++++++++--
 fs/nfsd/xdr4.h      | 13 +++++++
 5 files changed, 247 insertions(+), 16 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index cfd8767..c39fa72 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -37,6 +37,7 @@
 #include <linux/falloc.h>
 #include <linux/slab.h>
 #include <linux/kthread.h>
+#include <linux/sunrpc/addr.h>
 
 #include "idmap.h"
 #include "cache.h"
@@ -1033,7 +1034,8 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
 static __be32
 nfsd4_verify_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		  stateid_t *src_stateid, struct file **src,
-		  stateid_t *dst_stateid, struct file **dst)
+		  stateid_t *dst_stateid, struct file **dst,
+		  struct nfs4_stid **stid)
 {
 	__be32 status;
 
@@ -1050,7 +1052,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
 
 	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
 					    dst_stateid, WR_STATE, dst, NULL,
-					    NULL);
+					    stid);
 	if (status) {
 		dprintk("NFSD: %s: couldn't process dst stateid!\n", __func__);
 		goto out_put_src;
@@ -1081,7 +1083,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
 	__be32 status;
 
 	status = nfsd4_verify_copy(rqstp, cstate, &clone->cl_src_stateid, &src,
-				   &clone->cl_dst_stateid, &dst);
+				   &clone->cl_dst_stateid, &dst, NULL);
 	if (status)
 		goto out;
 
@@ -1228,7 +1230,7 @@ static void dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy *dst)
 
 static void cleanup_async_copy(struct nfsd4_copy *copy)
 {
-	nfs4_free_cp_state(copy);
+	nfs4_free_copy_state(copy);
 	fput(copy->file_dst);
 	fput(copy->file_src);
 	spin_lock(&copy->cp_clp->async_lock);
@@ -1268,7 +1270,7 @@ static int nfsd4_do_async_copy(void *data)
 
 	status = nfsd4_verify_copy(rqstp, cstate, &copy->cp_src_stateid,
 				   &copy->file_src, &copy->cp_dst_stateid,
-				   &copy->file_dst);
+				   &copy->file_dst, NULL);
 	if (status)
 		goto out;
 
@@ -1282,7 +1284,7 @@ static int nfsd4_do_async_copy(void *data)
 		async_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL);
 		if (!async_copy)
 			goto out;
-		if (!nfs4_init_cp_state(nn, copy)) {
+		if (!nfs4_init_copy_state(nn, copy)) {
 			kfree(async_copy);
 			goto out;
 		}
@@ -1346,6 +1348,42 @@ struct nfsd4_copy *
 }
 
 static __be32
+nfsd4_copy_notify(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
+		  union nfsd4_op_u *u)
+{
+	struct nfsd4_copy_notify *cn = &u->copy_notify;
+	__be32 status;
+	struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
+	struct nfs4_stid *stid;
+	struct nfs4_cpntf_state *cps;
+
+	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
+					&cn->cpn_src_stateid, RD_STATE, NULL,
+					NULL, &stid);
+	if (status)
+		return status;
+
+	cn->cpn_sec = nn->nfsd4_lease;
+	cn->cpn_nsec = 0;
+
+	status = nfserrno(-ENOMEM);
+	cps = nfs4_alloc_init_cpntf_state(nn, stid);
+	if (!cps)
+		return status;
+	memcpy(&cn->cpn_cnr_stateid, &cps->cp_stateid, sizeof(stateid_t));
+
+	/* For now, only return one server address in cpn_src, the
+	 * address used by the client to connect to this server.
+	 */
+	cn->cpn_src.nl4_type = NL4_NETADDR;
+	status = nfsd4_set_netaddr((struct sockaddr *)&rqstp->rq_daddr,
+				 &cn->cpn_src.u.nl4_addr);
+	WARN_ON_ONCE(status);
+
+	return status;
+}
+
+static __be32
 nfsd4_fallocate(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		struct nfsd4_fallocate *fallocate, int flags)
 {
@@ -2298,6 +2336,21 @@ static inline u32 nfsd4_offload_status_rsize(struct svc_rqst *rqstp,
 		1 /* osr_complete<1> optional 0 for now */) * sizeof(__be32);
 }
 
+static inline u32 nfsd4_copy_notify_rsize(struct svc_rqst *rqstp,
+					struct nfsd4_op *op)
+{
+	return (op_encode_hdr_size +
+		3 /* cnr_lease_time */ +
+		1 /* We support one cnr_source_server */ +
+		1 /* cnr_stateid seq */ +
+		op_encode_stateid_maxsz /* cnr_stateid */ +
+		1 /* num cnr_source_server*/ +
+		1 /* nl4_type */ +
+		1 /* nl4 size */ +
+		XDR_QUADLEN(NFS4_OPAQUE_LIMIT) /*nl4_loc + nl4_loc_sz */)
+		* sizeof(__be32);
+}
+
 #ifdef CONFIG_NFSD_PNFS
 static inline u32 nfsd4_getdeviceinfo_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
 {
@@ -2722,6 +2775,12 @@ static inline u32 nfsd4_seek_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
 		.op_name = "OP_OFFLOAD_CANCEL",
 		.op_rsize_bop = nfsd4_only_status_rsize,
 	},
+	[OP_COPY_NOTIFY] = {
+		.op_func = nfsd4_copy_notify,
+		.op_flags = OP_MODIFIES_SOMETHING,
+		.op_name = "OP_COPY_NOTIFY",
+		.op_rsize_bop = nfsd4_copy_notify_rsize,
+	},
 };
 
 /**
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 05c0295..2555eb9 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -707,6 +707,7 @@ struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *sla
 	/* Will be incremented before return to client: */
 	refcount_set(&stid->sc_count, 1);
 	spin_lock_init(&stid->sc_lock);
+	INIT_LIST_HEAD(&stid->sc_cp_list);
 
 	/*
 	 * It shouldn't be a problem to reuse an opaque stateid value.
@@ -726,24 +727,53 @@ struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *sla
 /*
  * Create a unique stateid_t to represent each COPY.
  */
-int nfs4_init_cp_state(struct nfsd_net *nn, struct nfsd4_copy *copy)
+static int nfs4_init_cp_state(struct nfsd_net *nn, void *ptr, stateid_t *stid)
 {
 	int new_id;
 
 	idr_preload(GFP_KERNEL);
 	spin_lock(&nn->s2s_cp_lock);
-	new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, copy, 0, 0, GFP_NOWAIT);
+	new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, ptr, 0, 0, GFP_NOWAIT);
 	spin_unlock(&nn->s2s_cp_lock);
 	idr_preload_end();
 	if (new_id < 0)
 		return 0;
-	copy->cp_stateid.si_opaque.so_id = new_id;
-	copy->cp_stateid.si_opaque.so_clid.cl_boot = nn->boot_time;
-	copy->cp_stateid.si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
+	stid->si_opaque.so_id = new_id;
+	stid->si_opaque.so_clid.cl_boot = nn->boot_time;
+	stid->si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
 	return 1;
 }
 
-void nfs4_free_cp_state(struct nfsd4_copy *copy)
+int nfs4_init_copy_state(struct nfsd_net *nn, struct nfsd4_copy *copy)
+{
+	return nfs4_init_cp_state(nn, copy, &copy->cp_stateid);
+}
+
+struct nfs4_cpntf_state *nfs4_alloc_init_cpntf_state(struct nfsd_net *nn,
+						     struct nfs4_stid *p_stid)
+{
+	struct nfs4_cpntf_state *cps;
+
+	cps = kzalloc(sizeof(struct nfs4_cpntf_state), GFP_KERNEL);
+	if (!cps)
+		return NULL;
+	if (!nfs4_init_cp_state(nn, cps, &cps->cp_stateid))
+		goto out_free;
+	cps->cp_p_stid = p_stid;
+	cps->cp_active = false;
+	cps->cp_timeout = jiffies + (nn->nfsd4_lease * HZ);
+	INIT_LIST_HEAD(&cps->cp_list);
+	spin_lock(&nn->s2s_cp_lock);
+	list_add(&cps->cp_list, &p_stid->sc_cp_list);
+	spin_unlock(&nn->s2s_cp_lock);
+
+	return cps;
+out_free:
+	kfree(cps);
+	return NULL;
+}
+
+void nfs4_free_copy_state(struct nfsd4_copy *copy)
 {
 	struct nfsd_net *nn;
 
@@ -753,6 +783,27 @@ void nfs4_free_cp_state(struct nfsd4_copy *copy)
 	spin_unlock(&nn->s2s_cp_lock);
 }
 
+static void nfs4_free_cpntf_statelist(struct net *net, struct nfs4_stid *stid)
+{
+	struct nfs4_cpntf_state *cps;
+	struct nfsd_net *nn;
+
+	nn = net_generic(net, nfsd_net_id);
+
+	might_sleep();
+
+	spin_lock(&nn->s2s_cp_lock);
+	while (!list_empty(&stid->sc_cp_list)) {
+		cps = list_first_entry(&stid->sc_cp_list,
+				       struct nfs4_cpntf_state, cp_list);
+		list_del(&cps->cp_list);
+		idr_remove(&nn->s2s_cp_stateids,
+			   cps->cp_stateid.si_opaque.so_id);
+		kfree(cps);
+	}
+	spin_unlock(&nn->s2s_cp_lock);
+}
+
 static struct nfs4_ol_stateid * nfs4_alloc_open_stateid(struct nfs4_client *clp)
 {
 	struct nfs4_stid *stid;
@@ -901,6 +952,7 @@ static void block_delegations(struct knfsd_fh *fh)
 	}
 	idr_remove(&clp->cl_stateids, s->sc_stateid.si_opaque.so_id);
 	spin_unlock(&clp->cl_lock);
+	nfs4_free_cpntf_statelist(clp->net, s);
 	s->sc_free(s);
 	if (fp)
 		put_nfs4_file(fp);
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 15f53bb..ed37528 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1847,6 +1847,22 @@ static __be32 nfsd4_decode_nl4_server(struct nfsd4_compoundargs *argp,
 }
 
 static __be32
+nfsd4_decode_copy_notify(struct nfsd4_compoundargs *argp,
+			 struct nfsd4_copy_notify *cn)
+{
+	int status;
+
+	status = nfsd4_decode_stateid(argp, &cn->cpn_src_stateid);
+	if (status)
+		return status;
+	status = nfsd4_decode_nl4_server(argp, &cn->cpn_dst);
+	if (status)
+		return status;
+
+	return status;
+}
+
+static __be32
 nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek *seek)
 {
 	DECODE_HEAD;
@@ -1947,7 +1963,7 @@ static __be32 nfsd4_decode_nl4_server(struct nfsd4_compoundargs *argp,
 	/* new operations for NFSv4.2 */
 	[OP_ALLOCATE]		= (nfsd4_dec)nfsd4_decode_fallocate,
 	[OP_COPY]		= (nfsd4_dec)nfsd4_decode_copy,
-	[OP_COPY_NOTIFY]	= (nfsd4_dec)nfsd4_decode_notsupp,
+	[OP_COPY_NOTIFY]	= (nfsd4_dec)nfsd4_decode_copy_notify,
 	[OP_DEALLOCATE]		= (nfsd4_dec)nfsd4_decode_fallocate,
 	[OP_IO_ADVISE]		= (nfsd4_dec)nfsd4_decode_notsupp,
 	[OP_LAYOUTERROR]	= (nfsd4_dec)nfsd4_decode_notsupp,
@@ -4336,6 +4352,45 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
 }
 
 static __be32
+nfsd42_encode_nl4_server(struct nfsd4_compoundres *resp, struct nl4_server *ns)
+{
+	struct xdr_stream *xdr = &resp->xdr;
+	struct nfs42_netaddr *addr;
+	__be32 *p;
+
+	p = xdr_reserve_space(xdr, 4);
+	*p++ = cpu_to_be32(ns->nl4_type);
+
+	switch (ns->nl4_type) {
+	case NL4_NETADDR:
+		addr = &ns->u.nl4_addr;
+
+		/* netid_len, netid, uaddr_len, uaddr (port included
+		 * in RPCBIND_MAXUADDRLEN)
+		 */
+		p = xdr_reserve_space(xdr,
+			4 /* netid len */ +
+			(XDR_QUADLEN(addr->netid_len) * 4) +
+			4 /* uaddr len */ +
+			(XDR_QUADLEN(addr->addr_len) * 4));
+		if (!p)
+			return nfserr_resource;
+
+		*p++ = cpu_to_be32(addr->netid_len);
+		p = xdr_encode_opaque_fixed(p, addr->netid,
+					    addr->netid_len);
+		*p++ = cpu_to_be32(addr->addr_len);
+		p = xdr_encode_opaque_fixed(p, addr->addr,
+					addr->addr_len);
+		break;
+	default:
+		WARN_ON(ns->nl4_type != NL4_NETADDR);
+	}
+
+	return 0;
+}
+
+static __be32
 nfsd4_encode_copy(struct nfsd4_compoundres *resp, __be32 nfserr,
 		  struct nfsd4_copy *copy)
 {
@@ -4369,6 +4424,44 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
 }
 
 static __be32
+nfsd4_encode_copy_notify(struct nfsd4_compoundres *resp, __be32 nfserr,
+			 struct nfsd4_copy_notify *cn)
+{
+	struct xdr_stream *xdr = &resp->xdr;
+	__be32 *p;
+
+	if (nfserr)
+		return nfserr;
+
+	/* 8 sec, 4 nsec */
+	p = xdr_reserve_space(xdr, 12);
+	if (!p)
+		return nfserr_resource;
+
+	/* cnr_lease_time */
+	p = xdr_encode_hyper(p, cn->cpn_sec);
+	*p++ = cpu_to_be32(cn->cpn_nsec);
+
+	/* cnr_stateid */
+	nfserr = nfsd4_encode_stateid(xdr, &cn->cpn_cnr_stateid);
+	if (nfserr)
+		return nfserr;
+
+	/* cnr_src.nl_nsvr */
+	p = xdr_reserve_space(xdr, 4);
+	if (!p)
+		return nfserr_resource;
+
+	*p++ = cpu_to_be32(1);
+
+	nfserr = nfsd42_encode_nl4_server(resp, &cn->cpn_src);
+	if (nfserr)
+		return nfserr;
+
+	return nfserr;
+}
+
+static __be32
 nfsd4_encode_seek(struct nfsd4_compoundres *resp, __be32 nfserr,
 		  struct nfsd4_seek *seek)
 {
@@ -4465,7 +4558,7 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
 	/* NFSv4.2 operations */
 	[OP_ALLOCATE]		= (nfsd4_enc)nfsd4_encode_noop,
 	[OP_COPY]		= (nfsd4_enc)nfsd4_encode_copy,
-	[OP_COPY_NOTIFY]	= (nfsd4_enc)nfsd4_encode_noop,
+	[OP_COPY_NOTIFY]	= (nfsd4_enc)nfsd4_encode_copy_notify,
 	[OP_DEALLOCATE]		= (nfsd4_enc)nfsd4_encode_noop,
 	[OP_IO_ADVISE]		= (nfsd4_enc)nfsd4_encode_noop,
 	[OP_LAYOUTERROR]	= (nfsd4_enc)nfsd4_encode_noop,
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 5da9cc3..106ed56 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -95,6 +95,7 @@ struct nfs4_stid {
 #define NFS4_REVOKED_DELEG_STID 16
 #define NFS4_CLOSED_DELEG_STID 32
 #define NFS4_LAYOUT_STID 64
+	struct list_head	sc_cp_list;
 	unsigned char		sc_type;
 	stateid_t		sc_stateid;
 	spinlock_t		sc_lock;
@@ -103,6 +104,17 @@ struct nfs4_stid {
 	void			(*sc_free)(struct nfs4_stid *);
 };
 
+/* Keep a list of stateids issued by the COPY_NOTIFY, associate it with the
+ * parent OPEN/LOCK/DELEG stateid.
+ */
+struct nfs4_cpntf_state {
+	stateid_t		cp_stateid;
+	struct list_head	cp_list;	/* per parent nfs4_stid */
+	struct nfs4_stid	*cp_p_stid;	/* pointer to parent */
+	bool			cp_active;	/* has the copy started */
+	unsigned long		cp_timeout;	/* copy timeout */
+};
+
 /*
  * Represents a delegation stateid. The nfs4_client holds references to these
  * and they are put when it is being destroyed or when the delegation is
@@ -614,8 +626,10 @@ __be32 nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
 		     struct nfs4_stid **s, struct nfsd_net *nn);
 struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *slab,
 				  void (*sc_free)(struct nfs4_stid *));
-int nfs4_init_cp_state(struct nfsd_net *nn, struct nfsd4_copy *copy);
-void nfs4_free_cp_state(struct nfsd4_copy *copy);
+int nfs4_init_copy_state(struct nfsd_net *nn, struct nfsd4_copy *copy);
+void nfs4_free_copy_state(struct nfsd4_copy *copy);
+struct nfs4_cpntf_state *nfs4_alloc_init_cpntf_state(struct nfsd_net *nn,
+			struct nfs4_stid *p_stid);
 void nfs4_unhash_stid(struct nfs4_stid *s);
 void nfs4_put_stid(struct nfs4_stid *s);
 void nfs4_inc_and_copy_stateid(stateid_t *dst, struct nfs4_stid *stid);
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 513c9ff..bade8e5 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -568,6 +568,18 @@ struct nfsd4_offload_status {
 	u32		status;
 };
 
+struct nfsd4_copy_notify {
+	/* request */
+	stateid_t		cpn_src_stateid;
+	struct nl4_server	cpn_dst;
+
+	/* response */
+	stateid_t		cpn_cnr_stateid;
+	u64			cpn_sec;
+	u32			cpn_nsec;
+	struct nl4_server	cpn_src;
+};
+
 struct nfsd4_op {
 	int					opnum;
 	const struct nfsd4_operation *		opdesc;
@@ -627,6 +639,7 @@ struct nfsd4_op {
 		struct nfsd4_clone		clone;
 		struct nfsd4_copy		copy;
 		struct nfsd4_offload_status	offload_status;
+		struct nfsd4_copy_notify	copy_notify;
 		struct nfsd4_seek		seek;
 	} u;
 	struct nfs4_replay *			replay;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v4 5/8] NFSD check stateids against copy stateids
  2019-07-08 19:23 [PATCH v4 0/8] server-side support for "inter" SSC copy Olga Kornievskaia
                   ` (3 preceding siblings ...)
  2019-07-08 19:23 ` [PATCH v4 4/8] NFSD add COPY_NOTIFY operation Olga Kornievskaia
@ 2019-07-08 19:23 ` Olga Kornievskaia
  2019-07-19 22:01   ` bfields
  2019-07-08 19:23 ` [PATCH v4 6/8] NFSD generalize nfsd4_compound_state flag names Olga Kornievskaia
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 51+ messages in thread
From: Olga Kornievskaia @ 2019-07-08 19:23 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Incoming stateid (used by a READ) could be a saved copy stateid.
On first use make it active and check that the copy has started
within the allowable lease time.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfsd/nfs4state.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 2555eb9..b786625 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -5232,6 +5232,49 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
 
 	return 0;
 }
+/*
+ * A READ from an inter server to server COPY will have a
+ * copy stateid. Return the parent nfs4_stid.
+ */
+static __be32 _find_cpntf_state(struct nfsd_net *nn, stateid_t *st,
+		     struct nfs4_cpntf_state **cps)
+{
+	struct nfs4_cpntf_state *state = NULL;
+
+	if (st->si_opaque.so_clid.cl_id != nn->s2s_cp_cl_id)
+		return nfserr_bad_stateid;
+	spin_lock(&nn->s2s_cp_lock);
+	state = idr_find(&nn->s2s_cp_stateids, st->si_opaque.so_id);
+	if (state)
+		refcount_inc(&state->cp_p_stid->sc_count);
+	spin_unlock(&nn->s2s_cp_lock);
+	if (!state)
+		return nfserr_bad_stateid;
+	*cps = state;
+	return 0;
+}
+
+static __be32 find_cpntf_state(struct nfsd_net *nn, stateid_t *st,
+			       struct nfs4_stid **stid)
+{
+	__be32 status;
+	struct nfs4_cpntf_state *cps = NULL;
+
+	status = _find_cpntf_state(nn, st, &cps);
+	if (status)
+		return status;
+
+	/* Did the inter server to server copy start in time? */
+	if (cps->cp_active == false && !time_after(cps->cp_timeout, jiffies)) {
+		nfs4_put_stid(cps->cp_p_stid);
+		return nfserr_partner_no_auth;
+	} else
+		cps->cp_active = true;
+
+	*stid = cps->cp_p_stid;
+
+	return nfs_ok;
+}
 
 /*
  * Checks for stateid operations
@@ -5264,6 +5307,8 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
 	status = nfsd4_lookup_stateid(cstate, stateid,
 				NFS4_DELEG_STID|NFS4_OPEN_STID|NFS4_LOCK_STID,
 				&s, nn);
+	if (status == nfserr_bad_stateid)
+		status = find_cpntf_state(nn, stateid, &s);
 	if (status)
 		return status;
 	status = nfsd4_stid_check_stateid_generation(stateid, s,
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v4 6/8] NFSD generalize nfsd4_compound_state flag names
  2019-07-08 19:23 [PATCH v4 0/8] server-side support for "inter" SSC copy Olga Kornievskaia
                   ` (4 preceding siblings ...)
  2019-07-08 19:23 ` [PATCH v4 5/8] NFSD check stateids against copy stateids Olga Kornievskaia
@ 2019-07-08 19:23 ` Olga Kornievskaia
  2019-07-08 19:23 ` [PATCH v4 7/8] NFSD: allow inter server COPY to have a STALE source server fh Olga Kornievskaia
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 51+ messages in thread
From: Olga Kornievskaia @ 2019-07-08 19:23 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

From: Olga Kornievskaia <kolga@netapp.com>

Allow for sid_flag field non-stateid use.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfsd/nfs4proc.c  | 8 ++++----
 fs/nfsd/nfs4state.c | 7 ++++---
 fs/nfsd/xdr4.h      | 6 +++---
 3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index c39fa72..8c2273e 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -531,9 +531,9 @@ static __be32 nfsd4_open_omfg(struct svc_rqst *rqstp, struct nfsd4_compound_stat
 		return nfserr_restorefh;
 
 	fh_dup2(&cstate->current_fh, &cstate->save_fh);
-	if (HAS_STATE_ID(cstate, SAVED_STATE_ID_FLAG)) {
+	if (HAS_CSTATE_FLAG(cstate, SAVED_STATE_ID_FLAG)) {
 		memcpy(&cstate->current_stateid, &cstate->save_stateid, sizeof(stateid_t));
-		SET_STATE_ID(cstate, CURRENT_STATE_ID_FLAG);
+		SET_CSTATE_FLAG(cstate, CURRENT_STATE_ID_FLAG);
 	}
 	return nfs_ok;
 }
@@ -543,9 +543,9 @@ static __be32 nfsd4_open_omfg(struct svc_rqst *rqstp, struct nfsd4_compound_stat
 	     union nfsd4_op_u *u)
 {
 	fh_dup2(&cstate->save_fh, &cstate->current_fh);
-	if (HAS_STATE_ID(cstate, CURRENT_STATE_ID_FLAG)) {
+	if (HAS_CSTATE_FLAG(cstate, CURRENT_STATE_ID_FLAG)) {
 		memcpy(&cstate->save_stateid, &cstate->current_stateid, sizeof(stateid_t));
-		SET_STATE_ID(cstate, SAVED_STATE_ID_FLAG);
+		SET_CSTATE_FLAG(cstate, SAVED_STATE_ID_FLAG);
 	}
 	return nfs_ok;
 }
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index b786625..d7f4b96 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -7466,7 +7466,8 @@ static int nfs4_state_create_net(struct net *net)
 static void
 get_stateid(struct nfsd4_compound_state *cstate, stateid_t *stateid)
 {
-	if (HAS_STATE_ID(cstate, CURRENT_STATE_ID_FLAG) && CURRENT_STATEID(stateid))
+	if (HAS_CSTATE_FLAG(cstate, CURRENT_STATE_ID_FLAG) &&
+	    CURRENT_STATEID(stateid))
 		memcpy(stateid, &cstate->current_stateid, sizeof(stateid_t));
 }
 
@@ -7475,14 +7476,14 @@ static int nfs4_state_create_net(struct net *net)
 {
 	if (cstate->minorversion) {
 		memcpy(&cstate->current_stateid, stateid, sizeof(stateid_t));
-		SET_STATE_ID(cstate, CURRENT_STATE_ID_FLAG);
+		SET_CSTATE_FLAG(cstate, CURRENT_STATE_ID_FLAG);
 	}
 }
 
 void
 clear_current_stateid(struct nfsd4_compound_state *cstate)
 {
-	CLEAR_STATE_ID(cstate, CURRENT_STATE_ID_FLAG);
+	CLEAR_CSTATE_FLAG(cstate, CURRENT_STATE_ID_FLAG);
 }
 
 /*
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index bade8e5..9d7318c 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -46,9 +46,9 @@
 #define CURRENT_STATE_ID_FLAG (1<<0)
 #define SAVED_STATE_ID_FLAG (1<<1)
 
-#define SET_STATE_ID(c, f) ((c)->sid_flags |= (f))
-#define HAS_STATE_ID(c, f) ((c)->sid_flags & (f))
-#define CLEAR_STATE_ID(c, f) ((c)->sid_flags &= ~(f))
+#define SET_CSTATE_FLAG(c, f) ((c)->sid_flags |= (f))
+#define HAS_CSTATE_FLAG(c, f) ((c)->sid_flags & (f))
+#define CLEAR_CSTATE_FLAG(c, f) ((c)->sid_flags &= ~(f))
 
 struct nfsd4_compound_state {
 	struct svc_fh		current_fh;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v4 7/8] NFSD: allow inter server COPY to have a STALE source server fh
  2019-07-08 19:23 [PATCH v4 0/8] server-side support for "inter" SSC copy Olga Kornievskaia
                   ` (5 preceding siblings ...)
  2019-07-08 19:23 ` [PATCH v4 6/8] NFSD generalize nfsd4_compound_state flag names Olga Kornievskaia
@ 2019-07-08 19:23 ` Olga Kornievskaia
  2019-07-23 21:35   ` bfields
  2019-07-08 19:23 ` [PATCH v4 8/8] NFSD add nfs4 inter ssc to nfsd4_copy Olga Kornievskaia
  2019-07-09  3:53 ` [PATCH v4 0/8] server-side support for "inter" SSC copy bfields
  8 siblings, 1 reply; 51+ messages in thread
From: Olga Kornievskaia @ 2019-07-08 19:23 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

The inter server to server COPY source server filehandle
is a foreign filehandle as the COPY is sent to the destination
server.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfsd/Kconfig    | 10 ++++++++++
 fs/nfsd/nfs4proc.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++----
 fs/nfsd/nfsfh.h    |  5 ++++-
 fs/nfsd/xdr4.h     |  1 +
 4 files changed, 64 insertions(+), 5 deletions(-)

diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
index d25f6bb..bef3a58 100644
--- a/fs/nfsd/Kconfig
+++ b/fs/nfsd/Kconfig
@@ -132,6 +132,16 @@ config NFSD_FLEXFILELAYOUT
 
 	  If unsure, say N.
 
+config NFSD_V4_2_INTER_SSC
+	bool "NFSv4.2 inter server to server COPY"
+	depends on NFSD_V4 && NFS_V4_1 && NFS_V4_2
+	help
+	  This option enables support for NFSv4.2 inter server to
+	  server copy where the destination server calls the NFSv4.2
+	  client to read the data to copy from the source server.
+
+	  If unsure, say N.
+
 config NFSD_V4_SECURITY_LABEL
 	bool "Provide Security Label support for NFSv4 server"
 	depends on NFSD_V4 && SECURITY
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 8c2273e..1039528 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -504,12 +504,20 @@ static __be32 nfsd4_open_omfg(struct svc_rqst *rqstp, struct nfsd4_compound_stat
 	    union nfsd4_op_u *u)
 {
 	struct nfsd4_putfh *putfh = &u->putfh;
+	__be32 ret;
 
 	fh_put(&cstate->current_fh);
 	cstate->current_fh.fh_handle.fh_size = putfh->pf_fhlen;
 	memcpy(&cstate->current_fh.fh_handle.fh_base, putfh->pf_fhval,
 	       putfh->pf_fhlen);
-	return fh_verify(rqstp, &cstate->current_fh, 0, NFSD_MAY_BYPASS_GSS);
+	ret = fh_verify(rqstp, &cstate->current_fh, 0, NFSD_MAY_BYPASS_GSS);
+#ifdef CONFIG_NFSD_V4_2_INTER_SSC
+	if (ret == nfserr_stale && putfh->no_verify) {
+		SET_FH_FLAG(&cstate->current_fh, NFSD4_FH_FOREIGN);
+		ret = 0;
+	}
+#endif
+	return ret;
 }
 
 static __be32
@@ -1956,6 +1964,41 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
 		- rqstp->rq_auth_slack;
 }
 
+#ifdef CONFIG_NFSD_V4_2_INTER_SSC
+static void
+check_if_stalefh_allowed(struct nfsd4_compoundargs *args)
+{
+	struct nfsd4_op	*op, *current_op, *saved_op;
+	struct nfsd4_copy *copy;
+	struct nfsd4_putfh *putfh;
+	int i;
+
+	/* traverse all operation and if it's a COPY compound, mark the
+	 * source filehandle to skip verification
+	 */
+	for (i = 0; i < args->opcnt; i++) {
+		op = &args->ops[i];
+		if (op->opnum == OP_PUTFH)
+			current_op = op;
+		else if (op->opnum == OP_SAVEFH)
+			saved_op = current_op;
+		else if (op->opnum == OP_RESTOREFH)
+			current_op = saved_op;
+		else if (op->opnum == OP_COPY) {
+			copy = (struct nfsd4_copy *)&op->u;
+			putfh = (struct nfsd4_putfh *)&saved_op->u;
+			if (!copy->cp_intra)
+				putfh->no_verify = true;
+		}
+	}
+}
+#else
+static void
+check_if_stalefh_allowed(struct nfsd4_compoundargs *args)
+{
+}
+#endif
+
 /*
  * COMPOUND call.
  */
@@ -2004,6 +2047,7 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
 		resp->opcnt = 1;
 		goto encode_op;
 	}
+	check_if_stalefh_allowed(args);
 
 	trace_nfsd_compound(rqstp, args->opcnt);
 	while (!status && resp->opcnt < args->opcnt) {
@@ -2019,13 +2063,14 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
 				op->status = nfsd4_open_omfg(rqstp, cstate, op);
 			goto encode_op;
 		}
-
-		if (!current_fh->fh_dentry) {
+		if (!current_fh->fh_dentry &&
+				!HAS_FH_FLAG(current_fh, NFSD4_FH_FOREIGN)) {
 			if (!(op->opdesc->op_flags & ALLOWED_WITHOUT_FH)) {
 				op->status = nfserr_nofilehandle;
 				goto encode_op;
 			}
-		} else if (current_fh->fh_export->ex_fslocs.migrated &&
+		} else if (current_fh->fh_export &&
+			   current_fh->fh_export->ex_fslocs.migrated &&
 			  !(op->opdesc->op_flags & ALLOWED_ON_ABSENT_FS)) {
 			op->status = nfserr_moved;
 			goto encode_op;
diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
index 755e256..b9c7568 100644
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -35,7 +35,7 @@ static inline ino_t u32_to_ino_t(__u32 uino)
 
 	bool			fh_locked;	/* inode locked by us */
 	bool			fh_want_write;	/* remount protection taken */
-
+	int			fh_flags;	/* FH flags */
 #ifdef CONFIG_NFSD_V3
 	bool			fh_post_saved;	/* post-op attrs saved */
 	bool			fh_pre_saved;	/* pre-op attrs saved */
@@ -56,6 +56,9 @@ static inline ino_t u32_to_ino_t(__u32 uino)
 #endif /* CONFIG_NFSD_V3 */
 
 } svc_fh;
+#define NFSD4_FH_FOREIGN (1<<0)
+#define SET_FH_FLAG(c, f) ((c)->fh_flags |= (f))
+#define HAS_FH_FLAG(c, f) ((c)->fh_flags & (f))
 
 enum nfsd_fsid {
 	FSID_DEV = 0,
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 9d7318c..fbd18d6 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -221,6 +221,7 @@ struct nfsd4_lookup {
 struct nfsd4_putfh {
 	u32		pf_fhlen;           /* request */
 	char		*pf_fhval;          /* request */
+	bool		no_verify;	    /* represents foreigh fh */
 };
 
 struct nfsd4_open {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v4 8/8] NFSD add nfs4 inter ssc to nfsd4_copy
  2019-07-08 19:23 [PATCH v4 0/8] server-side support for "inter" SSC copy Olga Kornievskaia
                   ` (6 preceding siblings ...)
  2019-07-08 19:23 ` [PATCH v4 7/8] NFSD: allow inter server COPY to have a STALE source server fh Olga Kornievskaia
@ 2019-07-08 19:23 ` Olga Kornievskaia
  2019-07-09 12:43   ` Anna Schumaker
  2019-07-09  3:53 ` [PATCH v4 0/8] server-side support for "inter" SSC copy bfields
  8 siblings, 1 reply; 51+ messages in thread
From: Olga Kornievskaia @ 2019-07-08 19:23 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Given a universal address, mount the source server from the destination
server.  Use an internal mount. Call the NFS client nfs42_ssc_open to
obtain the NFS struct file suitable for nfsd_copy_range.

Ability to do "inter" server-to-server depends on the an nfsd kernel
parameter "inter_copy_offload_enabled".

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfsd/nfs4proc.c  | 291 ++++++++++++++++++++++++++++++++++++++++++++++++----
 fs/nfsd/nfs4state.c |  17 ++-
 fs/nfsd/nfssvc.c    |   6 ++
 fs/nfsd/state.h     |   4 +
 fs/nfsd/xdr4.h      |   5 +
 5 files changed, 299 insertions(+), 24 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 1039528..caf046f 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1153,6 +1153,209 @@ void nfsd4_shutdown_copy(struct nfs4_client *clp)
 	while ((copy = nfsd4_get_copy(clp)) != NULL)
 		nfsd4_stop_copy(copy);
 }
+#ifdef CONFIG_NFSD_V4_2_INTER_SSC
+
+extern struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
+				   struct nfs_fh *src_fh,
+				   nfs4_stateid *stateid);
+extern void nfs42_ssc_close(struct file *filep);
+
+extern void nfs_sb_deactive(struct super_block *sb);
+
+#define NFSD42_INTERSSC_MOUNTOPS "vers=4.2,addr=%s,sec=sys"
+
+/**
+ * Support one copy source server for now.
+ */
+static __be32
+nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
+		       struct vfsmount **mount)
+{
+	struct file_system_type *type;
+	struct vfsmount *ss_mnt;
+	struct nfs42_netaddr *naddr;
+	struct sockaddr_storage tmp_addr;
+	size_t tmp_addrlen, match_netid_len = 3;
+	char *startsep = "", *endsep = "", *match_netid = "tcp";
+	char *ipaddr, *dev_name, *raw_data;
+	int len, raw_len, status = -EINVAL;
+
+	naddr = &nss->u.nl4_addr;
+	tmp_addrlen = rpc_uaddr2sockaddr(SVC_NET(rqstp), naddr->addr,
+					 naddr->addr_len,
+					 (struct sockaddr *)&tmp_addr,
+					 sizeof(tmp_addr));
+	if (tmp_addrlen == 0)
+		goto out_err;
+
+	if (tmp_addr.ss_family == AF_INET6) {
+		startsep = "[";
+		endsep = "]";
+		match_netid = "tcp6";
+		match_netid_len = 4;
+	}
+
+	if (naddr->netid_len != match_netid_len ||
+		strncmp(naddr->netid, match_netid, naddr->netid_len))
+		goto out_err;
+
+	/* Construct the raw data for the vfs_kern_mount call */
+	len = RPC_MAX_ADDRBUFLEN + 1;
+	ipaddr = kzalloc(len, GFP_KERNEL);
+	if (!ipaddr)
+		goto out_err;
+
+	rpc_ntop((struct sockaddr *)&tmp_addr, ipaddr, len);
+
+	/* 2 for ipv6 endsep and startsep. 3 for ":/" and trailing '/0'*/
+
+	raw_len = strlen(NFSD42_INTERSSC_MOUNTOPS) + strlen(ipaddr);
+	raw_data = kzalloc(raw_len, GFP_KERNEL);
+	if (!raw_data)
+		goto out_free_ipaddr;
+
+	snprintf(raw_data, raw_len, NFSD42_INTERSSC_MOUNTOPS, ipaddr);
+
+	status = -ENODEV;
+	type = get_fs_type("nfs");
+	if (!type)
+		goto out_free_rawdata;
+
+	/* Set the server:<export> for the vfs_kern_mount call */
+	dev_name = kzalloc(len + 5, GFP_KERNEL);
+	if (!dev_name)
+		goto out_free_rawdata;
+	snprintf(dev_name, len + 5, "%s%s%s:/", startsep, ipaddr, endsep);
+
+	/* Use an 'internal' mount: SB_KERNMOUNT -> MNT_INTERNAL */
+	ss_mnt = vfs_kern_mount(type, SB_KERNMOUNT, dev_name, raw_data);
+	module_put(type->owner);
+	if (IS_ERR(ss_mnt))
+		goto out_free_devname;
+
+	status = 0;
+	*mount = ss_mnt;
+
+out_free_devname:
+	kfree(dev_name);
+out_free_rawdata:
+	kfree(raw_data);
+out_free_ipaddr:
+	kfree(ipaddr);
+out_err:
+	return status;
+}
+
+static void
+nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
+{
+	nfs_sb_deactive(ss_mnt->mnt_sb);
+	mntput(ss_mnt);
+}
+
+/**
+ * nfsd4_setup_inter_ssc
+ *
+ * Verify COPY destination stateid.
+ * Connect to the source server with NFSv4.1.
+ * Create the source struct file for nfsd_copy_range.
+ * Called with COPY cstate:
+ *    SAVED_FH: source filehandle
+ *    CURRENT_FH: destination filehandle
+ *
+ * Returns errno (not nfserrxxx)
+ */
+static __be32
+nfsd4_setup_inter_ssc(struct svc_rqst *rqstp,
+		      struct nfsd4_compound_state *cstate,
+		      struct nfsd4_copy *copy, struct vfsmount **mount)
+{
+	struct svc_fh *s_fh = NULL;
+	stateid_t *s_stid = &copy->cp_src_stateid;
+	__be32 status = -EINVAL;
+
+	/* Verify the destination stateid and set dst struct file*/
+	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
+					    &copy->cp_dst_stateid,
+					    WR_STATE, &copy->file_dst, NULL,
+					    NULL);
+	if (status)
+		goto out;
+
+	status = nfsd4_interssc_connect(&copy->cp_src, rqstp, mount);
+	if (status)
+		goto out;
+
+	s_fh = &cstate->save_fh;
+
+	copy->c_fh.size = s_fh->fh_handle.fh_size;
+	memcpy(copy->c_fh.data, &s_fh->fh_handle.fh_base, copy->c_fh.size);
+	copy->stateid.seqid = s_stid->si_generation;
+	memcpy(copy->stateid.other, (void *)&s_stid->si_opaque,
+	       sizeof(stateid_opaque_t));
+
+	status = 0;
+out:
+	return status;
+}
+
+static void
+nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *src,
+			struct file *dst)
+{
+	nfs42_ssc_close(src);
+	fput(src);
+	fput(dst);
+	mntput(ss_mnt);
+}
+
+#else /* CONFIG_NFSD_V4_2_INTER_SSC */
+
+static __be32
+nfsd4_setup_inter_ssc(struct svc_rqst *rqstp,
+		      struct nfsd4_compound_state *cstate,
+		      struct nfsd4_copy *copy,
+		      struct vfsmount **mount)
+{
+	*mount = NULL;
+	return -EINVAL;
+}
+
+static void
+nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *src,
+			struct file *dst)
+{
+}
+
+static void
+nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
+{
+}
+
+static struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
+				   struct nfs_fh *src_fh,
+				   nfs4_stateid *stateid)
+{
+	return NULL;
+}
+#endif /* CONFIG_NFSD_V4_2_INTER_SSC */
+
+static __be32
+nfsd4_setup_intra_ssc(struct svc_rqst *rqstp,
+		      struct nfsd4_compound_state *cstate,
+		      struct nfsd4_copy *copy)
+{
+	return nfsd4_verify_copy(rqstp, cstate, &copy->cp_src_stateid,
+				 &copy->file_src, &copy->cp_dst_stateid,
+				 &copy->file_dst, NULL);
+}
+
+static void
+nfsd4_cleanup_intra_ssc(struct file *src, struct file *dst)
+{
+	fput(src);
+	fput(dst);
+}
 
 static void nfsd4_cb_offload_release(struct nfsd4_callback *cb)
 {
@@ -1217,12 +1420,16 @@ static __be32 nfsd4_do_copy(struct nfsd4_copy *copy, bool sync)
 		status = nfs_ok;
 	}
 
-	fput(copy->file_src);
-	fput(copy->file_dst);
+	if (!copy->cp_intra) /* Inter server SSC */
+		nfsd4_cleanup_inter_ssc(copy->ss_mnt, copy->file_src,
+					copy->file_dst);
+	else
+		nfsd4_cleanup_intra_ssc(copy->file_src, copy->file_dst);
+
 	return status;
 }
 
-static void dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy *dst)
+static int dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy *dst)
 {
 	dst->cp_src_pos = src->cp_src_pos;
 	dst->cp_dst_pos = src->cp_dst_pos;
@@ -1232,8 +1439,17 @@ static void dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy *dst)
 	memcpy(&dst->fh, &src->fh, sizeof(src->fh));
 	dst->cp_clp = src->cp_clp;
 	dst->file_dst = get_file(src->file_dst);
-	dst->file_src = get_file(src->file_src);
+	dst->cp_intra = src->cp_intra;
+	if (src->cp_intra) /* for inter, file_src doesn't exist yet */
+		dst->file_src = get_file(src->file_src);
 	memcpy(&dst->cp_stateid, &src->cp_stateid, sizeof(src->cp_stateid));
+	memcpy(&dst->cp_src, &src->cp_src, sizeof(struct nl4_server));
+	memcpy(&dst->stateid, &src->stateid, sizeof(src->stateid));
+	memcpy(&dst->c_fh, &src->c_fh, sizeof(src->c_fh));
+	dst->ss_mnt = src->ss_mnt;
+
+	return 0;
+
 }
 
 static void cleanup_async_copy(struct nfsd4_copy *copy)
@@ -1252,7 +1468,18 @@ static int nfsd4_do_async_copy(void *data)
 	struct nfsd4_copy *copy = (struct nfsd4_copy *)data;
 	struct nfsd4_copy *cb_copy;
 
+	if (!copy->cp_intra) { /* Inter server SSC */
+		copy->file_src = nfs42_ssc_open(copy->ss_mnt, &copy->c_fh,
+					      &copy->stateid);
+		if (IS_ERR(copy->file_src)) {
+			copy->nfserr = nfserr_offload_denied;
+			nfsd4_interssc_disconnect(copy->ss_mnt);
+			goto do_callback;
+		}
+	}
+
 	copy->nfserr = nfsd4_do_copy(copy, 0);
+do_callback:
 	cb_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL);
 	if (!cb_copy)
 		goto out;
@@ -1276,11 +1503,20 @@ static int nfsd4_do_async_copy(void *data)
 	__be32 status;
 	struct nfsd4_copy *async_copy = NULL;
 
-	status = nfsd4_verify_copy(rqstp, cstate, &copy->cp_src_stateid,
-				   &copy->file_src, &copy->cp_dst_stateid,
-				   &copy->file_dst, NULL);
-	if (status)
-		goto out;
+	if (!copy->cp_intra) { /* Inter server SSC */
+		if (!inter_copy_offload_enable || copy->cp_synchronous) {
+			status = nfserr_notsupp;
+			goto out;
+		}
+		status = nfsd4_setup_inter_ssc(rqstp, cstate, copy,
+					&copy->ss_mnt);
+		if (status)
+			return nfserr_offload_denied;
+	} else {
+		status = nfsd4_setup_intra_ssc(rqstp, cstate, copy);
+		if (status)
+			return status;
+	}
 
 	copy->cp_clp = cstate->clp;
 	memcpy(&copy->fh, &cstate->current_fh.fh_handle,
@@ -1291,15 +1527,15 @@ static int nfsd4_do_async_copy(void *data)
 		status = nfserrno(-ENOMEM);
 		async_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL);
 		if (!async_copy)
-			goto out;
-		if (!nfs4_init_copy_state(nn, copy)) {
-			kfree(async_copy);
-			goto out;
-		}
+			goto out_err;
+		if (!nfs4_init_copy_state(nn, copy))
+			goto out_err;
 		refcount_set(&async_copy->refcount, 1);
 		memcpy(&copy->cp_res.cb_stateid, &copy->cp_stateid,
 			sizeof(copy->cp_stateid));
-		dup_copy_fields(copy, async_copy);
+		status = dup_copy_fields(copy, async_copy);
+		if (status)
+			goto out_err;
 		async_copy->copy_task = kthread_create(nfsd4_do_async_copy,
 				async_copy, "%s", "copy thread");
 		if (IS_ERR(async_copy->copy_task))
@@ -1310,13 +1546,17 @@ static int nfsd4_do_async_copy(void *data)
 		spin_unlock(&async_copy->cp_clp->async_lock);
 		wake_up_process(async_copy->copy_task);
 		status = nfs_ok;
-	} else
+	} else {
 		status = nfsd4_do_copy(copy, 1);
+	}
 out:
 	return status;
 out_err:
 	cleanup_async_copy(async_copy);
-	goto out;
+	status = nfserrno(-ENOMEM);
+	if (!copy->cp_intra)
+		nfsd4_interssc_disconnect(copy->ss_mnt);
+	goto out_err;
 }
 
 struct nfsd4_copy *
@@ -1342,15 +1582,24 @@ struct nfsd4_copy *
 		     union nfsd4_op_u *u)
 {
 	struct nfsd4_offload_status *os = &u->offload_status;
-	__be32 status = 0;
+	__be32 status = nfserr_bad_stateid;
 	struct nfsd4_copy *copy;
 	struct nfs4_client *clp = cstate->clp;
+	struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
 
 	copy = find_async_copy(clp, &os->stateid);
-	if (copy)
+	if (!copy) {
+		struct nfs4_cpntf_state *cps = NULL;
+
+		status = find_internal_cpntf_state(nn, &os->stateid, &cps);
+		if (status)
+			return status;
+		if (cps) {
+			free_cpntf_state(nn, &os->stateid, cps);
+			return nfs_ok;
+		}
+	} else
 		nfsd4_stop_copy(copy);
-	else
-		status = nfserr_bad_stateid;
 
 	return status;
 }
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index d7f4b96..c1a0695 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -5232,12 +5232,23 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
 
 	return 0;
 }
+void free_cpntf_state(struct nfsd_net *nn, stateid_t *st,
+		      struct nfs4_cpntf_state *cps)
+{
+	spin_lock(&nn->s2s_cp_lock);
+	list_del(&cps->cp_list);
+	idr_remove(&nn->s2s_cp_stateids, cps->cp_stateid.si_opaque.so_id);
+	nfs4_put_stid(cps->cp_p_stid);
+	kfree(cps);
+	spin_unlock(&nn->s2s_cp_lock);
+}
+
 /*
  * A READ from an inter server to server COPY will have a
  * copy stateid. Return the parent nfs4_stid.
  */
-static __be32 _find_cpntf_state(struct nfsd_net *nn, stateid_t *st,
-		     struct nfs4_cpntf_state **cps)
+__be32 find_internal_cpntf_state(struct nfsd_net *nn, stateid_t *st,
+				 struct nfs4_cpntf_state **cps)
 {
 	struct nfs4_cpntf_state *state = NULL;
 
@@ -5260,7 +5271,7 @@ static __be32 find_cpntf_state(struct nfsd_net *nn, stateid_t *st,
 	__be32 status;
 	struct nfs4_cpntf_state *cps = NULL;
 
-	status = _find_cpntf_state(nn, st, &cps);
+	status = find_internal_cpntf_state(nn, st, &cps);
 	if (status)
 		return status;
 
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 18d94ea..033bfcb 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -30,6 +30,12 @@
 
 #define NFSDDBG_FACILITY	NFSDDBG_SVC
 
+bool inter_copy_offload_enable;
+EXPORT_SYMBOL_GPL(inter_copy_offload_enable);
+module_param(inter_copy_offload_enable, bool, 0644);
+MODULE_PARM_DESC(inter_copy_offload_enable,
+		 "Enable inter server to server copy offload. Default: false");
+
 extern struct svc_program	nfsd_program;
 static int			nfsd(void *vrqstp);
 #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 106ed56..7026e2a 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -659,6 +659,10 @@ extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(struct xdr_netobj name
 extern void nfs4_put_copy(struct nfsd4_copy *copy);
 extern struct nfsd4_copy *
 find_async_copy(struct nfs4_client *clp, stateid_t *staetid);
+extern __be32 find_internal_cpntf_state(struct nfsd_net *nn, stateid_t *st,
+				struct nfs4_cpntf_state **cps);
+extern void free_cpntf_state(struct nfsd_net *nn, stateid_t *st,
+		      struct nfs4_cpntf_state *cps);
 static inline void get_nfs4_file(struct nfs4_file *fi)
 {
 	refcount_inc(&fi->fi_ref);
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index fbd18d6..bb2f8e5 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -547,7 +547,12 @@ struct nfsd4_copy {
 	struct task_struct	*copy_task;
 	refcount_t		refcount;
 	bool			stopped;
+
+	struct vfsmount		*ss_mnt;
+	struct nfs_fh		c_fh;
+	nfs4_stateid		stateid;
 };
+extern bool inter_copy_offload_enable;
 
 struct nfsd4_seek {
 	/* request */
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 0/8] server-side support for "inter" SSC copy
  2019-07-08 19:23 [PATCH v4 0/8] server-side support for "inter" SSC copy Olga Kornievskaia
                   ` (7 preceding siblings ...)
  2019-07-08 19:23 ` [PATCH v4 8/8] NFSD add nfs4 inter ssc to nfsd4_copy Olga Kornievskaia
@ 2019-07-09  3:53 ` bfields
  2019-07-09 15:47   ` Olga Kornievskaia
  8 siblings, 1 reply; 51+ messages in thread
From: bfields @ 2019-07-09  3:53 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: bfields, linux-nfs

Thanks for resending.  What's the status of Amir's series?  I guess I've
been using that as an excuse to put off reviewing these, but I really
should anyway....

--b.

On Mon, Jul 08, 2019 at 03:23:44PM -0400, Olga Kornievskaia wrote:
> This patch series adds support for NFSv4.2 copy offload feature
> allowing copy between two different NFS servers.
> 
> This functionality depends on the VFS ability to support generic
> copy_file_range() where a copy is done between an NFS file and
> a local file system. This is on top of Amir's VFS generic copy
> offload series.
> 
> This feature is enabled by the kernel module parameter --
> inter_copy_offload_enable -- and by default is disabled. There is
> also a kernel compile configuration of NFSD_V4_2_INTER_SSC that
> adds dependency on the NFS client side functions called from the
> server.
> 
> These patches work on top of existing async intra copy offload
> patches. For the "inter" SSC, the implementation only supports
> asynchronous inter copy.
> 
> On the source server, upon receiving a COPY_NOTIFY, it generate a
> unique stateid that's kept in the global list. Upon receiving a READ
> with a stateid, the code checks the normal list of open stateid and
> now additionally, it'll check the copy state list as well before
> deciding to either fail with BAD_STATEID or find one that matches.
> The stored stateid is only valid to be used for the first time
> with a choosen lease period (90s currently). When the source server
> received an OFFLOAD_CANCEL, it will remove the stateid from the
> global list. Otherwise, the copy stateid is removed upon the removal
> of its "parent" stateid (open/lock/delegation stateid).
> 
> On the destination server, upon receiving a COPY request, the server
> establishes the necessary clientid/session with the source server.
> It calls into the NFS client code to establish the necessary
> open stateid, filehandle, file description (without doing an NFS open).
> Then the server calls into the copy_file_range() to preform the copy
> where the source file will issue NFS READs and then do local file
> system writes (this depends on the VFS ability to do cross device
> copy_file_range().
> 
> v4:
> --- allowing for synchronous inter server-to-server copy
> --- added missing offload_cancel on the source server
> 
> Already presented numbers for performance improvement for large
> file transfer but here are times for copying linux kernel tree
> (which is mostly small files):
> -- regular cp 6m1s (intra)
> -- copy offload cp 4m11s (intra)
>    -- benefit of using copy offload with small copies using sync copy
> -- regular cp 6m9s (inter)
> -- copy offload cp 6m3s (inter)
>    -- same performance as traditional as for most it fallback to traditional
> copy offload
> 
> Olga Kornievskaia (8):
>   NFSD fill-in netloc4 structure
>   NFSD add ca_source_server<> to COPY
>   NFSD return nfs4_stid in nfs4_preprocess_stateid_op
>   NFSD add COPY_NOTIFY operation
>   NFSD check stateids against copy stateids
>   NFSD generalize nfsd4_compound_state flag names
>   NFSD: allow inter server COPY to have a STALE source server fh
>   NFSD add nfs4 inter ssc to nfsd4_copy
> 
>  fs/nfsd/Kconfig     |  10 ++
>  fs/nfsd/nfs4proc.c  | 434 +++++++++++++++++++++++++++++++++++++++++++++++-----
>  fs/nfsd/nfs4state.c | 135 ++++++++++++++--
>  fs/nfsd/nfs4xdr.c   | 172 ++++++++++++++++++++-
>  fs/nfsd/nfsd.h      |  32 ++++
>  fs/nfsd/nfsfh.h     |   5 +-
>  fs/nfsd/nfssvc.c    |   6 +
>  fs/nfsd/state.h     |  25 ++-
>  fs/nfsd/xdr4.h      |  37 ++++-
>  9 files changed, 790 insertions(+), 66 deletions(-)
> 
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 4/8] NFSD add COPY_NOTIFY operation
  2019-07-08 19:23 ` [PATCH v4 4/8] NFSD add COPY_NOTIFY operation Olga Kornievskaia
@ 2019-07-09 12:34   ` Anna Schumaker
  2019-07-09 15:51     ` Olga Kornievskaia
  2019-07-17 22:12   ` bfields
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 51+ messages in thread
From: Anna Schumaker @ 2019-07-09 12:34 UTC (permalink / raw)
  To: Olga Kornievskaia, bfields; +Cc: linux-nfs

Hi Olga,

On Mon, 2019-07-08 at 15:23 -0400, Olga Kornievskaia wrote:
> From: Olga Kornievskaia <kolga@netapp.com>
> 
> Introducing the COPY_NOTIFY operation.
> 
> Create a new unique stateid that will keep track of the copy
> state and the upcoming READs that will use that stateid. Keep
> it in the list associated with parent stateid.
> 
> Return single netaddr to advertise to the copy.
> 
> Signed-off-by: Andy Adamson <andros@netapp.com>
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  fs/nfsd/nfs4proc.c  | 71 +++++++++++++++++++++++++++++++++++----
>  fs/nfsd/nfs4state.c | 64 +++++++++++++++++++++++++++++++----
>  fs/nfsd/nfs4xdr.c   | 97
> +++++++++++++++++++++++++++++++++++++++++++++++++++--
>  fs/nfsd/state.h     | 18 ++++++++--
>  fs/nfsd/xdr4.h      | 13 +++++++
>  5 files changed, 247 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index cfd8767..c39fa72 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -37,6 +37,7 @@
>  #include <linux/falloc.h>
>  #include <linux/slab.h>
>  #include <linux/kthread.h>
> +#include <linux/sunrpc/addr.h>
>  
>  #include "idmap.h"
>  #include "cache.h"
> @@ -1033,7 +1034,8 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst
> *rqstp, struct svc_fh *fh)
>  static __be32
>  nfsd4_verify_copy(struct svc_rqst *rqstp, struct
> nfsd4_compound_state *cstate,
>  		  stateid_t *src_stateid, struct file **src,
> -		  stateid_t *dst_stateid, struct file **dst)
> +		  stateid_t *dst_stateid, struct file **dst,
> +		  struct nfs4_stid **stid)
>  {
>  	__be32 status;
>  
> @@ -1050,7 +1052,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst
> *rqstp, struct svc_fh *fh)
>  
>  	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate-
> >current_fh,
>  					    dst_stateid, WR_STATE, dst,
> NULL,
> -					    NULL);
> +					    stid);
>  	if (status) {
>  		dprintk("NFSD: %s: couldn't process dst stateid!\n",
> __func__);
>  		goto out_put_src;
> @@ -1081,7 +1083,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst
> *rqstp, struct svc_fh *fh)
>  	__be32 status;
>  
>  	status = nfsd4_verify_copy(rqstp, cstate, &clone-
> >cl_src_stateid, &src,
> -				   &clone->cl_dst_stateid, &dst);
> +				   &clone->cl_dst_stateid, &dst, NULL);
>  	if (status)
>  		goto out;
>  
> @@ -1228,7 +1230,7 @@ static void dup_copy_fields(struct nfsd4_copy
> *src, struct nfsd4_copy *dst)
>  
>  static void cleanup_async_copy(struct nfsd4_copy *copy)
>  {
> -	nfs4_free_cp_state(copy);
> +	nfs4_free_copy_state(copy);
>  	fput(copy->file_dst);
>  	fput(copy->file_src);
>  	spin_lock(&copy->cp_clp->async_lock);
> @@ -1268,7 +1270,7 @@ static int nfsd4_do_async_copy(void *data)
>  
>  	status = nfsd4_verify_copy(rqstp, cstate, &copy-
> >cp_src_stateid,
>  				   &copy->file_src, &copy-
> >cp_dst_stateid,
> -				   &copy->file_dst);
> +				   &copy->file_dst, NULL);
>  	if (status)
>  		goto out;
>  
> @@ -1282,7 +1284,7 @@ static int nfsd4_do_async_copy(void *data)
>  		async_copy = kzalloc(sizeof(struct nfsd4_copy),
> GFP_KERNEL);
>  		if (!async_copy)
>  			goto out;
> -		if (!nfs4_init_cp_state(nn, copy)) {
> +		if (!nfs4_init_copy_state(nn, copy)) {
>  			kfree(async_copy);
>  			goto out;
>  		}
> @@ -1346,6 +1348,42 @@ struct nfsd4_copy *
>  }
>  
>  static __be32
> +nfsd4_copy_notify(struct svc_rqst *rqstp, struct
> nfsd4_compound_state *cstate,
> +		  union nfsd4_op_u *u)
> +{
> +	struct nfsd4_copy_notify *cn = &u->copy_notify;
> +	__be32 status;
> +	struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
> +	struct nfs4_stid *stid;
> +	struct nfs4_cpntf_state *cps;
> +
> +	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate-
> >current_fh,
> +					&cn->cpn_src_stateid, RD_STATE,
> NULL,
> +					NULL, &stid);
> +	if (status)
> +		return status;
> +
> +	cn->cpn_sec = nn->nfsd4_lease;
> +	cn->cpn_nsec = 0;
> +
> +	status = nfserrno(-ENOMEM);
> +	cps = nfs4_alloc_init_cpntf_state(nn, stid);
> +	if (!cps)
> +		return status;
> +	memcpy(&cn->cpn_cnr_stateid, &cps->cp_stateid,
> sizeof(stateid_t));
> +
> +	/* For now, only return one server address in cpn_src, the
> +	 * address used by the client to connect to this server.
> +	 */
> +	cn->cpn_src.nl4_type = NL4_NETADDR;
> +	status = nfsd4_set_netaddr((struct sockaddr *)&rqstp->rq_daddr,
> +				 &cn->cpn_src.u.nl4_addr);
> +	WARN_ON_ONCE(status);
> +
> +	return status;
> +}
> +
> +static __be32
>  nfsd4_fallocate(struct svc_rqst *rqstp, struct nfsd4_compound_state
> *cstate,
>  		struct nfsd4_fallocate *fallocate, int flags)
>  {
> @@ -2298,6 +2336,21 @@ static inline u32
> nfsd4_offload_status_rsize(struct svc_rqst *rqstp,
>  		1 /* osr_complete<1> optional 0 for now */) *
> sizeof(__be32);
>  }
>  
> +static inline u32 nfsd4_copy_notify_rsize(struct svc_rqst *rqstp,
> +					struct nfsd4_op *op)
> +{
> +	return (op_encode_hdr_size +
> +		3 /* cnr_lease_time */ +
> +		1 /* We support one cnr_source_server */ +
> +		1 /* cnr_stateid seq */ +
> +		op_encode_stateid_maxsz /* cnr_stateid */ +
> +		1 /* num cnr_source_server*/ +
> +		1 /* nl4_type */ +
> +		1 /* nl4 size */ +
> +		XDR_QUADLEN(NFS4_OPAQUE_LIMIT) /*nl4_loc + nl4_loc_sz
> */)
> +		* sizeof(__be32);
> +}
> +
>  #ifdef CONFIG_NFSD_PNFS
>  static inline u32 nfsd4_getdeviceinfo_rsize(struct svc_rqst *rqstp,
> struct nfsd4_op *op)
>  {
> @@ -2722,6 +2775,12 @@ static inline u32 nfsd4_seek_rsize(struct
> svc_rqst *rqstp, struct nfsd4_op *op)
>  		.op_name = "OP_OFFLOAD_CANCEL",
>  		.op_rsize_bop = nfsd4_only_status_rsize,
>  	},
> +	[OP_COPY_NOTIFY] = {
> +		.op_func = nfsd4_copy_notify,
> +		.op_flags = OP_MODIFIES_SOMETHING,
> +		.op_name = "OP_COPY_NOTIFY",
> +		.op_rsize_bop = nfsd4_copy_notify_rsize,
> +	},
>  };
>  
>  /**
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 05c0295..2555eb9 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -707,6 +707,7 @@ struct nfs4_stid *nfs4_alloc_stid(struct
> nfs4_client *cl, struct kmem_cache *sla
>  	/* Will be incremented before return to client: */
>  	refcount_set(&stid->sc_count, 1);
>  	spin_lock_init(&stid->sc_lock);
> +	INIT_LIST_HEAD(&stid->sc_cp_list);
>  
>  	/*
>  	 * It shouldn't be a problem to reuse an opaque stateid value.
> @@ -726,24 +727,53 @@ struct nfs4_stid *nfs4_alloc_stid(struct
> nfs4_client *cl, struct kmem_cache *sla
>  /*
>   * Create a unique stateid_t to represent each COPY.
>   */
> -int nfs4_init_cp_state(struct nfsd_net *nn, struct nfsd4_copy *copy)
> +static int nfs4_init_cp_state(struct nfsd_net *nn, void *ptr,
> stateid_t *stid)
>  {
>  	int new_id;
>  
>  	idr_preload(GFP_KERNEL);
>  	spin_lock(&nn->s2s_cp_lock);
> -	new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, copy, 0, 0,
> GFP_NOWAIT);
> +	new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, ptr, 0, 0,
> GFP_NOWAIT);
>  	spin_unlock(&nn->s2s_cp_lock);
>  	idr_preload_end();
>  	if (new_id < 0)
>  		return 0;
> -	copy->cp_stateid.si_opaque.so_id = new_id;
> -	copy->cp_stateid.si_opaque.so_clid.cl_boot = nn->boot_time;
> -	copy->cp_stateid.si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
> +	stid->si_opaque.so_id = new_id;
> +	stid->si_opaque.so_clid.cl_boot = nn->boot_time;
> +	stid->si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
>  	return 1;
>  }
>  
> -void nfs4_free_cp_state(struct nfsd4_copy *copy)
> +int nfs4_init_copy_state(struct nfsd_net *nn, struct nfsd4_copy
> *copy)
> +{
> +	return nfs4_init_cp_state(nn, copy, &copy->cp_stateid);
> +}
> +
> +struct nfs4_cpntf_state *nfs4_alloc_init_cpntf_state(struct nfsd_net
> *nn,
> +						     struct nfs4_stid
> *p_stid)
> +{
> +	struct nfs4_cpntf_state *cps;
> +
> +	cps = kzalloc(sizeof(struct nfs4_cpntf_state), GFP_KERNEL);
> +	if (!cps)
> +		return NULL;
> +	if (!nfs4_init_cp_state(nn, cps, &cps->cp_stateid))
> +		goto out_free;
> +	cps->cp_p_stid = p_stid;
> +	cps->cp_active = false;
> +	cps->cp_timeout = jiffies + (nn->nfsd4_lease * HZ);
> +	INIT_LIST_HEAD(&cps->cp_list);
> +	spin_lock(&nn->s2s_cp_lock);
> +	list_add(&cps->cp_list, &p_stid->sc_cp_list);
> +	spin_unlock(&nn->s2s_cp_lock);
> +
> +	return cps;
> +out_free:
> +	kfree(cps);
> +	return NULL;
> +}
> +
> +void nfs4_free_copy_state(struct nfsd4_copy *copy)
>  {
>  	struct nfsd_net *nn;
>  
> @@ -753,6 +783,27 @@ void nfs4_free_cp_state(struct nfsd4_copy *copy)
>  	spin_unlock(&nn->s2s_cp_lock);
>  }
>  
> +static void nfs4_free_cpntf_statelist(struct net *net, struct
> nfs4_stid *stid)
> +{
> +	struct nfs4_cpntf_state *cps;
> +	struct nfsd_net *nn;
> +
> +	nn = net_generic(net, nfsd_net_id);
> +
> +	might_sleep();
> +
> +	spin_lock(&nn->s2s_cp_lock);
> +	while (!list_empty(&stid->sc_cp_list)) {
> +		cps = list_first_entry(&stid->sc_cp_list,
> +				       struct nfs4_cpntf_state,
> cp_list);
> +		list_del(&cps->cp_list);
> +		idr_remove(&nn->s2s_cp_stateids,
> +			   cps->cp_stateid.si_opaque.so_id);
> +		kfree(cps);
> +	}
> +	spin_unlock(&nn->s2s_cp_lock);
> +}
> +
>  static struct nfs4_ol_stateid * nfs4_alloc_open_stateid(struct
> nfs4_client *clp)
>  {
>  	struct nfs4_stid *stid;
> @@ -901,6 +952,7 @@ static void block_delegations(struct knfsd_fh
> *fh)
>  	}
>  	idr_remove(&clp->cl_stateids, s->sc_stateid.si_opaque.so_id);
>  	spin_unlock(&clp->cl_lock);
> +	nfs4_free_cpntf_statelist(clp->net, s);
>  	s->sc_free(s);
>  	if (fp)
>  		put_nfs4_file(fp);
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 15f53bb..ed37528 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -1847,6 +1847,22 @@ static __be32 nfsd4_decode_nl4_server(struct
> nfsd4_compoundargs *argp,
>  }
>  
>  static __be32
> +nfsd4_decode_copy_notify(struct nfsd4_compoundargs *argp,
> +			 struct nfsd4_copy_notify *cn)
> +{
> +	int status;
> +
> +	status = nfsd4_decode_stateid(argp, &cn->cpn_src_stateid);
> +	if (status)
> +		return status;
> +	status = nfsd4_decode_nl4_server(argp, &cn->cpn_dst);

Maybe this could be simplified to "return nfsd4_decode_nl4_server()" ?

> +	if (status)
> +		return status;
> +
> +	return status;
> +}
> +
> +static __be32
>  nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek
> *seek)
>  {
>  	DECODE_HEAD;
> @@ -1947,7 +1963,7 @@ static __be32 nfsd4_decode_nl4_server(struct
> nfsd4_compoundargs *argp,
>  	/* new operations for NFSv4.2 */
>  	[OP_ALLOCATE]		= (nfsd4_dec)nfsd4_decode_fallocate,
>  	[OP_COPY]		= (nfsd4_dec)nfsd4_decode_copy,
> -	[OP_COPY_NOTIFY]	= (nfsd4_dec)nfsd4_decode_notsupp,
> +	[OP_COPY_NOTIFY]	= (nfsd4_dec)nfsd4_decode_copy_notify,
>  	[OP_DEALLOCATE]		= (nfsd4_dec)nfsd4_decode_fallocate,
>  	[OP_IO_ADVISE]		= (nfsd4_dec)nfsd4_decode_notsupp,
>  	[OP_LAYOUTERROR]	= (nfsd4_dec)nfsd4_decode_notsupp,
> @@ -4336,6 +4352,45 @@ static __be32 nfsd4_encode_readv(struct
> nfsd4_compoundres *resp,
>  }
>  
>  static __be32
> +nfsd42_encode_nl4_server(struct nfsd4_compoundres *resp, struct
> nl4_server *ns)
> +{
> +	struct xdr_stream *xdr = &resp->xdr;
> +	struct nfs42_netaddr *addr;
> +	__be32 *p;
> +
> +	p = xdr_reserve_space(xdr, 4);
> +	*p++ = cpu_to_be32(ns->nl4_type);
> +
> +	switch (ns->nl4_type) {
> +	case NL4_NETADDR:
> +		addr = &ns->u.nl4_addr;
> +
> +		/* netid_len, netid, uaddr_len, uaddr (port included
> +		 * in RPCBIND_MAXUADDRLEN)
> +		 */
> +		p = xdr_reserve_space(xdr,
> +			4 /* netid len */ +
> +			(XDR_QUADLEN(addr->netid_len) * 4) +
> +			4 /* uaddr len */ +
> +			(XDR_QUADLEN(addr->addr_len) * 4));
> +		if (!p)
> +			return nfserr_resource;
> +
> +		*p++ = cpu_to_be32(addr->netid_len);
> +		p = xdr_encode_opaque_fixed(p, addr->netid,
> +					    addr->netid_len);
> +		*p++ = cpu_to_be32(addr->addr_len);
> +		p = xdr_encode_opaque_fixed(p, addr->addr,
> +					addr->addr_len);
> +		break;
> +	default:
> +		WARN_ON(ns->nl4_type != NL4_NETADDR);
> +	}
> +
> +	return 0;
> +}
> +
> +static __be32
>  nfsd4_encode_copy(struct nfsd4_compoundres *resp, __be32 nfserr,
>  		  struct nfsd4_copy *copy)
>  {
> @@ -4369,6 +4424,44 @@ static __be32 nfsd4_encode_readv(struct
> nfsd4_compoundres *resp,
>  }
>  
>  static __be32
> +nfsd4_encode_copy_notify(struct nfsd4_compoundres *resp, __be32
> nfserr,
> +			 struct nfsd4_copy_notify *cn)
> +{
> +	struct xdr_stream *xdr = &resp->xdr;
> +	__be32 *p;
> +
> +	if (nfserr)
> +		return nfserr;
> +
> +	/* 8 sec, 4 nsec */
> +	p = xdr_reserve_space(xdr, 12);
> +	if (!p)
> +		return nfserr_resource;
> +
> +	/* cnr_lease_time */
> +	p = xdr_encode_hyper(p, cn->cpn_sec);
> +	*p++ = cpu_to_be32(cn->cpn_nsec);
> +
> +	/* cnr_stateid */
> +	nfserr = nfsd4_encode_stateid(xdr, &cn->cpn_cnr_stateid);
> +	if (nfserr)
> +		return nfserr;
> +
> +	/* cnr_src.nl_nsvr */
> +	p = xdr_reserve_space(xdr, 4);
> +	if (!p)
> +		return nfserr_resource;
> +
> +	*p++ = cpu_to_be32(1);
> +
> +	nfserr = nfsd42_encode_nl4_server(resp, &cn->cpn_src);

This could be simplified, too: "return nfsd42_encode_nl4_server()" 

Thanks,
Anna

> +	if (nfserr)
> +		return nfserr;
> +
> +	return nfserr;
> +}
> +
> +static __be32
>  nfsd4_encode_seek(struct nfsd4_compoundres *resp, __be32 nfserr,
>  		  struct nfsd4_seek *seek)
>  {
> @@ -4465,7 +4558,7 @@ static __be32 nfsd4_encode_readv(struct
> nfsd4_compoundres *resp,
>  	/* NFSv4.2 operations */
>  	[OP_ALLOCATE]		= (nfsd4_enc)nfsd4_encode_noop,
>  	[OP_COPY]		= (nfsd4_enc)nfsd4_encode_copy,
> -	[OP_COPY_NOTIFY]	= (nfsd4_enc)nfsd4_encode_noop,
> +	[OP_COPY_NOTIFY]	= (nfsd4_enc)nfsd4_encode_copy_notify,
>  	[OP_DEALLOCATE]		= (nfsd4_enc)nfsd4_encode_noop,
>  	[OP_IO_ADVISE]		= (nfsd4_enc)nfsd4_encode_noop,
>  	[OP_LAYOUTERROR]	= (nfsd4_enc)nfsd4_encode_noop,
> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> index 5da9cc3..106ed56 100644
> --- a/fs/nfsd/state.h
> +++ b/fs/nfsd/state.h
> @@ -95,6 +95,7 @@ struct nfs4_stid {
>  #define NFS4_REVOKED_DELEG_STID 16
>  #define NFS4_CLOSED_DELEG_STID 32
>  #define NFS4_LAYOUT_STID 64
> +	struct list_head	sc_cp_list;
>  	unsigned char		sc_type;
>  	stateid_t		sc_stateid;
>  	spinlock_t		sc_lock;
> @@ -103,6 +104,17 @@ struct nfs4_stid {
>  	void			(*sc_free)(struct nfs4_stid *);
>  };
>  
> +/* Keep a list of stateids issued by the COPY_NOTIFY, associate it
> with the
> + * parent OPEN/LOCK/DELEG stateid.
> + */
> +struct nfs4_cpntf_state {
> +	stateid_t		cp_stateid;
> +	struct list_head	cp_list;	/* per parent nfs4_stid */
> +	struct nfs4_stid	*cp_p_stid;	/* pointer to parent */
> +	bool			cp_active;	/* has the copy
> started */
> +	unsigned long		cp_timeout;	/* copy timeout */
> +};
> +
>  /*
>   * Represents a delegation stateid. The nfs4_client holds references
> to these
>   * and they are put when it is being destroyed or when the
> delegation is
> @@ -614,8 +626,10 @@ __be32 nfsd4_lookup_stateid(struct
> nfsd4_compound_state *cstate,
>  		     struct nfs4_stid **s, struct nfsd_net *nn);
>  struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct
> kmem_cache *slab,
>  				  void (*sc_free)(struct nfs4_stid *));
> -int nfs4_init_cp_state(struct nfsd_net *nn, struct nfsd4_copy
> *copy);
> -void nfs4_free_cp_state(struct nfsd4_copy *copy);
> +int nfs4_init_copy_state(struct nfsd_net *nn, struct nfsd4_copy
> *copy);
> +void nfs4_free_copy_state(struct nfsd4_copy *copy);
> +struct nfs4_cpntf_state *nfs4_alloc_init_cpntf_state(struct nfsd_net
> *nn,
> +			struct nfs4_stid *p_stid);
>  void nfs4_unhash_stid(struct nfs4_stid *s);
>  void nfs4_put_stid(struct nfs4_stid *s);
>  void nfs4_inc_and_copy_stateid(stateid_t *dst, struct nfs4_stid
> *stid);
> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
> index 513c9ff..bade8e5 100644
> --- a/fs/nfsd/xdr4.h
> +++ b/fs/nfsd/xdr4.h
> @@ -568,6 +568,18 @@ struct nfsd4_offload_status {
>  	u32		status;
>  };
>  
> +struct nfsd4_copy_notify {
> +	/* request */
> +	stateid_t		cpn_src_stateid;
> +	struct nl4_server	cpn_dst;
> +
> +	/* response */
> +	stateid_t		cpn_cnr_stateid;
> +	u64			cpn_sec;
> +	u32			cpn_nsec;
> +	struct nl4_server	cpn_src;
> +};
> +
>  struct nfsd4_op {
>  	int					opnum;
>  	const struct nfsd4_operation *		opdesc;
> @@ -627,6 +639,7 @@ struct nfsd4_op {
>  		struct nfsd4_clone		clone;
>  		struct nfsd4_copy		copy;
>  		struct nfsd4_offload_status	offload_status;
> +		struct nfsd4_copy_notify	copy_notify;
>  		struct nfsd4_seek		seek;
>  	} u;
>  	struct nfs4_replay *			replay;


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 8/8] NFSD add nfs4 inter ssc to nfsd4_copy
  2019-07-08 19:23 ` [PATCH v4 8/8] NFSD add nfs4 inter ssc to nfsd4_copy Olga Kornievskaia
@ 2019-07-09 12:43   ` Anna Schumaker
  2019-07-09 15:53     ` Olga Kornievskaia
  0 siblings, 1 reply; 51+ messages in thread
From: Anna Schumaker @ 2019-07-09 12:43 UTC (permalink / raw)
  To: Olga Kornievskaia, bfields; +Cc: linux-nfs

Hi Olga,

On Mon, 2019-07-08 at 15:23 -0400, Olga Kornievskaia wrote:
> Given a universal address, mount the source server from the
> destination
> server.  Use an internal mount. Call the NFS client nfs42_ssc_open to
> obtain the NFS struct file suitable for nfsd_copy_range.
> 
> Ability to do "inter" server-to-server depends on the an nfsd kernel
> parameter "inter_copy_offload_enabled".
> 
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  fs/nfsd/nfs4proc.c  | 291
> ++++++++++++++++++++++++++++++++++++++++++++++++----
>  fs/nfsd/nfs4state.c |  17 ++-
>  fs/nfsd/nfssvc.c    |   6 ++
>  fs/nfsd/state.h     |   4 +
>  fs/nfsd/xdr4.h      |   5 +
>  5 files changed, 299 insertions(+), 24 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 1039528..caf046f 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -1153,6 +1153,209 @@ void nfsd4_shutdown_copy(struct nfs4_client
> *clp)
>  	while ((copy = nfsd4_get_copy(clp)) != NULL)
>  		nfsd4_stop_copy(copy);
>  }
> +#ifdef CONFIG_NFSD_V4_2_INTER_SSC
> +
> +extern struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
> +				   struct nfs_fh *src_fh,
> +				   nfs4_stateid *stateid);
> +extern void nfs42_ssc_close(struct file *filep);
> +
> +extern void nfs_sb_deactive(struct super_block *sb);
> +
> +#define NFSD42_INTERSSC_MOUNTOPS "vers=4.2,addr=%s,sec=sys"
> +
> +/**
> + * Support one copy source server for now.
> + */
> +static __be32
> +nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst
> *rqstp,
> +		       struct vfsmount **mount)
> +{
> +	struct file_system_type *type;
> +	struct vfsmount *ss_mnt;
> +	struct nfs42_netaddr *naddr;
> +	struct sockaddr_storage tmp_addr;
> +	size_t tmp_addrlen, match_netid_len = 3;
> +	char *startsep = "", *endsep = "", *match_netid = "tcp";
> +	char *ipaddr, *dev_name, *raw_data;
> +	int len, raw_len, status = -EINVAL;
> +
> +	naddr = &nss->u.nl4_addr;
> +	tmp_addrlen = rpc_uaddr2sockaddr(SVC_NET(rqstp), naddr->addr,
> +					 naddr->addr_len,
> +					 (struct sockaddr *)&tmp_addr,
> +					 sizeof(tmp_addr));
> +	if (tmp_addrlen == 0)
> +		goto out_err;
> +
> +	if (tmp_addr.ss_family == AF_INET6) {
> +		startsep = "[";
> +		endsep = "]";
> +		match_netid = "tcp6";
> +		match_netid_len = 4;
> +	}
> +
> +	if (naddr->netid_len != match_netid_len ||
> +		strncmp(naddr->netid, match_netid, naddr->netid_len))
> +		goto out_err;
> +
> +	/* Construct the raw data for the vfs_kern_mount call */
> +	len = RPC_MAX_ADDRBUFLEN + 1;
> +	ipaddr = kzalloc(len, GFP_KERNEL);
> +	if (!ipaddr)
> +		goto out_err;
> +
> +	rpc_ntop((struct sockaddr *)&tmp_addr, ipaddr, len);
> +
> +	/* 2 for ipv6 endsep and startsep. 3 for ":/" and trailing
> '/0'*/
> +
> +	raw_len = strlen(NFSD42_INTERSSC_MOUNTOPS) + strlen(ipaddr);
> +	raw_data = kzalloc(raw_len, GFP_KERNEL);
> +	if (!raw_data)
> +		goto out_free_ipaddr;
> +
> +	snprintf(raw_data, raw_len, NFSD42_INTERSSC_MOUNTOPS, ipaddr);
> +
> +	status = -ENODEV;
> +	type = get_fs_type("nfs");
> +	if (!type)
> +		goto out_free_rawdata;
> +
> +	/* Set the server:<export> for the vfs_kern_mount call */
> +	dev_name = kzalloc(len + 5, GFP_KERNEL);
> +	if (!dev_name)
> +		goto out_free_rawdata;
> +	snprintf(dev_name, len + 5, "%s%s%s:/", startsep, ipaddr,
> endsep);
> +
> +	/* Use an 'internal' mount: SB_KERNMOUNT -> MNT_INTERNAL */
> +	ss_mnt = vfs_kern_mount(type, SB_KERNMOUNT, dev_name,
> raw_data);
> +	module_put(type->owner);
> +	if (IS_ERR(ss_mnt))
> +		goto out_free_devname;
> +
> +	status = 0;
> +	*mount = ss_mnt;
> +
> +out_free_devname:
> +	kfree(dev_name);
> +out_free_rawdata:
> +	kfree(raw_data);
> +out_free_ipaddr:
> +	kfree(ipaddr);
> +out_err:
> +	return status;
> +}
> +
> +static void
> +nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
> +{
> +	nfs_sb_deactive(ss_mnt->mnt_sb);
> +	mntput(ss_mnt);
> +}
> +
> +/**
> + * nfsd4_setup_inter_ssc
> + *
> + * Verify COPY destination stateid.
> + * Connect to the source server with NFSv4.1.
> + * Create the source struct file for nfsd_copy_range.
> + * Called with COPY cstate:
> + *    SAVED_FH: source filehandle
> + *    CURRENT_FH: destination filehandle
> + *
> + * Returns errno (not nfserrxxx)
> + */
> +static __be32
> +nfsd4_setup_inter_ssc(struct svc_rqst *rqstp,
> +		      struct nfsd4_compound_state *cstate,
> +		      struct nfsd4_copy *copy, struct vfsmount **mount)
> +{
> +	struct svc_fh *s_fh = NULL;
> +	stateid_t *s_stid = &copy->cp_src_stateid;
> +	__be32 status = -EINVAL;
> +
> +	/* Verify the destination stateid and set dst struct file*/
> +	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate-
> >current_fh,
> +					    &copy->cp_dst_stateid,
> +					    WR_STATE, &copy->file_dst,
> NULL,
> +					    NULL);
> +	if (status)
> +		goto out;
> +
> +	status = nfsd4_interssc_connect(&copy->cp_src, rqstp, mount);
> +	if (status)
> +		goto out;
> +
> +	s_fh = &cstate->save_fh;
> +
> +	copy->c_fh.size = s_fh->fh_handle.fh_size;
> +	memcpy(copy->c_fh.data, &s_fh->fh_handle.fh_base, copy-
> >c_fh.size);
> +	copy->stateid.seqid = s_stid->si_generation;
> +	memcpy(copy->stateid.other, (void *)&s_stid->si_opaque,
> +	       sizeof(stateid_opaque_t));
> +
> +	status = 0;
> +out:
> +	return status;
> +}
> +
> +static void
> +nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *src,
> +			struct file *dst)
> +{
> +	nfs42_ssc_close(src);
> +	fput(src);
> +	fput(dst);
> +	mntput(ss_mnt);
> +}
> +
> +#else /* CONFIG_NFSD_V4_2_INTER_SSC */
> +
> +static __be32
> +nfsd4_setup_inter_ssc(struct svc_rqst *rqstp,
> +		      struct nfsd4_compound_state *cstate,
> +		      struct nfsd4_copy *copy,
> +		      struct vfsmount **mount)
> +{
> +	*mount = NULL;
> +	return -EINVAL;
> +}
> +
> +static void
> +nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *src,
> +			struct file *dst)
> +{
> +}
> +
> +static void
> +nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
> +{
> +}
> +
> +static struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
> +				   struct nfs_fh *src_fh,
> +				   nfs4_stateid *stateid)
> +{
> +	return NULL;
> +}
> +#endif /* CONFIG_NFSD_V4_2_INTER_SSC */
> +
> +static __be32
> +nfsd4_setup_intra_ssc(struct svc_rqst *rqstp,
> +		      struct nfsd4_compound_state *cstate,
> +		      struct nfsd4_copy *copy)
> +{
> +	return nfsd4_verify_copy(rqstp, cstate, &copy->cp_src_stateid,
> +				 &copy->file_src, &copy-
> >cp_dst_stateid,
> +				 &copy->file_dst, NULL);
> +}
> +
> +static void
> +nfsd4_cleanup_intra_ssc(struct file *src, struct file *dst)
> +{
> +	fput(src);
> +	fput(dst);
> +}
>  
>  static void nfsd4_cb_offload_release(struct nfsd4_callback *cb)
>  {
> @@ -1217,12 +1420,16 @@ static __be32 nfsd4_do_copy(struct nfsd4_copy
> *copy, bool sync)
>  		status = nfs_ok;
>  	}
>  
> -	fput(copy->file_src);
> -	fput(copy->file_dst);
> +	if (!copy->cp_intra) /* Inter server SSC */
> +		nfsd4_cleanup_inter_ssc(copy->ss_mnt, copy->file_src,
> +					copy->file_dst);
> +	else
> +		nfsd4_cleanup_intra_ssc(copy->file_src, copy-
> >file_dst);
> +
>  	return status;
>  }
>  
> -static void dup_copy_fields(struct nfsd4_copy *src, struct
> nfsd4_copy *dst)
> +static int dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy
> *dst)
>  {
>  	dst->cp_src_pos = src->cp_src_pos;
>  	dst->cp_dst_pos = src->cp_dst_pos;
> @@ -1232,8 +1439,17 @@ static void dup_copy_fields(struct nfsd4_copy
> *src, struct nfsd4_copy *dst)
>  	memcpy(&dst->fh, &src->fh, sizeof(src->fh));
>  	dst->cp_clp = src->cp_clp;
>  	dst->file_dst = get_file(src->file_dst);
> -	dst->file_src = get_file(src->file_src);
> +	dst->cp_intra = src->cp_intra;
> +	if (src->cp_intra) /* for inter, file_src doesn't exist yet */
> +		dst->file_src = get_file(src->file_src);
>  	memcpy(&dst->cp_stateid, &src->cp_stateid, sizeof(src-
> >cp_stateid));
> +	memcpy(&dst->cp_src, &src->cp_src, sizeof(struct nl4_server));
> +	memcpy(&dst->stateid, &src->stateid, sizeof(src->stateid));
> +	memcpy(&dst->c_fh, &src->c_fh, sizeof(src->c_fh));
> +	dst->ss_mnt = src->ss_mnt;
> +
> +	return 0;
> +
>  }
>  
>  static void cleanup_async_copy(struct nfsd4_copy *copy)
> @@ -1252,7 +1468,18 @@ static int nfsd4_do_async_copy(void *data)
>  	struct nfsd4_copy *copy = (struct nfsd4_copy *)data;
>  	struct nfsd4_copy *cb_copy;
>  
> +	if (!copy->cp_intra) { /* Inter server SSC */
> +		copy->file_src = nfs42_ssc_open(copy->ss_mnt, &copy-
> >c_fh,
> +					      &copy->stateid);
> +		if (IS_ERR(copy->file_src)) {
> +			copy->nfserr = nfserr_offload_denied;
> +			nfsd4_interssc_disconnect(copy->ss_mnt);
> +			goto do_callback;
> +		}
> +	}
> +
>  	copy->nfserr = nfsd4_do_copy(copy, 0);
> +do_callback:
>  	cb_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL);
>  	if (!cb_copy)
>  		goto out;
> @@ -1276,11 +1503,20 @@ static int nfsd4_do_async_copy(void *data)
>  	__be32 status;
>  	struct nfsd4_copy *async_copy = NULL;
>  
> -	status = nfsd4_verify_copy(rqstp, cstate, &copy-
> >cp_src_stateid,
> -				   &copy->file_src, &copy-
> >cp_dst_stateid,
> -				   &copy->file_dst, NULL);
> -	if (status)
> -		goto out;
> +	if (!copy->cp_intra) { /* Inter server SSC */
> +		if (!inter_copy_offload_enable || copy->cp_synchronous) 
> {
> +			status = nfserr_notsupp;
> +			goto out;
> +		}
> +		status = nfsd4_setup_inter_ssc(rqstp, cstate, copy,
> +					&copy->ss_mnt);
> +		if (status)
> +			return nfserr_offload_denied;
> +	} else {
> +		status = nfsd4_setup_intra_ssc(rqstp, cstate, copy);
> +		if (status)
> +			return status;
> +	}
>  
>  	copy->cp_clp = cstate->clp;
>  	memcpy(&copy->fh, &cstate->current_fh.fh_handle,
> @@ -1291,15 +1527,15 @@ static int nfsd4_do_async_copy(void *data)
>  		status = nfserrno(-ENOMEM);
>  		async_copy = kzalloc(sizeof(struct nfsd4_copy),
> GFP_KERNEL);
>  		if (!async_copy)
> -			goto out;
> -		if (!nfs4_init_copy_state(nn, copy)) {
> -			kfree(async_copy);
> -			goto out;
> -		}
> +			goto out_err;
> +		if (!nfs4_init_copy_state(nn, copy))
> +			goto out_err;
>  		refcount_set(&async_copy->refcount, 1);
>  		memcpy(&copy->cp_res.cb_stateid, &copy->cp_stateid,
>  			sizeof(copy->cp_stateid));
> -		dup_copy_fields(copy, async_copy);
> +		status = dup_copy_fields(copy, async_copy);
> +		if (status)
> +			goto out_err;
>  		async_copy->copy_task =
> kthread_create(nfsd4_do_async_copy,
>  				async_copy, "%s", "copy thread");
>  		if (IS_ERR(async_copy->copy_task))
> @@ -1310,13 +1546,17 @@ static int nfsd4_do_async_copy(void *data)
>  		spin_unlock(&async_copy->cp_clp->async_lock);
>  		wake_up_process(async_copy->copy_task);
>  		status = nfs_ok;
> -	} else
> +	} else {
>  		status = nfsd4_do_copy(copy, 1);
> +	}
>  out:
>  	return status;
>  out_err:
>  	cleanup_async_copy(async_copy);
> -	goto out;
> +	status = nfserrno(-ENOMEM);
> +	if (!copy->cp_intra)
> +		nfsd4_interssc_disconnect(copy->ss_mnt);
> +	goto out_err;

Won't this just loop going to the out_err label?

Thanks,
Anna

>  }
>  
>  struct nfsd4_copy *
> @@ -1342,15 +1582,24 @@ struct nfsd4_copy *
>  		     union nfsd4_op_u *u)
>  {
>  	struct nfsd4_offload_status *os = &u->offload_status;
> -	__be32 status = 0;
> +	__be32 status = nfserr_bad_stateid;
>  	struct nfsd4_copy *copy;
>  	struct nfs4_client *clp = cstate->clp;
> +	struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
>  
>  	copy = find_async_copy(clp, &os->stateid);
> -	if (copy)
> +	if (!copy) {
> +		struct nfs4_cpntf_state *cps = NULL;
> +
> +		status = find_internal_cpntf_state(nn, &os->stateid,
> &cps);
> +		if (status)
> +			return status;
> +		if (cps) {
> +			free_cpntf_state(nn, &os->stateid, cps);
> +			return nfs_ok;
> +		}
> +	} else
>  		nfsd4_stop_copy(copy);
> -	else
> -		status = nfserr_bad_stateid;
>  
>  	return status;
>  }
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index d7f4b96..c1a0695 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -5232,12 +5232,23 @@ static __be32 nfsd4_validate_stateid(struct
> nfs4_client *cl, stateid_t *stateid)
>  
>  	return 0;
>  }
> +void free_cpntf_state(struct nfsd_net *nn, stateid_t *st,
> +		      struct nfs4_cpntf_state *cps)
> +{
> +	spin_lock(&nn->s2s_cp_lock);
> +	list_del(&cps->cp_list);
> +	idr_remove(&nn->s2s_cp_stateids, cps-
> >cp_stateid.si_opaque.so_id);
> +	nfs4_put_stid(cps->cp_p_stid);
> +	kfree(cps);
> +	spin_unlock(&nn->s2s_cp_lock);
> +}
> +
>  /*
>   * A READ from an inter server to server COPY will have a
>   * copy stateid. Return the parent nfs4_stid.
>   */
> -static __be32 _find_cpntf_state(struct nfsd_net *nn, stateid_t *st,
> -		     struct nfs4_cpntf_state **cps)
> +__be32 find_internal_cpntf_state(struct nfsd_net *nn, stateid_t *st,
> +				 struct nfs4_cpntf_state **cps)
>  {
>  	struct nfs4_cpntf_state *state = NULL;
>  
> @@ -5260,7 +5271,7 @@ static __be32 find_cpntf_state(struct nfsd_net
> *nn, stateid_t *st,
>  	__be32 status;
>  	struct nfs4_cpntf_state *cps = NULL;
>  
> -	status = _find_cpntf_state(nn, st, &cps);
> +	status = find_internal_cpntf_state(nn, st, &cps);
>  	if (status)
>  		return status;
>  
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index 18d94ea..033bfcb 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -30,6 +30,12 @@
>  
>  #define NFSDDBG_FACILITY	NFSDDBG_SVC
>  
> +bool inter_copy_offload_enable;
> +EXPORT_SYMBOL_GPL(inter_copy_offload_enable);
> +module_param(inter_copy_offload_enable, bool, 0644);
> +MODULE_PARM_DESC(inter_copy_offload_enable,
> +		 "Enable inter server to server copy offload. Default:
> false");
> +
>  extern struct svc_program	nfsd_program;
>  static int			nfsd(void *vrqstp);
>  #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> index 106ed56..7026e2a 100644
> --- a/fs/nfsd/state.h
> +++ b/fs/nfsd/state.h
> @@ -659,6 +659,10 @@ extern struct nfs4_client_reclaim
> *nfs4_client_to_reclaim(struct xdr_netobj name
>  extern void nfs4_put_copy(struct nfsd4_copy *copy);
>  extern struct nfsd4_copy *
>  find_async_copy(struct nfs4_client *clp, stateid_t *staetid);
> +extern __be32 find_internal_cpntf_state(struct nfsd_net *nn,
> stateid_t *st,
> +				struct nfs4_cpntf_state **cps);
> +extern void free_cpntf_state(struct nfsd_net *nn, stateid_t *st,
> +		      struct nfs4_cpntf_state *cps);
>  static inline void get_nfs4_file(struct nfs4_file *fi)
>  {
>  	refcount_inc(&fi->fi_ref);
> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
> index fbd18d6..bb2f8e5 100644
> --- a/fs/nfsd/xdr4.h
> +++ b/fs/nfsd/xdr4.h
> @@ -547,7 +547,12 @@ struct nfsd4_copy {
>  	struct task_struct	*copy_task;
>  	refcount_t		refcount;
>  	bool			stopped;
> +
> +	struct vfsmount		*ss_mnt;
> +	struct nfs_fh		c_fh;
> +	nfs4_stateid		stateid;
>  };
> +extern bool inter_copy_offload_enable;
>  
>  struct nfsd4_seek {
>  	/* request */


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 0/8] server-side support for "inter" SSC copy
  2019-07-09  3:53 ` [PATCH v4 0/8] server-side support for "inter" SSC copy bfields
@ 2019-07-09 15:47   ` Olga Kornievskaia
  2019-07-17 18:05     ` Olga Kornievskaia
  0 siblings, 1 reply; 51+ messages in thread
From: Olga Kornievskaia @ 2019-07-09 15:47 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: J. Bruce Fields, linux-nfs

Amir's patches have been in the linux-next xfs tree and will go into
5.3. I would like for NFS patches to go into 5.4 (I'm assuming hoping
for 5.3 is unrealistic).

On Mon, Jul 8, 2019 at 11:53 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> Thanks for resending.  What's the status of Amir's series?  I guess I've
> been using that as an excuse to put off reviewing these, but I really
> should anyway....
>
> --b.
>
> On Mon, Jul 08, 2019 at 03:23:44PM -0400, Olga Kornievskaia wrote:
> > This patch series adds support for NFSv4.2 copy offload feature
> > allowing copy between two different NFS servers.
> >
> > This functionality depends on the VFS ability to support generic
> > copy_file_range() where a copy is done between an NFS file and
> > a local file system. This is on top of Amir's VFS generic copy
> > offload series.
> >
> > This feature is enabled by the kernel module parameter --
> > inter_copy_offload_enable -- and by default is disabled. There is
> > also a kernel compile configuration of NFSD_V4_2_INTER_SSC that
> > adds dependency on the NFS client side functions called from the
> > server.
> >
> > These patches work on top of existing async intra copy offload
> > patches. For the "inter" SSC, the implementation only supports
> > asynchronous inter copy.
> >
> > On the source server, upon receiving a COPY_NOTIFY, it generate a
> > unique stateid that's kept in the global list. Upon receiving a READ
> > with a stateid, the code checks the normal list of open stateid and
> > now additionally, it'll check the copy state list as well before
> > deciding to either fail with BAD_STATEID or find one that matches.
> > The stored stateid is only valid to be used for the first time
> > with a choosen lease period (90s currently). When the source server
> > received an OFFLOAD_CANCEL, it will remove the stateid from the
> > global list. Otherwise, the copy stateid is removed upon the removal
> > of its "parent" stateid (open/lock/delegation stateid).
> >
> > On the destination server, upon receiving a COPY request, the server
> > establishes the necessary clientid/session with the source server.
> > It calls into the NFS client code to establish the necessary
> > open stateid, filehandle, file description (without doing an NFS open).
> > Then the server calls into the copy_file_range() to preform the copy
> > where the source file will issue NFS READs and then do local file
> > system writes (this depends on the VFS ability to do cross device
> > copy_file_range().
> >
> > v4:
> > --- allowing for synchronous inter server-to-server copy
> > --- added missing offload_cancel on the source server
> >
> > Already presented numbers for performance improvement for large
> > file transfer but here are times for copying linux kernel tree
> > (which is mostly small files):
> > -- regular cp 6m1s (intra)
> > -- copy offload cp 4m11s (intra)
> >    -- benefit of using copy offload with small copies using sync copy
> > -- regular cp 6m9s (inter)
> > -- copy offload cp 6m3s (inter)
> >    -- same performance as traditional as for most it fallback to traditional
> > copy offload
> >
> > Olga Kornievskaia (8):
> >   NFSD fill-in netloc4 structure
> >   NFSD add ca_source_server<> to COPY
> >   NFSD return nfs4_stid in nfs4_preprocess_stateid_op
> >   NFSD add COPY_NOTIFY operation
> >   NFSD check stateids against copy stateids
> >   NFSD generalize nfsd4_compound_state flag names
> >   NFSD: allow inter server COPY to have a STALE source server fh
> >   NFSD add nfs4 inter ssc to nfsd4_copy
> >
> >  fs/nfsd/Kconfig     |  10 ++
> >  fs/nfsd/nfs4proc.c  | 434 +++++++++++++++++++++++++++++++++++++++++++++++-----
> >  fs/nfsd/nfs4state.c | 135 ++++++++++++++--
> >  fs/nfsd/nfs4xdr.c   | 172 ++++++++++++++++++++-
> >  fs/nfsd/nfsd.h      |  32 ++++
> >  fs/nfsd/nfsfh.h     |   5 +-
> >  fs/nfsd/nfssvc.c    |   6 +
> >  fs/nfsd/state.h     |  25 ++-
> >  fs/nfsd/xdr4.h      |  37 ++++-
> >  9 files changed, 790 insertions(+), 66 deletions(-)
> >
> > --
> > 1.8.3.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 4/8] NFSD add COPY_NOTIFY operation
  2019-07-09 12:34   ` Anna Schumaker
@ 2019-07-09 15:51     ` Olga Kornievskaia
  0 siblings, 0 replies; 51+ messages in thread
From: Olga Kornievskaia @ 2019-07-09 15:51 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: J. Bruce Fields, linux-nfs

On Tue, Jul 9, 2019 at 8:34 AM Anna Schumaker <schumaker.anna@gmail.com> wrote:
>
> Hi Olga,
>
> On Mon, 2019-07-08 at 15:23 -0400, Olga Kornievskaia wrote:
> > From: Olga Kornievskaia <kolga@netapp.com>
> >
> > Introducing the COPY_NOTIFY operation.
> >
> > Create a new unique stateid that will keep track of the copy
> > state and the upcoming READs that will use that stateid. Keep
> > it in the list associated with parent stateid.
> >
> > Return single netaddr to advertise to the copy.
> >
> > Signed-off-by: Andy Adamson <andros@netapp.com>
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > ---
> >  fs/nfsd/nfs4proc.c  | 71 +++++++++++++++++++++++++++++++++++----
> >  fs/nfsd/nfs4state.c | 64 +++++++++++++++++++++++++++++++----
> >  fs/nfsd/nfs4xdr.c   | 97
> > +++++++++++++++++++++++++++++++++++++++++++++++++++--
> >  fs/nfsd/state.h     | 18 ++++++++--
> >  fs/nfsd/xdr4.h      | 13 +++++++
> >  5 files changed, 247 insertions(+), 16 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > index cfd8767..c39fa72 100644
> > --- a/fs/nfsd/nfs4proc.c
> > +++ b/fs/nfsd/nfs4proc.c
> > @@ -37,6 +37,7 @@
> >  #include <linux/falloc.h>
> >  #include <linux/slab.h>
> >  #include <linux/kthread.h>
> > +#include <linux/sunrpc/addr.h>
> >
> >  #include "idmap.h"
> >  #include "cache.h"
> > @@ -1033,7 +1034,8 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst
> > *rqstp, struct svc_fh *fh)
> >  static __be32
> >  nfsd4_verify_copy(struct svc_rqst *rqstp, struct
> > nfsd4_compound_state *cstate,
> >                 stateid_t *src_stateid, struct file **src,
> > -               stateid_t *dst_stateid, struct file **dst)
> > +               stateid_t *dst_stateid, struct file **dst,
> > +               struct nfs4_stid **stid)
> >  {
> >       __be32 status;
> >
> > @@ -1050,7 +1052,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst
> > *rqstp, struct svc_fh *fh)
> >
> >       status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate-
> > >current_fh,
> >                                           dst_stateid, WR_STATE, dst,
> > NULL,
> > -                                         NULL);
> > +                                         stid);
> >       if (status) {
> >               dprintk("NFSD: %s: couldn't process dst stateid!\n",
> > __func__);
> >               goto out_put_src;
> > @@ -1081,7 +1083,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst
> > *rqstp, struct svc_fh *fh)
> >       __be32 status;
> >
> >       status = nfsd4_verify_copy(rqstp, cstate, &clone-
> > >cl_src_stateid, &src,
> > -                                &clone->cl_dst_stateid, &dst);
> > +                                &clone->cl_dst_stateid, &dst, NULL);
> >       if (status)
> >               goto out;
> >
> > @@ -1228,7 +1230,7 @@ static void dup_copy_fields(struct nfsd4_copy
> > *src, struct nfsd4_copy *dst)
> >
> >  static void cleanup_async_copy(struct nfsd4_copy *copy)
> >  {
> > -     nfs4_free_cp_state(copy);
> > +     nfs4_free_copy_state(copy);
> >       fput(copy->file_dst);
> >       fput(copy->file_src);
> >       spin_lock(&copy->cp_clp->async_lock);
> > @@ -1268,7 +1270,7 @@ static int nfsd4_do_async_copy(void *data)
> >
> >       status = nfsd4_verify_copy(rqstp, cstate, &copy-
> > >cp_src_stateid,
> >                                  &copy->file_src, &copy-
> > >cp_dst_stateid,
> > -                                &copy->file_dst);
> > +                                &copy->file_dst, NULL);
> >       if (status)
> >               goto out;
> >
> > @@ -1282,7 +1284,7 @@ static int nfsd4_do_async_copy(void *data)
> >               async_copy = kzalloc(sizeof(struct nfsd4_copy),
> > GFP_KERNEL);
> >               if (!async_copy)
> >                       goto out;
> > -             if (!nfs4_init_cp_state(nn, copy)) {
> > +             if (!nfs4_init_copy_state(nn, copy)) {
> >                       kfree(async_copy);
> >                       goto out;
> >               }
> > @@ -1346,6 +1348,42 @@ struct nfsd4_copy *
> >  }
> >
> >  static __be32
> > +nfsd4_copy_notify(struct svc_rqst *rqstp, struct
> > nfsd4_compound_state *cstate,
> > +               union nfsd4_op_u *u)
> > +{
> > +     struct nfsd4_copy_notify *cn = &u->copy_notify;
> > +     __be32 status;
> > +     struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
> > +     struct nfs4_stid *stid;
> > +     struct nfs4_cpntf_state *cps;
> > +
> > +     status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate-
> > >current_fh,
> > +                                     &cn->cpn_src_stateid, RD_STATE,
> > NULL,
> > +                                     NULL, &stid);
> > +     if (status)
> > +             return status;
> > +
> > +     cn->cpn_sec = nn->nfsd4_lease;
> > +     cn->cpn_nsec = 0;
> > +
> > +     status = nfserrno(-ENOMEM);
> > +     cps = nfs4_alloc_init_cpntf_state(nn, stid);
> > +     if (!cps)
> > +             return status;
> > +     memcpy(&cn->cpn_cnr_stateid, &cps->cp_stateid,
> > sizeof(stateid_t));
> > +
> > +     /* For now, only return one server address in cpn_src, the
> > +      * address used by the client to connect to this server.
> > +      */
> > +     cn->cpn_src.nl4_type = NL4_NETADDR;
> > +     status = nfsd4_set_netaddr((struct sockaddr *)&rqstp->rq_daddr,
> > +                              &cn->cpn_src.u.nl4_addr);
> > +     WARN_ON_ONCE(status);
> > +
> > +     return status;
> > +}
> > +
> > +static __be32
> >  nfsd4_fallocate(struct svc_rqst *rqstp, struct nfsd4_compound_state
> > *cstate,
> >               struct nfsd4_fallocate *fallocate, int flags)
> >  {
> > @@ -2298,6 +2336,21 @@ static inline u32
> > nfsd4_offload_status_rsize(struct svc_rqst *rqstp,
> >               1 /* osr_complete<1> optional 0 for now */) *
> > sizeof(__be32);
> >  }
> >
> > +static inline u32 nfsd4_copy_notify_rsize(struct svc_rqst *rqstp,
> > +                                     struct nfsd4_op *op)
> > +{
> > +     return (op_encode_hdr_size +
> > +             3 /* cnr_lease_time */ +
> > +             1 /* We support one cnr_source_server */ +
> > +             1 /* cnr_stateid seq */ +
> > +             op_encode_stateid_maxsz /* cnr_stateid */ +
> > +             1 /* num cnr_source_server*/ +
> > +             1 /* nl4_type */ +
> > +             1 /* nl4 size */ +
> > +             XDR_QUADLEN(NFS4_OPAQUE_LIMIT) /*nl4_loc + nl4_loc_sz
> > */)
> > +             * sizeof(__be32);
> > +}
> > +
> >  #ifdef CONFIG_NFSD_PNFS
> >  static inline u32 nfsd4_getdeviceinfo_rsize(struct svc_rqst *rqstp,
> > struct nfsd4_op *op)
> >  {
> > @@ -2722,6 +2775,12 @@ static inline u32 nfsd4_seek_rsize(struct
> > svc_rqst *rqstp, struct nfsd4_op *op)
> >               .op_name = "OP_OFFLOAD_CANCEL",
> >               .op_rsize_bop = nfsd4_only_status_rsize,
> >       },
> > +     [OP_COPY_NOTIFY] = {
> > +             .op_func = nfsd4_copy_notify,
> > +             .op_flags = OP_MODIFIES_SOMETHING,
> > +             .op_name = "OP_COPY_NOTIFY",
> > +             .op_rsize_bop = nfsd4_copy_notify_rsize,
> > +     },
> >  };
> >
> >  /**
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 05c0295..2555eb9 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -707,6 +707,7 @@ struct nfs4_stid *nfs4_alloc_stid(struct
> > nfs4_client *cl, struct kmem_cache *sla
> >       /* Will be incremented before return to client: */
> >       refcount_set(&stid->sc_count, 1);
> >       spin_lock_init(&stid->sc_lock);
> > +     INIT_LIST_HEAD(&stid->sc_cp_list);
> >
> >       /*
> >        * It shouldn't be a problem to reuse an opaque stateid value.
> > @@ -726,24 +727,53 @@ struct nfs4_stid *nfs4_alloc_stid(struct
> > nfs4_client *cl, struct kmem_cache *sla
> >  /*
> >   * Create a unique stateid_t to represent each COPY.
> >   */
> > -int nfs4_init_cp_state(struct nfsd_net *nn, struct nfsd4_copy *copy)
> > +static int nfs4_init_cp_state(struct nfsd_net *nn, void *ptr,
> > stateid_t *stid)
> >  {
> >       int new_id;
> >
> >       idr_preload(GFP_KERNEL);
> >       spin_lock(&nn->s2s_cp_lock);
> > -     new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, copy, 0, 0,
> > GFP_NOWAIT);
> > +     new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, ptr, 0, 0,
> > GFP_NOWAIT);
> >       spin_unlock(&nn->s2s_cp_lock);
> >       idr_preload_end();
> >       if (new_id < 0)
> >               return 0;
> > -     copy->cp_stateid.si_opaque.so_id = new_id;
> > -     copy->cp_stateid.si_opaque.so_clid.cl_boot = nn->boot_time;
> > -     copy->cp_stateid.si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
> > +     stid->si_opaque.so_id = new_id;
> > +     stid->si_opaque.so_clid.cl_boot = nn->boot_time;
> > +     stid->si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
> >       return 1;
> >  }
> >
> > -void nfs4_free_cp_state(struct nfsd4_copy *copy)
> > +int nfs4_init_copy_state(struct nfsd_net *nn, struct nfsd4_copy
> > *copy)
> > +{
> > +     return nfs4_init_cp_state(nn, copy, &copy->cp_stateid);
> > +}
> > +
> > +struct nfs4_cpntf_state *nfs4_alloc_init_cpntf_state(struct nfsd_net
> > *nn,
> > +                                                  struct nfs4_stid
> > *p_stid)
> > +{
> > +     struct nfs4_cpntf_state *cps;
> > +
> > +     cps = kzalloc(sizeof(struct nfs4_cpntf_state), GFP_KERNEL);
> > +     if (!cps)
> > +             return NULL;
> > +     if (!nfs4_init_cp_state(nn, cps, &cps->cp_stateid))
> > +             goto out_free;
> > +     cps->cp_p_stid = p_stid;
> > +     cps->cp_active = false;
> > +     cps->cp_timeout = jiffies + (nn->nfsd4_lease * HZ);
> > +     INIT_LIST_HEAD(&cps->cp_list);
> > +     spin_lock(&nn->s2s_cp_lock);
> > +     list_add(&cps->cp_list, &p_stid->sc_cp_list);
> > +     spin_unlock(&nn->s2s_cp_lock);
> > +
> > +     return cps;
> > +out_free:
> > +     kfree(cps);
> > +     return NULL;
> > +}
> > +
> > +void nfs4_free_copy_state(struct nfsd4_copy *copy)
> >  {
> >       struct nfsd_net *nn;
> >
> > @@ -753,6 +783,27 @@ void nfs4_free_cp_state(struct nfsd4_copy *copy)
> >       spin_unlock(&nn->s2s_cp_lock);
> >  }
> >
> > +static void nfs4_free_cpntf_statelist(struct net *net, struct
> > nfs4_stid *stid)
> > +{
> > +     struct nfs4_cpntf_state *cps;
> > +     struct nfsd_net *nn;
> > +
> > +     nn = net_generic(net, nfsd_net_id);
> > +
> > +     might_sleep();
> > +
> > +     spin_lock(&nn->s2s_cp_lock);
> > +     while (!list_empty(&stid->sc_cp_list)) {
> > +             cps = list_first_entry(&stid->sc_cp_list,
> > +                                    struct nfs4_cpntf_state,
> > cp_list);
> > +             list_del(&cps->cp_list);
> > +             idr_remove(&nn->s2s_cp_stateids,
> > +                        cps->cp_stateid.si_opaque.so_id);
> > +             kfree(cps);
> > +     }
> > +     spin_unlock(&nn->s2s_cp_lock);
> > +}
> > +
> >  static struct nfs4_ol_stateid * nfs4_alloc_open_stateid(struct
> > nfs4_client *clp)
> >  {
> >       struct nfs4_stid *stid;
> > @@ -901,6 +952,7 @@ static void block_delegations(struct knfsd_fh
> > *fh)
> >       }
> >       idr_remove(&clp->cl_stateids, s->sc_stateid.si_opaque.so_id);
> >       spin_unlock(&clp->cl_lock);
> > +     nfs4_free_cpntf_statelist(clp->net, s);
> >       s->sc_free(s);
> >       if (fp)
> >               put_nfs4_file(fp);
> > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > index 15f53bb..ed37528 100644
> > --- a/fs/nfsd/nfs4xdr.c
> > +++ b/fs/nfsd/nfs4xdr.c
> > @@ -1847,6 +1847,22 @@ static __be32 nfsd4_decode_nl4_server(struct
> > nfsd4_compoundargs *argp,
> >  }
> >
> >  static __be32
> > +nfsd4_decode_copy_notify(struct nfsd4_compoundargs *argp,
> > +                      struct nfsd4_copy_notify *cn)
> > +{
> > +     int status;
> > +
> > +     status = nfsd4_decode_stateid(argp, &cn->cpn_src_stateid);
> > +     if (status)
> > +             return status;
> > +     status = nfsd4_decode_nl4_server(argp, &cn->cpn_dst);
>
> Maybe this could be simplified to "return nfsd4_decode_nl4_server()" ?
>
> > +     if (status)
> > +             return status;
> > +
> > +     return status;
> > +}
> > +
> > +static __be32
> >  nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek
> > *seek)
> >  {
> >       DECODE_HEAD;
> > @@ -1947,7 +1963,7 @@ static __be32 nfsd4_decode_nl4_server(struct
> > nfsd4_compoundargs *argp,
> >       /* new operations for NFSv4.2 */
> >       [OP_ALLOCATE]           = (nfsd4_dec)nfsd4_decode_fallocate,
> >       [OP_COPY]               = (nfsd4_dec)nfsd4_decode_copy,
> > -     [OP_COPY_NOTIFY]        = (nfsd4_dec)nfsd4_decode_notsupp,
> > +     [OP_COPY_NOTIFY]        = (nfsd4_dec)nfsd4_decode_copy_notify,
> >       [OP_DEALLOCATE]         = (nfsd4_dec)nfsd4_decode_fallocate,
> >       [OP_IO_ADVISE]          = (nfsd4_dec)nfsd4_decode_notsupp,
> >       [OP_LAYOUTERROR]        = (nfsd4_dec)nfsd4_decode_notsupp,
> > @@ -4336,6 +4352,45 @@ static __be32 nfsd4_encode_readv(struct
> > nfsd4_compoundres *resp,
> >  }
> >
> >  static __be32
> > +nfsd42_encode_nl4_server(struct nfsd4_compoundres *resp, struct
> > nl4_server *ns)
> > +{
> > +     struct xdr_stream *xdr = &resp->xdr;
> > +     struct nfs42_netaddr *addr;
> > +     __be32 *p;
> > +
> > +     p = xdr_reserve_space(xdr, 4);
> > +     *p++ = cpu_to_be32(ns->nl4_type);
> > +
> > +     switch (ns->nl4_type) {
> > +     case NL4_NETADDR:
> > +             addr = &ns->u.nl4_addr;
> > +
> > +             /* netid_len, netid, uaddr_len, uaddr (port included
> > +              * in RPCBIND_MAXUADDRLEN)
> > +              */
> > +             p = xdr_reserve_space(xdr,
> > +                     4 /* netid len */ +
> > +                     (XDR_QUADLEN(addr->netid_len) * 4) +
> > +                     4 /* uaddr len */ +
> > +                     (XDR_QUADLEN(addr->addr_len) * 4));
> > +             if (!p)
> > +                     return nfserr_resource;
> > +
> > +             *p++ = cpu_to_be32(addr->netid_len);
> > +             p = xdr_encode_opaque_fixed(p, addr->netid,
> > +                                         addr->netid_len);
> > +             *p++ = cpu_to_be32(addr->addr_len);
> > +             p = xdr_encode_opaque_fixed(p, addr->addr,
> > +                                     addr->addr_len);
> > +             break;
> > +     default:
> > +             WARN_ON(ns->nl4_type != NL4_NETADDR);
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +static __be32
> >  nfsd4_encode_copy(struct nfsd4_compoundres *resp, __be32 nfserr,
> >                 struct nfsd4_copy *copy)
> >  {
> > @@ -4369,6 +4424,44 @@ static __be32 nfsd4_encode_readv(struct
> > nfsd4_compoundres *resp,
> >  }
> >
> >  static __be32
> > +nfsd4_encode_copy_notify(struct nfsd4_compoundres *resp, __be32
> > nfserr,
> > +                      struct nfsd4_copy_notify *cn)
> > +{
> > +     struct xdr_stream *xdr = &resp->xdr;
> > +     __be32 *p;
> > +
> > +     if (nfserr)
> > +             return nfserr;
> > +
> > +     /* 8 sec, 4 nsec */
> > +     p = xdr_reserve_space(xdr, 12);
> > +     if (!p)
> > +             return nfserr_resource;
> > +
> > +     /* cnr_lease_time */
> > +     p = xdr_encode_hyper(p, cn->cpn_sec);
> > +     *p++ = cpu_to_be32(cn->cpn_nsec);
> > +
> > +     /* cnr_stateid */
> > +     nfserr = nfsd4_encode_stateid(xdr, &cn->cpn_cnr_stateid);
> > +     if (nfserr)
> > +             return nfserr;
> > +
> > +     /* cnr_src.nl_nsvr */
> > +     p = xdr_reserve_space(xdr, 4);
> > +     if (!p)
> > +             return nfserr_resource;
> > +
> > +     *p++ = cpu_to_be32(1);
> > +
> > +     nfserr = nfsd42_encode_nl4_server(resp, &cn->cpn_src);
>
> This could be simplified, too: "return nfsd42_encode_nl4_server()"

Yep will make the changes.
>
> Thanks,
> Anna
>
> > +     if (nfserr)
> > +             return nfserr;
> > +
> > +     return nfserr;
> > +}
> > +
> > +static __be32
> >  nfsd4_encode_seek(struct nfsd4_compoundres *resp, __be32 nfserr,
> >                 struct nfsd4_seek *seek)
> >  {
> > @@ -4465,7 +4558,7 @@ static __be32 nfsd4_encode_readv(struct
> > nfsd4_compoundres *resp,
> >       /* NFSv4.2 operations */
> >       [OP_ALLOCATE]           = (nfsd4_enc)nfsd4_encode_noop,
> >       [OP_COPY]               = (nfsd4_enc)nfsd4_encode_copy,
> > -     [OP_COPY_NOTIFY]        = (nfsd4_enc)nfsd4_encode_noop,
> > +     [OP_COPY_NOTIFY]        = (nfsd4_enc)nfsd4_encode_copy_notify,
> >       [OP_DEALLOCATE]         = (nfsd4_enc)nfsd4_encode_noop,
> >       [OP_IO_ADVISE]          = (nfsd4_enc)nfsd4_encode_noop,
> >       [OP_LAYOUTERROR]        = (nfsd4_enc)nfsd4_encode_noop,
> > diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> > index 5da9cc3..106ed56 100644
> > --- a/fs/nfsd/state.h
> > +++ b/fs/nfsd/state.h
> > @@ -95,6 +95,7 @@ struct nfs4_stid {
> >  #define NFS4_REVOKED_DELEG_STID 16
> >  #define NFS4_CLOSED_DELEG_STID 32
> >  #define NFS4_LAYOUT_STID 64
> > +     struct list_head        sc_cp_list;
> >       unsigned char           sc_type;
> >       stateid_t               sc_stateid;
> >       spinlock_t              sc_lock;
> > @@ -103,6 +104,17 @@ struct nfs4_stid {
> >       void                    (*sc_free)(struct nfs4_stid *);
> >  };
> >
> > +/* Keep a list of stateids issued by the COPY_NOTIFY, associate it
> > with the
> > + * parent OPEN/LOCK/DELEG stateid.
> > + */
> > +struct nfs4_cpntf_state {
> > +     stateid_t               cp_stateid;
> > +     struct list_head        cp_list;        /* per parent nfs4_stid */
> > +     struct nfs4_stid        *cp_p_stid;     /* pointer to parent */
> > +     bool                    cp_active;      /* has the copy
> > started */
> > +     unsigned long           cp_timeout;     /* copy timeout */
> > +};
> > +
> >  /*
> >   * Represents a delegation stateid. The nfs4_client holds references
> > to these
> >   * and they are put when it is being destroyed or when the
> > delegation is
> > @@ -614,8 +626,10 @@ __be32 nfsd4_lookup_stateid(struct
> > nfsd4_compound_state *cstate,
> >                    struct nfs4_stid **s, struct nfsd_net *nn);
> >  struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct
> > kmem_cache *slab,
> >                                 void (*sc_free)(struct nfs4_stid *));
> > -int nfs4_init_cp_state(struct nfsd_net *nn, struct nfsd4_copy
> > *copy);
> > -void nfs4_free_cp_state(struct nfsd4_copy *copy);
> > +int nfs4_init_copy_state(struct nfsd_net *nn, struct nfsd4_copy
> > *copy);
> > +void nfs4_free_copy_state(struct nfsd4_copy *copy);
> > +struct nfs4_cpntf_state *nfs4_alloc_init_cpntf_state(struct nfsd_net
> > *nn,
> > +                     struct nfs4_stid *p_stid);
> >  void nfs4_unhash_stid(struct nfs4_stid *s);
> >  void nfs4_put_stid(struct nfs4_stid *s);
> >  void nfs4_inc_and_copy_stateid(stateid_t *dst, struct nfs4_stid
> > *stid);
> > diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
> > index 513c9ff..bade8e5 100644
> > --- a/fs/nfsd/xdr4.h
> > +++ b/fs/nfsd/xdr4.h
> > @@ -568,6 +568,18 @@ struct nfsd4_offload_status {
> >       u32             status;
> >  };
> >
> > +struct nfsd4_copy_notify {
> > +     /* request */
> > +     stateid_t               cpn_src_stateid;
> > +     struct nl4_server       cpn_dst;
> > +
> > +     /* response */
> > +     stateid_t               cpn_cnr_stateid;
> > +     u64                     cpn_sec;
> > +     u32                     cpn_nsec;
> > +     struct nl4_server       cpn_src;
> > +};
> > +
> >  struct nfsd4_op {
> >       int                                     opnum;
> >       const struct nfsd4_operation *          opdesc;
> > @@ -627,6 +639,7 @@ struct nfsd4_op {
> >               struct nfsd4_clone              clone;
> >               struct nfsd4_copy               copy;
> >               struct nfsd4_offload_status     offload_status;
> > +             struct nfsd4_copy_notify        copy_notify;
> >               struct nfsd4_seek               seek;
> >       } u;
> >       struct nfs4_replay *                    replay;
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 8/8] NFSD add nfs4 inter ssc to nfsd4_copy
  2019-07-09 12:43   ` Anna Schumaker
@ 2019-07-09 15:53     ` Olga Kornievskaia
  0 siblings, 0 replies; 51+ messages in thread
From: Olga Kornievskaia @ 2019-07-09 15:53 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: J. Bruce Fields, linux-nfs

On Tue, Jul 9, 2019 at 8:43 AM Anna Schumaker <schumaker.anna@gmail.com> wrote:
>
> Hi Olga,
>
> On Mon, 2019-07-08 at 15:23 -0400, Olga Kornievskaia wrote:
> > Given a universal address, mount the source server from the
> > destination
> > server.  Use an internal mount. Call the NFS client nfs42_ssc_open to
> > obtain the NFS struct file suitable for nfsd_copy_range.
> >
> > Ability to do "inter" server-to-server depends on the an nfsd kernel
> > parameter "inter_copy_offload_enabled".
> >
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > ---
> >  fs/nfsd/nfs4proc.c  | 291
> > ++++++++++++++++++++++++++++++++++++++++++++++++----
> >  fs/nfsd/nfs4state.c |  17 ++-
> >  fs/nfsd/nfssvc.c    |   6 ++
> >  fs/nfsd/state.h     |   4 +
> >  fs/nfsd/xdr4.h      |   5 +
> >  5 files changed, 299 insertions(+), 24 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > index 1039528..caf046f 100644
> > --- a/fs/nfsd/nfs4proc.c
> > +++ b/fs/nfsd/nfs4proc.c
> > @@ -1153,6 +1153,209 @@ void nfsd4_shutdown_copy(struct nfs4_client
> > *clp)
> >       while ((copy = nfsd4_get_copy(clp)) != NULL)
> >               nfsd4_stop_copy(copy);
> >  }
> > +#ifdef CONFIG_NFSD_V4_2_INTER_SSC
> > +
> > +extern struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
> > +                                struct nfs_fh *src_fh,
> > +                                nfs4_stateid *stateid);
> > +extern void nfs42_ssc_close(struct file *filep);
> > +
> > +extern void nfs_sb_deactive(struct super_block *sb);
> > +
> > +#define NFSD42_INTERSSC_MOUNTOPS "vers=4.2,addr=%s,sec=sys"
> > +
> > +/**
> > + * Support one copy source server for now.
> > + */
> > +static __be32
> > +nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst
> > *rqstp,
> > +                    struct vfsmount **mount)
> > +{
> > +     struct file_system_type *type;
> > +     struct vfsmount *ss_mnt;
> > +     struct nfs42_netaddr *naddr;
> > +     struct sockaddr_storage tmp_addr;
> > +     size_t tmp_addrlen, match_netid_len = 3;
> > +     char *startsep = "", *endsep = "", *match_netid = "tcp";
> > +     char *ipaddr, *dev_name, *raw_data;
> > +     int len, raw_len, status = -EINVAL;
> > +
> > +     naddr = &nss->u.nl4_addr;
> > +     tmp_addrlen = rpc_uaddr2sockaddr(SVC_NET(rqstp), naddr->addr,
> > +                                      naddr->addr_len,
> > +                                      (struct sockaddr *)&tmp_addr,
> > +                                      sizeof(tmp_addr));
> > +     if (tmp_addrlen == 0)
> > +             goto out_err;
> > +
> > +     if (tmp_addr.ss_family == AF_INET6) {
> > +             startsep = "[";
> > +             endsep = "]";
> > +             match_netid = "tcp6";
> > +             match_netid_len = 4;
> > +     }
> > +
> > +     if (naddr->netid_len != match_netid_len ||
> > +             strncmp(naddr->netid, match_netid, naddr->netid_len))
> > +             goto out_err;
> > +
> > +     /* Construct the raw data for the vfs_kern_mount call */
> > +     len = RPC_MAX_ADDRBUFLEN + 1;
> > +     ipaddr = kzalloc(len, GFP_KERNEL);
> > +     if (!ipaddr)
> > +             goto out_err;
> > +
> > +     rpc_ntop((struct sockaddr *)&tmp_addr, ipaddr, len);
> > +
> > +     /* 2 for ipv6 endsep and startsep. 3 for ":/" and trailing
> > '/0'*/
> > +
> > +     raw_len = strlen(NFSD42_INTERSSC_MOUNTOPS) + strlen(ipaddr);
> > +     raw_data = kzalloc(raw_len, GFP_KERNEL);
> > +     if (!raw_data)
> > +             goto out_free_ipaddr;
> > +
> > +     snprintf(raw_data, raw_len, NFSD42_INTERSSC_MOUNTOPS, ipaddr);
> > +
> > +     status = -ENODEV;
> > +     type = get_fs_type("nfs");
> > +     if (!type)
> > +             goto out_free_rawdata;
> > +
> > +     /* Set the server:<export> for the vfs_kern_mount call */
> > +     dev_name = kzalloc(len + 5, GFP_KERNEL);
> > +     if (!dev_name)
> > +             goto out_free_rawdata;
> > +     snprintf(dev_name, len + 5, "%s%s%s:/", startsep, ipaddr,
> > endsep);
> > +
> > +     /* Use an 'internal' mount: SB_KERNMOUNT -> MNT_INTERNAL */
> > +     ss_mnt = vfs_kern_mount(type, SB_KERNMOUNT, dev_name,
> > raw_data);
> > +     module_put(type->owner);
> > +     if (IS_ERR(ss_mnt))
> > +             goto out_free_devname;
> > +
> > +     status = 0;
> > +     *mount = ss_mnt;
> > +
> > +out_free_devname:
> > +     kfree(dev_name);
> > +out_free_rawdata:
> > +     kfree(raw_data);
> > +out_free_ipaddr:
> > +     kfree(ipaddr);
> > +out_err:
> > +     return status;
> > +}
> > +
> > +static void
> > +nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
> > +{
> > +     nfs_sb_deactive(ss_mnt->mnt_sb);
> > +     mntput(ss_mnt);
> > +}
> > +
> > +/**
> > + * nfsd4_setup_inter_ssc
> > + *
> > + * Verify COPY destination stateid.
> > + * Connect to the source server with NFSv4.1.
> > + * Create the source struct file for nfsd_copy_range.
> > + * Called with COPY cstate:
> > + *    SAVED_FH: source filehandle
> > + *    CURRENT_FH: destination filehandle
> > + *
> > + * Returns errno (not nfserrxxx)
> > + */
> > +static __be32
> > +nfsd4_setup_inter_ssc(struct svc_rqst *rqstp,
> > +                   struct nfsd4_compound_state *cstate,
> > +                   struct nfsd4_copy *copy, struct vfsmount **mount)
> > +{
> > +     struct svc_fh *s_fh = NULL;
> > +     stateid_t *s_stid = &copy->cp_src_stateid;
> > +     __be32 status = -EINVAL;
> > +
> > +     /* Verify the destination stateid and set dst struct file*/
> > +     status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate-
> > >current_fh,
> > +                                         &copy->cp_dst_stateid,
> > +                                         WR_STATE, &copy->file_dst,
> > NULL,
> > +                                         NULL);
> > +     if (status)
> > +             goto out;
> > +
> > +     status = nfsd4_interssc_connect(&copy->cp_src, rqstp, mount);
> > +     if (status)
> > +             goto out;
> > +
> > +     s_fh = &cstate->save_fh;
> > +
> > +     copy->c_fh.size = s_fh->fh_handle.fh_size;
> > +     memcpy(copy->c_fh.data, &s_fh->fh_handle.fh_base, copy-
> > >c_fh.size);
> > +     copy->stateid.seqid = s_stid->si_generation;
> > +     memcpy(copy->stateid.other, (void *)&s_stid->si_opaque,
> > +            sizeof(stateid_opaque_t));
> > +
> > +     status = 0;
> > +out:
> > +     return status;
> > +}
> > +
> > +static void
> > +nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *src,
> > +                     struct file *dst)
> > +{
> > +     nfs42_ssc_close(src);
> > +     fput(src);
> > +     fput(dst);
> > +     mntput(ss_mnt);
> > +}
> > +
> > +#else /* CONFIG_NFSD_V4_2_INTER_SSC */
> > +
> > +static __be32
> > +nfsd4_setup_inter_ssc(struct svc_rqst *rqstp,
> > +                   struct nfsd4_compound_state *cstate,
> > +                   struct nfsd4_copy *copy,
> > +                   struct vfsmount **mount)
> > +{
> > +     *mount = NULL;
> > +     return -EINVAL;
> > +}
> > +
> > +static void
> > +nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *src,
> > +                     struct file *dst)
> > +{
> > +}
> > +
> > +static void
> > +nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
> > +{
> > +}
> > +
> > +static struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
> > +                                struct nfs_fh *src_fh,
> > +                                nfs4_stateid *stateid)
> > +{
> > +     return NULL;
> > +}
> > +#endif /* CONFIG_NFSD_V4_2_INTER_SSC */
> > +
> > +static __be32
> > +nfsd4_setup_intra_ssc(struct svc_rqst *rqstp,
> > +                   struct nfsd4_compound_state *cstate,
> > +                   struct nfsd4_copy *copy)
> > +{
> > +     return nfsd4_verify_copy(rqstp, cstate, &copy->cp_src_stateid,
> > +                              &copy->file_src, &copy-
> > >cp_dst_stateid,
> > +                              &copy->file_dst, NULL);
> > +}
> > +
> > +static void
> > +nfsd4_cleanup_intra_ssc(struct file *src, struct file *dst)
> > +{
> > +     fput(src);
> > +     fput(dst);
> > +}
> >
> >  static void nfsd4_cb_offload_release(struct nfsd4_callback *cb)
> >  {
> > @@ -1217,12 +1420,16 @@ static __be32 nfsd4_do_copy(struct nfsd4_copy
> > *copy, bool sync)
> >               status = nfs_ok;
> >       }
> >
> > -     fput(copy->file_src);
> > -     fput(copy->file_dst);
> > +     if (!copy->cp_intra) /* Inter server SSC */
> > +             nfsd4_cleanup_inter_ssc(copy->ss_mnt, copy->file_src,
> > +                                     copy->file_dst);
> > +     else
> > +             nfsd4_cleanup_intra_ssc(copy->file_src, copy-
> > >file_dst);
> > +
> >       return status;
> >  }
> >
> > -static void dup_copy_fields(struct nfsd4_copy *src, struct
> > nfsd4_copy *dst)
> > +static int dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy
> > *dst)
> >  {
> >       dst->cp_src_pos = src->cp_src_pos;
> >       dst->cp_dst_pos = src->cp_dst_pos;
> > @@ -1232,8 +1439,17 @@ static void dup_copy_fields(struct nfsd4_copy
> > *src, struct nfsd4_copy *dst)
> >       memcpy(&dst->fh, &src->fh, sizeof(src->fh));
> >       dst->cp_clp = src->cp_clp;
> >       dst->file_dst = get_file(src->file_dst);
> > -     dst->file_src = get_file(src->file_src);
> > +     dst->cp_intra = src->cp_intra;
> > +     if (src->cp_intra) /* for inter, file_src doesn't exist yet */
> > +             dst->file_src = get_file(src->file_src);
> >       memcpy(&dst->cp_stateid, &src->cp_stateid, sizeof(src-
> > >cp_stateid));
> > +     memcpy(&dst->cp_src, &src->cp_src, sizeof(struct nl4_server));
> > +     memcpy(&dst->stateid, &src->stateid, sizeof(src->stateid));
> > +     memcpy(&dst->c_fh, &src->c_fh, sizeof(src->c_fh));
> > +     dst->ss_mnt = src->ss_mnt;
> > +
> > +     return 0;
> > +
> >  }
> >
> >  static void cleanup_async_copy(struct nfsd4_copy *copy)
> > @@ -1252,7 +1468,18 @@ static int nfsd4_do_async_copy(void *data)
> >       struct nfsd4_copy *copy = (struct nfsd4_copy *)data;
> >       struct nfsd4_copy *cb_copy;
> >
> > +     if (!copy->cp_intra) { /* Inter server SSC */
> > +             copy->file_src = nfs42_ssc_open(copy->ss_mnt, &copy-
> > >c_fh,
> > +                                           &copy->stateid);
> > +             if (IS_ERR(copy->file_src)) {
> > +                     copy->nfserr = nfserr_offload_denied;
> > +                     nfsd4_interssc_disconnect(copy->ss_mnt);
> > +                     goto do_callback;
> > +             }
> > +     }
> > +
> >       copy->nfserr = nfsd4_do_copy(copy, 0);
> > +do_callback:
> >       cb_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL);
> >       if (!cb_copy)
> >               goto out;
> > @@ -1276,11 +1503,20 @@ static int nfsd4_do_async_copy(void *data)
> >       __be32 status;
> >       struct nfsd4_copy *async_copy = NULL;
> >
> > -     status = nfsd4_verify_copy(rqstp, cstate, &copy-
> > >cp_src_stateid,
> > -                                &copy->file_src, &copy-
> > >cp_dst_stateid,
> > -                                &copy->file_dst, NULL);
> > -     if (status)
> > -             goto out;
> > +     if (!copy->cp_intra) { /* Inter server SSC */
> > +             if (!inter_copy_offload_enable || copy->cp_synchronous)
> > {
> > +                     status = nfserr_notsupp;
> > +                     goto out;
> > +             }
> > +             status = nfsd4_setup_inter_ssc(rqstp, cstate, copy,
> > +                                     &copy->ss_mnt);
> > +             if (status)
> > +                     return nfserr_offload_denied;
> > +     } else {
> > +             status = nfsd4_setup_intra_ssc(rqstp, cstate, copy);
> > +             if (status)
> > +                     return status;
> > +     }
> >
> >       copy->cp_clp = cstate->clp;
> >       memcpy(&copy->fh, &cstate->current_fh.fh_handle,
> > @@ -1291,15 +1527,15 @@ static int nfsd4_do_async_copy(void *data)
> >               status = nfserrno(-ENOMEM);
> >               async_copy = kzalloc(sizeof(struct nfsd4_copy),
> > GFP_KERNEL);
> >               if (!async_copy)
> > -                     goto out;
> > -             if (!nfs4_init_copy_state(nn, copy)) {
> > -                     kfree(async_copy);
> > -                     goto out;
> > -             }
> > +                     goto out_err;
> > +             if (!nfs4_init_copy_state(nn, copy))
> > +                     goto out_err;
> >               refcount_set(&async_copy->refcount, 1);
> >               memcpy(&copy->cp_res.cb_stateid, &copy->cp_stateid,
> >                       sizeof(copy->cp_stateid));
> > -             dup_copy_fields(copy, async_copy);
> > +             status = dup_copy_fields(copy, async_copy);
> > +             if (status)
> > +                     goto out_err;
> >               async_copy->copy_task =
> > kthread_create(nfsd4_do_async_copy,
> >                               async_copy, "%s", "copy thread");
> >               if (IS_ERR(async_copy->copy_task))
> > @@ -1310,13 +1546,17 @@ static int nfsd4_do_async_copy(void *data)
> >               spin_unlock(&async_copy->cp_clp->async_lock);
> >               wake_up_process(async_copy->copy_task);
> >               status = nfs_ok;
> > -     } else
> > +     } else {
> >               status = nfsd4_do_copy(copy, 1);
> > +     }
> >  out:
> >       return status;
> >  out_err:
> >       cleanup_async_copy(async_copy);
> > -     goto out;
> > +     status = nfserrno(-ENOMEM);
> > +     if (!copy->cp_intra)
> > +             nfsd4_interssc_disconnect(copy->ss_mnt);
> > +     goto out_err;
>
> Won't this just loop going to the out_err label?

Yep should have been "goto out". Will fix it.

>
> Thanks,
> Anna
>
> >  }
> >
> >  struct nfsd4_copy *
> > @@ -1342,15 +1582,24 @@ struct nfsd4_copy *
> >                    union nfsd4_op_u *u)
> >  {
> >       struct nfsd4_offload_status *os = &u->offload_status;
> > -     __be32 status = 0;
> > +     __be32 status = nfserr_bad_stateid;
> >       struct nfsd4_copy *copy;
> >       struct nfs4_client *clp = cstate->clp;
> > +     struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
> >
> >       copy = find_async_copy(clp, &os->stateid);
> > -     if (copy)
> > +     if (!copy) {
> > +             struct nfs4_cpntf_state *cps = NULL;
> > +
> > +             status = find_internal_cpntf_state(nn, &os->stateid,
> > &cps);
> > +             if (status)
> > +                     return status;
> > +             if (cps) {
> > +                     free_cpntf_state(nn, &os->stateid, cps);
> > +                     return nfs_ok;
> > +             }
> > +     } else
> >               nfsd4_stop_copy(copy);
> > -     else
> > -             status = nfserr_bad_stateid;
> >
> >       return status;
> >  }
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index d7f4b96..c1a0695 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -5232,12 +5232,23 @@ static __be32 nfsd4_validate_stateid(struct
> > nfs4_client *cl, stateid_t *stateid)
> >
> >       return 0;
> >  }
> > +void free_cpntf_state(struct nfsd_net *nn, stateid_t *st,
> > +                   struct nfs4_cpntf_state *cps)
> > +{
> > +     spin_lock(&nn->s2s_cp_lock);
> > +     list_del(&cps->cp_list);
> > +     idr_remove(&nn->s2s_cp_stateids, cps-
> > >cp_stateid.si_opaque.so_id);
> > +     nfs4_put_stid(cps->cp_p_stid);
> > +     kfree(cps);
> > +     spin_unlock(&nn->s2s_cp_lock);
> > +}
> > +
> >  /*
> >   * A READ from an inter server to server COPY will have a
> >   * copy stateid. Return the parent nfs4_stid.
> >   */
> > -static __be32 _find_cpntf_state(struct nfsd_net *nn, stateid_t *st,
> > -                  struct nfs4_cpntf_state **cps)
> > +__be32 find_internal_cpntf_state(struct nfsd_net *nn, stateid_t *st,
> > +                              struct nfs4_cpntf_state **cps)
> >  {
> >       struct nfs4_cpntf_state *state = NULL;
> >
> > @@ -5260,7 +5271,7 @@ static __be32 find_cpntf_state(struct nfsd_net
> > *nn, stateid_t *st,
> >       __be32 status;
> >       struct nfs4_cpntf_state *cps = NULL;
> >
> > -     status = _find_cpntf_state(nn, st, &cps);
> > +     status = find_internal_cpntf_state(nn, st, &cps);
> >       if (status)
> >               return status;
> >
> > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > index 18d94ea..033bfcb 100644
> > --- a/fs/nfsd/nfssvc.c
> > +++ b/fs/nfsd/nfssvc.c
> > @@ -30,6 +30,12 @@
> >
> >  #define NFSDDBG_FACILITY     NFSDDBG_SVC
> >
> > +bool inter_copy_offload_enable;
> > +EXPORT_SYMBOL_GPL(inter_copy_offload_enable);
> > +module_param(inter_copy_offload_enable, bool, 0644);
> > +MODULE_PARM_DESC(inter_copy_offload_enable,
> > +              "Enable inter server to server copy offload. Default:
> > false");
> > +
> >  extern struct svc_program    nfsd_program;
> >  static int                   nfsd(void *vrqstp);
> >  #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
> > diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> > index 106ed56..7026e2a 100644
> > --- a/fs/nfsd/state.h
> > +++ b/fs/nfsd/state.h
> > @@ -659,6 +659,10 @@ extern struct nfs4_client_reclaim
> > *nfs4_client_to_reclaim(struct xdr_netobj name
> >  extern void nfs4_put_copy(struct nfsd4_copy *copy);
> >  extern struct nfsd4_copy *
> >  find_async_copy(struct nfs4_client *clp, stateid_t *staetid);
> > +extern __be32 find_internal_cpntf_state(struct nfsd_net *nn,
> > stateid_t *st,
> > +                             struct nfs4_cpntf_state **cps);
> > +extern void free_cpntf_state(struct nfsd_net *nn, stateid_t *st,
> > +                   struct nfs4_cpntf_state *cps);
> >  static inline void get_nfs4_file(struct nfs4_file *fi)
> >  {
> >       refcount_inc(&fi->fi_ref);
> > diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
> > index fbd18d6..bb2f8e5 100644
> > --- a/fs/nfsd/xdr4.h
> > +++ b/fs/nfsd/xdr4.h
> > @@ -547,7 +547,12 @@ struct nfsd4_copy {
> >       struct task_struct      *copy_task;
> >       refcount_t              refcount;
> >       bool                    stopped;
> > +
> > +     struct vfsmount         *ss_mnt;
> > +     struct nfs_fh           c_fh;
> > +     nfs4_stateid            stateid;
> >  };
> > +extern bool inter_copy_offload_enable;
> >
> >  struct nfsd4_seek {
> >       /* request */
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 0/8] server-side support for "inter" SSC copy
  2019-07-09 15:47   ` Olga Kornievskaia
@ 2019-07-17 18:05     ` Olga Kornievskaia
  0 siblings, 0 replies; 51+ messages in thread
From: Olga Kornievskaia @ 2019-07-17 18:05 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: J. Bruce Fields, linux-nfs

Hi Bruce,

Do you have any more comments than what Anna already provided? I have
been waiting for more comments before putting up those 2 small
changes.

On Tue, Jul 9, 2019 at 11:47 AM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> Amir's patches have been in the linux-next xfs tree and will go into
> 5.3. I would like for NFS patches to go into 5.4 (I'm assuming hoping
> for 5.3 is unrealistic).
>
> On Mon, Jul 8, 2019 at 11:53 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> >
> > Thanks for resending.  What's the status of Amir's series?  I guess I've
> > been using that as an excuse to put off reviewing these, but I really
> > should anyway....
> >
> > --b.
> >
> > On Mon, Jul 08, 2019 at 03:23:44PM -0400, Olga Kornievskaia wrote:
> > > This patch series adds support for NFSv4.2 copy offload feature
> > > allowing copy between two different NFS servers.
> > >
> > > This functionality depends on the VFS ability to support generic
> > > copy_file_range() where a copy is done between an NFS file and
> > > a local file system. This is on top of Amir's VFS generic copy
> > > offload series.
> > >
> > > This feature is enabled by the kernel module parameter --
> > > inter_copy_offload_enable -- and by default is disabled. There is
> > > also a kernel compile configuration of NFSD_V4_2_INTER_SSC that
> > > adds dependency on the NFS client side functions called from the
> > > server.
> > >
> > > These patches work on top of existing async intra copy offload
> > > patches. For the "inter" SSC, the implementation only supports
> > > asynchronous inter copy.
> > >
> > > On the source server, upon receiving a COPY_NOTIFY, it generate a
> > > unique stateid that's kept in the global list. Upon receiving a READ
> > > with a stateid, the code checks the normal list of open stateid and
> > > now additionally, it'll check the copy state list as well before
> > > deciding to either fail with BAD_STATEID or find one that matches.
> > > The stored stateid is only valid to be used for the first time
> > > with a choosen lease period (90s currently). When the source server
> > > received an OFFLOAD_CANCEL, it will remove the stateid from the
> > > global list. Otherwise, the copy stateid is removed upon the removal
> > > of its "parent" stateid (open/lock/delegation stateid).
> > >
> > > On the destination server, upon receiving a COPY request, the server
> > > establishes the necessary clientid/session with the source server.
> > > It calls into the NFS client code to establish the necessary
> > > open stateid, filehandle, file description (without doing an NFS open).
> > > Then the server calls into the copy_file_range() to preform the copy
> > > where the source file will issue NFS READs and then do local file
> > > system writes (this depends on the VFS ability to do cross device
> > > copy_file_range().
> > >
> > > v4:
> > > --- allowing for synchronous inter server-to-server copy
> > > --- added missing offload_cancel on the source server
> > >
> > > Already presented numbers for performance improvement for large
> > > file transfer but here are times for copying linux kernel tree
> > > (which is mostly small files):
> > > -- regular cp 6m1s (intra)
> > > -- copy offload cp 4m11s (intra)
> > >    -- benefit of using copy offload with small copies using sync copy
> > > -- regular cp 6m9s (inter)
> > > -- copy offload cp 6m3s (inter)
> > >    -- same performance as traditional as for most it fallback to traditional
> > > copy offload
> > >
> > > Olga Kornievskaia (8):
> > >   NFSD fill-in netloc4 structure
> > >   NFSD add ca_source_server<> to COPY
> > >   NFSD return nfs4_stid in nfs4_preprocess_stateid_op
> > >   NFSD add COPY_NOTIFY operation
> > >   NFSD check stateids against copy stateids
> > >   NFSD generalize nfsd4_compound_state flag names
> > >   NFSD: allow inter server COPY to have a STALE source server fh
> > >   NFSD add nfs4 inter ssc to nfsd4_copy
> > >
> > >  fs/nfsd/Kconfig     |  10 ++
> > >  fs/nfsd/nfs4proc.c  | 434 +++++++++++++++++++++++++++++++++++++++++++++++-----
> > >  fs/nfsd/nfs4state.c | 135 ++++++++++++++--
> > >  fs/nfsd/nfs4xdr.c   | 172 ++++++++++++++++++++-
> > >  fs/nfsd/nfsd.h      |  32 ++++
> > >  fs/nfsd/nfsfh.h     |   5 +-
> > >  fs/nfsd/nfssvc.c    |   6 +
> > >  fs/nfsd/state.h     |  25 ++-
> > >  fs/nfsd/xdr4.h      |  37 ++++-
> > >  9 files changed, 790 insertions(+), 66 deletions(-)
> > >
> > > --
> > > 1.8.3.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 1/8] NFSD fill-in netloc4 structure
  2019-07-08 19:23 ` [PATCH v4 1/8] NFSD fill-in netloc4 structure Olga Kornievskaia
@ 2019-07-17 21:13   ` bfields
  2019-07-22 19:59     ` Olga Kornievskaia
  0 siblings, 1 reply; 51+ messages in thread
From: bfields @ 2019-07-17 21:13 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: bfields, linux-nfs

On Mon, Jul 08, 2019 at 03:23:45PM -0400, Olga Kornievskaia wrote:
> From: Olga Kornievskaia <kolga@netapp.com>
> 
> nfs.4 defines nfs42_netaddr structure that represents netloc4.
> 
> Populate needed fields from the sockaddr structure.
> 
> This will be used by flexfiles and 4.2 inter copy
> 
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  fs/nfsd/nfsd.h | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> index 24187b5..8f4fc50 100644
> --- a/fs/nfsd/nfsd.h
> +++ b/fs/nfsd/nfsd.h
> @@ -19,6 +19,7 @@
>  #include <linux/sunrpc/svc.h>
>  #include <linux/sunrpc/svc_xprt.h>
>  #include <linux/sunrpc/msg_prot.h>
> +#include <linux/sunrpc/addr.h>
>  
>  #include <uapi/linux/nfsd/debug.h>
>  
> @@ -375,6 +376,37 @@ static inline bool nfsd4_spo_must_allow(struct svc_rqst *rqstp)
>  
>  extern const u32 nfsd_suppattrs[3][3];
>  
> +static inline u32 nfsd4_set_netaddr(struct sockaddr *addr,
> +				    struct nfs42_netaddr *netaddr)
> +{
> +	struct sockaddr_in *sin = (struct sockaddr_in *)addr;
> +	struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)addr;
> +	unsigned int port;
> +	size_t ret_addr, ret_port;
> +
> +	switch (addr->sa_family) {
> +	case AF_INET:
> +		port = ntohs(sin->sin_port);
> +		sprintf(netaddr->netid, "tcp");
> +		netaddr->netid_len = 3;
> +		break;
> +	case AF_INET6:
> +		port = ntohs(sin6->sin6_port);
> +		sprintf(netaddr->netid, "tcp6");
> +		netaddr->netid_len = 4;
> +		break;
> +	default:
> +		return nfserr_inval;
> +	}
> +	ret_addr = rpc_ntop(addr, netaddr->addr, sizeof(netaddr->addr));
> +	ret_port = snprintf(netaddr->addr + ret_addr,
> +			    RPCBIND_MAXUADDRLEN + 1 - ret_addr,
> +			    ".%u.%u", port >> 8, port & 0xff);
> +	WARN_ON(ret_port >= RPCBIND_MAXUADDRLEN + 1 - ret_addr);
> +	netaddr->addr_len = ret_addr + ret_port;
> +	return 0;
> +}

Kinda surprised we don't already do something like this elsewhere, but I
don't see anything exactly the same.  Might be possible to put this in
net/sunrpc/addr.c and share some code with rpc_sockaddr2uaddr?  I'll
leave it to you whether that looks worth it.

--b.

> +
>  static inline bool bmval_is_subset(const u32 *bm1, const u32 *bm2)
>  {
>  	return !((bm1[0] & ~bm2[0]) ||
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 2/8] NFSD add ca_source_server<> to COPY
  2019-07-08 19:23 ` [PATCH v4 2/8] NFSD add ca_source_server<> to COPY Olga Kornievskaia
@ 2019-07-17 21:40   ` bfields
  2019-07-22 20:00     ` Olga Kornievskaia
  0 siblings, 1 reply; 51+ messages in thread
From: bfields @ 2019-07-17 21:40 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: bfields, linux-nfs

On Mon, Jul 08, 2019 at 03:23:46PM -0400, Olga Kornievskaia wrote:
> From: Olga Kornievskaia <kolga@netapp.com>
> 
> Decode the ca_source_server list that's sent but only use the
> first one. Presence of non-zero list indicates an "inter" copy.
> 
> Signed-off-by: Andy Adamson <andros@netapp.com>
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  fs/nfsd/nfs4xdr.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  fs/nfsd/xdr4.h    | 12 +++++----
>  2 files changed, 80 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 52c4f6d..15f53bb 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -40,6 +40,7 @@
>  #include <linux/utsname.h>
>  #include <linux/pagemap.h>
>  #include <linux/sunrpc/svcauth_gss.h>
> +#include <linux/sunrpc/addr.h>
>  
>  #include "idmap.h"
>  #include "acl.h"
> @@ -1744,11 +1745,58 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, str
>  	DECODE_TAIL;
>  }
>  
> +static __be32 nfsd4_decode_nl4_server(struct nfsd4_compoundargs *argp,
> +				      struct nl4_server *ns)
> +{
> +	DECODE_HEAD;
> +	struct nfs42_netaddr *naddr;
> +
> +	READ_BUF(4);
> +	ns->nl4_type = be32_to_cpup(p++);
> +
> +	/* currently support for 1 inter-server source server */
> +	switch (ns->nl4_type) {
> +	case NL4_NAME:
> +	case NL4_URL:
> +		READ_BUF(4);
> +		ns->u.nl4_str_sz = be32_to_cpup(p++);
> +		if (ns->u.nl4_str_sz > NFS4_OPAQUE_LIMIT)
> +			goto xdr_error;
> +
> +		READ_BUF(ns->u.nl4_str_sz);
> +		COPYMEM(ns->u.nl4_str,
> +			ns->u.nl4_str_sz);
> +		break;

Do we actually have plans to use this case?  If not, it's probably not
worth saving these fields.

--b.

> +	case NL4_NETADDR:
> +		naddr = &ns->u.nl4_addr;
> +
> +		READ_BUF(4);
> +		naddr->netid_len = be32_to_cpup(p++);
> +		if (naddr->netid_len > RPCBIND_MAXNETIDLEN)
> +			goto xdr_error;
> +
> +		READ_BUF(naddr->netid_len + 4); /* 4 for uaddr len */
> +		COPYMEM(naddr->netid, naddr->netid_len);
> +
> +		naddr->addr_len = be32_to_cpup(p++);
> +		if (naddr->addr_len > RPCBIND_MAXUADDRLEN)
> +			goto xdr_error;
> +
> +		READ_BUF(naddr->addr_len);
> +		COPYMEM(naddr->addr, naddr->addr_len);
> +		break;
> +	default:
> +		goto xdr_error;
> +	}
> +	DECODE_TAIL;
> +}
> +
>  static __be32
>  nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy)
>  {
>  	DECODE_HEAD;
> -	unsigned int tmp;
> +	struct nl4_server *ns_dummy;
> +	int i, count;
>  
>  	status = nfsd4_decode_stateid(argp, &copy->cp_src_stateid);
>  	if (status)
> @@ -1763,8 +1811,31 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, str
>  	p = xdr_decode_hyper(p, &copy->cp_count);
>  	p++; /* ca_consecutive: we always do consecutive copies */
>  	copy->cp_synchronous = be32_to_cpup(p++);
> -	tmp = be32_to_cpup(p); /* Source server list not supported */
> +	count = be32_to_cpup(p++);
>  
> +	copy->cp_intra = false;
> +	if (count == 0) { /* intra-server copy */
> +		copy->cp_intra = true;
> +		goto intra;
> +	}
> +
> +	/* decode all the supplied server addresses but use first */
> +	status = nfsd4_decode_nl4_server(argp, &copy->cp_src);
> +	if (status)
> +		return status;
> +
> +	ns_dummy = kmalloc(sizeof(struct nl4_server), GFP_KERNEL);
> +	if (ns_dummy == NULL)
> +		return nfserrno(-ENOMEM);
> +	for (i = 0; i < count - 1; i++) {
> +		status = nfsd4_decode_nl4_server(argp, ns_dummy);
> +		if (status) {
> +			kfree(ns_dummy);
> +			return status;
> +		}
> +	}
> +	kfree(ns_dummy);
> +intra:
>  	DECODE_TAIL;
>  }
>  
> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
> index feeb6d4..513c9ff 100644
> --- a/fs/nfsd/xdr4.h
> +++ b/fs/nfsd/xdr4.h
> @@ -516,11 +516,13 @@ struct nfsd42_write_res {
>  
>  struct nfsd4_copy {
>  	/* request */
> -	stateid_t	cp_src_stateid;
> -	stateid_t	cp_dst_stateid;
> -	u64		cp_src_pos;
> -	u64		cp_dst_pos;
> -	u64		cp_count;
> +	stateid_t		cp_src_stateid;
> +	stateid_t		cp_dst_stateid;
> +	u64			cp_src_pos;
> +	u64			cp_dst_pos;
> +	u64			cp_count;
> +	struct nl4_server	cp_src;
> +	bool			cp_intra;
>  
>  	/* both */
>  	bool		cp_synchronous;
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 4/8] NFSD add COPY_NOTIFY operation
  2019-07-08 19:23 ` [PATCH v4 4/8] NFSD add COPY_NOTIFY operation Olga Kornievskaia
  2019-07-09 12:34   ` Anna Schumaker
@ 2019-07-17 22:12   ` bfields
  2019-07-17 22:15   ` bfields
  2019-07-17 23:07   ` bfields
  3 siblings, 0 replies; 51+ messages in thread
From: bfields @ 2019-07-17 22:12 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: bfields, linux-nfs

On Mon, Jul 08, 2019 at 03:23:48PM -0400, Olga Kornievskaia wrote:
>  static __be32
> +nfsd42_encode_nl4_server(struct nfsd4_compoundres *resp, struct nl4_server *ns)
> +{
> +	struct xdr_stream *xdr = &resp->xdr;
> +	struct nfs42_netaddr *addr;
> +	__be32 *p;
> +
> +	p = xdr_reserve_space(xdr, 4);
> +	*p++ = cpu_to_be32(ns->nl4_type);
> +
> +	switch (ns->nl4_type) {
> +	case NL4_NETADDR:
> +		addr = &ns->u.nl4_addr;
> +
> +		/* netid_len, netid, uaddr_len, uaddr (port included
> +		 * in RPCBIND_MAXUADDRLEN)
> +		 */
> +		p = xdr_reserve_space(xdr,
> +			4 /* netid len */ +
> +			(XDR_QUADLEN(addr->netid_len) * 4) +
> +			4 /* uaddr len */ +
> +			(XDR_QUADLEN(addr->addr_len) * 4));
> +		if (!p)
> +			return nfserr_resource;
> +
> +		*p++ = cpu_to_be32(addr->netid_len);
> +		p = xdr_encode_opaque_fixed(p, addr->netid,
> +					    addr->netid_len);
> +		*p++ = cpu_to_be32(addr->addr_len);
> +		p = xdr_encode_opaque_fixed(p, addr->addr,
> +					addr->addr_len);
> +		break;
> +	default:
> +		WARN_ON(ns->nl4_type != NL4_NETADDR);

I default to WARN_ON_ONCE().

--b.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 4/8] NFSD add COPY_NOTIFY operation
  2019-07-08 19:23 ` [PATCH v4 4/8] NFSD add COPY_NOTIFY operation Olga Kornievskaia
  2019-07-09 12:34   ` Anna Schumaker
  2019-07-17 22:12   ` bfields
@ 2019-07-17 22:15   ` bfields
  2019-07-22 20:03     ` Olga Kornievskaia
  2019-07-17 23:07   ` bfields
  3 siblings, 1 reply; 51+ messages in thread
From: bfields @ 2019-07-17 22:15 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: bfields, linux-nfs

On Mon, Jul 08, 2019 at 03:23:48PM -0400, Olga Kornievskaia wrote:
> From: Olga Kornievskaia <kolga@netapp.com>
> 
> Introducing the COPY_NOTIFY operation.
> 
> Create a new unique stateid that will keep track of the copy
> state and the upcoming READs that will use that stateid. Keep
> it in the list associated with parent stateid.
> 
> Return single netaddr to advertise to the copy.
> 
> Signed-off-by: Andy Adamson <andros@netapp.com>
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  fs/nfsd/nfs4proc.c  | 71 +++++++++++++++++++++++++++++++++++----
>  fs/nfsd/nfs4state.c | 64 +++++++++++++++++++++++++++++++----
>  fs/nfsd/nfs4xdr.c   | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++--
>  fs/nfsd/state.h     | 18 ++++++++--
>  fs/nfsd/xdr4.h      | 13 +++++++
>  5 files changed, 247 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index cfd8767..c39fa72 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -37,6 +37,7 @@
>  #include <linux/falloc.h>
>  #include <linux/slab.h>
>  #include <linux/kthread.h>
> +#include <linux/sunrpc/addr.h>
>  
>  #include "idmap.h"
>  #include "cache.h"
> @@ -1033,7 +1034,8 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
>  static __be32
>  nfsd4_verify_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  		  stateid_t *src_stateid, struct file **src,
> -		  stateid_t *dst_stateid, struct file **dst)
> +		  stateid_t *dst_stateid, struct file **dst,
> +		  struct nfs4_stid **stid)
>  {
>  	__be32 status;
>  
> @@ -1050,7 +1052,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
>  
>  	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
>  					    dst_stateid, WR_STATE, dst, NULL,
> -					    NULL);
> +					    stid);

Doesn't this belong with the previous patch?

--b.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 4/8] NFSD add COPY_NOTIFY operation
  2019-07-08 19:23 ` [PATCH v4 4/8] NFSD add COPY_NOTIFY operation Olga Kornievskaia
                     ` (2 preceding siblings ...)
  2019-07-17 22:15   ` bfields
@ 2019-07-17 23:07   ` bfields
  2019-07-22 20:17     ` Olga Kornievskaia
  3 siblings, 1 reply; 51+ messages in thread
From: bfields @ 2019-07-17 23:07 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: bfields, linux-nfs

On Mon, Jul 08, 2019 at 03:23:48PM -0400, Olga Kornievskaia wrote:
> @@ -726,24 +727,53 @@ struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *sla
>  /*
>   * Create a unique stateid_t to represent each COPY.
>   */
> -int nfs4_init_cp_state(struct nfsd_net *nn, struct nfsd4_copy *copy)
> +static int nfs4_init_cp_state(struct nfsd_net *nn, void *ptr, stateid_t *stid)
>  {
>  	int new_id;
>  
>  	idr_preload(GFP_KERNEL);
>  	spin_lock(&nn->s2s_cp_lock);
> -	new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, copy, 0, 0, GFP_NOWAIT);
> +	new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, ptr, 0, 0, GFP_NOWAIT);
>  	spin_unlock(&nn->s2s_cp_lock);
>  	idr_preload_end();
>  	if (new_id < 0)
>  		return 0;
> -	copy->cp_stateid.si_opaque.so_id = new_id;
> -	copy->cp_stateid.si_opaque.so_clid.cl_boot = nn->boot_time;
> -	copy->cp_stateid.si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
> +	stid->si_opaque.so_id = new_id;
> +	stid->si_opaque.so_clid.cl_boot = nn->boot_time;
> +	stid->si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
>  	return 1;
>  }
>  
> -void nfs4_free_cp_state(struct nfsd4_copy *copy)
> +int nfs4_init_copy_state(struct nfsd_net *nn, struct nfsd4_copy *copy)
> +{
> +	return nfs4_init_cp_state(nn, copy, &copy->cp_stateid);
> +}

This little bit of refactoring could go into a seperate patch.  It's
easier for me to review lots of smaller patches.

But I don't understand why you're doing it.

Also, I'm a little suspicious of code that doesn't initialize an object
till after it's been added to a global structure.  The more typical
pattern is:


	initialize foo
	take locks, add foo global structure, drop locks.

This prevents anyone doing a lookup from finding "foo" while it's still
in a partially initialized state.

--b.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 5/8] NFSD check stateids against copy stateids
  2019-07-08 19:23 ` [PATCH v4 5/8] NFSD check stateids against copy stateids Olga Kornievskaia
@ 2019-07-19 22:01   ` bfields
  2019-07-22 20:24     ` Olga Kornievskaia
  0 siblings, 1 reply; 51+ messages in thread
From: bfields @ 2019-07-19 22:01 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: bfields, linux-nfs

On Mon, Jul 08, 2019 at 03:23:49PM -0400, Olga Kornievskaia wrote:
> Incoming stateid (used by a READ) could be a saved copy stateid.
> On first use make it active and check that the copy has started
> within the allowable lease time.
> 
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  fs/nfsd/nfs4state.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 2555eb9..b786625 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -5232,6 +5232,49 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
>  
>  	return 0;
>  }
> +/*
> + * A READ from an inter server to server COPY will have a
> + * copy stateid. Return the parent nfs4_stid.
> + */
> +static __be32 _find_cpntf_state(struct nfsd_net *nn, stateid_t *st,
> +		     struct nfs4_cpntf_state **cps)
> +{
> +	struct nfs4_cpntf_state *state = NULL;
> +
> +	if (st->si_opaque.so_clid.cl_id != nn->s2s_cp_cl_id)
> +		return nfserr_bad_stateid;
> +	spin_lock(&nn->s2s_cp_lock);
> +	state = idr_find(&nn->s2s_cp_stateids, st->si_opaque.so_id);
> +	if (state)
> +		refcount_inc(&state->cp_p_stid->sc_count);
> +	spin_unlock(&nn->s2s_cp_lock);
> +	if (!state)
> +		return nfserr_bad_stateid;
> +	*cps = state;
> +	return 0;
> +}
> +
> +static __be32 find_cpntf_state(struct nfsd_net *nn, stateid_t *st,
> +			       struct nfs4_stid **stid)
> +{
> +	__be32 status;
> +	struct nfs4_cpntf_state *cps = NULL;
> +
> +	status = _find_cpntf_state(nn, st, &cps);
> +	if (status)
> +		return status;
> +
> +	/* Did the inter server to server copy start in time? */
> +	if (cps->cp_active == false && !time_after(cps->cp_timeout, jiffies)) {
> +		nfs4_put_stid(cps->cp_p_stid);
> +		return nfserr_partner_no_auth;

I wonder whether instead of checking the time we should instead be
destroying copy stateid's as they expire, so the fact that you were
still able to look up the stateid suggests that it's good.  Or would
that result in returning the wrong error here?  Just curious.

> +	} else
> +		cps->cp_active = true;
> +
> +	*stid = cps->cp_p_stid;

What guarantees that cp_p_stid still points to a valid stateid?  (E.g.
if this is an open stateid that has since been closed.)

--b.

> +
> +	return nfs_ok;
> +}
>  
>  /*
>   * Checks for stateid operations
> @@ -5264,6 +5307,8 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
>  	status = nfsd4_lookup_stateid(cstate, stateid,
>  				NFS4_DELEG_STID|NFS4_OPEN_STID|NFS4_LOCK_STID,
>  				&s, nn);
> +	if (status == nfserr_bad_stateid)
> +		status = find_cpntf_state(nn, stateid, &s);
>  	if (status)
>  		return status;
>  	status = nfsd4_stid_check_stateid_generation(stateid, s,
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 1/8] NFSD fill-in netloc4 structure
  2019-07-17 21:13   ` bfields
@ 2019-07-22 19:59     ` Olga Kornievskaia
  2019-07-30 15:48       ` Olga Kornievskaia
  0 siblings, 1 reply; 51+ messages in thread
From: Olga Kornievskaia @ 2019-07-22 19:59 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: J. Bruce Fields, linux-nfs

On Wed, Jul 17, 2019 at 5:13 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Mon, Jul 08, 2019 at 03:23:45PM -0400, Olga Kornievskaia wrote:
> > From: Olga Kornievskaia <kolga@netapp.com>
> >
> > nfs.4 defines nfs42_netaddr structure that represents netloc4.
> >
> > Populate needed fields from the sockaddr structure.
> >
> > This will be used by flexfiles and 4.2 inter copy
> >
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > ---
> >  fs/nfsd/nfsd.h | 32 ++++++++++++++++++++++++++++++++
> >  1 file changed, 32 insertions(+)
> >
> > diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> > index 24187b5..8f4fc50 100644
> > --- a/fs/nfsd/nfsd.h
> > +++ b/fs/nfsd/nfsd.h
> > @@ -19,6 +19,7 @@
> >  #include <linux/sunrpc/svc.h>
> >  #include <linux/sunrpc/svc_xprt.h>
> >  #include <linux/sunrpc/msg_prot.h>
> > +#include <linux/sunrpc/addr.h>
> >
> >  #include <uapi/linux/nfsd/debug.h>
> >
> > @@ -375,6 +376,37 @@ static inline bool nfsd4_spo_must_allow(struct svc_rqst *rqstp)
> >
> >  extern const u32 nfsd_suppattrs[3][3];
> >
> > +static inline u32 nfsd4_set_netaddr(struct sockaddr *addr,
> > +                                 struct nfs42_netaddr *netaddr)
> > +{
> > +     struct sockaddr_in *sin = (struct sockaddr_in *)addr;
> > +     struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)addr;
> > +     unsigned int port;
> > +     size_t ret_addr, ret_port;
> > +
> > +     switch (addr->sa_family) {
> > +     case AF_INET:
> > +             port = ntohs(sin->sin_port);
> > +             sprintf(netaddr->netid, "tcp");
> > +             netaddr->netid_len = 3;
> > +             break;
> > +     case AF_INET6:
> > +             port = ntohs(sin6->sin6_port);
> > +             sprintf(netaddr->netid, "tcp6");
> > +             netaddr->netid_len = 4;
> > +             break;
> > +     default:
> > +             return nfserr_inval;
> > +     }
> > +     ret_addr = rpc_ntop(addr, netaddr->addr, sizeof(netaddr->addr));
> > +     ret_port = snprintf(netaddr->addr + ret_addr,
> > +                         RPCBIND_MAXUADDRLEN + 1 - ret_addr,
> > +                         ".%u.%u", port >> 8, port & 0xff);
> > +     WARN_ON(ret_port >= RPCBIND_MAXUADDRLEN + 1 - ret_addr);
> > +     netaddr->addr_len = ret_addr + ret_port;
> > +     return 0;
> > +}
>
> Kinda surprised we don't already do something like this elsewhere, but I
> don't see anything exactly the same.  Might be possible to put this in
> net/sunrpc/addr.c and share some code with rpc_sockaddr2uaddr?  I'll
> leave it to you whether that looks worth it.

I'll investigate how to move this into the sunrpc. Client also
populates nfs42_netaddr structure but it's slightly different. I need
to go back and see if what's shareable.

>
> --b.
>
> > +
> >  static inline bool bmval_is_subset(const u32 *bm1, const u32 *bm2)
> >  {
> >       return !((bm1[0] & ~bm2[0]) ||
> > --
> > 1.8.3.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 2/8] NFSD add ca_source_server<> to COPY
  2019-07-17 21:40   ` bfields
@ 2019-07-22 20:00     ` Olga Kornievskaia
  0 siblings, 0 replies; 51+ messages in thread
From: Olga Kornievskaia @ 2019-07-22 20:00 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: J. Bruce Fields, linux-nfs

On Wed, Jul 17, 2019 at 5:40 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Mon, Jul 08, 2019 at 03:23:46PM -0400, Olga Kornievskaia wrote:
> > From: Olga Kornievskaia <kolga@netapp.com>
> >
> > Decode the ca_source_server list that's sent but only use the
> > first one. Presence of non-zero list indicates an "inter" copy.
> >
> > Signed-off-by: Andy Adamson <andros@netapp.com>
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > ---
> >  fs/nfsd/nfs4xdr.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
> >  fs/nfsd/xdr4.h    | 12 +++++----
> >  2 files changed, 80 insertions(+), 7 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > index 52c4f6d..15f53bb 100644
> > --- a/fs/nfsd/nfs4xdr.c
> > +++ b/fs/nfsd/nfs4xdr.c
> > @@ -40,6 +40,7 @@
> >  #include <linux/utsname.h>
> >  #include <linux/pagemap.h>
> >  #include <linux/sunrpc/svcauth_gss.h>
> > +#include <linux/sunrpc/addr.h>
> >
> >  #include "idmap.h"
> >  #include "acl.h"
> > @@ -1744,11 +1745,58 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, str
> >       DECODE_TAIL;
> >  }
> >
> > +static __be32 nfsd4_decode_nl4_server(struct nfsd4_compoundargs *argp,
> > +                                   struct nl4_server *ns)
> > +{
> > +     DECODE_HEAD;
> > +     struct nfs42_netaddr *naddr;
> > +
> > +     READ_BUF(4);
> > +     ns->nl4_type = be32_to_cpup(p++);
> > +
> > +     /* currently support for 1 inter-server source server */
> > +     switch (ns->nl4_type) {
> > +     case NL4_NAME:
> > +     case NL4_URL:
> > +             READ_BUF(4);
> > +             ns->u.nl4_str_sz = be32_to_cpup(p++);
> > +             if (ns->u.nl4_str_sz > NFS4_OPAQUE_LIMIT)
> > +                     goto xdr_error;
> > +
> > +             READ_BUF(ns->u.nl4_str_sz);
> > +             COPYMEM(ns->u.nl4_str,
> > +                     ns->u.nl4_str_sz);
> > +             break;
>
> Do we actually have plans to use this case?  If not, it's probably not
> worth saving these fields.

We don't use them. They were there for "completeness".  I can remove
them to simplify the code.

>
> --b.
>
> > +     case NL4_NETADDR:
> > +             naddr = &ns->u.nl4_addr;
> > +
> > +             READ_BUF(4);
> > +             naddr->netid_len = be32_to_cpup(p++);
> > +             if (naddr->netid_len > RPCBIND_MAXNETIDLEN)
> > +                     goto xdr_error;
> > +
> > +             READ_BUF(naddr->netid_len + 4); /* 4 for uaddr len */
> > +             COPYMEM(naddr->netid, naddr->netid_len);
> > +
> > +             naddr->addr_len = be32_to_cpup(p++);
> > +             if (naddr->addr_len > RPCBIND_MAXUADDRLEN)
> > +                     goto xdr_error;
> > +
> > +             READ_BUF(naddr->addr_len);
> > +             COPYMEM(naddr->addr, naddr->addr_len);
> > +             break;
> > +     default:
> > +             goto xdr_error;
> > +     }
> > +     DECODE_TAIL;
> > +}
> > +
> >  static __be32
> >  nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy)
> >  {
> >       DECODE_HEAD;
> > -     unsigned int tmp;
> > +     struct nl4_server *ns_dummy;
> > +     int i, count;
> >
> >       status = nfsd4_decode_stateid(argp, &copy->cp_src_stateid);
> >       if (status)
> > @@ -1763,8 +1811,31 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, str
> >       p = xdr_decode_hyper(p, &copy->cp_count);
> >       p++; /* ca_consecutive: we always do consecutive copies */
> >       copy->cp_synchronous = be32_to_cpup(p++);
> > -     tmp = be32_to_cpup(p); /* Source server list not supported */
> > +     count = be32_to_cpup(p++);
> >
> > +     copy->cp_intra = false;
> > +     if (count == 0) { /* intra-server copy */
> > +             copy->cp_intra = true;
> > +             goto intra;
> > +     }
> > +
> > +     /* decode all the supplied server addresses but use first */
> > +     status = nfsd4_decode_nl4_server(argp, &copy->cp_src);
> > +     if (status)
> > +             return status;
> > +
> > +     ns_dummy = kmalloc(sizeof(struct nl4_server), GFP_KERNEL);
> > +     if (ns_dummy == NULL)
> > +             return nfserrno(-ENOMEM);
> > +     for (i = 0; i < count - 1; i++) {
> > +             status = nfsd4_decode_nl4_server(argp, ns_dummy);
> > +             if (status) {
> > +                     kfree(ns_dummy);
> > +                     return status;
> > +             }
> > +     }
> > +     kfree(ns_dummy);
> > +intra:
> >       DECODE_TAIL;
> >  }
> >
> > diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
> > index feeb6d4..513c9ff 100644
> > --- a/fs/nfsd/xdr4.h
> > +++ b/fs/nfsd/xdr4.h
> > @@ -516,11 +516,13 @@ struct nfsd42_write_res {
> >
> >  struct nfsd4_copy {
> >       /* request */
> > -     stateid_t       cp_src_stateid;
> > -     stateid_t       cp_dst_stateid;
> > -     u64             cp_src_pos;
> > -     u64             cp_dst_pos;
> > -     u64             cp_count;
> > +     stateid_t               cp_src_stateid;
> > +     stateid_t               cp_dst_stateid;
> > +     u64                     cp_src_pos;
> > +     u64                     cp_dst_pos;
> > +     u64                     cp_count;
> > +     struct nl4_server       cp_src;
> > +     bool                    cp_intra;
> >
> >       /* both */
> >       bool            cp_synchronous;
> > --
> > 1.8.3.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 4/8] NFSD add COPY_NOTIFY operation
  2019-07-17 22:15   ` bfields
@ 2019-07-22 20:03     ` Olga Kornievskaia
  0 siblings, 0 replies; 51+ messages in thread
From: Olga Kornievskaia @ 2019-07-22 20:03 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: J. Bruce Fields, linux-nfs

On Wed, Jul 17, 2019 at 6:15 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Mon, Jul 08, 2019 at 03:23:48PM -0400, Olga Kornievskaia wrote:
> > From: Olga Kornievskaia <kolga@netapp.com>
> >
> > Introducing the COPY_NOTIFY operation.
> >
> > Create a new unique stateid that will keep track of the copy
> > state and the upcoming READs that will use that stateid. Keep
> > it in the list associated with parent stateid.
> >
> > Return single netaddr to advertise to the copy.
> >
> > Signed-off-by: Andy Adamson <andros@netapp.com>
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > ---
> >  fs/nfsd/nfs4proc.c  | 71 +++++++++++++++++++++++++++++++++++----
> >  fs/nfsd/nfs4state.c | 64 +++++++++++++++++++++++++++++++----
> >  fs/nfsd/nfs4xdr.c   | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++--
> >  fs/nfsd/state.h     | 18 ++++++++--
> >  fs/nfsd/xdr4.h      | 13 +++++++
> >  5 files changed, 247 insertions(+), 16 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > index cfd8767..c39fa72 100644
> > --- a/fs/nfsd/nfs4proc.c
> > +++ b/fs/nfsd/nfs4proc.c
> > @@ -37,6 +37,7 @@
> >  #include <linux/falloc.h>
> >  #include <linux/slab.h>
> >  #include <linux/kthread.h>
> > +#include <linux/sunrpc/addr.h>
> >
> >  #include "idmap.h"
> >  #include "cache.h"
> > @@ -1033,7 +1034,8 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
> >  static __be32
> >  nfsd4_verify_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >                 stateid_t *src_stateid, struct file **src,
> > -               stateid_t *dst_stateid, struct file **dst)
> > +               stateid_t *dst_stateid, struct file **dst,
> > +               struct nfs4_stid **stid)
> >  {
> >       __be32 status;
> >
> > @@ -1050,7 +1052,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
> >
> >       status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
> >                                           dst_stateid, WR_STATE, dst, NULL,
> > -                                         NULL);
> > +                                         stid);
>
> Doesn't this belong with the previous patch?

I could. I think what Andy did was first changed the code to add
ability to add to the nfs4_stid and this patch actually uses it.
Sounds like you prefer this to be in the previous patch?

>
> --b.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 4/8] NFSD add COPY_NOTIFY operation
  2019-07-17 23:07   ` bfields
@ 2019-07-22 20:17     ` Olga Kornievskaia
  2019-07-23 20:45       ` J. Bruce Fields
  0 siblings, 1 reply; 51+ messages in thread
From: Olga Kornievskaia @ 2019-07-22 20:17 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: J. Bruce Fields, linux-nfs

On Wed, Jul 17, 2019 at 7:07 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Mon, Jul 08, 2019 at 03:23:48PM -0400, Olga Kornievskaia wrote:
> > @@ -726,24 +727,53 @@ struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *sla
> >  /*
> >   * Create a unique stateid_t to represent each COPY.
> >   */
> > -int nfs4_init_cp_state(struct nfsd_net *nn, struct nfsd4_copy *copy)
> > +static int nfs4_init_cp_state(struct nfsd_net *nn, void *ptr, stateid_t *stid)
> >  {
> >       int new_id;
> >
> >       idr_preload(GFP_KERNEL);
> >       spin_lock(&nn->s2s_cp_lock);
> > -     new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, copy, 0, 0, GFP_NOWAIT);
> > +     new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, ptr, 0, 0, GFP_NOWAIT);
> >       spin_unlock(&nn->s2s_cp_lock);
> >       idr_preload_end();
> >       if (new_id < 0)
> >               return 0;
> > -     copy->cp_stateid.si_opaque.so_id = new_id;
> > -     copy->cp_stateid.si_opaque.so_clid.cl_boot = nn->boot_time;
> > -     copy->cp_stateid.si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
> > +     stid->si_opaque.so_id = new_id;
> > +     stid->si_opaque.so_clid.cl_boot = nn->boot_time;
> > +     stid->si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
> >       return 1;
> >  }
> >
> > -void nfs4_free_cp_state(struct nfsd4_copy *copy)
> > +int nfs4_init_copy_state(struct nfsd_net *nn, struct nfsd4_copy *copy)
> > +{
> > +     return nfs4_init_cp_state(nn, copy, &copy->cp_stateid);
> > +}
>
> This little bit of refactoring could go into a seperate patch.  It's
> easier for me to review lots of smaller patches.
>
> But I don't understand why you're doing it.
>
> Also, I'm a little suspicious of code that doesn't initialize an object
> till after it's been added to a global structure.  The more typical
> pattern is:
>
>
>         initialize foo
>         take locks, add foo global structure, drop locks.
>
> This prevents anyone doing a lookup from finding "foo" while it's still
> in a partially initialized state.

Let me try to explain the change. This change is due to the fact that
now both COPY_NOTIFY and COPY both are generating unique stateid
(COPY_NOTIFY needs a unique stateid to passed into the COPY and COPY
is generating a unique stateid to be referred to by callbacks).
Previously we had just the COPY generating the stateid (so it was
stored in the nfs4_copy structure) but now we have the COPY_NOTIFY
which doesn't create nfs4_copy when it's processing the operation but
still needs a unique stateid (stored in the stateid structure).

Let me see if I understand your suspicion and ask for guidance how to
resolve it as perhaps I'm misusing the function. idr_alloc_cyclic()
keeps track of the structure of the 2nd arguments with a value it
returns. How do I initiate the structure with the value of the
function without knowing the value which can only be returned when I
call the function to add it to the list? what you are suggesting is to
somehow get the value for the new_id but not associate anything then
update the copy structure with that value and then call
idr_alloc_cyclic() (or something else) to create that association of
the new_id and the structure? I don't know how to do that.

>
> --b.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 5/8] NFSD check stateids against copy stateids
  2019-07-19 22:01   ` bfields
@ 2019-07-22 20:24     ` Olga Kornievskaia
  2019-07-23 20:58       ` J. Bruce Fields
  0 siblings, 1 reply; 51+ messages in thread
From: Olga Kornievskaia @ 2019-07-22 20:24 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: J. Bruce Fields, linux-nfs

On Fri, Jul 19, 2019 at 6:01 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Mon, Jul 08, 2019 at 03:23:49PM -0400, Olga Kornievskaia wrote:
> > Incoming stateid (used by a READ) could be a saved copy stateid.
> > On first use make it active and check that the copy has started
> > within the allowable lease time.
> >
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > ---
> >  fs/nfsd/nfs4state.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 45 insertions(+)
> >
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 2555eb9..b786625 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -5232,6 +5232,49 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
> >
> >       return 0;
> >  }
> > +/*
> > + * A READ from an inter server to server COPY will have a
> > + * copy stateid. Return the parent nfs4_stid.
> > + */
> > +static __be32 _find_cpntf_state(struct nfsd_net *nn, stateid_t *st,
> > +                  struct nfs4_cpntf_state **cps)
> > +{
> > +     struct nfs4_cpntf_state *state = NULL;
> > +
> > +     if (st->si_opaque.so_clid.cl_id != nn->s2s_cp_cl_id)
> > +             return nfserr_bad_stateid;
> > +     spin_lock(&nn->s2s_cp_lock);
> > +     state = idr_find(&nn->s2s_cp_stateids, st->si_opaque.so_id);
> > +     if (state)
> > +             refcount_inc(&state->cp_p_stid->sc_count);
> > +     spin_unlock(&nn->s2s_cp_lock);
> > +     if (!state)
> > +             return nfserr_bad_stateid;
> > +     *cps = state;
> > +     return 0;
> > +}
> > +
> > +static __be32 find_cpntf_state(struct nfsd_net *nn, stateid_t *st,
> > +                            struct nfs4_stid **stid)
> > +{
> > +     __be32 status;
> > +     struct nfs4_cpntf_state *cps = NULL;
> > +
> > +     status = _find_cpntf_state(nn, st, &cps);
> > +     if (status)
> > +             return status;
> > +
> > +     /* Did the inter server to server copy start in time? */
> > +     if (cps->cp_active == false && !time_after(cps->cp_timeout, jiffies)) {
> > +             nfs4_put_stid(cps->cp_p_stid);
> > +             return nfserr_partner_no_auth;
>
> I wonder whether instead of checking the time we should instead be
> destroying copy stateid's as they expire, so the fact that you were
> still able to look up the stateid suggests that it's good.  Or would
> that result in returning the wrong error here?  Just curious.

In order to destroy copy stateid as they expire we need some thread
monitoring the copies and then remove the expired one. That seems like
a lot more work than what's currently there. The spec says that the
use of the copy has to start without a certain timeout and that's what
this is suppose to enforce. If the client took too long start the
copy, it'll get an error. I don't think it matters what error code is
returned BAD_STATEID or PARTNER_NO_AUTH both imply the stateid is bad.

>
> > +     } else
> > +             cps->cp_active = true;
> > +
> > +     *stid = cps->cp_p_stid;
>
> What guarantees that cp_p_stid still points to a valid stateid?  (E.g.
> if this is an open stateid that has since been closed.)

A copy (or copy_notify) stateid takes a reference on the parent, thus
we guaranteed that pointer is still a valid stateid.

>
> --b.
>
> > +
> > +     return nfs_ok;
> > +}
> >
> >  /*
> >   * Checks for stateid operations
> > @@ -5264,6 +5307,8 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
> >       status = nfsd4_lookup_stateid(cstate, stateid,
> >                               NFS4_DELEG_STID|NFS4_OPEN_STID|NFS4_LOCK_STID,
> >                               &s, nn);
> > +     if (status == nfserr_bad_stateid)
> > +             status = find_cpntf_state(nn, stateid, &s);
> >       if (status)
> >               return status;
> >       status = nfsd4_stid_check_stateid_generation(stateid, s,
> > --
> > 1.8.3.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 4/8] NFSD add COPY_NOTIFY operation
  2019-07-22 20:17     ` Olga Kornievskaia
@ 2019-07-23 20:45       ` J. Bruce Fields
  2019-07-30 15:48         ` Olga Kornievskaia
  0 siblings, 1 reply; 51+ messages in thread
From: J. Bruce Fields @ 2019-07-23 20:45 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: J. Bruce Fields, linux-nfs

On Mon, Jul 22, 2019 at 04:17:44PM -0400, Olga Kornievskaia wrote:
> On Wed, Jul 17, 2019 at 7:07 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> >
> > On Mon, Jul 08, 2019 at 03:23:48PM -0400, Olga Kornievskaia wrote:
> > > @@ -726,24 +727,53 @@ struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *sla
> > >  /*
> > >   * Create a unique stateid_t to represent each COPY.
> > >   */
> > > -int nfs4_init_cp_state(struct nfsd_net *nn, struct nfsd4_copy *copy)
> > > +static int nfs4_init_cp_state(struct nfsd_net *nn, void *ptr, stateid_t *stid)
> > >  {
> > >       int new_id;
> > >
> > >       idr_preload(GFP_KERNEL);
> > >       spin_lock(&nn->s2s_cp_lock);
> > > -     new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, copy, 0, 0, GFP_NOWAIT);
> > > +     new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, ptr, 0, 0, GFP_NOWAIT);
> > >       spin_unlock(&nn->s2s_cp_lock);
> > >       idr_preload_end();
> > >       if (new_id < 0)
> > >               return 0;
> > > -     copy->cp_stateid.si_opaque.so_id = new_id;
> > > -     copy->cp_stateid.si_opaque.so_clid.cl_boot = nn->boot_time;
> > > -     copy->cp_stateid.si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
> > > +     stid->si_opaque.so_id = new_id;
> > > +     stid->si_opaque.so_clid.cl_boot = nn->boot_time;
> > > +     stid->si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
> > >       return 1;
> > >  }
> > >
> > > -void nfs4_free_cp_state(struct nfsd4_copy *copy)
> > > +int nfs4_init_copy_state(struct nfsd_net *nn, struct nfsd4_copy *copy)
> > > +{
> > > +     return nfs4_init_cp_state(nn, copy, &copy->cp_stateid);
> > > +}
> >
> > This little bit of refactoring could go into a seperate patch.  It's
> > easier for me to review lots of smaller patches.
> >
> > But I don't understand why you're doing it.
> >
> > Also, I'm a little suspicious of code that doesn't initialize an object
> > till after it's been added to a global structure.  The more typical
> > pattern is:
> >
> >
> >         initialize foo
> >         take locks, add foo global structure, drop locks.
> >
> > This prevents anyone doing a lookup from finding "foo" while it's still
> > in a partially initialized state.
> 
> Let me try to explain the change. This change is due to the fact that
> now both COPY_NOTIFY and COPY both are generating unique stateid
> (COPY_NOTIFY needs a unique stateid to passed into the COPY and COPY
> is generating a unique stateid to be referred to by callbacks).
> Previously we had just the COPY generating the stateid (so it was
> stored in the nfs4_copy structure) but now we have the COPY_NOTIFY
> which doesn't create nfs4_copy when it's processing the operation but
> still needs a unique stateid (stored in the stateid structure).

The usual way to handle a situation like this is to store in the idr a
pointer to the stateid (copy->cp_stateid or cps->cp_stateid).  When you
do a lookup you do something like:

	st = idr_find(...);
	copy = container_of(st, struct nfsd4_copy, cp_stateid);

to get a copy to the larger structure.

By the way, in find_internal_cpntf_state, a buggy or malicious client
could cause idr_find to look up a copy (not a copy_notify) stateid.  The
code needs some way to distinguish the two cases.  You could use a
different cl_id for the two cases.  That might also be handy for
debugging.  And/or you could do as we do in the case of open, lock, and
other stateid's and embed a common structure that also includes a "type"
field.  (See nfs4_stid->sc_type).

> Let me see if I understand your suspicion and ask for guidance how to
> resolve it as perhaps I'm misusing the function. idr_alloc_cyclic()
> keeps track of the structure of the 2nd arguments with a value it
> returns. How do I initiate the structure with the value of the
> function without knowing the value which can only be returned when I
> call the function to add it to the list? what you are suggesting is to
> somehow get the value for the new_id but not associate anything then
> update the copy structure with that value and then call
> idr_alloc_cyclic() (or something else) to create that association of
> the new_id and the structure? I don't know how to do that.

You could move the initialization under the s2s_cp_lock.  But there's
additional initialization that's done in the caller.

So, either this needs more locking, or maybe some flag value set to
indicate that the object is initialized and safe to use.  (In the case
of open/lock/etc.  stateid's I think that is sc_type.  I'm not
completely convinced we've got that correct, though.)

--b.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 5/8] NFSD check stateids against copy stateids
  2019-07-22 20:24     ` Olga Kornievskaia
@ 2019-07-23 20:58       ` J. Bruce Fields
  2019-07-30 16:03         ` Olga Kornievskaia
  0 siblings, 1 reply; 51+ messages in thread
From: J. Bruce Fields @ 2019-07-23 20:58 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: J. Bruce Fields, linux-nfs

On Mon, Jul 22, 2019 at 04:24:08PM -0400, Olga Kornievskaia wrote:
> On Fri, Jul 19, 2019 at 6:01 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> >
> > On Mon, Jul 08, 2019 at 03:23:49PM -0400, Olga Kornievskaia wrote:
> > > Incoming stateid (used by a READ) could be a saved copy stateid.
> > > On first use make it active and check that the copy has started
> > > within the allowable lease time.
> > >
> > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > ---
> > >  fs/nfsd/nfs4state.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 45 insertions(+)
> > >
> > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > index 2555eb9..b786625 100644
> > > --- a/fs/nfsd/nfs4state.c
> > > +++ b/fs/nfsd/nfs4state.c
> > > @@ -5232,6 +5232,49 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
> > >
> > >       return 0;
> > >  }
> > > +/*
> > > + * A READ from an inter server to server COPY will have a
> > > + * copy stateid. Return the parent nfs4_stid.
> > > + */
> > > +static __be32 _find_cpntf_state(struct nfsd_net *nn, stateid_t *st,
> > > +                  struct nfs4_cpntf_state **cps)
> > > +{
> > > +     struct nfs4_cpntf_state *state = NULL;
> > > +
> > > +     if (st->si_opaque.so_clid.cl_id != nn->s2s_cp_cl_id)
> > > +             return nfserr_bad_stateid;
> > > +     spin_lock(&nn->s2s_cp_lock);
> > > +     state = idr_find(&nn->s2s_cp_stateids, st->si_opaque.so_id);
> > > +     if (state)
> > > +             refcount_inc(&state->cp_p_stid->sc_count);
> > > +     spin_unlock(&nn->s2s_cp_lock);
> > > +     if (!state)
> > > +             return nfserr_bad_stateid;
> > > +     *cps = state;
> > > +     return 0;
> > > +}
> > > +
> > > +static __be32 find_cpntf_state(struct nfsd_net *nn, stateid_t *st,
> > > +                            struct nfs4_stid **stid)
> > > +{
> > > +     __be32 status;
> > > +     struct nfs4_cpntf_state *cps = NULL;
> > > +
> > > +     status = _find_cpntf_state(nn, st, &cps);
> > > +     if (status)
> > > +             return status;
> > > +
> > > +     /* Did the inter server to server copy start in time? */
> > > +     if (cps->cp_active == false && !time_after(cps->cp_timeout, jiffies)) {
> > > +             nfs4_put_stid(cps->cp_p_stid);
> > > +             return nfserr_partner_no_auth;
> >
> > I wonder whether instead of checking the time we should instead be
> > destroying copy stateid's as they expire, so the fact that you were
> > still able to look up the stateid suggests that it's good.  Or would
> > that result in returning the wrong error here?  Just curious.
> 
> In order to destroy copy stateid as they expire we need some thread
> monitoring the copies and then remove the expired one.

It would be just another thing to do in the laundromat thread.

So when do we free these things?  The only free_cpntf_state() caller I
can find is in nfsd4_offload_cancel, but I think the client only calls
those in case of interrupts or other unusual events.  What about a copy
that terminates normally?

> That seems like
> a lot more work than what's currently there. The spec says that the
> use of the copy has to start without a certain timeout and that's what
> this is suppose to enforce. If the client took too long start the
> copy, it'll get an error. I don't think it matters what error code is
> returned BAD_STATEID or PARTNER_NO_AUTH both imply the stateid is bad.
> 
> >
> > > +     } else
> > > +             cps->cp_active = true;
> > > +
> > > +     *stid = cps->cp_p_stid;
> >
> > What guarantees that cp_p_stid still points to a valid stateid?  (E.g.
> > if this is an open stateid that has since been closed.)
> 
> A copy (or copy_notify) stateid takes a reference on the parent, thus
> we guaranteed that pointer is still a valid stateid.

I only see a reference count taken when one is looked up, in
find_internal_cpntf_state.  That's too late.

--b.

> 
> >
> > --b.
> >
> > > +
> > > +     return nfs_ok;
> > > +}
> > >
> > >  /*
> > >   * Checks for stateid operations
> > > @@ -5264,6 +5307,8 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
> > >       status = nfsd4_lookup_stateid(cstate, stateid,
> > >                               NFS4_DELEG_STID|NFS4_OPEN_STID|NFS4_LOCK_STID,
> > >                               &s, nn);
> > > +     if (status == nfserr_bad_stateid)
> > > +             status = find_cpntf_state(nn, stateid, &s);
> > >       if (status)
> > >               return status;
> > >       status = nfsd4_stid_check_stateid_generation(stateid, s,
> > > --
> > > 1.8.3.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 7/8] NFSD: allow inter server COPY to have a STALE source server fh
  2019-07-08 19:23 ` [PATCH v4 7/8] NFSD: allow inter server COPY to have a STALE source server fh Olga Kornievskaia
@ 2019-07-23 21:35   ` bfields
  2019-07-30 15:48     ` Olga Kornievskaia
  0 siblings, 1 reply; 51+ messages in thread
From: bfields @ 2019-07-23 21:35 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: bfields, linux-nfs

On Mon, Jul 08, 2019 at 03:23:51PM -0400, Olga Kornievskaia wrote:
> The inter server to server COPY source server filehandle
> is a foreign filehandle as the COPY is sent to the destination
> server.
> 
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  fs/nfsd/Kconfig    | 10 ++++++++++
>  fs/nfsd/nfs4proc.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++----
>  fs/nfsd/nfsfh.h    |  5 ++++-
>  fs/nfsd/xdr4.h     |  1 +
>  4 files changed, 64 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
> index d25f6bb..bef3a58 100644
> --- a/fs/nfsd/Kconfig
> +++ b/fs/nfsd/Kconfig
> @@ -132,6 +132,16 @@ config NFSD_FLEXFILELAYOUT
>  
>  	  If unsure, say N.
>  
> +config NFSD_V4_2_INTER_SSC
> +	bool "NFSv4.2 inter server to server COPY"
> +	depends on NFSD_V4 && NFS_V4_1 && NFS_V4_2
> +	help
> +	  This option enables support for NFSv4.2 inter server to
> +	  server copy where the destination server calls the NFSv4.2
> +	  client to read the data to copy from the source server.
> +
> +	  If unsure, say N.
> +
>  config NFSD_V4_SECURITY_LABEL
>  	bool "Provide Security Label support for NFSv4 server"
>  	depends on NFSD_V4 && SECURITY
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 8c2273e..1039528 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -504,12 +504,20 @@ static __be32 nfsd4_open_omfg(struct svc_rqst *rqstp, struct nfsd4_compound_stat
>  	    union nfsd4_op_u *u)
>  {
>  	struct nfsd4_putfh *putfh = &u->putfh;
> +	__be32 ret;
>  
>  	fh_put(&cstate->current_fh);
>  	cstate->current_fh.fh_handle.fh_size = putfh->pf_fhlen;
>  	memcpy(&cstate->current_fh.fh_handle.fh_base, putfh->pf_fhval,
>  	       putfh->pf_fhlen);
> -	return fh_verify(rqstp, &cstate->current_fh, 0, NFSD_MAY_BYPASS_GSS);
> +	ret = fh_verify(rqstp, &cstate->current_fh, 0, NFSD_MAY_BYPASS_GSS);
> +#ifdef CONFIG_NFSD_V4_2_INTER_SSC
> +	if (ret == nfserr_stale && putfh->no_verify) {
> +		SET_FH_FLAG(&cstate->current_fh, NFSD4_FH_FOREIGN);
> +		ret = 0;
> +	}
> +#endif
> +	return ret;
>  }
>  
>  static __be32
> @@ -1956,6 +1964,41 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
>  		- rqstp->rq_auth_slack;
>  }
>  
> +#ifdef CONFIG_NFSD_V4_2_INTER_SSC
> +static void
> +check_if_stalefh_allowed(struct nfsd4_compoundargs *args)
> +{
> +	struct nfsd4_op	*op, *current_op, *saved_op;
> +	struct nfsd4_copy *copy;
> +	struct nfsd4_putfh *putfh;
> +	int i;
> +
> +	/* traverse all operation and if it's a COPY compound, mark the
> +	 * source filehandle to skip verification
> +	 */
> +	for (i = 0; i < args->opcnt; i++) {
> +		op = &args->ops[i];
> +		if (op->opnum == OP_PUTFH)
> +			current_op = op;
> +		else if (op->opnum == OP_SAVEFH)
> +			saved_op = current_op;
> +		else if (op->opnum == OP_RESTOREFH)
> +			current_op = saved_op;
> +		else if (op->opnum == OP_COPY) {
> +			copy = (struct nfsd4_copy *)&op->u;
> +			putfh = (struct nfsd4_putfh *)&saved_op->u;
> +			if (!copy->cp_intra)
> +				putfh->no_verify = true;

Won't this crash on a compound with a COPY but no preceding PUTFH and
SAVEFH?  Or is that checked elsewhere?  I was expecting a check in that
last clause like

	if (!saved_op)
		/* return some error */
		/* or just continue and let nfsd4_copy catch the error */

--b.

> +		}
> +	}
> +}
> +#else
> +static void
> +check_if_stalefh_allowed(struct nfsd4_compoundargs *args)
> +{
> +}
> +#endif
> +
>  /*
>   * COMPOUND call.
>   */
> @@ -2004,6 +2047,7 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
>  		resp->opcnt = 1;
>  		goto encode_op;
>  	}
> +	check_if_stalefh_allowed(args);
>  
>  	trace_nfsd_compound(rqstp, args->opcnt);
>  	while (!status && resp->opcnt < args->opcnt) {
> @@ -2019,13 +2063,14 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
>  				op->status = nfsd4_open_omfg(rqstp, cstate, op);
>  			goto encode_op;
>  		}
> -
> -		if (!current_fh->fh_dentry) {
> +		if (!current_fh->fh_dentry &&
> +				!HAS_FH_FLAG(current_fh, NFSD4_FH_FOREIGN)) {
>  			if (!(op->opdesc->op_flags & ALLOWED_WITHOUT_FH)) {
>  				op->status = nfserr_nofilehandle;
>  				goto encode_op;
>  			}
> -		} else if (current_fh->fh_export->ex_fslocs.migrated &&
> +		} else if (current_fh->fh_export &&
> +			   current_fh->fh_export->ex_fslocs.migrated &&
>  			  !(op->opdesc->op_flags & ALLOWED_ON_ABSENT_FS)) {
>  			op->status = nfserr_moved;
>  			goto encode_op;
> diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
> index 755e256..b9c7568 100644
> --- a/fs/nfsd/nfsfh.h
> +++ b/fs/nfsd/nfsfh.h
> @@ -35,7 +35,7 @@ static inline ino_t u32_to_ino_t(__u32 uino)
>  
>  	bool			fh_locked;	/* inode locked by us */
>  	bool			fh_want_write;	/* remount protection taken */
> -
> +	int			fh_flags;	/* FH flags */
>  #ifdef CONFIG_NFSD_V3
>  	bool			fh_post_saved;	/* post-op attrs saved */
>  	bool			fh_pre_saved;	/* pre-op attrs saved */
> @@ -56,6 +56,9 @@ static inline ino_t u32_to_ino_t(__u32 uino)
>  #endif /* CONFIG_NFSD_V3 */
>  
>  } svc_fh;
> +#define NFSD4_FH_FOREIGN (1<<0)
> +#define SET_FH_FLAG(c, f) ((c)->fh_flags |= (f))
> +#define HAS_FH_FLAG(c, f) ((c)->fh_flags & (f))
>  
>  enum nfsd_fsid {
>  	FSID_DEV = 0,
> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
> index 9d7318c..fbd18d6 100644
> --- a/fs/nfsd/xdr4.h
> +++ b/fs/nfsd/xdr4.h
> @@ -221,6 +221,7 @@ struct nfsd4_lookup {
>  struct nfsd4_putfh {
>  	u32		pf_fhlen;           /* request */
>  	char		*pf_fhval;          /* request */
> +	bool		no_verify;	    /* represents foreigh fh */
>  };
>  
>  struct nfsd4_open {
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 4/8] NFSD add COPY_NOTIFY operation
  2019-07-23 20:45       ` J. Bruce Fields
@ 2019-07-30 15:48         ` Olga Kornievskaia
  2019-07-30 15:55           ` J. Bruce Fields
  0 siblings, 1 reply; 51+ messages in thread
From: Olga Kornievskaia @ 2019-07-30 15:48 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: J. Bruce Fields, linux-nfs

On Tue, Jul 23, 2019 at 4:46 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Mon, Jul 22, 2019 at 04:17:44PM -0400, Olga Kornievskaia wrote:
> > On Wed, Jul 17, 2019 at 7:07 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> > >
> > > On Mon, Jul 08, 2019 at 03:23:48PM -0400, Olga Kornievskaia wrote:
> > > > @@ -726,24 +727,53 @@ struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *sla
> > > >  /*
> > > >   * Create a unique stateid_t to represent each COPY.
> > > >   */
> > > > -int nfs4_init_cp_state(struct nfsd_net *nn, struct nfsd4_copy *copy)
> > > > +static int nfs4_init_cp_state(struct nfsd_net *nn, void *ptr, stateid_t *stid)
> > > >  {
> > > >       int new_id;
> > > >
> > > >       idr_preload(GFP_KERNEL);
> > > >       spin_lock(&nn->s2s_cp_lock);
> > > > -     new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, copy, 0, 0, GFP_NOWAIT);
> > > > +     new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, ptr, 0, 0, GFP_NOWAIT);
> > > >       spin_unlock(&nn->s2s_cp_lock);
> > > >       idr_preload_end();
> > > >       if (new_id < 0)
> > > >               return 0;
> > > > -     copy->cp_stateid.si_opaque.so_id = new_id;
> > > > -     copy->cp_stateid.si_opaque.so_clid.cl_boot = nn->boot_time;
> > > > -     copy->cp_stateid.si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
> > > > +     stid->si_opaque.so_id = new_id;
> > > > +     stid->si_opaque.so_clid.cl_boot = nn->boot_time;
> > > > +     stid->si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
> > > >       return 1;
> > > >  }
> > > >
> > > > -void nfs4_free_cp_state(struct nfsd4_copy *copy)
> > > > +int nfs4_init_copy_state(struct nfsd_net *nn, struct nfsd4_copy *copy)
> > > > +{
> > > > +     return nfs4_init_cp_state(nn, copy, &copy->cp_stateid);
> > > > +}
> > >
> > > This little bit of refactoring could go into a seperate patch.  It's
> > > easier for me to review lots of smaller patches.
> > >
> > > But I don't understand why you're doing it.
> > >
> > > Also, I'm a little suspicious of code that doesn't initialize an object
> > > till after it's been added to a global structure.  The more typical
> > > pattern is:
> > >
> > >
> > >         initialize foo
> > >         take locks, add foo global structure, drop locks.
> > >
> > > This prevents anyone doing a lookup from finding "foo" while it's still
> > > in a partially initialized state.
> >
> > Let me try to explain the change. This change is due to the fact that
> > now both COPY_NOTIFY and COPY both are generating unique stateid
> > (COPY_NOTIFY needs a unique stateid to passed into the COPY and COPY
> > is generating a unique stateid to be referred to by callbacks).
> > Previously we had just the COPY generating the stateid (so it was
> > stored in the nfs4_copy structure) but now we have the COPY_NOTIFY
> > which doesn't create nfs4_copy when it's processing the operation but
> > still needs a unique stateid (stored in the stateid structure).
>
> The usual way to handle a situation like this is to store in the idr a
> pointer to the stateid (copy->cp_stateid or cps->cp_stateid).  When you

Ok I'll store the stateid directly.

> do a lookup you do something like:
>
>         st = idr_find(...);
>         copy = container_of(st, struct nfsd4_copy, cp_stateid);
>
> to get a copy to the larger structure.
>
> By the way, in find_internal_cpntf_state, a buggy or malicious client
> could cause idr_find to look up a copy (not a copy_notify) stateid.  The
> code needs some way to distinguish the two cases.  You could use a
> different cl_id for the two cases.  That might also be handy for
> debugging.  And/or you could do as we do in the case of open, lock, and
> other stateid's and embed a common structure that also includes a "type"
> field.  (See nfs4_stid->sc_type).

I'll add 2 new sc_types and make sure that during the look up the
entry retrieved is of the appropriate type.


> > Let me see if I understand your suspicion and ask for guidance how to
> > resolve it as perhaps I'm misusing the function. idr_alloc_cyclic()
> > keeps track of the structure of the 2nd arguments with a value it
> > returns. How do I initiate the structure with the value of the
> > function without knowing the value which can only be returned when I
> > call the function to add it to the list? what you are suggesting is to
> > somehow get the value for the new_id but not associate anything then
> > update the copy structure with that value and then call
> > idr_alloc_cyclic() (or something else) to create that association of
> > the new_id and the structure? I don't know how to do that.
>
> You could move the initialization under the s2s_cp_lock.  But there's
> additional initialization that's done in the caller.

I still don't understand what you are looking for here and why. I'm
following what the normal stid allocation does. There is no extra code
there to see if it initiated or not. nfs4_alloc_stid() calls
idr_alloc_cyclic() creates an association between the stid pointer and
at the time uninitialized nfs4_stid structure which is then filled in
with the return of the idr_alloc_cyclic(). That's exactly what the new
code is doing (well accept that i'll change it to store the
stateid_t).

> So, either this needs more locking, or maybe some flag value set to
> indicate that the object is initialized and safe to use.  (In the case
> of open/lock/etc.  stateid's I think that is sc_type.  I'm not
> completely convinced we've got that correct, though.)
>
> --b.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 1/8] NFSD fill-in netloc4 structure
  2019-07-22 19:59     ` Olga Kornievskaia
@ 2019-07-30 15:48       ` Olga Kornievskaia
  2019-07-30 15:51         ` J. Bruce Fields
  0 siblings, 1 reply; 51+ messages in thread
From: Olga Kornievskaia @ 2019-07-30 15:48 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: J. Bruce Fields, linux-nfs

On Mon, Jul 22, 2019 at 3:59 PM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> On Wed, Jul 17, 2019 at 5:13 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> >
> > On Mon, Jul 08, 2019 at 03:23:45PM -0400, Olga Kornievskaia wrote:
> > > From: Olga Kornievskaia <kolga@netapp.com>
> > >
> > > nfs.4 defines nfs42_netaddr structure that represents netloc4.
> > >
> > > Populate needed fields from the sockaddr structure.
> > >
> > > This will be used by flexfiles and 4.2 inter copy
> > >
> > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > ---
> > >  fs/nfsd/nfsd.h | 32 ++++++++++++++++++++++++++++++++
> > >  1 file changed, 32 insertions(+)
> > >
> > > diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> > > index 24187b5..8f4fc50 100644
> > > --- a/fs/nfsd/nfsd.h
> > > +++ b/fs/nfsd/nfsd.h
> > > @@ -19,6 +19,7 @@
> > >  #include <linux/sunrpc/svc.h>
> > >  #include <linux/sunrpc/svc_xprt.h>
> > >  #include <linux/sunrpc/msg_prot.h>
> > > +#include <linux/sunrpc/addr.h>
> > >
> > >  #include <uapi/linux/nfsd/debug.h>
> > >
> > > @@ -375,6 +376,37 @@ static inline bool nfsd4_spo_must_allow(struct svc_rqst *rqstp)
> > >
> > >  extern const u32 nfsd_suppattrs[3][3];
> > >
> > > +static inline u32 nfsd4_set_netaddr(struct sockaddr *addr,
> > > +                                 struct nfs42_netaddr *netaddr)
> > > +{
> > > +     struct sockaddr_in *sin = (struct sockaddr_in *)addr;
> > > +     struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)addr;
> > > +     unsigned int port;
> > > +     size_t ret_addr, ret_port;
> > > +
> > > +     switch (addr->sa_family) {
> > > +     case AF_INET:
> > > +             port = ntohs(sin->sin_port);
> > > +             sprintf(netaddr->netid, "tcp");
> > > +             netaddr->netid_len = 3;
> > > +             break;
> > > +     case AF_INET6:
> > > +             port = ntohs(sin6->sin6_port);
> > > +             sprintf(netaddr->netid, "tcp6");
> > > +             netaddr->netid_len = 4;
> > > +             break;
> > > +     default:
> > > +             return nfserr_inval;
> > > +     }
> > > +     ret_addr = rpc_ntop(addr, netaddr->addr, sizeof(netaddr->addr));
> > > +     ret_port = snprintf(netaddr->addr + ret_addr,
> > > +                         RPCBIND_MAXUADDRLEN + 1 - ret_addr,
> > > +                         ".%u.%u", port >> 8, port & 0xff);
> > > +     WARN_ON(ret_port >= RPCBIND_MAXUADDRLEN + 1 - ret_addr);
> > > +     netaddr->addr_len = ret_addr + ret_port;
> > > +     return 0;
> > > +}
> >
> > Kinda surprised we don't already do something like this elsewhere, but I
> > don't see anything exactly the same.  Might be possible to put this in
> > net/sunrpc/addr.c and share some code with rpc_sockaddr2uaddr?  I'll
> > leave it to you whether that looks worth it.
>
> I'll investigate how to move this into the sunrpc. Client also
> populates nfs42_netaddr structure but it's slightly different. I need
> to go back and see if what's shareable.

Ok I'd like argue for the code to stay as is because
1. can't move the whole function into addr.c because it created a data
structure (nfs42_netaddr) that rpc knows nothing about
2. While the nfs42_netaddr->addr is the output of the rpc_sock2uaddr()
but we still need the switch to populate the netid . Also since
rpc_sock2uaddr returns memory than the nfs42_netaddr data structure
needs to change to store pointers (and that's shared with the client).
Thus client and server would need to add other code to free the
created netaddr.
3. this function as is can be used by the flexfile layout as well
(they also decided not to share code with rpc_sockaddr2uaddr but use
same content). that function also doesn't want the memory to be
allocated.

Maybe I'm wrong about all of it and it all needs to be re-written to
take dynamic memory. But to use as is I don't want to call it and then
memcpy into existing static buffers and freeing what
rpc_sockaddr2uaddr has allocated.



>
> >
> > --b.
> >
> > > +
> > >  static inline bool bmval_is_subset(const u32 *bm1, const u32 *bm2)
> > >  {
> > >       return !((bm1[0] & ~bm2[0]) ||
> > > --
> > > 1.8.3.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 7/8] NFSD: allow inter server COPY to have a STALE source server fh
  2019-07-23 21:35   ` bfields
@ 2019-07-30 15:48     ` Olga Kornievskaia
  0 siblings, 0 replies; 51+ messages in thread
From: Olga Kornievskaia @ 2019-07-30 15:48 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: J. Bruce Fields, linux-nfs

On Tue, Jul 23, 2019 at 5:35 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Mon, Jul 08, 2019 at 03:23:51PM -0400, Olga Kornievskaia wrote:
> > The inter server to server COPY source server filehandle
> > is a foreign filehandle as the COPY is sent to the destination
> > server.
> >
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > ---
> >  fs/nfsd/Kconfig    | 10 ++++++++++
> >  fs/nfsd/nfs4proc.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++----
> >  fs/nfsd/nfsfh.h    |  5 ++++-
> >  fs/nfsd/xdr4.h     |  1 +
> >  4 files changed, 64 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
> > index d25f6bb..bef3a58 100644
> > --- a/fs/nfsd/Kconfig
> > +++ b/fs/nfsd/Kconfig
> > @@ -132,6 +132,16 @@ config NFSD_FLEXFILELAYOUT
> >
> >         If unsure, say N.
> >
> > +config NFSD_V4_2_INTER_SSC
> > +     bool "NFSv4.2 inter server to server COPY"
> > +     depends on NFSD_V4 && NFS_V4_1 && NFS_V4_2
> > +     help
> > +       This option enables support for NFSv4.2 inter server to
> > +       server copy where the destination server calls the NFSv4.2
> > +       client to read the data to copy from the source server.
> > +
> > +       If unsure, say N.
> > +
> >  config NFSD_V4_SECURITY_LABEL
> >       bool "Provide Security Label support for NFSv4 server"
> >       depends on NFSD_V4 && SECURITY
> > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > index 8c2273e..1039528 100644
> > --- a/fs/nfsd/nfs4proc.c
> > +++ b/fs/nfsd/nfs4proc.c
> > @@ -504,12 +504,20 @@ static __be32 nfsd4_open_omfg(struct svc_rqst *rqstp, struct nfsd4_compound_stat
> >           union nfsd4_op_u *u)
> >  {
> >       struct nfsd4_putfh *putfh = &u->putfh;
> > +     __be32 ret;
> >
> >       fh_put(&cstate->current_fh);
> >       cstate->current_fh.fh_handle.fh_size = putfh->pf_fhlen;
> >       memcpy(&cstate->current_fh.fh_handle.fh_base, putfh->pf_fhval,
> >              putfh->pf_fhlen);
> > -     return fh_verify(rqstp, &cstate->current_fh, 0, NFSD_MAY_BYPASS_GSS);
> > +     ret = fh_verify(rqstp, &cstate->current_fh, 0, NFSD_MAY_BYPASS_GSS);
> > +#ifdef CONFIG_NFSD_V4_2_INTER_SSC
> > +     if (ret == nfserr_stale && putfh->no_verify) {
> > +             SET_FH_FLAG(&cstate->current_fh, NFSD4_FH_FOREIGN);
> > +             ret = 0;
> > +     }
> > +#endif
> > +     return ret;
> >  }
> >
> >  static __be32
> > @@ -1956,6 +1964,41 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
> >               - rqstp->rq_auth_slack;
> >  }
> >
> > +#ifdef CONFIG_NFSD_V4_2_INTER_SSC
> > +static void
> > +check_if_stalefh_allowed(struct nfsd4_compoundargs *args)
> > +{
> > +     struct nfsd4_op *op, *current_op, *saved_op;
> > +     struct nfsd4_copy *copy;
> > +     struct nfsd4_putfh *putfh;
> > +     int i;
> > +
> > +     /* traverse all operation and if it's a COPY compound, mark the
> > +      * source filehandle to skip verification
> > +      */
> > +     for (i = 0; i < args->opcnt; i++) {
> > +             op = &args->ops[i];
> > +             if (op->opnum == OP_PUTFH)
> > +                     current_op = op;
> > +             else if (op->opnum == OP_SAVEFH)
> > +                     saved_op = current_op;
> > +             else if (op->opnum == OP_RESTOREFH)
> > +                     current_op = saved_op;
> > +             else if (op->opnum == OP_COPY) {
> > +                     copy = (struct nfsd4_copy *)&op->u;
> > +                     putfh = (struct nfsd4_putfh *)&saved_op->u;
> > +                     if (!copy->cp_intra)
> > +                             putfh->no_verify = true;
>
> Won't this crash on a compound with a COPY but no preceding PUTFH and
> SAVEFH?  Or is that checked elsewhere?  I was expecting a check in that
> last clause like

Yes it will crash. I will check and return ERR_NOFILEHANDLE if no
filehandle was stored.


>
>         if (!saved_op)
>                 /* return some error */
>                 /* or just continue and let nfsd4_copy catch the error */
>
> --b.
>
> > +             }
> > +     }
> > +}
> > +#else
> > +static void
> > +check_if_stalefh_allowed(struct nfsd4_compoundargs *args)
> > +{
> > +}
> > +#endif
> > +
> >  /*
> >   * COMPOUND call.
> >   */
> > @@ -2004,6 +2047,7 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
> >               resp->opcnt = 1;
> >               goto encode_op;
> >       }
> > +     check_if_stalefh_allowed(args);
> >
> >       trace_nfsd_compound(rqstp, args->opcnt);
> >       while (!status && resp->opcnt < args->opcnt) {
> > @@ -2019,13 +2063,14 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
> >                               op->status = nfsd4_open_omfg(rqstp, cstate, op);
> >                       goto encode_op;
> >               }
> > -
> > -             if (!current_fh->fh_dentry) {
> > +             if (!current_fh->fh_dentry &&
> > +                             !HAS_FH_FLAG(current_fh, NFSD4_FH_FOREIGN)) {
> >                       if (!(op->opdesc->op_flags & ALLOWED_WITHOUT_FH)) {
> >                               op->status = nfserr_nofilehandle;
> >                               goto encode_op;
> >                       }
> > -             } else if (current_fh->fh_export->ex_fslocs.migrated &&
> > +             } else if (current_fh->fh_export &&
> > +                        current_fh->fh_export->ex_fslocs.migrated &&
> >                         !(op->opdesc->op_flags & ALLOWED_ON_ABSENT_FS)) {
> >                       op->status = nfserr_moved;
> >                       goto encode_op;
> > diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
> > index 755e256..b9c7568 100644
> > --- a/fs/nfsd/nfsfh.h
> > +++ b/fs/nfsd/nfsfh.h
> > @@ -35,7 +35,7 @@ static inline ino_t u32_to_ino_t(__u32 uino)
> >
> >       bool                    fh_locked;      /* inode locked by us */
> >       bool                    fh_want_write;  /* remount protection taken */
> > -
> > +     int                     fh_flags;       /* FH flags */
> >  #ifdef CONFIG_NFSD_V3
> >       bool                    fh_post_saved;  /* post-op attrs saved */
> >       bool                    fh_pre_saved;   /* pre-op attrs saved */
> > @@ -56,6 +56,9 @@ static inline ino_t u32_to_ino_t(__u32 uino)
> >  #endif /* CONFIG_NFSD_V3 */
> >
> >  } svc_fh;
> > +#define NFSD4_FH_FOREIGN (1<<0)
> > +#define SET_FH_FLAG(c, f) ((c)->fh_flags |= (f))
> > +#define HAS_FH_FLAG(c, f) ((c)->fh_flags & (f))
> >
> >  enum nfsd_fsid {
> >       FSID_DEV = 0,
> > diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
> > index 9d7318c..fbd18d6 100644
> > --- a/fs/nfsd/xdr4.h
> > +++ b/fs/nfsd/xdr4.h
> > @@ -221,6 +221,7 @@ struct nfsd4_lookup {
> >  struct nfsd4_putfh {
> >       u32             pf_fhlen;           /* request */
> >       char            *pf_fhval;          /* request */
> > +     bool            no_verify;          /* represents foreigh fh */
> >  };
> >
> >  struct nfsd4_open {
> > --
> > 1.8.3.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 1/8] NFSD fill-in netloc4 structure
  2019-07-30 15:48       ` Olga Kornievskaia
@ 2019-07-30 15:51         ` J. Bruce Fields
  0 siblings, 0 replies; 51+ messages in thread
From: J. Bruce Fields @ 2019-07-30 15:51 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: J. Bruce Fields, linux-nfs

On Tue, Jul 30, 2019 at 11:48:40AM -0400, Olga Kornievskaia wrote:
> Ok I'd like argue for the code to stay as is because
> 1. can't move the whole function into addr.c because it created a data
> structure (nfs42_netaddr) that rpc knows nothing about
> 2. While the nfs42_netaddr->addr is the output of the rpc_sock2uaddr()
> but we still need the switch to populate the netid . Also since
> rpc_sock2uaddr returns memory than the nfs42_netaddr data structure
> needs to change to store pointers (and that's shared with the client).
> Thus client and server would need to add other code to free the
> created netaddr.
> 3. this function as is can be used by the flexfile layout as well
> (they also decided not to share code with rpc_sockaddr2uaddr but use
> same content). that function also doesn't want the memory to be
> allocated.
> 
> Maybe I'm wrong about all of it and it all needs to be re-written to
> take dynamic memory. But to use as is I don't want to call it and then
> memcpy into existing static buffers and freeing what
> rpc_sockaddr2uaddr has allocated.

OK, that's fine.

--b.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 4/8] NFSD add COPY_NOTIFY operation
  2019-07-30 15:48         ` Olga Kornievskaia
@ 2019-07-30 15:55           ` J. Bruce Fields
  2019-07-30 16:13             ` Olga Kornievskaia
  0 siblings, 1 reply; 51+ messages in thread
From: J. Bruce Fields @ 2019-07-30 15:55 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: J. Bruce Fields, linux-nfs

On Tue, Jul 30, 2019 at 11:48:27AM -0400, Olga Kornievskaia wrote:
> On Tue, Jul 23, 2019 at 4:46 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> >
> > On Mon, Jul 22, 2019 at 04:17:44PM -0400, Olga Kornievskaia wrote:
> > > Let me see if I understand your suspicion and ask for guidance how to
> > > resolve it as perhaps I'm misusing the function. idr_alloc_cyclic()
> > > keeps track of the structure of the 2nd arguments with a value it
> > > returns. How do I initiate the structure with the value of the
> > > function without knowing the value which can only be returned when I
> > > call the function to add it to the list? what you are suggesting is to
> > > somehow get the value for the new_id but not associate anything then
> > > update the copy structure with that value and then call
> > > idr_alloc_cyclic() (or something else) to create that association of
> > > the new_id and the structure? I don't know how to do that.
> >
> > You could move the initialization under the s2s_cp_lock.  But there's
> > additional initialization that's done in the caller.
> 
> I still don't understand what you are looking for here and why. I'm
> following what the normal stid allocation does.  There is no extra code
> there to see if it initiated or not. nfs4_alloc_stid() calls
> idr_alloc_cyclic() creates an association between the stid pointer and
> at the time uninitialized nfs4_stid structure which is then filled in
> with the return of the idr_alloc_cyclic(). That's exactly what the new
> code is doing (well accept that i'll change it to store the
> stateid_t).

Yes, I'm a little worried about normal stid allocation too.  It's got
one extra safeguard because of the check for 0 sc_type in the lookup,
I haven't yet convinced myself that's enough.

The race I'm worried about is: one task does the idr allocation and
drops locks.  Before it has the chance to finish initializing the
object, a second task looks it up in the idr and does something with it.
It sees the not-yet-initialized fields.

--b.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 5/8] NFSD check stateids against copy stateids
  2019-07-23 20:58       ` J. Bruce Fields
@ 2019-07-30 16:03         ` Olga Kornievskaia
  2019-07-31 21:10           ` Olga Kornievskaia
  0 siblings, 1 reply; 51+ messages in thread
From: Olga Kornievskaia @ 2019-07-30 16:03 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: J. Bruce Fields, linux-nfs

On Tue, Jul 23, 2019 at 4:59 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Mon, Jul 22, 2019 at 04:24:08PM -0400, Olga Kornievskaia wrote:
> > On Fri, Jul 19, 2019 at 6:01 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> > >
> > > On Mon, Jul 08, 2019 at 03:23:49PM -0400, Olga Kornievskaia wrote:
> > > > Incoming stateid (used by a READ) could be a saved copy stateid.
> > > > On first use make it active and check that the copy has started
> > > > within the allowable lease time.
> > > >
> > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > ---
> > > >  fs/nfsd/nfs4state.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 45 insertions(+)
> > > >
> > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > index 2555eb9..b786625 100644
> > > > --- a/fs/nfsd/nfs4state.c
> > > > +++ b/fs/nfsd/nfs4state.c
> > > > @@ -5232,6 +5232,49 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
> > > >
> > > >       return 0;
> > > >  }
> > > > +/*
> > > > + * A READ from an inter server to server COPY will have a
> > > > + * copy stateid. Return the parent nfs4_stid.
> > > > + */
> > > > +static __be32 _find_cpntf_state(struct nfsd_net *nn, stateid_t *st,
> > > > +                  struct nfs4_cpntf_state **cps)
> > > > +{
> > > > +     struct nfs4_cpntf_state *state = NULL;
> > > > +
> > > > +     if (st->si_opaque.so_clid.cl_id != nn->s2s_cp_cl_id)
> > > > +             return nfserr_bad_stateid;
> > > > +     spin_lock(&nn->s2s_cp_lock);
> > > > +     state = idr_find(&nn->s2s_cp_stateids, st->si_opaque.so_id);
> > > > +     if (state)
> > > > +             refcount_inc(&state->cp_p_stid->sc_count);
> > > > +     spin_unlock(&nn->s2s_cp_lock);
> > > > +     if (!state)
> > > > +             return nfserr_bad_stateid;
> > > > +     *cps = state;
> > > > +     return 0;
> > > > +}
> > > > +
> > > > +static __be32 find_cpntf_state(struct nfsd_net *nn, stateid_t *st,
> > > > +                            struct nfs4_stid **stid)
> > > > +{
> > > > +     __be32 status;
> > > > +     struct nfs4_cpntf_state *cps = NULL;
> > > > +
> > > > +     status = _find_cpntf_state(nn, st, &cps);
> > > > +     if (status)
> > > > +             return status;
> > > > +
> > > > +     /* Did the inter server to server copy start in time? */
> > > > +     if (cps->cp_active == false && !time_after(cps->cp_timeout, jiffies)) {
> > > > +             nfs4_put_stid(cps->cp_p_stid);
> > > > +             return nfserr_partner_no_auth;
> > >
> > > I wonder whether instead of checking the time we should instead be
> > > destroying copy stateid's as they expire, so the fact that you were
> > > still able to look up the stateid suggests that it's good.  Or would
> > > that result in returning the wrong error here?  Just curious.
> >
> > In order to destroy copy stateid as they expire we need some thread
> > monitoring the copies and then remove the expired one.
>
> It would be just another thing to do in the laundromat thread.

This still seems simpler. You'd need to traverse the list and do more
work? What's the advantage of laundry vs this? Given that laundry
thread doesn't run all the time, there might still be a gap with it
was last run and stateid expiring before the next run.

>
> So when do we free these things?  The only free_cpntf_state() caller I
> can find is in nfsd4_offload_cancel,

There is a caller in the nfs4_put_stid. Copy notify state is freed
when the associated stateid going away.

> but I think the client only calls
> those in case of interrupts or other unusual events.  What about a copy
> that terminates normally?

At this point, are you asking about a copy state or a copy_notify
state? When the copy is done, then the destination server will free
the copy state. However, source server doesn't keep track of when the
source server is done with the copy (I don't think we want to do that
to store how much is read and state of the file seems like
unnecessary).

>
> > That seems like
> > a lot more work than what's currently there. The spec says that the
> > use of the copy has to start without a certain timeout and that's what
> > this is suppose to enforce. If the client took too long start the
> > copy, it'll get an error. I don't think it matters what error code is
> > returned BAD_STATEID or PARTNER_NO_AUTH both imply the stateid is bad.
> >
> > >
> > > > +     } else
> > > > +             cps->cp_active = true;
> > > > +
> > > > +     *stid = cps->cp_p_stid;
> > >
> > > What guarantees that cp_p_stid still points to a valid stateid?  (E.g.
> > > if this is an open stateid that has since been closed.)
> >
> > A copy (or copy_notify) stateid takes a reference on the parent, thus
> > we guaranteed that pointer is still a valid stateid.
>
> I only see a reference count taken when one is looked up, in
> find_internal_cpntf_state.  That's too late.

Hm, right so this is tricky. With copy_notify, if I were to take a
reference on the parent when copy_notify is processed, there is no way
to free this reference because the source server never knows when the
copy was done.



>
> --b.
>
> >
> > >
> > > --b.
> > >
> > > > +
> > > > +     return nfs_ok;
> > > > +}
> > > >
> > > >  /*
> > > >   * Checks for stateid operations
> > > > @@ -5264,6 +5307,8 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
> > > >       status = nfsd4_lookup_stateid(cstate, stateid,
> > > >                               NFS4_DELEG_STID|NFS4_OPEN_STID|NFS4_LOCK_STID,
> > > >                               &s, nn);
> > > > +     if (status == nfserr_bad_stateid)
> > > > +             status = find_cpntf_state(nn, stateid, &s);
> > > >       if (status)
> > > >               return status;
> > > >       status = nfsd4_stid_check_stateid_generation(stateid, s,
> > > > --
> > > > 1.8.3.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 4/8] NFSD add COPY_NOTIFY operation
  2019-07-30 15:55           ` J. Bruce Fields
@ 2019-07-30 16:13             ` Olga Kornievskaia
  2019-07-30 17:10               ` Olga Kornievskaia
  0 siblings, 1 reply; 51+ messages in thread
From: Olga Kornievskaia @ 2019-07-30 16:13 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: J. Bruce Fields, linux-nfs

On Tue, Jul 30, 2019 at 11:55 AM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Tue, Jul 30, 2019 at 11:48:27AM -0400, Olga Kornievskaia wrote:
> > On Tue, Jul 23, 2019 at 4:46 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> > >
> > > On Mon, Jul 22, 2019 at 04:17:44PM -0400, Olga Kornievskaia wrote:
> > > > Let me see if I understand your suspicion and ask for guidance how to
> > > > resolve it as perhaps I'm misusing the function. idr_alloc_cyclic()
> > > > keeps track of the structure of the 2nd arguments with a value it
> > > > returns. How do I initiate the structure with the value of the
> > > > function without knowing the value which can only be returned when I
> > > > call the function to add it to the list? what you are suggesting is to
> > > > somehow get the value for the new_id but not associate anything then
> > > > update the copy structure with that value and then call
> > > > idr_alloc_cyclic() (or something else) to create that association of
> > > > the new_id and the structure? I don't know how to do that.
> > >
> > > You could move the initialization under the s2s_cp_lock.  But there's
> > > additional initialization that's done in the caller.
> >
> > I still don't understand what you are looking for here and why. I'm
> > following what the normal stid allocation does.  There is no extra code
> > there to see if it initiated or not. nfs4_alloc_stid() calls
> > idr_alloc_cyclic() creates an association between the stid pointer and
> > at the time uninitialized nfs4_stid structure which is then filled in
> > with the return of the idr_alloc_cyclic(). That's exactly what the new
> > code is doing (well accept that i'll change it to store the
> > stateid_t).
>
> Yes, I'm a little worried about normal stid allocation too.  It's got
> one extra safeguard because of the check for 0 sc_type in the lookup,
> I haven't yet convinced myself that's enough.
>
> The race I'm worried about is: one task does the idr allocation and
> drops locks.  Before it has the chance to finish initializing the
> object, a second task looks it up in the idr and does something with it.
> It sees the not-yet-initialized fields.

Can the spin_lock() that we call before the idr_alloc_cyclic() be held
thru the initialization of the stid then? I'm just not sure what this
idr_preload_end() with a spin_lock but otherwise I don't see why we
can't and since idr_find() takes the same spin lock before the call,
it would solve the problem.

>
> --b.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 4/8] NFSD add COPY_NOTIFY operation
  2019-07-30 16:13             ` Olga Kornievskaia
@ 2019-07-30 17:10               ` Olga Kornievskaia
  0 siblings, 0 replies; 51+ messages in thread
From: Olga Kornievskaia @ 2019-07-30 17:10 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: J. Bruce Fields, linux-nfs

On Tue, Jul 30, 2019 at 12:13 PM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> On Tue, Jul 30, 2019 at 11:55 AM J. Bruce Fields <bfields@fieldses.org> wrote:
> >
> > On Tue, Jul 30, 2019 at 11:48:27AM -0400, Olga Kornievskaia wrote:
> > > On Tue, Jul 23, 2019 at 4:46 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> > > >
> > > > On Mon, Jul 22, 2019 at 04:17:44PM -0400, Olga Kornievskaia wrote:
> > > > > Let me see if I understand your suspicion and ask for guidance how to
> > > > > resolve it as perhaps I'm misusing the function. idr_alloc_cyclic()
> > > > > keeps track of the structure of the 2nd arguments with a value it
> > > > > returns. How do I initiate the structure with the value of the
> > > > > function without knowing the value which can only be returned when I
> > > > > call the function to add it to the list? what you are suggesting is to
> > > > > somehow get the value for the new_id but not associate anything then
> > > > > update the copy structure with that value and then call
> > > > > idr_alloc_cyclic() (or something else) to create that association of
> > > > > the new_id and the structure? I don't know how to do that.
> > > >
> > > > You could move the initialization under the s2s_cp_lock.  But there's
> > > > additional initialization that's done in the caller.
> > >
> > > I still don't understand what you are looking for here and why. I'm
> > > following what the normal stid allocation does.  There is no extra code
> > > there to see if it initiated or not. nfs4_alloc_stid() calls
> > > idr_alloc_cyclic() creates an association between the stid pointer and
> > > at the time uninitialized nfs4_stid structure which is then filled in
> > > with the return of the idr_alloc_cyclic(). That's exactly what the new
> > > code is doing (well accept that i'll change it to store the
> > > stateid_t).
> >
> > Yes, I'm a little worried about normal stid allocation too.  It's got
> > one extra safeguard because of the check for 0 sc_type in the lookup,
> > I haven't yet convinced myself that's enough.
> >
> > The race I'm worried about is: one task does the idr allocation and
> > drops locks.  Before it has the chance to finish initializing the
> > object, a second task looks it up in the idr and does something with it.
> > It sees the not-yet-initialized fields.
>
> Can the spin_lock() that we call before the idr_alloc_cyclic() be held
> thru the initialization of the stid then? I'm just not sure what this
> idr_preload_end() with a spin_lock but otherwise I don't see why we
> can't and since idr_find() takes the same spin lock before the call,
> it would solve the problem.

actually instead moving initialization of other stid fields prior to
calling the idr_alloc_cycle would never expose the un-initialized
value so

stid->..cl_boot = nn->boot_time
stid->.. cl_id = nn->..id
..
spinlock()
newid = idr_alloc_cycle(stid)
stid->..id = newid
unlock()


>
> >
> > --b.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 5/8] NFSD check stateids against copy stateids
  2019-07-30 16:03         ` Olga Kornievskaia
@ 2019-07-31 21:10           ` Olga Kornievskaia
  2019-07-31 21:51             ` J. Bruce Fields
  0 siblings, 1 reply; 51+ messages in thread
From: Olga Kornievskaia @ 2019-07-31 21:10 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: J. Bruce Fields, linux-nfs

On Tue, Jul 30, 2019 at 12:03 PM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> On Tue, Jul 23, 2019 at 4:59 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> >
> > On Mon, Jul 22, 2019 at 04:24:08PM -0400, Olga Kornievskaia wrote:
> > > On Fri, Jul 19, 2019 at 6:01 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> > > >
> > > > On Mon, Jul 08, 2019 at 03:23:49PM -0400, Olga Kornievskaia wrote:
> > > > > Incoming stateid (used by a READ) could be a saved copy stateid.
> > > > > On first use make it active and check that the copy has started
> > > > > within the allowable lease time.
> > > > >
> > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > > ---
> > > > >  fs/nfsd/nfs4state.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
> > > > >  1 file changed, 45 insertions(+)
> > > > >
> > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > > index 2555eb9..b786625 100644
> > > > > --- a/fs/nfsd/nfs4state.c
> > > > > +++ b/fs/nfsd/nfs4state.c
> > > > > @@ -5232,6 +5232,49 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
> > > > >
> > > > >       return 0;
> > > > >  }
> > > > > +/*
> > > > > + * A READ from an inter server to server COPY will have a
> > > > > + * copy stateid. Return the parent nfs4_stid.
> > > > > + */
> > > > > +static __be32 _find_cpntf_state(struct nfsd_net *nn, stateid_t *st,
> > > > > +                  struct nfs4_cpntf_state **cps)
> > > > > +{
> > > > > +     struct nfs4_cpntf_state *state = NULL;
> > > > > +
> > > > > +     if (st->si_opaque.so_clid.cl_id != nn->s2s_cp_cl_id)
> > > > > +             return nfserr_bad_stateid;
> > > > > +     spin_lock(&nn->s2s_cp_lock);
> > > > > +     state = idr_find(&nn->s2s_cp_stateids, st->si_opaque.so_id);
> > > > > +     if (state)
> > > > > +             refcount_inc(&state->cp_p_stid->sc_count);
> > > > > +     spin_unlock(&nn->s2s_cp_lock);
> > > > > +     if (!state)
> > > > > +             return nfserr_bad_stateid;
> > > > > +     *cps = state;
> > > > > +     return 0;
> > > > > +}
> > > > > +
> > > > > +static __be32 find_cpntf_state(struct nfsd_net *nn, stateid_t *st,
> > > > > +                            struct nfs4_stid **stid)
> > > > > +{
> > > > > +     __be32 status;
> > > > > +     struct nfs4_cpntf_state *cps = NULL;
> > > > > +
> > > > > +     status = _find_cpntf_state(nn, st, &cps);
> > > > > +     if (status)
> > > > > +             return status;
> > > > > +
> > > > > +     /* Did the inter server to server copy start in time? */
> > > > > +     if (cps->cp_active == false && !time_after(cps->cp_timeout, jiffies)) {
> > > > > +             nfs4_put_stid(cps->cp_p_stid);
> > > > > +             return nfserr_partner_no_auth;
> > > >
> > > > I wonder whether instead of checking the time we should instead be
> > > > destroying copy stateid's as they expire, so the fact that you were
> > > > still able to look up the stateid suggests that it's good.  Or would
> > > > that result in returning the wrong error here?  Just curious.
> > >
> > > In order to destroy copy stateid as they expire we need some thread
> > > monitoring the copies and then remove the expired one.
> >
> > It would be just another thing to do in the laundromat thread.
>
> This still seems simpler. You'd need to traverse the list and do more
> work? What's the advantage of laundry vs this? Given that laundry
> thread doesn't run all the time, there might still be a gap with it
> was last run and stateid expiring before the next run.
>
> >
> > So when do we free these things?  The only free_cpntf_state() caller I
> > can find is in nfsd4_offload_cancel,
>
> There is a caller in the nfs4_put_stid. Copy notify state is freed
> when the associated stateid going away.
>
> > but I think the client only calls
> > those in case of interrupts or other unusual events.  What about a copy
> > that terminates normally?
>
> At this point, are you asking about a copy state or a copy_notify
> state? When the copy is done, then the destination server will free
> the copy state. However, source server doesn't keep track of when the
> source server is done with the copy (I don't think we want to do that
> to store how much is read and state of the file seems like
> unnecessary).
>
> >
> > > That seems like
> > > a lot more work than what's currently there. The spec says that the
> > > use of the copy has to start without a certain timeout and that's what
> > > this is suppose to enforce. If the client took too long start the
> > > copy, it'll get an error. I don't think it matters what error code is
> > > returned BAD_STATEID or PARTNER_NO_AUTH both imply the stateid is bad.
> > >
> > > >
> > > > > +     } else
> > > > > +             cps->cp_active = true;
> > > > > +
> > > > > +     *stid = cps->cp_p_stid;
> > > >
> > > > What guarantees that cp_p_stid still points to a valid stateid?  (E.g.
> > > > if this is an open stateid that has since been closed.)
> > >
> > > A copy (or copy_notify) stateid takes a reference on the parent, thus
> > > we guaranteed that pointer is still a valid stateid.
> >
> > I only see a reference count taken when one is looked up, in
> > find_internal_cpntf_state.  That's too late.
>
> Hm, right so this is tricky. With copy_notify, if I were to take a
> reference on the parent when copy_notify is processed, there is no way
> to free this reference because the source server never knows when the
> copy was done.

I'm having difficulty with this patch because there is no good way to
know when the copy_notify stateid can be freed. What I can propose is
to have the linux client send a FREE_STATEID with the copy_notify
stateid and use that as the trigger to free the state. In that case,
I'll keep a reference on the parent until the FREE_STATEID is
received.

This is not in the spec (though seems like a good idea to tell the
source server it's ok to clean up) so other implementations might not
choose this approach so we'll have problems with stateids sticking
around.

Thoughts?

>
>
>
> >
> > --b.
> >
> > >
> > > >
> > > > --b.
> > > >
> > > > > +
> > > > > +     return nfs_ok;
> > > > > +}
> > > > >
> > > > >  /*
> > > > >   * Checks for stateid operations
> > > > > @@ -5264,6 +5307,8 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
> > > > >       status = nfsd4_lookup_stateid(cstate, stateid,
> > > > >                               NFS4_DELEG_STID|NFS4_OPEN_STID|NFS4_LOCK_STID,
> > > > >                               &s, nn);
> > > > > +     if (status == nfserr_bad_stateid)
> > > > > +             status = find_cpntf_state(nn, stateid, &s);
> > > > >       if (status)
> > > > >               return status;
> > > > >       status = nfsd4_stid_check_stateid_generation(stateid, s,
> > > > > --
> > > > > 1.8.3.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 5/8] NFSD check stateids against copy stateids
  2019-07-31 21:10           ` Olga Kornievskaia
@ 2019-07-31 21:51             ` J. Bruce Fields
  2019-08-01 14:12               ` Olga Kornievskaia
  0 siblings, 1 reply; 51+ messages in thread
From: J. Bruce Fields @ 2019-07-31 21:51 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: J. Bruce Fields, linux-nfs

On Wed, Jul 31, 2019 at 05:10:01PM -0400, Olga Kornievskaia wrote:
> I'm having difficulty with this patch because there is no good way to
> know when the copy_notify stateid can be freed. What I can propose is
> to have the linux client send a FREE_STATEID with the copy_notify
> stateid and use that as the trigger to free the state. In that case,
> I'll keep a reference on the parent until the FREE_STATEID is
> received.
> 
> This is not in the spec (though seems like a good idea to tell the
> source server it's ok to clean up) so other implementations might not
> choose this approach so we'll have problems with stateids sticking
> around.

https://tools.ietf.org/html/rfc7862#page-71

	"If the cnr_lease_time expires while the destination server is
	still reading the source file, the destination server is allowed
	to finish reading the file.  If the cnr_lease_time expires
	before the destination server uses READ or READ_PLUS to begin
	the transfer, the source server can use NFS4ERR_PARTNER_NO_AUTH
	to inform the destination server that the cnr_lease_time has
	expired."

The spec doesn't really define what "is allowed to finish reading the
file" means, but I think the source server should decide somehow whether
the target's done.  And "hasn't sent a read in cnr_lease_time" seems
like a pretty good conservative definition that would be easy to
enforce.  Worst case, if the network goes down for a couple minutes and
the target tries to pick up a copy where it left off, it'll get
PARTNER_NO_AUTH.  I assume that results in the same error being returned
the client, at which point the client knows that the copy_notify stateid
may have installed and can do what it chooses to recover (like send a
new copy_notify).

The FREE_STATEID might also be a good idea, but I guess we can't count
on it.

Maybe the spec could use some errata to clarify that FREE_STATEID is
allowed on copy_notify stateids, that clients should send it when
they're done, and that servers are allowed to expire copy_notify
stateid's even after their first use.

--b.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 5/8] NFSD check stateids against copy stateids
  2019-07-31 21:51             ` J. Bruce Fields
@ 2019-08-01 14:12               ` Olga Kornievskaia
  2019-08-01 15:12                 ` J. Bruce Fields
  0 siblings, 1 reply; 51+ messages in thread
From: Olga Kornievskaia @ 2019-08-01 14:12 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: J. Bruce Fields, linux-nfs

On Wed, Jul 31, 2019 at 5:51 PM J. Bruce Fields <bfields@redhat.com> wrote:
>
> On Wed, Jul 31, 2019 at 05:10:01PM -0400, Olga Kornievskaia wrote:
> > I'm having difficulty with this patch because there is no good way to
> > know when the copy_notify stateid can be freed. What I can propose is
> > to have the linux client send a FREE_STATEID with the copy_notify
> > stateid and use that as the trigger to free the state. In that case,
> > I'll keep a reference on the parent until the FREE_STATEID is
> > received.
> >
> > This is not in the spec (though seems like a good idea to tell the
> > source server it's ok to clean up) so other implementations might not
> > choose this approach so we'll have problems with stateids sticking
> > around.
>
> https://tools.ietf.org/html/rfc7862#page-71
>
>         "If the cnr_lease_time expires while the destination server is
>         still reading the source file, the destination server is allowed
>         to finish reading the file.  If the cnr_lease_time expires
>         before the destination server uses READ or READ_PLUS to begin
>         the transfer, the source server can use NFS4ERR_PARTNER_NO_AUTH
>         to inform the destination server that the cnr_lease_time has
>         expired."
>
> The spec doesn't really define what "is allowed to finish reading the
> file" means, but I think the source server should decide somehow whether
> the target's done.  And "hasn't sent a read in cnr_lease_time" seems
> like a pretty good conservative definition that would be easy to
> enforce.

"hasn't send a read in cnr_lease_time" is already enforced.

The problem is when the copy did start in normal time, it might take
unknown time to complete. If we limit copies to all be done with in a
cnr_lease_time or even some number of that, we'll get into problems
when files are large enough or network is slow enough that it will
make this method unusable.

> Worst case, if the network goes down for a couple minutes and
> the target tries to pick up a copy where it left off, it'll get
> PARTNER_NO_AUTH.  I assume that results in the same error being returned
> the client, at which point the client knows that the copy_notify stateid
> may have installed and can do what it chooses to recover (like send a
> new copy_notify).

Yes the client recovers but the cost of setting up the source server
to destination is huge so any retries would kill the performance.

>
> The FREE_STATEID might also be a good idea, but I guess we can't count
> on it.
>
> Maybe the spec could use some errata to clarify that FREE_STATEID is
> allowed on copy_notify stateids, that clients should send it when
> they're done, and that servers are allowed to expire copy_notify
> stateid's even after their first use.

FREE_STATEID is for a stateid which a copy_notify (or copy) stateid is
so I don't see anything that really needs any extra stating. I think
what's needed is specifying that for COPY_NOTIFY a client must do a
FREE_STATEID when its done with a stateid.

>
> --b.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 5/8] NFSD check stateids against copy stateids
  2019-08-01 14:12               ` Olga Kornievskaia
@ 2019-08-01 15:12                 ` J. Bruce Fields
  2019-08-01 15:41                   ` Olga Kornievskaia
  0 siblings, 1 reply; 51+ messages in thread
From: J. Bruce Fields @ 2019-08-01 15:12 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: J. Bruce Fields, linux-nfs

On Thu, Aug 01, 2019 at 10:12:11AM -0400, Olga Kornievskaia wrote:
> On Wed, Jul 31, 2019 at 5:51 PM J. Bruce Fields <bfields@redhat.com> wrote:
> >
> > On Wed, Jul 31, 2019 at 05:10:01PM -0400, Olga Kornievskaia wrote:
> > > I'm having difficulty with this patch because there is no good way to
> > > know when the copy_notify stateid can be freed. What I can propose is
> > > to have the linux client send a FREE_STATEID with the copy_notify
> > > stateid and use that as the trigger to free the state. In that case,
> > > I'll keep a reference on the parent until the FREE_STATEID is
> > > received.
> > >
> > > This is not in the spec (though seems like a good idea to tell the
> > > source server it's ok to clean up) so other implementations might not
> > > choose this approach so we'll have problems with stateids sticking
> > > around.
> >
> > https://tools.ietf.org/html/rfc7862#page-71
> >
> >         "If the cnr_lease_time expires while the destination server is
> >         still reading the source file, the destination server is allowed
> >         to finish reading the file.  If the cnr_lease_time expires
> >         before the destination server uses READ or READ_PLUS to begin
> >         the transfer, the source server can use NFS4ERR_PARTNER_NO_AUTH
> >         to inform the destination server that the cnr_lease_time has
> >         expired."
> >
> > The spec doesn't really define what "is allowed to finish reading the
> > file" means, but I think the source server should decide somehow whether
> > the target's done.  And "hasn't sent a read in cnr_lease_time" seems
> > like a pretty good conservative definition that would be easy to
> > enforce.
> 
> "hasn't send a read in cnr_lease_time" is already enforced.
> 
> The problem is when the copy did start in normal time, it might take
> unknown time to complete. If we limit copies to all be done with in a
> cnr_lease_time or even some number of that, we'll get into problems
> when files are large enough or network is slow enough that it will
> make this method unusable.

No, I'm just suggesting that if it's been more than cnr_lease_time since
the target server last sent a read using this stateid, then we could
free the stateid.

> > Worst case, if the network goes down for a couple minutes and
> > the target tries to pick up a copy where it left off, it'll get
> > PARTNER_NO_AUTH.  I assume that results in the same error being returned
> > the client, at which point the client knows that the copy_notify stateid
> > may have installed and can do what it chooses to recover (like send a
> > new copy_notify).
> 
> Yes the client recovers but the cost of setting up the source server
> to destination is huge so any retries would kill the performance.

In the rare case when the server goes an entire cnr_lease_time between
reads, the performance hit of recovery won't be an issue.

> > The FREE_STATEID might also be a good idea, but I guess we can't count
> > on it.
> >
> > Maybe the spec could use some errata to clarify that FREE_STATEID is
> > allowed on copy_notify stateids, that clients should send it when
> > they're done, and that servers are allowed to expire copy_notify
> > stateid's even after their first use.
> 
> FREE_STATEID is for a stateid

The discussion of FREE_STATEID in 4.1 says "The FREE_STATEID operation
is used to free a stateid that no longer has any associated locks
(including opens, byte-range locks, delegations, and layouts)."  A
clarification that it can be used for any stateid would be nice.  (Is
that true?  Do we want it for COPY stateid's too?)

--b.

> which a copy_notify (or copy) stateid is so I don't see anything that
> really needs any extra stating.
>
> I think what's needed is specifying that for COPY_NOTIFY a client must
> do a FREE_STATEID when its done with a stateid.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 5/8] NFSD check stateids against copy stateids
  2019-08-01 15:12                 ` J. Bruce Fields
@ 2019-08-01 15:41                   ` Olga Kornievskaia
  2019-08-01 18:06                     ` Olga Kornievskaia
  0 siblings, 1 reply; 51+ messages in thread
From: Olga Kornievskaia @ 2019-08-01 15:41 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: J. Bruce Fields, linux-nfs

On Thu, Aug 1, 2019 at 11:13 AM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Thu, Aug 01, 2019 at 10:12:11AM -0400, Olga Kornievskaia wrote:
> > On Wed, Jul 31, 2019 at 5:51 PM J. Bruce Fields <bfields@redhat.com> wrote:
> > >
> > > On Wed, Jul 31, 2019 at 05:10:01PM -0400, Olga Kornievskaia wrote:
> > > > I'm having difficulty with this patch because there is no good way to
> > > > know when the copy_notify stateid can be freed. What I can propose is
> > > > to have the linux client send a FREE_STATEID with the copy_notify
> > > > stateid and use that as the trigger to free the state. In that case,
> > > > I'll keep a reference on the parent until the FREE_STATEID is
> > > > received.
> > > >
> > > > This is not in the spec (though seems like a good idea to tell the
> > > > source server it's ok to clean up) so other implementations might not
> > > > choose this approach so we'll have problems with stateids sticking
> > > > around.
> > >
> > > https://tools.ietf.org/html/rfc7862#page-71
> > >
> > >         "If the cnr_lease_time expires while the destination server is
> > >         still reading the source file, the destination server is allowed
> > >         to finish reading the file.  If the cnr_lease_time expires
> > >         before the destination server uses READ or READ_PLUS to begin
> > >         the transfer, the source server can use NFS4ERR_PARTNER_NO_AUTH
> > >         to inform the destination server that the cnr_lease_time has
> > >         expired."
> > >
> > > The spec doesn't really define what "is allowed to finish reading the
> > > file" means, but I think the source server should decide somehow whether
> > > the target's done.  And "hasn't sent a read in cnr_lease_time" seems
> > > like a pretty good conservative definition that would be easy to
> > > enforce.
> >
> > "hasn't send a read in cnr_lease_time" is already enforced.
> >
> > The problem is when the copy did start in normal time, it might take
> > unknown time to complete. If we limit copies to all be done with in a
> > cnr_lease_time or even some number of that, we'll get into problems
> > when files are large enough or network is slow enough that it will
> > make this method unusable.
>
> No, I'm just suggesting that if it's been more than cnr_lease_time since
> the target server last sent a read using this stateid, then we could
> free the stateid.

That's reasonable. Let me do that.

> > > Worst case, if the network goes down for a couple minutes and
> > > the target tries to pick up a copy where it left off, it'll get
> > > PARTNER_NO_AUTH.  I assume that results in the same error being returned
> > > the client, at which point the client knows that the copy_notify stateid
> > > may have installed and can do what it chooses to recover (like send a
> > > new copy_notify).
> >
> > Yes the client recovers but the cost of setting up the source server
> > to destination is huge so any retries would kill the performance.
>
> In the rare case when the server goes an entire cnr_lease_time between
> reads, the performance hit of recovery won't be an issue.
>
> > > The FREE_STATEID might also be a good idea, but I guess we can't count
> > > on it.
> > >
> > > Maybe the spec could use some errata to clarify that FREE_STATEID is
> > > allowed on copy_notify stateids, that clients should send it when
> > > they're done, and that servers are allowed to expire copy_notify
> > > stateid's even after their first use.
> >
> > FREE_STATEID is for a stateid
>
> The discussion of FREE_STATEID in 4.1 says "The FREE_STATEID operation
> is used to free a stateid that no longer has any associated locks
> (including opens, byte-range locks, delegations, and layouts)."  A
> clarification that it can be used for any stateid would be nice.  (Is
> that true?  Do we want it for COPY stateid's too?)

We don't need it for the COPY stateids as there is a OFFLOAD_CANCEL if
the client wants to stop, otherwise, the destination server has no
problems with knowing when to free the copy stateid.

>
> --b.
>
> > which a copy_notify (or copy) stateid is so I don't see anything that
> > really needs any extra stating.
> >
> > I think what's needed is specifying that for COPY_NOTIFY a client must
> > do a FREE_STATEID when its done with a stateid.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 5/8] NFSD check stateids against copy stateids
  2019-08-01 15:41                   ` Olga Kornievskaia
@ 2019-08-01 18:06                     ` Olga Kornievskaia
  2019-08-01 18:11                       ` J. Bruce Fields
  0 siblings, 1 reply; 51+ messages in thread
From: Olga Kornievskaia @ 2019-08-01 18:06 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: J. Bruce Fields, linux-nfs

On Thu, Aug 1, 2019 at 11:41 AM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> On Thu, Aug 1, 2019 at 11:13 AM J. Bruce Fields <bfields@fieldses.org> wrote:
> >
> > On Thu, Aug 01, 2019 at 10:12:11AM -0400, Olga Kornievskaia wrote:
> > > On Wed, Jul 31, 2019 at 5:51 PM J. Bruce Fields <bfields@redhat.com> wrote:
> > > >
> > > > On Wed, Jul 31, 2019 at 05:10:01PM -0400, Olga Kornievskaia wrote:
> > > > > I'm having difficulty with this patch because there is no good way to
> > > > > know when the copy_notify stateid can be freed. What I can propose is
> > > > > to have the linux client send a FREE_STATEID with the copy_notify
> > > > > stateid and use that as the trigger to free the state. In that case,
> > > > > I'll keep a reference on the parent until the FREE_STATEID is
> > > > > received.
> > > > >
> > > > > This is not in the spec (though seems like a good idea to tell the
> > > > > source server it's ok to clean up) so other implementations might not
> > > > > choose this approach so we'll have problems with stateids sticking
> > > > > around.
> > > >
> > > > https://tools.ietf.org/html/rfc7862#page-71
> > > >
> > > >         "If the cnr_lease_time expires while the destination server is
> > > >         still reading the source file, the destination server is allowed
> > > >         to finish reading the file.  If the cnr_lease_time expires
> > > >         before the destination server uses READ or READ_PLUS to begin
> > > >         the transfer, the source server can use NFS4ERR_PARTNER_NO_AUTH
> > > >         to inform the destination server that the cnr_lease_time has
> > > >         expired."
> > > >
> > > > The spec doesn't really define what "is allowed to finish reading the
> > > > file" means, but I think the source server should decide somehow whether
> > > > the target's done.  And "hasn't sent a read in cnr_lease_time" seems
> > > > like a pretty good conservative definition that would be easy to
> > > > enforce.
> > >
> > > "hasn't send a read in cnr_lease_time" is already enforced.
> > >
> > > The problem is when the copy did start in normal time, it might take
> > > unknown time to complete. If we limit copies to all be done with in a
> > > cnr_lease_time or even some number of that, we'll get into problems
> > > when files are large enough or network is slow enough that it will
> > > make this method unusable.
> >
> > No, I'm just suggesting that if it's been more than cnr_lease_time since
> > the target server last sent a read using this stateid, then we could
> > free the stateid.
>
> That's reasonable. Let me do that.

Now that I need a global list for the copy_notify stateids, do you
have a preference for either to keep it of the nfs4_client structure
or the nfsd_net structure? I store async copies under the nfs4_client
structure but the laundromat traverses things in nfsd_net structure.

>
> > > > Worst case, if the network goes down for a couple minutes and
> > > > the target tries to pick up a copy where it left off, it'll get
> > > > PARTNER_NO_AUTH.  I assume that results in the same error being returned
> > > > the client, at which point the client knows that the copy_notify stateid
> > > > may have installed and can do what it chooses to recover (like send a
> > > > new copy_notify).
> > >
> > > Yes the client recovers but the cost of setting up the source server
> > > to destination is huge so any retries would kill the performance.
> >
> > In the rare case when the server goes an entire cnr_lease_time between
> > reads, the performance hit of recovery won't be an issue.
> >
> > > > The FREE_STATEID might also be a good idea, but I guess we can't count
> > > > on it.
> > > >
> > > > Maybe the spec could use some errata to clarify that FREE_STATEID is
> > > > allowed on copy_notify stateids, that clients should send it when
> > > > they're done, and that servers are allowed to expire copy_notify
> > > > stateid's even after their first use.
> > >
> > > FREE_STATEID is for a stateid
> >
> > The discussion of FREE_STATEID in 4.1 says "The FREE_STATEID operation
> > is used to free a stateid that no longer has any associated locks
> > (including opens, byte-range locks, delegations, and layouts)."  A
> > clarification that it can be used for any stateid would be nice.  (Is
> > that true?  Do we want it for COPY stateid's too?)
>
> We don't need it for the COPY stateids as there is a OFFLOAD_CANCEL if
> the client wants to stop, otherwise, the destination server has no
> problems with knowing when to free the copy stateid.
>
> >
> > --b.
> >
> > > which a copy_notify (or copy) stateid is so I don't see anything that
> > > really needs any extra stating.
> > >
> > > I think what's needed is specifying that for COPY_NOTIFY a client must
> > > do a FREE_STATEID when its done with a stateid.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 5/8] NFSD check stateids against copy stateids
  2019-08-01 18:06                     ` Olga Kornievskaia
@ 2019-08-01 18:11                       ` J. Bruce Fields
  2019-08-01 18:24                         ` Olga Kornievskaia
  0 siblings, 1 reply; 51+ messages in thread
From: J. Bruce Fields @ 2019-08-01 18:11 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: J. Bruce Fields, linux-nfs

On Thu, Aug 01, 2019 at 02:06:46PM -0400, Olga Kornievskaia wrote:
> On Thu, Aug 1, 2019 at 11:41 AM Olga Kornievskaia
> <olga.kornievskaia@gmail.com> wrote:
> >
> > On Thu, Aug 1, 2019 at 11:13 AM J. Bruce Fields <bfields@fieldses.org> wrote:
> > >
> > > On Thu, Aug 01, 2019 at 10:12:11AM -0400, Olga Kornievskaia wrote:
> > > > On Wed, Jul 31, 2019 at 5:51 PM J. Bruce Fields <bfields@redhat.com> wrote:
> > > > >
> > > > > On Wed, Jul 31, 2019 at 05:10:01PM -0400, Olga Kornievskaia wrote:
> > > > > > I'm having difficulty with this patch because there is no good way to
> > > > > > know when the copy_notify stateid can be freed. What I can propose is
> > > > > > to have the linux client send a FREE_STATEID with the copy_notify
> > > > > > stateid and use that as the trigger to free the state. In that case,
> > > > > > I'll keep a reference on the parent until the FREE_STATEID is
> > > > > > received.
> > > > > >
> > > > > > This is not in the spec (though seems like a good idea to tell the
> > > > > > source server it's ok to clean up) so other implementations might not
> > > > > > choose this approach so we'll have problems with stateids sticking
> > > > > > around.
> > > > >
> > > > > https://tools.ietf.org/html/rfc7862#page-71
> > > > >
> > > > >         "If the cnr_lease_time expires while the destination server is
> > > > >         still reading the source file, the destination server is allowed
> > > > >         to finish reading the file.  If the cnr_lease_time expires
> > > > >         before the destination server uses READ or READ_PLUS to begin
> > > > >         the transfer, the source server can use NFS4ERR_PARTNER_NO_AUTH
> > > > >         to inform the destination server that the cnr_lease_time has
> > > > >         expired."
> > > > >
> > > > > The spec doesn't really define what "is allowed to finish reading the
> > > > > file" means, but I think the source server should decide somehow whether
> > > > > the target's done.  And "hasn't sent a read in cnr_lease_time" seems
> > > > > like a pretty good conservative definition that would be easy to
> > > > > enforce.
> > > >
> > > > "hasn't send a read in cnr_lease_time" is already enforced.
> > > >
> > > > The problem is when the copy did start in normal time, it might take
> > > > unknown time to complete. If we limit copies to all be done with in a
> > > > cnr_lease_time or even some number of that, we'll get into problems
> > > > when files are large enough or network is slow enough that it will
> > > > make this method unusable.
> > >
> > > No, I'm just suggesting that if it's been more than cnr_lease_time since
> > > the target server last sent a read using this stateid, then we could
> > > free the stateid.
> >
> > That's reasonable. Let me do that.
> 
> Now that I need a global list for the copy_notify stateids, do you
> have a preference for either to keep it of the nfs4_client structure
> or the nfsd_net structure? I store async copies under the nfs4_client
> structure but the laundromat traverses things in nfsd_net structure.

If copy_notify stateids are associated with a client, then they must
already be reachable from the client somehow so they can be destroyed at
the time the client is, right?  I'm saying that without looking at the
code....

--b.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 5/8] NFSD check stateids against copy stateids
  2019-08-01 18:11                       ` J. Bruce Fields
@ 2019-08-01 18:24                         ` Olga Kornievskaia
  2019-08-01 19:36                           ` J. Bruce Fields
  0 siblings, 1 reply; 51+ messages in thread
From: Olga Kornievskaia @ 2019-08-01 18:24 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: J. Bruce Fields, linux-nfs

On Thu, Aug 1, 2019 at 2:12 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Thu, Aug 01, 2019 at 02:06:46PM -0400, Olga Kornievskaia wrote:
> > On Thu, Aug 1, 2019 at 11:41 AM Olga Kornievskaia
> > <olga.kornievskaia@gmail.com> wrote:
> > >
> > > On Thu, Aug 1, 2019 at 11:13 AM J. Bruce Fields <bfields@fieldses.org> wrote:
> > > >
> > > > On Thu, Aug 01, 2019 at 10:12:11AM -0400, Olga Kornievskaia wrote:
> > > > > On Wed, Jul 31, 2019 at 5:51 PM J. Bruce Fields <bfields@redhat.com> wrote:
> > > > > >
> > > > > > On Wed, Jul 31, 2019 at 05:10:01PM -0400, Olga Kornievskaia wrote:
> > > > > > > I'm having difficulty with this patch because there is no good way to
> > > > > > > know when the copy_notify stateid can be freed. What I can propose is
> > > > > > > to have the linux client send a FREE_STATEID with the copy_notify
> > > > > > > stateid and use that as the trigger to free the state. In that case,
> > > > > > > I'll keep a reference on the parent until the FREE_STATEID is
> > > > > > > received.
> > > > > > >
> > > > > > > This is not in the spec (though seems like a good idea to tell the
> > > > > > > source server it's ok to clean up) so other implementations might not
> > > > > > > choose this approach so we'll have problems with stateids sticking
> > > > > > > around.
> > > > > >
> > > > > > https://tools.ietf.org/html/rfc7862#page-71
> > > > > >
> > > > > >         "If the cnr_lease_time expires while the destination server is
> > > > > >         still reading the source file, the destination server is allowed
> > > > > >         to finish reading the file.  If the cnr_lease_time expires
> > > > > >         before the destination server uses READ or READ_PLUS to begin
> > > > > >         the transfer, the source server can use NFS4ERR_PARTNER_NO_AUTH
> > > > > >         to inform the destination server that the cnr_lease_time has
> > > > > >         expired."
> > > > > >
> > > > > > The spec doesn't really define what "is allowed to finish reading the
> > > > > > file" means, but I think the source server should decide somehow whether
> > > > > > the target's done.  And "hasn't sent a read in cnr_lease_time" seems
> > > > > > like a pretty good conservative definition that would be easy to
> > > > > > enforce.
> > > > >
> > > > > "hasn't send a read in cnr_lease_time" is already enforced.
> > > > >
> > > > > The problem is when the copy did start in normal time, it might take
> > > > > unknown time to complete. If we limit copies to all be done with in a
> > > > > cnr_lease_time or even some number of that, we'll get into problems
> > > > > when files are large enough or network is slow enough that it will
> > > > > make this method unusable.
> > > >
> > > > No, I'm just suggesting that if it's been more than cnr_lease_time since
> > > > the target server last sent a read using this stateid, then we could
> > > > free the stateid.
> > >
> > > That's reasonable. Let me do that.
> >
> > Now that I need a global list for the copy_notify stateids, do you
> > have a preference for either to keep it of the nfs4_client structure
> > or the nfsd_net structure? I store async copies under the nfs4_client
> > structure but the laundromat traverses things in nfsd_net structure.
>
> If copy_notify stateids are associated with a client, then they must
> already be reachable from the client somehow so they can be destroyed at
> the time the client is, right?  I'm saying that without looking at the
> code....

yes, i agree. but since we are taking a reference on a parent stateid
and the copy_notify state is destroyed at the destruction of the
parent id, then we'll never get there (or shouldn't get there). But I
can add something to the client destruction to make sure to delete
anything if it's there.

i was just looking at close_lru and delegation_lru but I guess that's
not a list of delegation or open stateids but rather some complex of
not deleting the stateid right away but moving it to nfs4_ol_stateid
and the list on the nfsd_net. Are you looking for something similar
for the copy_notify state or can I just keep a global list of the
nfs4_client and add and delete of that (not move to the delete later)?

> --b.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 5/8] NFSD check stateids against copy stateids
  2019-08-01 18:24                         ` Olga Kornievskaia
@ 2019-08-01 19:36                           ` J. Bruce Fields
  2019-08-07 16:02                             ` Olga Kornievskaia
  0 siblings, 1 reply; 51+ messages in thread
From: J. Bruce Fields @ 2019-08-01 19:36 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: J. Bruce Fields, linux-nfs

On Thu, Aug 01, 2019 at 02:24:04PM -0400, Olga Kornievskaia wrote:
> i was just looking at close_lru and delegation_lru but I guess that's
> not a list of delegation or open stateids but rather some complex of
> not deleting the stateid right away but moving it to nfs4_ol_stateid
> and the list on the nfsd_net. Are you looking for something similar
> for the copy_notify state or can I just keep a global list of the
> nfs4_client and add and delete of that (not move to the delete later)?

A global list seems like it should work if the locking's OK.

--b.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 5/8] NFSD check stateids against copy stateids
  2019-08-01 19:36                           ` J. Bruce Fields
@ 2019-08-07 16:02                             ` Olga Kornievskaia
  2019-08-07 16:08                               ` J. Bruce Fields
  0 siblings, 1 reply; 51+ messages in thread
From: Olga Kornievskaia @ 2019-08-07 16:02 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: J. Bruce Fields, linux-nfs

On Thu, Aug 1, 2019 at 3:36 PM J. Bruce Fields <bfields@redhat.com> wrote:
>
> On Thu, Aug 01, 2019 at 02:24:04PM -0400, Olga Kornievskaia wrote:
> > i was just looking at close_lru and delegation_lru but I guess that's
> > not a list of delegation or open stateids but rather some complex of
> > not deleting the stateid right away but moving it to nfs4_ol_stateid
> > and the list on the nfsd_net. Are you looking for something similar
> > for the copy_notify state or can I just keep a global list of the
> > nfs4_client and add and delete of that (not move to the delete later)?
>
> A global list seems like it should work if the locking's OK.

I'm having issues taking a reference on a parent stateid and being
able to clean it. Let me try to explain.

Since I take a reference on the stateid, then during what would have
been the last put (due to say a close operation), stateid isn't
released. Now that stateid is sticking around. I personally would have
liked on what would have been a close and release of the stateid to
release the copy notify state(s) (which was being done before but
having a reference makes it hard? i want to count number of copy
notify states and if then somehow if the num_copies-1 is going to make
it 0, then decrement by num_copies (and the normal -1) but if it's not
the last reference then it shouldn't be decremented.

Now say no fancy logic happens on close so we have these stateids left
over . What to do on unmount? It will error with err_client_busy since
there are non-zero copy notify states and only after a lease period it
will release the resources (when the close of the file should have
removed any copy notify state)?

Question: would it be acceptable to do something like this on freeing
of the parent stateid?

@@ -896,8 +931,12 @@ static void block_delegations(struct knfsd_fh *fh)
        might_lock(&clp->cl_lock);

        if (!refcount_dec_and_lock(&s->sc_count, &clp->cl_lock)) {
-               wake_up_all(&close_wq);
-               return;
+               if (!refcount_sub_and_test_checked(s->sc_cp_list_size,
+                               &s->sc_count)) {
+                       refcount_add_checked(s->sc_cp_list_size, &s->sc_count);
+                       wake_up_all(&close_wq);
+                       return;
+               }
        }
        idr_remove(&clp->cl_stateids, s->sc_stateid.si_opaque.so_id);
        spin_unlock(&clp->cl_lock);

then free the copy notify stateids associated with stateid.

Laundromat would still be checking the copy_notify stateids for
anything that's been not active for a while (but not closed).





>
> --b.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 5/8] NFSD check stateids against copy stateids
  2019-08-07 16:02                             ` Olga Kornievskaia
@ 2019-08-07 16:08                               ` J. Bruce Fields
  2019-08-07 16:42                                 ` Olga Kornievskaia
  0 siblings, 1 reply; 51+ messages in thread
From: J. Bruce Fields @ 2019-08-07 16:08 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: J. Bruce Fields, linux-nfs

On Wed, Aug 07, 2019 at 12:02:40PM -0400, Olga Kornievskaia wrote:
> On Thu, Aug 1, 2019 at 3:36 PM J. Bruce Fields <bfields@redhat.com> wrote:
> >
> > On Thu, Aug 01, 2019 at 02:24:04PM -0400, Olga Kornievskaia wrote:
> > > i was just looking at close_lru and delegation_lru but I guess that's
> > > not a list of delegation or open stateids but rather some complex of
> > > not deleting the stateid right away but moving it to nfs4_ol_stateid
> > > and the list on the nfsd_net. Are you looking for something similar
> > > for the copy_notify state or can I just keep a global list of the
> > > nfs4_client and add and delete of that (not move to the delete later)?
> >
> > A global list seems like it should work if the locking's OK.
> 
> I'm having issues taking a reference on a parent stateid and being
> able to clean it. Let me try to explain.

With other stateid parent relationships I believe what we do is: instead
of the child taking a reference on the parent, we ensure that the child
is destroyed, and that nobody can be holding a pointer to it, before we
destroy the parent.

--b.

> Since I take a reference on the stateid, then during what would have
> been the last put (due to say a close operation), stateid isn't
> released. Now that stateid is sticking around. I personally would have
> liked on what would have been a close and release of the stateid to
> release the copy notify state(s) (which was being done before but
> having a reference makes it hard? i want to count number of copy
> notify states and if then somehow if the num_copies-1 is going to make
> it 0, then decrement by num_copies (and the normal -1) but if it's not
> the last reference then it shouldn't be decremented.
> 
> Now say no fancy logic happens on close so we have these stateids left
> over . What to do on unmount? It will error with err_client_busy since
> there are non-zero copy notify states and only after a lease period it
> will release the resources (when the close of the file should have
> removed any copy notify state)?
> 
> Question: would it be acceptable to do something like this on freeing
> of the parent stateid?
> 
> @@ -896,8 +931,12 @@ static void block_delegations(struct knfsd_fh *fh)
>         might_lock(&clp->cl_lock);
> 
>         if (!refcount_dec_and_lock(&s->sc_count, &clp->cl_lock)) {
> -               wake_up_all(&close_wq);
> -               return;
> +               if (!refcount_sub_and_test_checked(s->sc_cp_list_size,
> +                               &s->sc_count)) {
> +                       refcount_add_checked(s->sc_cp_list_size, &s->sc_count);
> +                       wake_up_all(&close_wq);
> +                       return;
> +               }
>         }
>         idr_remove(&clp->cl_stateids, s->sc_stateid.si_opaque.so_id);
>         spin_unlock(&clp->cl_lock);
> 
> then free the copy notify stateids associated with stateid.
> 
> Laundromat would still be checking the copy_notify stateids for
> anything that's been not active for a while (but not closed).
> 
> 
> 
> 
> 
> >
> > --b.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 5/8] NFSD check stateids against copy stateids
  2019-08-07 16:08                               ` J. Bruce Fields
@ 2019-08-07 16:42                                 ` Olga Kornievskaia
  2019-08-08 11:25                                   ` J. Bruce Fields
  0 siblings, 1 reply; 51+ messages in thread
From: Olga Kornievskaia @ 2019-08-07 16:42 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: J. Bruce Fields, linux-nfs

On Wed, Aug 7, 2019 at 12:09 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Wed, Aug 07, 2019 at 12:02:40PM -0400, Olga Kornievskaia wrote:
> > On Thu, Aug 1, 2019 at 3:36 PM J. Bruce Fields <bfields@redhat.com> wrote:
> > >
> > > On Thu, Aug 01, 2019 at 02:24:04PM -0400, Olga Kornievskaia wrote:
> > > > i was just looking at close_lru and delegation_lru but I guess that's
> > > > not a list of delegation or open stateids but rather some complex of
> > > > not deleting the stateid right away but moving it to nfs4_ol_stateid
> > > > and the list on the nfsd_net. Are you looking for something similar
> > > > for the copy_notify state or can I just keep a global list of the
> > > > nfs4_client and add and delete of that (not move to the delete later)?
> > >
> > > A global list seems like it should work if the locking's OK.
> >
> > I'm having issues taking a reference on a parent stateid and being
> > able to clean it. Let me try to explain.
>
> With other stateid parent relationships I believe what we do is: instead
> of the child taking a reference on the parent, we ensure that the child
> is destroyed, and that nobody can be holding a pointer to it, before we
> destroy the parent.

I don't think we can get away from not taking a reference on the
parent. When a READ comes with the copy_notify stateid, it's used to
lookup the parent state because the nfs4_preprocess_stateid_op() that
checks the validity of the stateid for a given operation needs to
check validity of that parent stateid). Otherwise, we'd have to
special case the READ calling nfs4_preprocess_stateid_op() and special
call that function to when called from READ and finding a copy_notify
stateid will forego the other checks. Do you want me to that instead
of what I proposed below?

>
> --b.
>
> > Since I take a reference on the stateid, then during what would have
> > been the last put (due to say a close operation), stateid isn't
> > released. Now that stateid is sticking around. I personally would have
> > liked on what would have been a close and release of the stateid to
> > release the copy notify state(s) (which was being done before but
> > having a reference makes it hard? i want to count number of copy
> > notify states and if then somehow if the num_copies-1 is going to make
> > it 0, then decrement by num_copies (and the normal -1) but if it's not
> > the last reference then it shouldn't be decremented.
> >
> > Now say no fancy logic happens on close so we have these stateids left
> > over . What to do on unmount? It will error with err_client_busy since
> > there are non-zero copy notify states and only after a lease period it
> > will release the resources (when the close of the file should have
> > removed any copy notify state)?
> >
> > Question: would it be acceptable to do something like this on freeing
> > of the parent stateid?
> >
> > @@ -896,8 +931,12 @@ static void block_delegations(struct knfsd_fh *fh)
> >         might_lock(&clp->cl_lock);
> >
> >         if (!refcount_dec_and_lock(&s->sc_count, &clp->cl_lock)) {
> > -               wake_up_all(&close_wq);
> > -               return;
> > +               if (!refcount_sub_and_test_checked(s->sc_cp_list_size,
> > +                               &s->sc_count)) {
> > +                       refcount_add_checked(s->sc_cp_list_size, &s->sc_count);
> > +                       wake_up_all(&close_wq);
> > +                       return;
> > +               }
> >         }
> >         idr_remove(&clp->cl_stateids, s->sc_stateid.si_opaque.so_id);
> >         spin_unlock(&clp->cl_lock);
> >
> > then free the copy notify stateids associated with stateid.
> >
> > Laundromat would still be checking the copy_notify stateids for
> > anything that's been not active for a while (but not closed).
> >
> >
> >
> >
> >
> > >
> > > --b.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 5/8] NFSD check stateids against copy stateids
  2019-08-07 16:42                                 ` Olga Kornievskaia
@ 2019-08-08 11:25                                   ` J. Bruce Fields
  0 siblings, 0 replies; 51+ messages in thread
From: J. Bruce Fields @ 2019-08-08 11:25 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: J. Bruce Fields, linux-nfs

On Wed, Aug 07, 2019 at 12:42:08PM -0400, Olga Kornievskaia wrote:
> On Wed, Aug 7, 2019 at 12:09 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> >
> > On Wed, Aug 07, 2019 at 12:02:40PM -0400, Olga Kornievskaia wrote:
> > > On Thu, Aug 1, 2019 at 3:36 PM J. Bruce Fields <bfields@redhat.com> wrote:
> > > >
> > > > On Thu, Aug 01, 2019 at 02:24:04PM -0400, Olga Kornievskaia wrote:
> > > > > i was just looking at close_lru and delegation_lru but I guess that's
> > > > > not a list of delegation or open stateids but rather some complex of
> > > > > not deleting the stateid right away but moving it to nfs4_ol_stateid
> > > > > and the list on the nfsd_net. Are you looking for something similar
> > > > > for the copy_notify state or can I just keep a global list of the
> > > > > nfs4_client and add and delete of that (not move to the delete later)?
> > > >
> > > > A global list seems like it should work if the locking's OK.
> > >
> > > I'm having issues taking a reference on a parent stateid and being
> > > able to clean it. Let me try to explain.
> >
> > With other stateid parent relationships I believe what we do is: instead
> > of the child taking a reference on the parent, we ensure that the child
> > is destroyed, and that nobody can be holding a pointer to it, before we
> > destroy the parent.
> 
> I don't think we can get away from not taking a reference on the
> parent. When a READ comes with the copy_notify stateid, it's used to
> lookup the parent state because the nfs4_preprocess_stateid_op() that
> checks the validity of the stateid for a given operation needs to
> check validity of that parent stateid). Otherwise, we'd have to
> special case the READ calling nfs4_preprocess_stateid_op() and special
> call that function to when called from READ and finding a copy_notify
> stateid will forego the other checks. Do you want me to that instead
> of what I proposed below?

Um, honestly I'm not sure I understand your code below yet.  I'll take
another look....

> > > Since I take a reference on the stateid, then during what would have
> > > been the last put (due to say a close operation), stateid isn't
> > > released. Now that stateid is sticking around. I personally would have
> > > liked on what would have been a close and release of the stateid to
> > > release the copy notify state(s)

That's OK with me as long as it works.  Did I complain about it?  The
only real requirement is that we've got *some* way to assure that we
aren't going to find a copy_notify stateid and try to follow it to its
parent, after the parent's been freed.

--b.

> > > (which was being done before but
> > > having a reference makes it hard? i want to count number of copy
> > > notify states and if then somehow if the num_copies-1 is going to make
> > > it 0, then decrement by num_copies (and the normal -1) but if it's not
> > > the last reference then it shouldn't be decremented.
> > >
> > > Now say no fancy logic happens on close so we have these stateids left
> > > over . What to do on unmount? It will error with err_client_busy since
> > > there are non-zero copy notify states and only after a lease period it
> > > will release the resources (when the close of the file should have
> > > removed any copy notify state)?
> > >
> > > Question: would it be acceptable to do something like this on freeing
> > > of the parent stateid?
> > >
> > > @@ -896,8 +931,12 @@ static void block_delegations(struct knfsd_fh *fh)
> > >         might_lock(&clp->cl_lock);
> > >
> > >         if (!refcount_dec_and_lock(&s->sc_count, &clp->cl_lock)) {
> > > -               wake_up_all(&close_wq);
> > > -               return;
> > > +               if (!refcount_sub_and_test_checked(s->sc_cp_list_size,
> > > +                               &s->sc_count)) {
> > > +                       refcount_add_checked(s->sc_cp_list_size, &s->sc_count);
> > > +                       wake_up_all(&close_wq);
> > > +                       return;
> > > +               }
> > >         }
> > >         idr_remove(&clp->cl_stateids, s->sc_stateid.si_opaque.so_id);
> > >         spin_unlock(&clp->cl_lock);
> > >
> > > then free the copy notify stateids associated with stateid.
> > >
> > > Laundromat would still be checking the copy_notify stateids for
> > > anything that's been not active for a while (but not closed).
> > >
> > >
> > >
> > >
> > >
> > > >
> > > > --b.

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, back to index

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-08 19:23 [PATCH v4 0/8] server-side support for "inter" SSC copy Olga Kornievskaia
2019-07-08 19:23 ` [PATCH v4 1/8] NFSD fill-in netloc4 structure Olga Kornievskaia
2019-07-17 21:13   ` bfields
2019-07-22 19:59     ` Olga Kornievskaia
2019-07-30 15:48       ` Olga Kornievskaia
2019-07-30 15:51         ` J. Bruce Fields
2019-07-08 19:23 ` [PATCH v4 2/8] NFSD add ca_source_server<> to COPY Olga Kornievskaia
2019-07-17 21:40   ` bfields
2019-07-22 20:00     ` Olga Kornievskaia
2019-07-08 19:23 ` [PATCH v4 3/8] NFSD return nfs4_stid in nfs4_preprocess_stateid_op Olga Kornievskaia
2019-07-08 19:23 ` [PATCH v4 4/8] NFSD add COPY_NOTIFY operation Olga Kornievskaia
2019-07-09 12:34   ` Anna Schumaker
2019-07-09 15:51     ` Olga Kornievskaia
2019-07-17 22:12   ` bfields
2019-07-17 22:15   ` bfields
2019-07-22 20:03     ` Olga Kornievskaia
2019-07-17 23:07   ` bfields
2019-07-22 20:17     ` Olga Kornievskaia
2019-07-23 20:45       ` J. Bruce Fields
2019-07-30 15:48         ` Olga Kornievskaia
2019-07-30 15:55           ` J. Bruce Fields
2019-07-30 16:13             ` Olga Kornievskaia
2019-07-30 17:10               ` Olga Kornievskaia
2019-07-08 19:23 ` [PATCH v4 5/8] NFSD check stateids against copy stateids Olga Kornievskaia
2019-07-19 22:01   ` bfields
2019-07-22 20:24     ` Olga Kornievskaia
2019-07-23 20:58       ` J. Bruce Fields
2019-07-30 16:03         ` Olga Kornievskaia
2019-07-31 21:10           ` Olga Kornievskaia
2019-07-31 21:51             ` J. Bruce Fields
2019-08-01 14:12               ` Olga Kornievskaia
2019-08-01 15:12                 ` J. Bruce Fields
2019-08-01 15:41                   ` Olga Kornievskaia
2019-08-01 18:06                     ` Olga Kornievskaia
2019-08-01 18:11                       ` J. Bruce Fields
2019-08-01 18:24                         ` Olga Kornievskaia
2019-08-01 19:36                           ` J. Bruce Fields
2019-08-07 16:02                             ` Olga Kornievskaia
2019-08-07 16:08                               ` J. Bruce Fields
2019-08-07 16:42                                 ` Olga Kornievskaia
2019-08-08 11:25                                   ` J. Bruce Fields
2019-07-08 19:23 ` [PATCH v4 6/8] NFSD generalize nfsd4_compound_state flag names Olga Kornievskaia
2019-07-08 19:23 ` [PATCH v4 7/8] NFSD: allow inter server COPY to have a STALE source server fh Olga Kornievskaia
2019-07-23 21:35   ` bfields
2019-07-30 15:48     ` Olga Kornievskaia
2019-07-08 19:23 ` [PATCH v4 8/8] NFSD add nfs4 inter ssc to nfsd4_copy Olga Kornievskaia
2019-07-09 12:43   ` Anna Schumaker
2019-07-09 15:53     ` Olga Kornievskaia
2019-07-09  3:53 ` [PATCH v4 0/8] server-side support for "inter" SSC copy bfields
2019-07-09 15:47   ` Olga Kornievskaia
2019-07-17 18:05     ` Olga Kornievskaia

Linux-NFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nfs/0 linux-nfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nfs linux-nfs/ https://lore.kernel.org/linux-nfs \
		linux-nfs@vger.kernel.org linux-nfs@archiver.kernel.org
	public-inbox-index linux-nfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-nfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox