linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/10] server-side support for "inter" SSC copy
@ 2018-11-30 20:03 Olga Kornievskaia
  2018-11-30 20:03 ` [PATCH v2 01/10] VFS generic copy_file_range() support Olga Kornievskaia
                   ` (9 more replies)
  0 siblings, 10 replies; 30+ messages in thread
From: Olga Kornievskaia @ 2018-11-30 20:03 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-fsdevel

This patch series adds support for NFSv4.2 copy offload feature
allowing copy between two different NFS servers.

This functionality depends on the VFS ability to support generic
copy_file_range() where a copy is done between an NFS file and
a local file system.

This feature is enabled by the kernel module parameter --
inter_copy_offload_enable -- and by default is disabled. There is
also a kernel compile configuration of NFSD_V4_2_INTER_SSC that
adds dependency on the NFS client side functions called from the
server.

These patches work on top of existing async intra copy offload
patches. For the "inter" SSC, the implementation only supports
asynchronous inter copy.

On the source server, upon receiving a COPY_NOTIFY, it generate a
unique stateid that's kept in the global list. Upon receiving a READ
with a stateid, the code checks the normal list of open stateid and
now additionally, it'll check the copy state list as well before
deciding to either fail with BAD_STATEID or find one that matches.
The stored stateid is only valid to be used for the first time
with a choosen lease period (90s currently). When the source server
received an OFFLOAD_CANCEL, it will remove the stateid from the
global list. Otherwise, the copy stateid is removed upon the removal
of its "parent" stateid (open/lock/delegation stateid).

On the destination server, upon receiving a COPY request, the server
establishes the necessary clientid/session with the source server.
It calls into the NFS client code to establish the necessary
open stateid, filehandle, file description (without doing an NFS open).
Then the server calls into the copy_file_range() to preform the copy
where the source file will issue NFS READs and then do local file
system writes (this depends on the VFS ability to do cross device
copy_file_range().

v2:
-- in on top of 4.20-rc4 + client side inter patch series
-- VFS changes to do enable generic copy_file_range() and then NFS
falls back on generic_copy_file_range() for previous EXDEV/OPNOTSUPP
errors
-- hopefully addressed Bruce's review comments (highlights are):
   --- copy_notify patch: addressed naming, sc_cp_list access is
now protected by s2s_cp_lock
   --- fillin netloc4 patch: address the size and added WARN_ON
   --- add ca_source to COPY: decode only 1 address, dont allocate
memory (the rest into dummy)
   --- check stateid against stored: moved the refcount under lock
   --- allow stale filehandle: adding a loop to go thru the ops in
the compound, store/manage puttfh if copy is present in the compound
mark the source putfh as "no verify".

All the patches (client inter) and this patch series is available
from git://linux-nfs.org/projects/aglo/linux.git under the "linux-ssc"
branch

Olga Kornievskaia (10):
  VFS generic copy_file_range() support
  NFS fallback to generic_copy_file_range
  NFSD fill-in netloc4 structure
  NFSD add ca_source_server<> to COPY
  NFSD return nfs4_stid in nfs4_preprocess_stateid_op
  NFSD add COPY_NOTIFY operation
  NFSD check stateids against copy stateids
  NFSD generalize nfsd4_compound_state flag names
  NFSD: allow inter server COPY to have a STALE source server fh
  NFSD add nfs4 inter ssc to nfsd4_copy

 fs/nfs/nfs4file.c    |   9 +-
 fs/nfsd/Kconfig      |  10 ++
 fs/nfsd/nfs4proc.c   | 406 ++++++++++++++++++++++++++++++++++++++++++++++-----
 fs/nfsd/nfs4state.c  | 124 ++++++++++++++--
 fs/nfsd/nfs4xdr.c    | 166 ++++++++++++++++++++-
 fs/nfsd/nfsd.h       |  32 ++++
 fs/nfsd/nfsfh.h      |   5 +-
 fs/nfsd/nfssvc.c     |   6 +
 fs/nfsd/state.h      |  21 ++-
 fs/nfsd/xdr4.h       |  37 ++++-
 fs/read_write.c      |  66 +++++++--
 include/linux/fs.h   |   7 +
 include/linux/nfs4.h |   1 +
 mm/filemap.c         |   6 +-
 14 files changed, 810 insertions(+), 86 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 01/10] VFS generic copy_file_range() support
  2018-11-30 20:03 [PATCH v2 00/10] server-side support for "inter" SSC copy Olga Kornievskaia
@ 2018-11-30 20:03 ` Olga Kornievskaia
  2018-12-01  8:11   ` Amir Goldstein
  2018-12-01 21:18   ` Matthew Wilcox
  2018-11-30 20:03 ` [PATCH v2 02/10] NFS fallback to generic_copy_file_range Olga Kornievskaia
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 30+ messages in thread
From: Olga Kornievskaia @ 2018-11-30 20:03 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-fsdevel

Relax the condition that input files must be from the same
file systems.

Add checks that input parameters adhere semantics.

If no copy_file_range() support is found, then do generic
checks for the unsupported page cache ranges, LFS, limits,
and clear setuid/setgid if not running as root before calling
do_splice_direct(). Update atime,ctime,mtime afterwards.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/read_write.c    | 66 ++++++++++++++++++++++++++++++++++++++++++------------
 include/linux/fs.h |  7 ++++++
 mm/filemap.c       |  6 ++---
 3 files changed, 61 insertions(+), 18 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 7b9e59d..2d309b0 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1540,6 +1540,44 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 }
 #endif
 
+ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
+				struct file *file_out, loff_t pos_out,
+				loff_t len, unsigned int flags)
+{
+	ssize_t ret;
+	loff_t size_in = i_size_read(file_inode(file_in)), count;
+
+	/* preform generic checks for unsupported page cache ranges, LFS
+	 * limits. If pos exceeds the limit, returns EFBIG
+	 */
+	count = min(len, size_in - pos_in);
+	ret = generic_access_check_limits(file_in, pos_in, &count);
+	if (ret)
+		goto done;
+	ret = generic_write_check_limits(file_out, pos_out, &count);
+	if (ret)
+		goto done;
+	/* If not running as root, clear setuid/setgid bits. This keeps
+	 * people from modifying setuid and setgid binaries.
+	 */
+	if (!IS_NOSEC(file_inode(file_out))) {
+		ret = file_remove_privs(file_out);
+		if (ret)
+			goto done;
+	}
+
+	ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out,
+			count > MAX_RW_COUNT ? MAX_RW_COUNT : count, 0);
+
+	file_accessed(file_in);
+	if (!(file_out->f_mode & FMODE_NOCMTIME))
+		file_update_time(file_out);
+
+done:
+	return ret;
+}
+EXPORT_SYMBOL(generic_copy_file_range);
+
 /*
  * copy_file_range() differs from regular file read and write in that it
  * specifically allows return partial success.  When it does so is up to
@@ -1552,6 +1590,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	struct inode *inode_in = file_inode(file_in);
 	struct inode *inode_out = file_inode(file_out);
 	ssize_t ret;
+	loff_t size_in;
 
 	if (flags != 0)
 		return -EINVAL;
@@ -1577,6 +1616,15 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	if (len == 0)
 		return 0;
 
+	/* Ensure offsets don't wrap. */
+	if (pos_in + len < pos_in || pos_out + len < pos_out)
+		return -EINVAL;
+
+	size_in = i_size_read(inode_in);
+	/* Ensure that source range is within EOF. */
+	if (pos_in >= size_in || pos_in + len > size_in)
+		return -EINVAL;
+
 	file_start_write(file_out);
 
 	/*
@@ -1597,22 +1645,12 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 		}
 	}
 
-	if (file_out->f_op->copy_file_range) {
+	if (file_out->f_op->copy_file_range)
 		ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
 						      pos_out, len, flags);
-		if (ret != -EOPNOTSUPP)
-			goto done;
-	}
-
-	/* this could be relaxed once generic cross fs support is added */
-	if (inode_in->i_sb != inode_out->i_sb) {
-		ret = -EXDEV;
-		goto done;
-	}
-
-	ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out,
-			len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0);
-
+	else
+		ret = generic_copy_file_range(file_in, pos_in, file_out,
+					      pos_out, len, flags);
 done:
 	if (ret > 0) {
 		fsnotify_access(file_in);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c95c080..c88ad09 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1874,6 +1874,9 @@ extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
 		unsigned long, loff_t *, rwf_t);
 extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
 				   loff_t, size_t, unsigned int);
+extern ssize_t generic_copy_file_range(struct file *file_int, loff_t pos_in,
+				       struct file *file_out, loff_t pos_out,
+				       loff_t len, unsigned int flags);
 extern int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
 					 struct file *file_out, loff_t pos_out,
 					 loff_t *count,
@@ -3016,6 +3019,10 @@ static inline void remove_inode_hash(struct inode *inode)
 extern int generic_file_mmap(struct file *, struct vm_area_struct *);
 extern int generic_file_readonly_mmap(struct file *, struct vm_area_struct *);
 extern ssize_t generic_write_checks(struct kiocb *, struct iov_iter *);
+extern int generic_access_check_limits(struct file *file, loff_t pos,
+				       loff_t *count);
+extern int generic_write_check_limits(struct file *file, loff_t pos,
+				      loff_t *count);
 extern int generic_remap_checks(struct file *file_in, loff_t pos_in,
 				struct file *file_out, loff_t pos_out,
 				loff_t *count, unsigned int remap_flags);
diff --git a/mm/filemap.c b/mm/filemap.c
index 81adec8..894f3ae 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2829,8 +2829,7 @@ struct page *read_cache_page_gfp(struct address_space *mapping,
  * LFS limits.  If pos is under the limit it becomes a short access.  If it
  * exceeds the limit we return -EFBIG.
  */
-static int generic_access_check_limits(struct file *file, loff_t pos,
-				       loff_t *count)
+int generic_access_check_limits(struct file *file, loff_t pos, loff_t *count)
 {
 	struct inode *inode = file->f_mapping->host;
 	loff_t max_size = inode->i_sb->s_maxbytes;
@@ -2844,8 +2843,7 @@ static int generic_access_check_limits(struct file *file, loff_t pos,
 	return 0;
 }
 
-static int generic_write_check_limits(struct file *file, loff_t pos,
-				      loff_t *count)
+int generic_write_check_limits(struct file *file, loff_t pos, loff_t *count)
 {
 	loff_t limit = rlimit(RLIMIT_FSIZE);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 02/10] NFS fallback to generic_copy_file_range
  2018-11-30 20:03 [PATCH v2 00/10] server-side support for "inter" SSC copy Olga Kornievskaia
  2018-11-30 20:03 ` [PATCH v2 01/10] VFS generic copy_file_range() support Olga Kornievskaia
@ 2018-11-30 20:03 ` Olga Kornievskaia
  2018-11-30 20:03 ` [PATCH v2 03/10] NFSD fill-in netloc4 structure Olga Kornievskaia
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Olga Kornievskaia @ 2018-11-30 20:03 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-fsdevel

If NFS unable to handle the copy then fallback to the generic VFS
copy_file_range functionality.

Also remove the offset check, as the check was added at the VFS.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfs/nfs4file.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index 4fe9fc1..78e163a 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -139,17 +139,16 @@ static ssize_t nfs4_copy_file_range(struct file *file_in, loff_t pos_in,
 	nfs4_stateid *cnrs = NULL;
 	ssize_t ret;
 
-	if (pos_in >= i_size_read(file_inode(file_in)))
-		return -EINVAL;
-
 	if (file_in->f_op != &nfs4_file_operations)
-		return -EXDEV;
+		return generic_copy_file_range(file_in, pos_in, file_out,
+					pos_out, count, flags);
 
 	if (file_inode(file_in) == file_inode(file_out))
 		return -EINVAL;
 
 	if (!nfs_server_capable(file_inode(file_out), NFS_CAP_COPY))
-		return -EOPNOTSUPP;
+		return generic_copy_file_range(file_in, pos_in, file_out,
+					pos_out, count, flags);
 retry:
 	if (!nfs42_files_from_same_server(file_in, file_out)) {
 		cn_resp = kzalloc(sizeof(struct nfs42_copy_notify_res),
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 03/10] NFSD fill-in netloc4 structure
  2018-11-30 20:03 [PATCH v2 00/10] server-side support for "inter" SSC copy Olga Kornievskaia
  2018-11-30 20:03 ` [PATCH v2 01/10] VFS generic copy_file_range() support Olga Kornievskaia
  2018-11-30 20:03 ` [PATCH v2 02/10] NFS fallback to generic_copy_file_range Olga Kornievskaia
@ 2018-11-30 20:03 ` Olga Kornievskaia
  2018-11-30 20:03 ` [PATCH v2 04/10] NFSD add ca_source_server<> to COPY Olga Kornievskaia
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Olga Kornievskaia @ 2018-11-30 20:03 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-fsdevel

nfs.4 defines nfs42_netaddr structure that represents netloc4.

Populate needed fields from the sockaddr structure.

This will be used by flexfiles and 4.2 inter copy

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfsd/nfsd.h | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 0668999..a8fec63 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -18,6 +18,7 @@
 #include <linux/nfs4.h>
 #include <linux/sunrpc/svc.h>
 #include <linux/sunrpc/msg_prot.h>
+#include <linux/sunrpc/addr.h>
 
 #include <uapi/linux/nfsd/debug.h>
 
@@ -366,6 +367,37 @@ static inline bool nfsd4_spo_must_allow(struct svc_rqst *rqstp)
 
 extern const u32 nfsd_suppattrs[3][3];
 
+static inline u32 nfsd4_set_netaddr(struct sockaddr *addr,
+				    struct nfs42_netaddr *netaddr)
+{
+	struct sockaddr_in *sin = (struct sockaddr_in *)addr;
+	struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)addr;
+	unsigned int port;
+	size_t ret_addr, ret_port;
+
+	switch (addr->sa_family) {
+	case AF_INET:
+		port = ntohs(sin->sin_port);
+		sprintf(netaddr->netid, "tcp");
+		netaddr->netid_len = 3;
+		break;
+	case AF_INET6:
+		port = ntohs(sin6->sin6_port);
+		sprintf(netaddr->netid, "tcp6");
+		netaddr->netid_len = 4;
+		break;
+	default:
+		return nfserr_inval;
+	}
+	ret_addr = rpc_ntop(addr, netaddr->addr, sizeof(netaddr->addr));
+	ret_port = snprintf(netaddr->addr + ret_addr,
+			    RPCBIND_MAXUADDRLEN + 1 - ret_addr,
+			    ".%u.%u", port >> 8, port & 0xff);
+	WARN_ON(ret_port >= RPCBIND_MAXUADDRLEN + 1 - ret_addr);
+	netaddr->addr_len = ret_addr + ret_port;
+	return 0;
+}
+
 static inline bool bmval_is_subset(const u32 *bm1, const u32 *bm2)
 {
 	return !((bm1[0] & ~bm2[0]) ||
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 04/10] NFSD add ca_source_server<> to COPY
  2018-11-30 20:03 [PATCH v2 00/10] server-side support for "inter" SSC copy Olga Kornievskaia
                   ` (2 preceding siblings ...)
  2018-11-30 20:03 ` [PATCH v2 03/10] NFSD fill-in netloc4 structure Olga Kornievskaia
@ 2018-11-30 20:03 ` Olga Kornievskaia
  2019-02-19 16:17   ` J. Bruce Fields
  2018-11-30 20:03 ` [PATCH v2 05/10] NFSD return nfs4_stid in nfs4_preprocess_stateid_op Olga Kornievskaia
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Olga Kornievskaia @ 2018-11-30 20:03 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-fsdevel

Decode the ca_source_server list that's sent but only use the
first one. Presence of non-zero list indicates an "inter" copy.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfsd/nfs4xdr.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 fs/nfsd/xdr4.h    | 12 ++++++----
 2 files changed, 74 insertions(+), 7 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 3de42a7..879ddc6 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -40,6 +40,7 @@
 #include <linux/utsname.h>
 #include <linux/pagemap.h>
 #include <linux/sunrpc/svcauth_gss.h>
+#include <linux/sunrpc/addr.h>
 
 #include "idmap.h"
 #include "acl.h"
@@ -1743,11 +1744,58 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, str
 	DECODE_TAIL;
 }
 
+static __be32 nfsd4_decode_nl4_server(struct nfsd4_compoundargs *argp,
+				      struct nl4_server *ns)
+{
+	DECODE_HEAD;
+	struct nfs42_netaddr *naddr;
+
+	READ_BUF(4);
+	ns->nl4_type = be32_to_cpup(p++);
+
+	/* currently support for 1 inter-server source server */
+	switch (ns->nl4_type) {
+	case NL4_NAME:
+	case NL4_URL:
+		READ_BUF(4);
+		ns->u.nl4_str_sz = be32_to_cpup(p++);
+		if (ns->u.nl4_str_sz > NFS4_OPAQUE_LIMIT)
+			goto xdr_error;
+
+		READ_BUF(ns->u.nl4_str_sz);
+		COPYMEM(ns->u.nl4_str,
+			ns->u.nl4_str_sz);
+		break;
+	case NL4_NETADDR:
+		naddr = &ns->u.nl4_addr;
+
+		READ_BUF(4);
+		naddr->netid_len = be32_to_cpup(p++);
+		if (naddr->netid_len > RPCBIND_MAXNETIDLEN)
+			goto xdr_error;
+
+		READ_BUF(naddr->netid_len + 4); /* 4 for uaddr len */
+		COPYMEM(naddr->netid, naddr->netid_len);
+
+		naddr->addr_len = be32_to_cpup(p++);
+		if (naddr->addr_len > RPCBIND_MAXUADDRLEN)
+			goto xdr_error;
+
+		READ_BUF(naddr->addr_len);
+		COPYMEM(naddr->addr, naddr->addr_len);
+		break;
+	default:
+		goto xdr_error;
+	}
+	DECODE_TAIL;
+}
+
 static __be32
 nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy)
 {
 	DECODE_HEAD;
-	unsigned int tmp;
+	struct nl4_server ns_dummy;
+	int i, count;
 
 	status = nfsd4_decode_stateid(argp, &copy->cp_src_stateid);
 	if (status)
@@ -1762,8 +1810,25 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, str
 	p = xdr_decode_hyper(p, &copy->cp_count);
 	p++; /* ca_consecutive: we always do consecutive copies */
 	copy->cp_synchronous = be32_to_cpup(p++);
-	tmp = be32_to_cpup(p); /* Source server list not supported */
+	count = be32_to_cpup(p++);
 
+	copy->cp_intra = false;
+	if (count == 0) { /* intra-server copy */
+		copy->cp_intra = true;
+		goto intra;
+	}
+
+	/* decode all the supplied server addresses but use first */
+	status = nfsd4_decode_nl4_server(argp, &copy->cp_src);
+	if (status)
+		return status;
+
+	for (i = 0; i < count - 1; i++) {
+		status = nfsd4_decode_nl4_server(argp, &ns_dummy);
+		if (status)
+			return status;
+	}
+intra:
 	DECODE_TAIL;
 }
 
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index feeb6d4..513c9ff 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -516,11 +516,13 @@ struct nfsd42_write_res {
 
 struct nfsd4_copy {
 	/* request */
-	stateid_t	cp_src_stateid;
-	stateid_t	cp_dst_stateid;
-	u64		cp_src_pos;
-	u64		cp_dst_pos;
-	u64		cp_count;
+	stateid_t		cp_src_stateid;
+	stateid_t		cp_dst_stateid;
+	u64			cp_src_pos;
+	u64			cp_dst_pos;
+	u64			cp_count;
+	struct nl4_server	cp_src;
+	bool			cp_intra;
 
 	/* both */
 	bool		cp_synchronous;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 05/10] NFSD return nfs4_stid in nfs4_preprocess_stateid_op
  2018-11-30 20:03 [PATCH v2 00/10] server-side support for "inter" SSC copy Olga Kornievskaia
                   ` (3 preceding siblings ...)
  2018-11-30 20:03 ` [PATCH v2 04/10] NFSD add ca_source_server<> to COPY Olga Kornievskaia
@ 2018-11-30 20:03 ` Olga Kornievskaia
  2018-11-30 20:03 ` [PATCH v2 06/10] NFSD add COPY_NOTIFY operation Olga Kornievskaia
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Olga Kornievskaia @ 2018-11-30 20:03 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-fsdevel

Needed for copy to add nfs4_cp_state to the nfs4_stid.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfsd/nfs4proc.c  | 17 ++++++++++-------
 fs/nfsd/nfs4state.c |  8 ++++++--
 fs/nfsd/state.h     |  3 ++-
 3 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index d505990..0152b34 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -781,7 +781,8 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
 	/* check stateid */
 	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
 					&read->rd_stateid, RD_STATE,
-					&read->rd_filp, &read->rd_tmp_file);
+					&read->rd_filp, &read->rd_tmp_file,
+					NULL);
 	if (status) {
 		dprintk("NFSD: nfsd4_read: couldn't process stateid!\n");
 		goto out;
@@ -954,7 +955,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
 	if (setattr->sa_iattr.ia_valid & ATTR_SIZE) {
 		status = nfs4_preprocess_stateid_op(rqstp, cstate,
 				&cstate->current_fh, &setattr->sa_stateid,
-				WR_STATE, NULL, NULL);
+				WR_STATE, NULL, NULL, NULL);
 		if (status) {
 			dprintk("NFSD: nfsd4_setattr: couldn't process stateid!\n");
 			return status;
@@ -1005,7 +1006,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
 	trace_nfsd_write_start(rqstp, &cstate->current_fh,
 			       write->wr_offset, cnt);
 	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
-						stateid, WR_STATE, &filp, NULL);
+					stateid, WR_STATE, &filp, NULL, NULL);
 	if (status) {
 		dprintk("NFSD: nfsd4_write: couldn't process stateid!\n");
 		return status;
@@ -1042,14 +1043,16 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
 		return nfserr_nofilehandle;
 
 	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->save_fh,
-					    src_stateid, RD_STATE, src, NULL);
+					    src_stateid, RD_STATE, src, NULL,
+					    NULL);
 	if (status) {
 		dprintk("NFSD: %s: couldn't process src stateid!\n", __func__);
 		goto out;
 	}
 
 	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
-					    dst_stateid, WR_STATE, dst, NULL);
+					    dst_stateid, WR_STATE, dst, NULL,
+					    NULL);
 	if (status) {
 		dprintk("NFSD: %s: couldn't process dst stateid!\n", __func__);
 		goto out_put_src;
@@ -1353,7 +1356,7 @@ struct nfsd4_copy *
 
 	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
 					    &fallocate->falloc_stateid,
-					    WR_STATE, &file, NULL);
+					    WR_STATE, &file, NULL, NULL);
 	if (status != nfs_ok) {
 		dprintk("NFSD: nfsd4_fallocate: couldn't process stateid!\n");
 		return status;
@@ -1412,7 +1415,7 @@ struct nfsd4_copy *
 
 	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
 					    &seek->seek_stateid,
-					    RD_STATE, &file, NULL);
+					    RD_STATE, &file, NULL, NULL);
 	if (status) {
 		dprintk("NFSD: nfsd4_seek: couldn't process stateid!\n");
 		return status;
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index f093fbe..be3e967 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -5158,7 +5158,8 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
 __be32
 nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
 		struct nfsd4_compound_state *cstate, struct svc_fh *fhp,
-		stateid_t *stateid, int flags, struct file **filpp, bool *tmp_file)
+		stateid_t *stateid, int flags, struct file **filpp,
+		bool *tmp_file, struct nfs4_stid **cstid)
 {
 	struct inode *ino = d_inode(fhp->fh_dentry);
 	struct net *net = SVC_NET(rqstp);
@@ -5209,8 +5210,11 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
 	if (!status && filpp)
 		status = nfs4_check_file(rqstp, fhp, s, filpp, tmp_file, flags);
 out:
-	if (s)
+	if (s) {
+		if (!status && cstid)
+			*cstid = s;
 		nfs4_put_stid(s);
+	}
 	return status;
 }
 
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 6aacb32..304de3b 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -606,7 +606,8 @@ struct nfsd4_blocked_lock {
 
 extern __be32 nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
 		struct nfsd4_compound_state *cstate, struct svc_fh *fhp,
-		stateid_t *stateid, int flags, struct file **filp, bool *tmp_file);
+		stateid_t *stateid, int flags, struct file **filp,
+		bool *tmp_file, struct nfs4_stid **cstid);
 __be32 nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
 		     stateid_t *stateid, unsigned char typemask,
 		     struct nfs4_stid **s, struct nfsd_net *nn);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 06/10] NFSD add COPY_NOTIFY operation
  2018-11-30 20:03 [PATCH v2 00/10] server-side support for "inter" SSC copy Olga Kornievskaia
                   ` (4 preceding siblings ...)
  2018-11-30 20:03 ` [PATCH v2 05/10] NFSD return nfs4_stid in nfs4_preprocess_stateid_op Olga Kornievskaia
@ 2018-11-30 20:03 ` Olga Kornievskaia
  2019-02-20  1:44   ` J. Bruce Fields
                     ` (3 more replies)
  2018-11-30 20:03 ` [PATCH v2 07/10] NFSD check stateids against copy stateids Olga Kornievskaia
                   ` (3 subsequent siblings)
  9 siblings, 4 replies; 30+ messages in thread
From: Olga Kornievskaia @ 2018-11-30 20:03 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-fsdevel

Introducing the COPY_NOTIFY operation.

Create a new unique stateid that will keep track of the copy
state and the upcoming READs that will use that stateid. Keep
it in the list associated with parent stateid.

Return single netaddr to advertise to the copy.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfsd/nfs4proc.c  | 72 +++++++++++++++++++++++++++++++++++----
 fs/nfsd/nfs4state.c | 64 +++++++++++++++++++++++++++++++----
 fs/nfsd/nfs4xdr.c   | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 fs/nfsd/state.h     | 18 ++++++++--
 fs/nfsd/xdr4.h      | 13 +++++++
 5 files changed, 248 insertions(+), 16 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 0152b34..51fca9e 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -37,6 +37,7 @@
 #include <linux/falloc.h>
 #include <linux/slab.h>
 #include <linux/kthread.h>
+#include <linux/sunrpc/addr.h>
 
 #include "idmap.h"
 #include "cache.h"
@@ -1035,7 +1036,8 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
 static __be32
 nfsd4_verify_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		  stateid_t *src_stateid, struct file **src,
-		  stateid_t *dst_stateid, struct file **dst)
+		  stateid_t *dst_stateid, struct file **dst,
+		  struct nfs4_stid **stid)
 {
 	__be32 status;
 
@@ -1052,7 +1054,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
 
 	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
 					    dst_stateid, WR_STATE, dst, NULL,
-					    NULL);
+					    stid);
 	if (status) {
 		dprintk("NFSD: %s: couldn't process dst stateid!\n", __func__);
 		goto out_put_src;
@@ -1083,7 +1085,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
 	__be32 status;
 
 	status = nfsd4_verify_copy(rqstp, cstate, &clone->cl_src_stateid, &src,
-				   &clone->cl_dst_stateid, &dst);
+				   &clone->cl_dst_stateid, &dst, NULL);
 	if (status)
 		goto out;
 
@@ -1230,7 +1232,7 @@ static void dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy *dst)
 
 static void cleanup_async_copy(struct nfsd4_copy *copy)
 {
-	nfs4_free_cp_state(copy);
+	nfs4_free_copy_state(copy);
 	fput(copy->file_dst);
 	fput(copy->file_src);
 	spin_lock(&copy->cp_clp->async_lock);
@@ -1270,7 +1272,7 @@ static int nfsd4_do_async_copy(void *data)
 
 	status = nfsd4_verify_copy(rqstp, cstate, &copy->cp_src_stateid,
 				   &copy->file_src, &copy->cp_dst_stateid,
-				   &copy->file_dst);
+				   &copy->file_dst, NULL);
 	if (status)
 		goto out;
 
@@ -1284,7 +1286,7 @@ static int nfsd4_do_async_copy(void *data)
 		async_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL);
 		if (!async_copy)
 			goto out;
-		if (!nfs4_init_cp_state(nn, copy)) {
+		if (!nfs4_init_copy_state(nn, copy)) {
 			kfree(async_copy);
 			goto out;
 		}
@@ -1348,6 +1350,43 @@ struct nfsd4_copy *
 }
 
 static __be32
+nfsd4_copy_notify(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
+		  union nfsd4_op_u *u)
+{
+	struct nfsd4_copy_notify *cn = &u->copy_notify;
+	__be32 status;
+	struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
+	struct nfs4_stid *stid;
+	struct nfs4_cpntf_state *cps;
+
+	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
+					&cn->cpn_src_stateid, RD_STATE, NULL,
+					NULL, &stid);
+	if (status)
+		return status;
+
+	cn->cpn_sec = nn->nfsd4_lease;
+	cn->cpn_nsec = 0;
+
+	status = nfserrno(-ENOMEM);
+	cps = nfs4_alloc_init_cpntf_state(nn, stid);
+	if (!cps)
+		return status;
+	memcpy(&cn->cpn_cnr_stateid, &cps->cp_stateid, sizeof(stateid_t));
+
+	/**
+	 * For now, only return one server address in cpn_src, the
+	 * address used by the client to connect to this server.
+	 */
+	cn->cpn_src.nl4_type = NL4_NETADDR;
+	status = nfsd4_set_netaddr((struct sockaddr *)&rqstp->rq_daddr,
+				 &cn->cpn_src.u.nl4_addr);
+	WARN_ON_ONCE(status);
+
+	return status;
+}
+
+static __be32
 nfsd4_fallocate(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		struct nfsd4_fallocate *fallocate, int flags)
 {
@@ -2299,6 +2338,21 @@ static inline u32 nfsd4_offload_status_rsize(struct svc_rqst *rqstp,
 		1 /* osr_complete<1> optional 0 for now */) * sizeof(__be32);
 }
 
+static inline u32 nfsd4_copy_notify_rsize(struct svc_rqst *rqstp,
+					struct nfsd4_op *op)
+{
+	return (op_encode_hdr_size +
+		3 /* cnr_lease_time */ +
+		1 /* We support one cnr_source_server */ +
+		1 /* cnr_stateid seq */ +
+		op_encode_stateid_maxsz /* cnr_stateid */ +
+		1 /* num cnr_source_server*/ +
+		1 /* nl4_type */ +
+		1 /* nl4 size */ +
+		XDR_QUADLEN(NFS4_OPAQUE_LIMIT) /*nl4_loc + nl4_loc_sz */)
+		* sizeof(__be32);
+}
+
 #ifdef CONFIG_NFSD_PNFS
 static inline u32 nfsd4_getdeviceinfo_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
 {
@@ -2723,6 +2777,12 @@ static inline u32 nfsd4_seek_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
 		.op_name = "OP_OFFLOAD_CANCEL",
 		.op_rsize_bop = nfsd4_only_status_rsize,
 	},
+	[OP_COPY_NOTIFY] = {
+		.op_func = nfsd4_copy_notify,
+		.op_flags = OP_MODIFIES_SOMETHING,
+		.op_name = "OP_COPY_NOTIFY",
+		.op_rsize_bop = nfsd4_copy_notify_rsize,
+	},
 };
 
 /**
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index be3e967..eaa136f 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -697,6 +697,7 @@ struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *sla
 	/* Will be incremented before return to client: */
 	refcount_set(&stid->sc_count, 1);
 	spin_lock_init(&stid->sc_lock);
+	INIT_LIST_HEAD(&stid->sc_cp_list);
 
 	/*
 	 * It shouldn't be a problem to reuse an opaque stateid value.
@@ -716,24 +717,53 @@ struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *sla
 /*
  * Create a unique stateid_t to represent each COPY.
  */
-int nfs4_init_cp_state(struct nfsd_net *nn, struct nfsd4_copy *copy)
+static int nfs4_init_cp_state(struct nfsd_net *nn, void *ptr, stateid_t *stid)
 {
 	int new_id;
 
 	idr_preload(GFP_KERNEL);
 	spin_lock(&nn->s2s_cp_lock);
-	new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, copy, 0, 0, GFP_NOWAIT);
+	new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, ptr, 0, 0, GFP_NOWAIT);
 	spin_unlock(&nn->s2s_cp_lock);
 	idr_preload_end();
 	if (new_id < 0)
 		return 0;
-	copy->cp_stateid.si_opaque.so_id = new_id;
-	copy->cp_stateid.si_opaque.so_clid.cl_boot = nn->boot_time;
-	copy->cp_stateid.si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
+	stid->si_opaque.so_id = new_id;
+	stid->si_opaque.so_clid.cl_boot = nn->boot_time;
+	stid->si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
 	return 1;
 }
 
-void nfs4_free_cp_state(struct nfsd4_copy *copy)
+int nfs4_init_copy_state(struct nfsd_net *nn, struct nfsd4_copy *copy)
+{
+	return nfs4_init_cp_state(nn, copy, &copy->cp_stateid);
+}
+
+struct nfs4_cpntf_state *nfs4_alloc_init_cpntf_state(struct nfsd_net *nn,
+						     struct nfs4_stid *p_stid)
+{
+	struct nfs4_cpntf_state *cps;
+
+	cps = kzalloc(sizeof(struct nfs4_cpntf_state), GFP_KERNEL);
+	if (!cps)
+		return NULL;
+	if (!nfs4_init_cp_state(nn, cps, &cps->cp_stateid))
+		goto out_free;
+	cps->cp_p_stid = p_stid;
+	cps->cp_active = false;
+	cps->cp_timeout = jiffies + (nn->nfsd4_lease * HZ);
+	INIT_LIST_HEAD(&cps->cp_list);
+	spin_lock(&nn->s2s_cp_lock);
+	list_add(&cps->cp_list, &p_stid->sc_cp_list);
+	spin_unlock(&nn->s2s_cp_lock);
+
+	return cps;
+out_free:
+	kfree(cps);
+	return NULL;
+}
+
+void nfs4_free_copy_state(struct nfsd4_copy *copy)
 {
 	struct nfsd_net *nn;
 
@@ -743,6 +773,27 @@ void nfs4_free_cp_state(struct nfsd4_copy *copy)
 	spin_unlock(&nn->s2s_cp_lock);
 }
 
+static void nfs4_free_cpntf_statelist(struct net *net, struct nfs4_stid *stid)
+{
+	struct nfs4_cpntf_state *cps;
+	struct nfsd_net *nn;
+
+	nn = net_generic(net, nfsd_net_id);
+
+	might_sleep();
+
+	spin_lock(&nn->s2s_cp_lock);
+	while (!list_empty(&stid->sc_cp_list)) {
+		cps = list_first_entry(&stid->sc_cp_list,
+				       struct nfs4_cpntf_state, cp_list);
+		list_del(&cps->cp_list);
+		idr_remove(&nn->s2s_cp_stateids,
+			   cps->cp_stateid.si_opaque.so_id);
+		kfree(cps);
+	}
+	spin_unlock(&nn->s2s_cp_lock);
+}
+
 static struct nfs4_ol_stateid * nfs4_alloc_open_stateid(struct nfs4_client *clp)
 {
 	struct nfs4_stid *stid;
@@ -891,6 +942,7 @@ static void block_delegations(struct knfsd_fh *fh)
 	}
 	idr_remove(&clp->cl_stateids, s->sc_stateid.si_opaque.so_id);
 	spin_unlock(&clp->cl_lock);
+	nfs4_free_cpntf_statelist(clp->net, s);
 	s->sc_free(s);
 	if (fp)
 		put_nfs4_file(fp);
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 879ddc6..c9fb625 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1840,6 +1840,22 @@ static __be32 nfsd4_decode_nl4_server(struct nfsd4_compoundargs *argp,
 }
 
 static __be32
+nfsd4_decode_copy_notify(struct nfsd4_compoundargs *argp,
+			 struct nfsd4_copy_notify *cn)
+{
+	int status;
+
+	status = nfsd4_decode_stateid(argp, &cn->cpn_src_stateid);
+	if (status)
+		return status;
+	status = nfsd4_decode_nl4_server(argp, &cn->cpn_dst);
+	if (status)
+		return status;
+
+	return status;
+}
+
+static __be32
 nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek *seek)
 {
 	DECODE_HEAD;
@@ -1940,7 +1956,7 @@ static __be32 nfsd4_decode_nl4_server(struct nfsd4_compoundargs *argp,
 	/* new operations for NFSv4.2 */
 	[OP_ALLOCATE]		= (nfsd4_dec)nfsd4_decode_fallocate,
 	[OP_COPY]		= (nfsd4_dec)nfsd4_decode_copy,
-	[OP_COPY_NOTIFY]	= (nfsd4_dec)nfsd4_decode_notsupp,
+	[OP_COPY_NOTIFY]	= (nfsd4_dec)nfsd4_decode_copy_notify,
 	[OP_DEALLOCATE]		= (nfsd4_dec)nfsd4_decode_fallocate,
 	[OP_IO_ADVISE]		= (nfsd4_dec)nfsd4_decode_notsupp,
 	[OP_LAYOUTERROR]	= (nfsd4_dec)nfsd4_decode_notsupp,
@@ -4325,6 +4341,45 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
 }
 
 static __be32
+nfsd42_encode_nl4_server(struct nfsd4_compoundres *resp, struct nl4_server *ns)
+{
+	struct xdr_stream *xdr = &resp->xdr;
+	struct nfs42_netaddr *addr;
+	__be32 *p;
+
+	p = xdr_reserve_space(xdr, 4);
+	*p++ = cpu_to_be32(ns->nl4_type);
+
+	switch (ns->nl4_type) {
+	case NL4_NETADDR:
+		addr = &ns->u.nl4_addr;
+
+		/** netid_len, netid, uaddr_len, uaddr (port included
+		 * in RPCBIND_MAXUADDRLEN)
+		 */
+		p = xdr_reserve_space(xdr,
+			4 /* netid len */ +
+			(XDR_QUADLEN(addr->netid_len) * 4) +
+			4 /* uaddr len */ +
+			(XDR_QUADLEN(addr->addr_len) * 4));
+		if (!p)
+			return nfserr_resource;
+
+		*p++ = cpu_to_be32(addr->netid_len);
+		p = xdr_encode_opaque_fixed(p, addr->netid,
+					    addr->netid_len);
+		*p++ = cpu_to_be32(addr->addr_len);
+		p = xdr_encode_opaque_fixed(p, addr->addr,
+					addr->addr_len);
+		break;
+	default:
+		WARN_ON(ns->nl4_type != NL4_NETADDR);
+	}
+
+	return 0;
+}
+
+static __be32
 nfsd4_encode_copy(struct nfsd4_compoundres *resp, __be32 nfserr,
 		  struct nfsd4_copy *copy)
 {
@@ -4358,6 +4413,44 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
 }
 
 static __be32
+nfsd4_encode_copy_notify(struct nfsd4_compoundres *resp, __be32 nfserr,
+			 struct nfsd4_copy_notify *cn)
+{
+	struct xdr_stream *xdr = &resp->xdr;
+	__be32 *p;
+
+	if (nfserr)
+		return nfserr;
+
+	/* 8 sec, 4 nsec */
+	p = xdr_reserve_space(xdr, 12);
+	if (!p)
+		return nfserr_resource;
+
+	/* cnr_lease_time */
+	p = xdr_encode_hyper(p, cn->cpn_sec);
+	*p++ = cpu_to_be32(cn->cpn_nsec);
+
+	/* cnr_stateid */
+	nfserr = nfsd4_encode_stateid(xdr, &cn->cpn_cnr_stateid);
+	if (nfserr)
+		return nfserr;
+
+	/* cnr_src.nl_nsvr */
+	p = xdr_reserve_space(xdr, 4);
+	if (!p)
+		return nfserr_resource;
+
+	*p++ = cpu_to_be32(1);
+
+	nfserr = nfsd42_encode_nl4_server(resp, &cn->cpn_src);
+	if (nfserr)
+		return nfserr;
+
+	return nfserr;
+}
+
+static __be32
 nfsd4_encode_seek(struct nfsd4_compoundres *resp, __be32 nfserr,
 		  struct nfsd4_seek *seek)
 {
@@ -4454,7 +4547,7 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
 	/* NFSv4.2 operations */
 	[OP_ALLOCATE]		= (nfsd4_enc)nfsd4_encode_noop,
 	[OP_COPY]		= (nfsd4_enc)nfsd4_encode_copy,
-	[OP_COPY_NOTIFY]	= (nfsd4_enc)nfsd4_encode_noop,
+	[OP_COPY_NOTIFY]	= (nfsd4_enc)nfsd4_encode_copy_notify,
 	[OP_DEALLOCATE]		= (nfsd4_enc)nfsd4_encode_noop,
 	[OP_IO_ADVISE]		= (nfsd4_enc)nfsd4_encode_noop,
 	[OP_LAYOUTERROR]	= (nfsd4_enc)nfsd4_encode_noop,
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 304de3b..31b12b1 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -94,6 +94,7 @@ struct nfs4_stid {
 #define NFS4_REVOKED_DELEG_STID 16
 #define NFS4_CLOSED_DELEG_STID 32
 #define NFS4_LAYOUT_STID 64
+	struct list_head	sc_cp_list;
 	unsigned char		sc_type;
 	stateid_t		sc_stateid;
 	spinlock_t		sc_lock;
@@ -102,6 +103,17 @@ struct nfs4_stid {
 	void			(*sc_free)(struct nfs4_stid *);
 };
 
+/* Keep a list of stateids issued by the COPY_NOTIFY, associate it with the
+ * parent OPEN/LOCK/DELEG stateid.
+ */
+struct nfs4_cpntf_state {
+	stateid_t		cp_stateid;
+	struct list_head	cp_list;	/* per parent nfs4_stid */
+	struct nfs4_stid	*cp_p_stid;	/* pointer to parent */
+	bool			cp_active;	/* has the copy started */
+	unsigned long		cp_timeout;	/* copy timeout */
+};
+
 /*
  * Represents a delegation stateid. The nfs4_client holds references to these
  * and they are put when it is being destroyed or when the delegation is
@@ -613,8 +625,10 @@ __be32 nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
 		     struct nfs4_stid **s, struct nfsd_net *nn);
 struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *slab,
 				  void (*sc_free)(struct nfs4_stid *));
-int nfs4_init_cp_state(struct nfsd_net *nn, struct nfsd4_copy *copy);
-void nfs4_free_cp_state(struct nfsd4_copy *copy);
+int nfs4_init_copy_state(struct nfsd_net *nn, struct nfsd4_copy *copy);
+void nfs4_free_copy_state(struct nfsd4_copy *copy);
+struct nfs4_cpntf_state *nfs4_alloc_init_cpntf_state(struct nfsd_net *nn,
+			struct nfs4_stid *p_stid);
 void nfs4_unhash_stid(struct nfs4_stid *s);
 void nfs4_put_stid(struct nfs4_stid *s);
 void nfs4_inc_and_copy_stateid(stateid_t *dst, struct nfs4_stid *stid);
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 513c9ff..bade8e5 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -568,6 +568,18 @@ struct nfsd4_offload_status {
 	u32		status;
 };
 
+struct nfsd4_copy_notify {
+	/* request */
+	stateid_t		cpn_src_stateid;
+	struct nl4_server	cpn_dst;
+
+	/* response */
+	stateid_t		cpn_cnr_stateid;
+	u64			cpn_sec;
+	u32			cpn_nsec;
+	struct nl4_server	cpn_src;
+};
+
 struct nfsd4_op {
 	int					opnum;
 	const struct nfsd4_operation *		opdesc;
@@ -627,6 +639,7 @@ struct nfsd4_op {
 		struct nfsd4_clone		clone;
 		struct nfsd4_copy		copy;
 		struct nfsd4_offload_status	offload_status;
+		struct nfsd4_copy_notify	copy_notify;
 		struct nfsd4_seek		seek;
 	} u;
 	struct nfs4_replay *			replay;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 07/10] NFSD check stateids against copy stateids
  2018-11-30 20:03 [PATCH v2 00/10] server-side support for "inter" SSC copy Olga Kornievskaia
                   ` (5 preceding siblings ...)
  2018-11-30 20:03 ` [PATCH v2 06/10] NFSD add COPY_NOTIFY operation Olga Kornievskaia
@ 2018-11-30 20:03 ` Olga Kornievskaia
  2018-11-30 20:03 ` [PATCH v2 08/10] NFSD generalize nfsd4_compound_state flag names Olga Kornievskaia
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Olga Kornievskaia @ 2018-11-30 20:03 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-fsdevel

Incoming stateid (used by a READ) could be a saved copy stateid.
On first use make it active and check that the copy has started
within the allowable lease time.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfsd/nfs4state.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index eaa136f..7b3586ab 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -5203,6 +5203,49 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
 
 	return 0;
 }
+/*
+ * A READ from an inter server to server COPY will have a
+ * copy stateid. Return the parent nfs4_stid.
+ */
+static __be32 _find_cpntf_state(struct nfsd_net *nn, stateid_t *st,
+		     struct nfs4_cpntf_state **cps)
+{
+	struct nfs4_cpntf_state *state = NULL;
+
+	if (st->si_opaque.so_clid.cl_id != nn->s2s_cp_cl_id)
+		return nfserr_bad_stateid;
+	spin_lock(&nn->s2s_cp_lock);
+	state = idr_find(&nn->s2s_cp_stateids, st->si_opaque.so_id);
+	if (state)
+		refcount_inc(&state->cp_p_stid->sc_count);
+	spin_unlock(&nn->s2s_cp_lock);
+	if (!state)
+		return nfserr_bad_stateid;
+	*cps = state;
+	return 0;
+}
+
+static __be32 find_cpntf_state(struct nfsd_net *nn, stateid_t *st,
+			       struct nfs4_stid **stid)
+{
+	__be32 status;
+	struct nfs4_cpntf_state *cps = NULL;
+
+	status = _find_cpntf_state(nn, st, &cps);
+	if (status)
+		return status;
+
+	/* Did the inter server to server copy start in time? */
+	if (cps->cp_active == false && !time_after(cps->cp_timeout, jiffies)) {
+		nfs4_put_stid(cps->cp_p_stid);
+		return nfserr_partner_no_auth;
+	} else
+		cps->cp_active = true;
+
+	*stid = cps->cp_p_stid;
+
+	return nfs_ok;
+}
 
 /*
  * Checks for stateid operations
@@ -5235,6 +5278,8 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
 	status = nfsd4_lookup_stateid(cstate, stateid,
 				NFS4_DELEG_STID|NFS4_OPEN_STID|NFS4_LOCK_STID,
 				&s, nn);
+	if (status == nfserr_bad_stateid)
+		status = find_cpntf_state(nn, stateid, &s);
 	if (status)
 		return status;
 	status = nfsd4_stid_check_stateid_generation(stateid, s,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 08/10] NFSD generalize nfsd4_compound_state flag names
  2018-11-30 20:03 [PATCH v2 00/10] server-side support for "inter" SSC copy Olga Kornievskaia
                   ` (6 preceding siblings ...)
  2018-11-30 20:03 ` [PATCH v2 07/10] NFSD check stateids against copy stateids Olga Kornievskaia
@ 2018-11-30 20:03 ` Olga Kornievskaia
  2018-11-30 20:03 ` [PATCH v2 09/10] NFSD: allow inter server COPY to have a STALE source server fh Olga Kornievskaia
  2018-11-30 20:03 ` [PATCH v2 10/10] NFSD add nfs4 inter ssc to nfsd4_copy Olga Kornievskaia
  9 siblings, 0 replies; 30+ messages in thread
From: Olga Kornievskaia @ 2018-11-30 20:03 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-fsdevel

From: Olga Kornievskaia <kolga@netapp.com>

Allow for sid_flag field non-stateid use.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfsd/nfs4proc.c  | 8 ++++----
 fs/nfsd/nfs4state.c | 7 ++++---
 fs/nfsd/xdr4.h      | 6 +++---
 3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 51fca9e..70d03e9 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -530,9 +530,9 @@ static __be32 nfsd4_open_omfg(struct svc_rqst *rqstp, struct nfsd4_compound_stat
 		return nfserr_restorefh;
 
 	fh_dup2(&cstate->current_fh, &cstate->save_fh);
-	if (HAS_STATE_ID(cstate, SAVED_STATE_ID_FLAG)) {
+	if (HAS_CSTATE_FLAG(cstate, SAVED_STATE_ID_FLAG)) {
 		memcpy(&cstate->current_stateid, &cstate->save_stateid, sizeof(stateid_t));
-		SET_STATE_ID(cstate, CURRENT_STATE_ID_FLAG);
+		SET_CSTATE_FLAG(cstate, CURRENT_STATE_ID_FLAG);
 	}
 	return nfs_ok;
 }
@@ -542,9 +542,9 @@ static __be32 nfsd4_open_omfg(struct svc_rqst *rqstp, struct nfsd4_compound_stat
 	     union nfsd4_op_u *u)
 {
 	fh_dup2(&cstate->save_fh, &cstate->current_fh);
-	if (HAS_STATE_ID(cstate, CURRENT_STATE_ID_FLAG)) {
+	if (HAS_CSTATE_FLAG(cstate, CURRENT_STATE_ID_FLAG)) {
 		memcpy(&cstate->save_stateid, &cstate->current_stateid, sizeof(stateid_t));
-		SET_STATE_ID(cstate, SAVED_STATE_ID_FLAG);
+		SET_CSTATE_FLAG(cstate, SAVED_STATE_ID_FLAG);
 	}
 	return nfs_ok;
 }
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 7b3586ab..3f5fb0b 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -7423,7 +7423,8 @@ static int nfs4_state_create_net(struct net *net)
 static void
 get_stateid(struct nfsd4_compound_state *cstate, stateid_t *stateid)
 {
-	if (HAS_STATE_ID(cstate, CURRENT_STATE_ID_FLAG) && CURRENT_STATEID(stateid))
+	if (HAS_CSTATE_FLAG(cstate, CURRENT_STATE_ID_FLAG) &&
+	    CURRENT_STATEID(stateid))
 		memcpy(stateid, &cstate->current_stateid, sizeof(stateid_t));
 }
 
@@ -7432,14 +7433,14 @@ static int nfs4_state_create_net(struct net *net)
 {
 	if (cstate->minorversion) {
 		memcpy(&cstate->current_stateid, stateid, sizeof(stateid_t));
-		SET_STATE_ID(cstate, CURRENT_STATE_ID_FLAG);
+		SET_CSTATE_FLAG(cstate, CURRENT_STATE_ID_FLAG);
 	}
 }
 
 void
 clear_current_stateid(struct nfsd4_compound_state *cstate)
 {
-	CLEAR_STATE_ID(cstate, CURRENT_STATE_ID_FLAG);
+	CLEAR_CSTATE_FLAG(cstate, CURRENT_STATE_ID_FLAG);
 }
 
 /*
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index bade8e5..9d7318c 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -46,9 +46,9 @@
 #define CURRENT_STATE_ID_FLAG (1<<0)
 #define SAVED_STATE_ID_FLAG (1<<1)
 
-#define SET_STATE_ID(c, f) ((c)->sid_flags |= (f))
-#define HAS_STATE_ID(c, f) ((c)->sid_flags & (f))
-#define CLEAR_STATE_ID(c, f) ((c)->sid_flags &= ~(f))
+#define SET_CSTATE_FLAG(c, f) ((c)->sid_flags |= (f))
+#define HAS_CSTATE_FLAG(c, f) ((c)->sid_flags & (f))
+#define CLEAR_CSTATE_FLAG(c, f) ((c)->sid_flags &= ~(f))
 
 struct nfsd4_compound_state {
 	struct svc_fh		current_fh;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 09/10] NFSD: allow inter server COPY to have a STALE source server fh
  2018-11-30 20:03 [PATCH v2 00/10] server-side support for "inter" SSC copy Olga Kornievskaia
                   ` (7 preceding siblings ...)
  2018-11-30 20:03 ` [PATCH v2 08/10] NFSD generalize nfsd4_compound_state flag names Olga Kornievskaia
@ 2018-11-30 20:03 ` Olga Kornievskaia
  2019-02-19 15:53   ` J. Bruce Fields
  2018-11-30 20:03 ` [PATCH v2 10/10] NFSD add nfs4 inter ssc to nfsd4_copy Olga Kornievskaia
  9 siblings, 1 reply; 30+ messages in thread
From: Olga Kornievskaia @ 2018-11-30 20:03 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-fsdevel

The inter server to server COPY source server filehandle
is a foreign filehandle as the COPY is sent to the destination
server.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfsd/Kconfig    | 10 ++++++++++
 fs/nfsd/nfs4proc.c | 41 ++++++++++++++++++++++++++++++++++++-----
 fs/nfsd/nfsfh.h    |  5 ++++-
 fs/nfsd/xdr4.h     |  1 +
 4 files changed, 51 insertions(+), 6 deletions(-)

diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
index 20b1c17..37ff3d5 100644
--- a/fs/nfsd/Kconfig
+++ b/fs/nfsd/Kconfig
@@ -131,6 +131,16 @@ config NFSD_FLEXFILELAYOUT
 
 	  If unsure, say N.
 
+config NFSD_V4_2_INTER_SSC
+	bool "NFSv4.2 inter server to server COPY"
+	depends on NFSD_V4 && NFS_V4_1 && NFS_V4_2
+	help
+	  This option enables support for NFSv4.2 inter server to
+	  server copy where the destination server calls the NFSv4.2
+	  client to read the data to copy from the source server.
+
+	  If unsure, say N.
+
 config NFSD_V4_SECURITY_LABEL
 	bool "Provide Security Label support for NFSv4 server"
 	depends on NFSD_V4 && SECURITY
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 70d03e9..2e28254 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -503,12 +503,20 @@ static __be32 nfsd4_open_omfg(struct svc_rqst *rqstp, struct nfsd4_compound_stat
 	    union nfsd4_op_u *u)
 {
 	struct nfsd4_putfh *putfh = &u->putfh;
+	__be32 ret;
 
 	fh_put(&cstate->current_fh);
 	cstate->current_fh.fh_handle.fh_size = putfh->pf_fhlen;
 	memcpy(&cstate->current_fh.fh_handle.fh_base, putfh->pf_fhval,
 	       putfh->pf_fhlen);
-	return fh_verify(rqstp, &cstate->current_fh, 0, NFSD_MAY_BYPASS_GSS);
+	ret = fh_verify(rqstp, &cstate->current_fh, 0, NFSD_MAY_BYPASS_GSS);
+#ifdef CONFIG_NFSD_V4_2_INTER_SSC
+	if (ret == nfserr_stale && putfh->no_verify) {
+		SET_FH_FLAG(&cstate->current_fh, NFSD4_FH_FOREIGN);
+		ret = 0;
+	}
+#endif
+	return ret;
 }
 
 static __be32
@@ -1967,11 +1975,12 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
 {
 	struct nfsd4_compoundargs *args = rqstp->rq_argp;
 	struct nfsd4_compoundres *resp = rqstp->rq_resp;
-	struct nfsd4_op	*op;
+	struct nfsd4_op	*op, *current_op, *saved_op;
 	struct nfsd4_compound_state *cstate = &resp->cstate;
 	struct svc_fh *current_fh = &cstate->current_fh;
 	struct svc_fh *save_fh = &cstate->save_fh;
 	__be32		status;
+	int i;
 
 	svcxdr_init_encode(rqstp, resp);
 	resp->tagp = resp->xdr.p;
@@ -2006,6 +2015,27 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
 		resp->opcnt = 1;
 		goto encode_op;
 	}
+#ifdef CONFIG_NFSD_V4_2_INTER_SSC
+	/* traverse all operation and if it's a COPY compound, mark the
+	 * source filehandle to skip verification
+	 */
+	for (i = 0; i < args->opcnt; i++) {
+		op = &args->ops[i];
+		if (op->opnum == OP_PUTFH)
+			current_op = op;
+		else if (op->opnum == OP_SAVEFH)
+			saved_op = current_op;
+		else if (op->opnum == OP_RESTOREFH)
+			current_op = saved_op;
+		else if (op->opnum == OP_COPY) {
+			struct nfsd4_copy *copy = (struct nfsd4_copy *)&op[i].u;
+			struct nfsd4_putfh *putfh =
+				(struct nfsd4_putfh *)&saved_op->u;
+			if (!copy->cp_intra)
+				putfh->no_verify = true;
+		}
+	}
+#endif
 
 	trace_nfsd_compound(rqstp, args->opcnt);
 	while (!status && resp->opcnt < args->opcnt) {
@@ -2021,13 +2051,14 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
 				op->status = nfsd4_open_omfg(rqstp, cstate, op);
 			goto encode_op;
 		}
-
-		if (!current_fh->fh_dentry) {
+		if (!current_fh->fh_dentry &&
+				!HAS_FH_FLAG(current_fh, NFSD4_FH_FOREIGN)) {
 			if (!(op->opdesc->op_flags & ALLOWED_WITHOUT_FH)) {
 				op->status = nfserr_nofilehandle;
 				goto encode_op;
 			}
-		} else if (current_fh->fh_export->ex_fslocs.migrated &&
+		} else if (current_fh->fh_export &&
+			   current_fh->fh_export->ex_fslocs.migrated &&
 			  !(op->opdesc->op_flags & ALLOWED_ON_ABSENT_FS)) {
 			op->status = nfserr_moved;
 			goto encode_op;
diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
index 755e256..b9c7568 100644
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -35,7 +35,7 @@ static inline ino_t u32_to_ino_t(__u32 uino)
 
 	bool			fh_locked;	/* inode locked by us */
 	bool			fh_want_write;	/* remount protection taken */
-
+	int			fh_flags;	/* FH flags */
 #ifdef CONFIG_NFSD_V3
 	bool			fh_post_saved;	/* post-op attrs saved */
 	bool			fh_pre_saved;	/* pre-op attrs saved */
@@ -56,6 +56,9 @@ static inline ino_t u32_to_ino_t(__u32 uino)
 #endif /* CONFIG_NFSD_V3 */
 
 } svc_fh;
+#define NFSD4_FH_FOREIGN (1<<0)
+#define SET_FH_FLAG(c, f) ((c)->fh_flags |= (f))
+#define HAS_FH_FLAG(c, f) ((c)->fh_flags & (f))
 
 enum nfsd_fsid {
 	FSID_DEV = 0,
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 9d7318c..fbd18d6 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -221,6 +221,7 @@ struct nfsd4_lookup {
 struct nfsd4_putfh {
 	u32		pf_fhlen;           /* request */
 	char		*pf_fhval;          /* request */
+	bool		no_verify;	    /* represents foreigh fh */
 };
 
 struct nfsd4_open {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 10/10] NFSD add nfs4 inter ssc to nfsd4_copy
  2018-11-30 20:03 [PATCH v2 00/10] server-side support for "inter" SSC copy Olga Kornievskaia
                   ` (8 preceding siblings ...)
  2018-11-30 20:03 ` [PATCH v2 09/10] NFSD: allow inter server COPY to have a STALE source server fh Olga Kornievskaia
@ 2018-11-30 20:03 ` Olga Kornievskaia
  2019-02-19 15:54   ` J. Bruce Fields
  9 siblings, 1 reply; 30+ messages in thread
From: Olga Kornievskaia @ 2018-11-30 20:03 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-fsdevel

Given a universal address, mount the source server from the destination
server.  Use an internal mount. Call the NFS client nfs42_ssc_open to
obtain the NFS struct file suitable for nfsd_copy_range.

Ability to do "inter" server-to-server depends on the an nfsd kernel
parameter "inter_copy_offload_enabled".

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfsd/nfs4proc.c   | 274 +++++++++++++++++++++++++++++++++++++++++++++++----
 fs/nfsd/nfssvc.c     |   6 ++
 fs/nfsd/xdr4.h       |   5 +
 include/linux/nfs4.h |   1 +
 4 files changed, 269 insertions(+), 17 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 2e28254..238c4b7 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1155,6 +1155,209 @@ void nfsd4_shutdown_copy(struct nfs4_client *clp)
 	while ((copy = nfsd4_get_copy(clp)) != NULL)
 		nfsd4_stop_copy(copy);
 }
+#ifdef CONFIG_NFSD_V4_2_INTER_SSC
+
+extern struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
+				   struct nfs_fh *src_fh,
+				   nfs4_stateid *stateid);
+extern void nfs42_ssc_close(struct file *filep);
+
+extern void nfs_sb_deactive(struct super_block *sb);
+
+#define NFSD42_INTERSSC_MOUNTOPS "minorversion=2,vers=4,addr=%s"
+
+/**
+ * Support one copy source server for now.
+ */
+static __be32
+nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
+		       struct vfsmount **mount)
+{
+	struct file_system_type *type;
+	struct vfsmount *ss_mnt;
+	struct nfs42_netaddr *naddr;
+	struct sockaddr_storage tmp_addr;
+	size_t tmp_addrlen, match_netid_len = 3;
+	char *startsep = "", *endsep = "", *match_netid = "tcp";
+	char *ipaddr, *dev_name, *raw_data;
+	int len, raw_len, status = -EINVAL;
+
+	naddr = &nss->u.nl4_addr;
+	tmp_addrlen = rpc_uaddr2sockaddr(SVC_NET(rqstp), naddr->addr,
+					 naddr->addr_len,
+					 (struct sockaddr *)&tmp_addr,
+					 sizeof(tmp_addr));
+	if (tmp_addrlen == 0)
+		goto out_err;
+
+	if (tmp_addr.ss_family == AF_INET6) {
+		startsep = "[";
+		endsep = "]";
+		match_netid = "tcp6";
+		match_netid_len = 4;
+	}
+
+	if (naddr->netid_len != match_netid_len ||
+		strncmp(naddr->netid, match_netid, naddr->netid_len))
+		goto out_err;
+
+	/* Construct the raw data for the vfs_kern_mount call */
+	len = RPC_MAX_ADDRBUFLEN + 1;
+	ipaddr = kzalloc(len, GFP_KERNEL);
+	if (!ipaddr)
+		goto out_err;
+
+	rpc_ntop((struct sockaddr *)&tmp_addr, ipaddr, len);
+
+	/* 2 for ipv6 endsep and startsep. 3 for ":/" and trailing '/0'*/
+
+	raw_len = strlen(NFSD42_INTERSSC_MOUNTOPS) + strlen(ipaddr);
+	raw_data = kzalloc(raw_len, GFP_KERNEL);
+	if (!raw_data)
+		goto out_free_ipaddr;
+
+	snprintf(raw_data, raw_len, NFSD42_INTERSSC_MOUNTOPS, ipaddr);
+
+	status = -ENODEV;
+	type = get_fs_type("nfs");
+	if (!type)
+		goto out_free_rawdata;
+
+	/* Set the server:<export> for the vfs_kern_mount call */
+	dev_name = kzalloc(len + 5, GFP_KERNEL);
+	if (!dev_name)
+		goto out_free_rawdata;
+	snprintf(dev_name, len + 5, "%s%s%s:/", startsep, ipaddr, endsep);
+
+	/* Use an 'internal' mount: MS_KERNMOUNT -> MNT_INTERNAL */
+	ss_mnt = vfs_kern_mount(type, MS_KERNMOUNT, dev_name, raw_data);
+	module_put(type->owner);
+	if (IS_ERR(ss_mnt))
+		goto out_free_devname;
+
+	status = 0;
+	*mount = ss_mnt;
+
+out_free_devname:
+	kfree(dev_name);
+out_free_rawdata:
+	kfree(raw_data);
+out_free_ipaddr:
+	kfree(ipaddr);
+out_err:
+	return status;
+}
+
+static void
+nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
+{
+	nfs_sb_deactive(ss_mnt->mnt_sb);
+	mntput(ss_mnt);
+}
+
+/**
+ * nfsd4_setup_inter_ssc
+ *
+ * Verify COPY destination stateid.
+ * Connect to the source server with NFSv4.1.
+ * Create the source struct file for nfsd_copy_range.
+ * Called with COPY cstate:
+ *    SAVED_FH: source filehandle
+ *    CURRENT_FH: destination filehandle
+ *
+ * Returns errno (not nfserrxxx)
+ */
+static __be32
+nfsd4_setup_inter_ssc(struct svc_rqst *rqstp,
+		      struct nfsd4_compound_state *cstate,
+		      struct nfsd4_copy *copy, struct vfsmount **mount)
+{
+	struct svc_fh *s_fh = NULL;
+	stateid_t *s_stid = &copy->cp_src_stateid;
+	__be32 status = -EINVAL;
+
+	/* Verify the destination stateid and set dst struct file*/
+	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
+					    &copy->cp_dst_stateid,
+					    WR_STATE, &copy->file_dst, NULL,
+					    NULL);
+	if (status)
+		goto out;
+
+	status = nfsd4_interssc_connect(&copy->cp_src, rqstp, mount);
+	if (status)
+		goto out;
+
+	s_fh = &cstate->save_fh;
+
+	copy->c_fh.size = s_fh->fh_handle.fh_size;
+	memcpy(copy->c_fh.data, &s_fh->fh_handle.fh_base, copy->c_fh.size);
+	copy->stateid.seqid = s_stid->si_generation;
+	memcpy(copy->stateid.other, (void *)&s_stid->si_opaque,
+	       sizeof(stateid_opaque_t));
+
+	status = 0;
+out:
+	return status;
+}
+
+static void
+nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *src,
+			struct file *dst)
+{
+	nfs42_ssc_close(src);
+	fput(src);
+	fput(dst);
+	mntput(ss_mnt);
+}
+
+#else /* CONFIG_NFSD_V4_2_INTER_SSC */
+
+static __be32
+nfsd4_setup_inter_ssc(struct svc_rqst *rqstp,
+		      struct nfsd4_compound_state *cstate,
+		      struct nfsd4_copy *copy,
+		      struct vfs_mount **mount)
+{
+	*mount = NULL;
+	return -EINVAL;
+}
+
+static void
+nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *src,
+			struct file *dst)
+{
+}
+
+static void
+nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
+{
+}
+
+static struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
+				   struct nfs_fh *src_fh,
+				   nfs4_stateid *stateid)
+{
+	return NULL;
+}
+#endif /* CONFIG_NFSD_V4_2_INTER_SSC */
+
+static __be32
+nfsd4_setup_intra_ssc(struct svc_rqst *rqstp,
+		      struct nfsd4_compound_state *cstate,
+		      struct nfsd4_copy *copy)
+{
+	return nfsd4_verify_copy(rqstp, cstate, &copy->cp_src_stateid,
+				 &copy->file_src, &copy->cp_dst_stateid,
+				 &copy->file_dst, NULL);
+}
+
+static void
+nfsd4_cleanup_intra_ssc(struct file *src, struct file *dst)
+{
+	fput(src);
+	fput(dst);
+}
 
 static void nfsd4_cb_offload_release(struct nfsd4_callback *cb)
 {
@@ -1219,12 +1422,16 @@ static __be32 nfsd4_do_copy(struct nfsd4_copy *copy, bool sync)
 		status = nfs_ok;
 	}
 
-	fput(copy->file_src);
-	fput(copy->file_dst);
+	if (!copy->cp_intra) /* Inter server SSC */
+		nfsd4_cleanup_inter_ssc(copy->ss_mnt, copy->file_src,
+					copy->file_dst);
+	else
+		nfsd4_cleanup_intra_ssc(copy->file_src, copy->file_dst);
+
 	return status;
 }
 
-static void dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy *dst)
+static int dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy *dst)
 {
 	dst->cp_src_pos = src->cp_src_pos;
 	dst->cp_dst_pos = src->cp_dst_pos;
@@ -1234,8 +1441,17 @@ static void dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy *dst)
 	memcpy(&dst->fh, &src->fh, sizeof(src->fh));
 	dst->cp_clp = src->cp_clp;
 	dst->file_dst = get_file(src->file_dst);
-	dst->file_src = get_file(src->file_src);
+	dst->cp_intra = src->cp_intra;
+	if (src->cp_intra) /* for inter, file_src doesn't exist yet */
+		dst->file_src = get_file(src->file_src);
 	memcpy(&dst->cp_stateid, &src->cp_stateid, sizeof(src->cp_stateid));
+	memcpy(&dst->cp_src, &src->cp_src, sizeof(struct nl4_server));
+	memcpy(&dst->stateid, &src->stateid, sizeof(src->stateid));
+	memcpy(&dst->c_fh, &src->c_fh, sizeof(src->c_fh));
+	dst->ss_mnt = src->ss_mnt;
+
+	return 0;
+
 }
 
 static void cleanup_async_copy(struct nfsd4_copy *copy)
@@ -1254,7 +1470,18 @@ static int nfsd4_do_async_copy(void *data)
 	struct nfsd4_copy *copy = (struct nfsd4_copy *)data;
 	struct nfsd4_copy *cb_copy;
 
+	if (!copy->cp_intra) { /* Inter server SSC */
+		copy->file_src = nfs42_ssc_open(copy->ss_mnt, &copy->c_fh,
+					      &copy->stateid);
+		if (IS_ERR(copy->file_src)) {
+			copy->nfserr = nfserr_offload_denied;
+			nfsd4_interssc_disconnect(copy->ss_mnt);
+			goto do_callback;
+		}
+	}
+
 	copy->nfserr = nfsd4_do_copy(copy, 0);
+do_callback:
 	cb_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL);
 	if (!cb_copy)
 		goto out;
@@ -1278,11 +1505,20 @@ static int nfsd4_do_async_copy(void *data)
 	__be32 status;
 	struct nfsd4_copy *async_copy = NULL;
 
-	status = nfsd4_verify_copy(rqstp, cstate, &copy->cp_src_stateid,
-				   &copy->file_src, &copy->cp_dst_stateid,
-				   &copy->file_dst, NULL);
-	if (status)
-		goto out;
+	if (!copy->cp_intra) { /* Inter server SSC */
+		if (!inter_copy_offload_enable || copy->cp_synchronous) {
+			status = nfserr_notsupp;
+			goto out;
+		}
+		status = nfsd4_setup_inter_ssc(rqstp, cstate, copy,
+					&copy->ss_mnt);
+		if (status)
+			return nfserr_offload_denied;
+	} else {
+		status = nfsd4_setup_intra_ssc(rqstp, cstate, copy);
+		if (status)
+			return status;
+	}
 
 	copy->cp_clp = cstate->clp;
 	memcpy(&copy->fh, &cstate->current_fh.fh_handle,
@@ -1293,15 +1529,15 @@ static int nfsd4_do_async_copy(void *data)
 		status = nfserrno(-ENOMEM);
 		async_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL);
 		if (!async_copy)
-			goto out;
-		if (!nfs4_init_copy_state(nn, copy)) {
-			kfree(async_copy);
-			goto out;
-		}
+			goto out_err;
+		if (!nfs4_init_copy_state(nn, copy))
+			goto out_err;
 		refcount_set(&async_copy->refcount, 1);
 		memcpy(&copy->cp_res.cb_stateid, &copy->cp_stateid,
 			sizeof(copy->cp_stateid));
-		dup_copy_fields(copy, async_copy);
+		status = dup_copy_fields(copy, async_copy);
+		if (status)
+			goto out_err;
 		async_copy->copy_task = kthread_create(nfsd4_do_async_copy,
 				async_copy, "%s", "copy thread");
 		if (IS_ERR(async_copy->copy_task))
@@ -1312,13 +1548,17 @@ static int nfsd4_do_async_copy(void *data)
 		spin_unlock(&async_copy->cp_clp->async_lock);
 		wake_up_process(async_copy->copy_task);
 		status = nfs_ok;
-	} else
+	} else {
 		status = nfsd4_do_copy(copy, 1);
+	}
 out:
 	return status;
 out_err:
 	cleanup_async_copy(async_copy);
-	goto out;
+	status = nfserrno(-ENOMEM);
+	if (!copy->cp_intra)
+		nfsd4_interssc_disconnect(copy->ss_mnt);
+	goto out_err;
 }
 
 struct nfsd4_copy *
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 89cb484..9d254e7 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -30,6 +30,12 @@
 
 #define NFSDDBG_FACILITY	NFSDDBG_SVC
 
+bool inter_copy_offload_enable;
+EXPORT_SYMBOL_GPL(inter_copy_offload_enable);
+module_param(inter_copy_offload_enable, bool, 0644);
+MODULE_PARM_DESC(inter_copy_offload_enable,
+		 "Enable inter server to server copy offload. Default: false");
+
 extern struct svc_program	nfsd_program;
 static int			nfsd(void *vrqstp);
 
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index fbd18d6..bb2f8e5 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -547,7 +547,12 @@ struct nfsd4_copy {
 	struct task_struct	*copy_task;
 	refcount_t		refcount;
 	bool			stopped;
+
+	struct vfsmount		*ss_mnt;
+	struct nfs_fh		c_fh;
+	nfs4_stateid		stateid;
 };
+extern bool inter_copy_offload_enable;
 
 struct nfsd4_seek {
 	/* request */
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 9e49a6c..fa4b411 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -17,6 +17,7 @@
 #include <linux/uidgid.h>
 #include <uapi/linux/nfs4.h>
 #include <linux/sunrpc/msg_prot.h>
+#include <linux/nfs.h>
 
 enum nfs4_acl_whotype {
 	NFS4_ACL_WHO_NAMED = 0,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 01/10] VFS generic copy_file_range() support
  2018-11-30 20:03 ` [PATCH v2 01/10] VFS generic copy_file_range() support Olga Kornievskaia
@ 2018-12-01  8:11   ` Amir Goldstein
  2018-12-01 13:23     ` Olga Kornievskaia
  2018-12-01 22:00     ` Dave Chinner
  2018-12-01 21:18   ` Matthew Wilcox
  1 sibling, 2 replies; 30+ messages in thread
From: Amir Goldstein @ 2018-12-01  8:11 UTC (permalink / raw)
  To: Olga Kornievskaia
  Cc: bfields, Linux NFS Mailing List, linux-fsdevel, Dave Chinner,
	Matthew Wilcox, Jeff Layton, Steve French

On Fri, Nov 30, 2018 at 10:04 PM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> Relax the condition that input files must be from the same
> file systems.
>
> Add checks that input parameters adhere semantics.
>
> If no copy_file_range() support is found, then do generic
> checks for the unsupported page cache ranges, LFS, limits,
> and clear setuid/setgid if not running as root before calling
> do_splice_direct(). Update atime,ctime,mtime afterwards.
>
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---

This patch is either going to bring you down or make you stronger ;-)

This is not how its done. Behavior change and refactoring mixed into
one patch is wrong for several reasons. And when you relax same sb
check you need to restrict it inside filesystems, like your previous patch
did.

You already had v7 patch reviewed-by 4 developers.
What made you go and change it (and posted as v2)?

Your intentions were good trying to fix the broken syscall, but
I hope you understood that Dave didn't mean that you *have* to
add the missing generic checks as part of your work. He just
pointed out how broken the current interface is in the context of
reviewing your patch.

In any case, I hear that Dave is neck deep in fixing copy_file_range()
so changes to this function should be collaborated with him. Or better
yet, wait until he posts his fixes and carry on from there.

If I were you, I would just go back to the reviewed v7 vfs patch.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 01/10] VFS generic copy_file_range() support
  2018-12-01  8:11   ` Amir Goldstein
@ 2018-12-01 13:23     ` Olga Kornievskaia
  2018-12-01 13:44       ` Olga Kornievskaia
  2018-12-01 22:00     ` Dave Chinner
  1 sibling, 1 reply; 30+ messages in thread
From: Olga Kornievskaia @ 2018-12-01 13:23 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: J. Bruce Fields, linux-nfs, linux-fsdevel, david, willy, jlayton,
	stfrench

On Sat, Dec 1, 2018 at 3:11 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Fri, Nov 30, 2018 at 10:04 PM Olga Kornievskaia
> <olga.kornievskaia@gmail.com> wrote:
> >
> > Relax the condition that input files must be from the same
> > file systems.
> >
> > Add checks that input parameters adhere semantics.
> >
> > If no copy_file_range() support is found, then do generic
> > checks for the unsupported page cache ranges, LFS, limits,
> > and clear setuid/setgid if not running as root before calling
> > do_splice_direct(). Update atime,ctime,mtime afterwards.
> >
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > ---
>
> This patch is either going to bring you down or make you stronger ;-)
>
> This is not how its done. Behavior change and refactoring mixed into
> one patch is wrong for several reasons. And when you relax same sb
> check you need to restrict it inside filesystems, like your previous patch
> did.
>
> You already had v7 patch reviewed-by 4 developers.
> What made you go and change it (and posted as v2)?
>
> Your intentions were good trying to fix the broken syscall, but
> I hope you understood that Dave didn't mean that you *have* to
> add the missing generic checks as part of your work. He just
> pointed out how broken the current interface is in the context of
> reviewing your patch.
>
> In any case, I hear that Dave is neck deep in fixing copy_file_range()
> so changes to this function should be collaborated with him. Or better
> yet, wait until he posts his fixes and carry on from there.
>
> If I were you, I would just go back to the reviewed v7 vfs patch.

This is NOT a replacement to the v7 vfs patch??? This is a new patch
on top of that one.

I assume that v7 patch has been OK-ed by everybody and is ready to go in???

As you recall, what was left is to provide the functionality to relax
the check for the superblocks to be the same before calling the
do_splice_direct(). This patch attempt do this. I was under the
impression that to do so extra checks were needed to be added which I
added.


>
> Thanks,
> Amir.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 01/10] VFS generic copy_file_range() support
  2018-12-01 13:23     ` Olga Kornievskaia
@ 2018-12-01 13:44       ` Olga Kornievskaia
       [not found]         ` <CAOQ4uxgENLCDH7QwtBPxA60dKEXvLVknBMY_Lgoetq_uQ=7gwA@mail.gmail.com>
  0 siblings, 1 reply; 30+ messages in thread
From: Olga Kornievskaia @ 2018-12-01 13:44 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: J. Bruce Fields, linux-nfs, linux-fsdevel, david, willy, jlayton,
	stfrench

On Sat, Dec 1, 2018 at 8:23 AM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> On Sat, Dec 1, 2018 at 3:11 AM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Fri, Nov 30, 2018 at 10:04 PM Olga Kornievskaia
> > <olga.kornievskaia@gmail.com> wrote:
> > >
> > > Relax the condition that input files must be from the same
> > > file systems.
> > >
> > > Add checks that input parameters adhere semantics.
> > >
> > > If no copy_file_range() support is found, then do generic
> > > checks for the unsupported page cache ranges, LFS, limits,
> > > and clear setuid/setgid if not running as root before calling
> > > do_splice_direct(). Update atime,ctime,mtime afterwards.
> > >
> > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > ---
> >
> > This patch is either going to bring you down or make you stronger ;-)
> >
> > This is not how its done. Behavior change and refactoring mixed into
> > one patch is wrong for several reasons. And when you relax same sb
> > check you need to restrict it inside filesystems, like your previous patch
> > did.
> >
> > You already had v7 patch reviewed-by 4 developers.
> > What made you go and change it (and posted as v2)?
> >
> > Your intentions were good trying to fix the broken syscall, but
> > I hope you understood that Dave didn't mean that you *have* to
> > add the missing generic checks as part of your work. He just
> > pointed out how broken the current interface is in the context of
> > reviewing your patch.
> >
> > In any case, I hear that Dave is neck deep in fixing copy_file_range()
> > so changes to this function should be collaborated with him. Or better
> > yet, wait until he posts his fixes and carry on from there.
> >
> > If I were you, I would just go back to the reviewed v7 vfs patch.
>
> This is NOT a replacement to the v7 vfs patch??? This is a new patch
> on top of that one.
>
> I assume that v7 patch has been OK-ed by everybody and is ready to go in???
>
> As you recall, what was left is to provide the functionality to relax
> the check for the superblocks to be the same before calling the
> do_splice_direct(). This patch attempt do this. I was under the
> impression that to do so extra checks were needed to be added which I
> added.
>

To clarify, previously I had a VFS patch with the client-side series
to support "server to server" copy offload. It needed the
functionality to be able to call copy_file_range with different super
blocks.

This patch series is for the server side support for the "server to
server" copy offload. It requires ability to call copy_file_range()
and do a copy between NFS and a local file system. Thus it needs
generic_copy_file_range.

>
> >
> > Thanks,
> > Amir.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 01/10] VFS generic copy_file_range() support
       [not found]           ` <CAN-5tyFGV=fUCbAG5mSvy=LXDpdp8VG9Sh1aGMkBHQAG1Rp1sQ@mail.gmail.com>
@ 2018-12-01 16:59             ` Amir Goldstein
  0 siblings, 0 replies; 30+ messages in thread
From: Amir Goldstein @ 2018-12-01 16:59 UTC (permalink / raw)
  To: Olga Kornievskaia
  Cc: bfields, Dave Chinner, Jeff Layton, Matthew Wilcox,
	linux-fsdevel, Linux NFS Mailing List

On Sat, Dec 1, 2018 at 5:57 PM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> On Sat, Dec 1, 2018 at 9:03 AM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> >
> >
> > On Sat, Dec 1, 2018, 3:44 PM Olga Kornievskaia <olga.kornievskaia@gmail.com wrote:
> >>
> >> On Sat, Dec 1, 2018 at 8:23 AM Olga Kornievskaia
> >> <olga.kornievskaia@gmail.com> wrote:
> >> >
> >> > On Sat, Dec 1, 2018 at 3:11 AM Amir Goldstein <amir73il@gmail.com> wrote:
> >> > >
> >> > > On Fri, Nov 30, 2018 at 10:04 PM Olga Kornievskaia
> >> > > <olga.kornievskaia@gmail.com> wrote:
> >> > > >
> >> > > > Relax the condition that input files must be from the same
> >> > > > file systems.
> >> > > >
> >> > > > Add checks that input parameters adhere semantics.
> >> > > >
> >> > > > If no copy_file_range() support is found, then do generic
> >> > > > checks for the unsupported page cache ranges, LFS, limits,
> >> > > > and clear setuid/setgid if not running as root before calling
> >> > > > do_splice_direct(). Update atime,ctime,mtime afterwards.
> >> > > >
> >> > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> >> > > > ---
> >> > >
> >> > > This patch is either going to bring you down or make you stronger ;-)
> >> > >
> >> > > This is not how its done. Behavior change and refactoring mixed into
> >> > > one patch is wrong for several reasons. And when you relax same sb
> >> > > check you need to restrict it inside filesystems, like your previous patch
> >> > > did.
> >> > >
> >> > > You already had v7 patch reviewed-by 4 developers.
> >> > > What made you go and change it (and posted as v2)?
> >> > >
> >> > > Your intentions were good trying to fix the broken syscall, but
> >> > > I hope you understood that Dave didn't mean that you *have* to
> >> > > add the missing generic checks as part of your work. He just
> >> > > pointed out how broken the current interface is in the context of
> >> > > reviewing your patch.
> >> > >
> >> > > In any case, I hear that Dave is neck deep in fixing copy_file_range()
> >> > > so changes to this function should be collaborated with him. Or better
> >> > > yet, wait until he posts his fixes and carry on from there.
> >> > >
> >> > > If I were you, I would just go back to the reviewed v7 vfs patch.
> >> >
> >> > This is NOT a replacement to the v7 vfs patch??? This is a new patch
> >> > on top of that one.
> >> >
> >> > I assume that v7 patch has been OK-ed by everybody and is ready to go in???
> >> >
> >> > As you recall, what was left is to provide the functionality to relax
> >> > the check for the superblocks to be the same before calling the
> >> > do_splice_direct(). This patch attempt do this. I was under the
> >> > impression that to do so extra checks were needed to be added which I
> >> > added.
> >> >
> >>
> >> To clarify, previously I had a VFS patch with the client-side series
> >> to support "server to server" copy offload. It needed the
> >> functionality to be able to call copy_file_range with different super
> >> blocks.
> >>
> >> This patch series is for the server side support for the "server to
> >> server" copy offload. It requires ability to call copy_file_range()
> >> and do a copy between NFS and a local file system. Thus it needs
> >> generic_copy_file_range.
> >
> >
> > Ah. Sorry for the confusion.
> > My comment on change of behavior and refactoring in same patch still hold.
> > My comment about coordinate your work with Dave Chinner still hold.
>
> Understood. I will email Dave directly and coordinate.
>
> > Raise that with a comment about adding test coverage to the new
> > generic cross fs copy API to xfstest.
>
> What kind of extra coverage are you envisioning? Something that
> requires two different file systems mounted and then does a fs copy?
>

Yes, if you add this functionality you should add test coverage for the
added functionality. It's not going to be trivial to add cross fs type tests
to xfstests, but adding cross fs (same type) should be relatively easy
(copy_file_range from test fs to scratch fs).

> > Am I mistaken that this change affects any cross fs copy file range
> > by userspace and not only by kernel nfsd?
>
> That's correct, any cross fs copy is what I'm going for here.
>

Forgive me for being thick. After briefly going over the patches, I still don't
understand if you *need* to add generic cross fs copy to implement
server side copy support in nfsd? Or if you are adding it as an added bonus
to the community along with your SSC patch set?

The first two patches of the series seem unrelated to the rest, but maybe
I'm just not getting the connection?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 01/10] VFS generic copy_file_range() support
  2018-11-30 20:03 ` [PATCH v2 01/10] VFS generic copy_file_range() support Olga Kornievskaia
  2018-12-01  8:11   ` Amir Goldstein
@ 2018-12-01 21:18   ` Matthew Wilcox
  2018-12-01 22:36     ` Dave Chinner
  1 sibling, 1 reply; 30+ messages in thread
From: Matthew Wilcox @ 2018-12-01 21:18 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: bfields, linux-nfs, linux-fsdevel

On Fri, Nov 30, 2018 at 03:03:39PM -0500, Olga Kornievskaia wrote:
> Relax the condition that input files must be from the same
> file systems.

> +	ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out,
> +			count > MAX_RW_COUNT ? MAX_RW_COUNT : count, 0);

Wasn't there a concern about splicing between filesystems with different
block sizes mentioned the last time this came up?  I can't find a citation
for that now.

> -	/* this could be relaxed once generic cross fs support is added */
> -	if (inode_in->i_sb != inode_out->i_sb) {
> -		ret = -EXDEV;
> -		goto done;
> -	}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 01/10] VFS generic copy_file_range() support
  2018-12-01  8:11   ` Amir Goldstein
  2018-12-01 13:23     ` Olga Kornievskaia
@ 2018-12-01 22:00     ` Dave Chinner
  2018-12-02  3:12       ` Olga Kornievskaia
  1 sibling, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2018-12-01 22:00 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Olga Kornievskaia, bfields, Linux NFS Mailing List,
	linux-fsdevel, Matthew Wilcox, Jeff Layton, Steve French

On Sat, Dec 01, 2018 at 10:11:48AM +0200, Amir Goldstein wrote:
> On Fri, Nov 30, 2018 at 10:04 PM Olga Kornievskaia
> <olga.kornievskaia@gmail.com> wrote:
> >
> > Relax the condition that input files must be from the same
> > file systems.
> >
> > Add checks that input parameters adhere semantics.
> >
> > If no copy_file_range() support is found, then do generic
> > checks for the unsupported page cache ranges, LFS, limits,
> > and clear setuid/setgid if not running as root before calling
> > do_splice_direct(). Update atime,ctime,mtime afterwards.
> >
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > ---
> 
> This patch is either going to bring you down or make you stronger ;-)
> 
> This is not how its done. Behavior change and refactoring mixed into
> one patch is wrong for several reasons. And when you relax same sb
> check you need to restrict it inside filesystems, like your previous patch
> did.
.....
> In any case, I hear that Dave is neck deep in fixing copy_file_range()
> so changes to this function should be collaborated with him. Or better
> yet, wait until he posts his fixes and carry on from there.

Yeah, because I've heard nothing for a month and this is kinda
important, I have a series of 8-9 patches that make all the fixes we
need, push the cross-filesystem checks down into the filesystems,
and let filesystems handle the fallback to a splice based copy
themselves (because there are way more fallback cases than just
EOPNOPSUPP and EXDEV).

I also have a patch for the man page that document all the missing
failure cases, and document where things are filesystem specific or
not.

And I also have a fstests patch that exercises all the failure cases
so that all filesystems will end up behaving the same way for all
the same cases they should.

I'm still sorting out the fstests patch (it requires changes
to xfs_io's copy-range command) so I've got some confidence that the
code actually does what it says in the man page, but I should have
that sorted in a couple of days.

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 01/10] VFS generic copy_file_range() support
  2018-12-01 21:18   ` Matthew Wilcox
@ 2018-12-01 22:36     ` Dave Chinner
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Chinner @ 2018-12-01 22:36 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Olga Kornievskaia, bfields, linux-nfs, linux-fsdevel

On Sat, Dec 01, 2018 at 01:18:06PM -0800, Matthew Wilcox wrote:
> On Fri, Nov 30, 2018 at 03:03:39PM -0500, Olga Kornievskaia wrote:
> > Relax the condition that input files must be from the same
> > file systems.
> 
> > +	ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out,
> > +			count > MAX_RW_COUNT ? MAX_RW_COUNT : count, 0);
> 
> Wasn't there a concern about splicing between filesystems with different
> block sizes mentioned the last time this came up?  I can't find a citation
> for that now.

the filesystems should be able to handle that themselves - they are
just passes an iter that has a range of data regions in pages that
they copy the required data into/out of. The data transfer mechanism
itself is completely independent of filesystem block sizes....

There's lots of other problems with do_splice_direct, but I don't
think this is one of them. I coul dbe wrong - this code has pretty
much zero documentation on how it is supposed to work and what it is
supposed to do - so don't take my word for it...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 01/10] VFS generic copy_file_range() support
  2018-12-01 22:00     ` Dave Chinner
@ 2018-12-02  3:12       ` Olga Kornievskaia
  2018-12-02 15:19         ` Olga Kornievskaia
  2018-12-02 20:47         ` Dave Chinner
  0 siblings, 2 replies; 30+ messages in thread
From: Olga Kornievskaia @ 2018-12-02  3:12 UTC (permalink / raw)
  To: david
  Cc: Amir Goldstein, J. Bruce Fields, linux-nfs, linux-fsdevel, willy,
	jlayton, stfrench

On Sat, Dec 1, 2018 at 5:00 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Sat, Dec 01, 2018 at 10:11:48AM +0200, Amir Goldstein wrote:
> > On Fri, Nov 30, 2018 at 10:04 PM Olga Kornievskaia
> > <olga.kornievskaia@gmail.com> wrote:
> > >
> > > Relax the condition that input files must be from the same
> > > file systems.
> > >
> > > Add checks that input parameters adhere semantics.
> > >
> > > If no copy_file_range() support is found, then do generic
> > > checks for the unsupported page cache ranges, LFS, limits,
> > > and clear setuid/setgid if not running as root before calling
> > > do_splice_direct(). Update atime,ctime,mtime afterwards.
> > >
> > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > ---
> >
> > This patch is either going to bring you down or make you stronger ;-)
> >
> > This is not how its done. Behavior change and refactoring mixed into
> > one patch is wrong for several reasons. And when you relax same sb
> > check you need to restrict it inside filesystems, like your previous patch
> > did.
> .....
> > In any case, I hear that Dave is neck deep in fixing copy_file_range()
> > so changes to this function should be collaborated with him. Or better
> > yet, wait until he posts his fixes and carry on from there.
>
> Yeah, because I've heard nothing for a month and this is kinda
> important

Dave I think that's unfair. It is important. NFS is actually the file
system that needed VFS support for cross fs copy_file_range and I was
working on it. If you were in doubt, you could have emailed and asked
me.

I'm unsure now what does this mean. I have a patch series with a VFS
patch that went thru the extensive review (people spend time on it)
and an NFS patch series that depends on it that is ready for the
upstream push. Are you saying that the VFS patch is no longer welcomed
and thus NFS series is no longer viable either?

, I have a series of 8-9 patches that make all the fixes we
> need, push the cross-filesystem checks down into the filesystems,
> and let filesystems handle the fallback to a splice based copy
> themselves (because there are way more fallback cases than just
> EOPNOPSUPP and EXDEV).

Are you saying it is each individual filesystem responsibility to
fallback on splice? Isn't that a step backwards? Each individual
filesystem is going to implement the same code of calling
do_splice_direct() to do the functionally that could and should be in
VFS?

>
> I also have a patch for the man page that document all the missing
> failure cases, and document where things are filesystem specific or
> not.
>
> And I also have a fstests patch that exercises all the failure cases
> so that all filesystems will end up behaving the same way for all
> the same cases they should.
>
> I'm still sorting out the fstests patch (it requires changes
> to xfs_io's copy-range command) so I've got some confidence that the
> code actually does what it says in the man page, but I should have
> that sorted in a couple of days.
>
> Cheers,
>
> Dave.
>
> --
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 01/10] VFS generic copy_file_range() support
  2018-12-02  3:12       ` Olga Kornievskaia
@ 2018-12-02 15:19         ` Olga Kornievskaia
  2018-12-02 20:47         ` Dave Chinner
  1 sibling, 0 replies; 30+ messages in thread
From: Olga Kornievskaia @ 2018-12-02 15:19 UTC (permalink / raw)
  To: david
  Cc: Amir Goldstein, J. Bruce Fields, linux-nfs, linux-fsdevel, willy,
	jlayton, stfrench

On Sat, Dec 1, 2018 at 10:12 PM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> On Sat, Dec 1, 2018 at 5:00 PM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Sat, Dec 01, 2018 at 10:11:48AM +0200, Amir Goldstein wrote:
> > > On Fri, Nov 30, 2018 at 10:04 PM Olga Kornievskaia
> > > <olga.kornievskaia@gmail.com> wrote:
> > > >
> > > > Relax the condition that input files must be from the same
> > > > file systems.
> > > >
> > > > Add checks that input parameters adhere semantics.
> > > >
> > > > If no copy_file_range() support is found, then do generic
> > > > checks for the unsupported page cache ranges, LFS, limits,
> > > > and clear setuid/setgid if not running as root before calling
> > > > do_splice_direct(). Update atime,ctime,mtime afterwards.
> > > >
> > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > ---
> > >
> > > This patch is either going to bring you down or make you stronger ;-)
> > >
> > > This is not how its done. Behavior change and refactoring mixed into
> > > one patch is wrong for several reasons. And when you relax same sb
> > > check you need to restrict it inside filesystems, like your previous patch
> > > did.
> > .....
> > > In any case, I hear that Dave is neck deep in fixing copy_file_range()
> > > so changes to this function should be collaborated with him. Or better
> > > yet, wait until he posts his fixes and carry on from there.
> >
> > Yeah, because I've heard nothing for a month and this is kinda
> > important
>
> Dave I think that's unfair. It is important. NFS is actually the file
> system that needed VFS support for cross fs copy_file_range and I was
> working on it. If you were in doubt, you could have emailed and asked
> me.

Just to be clear. What I think was unfair in that comment was the
wording "this is kinda important". I think a lot stems from lack of
clarity in the the mailing list communications. I object to the fact
that it wasn't clear who was going to implement the functionality.
Since the work was needed by NFS I didn't want to assume that somebody
in VFS would just do it for us. At the time nobody in VFS stood up and
said they would do the work and thus I tried to do my best.

I'm grateful, and would have been in the first place, that somebody
did support generic cross-filesystem functionality. Thus I'm by no
means speaking against Dave's work.

> I'm unsure now what does this mean. I have a patch series with a VFS
> patch that went thru the extensive review (people spend time on it)
> and an NFS patch series that depends on it that is ready for the
> upstream push. Are you saying that the VFS patch is no longer welcomed
> and thus NFS series is no longer viable either?

I'm unclear of the fate of the patch set that has the (v7) VFS patch
that was reviewed and approved and is thought to be pushed for 4.21.
It is unclear if the new work is on top of that or not.

> , I have a series of 8-9 patches that make all the fixes we
> > need, push the cross-filesystem checks down into the filesystems,
> > and let filesystems handle the fallback to a splice based copy
> > themselves (because there are way more fallback cases than just
> > EOPNOPSUPP and EXDEV).
>
> Are you saying it is each individual filesystem responsibility to
> fallback on splice? Isn't that a step backwards? Each individual
> filesystem is going to implement the same code of calling
> do_splice_direct() to do the functionally that could and should be in
> VFS?
>
> >
> > I also have a patch for the man page that document all the missing
> > failure cases, and document where things are filesystem specific or
> > not.
> >
> > And I also have a fstests patch that exercises all the failure cases
> > so that all filesystems will end up behaving the same way for all
> > the same cases they should.
> >
> > I'm still sorting out the fstests patch (it requires changes
> > to xfs_io's copy-range command) so I've got some confidence that the
> > code actually does what it says in the man page, but I should have
> > that sorted in a couple of days.
> >
> > Cheers,
> >
> > Dave.
> >
> > --
> > Dave Chinner
> > david@fromorbit.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 01/10] VFS generic copy_file_range() support
  2018-12-02  3:12       ` Olga Kornievskaia
  2018-12-02 15:19         ` Olga Kornievskaia
@ 2018-12-02 20:47         ` Dave Chinner
  1 sibling, 0 replies; 30+ messages in thread
From: Dave Chinner @ 2018-12-02 20:47 UTC (permalink / raw)
  To: Olga Kornievskaia
  Cc: Amir Goldstein, J. Bruce Fields, linux-nfs, linux-fsdevel, willy,
	jlayton, stfrench

On Sat, Dec 01, 2018 at 10:12:05PM -0500, Olga Kornievskaia wrote:
> On Sat, Dec 1, 2018 at 5:00 PM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Sat, Dec 01, 2018 at 10:11:48AM +0200, Amir Goldstein wrote:
> > > On Fri, Nov 30, 2018 at 10:04 PM Olga Kornievskaia
> > > <olga.kornievskaia@gmail.com> wrote:
> > > >
> > > > Relax the condition that input files must be from the same
> > > > file systems.
> > > >
> > > > Add checks that input parameters adhere semantics.
> > > >
> > > > If no copy_file_range() support is found, then do generic
> > > > checks for the unsupported page cache ranges, LFS, limits,
> > > > and clear setuid/setgid if not running as root before calling
> > > > do_splice_direct(). Update atime,ctime,mtime afterwards.
> > > >
> > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > ---
> > >
> > > This patch is either going to bring you down or make you stronger ;-)
> > >
> > > This is not how its done. Behavior change and refactoring mixed into
> > > one patch is wrong for several reasons. And when you relax same sb
> > > check you need to restrict it inside filesystems, like your previous patch
> > > did.
> > .....
> > > In any case, I hear that Dave is neck deep in fixing copy_file_range()
> > > so changes to this function should be collaborated with him. Or better
> > > yet, wait until he posts his fixes and carry on from there.
> >
> > Yeah, because I've heard nothing for a month and this is kinda
> > important
> 
> Dave I think that's unfair. It is important. NFS is actually the file
> system that needed VFS support for cross fs copy_file_range and I was
> working on it. If you were in doubt, you could have emailed and asked
> me.

Last I heard from you was "this isn't my problem and I don't have
time to deal with it". You were fairly unambiguous in saying you
weren't going to spend any time on it.

> I'm unsure now what does this mean. I have a patch series with a VFS
> patch that went thru the extensive review (people spend time on it)
> and an NFS patch series that depends on it that is ready for the
> upstream push. Are you saying that the VFS patch is no longer welcomed
> and thus NFS series is no longer viable either?

No, I'm saying that this is urgent work and needs to be separated
from the NFS patch series, of which there are now two and you've
split copy_file_range() changes across both patch sets.
copy_file_range() is broken for *everyone*, not just NFS.  i.e.
fixing these problems should not be tied to some other filesystem
feature patchset.

> , I have a series of 8-9 patches that make all the fixes we
> > need, push the cross-filesystem checks down into the filesystems,
> > and let filesystems handle the fallback to a splice based copy
> > themselves (because there are way more fallback cases than just
> > EOPNOPSUPP and EXDEV).
> 
> Are you saying it is each individual filesystem responsibility to
> fallback on splice? Isn't that a step backwards? Each individual
> filesystem is going to implement the same code of calling
> do_splice_direct() to do the functionally that could and should be in
> VFS?

I've done this because one of the problems I've found is that
different filesystems *do not fall back consistently*. e.g. the NFS
client will return -EINVAL if src/dst are the same file, but -EINVAL
is not one of the errors that the vfs code falls back to a data copy
on.

This is despite the fact that the fallback path can copy to/from
the same file, we support same file copy through the
->remap_file_range offload, etc. IOWs, the behaviour of the syscall
when it comes to single file ranges is completely inconsistent
because fallbacks are implemented on a filesystem-by-filesystem
basis.

I called the fallback generic_copy_file_range(), and filesystems that
implement ->copy_file_range() are responsible for calling it
themselves if they want a fallback. That's because there may be
different error/constraint conditions at the filesystem level that
prevent offloading the copy, and we can't distinguish at the VFs
between "-EINVAL means fallback because it was a single file copy"
and "-EINVAL means fail, parameter out of range".

IOWs, if you implement ->copy_file_range() you take full
resposnsibility for implementing the copying function. This is
exactly what we do for all the other file methods, so this is just
making the implementation behaviour consistent with the rest of the
code.

FWIW, this also points out a problem with the copy_file_range()
definition - it does not say WTF should happen if the copy ranges
/overlap/ in the same file. clone is clear on that - support is
determined by the filesystem (i.e. "EINVAL [...] XFS and Btrfs do
not support overlapping reflink ranges in the same file."). For
copying, the fallback code can't copy the file data correctly if the
ranges overlap, so I've added checks to make this illegal and added
that overlapping ranges are not supported to the man page.....

These are the sort of API definition problems that I'm fixing with
right now, and I'm writing tests to make sure that all filesystems
will behave the same way for given copy scenarios.

i.e. I'm not doing this so I can get a NFS feature patchset merged,
I'm doing this to make the copy_file_range API well defined and
robust and allow implementations to be verified against the
specification the man page lays out.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 09/10] NFSD: allow inter server COPY to have a STALE source server fh
  2018-11-30 20:03 ` [PATCH v2 09/10] NFSD: allow inter server COPY to have a STALE source server fh Olga Kornievskaia
@ 2019-02-19 15:53   ` J. Bruce Fields
  0 siblings, 0 replies; 30+ messages in thread
From: J. Bruce Fields @ 2019-02-19 15:53 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: bfields, linux-nfs, linux-fsdevel

On Fri, Nov 30, 2018 at 03:03:47PM -0500, Olga Kornievskaia wrote:
> The inter server to server COPY source server filehandle
> is a foreign filehandle as the COPY is sent to the destination
> server.
> 
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  fs/nfsd/Kconfig    | 10 ++++++++++
>  fs/nfsd/nfs4proc.c | 41 ++++++++++++++++++++++++++++++++++++-----
>  fs/nfsd/nfsfh.h    |  5 ++++-
>  fs/nfsd/xdr4.h     |  1 +
>  4 files changed, 51 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
> index 20b1c17..37ff3d5 100644
> --- a/fs/nfsd/Kconfig
> +++ b/fs/nfsd/Kconfig
> @@ -131,6 +131,16 @@ config NFSD_FLEXFILELAYOUT
>  
>  	  If unsure, say N.
>  
> +config NFSD_V4_2_INTER_SSC
> +	bool "NFSv4.2 inter server to server COPY"
> +	depends on NFSD_V4 && NFS_V4_1 && NFS_V4_2
> +	help
> +	  This option enables support for NFSv4.2 inter server to
> +	  server copy where the destination server calls the NFSv4.2
> +	  client to read the data to copy from the source server.
> +
> +	  If unsure, say N.
> +
>  config NFSD_V4_SECURITY_LABEL
>  	bool "Provide Security Label support for NFSv4 server"
>  	depends on NFSD_V4 && SECURITY
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 70d03e9..2e28254 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -503,12 +503,20 @@ static __be32 nfsd4_open_omfg(struct svc_rqst *rqstp, struct nfsd4_compound_stat
>  	    union nfsd4_op_u *u)
>  {
>  	struct nfsd4_putfh *putfh = &u->putfh;
> +	__be32 ret;
>  
>  	fh_put(&cstate->current_fh);
>  	cstate->current_fh.fh_handle.fh_size = putfh->pf_fhlen;
>  	memcpy(&cstate->current_fh.fh_handle.fh_base, putfh->pf_fhval,
>  	       putfh->pf_fhlen);
> -	return fh_verify(rqstp, &cstate->current_fh, 0, NFSD_MAY_BYPASS_GSS);
> +	ret = fh_verify(rqstp, &cstate->current_fh, 0, NFSD_MAY_BYPASS_GSS);
> +#ifdef CONFIG_NFSD_V4_2_INTER_SSC
> +	if (ret == nfserr_stale && putfh->no_verify) {
> +		SET_FH_FLAG(&cstate->current_fh, NFSD4_FH_FOREIGN);
> +		ret = 0;
> +	}
> +#endif
> +	return ret;
>  }
>  
>  static __be32
> @@ -1967,11 +1975,12 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
>  {
>  	struct nfsd4_compoundargs *args = rqstp->rq_argp;
>  	struct nfsd4_compoundres *resp = rqstp->rq_resp;
> -	struct nfsd4_op	*op;
> +	struct nfsd4_op	*op, *current_op, *saved_op;
>  	struct nfsd4_compound_state *cstate = &resp->cstate;
>  	struct svc_fh *current_fh = &cstate->current_fh;
>  	struct svc_fh *save_fh = &cstate->save_fh;
>  	__be32		status;
> +	int i;
>  
>  	svcxdr_init_encode(rqstp, resp);
>  	resp->tagp = resp->xdr.p;
> @@ -2006,6 +2015,27 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
>  		resp->opcnt = 1;
>  		goto encode_op;
>  	}
> +#ifdef CONFIG_NFSD_V4_2_INTER_SSC
> +	/* traverse all operation and if it's a COPY compound, mark the
> +	 * source filehandle to skip verification
> +	 */
> +	for (i = 0; i < args->opcnt; i++) {
> +		op = &args->ops[i];
> +		if (op->opnum == OP_PUTFH)
> +			current_op = op;
> +		else if (op->opnum == OP_SAVEFH)
> +			saved_op = current_op;
> +		else if (op->opnum == OP_RESTOREFH)
> +			current_op = saved_op;
> +		else if (op->opnum == OP_COPY) {
> +			struct nfsd4_copy *copy = (struct nfsd4_copy *)&op[i].u;
> +			struct nfsd4_putfh *putfh =
> +				(struct nfsd4_putfh *)&saved_op->u;
> +			if (!copy->cp_intra)
> +				putfh->no_verify = true;
> +		}
> +	}
> +#endif

This looks good, but could you please move this loop to a function of
its own?  (And do the usual trick of making that function a no-op when
INTER_SSC isn't defined.)

--b.

>  
>  	trace_nfsd_compound(rqstp, args->opcnt);
>  	while (!status && resp->opcnt < args->opcnt) {
> @@ -2021,13 +2051,14 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
>  				op->status = nfsd4_open_omfg(rqstp, cstate, op);
>  			goto encode_op;
>  		}
> -
> -		if (!current_fh->fh_dentry) {
> +		if (!current_fh->fh_dentry &&
> +				!HAS_FH_FLAG(current_fh, NFSD4_FH_FOREIGN)) {
>  			if (!(op->opdesc->op_flags & ALLOWED_WITHOUT_FH)) {
>  				op->status = nfserr_nofilehandle;
>  				goto encode_op;
>  			}
> -		} else if (current_fh->fh_export->ex_fslocs.migrated &&
> +		} else if (current_fh->fh_export &&
> +			   current_fh->fh_export->ex_fslocs.migrated &&
>  			  !(op->opdesc->op_flags & ALLOWED_ON_ABSENT_FS)) {
>  			op->status = nfserr_moved;
>  			goto encode_op;
> diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
> index 755e256..b9c7568 100644
> --- a/fs/nfsd/nfsfh.h
> +++ b/fs/nfsd/nfsfh.h
> @@ -35,7 +35,7 @@ static inline ino_t u32_to_ino_t(__u32 uino)
>  
>  	bool			fh_locked;	/* inode locked by us */
>  	bool			fh_want_write;	/* remount protection taken */
> -
> +	int			fh_flags;	/* FH flags */
>  #ifdef CONFIG_NFSD_V3
>  	bool			fh_post_saved;	/* post-op attrs saved */
>  	bool			fh_pre_saved;	/* pre-op attrs saved */
> @@ -56,6 +56,9 @@ static inline ino_t u32_to_ino_t(__u32 uino)
>  #endif /* CONFIG_NFSD_V3 */
>  
>  } svc_fh;
> +#define NFSD4_FH_FOREIGN (1<<0)
> +#define SET_FH_FLAG(c, f) ((c)->fh_flags |= (f))
> +#define HAS_FH_FLAG(c, f) ((c)->fh_flags & (f))
>  
>  enum nfsd_fsid {
>  	FSID_DEV = 0,
> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
> index 9d7318c..fbd18d6 100644
> --- a/fs/nfsd/xdr4.h
> +++ b/fs/nfsd/xdr4.h
> @@ -221,6 +221,7 @@ struct nfsd4_lookup {
>  struct nfsd4_putfh {
>  	u32		pf_fhlen;           /* request */
>  	char		*pf_fhval;          /* request */
> +	bool		no_verify;	    /* represents foreigh fh */
>  };
>  
>  struct nfsd4_open {
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 10/10] NFSD add nfs4 inter ssc to nfsd4_copy
  2018-11-30 20:03 ` [PATCH v2 10/10] NFSD add nfs4 inter ssc to nfsd4_copy Olga Kornievskaia
@ 2019-02-19 15:54   ` J. Bruce Fields
  0 siblings, 0 replies; 30+ messages in thread
From: J. Bruce Fields @ 2019-02-19 15:54 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: bfields, linux-nfs, linux-fsdevel

On Fri, Nov 30, 2018 at 03:03:48PM -0500, Olga Kornievskaia wrote:
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 2e28254..238c4b7 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
...
> +static __be32
> +nfsd4_setup_inter_ssc(struct svc_rqst *rqstp,
> +		      struct nfsd4_compound_state *cstate,
> +		      struct nfsd4_copy *copy,
> +		      struct vfs_mount **mount)

That should be struct vfsmount.  Don't forget to check the compile with
the new config option both on and off.

--b.

> +{
> +	*mount = NULL;
> +	return -EINVAL;
> +}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 04/10] NFSD add ca_source_server<> to COPY
  2018-11-30 20:03 ` [PATCH v2 04/10] NFSD add ca_source_server<> to COPY Olga Kornievskaia
@ 2019-02-19 16:17   ` J. Bruce Fields
  0 siblings, 0 replies; 30+ messages in thread
From: J. Bruce Fields @ 2019-02-19 16:17 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: bfields, linux-nfs, linux-fsdevel

On Fri, Nov 30, 2018 at 03:03:42PM -0500, Olga Kornievskaia wrote:
> Decode the ca_source_server list that's sent but only use the
> first one. Presence of non-zero list indicates an "inter" copy.
> 
> Signed-off-by: Andy Adamson <andros@netapp.com>
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  fs/nfsd/nfs4xdr.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  fs/nfsd/xdr4.h    | 12 ++++++----
>  2 files changed, 74 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 3de42a7..879ddc6 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
...
>  static __be32
>  nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy)
>  {
>  	DECODE_HEAD;
> -	unsigned int tmp;
> +	struct nl4_server ns_dummy;

This struct is much too big to put on the stack.

--b.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 06/10] NFSD add COPY_NOTIFY operation
  2018-11-30 20:03 ` [PATCH v2 06/10] NFSD add COPY_NOTIFY operation Olga Kornievskaia
@ 2019-02-20  1:44   ` J. Bruce Fields
  2019-02-20  2:07   ` J. Bruce Fields
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 30+ messages in thread
From: J. Bruce Fields @ 2019-02-20  1:44 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: bfields, linux-nfs, linux-fsdevel

On Fri, Nov 30, 2018 at 03:03:44PM -0500, Olga Kornievskaia wrote:
> +	/**
> +	 * For now, only return one server address in cpn_src, the
> +	 * address used by the client to connect to this server.
> +	 */

Could you check your code for any places where you use /** as the
beginning of a comment?  The usual style is just /*, and /** is reserved
for specially-formatted comments meant to be extracted into
automatically-generated API documentation, so I think the above might
confuse kernel-doc.  (See Documentation/doc-guide/kernel-doc.rst.)

--b.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 06/10] NFSD add COPY_NOTIFY operation
  2018-11-30 20:03 ` [PATCH v2 06/10] NFSD add COPY_NOTIFY operation Olga Kornievskaia
  2019-02-20  1:44   ` J. Bruce Fields
@ 2019-02-20  2:07   ` J. Bruce Fields
  2019-02-20 14:04     ` J. Bruce Fields
  2019-02-20  2:12   ` J. Bruce Fields
  2019-02-20  2:35   ` J. Bruce Fields
  3 siblings, 1 reply; 30+ messages in thread
From: J. Bruce Fields @ 2019-02-20  2:07 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: bfields, linux-nfs, linux-fsdevel

On Fri, Nov 30, 2018 at 03:03:44PM -0500, Olga Kornievskaia wrote:
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 879ddc6..c9fb625 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
...
> +	/* cnr_lease_time */
> +	p = xdr_encode_hyper(p, cn->cpn_sec);
> +	*p++ = cpu_to_be32(cn->cpn_nsec);

This is redundant; xdr_encode_hyper already wrote cn->cpn_sec.

--b.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 06/10] NFSD add COPY_NOTIFY operation
  2018-11-30 20:03 ` [PATCH v2 06/10] NFSD add COPY_NOTIFY operation Olga Kornievskaia
  2019-02-20  1:44   ` J. Bruce Fields
  2019-02-20  2:07   ` J. Bruce Fields
@ 2019-02-20  2:12   ` J. Bruce Fields
  2019-02-20  2:35   ` J. Bruce Fields
  3 siblings, 0 replies; 30+ messages in thread
From: J. Bruce Fields @ 2019-02-20  2:12 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: bfields, linux-nfs, linux-fsdevel

On Fri, Nov 30, 2018 at 03:03:44PM -0500, Olga Kornievskaia wrote:
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 0152b34..51fca9e 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
...
> @@ -1348,6 +1350,43 @@ struct nfsd4_copy *
>  }
>  
>  static __be32
> +nfsd4_copy_notify(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> +		  union nfsd4_op_u *u)
> +{
> +	struct nfsd4_copy_notify *cn = &u->copy_notify;
> +	__be32 status;
> +	struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
> +	struct nfs4_stid *stid;
> +	struct nfs4_cpntf_state *cps;
> +
> +	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
> +					&cn->cpn_src_stateid, RD_STATE, NULL,
> +					NULL, &stid);
> +	if (status)
> +		return status;
> +
> +	cn->cpn_sec = nn->nfsd4_lease;
> +	cn->cpn_nsec = 0;

I'm pretty sure this should be cp_timeout, not 0.

--b.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 06/10] NFSD add COPY_NOTIFY operation
  2018-11-30 20:03 ` [PATCH v2 06/10] NFSD add COPY_NOTIFY operation Olga Kornievskaia
                     ` (2 preceding siblings ...)
  2019-02-20  2:12   ` J. Bruce Fields
@ 2019-02-20  2:35   ` J. Bruce Fields
  2019-06-14 19:11     ` Olga Kornievskaia
  3 siblings, 1 reply; 30+ messages in thread
From: J. Bruce Fields @ 2019-02-20  2:35 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: bfields, linux-nfs, linux-fsdevel

On Fri, Nov 30, 2018 at 03:03:44PM -0500, Olga Kornievskaia wrote:
> Introducing the COPY_NOTIFY operation.
> 
> Create a new unique stateid that will keep track of the copy
> state and the upcoming READs that will use that stateid. Keep
> it in the list associated with parent stateid.
> 
> Return single netaddr to advertise to the copy.
> 
> Signed-off-by: Andy Adamson <andros@netapp.com>
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  fs/nfsd/nfs4proc.c  | 72 +++++++++++++++++++++++++++++++++++----
>  fs/nfsd/nfs4state.c | 64 +++++++++++++++++++++++++++++++----
>  fs/nfsd/nfs4xdr.c   | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++--
>  fs/nfsd/state.h     | 18 ++++++++--
>  fs/nfsd/xdr4.h      | 13 +++++++
>  5 files changed, 248 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 0152b34..51fca9e 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -37,6 +37,7 @@
>  #include <linux/falloc.h>
>  #include <linux/slab.h>
>  #include <linux/kthread.h>
> +#include <linux/sunrpc/addr.h>
>  
>  #include "idmap.h"
>  #include "cache.h"
> @@ -1035,7 +1036,8 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
>  static __be32
>  nfsd4_verify_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  		  stateid_t *src_stateid, struct file **src,
> -		  stateid_t *dst_stateid, struct file **dst)
> +		  stateid_t *dst_stateid, struct file **dst,
> +		  struct nfs4_stid **stid)
>  {
>  	__be32 status;
>  
> @@ -1052,7 +1054,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
>  
>  	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
>  					    dst_stateid, WR_STATE, dst, NULL,
> -					    NULL);
> +					    stid);
>  	if (status) {
>  		dprintk("NFSD: %s: couldn't process dst stateid!\n", __func__);
>  		goto out_put_src;
> @@ -1083,7 +1085,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
>  	__be32 status;
>  
>  	status = nfsd4_verify_copy(rqstp, cstate, &clone->cl_src_stateid, &src,
> -				   &clone->cl_dst_stateid, &dst);
> +				   &clone->cl_dst_stateid, &dst, NULL);
>  	if (status)
>  		goto out;
>  
> @@ -1230,7 +1232,7 @@ static void dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy *dst)
>  
>  static void cleanup_async_copy(struct nfsd4_copy *copy)
>  {
> -	nfs4_free_cp_state(copy);
> +	nfs4_free_copy_state(copy);
>  	fput(copy->file_dst);
>  	fput(copy->file_src);
>  	spin_lock(&copy->cp_clp->async_lock);
> @@ -1270,7 +1272,7 @@ static int nfsd4_do_async_copy(void *data)
>  
>  	status = nfsd4_verify_copy(rqstp, cstate, &copy->cp_src_stateid,
>  				   &copy->file_src, &copy->cp_dst_stateid,
> -				   &copy->file_dst);
> +				   &copy->file_dst, NULL);
>  	if (status)
>  		goto out;
>  
> @@ -1284,7 +1286,7 @@ static int nfsd4_do_async_copy(void *data)
>  		async_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL);
>  		if (!async_copy)
>  			goto out;
> -		if (!nfs4_init_cp_state(nn, copy)) {
> +		if (!nfs4_init_copy_state(nn, copy)) {
>  			kfree(async_copy);
>  			goto out;
>  		}
> @@ -1348,6 +1350,43 @@ struct nfsd4_copy *
>  }
>  
>  static __be32
> +nfsd4_copy_notify(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> +		  union nfsd4_op_u *u)
> +{
> +	struct nfsd4_copy_notify *cn = &u->copy_notify;
> +	__be32 status;
> +	struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
> +	struct nfs4_stid *stid;
> +	struct nfs4_cpntf_state *cps;
> +
> +	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
> +					&cn->cpn_src_stateid, RD_STATE, NULL,
> +					NULL, &stid);
> +	if (status)
> +		return status;
> +
> +	cn->cpn_sec = nn->nfsd4_lease;
> +	cn->cpn_nsec = 0;
> +
> +	status = nfserrno(-ENOMEM);
> +	cps = nfs4_alloc_init_cpntf_state(nn, stid);
> +	if (!cps)
> +		return status;
> +	memcpy(&cn->cpn_cnr_stateid, &cps->cp_stateid, sizeof(stateid_t));
> +
> +	/**
> +	 * For now, only return one server address in cpn_src, the
> +	 * address used by the client to connect to this server.
> +	 */
> +	cn->cpn_src.nl4_type = NL4_NETADDR;
> +	status = nfsd4_set_netaddr((struct sockaddr *)&rqstp->rq_daddr,
> +				 &cn->cpn_src.u.nl4_addr);
> +	WARN_ON_ONCE(status);
> +
> +	return status;
> +}
> +
> +static __be32
>  nfsd4_fallocate(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  		struct nfsd4_fallocate *fallocate, int flags)
>  {
> @@ -2299,6 +2338,21 @@ static inline u32 nfsd4_offload_status_rsize(struct svc_rqst *rqstp,
>  		1 /* osr_complete<1> optional 0 for now */) * sizeof(__be32);
>  }
>  
> +static inline u32 nfsd4_copy_notify_rsize(struct svc_rqst *rqstp,
> +					struct nfsd4_op *op)
> +{
> +	return (op_encode_hdr_size +
> +		3 /* cnr_lease_time */ +
> +		1 /* We support one cnr_source_server */ +
> +		1 /* cnr_stateid seq */ +
> +		op_encode_stateid_maxsz /* cnr_stateid */ +
> +		1 /* num cnr_source_server*/ +
> +		1 /* nl4_type */ +
> +		1 /* nl4 size */ +
> +		XDR_QUADLEN(NFS4_OPAQUE_LIMIT) /*nl4_loc + nl4_loc_sz */)
> +		* sizeof(__be32);
> +}
> +
>  #ifdef CONFIG_NFSD_PNFS
>  static inline u32 nfsd4_getdeviceinfo_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
>  {
> @@ -2723,6 +2777,12 @@ static inline u32 nfsd4_seek_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
>  		.op_name = "OP_OFFLOAD_CANCEL",
>  		.op_rsize_bop = nfsd4_only_status_rsize,
>  	},
> +	[OP_COPY_NOTIFY] = {
> +		.op_func = nfsd4_copy_notify,
> +		.op_flags = OP_MODIFIES_SOMETHING,
> +		.op_name = "OP_COPY_NOTIFY",
> +		.op_rsize_bop = nfsd4_copy_notify_rsize,
> +	},
>  };
>  
>  /**
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index be3e967..eaa136f 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -697,6 +697,7 @@ struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *sla
>  	/* Will be incremented before return to client: */
>  	refcount_set(&stid->sc_count, 1);
>  	spin_lock_init(&stid->sc_lock);
> +	INIT_LIST_HEAD(&stid->sc_cp_list);
>  
>  	/*
>  	 * It shouldn't be a problem to reuse an opaque stateid value.
> @@ -716,24 +717,53 @@ struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *sla
>  /*
>   * Create a unique stateid_t to represent each COPY.
>   */
> -int nfs4_init_cp_state(struct nfsd_net *nn, struct nfsd4_copy *copy)
> +static int nfs4_init_cp_state(struct nfsd_net *nn, void *ptr, stateid_t *stid)
>  {
>  	int new_id;
>  
>  	idr_preload(GFP_KERNEL);
>  	spin_lock(&nn->s2s_cp_lock);
> -	new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, copy, 0, 0, GFP_NOWAIT);
> +	new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, ptr, 0, 0, GFP_NOWAIT);
>  	spin_unlock(&nn->s2s_cp_lock);
>  	idr_preload_end();
>  	if (new_id < 0)
>  		return 0;
> -	copy->cp_stateid.si_opaque.so_id = new_id;
> -	copy->cp_stateid.si_opaque.so_clid.cl_boot = nn->boot_time;
> -	copy->cp_stateid.si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
> +	stid->si_opaque.so_id = new_id;
> +	stid->si_opaque.so_clid.cl_boot = nn->boot_time;
> +	stid->si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
>  	return 1;
>  }
>  
> -void nfs4_free_cp_state(struct nfsd4_copy *copy)
> +int nfs4_init_copy_state(struct nfsd_net *nn, struct nfsd4_copy *copy)
> +{
> +	return nfs4_init_cp_state(nn, copy, &copy->cp_stateid);
> +}
> +
> +struct nfs4_cpntf_state *nfs4_alloc_init_cpntf_state(struct nfsd_net *nn,
> +						     struct nfs4_stid *p_stid)
> +{
> +	struct nfs4_cpntf_state *cps;
> +
> +	cps = kzalloc(sizeof(struct nfs4_cpntf_state), GFP_KERNEL);
> +	if (!cps)
> +		return NULL;
> +	if (!nfs4_init_cp_state(nn, cps, &cps->cp_stateid))
> +		goto out_free;
> +	cps->cp_p_stid = p_stid;
> +	cps->cp_active = false;
> +	cps->cp_timeout = jiffies + (nn->nfsd4_lease * HZ);
> +	INIT_LIST_HEAD(&cps->cp_list);
> +	spin_lock(&nn->s2s_cp_lock);
> +	list_add(&cps->cp_list, &p_stid->sc_cp_list);
> +	spin_unlock(&nn->s2s_cp_lock);
> +
> +	return cps;
> +out_free:
> +	kfree(cps);
> +	return NULL;
> +}
> +
> +void nfs4_free_copy_state(struct nfsd4_copy *copy)
>  {
>  	struct nfsd_net *nn;
>  
> @@ -743,6 +773,27 @@ void nfs4_free_cp_state(struct nfsd4_copy *copy)
>  	spin_unlock(&nn->s2s_cp_lock);
>  }
>  
> +static void nfs4_free_cpntf_statelist(struct net *net, struct nfs4_stid *stid)
> +{
> +	struct nfs4_cpntf_state *cps;
> +	struct nfsd_net *nn;
> +
> +	nn = net_generic(net, nfsd_net_id);
> +
> +	might_sleep();

What's that for?  Just remove it unless you've got some good reason.

--b.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 06/10] NFSD add COPY_NOTIFY operation
  2019-02-20  2:07   ` J. Bruce Fields
@ 2019-02-20 14:04     ` J. Bruce Fields
  0 siblings, 0 replies; 30+ messages in thread
From: J. Bruce Fields @ 2019-02-20 14:04 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Olga Kornievskaia, linux-nfs, linux-fsdevel

On Tue, Feb 19, 2019 at 09:07:32PM -0500, J. Bruce Fields wrote:
> On Fri, Nov 30, 2018 at 03:03:44PM -0500, Olga Kornievskaia wrote:
> > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > index 879ddc6..c9fb625 100644
> > --- a/fs/nfsd/nfs4xdr.c
> > +++ b/fs/nfsd/nfs4xdr.c
> ...
> > +	/* cnr_lease_time */
> > +	p = xdr_encode_hyper(p, cn->cpn_sec);
> > +	*p++ = cpu_to_be32(cn->cpn_nsec);
> 
> This is redundant; xdr_encode_hyper already wrote cn->cpn_sec.

Um, I completely missed the sec/nsec distinction, never mind me.

--b.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 06/10] NFSD add COPY_NOTIFY operation
  2019-02-20  2:35   ` J. Bruce Fields
@ 2019-06-14 19:11     ` Olga Kornievskaia
  0 siblings, 0 replies; 30+ messages in thread
From: Olga Kornievskaia @ 2019-06-14 19:11 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: J. Bruce Fields, linux-nfs, linux-fsdevel

On Tue, Feb 19, 2019 at 9:35 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Fri, Nov 30, 2018 at 03:03:44PM -0500, Olga Kornievskaia wrote:
> > Introducing the COPY_NOTIFY operation.
> >
> > Create a new unique stateid that will keep track of the copy
> > state and the upcoming READs that will use that stateid. Keep
> > it in the list associated with parent stateid.
> >
> > Return single netaddr to advertise to the copy.
> >
> > Signed-off-by: Andy Adamson <andros@netapp.com>
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > ---
> >  fs/nfsd/nfs4proc.c  | 72 +++++++++++++++++++++++++++++++++++----
> >  fs/nfsd/nfs4state.c | 64 +++++++++++++++++++++++++++++++----
> >  fs/nfsd/nfs4xdr.c   | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++--
> >  fs/nfsd/state.h     | 18 ++++++++--
> >  fs/nfsd/xdr4.h      | 13 +++++++
> >  5 files changed, 248 insertions(+), 16 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > index 0152b34..51fca9e 100644
> > --- a/fs/nfsd/nfs4proc.c
> > +++ b/fs/nfsd/nfs4proc.c
> > @@ -37,6 +37,7 @@
> >  #include <linux/falloc.h>
> >  #include <linux/slab.h>
> >  #include <linux/kthread.h>
> > +#include <linux/sunrpc/addr.h>
> >
> >  #include "idmap.h"
> >  #include "cache.h"
> > @@ -1035,7 +1036,8 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
> >  static __be32
> >  nfsd4_verify_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >                 stateid_t *src_stateid, struct file **src,
> > -               stateid_t *dst_stateid, struct file **dst)
> > +               stateid_t *dst_stateid, struct file **dst,
> > +               struct nfs4_stid **stid)
> >  {
> >       __be32 status;
> >
> > @@ -1052,7 +1054,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
> >
> >       status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
> >                                           dst_stateid, WR_STATE, dst, NULL,
> > -                                         NULL);
> > +                                         stid);
> >       if (status) {
> >               dprintk("NFSD: %s: couldn't process dst stateid!\n", __func__);
> >               goto out_put_src;
> > @@ -1083,7 +1085,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
> >       __be32 status;
> >
> >       status = nfsd4_verify_copy(rqstp, cstate, &clone->cl_src_stateid, &src,
> > -                                &clone->cl_dst_stateid, &dst);
> > +                                &clone->cl_dst_stateid, &dst, NULL);
> >       if (status)
> >               goto out;
> >
> > @@ -1230,7 +1232,7 @@ static void dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy *dst)
> >
> >  static void cleanup_async_copy(struct nfsd4_copy *copy)
> >  {
> > -     nfs4_free_cp_state(copy);
> > +     nfs4_free_copy_state(copy);
> >       fput(copy->file_dst);
> >       fput(copy->file_src);
> >       spin_lock(&copy->cp_clp->async_lock);
> > @@ -1270,7 +1272,7 @@ static int nfsd4_do_async_copy(void *data)
> >
> >       status = nfsd4_verify_copy(rqstp, cstate, &copy->cp_src_stateid,
> >                                  &copy->file_src, &copy->cp_dst_stateid,
> > -                                &copy->file_dst);
> > +                                &copy->file_dst, NULL);
> >       if (status)
> >               goto out;
> >
> > @@ -1284,7 +1286,7 @@ static int nfsd4_do_async_copy(void *data)
> >               async_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL);
> >               if (!async_copy)
> >                       goto out;
> > -             if (!nfs4_init_cp_state(nn, copy)) {
> > +             if (!nfs4_init_copy_state(nn, copy)) {
> >                       kfree(async_copy);
> >                       goto out;
> >               }
> > @@ -1348,6 +1350,43 @@ struct nfsd4_copy *
> >  }
> >
> >  static __be32
> > +nfsd4_copy_notify(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > +               union nfsd4_op_u *u)
> > +{
> > +     struct nfsd4_copy_notify *cn = &u->copy_notify;
> > +     __be32 status;
> > +     struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
> > +     struct nfs4_stid *stid;
> > +     struct nfs4_cpntf_state *cps;
> > +
> > +     status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
> > +                                     &cn->cpn_src_stateid, RD_STATE, NULL,
> > +                                     NULL, &stid);
> > +     if (status)
> > +             return status;
> > +
> > +     cn->cpn_sec = nn->nfsd4_lease;
> > +     cn->cpn_nsec = 0;
> > +
> > +     status = nfserrno(-ENOMEM);
> > +     cps = nfs4_alloc_init_cpntf_state(nn, stid);
> > +     if (!cps)
> > +             return status;
> > +     memcpy(&cn->cpn_cnr_stateid, &cps->cp_stateid, sizeof(stateid_t));
> > +
> > +     /**
> > +      * For now, only return one server address in cpn_src, the
> > +      * address used by the client to connect to this server.
> > +      */
> > +     cn->cpn_src.nl4_type = NL4_NETADDR;
> > +     status = nfsd4_set_netaddr((struct sockaddr *)&rqstp->rq_daddr,
> > +                              &cn->cpn_src.u.nl4_addr);
> > +     WARN_ON_ONCE(status);
> > +
> > +     return status;
> > +}
> > +
> > +static __be32
> >  nfsd4_fallocate(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >               struct nfsd4_fallocate *fallocate, int flags)
> >  {
> > @@ -2299,6 +2338,21 @@ static inline u32 nfsd4_offload_status_rsize(struct svc_rqst *rqstp,
> >               1 /* osr_complete<1> optional 0 for now */) * sizeof(__be32);
> >  }
> >
> > +static inline u32 nfsd4_copy_notify_rsize(struct svc_rqst *rqstp,
> > +                                     struct nfsd4_op *op)
> > +{
> > +     return (op_encode_hdr_size +
> > +             3 /* cnr_lease_time */ +
> > +             1 /* We support one cnr_source_server */ +
> > +             1 /* cnr_stateid seq */ +
> > +             op_encode_stateid_maxsz /* cnr_stateid */ +
> > +             1 /* num cnr_source_server*/ +
> > +             1 /* nl4_type */ +
> > +             1 /* nl4 size */ +
> > +             XDR_QUADLEN(NFS4_OPAQUE_LIMIT) /*nl4_loc + nl4_loc_sz */)
> > +             * sizeof(__be32);
> > +}
> > +
> >  #ifdef CONFIG_NFSD_PNFS
> >  static inline u32 nfsd4_getdeviceinfo_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
> >  {
> > @@ -2723,6 +2777,12 @@ static inline u32 nfsd4_seek_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
> >               .op_name = "OP_OFFLOAD_CANCEL",
> >               .op_rsize_bop = nfsd4_only_status_rsize,
> >       },
> > +     [OP_COPY_NOTIFY] = {
> > +             .op_func = nfsd4_copy_notify,
> > +             .op_flags = OP_MODIFIES_SOMETHING,
> > +             .op_name = "OP_COPY_NOTIFY",
> > +             .op_rsize_bop = nfsd4_copy_notify_rsize,
> > +     },
> >  };
> >
> >  /**
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index be3e967..eaa136f 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -697,6 +697,7 @@ struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *sla
> >       /* Will be incremented before return to client: */
> >       refcount_set(&stid->sc_count, 1);
> >       spin_lock_init(&stid->sc_lock);
> > +     INIT_LIST_HEAD(&stid->sc_cp_list);
> >
> >       /*
> >        * It shouldn't be a problem to reuse an opaque stateid value.
> > @@ -716,24 +717,53 @@ struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *sla
> >  /*
> >   * Create a unique stateid_t to represent each COPY.
> >   */
> > -int nfs4_init_cp_state(struct nfsd_net *nn, struct nfsd4_copy *copy)
> > +static int nfs4_init_cp_state(struct nfsd_net *nn, void *ptr, stateid_t *stid)
> >  {
> >       int new_id;
> >
> >       idr_preload(GFP_KERNEL);
> >       spin_lock(&nn->s2s_cp_lock);
> > -     new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, copy, 0, 0, GFP_NOWAIT);
> > +     new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, ptr, 0, 0, GFP_NOWAIT);
> >       spin_unlock(&nn->s2s_cp_lock);
> >       idr_preload_end();
> >       if (new_id < 0)
> >               return 0;
> > -     copy->cp_stateid.si_opaque.so_id = new_id;
> > -     copy->cp_stateid.si_opaque.so_clid.cl_boot = nn->boot_time;
> > -     copy->cp_stateid.si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
> > +     stid->si_opaque.so_id = new_id;
> > +     stid->si_opaque.so_clid.cl_boot = nn->boot_time;
> > +     stid->si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
> >       return 1;
> >  }
> >
> > -void nfs4_free_cp_state(struct nfsd4_copy *copy)
> > +int nfs4_init_copy_state(struct nfsd_net *nn, struct nfsd4_copy *copy)
> > +{
> > +     return nfs4_init_cp_state(nn, copy, &copy->cp_stateid);
> > +}
> > +
> > +struct nfs4_cpntf_state *nfs4_alloc_init_cpntf_state(struct nfsd_net *nn,
> > +                                                  struct nfs4_stid *p_stid)
> > +{
> > +     struct nfs4_cpntf_state *cps;
> > +
> > +     cps = kzalloc(sizeof(struct nfs4_cpntf_state), GFP_KERNEL);
> > +     if (!cps)
> > +             return NULL;
> > +     if (!nfs4_init_cp_state(nn, cps, &cps->cp_stateid))
> > +             goto out_free;
> > +     cps->cp_p_stid = p_stid;
> > +     cps->cp_active = false;
> > +     cps->cp_timeout = jiffies + (nn->nfsd4_lease * HZ);
> > +     INIT_LIST_HEAD(&cps->cp_list);
> > +     spin_lock(&nn->s2s_cp_lock);
> > +     list_add(&cps->cp_list, &p_stid->sc_cp_list);
> > +     spin_unlock(&nn->s2s_cp_lock);
> > +
> > +     return cps;
> > +out_free:
> > +     kfree(cps);
> > +     return NULL;
> > +}
> > +
> > +void nfs4_free_copy_state(struct nfsd4_copy *copy)
> >  {
> >       struct nfsd_net *nn;
> >
> > @@ -743,6 +773,27 @@ void nfs4_free_cp_state(struct nfsd4_copy *copy)
> >       spin_unlock(&nn->s2s_cp_lock);
> >  }
> >
> > +static void nfs4_free_cpntf_statelist(struct net *net, struct nfs4_stid *stid)
> > +{
> > +     struct nfs4_cpntf_state *cps;
> > +     struct nfsd_net *nn;
> > +
> > +     nn = net_generic(net, nfsd_net_id);
> > +
> > +     might_sleep();
>
> What's that for?  Just remove it unless you've got some good reason.

I think this function was using free_ol_stateid_reaplist() as an
example which has might_sleep() in it. I don't really see a reason why
we'd be sleeping in nfs4_free_cpntf_statelist() so I'll remove it.

>
> --b.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2019-06-14 19:12 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-30 20:03 [PATCH v2 00/10] server-side support for "inter" SSC copy Olga Kornievskaia
2018-11-30 20:03 ` [PATCH v2 01/10] VFS generic copy_file_range() support Olga Kornievskaia
2018-12-01  8:11   ` Amir Goldstein
2018-12-01 13:23     ` Olga Kornievskaia
2018-12-01 13:44       ` Olga Kornievskaia
     [not found]         ` <CAOQ4uxgENLCDH7QwtBPxA60dKEXvLVknBMY_Lgoetq_uQ=7gwA@mail.gmail.com>
     [not found]           ` <CAN-5tyFGV=fUCbAG5mSvy=LXDpdp8VG9Sh1aGMkBHQAG1Rp1sQ@mail.gmail.com>
2018-12-01 16:59             ` Amir Goldstein
2018-12-01 22:00     ` Dave Chinner
2018-12-02  3:12       ` Olga Kornievskaia
2018-12-02 15:19         ` Olga Kornievskaia
2018-12-02 20:47         ` Dave Chinner
2018-12-01 21:18   ` Matthew Wilcox
2018-12-01 22:36     ` Dave Chinner
2018-11-30 20:03 ` [PATCH v2 02/10] NFS fallback to generic_copy_file_range Olga Kornievskaia
2018-11-30 20:03 ` [PATCH v2 03/10] NFSD fill-in netloc4 structure Olga Kornievskaia
2018-11-30 20:03 ` [PATCH v2 04/10] NFSD add ca_source_server<> to COPY Olga Kornievskaia
2019-02-19 16:17   ` J. Bruce Fields
2018-11-30 20:03 ` [PATCH v2 05/10] NFSD return nfs4_stid in nfs4_preprocess_stateid_op Olga Kornievskaia
2018-11-30 20:03 ` [PATCH v2 06/10] NFSD add COPY_NOTIFY operation Olga Kornievskaia
2019-02-20  1:44   ` J. Bruce Fields
2019-02-20  2:07   ` J. Bruce Fields
2019-02-20 14:04     ` J. Bruce Fields
2019-02-20  2:12   ` J. Bruce Fields
2019-02-20  2:35   ` J. Bruce Fields
2019-06-14 19:11     ` Olga Kornievskaia
2018-11-30 20:03 ` [PATCH v2 07/10] NFSD check stateids against copy stateids Olga Kornievskaia
2018-11-30 20:03 ` [PATCH v2 08/10] NFSD generalize nfsd4_compound_state flag names Olga Kornievskaia
2018-11-30 20:03 ` [PATCH v2 09/10] NFSD: allow inter server COPY to have a STALE source server fh Olga Kornievskaia
2019-02-19 15:53   ` J. Bruce Fields
2018-11-30 20:03 ` [PATCH v2 10/10] NFSD add nfs4 inter ssc to nfsd4_copy Olga Kornievskaia
2019-02-19 15:54   ` J. Bruce Fields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).