linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/11] client-side support for "inter" SSC copy
@ 2018-10-26 20:10 Olga Kornievskaia
  2018-10-26 20:10 ` [PATCH v4 01/11] VFS: move cross device copy_file_range() check into filesystems Olga Kornievskaia
                   ` (11 more replies)
  0 siblings, 12 replies; 43+ messages in thread
From: Olga Kornievskaia @ 2018-10-26 20:10 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, viro, smfrench, miklos
  Cc: linux-nfs, linux-fsdevel, linux-cifs, linux-unionfs, linux-man

From: Olga Kornievskaia <kolga@netapp.com>

This patch series adds client-side support for doing NFSv4.2 "inter"
copy offload between different NFS servers.

In case of the "inter" SSC copy files reside on different servers and
thus under different superblocks and require that VFS removes the
restriction that src and dst files must be on the same superblock.

NFS's copy_file_range() determines if the copy is "intra" or "inter"
and for "inter" it sends the COPY_NOTIFY to the source server. Then,
it would send of an asynchronous COPY to the destination server. If
an application cancels an in-flight COPY, OFFLOAD_CANCEL is sent to
both of the servers.

This patch series also include necessary client-side additions that
are performed by the destination server. The server needs an NFS
open that represents a source file without doing an actual open.
Two function nfs42_ssc_open/nfs42_ssc_close() are introduced to
accomplish it that make use of the VFS's alloc_file_pseudo() to
represent an open.

Also this particular open is marked (NFS_SVC_SSC_COPY_STATE) so
that if the destination server ever to receive stateid errors on
this stateid, it knows not to initiate state recovery (in case
when source server reboots). The recovery must be done by the
client and a new copy must be initiated. Therefore, in this case
the recovery needs to fail with EIO.

v4:
-- in the VFS patches: 1. removed the accidental text in vfs.txt,
2. removed additions to vfs.txt, and 3. Anna provided help with 
wording the commit message and 4. changed the offset check to
include the boundry just like it was before.
-- removed dprintk from the nfs42_ssc_open
-- changed the cross filesystem check in NFS's copy_file_range

Olga Kornievskaia (11):
  VFS move cross device copy_file_range() check into filesystems
  VFS copy_file_range check validity of input source offset
  NFS NFSD defining nl4_servers structure needed by both
  NFS add COPY_NOTIFY operation
  NFS add ca_source_server<> to COPY
  NFS also send OFFLOAD_CANCEL to source server
  NFS inter ssc open
  NFS skip recovery of copy open on dest server
  NFS for "inter" copy treat ESTALE as ENOTSUPP
  NFS COPY handle ERR_OFFLOAD_DENIED
  NFS replace cross device check in copy_file_range

 Documentation/filesystems/porting |   7 ++
 fs/cifs/cifsfs.c                  |   3 +
 fs/nfs/nfs42.h                    |  15 ++-
 fs/nfs/nfs42proc.c                | 129 ++++++++++++++++++++++---
 fs/nfs/nfs42xdr.c                 | 193 +++++++++++++++++++++++++++++++++++++-
 fs/nfs/nfs4_fs.h                  |  10 ++
 fs/nfs/nfs4client.c               |   2 +-
 fs/nfs/nfs4file.c                 | 125 +++++++++++++++++++++++-
 fs/nfs/nfs4proc.c                 |   6 +-
 fs/nfs/nfs4state.c                |  14 +++
 fs/nfs/nfs4xdr.c                  |   1 +
 fs/overlayfs/file.c               |   3 +
 fs/read_write.c                   |  12 +--
 include/linux/nfs4.h              |  25 +++++
 include/linux/nfs_fs_sb.h         |   1 +
 include/linux/nfs_xdr.h           |  17 ++++
 16 files changed, 538 insertions(+), 25 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v4 01/11] VFS: move cross device copy_file_range() check into filesystems
  2018-10-26 20:10 [PATCH v4 00/11] client-side support for "inter" SSC copy Olga Kornievskaia
@ 2018-10-26 20:10 ` Olga Kornievskaia
  2018-10-26 21:23   ` Matthew Wilcox
                     ` (3 more replies)
  2018-10-26 20:10 ` [PATCH 1/1] man-page: copy_file_range(2) allow for cross-device copies Olga Kornievskaia
                   ` (10 subsequent siblings)
  11 siblings, 4 replies; 43+ messages in thread
From: Olga Kornievskaia @ 2018-10-26 20:10 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, viro, smfrench, miklos
  Cc: linux-nfs, linux-fsdevel, linux-cifs, linux-unionfs, linux-man

From: Olga Kornievskaia <kolga@netapp.com>

This patch makes it the responsibility of individual filesystems to
allow or deny cross device copies.  Both NFS and CIFS have operations
for cross-server copies, and later patches will implement this feature.

Note that as of this patch, the copy_file_range() function might be passed
superblocks from different filesystem types. -EXDEV should be returned
if cross device copies aren't supported, causing the VFS to fall back
on using do_splice_direct().

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 Documentation/filesystems/porting | 7 +++++++
 fs/cifs/cifsfs.c                  | 3 +++
 fs/nfs/nfs4file.c                 | 3 +++
 fs/overlayfs/file.c               | 3 +++
 fs/read_write.c                   | 9 +++------
 5 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
index 7b7b845..897e1e7 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -622,3 +622,10 @@ in your dentry operations instead.
 	alloc_file_clone(file, flags, ops) does not affect any caller's references.
 	On success you get a new struct file sharing the mount/dentry with the
 	original, on failure - ERR_PTR().
+--
+[mandatory]
+	->copy_file_range() may now be passed files which belong to two
+	different superblocks of the same file system type or which belong
+	to two different filesystems types all together. As before, the
+	destination's copy_file_range() is the function which is called.
+	If it cannot copy ranges from the source, it should return -EXDEV.
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 7065426..f2d7f4f 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -1114,6 +1114,9 @@ static ssize_t cifs_copy_file_range(struct file *src_file, loff_t off,
 	unsigned int xid = get_xid();
 	ssize_t rc;
 
+	if (src_file->f_inode->i_sb != dst_file->f_inode->i_sb)
+		return -EXDEV;
+
 	rc = cifs_file_copychunk_range(xid, src_file, off, dst_file, destoff,
 					len, flags);
 	free_xid(xid);
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index 4288a6e..09df688 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -135,6 +135,9 @@ static ssize_t nfs4_copy_file_range(struct file *file_in, loff_t pos_in,
 {
 	ssize_t ret;
 
+	if (file_in->f_inode->i_sb != file_out->f_inode->i_sb)
+		return -EXDEV;
+
 	if (file_inode(file_in) == file_inode(file_out))
 		return -EINVAL;
 retry:
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index aeaefd2..5282853 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -483,6 +483,9 @@ static ssize_t ovl_copy_file_range(struct file *file_in, loff_t pos_in,
 				   struct file *file_out, loff_t pos_out,
 				   size_t len, unsigned int flags)
 {
+	if (file_in->f_inode->i_sb != file_out->f_inode->i_sb)
+		return -EXDEV;
+
 	return ovl_copyfile(file_in, pos_in, file_out, pos_out, len, flags,
 			    OVL_COPY);
 }
diff --git a/fs/read_write.c b/fs/read_write.c
index 39b4a21..fb4ffca 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1575,10 +1575,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	    (file_out->f_flags & O_APPEND))
 		return -EBADF;
 
-	/* this could be relaxed once a method supports cross-fs copies */
-	if (inode_in->i_sb != inode_out->i_sb)
-		return -EXDEV;
-
 	if (len == 0)
 		return 0;
 
@@ -1588,7 +1584,8 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	 * Try cloning first, this is supported by more file systems, and
 	 * more efficient if both clone and copy are supported (e.g. NFS).
 	 */
-	if (file_in->f_op->clone_file_range) {
+	if (inode_in->i_sb == inode_out->i_sb &&
+			file_in->f_op->clone_file_range) {
 		ret = file_in->f_op->clone_file_range(file_in, pos_in,
 				file_out, pos_out, len);
 		if (ret == 0) {
@@ -1600,7 +1597,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	if (file_out->f_op->copy_file_range) {
 		ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
 						      pos_out, len, flags);
-		if (ret != -EOPNOTSUPP)
+		if (ret != -EOPNOTSUPP && ret != -EXDEV)
 			goto done;
 	}
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 1/1] man-page: copy_file_range(2) allow for cross-device copies
  2018-10-26 20:10 [PATCH v4 00/11] client-side support for "inter" SSC copy Olga Kornievskaia
  2018-10-26 20:10 ` [PATCH v4 01/11] VFS: move cross device copy_file_range() check into filesystems Olga Kornievskaia
@ 2018-10-26 20:10 ` Olga Kornievskaia
  2018-10-27  9:12   ` Dave Chinner
  2018-10-26 20:10 ` [PATCH v4 02/11] VFS: copy_file_range check validity of input source offset Olga Kornievskaia
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 43+ messages in thread
From: Olga Kornievskaia @ 2018-10-26 20:10 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, viro, smfrench, miklos
  Cc: linux-nfs, linux-fsdevel, linux-cifs, linux-unionfs, linux-man

From: Olga Kornievskaia <kolga@netapp.com>

A proposed VFS change removes the check for the files to reside
under the same file system. Instead, a file system driver implementation
is allowed to perform a cross-device copy_file_range() and if
the file system fails to support it instead fallback to doing
a do_splice copy. Therefore, EXDEV error code only applies to
kernel version prior to such support.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 man2/copy_file_range.2 | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
index 20374ab..88b40bb 100644
--- a/man2/copy_file_range.2
+++ b/man2/copy_file_range.2
@@ -39,7 +39,8 @@ The
 .BR copy_file_range ()
 system call performs an in-kernel copy between two file descriptors
 without the additional cost of transferring data from the kernel to user space
-and then back into the kernel.
+and then back into the kernel. Starting kernel version 4.21 passed in
+file descriptors are not required to be under the same mounted file system.
 It copies up to
 .I len
 bytes of data from file descriptor
@@ -131,7 +132,8 @@ There is not enough space on the target filesystem to complete the copy.
 .B EXDEV
 The files referred to by
 .IR file_in " and " file_out
-are not on the same mounted filesystem.
+are not on the same mounted filesystem when the kernel does not support
+cross device file copy.
 .SH VERSIONS
 The
 .BR copy_file_range ()
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v4 02/11] VFS: copy_file_range check validity of input source offset
  2018-10-26 20:10 [PATCH v4 00/11] client-side support for "inter" SSC copy Olga Kornievskaia
  2018-10-26 20:10 ` [PATCH v4 01/11] VFS: move cross device copy_file_range() check into filesystems Olga Kornievskaia
  2018-10-26 20:10 ` [PATCH 1/1] man-page: copy_file_range(2) allow for cross-device copies Olga Kornievskaia
@ 2018-10-26 20:10 ` Olga Kornievskaia
  2018-10-26 21:26   ` Matthew Wilcox
  2018-10-27  9:27   ` Dave Chinner
  2018-10-26 20:10 ` [PATCH v4 03/11] NFS: NFSD defining nl4_servers structure needed by both Olga Kornievskaia
                   ` (8 subsequent siblings)
  11 siblings, 2 replies; 43+ messages in thread
From: Olga Kornievskaia @ 2018-10-26 20:10 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, viro, smfrench, miklos
  Cc: linux-nfs, linux-fsdevel, linux-cifs, linux-unionfs, linux-man

From: Olga Kornievskaia <kolga@netapp.com>

Input source offset can't be beyond the end of the file.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/read_write.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/read_write.c b/fs/read_write.c
index fb4ffca..b3b304e 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 		}
 	}
 
+	if (pos_in >= i_size_read(inode_in))
+		return -EINVAL;
+
 	if (file_out->f_op->copy_file_range) {
 		ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
 						      pos_out, len, flags);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v4 03/11] NFS: NFSD defining nl4_servers structure needed by both
  2018-10-26 20:10 [PATCH v4 00/11] client-side support for "inter" SSC copy Olga Kornievskaia
                   ` (2 preceding siblings ...)
  2018-10-26 20:10 ` [PATCH v4 02/11] VFS: copy_file_range check validity of input source offset Olga Kornievskaia
@ 2018-10-26 20:10 ` Olga Kornievskaia
  2018-10-27 11:14   ` Jeff Layton
  2018-10-26 20:10 ` [PATCH v4 04/11] NFS: add COPY_NOTIFY operation Olga Kornievskaia
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 43+ messages in thread
From: Olga Kornievskaia @ 2018-10-26 20:10 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, viro, smfrench, miklos
  Cc: linux-nfs, linux-fsdevel, linux-cifs, linux-unionfs, linux-man

From: Olga Kornievskaia <kolga@netapp.com>

These structures are needed by COPY_NOTIFY on the client and needed
by the nfsd as well

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 include/linux/nfs4.h | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 1b06f0b..4d76f87 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -16,6 +16,7 @@
 #include <linux/list.h>
 #include <linux/uidgid.h>
 #include <uapi/linux/nfs4.h>
+#include <linux/sunrpc/msg_prot.h>
 
 enum nfs4_acl_whotype {
 	NFS4_ACL_WHO_NAMED = 0,
@@ -672,4 +673,27 @@ struct nfs4_op_map {
 	} u;
 };
 
+struct nfs42_netaddr {
+	char		netid[RPCBIND_MAXNETIDLEN];
+	char		addr[RPCBIND_MAXUADDRLEN + 1];
+	u32		netid_len;
+	u32 addr_len;
+};
+
+enum netloc_type4 {
+	NL4_NAME		= 1,
+	NL4_URL			= 2,
+	NL4_NETADDR		= 3,
+};
+
+struct nl4_server {
+	enum netloc_type4	nl4_type;
+	union {
+		struct { /* NL4_NAME, NL4_URL */
+			int	nl4_str_sz;
+			char	nl4_str[NFS4_OPAQUE_LIMIT + 1];
+		};
+		struct nfs42_netaddr	nl4_addr; /* NL4_NETADDR */
+	} u;
+};
 #endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v4 04/11] NFS: add COPY_NOTIFY operation
  2018-10-26 20:10 [PATCH v4 00/11] client-side support for "inter" SSC copy Olga Kornievskaia
                   ` (3 preceding siblings ...)
  2018-10-26 20:10 ` [PATCH v4 03/11] NFS: NFSD defining nl4_servers structure needed by both Olga Kornievskaia
@ 2018-10-26 20:10 ` Olga Kornievskaia
  2018-10-26 20:10 ` [PATCH v4 05/11] NFS: add ca_source_server<> to COPY Olga Kornievskaia
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 43+ messages in thread
From: Olga Kornievskaia @ 2018-10-26 20:10 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, viro, smfrench, miklos
  Cc: linux-nfs, linux-fsdevel, linux-cifs, linux-unionfs, linux-man

From: Olga Kornievskaia <kolga@netapp.com>

Try using the delegation stateid, then the open stateid.

Only NL4_NETATTR, No support for NL4_NAME and NL4_URL.
Allow only one source server address to be returned for now.

To distinguish between same server copy offload ("intra") and
a copy between different server ("inter"), do a check of server
owner identity.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfs/nfs42.h            |  12 +++
 fs/nfs/nfs42proc.c        |  91 +++++++++++++++++++++++
 fs/nfs/nfs42xdr.c         | 181 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/nfs4_fs.h          |   2 +
 fs/nfs/nfs4client.c       |   2 +-
 fs/nfs/nfs4file.c         |  14 ++++
 fs/nfs/nfs4proc.c         |   1 +
 fs/nfs/nfs4xdr.c          |   1 +
 include/linux/nfs4.h      |   1 +
 include/linux/nfs_fs_sb.h |   1 +
 include/linux/nfs_xdr.h   |  16 ++++
 11 files changed, 321 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/nfs42.h b/fs/nfs/nfs42.h
index 19ec38f8..bbe49a3 100644
--- a/fs/nfs/nfs42.h
+++ b/fs/nfs/nfs42.h
@@ -13,6 +13,7 @@
 #define PNFS_LAYOUTSTATS_MAXDEV (4)
 
 /* nfs4.2proc.c */
+#ifdef CONFIG_NFS_V4_2
 int nfs42_proc_allocate(struct file *, loff_t, loff_t);
 ssize_t nfs42_proc_copy(struct file *, loff_t, struct file *, loff_t, size_t);
 int nfs42_proc_deallocate(struct file *, loff_t, loff_t);
@@ -20,5 +21,16 @@
 int nfs42_proc_layoutstats_generic(struct nfs_server *,
 				   struct nfs42_layoutstat_data *);
 int nfs42_proc_clone(struct file *, struct file *, loff_t, loff_t, loff_t);
+int nfs42_proc_copy_notify(struct file *, struct file *,
+			   struct nfs42_copy_notify_res *);
+static inline bool nfs42_files_from_same_server(struct file *in,
+						struct file *out)
+{
+	struct nfs_client *c_in = (NFS_SERVER(file_inode(in)))->nfs_client;
+	struct nfs_client *c_out = (NFS_SERVER(file_inode(out)))->nfs_client;
 
+	return nfs4_check_serverowner_major_id(c_in->cl_serverowner,
+				c_out->cl_serverowner);
+}
+#endif /* CONFIG_NFS_V4_2 */
 #endif /* __LINUX_FS_NFS_NFS4_2_H */
diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index ac5b784..b1c57a4 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -3,6 +3,7 @@
  * Copyright (c) 2014 Anna Schumaker <Anna.Schumaker@Netapp.com>
  */
 #include <linux/fs.h>
+#include <linux/sunrpc/addr.h>
 #include <linux/sunrpc/sched.h>
 #include <linux/nfs.h>
 #include <linux/nfs3.h>
@@ -15,10 +16,30 @@
 #include "pnfs.h"
 #include "nfs4session.h"
 #include "internal.h"
+#include "delegation.h"
 
 #define NFSDBG_FACILITY NFSDBG_PROC
 static int nfs42_do_offload_cancel_async(struct file *dst, nfs4_stateid *std);
 
+static void nfs42_set_netaddr(struct file *filep, struct nfs42_netaddr *naddr)
+{
+	struct nfs_client *clp = (NFS_SERVER(file_inode(filep)))->nfs_client;
+	unsigned short port = 2049;
+
+	rcu_read_lock();
+	naddr->netid_len = scnprintf(naddr->netid,
+					sizeof(naddr->netid), "%s",
+					rpc_peeraddr2str(clp->cl_rpcclient,
+					RPC_DISPLAY_NETID));
+	naddr->addr_len = scnprintf(naddr->addr,
+					sizeof(naddr->addr),
+					"%s.%u.%u",
+					rpc_peeraddr2str(clp->cl_rpcclient,
+					RPC_DISPLAY_ADDR),
+					port >> 8, port & 255);
+	rcu_read_unlock();
+}
+
 static int _nfs42_proc_fallocate(struct rpc_message *msg, struct file *filep,
 		struct nfs_lock_context *lock, loff_t offset, loff_t len)
 {
@@ -461,6 +482,76 @@ static int nfs42_do_offload_cancel_async(struct file *dst,
 	return status;
 }
 
+int _nfs42_proc_copy_notify(struct file *src, struct file *dst,
+			    struct nfs42_copy_notify_args *args,
+			    struct nfs42_copy_notify_res *res)
+{
+	struct nfs_server *src_server = NFS_SERVER(file_inode(src));
+	struct rpc_message msg = {
+		.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_COPY_NOTIFY],
+		.rpc_argp = args,
+		.rpc_resp = res,
+	};
+	int status;
+	struct nfs_open_context *ctx;
+	struct nfs_lock_context *l_ctx;
+
+	ctx = get_nfs_open_context(nfs_file_open_context(src));
+	l_ctx = nfs_get_lock_context(ctx);
+	if (IS_ERR(l_ctx))
+		return PTR_ERR(l_ctx);
+
+	status = nfs4_set_rw_stateid(&args->cna_src_stateid, ctx, l_ctx,
+				     FMODE_READ);
+	nfs_put_lock_context(l_ctx);
+	if (status)
+		return status;
+
+	status = nfs4_call_sync(src_server->client, src_server, &msg,
+				&args->cna_seq_args, &res->cnr_seq_res, 0);
+	if (status == -ENOTSUPP)
+		src_server->caps &= ~NFS_CAP_COPY_NOTIFY;
+
+	put_nfs_open_context(nfs_file_open_context(src));
+	return status;
+}
+
+int nfs42_proc_copy_notify(struct file *src, struct file *dst,
+				struct nfs42_copy_notify_res *res)
+{
+	struct nfs_server *src_server = NFS_SERVER(file_inode(src));
+	struct nfs42_copy_notify_args *args;
+	struct nfs4_exception exception = {
+		.inode = file_inode(src),
+	};
+	int status;
+
+	if (!(src_server->caps & NFS_CAP_COPY_NOTIFY))
+		return -EOPNOTSUPP;
+
+	args = kzalloc(sizeof(struct nfs42_copy_notify_args), GFP_NOFS);
+	if (args == NULL)
+		return -ENOMEM;
+
+	args->cna_src_fh  = NFS_FH(file_inode(src)),
+	args->cna_dst.nl4_type = NL4_NETADDR;
+	nfs42_set_netaddr(dst, &args->cna_dst.u.nl4_addr);
+	exception.stateid = &args->cna_src_stateid;
+
+	do {
+		status = _nfs42_proc_copy_notify(src, dst, args, res);
+		if (status == -ENOTSUPP) {
+			status = -EOPNOTSUPP;
+			goto out;
+		}
+		status = nfs4_handle_exception(src_server, status, &exception);
+	} while (exception.retry);
+
+out:
+	kfree(args);
+	return status;
+}
+
 static loff_t _nfs42_proc_llseek(struct file *filep,
 		struct nfs_lock_context *lock, loff_t offset, int whence)
 {
diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
index 69f72ed..e6e7cbf 100644
--- a/fs/nfs/nfs42xdr.c
+++ b/fs/nfs/nfs42xdr.c
@@ -29,6 +29,16 @@
 #define encode_offload_cancel_maxsz	(op_encode_hdr_maxsz + \
 					 XDR_QUADLEN(NFS4_STATEID_SIZE))
 #define decode_offload_cancel_maxsz	(op_decode_hdr_maxsz)
+#define encode_copy_notify_maxsz	(op_encode_hdr_maxsz + \
+					 XDR_QUADLEN(NFS4_STATEID_SIZE) + \
+					 1 + /* nl4_type */ \
+					 1 + XDR_QUADLEN(NFS4_OPAQUE_LIMIT))
+#define decode_copy_notify_maxsz	(op_decode_hdr_maxsz + \
+					 3 + /* cnr_lease_time */\
+					 XDR_QUADLEN(NFS4_STATEID_SIZE) + \
+					 1 + /* Support 1 cnr_source_server */\
+					 1 + /* nl4_type */ \
+					 1 + XDR_QUADLEN(NFS4_OPAQUE_LIMIT))
 #define encode_deallocate_maxsz		(op_encode_hdr_maxsz + \
 					 encode_fallocate_maxsz)
 #define decode_deallocate_maxsz		(op_decode_hdr_maxsz)
@@ -84,6 +94,12 @@
 #define NFS4_dec_offload_cancel_sz	(compound_decode_hdr_maxsz + \
 					 decode_putfh_maxsz + \
 					 decode_offload_cancel_maxsz)
+#define NFS4_enc_copy_notify_sz		(compound_encode_hdr_maxsz + \
+					 encode_putfh_maxsz + \
+					 encode_copy_notify_maxsz)
+#define NFS4_dec_copy_notify_sz		(compound_decode_hdr_maxsz + \
+					 decode_putfh_maxsz + \
+					 decode_copy_notify_maxsz)
 #define NFS4_enc_deallocate_sz		(compound_encode_hdr_maxsz + \
 					 encode_putfh_maxsz + \
 					 encode_deallocate_maxsz + \
@@ -137,6 +153,25 @@ static void encode_allocate(struct xdr_stream *xdr,
 	encode_fallocate(xdr, args);
 }
 
+static void encode_nl4_server(struct xdr_stream *xdr, const struct nl4_server *ns)
+{
+	encode_uint32(xdr, ns->nl4_type);
+	switch (ns->nl4_type) {
+	case NL4_NAME:
+	case NL4_URL:
+		encode_string(xdr, ns->u.nl4_str_sz, ns->u.nl4_str);
+		break;
+	case NL4_NETADDR:
+		encode_string(xdr, ns->u.nl4_addr.netid_len,
+			      ns->u.nl4_addr.netid);
+		encode_string(xdr, ns->u.nl4_addr.addr_len,
+			      ns->u.nl4_addr.addr);
+		break;
+	default:
+		WARN_ON_ONCE(1);
+	}
+}
+
 static void encode_copy(struct xdr_stream *xdr,
 			const struct nfs42_copy_args *args,
 			struct compound_hdr *hdr)
@@ -162,6 +197,15 @@ static void encode_offload_cancel(struct xdr_stream *xdr,
 	encode_nfs4_stateid(xdr, &args->osa_stateid);
 }
 
+static void encode_copy_notify(struct xdr_stream *xdr,
+			       const struct nfs42_copy_notify_args *args,
+			       struct compound_hdr *hdr)
+{
+	encode_op_hdr(xdr, OP_COPY_NOTIFY, decode_copy_notify_maxsz, hdr);
+	encode_nfs4_stateid(xdr, &args->cna_src_stateid);
+	encode_nl4_server(xdr, &args->cna_dst);
+}
+
 static void encode_deallocate(struct xdr_stream *xdr,
 			      const struct nfs42_falloc_args *args,
 			      struct compound_hdr *hdr)
@@ -298,6 +342,25 @@ static void nfs4_xdr_enc_offload_cancel(struct rpc_rqst *req,
 }
 
 /*
+ * Encode COPY_NOTIFY request
+ */
+static void nfs4_xdr_enc_copy_notify(struct rpc_rqst *req,
+				     struct xdr_stream *xdr,
+				     const void *data)
+{
+	const struct nfs42_copy_notify_args *args = data;
+	struct compound_hdr hdr = {
+		.minorversion = nfs4_xdr_minorversion(&args->cna_seq_args),
+	};
+
+	encode_compound_hdr(xdr, req, &hdr);
+	encode_sequence(xdr, &args->cna_seq_args, &hdr);
+	encode_putfh(xdr, args->cna_src_fh, &hdr);
+	encode_copy_notify(xdr, args, &hdr);
+	encode_nops(&hdr);
+}
+
+/*
  * Encode DEALLOCATE request
  */
 static void nfs4_xdr_enc_deallocate(struct rpc_rqst *req,
@@ -416,6 +479,58 @@ static int decode_write_response(struct xdr_stream *xdr,
 	return -EIO;
 }
 
+static int decode_nl4_server(struct xdr_stream *xdr, struct nl4_server *ns)
+{
+	struct nfs42_netaddr *naddr;
+	uint32_t dummy;
+	char *dummy_str;
+	__be32 *p;
+	int status;
+
+	/* nl_type */
+	p = xdr_inline_decode(xdr, 4);
+	if (unlikely(!p))
+		return -EIO;
+	ns->nl4_type = be32_to_cpup(p);
+	switch (ns->nl4_type) {
+	case NL4_NAME:
+	case NL4_URL:
+		status = decode_opaque_inline(xdr, &dummy, &dummy_str);
+		if (unlikely(status))
+			return status;
+		if (unlikely(dummy > NFS4_OPAQUE_LIMIT))
+			return -EIO;
+		memcpy(&ns->u.nl4_str, dummy_str, dummy);
+		ns->u.nl4_str_sz = dummy;
+		break;
+	case NL4_NETADDR:
+		naddr = &ns->u.nl4_addr;
+
+		/* netid string */
+		status = decode_opaque_inline(xdr, &dummy, &dummy_str);
+		if (unlikely(status))
+			return status;
+		if (unlikely(dummy > RPCBIND_MAXNETIDLEN))
+			return -EIO;
+		naddr->netid_len = dummy;
+		memcpy(naddr->netid, dummy_str, naddr->netid_len);
+
+		/* uaddr string */
+		status = decode_opaque_inline(xdr, &dummy, &dummy_str);
+		if (unlikely(status))
+			return status;
+		if (unlikely(dummy > RPCBIND_MAXUADDRLEN))
+			return -EIO;
+		naddr->addr_len = dummy;
+		memcpy(naddr->addr, dummy_str, naddr->addr_len);
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		return -EIO;
+	}
+	return 0;
+}
+
 static int decode_copy_requirements(struct xdr_stream *xdr,
 				    struct nfs42_copy_res *res) {
 	__be32 *p;
@@ -458,6 +573,46 @@ static int decode_offload_cancel(struct xdr_stream *xdr,
 	return decode_op_hdr(xdr, OP_OFFLOAD_CANCEL);
 }
 
+static int decode_copy_notify(struct xdr_stream *xdr,
+			      struct nfs42_copy_notify_res *res)
+{
+	__be32 *p;
+	int status, count;
+
+	status = decode_op_hdr(xdr, OP_COPY_NOTIFY);
+	if (status)
+		return status;
+	/* cnr_lease_time */
+	p = xdr_inline_decode(xdr, 12);
+	if (unlikely(!p))
+		goto out_overflow;
+	p = xdr_decode_hyper(p, &res->cnr_lease_time.seconds);
+	res->cnr_lease_time.nseconds = be32_to_cpup(p);
+
+	status = decode_opaque_fixed(xdr, &res->cnr_stateid, NFS4_STATEID_SIZE);
+	if (unlikely(status))
+		goto out_overflow;
+
+	/* number of source addresses */
+	p = xdr_inline_decode(xdr, 4);
+	if (unlikely(!p))
+		goto out_overflow;
+
+	count = be32_to_cpup(p);
+	if (count > 1)
+		pr_warn("NFS: %s: nsvr %d > Supported. Use first servers\n",
+			 __func__, count);
+
+	status = decode_nl4_server(xdr, &res->cnr_src);
+	if (unlikely(status))
+		goto out_overflow;
+	return 0;
+
+out_overflow:
+	print_overflow_msg(__func__, xdr);
+	return -EIO;
+}
+
 static int decode_deallocate(struct xdr_stream *xdr, struct nfs42_falloc_res *res)
 {
 	return decode_op_hdr(xdr, OP_DEALLOCATE);
@@ -585,6 +740,32 @@ static int nfs4_xdr_dec_offload_cancel(struct rpc_rqst *rqstp,
 }
 
 /*
+ * Decode COPY_NOTIFY response
+ */
+static int nfs4_xdr_dec_copy_notify(struct rpc_rqst *rqstp,
+				    struct xdr_stream *xdr,
+				    void *data)
+{
+	struct nfs42_copy_notify_res *res = data;
+	struct compound_hdr hdr;
+	int status;
+
+	status = decode_compound_hdr(xdr, &hdr);
+	if (status)
+		goto out;
+	status = decode_sequence(xdr, &res->cnr_seq_res, rqstp);
+	if (status)
+		goto out;
+	status = decode_putfh(xdr);
+	if (status)
+		goto out;
+	status = decode_copy_notify(xdr, res);
+
+out:
+	return status;
+}
+
+/*
  * Decode DEALLOCATE request
  */
 static int nfs4_xdr_dec_deallocate(struct rpc_rqst *rqstp,
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 8d59c96..7d17b31 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -460,6 +460,8 @@ int nfs41_discover_server_trunking(struct nfs_client *clp,
 			struct nfs_client **, struct rpc_cred *);
 extern void nfs4_schedule_session_recovery(struct nfs4_session *, int);
 extern void nfs41_notify_server(struct nfs_client *);
+bool nfs4_check_serverowner_major_id(struct nfs41_server_owner *o1,
+			struct nfs41_server_owner *o2);
 #else
 static inline void nfs4_schedule_session_recovery(struct nfs4_session *session, int err)
 {
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 8f53455..ac00eb8 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -625,7 +625,7 @@ int nfs40_walk_client_list(struct nfs_client *new,
 /*
  * Returns true if the server major ids match
  */
-static bool
+bool
 nfs4_check_serverowner_major_id(struct nfs41_server_owner *o1,
 				struct nfs41_server_owner *o2)
 {
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index 09df688..6d53750 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -133,6 +133,7 @@ static ssize_t nfs4_copy_file_range(struct file *file_in, loff_t pos_in,
 				    struct file *file_out, loff_t pos_out,
 				    size_t count, unsigned int flags)
 {
+	struct nfs42_copy_notify_res *cn_resp = NULL;
 	ssize_t ret;
 
 	if (file_in->f_inode->i_sb != file_out->f_inode->i_sb)
@@ -141,7 +142,20 @@ static ssize_t nfs4_copy_file_range(struct file *file_in, loff_t pos_in,
 	if (file_inode(file_in) == file_inode(file_out))
 		return -EINVAL;
 retry:
+	if (!nfs42_files_from_same_server(file_in, file_out)) {
+		cn_resp = kzalloc(sizeof(struct nfs42_copy_notify_res),
+				GFP_NOFS);
+		if (unlikely(cn_resp == NULL))
+			return -ENOMEM;
+
+		ret = nfs42_proc_copy_notify(file_in, file_out, cn_resp);
+		if (ret)
+			goto out;
+	}
+
 	ret = nfs42_proc_copy(file_in, pos_in, file_out, pos_out, count);
+out:
+	kfree(cn_resp);
 	if (ret == -EAGAIN)
 		goto retry;
 	return ret;
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index db84b4a..fec6e6b 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -9692,6 +9692,7 @@ static bool nfs4_match_stateid(const nfs4_stateid *s1,
 		| NFS_CAP_ALLOCATE
 		| NFS_CAP_COPY
 		| NFS_CAP_OFFLOAD_CANCEL
+		| NFS_CAP_COPY_NOTIFY
 		| NFS_CAP_DEALLOCATE
 		| NFS_CAP_SEEK
 		| NFS_CAP_LAYOUTSTATS
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index b7bde12..2163900 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -7790,6 +7790,7 @@ int nfs4_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry,
 	PROC42(CLONE,		enc_clone,		dec_clone),
 	PROC42(COPY,		enc_copy,		dec_copy),
 	PROC42(OFFLOAD_CANCEL,	enc_offload_cancel,	dec_offload_cancel),
+	PROC42(COPY_NOTIFY,	enc_copy_notify,	dec_copy_notify),
 	PROC(LOOKUPP,		enc_lookupp,		dec_lookupp),
 };
 
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 4d76f87..d80b25e 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -537,6 +537,7 @@ enum {
 	NFSPROC4_CLNT_CLONE,
 	NFSPROC4_CLNT_COPY,
 	NFSPROC4_CLNT_OFFLOAD_CANCEL,
+	NFSPROC4_CLNT_COPY_NOTIFY,
 
 	NFSPROC4_CLNT_LOOKUPP,
 };
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 0fc0b91..e5d89ff 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -261,5 +261,6 @@ struct nfs_server {
 #define NFS_CAP_CLONE		(1U << 23)
 #define NFS_CAP_COPY		(1U << 24)
 #define NFS_CAP_OFFLOAD_CANCEL	(1U << 25)
+#define NFS_CAP_COPY_NOTIFY	(1U << 26)
 
 #endif
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 0e01625..dfc59bc 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1428,6 +1428,22 @@ struct nfs42_offload_status_res {
 	int				osr_status;
 };
 
+struct nfs42_copy_notify_args {
+	struct nfs4_sequence_args	cna_seq_args;
+
+	struct nfs_fh		*cna_src_fh;
+	nfs4_stateid		cna_src_stateid;
+	struct nl4_server	cna_dst;
+};
+
+struct nfs42_copy_notify_res {
+	struct nfs4_sequence_res	cnr_seq_res;
+
+	struct nfstime4		cnr_lease_time;
+	nfs4_stateid		cnr_stateid;
+	struct nl4_server	cnr_src;
+};
+
 struct nfs42_seek_args {
 	struct nfs4_sequence_args	seq_args;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v4 05/11] NFS: add ca_source_server<> to COPY
  2018-10-26 20:10 [PATCH v4 00/11] client-side support for "inter" SSC copy Olga Kornievskaia
                   ` (4 preceding siblings ...)
  2018-10-26 20:10 ` [PATCH v4 04/11] NFS: add COPY_NOTIFY operation Olga Kornievskaia
@ 2018-10-26 20:10 ` Olga Kornievskaia
  2018-10-26 20:10 ` [PATCH v4 06/11] NFS: also send OFFLOAD_CANCEL to source server Olga Kornievskaia
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 43+ messages in thread
From: Olga Kornievskaia @ 2018-10-26 20:10 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, viro, smfrench, miklos
  Cc: linux-nfs, linux-fsdevel, linux-cifs, linux-unionfs, linux-man

From: Olga Kornievskaia <kolga@netapp.com>

Support only one source server address: the same address that
the client and source server use.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfs/nfs42.h          |  3 ++-
 fs/nfs/nfs42proc.c      | 26 +++++++++++++++++---------
 fs/nfs/nfs42xdr.c       | 12 ++++++++++--
 fs/nfs/nfs4file.c       |  7 ++++++-
 include/linux/nfs_xdr.h |  1 +
 5 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/fs/nfs/nfs42.h b/fs/nfs/nfs42.h
index bbe49a3..28dcee5 100644
--- a/fs/nfs/nfs42.h
+++ b/fs/nfs/nfs42.h
@@ -15,7 +15,8 @@
 /* nfs4.2proc.c */
 #ifdef CONFIG_NFS_V4_2
 int nfs42_proc_allocate(struct file *, loff_t, loff_t);
-ssize_t nfs42_proc_copy(struct file *, loff_t, struct file *, loff_t, size_t);
+ssize_t nfs42_proc_copy(struct file *, loff_t, struct file *, loff_t, size_t,
+			struct nl4_server *, nfs4_stateid *);
 int nfs42_proc_deallocate(struct file *, loff_t, loff_t);
 loff_t nfs42_proc_llseek(struct file *, loff_t, int);
 int nfs42_proc_layoutstats_generic(struct nfs_server *,
diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index b1c57a4..bb9e799 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -242,7 +242,9 @@ static ssize_t _nfs42_proc_copy(struct file *src,
 				struct file *dst,
 				struct nfs_lock_context *dst_lock,
 				struct nfs42_copy_args *args,
-				struct nfs42_copy_res *res)
+				struct nfs42_copy_res *res,
+				struct nl4_server *nss,
+				nfs4_stateid *cnr_stateid)
 {
 	struct rpc_message msg = {
 		.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_COPY],
@@ -256,11 +258,15 @@ static ssize_t _nfs42_proc_copy(struct file *src,
 	size_t count = args->count;
 	ssize_t status;
 
-	status = nfs4_set_rw_stateid(&args->src_stateid, src_lock->open_context,
-				     src_lock, FMODE_READ);
-	if (status)
-		return status;
-
+	if (nss) {
+		args->cp_src = nss;
+		nfs4_stateid_copy(&args->src_stateid, cnr_stateid);
+	} else {
+		status = nfs4_set_rw_stateid(&args->src_stateid,
+				src_lock->open_context, src_lock, FMODE_READ);
+		if (status)
+			return status;
+	}
 	status = nfs_filemap_write_and_wait_range(file_inode(src)->i_mapping,
 			pos_src, pos_src + (loff_t)count - 1);
 	if (status)
@@ -324,8 +330,9 @@ static ssize_t _nfs42_proc_copy(struct file *src,
 }
 
 ssize_t nfs42_proc_copy(struct file *src, loff_t pos_src,
-			struct file *dst, loff_t pos_dst,
-			size_t count)
+			struct file *dst, loff_t pos_dst, size_t count,
+			struct nl4_server *nss,
+			nfs4_stateid *cnr_stateid)
 {
 	struct nfs_server *server = NFS_SERVER(file_inode(dst));
 	struct nfs_lock_context *src_lock;
@@ -370,7 +377,8 @@ ssize_t nfs42_proc_copy(struct file *src, loff_t pos_src,
 		inode_lock(file_inode(dst));
 		err = _nfs42_proc_copy(src, src_lock,
 				dst, dst_lock,
-				&args, &res);
+				&args, &res,
+				nss, cnr_stateid);
 		inode_unlock(file_inode(dst));
 
 		if (err >= 0)
diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
index e6e7cbf..c96c3f8 100644
--- a/fs/nfs/nfs42xdr.c
+++ b/fs/nfs/nfs42xdr.c
@@ -21,7 +21,10 @@
 #define encode_copy_maxsz		(op_encode_hdr_maxsz +          \
 					 XDR_QUADLEN(NFS4_STATEID_SIZE) + \
 					 XDR_QUADLEN(NFS4_STATEID_SIZE) + \
-					 2 + 2 + 2 + 1 + 1 + 1)
+					 2 + 2 + 2 + 1 + 1 + 1 +\
+					 1 + /* One cnr_source_server */\
+					 1 + /* nl4_type */ \
+					 1 + XDR_QUADLEN(NFS4_OPAQUE_LIMIT))
 #define decode_copy_maxsz		(op_decode_hdr_maxsz + \
 					 NFS42_WRITE_RES_SIZE + \
 					 1 /* cr_consecutive */ + \
@@ -186,7 +189,12 @@ static void encode_copy(struct xdr_stream *xdr,
 
 	encode_uint32(xdr, 1); /* consecutive = true */
 	encode_uint32(xdr, args->sync);
-	encode_uint32(xdr, 0); /* src server list */
+	if (args->cp_src == NULL) { /* intra-ssc */
+		encode_uint32(xdr, 0); /* no src server list */
+		return;
+	}
+	encode_uint32(xdr, 1); /* supporting 1 server */
+	encode_nl4_server(xdr, args->cp_src);
 }
 
 static void encode_offload_cancel(struct xdr_stream *xdr,
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index 6d53750..032996b 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -134,6 +134,8 @@ static ssize_t nfs4_copy_file_range(struct file *file_in, loff_t pos_in,
 				    size_t count, unsigned int flags)
 {
 	struct nfs42_copy_notify_res *cn_resp = NULL;
+	struct nl4_server *nss = NULL;
+	nfs4_stateid *cnrs = NULL;
 	ssize_t ret;
 
 	if (file_in->f_inode->i_sb != file_out->f_inode->i_sb)
@@ -151,9 +153,12 @@ static ssize_t nfs4_copy_file_range(struct file *file_in, loff_t pos_in,
 		ret = nfs42_proc_copy_notify(file_in, file_out, cn_resp);
 		if (ret)
 			goto out;
+		nss = &cn_resp->cnr_src;
+		cnrs = &cn_resp->cnr_stateid;
 	}
 
-	ret = nfs42_proc_copy(file_in, pos_in, file_out, pos_out, count);
+	ret = nfs42_proc_copy(file_in, pos_in, file_out, pos_out, count, nss,
+				cnrs);
 out:
 	kfree(cn_resp);
 	if (ret == -EAGAIN)
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index dfc59bc..3a40b17 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1400,6 +1400,7 @@ struct nfs42_copy_args {
 
 	u64				count;
 	bool				sync;
+	struct nl4_server		*cp_src;
 };
 
 struct nfs42_write_res {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v4 06/11] NFS: also send OFFLOAD_CANCEL to source server
  2018-10-26 20:10 [PATCH v4 00/11] client-side support for "inter" SSC copy Olga Kornievskaia
                   ` (5 preceding siblings ...)
  2018-10-26 20:10 ` [PATCH v4 05/11] NFS: add ca_source_server<> to COPY Olga Kornievskaia
@ 2018-10-26 20:10 ` Olga Kornievskaia
  2018-10-26 20:10 ` [PATCH v4 07/11] NFS: inter ssc open Olga Kornievskaia
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 43+ messages in thread
From: Olga Kornievskaia @ 2018-10-26 20:10 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, viro, smfrench, miklos
  Cc: linux-nfs, linux-fsdevel, linux-cifs, linux-unionfs, linux-man

From: Olga Kornievskaia <kolga@netapp.com>

In case of copy is cancelled, also send OFFLOAD_CANCEL to the source
server.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfs/nfs42proc.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index bb9e799..98fe00b 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -205,12 +205,14 @@ static int handle_async_copy(struct nfs42_copy_res *res,
 	memcpy(&res->write_res.verifier, &copy->verf, sizeof(copy->verf));
 	status = -copy->error;
 
+out_free:
 	kfree(copy);
 	return status;
 out_cancel:
 	nfs42_do_offload_cancel_async(dst, &copy->stateid);
-	kfree(copy);
-	return status;
+	if (!nfs42_files_from_same_server(src, dst))
+		nfs42_do_offload_cancel_async(src, src_stateid);
+	goto out_free;
 }
 
 static int process_copy_commit(struct file *dst, loff_t pos_dst,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v4 07/11] NFS: inter ssc open
  2018-10-26 20:10 [PATCH v4 00/11] client-side support for "inter" SSC copy Olga Kornievskaia
                   ` (6 preceding siblings ...)
  2018-10-26 20:10 ` [PATCH v4 06/11] NFS: also send OFFLOAD_CANCEL to source server Olga Kornievskaia
@ 2018-10-26 20:10 ` Olga Kornievskaia
  2018-10-26 20:10 ` [PATCH v4 08/11] NFS: skip recovery of copy open on dest server Olga Kornievskaia
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 43+ messages in thread
From: Olga Kornievskaia @ 2018-10-26 20:10 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, viro, smfrench, miklos
  Cc: linux-nfs, linux-fsdevel, linux-cifs, linux-unionfs, linux-man

From: Olga Kornievskaia <kolga@netapp.com>

NFSv4.2 inter server to server copy requires the destination server to
READ the data from the source server using the provided stateid and
file handle.

Given an NFSv4 stateid and filehandle from the COPY operaion, provide the
destination server with an NFS client function to create a struct file
suitable for the destiniation server to READ the data to be copied.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4_fs.h  |  7 +++++
 fs/nfs/nfs4file.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/nfs4proc.c |  5 ++-
 3 files changed, 103 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 7d17b31..9c566a4 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -307,6 +307,13 @@ extern int nfs4_set_rw_stateid(nfs4_stateid *stateid,
 		const struct nfs_open_context *ctx,
 		const struct nfs_lock_context *l_ctx,
 		fmode_t fmode);
+extern int nfs4_proc_getattr(struct nfs_server *server, struct nfs_fh *fhandle,
+			     struct nfs_fattr *fattr, struct nfs4_label *label,
+			     struct inode *inode);
+extern int update_open_stateid(struct nfs4_state *state,
+				const nfs4_stateid *open_stateid,
+				const nfs4_stateid *deleg_stateid,
+				fmode_t fmode);
 
 #if defined(CONFIG_NFS_V4_1)
 extern int nfs41_sequence_done(struct rpc_task *, struct nfs4_sequence_res *);
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index 032996b..19b1b3f 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -8,6 +8,7 @@
 #include <linux/file.h>
 #include <linux/falloc.h>
 #include <linux/nfs_fs.h>
+#include <linux/file.h>
 #include "delegation.h"
 #include "internal.h"
 #include "iostat.h"
@@ -264,6 +265,99 @@ static int nfs42_clone_file_range(struct file *src_file, loff_t src_off,
 out:
 	return ret;
 }
+
+static int read_name_gen = 1;
+#define SSC_READ_NAME_BODY "ssc_read_%d"
+
+struct file *
+nfs42_ssc_open(struct vfsmount *ss_mnt, struct nfs_fh *src_fh,
+		nfs4_stateid *stateid)
+{
+	struct nfs_fattr fattr;
+	struct file *filep, *res;
+	struct nfs_server *server;
+	struct inode *r_ino = NULL;
+	struct nfs_open_context *ctx;
+	struct nfs4_state_owner *sp;
+	char *read_name;
+	int len, status = 0;
+
+	server = NFS_SERVER(ss_mnt->mnt_root->d_inode);
+
+	nfs_fattr_init(&fattr);
+
+	status = nfs4_proc_getattr(server, src_fh, &fattr, NULL, NULL);
+	if (status < 0) {
+		res = ERR_PTR(status);
+		goto out;
+	}
+
+	res = ERR_PTR(-ENOMEM);
+	len = strlen(SSC_READ_NAME_BODY) + 16;
+	read_name = kzalloc(len, GFP_NOFS);
+	if (read_name == NULL)
+		goto out;
+	snprintf(read_name, len, SSC_READ_NAME_BODY, read_name_gen++);
+
+	r_ino = nfs_fhget(ss_mnt->mnt_root->d_inode->i_sb, src_fh, &fattr,
+			NULL);
+	if (IS_ERR(r_ino)) {
+		res = ERR_CAST(r_ino);
+		goto out;
+	}
+
+	filep = alloc_file_pseudo(r_ino, ss_mnt, read_name, FMODE_READ,
+				     r_ino->i_fop);
+	if (IS_ERR(filep)) {
+		res = ERR_CAST(filep);
+		goto out;
+	}
+	filep->f_mode |= FMODE_READ;
+
+	ctx = alloc_nfs_open_context(filep->f_path.dentry, filep->f_mode,
+					filep);
+	if (IS_ERR(ctx)) {
+		res = ERR_CAST(ctx);
+		goto out_filep;
+	}
+
+	res = ERR_PTR(-EINVAL);
+	sp = nfs4_get_state_owner(server, ctx->cred, GFP_KERNEL);
+	if (sp == NULL)
+		goto out_ctx;
+
+	ctx->state = nfs4_get_open_state(r_ino, sp);
+	if (ctx->state == NULL)
+		goto out_stateowner;
+
+	set_bit(NFS_OPEN_STATE, &ctx->state->flags);
+	memcpy(&ctx->state->open_stateid.other, &stateid->other,
+	       NFS4_STATEID_OTHER_SIZE);
+	update_open_stateid(ctx->state, stateid, NULL, filep->f_mode);
+
+	nfs_file_set_open_context(filep, ctx);
+	put_nfs_open_context(ctx);
+
+	file_ra_state_init(&filep->f_ra, filep->f_mapping->host->i_mapping);
+	res = filep;
+out:
+	return res;
+out_stateowner:
+	nfs4_put_state_owner(sp);
+out_ctx:
+	put_nfs_open_context(ctx);
+out_filep:
+	fput(filep);
+	goto out;
+}
+EXPORT_SYMBOL_GPL(nfs42_ssc_open);
+void nfs42_ssc_close(struct file *filep)
+{
+	struct nfs_open_context *ctx = nfs_file_open_context(filep);
+
+	ctx->state->flags = 0;
+}
+EXPORT_SYMBOL_GPL(nfs42_ssc_close);
 #endif /* CONFIG_NFS_V4_2 */
 
 const struct file_operations nfs4_file_operations = {
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index fec6e6b..e5178b2f 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -91,7 +91,6 @@
 static int _nfs4_recover_proc_open(struct nfs4_opendata *data);
 static int nfs4_do_fsinfo(struct nfs_server *, struct nfs_fh *, struct nfs_fsinfo *);
 static void nfs_fixup_referral_attributes(struct nfs_fattr *fattr);
-static int nfs4_proc_getattr(struct nfs_server *, struct nfs_fh *, struct nfs_fattr *, struct nfs4_label *label, struct inode *inode);
 static int _nfs4_proc_getattr(struct nfs_server *server, struct nfs_fh *fhandle, struct nfs_fattr *fattr, struct nfs4_label *label, struct inode *inode);
 static int nfs4_do_setattr(struct inode *inode, struct rpc_cred *cred,
 			    struct nfs_fattr *fattr, struct iattr *sattr,
@@ -1653,7 +1652,7 @@ static void nfs_state_clear_delegation(struct nfs4_state *state)
 	write_sequnlock(&state->seqlock);
 }
 
-static int update_open_stateid(struct nfs4_state *state,
+int update_open_stateid(struct nfs4_state *state,
 		const nfs4_stateid *open_stateid,
 		const nfs4_stateid *delegation,
 		fmode_t fmode)
@@ -3936,7 +3935,7 @@ static int _nfs4_proc_getattr(struct nfs_server *server, struct nfs_fh *fhandle,
 	return nfs4_call_sync(server->client, server, &msg, &args.seq_args, &res.seq_res, 0);
 }
 
-static int nfs4_proc_getattr(struct nfs_server *server, struct nfs_fh *fhandle,
+int nfs4_proc_getattr(struct nfs_server *server, struct nfs_fh *fhandle,
 				struct nfs_fattr *fattr, struct nfs4_label *label,
 				struct inode *inode)
 {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v4 08/11] NFS: skip recovery of copy open on dest server
  2018-10-26 20:10 [PATCH v4 00/11] client-side support for "inter" SSC copy Olga Kornievskaia
                   ` (7 preceding siblings ...)
  2018-10-26 20:10 ` [PATCH v4 07/11] NFS: inter ssc open Olga Kornievskaia
@ 2018-10-26 20:10 ` Olga Kornievskaia
  2018-10-26 20:10 ` [PATCH v4 09/11] NFS: for "inter" copy treat ESTALE as ENOTSUPP Olga Kornievskaia
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 43+ messages in thread
From: Olga Kornievskaia @ 2018-10-26 20:10 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, viro, smfrench, miklos
  Cc: linux-nfs, linux-fsdevel, linux-cifs, linux-unionfs, linux-man

From: Olga Kornievskaia <kolga@netapp.com>

Mark the open created for the source file on the destination
server. Then if this open is going thru a recovery, then fail
the recovery as we don't need to be recoving a "fake" open.
We need to fail the ongoing READs and vfs_copy_file_range().

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfs/nfs4_fs.h   |  1 +
 fs/nfs/nfs4file.c  |  1 +
 fs/nfs/nfs4state.c | 14 ++++++++++++++
 3 files changed, 16 insertions(+)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 9c566a4..482754d 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -165,6 +165,7 @@ enum {
 	NFS_STATE_CHANGE_WAIT,		/* A state changing operation is outstanding */
 #ifdef CONFIG_NFS_V4_2
 	NFS_CLNT_DST_SSC_COPY_STATE,    /* dst server open state on client*/
+	NFS_SRV_SSC_COPY_STATE,		/* ssc state on the dst server */
 #endif /* CONFIG_NFS_V4_2 */
 };
 
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index 19b1b3f..2f31f30 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -330,6 +330,7 @@ struct file *
 	if (ctx->state == NULL)
 		goto out_stateowner;
 
+	set_bit(NFS_SRV_SSC_COPY_STATE, &ctx->state->flags);
 	set_bit(NFS_OPEN_STATE, &ctx->state->flags);
 	memcpy(&ctx->state->open_stateid.other, &stateid->other,
 	       NFS4_STATEID_OTHER_SIZE);
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 62ae0fd..b0b82c6 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1606,6 +1606,9 @@ static int nfs4_reclaim_open_state(struct nfs4_state_owner *sp, const struct nfs
 {
 	struct nfs4_state *state;
 	int status = 0;
+#ifdef CONFIG_NFS_V4_2
+	bool found_ssc_copy_state = false;
+#endif /* CONFIG_NFS_V4_2 */
 
 	/* Note: we rely on the sp->so_states list being ordered 
 	 * so that we always reclaim open(O_RDWR) and/or open(O_WRITE)
@@ -1625,6 +1628,13 @@ static int nfs4_reclaim_open_state(struct nfs4_state_owner *sp, const struct nfs
 			continue;
 		if (state->state == 0)
 			continue;
+#ifdef CONFIG_NFS_V4_2
+		if (test_bit(NFS_SRV_SSC_COPY_STATE, &state->flags)) {
+			nfs4_state_mark_recovery_failed(state, -EIO);
+			found_ssc_copy_state = true;
+			continue;
+		}
+#endif /* CONFIG_NFS_V4_2 */
 		refcount_inc(&state->count);
 		spin_unlock(&sp->so_lock);
 		status = __nfs4_reclaim_open_state(sp, state, ops);
@@ -1671,6 +1681,10 @@ static int nfs4_reclaim_open_state(struct nfs4_state_owner *sp, const struct nfs
 	}
 	raw_write_seqcount_end(&sp->so_reclaim_seqcount);
 	spin_unlock(&sp->so_lock);
+#ifdef CONFIG_NFS_V4_2
+	if (found_ssc_copy_state)
+		return -EIO;
+#endif /* CONFIG_NFS_V4_2 */
 	return 0;
 out_err:
 	nfs4_put_open_state(state);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v4 09/11] NFS: for "inter" copy treat ESTALE as ENOTSUPP
  2018-10-26 20:10 [PATCH v4 00/11] client-side support for "inter" SSC copy Olga Kornievskaia
                   ` (8 preceding siblings ...)
  2018-10-26 20:10 ` [PATCH v4 08/11] NFS: skip recovery of copy open on dest server Olga Kornievskaia
@ 2018-10-26 20:10 ` Olga Kornievskaia
  2018-10-26 20:10 ` [PATCH v4 10/11] NFS: COPY handle ERR_OFFLOAD_DENIED Olga Kornievskaia
  2018-10-26 20:10 ` [PATCH v4 11/11] NFS: replace cross device check in copy_file_range Olga Kornievskaia
  11 siblings, 0 replies; 43+ messages in thread
From: Olga Kornievskaia @ 2018-10-26 20:10 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, viro, smfrench, miklos
  Cc: linux-nfs, linux-fsdevel, linux-cifs, linux-unionfs, linux-man

From: Olga Kornievskaia <kolga@netapp.com>

If the client sends an "inter" copy to the destination server but
it only supports "intra" copy, it can return ESTALE (since it
doesn't know anything about the file handle from the other server
and does not recognize the special case of "inter" copy). Translate
this error as ENOTSUPP and also send OFFLOAD_CANCEL to the source
server so that it can clean up state.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfs/nfs42proc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index 98fe00b..00809b3 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -395,6 +395,11 @@ ssize_t nfs42_proc_copy(struct file *src, loff_t pos_src,
 			args.sync = true;
 			dst_exception.retry = 1;
 			continue;
+		} else if (err == -ESTALE &&
+				!nfs42_files_from_same_server(src, dst)) {
+			nfs42_do_offload_cancel_async(src, &args.src_stateid);
+			err = -EOPNOTSUPP;
+			break;
 		}
 
 		err2 = nfs4_handle_exception(server, err, &src_exception);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v4 10/11] NFS: COPY handle ERR_OFFLOAD_DENIED
  2018-10-26 20:10 [PATCH v4 00/11] client-side support for "inter" SSC copy Olga Kornievskaia
                   ` (9 preceding siblings ...)
  2018-10-26 20:10 ` [PATCH v4 09/11] NFS: for "inter" copy treat ESTALE as ENOTSUPP Olga Kornievskaia
@ 2018-10-26 20:10 ` Olga Kornievskaia
  2018-10-26 20:10 ` [PATCH v4 11/11] NFS: replace cross device check in copy_file_range Olga Kornievskaia
  11 siblings, 0 replies; 43+ messages in thread
From: Olga Kornievskaia @ 2018-10-26 20:10 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, viro, smfrench, miklos
  Cc: linux-nfs, linux-fsdevel, linux-cifs, linux-unionfs, linux-man

From: Olga Kornievskaia <kolga@netapp.com>

If server sends ERR_OFFLOAD_DENIED error, the client must fall
back on doing copy the normal way. Return ENOTSUPP to the vfs and
fallback to regular copy.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfs/nfs42proc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index 00809b3..c7c2ffa 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -395,7 +395,8 @@ ssize_t nfs42_proc_copy(struct file *src, loff_t pos_src,
 			args.sync = true;
 			dst_exception.retry = 1;
 			continue;
-		} else if (err == -ESTALE &&
+		} else if ((err == -ESTALE ||
+				err == -NFS4ERR_OFFLOAD_DENIED) &&
 				!nfs42_files_from_same_server(src, dst)) {
 			nfs42_do_offload_cancel_async(src, &args.src_stateid);
 			err = -EOPNOTSUPP;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v4 11/11] NFS: replace cross device check in copy_file_range
  2018-10-26 20:10 [PATCH v4 00/11] client-side support for "inter" SSC copy Olga Kornievskaia
                   ` (10 preceding siblings ...)
  2018-10-26 20:10 ` [PATCH v4 10/11] NFS: COPY handle ERR_OFFLOAD_DENIED Olga Kornievskaia
@ 2018-10-26 20:10 ` Olga Kornievskaia
  2018-10-26 21:22   ` Matthew Wilcox
  2018-10-27 11:08   ` Jeff Layton
  11 siblings, 2 replies; 43+ messages in thread
From: Olga Kornievskaia @ 2018-10-26 20:10 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, viro, smfrench, miklos
  Cc: linux-nfs, linux-fsdevel, linux-cifs, linux-unionfs, linux-man

From: Olga Kornievskaia <kolga@netapp.com>

Add a check to disallow cross file systems copy offload, both
files are expected to be of NFS4.2+ type.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfs/nfs4file.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index 2f31f30..344e9dd 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -139,8 +139,14 @@ static ssize_t nfs4_copy_file_range(struct file *file_in, loff_t pos_in,
 	nfs4_stateid *cnrs = NULL;
 	ssize_t ret;
 
-	if (file_in->f_inode->i_sb != file_out->f_inode->i_sb)
+	if (file_in->f_op != &nfs4_file_operations)
 		return -EXDEV;
+	else {
+		struct nfs_client *c_in =
+			(NFS_SERVER(file_inode(file_in)))->nfs_client;
+		if (c_in->cl_minorversion < 2)
+			return -EXDEV;
+	}
 
 	if (file_inode(file_in) == file_inode(file_out))
 		return -EINVAL;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 11/11] NFS: replace cross device check in copy_file_range
  2018-10-26 20:10 ` [PATCH v4 11/11] NFS: replace cross device check in copy_file_range Olga Kornievskaia
@ 2018-10-26 21:22   ` Matthew Wilcox
  2018-10-27 11:08   ` Jeff Layton
  1 sibling, 0 replies; 43+ messages in thread
From: Matthew Wilcox @ 2018-10-26 21:22 UTC (permalink / raw)
  To: Olga Kornievskaia
  Cc: trond.myklebust, anna.schumaker, viro, smfrench, miklos,
	linux-nfs, linux-fsdevel, linux-cifs, linux-unionfs, linux-man

On Fri, Oct 26, 2018 at 04:10:57PM -0400, Olga Kornievskaia wrote:
> From: Olga Kornievskaia <kolga@netapp.com>
> 
> Add a check to disallow cross file systems copy offload, both
> files are expected to be of NFS4.2+ type.
> 
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>

Reviewed-by: Matthew Wilcox <willy@infradead.org>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 01/11] VFS: move cross device copy_file_range() check into filesystems
  2018-10-26 20:10 ` [PATCH v4 01/11] VFS: move cross device copy_file_range() check into filesystems Olga Kornievskaia
@ 2018-10-26 21:23   ` Matthew Wilcox
  2018-10-26 22:10   ` Steve French
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 43+ messages in thread
From: Matthew Wilcox @ 2018-10-26 21:23 UTC (permalink / raw)
  To: Olga Kornievskaia
  Cc: trond.myklebust, anna.schumaker, viro, smfrench, miklos,
	linux-nfs, linux-fsdevel, linux-cifs, linux-unionfs, linux-man

On Fri, Oct 26, 2018 at 04:10:46PM -0400, Olga Kornievskaia wrote:
> From: Olga Kornievskaia <kolga@netapp.com>
> 
> This patch makes it the responsibility of individual filesystems to
> allow or deny cross device copies.  Both NFS and CIFS have operations
> for cross-server copies, and later patches will implement this feature.
> 
> Note that as of this patch, the copy_file_range() function might be passed
> superblocks from different filesystem types. -EXDEV should be returned
> if cross device copies aren't supported, causing the VFS to fall back
> on using do_splice_direct().
> 
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>

Reviewed-by: Matthew Wilcox <willy@infradead.org>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 02/11] VFS: copy_file_range check validity of input source offset
  2018-10-26 20:10 ` [PATCH v4 02/11] VFS: copy_file_range check validity of input source offset Olga Kornievskaia
@ 2018-10-26 21:26   ` Matthew Wilcox
  2018-10-29 16:09     ` Olga Kornievskaia
  2018-10-27  9:27   ` Dave Chinner
  1 sibling, 1 reply; 43+ messages in thread
From: Matthew Wilcox @ 2018-10-26 21:26 UTC (permalink / raw)
  To: Olga Kornievskaia
  Cc: trond.myklebust, anna.schumaker, viro, smfrench, miklos,
	linux-nfs, linux-fsdevel, linux-cifs, linux-unionfs, linux-man

On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote:
> +++ b/fs/read_write.c
> @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  		}
>  	}
>  
> +	if (pos_in >= i_size_read(inode_in))
> +		return -EINVAL;
> +
>  	if (file_out->f_op->copy_file_range) {
>  		ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
>  						      pos_out, len, flags);

Is this the right place to check this?  Surely we should be checking for
this before calling clone_file_range()?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 01/11] VFS: move cross device copy_file_range() check into filesystems
  2018-10-26 20:10 ` [PATCH v4 01/11] VFS: move cross device copy_file_range() check into filesystems Olga Kornievskaia
  2018-10-26 21:23   ` Matthew Wilcox
@ 2018-10-26 22:10   ` Steve French
  2018-10-27  9:09   ` Dave Chinner
  2018-10-27 11:11   ` Jeff Layton
  3 siblings, 0 replies; 43+ messages in thread
From: Steve French @ 2018-10-26 22:10 UTC (permalink / raw)
  To: Olga Kornievskaia
  Cc: trond.myklebust, Anna Schumaker, Al Viro, Miklos Szeredi,
	linux-nfs, linux-fsdevel, CIFS, linux-unionfs, linux-man

Looks fine to me - will relax the semantics in followon patch for
cifs.ko (to allow for cases where we can support it) as soon as this
is in.

Reviewed-by: Steve French <stfrench@microsoft.com>
On Fri, Oct 26, 2018 at 3:11 PM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> From: Olga Kornievskaia <kolga@netapp.com>
>
> This patch makes it the responsibility of individual filesystems to
> allow or deny cross device copies.  Both NFS and CIFS have operations
> for cross-server copies, and later patches will implement this feature.
>
> Note that as of this patch, the copy_file_range() function might be passed
> superblocks from different filesystem types. -EXDEV should be returned
> if cross device copies aren't supported, causing the VFS to fall back
> on using do_splice_direct().
>
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  Documentation/filesystems/porting | 7 +++++++
>  fs/cifs/cifsfs.c                  | 3 +++
>  fs/nfs/nfs4file.c                 | 3 +++
>  fs/overlayfs/file.c               | 3 +++
>  fs/read_write.c                   | 9 +++------
>  5 files changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
> index 7b7b845..897e1e7 100644
> --- a/Documentation/filesystems/porting
> +++ b/Documentation/filesystems/porting
> @@ -622,3 +622,10 @@ in your dentry operations instead.
>         alloc_file_clone(file, flags, ops) does not affect any caller's references.
>         On success you get a new struct file sharing the mount/dentry with the
>         original, on failure - ERR_PTR().
> +--
> +[mandatory]
> +       ->copy_file_range() may now be passed files which belong to two
> +       different superblocks of the same file system type or which belong
> +       to two different filesystems types all together. As before, the
> +       destination's copy_file_range() is the function which is called.
> +       If it cannot copy ranges from the source, it should return -EXDEV.
> diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> index 7065426..f2d7f4f 100644
> --- a/fs/cifs/cifsfs.c
> +++ b/fs/cifs/cifsfs.c
> @@ -1114,6 +1114,9 @@ static ssize_t cifs_copy_file_range(struct file *src_file, loff_t off,
>         unsigned int xid = get_xid();
>         ssize_t rc;
>
> +       if (src_file->f_inode->i_sb != dst_file->f_inode->i_sb)
> +               return -EXDEV;
> +
>         rc = cifs_file_copychunk_range(xid, src_file, off, dst_file, destoff,
>                                         len, flags);
>         free_xid(xid);
> diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
> index 4288a6e..09df688 100644
> --- a/fs/nfs/nfs4file.c
> +++ b/fs/nfs/nfs4file.c
> @@ -135,6 +135,9 @@ static ssize_t nfs4_copy_file_range(struct file *file_in, loff_t pos_in,
>  {
>         ssize_t ret;
>
> +       if (file_in->f_inode->i_sb != file_out->f_inode->i_sb)
> +               return -EXDEV;
> +
>         if (file_inode(file_in) == file_inode(file_out))
>                 return -EINVAL;
>  retry:
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index aeaefd2..5282853 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -483,6 +483,9 @@ static ssize_t ovl_copy_file_range(struct file *file_in, loff_t pos_in,
>                                    struct file *file_out, loff_t pos_out,
>                                    size_t len, unsigned int flags)
>  {
> +       if (file_in->f_inode->i_sb != file_out->f_inode->i_sb)
> +               return -EXDEV;
> +
>         return ovl_copyfile(file_in, pos_in, file_out, pos_out, len, flags,
>                             OVL_COPY);
>  }
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 39b4a21..fb4ffca 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1575,10 +1575,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>             (file_out->f_flags & O_APPEND))
>                 return -EBADF;
>
> -       /* this could be relaxed once a method supports cross-fs copies */
> -       if (inode_in->i_sb != inode_out->i_sb)
> -               return -EXDEV;
> -
>         if (len == 0)
>                 return 0;
>
> @@ -1588,7 +1584,8 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>          * Try cloning first, this is supported by more file systems, and
>          * more efficient if both clone and copy are supported (e.g. NFS).
>          */
> -       if (file_in->f_op->clone_file_range) {
> +       if (inode_in->i_sb == inode_out->i_sb &&
> +                       file_in->f_op->clone_file_range) {
>                 ret = file_in->f_op->clone_file_range(file_in, pos_in,
>                                 file_out, pos_out, len);
>                 if (ret == 0) {
> @@ -1600,7 +1597,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>         if (file_out->f_op->copy_file_range) {
>                 ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
>                                                       pos_out, len, flags);
> -               if (ret != -EOPNOTSUPP)
> +               if (ret != -EOPNOTSUPP && ret != -EXDEV)
>                         goto done;
>         }
>
> --
> 1.8.3.1
>


-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 01/11] VFS: move cross device copy_file_range() check into filesystems
  2018-10-26 20:10 ` [PATCH v4 01/11] VFS: move cross device copy_file_range() check into filesystems Olga Kornievskaia
  2018-10-26 21:23   ` Matthew Wilcox
  2018-10-26 22:10   ` Steve French
@ 2018-10-27  9:09   ` Dave Chinner
  2018-10-29 14:31     ` Olga Kornievskaia
  2018-10-27 11:11   ` Jeff Layton
  3 siblings, 1 reply; 43+ messages in thread
From: Dave Chinner @ 2018-10-27  9:09 UTC (permalink / raw)
  To: Olga Kornievskaia
  Cc: trond.myklebust, anna.schumaker, viro, smfrench, miklos,
	linux-nfs, linux-fsdevel, linux-cifs, linux-unionfs, linux-man

On Fri, Oct 26, 2018 at 04:10:46PM -0400, Olga Kornievskaia wrote:
> From: Olga Kornievskaia <kolga@netapp.com>
> 
> This patch makes it the responsibility of individual filesystems to
> allow or deny cross device copies.  Both NFS and CIFS have operations
> for cross-server copies, and later patches will implement this feature.
> 
> Note that as of this patch, the copy_file_range() function might be passed
> superblocks from different filesystem types. -EXDEV should be returned
> if cross device copies aren't supported, causing the VFS to fall back
> on using do_splice_direct().
> 
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  Documentation/filesystems/porting | 7 +++++++
>  fs/cifs/cifsfs.c                  | 3 +++
>  fs/nfs/nfs4file.c                 | 3 +++
>  fs/overlayfs/file.c               | 3 +++
>  fs/read_write.c                   | 9 +++------
>  5 files changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
> index 7b7b845..897e1e7 100644
> --- a/Documentation/filesystems/porting
> +++ b/Documentation/filesystems/porting
> @@ -622,3 +622,10 @@ in your dentry operations instead.
>  	alloc_file_clone(file, flags, ops) does not affect any caller's references.
>  	On success you get a new struct file sharing the mount/dentry with the
>  	original, on failure - ERR_PTR().
> +--
> +[mandatory]
> +	->copy_file_range() may now be passed files which belong to two
> +	different superblocks of the same file system type or which belong
> +	to two different filesystems types all together. As before, the
> +	destination's copy_file_range() is the function which is called.
> +	If it cannot copy ranges from the source, it should return -EXDEV.
> diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> index 7065426..f2d7f4f 100644
> --- a/fs/cifs/cifsfs.c
> +++ b/fs/cifs/cifsfs.c
> @@ -1114,6 +1114,9 @@ static ssize_t cifs_copy_file_range(struct file *src_file, loff_t off,
>  	unsigned int xid = get_xid();
>  	ssize_t rc;
>  
> +	if (src_file->f_inode->i_sb != dst_file->f_inode->i_sb)

file_inode(file)->i_sb, please.

> +		return -EXDEV;
> +
>  	rc = cifs_file_copychunk_range(xid, src_file, off, dst_file, destoff,
>  					len, flags);
>  	free_xid(xid);
> diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
> index 4288a6e..09df688 100644
> --- a/fs/nfs/nfs4file.c
> +++ b/fs/nfs/nfs4file.c
> @@ -135,6 +135,9 @@ static ssize_t nfs4_copy_file_range(struct file *file_in, loff_t pos_in,
>  {
>  	ssize_t ret;
>  
> +	if (file_in->f_inode->i_sb != file_out->f_inode->i_sb)
> +		return -EXDEV;
> +
>  	if (file_inode(file_in) == file_inode(file_out))
>  		return -EINVAL;

Please look at the code around your modifications and make sure your
additions are consistent with it.

> @@ -1588,7 +1584,8 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  	 * Try cloning first, this is supported by more file systems, and
>  	 * more efficient if both clone and copy are supported (e.g. NFS).
>  	 */
> -	if (file_in->f_op->clone_file_range) {
> +	if (inode_in->i_sb == inode_out->i_sb &&
> +			file_in->f_op->clone_file_range) {

	if (inode_in->i_sb == inode_out->i_sb &&
	    file_in->f_op->clone_file_range) {

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/1] man-page: copy_file_range(2) allow for cross-device copies
  2018-10-26 20:10 ` [PATCH 1/1] man-page: copy_file_range(2) allow for cross-device copies Olga Kornievskaia
@ 2018-10-27  9:12   ` Dave Chinner
  2018-10-27 13:23     ` Matthew Wilcox
  0 siblings, 1 reply; 43+ messages in thread
From: Dave Chinner @ 2018-10-27  9:12 UTC (permalink / raw)
  To: Olga Kornievskaia
  Cc: trond.myklebust, anna.schumaker, viro, smfrench, miklos,
	linux-nfs, linux-fsdevel, linux-cifs, linux-unionfs, linux-man

On Fri, Oct 26, 2018 at 04:10:47PM -0400, Olga Kornievskaia wrote:
> From: Olga Kornievskaia <kolga@netapp.com>
> 
> A proposed VFS change removes the check for the files to reside
> under the same file system. Instead, a file system driver implementation
> is allowed to perform a cross-device copy_file_range() and if
> the file system fails to support it instead fallback to doing
> a do_splice copy. Therefore, EXDEV error code only applies to
> kernel version prior to such support.
> 
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  man2/copy_file_range.2 | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
> index 20374ab..88b40bb 100644
> --- a/man2/copy_file_range.2
> +++ b/man2/copy_file_range.2
> @@ -39,7 +39,8 @@ The
>  .BR copy_file_range ()
>  system call performs an in-kernel copy between two file descriptors
>  without the additional cost of transferring data from the kernel to user space
> -and then back into the kernel.
> +and then back into the kernel. Starting kernel version 4.21 passed in
> +file descriptors are not required to be under the same mounted file system.
>  It copies up to
>  .I len
>  bytes of data from file descriptor
> @@ -131,7 +132,8 @@ There is not enough space on the target filesystem to complete the copy.
>  .B EXDEV
>  The files referred to by
>  .IR file_in " and " file_out
> -are not on the same mounted filesystem.
> +are not on the same mounted filesystem when the kernel does not support
> +cross device file copy.

Kernel can support cross device file copy, the filesystem may not.

EXDEV
	One of the files specified by file_in and file_out are on a
	filesystem that does not support cross device copies.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 02/11] VFS: copy_file_range check validity of input source offset
  2018-10-26 20:10 ` [PATCH v4 02/11] VFS: copy_file_range check validity of input source offset Olga Kornievskaia
  2018-10-26 21:26   ` Matthew Wilcox
@ 2018-10-27  9:27   ` Dave Chinner
  2018-10-29 14:41     ` Olga Kornievskaia
  1 sibling, 1 reply; 43+ messages in thread
From: Dave Chinner @ 2018-10-27  9:27 UTC (permalink / raw)
  To: Olga Kornievskaia
  Cc: trond.myklebust, anna.schumaker, viro, smfrench, miklos,
	linux-nfs, linux-fsdevel, linux-cifs, linux-unionfs, linux-man

On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote:
> From: Olga Kornievskaia <kolga@netapp.com>
> 
> Input source offset can't be beyond the end of the file.
> 
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  fs/read_write.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/read_write.c b/fs/read_write.c
> index fb4ffca..b3b304e 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  		}
>  	}
>  
> +	if (pos_in >= i_size_read(inode_in))
> +		return -EINVAL;
> +

vfs_copy_file_range seems ot be missing a wide range of checks.
rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the
checks in generic_write_checks() apply, right? And the same security
issues like stripping setuid bits, etc? And we need to touch
atime on the source file, too?

We've just merged 5 or so patches in 4.19-rc8 and we're ready to
merge another ~30 patch series to fix all the stuff missing from the
clone/dedupe file range operations that make them safe and robust.
It seems like copy_file_range is all the checks it needs, too?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 11/11] NFS: replace cross device check in copy_file_range
  2018-10-26 20:10 ` [PATCH v4 11/11] NFS: replace cross device check in copy_file_range Olga Kornievskaia
  2018-10-26 21:22   ` Matthew Wilcox
@ 2018-10-27 11:08   ` Jeff Layton
  2018-10-27 13:26     ` Matthew Wilcox
  1 sibling, 1 reply; 43+ messages in thread
From: Jeff Layton @ 2018-10-27 11:08 UTC (permalink / raw)
  To: Olga Kornievskaia, trond.myklebust, anna.schumaker, viro,
	smfrench, miklos
  Cc: linux-nfs, linux-fsdevel, linux-cifs, linux-unionfs, linux-man

On Fri, 2018-10-26 at 16:10 -0400, Olga Kornievskaia wrote:
> From: Olga Kornievskaia <kolga@netapp.com>
> 
> Add a check to disallow cross file systems copy offload, both
> files are expected to be of NFS4.2+ type.
> 
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  fs/nfs/nfs4file.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
> index 2f31f30..344e9dd 100644
> --- a/fs/nfs/nfs4file.c
> +++ b/fs/nfs/nfs4file.c
> @@ -139,8 +139,14 @@ static ssize_t nfs4_copy_file_range(struct file *file_in, loff_t pos_in,
>  	nfs4_stateid *cnrs = NULL;
>  	ssize_t ret;
>  
> -	if (file_in->f_inode->i_sb != file_out->f_inode->i_sb)
> +	if (file_in->f_op != &nfs4_file_operations)
>  		return -EXDEV;
> +	else {

nit: you don't really need the "else" here since the previous block
returns

> +		struct nfs_client *c_in =
> +			(NFS_SERVER(file_inode(file_in)))->nfs_client;
> +		if (c_in->cl_minorversion < 2)
> +			return -EXDEV;
> +	}
>  
>  	if (file_inode(file_in) == file_inode(file_out))
>  		return -EINVAL;

Looks fine though.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 01/11] VFS: move cross device copy_file_range() check into filesystems
  2018-10-26 20:10 ` [PATCH v4 01/11] VFS: move cross device copy_file_range() check into filesystems Olga Kornievskaia
                     ` (2 preceding siblings ...)
  2018-10-27  9:09   ` Dave Chinner
@ 2018-10-27 11:11   ` Jeff Layton
  3 siblings, 0 replies; 43+ messages in thread
From: Jeff Layton @ 2018-10-27 11:11 UTC (permalink / raw)
  To: Olga Kornievskaia, trond.myklebust, anna.schumaker, viro,
	smfrench, miklos
  Cc: linux-nfs, linux-fsdevel, linux-cifs, linux-unionfs, linux-man

On Fri, 2018-10-26 at 16:10 -0400, Olga Kornievskaia wrote:
> From: Olga Kornievskaia <kolga@netapp.com>
> 
> This patch makes it the responsibility of individual filesystems to
> allow or deny cross device copies.  Both NFS and CIFS have operations
> for cross-server copies, and later patches will implement this feature.
> 
> Note that as of this patch, the copy_file_range() function might be passed
> superblocks from different filesystem types. -EXDEV should be returned
> if cross device copies aren't supported, causing the VFS to fall back
> on using do_splice_direct().
> 
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  Documentation/filesystems/porting | 7 +++++++
>  fs/cifs/cifsfs.c                  | 3 +++
>  fs/nfs/nfs4file.c                 | 3 +++
>  fs/overlayfs/file.c               | 3 +++
>  fs/read_write.c                   | 9 +++------
>  5 files changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
> index 7b7b845..897e1e7 100644
> --- a/Documentation/filesystems/porting
> +++ b/Documentation/filesystems/porting
> @@ -622,3 +622,10 @@ in your dentry operations instead.
>  	alloc_file_clone(file, flags, ops) does not affect any caller's references.
>  	On success you get a new struct file sharing the mount/dentry with the
>  	original, on failure - ERR_PTR().
> +--
> +[mandatory]
> +	->copy_file_range() may now be passed files which belong to two
> +	different superblocks of the same file system type or which belong
> +	to two different filesystems types all together. As before, the
> +	destination's copy_file_range() is the function which is called.
> +	If it cannot copy ranges from the source, it should return -EXDEV.
> diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> index 7065426..f2d7f4f 100644
> --- a/fs/cifs/cifsfs.c
> +++ b/fs/cifs/cifsfs.c
> @@ -1114,6 +1114,9 @@ static ssize_t cifs_copy_file_range(struct file *src_file, loff_t off,
>  	unsigned int xid = get_xid();
>  	ssize_t rc;
>  
> +	if (src_file->f_inode->i_sb != dst_file->f_inode->i_sb)
> +		return -EXDEV;
> +
>  	rc = cifs_file_copychunk_range(xid, src_file, off, dst_file, destoff,
>  					len, flags);
>  	free_xid(xid);
> diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
> index 4288a6e..09df688 100644
> --- a/fs/nfs/nfs4file.c
> +++ b/fs/nfs/nfs4file.c
> @@ -135,6 +135,9 @@ static ssize_t nfs4_copy_file_range(struct file *file_in, loff_t pos_in,
>  {
>  	ssize_t ret;
>  
> +	if (file_in->f_inode->i_sb != file_out->f_inode->i_sb)
> +		return -EXDEV;
> +
>  	if (file_inode(file_in) == file_inode(file_out))
>  		return -EINVAL;
>  retry:
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index aeaefd2..5282853 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -483,6 +483,9 @@ static ssize_t ovl_copy_file_range(struct file *file_in, loff_t pos_in,
>  				   struct file *file_out, loff_t pos_out,
>  				   size_t len, unsigned int flags)
>  {
> +	if (file_in->f_inode->i_sb != file_out->f_inode->i_sb)
> +		return -EXDEV;
> +
>  	return ovl_copyfile(file_in, pos_in, file_out, pos_out, len, flags,
>  			    OVL_COPY);
>  }
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 39b4a21..fb4ffca 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1575,10 +1575,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  	    (file_out->f_flags & O_APPEND))
>  		return -EBADF;
>  
> -	/* this could be relaxed once a method supports cross-fs copies */
> -	if (inode_in->i_sb != inode_out->i_sb)
> -		return -EXDEV;
> -
>  	if (len == 0)
>  		return 0;
>  
> @@ -1588,7 +1584,8 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  	 * Try cloning first, this is supported by more file systems, and
>  	 * more efficient if both clone and copy are supported (e.g. NFS).
>  	 */
> -	if (file_in->f_op->clone_file_range) {
> +	if (inode_in->i_sb == inode_out->i_sb &&
> +			file_in->f_op->clone_file_range) {
>  		ret = file_in->f_op->clone_file_range(file_in, pos_in,
>  				file_out, pos_out, len);
>  		if (ret == 0) {
> @@ -1600,7 +1597,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  	if (file_out->f_op->copy_file_range) {
>  		ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
>  						      pos_out, len, flags);
> -		if (ret != -EOPNOTSUPP)
> +		if (ret != -EOPNOTSUPP && ret != -EXDEV)
>  			goto done;
>  	}
>  

Yes, this is the right way to do this sort of change, IMO. Push the
checks down into the fs' so they can be relaxed on a case-by-case basis.

Modulo dchinner's comments, this looks fine.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 03/11] NFS: NFSD defining nl4_servers structure needed by both
  2018-10-26 20:10 ` [PATCH v4 03/11] NFS: NFSD defining nl4_servers structure needed by both Olga Kornievskaia
@ 2018-10-27 11:14   ` Jeff Layton
  2018-10-29 14:28     ` Olga Kornievskaia
  0 siblings, 1 reply; 43+ messages in thread
From: Jeff Layton @ 2018-10-27 11:14 UTC (permalink / raw)
  To: Olga Kornievskaia, trond.myklebust, anna.schumaker, viro,
	smfrench, miklos
  Cc: linux-nfs, linux-fsdevel, linux-cifs, linux-unionfs, linux-man

On Fri, 2018-10-26 at 16:10 -0400, Olga Kornievskaia wrote:
> From: Olga Kornievskaia <kolga@netapp.com>
> 
> These structures are needed by COPY_NOTIFY on the client and needed
> by the nfsd as well
> 
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  include/linux/nfs4.h | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
> index 1b06f0b..4d76f87 100644
> --- a/include/linux/nfs4.h
> +++ b/include/linux/nfs4.h
> @@ -16,6 +16,7 @@
>  #include <linux/list.h>
>  #include <linux/uidgid.h>
>  #include <uapi/linux/nfs4.h>
> +#include <linux/sunrpc/msg_prot.h>
>  
>  enum nfs4_acl_whotype {
>  	NFS4_ACL_WHO_NAMED = 0,
> @@ -672,4 +673,27 @@ struct nfs4_op_map {
>  	} u;
>  };
>  
> +struct nfs42_netaddr {
> +	char		netid[RPCBIND_MAXNETIDLEN];
> +	char		addr[RPCBIND_MAXUADDRLEN + 1];
> +	u32		netid_len;
> +	u32 addr_len;

Could you fix the indentation above?

> +};
> +
> +enum netloc_type4 {
> +	NL4_NAME		= 1,
> +	NL4_URL			= 2,
> +	NL4_NETADDR		= 3,
> +};
> +
> +struct nl4_server {
> +	enum netloc_type4	nl4_type;
> +	union {
> +		struct { /* NL4_NAME, NL4_URL */
> +			int	nl4_str_sz;
> +			char	nl4_str[NFS4_OPAQUE_LIMIT + 1];
> +		};
> +		struct nfs42_netaddr	nl4_addr; /* NL4_NETADDR */
> +	} u;
> +};
>  #endif

Otherwise, looks fine.

Reviewed-by: Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/1] man-page: copy_file_range(2) allow for cross-device copies
  2018-10-27  9:12   ` Dave Chinner
@ 2018-10-27 13:23     ` Matthew Wilcox
  2018-10-28  1:33       ` Dave Chinner
  0 siblings, 1 reply; 43+ messages in thread
From: Matthew Wilcox @ 2018-10-27 13:23 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Olga Kornievskaia, trond.myklebust, anna.schumaker, viro,
	smfrench, miklos, linux-nfs, linux-fsdevel, linux-cifs,
	linux-unionfs, linux-man

On Sat, Oct 27, 2018 at 08:12:40PM +1100, Dave Chinner wrote:
> > @@ -131,7 +132,8 @@ There is not enough space on the target filesystem to complete the copy.
> >  .B EXDEV
> >  The files referred to by
> >  .IR file_in " and " file_out
> > -are not on the same mounted filesystem.
> > +are not on the same mounted filesystem when the kernel does not support
> > +cross device file copy.
> 
> Kernel can support cross device file copy, the filesystem may not.
> 
> EXDEV
> 	One of the files specified by file_in and file_out are on a
> 	filesystem that does not support cross device copies.

I mentioned this in my last review, and Olga pointed out that one of
the changes in this patch means the kernel will do the copy using
do_splice_direct if the filesystem doesn't support cross-device copying.
We should keep this error documented for those on old kernels, but the
kernel will never return -EXDEV any more.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 11/11] NFS: replace cross device check in copy_file_range
  2018-10-27 11:08   ` Jeff Layton
@ 2018-10-27 13:26     ` Matthew Wilcox
  2018-10-29 14:28       ` Olga Kornievskaia
  0 siblings, 1 reply; 43+ messages in thread
From: Matthew Wilcox @ 2018-10-27 13:26 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Olga Kornievskaia, trond.myklebust, anna.schumaker, viro,
	smfrench, miklos, linux-nfs, linux-fsdevel, linux-cifs,
	linux-unionfs, linux-man

On Sat, Oct 27, 2018 at 07:08:11AM -0400, Jeff Layton wrote:
> >  
> > -	if (file_in->f_inode->i_sb != file_out->f_inode->i_sb)
> > +	if (file_in->f_op != &nfs4_file_operations)
> >  		return -EXDEV;
> > +	else {
> 
> nit: you don't really need the "else" here since the previous block
> returns
> 
> > +		struct nfs_client *c_in =
> > +			(NFS_SERVER(file_inode(file_in)))->nfs_client;
> > +		if (c_in->cl_minorversion < 2)
> > +			return -EXDEV;
> > +	}

Yeah, but if you don't have the else, then you need to declare the c_in
at the beginning of the function instead of in the new block.  Mind you,
if you do that then:

	c_in = NFS_SERVER(file_inode(file_in))->nfs_client;

fits on one line, so it does look a bit neater.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/1] man-page: copy_file_range(2) allow for cross-device copies
  2018-10-27 13:23     ` Matthew Wilcox
@ 2018-10-28  1:33       ` Dave Chinner
  2018-10-28  2:39         ` Matthew Wilcox
  2018-10-29 14:25         ` Olga Kornievskaia
  0 siblings, 2 replies; 43+ messages in thread
From: Dave Chinner @ 2018-10-28  1:33 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Olga Kornievskaia, trond.myklebust, anna.schumaker, viro,
	smfrench, miklos, linux-nfs, linux-fsdevel, linux-cifs,
	linux-unionfs, linux-man

On Sat, Oct 27, 2018 at 06:23:39AM -0700, Matthew Wilcox wrote:
> On Sat, Oct 27, 2018 at 08:12:40PM +1100, Dave Chinner wrote:
> > > @@ -131,7 +132,8 @@ There is not enough space on the target filesystem to complete the copy.
> > >  .B EXDEV
> > >  The files referred to by
> > >  .IR file_in " and " file_out
> > > -are not on the same mounted filesystem.
> > > +are not on the same mounted filesystem when the kernel does not support
> > > +cross device file copy.
> > 
> > Kernel can support cross device file copy, the filesystem may not.
> > 
> > EXDEV
> > 	One of the files specified by file_in and file_out are on a
> > 	filesystem that does not support cross device copies.
> 
> I mentioned this in my last review, and Olga pointed out that one of
> the changes in this patch means the kernel will do the copy using
> do_splice_direct if the filesystem doesn't support cross-device copying.
> We should keep this error documented for those on old kernels, but the
> kernel will never return -EXDEV any more.

Uh, wot? Where's the patch named "VFS: enable copy_file_range() for
all filesystems"? That shoul dnot be a detail hidden inside some
other patch that multiple people completely miss during review.

If we are completely changing the kernel's behaviour, the patch
should be explicitly named to call out the change of behaviour, and
the commit message should clearly explain the change being made and
why.
 
/me goes looking.

Yup, it is only mentione din passing as a side-effect of an
implementation change. That's back to front. Describe the behaviour
change, what users will see and the reasons for making the change,
leave the code to describe exactly what the change is. Then you can
describe the actions needed to make the new functionality work. e.g.
The first patch shoul dbe described as:

VFS: generic cross-device copy_file_range() support for all filesystems

From: ....

In preparation for enabling cross-device offloading of
copy_file_range() functionality, first enable generic cross-device
support by allowing copy_file_range() to fall back to a page cache
based physical data copy. This means the copy_file_range() syscall
will never fail with EXDEV, and so in future userspace will not need
to detect or provide a fallback for failed cross-device copies
anymore.

This requires pushing the cross device support checks down into the
filesystem ->copy_file_range methods and falling back to the page
cache copy if they return EXDEV.

Further, clone_file_range() is only supported within a filesystem,
so we must also check that the files belong to the same superblock
before calling ->clone_file_range(). If they are on different
superblocks, skip the attempt to clone the file and instead try to
copy the file.

S-o-B: .....


Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/1] man-page: copy_file_range(2) allow for cross-device copies
  2018-10-28  1:33       ` Dave Chinner
@ 2018-10-28  2:39         ` Matthew Wilcox
  2018-10-29 14:25         ` Olga Kornievskaia
  1 sibling, 0 replies; 43+ messages in thread
From: Matthew Wilcox @ 2018-10-28  2:39 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Olga Kornievskaia, trond.myklebust, anna.schumaker, viro,
	smfrench, miklos, linux-nfs, linux-fsdevel, linux-cifs,
	linux-unionfs, linux-man

On Sun, Oct 28, 2018 at 12:33:07PM +1100, Dave Chinner wrote:
> On Sat, Oct 27, 2018 at 06:23:39AM -0700, Matthew Wilcox wrote:
> > I mentioned this in my last review, and Olga pointed out that one of
> > the changes in this patch means the kernel will do the copy using
> > do_splice_direct if the filesystem doesn't support cross-device copying.
> > We should keep this error documented for those on old kernels, but the
> > kernel will never return -EXDEV any more.
> 
> Uh, wot? Where's the patch named "VFS: enable copy_file_range() for
> all filesystems"? That shoul dnot be a detail hidden inside some
> other patch that multiple people completely miss during review.

Yes, I completely agree.

I'd actually suggest doing the patches the other way around; first push
the -EXDEV check into the filesystems, then make an EXDEV return cause a
call to do_splice_direct().  I think that makes for a more understandable
patch series (ie it's just splitting the last hunk from the existing
patch 1 into a separate patch with an appropriate changelog).

> If we are completely changing the kernel's behaviour, the patch
> should be explicitly named to call out the change of behaviour, and
> the commit message should clearly explain the change being made and
> why.
>  
> /me goes looking.
> 
> Yup, it is only mentione din passing as a side-effect of an
> implementation change. That's back to front. Describe the behaviour
> change, what users will see and the reasons for making the change,
> leave the code to describe exactly what the change is. Then you can
> describe the actions needed to make the new functionality work. e.g.
> The first patch shoul dbe described as:
> 
> VFS: generic cross-device copy_file_range() support for all filesystems
> 
> From: ....
> 
> In preparation for enabling cross-device offloading of
> copy_file_range() functionality, first enable generic cross-device
> support by allowing copy_file_range() to fall back to a page cache
> based physical data copy. This means the copy_file_range() syscall
> will never fail with EXDEV, and so in future userspace will not need
> to detect or provide a fallback for failed cross-device copies
> anymore.
> 
> This requires pushing the cross device support checks down into the
> filesystem ->copy_file_range methods and falling back to the page
> cache copy if they return EXDEV.
> 
> Further, clone_file_range() is only supported within a filesystem,
> so we must also check that the files belong to the same superblock
> before calling ->clone_file_range(). If they are on different
> superblocks, skip the attempt to clone the file and instead try to
> copy the file.
> 
> S-o-B: .....
> 
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/1] man-page: copy_file_range(2) allow for cross-device copies
  2018-10-28  1:33       ` Dave Chinner
  2018-10-28  2:39         ` Matthew Wilcox
@ 2018-10-29 14:25         ` Olga Kornievskaia
  2018-10-29 15:52           ` Olga Kornievskaia
  1 sibling, 1 reply; 43+ messages in thread
From: Olga Kornievskaia @ 2018-10-29 14:25 UTC (permalink / raw)
  To: david
  Cc: willy, trond.myklebust, Anna Schumaker, viro, Steve French,
	Miklos Szeredi, linux-nfs, linux-fsdevel, linux-cifs,
	linux-unionfs, linux-man

On Sat, Oct 27, 2018 at 9:33 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Sat, Oct 27, 2018 at 06:23:39AM -0700, Matthew Wilcox wrote:
> > On Sat, Oct 27, 2018 at 08:12:40PM +1100, Dave Chinner wrote:
> > > > @@ -131,7 +132,8 @@ There is not enough space on the target filesystem to complete the copy.
> > > >  .B EXDEV
> > > >  The files referred to by
> > > >  .IR file_in " and " file_out
> > > > -are not on the same mounted filesystem.
> > > > +are not on the same mounted filesystem when the kernel does not support
> > > > +cross device file copy.
> > >
> > > Kernel can support cross device file copy, the filesystem may not.
> > >
> > > EXDEV
> > >     One of the files specified by file_in and file_out are on a
> > >     filesystem that does not support cross device copies.
> >
> > I mentioned this in my last review, and Olga pointed out that one of
> > the changes in this patch means the kernel will do the copy using
> > do_splice_direct if the filesystem doesn't support cross-device copying.
> > We should keep this error documented for those on old kernels, but the
> > kernel will never return -EXDEV any more.
>
> Uh, wot? Where's the patch named "VFS: enable copy_file_range() for
> all filesystems"? That shoul dnot be a detail hidden inside some
> other patch that multiple people completely miss during review.

The fact that cross-device check was moved is what allowed for all
filesystems to use copy_file_range(). I think choosing the words of
"moving cross device check" was also appropriate title for the VFS
patch.

To other what you thought was the main change was a side-effect or at
least that's not where the discussion was centered. The discussion was
focused on the fact that there is cross device of same and different
file systems and such discussion deemed appropriate to be noted
clearly in the commit message. The proposed commit message below
doesn't capture it.

I'm fine with the commit message below as well. If that's acceptable
to others. I can change the commit to the wording below.

When I started reading the message I thought your comments where about
the "man-page" patch that it lacked the wording including in the VFS
patch. So to clarify, do you have any objections to the wording in the
man-page patch or was this just about the VFS patch?

> If we are completely changing the kernel's behaviour, the patch
> should be explicitly named to call out the change of behaviour, and
> the commit message should clearly explain the change being made and
> why.
>
> /me goes looking.
>
> Yup, it is only mentione din passing as a side-effect of an
> implementation change. That's back to front. Describe the behaviour
> change, what users will see and the reasons for making the change,
> leave the code to describe exactly what the change is. Then you can
> describe the actions needed to make the new functionality work. e.g.
> The first patch shoul dbe described as:
>
> VFS: generic cross-device copy_file_range() support for all filesystems

> From: ....
>
> In preparation for enabling cross-device offloading of
> copy_file_range() functionality, first enable generic cross-device
> support by allowing copy_file_range() to fall back to a page cache
> based physical data copy. This means the copy_file_range() syscall
> will never fail with EXDEV, and so in future userspace will not need
> to detect or provide a fallback for failed cross-device copies
> anymore.
>
> This requires pushing the cross device support checks down into the
> filesystem ->copy_file_range methods and falling back to the page
> cache copy if they return EXDEV.
>
> Further, clone_file_range() is only supported within a filesystem,
> so we must also check that the files belong to the same superblock
> before calling ->clone_file_range(). If they are on different
> superblocks, skip the attempt to clone the file and instead try to
> copy the file.
>
> S-o-B: .....

>
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 11/11] NFS: replace cross device check in copy_file_range
  2018-10-27 13:26     ` Matthew Wilcox
@ 2018-10-29 14:28       ` Olga Kornievskaia
  0 siblings, 0 replies; 43+ messages in thread
From: Olga Kornievskaia @ 2018-10-29 14:28 UTC (permalink / raw)
  To: willy
  Cc: jlayton, trond.myklebust, Anna Schumaker, viro, Steve French,
	Miklos Szeredi, linux-nfs, linux-fsdevel, linux-cifs,
	linux-unionfs, linux-man

On Sat, Oct 27, 2018 at 9:26 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Sat, Oct 27, 2018 at 07:08:11AM -0400, Jeff Layton wrote:
> > >
> > > -   if (file_in->f_inode->i_sb != file_out->f_inode->i_sb)
> > > +   if (file_in->f_op != &nfs4_file_operations)
> > >             return -EXDEV;
> > > +   else {
> >
> > nit: you don't really need the "else" here since the previous block
> > returns
> >
> > > +           struct nfs_client *c_in =
> > > +                   (NFS_SERVER(file_inode(file_in)))->nfs_client;
> > > +           if (c_in->cl_minorversion < 2)
> > > +                   return -EXDEV;
> > > +   }
>
> Yeah, but if you don't have the else, then you need to declare the c_in
> at the beginning of the function instead of in the new block.  Mind you,
> if you do that then:
>
>         c_in = NFS_SERVER(file_inode(file_in))->nfs_client;
>
> fits on one line, so it does look a bit neater.

My thoughts for the "else" was to be able to get the nfs_client but
yes I could declare for the whole function and assign after the first
"if". I'll change it.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 03/11] NFS: NFSD defining nl4_servers structure needed by both
  2018-10-27 11:14   ` Jeff Layton
@ 2018-10-29 14:28     ` Olga Kornievskaia
  0 siblings, 0 replies; 43+ messages in thread
From: Olga Kornievskaia @ 2018-10-29 14:28 UTC (permalink / raw)
  To: Jeff Layton
  Cc: trond.myklebust, Anna Schumaker, viro, Steve French,
	Miklos Szeredi, linux-nfs, linux-fsdevel, linux-cifs,
	linux-unionfs, linux-man

On Sat, Oct 27, 2018 at 7:14 AM Jeff Layton <jlayton@redhat.com> wrote:
>
> On Fri, 2018-10-26 at 16:10 -0400, Olga Kornievskaia wrote:
> > From: Olga Kornievskaia <kolga@netapp.com>
> >
> > These structures are needed by COPY_NOTIFY on the client and needed
> > by the nfsd as well
> >
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > ---
> >  include/linux/nfs4.h | 24 ++++++++++++++++++++++++
> >  1 file changed, 24 insertions(+)
> >
> > diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
> > index 1b06f0b..4d76f87 100644
> > --- a/include/linux/nfs4.h
> > +++ b/include/linux/nfs4.h
> > @@ -16,6 +16,7 @@
> >  #include <linux/list.h>
> >  #include <linux/uidgid.h>
> >  #include <uapi/linux/nfs4.h>
> > +#include <linux/sunrpc/msg_prot.h>
> >
> >  enum nfs4_acl_whotype {
> >       NFS4_ACL_WHO_NAMED = 0,
> > @@ -672,4 +673,27 @@ struct nfs4_op_map {
> >       } u;
> >  };
> >
> > +struct nfs42_netaddr {
> > +     char            netid[RPCBIND_MAXNETIDLEN];
> > +     char            addr[RPCBIND_MAXUADDRLEN + 1];
> > +     u32             netid_len;
> > +     u32 addr_len;
>
> Could you fix the indentation above?

Will do.

>
> > +};
> > +
> > +enum netloc_type4 {
> > +     NL4_NAME                = 1,
> > +     NL4_URL                 = 2,
> > +     NL4_NETADDR             = 3,
> > +};
> > +
> > +struct nl4_server {
> > +     enum netloc_type4       nl4_type;
> > +     union {
> > +             struct { /* NL4_NAME, NL4_URL */
> > +                     int     nl4_str_sz;
> > +                     char    nl4_str[NFS4_OPAQUE_LIMIT + 1];
> > +             };
> > +             struct nfs42_netaddr    nl4_addr; /* NL4_NETADDR */
> > +     } u;
> > +};
> >  #endif
>
> Otherwise, looks fine.
>
> Reviewed-by: Jeff Layton <jlayton@redhat.com>
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 01/11] VFS: move cross device copy_file_range() check into filesystems
  2018-10-27  9:09   ` Dave Chinner
@ 2018-10-29 14:31     ` Olga Kornievskaia
  0 siblings, 0 replies; 43+ messages in thread
From: Olga Kornievskaia @ 2018-10-29 14:31 UTC (permalink / raw)
  To: david
  Cc: trond.myklebust, Anna Schumaker, viro, Steve French,
	Miklos Szeredi, linux-nfs, linux-fsdevel, linux-cifs,
	linux-unionfs, linux-man

On Sat, Oct 27, 2018 at 5:09 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Fri, Oct 26, 2018 at 04:10:46PM -0400, Olga Kornievskaia wrote:
> > From: Olga Kornievskaia <kolga@netapp.com>
> >
> > This patch makes it the responsibility of individual filesystems to
> > allow or deny cross device copies.  Both NFS and CIFS have operations
> > for cross-server copies, and later patches will implement this feature.
> >
> > Note that as of this patch, the copy_file_range() function might be passed
> > superblocks from different filesystem types. -EXDEV should be returned
> > if cross device copies aren't supported, causing the VFS to fall back
> > on using do_splice_direct().
> >
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > ---
> >  Documentation/filesystems/porting | 7 +++++++
> >  fs/cifs/cifsfs.c                  | 3 +++
> >  fs/nfs/nfs4file.c                 | 3 +++
> >  fs/overlayfs/file.c               | 3 +++
> >  fs/read_write.c                   | 9 +++------
> >  5 files changed, 19 insertions(+), 6 deletions(-)
> >
> > diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
> > index 7b7b845..897e1e7 100644
> > --- a/Documentation/filesystems/porting
> > +++ b/Documentation/filesystems/porting
> > @@ -622,3 +622,10 @@ in your dentry operations instead.
> >       alloc_file_clone(file, flags, ops) does not affect any caller's references.
> >       On success you get a new struct file sharing the mount/dentry with the
> >       original, on failure - ERR_PTR().
> > +--
> > +[mandatory]
> > +     ->copy_file_range() may now be passed files which belong to two
> > +     different superblocks of the same file system type or which belong
> > +     to two different filesystems types all together. As before, the
> > +     destination's copy_file_range() is the function which is called.
> > +     If it cannot copy ranges from the source, it should return -EXDEV.
> > diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> > index 7065426..f2d7f4f 100644
> > --- a/fs/cifs/cifsfs.c
> > +++ b/fs/cifs/cifsfs.c
> > @@ -1114,6 +1114,9 @@ static ssize_t cifs_copy_file_range(struct file *src_file, loff_t off,
> >       unsigned int xid = get_xid();
> >       ssize_t rc;
> >
> > +     if (src_file->f_inode->i_sb != dst_file->f_inode->i_sb)
>
> file_inode(file)->i_sb, please.
>
> > +             return -EXDEV;
> > +
> >       rc = cifs_file_copychunk_range(xid, src_file, off, dst_file, destoff,
> >                                       len, flags);
> >       free_xid(xid);
> > diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
> > index 4288a6e..09df688 100644
> > --- a/fs/nfs/nfs4file.c
> > +++ b/fs/nfs/nfs4file.c
> > @@ -135,6 +135,9 @@ static ssize_t nfs4_copy_file_range(struct file *file_in, loff_t pos_in,
> >  {
> >       ssize_t ret;
> >
> > +     if (file_in->f_inode->i_sb != file_out->f_inode->i_sb)
> > +             return -EXDEV;
> > +
> >       if (file_inode(file_in) == file_inode(file_out))
> >               return -EINVAL;
>
> Please look at the code around your modifications and make sure your
> additions are consistent with it.

Will make the changes. Thanks.

>
> > @@ -1588,7 +1584,8 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >        * Try cloning first, this is supported by more file systems, and
> >        * more efficient if both clone and copy are supported (e.g. NFS).
> >        */
> > -     if (file_in->f_op->clone_file_range) {
> > +     if (inode_in->i_sb == inode_out->i_sb &&
> > +                     file_in->f_op->clone_file_range) {
>
>         if (inode_in->i_sb == inode_out->i_sb &&
>             file_in->f_op->clone_file_range) {
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 02/11] VFS: copy_file_range check validity of input source offset
  2018-10-27  9:27   ` Dave Chinner
@ 2018-10-29 14:41     ` Olga Kornievskaia
  2018-10-30  9:03       ` Dave Chinner
  0 siblings, 1 reply; 43+ messages in thread
From: Olga Kornievskaia @ 2018-10-29 14:41 UTC (permalink / raw)
  To: david
  Cc: trond.myklebust, Anna Schumaker, viro, Steve French,
	Miklos Szeredi, linux-nfs, linux-fsdevel, linux-cifs,
	linux-unionfs, linux-man

On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote:
> > From: Olga Kornievskaia <kolga@netapp.com>
> >
> > Input source offset can't be beyond the end of the file.
> >
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > ---
> >  fs/read_write.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/fs/read_write.c b/fs/read_write.c
> > index fb4ffca..b3b304e 100644
> > --- a/fs/read_write.c
> > +++ b/fs/read_write.c
> > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >               }
> >       }
> >
> > +     if (pos_in >= i_size_read(inode_in))
> > +             return -EINVAL;
> > +
>
> vfs_copy_file_range seems ot be missing a wide range of checks.
> rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the
> checks in generic_write_checks() apply, right? And the same security
> issues like stripping setuid bits, etc? And we need to touch
> atime on the source file, too?

Yes sound like needed checks.

> We've just merged 5 or so patches in 4.19-rc8 and we're ready to
> merge another ~30 patch series to fix all the stuff missing from the
> clone/dedupe file range operations that make them safe and robust.
> It seems like copy_file_range is all the checks it needs, too?

Are you proposing to not do this check now in favor of the proper work
that will do all of those checks you listed above? I can not volunteer
to provide this comprehensive check. However if this is the path
community decides is the best then I can move this check into NFS for
now and remove it once VFS provides such check later.

>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/1] man-page: copy_file_range(2) allow for cross-device copies
  2018-10-29 14:25         ` Olga Kornievskaia
@ 2018-10-29 15:52           ` Olga Kornievskaia
  2018-10-29 17:49             ` Amir Goldstein
  0 siblings, 1 reply; 43+ messages in thread
From: Olga Kornievskaia @ 2018-10-29 15:52 UTC (permalink / raw)
  To: david
  Cc: willy, trond.myklebust, Anna Schumaker, viro, Steve French,
	Miklos Szeredi, linux-nfs, linux-fsdevel, linux-cifs,
	linux-unionfs, linux-man

On Mon, Oct 29, 2018 at 10:25 AM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> On Sat, Oct 27, 2018 at 9:33 PM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Sat, Oct 27, 2018 at 06:23:39AM -0700, Matthew Wilcox wrote:
> > > On Sat, Oct 27, 2018 at 08:12:40PM +1100, Dave Chinner wrote:
> > > > > @@ -131,7 +132,8 @@ There is not enough space on the target filesystem to complete the copy.
> > > > >  .B EXDEV
> > > > >  The files referred to by
> > > > >  .IR file_in " and " file_out
> > > > > -are not on the same mounted filesystem.
> > > > > +are not on the same mounted filesystem when the kernel does not support
> > > > > +cross device file copy.
> > > >
> > > > Kernel can support cross device file copy, the filesystem may not.
> > > >
> > > > EXDEV
> > > >     One of the files specified by file_in and file_out are on a
> > > >     filesystem that does not support cross device copies.
> > >
> > > I mentioned this in my last review, and Olga pointed out that one of
> > > the changes in this patch means the kernel will do the copy using
> > > do_splice_direct if the filesystem doesn't support cross-device copying.
> > > We should keep this error documented for those on old kernels, but the
> > > kernel will never return -EXDEV any more.
> >
> > Uh, wot? Where's the patch named "VFS: enable copy_file_range() for
> > all filesystems"? That shoul dnot be a detail hidden inside some
> > other patch that multiple people completely miss during review.
>
> The fact that cross-device check was moved is what allowed for all
> filesystems to use copy_file_range(). I think choosing the words of
> "moving cross device check" was also appropriate title for the VFS
> patch.

I wasn't completely correct. It was the check removal and then also
allowing for -EXDEV error to keep going with the  do_splice_direct().
I could have left EXDEV being an error but thought it would be
beneficial to add such ability. One reason to keep all the changes in
one patch was said to be so that porting back all of this
functionality is included.

So one option would be keep EXDEV as an error if that's what folks
want then this side-effect would not be a problem and the focus would
be on what it was in the first place : removal (or relocation) of the
cross device check into the filesystems.

> To other what you thought was the main change was a side-effect or at
> least that's not where the discussion was centered. The discussion was
> focused on the fact that there is cross device of same and different
> file systems and such discussion deemed appropriate to be noted
> clearly in the commit message. The proposed commit message below
> doesn't capture it.
>
> I'm fine with the commit message below as well. If that's acceptable
> to others. I can change the commit to the wording below.
>
> When I started reading the message I thought your comments where about
> the "man-page" patch that it lacked the wording including in the VFS
> patch. So to clarify, do you have any objections to the wording in the
> man-page patch or was this just about the VFS patch?
>
> > If we are completely changing the kernel's behaviour, the patch
> > should be explicitly named to call out the change of behaviour, and
> > the commit message should clearly explain the change being made and
> > why.
> >
> > /me goes looking.
> >
> > Yup, it is only mentione din passing as a side-effect of an
> > implementation change. That's back to front. Describe the behaviour
> > change, what users will see and the reasons for making the change,
> > leave the code to describe exactly what the change is. Then you can
> > describe the actions needed to make the new functionality work. e.g.
> > The first patch shoul dbe described as:
> >
> > VFS: generic cross-device copy_file_range() support for all filesystems
>
> > From: ....
> >
> > In preparation for enabling cross-device offloading of
> > copy_file_range() functionality, first enable generic cross-device
> > support by allowing copy_file_range() to fall back to a page cache
> > based physical data copy. This means the copy_file_range() syscall
> > will never fail with EXDEV, and so in future userspace will not need
> > to detect or provide a fallback for failed cross-device copies
> > anymore.
> >
> > This requires pushing the cross device support checks down into the
> > filesystem ->copy_file_range methods and falling back to the page
> > cache copy if they return EXDEV.
> >
> > Further, clone_file_range() is only supported within a filesystem,
> > so we must also check that the files belong to the same superblock
> > before calling ->clone_file_range(). If they are on different
> > superblocks, skip the attempt to clone the file and instead try to
> > copy the file.
> >
> > S-o-B: .....
>
> >
> >
> > Cheers,
> >
> > Dave.
> > --
> > Dave Chinner
> > david@fromorbit.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 02/11] VFS: copy_file_range check validity of input source offset
  2018-10-26 21:26   ` Matthew Wilcox
@ 2018-10-29 16:09     ` Olga Kornievskaia
  0 siblings, 0 replies; 43+ messages in thread
From: Olga Kornievskaia @ 2018-10-29 16:09 UTC (permalink / raw)
  To: willy
  Cc: trond.myklebust, Anna Schumaker, viro, Steve French,
	Miklos Szeredi, linux-nfs, linux-fsdevel, linux-cifs,
	linux-unionfs, linux-man

On Fri, Oct 26, 2018 at 5:26 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote:
> > +++ b/fs/read_write.c
> > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >               }
> >       }
> >
> > +     if (pos_in >= i_size_read(inode_in))
> > +             return -EINVAL;
> > +
> >       if (file_out->f_op->copy_file_range) {
> >               ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
> >                                                     pos_out, len, flags);
>
> Is this the right place to check this?  Surely we should be checking for
> this before calling clone_file_range()?

Ops, indeed this is the wrong place. If I were to keep this check here
then I need to move it where it was originally located. However, right
now I'm included to not do the check in VFS at all and move it to NFS.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/1] man-page: copy_file_range(2) allow for cross-device copies
  2018-10-29 15:52           ` Olga Kornievskaia
@ 2018-10-29 17:49             ` Amir Goldstein
  0 siblings, 0 replies; 43+ messages in thread
From: Amir Goldstein @ 2018-10-29 17:49 UTC (permalink / raw)
  To: Olga Kornievskaia
  Cc: Dave Chinner, Matthew Wilcox, trond.myklebust, Anna Schumaker,
	Al Viro, Steve French, Miklos Szeredi, Linux NFS Mailing List,
	linux-fsdevel, linux-cifs, overlayfs, linux-man

On Mon, Oct 29, 2018 at 5:52 PM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> On Mon, Oct 29, 2018 at 10:25 AM Olga Kornievskaia
> <olga.kornievskaia@gmail.com> wrote:
> >
> > On Sat, Oct 27, 2018 at 9:33 PM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > On Sat, Oct 27, 2018 at 06:23:39AM -0700, Matthew Wilcox wrote:
> > > > On Sat, Oct 27, 2018 at 08:12:40PM +1100, Dave Chinner wrote:
> > > > > > @@ -131,7 +132,8 @@ There is not enough space on the target filesystem to complete the copy.
> > > > > >  .B EXDEV
> > > > > >  The files referred to by
> > > > > >  .IR file_in " and " file_out
> > > > > > -are not on the same mounted filesystem.
> > > > > > +are not on the same mounted filesystem when the kernel does not support
> > > > > > +cross device file copy.
> > > > >
> > > > > Kernel can support cross device file copy, the filesystem may not.
> > > > >
> > > > > EXDEV
> > > > >     One of the files specified by file_in and file_out are on a
> > > > >     filesystem that does not support cross device copies.
> > > >
> > > > I mentioned this in my last review, and Olga pointed out that one of
> > > > the changes in this patch means the kernel will do the copy using
> > > > do_splice_direct if the filesystem doesn't support cross-device copying.
> > > > We should keep this error documented for those on old kernels, but the
> > > > kernel will never return -EXDEV any more.
> > >
> > > Uh, wot? Where's the patch named "VFS: enable copy_file_range() for
> > > all filesystems"? That shoul dnot be a detail hidden inside some
> > > other patch that multiple people completely miss during review.
> >
> > The fact that cross-device check was moved is what allowed for all
> > filesystems to use copy_file_range(). I think choosing the words of
> > "moving cross device check" was also appropriate title for the VFS
> > patch.
>
> I wasn't completely correct. It was the check removal and then also
> allowing for -EXDEV error to keep going with the  do_splice_direct().
> I could have left EXDEV being an error but thought it would be
> beneficial to add such ability. One reason to keep all the changes in
> one patch was said to be so that porting back all of this
> functionality is included.
>

Olga,

There are many details in this patch series and repeated miscommunications
about the details. Accuracy is important - "the reason to keep all changes" -
that was a commetn of mine referring *only* to the move of same sb check
from VFS to different filesystems.

The *extra* change of do_splice_direct cross fs was a very nice low
hanging fruit, but completely unrelated to your original  motivation.
I completely agree with Dave that it should be in a separate patch.
I also side with Dave that it would make sense as the first patch of the series.

> So one option would be keep EXDEV as an error if that's what folks
> want then this side-effect would not be a problem and the focus would
> be on what it was in the first place : removal (or relocation) of the
> cross device check into the filesystems.
>

This is up to you. It's an extra change and not related to your NFS work
and you may submit it or not. I think it will be nice if you did submit it.
There was no objection to this change, only objection was to bundle it together
with a different unrelated patch.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 02/11] VFS: copy_file_range check validity of input source offset
  2018-10-29 14:41     ` Olga Kornievskaia
@ 2018-10-30  9:03       ` Dave Chinner
  2018-10-30 13:40         ` Olga Kornievskaia
  2018-10-30 21:10         ` Olga Kornievskaia
  0 siblings, 2 replies; 43+ messages in thread
From: Dave Chinner @ 2018-10-30  9:03 UTC (permalink / raw)
  To: Olga Kornievskaia
  Cc: trond.myklebust, Anna Schumaker, viro, Steve French,
	Miklos Szeredi, linux-nfs, linux-fsdevel, linux-cifs,
	linux-unionfs, linux-man

On Mon, Oct 29, 2018 at 10:41:22AM -0400, Olga Kornievskaia wrote:
> On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote:
> > > From: Olga Kornievskaia <kolga@netapp.com>
> > >
> > > Input source offset can't be beyond the end of the file.
> > >
> > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > ---
> > >  fs/read_write.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/fs/read_write.c b/fs/read_write.c
> > > index fb4ffca..b3b304e 100644
> > > --- a/fs/read_write.c
> > > +++ b/fs/read_write.c
> > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> > >               }
> > >       }
> > >
> > > +     if (pos_in >= i_size_read(inode_in))
> > > +             return -EINVAL;
> > > +
> >
> > vfs_copy_file_range seems ot be missing a wide range of checks.
> > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the
> > checks in generic_write_checks() apply, right? And the same security
> > issues like stripping setuid bits, etc? And we need to touch
> > atime on the source file, too?
> 
> Yes sound like needed checks.
> 
> > We've just merged 5 or so patches in 4.19-rc8 and we're ready to
> > merge another ~30 patch series to fix all the stuff missing from the
> > clone/dedupe file range operations that make them safe and robust.
> > It seems like copy_file_range is all the checks it needs, too?
> 
> Are you proposing to not do this check now in favor of the proper work
> that will do all of those checks you listed above?

No, I'm saying that if you're adding one check, there's a whole heap
of checks that still need to be added, *especially* if this is going
to fall back to page cache copy between superblocks that may have
different limits and constraints.

There's security issues in this API. They need to be fixed before we
allow it to do more and potentially expose more problems due to it's
wider capability.

> I can not volunteer
> to provide this comprehensive check.

Why not?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 02/11] VFS: copy_file_range check validity of input source offset
  2018-10-30  9:03       ` Dave Chinner
@ 2018-10-30 13:40         ` Olga Kornievskaia
  2018-10-30 23:40           ` Dave Chinner
  2018-10-30 21:10         ` Olga Kornievskaia
  1 sibling, 1 reply; 43+ messages in thread
From: Olga Kornievskaia @ 2018-10-30 13:40 UTC (permalink / raw)
  To: david
  Cc: trond.myklebust, Anna Schumaker, viro, Steve French,
	Miklos Szeredi, linux-nfs, linux-fsdevel, linux-cifs,
	linux-unionfs, linux-man

On Tue, Oct 30, 2018 at 5:03 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Mon, Oct 29, 2018 at 10:41:22AM -0400, Olga Kornievskaia wrote:
> > On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote:
> > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > >
> > > > Input source offset can't be beyond the end of the file.
> > > >
> > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > ---
> > > >  fs/read_write.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/fs/read_write.c b/fs/read_write.c
> > > > index fb4ffca..b3b304e 100644
> > > > --- a/fs/read_write.c
> > > > +++ b/fs/read_write.c
> > > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> > > >               }
> > > >       }
> > > >
> > > > +     if (pos_in >= i_size_read(inode_in))
> > > > +             return -EINVAL;
> > > > +
> > >
> > > vfs_copy_file_range seems ot be missing a wide range of checks.
> > > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the
> > > checks in generic_write_checks() apply, right? And the same security
> > > issues like stripping setuid bits, etc? And we need to touch
> > > atime on the source file, too?
> >
> > Yes sound like needed checks.
> >
> > > We've just merged 5 or so patches in 4.19-rc8 and we're ready to
> > > merge another ~30 patch series to fix all the stuff missing from the
> > > clone/dedupe file range operations that make them safe and robust.
> > > It seems like copy_file_range is all the checks it needs, too?
> >
> > Are you proposing to not do this check now in favor of the proper work
> > that will do all of those checks you listed above?
>
> No, I'm saying that if you're adding one check, there's a whole heap
> of checks that still need to be added, *especially* if this is going
> to fall back to page cache copy between superblocks that may have
> different limits and constraints.
>
> There's security issues in this API. They need to be fixed before we
> allow it to do more and potentially expose more problems due to it's
> wider capability.

Sounds like you are arguing that enabling generic copy_file_range()
via do_splice() isn't a good thing without those checks. I'm totally
OK with removing this functionality. As I mentioned earlier I added it
as I thought it was beneficial and I assumed that do_splice() was a
generic enough API to handle what was needed.

What this patch series was intended for was removing/relocating the
cross device check and it sounds like I should be keeping it just to
that.

> > I can not volunteer
> > to provide this comprehensive check.
>
> Why not?

I don't have the expertise in the VFS code.

>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 02/11] VFS: copy_file_range check validity of input source offset
  2018-10-30  9:03       ` Dave Chinner
  2018-10-30 13:40         ` Olga Kornievskaia
@ 2018-10-30 21:10         ` Olga Kornievskaia
  2018-10-30 21:12           ` Olga Kornievskaia
  2018-10-31  0:14           ` Dave Chinner
  1 sibling, 2 replies; 43+ messages in thread
From: Olga Kornievskaia @ 2018-10-30 21:10 UTC (permalink / raw)
  To: david
  Cc: trond.myklebust, Anna Schumaker, viro, Steve French,
	Miklos Szeredi, linux-nfs, linux-fsdevel, linux-cifs,
	linux-unionfs, linux-man

On Tue, Oct 30, 2018 at 5:03 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Mon, Oct 29, 2018 at 10:41:22AM -0400, Olga Kornievskaia wrote:
> > On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote:
> > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > >
> > > > Input source offset can't be beyond the end of the file.
> > > >
> > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > ---
> > > >  fs/read_write.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/fs/read_write.c b/fs/read_write.c
> > > > index fb4ffca..b3b304e 100644
> > > > --- a/fs/read_write.c
> > > > +++ b/fs/read_write.c
> > > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> > > >               }
> > > >       }
> > > >
> > > > +     if (pos_in >= i_size_read(inode_in))
> > > > +             return -EINVAL;
> > > > +
> > >
> > > vfs_copy_file_range seems ot be missing a wide range of checks.
> > > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the
> > > checks in generic_write_checks() apply, right? And the same security
> > > issues like stripping setuid bits, etc? And we need to touch
> > > atime on the source file, too?
> >
> > Yes sound like needed checks.
> >
> > > We've just merged 5 or so patches in 4.19-rc8 and we're ready to
> > > merge another ~30 patch series to fix all the stuff missing from the
> > > clone/dedupe file range operations that make them safe and robust.
> > > It seems like copy_file_range is all the checks it needs, too?
> >
> > Are you proposing to not do this check now in favor of the proper work
> > that will do all of those checks you listed above?
>
> No, I'm saying that if you're adding one check, there's a whole heap
> of checks that still need to be added, *especially* if this is going
> to fall back to page cache copy between superblocks that may have
> different limits and constraints.
>
> There's security issues in this API. They need to be fixed before we
> allow it to do more and potentially expose more problems due to it's
> wider capability.

Before I totally give up on this feature, can you help me understand
your concerns with allowing the generic copy_file_range via
do_splice().

I have mentioned I'm not a VFS expert thus I come from just looking at
the available documentation and the code.

I don't see any restrictions on the files being passed in the
do_splice_direct(). There are no restrictions that they must be from
the same filesystem or file system type. But perhaps this not the
concern you had but more about checking validity of arguments?

I have looked at Dave Wong's, if I'm not mistaken these 2 are the
relevant patches:
[PATCH 02/28] vfs: check file ranges before cloning files
 -- a couple but not all checks apply to copy_file_range() .
specifically, the offsets don't wrap and offset isn't past eof (as my
patch suggests). Other checks have to do with aligned memory which I
don't believe is needed or other dedup requirement that don't apply.

[PATCH 04/28] vfs: strengthen checking of file range inputs to
generic_remap_checks
 -- these checks apply to the code once we fall back to the
do_splice(). it looks to me that perhaps exporting
generic_access_check_limits()/generic_write_check_limits so that they
can be used by the copy_file_range when it get pulled in.

Also, can you elaborate one what "security issues" are present in this
API? Is it "stripping setuid bits" and so something like calling
file_remove_priv() that should be done when the fallback to the
do_splice_direct() happens?

As for the atime, wouldn't the ->copy_file_range() be updating the
file attributes? I guess for the fallback case, the attributes need to
be updated.

If those are checks/issues needed to be addressed and would then get
the generic copy_file_range() in, I could give a go at a patch (or 2).
>
> > I can not volunteer
> > to provide this comprehensive check.
>
> Why not?
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 02/11] VFS: copy_file_range check validity of input source offset
  2018-10-30 21:10         ` Olga Kornievskaia
@ 2018-10-30 21:12           ` Olga Kornievskaia
  2018-10-31  0:14           ` Dave Chinner
  1 sibling, 0 replies; 43+ messages in thread
From: Olga Kornievskaia @ 2018-10-30 21:12 UTC (permalink / raw)
  To: david
  Cc: trond.myklebust, Anna Schumaker, viro, Steve French,
	Miklos Szeredi, linux-nfs, linux-fsdevel, linux-cifs,
	linux-unionfs, linux-man

On Tue, Oct 30, 2018 at 5:10 PM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> On Tue, Oct 30, 2018 at 5:03 AM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Mon, Oct 29, 2018 at 10:41:22AM -0400, Olga Kornievskaia wrote:
> > > On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote:
> > > >
> > > > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote:
> > > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > > >
> > > > > Input source offset can't be beyond the end of the file.
> > > > >
> > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > > ---
> > > > >  fs/read_write.c | 3 +++
> > > > >  1 file changed, 3 insertions(+)
> > > > >
> > > > > diff --git a/fs/read_write.c b/fs/read_write.c
> > > > > index fb4ffca..b3b304e 100644
> > > > > --- a/fs/read_write.c
> > > > > +++ b/fs/read_write.c
> > > > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> > > > >               }
> > > > >       }
> > > > >
> > > > > +     if (pos_in >= i_size_read(inode_in))
> > > > > +             return -EINVAL;
> > > > > +
> > > >
> > > > vfs_copy_file_range seems ot be missing a wide range of checks.
> > > > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the
> > > > checks in generic_write_checks() apply, right? And the same security
> > > > issues like stripping setuid bits, etc? And we need to touch
> > > > atime on the source file, too?
> > >
> > > Yes sound like needed checks.
> > >
> > > > We've just merged 5 or so patches in 4.19-rc8 and we're ready to
> > > > merge another ~30 patch series to fix all the stuff missing from the
> > > > clone/dedupe file range operations that make them safe and robust.
> > > > It seems like copy_file_range is all the checks it needs, too?
> > >
> > > Are you proposing to not do this check now in favor of the proper work
> > > that will do all of those checks you listed above?
> >
> > No, I'm saying that if you're adding one check, there's a whole heap
> > of checks that still need to be added, *especially* if this is going
> > to fall back to page cache copy between superblocks that may have
> > different limits and constraints.
> >
> > There's security issues in this API. They need to be fixed before we
> > allow it to do more and potentially expose more problems due to it's
> > wider capability.
>
> Before I totally give up on this feature, can you help me understand
> your concerns with allowing the generic copy_file_range via
> do_splice().
>
> I have mentioned I'm not a VFS expert thus I come from just looking at
> the available documentation and the code.
>
> I don't see any restrictions on the files being passed in the
> do_splice_direct(). There are no restrictions that they must be from
> the same filesystem or file system type. But perhaps this not the
> concern you had but more about checking validity of arguments?
>
> I have looked at Dave Wong's, if I'm not mistaken these 2 are the

apologizes Darrick Wong's patches.

> relevant patches:
> [PATCH 02/28] vfs: check file ranges before cloning files
>  -- a couple but not all checks apply to copy_file_range() .
> specifically, the offsets don't wrap and offset isn't past eof (as my
> patch suggests). Other checks have to do with aligned memory which I
> don't believe is needed or other dedup requirement that don't apply.
>
> [PATCH 04/28] vfs: strengthen checking of file range inputs to
> generic_remap_checks
>  -- these checks apply to the code once we fall back to the
> do_splice(). it looks to me that perhaps exporting
> generic_access_check_limits()/generic_write_check_limits so that they
> can be used by the copy_file_range when it get pulled in.
>
> Also, can you elaborate one what "security issues" are present in this
> API? Is it "stripping setuid bits" and so something like calling
> file_remove_priv() that should be done when the fallback to the
> do_splice_direct() happens?
>
> As for the atime, wouldn't the ->copy_file_range() be updating the
> file attributes? I guess for the fallback case, the attributes need to
> be updated.
>
> If those are checks/issues needed to be addressed and would then get
> the generic copy_file_range() in, I could give a go at a patch (or 2).
> >
> > > I can not volunteer
> > > to provide this comprehensive check.
> >
> > Why not?
> >
> > Cheers,
> >
> > Dave.
> > --
> > Dave Chinner
> > david@fromorbit.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 02/11] VFS: copy_file_range check validity of input source offset
  2018-10-30 13:40         ` Olga Kornievskaia
@ 2018-10-30 23:40           ` Dave Chinner
  0 siblings, 0 replies; 43+ messages in thread
From: Dave Chinner @ 2018-10-30 23:40 UTC (permalink / raw)
  To: Olga Kornievskaia
  Cc: trond.myklebust, Anna Schumaker, viro, Steve French,
	Miklos Szeredi, linux-nfs, linux-fsdevel, linux-cifs,
	linux-unionfs, linux-man

On Tue, Oct 30, 2018 at 09:40:09AM -0400, Olga Kornievskaia wrote:
> On Tue, Oct 30, 2018 at 5:03 AM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Mon, Oct 29, 2018 at 10:41:22AM -0400, Olga Kornievskaia wrote:
> > > On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote:
> > > >
> > > > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote:
> > > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > > >
> > > > > Input source offset can't be beyond the end of the file.
> > > > >
> > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > > ---
> > > > >  fs/read_write.c | 3 +++
> > > > >  1 file changed, 3 insertions(+)
> > > > >
> > > > > diff --git a/fs/read_write.c b/fs/read_write.c
> > > > > index fb4ffca..b3b304e 100644
> > > > > --- a/fs/read_write.c
> > > > > +++ b/fs/read_write.c
> > > > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> > > > >               }
> > > > >       }
> > > > >
> > > > > +     if (pos_in >= i_size_read(inode_in))
> > > > > +             return -EINVAL;
> > > > > +
> > > >
> > > > vfs_copy_file_range seems ot be missing a wide range of checks.
> > > > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the
> > > > checks in generic_write_checks() apply, right? And the same security
> > > > issues like stripping setuid bits, etc? And we need to touch
> > > > atime on the source file, too?
> > >
> > > Yes sound like needed checks.
> > >
> > > > We've just merged 5 or so patches in 4.19-rc8 and we're ready to
> > > > merge another ~30 patch series to fix all the stuff missing from the
> > > > clone/dedupe file range operations that make them safe and robust.
> > > > It seems like copy_file_range is all the checks it needs, too?
> > >
> > > Are you proposing to not do this check now in favor of the proper work
> > > that will do all of those checks you listed above?
> >
> > No, I'm saying that if you're adding one check, there's a whole heap
> > of checks that still need to be added, *especially* if this is going
> > to fall back to page cache copy between superblocks that may have
> > different limits and constraints.
> >
> > There's security issues in this API. They need to be fixed before we
> > allow it to do more and potentially expose more problems due to it's
> > wider capability.
> 
> Sounds like you are arguing that enabling generic copy_file_range()
> via do_splice() isn't a good thing without those checks.

It's already enabled for filesystems that don't provide
->remap_file_range or ->copy_file_range. Your changes don't
introduce new problems, that just made me aware that there are
serious problems here, too.

The problem we've got now is that filesystem that use
->remap_file_range() are going to have very different error
detection and handling when Darrick's patchset is merged. They are
going to reject setuid files, obey rlimit, LFS, superblock max
sizes, etc. while ->copy_file_range() and do_splice_direct() methods
are not.

That really needs to be fixed.

> I'm totally
> OK with removing this functionality. As I mentioned earlier I added it
> as I thought it was beneficial and I assumed that do_splice() was a
> generic enough API to handle what was needed.

That doesn't fix the problems that are already there. :/

> > > I can not volunteer
> > > to provide this comprehensive check.
> >
> > Why not?
> 
> I don't have the expertise in the VFS code.

Just like all the other people without VFS expertise who fix
problems introduced in code written by people without VFS expertise.
I'm certainly no VFS expert, either, I've just seen, found and
helped fix a bunch of serious problems in the VFS dedupe/reflink
code that were uncovered by data corruptions on XFS filesystems.

That's part of being a filesystem developer - the VFS is "common
property" of all the filesystems and everyone has to do their bit to
maintain it....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 02/11] VFS: copy_file_range check validity of input source offset
  2018-10-30 21:10         ` Olga Kornievskaia
  2018-10-30 21:12           ` Olga Kornievskaia
@ 2018-10-31  0:14           ` Dave Chinner
  2018-10-31 14:51             ` Olga Kornievskaia
  1 sibling, 1 reply; 43+ messages in thread
From: Dave Chinner @ 2018-10-31  0:14 UTC (permalink / raw)
  To: Olga Kornievskaia
  Cc: trond.myklebust, Anna Schumaker, viro, Steve French,
	Miklos Szeredi, linux-nfs, linux-fsdevel, linux-cifs,
	linux-unionfs, linux-man

On Tue, Oct 30, 2018 at 05:10:58PM -0400, Olga Kornievskaia wrote:
> On Tue, Oct 30, 2018 at 5:03 AM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Mon, Oct 29, 2018 at 10:41:22AM -0400, Olga Kornievskaia wrote:
> > > On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote:
> > > >
> > > > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote:
> > > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > > >
> > > > > Input source offset can't be beyond the end of the file.
> > > > >
> > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > > ---
> > > > >  fs/read_write.c | 3 +++
> > > > >  1 file changed, 3 insertions(+)
> > > > >
> > > > > diff --git a/fs/read_write.c b/fs/read_write.c
> > > > > index fb4ffca..b3b304e 100644
> > > > > --- a/fs/read_write.c
> > > > > +++ b/fs/read_write.c
> > > > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> > > > >               }
> > > > >       }
> > > > >
> > > > > +     if (pos_in >= i_size_read(inode_in))
> > > > > +             return -EINVAL;
> > > > > +
> > > >
> > > > vfs_copy_file_range seems ot be missing a wide range of checks.
> > > > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the
> > > > checks in generic_write_checks() apply, right? And the same security
> > > > issues like stripping setuid bits, etc? And we need to touch
> > > > atime on the source file, too?
> > >
> > > Yes sound like needed checks.
> > >
> > > > We've just merged 5 or so patches in 4.19-rc8 and we're ready to
> > > > merge another ~30 patch series to fix all the stuff missing from the
> > > > clone/dedupe file range operations that make them safe and robust.
> > > > It seems like copy_file_range is all the checks it needs, too?
> > >
> > > Are you proposing to not do this check now in favor of the proper work
> > > that will do all of those checks you listed above?
> >
> > No, I'm saying that if you're adding one check, there's a whole heap
> > of checks that still need to be added, *especially* if this is going
> > to fall back to page cache copy between superblocks that may have
> > different limits and constraints.
> >
> > There's security issues in this API. They need to be fixed before we
> > allow it to do more and potentially expose more problems due to it's
> > wider capability.
> 
> Before I totally give up on this feature, can you help me understand
> your concerns with allowing the generic copy_file_range via
> do_splice().

it's not do_splice_direct() i'm concerned about. It's /writing data
without adequate checks/ that I'm concerned about.
->copy_file_range() also writes data, so it needs to undergo the
same safety checks as well.

> I have mentioned I'm not a VFS expert thus I come from just looking at
> the available documentation and the code.
> 
> I don't see any restrictions on the files being passed in the
> do_splice_direct(). There are no restrictions that they must be from
> the same filesystem or file system type. But perhaps this not the
> concern you had but more about checking validity of arguments?
> 
> I have looked at Dave Wong's, if I'm not mistaken these 2 are the
> relevant patches:
> [PATCH 02/28] vfs: check file ranges before cloning files
>  -- a couple but not all checks apply to copy_file_range() .

Yes, of course - clone/dedupe have different constraints, but the
core checks are still needed for copy_file_range(). 

For example, the man page says:

EINVAL
	Requested range extends beyond the end of the source
	file; or the flags argument is not 0.

Your patch above doesn't actually check that - it only checks if the
pos_in is beyond EOF. It needs to check if pos_in + len is beyond
EOF. After checking for wraps, of course.

> [PATCH 04/28] vfs: strengthen checking of file range inputs to
> generic_remap_checks
>  -- these checks apply to the code once we fall back to the
> do_splice().

man page says:

EFBIG
       An  attempt  was made to write a file that exceeds the
       implementation-defined maximum file size or the process's
       file size limit, or to write at a position past the maximum
       allowed offset.

These conditions apply to the destination file regards of the method
used to copy the data. That's what the generic methods now check for
clone/dedupe, and need to be used here, too.

e.g. In the case of NFS, if the user ion the NFS client is not
allowed to copy a file (e.g. say user RLIMIT restrictions) then that
should not be bypassed just because the NFS client can do a server
side copy. The restriction has been placed on the local user calling
copy_file_range(), not on the server side implementation.

> Also, can you elaborate one what "security issues" are present in this
> API? Is it "stripping setuid bits" and so something like calling
> file_remove_priv() that should be done when the fallback to the
> do_splice_direct() happens?

>From 4.19-rc8:

7debbf015f58 xfs: update ctime and remove suid before cloning files

Which then got moved into the generic remap_file_range code in
Darrick's "vfs: remap helper should update destination inode
metadata" patch:

https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&id=8dde90bca6fca3736ea20109654bcf6dcf2ecf1d

We can't assume that a server side copy is going to strip setuid
bits or even update target files c/mtimes. 

> As for the atime, wouldn't the ->copy_file_range() be updating the
> file attributes? I guess for the fallback case, the attributes need to
> be updated.

Generic VFS code should take care of atime updates. Hence if the
source file needs an atime update, it should be in the generic code
and done for all copy methods.

> If those are checks/issues needed to be addressed and would then get
> the generic copy_file_range() in, I could give a go at a patch (or 2).

yes, those are the ones I'm aware of. BUt there may be more, I
haven't really looked that closely...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 02/11] VFS: copy_file_range check validity of input source offset
  2018-10-31  0:14           ` Dave Chinner
@ 2018-10-31 14:51             ` Olga Kornievskaia
  2018-10-31 23:33               ` Dave Chinner
  0 siblings, 1 reply; 43+ messages in thread
From: Olga Kornievskaia @ 2018-10-31 14:51 UTC (permalink / raw)
  To: david
  Cc: trond.myklebust, Anna Schumaker, viro, Steve French,
	Miklos Szeredi, linux-nfs, linux-fsdevel, linux-cifs,
	linux-unionfs, linux-man

On Tue, Oct 30, 2018 at 8:15 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Tue, Oct 30, 2018 at 05:10:58PM -0400, Olga Kornievskaia wrote:
> > On Tue, Oct 30, 2018 at 5:03 AM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > On Mon, Oct 29, 2018 at 10:41:22AM -0400, Olga Kornievskaia wrote:
> > > > On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote:
> > > > >
> > > > > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote:
> > > > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > > > >
> > > > > > Input source offset can't be beyond the end of the file.
> > > > > >
> > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > > > ---
> > > > > >  fs/read_write.c | 3 +++
> > > > > >  1 file changed, 3 insertions(+)
> > > > > >
> > > > > > diff --git a/fs/read_write.c b/fs/read_write.c
> > > > > > index fb4ffca..b3b304e 100644
> > > > > > --- a/fs/read_write.c
> > > > > > +++ b/fs/read_write.c
> > > > > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> > > > > >               }
> > > > > >       }
> > > > > >
> > > > > > +     if (pos_in >= i_size_read(inode_in))
> > > > > > +             return -EINVAL;
> > > > > > +
> > > > >
> > > > > vfs_copy_file_range seems ot be missing a wide range of checks.
> > > > > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the
> > > > > checks in generic_write_checks() apply, right? And the same security
> > > > > issues like stripping setuid bits, etc? And we need to touch
> > > > > atime on the source file, too?
> > > >
> > > > Yes sound like needed checks.
> > > >
> > > > > We've just merged 5 or so patches in 4.19-rc8 and we're ready to
> > > > > merge another ~30 patch series to fix all the stuff missing from the
> > > > > clone/dedupe file range operations that make them safe and robust.
> > > > > It seems like copy_file_range is all the checks it needs, too?
> > > >
> > > > Are you proposing to not do this check now in favor of the proper work
> > > > that will do all of those checks you listed above?
> > >
> > > No, I'm saying that if you're adding one check, there's a whole heap
> > > of checks that still need to be added, *especially* if this is going
> > > to fall back to page cache copy between superblocks that may have
> > > different limits and constraints.
> > >
> > > There's security issues in this API. They need to be fixed before we
> > > allow it to do more and potentially expose more problems due to it's
> > > wider capability.
> >
> > Before I totally give up on this feature, can you help me understand
> > your concerns with allowing the generic copy_file_range via
> > do_splice().
>
> it's not do_splice_direct() i'm concerned about. It's /writing data
> without adequate checks/ that I'm concerned about.
> ->copy_file_range() also writes data, so it needs to undergo the
> same safety checks as well.

Thank you Dave for clarifying and elaborating on the points. As you
pointed out this concerns apply to the current code the same way as to
the patch series. Those concerns should be address however I feel like
they shouldn't be the responsibility of this particular patch series.
Therefore, I ask for the community to either make any final comments
for any changes that are needed to "version 7" patches and if no more
comments arise I would like to ask for this to be added to the queue
for the next kernel version.

Then the next patch series would be just VFS and would add appropriate
checks and then allow for the generic copy_file_range() via do_splice.

> > I have mentioned I'm not a VFS expert thus I come from just looking at
> > the available documentation and the code.
> >
> > I don't see any restrictions on the files being passed in the
> > do_splice_direct(). There are no restrictions that they must be from
> > the same filesystem or file system type. But perhaps this not the
> > concern you had but more about checking validity of arguments?
> >
> > I have looked at Dave Wong's, if I'm not mistaken these 2 are the
> > relevant patches:
> > [PATCH 02/28] vfs: check file ranges before cloning files
> >  -- a couple but not all checks apply to copy_file_range() .
>
> Yes, of course - clone/dedupe have different constraints, but the
> core checks are still needed for copy_file_range().
>
> For example, the man page says:
>
> EINVAL
>         Requested range extends beyond the end of the source
>         file; or the flags argument is not 0.
>
> Your patch above doesn't actually check that - it only checks if the
> pos_in is beyond EOF. It needs to check if pos_in + len is beyond
> EOF. After checking for wraps, of course.

There was a reason why I didn't include the "pos_in + len" check. It
sparked the conversation why should "pos_in + len" be an error, when a
"read" system call would just return a "short" read and EOF. So I
dropped the check for "pst_in + len" to be an error.

https://www.spinics.net/lists/linux-nfs/msg62627.html  (probably not
all the conversations but some of them)

> > [PATCH 04/28] vfs: strengthen checking of file range inputs to
> > generic_remap_checks
> >  -- these checks apply to the code once we fall back to the
> > do_splice().
>
> man page says:
>
> EFBIG
>        An  attempt  was made to write a file that exceeds the
>        implementation-defined maximum file size or the process's
>        file size limit, or to write at a position past the maximum
>        allowed offset.
>
> These conditions apply to the destination file regards of the method
> used to copy the data. That's what the generic methods now check for
> clone/dedupe, and need to be used here, too.

Agreed and once Darrek patches are in, copy_file_range() can use them too.

> e.g. In the case of NFS, if the user ion the NFS client is not
> allowed to copy a file (e.g. say user RLIMIT restrictions) then that
> should not be bypassed just because the NFS client can do a server
> side copy. The restriction has been placed on the local user calling
> copy_file_range(), not on the server side implementation.
>
> > Also, can you elaborate one what "security issues" are present in this
> > API? Is it "stripping setuid bits" and so something like calling
> > file_remove_priv() that should be done when the fallback to the
> > do_splice_direct() happens?
>
> From 4.19-rc8:
>
> 7debbf015f58 xfs: update ctime and remove suid before cloning files
>
> Which then got moved into the generic remap_file_range code in
> Darrick's "vfs: remap helper should update destination inode
> metadata" patch:
>
> https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&id=8dde90bca6fca3736ea20109654bcf6dcf2ecf1d
>
> We can't assume that a server side copy is going to strip setuid
> bits or even update target files c/mtimes.

I would like to discuss your concerns about updating attributes
(c/m/atimes), why shouldn't it be a ->copy_file_range()
responsibility. copy_file_rage is basically a read+write. As far as I
can tell, vfs_read and vfs_write (in VFS) don't deal with updating
attributes. I'm guessing it's assumed that underlying file systems are
going to take care of it (unless of course I misread the code).

> > As for the atime, wouldn't the ->copy_file_range() be updating the
> > file attributes? I guess for the fallback case, the attributes need to
> > be updated.
>
> Generic VFS code should take care of atime updates. Hence if the
> source file needs an atime update, it should be in the generic code
> and done for all copy methods.
>
> > If those are checks/issues needed to be addressed and would then get
> > the generic copy_file_range() in, I could give a go at a patch (or 2).
>
> yes, those are the ones I'm aware of. BUt there may be more, I
> haven't really looked that closely...
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 02/11] VFS: copy_file_range check validity of input source offset
  2018-10-31 14:51             ` Olga Kornievskaia
@ 2018-10-31 23:33               ` Dave Chinner
  0 siblings, 0 replies; 43+ messages in thread
From: Dave Chinner @ 2018-10-31 23:33 UTC (permalink / raw)
  To: Olga Kornievskaia
  Cc: trond.myklebust, Anna Schumaker, viro, Steve French,
	Miklos Szeredi, linux-nfs, linux-fsdevel, linux-cifs,
	linux-unionfs, linux-man

On Wed, Oct 31, 2018 at 10:51:48AM -0400, Olga Kornievskaia wrote:
> On Tue, Oct 30, 2018 at 8:15 PM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Tue, Oct 30, 2018 at 05:10:58PM -0400, Olga Kornievskaia wrote:
> > > On Tue, Oct 30, 2018 at 5:03 AM Dave Chinner <david@fromorbit.com> wrote:
> > > >
> > > > On Mon, Oct 29, 2018 at 10:41:22AM -0400, Olga Kornievskaia wrote:
> > > > > On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote:
> > > > > >
> > > > > > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote:
> > > > > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > > > > >
> > > > > > > Input source offset can't be beyond the end of the file.
> > > > > > >
> > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > > > > ---
> > > > > > >  fs/read_write.c | 3 +++
> > > > > > >  1 file changed, 3 insertions(+)
> > > > > > >
> > > > > > > diff --git a/fs/read_write.c b/fs/read_write.c
> > > > > > > index fb4ffca..b3b304e 100644
> > > > > > > --- a/fs/read_write.c
> > > > > > > +++ b/fs/read_write.c
> > > > > > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> > > > > > >               }
> > > > > > >       }
> > > > > > >
> > > > > > > +     if (pos_in >= i_size_read(inode_in))
> > > > > > > +             return -EINVAL;
> > > > > > > +
> > > > > >
> > > > > > vfs_copy_file_range seems ot be missing a wide range of checks.
> > > > > > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the
> > > > > > checks in generic_write_checks() apply, right? And the same security
> > > > > > issues like stripping setuid bits, etc? And we need to touch
> > > > > > atime on the source file, too?
> > > > >
> > > > > Yes sound like needed checks.
> > > > >
> > > > > > We've just merged 5 or so patches in 4.19-rc8 and we're ready to
> > > > > > merge another ~30 patch series to fix all the stuff missing from the
> > > > > > clone/dedupe file range operations that make them safe and robust.
> > > > > > It seems like copy_file_range is all the checks it needs, too?
> > > > >
> > > > > Are you proposing to not do this check now in favor of the proper work
> > > > > that will do all of those checks you listed above?
> > > >
> > > > No, I'm saying that if you're adding one check, there's a whole heap
> > > > of checks that still need to be added, *especially* if this is going
> > > > to fall back to page cache copy between superblocks that may have
> > > > different limits and constraints.
> > > >
> > > > There's security issues in this API. They need to be fixed before we
> > > > allow it to do more and potentially expose more problems due to it's
> > > > wider capability.
> > >
> > > Before I totally give up on this feature, can you help me understand
> > > your concerns with allowing the generic copy_file_range via
> > > do_splice().
> >
> > it's not do_splice_direct() i'm concerned about. It's /writing data
> > without adequate checks/ that I'm concerned about.
> > ->copy_file_range() also writes data, so it needs to undergo the
> > same safety checks as well.
> 
> Thank you Dave for clarifying and elaborating on the points. As you
> pointed out this concerns apply to the current code the same way as to
> the patch series. Those concerns should be address however I feel like
> they shouldn't be the responsibility of this particular patch series.
> Therefore, I ask for the community to either make any final comments
> for any changes that are needed to "version 7" patches and if no more
> comments arise I would like to ask for this to be added to the queue
> for the next kernel version.
> 
> Then the next patch series would be just VFS and would add appropriate
> checks and then allow for the generic copy_file_range() via do_splice.

That's fine by me.

> 
> > > I have mentioned I'm not a VFS expert thus I come from just looking at
> > > the available documentation and the code.
> > >
> > > I don't see any restrictions on the files being passed in the
> > > do_splice_direct(). There are no restrictions that they must be from
> > > the same filesystem or file system type. But perhaps this not the
> > > concern you had but more about checking validity of arguments?
> > >
> > > I have looked at Dave Wong's, if I'm not mistaken these 2 are the
> > > relevant patches:
> > > [PATCH 02/28] vfs: check file ranges before cloning files
> > >  -- a couple but not all checks apply to copy_file_range() .
> >
> > Yes, of course - clone/dedupe have different constraints, but the
> > core checks are still needed for copy_file_range().
> >
> > For example, the man page says:
> >
> > EINVAL
> >         Requested range extends beyond the end of the source
> >         file; or the flags argument is not 0.
> >
> > Your patch above doesn't actually check that - it only checks if the
> > pos_in is beyond EOF. It needs to check if pos_in + len is beyond
> > EOF. After checking for wraps, of course.
> 
> There was a reason why I didn't include the "pos_in + len" check. It
> sparked the conversation why should "pos_in + len" be an error, when a
> "read" system call would just return a "short" read and EOF. So I
> dropped the check for "pst_in + len" to be an error.

So man page patches will be required, too. :)

Basically, we need to nail down the expected semantics, make sure
they are correctly documented and /enforced consistently/ across all
filesystems.


> > >  -- these checks apply to the code once we fall back to the
> > > do_splice().
> >
> > man page says:
> >
> > EFBIG
> >        An  attempt  was made to write a file that exceeds the
> >        implementation-defined maximum file size or the process's
> >        file size limit, or to write at a position past the maximum
> >        allowed offset.
> >
> > These conditions apply to the destination file regards of the method
> > used to copy the data. That's what the generic methods now check for
> > clone/dedupe, and need to be used here, too.
> 
> Agreed and once Darrek patches are in, copy_file_range() can use them too.

Should be in the next couple of days.

> > 7debbf015f58 xfs: update ctime and remove suid before cloning files
> >
> > Which then got moved into the generic remap_file_range code in
> > Darrick's "vfs: remap helper should update destination inode
> > metadata" patch:
> >
> > https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&id=8dde90bca6fca3736ea20109654bcf6dcf2ecf1d
> >
> > We can't assume that a server side copy is going to strip setuid
> > bits or even update target files c/mtimes.
> 
> I would like to discuss your concerns about updating attributes
> (c/m/atimes), why shouldn't it be a ->copy_file_range()
> responsibility. copy_file_rage is basically a read+write. As far as I
> can tell, vfs_read and vfs_write (in VFS) don't deal with updating
> attributes.

You're looking at the wrong level. The VFS layer is the first
multiplexing layer, allowing filesystems to select a method of
handling functionality. They then make use of "generic helpers"
to implement the required functionality, and they contain the
required updates.

ie.g. A list of generic helpers with atime update callers from my
cscope index:

f fs/pipe.c          pipe_read                   343 file_accessed(filp);
h fs/readdir.c       iterate_dir                  56 file_accessed(file);
i fs/splice.c        generic_file_splice_read    311 file_accessed(in);
j fs/splice.c        splice_direct_to_actor      992 file_accessed(in);
p mm/filemap.c       generic_file_buffered_read 2299 file_accessed(filp);
q mm/filemap.c       generic_file_read_iter     2339 file_accessed(file);
r mm/filemap.c       generic_file_mmap          2736 file_accessed(file);

These are effectively reference implementations of the file reading
infrastructure. Filesystems often have customised implementations
but they all must contain the same functioanlity and behaviour as
the reference implementation.

> I'm guessing it's assumed that underlying file systems are
> going to take care of it (unless of course I misread the code).

Only the ones that don't specifically call the generic helper to do
the work.

IOWs, what I'd like to see is a generic_copy_file_range() as the
reference implemenation using a page cache copy. This contains all
the required checks, timestamp updates, etc. If the filesystem does
not supply ->copy_file_range, then generic_copy_file_range() is
called, not do_splice_direct(). Indeed, a filesystem should be able
to do:

	.copy_file_range = xfs_copy_file_range,

xfs_copy_file_range(...)
{
	trace_xfs_copy_file_range(...)
	return generic_copy_file_range(....);
}

and have everything work correctly.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2018-11-01  8:33 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-26 20:10 [PATCH v4 00/11] client-side support for "inter" SSC copy Olga Kornievskaia
2018-10-26 20:10 ` [PATCH v4 01/11] VFS: move cross device copy_file_range() check into filesystems Olga Kornievskaia
2018-10-26 21:23   ` Matthew Wilcox
2018-10-26 22:10   ` Steve French
2018-10-27  9:09   ` Dave Chinner
2018-10-29 14:31     ` Olga Kornievskaia
2018-10-27 11:11   ` Jeff Layton
2018-10-26 20:10 ` [PATCH 1/1] man-page: copy_file_range(2) allow for cross-device copies Olga Kornievskaia
2018-10-27  9:12   ` Dave Chinner
2018-10-27 13:23     ` Matthew Wilcox
2018-10-28  1:33       ` Dave Chinner
2018-10-28  2:39         ` Matthew Wilcox
2018-10-29 14:25         ` Olga Kornievskaia
2018-10-29 15:52           ` Olga Kornievskaia
2018-10-29 17:49             ` Amir Goldstein
2018-10-26 20:10 ` [PATCH v4 02/11] VFS: copy_file_range check validity of input source offset Olga Kornievskaia
2018-10-26 21:26   ` Matthew Wilcox
2018-10-29 16:09     ` Olga Kornievskaia
2018-10-27  9:27   ` Dave Chinner
2018-10-29 14:41     ` Olga Kornievskaia
2018-10-30  9:03       ` Dave Chinner
2018-10-30 13:40         ` Olga Kornievskaia
2018-10-30 23:40           ` Dave Chinner
2018-10-30 21:10         ` Olga Kornievskaia
2018-10-30 21:12           ` Olga Kornievskaia
2018-10-31  0:14           ` Dave Chinner
2018-10-31 14:51             ` Olga Kornievskaia
2018-10-31 23:33               ` Dave Chinner
2018-10-26 20:10 ` [PATCH v4 03/11] NFS: NFSD defining nl4_servers structure needed by both Olga Kornievskaia
2018-10-27 11:14   ` Jeff Layton
2018-10-29 14:28     ` Olga Kornievskaia
2018-10-26 20:10 ` [PATCH v4 04/11] NFS: add COPY_NOTIFY operation Olga Kornievskaia
2018-10-26 20:10 ` [PATCH v4 05/11] NFS: add ca_source_server<> to COPY Olga Kornievskaia
2018-10-26 20:10 ` [PATCH v4 06/11] NFS: also send OFFLOAD_CANCEL to source server Olga Kornievskaia
2018-10-26 20:10 ` [PATCH v4 07/11] NFS: inter ssc open Olga Kornievskaia
2018-10-26 20:10 ` [PATCH v4 08/11] NFS: skip recovery of copy open on dest server Olga Kornievskaia
2018-10-26 20:10 ` [PATCH v4 09/11] NFS: for "inter" copy treat ESTALE as ENOTSUPP Olga Kornievskaia
2018-10-26 20:10 ` [PATCH v4 10/11] NFS: COPY handle ERR_OFFLOAD_DENIED Olga Kornievskaia
2018-10-26 20:10 ` [PATCH v4 11/11] NFS: replace cross device check in copy_file_range Olga Kornievskaia
2018-10-26 21:22   ` Matthew Wilcox
2018-10-27 11:08   ` Jeff Layton
2018-10-27 13:26     ` Matthew Wilcox
2018-10-29 14:28       ` Olga Kornievskaia

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).