All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v1 00/19] NFS support for inter and async COPY
@ 2017-03-02 16:01 Olga Kornievskaia
  2017-03-02 16:01 ` [RFC v1 01/19] fs: Don't copy beyond the end of the file Olga Kornievskaia
                   ` (18 more replies)
  0 siblings, 19 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-02 16:01 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: linux-nfs

This patch series provides support for NFSv4.2 COPY featuring support
for asynchronous copy and inter SSC copy.

In case, of the "inter" SSC copy files reside on different servers and
thus under different superblocks and require that VFS removes the
restriction that src and dst files must be on the same superblock.

NFS's copy_file_range() determines if the copy is "intra" or "inter"
and for "inter" it sends the COPY_NOTIFY to the source server. Then,
it would send of an asynchronous COPY to the server (destination in
case of "inter"). If server errs with ERR_OFFLOAD_NOREQS the copy will
be re-sent as a synchronous COPY. If application cancels an in-flight
COPY, OFFLOAD_CANCEL is sent to the source server.

If server replies to the COPY with the copy stateid, client will go
wait on the CB_OFFLOAD. To fend off the race between CB_OFFLOAD and
COPY reply, we check the list of pending callbacks before going to
wait. Client adds the copy to the global list of copy stateids for the
callback to look thru and signal the waiting copy.

If application cancels async COPY after reply is received, wait will be
interrupted and client will send OFFLOAD_CANCEL to the source and
destination servers (sending it as an async RPC in the context of the
nfsiod_workqueue).

When the client receives reply from the CB_OFFLOAD with some bytes and
committed how is UNSTABLE, then COMMIT is sent to the server. The results
arep propagated to the VFS and application. Assuming that application
will deal with a partial result and continue from the new offset if needed.

Handling reboot of the destination server when client is waiting on the
CB_OFFLOAD happens when SEQUENCE discovers that destination server rebooted.
The open state initially is marked to be NFS_CLNT_DST_SSC_COPY_STATE
during the COPY. Then during the recovery if state is marked as such,
then look thru the list of copies for the server and see if any are
associated with this recovering open, if so mark the copy rebooted and
wake up the waiting copy. Upon wake up the waiting copy, will restart the
copy from scratch.

If the source server is rebooted, the destination server will also know
about it and it will return the partial result via CB_OFFLOAD, then the
result will be propagated back to the application which will initiate
the new copy and new COPY_NOTIFY will be sent.

If CB_OFFLOAD returned an error and non negative value of partial copy
and error is not ENOSPC, then ignore the error and send the commit
and return partial result to the client to start the next copy.

On the destination server, it's now acting as a client and needs to do
a special "open" and "close". Since destination server doesn't do an open
on the wire, we "fake" create the needed data structures and that's done
in the new function nfs42_ssc_open(). To clean up this open but not trigger
the CLOSE on the wire, we have a new function nfs42_ssc_close() that
accomplishes that.

Anna Schumaker (1):
  fs: Don't copy beyond the end of the file

Andy Adamson (4):
  VFS permit cross device vfs_copy_file_range
  NFS inter ssc open
  NFS add COPY_NOTIFY operation
  NFS add ca_source_server<> to COPY

Olga Kornievskaia (14):
  VFS don't try clone if superblocks are different
  NFS CB_OFFLOAD xdr
  NFS OFFLOAD_STATUS xdr
  NFS OFFLOAD_STATUS op
  NFS OFFLOAD_CANCEL xdr
  NFS COPY xdr handle async reply
  NFS add support for asynchronous COPY
  NFS handle COPY reply CB_OFFLOAD call race
  NFS send OFFLOAD_CANCEL when COPY killed
  NFS make COPY synchronous xdr configurable
  NFS handle COPY ERR_OFFLOAD_NO_REQS
  NFS skip recovery of copy open on dest server
  NFS recover from destination server reboot for copies
  NFS if we got partial copy ignore errors

 fs/nfs/callback.h         |  13 ++
 fs/nfs/callback_proc.c    |  52 +++++++
 fs/nfs/callback_xdr.c     |  80 +++++++++-
 fs/nfs/client.c           |   1 +
 fs/nfs/internal.h         |  10 ++
 fs/nfs/nfs42.h            |   9 +-
 fs/nfs/nfs42proc.c        | 352 ++++++++++++++++++++++++++++++++++++++++--
 fs/nfs/nfs42xdr.c         | 382 ++++++++++++++++++++++++++++++++++++++++++++--
 fs/nfs/nfs4_fs.h          |  12 ++
 fs/nfs/nfs4client.c       |  15 ++
 fs/nfs/nfs4file.c         | 147 +++++++++++++++++-
 fs/nfs/nfs4proc.c         |   8 +-
 fs/nfs/nfs4state.c        |  31 +++-
 fs/nfs/nfs4xdr.c          |   3 +
 fs/read_write.c           |  12 +-
 include/linux/nfs4.h      |  36 +++++
 include/linux/nfs_fs.h    |  11 ++
 include/linux/nfs_fs_sb.h |   5 +
 include/linux/nfs_xdr.h   |  34 +++++
 19 files changed, 1171 insertions(+), 42 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC v1 01/19] fs: Don't copy beyond the end of the file
  2017-03-02 16:01 [RFC v1 00/19] NFS support for inter and async COPY Olga Kornievskaia
@ 2017-03-02 16:01 ` Olga Kornievskaia
  2017-03-02 16:22   ` Christoph Hellwig
  2017-03-02 16:01 ` [RFC v1 02/19] VFS permit cross device vfs_copy_file_range Olga Kornievskaia
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-02 16:01 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: linux-nfs

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 fs/read_write.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/read_write.c b/fs/read_write.c
index 5816d4c..1d9e305 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1526,6 +1526,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	if (unlikely(ret))
 		return ret;
 
+	if (pos_in >= i_size_read(inode_in))
+		return -EINVAL;
+
 	if (!(file_in->f_mode & FMODE_READ) ||
 	    !(file_out->f_mode & FMODE_WRITE) ||
 	    (file_out->f_flags & O_APPEND))
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC v1 02/19] VFS permit cross device vfs_copy_file_range
  2017-03-02 16:01 [RFC v1 00/19] NFS support for inter and async COPY Olga Kornievskaia
  2017-03-02 16:01 ` [RFC v1 01/19] fs: Don't copy beyond the end of the file Olga Kornievskaia
@ 2017-03-02 16:01 ` Olga Kornievskaia
  2017-03-02 16:01 ` [RFC v1 03/19] VFS don't try clone if superblocks are different Olga Kornievskaia
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-02 16:01 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: linux-nfs

Allow nfs_copy_file_range to copy across devices.
NFSv4.2 inter server to server copy always copies across devices, and
NFSv4.2 intra server to server copy can copy across devices on the same
server.

If a file system's fileoperations copy_file_range operation prohibits
cross-device copies, fall back to do_splice_direct. This is needed for
nfsd_copy_file_range() which is called by the inter server to server
destination server acting as an NFS client, and reading the file from
the source server.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/read_write.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 1d9e305..75084cd 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1534,10 +1534,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	    (file_out->f_flags & O_APPEND))
 		return -EBADF;
 
-	/* this could be relaxed once a method supports cross-fs copies */
-	if (inode_in->i_sb != inode_out->i_sb)
-		return -EXDEV;
-
 	if (len == 0)
 		return 0;
 
@@ -1559,7 +1555,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	if (file_out->f_op->copy_file_range) {
 		ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
 						      pos_out, len, flags);
-		if (ret != -EOPNOTSUPP)
+		if (ret != -EOPNOTSUPP && ret != -EXDEV)
 			goto done;
 	}
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC v1 03/19] VFS don't try clone if superblocks are different
  2017-03-02 16:01 [RFC v1 00/19] NFS support for inter and async COPY Olga Kornievskaia
  2017-03-02 16:01 ` [RFC v1 01/19] fs: Don't copy beyond the end of the file Olga Kornievskaia
  2017-03-02 16:01 ` [RFC v1 02/19] VFS permit cross device vfs_copy_file_range Olga Kornievskaia
@ 2017-03-02 16:01 ` Olga Kornievskaia
  2017-03-02 16:01 ` [RFC v1 04/19] NFS inter ssc open Olga Kornievskaia
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-02 16:01 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: linux-nfs

Only try clone_file_range if superblocks are the same.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/read_write.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 75084cd..00161bd 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1543,7 +1543,8 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	 * Try cloning first, this is supported by more file systems, and
 	 * more efficient if both clone and copy are supported (e.g. NFS).
 	 */
-	if (file_in->f_op->clone_file_range) {
+	if (inode_in->i_sb == inode_out->i_sb &&
+			file_in->f_op->clone_file_range) {
 		ret = file_in->f_op->clone_file_range(file_in, pos_in,
 				file_out, pos_out, len);
 		if (ret == 0) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC v1 04/19] NFS inter ssc open
  2017-03-02 16:01 [RFC v1 00/19] NFS support for inter and async COPY Olga Kornievskaia
                   ` (2 preceding siblings ...)
  2017-03-02 16:01 ` [RFC v1 03/19] VFS don't try clone if superblocks are different Olga Kornievskaia
@ 2017-03-02 16:01 ` Olga Kornievskaia
  2017-03-02 16:01 ` [RFC v1 05/19] NFS add COPY_NOTIFY operation Olga Kornievskaia
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-02 16:01 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: linux-nfs

NFSv4.2 inter server to server copy requires the destination server to
READ the data from the source server using the provided stateid and
file handle.

Given an NFSv4 stateid and filehandle from the COPY operaion, provide the
destination server with an NFS client function to create a struct file
suitable for the destiniation server to READ the data to be copied.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4_fs.h  |   7 ++++
 fs/nfs/nfs4file.c | 106 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/nfs4proc.c |   5 ++-
 3 files changed, 115 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index af285cc..ced3431 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -271,6 +271,13 @@ extern int nfs4_set_rw_stateid(nfs4_stateid *stateid,
 		const struct nfs_open_context *ctx,
 		const struct nfs_lock_context *l_ctx,
 		fmode_t fmode);
+extern int nfs4_proc_getattr(struct nfs_server *server, struct nfs_fh *fhandle,
+				struct nfs_fattr *fattr,
+				struct nfs4_label *label);
+extern int update_open_stateid(struct nfs4_state *state,
+				const nfs4_stateid *open_stateid,
+				const nfs4_stateid *deleg_stateid,
+				fmode_t fmode);
 
 #if defined(CONFIG_NFS_V4_1)
 extern int nfs41_sequence_done(struct rpc_task *, struct nfs4_sequence_res *);
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index 0efba77..0730c62 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -7,6 +7,7 @@
 #include <linux/file.h>
 #include <linux/falloc.h>
 #include <linux/nfs_fs.h>
+#include <linux/file.h>
 #include <uapi/linux/btrfs.h>	/* BTRFS_IOC_CLONE/BTRFS_IOC_CLONE_RANGE */
 #include "delegation.h"
 #include "internal.h"
@@ -236,6 +237,111 @@ static int nfs42_clone_file_range(struct file *src_file, loff_t src_off,
 out:
 	return ret;
 }
+
+static int read_name_gen = 1;
+#define SSC_READ_NAME_BODY "ssc_read_%d"
+
+struct file *
+nfs42_ssc_open(struct vfsmount *ss_mnt, struct nfs_fh *src_fh,
+		nfs4_stateid *stateid)
+{
+	struct nfs_fattr fattr;
+	struct path path = {
+		.dentry = NULL,
+	};
+	struct file *filep, *res;
+	struct nfs_server *server;
+	struct inode *r_ino = NULL;
+	struct nfs_open_context *ctx;
+	struct nfs4_state_owner *sp;
+	char *read_name;
+	int len, status = 0;
+
+	server = NFS_SERVER(ss_mnt->mnt_root->d_inode);
+
+	nfs_fattr_init(&fattr);
+
+	status = nfs4_proc_getattr(server, src_fh, &fattr, NULL);
+	if (status < 0) {
+		res = ERR_PTR(status);
+		goto out;
+	}
+
+	res = ERR_PTR(-ENOMEM);
+	len = strlen(SSC_READ_NAME_BODY) + 16;
+	read_name = kzalloc(len, GFP_NOFS);
+	if (read_name == NULL)
+		goto out;
+	snprintf(read_name, len, SSC_READ_NAME_BODY, read_name_gen++);
+	dprintk("%s read_name %s\n", __func__, read_name);
+
+	/* Just put the file under the mount point */
+	path.dentry = d_alloc_name(ss_mnt->mnt_root, read_name);
+	kfree(read_name);
+	if (path.dentry == NULL)
+		goto out;
+	path.mnt = ss_mnt;
+	r_ino = nfs_fhget(ss_mnt->mnt_root->d_inode->i_sb, src_fh, &fattr,
+			NULL);
+	if (IS_ERR(r_ino)) {
+		res = ERR_CAST(r_ino);
+		goto out_path;
+	}
+
+	d_add(path.dentry, r_ino);
+
+	filep = alloc_file(&path, FMODE_READ, r_ino->i_fop);
+	if (IS_ERR(filep)) {
+		res = ERR_CAST(filep);
+		goto out_path;
+	}
+
+	ctx = alloc_nfs_open_context(filep->f_path.dentry, filep->f_mode,
+					filep);
+	if (IS_ERR(ctx)) {
+		res = ERR_CAST(ctx);
+		goto out_filep;
+	}
+
+	res = ERR_PTR(-EINVAL);
+	sp = nfs4_get_state_owner(server, ctx->cred, GFP_KERNEL);
+	if (sp == NULL)
+		goto out_ctx;
+
+	ctx->state = nfs4_get_open_state(r_ino, sp);
+	if (ctx->state == NULL)
+		goto out_stateowner;
+
+	update_open_stateid(ctx->state, stateid, NULL, filep->f_mode);
+
+	nfs_file_set_open_context(filep, ctx);
+	put_nfs_open_context(ctx);
+
+	file_ra_state_init(&filep->f_ra, filep->f_mapping->host->i_mapping);
+	res = filep;
+out:
+	dprintk("<-- %s error %ld filep %p r_ino %p\n",
+		__func__, IS_ERR(res) ? PTR_ERR(res) : 0, res, r_ino);
+
+	return res;
+out_stateowner:
+	nfs4_put_state_owner(sp);
+out_ctx:
+	put_nfs_open_context(ctx);
+out_filep:
+	fput(filep);
+out_path:
+	path_put(&path);
+	goto out;
+}
+EXPORT_SYMBOL_GPL(nfs42_ssc_open);
+void nfs42_ssc_close(struct file *filep)
+{
+	struct nfs_open_context *ctx = nfs_file_open_context(filep);
+
+	ctx->state->flags = 0;
+}
+EXPORT_SYMBOL_GPL(nfs42_ssc_close);
 #endif /* CONFIG_NFS_V4_2 */
 
 const struct file_operations nfs4_file_operations = {
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 1b18368..fbe2452 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -89,7 +89,6 @@
 static int _nfs4_recover_proc_open(struct nfs4_opendata *data);
 static int nfs4_do_fsinfo(struct nfs_server *, struct nfs_fh *, struct nfs_fsinfo *);
 static void nfs_fixup_referral_attributes(struct nfs_fattr *fattr);
-static int nfs4_proc_getattr(struct nfs_server *, struct nfs_fh *, struct nfs_fattr *, struct nfs4_label *label);
 static int _nfs4_proc_getattr(struct nfs_server *server, struct nfs_fh *fhandle, struct nfs_fattr *fattr, struct nfs4_label *label);
 static int nfs4_do_setattr(struct inode *inode, struct rpc_cred *cred,
 			    struct nfs_fattr *fattr, struct iattr *sattr,
@@ -1459,7 +1458,7 @@ static void __update_open_stateid(struct nfs4_state *state,
 	spin_unlock(&state->owner->so_lock);
 }
 
-static int update_open_stateid(struct nfs4_state *state,
+int update_open_stateid(struct nfs4_state *state,
 		const nfs4_stateid *open_stateid,
 		const nfs4_stateid *delegation,
 		fmode_t fmode)
@@ -3609,7 +3608,7 @@ static int _nfs4_proc_getattr(struct nfs_server *server, struct nfs_fh *fhandle,
 	return nfs4_call_sync(server->client, server, &msg, &args.seq_args, &res.seq_res, 0);
 }
 
-static int nfs4_proc_getattr(struct nfs_server *server, struct nfs_fh *fhandle,
+int nfs4_proc_getattr(struct nfs_server *server, struct nfs_fh *fhandle,
 				struct nfs_fattr *fattr, struct nfs4_label *label)
 {
 	struct nfs4_exception exception = { };
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC v1 05/19] NFS add COPY_NOTIFY operation
  2017-03-02 16:01 [RFC v1 00/19] NFS support for inter and async COPY Olga Kornievskaia
                   ` (3 preceding siblings ...)
  2017-03-02 16:01 ` [RFC v1 04/19] NFS inter ssc open Olga Kornievskaia
@ 2017-03-02 16:01 ` Olga Kornievskaia
  2017-03-02 16:01 ` [RFC v1 06/19] NFS add ca_source_server<> to COPY Olga Kornievskaia
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-02 16:01 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: linux-nfs

Try using the delegation stateid, then the open stateid.

Only NL4_NETATTR, No support for NL4_NAME and NL4_URL.
Allow only one source server address to be returned for now.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfs/internal.h         |  10 +++
 fs/nfs/nfs42.h            |   4 +
 fs/nfs/nfs42proc.c        | 101 +++++++++++++++++++++++++
 fs/nfs/nfs42xdr.c         | 186 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/nfs4_fs.h          |   1 +
 fs/nfs/nfs4file.c         |  33 +++++++-
 fs/nfs/nfs4proc.c         |   1 +
 fs/nfs/nfs4state.c        |   2 +-
 fs/nfs/nfs4xdr.c          |   1 +
 include/linux/nfs4.h      |  34 +++++++++
 include/linux/nfs_fs_sb.h |   1 +
 include/linux/nfs_xdr.h   |  18 +++++
 12 files changed, 390 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 09ca509..c006f73 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -7,6 +7,7 @@
 #include <linux/security.h>
 #include <linux/crc32.h>
 #include <linux/nfs_page.h>
+#include <linux/sunrpc/addr.h>
 
 #define NFS_MS_MASK (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_SYNCHRONOUS)
 
@@ -763,3 +764,12 @@ static inline bool nfs_error_is_fatal(int err)
 		return false;
 	}
 }
+
+static inline bool nfs42_intra_ssc(struct file *in, struct file *out)
+{
+	struct nfs_client *c_in = (NFS_SERVER(file_inode(in)))->nfs_client;
+	struct nfs_client *c_out = (NFS_SERVER(file_inode(out)))->nfs_client;
+
+	return rpc_cmp_addr((struct sockaddr *)&c_in->cl_addr,
+				(struct sockaddr *)&c_out->cl_addr);
+}
diff --git a/fs/nfs/nfs42.h b/fs/nfs/nfs42.h
index b6cd153..2f780d2 100644
--- a/fs/nfs/nfs42.h
+++ b/fs/nfs/nfs42.h
@@ -12,6 +12,7 @@
 #define PNFS_LAYOUTSTATS_MAXDEV (4)
 
 /* nfs4.2proc.c */
+#if defined(CONFIG_NFS_V4_2)
 int nfs42_proc_allocate(struct file *, loff_t, loff_t);
 ssize_t nfs42_proc_copy(struct file *, loff_t, struct file *, loff_t, size_t);
 int nfs42_proc_deallocate(struct file *, loff_t, loff_t);
@@ -19,5 +20,8 @@
 int nfs42_proc_layoutstats_generic(struct nfs_server *,
 				   struct nfs42_layoutstat_data *);
 int nfs42_proc_clone(struct file *, struct file *, loff_t, loff_t, loff_t);
+int nfs42_proc_copy_notify(struct file *, struct file *,
+			   struct nfs42_copy_notify_res *);
+#endif /* CONFIG_NFS_V4_2 */
 
 #endif /* __LINUX_FS_NFS_NFS4_2_H */
diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index 1e486c7..a28d828 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -2,6 +2,7 @@
  * Copyright (c) 2014 Anna Schumaker <Anna.Schumaker@Netapp.com>
  */
 #include <linux/fs.h>
+#include <linux/sunrpc/addr.h>
 #include <linux/sunrpc/sched.h>
 #include <linux/nfs.h>
 #include <linux/nfs3.h>
@@ -14,9 +15,48 @@
 #include "pnfs.h"
 #include "nfs4session.h"
 #include "internal.h"
+#include "delegation.h"
 
 #define NFSDBG_FACILITY NFSDBG_PROC
 
+static int nfs42_set_open_stateid(nfs4_stateid *dst, struct file *file)
+{
+	struct nfs_open_context *ctx;
+	struct rpc_cred *cred;
+	struct nfs_lock_context *l_ctx;
+	int ret;
+
+	ctx = get_nfs_open_context(nfs_file_open_context(file));
+	l_ctx = nfs_get_lock_context(ctx);
+	if (IS_ERR(l_ctx))
+		return PTR_ERR(l_ctx);
+	ret = nfs4_select_rw_stateid(ctx->state, FMODE_READ, l_ctx,
+				     dst, &cred);
+	nfs_put_lock_context(l_ctx);
+	dprintk("<-- %s returns %d\n", __func__, ret);
+	return ret;
+}
+
+static void nfs42_set_netaddr(struct file *filep,
+				      struct nfs42_netaddr *naddr)
+{
+	struct nfs_client *clp = (NFS_SERVER(file_inode(filep)))->nfs_client;
+	unsigned short port = 2049;
+
+	rcu_read_lock();
+	naddr->na_netid_len = scnprintf(naddr->na_netid,
+					sizeof(naddr->na_netid), "%s",
+					rpc_peeraddr2str(clp->cl_rpcclient,
+					RPC_DISPLAY_NETID));
+	naddr->na_uaddr_len = scnprintf(naddr->na_uaddr,
+					sizeof(naddr->na_uaddr),
+					"%s.%u.%u",
+					rpc_peeraddr2str(clp->cl_rpcclient,
+					RPC_DISPLAY_ADDR),
+					port >> 8, port & 255);
+	rcu_read_unlock();
+}
+
 static int _nfs42_proc_fallocate(struct rpc_message *msg, struct file *filep,
 		struct nfs_lock_context *lock, loff_t offset, loff_t len)
 {
@@ -254,6 +294,67 @@ ssize_t nfs42_proc_copy(struct file *src, loff_t pos_src,
 	return err;
 }
 
+int _nfs42_proc_copy_notify(struct file *src, struct file *dst,
+			    struct nfs42_copy_notify_args *args,
+			    struct nfs42_copy_notify_res *res)
+{
+	struct nfs_server *src_server = NFS_SERVER(file_inode(src));
+	struct rpc_message msg = {
+		.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_COPY_NOTIFY],
+		.rpc_argp = args,
+		.rpc_resp = res,
+	};
+	int status;
+
+	status = nfs42_set_open_stateid(&args->cna_src_stateid, src);
+	if (status)
+		goto out;
+	status = nfs4_call_sync(src_server->client, src_server, &msg,
+				&args->cna_seq_args, &res->cnr_seq_res, 0);
+	if (status == -ENOTSUPP)
+		src_server->caps &= ~NFS_CAP_COPY_NOTIFY;
+
+	put_nfs_open_context(nfs_file_open_context(src));
+out:
+	return status;
+}
+
+int nfs42_proc_copy_notify(struct file *src, struct file *dst,
+				struct nfs42_copy_notify_res *res)
+{
+	struct nfs_server *src_server = NFS_SERVER(file_inode(src));
+	struct nfs42_copy_notify_args *args;
+	struct nfs4_exception exception = {
+		.inode = file_inode(src),
+	};
+	int status;
+
+	if (!(src_server->caps & NFS_CAP_COPY_NOTIFY))
+		return -EOPNOTSUPP;
+
+	args = kzalloc(sizeof(struct nfs42_copy_notify_args), GFP_NOFS);
+	if (args == NULL)
+		return -ENOMEM;
+
+	args->cna_src_fh  = NFS_FH(file_inode(src)),
+	args->cna_dst.nl4_type = NL4_NETADDR;
+	nfs42_set_netaddr(dst, &args->cna_dst.u.nl4_addr);
+	exception.stateid = &args->cna_src_stateid;
+
+	do {
+		status = _nfs42_proc_copy_notify(src, dst, args, res);
+		if (status == -ENOTSUPP) {
+			status = -EOPNOTSUPP;
+			goto out;
+		}
+		status = nfs4_handle_exception(src_server, status, &exception);
+	} while (exception.retry);
+
+out:
+	kfree(args);
+	return status;
+}
+
 static loff_t _nfs42_proc_llseek(struct file *filep,
 		struct nfs_lock_context *lock, loff_t offset, int whence)
 {
diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
index 6c72964..aad2b08 100644
--- a/fs/nfs/nfs42xdr.c
+++ b/fs/nfs/nfs42xdr.c
@@ -25,6 +25,16 @@
 					 NFS42_WRITE_RES_SIZE + \
 					 1 /* cr_consecutive */ + \
 					 1 /* cr_synchronous */)
+#define encode_copy_notify_maxsz	(op_encode_hdr_maxsz + \
+					 XDR_QUADLEN(NFS4_STATEID_SIZE) + \
+					 1 + /* nl4_type */ \
+					 1 + XDR_QUADLEN(NFS4_OPAQUE_LIMIT))
+#define decode_copy_notify_maxsz	(op_decode_hdr_maxsz + \
+					 3 + /* cnr_lease_time */\
+					 XDR_QUADLEN(NFS4_STATEID_SIZE) + \
+					 1 + /* Support 1 cnr_source_server */\
+					 1 + /* nl4_type */ \
+					 1 + XDR_QUADLEN(NFS4_OPAQUE_LIMIT))
 #define encode_deallocate_maxsz		(op_encode_hdr_maxsz + \
 					 encode_fallocate_maxsz)
 #define decode_deallocate_maxsz		(op_decode_hdr_maxsz)
@@ -72,6 +82,12 @@
 					 decode_savefh_maxsz + \
 					 decode_putfh_maxsz + \
 					 decode_copy_maxsz)
+#define NFS4_enc_copy_notify_sz		(compound_encode_hdr_maxsz + \
+					 encode_putfh_maxsz + \
+					 encode_copy_notify_maxsz)
+#define NFS4_dec_copy_notify_sz		(compound_decode_hdr_maxsz + \
+					 decode_putfh_maxsz + \
+					 decode_copy_notify_maxsz)
 #define NFS4_enc_deallocate_sz		(compound_encode_hdr_maxsz + \
 					 encode_putfh_maxsz + \
 					 encode_deallocate_maxsz + \
@@ -125,6 +141,25 @@ static void encode_allocate(struct xdr_stream *xdr,
 	encode_fallocate(xdr, args);
 }
 
+static void encode_nl4_server(struct xdr_stream *xdr, struct nl4_server *ns)
+{
+	encode_uint32(xdr, ns->nl4_type);
+	switch (ns->nl4_type) {
+	case NL4_NAME:
+	case NL4_URL:
+		encode_string(xdr, ns->u.nl4_str_sz, ns->u.nl4_str);
+	break;
+	case NL4_NETADDR:
+		encode_string(xdr, ns->u.nl4_addr.na_netid_len,
+			      ns->u.nl4_addr.na_netid);
+		encode_string(xdr, ns->u.nl4_addr.na_uaddr_len,
+			      ns->u.nl4_addr.na_uaddr);
+	break;
+	default:
+		WARN_ON_ONCE(1);
+	}
+}
+
 static void encode_copy(struct xdr_stream *xdr,
 			struct nfs42_copy_args *args,
 			struct compound_hdr *hdr)
@@ -142,6 +177,15 @@ static void encode_copy(struct xdr_stream *xdr,
 	encode_uint32(xdr, 0); /* src server list */
 }
 
+static void encode_copy_notify(struct xdr_stream *xdr,
+			       struct nfs42_copy_notify_args *args,
+			       struct compound_hdr *hdr)
+{
+	encode_op_hdr(xdr, OP_COPY_NOTIFY, decode_copy_notify_maxsz, hdr);
+	encode_nfs4_stateid(xdr, &args->cna_src_stateid);
+	encode_nl4_server(xdr, &args->cna_dst);
+}
+
 static void encode_deallocate(struct xdr_stream *xdr,
 			      struct nfs42_falloc_args *args,
 			      struct compound_hdr *hdr)
@@ -243,6 +287,24 @@ static void nfs4_xdr_enc_copy(struct rpc_rqst *req,
 }
 
 /*
+ * Encode COPY_NOTIFY request
+ */
+static void nfs4_xdr_enc_copy_notify(struct rpc_rqst *req,
+				     struct xdr_stream *xdr,
+				     struct nfs42_copy_notify_args *args)
+{
+	struct compound_hdr hdr = {
+		.minorversion = nfs4_xdr_minorversion(&args->cna_seq_args),
+	};
+
+	encode_compound_hdr(xdr, req, &hdr);
+	encode_sequence(xdr, &args->cna_seq_args, &hdr);
+	encode_putfh(xdr, args->cna_src_fh, &hdr);
+	encode_copy_notify(xdr, args, &hdr);
+	encode_nops(&hdr);
+}
+
+/*
  * Encode DEALLOCATE request
  */
 static void nfs4_xdr_enc_deallocate(struct rpc_rqst *req,
@@ -355,6 +417,58 @@ static int decode_write_response(struct xdr_stream *xdr,
 	return -EIO;
 }
 
+static int decode_nl4_server(struct xdr_stream *xdr, struct nl4_server *ns)
+{
+	struct nfs42_netaddr *naddr;
+	uint32_t dummy;
+	char *dummy_str;
+	__be32 *p;
+	int status;
+
+	/* nl_type */
+	p = xdr_inline_decode(xdr, 4);
+	if (unlikely(!p))
+		return -EIO;
+	ns->nl4_type = be32_to_cpup(p);
+	switch (ns->nl4_type) {
+	case NL4_NAME:
+	case NL4_URL:
+		status = decode_opaque_inline(xdr, &dummy, &dummy_str);
+		if (unlikely(status))
+			return status;
+		if (unlikely(dummy > NFS4_OPAQUE_LIMIT))
+			return -EIO;
+		memcpy(&ns->u.nl4_str, dummy_str, dummy);
+		ns->u.nl4_str_sz = dummy;
+		break;
+	case NL4_NETADDR:
+		naddr = &ns->u.nl4_addr;
+
+		/* netid string */
+		status = decode_opaque_inline(xdr, &dummy, &dummy_str);
+		if (unlikely(status))
+			return status;
+		if (unlikely(dummy > RPCBIND_MAXNETIDLEN))
+			return -EIO;
+		naddr->na_netid_len = dummy;
+		memcpy(naddr->na_netid, dummy_str, naddr->na_netid_len);
+
+		/* uaddr string */
+		status = decode_opaque_inline(xdr, &dummy, &dummy_str);
+		if (unlikely(status))
+			return status;
+		if (unlikely(dummy > RPCBIND_MAXUADDRLEN))
+			return -EIO;
+		naddr->na_uaddr_len = dummy;
+		memcpy(naddr->na_uaddr, dummy_str, naddr->na_uaddr_len);
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		return -EIO;
+	}
+	return 0;
+}
+
 static int decode_copy_requirements(struct xdr_stream *xdr,
 				    struct nfs42_copy_res *res) {
 	__be32 *p;
@@ -391,6 +505,53 @@ static int decode_copy(struct xdr_stream *xdr, struct nfs42_copy_res *res)
 	return decode_copy_requirements(xdr, res);
 }
 
+static int decode_copy_notify(struct xdr_stream *xdr,
+			      struct nfs42_copy_notify_res *res)
+{
+	__be32 *p;
+	struct nl4_server *ns;
+	int status, i;
+
+	status = decode_op_hdr(xdr, OP_COPY_NOTIFY);
+	if (status)
+		return status;
+	/* cnr_lease_time */
+	p = xdr_inline_decode(xdr, 12);
+	if (unlikely(!p))
+		goto out_overflow;
+	p = xdr_decode_hyper(p, &res->cnr_lease_time.seconds);
+	res->cnr_lease_time.nseconds = be32_to_cpup(p);
+
+	status = decode_opaque_fixed(xdr, &res->cnr_stateid, NFS4_STATEID_SIZE);
+	if (unlikely(status))
+		goto out_overflow;
+
+	/* number of source addresses */
+	p = xdr_inline_decode(xdr, 4);
+	if (unlikely(!p))
+		goto out_overflow;
+
+	res->cnr_src.nl_nsvr = be32_to_cpup(p);
+	if (res->cnr_src.nl_nsvr > NFS42_MAX_SSC_SRC) {
+		pr_warn("NFS: %s: nsvr %d > Supported. Use first %d servers\n",
+			 __func__, res->cnr_src.nl_nsvr, NFS42_MAX_SSC_SRC);
+		res->cnr_src.nl_nsvr = NFS42_MAX_SSC_SRC;
+	}
+
+	ns = res->cnr_src.nl_svr;
+	for (i = 0; i < res->cnr_src.nl_nsvr; i++) {
+		status = decode_nl4_server(xdr, ns);
+		if (unlikely(status))
+			goto out_overflow;
+		ns++;
+	}
+	return 0;
+
+out_overflow:
+	print_overflow_msg(__func__, xdr);
+	return -EIO;
+}
+
 static int decode_deallocate(struct xdr_stream *xdr, struct nfs42_falloc_res *res)
 {
 	return decode_op_hdr(xdr, OP_DEALLOCATE);
@@ -486,6 +647,31 @@ static int nfs4_xdr_dec_copy(struct rpc_rqst *rqstp,
 }
 
 /*
+ * Decode COPY_NOTIFY response
+ */
+static int nfs4_xdr_dec_copy_notify(struct rpc_rqst *rqstp,
+				    struct xdr_stream *xdr,
+				    struct nfs42_copy_notify_res *res)
+{
+	struct compound_hdr hdr;
+	int status;
+
+	status = decode_compound_hdr(xdr, &hdr);
+	if (status)
+		goto out;
+	status = decode_sequence(xdr, &res->cnr_seq_res, rqstp);
+	if (status)
+		goto out;
+	status = decode_putfh(xdr);
+	if (status)
+		goto out;
+	status = decode_copy_notify(xdr, res);
+
+out:
+	return status;
+}
+
+/*
  * Decode DEALLOCATE request
  */
 static int nfs4_xdr_dec_deallocate(struct rpc_rqst *rqstp,
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index ced3431..c23089e 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -453,6 +453,7 @@ extern void nfs41_handle_server_scope(struct nfs_client *,
 extern int nfs4_select_rw_stateid(struct nfs4_state *, fmode_t,
 		const struct nfs_lock_context *, nfs4_stateid *,
 		struct rpc_cred **);
+extern void nfs4_copy_open_stateid(nfs4_stateid *, struct nfs4_state *);
 
 extern struct nfs_seqid *nfs_alloc_seqid(struct nfs_seqid_counter *counter, gfp_t gfp_mask);
 extern int nfs_wait_on_sequence(struct nfs_seqid *seqid, struct rpc_task *task);
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index 0730c62..dc6cab3 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -134,10 +134,41 @@ static ssize_t nfs4_copy_file_range(struct file *file_in, loff_t pos_in,
 				    struct file *file_out, loff_t pos_out,
 				    size_t count, unsigned int flags)
 {
+	struct nfs42_copy_notify_res *cn_resp = NULL;
+	ssize_t ret;
+
 	if (file_inode(file_in) == file_inode(file_out))
 		return -EINVAL;
 
-	return nfs42_proc_copy(file_in, pos_in, file_out, pos_out, count);
+	if (nfs42_intra_ssc(file_in, file_out)) {  /* Intra-ssc */
+		if (file_in->f_op != file_out->f_op)
+			return -EXDEV;
+	} else {  /* Inter-ssc */
+		cn_resp = kzalloc(sizeof(struct nfs42_copy_notify_res),
+				  GFP_NOFS);
+		if (unlikely(cn_resp == NULL))
+			return -ENOMEM;
+
+		cn_resp->cnr_src.nl_svr = kzalloc(NFS42_MAX_SSC_SRC *
+						sizeof(struct nl4_server),
+						GFP_NOFS);
+		if (unlikely(cn_resp->cnr_src.nl_svr == NULL)) {
+			kfree(cn_resp);
+			return -ENOMEM;
+		}
+
+		ret = nfs42_proc_copy_notify(file_in, file_out, cn_resp);
+		if (ret)
+			goto out;
+	}
+	ret = nfs42_proc_copy(file_in, pos_in, file_out, pos_out, count);
+
+out:
+	if (cn_resp) {
+		kfree(cn_resp->cnr_src.nl_svr);
+		kfree(cn_resp);
+	}
+	return ret;
 }
 
 static loff_t nfs4_file_llseek(struct file *filep, loff_t offset, int whence)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index fbe2452..4633b26 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -9244,6 +9244,7 @@ static bool nfs4_match_stateid(const nfs4_stateid *s1,
 		| NFS_CAP_ATOMIC_OPEN_V1
 		| NFS_CAP_ALLOCATE
 		| NFS_CAP_COPY
+		| NFS_CAP_COPY_NOTIFY
 		| NFS_CAP_DEALLOCATE
 		| NFS_CAP_SEEK
 		| NFS_CAP_LAYOUTSTATS
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 8156bad..5827d95 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -979,7 +979,7 @@ static int nfs4_copy_lock_stateid(nfs4_stateid *dst,
 	return ret;
 }
 
-static void nfs4_copy_open_stateid(nfs4_stateid *dst, struct nfs4_state *state)
+void nfs4_copy_open_stateid(nfs4_stateid *dst, struct nfs4_state *state)
 {
 	const nfs4_stateid *src;
 	int seq;
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index f0369e3..9def5c6 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -7573,6 +7573,7 @@ struct rpc_procinfo	nfs4_procedures[] = {
 	PROC(LAYOUTSTATS,	enc_layoutstats,	dec_layoutstats),
 	PROC(CLONE,		enc_clone,		dec_clone),
 	PROC(COPY,		enc_copy,		dec_copy),
+	PROC(COPY_NOTIFY,	enc_copy_notify,	dec_copy_notify),
 #endif /* CONFIG_NFS_V4_2 */
 };
 
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 1b1ca04..e4be278 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -15,6 +15,7 @@
 #include <linux/list.h>
 #include <linux/uidgid.h>
 #include <uapi/linux/nfs4.h>
+#include <linux/sunrpc/msg_prot.h>
 
 enum nfs4_acl_whotype {
 	NFS4_ACL_WHO_NAMED = 0,
@@ -523,6 +524,7 @@ enum {
 	NFSPROC4_CLNT_LAYOUTSTATS,
 	NFSPROC4_CLNT_CLONE,
 	NFSPROC4_CLNT_COPY,
+	NFSPROC4_CLNT_COPY_NOTIFY,
 };
 
 /* nfs41 types */
@@ -657,4 +659,36 @@ struct nfs4_op_map {
 	} u;
 };
 
+struct nfs42_netaddr {
+	unsigned int	na_netid_len;
+	char		na_netid[RPCBIND_MAXNETIDLEN + 1];
+	unsigned int	na_uaddr_len;
+	char		na_uaddr[RPCBIND_MAXUADDRLEN + 1];
+};
+
+enum netloc_type4 {
+	NL4_NAME		= 1,
+	NL4_URL			= 2,
+	NL4_NETADDR		= 3,
+};
+
+struct nl4_server {
+	enum netloc_type4	nl4_type;
+	union {
+		struct { /* NL4_NAME, NL4_URL */
+			int	nl4_str_sz;
+			char	nl4_str[NFS4_OPAQUE_LIMIT + 1];
+		};
+		struct nfs42_netaddr	nl4_addr; /* NL4_NETADDR */
+	} u;
+};
+
+/*  support 1 nl4_server for now */
+#define NFS42_MAX_SSC_SRC       1
+
+struct nl4_servers {
+	int			nl_nsvr;
+	struct nl4_server	*nl_svr;
+};
+
 #endif
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index b34097c..a538b69 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -250,5 +250,6 @@ struct nfs_server {
 #define NFS_CAP_LAYOUTSTATS	(1U << 22)
 #define NFS_CAP_CLONE		(1U << 23)
 #define NFS_CAP_COPY		(1U << 24)
+#define NFS_CAP_COPY_NOTIFY	(1U << 25)
 
 #endif
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 348f7c1..1af723f 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1385,6 +1385,24 @@ struct nfs42_copy_res {
 	bool				synchronous;
 };
 
+struct nfs42_copy_notify_args {
+	struct nfs4_sequence_args	cna_seq_args;
+
+	struct nfs_fh		*cna_src_fh;
+	nfs4_stateid		cna_src_stateid;
+	/* cna_destiniation_server */
+	struct nl4_server	cna_dst;
+};
+
+struct nfs42_copy_notify_res {
+	struct nfs4_sequence_res	cnr_seq_res;
+
+	struct nfstime4		cnr_lease_time;
+	nfs4_stateid		cnr_stateid;
+	/* cnr_source_server, for now, always 1 */
+	struct nl4_servers	cnr_src;
+};
+
 struct nfs42_seek_args {
 	struct nfs4_sequence_args	seq_args;
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC v1 06/19] NFS add ca_source_server<> to COPY
  2017-03-02 16:01 [RFC v1 00/19] NFS support for inter and async COPY Olga Kornievskaia
                   ` (4 preceding siblings ...)
  2017-03-02 16:01 ` [RFC v1 05/19] NFS add COPY_NOTIFY operation Olga Kornievskaia
@ 2017-03-02 16:01 ` Olga Kornievskaia
  2017-03-02 16:01 ` [RFC v1 07/19] NFS CB_OFFLOAD xdr Olga Kornievskaia
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-02 16:01 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: linux-nfs

Support only one source server address: the same address that
the client and source server use.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfs/nfs42.h          |  3 ++-
 fs/nfs/nfs42proc.c      | 26 +++++++++++++++++---------
 fs/nfs/nfs42xdr.c       | 19 +++++++++++++++++--
 fs/nfs/nfs4file.c       |  7 ++++++-
 include/linux/nfs_xdr.h |  2 ++
 5 files changed, 44 insertions(+), 13 deletions(-)

diff --git a/fs/nfs/nfs42.h b/fs/nfs/nfs42.h
index 2f780d2..820fd63 100644
--- a/fs/nfs/nfs42.h
+++ b/fs/nfs/nfs42.h
@@ -14,7 +14,8 @@
 /* nfs4.2proc.c */
 #if defined(CONFIG_NFS_V4_2)
 int nfs42_proc_allocate(struct file *, loff_t, loff_t);
-ssize_t nfs42_proc_copy(struct file *, loff_t, struct file *, loff_t, size_t);
+ssize_t nfs42_proc_copy(struct file *, loff_t, struct file *, loff_t, size_t,
+			struct nl4_servers *, nfs4_stateid *);
 int nfs42_proc_deallocate(struct file *, loff_t, loff_t);
 loff_t nfs42_proc_llseek(struct file *, loff_t, int);
 int nfs42_proc_layoutstats_generic(struct nfs_server *,
diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index a28d828..ade2938 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -174,7 +174,9 @@ static ssize_t _nfs42_proc_copy(struct file *src,
 				struct file *dst,
 				struct nfs_lock_context *dst_lock,
 				struct nfs42_copy_args *args,
-				struct nfs42_copy_res *res)
+				struct nfs42_copy_res *res,
+				struct nl4_servers *nss,
+				nfs4_stateid *cnr_stateid)
 {
 	struct rpc_message msg = {
 		.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_COPY],
@@ -188,11 +190,15 @@ static ssize_t _nfs42_proc_copy(struct file *src,
 	size_t count = args->count;
 	int status;
 
-	status = nfs4_set_rw_stateid(&args->src_stateid, src_lock->open_context,
-				     src_lock, FMODE_READ);
-	if (status)
-		return status;
-
+	if (nss) {
+		args->cp_src = nss;
+		nfs4_stateid_copy(&args->src_stateid, cnr_stateid);
+	} else {
+		status = nfs4_set_rw_stateid(&args->src_stateid,
+				src_lock->open_context, src_lock, FMODE_READ);
+		if (status)
+			return status;
+	}
 	status = nfs_filemap_write_and_wait_range(file_inode(src)->i_mapping,
 			pos_src, pos_src + (loff_t)count - 1);
 	if (status)
@@ -227,8 +233,9 @@ static ssize_t _nfs42_proc_copy(struct file *src,
 }
 
 ssize_t nfs42_proc_copy(struct file *src, loff_t pos_src,
-			struct file *dst, loff_t pos_dst,
-			size_t count)
+			struct file *dst, loff_t pos_dst, size_t count,
+			struct nl4_servers *nss,
+			nfs4_stateid *cnr_stateid)
 {
 	struct nfs_server *server = NFS_SERVER(file_inode(dst));
 	struct nfs_lock_context *src_lock;
@@ -272,7 +279,8 @@ ssize_t nfs42_proc_copy(struct file *src, loff_t pos_src,
 		inode_lock(file_inode(dst));
 		err = _nfs42_proc_copy(src, src_lock,
 				dst, dst_lock,
-				&args, &res);
+				&args, &res,
+				nss, cnr_stateid);
 		inode_unlock(file_inode(dst));
 
 		if (err >= 0)
diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
index aad2b08..fa0daaa 100644
--- a/fs/nfs/nfs42xdr.c
+++ b/fs/nfs/nfs42xdr.c
@@ -20,7 +20,10 @@
 #define encode_copy_maxsz		(op_encode_hdr_maxsz +          \
 					 XDR_QUADLEN(NFS4_STATEID_SIZE) + \
 					 XDR_QUADLEN(NFS4_STATEID_SIZE) + \
-					 2 + 2 + 2 + 1 + 1 + 1)
+					 2 + 2 + 2 + 1 + 1 + 1 +\
+					 1 + /* One cnr_source_server */\
+					 1 + /* nl4_type */ \
+					 1 + XDR_QUADLEN(NFS4_OPAQUE_LIMIT))
 #define decode_copy_maxsz		(op_decode_hdr_maxsz + \
 					 NFS42_WRITE_RES_SIZE + \
 					 1 /* cr_consecutive */ + \
@@ -164,6 +167,9 @@ static void encode_copy(struct xdr_stream *xdr,
 			struct nfs42_copy_args *args,
 			struct compound_hdr *hdr)
 {
+	struct nl4_server *ns;
+	int i;
+
 	encode_op_hdr(xdr, OP_COPY, decode_copy_maxsz, hdr);
 	encode_nfs4_stateid(xdr, &args->src_stateid);
 	encode_nfs4_stateid(xdr, &args->dst_stateid);
@@ -174,7 +180,16 @@ static void encode_copy(struct xdr_stream *xdr,
 
 	encode_uint32(xdr, 1); /* consecutive = true */
 	encode_uint32(xdr, 1); /* synchronous = true */
-	encode_uint32(xdr, 0); /* src server list */
+	if (args->cp_src == NULL) { /* intra-ssc */
+		encode_uint32(xdr, 0); /* no src server list */
+		return;
+	}
+	encode_uint32(xdr, args->cp_src->nl_nsvr);
+	ns = args->cp_src->nl_svr;
+	for (i = 0; i < args->cp_src->nl_nsvr; i++) {
+		encode_nl4_server(xdr, args->cp_src->nl_svr);
+		ns++;
+	}
 }
 
 static void encode_copy_notify(struct xdr_stream *xdr,
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index dc6cab3..43a2346 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -135,6 +135,8 @@ static ssize_t nfs4_copy_file_range(struct file *file_in, loff_t pos_in,
 				    size_t count, unsigned int flags)
 {
 	struct nfs42_copy_notify_res *cn_resp = NULL;
+	struct nl4_servers *nss = NULL;
+	nfs4_stateid *cnrs = NULL;
 	ssize_t ret;
 
 	if (file_inode(file_in) == file_inode(file_out))
@@ -160,8 +162,11 @@ static ssize_t nfs4_copy_file_range(struct file *file_in, loff_t pos_in,
 		ret = nfs42_proc_copy_notify(file_in, file_out, cn_resp);
 		if (ret)
 			goto out;
+		nss = &cn_resp->cnr_src;
+		cnrs = &cn_resp->cnr_stateid;
 	}
-	ret = nfs42_proc_copy(file_in, pos_in, file_out, pos_out, count);
+	ret = nfs42_proc_copy(file_in, pos_in, file_out, pos_out, count, nss,
+				cnrs);
 
 out:
 	if (cn_resp) {
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 1af723f..9d4ea77 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1371,6 +1371,8 @@ struct nfs42_copy_args {
 	u64				dst_pos;
 
 	u64				count;
+	/* Support one source server */
+	struct nl4_servers		*cp_src;
 };
 
 struct nfs42_write_res {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC v1 07/19] NFS CB_OFFLOAD xdr
  2017-03-02 16:01 [RFC v1 00/19] NFS support for inter and async COPY Olga Kornievskaia
                   ` (5 preceding siblings ...)
  2017-03-02 16:01 ` [RFC v1 06/19] NFS add ca_source_server<> to COPY Olga Kornievskaia
@ 2017-03-02 16:01 ` Olga Kornievskaia
  2017-03-02 16:01 ` [RFC v1 08/19] NFS OFFLOAD_STATUS xdr Olga Kornievskaia
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-02 16:01 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: linux-nfs

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfs/callback.h      | 13 ++++++++
 fs/nfs/callback_proc.c |  7 +++++
 fs/nfs/callback_xdr.c  | 80 +++++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/callback.h b/fs/nfs/callback.h
index c701c30..e4ab65d 100644
--- a/fs/nfs/callback.h
+++ b/fs/nfs/callback.h
@@ -189,6 +189,19 @@ extern __be32 nfs4_callback_notify_lock(struct cb_notify_lock_args *args,
 					 void *dummy,
 					 struct cb_process_state *cps);
 #endif /* CONFIG_NFS_V4_1 */
+#ifdef CONFIG_NFS_V4_2
+struct cb_offloadargs {
+	struct nfs_fh		coa_fh;
+	nfs4_stateid		coa_stateid;
+	uint32_t		error;
+	uint64_t		wr_count;
+	struct nfs_writeverf	wr_writeverf;
+};
+
+extern __be32 nfs4_callback_offload(
+	struct cb_offloadargs *args,
+	void *dummy, struct cb_process_state *cps);
+#endif /* CONFIG_NFS_V4_2 */
 extern int check_gss_callback_principal(struct nfs_client *, struct svc_rqst *);
 extern __be32 nfs4_callback_getattr(struct cb_getattrargs *args,
 				    struct cb_getattrres *res,
diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index f073a6d2c..b68b803 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -680,3 +680,10 @@ __be32 nfs4_callback_notify_lock(struct cb_notify_lock_args *args, void *dummy,
 	return htonl(NFS4_OK);
 }
 #endif /* CONFIG_NFS_V4_1 */
+#ifdef CONFIG_NFS_V4_2
+__be32 nfs4_callback_offload(struct cb_offloadargs *args, void *dummy,
+				struct cb_process_state *cps)
+{
+	return 0;
+}
+#endif /* CONFIG_NFS_V4_2 */
diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c
index 2ade5cb..192824f 100644
--- a/fs/nfs/callback_xdr.c
+++ b/fs/nfs/callback_xdr.c
@@ -37,6 +37,9 @@
 #define CB_OP_RECALLSLOT_RES_MAXSZ	(CB_OP_HDR_RES_MAXSZ)
 #define CB_OP_NOTIFY_LOCK_RES_MAXSZ	(CB_OP_HDR_RES_MAXSZ)
 #endif /* CONFIG_NFS_V4_1 */
+#ifdef CONFIG_NFS_V4_2
+#define CB_OP_OFFLOAD_RES_MAXSZ		(CB_OP_HDR_RES_MAXSZ)
+#endif /* CONFIG_NFS_V4_2 */
 
 #define NFSDBG_FACILITY NFSDBG_CALLBACK
 
@@ -565,7 +568,72 @@ static __be32 decode_notify_lock_args(struct svc_rqst *rqstp, struct xdr_stream
 }
 
 #endif /* CONFIG_NFS_V4_1 */
+#ifdef CONFIG_NFS_V4_2
+static __be32 decode_write_response(struct xdr_stream *xdr,
+					struct cb_offloadargs *args)
+{
+	__be32 *p;
+	__be32 dummy;
+
+	/* skip the always zero field */
+	p = read_buf(xdr, 4);
+	if (unlikely(!p))
+		goto out;
+	dummy = ntohl(*p++);
+
+	/* decode count, stable_how, verifier */
+	p = xdr_inline_decode(xdr, 8 + 4);
+	if (unlikely(!p))
+		goto out;
+	p = xdr_decode_hyper(p, &args->wr_count);
+	args->wr_writeverf.committed = be32_to_cpup(p);
+	p = xdr_inline_decode(xdr, NFS4_VERIFIER_SIZE);
+	if (likely(p)) {
+		memcpy(&args->wr_writeverf.verifier.data[0], p,
+			NFS4_VERIFIER_SIZE);
+		return 0;
+	}
+out:
+	return htonl(NFS4ERR_RESOURCE);
+}
 
+static __be32 decode_offload_args(struct svc_rqst *rqstp,
+					struct xdr_stream *xdr,
+					struct cb_offloadargs *args)
+{
+	__be32 *p;
+	__be32 status;
+
+	/* decode fh */
+	status = decode_fh(xdr, &args->coa_fh);
+	if (unlikely(status != 0))
+		return status;
+
+	/* decode stateid */
+	status = decode_stateid(xdr, &args->coa_stateid);
+	if (unlikely(status != 0))
+		return status;
+
+	/* decode status */
+	p = read_buf(xdr, 4);
+	if (unlikely(!p))
+		goto out;
+	args->error = ntohl(*p++);
+	if (!args->error) {
+		status = decode_write_response(xdr, args);
+		if (unlikely(status != 0))
+			return status;
+	} else {
+		p = xdr_inline_decode(xdr, 8);
+		if (unlikely(!p))
+			goto out;
+		p = xdr_decode_hyper(p, &args->wr_count);
+	}
+	return 0;
+out:
+	return htonl(NFS4ERR_RESOURCE);
+}
+#endif /* CONFIG_NFS_V4_2 */
 static __be32 encode_string(struct xdr_stream *xdr, unsigned int len, const char *str)
 {
 	if (unlikely(xdr_stream_encode_opaque(xdr, str, len) < 0))
@@ -833,7 +901,10 @@ static void nfs4_cb_free_slot(struct cb_process_state *cps)
 	if (status != htonl(NFS4ERR_OP_ILLEGAL))
 		return status;
 
-	if (op_nr == OP_CB_OFFLOAD)
+	if (op_nr == OP_CB_OFFLOAD) {
+		*op = &callback_ops[op_nr];
+		return htonl(NFS_OK);
+	} else
 		return htonl(NFS4ERR_NOTSUPP);
 	return htonl(NFS4ERR_OP_ILLEGAL);
 }
@@ -1038,6 +1109,13 @@ static __be32 nfs4_callback_compound(struct svc_rqst *rqstp, void *argp, void *r
 		.res_maxsize = CB_OP_NOTIFY_LOCK_RES_MAXSZ,
 	},
 #endif /* CONFIG_NFS_V4_1 */
+#ifdef CONFIG_NFS_V4_2
+	[OP_CB_OFFLOAD] = {
+		.process_op = (callback_process_op_t)nfs4_callback_offload,
+		.decode_args = (callback_decode_arg_t)decode_offload_args,
+		.res_maxsize = CB_OP_OFFLOAD_RES_MAXSZ,
+	},
+#endif /* CONFIG_NFS_V4_2 */
 };
 
 /*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC v1 08/19] NFS OFFLOAD_STATUS xdr
  2017-03-02 16:01 [RFC v1 00/19] NFS support for inter and async COPY Olga Kornievskaia
                   ` (6 preceding siblings ...)
  2017-03-02 16:01 ` [RFC v1 07/19] NFS CB_OFFLOAD xdr Olga Kornievskaia
@ 2017-03-02 16:01 ` Olga Kornievskaia
  2017-03-02 16:01 ` [RFC v1 09/19] NFS OFFLOAD_STATUS op Olga Kornievskaia
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-02 16:01 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: linux-nfs

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfs/nfs42xdr.c         | 87 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/nfs4proc.c         |  1 +
 fs/nfs/nfs4xdr.c          |  1 +
 include/linux/nfs4.h      |  1 +
 include/linux/nfs_fs_sb.h |  1 +
 include/linux/nfs_xdr.h   | 12 +++++++
 6 files changed, 103 insertions(+)

diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
index fa0daaa..352af86 100644
--- a/fs/nfs/nfs42xdr.c
+++ b/fs/nfs/nfs42xdr.c
@@ -38,6 +38,11 @@
 					 1 + /* Support 1 cnr_source_server */\
 					 1 + /* nl4_type */ \
 					 1 + XDR_QUADLEN(NFS4_OPAQUE_LIMIT))
+#define encode_offload_status_maxsz	(op_encode_hdr_maxsz + \
+					 XDR_QUADLEN(NFS4_STATEID_SIZE))
+#define decode_offload_status_maxsz	(op_decode_hdr_maxsz + \
+					 2 /* osr_count */ + \
+					 1 + 1 /* osr_complete<1> */)
 #define encode_deallocate_maxsz		(op_encode_hdr_maxsz + \
 					 encode_fallocate_maxsz)
 #define decode_deallocate_maxsz		(op_decode_hdr_maxsz)
@@ -91,6 +96,12 @@
 #define NFS4_dec_copy_notify_sz		(compound_decode_hdr_maxsz + \
 					 decode_putfh_maxsz + \
 					 decode_copy_notify_maxsz)
+#define NFS4_enc_offload_status_sz	(compound_encode_hdr_maxsz + \
+					 encode_putfh_maxsz + \
+					 encode_offload_status_maxsz)
+#define NFS4_dec_offload_status_sz	(compound_decode_hdr_maxsz + \
+					 decode_putfh_maxsz + \
+					 decode_offload_status_maxsz)
 #define NFS4_enc_deallocate_sz		(compound_encode_hdr_maxsz + \
 					 encode_putfh_maxsz + \
 					 encode_deallocate_maxsz + \
@@ -201,6 +212,14 @@ static void encode_copy_notify(struct xdr_stream *xdr,
 	encode_nl4_server(xdr, &args->cna_dst);
 }
 
+static void encode_offload_status(struct xdr_stream *xdr,
+				  struct nfs42_offload_status_args *args,
+				  struct compound_hdr *hdr)
+{
+	encode_op_hdr(xdr, OP_OFFLOAD_STATUS, decode_offload_status_maxsz, hdr);
+	encode_nfs4_stateid(xdr, &args->osa_stateid);
+}
+
 static void encode_deallocate(struct xdr_stream *xdr,
 			      struct nfs42_falloc_args *args,
 			      struct compound_hdr *hdr)
@@ -320,6 +339,24 @@ static void nfs4_xdr_enc_copy_notify(struct rpc_rqst *req,
 }
 
 /*
+ * Encode OFFLOAD_STATUS request
+ */
+static void nfs4_xdr_enc_offload_status(struct rpc_rqst *req,
+					struct xdr_stream *xdr,
+					struct nfs42_offload_status_args *args)
+{
+	struct compound_hdr hdr = {
+		.minorversion = nfs4_xdr_minorversion(&args->osa_seq_args),
+	};
+
+	encode_compound_hdr(xdr, req, &hdr);
+	encode_sequence(xdr, &args->osa_seq_args, &hdr);
+	encode_putfh(xdr, args->osa_src_fh, &hdr);
+	encode_offload_status(xdr, args, &hdr);
+	encode_nops(&hdr);
+}
+
+/*
  * Encode DEALLOCATE request
  */
 static void nfs4_xdr_enc_deallocate(struct rpc_rqst *req,
@@ -567,6 +604,31 @@ static int decode_copy_notify(struct xdr_stream *xdr,
 	return -EIO;
 }
 
+static int decode_offload_status(struct xdr_stream *xdr,
+				 struct nfs42_offload_status_res *res)
+{
+	__be32 *p;
+	uint32_t count;
+	int status;
+
+	status = decode_op_hdr(xdr, OP_OFFLOAD_STATUS);
+	if (status)
+		return status;
+	p = xdr_inline_decode(xdr, 8 + 4);
+	if (unlikely(!p))
+		goto out_overflow;
+	p = xdr_decode_hyper(p, &res->osr_count);
+	count = be32_to_cpup(p++);
+	if (count) {
+		p = xdr_inline_decode(xdr, 4);
+		res->osr_status = be32_to_cpup(p);
+	}
+	return 0;
+out_overflow:
+	print_overflow_msg(__func__, xdr);
+	return -EIO;
+}
+
 static int decode_deallocate(struct xdr_stream *xdr, struct nfs42_falloc_res *res)
 {
 	return decode_op_hdr(xdr, OP_DEALLOCATE);
@@ -687,6 +749,31 @@ static int nfs4_xdr_dec_copy_notify(struct rpc_rqst *rqstp,
 }
 
 /*
+ * Decode OFFLOAD_STATUS response
+ */
+static int nfs4_xdr_dec_offload_status(struct rpc_rqst *rqstp,
+				       struct xdr_stream *xdr,
+				       struct nfs42_offload_status_res *res)
+{
+	struct compound_hdr hdr;
+	int status;
+
+	status = decode_compound_hdr(xdr, &hdr);
+	if (status)
+		goto out;
+	status = decode_sequence(xdr, &res->osr_seq_res, rqstp);
+	if (status)
+		goto out;
+	status = decode_putfh(xdr);
+	if (status)
+		goto out;
+	status = decode_offload_status(xdr, res);
+
+out:
+	return status;
+}
+
+/*
  * Decode DEALLOCATE request
  */
 static int nfs4_xdr_dec_deallocate(struct rpc_rqst *rqstp,
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 4633b26..9071652 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -9245,6 +9245,7 @@ static bool nfs4_match_stateid(const nfs4_stateid *s1,
 		| NFS_CAP_ALLOCATE
 		| NFS_CAP_COPY
 		| NFS_CAP_COPY_NOTIFY
+		| NFS_CAP_OFFLOAD_STATUS
 		| NFS_CAP_DEALLOCATE
 		| NFS_CAP_SEEK
 		| NFS_CAP_LAYOUTSTATS
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 9def5c6..a01ff49 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -7574,6 +7574,7 @@ struct rpc_procinfo	nfs4_procedures[] = {
 	PROC(CLONE,		enc_clone,		dec_clone),
 	PROC(COPY,		enc_copy,		dec_copy),
 	PROC(COPY_NOTIFY,	enc_copy_notify,	dec_copy_notify),
+	PROC(OFFLOAD_STATUS,	enc_offload_status,	dec_offload_status),
 #endif /* CONFIG_NFS_V4_2 */
 };
 
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index e4be278..0ed1026 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -525,6 +525,7 @@ enum {
 	NFSPROC4_CLNT_CLONE,
 	NFSPROC4_CLNT_COPY,
 	NFSPROC4_CLNT_COPY_NOTIFY,
+	NFSPROC4_CLNT_OFFLOAD_STATUS,
 };
 
 /* nfs41 types */
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index a538b69..4bc9a13 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -251,5 +251,6 @@ struct nfs_server {
 #define NFS_CAP_CLONE		(1U << 23)
 #define NFS_CAP_COPY		(1U << 24)
 #define NFS_CAP_COPY_NOTIFY	(1U << 25)
+#define NFS_CAP_OFFLOAD_STATUS	(1U << 26)
 
 #endif
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 9d4ea77..65b4323 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1405,6 +1405,18 @@ struct nfs42_copy_notify_res {
 	struct nl4_servers	cnr_src;
 };
 
+struct nfs42_offload_status_args {
+	struct nfs4_sequence_args	osa_seq_args;
+	struct nfs_fh			*osa_src_fh;
+	nfs4_stateid			osa_stateid;
+};
+
+struct nfs42_offload_status_res {
+	struct nfs4_sequence_res	osr_seq_res;
+	uint64_t			osr_count;
+	int				osr_status;
+};
+
 struct nfs42_seek_args {
 	struct nfs4_sequence_args	seq_args;
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC v1 09/19] NFS OFFLOAD_STATUS op
  2017-03-02 16:01 [RFC v1 00/19] NFS support for inter and async COPY Olga Kornievskaia
                   ` (7 preceding siblings ...)
  2017-03-02 16:01 ` [RFC v1 08/19] NFS OFFLOAD_STATUS xdr Olga Kornievskaia
@ 2017-03-02 16:01 ` Olga Kornievskaia
  2017-03-02 16:01 ` [RFC v1 10/19] NFS OFFLOAD_CANCEL xdr Olga Kornievskaia
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-02 16:01 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: linux-nfs

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfs/nfs42.h     |  2 ++
 fs/nfs/nfs42proc.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/fs/nfs/nfs42.h b/fs/nfs/nfs42.h
index 820fd63..47ffbb0 100644
--- a/fs/nfs/nfs42.h
+++ b/fs/nfs/nfs42.h
@@ -23,6 +23,8 @@ int nfs42_proc_layoutstats_generic(struct nfs_server *,
 int nfs42_proc_clone(struct file *, struct file *, loff_t, loff_t, loff_t);
 int nfs42_proc_copy_notify(struct file *, struct file *,
 			   struct nfs42_copy_notify_res *);
+int nfs42_proc_offload_status(struct file *, nfs4_stateid *,
+			      struct nfs42_offload_status_res *);
 #endif /* CONFIG_NFS_V4_2 */
 
 #endif /* __LINUX_FS_NFS_NFS4_2_H */
diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index ade2938..d6de5ff 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -363,6 +363,48 @@ int nfs42_proc_copy_notify(struct file *src, struct file *dst,
 	return status;
 }
 
+int _nfs42_proc_offload_status(struct file *dst, nfs4_stateid *stateid,
+				struct nfs42_offload_status_res *res)
+{
+	struct nfs42_offload_status_args args;
+	struct nfs_server *dst_server = NFS_SERVER(file_inode(dst));
+	struct rpc_message msg = {
+		.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_OFFLOAD_STATUS],
+		.rpc_resp = res,
+	};
+	int status;
+
+	args.osa_src_fh = NFS_FH(file_inode(dst));
+	memcpy(&args.osa_stateid, stateid, sizeof(args.osa_stateid));
+	msg.rpc_argp = &args;
+	status = nfs4_call_sync(dst_server->client, dst_server, &msg,
+				&args.osa_seq_args, &res->osr_seq_res, 0);
+	if (status == -ENOTSUPP)
+		dst_server->caps &= ~NFS_CAP_OFFLOAD_STATUS;
+
+	return status;
+}
+
+int nfs42_proc_offload_status(struct file *dst, nfs4_stateid *stateid,
+				struct nfs42_offload_status_res *res)
+{
+	struct nfs_server *dst_server = NFS_SERVER(file_inode(dst));
+	struct nfs4_exception exception = { };
+	int status;
+
+	if (!(dst_server->caps & NFS_CAP_OFFLOAD_STATUS))
+		return -EOPNOTSUPP;
+
+	do {
+		status = _nfs42_proc_offload_status(dst, stateid, res);
+		if (status == -ENOTSUPP)
+			return -EOPNOTSUPP;
+		status = nfs4_handle_exception(dst_server, status, &exception);
+	} while (exception.retry);
+
+	return status;
+}
+
 static loff_t _nfs42_proc_llseek(struct file *filep,
 		struct nfs_lock_context *lock, loff_t offset, int whence)
 {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC v1 10/19] NFS OFFLOAD_CANCEL xdr
  2017-03-02 16:01 [RFC v1 00/19] NFS support for inter and async COPY Olga Kornievskaia
                   ` (8 preceding siblings ...)
  2017-03-02 16:01 ` [RFC v1 09/19] NFS OFFLOAD_STATUS op Olga Kornievskaia
@ 2017-03-02 16:01 ` Olga Kornievskaia
  2017-03-02 16:01 ` [RFC v1 11/19] NFS COPY xdr handle async reply Olga Kornievskaia
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-02 16:01 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: linux-nfs

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfs/nfs42xdr.c         | 66 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/nfs4proc.c         |  1 +
 fs/nfs/nfs4xdr.c          |  1 +
 include/linux/nfs4.h      |  1 +
 include/linux/nfs_fs_sb.h |  1 +
 5 files changed, 70 insertions(+)

diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
index 352af86..d782606 100644
--- a/fs/nfs/nfs42xdr.c
+++ b/fs/nfs/nfs42xdr.c
@@ -43,6 +43,9 @@
 #define decode_offload_status_maxsz	(op_decode_hdr_maxsz + \
 					 2 /* osr_count */ + \
 					 1 + 1 /* osr_complete<1> */)
+#define encode_offload_cancel_maxsz	(op_encode_hdr_maxsz + \
+					 XDR_QUADLEN(NFS4_STATEID_SIZE))
+#define decode_offload_cancel_maxsz	(op_decode_hdr_maxsz)
 #define encode_deallocate_maxsz		(op_encode_hdr_maxsz + \
 					 encode_fallocate_maxsz)
 #define decode_deallocate_maxsz		(op_decode_hdr_maxsz)
@@ -102,6 +105,12 @@
 #define NFS4_dec_offload_status_sz	(compound_decode_hdr_maxsz + \
 					 decode_putfh_maxsz + \
 					 decode_offload_status_maxsz)
+#define NFS4_enc_offload_cancel_sz	(compound_encode_hdr_maxsz + \
+					 encode_putfh_maxsz + \
+					 encode_offload_cancel_maxsz)
+#define NFS4_dec_offload_cancel_sz	(compound_decode_hdr_maxsz + \
+					 decode_putfh_maxsz + \
+					 decode_offload_cancel_maxsz)
 #define NFS4_enc_deallocate_sz		(compound_encode_hdr_maxsz + \
 					 encode_putfh_maxsz + \
 					 encode_deallocate_maxsz + \
@@ -220,6 +229,14 @@ static void encode_offload_status(struct xdr_stream *xdr,
 	encode_nfs4_stateid(xdr, &args->osa_stateid);
 }
 
+static void encode_offload_cancel(struct xdr_stream *xdr,
+				  struct nfs42_offload_status_args *args,
+				  struct compound_hdr *hdr)
+{
+	encode_op_hdr(xdr, OP_OFFLOAD_CANCEL, decode_offload_cancel_maxsz, hdr);
+	encode_nfs4_stateid(xdr, &args->osa_stateid);
+}
+
 static void encode_deallocate(struct xdr_stream *xdr,
 			      struct nfs42_falloc_args *args,
 			      struct compound_hdr *hdr)
@@ -357,6 +374,24 @@ static void nfs4_xdr_enc_offload_status(struct rpc_rqst *req,
 }
 
 /*
+ * Encode OFFLOAD_CANEL request
+ */
+static void nfs4_xdr_enc_offload_cancel(struct rpc_rqst *req,
+					struct xdr_stream *xdr,
+					struct nfs42_offload_status_args *args)
+{
+	struct compound_hdr hdr = {
+		.minorversion = nfs4_xdr_minorversion(&args->osa_seq_args),
+	};
+
+	encode_compound_hdr(xdr, req, &hdr);
+	encode_sequence(xdr, &args->osa_seq_args, &hdr);
+	encode_putfh(xdr, args->osa_src_fh, &hdr);
+	encode_offload_cancel(xdr, args, &hdr);
+	encode_nops(&hdr);
+}
+
+/*
  * Encode DEALLOCATE request
  */
 static void nfs4_xdr_enc_deallocate(struct rpc_rqst *req,
@@ -629,6 +664,12 @@ static int decode_offload_status(struct xdr_stream *xdr,
 	return -EIO;
 }
 
+static int decode_offload_cancel(struct xdr_stream *xdr,
+				 struct nfs42_offload_status_res *res)
+{
+	return decode_op_hdr(xdr, OP_OFFLOAD_CANCEL);
+}
+
 static int decode_deallocate(struct xdr_stream *xdr, struct nfs42_falloc_res *res)
 {
 	return decode_op_hdr(xdr, OP_DEALLOCATE);
@@ -774,6 +815,31 @@ static int nfs4_xdr_dec_offload_status(struct rpc_rqst *rqstp,
 }
 
 /*
+ * Decode OFFLOAD_CANCEL response
+ */
+static int nfs4_xdr_dec_offload_cancel(struct rpc_rqst *rqstp,
+				       struct xdr_stream *xdr,
+				       struct nfs42_offload_status_res *res)
+{
+	struct compound_hdr hdr;
+	int status;
+
+	status = decode_compound_hdr(xdr, &hdr);
+	if (status)
+		goto out;
+	status = decode_sequence(xdr, &res->osr_seq_res, rqstp);
+	if (status)
+		goto out;
+	status = decode_putfh(xdr);
+	if (status)
+		goto out;
+	status = decode_offload_cancel(xdr, res);
+
+out:
+	return status;
+}
+
+/*
  * Decode DEALLOCATE request
  */
 static int nfs4_xdr_dec_deallocate(struct rpc_rqst *rqstp,
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 9071652..59be0f7 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -9246,6 +9246,7 @@ static bool nfs4_match_stateid(const nfs4_stateid *s1,
 		| NFS_CAP_COPY
 		| NFS_CAP_COPY_NOTIFY
 		| NFS_CAP_OFFLOAD_STATUS
+		| NFS_CAP_OFFLOAD_CANCEL
 		| NFS_CAP_DEALLOCATE
 		| NFS_CAP_SEEK
 		| NFS_CAP_LAYOUTSTATS
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index a01ff49..697edf3 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -7575,6 +7575,7 @@ struct rpc_procinfo	nfs4_procedures[] = {
 	PROC(COPY,		enc_copy,		dec_copy),
 	PROC(COPY_NOTIFY,	enc_copy_notify,	dec_copy_notify),
 	PROC(OFFLOAD_STATUS,	enc_offload_status,	dec_offload_status),
+	PROC(OFFLOAD_CANCEL,    enc_offload_cancel,     dec_offload_cancel),
 #endif /* CONFIG_NFS_V4_2 */
 };
 
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 0ed1026..e43c106 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -526,6 +526,7 @@ enum {
 	NFSPROC4_CLNT_COPY,
 	NFSPROC4_CLNT_COPY_NOTIFY,
 	NFSPROC4_CLNT_OFFLOAD_STATUS,
+	NFSPROC4_CLNT_OFFLOAD_CANCEL,
 };
 
 /* nfs41 types */
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 4bc9a13..488ff40 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -252,5 +252,6 @@ struct nfs_server {
 #define NFS_CAP_COPY		(1U << 24)
 #define NFS_CAP_COPY_NOTIFY	(1U << 25)
 #define NFS_CAP_OFFLOAD_STATUS	(1U << 26)
+#define NFS_CAP_OFFLOAD_CANCEL	(1U << 27)
 
 #endif
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC v1 11/19] NFS COPY xdr handle async reply
  2017-03-02 16:01 [RFC v1 00/19] NFS support for inter and async COPY Olga Kornievskaia
                   ` (9 preceding siblings ...)
  2017-03-02 16:01 ` [RFC v1 10/19] NFS OFFLOAD_CANCEL xdr Olga Kornievskaia
@ 2017-03-02 16:01 ` Olga Kornievskaia
  2017-03-02 16:01 ` [RFC v1 12/19] NFS add support for asynchronous COPY Olga Kornievskaia
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-02 16:01 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: linux-nfs

If server returns async reply, it must include a callback stateid,
wr_callback_id in the write_response4.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfs/nfs42xdr.c       | 22 ++++++++++++----------
 include/linux/nfs_xdr.h |  1 +
 2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
index d782606..d5393ef 100644
--- a/fs/nfs/nfs42xdr.c
+++ b/fs/nfs/nfs42xdr.c
@@ -480,21 +480,23 @@ static int decode_write_response(struct xdr_stream *xdr,
 				 struct nfs42_write_res *res)
 {
 	__be32 *p;
+	int status, count;
 
-	p = xdr_inline_decode(xdr, 4 + 8 + 4);
+	p = xdr_inline_decode(xdr, 4);
 	if (unlikely(!p))
 		goto out_overflow;
-
-	/*
-	 * We never use asynchronous mode, so warn if a server returns
-	 * a stateid.
-	 */
-	if (unlikely(*p != 0)) {
-		pr_err_once("%s: server has set unrequested "
-				"asynchronous mode\n", __func__);
+	count = be32_to_cpup(p);
+	if (count > 1)
 		return -EREMOTEIO;
+	else if (count == 1) {
+		status = decode_opaque_fixed(xdr, &res->stateid,
+				NFS4_STATEID_SIZE);
+		if (unlikely(status))
+			goto out_overflow;
 	}
-	p++;
+	p = xdr_inline_decode(xdr, 8 + 4);
+	if (unlikely(!p))
+		goto out_overflow;
 	p = xdr_decode_hyper(p, &res->count);
 	res->verifier.committed = be32_to_cpup(p);
 	return decode_verifier(xdr, &res->verifier.verifier);
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 65b4323..2ed335e 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1376,6 +1376,7 @@ struct nfs42_copy_args {
 };
 
 struct nfs42_write_res {
+	nfs4_stateid		stateid;
 	u64			count;
 	struct nfs_writeverf	verifier;
 };
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC v1 12/19] NFS add support for asynchronous COPY
  2017-03-02 16:01 [RFC v1 00/19] NFS support for inter and async COPY Olga Kornievskaia
                   ` (10 preceding siblings ...)
  2017-03-02 16:01 ` [RFC v1 11/19] NFS COPY xdr handle async reply Olga Kornievskaia
@ 2017-03-02 16:01 ` Olga Kornievskaia
  2017-03-02 16:01 ` [RFC v1 13/19] NFS handle COPY reply CB_OFFLOAD call race Olga Kornievskaia
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-02 16:01 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: linux-nfs

Change xdr to always send COPY asynchronously.

Keep the list copies send in a list under a server structure.
Once copy is sent, it waits on a completion structure that will
be signalled by the callback thread that receives CB_OFFLOAD.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfs/callback_proc.c    | 34 ++++++++++++++++++++++++++++++++++
 fs/nfs/client.c           |  1 +
 fs/nfs/nfs42proc.c        | 46 ++++++++++++++++++++++++++++++++++++++++++----
 fs/nfs/nfs42xdr.c         |  2 +-
 include/linux/nfs_fs.h    |  9 +++++++++
 include/linux/nfs_fs_sb.h |  1 +
 6 files changed, 88 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index b68b803..c1b081c 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -681,9 +681,43 @@ __be32 nfs4_callback_notify_lock(struct cb_notify_lock_args *args, void *dummy,
 }
 #endif /* CONFIG_NFS_V4_1 */
 #ifdef CONFIG_NFS_V4_2
+static void nfs4_copy_cb_args(struct nfs4_copy_state *cp_state,
+				struct cb_offloadargs *args)
+{
+	cp_state->count = args->wr_count;
+	cp_state->error = args->error;
+	if (!args->error) {
+		cp_state->verf.committed = args->wr_writeverf.committed;
+		memcpy(&cp_state->verf.verifier.data[0],
+			&args->wr_writeverf.verifier.data[0],
+			NFS4_VERIFIER_SIZE);
+	}
+}
+
 __be32 nfs4_callback_offload(struct cb_offloadargs *args, void *dummy,
 				struct cb_process_state *cps)
 {
+	struct nfs_server *server;
+	struct nfs4_copy_state *copy;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(server, &cps->clp->cl_superblocks,
+				client_link) {
+		spin_lock(&server->nfs_client->cl_lock);
+		list_for_each_entry(copy, &server->ss_copies, copies) {
+			if (memcmp(args->coa_stateid.other,
+					copy->stateid.other,
+					sizeof(args->coa_stateid.other)))
+				continue;
+			nfs4_copy_cb_args(copy, args);
+			complete(&copy->completion);
+			spin_unlock(&server->nfs_client->cl_lock);
+			goto out;
+		}
+		spin_unlock(&server->nfs_client->cl_lock);
+	}
+out:
+	rcu_read_unlock();
 	return 0;
 }
 #endif /* CONFIG_NFS_V4_2 */
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 91a8d61..a72cd1d 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -885,6 +885,7 @@ struct nfs_server *nfs_alloc_server(void)
 	INIT_LIST_HEAD(&server->delegations);
 	INIT_LIST_HEAD(&server->layouts);
 	INIT_LIST_HEAD(&server->state_owners_lru);
+	INIT_LIST_HEAD(&server->ss_copies);
 
 	atomic_set(&server->active, 0);
 
diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index d6de5ff..e45e554 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -169,6 +169,38 @@ int nfs42_proc_deallocate(struct file *filep, loff_t offset, loff_t len)
 	return err;
 }
 
+static int handle_async_copy(struct nfs42_copy_res *res,
+			     struct nfs_server *server,
+			     struct file *src,
+			     struct file *dst,
+			     nfs4_stateid *src_stateid,
+			     uint64_t *ret_count)
+{
+	struct nfs4_copy_state *copy;
+	int status;
+
+	copy = kzalloc(sizeof(struct nfs4_copy_state), GFP_NOFS);
+	if (!copy)
+		return -ENOMEM;
+	memcpy(&copy->stateid, &res->write_res.stateid, NFS4_STATEID_SIZE);
+	init_completion(&copy->completion);
+
+	spin_lock(&server->nfs_client->cl_lock);
+	list_add_tail(&copy->copies, &server->ss_copies);
+	spin_unlock(&server->nfs_client->cl_lock);
+
+	wait_for_completion_interruptible(&copy->completion);
+	spin_lock(&server->nfs_client->cl_lock);
+	list_del_init(&copy->copies);
+	spin_unlock(&server->nfs_client->cl_lock);
+	*ret_count = copy->count;
+	status = -copy->error;
+	if (copy->count && copy->verf.committed != NFS_FILE_SYNC)
+		status = nfs_commit_file(dst, &copy->verf.verifier);
+	kfree(copy);
+	return status;
+}
+
 static ssize_t _nfs42_proc_copy(struct file *src,
 				struct nfs_lock_context *src_lock,
 				struct file *dst,
@@ -189,6 +221,7 @@ static ssize_t _nfs42_proc_copy(struct file *src,
 	loff_t pos_dst = args->dst_pos;
 	size_t count = args->count;
 	int status;
+	uint64_t ret_count;
 
 	if (nss) {
 		args->cp_src = nss;
@@ -220,16 +253,21 @@ static ssize_t _nfs42_proc_copy(struct file *src,
 	if (status)
 		return status;
 
-	if (res->write_res.verifier.committed != NFS_FILE_SYNC) {
+	ret_count = res->write_res.count;
+	if (!res->synchronous) {
+		status = handle_async_copy(res, server, src, dst,
+				&args->src_stateid, &ret_count);
+		if (status)
+			return status;
+	} else if (res->write_res.verifier.committed != NFS_FILE_SYNC) {
 		status = nfs_commit_file(dst, &res->write_res.verifier.verifier);
 		if (status)
 			return status;
 	}
 
-	truncate_pagecache_range(dst_inode, pos_dst,
-				 pos_dst + res->write_res.count);
+	truncate_pagecache_range(dst_inode, pos_dst, pos_dst + ret_count);
 
-	return res->write_res.count;
+	return ret_count;
 }
 
 ssize_t nfs42_proc_copy(struct file *src, loff_t pos_src,
diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
index d5393ef..9e388e1 100644
--- a/fs/nfs/nfs42xdr.c
+++ b/fs/nfs/nfs42xdr.c
@@ -199,7 +199,7 @@ static void encode_copy(struct xdr_stream *xdr,
 	encode_uint64(xdr, args->count);
 
 	encode_uint32(xdr, 1); /* consecutive = true */
-	encode_uint32(xdr, 1); /* synchronous = true */
+	encode_uint32(xdr, 0); /* synchronous = false */
 	if (args->cp_src == NULL) { /* intra-ssc */
 		encode_uint32(xdr, 0); /* no src server list */
 		return;
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index f1da8c8..e174cba 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -182,6 +182,15 @@ struct nfs_inode {
 	struct inode		vfs_inode;
 };
 
+struct nfs4_copy_state {
+	struct list_head	copies;
+	nfs4_stateid		stateid;
+	struct completion	completion;
+	uint64_t		count;
+	struct nfs_writeverf	verf;
+	int			error;
+};
+
 /*
  * Cache validity bit flags
  */
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 488ff40..d2c33a8 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -206,6 +206,7 @@ struct nfs_server {
 	struct list_head	state_owners_lru;
 	struct list_head	layouts;
 	struct list_head	delegations;
+	struct list_head	ss_copies;
 
 	unsigned long		mig_gen;
 	unsigned long		mig_status;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC v1 13/19] NFS handle COPY reply CB_OFFLOAD call race
  2017-03-02 16:01 [RFC v1 00/19] NFS support for inter and async COPY Olga Kornievskaia
                   ` (11 preceding siblings ...)
  2017-03-02 16:01 ` [RFC v1 12/19] NFS add support for asynchronous COPY Olga Kornievskaia
@ 2017-03-02 16:01 ` Olga Kornievskaia
  2017-03-02 16:01 ` [RFC v1 14/19] NFS send OFFLOAD_CANCEL when COPY killed Olga Kornievskaia
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-02 16:01 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: linux-nfs

It's possible that server replies back with CB_OFFLOAD call and
COPY reply at the same time such that client will process
CB_OFFLOAD before reply to COPY. For that keep a list of pending
callback stateids received and them before waiting on completion
check the pending list.

Cleanup any pending copies on the client shutdown.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfs/callback_proc.c    | 17 ++++++++++++++---
 fs/nfs/nfs42proc.c        | 22 ++++++++++++++++++++--
 fs/nfs/nfs4client.c       | 15 +++++++++++++++
 include/linux/nfs_fs_sb.h |  1 +
 4 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index c1b081c..01bcb4b 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -699,11 +699,12 @@ __be32 nfs4_callback_offload(struct cb_offloadargs *args, void *dummy,
 {
 	struct nfs_server *server;
 	struct nfs4_copy_state *copy;
+	bool found = false;
 
+	spin_lock(&cps->clp->cl_lock);
 	rcu_read_lock();
 	list_for_each_entry_rcu(server, &cps->clp->cl_superblocks,
 				client_link) {
-		spin_lock(&server->nfs_client->cl_lock);
 		list_for_each_entry(copy, &server->ss_copies, copies) {
 			if (memcmp(args->coa_stateid.other,
 					copy->stateid.other,
@@ -711,13 +712,23 @@ __be32 nfs4_callback_offload(struct cb_offloadargs *args, void *dummy,
 				continue;
 			nfs4_copy_cb_args(copy, args);
 			complete(&copy->completion);
-			spin_unlock(&server->nfs_client->cl_lock);
+			found = true;
 			goto out;
 		}
-		spin_unlock(&server->nfs_client->cl_lock);
 	}
 out:
 	rcu_read_unlock();
+	if (!found) {
+		copy = kzalloc(sizeof(struct nfs4_copy_state), GFP_NOFS);
+		if (!copy) {
+			spin_unlock(&cps->clp->cl_lock);
+			return -ENOMEM;
+		}
+		memcpy(&copy->stateid, &args->coa_stateid, NFS4_STATEID_SIZE);
+		nfs4_copy_cb_args(copy, args);
+		list_add_tail(&copy->copies, &cps->clp->pending_cb_stateids);
+	}
+	spin_unlock(&cps->clp->cl_lock);
 	return 0;
 }
 #endif /* CONFIG_NFS_V4_2 */
diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index e45e554..cd9d955 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -178,14 +178,31 @@ static int handle_async_copy(struct nfs42_copy_res *res,
 {
 	struct nfs4_copy_state *copy;
 	int status;
+	bool found_pending = false;
+
+	spin_lock(&server->nfs_client->cl_lock);
+	list_for_each_entry(copy, &server->nfs_client->pending_cb_stateids,
+				copies) {
+		if (memcmp(&res->write_res.stateid, &copy->stateid,
+				NFS4_STATEID_SIZE))
+			continue;
+		found_pending = true;
+		list_del(&copy->copies);
+		break;
+	}
+	if (found_pending) {
+		spin_unlock(&server->nfs_client->cl_lock);
+		goto out;
+	}
 
 	copy = kzalloc(sizeof(struct nfs4_copy_state), GFP_NOFS);
-	if (!copy)
+	if (!copy) {
+		spin_unlock(&server->nfs_client->cl_lock);
 		return -ENOMEM;
+	}
 	memcpy(&copy->stateid, &res->write_res.stateid, NFS4_STATEID_SIZE);
 	init_completion(&copy->completion);
 
-	spin_lock(&server->nfs_client->cl_lock);
 	list_add_tail(&copy->copies, &server->ss_copies);
 	spin_unlock(&server->nfs_client->cl_lock);
 
@@ -193,6 +210,7 @@ static int handle_async_copy(struct nfs42_copy_res *res,
 	spin_lock(&server->nfs_client->cl_lock);
 	list_del_init(&copy->copies);
 	spin_unlock(&server->nfs_client->cl_lock);
+out:
 	*ret_count = copy->count;
 	status = -copy->error;
 	if (copy->count && copy->verf.committed != NFS_FILE_SYNC)
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 5ae9d64..bebb97e1 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -156,9 +156,23 @@ struct rpc_clnt *
 	}
 }
 
+static void
+nfs4_cleanup_callback(struct nfs_client *clp)
+{
+	struct nfs4_copy_state *cp_state;
+
+	while (!list_empty(&clp->pending_cb_stateids)) {
+		cp_state = list_entry(clp->pending_cb_stateids.next,
+					struct nfs4_copy_state, copies);
+		list_del(&cp_state->copies);
+		kfree(cp_state);
+	}
+}
+
 void nfs41_shutdown_client(struct nfs_client *clp)
 {
 	if (nfs4_has_session(clp)) {
+		nfs4_cleanup_callback(clp);
 		nfs4_shutdown_ds_clients(clp);
 		nfs4_destroy_session(clp->cl_session);
 		nfs4_destroy_clientid(clp);
@@ -202,6 +216,7 @@ struct nfs_client *nfs4_alloc_client(const struct nfs_client_initdata *cl_init)
 #if IS_ENABLED(CONFIG_NFS_V4_1)
 	init_waitqueue_head(&clp->cl_lock_waitq);
 #endif
+	INIT_LIST_HEAD(&clp->pending_cb_stateids);
 	return clp;
 
 error:
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index d2c33a8..6a35498 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -118,6 +118,7 @@ struct nfs_client {
 #endif
 
 	struct net		*cl_net;
+	struct list_head	pending_cb_stateids;
 };
 
 /*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC v1 14/19] NFS send OFFLOAD_CANCEL when COPY killed
  2017-03-02 16:01 [RFC v1 00/19] NFS support for inter and async COPY Olga Kornievskaia
                   ` (12 preceding siblings ...)
  2017-03-02 16:01 ` [RFC v1 13/19] NFS handle COPY reply CB_OFFLOAD call race Olga Kornievskaia
@ 2017-03-02 16:01 ` Olga Kornievskaia
  2017-03-02 16:01 ` [RFC v1 15/19] NFS make COPY synchronous xdr configurable Olga Kornievskaia
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-02 16:01 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: linux-nfs

When COPY is killed send OFFLOAD_CANCEL to both destination and
source servers.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfs/nfs42proc.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 93 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index cd9d955..7a1b2ae 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -19,6 +19,8 @@
 
 #define NFSDBG_FACILITY NFSDBG_PROC
 
+static int nfs42_do_offload_cancel_async(struct file *dst,
+				nfs4_stateid *stateid);
 static int nfs42_set_open_stateid(nfs4_stateid *dst, struct file *file)
 {
 	struct nfs_open_context *ctx;
@@ -206,10 +208,16 @@ static int handle_async_copy(struct nfs42_copy_res *res,
 	list_add_tail(&copy->copies, &server->ss_copies);
 	spin_unlock(&server->nfs_client->cl_lock);
 
-	wait_for_completion_interruptible(&copy->completion);
+	status = wait_for_completion_interruptible(&copy->completion);
 	spin_lock(&server->nfs_client->cl_lock);
 	list_del_init(&copy->copies);
 	spin_unlock(&server->nfs_client->cl_lock);
+	if (status == -ERESTARTSYS) {
+		nfs42_do_offload_cancel_async(dst, &copy->stateid);
+		nfs42_do_offload_cancel_async(src, src_stateid);
+		kfree(copy);
+		return status;
+	}
 out:
 	*ret_count = copy->count;
 	status = -copy->error;
@@ -266,8 +274,13 @@ static ssize_t _nfs42_proc_copy(struct file *src,
 
 	status = nfs4_call_sync(server->client, server, &msg,
 				&args->seq_args, &res->seq_res, 0);
-	if (status == -ENOTSUPP)
+	switch (status) {
+	case -ENOTSUPP:
 		server->caps &= ~NFS_CAP_COPY;
+		break;
+	case -ERESTARTSYS:
+		nfs42_do_offload_cancel_async(src, &args->src_stateid);
+	}
 	if (status)
 		return status;
 
@@ -419,6 +432,84 @@ int nfs42_proc_copy_notify(struct file *src, struct file *dst,
 	return status;
 }
 
+struct nfs42_offloadcancel_data {
+	struct nfs_server *seq_server;
+	struct nfs42_offload_status_args args;
+	struct nfs42_offload_status_res res;
+};
+
+static void nfs42_offload_cancel_prepare(struct rpc_task *task, void *calldata)
+{
+	struct nfs42_offloadcancel_data *data = calldata;
+	struct nfs4_session *session = nfs4_get_session(data->seq_server);
+
+	nfs41_setup_sequence(session, &data->args.osa_seq_args,
+				&data->res.osr_seq_res, task);
+}
+
+static void nfs42_offload_cancel_done(struct rpc_task *task, void *calldata)
+{
+	struct nfs42_offloadcancel_data *data = calldata;
+
+	nfs41_sequence_done(task, &data->res.osr_seq_res);
+	kfree(data);
+}
+
+static void nfs42_free_offloadcancel_data(void *data)
+{
+	kfree(data);
+}
+
+static const struct rpc_call_ops nfs42_offload_cancel_ops = {
+	.rpc_call_prepare = nfs42_offload_cancel_prepare,
+	.rpc_call_done = nfs42_offload_cancel_done,
+	.rpc_release = nfs42_free_offloadcancel_data,
+};
+
+static int nfs42_do_offload_cancel_async(struct file *dst,
+				nfs4_stateid *stateid)
+{
+	struct nfs_server *dst_server = NFS_SERVER(file_inode(dst));
+	struct nfs42_offloadcancel_data *data = NULL;
+	struct nfs_open_context *ctx = nfs_file_open_context(dst);
+	struct rpc_task *task;
+	struct rpc_message msg = {
+		.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_OFFLOAD_CANCEL],
+		.rpc_cred = ctx->cred,
+	};
+	struct rpc_task_setup task_setup_data = {
+		.rpc_client = dst_server->client,
+		.rpc_message = &msg,
+		.callback_ops = &nfs42_offload_cancel_ops,
+		.workqueue = nfsiod_workqueue,
+		.flags = RPC_TASK_ASYNC,
+	};
+	int status;
+
+	if (!(dst_server->caps & NFS_CAP_OFFLOAD_CANCEL))
+		return -EOPNOTSUPP;
+
+	data = kzalloc(sizeof(struct nfs42_offloadcancel_data), GFP_NOFS);
+	if (data == NULL)
+		return -ENOMEM;
+	data->seq_server = dst_server;
+	data->args.osa_src_fh = NFS_FH(file_inode(dst));
+	memcpy(&data->args.osa_stateid, stateid,
+		sizeof(data->args.osa_stateid));
+	msg.rpc_argp = &data->args;
+	msg.rpc_resp = &data->res;
+	task_setup_data.callback_data = data;
+	nfs4_init_sequence(&data->args.osa_seq_args, &data->res.osr_seq_res, 1);
+	task = rpc_run_task(&task_setup_data);
+	if (IS_ERR(task))
+		return PTR_ERR(task);
+	status = rpc_wait_for_completion_task(task);
+	if (status == -ENOTSUPP)
+		dst_server->caps &= ~NFS_CAP_OFFLOAD_CANCEL;
+	rpc_put_task(task);
+	return status;
+}
+
 int _nfs42_proc_offload_status(struct file *dst, nfs4_stateid *stateid,
 				struct nfs42_offload_status_res *res)
 {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC v1 15/19] NFS make COPY synchronous xdr configurable
  2017-03-02 16:01 [RFC v1 00/19] NFS support for inter and async COPY Olga Kornievskaia
                   ` (13 preceding siblings ...)
  2017-03-02 16:01 ` [RFC v1 14/19] NFS send OFFLOAD_CANCEL when COPY killed Olga Kornievskaia
@ 2017-03-02 16:01 ` Olga Kornievskaia
  2017-03-02 16:01 ` [RFC v1 16/19] NFS handle COPY ERR_OFFLOAD_NO_REQS Olga Kornievskaia
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-02 16:01 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: linux-nfs

Instead of hardcoding always sending async copy, make it
configurable. In case, we get ERR_OFFLOAD_NO_REQS and we need to
resend the operation as synchronous.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfs/nfs42proc.c      | 2 ++
 fs/nfs/nfs42xdr.c       | 2 +-
 include/linux/nfs_xdr.h | 1 +
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index 7a1b2ae..b29f2c9 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -248,6 +248,7 @@ static ssize_t _nfs42_proc_copy(struct file *src,
 	size_t count = args->count;
 	int status;
 	uint64_t ret_count;
+	bool sync = false;
 
 	if (nss) {
 		args->cp_src = nss;
@@ -272,6 +273,7 @@ static ssize_t _nfs42_proc_copy(struct file *src,
 	if (status)
 		return status;
 
+	args->sync = sync;
 	status = nfs4_call_sync(server->client, server, &msg,
 				&args->seq_args, &res->seq_res, 0);
 	switch (status) {
diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
index 9e388e1..d300b77 100644
--- a/fs/nfs/nfs42xdr.c
+++ b/fs/nfs/nfs42xdr.c
@@ -199,7 +199,7 @@ static void encode_copy(struct xdr_stream *xdr,
 	encode_uint64(xdr, args->count);
 
 	encode_uint32(xdr, 1); /* consecutive = true */
-	encode_uint32(xdr, 0); /* synchronous = false */
+	encode_uint32(xdr, args->sync);
 	if (args->cp_src == NULL) { /* intra-ssc */
 		encode_uint32(xdr, 0); /* no src server list */
 		return;
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 2ed335e..c210e93 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1373,6 +1373,7 @@ struct nfs42_copy_args {
 	u64				count;
 	/* Support one source server */
 	struct nl4_servers		*cp_src;
+	bool				sync;
 };
 
 struct nfs42_write_res {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC v1 16/19] NFS handle COPY ERR_OFFLOAD_NO_REQS
  2017-03-02 16:01 [RFC v1 00/19] NFS support for inter and async COPY Olga Kornievskaia
                   ` (14 preceding siblings ...)
  2017-03-02 16:01 ` [RFC v1 15/19] NFS make COPY synchronous xdr configurable Olga Kornievskaia
@ 2017-03-02 16:01 ` Olga Kornievskaia
  2017-03-02 16:01 ` [RFC v1 17/19] NFS skip recovery of copy open on dest server Olga Kornievskaia
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-02 16:01 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: linux-nfs

If client sent async COPY and server replied with
ERR_OFFLOAD_NO_REQS, client should retry with a synchronous copy.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfs/nfs42proc.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index b29f2c9..c9d8d7a 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -273,10 +273,17 @@ static ssize_t _nfs42_proc_copy(struct file *src,
 	if (status)
 		return status;
 
+retry:
 	args->sync = sync;
 	status = nfs4_call_sync(server->client, server, &msg,
 				&args->seq_args, &res->seq_res, 0);
 	switch (status) {
+	case -NFS4ERR_OFFLOAD_NO_REQS:
+		if (!sync) {
+			sync = true;
+			goto retry;
+		}
+		break;
 	case -ENOTSUPP:
 		server->caps &= ~NFS_CAP_COPY;
 		break;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC v1 17/19] NFS skip recovery of copy open on dest server
  2017-03-02 16:01 [RFC v1 00/19] NFS support for inter and async COPY Olga Kornievskaia
                   ` (15 preceding siblings ...)
  2017-03-02 16:01 ` [RFC v1 16/19] NFS handle COPY ERR_OFFLOAD_NO_REQS Olga Kornievskaia
@ 2017-03-02 16:01 ` Olga Kornievskaia
  2017-03-02 16:01 ` [RFC v1 18/19] NFS recover from destination server reboot for copies Olga Kornievskaia
  2017-03-02 16:01 ` [RFC v1 19/19] NFS if we got partial copy ignore errors Olga Kornievskaia
  18 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-02 16:01 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: linux-nfs

Mark the open created for the source file on the destination
server. Then if this open is going thru a recovery, then fail
the recovery as we don't need to be recoving a "fake" open.
We need to fail the ongoing READs and vfs_copy_file_range().

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfs/nfs4_fs.h   |  3 +++
 fs/nfs/nfs4file.c  |  2 +-
 fs/nfs/nfs4state.c | 14 ++++++++++++++
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index c23089e..07b93fde 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -161,6 +161,9 @@ enum {
 	NFS_STATE_POSIX_LOCKS,		/* Posix locks are supported */
 	NFS_STATE_RECOVERY_FAILED,	/* OPEN stateid state recovery failed */
 	NFS_STATE_MAY_NOTIFY_LOCK,	/* server may CB_NOTIFY_LOCK */
+#ifdef CONFIG_NFS_V4_2
+	NFS_SRV_SSC_COPY_STATE,		/* ssc state on the dst server */
+#endif /* CONFIG_NFS_V4_2 */
 };
 
 struct nfs4_state {
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index 43a2346..f6ccc6f 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -347,7 +347,7 @@ struct file *
 	ctx->state = nfs4_get_open_state(r_ino, sp);
 	if (ctx->state == NULL)
 		goto out_stateowner;
-
+	set_bit(NFS_SRV_SSC_COPY_STATE, &ctx->state->flags);
 	update_open_stateid(ctx->state, stateid, NULL, filep->f_mode);
 
 	nfs_file_set_open_context(filep, ctx);
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 5827d95..af5d8d7 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1501,6 +1501,9 @@ static int nfs4_reclaim_open_state(struct nfs4_state_owner *sp, const struct nfs
 	struct nfs4_state *state;
 	struct nfs4_lock_state *lock;
 	int status = 0;
+#ifdef CONFIG_NFS_V4_2
+	bool found_ssc_copy_state = false;
+#endif /* CONFIG_NFS_V4_2 */
 
 	/* Note: we rely on the sp->so_states list being ordered 
 	 * so that we always reclaim open(O_RDWR) and/or open(O_WRITE)
@@ -1520,6 +1523,13 @@ static int nfs4_reclaim_open_state(struct nfs4_state_owner *sp, const struct nfs
 			continue;
 		if (state->state == 0)
 			continue;
+#ifdef CONFIG_NFS_V4_2
+		if (test_bit(NFS_SRV_SSC_COPY_STATE, &state->flags)) {
+			nfs4_state_mark_recovery_failed(state, -EIO);
+			found_ssc_copy_state = true;
+			continue;
+		}
+#endif /* CONFIG_NFS_V4_2 */
 		atomic_inc(&state->count);
 		spin_unlock(&sp->so_lock);
 		status = ops->recover_open(sp, state);
@@ -1582,6 +1592,10 @@ static int nfs4_reclaim_open_state(struct nfs4_state_owner *sp, const struct nfs
 	}
 	raw_write_seqcount_end(&sp->so_reclaim_seqcount);
 	spin_unlock(&sp->so_lock);
+#ifdef CONFIG_NFS_V4_2
+	if (found_ssc_copy_state)
+		return -EIO;
+#endif /* CONFIG_NFS_V4_2 */
 	return 0;
 out_err:
 	nfs4_put_open_state(state);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC v1 18/19] NFS recover from destination server reboot for copies
  2017-03-02 16:01 [RFC v1 00/19] NFS support for inter and async COPY Olga Kornievskaia
                   ` (16 preceding siblings ...)
  2017-03-02 16:01 ` [RFC v1 17/19] NFS skip recovery of copy open on dest server Olga Kornievskaia
@ 2017-03-02 16:01 ` Olga Kornievskaia
  2017-03-02 16:01 ` [RFC v1 19/19] NFS if we got partial copy ignore errors Olga Kornievskaia
  18 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-02 16:01 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: linux-nfs

Mark the destination state to indicate a server-side copy is
happening. On detecting a reboot and recovering open state check
if any state is engaged in a server-side copy, if so, find the
copy and mark it and then signal the waiting thread. Upon wakeup,
if copy was marked then propage EAGAIN to the nfsd_copy_file_range
and restart the copy from scratch (no partial state is being
queried at this point).

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfs/nfs42proc.c     | 22 +++++++++++++++++-----
 fs/nfs/nfs4_fs.h       |  1 +
 fs/nfs/nfs4file.c      |  3 +++
 fs/nfs/nfs4state.c     | 15 +++++++++++++++
 include/linux/nfs_fs.h |  2 ++
 5 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index c9d8d7a..232447d 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -181,6 +181,7 @@ static int handle_async_copy(struct nfs42_copy_res *res,
 	struct nfs4_copy_state *copy;
 	int status;
 	bool found_pending = false;
+	struct nfs_open_context *ctx = nfs_file_open_context(dst);
 
 	spin_lock(&server->nfs_client->cl_lock);
 	list_for_each_entry(copy, &server->nfs_client->pending_cb_stateids,
@@ -204,6 +205,7 @@ static int handle_async_copy(struct nfs42_copy_res *res,
 	}
 	memcpy(&copy->stateid, &res->write_res.stateid, NFS4_STATEID_SIZE);
 	init_completion(&copy->completion);
+	copy->parent_state = ctx->state;
 
 	list_add_tail(&copy->copies, &server->ss_copies);
 	spin_unlock(&server->nfs_client->cl_lock);
@@ -213,10 +215,10 @@ static int handle_async_copy(struct nfs42_copy_res *res,
 	list_del_init(&copy->copies);
 	spin_unlock(&server->nfs_client->cl_lock);
 	if (status == -ERESTARTSYS) {
-		nfs42_do_offload_cancel_async(dst, &copy->stateid);
-		nfs42_do_offload_cancel_async(src, src_stateid);
-		kfree(copy);
-		return status;
+		goto out_cancel;
+	} else if (copy->flags) {
+		status = -EAGAIN;
+		goto out_cancel;
 	}
 out:
 	*ret_count = copy->count;
@@ -225,6 +227,11 @@ static int handle_async_copy(struct nfs42_copy_res *res,
 		status = nfs_commit_file(dst, &copy->verf.verifier);
 	kfree(copy);
 	return status;
+out_cancel:
+	nfs42_do_offload_cancel_async(dst, &copy->stateid);
+	nfs42_do_offload_cancel_async(src, src_stateid);
+	kfree(copy);
+	return status;
 }
 
 static ssize_t _nfs42_proc_copy(struct file *src,
@@ -273,6 +280,8 @@ static ssize_t _nfs42_proc_copy(struct file *src,
 	if (status)
 		return status;
 
+	set_bit(NFS_CLNT_DST_SSC_COPY_STATE,
+		&dst_lock->open_context->state->flags);
 retry:
 	args->sync = sync;
 	status = nfs4_call_sync(server->client, server, &msg,
@@ -363,9 +372,12 @@ ssize_t nfs42_proc_copy(struct file *src, loff_t pos_src,
 
 		if (err >= 0)
 			break;
-		if (err == -ENOTSUPP) {
+		switch (err) {
+		case -ENOTSUPP:
 			err = -EOPNOTSUPP;
 			break;
+		case -EAGAIN:
+			break;
 		}
 
 		err2 = nfs4_handle_exception(server, err, &src_exception);
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 07b93fde..8037ec3 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -162,6 +162,7 @@ enum {
 	NFS_STATE_RECOVERY_FAILED,	/* OPEN stateid state recovery failed */
 	NFS_STATE_MAY_NOTIFY_LOCK,	/* server may CB_NOTIFY_LOCK */
 #ifdef CONFIG_NFS_V4_2
+	NFS_CLNT_DST_SSC_COPY_STATE,    /* dst server open state on client*/
 	NFS_SRV_SSC_COPY_STATE,		/* ssc state on the dst server */
 #endif /* CONFIG_NFS_V4_2 */
 };
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index f6ccc6f..92822c4 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -142,6 +142,7 @@ static ssize_t nfs4_copy_file_range(struct file *file_in, loff_t pos_in,
 	if (file_inode(file_in) == file_inode(file_out))
 		return -EINVAL;
 
+retry:
 	if (nfs42_intra_ssc(file_in, file_out)) {  /* Intra-ssc */
 		if (file_in->f_op != file_out->f_op)
 			return -EXDEV;
@@ -173,6 +174,8 @@ static ssize_t nfs4_copy_file_range(struct file *file_in, loff_t pos_in,
 		kfree(cn_resp->cnr_src.nl_svr);
 		kfree(cn_resp);
 	}
+	if (ret == -EAGAIN)
+		goto retry;
 	return ret;
 }
 
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index af5d8d7..47ece10 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1550,6 +1550,21 @@ static int nfs4_reclaim_open_state(struct nfs4_state_owner *sp, const struct nfs
 					&state->flags);
 				nfs4_put_open_state(state);
 				spin_lock(&sp->so_lock);
+#ifdef CONFIG_NFS_V4_2
+				if (test_bit(NFS_CLNT_DST_SSC_COPY_STATE, &state->flags)) {
+					struct nfs4_copy_state *copy;
+
+					spin_lock(&sp->so_server->nfs_client->cl_lock);
+					list_for_each_entry(copy, &sp->so_server->ss_copies, copies) {
+						if (memcmp(&state->stateid.other, &copy->parent_state->stateid.other, NFS4_STATEID_SIZE))
+							continue;
+						copy->flags = 1;
+						complete(&copy->completion);
+						break;
+					}
+					spin_unlock(&sp->so_server->nfs_client->cl_lock);
+				}
+#endif /* CONFIG_NFS_V4_2 */
 				goto restart;
 			}
 		}
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index e174cba..7d60a0a 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -189,6 +189,8 @@ struct nfs4_copy_state {
 	uint64_t		count;
 	struct nfs_writeverf	verf;
 	int			error;
+	int			flags;
+	struct nfs4_state	*parent_state;
 };
 
 /*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC v1 19/19] NFS if we got partial copy ignore errors
  2017-03-02 16:01 [RFC v1 00/19] NFS support for inter and async COPY Olga Kornievskaia
                   ` (17 preceding siblings ...)
  2017-03-02 16:01 ` [RFC v1 18/19] NFS recover from destination server reboot for copies Olga Kornievskaia
@ 2017-03-02 16:01 ` Olga Kornievskaia
  18 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-02 16:01 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: linux-nfs

Don't ignore ENOSPC. Otherwise, try next copy chunk

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfs/nfs42proc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index 232447d..7a8f770 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -222,7 +222,8 @@ static int handle_async_copy(struct nfs42_copy_res *res,
 	}
 out:
 	*ret_count = copy->count;
-	status = -copy->error;
+	if (copy->count < 0 || copy->error == ENOSPC)
+		status = -copy->error;
 	if (copy->count && copy->verf.committed != NFS_FILE_SYNC)
 		status = nfs_commit_file(dst, &copy->verf.verifier);
 	kfree(copy);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
  2017-03-02 16:01 ` [RFC v1 01/19] fs: Don't copy beyond the end of the file Olga Kornievskaia
@ 2017-03-02 16:22   ` Christoph Hellwig
  2017-03-02 16:34     ` Olga Kornievskaia
  2017-03-03 20:47     ` J. Bruce Fields
  0 siblings, 2 replies; 56+ messages in thread
From: Christoph Hellwig @ 2017-03-02 16:22 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: Trond.Myklebust, linux-nfs

On Thu, Mar 02, 2017 at 11:01:05AM -0500, Olga Kornievskaia wrote:
> +	if (pos_in >= i_size_read(inode_in))
> +		return -EINVAL;

That's not how the syscall is supposed to work, we'd rather do a
short read^^^^^copy.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
  2017-03-02 16:22   ` Christoph Hellwig
@ 2017-03-02 16:34     ` Olga Kornievskaia
  2017-03-03 20:47     ` J. Bruce Fields
  1 sibling, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-02 16:34 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Trond.Myklebust, linux-nfs


> On Mar 2, 2017, at 11:22 AM, Christoph Hellwig <hch@infradead.org> =
wrote:
>=20
> On Thu, Mar 02, 2017 at 11:01:05AM -0500, Olga Kornievskaia wrote:
>> +	if (pos_in >=3D i_size_read(inode_in))
>> +		return -EINVAL;
>=20
> That's not how the syscall is supposed to work, we'd rather do a
> short read^^^^^copy.

This is when the offset from which to read is beyond the end of the =
file. I=E2=80=99m not sure what kind of short read are you expecting =
here.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
  2017-03-02 16:22   ` Christoph Hellwig
  2017-03-02 16:34     ` Olga Kornievskaia
@ 2017-03-03 20:47     ` J. Bruce Fields
  2017-03-03 21:08       ` Olga Kornievskaia
  2017-03-07 23:40       ` [RFC v1 01/19] fs: " Christoph Hellwig
  1 sibling, 2 replies; 56+ messages in thread
From: J. Bruce Fields @ 2017-03-03 20:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Olga Kornievskaia, Trond.Myklebust, linux-nfs

On Thu, Mar 02, 2017 at 08:22:21AM -0800, Christoph Hellwig wrote:
> On Thu, Mar 02, 2017 at 11:01:05AM -0500, Olga Kornievskaia wrote:
> > +	if (pos_in >= i_size_read(inode_in))
> > +		return -EINVAL;
> 
> That's not how the syscall is supposed to work, we'd rather do a
> short read^^^^^copy.

That's what I think too, but then is COPY(2) wrong?:

	EINVAL Requested  range  extends beyond the end of the source
	  file; or the flags argument is not 0.

Also, copy_file_range can be implemented by ->clone_file_range, where
these kinds of checks make more sense, I think; e.g. from btrfs:

	ret = -EINVAL;
	if (off + len > src->i_size || off + len < off)
		goto out_unlock;

Well, so the caller just has to be prepared for either behavior, I
guess, but that may make it more complicated to use.

--b.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
  2017-03-03 20:47     ` J. Bruce Fields
@ 2017-03-03 21:08       ` Olga Kornievskaia
  2017-03-03 21:32         ` J. Bruce Fields
  2017-03-07 23:40       ` [RFC v1 01/19] fs: " Christoph Hellwig
  1 sibling, 1 reply; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-03 21:08 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Christoph Hellwig, Trond.Myklebust, linux-nfs


> On Mar 3, 2017, at 3:47 PM, J. Bruce Fields <bfields@fieldses.org> =
wrote:
>=20
> On Thu, Mar 02, 2017 at 08:22:21AM -0800, Christoph Hellwig wrote:
>> On Thu, Mar 02, 2017 at 11:01:05AM -0500, Olga Kornievskaia wrote:
>>> +	if (pos_in >=3D i_size_read(inode_in))
>>> +		return -EINVAL;
>>=20
>> That's not how the syscall is supposed to work, we'd rather do a
>> short read^^^^^copy.
>=20
> That's what I think too, but then is COPY(2) wrong?:
>=20
> 	EINVAL Requested  range  extends beyond the end of the source
> 	  file; or the flags argument is not 0.
>=20
> Also, copy_file_range can be implemented by ->clone_file_range, where
> these kinds of checks make more sense, I think; e.g. from btrfs:
>=20
> 	ret =3D -EINVAL;
> 	if (off + len > src->i_size || off + len < off)
> 		goto out_unlock;
>=20
> Well, so the caller just has to be prepared for either behavior, I
> guess, but that may make it more complicated to use.
>=20

I=E2=80=99m still rather very confused again by the comment and what it =
is proposing.

There are two checks to consider for the validity of the arguments

1. If the offset of the source file is beyond the end of the source =
file.
2. If the offset + len is beyond the end of the file.

I read that the man page is talking about #2.  This is actually what the =
NFSv4.2 spec required for the COPY too but we=E2=80=99ve been discussing =
that it should be a short read instead.

This patch address is to address case #1. As far as I can tell it =
applies to all file systems.

Are you suggesting that the checks for the validity of the arguments do =
not belong in VFS but instead should be in each of the underlying file =
systems?

Not all vfs_copy_file_range() are implemented via clone_file_range(). At =
least I hope that =E2=80=9Cinter=E2=80=9D NFSv4.2 COPY will also use =
vfs_file_copy_range() and it would not be calling clone().


> --b.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
  2017-03-03 21:08       ` Olga Kornievskaia
@ 2017-03-03 21:32         ` J. Bruce Fields
       [not found]           ` <B3F80DA0-B4F8-4628-88C5-E5C047620F17@netapp.com>
  0 siblings, 1 reply; 56+ messages in thread
From: J. Bruce Fields @ 2017-03-03 21:32 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: Christoph Hellwig, Trond.Myklebust, linux-nfs

On Fri, Mar 03, 2017 at 04:08:19PM -0500, Olga Kornievskaia wrote:
> 
> > On Mar 3, 2017, at 3:47 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > 
> > On Thu, Mar 02, 2017 at 08:22:21AM -0800, Christoph Hellwig wrote:
> >> On Thu, Mar 02, 2017 at 11:01:05AM -0500, Olga Kornievskaia wrote:
> >>> +	if (pos_in >= i_size_read(inode_in))
> >>> +		return -EINVAL;
> >> 
> >> That's not how the syscall is supposed to work, we'd rather do a
> >> short read^^^^^copy.
> > 
> > That's what I think too, but then is COPY(2) wrong?:
> > 
> > 	EINVAL Requested  range  extends beyond the end of the source
> > 	  file; or the flags argument is not 0.
> > 
> > Also, copy_file_range can be implemented by ->clone_file_range, where
> > these kinds of checks make more sense, I think; e.g. from btrfs:
> > 
> > 	ret = -EINVAL;
> > 	if (off + len > src->i_size || off + len < off)
> > 		goto out_unlock;
> > 
> > Well, so the caller just has to be prepared for either behavior, I
> > guess, but that may make it more complicated to use.
> > 
> 
> I’m still rather very confused again by the comment and what it is proposing.
> 
> There are two checks to consider for the validity of the arguments
> 
> 1. If the offset of the source file is beyond the end of the source file.
> 2. If the offset + len is beyond the end of the file.
> 
> I read that the man page is talking about #2.  This is actually what the NFSv4.2 spec required for the COPY too but we’ve been discussing that it should be a short read instead.
> 
> This patch address is to address case #1. As far as I can tell it applies to all file systems.
> 
> Are you suggesting that the checks for the validity of the arguments do not belong in VFS but instead should be in each of the underlying file systems?
> 
> Not all vfs_copy_file_range() are implemented via clone_file_range(). At least I hope that “inter” NFSv4.2 COPY will also use vfs_file_copy_range() and it would not be calling clone().

I think it'd be acceptable for copy_file_range() to just return 0 even
in your case 1.  I believe that's what ordinary read and pread does.

You probably can't perform it atomically with the copy, so it's possible
that the size will change right after you check it.

I don't see a benefit to the check.

--b.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
       [not found]           ` <B3F80DA0-B4F8-4628-88C5-E5C047620F17@netapp.com>
@ 2017-03-04  2:10             ` J. Bruce Fields
  2017-03-06 16:27               ` [RFC v1 01/19] " Olga Kornievskaia
  0 siblings, 1 reply; 56+ messages in thread
From: J. Bruce Fields @ 2017-03-04  2:10 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: Christoph Hellwig, Trond.Myklebust, linux-nfs

On Fri, Mar 03, 2017 at 05:46:05PM -0500, Olga Kornievskaia wrote:
> 
> > On Mar 3, 2017, at 4:32 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > 
> > On Fri, Mar 03, 2017 at 04:08:19PM -0500, Olga Kornievskaia wrote:
> >> 
> >>> On Mar 3, 2017, at 3:47 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> >>> 
> >>> On Thu, Mar 02, 2017 at 08:22:21AM -0800, Christoph Hellwig wrote:
> >>>> On Thu, Mar 02, 2017 at 11:01:05AM -0500, Olga Kornievskaia wrote:
> >>>>> +	if (pos_in >= i_size_read(inode_in))
> >>>>> +		return -EINVAL;
> >>>> 
> >>>> That's not how the syscall is supposed to work, we'd rather do a
> >>>> short read^^^^^copy.
> >>> 
> >>> That's what I think too, but then is COPY(2) wrong?:
> >>> 
> >>> 	EINVAL Requested  range  extends beyond the end of the source
> >>> 	  file; or the flags argument is not 0.
> >>> 
> >>> Also, copy_file_range can be implemented by ->clone_file_range, where
> >>> these kinds of checks make more sense, I think; e.g. from btrfs:
> >>> 
> >>> 	ret = -EINVAL;
> >>> 	if (off + len > src->i_size || off + len < off)
> >>> 		goto out_unlock;
> >>> 
> >>> Well, so the caller just has to be prepared for either behavior, I
> >>> guess, but that may make it more complicated to use.
> >>> 
> >> 
> >> I’m still rather very confused again by the comment and what it is proposing.
> >> 
> >> There are two checks to consider for the validity of the arguments
> >> 
> >> 1. If the offset of the source file is beyond the end of the source file.
> >> 2. If the offset + len is beyond the end of the file.
> >> 
> >> I read that the man page is talking about #2.  This is actually what the NFSv4.2 spec required for the COPY too but we’ve been discussing that it should be a short read instead.
> >> 
> >> This patch address is to address case #1. As far as I can tell it applies to all file systems.
> >> 
> >> Are you suggesting that the checks for the validity of the arguments do not belong in VFS but instead should be in each of the underlying file systems?
> >> 
> >> Not all vfs_copy_file_range() are implemented via clone_file_range(). At least I hope that “inter” NFSv4.2 COPY will also use vfs_file_copy_range() and it would not be calling clone().
> > 
> > I think it'd be acceptable for copy_file_range() to just return 0 even
> > in your case 1.  I believe that's what ordinary read and pread does.
> > 
> > You probably can't perform it atomically with the copy, so it's possible
> > that the size will change right after you check it.
> > 
> > I don't see a benefit to the check.
> 

> In read() you don’t specify the offset from which to read. It is read
> from the current file descriptor offset. I don’t find the comparison
> equal. 
> 
> It’s it more fair to compare it to lseek() which does return EINVAL if
> you specify position beyond the end of the file. 

Or pread(), which takes an offset.

But read() can read at an offset at the end of file, or even past that
(if the file was truncated), and I believe it just returns 0 in those
cases.

--b.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] Don't copy beyond the end of the file
  2017-03-04  2:10             ` J. Bruce Fields
@ 2017-03-06 16:27               ` Olga Kornievskaia
  2017-03-06 19:09                 ` J. Bruce Fields
  2017-03-06 19:23                   ` J. Bruce Fields
  0 siblings, 2 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-06 16:27 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Christoph Hellwig, Trond.Myklebust, linux-nfs


> On Mar 3, 2017, at 9:10 PM, J. Bruce Fields <bfields@fieldses.org> =
wrote:
>=20
> On Fri, Mar 03, 2017 at 05:46:05PM -0500, Olga Kornievskaia wrote:
>>=20
>>> On Mar 3, 2017, at 4:32 PM, J. Bruce Fields <bfields@fieldses.org> =
wrote:
>>>=20
>>> On Fri, Mar 03, 2017 at 04:08:19PM -0500, Olga Kornievskaia wrote:
>>>>=20
>>>>> On Mar 3, 2017, at 3:47 PM, J. Bruce Fields <bfields@fieldses.org> =
wrote:
>>>>>=20
>>>>> On Thu, Mar 02, 2017 at 08:22:21AM -0800, Christoph Hellwig wrote:
>>>>>> On Thu, Mar 02, 2017 at 11:01:05AM -0500, Olga Kornievskaia =
wrote:
>>>>>>> +	if (pos_in >=3D i_size_read(inode_in))
>>>>>>> +		return -EINVAL;
>>>>>>=20
>>>>>> That's not how the syscall is supposed to work, we'd rather do a
>>>>>> short read^^^^^copy.
>>>>>=20
>>>>> That's what I think too, but then is COPY(2) wrong?:
>>>>>=20
>>>>> 	EINVAL Requested  range  extends beyond the end of the source
>>>>> 	  file; or the flags argument is not 0.
>>>>>=20
>>>>> Also, copy_file_range can be implemented by ->clone_file_range, =
where
>>>>> these kinds of checks make more sense, I think; e.g. from btrfs:
>>>>>=20
>>>>> 	ret =3D -EINVAL;
>>>>> 	if (off + len > src->i_size || off + len < off)
>>>>> 		goto out_unlock;
>>>>>=20
>>>>> Well, so the caller just has to be prepared for either behavior, I
>>>>> guess, but that may make it more complicated to use.
>>>>>=20
>>>>=20
>>>> I=E2=80=99m still rather very confused again by the comment and =
what it is proposing.
>>>>=20
>>>> There are two checks to consider for the validity of the arguments
>>>>=20
>>>> 1. If the offset of the source file is beyond the end of the source =
file.
>>>> 2. If the offset + len is beyond the end of the file.
>>>>=20
>>>> I read that the man page is talking about #2.  This is actually =
what the NFSv4.2 spec required for the COPY too but we=E2=80=99ve been =
discussing that it should be a short read instead.
>>>>=20
>>>> This patch address is to address case #1. As far as I can tell it =
applies to all file systems.
>>>>=20
>>>> Are you suggesting that the checks for the validity of the =
arguments do not belong in VFS but instead should be in each of the =
underlying file systems?
>>>>=20
>>>> Not all vfs_copy_file_range() are implemented via =
clone_file_range(). At least I hope that =E2=80=9Cinter=E2=80=9D NFSv4.2 =
COPY will also use vfs_file_copy_range() and it would not be calling =
clone().
>>>=20
>>> I think it'd be acceptable for copy_file_range() to just return 0 =
even
>>> in your case 1.  I believe that's what ordinary read and pread does.
>>>=20
>>> You probably can't perform it atomically with the copy, so it's =
possible
>>> that the size will change right after you check it.
>>>=20
>>> I don't see a benefit to the check.
>>=20
>=20
>> In read() you don=E2=80=99t specify the offset from which to read. It =
is read
>> from the current file descriptor offset. I don=E2=80=99t find the =
comparison
>> equal.=20
>>=20
>> It=E2=80=99s it more fair to compare it to lseek() which does return =
EINVAL if
>> you specify position beyond the end of the file.=20
>=20
> Or pread(), which takes an offset.
>=20
> But read() can read at an offset at the end of file, or even past that
> (if the file was truncated), and I believe it just returns 0 in those
> cases.
>=20

I don=E2=80=99t see copy_file_range() specifying that 0 means end of the =
file. Are you arguing to add that meaning to the function call? I guess =
in that case we=E2=80=99d need to take extra care to never return 0bytes =
to the client as a =E2=80=9Cpartial=E2=80=9D copy (say due to server =
rebooting).

Unless changed, NFS4.2 mandates the two checks that I=E2=80=99ve =
specified. I can add the checks in the NFS implementation itself. =
However, we thought at least this check belonged in the VFS layer. I=E2=80=
=99m really not super attached to getting into VFS. Actually the 2nd =
check is something that copy_file_range() man pages say should return =
EINVAL but the VFS code doesn=E2=80=99t enforce it.=20






^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] Don't copy beyond the end of the file
  2017-03-06 16:27               ` [RFC v1 01/19] " Olga Kornievskaia
@ 2017-03-06 19:09                 ` J. Bruce Fields
  2017-03-06 19:23                   ` J. Bruce Fields
  1 sibling, 0 replies; 56+ messages in thread
From: J. Bruce Fields @ 2017-03-06 19:09 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: Christoph Hellwig, Trond.Myklebust, linux-nfs

On Mon, Mar 06, 2017 at 11:27:23AM -0500, Olga Kornievskaia wrote:
> 
> > On Mar 3, 2017, at 9:10 PM, J. Bruce Fields <bfields@fieldses.org>
> > wrote:
> > 
> > On Fri, Mar 03, 2017 at 05:46:05PM -0500, Olga Kornievskaia wrote:
> >> 
> >>> On Mar 3, 2017, at 4:32 PM, J. Bruce Fields <bfields@fieldses.org>
> >>> wrote:
> >>> 
> >>> On Fri, Mar 03, 2017 at 04:08:19PM -0500, Olga Kornievskaia wrote:
> >>>> 
> >>>>> On Mar 3, 2017, at 3:47 PM, J. Bruce Fields
> >>>>> <bfields@fieldses.org> wrote:
> >>>>> 
> >>>>> On Thu, Mar 02, 2017 at 08:22:21AM -0800, Christoph Hellwig
> >>>>> wrote:
> >>>>>> On Thu, Mar 02, 2017 at 11:01:05AM -0500, Olga Kornievskaia
> >>>>>> wrote:
> >>>>>>> +	if (pos_in >= i_size_read(inode_in)) +		return
> >>>>>>> -EINVAL;
> >>>>>> 
> >>>>>> That's not how the syscall is supposed to work, we'd rather do
> >>>>>> a short read^^^^^copy.
> >>>>> 
> >>>>> That's what I think too, but then is COPY(2) wrong?:
> >>>>> 
> >>>>> 	EINVAL Requested  range  extends beyond the end of the
> >>>>> 	source file; or the flags argument is not 0.
> >>>>> 
> >>>>> Also, copy_file_range can be implemented by ->clone_file_range,
> >>>>> where these kinds of checks make more sense, I think; e.g. from
> >>>>> btrfs:
> >>>>> 
> >>>>> 	ret = -EINVAL; if (off + len > src->i_size || off + len
> >>>>> 	< off) goto out_unlock;
> >>>>> 
> >>>>> Well, so the caller just has to be prepared for either behavior,
> >>>>> I guess, but that may make it more complicated to use.
> >>>>> 
> >>>> 
> >>>> I’m still rather very confused again by the comment and what it
> >>>> is proposing.
> >>>> 
> >>>> There are two checks to consider for the validity of the
> >>>> arguments
> >>>> 
> >>>> 1. If the offset of the source file is beyond the end of the
> >>>> source file.  2. If the offset + len is beyond the end of the
> >>>> file.
> >>>> 
> >>>> I read that the man page is talking about #2.  This is actually
> >>>> what the NFSv4.2 spec required for the COPY too but we’ve been
> >>>> discussing that it should be a short read instead.
> >>>> 
> >>>> This patch address is to address case #1. As far as I can tell it
> >>>> applies to all file systems.
> >>>> 
> >>>> Are you suggesting that the checks for the validity of the
> >>>> arguments do not belong in VFS but instead should be in each of
> >>>> the underlying file systems?
> >>>> 
> >>>> Not all vfs_copy_file_range() are implemented via
> >>>> clone_file_range(). At least I hope that “inter” NFSv4.2 COPY
> >>>> will also use vfs_file_copy_range() and it would not be calling
> >>>> clone().
> >>> 
> >>> I think it'd be acceptable for copy_file_range() to just return 0
> >>> even in your case 1.  I believe that's what ordinary read and
> >>> pread does.
> >>> 
> >>> You probably can't perform it atomically with the copy, so it's
> >>> possible that the size will change right after you check it.
> >>> 
> >>> I don't see a benefit to the check.
> >> 
> > 
> >> In read() you don’t specify the offset from which to read. It is
> >> read from the current file descriptor offset. I don’t find the
> >> comparison equal. 
> >> 
> >> It’s it more fair to compare it to lseek() which does return EINVAL
> >> if you specify position beyond the end of the file. 
> > 
> > Or pread(), which takes an offset.
> > 
> > But read() can read at an offset at the end of file, or even past
> > that (if the file was truncated), and I believe it just returns 0 in
> > those cases.
> > 
> 
> I don’t see copy_file_range() specifying that 0 means end of the file.
> Are you arguing to add that meaning to the function call?

Yes.

> I guess in
> that case we’d need to take extra care to never return 0bytes to the
> client as a “partial” copy (say due to server rebooting).

Right.

> Unless changed, NFS4.2 mandates the two checks that I’ve specified. I
> can add the checks in the NFS implementation itself. However, we
> thought at least this check belonged in the VFS layer. I’m really not
> super attached to getting into VFS. Actually the 2nd check is
> something that copy_file_range() man pages say should return EINVAL
> but the VFS code doesn’t enforce it. 

Please leave those checks out.  Let's try to fix the spec.

--b.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] Don't copy beyond the end of the file
@ 2017-03-06 19:23                   ` J. Bruce Fields
  0 siblings, 0 replies; 56+ messages in thread
From: J. Bruce Fields @ 2017-03-06 19:23 UTC (permalink / raw)
  To: linux-man
  Cc: Christoph Hellwig, Trond.Myklebust, linux-nfs, linux-fsdevel,
	Olga Kornievskaia

On Mon, Mar 06, 2017 at 11:27:23AM -0500, Olga Kornievskaia wrote:
> I don’t see copy_file_range() specifying that 0 means end of the file.

Is there a reason that copy_file_range shouldn't be like read?  (From
read(2): "On  success,  the number of bytes read is returned (zero
indicates end of file)".

I haven't checked, but suspect that's already true of the
implementations we have.

Also, from copy_file_range():

	EINVAL Requested  range  extends beyond the end of the source
		file; or the flags argument is not 0.

Some filesystems do this, some don't; I think the man page should make
it clear that this behavior is not required.

--b.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] Don't copy beyond the end of the file
@ 2017-03-06 19:23                   ` J. Bruce Fields
  0 siblings, 0 replies; 56+ messages in thread
From: J. Bruce Fields @ 2017-03-06 19:23 UTC (permalink / raw)
  To: linux-man-u79uwXL29TY76Z2rM5mHXA
  Cc: Christoph Hellwig, Trond.Myklebust-7I+n7zu2hftEKMMhf/gKZA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Olga Kornievskaia

On Mon, Mar 06, 2017 at 11:27:23AM -0500, Olga Kornievskaia wrote:
> I don’t see copy_file_range() specifying that 0 means end of the file.

Is there a reason that copy_file_range shouldn't be like read?  (From
read(2): "On  success,  the number of bytes read is returned (zero
indicates end of file)".

I haven't checked, but suspect that's already true of the
implementations we have.

Also, from copy_file_range():

	EINVAL Requested  range  extends beyond the end of the source
		file; or the flags argument is not 0.

Some filesystems do this, some don't; I think the man page should make
it clear that this behavior is not required.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] Don't copy beyond the end of the file
  2017-03-06 19:23                   ` J. Bruce Fields
  (?)
@ 2017-03-07 14:18                     ` Olga Kornievskaia
  -1 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-07 14:18 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: linux-man, Christoph Hellwig, Trond Myklebust, linux-nfs,
	linux-fsdevel, Olga Kornievskaia

On Mon, Mar 6, 2017 at 2:23 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> On Mon, Mar 06, 2017 at 11:27:23AM -0500, Olga Kornievskaia wrote:
>> I don’t see copy_file_range() specifying that 0 means end of the file.
>
> Is there a reason that copy_file_range shouldn't be like read?  (From
> read(2): "On  success,  the number of bytes read is returned (zero
> indicates end of file)".

The only thing I can think of is the fact that copy_file_range() does
write too. And return 0 from the write is not expected/allowed (except
when input count was 0)?

> I haven't checked, but suspect that's already true of the
> implementations we have.
>
> Also, from copy_file_range():
>
>         EINVAL Requested  range  extends beyond the end of the source
>                 file; or the flags argument is not 0.
>
> Some filesystems do this, some don't; I think the man page should make
> it clear that this behavior is not required.
>
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] Don't copy beyond the end of the file
@ 2017-03-07 14:18                     ` Olga Kornievskaia
  0 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-07 14:18 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: linux-man, Christoph Hellwig, Trond Myklebust, linux-nfs,
	linux-fsdevel, Olga Kornievskaia

On Mon, Mar 6, 2017 at 2:23 PM, J. Bruce Fields <bfields@fieldses.org> wrot=
e:
> On Mon, Mar 06, 2017 at 11:27:23AM -0500, Olga Kornievskaia wrote:
>> I don=E2=80=99t see copy_file_range() specifying that 0 means end of the=
 file.
>
> Is there a reason that copy_file_range shouldn't be like read?  (From
> read(2): "On  success,  the number of bytes read is returned (zero
> indicates end of file)".

The only thing I can think of is the fact that copy_file_range() does
write too. And return 0 from the write is not expected/allowed (except
when input count was 0)?

> I haven't checked, but suspect that's already true of the
> implementations we have.
>
> Also, from copy_file_range():
>
>         EINVAL Requested  range  extends beyond the end of the source
>                 file; or the flags argument is not 0.
>
> Some filesystems do this, some don't; I think the man page should make
> it clear that this behavior is not required.
>
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] Don't copy beyond the end of the file
@ 2017-03-07 14:18                     ` Olga Kornievskaia
  0 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-07 14:18 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: linux-man-u79uwXL29TY76Z2rM5mHXA, Christoph Hellwig,
	Trond Myklebust, linux-nfs, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	Olga Kornievskaia

On Mon, Mar 6, 2017 at 2:23 PM, J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote:
> On Mon, Mar 06, 2017 at 11:27:23AM -0500, Olga Kornievskaia wrote:
>> I don’t see copy_file_range() specifying that 0 means end of the file.
>
> Is there a reason that copy_file_range shouldn't be like read?  (From
> read(2): "On  success,  the number of bytes read is returned (zero
> indicates end of file)".

The only thing I can think of is the fact that copy_file_range() does
write too. And return 0 from the write is not expected/allowed (except
when input count was 0)?

> I haven't checked, but suspect that's already true of the
> implementations we have.
>
> Also, from copy_file_range():
>
>         EINVAL Requested  range  extends beyond the end of the source
>                 file; or the flags argument is not 0.
>
> Some filesystems do this, some don't; I think the man page should make
> it clear that this behavior is not required.
>
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
  2017-03-03 20:47     ` J. Bruce Fields
  2017-03-03 21:08       ` Olga Kornievskaia
@ 2017-03-07 23:40       ` Christoph Hellwig
  2017-03-08 17:05         ` J. Bruce Fields
  1 sibling, 1 reply; 56+ messages in thread
From: Christoph Hellwig @ 2017-03-07 23:40 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Christoph Hellwig, Olga Kornievskaia, Trond.Myklebust, linux-nfs

On Fri, Mar 03, 2017 at 03:47:47PM -0500, J. Bruce Fields wrote:
> That's what I think too, but then is COPY(2) wrong?:

Either the current kernel code or the man page is wrong..

> Also, copy_file_range can be implemented by ->clone_file_range, where
> these kinds of checks make more sense, I think; e.g. from btrfs:
> 
> 	ret = -EINVAL;
> 	if (off + len > src->i_size || off + len < off)
> 		goto out_unlock;
> 
> Well, so the caller just has to be prepared for either behavior, I
> guess, but that may make it more complicated to use.

We'll need to stick to one behavior.  So between the man page and
the clone implementation I guess this patch is fine after all.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
  2017-03-07 23:40       ` [RFC v1 01/19] fs: " Christoph Hellwig
@ 2017-03-08 17:05         ` J. Bruce Fields
  2017-03-08 17:25           ` Christoph Hellwig
  0 siblings, 1 reply; 56+ messages in thread
From: J. Bruce Fields @ 2017-03-08 17:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Olga Kornievskaia, Trond.Myklebust, linux-nfs

On Tue, Mar 07, 2017 at 03:40:51PM -0800, Christoph Hellwig wrote:
> On Fri, Mar 03, 2017 at 03:47:47PM -0500, J. Bruce Fields wrote:
> > That's what I think too, but then is COPY(2) wrong?:
> 
> Either the current kernel code or the man page is wrong..
> 
> > Also, copy_file_range can be implemented by ->clone_file_range, where
> > these kinds of checks make more sense, I think; e.g. from btrfs:
> > 
> > 	ret = -EINVAL;
> > 	if (off + len > src->i_size || off + len < off)
> > 		goto out_unlock;
> > 
> > Well, so the caller just has to be prepared for either behavior, I
> > guess, but that may make it more complicated to use.
> 
> We'll need to stick to one behavior.  So between the man page and
> the clone implementation I guess this patch is fine after all.

Ugh, please, let's not.

Since copy isn't atomic that check is never going to be reliable.

As long as we allow copy to have either copy-like or clone-like
implementations, callers have to be prepared to handle either behavior
when the range is past end of file, either a short copy or an EINVAL.

The difference is that if we add this check, then the "short copy"
behavior becomes something that only happens when the timing is just
right.

--b.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
  2017-03-08 17:05         ` J. Bruce Fields
@ 2017-03-08 17:25           ` Christoph Hellwig
  2017-03-08 17:32             ` Olga Kornievskaia
  0 siblings, 1 reply; 56+ messages in thread
From: Christoph Hellwig @ 2017-03-08 17:25 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Christoph Hellwig, Olga Kornievskaia, Trond.Myklebust, linux-nfs

On Wed, Mar 08, 2017 at 12:05:21PM -0500, J. Bruce Fields wrote:
> Since copy isn't atomic that check is never going to be reliable.

That's true for everything that COPY does.  By that logic we should
not implement it at all (a logic that I'd fully support)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
  2017-03-08 17:25           ` Christoph Hellwig
@ 2017-03-08 17:32             ` Olga Kornievskaia
  2017-03-08 19:53               ` J. Bruce Fields
  0 siblings, 1 reply; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-08 17:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: J. Bruce Fields, Trond.Myklebust, linux-nfs


> On Mar 8, 2017, at 12:25 PM, Christoph Hellwig <hch@infradead.org> =
wrote:
>=20
> On Wed, Mar 08, 2017 at 12:05:21PM -0500, J. Bruce Fields wrote:
>> Since copy isn't atomic that check is never going to be reliable.
>=20
> That's true for everything that COPY does.  By that logic we should
> not implement it at all (a logic that I'd fully support)


If you were to only keep CLONE then you=E2=80=99d lose a huge =
performance gain you get from server-to-server COPY.=20=

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
  2017-03-08 17:32             ` Olga Kornievskaia
@ 2017-03-08 19:53               ` J. Bruce Fields
  2017-03-08 20:00                   ` Olga Kornievskaia
  0 siblings, 1 reply; 56+ messages in thread
From: J. Bruce Fields @ 2017-03-08 19:53 UTC (permalink / raw)
  To: Olga Kornievskaia
  Cc: Christoph Hellwig, Trond.Myklebust, linux-nfs, linux-fsdevel

On Wed, Mar 08, 2017 at 12:32:12PM -0500, Olga Kornievskaia wrote:
> 
> > On Mar 8, 2017, at 12:25 PM, Christoph Hellwig <hch@infradead.org>
> > wrote:
> > 
> > On Wed, Mar 08, 2017 at 12:05:21PM -0500, J. Bruce Fields wrote:
> >> Since copy isn't atomic that check is never going to be reliable.
> > 
> > That's true for everything that COPY does.  By that logic we should
> > not implement it at all (a logic that I'd fully support)
> 
> If you were to only keep CLONE then you’d lose a huge performance gain
> you get from server-to-server COPY. 

Yes.  Also, I think copy-like copy implementations have reasonable
semantics that are basically the same as read:

	- copy can return successfully with less copied than requested.
	- it's fine for the copied range to start and/or end past end of
	  file, it'll just return a short read.
	- A copy of more than 0 bytes returning 0 means you're at end of
	  file.

The particular problem here is that that doesn't fit how clone works at
all.

It feels like what happened is that copy_file_range() was made mainly
for the clone case, with the idea that copy might be reluctantly
accepted as a second-class implementation.

But the performance gain of copy offload is too big to just ignore, and
in fact it's what copy_file_range does on every filesystem but btrfs and
ocfs2 (and maybe cifs?), so I don't think we can just ignore it.

If we had separate copy_file_range and clone_file_range, I *think* it
could all be made sensible.  Am I missing something?

--b.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
  2017-03-08 19:53               ` J. Bruce Fields
@ 2017-03-08 20:00                   ` Olga Kornievskaia
  0 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-08 20:00 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Christoph Hellwig, Trond.Myklebust, linux-nfs, linux-fsdevel


> On Mar 8, 2017, at 2:53 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Wed, Mar 08, 2017 at 12:32:12PM -0500, Olga Kornievskaia wrote:
>> 
>>> On Mar 8, 2017, at 12:25 PM, Christoph Hellwig <hch@infradead.org>
>>> wrote:
>>> 
>>> On Wed, Mar 08, 2017 at 12:05:21PM -0500, J. Bruce Fields wrote:
>>>> Since copy isn't atomic that check is never going to be reliable.
>>> 
>>> That's true for everything that COPY does.  By that logic we should
>>> not implement it at all (a logic that I'd fully support)
>> 
>> If you were to only keep CLONE then you’d lose a huge performance gain
>> you get from server-to-server COPY. 
> 
> Yes.  Also, I think copy-like copy implementations have reasonable
> semantics that are basically the same as read:
> 
> 	- copy can return successfully with less copied than requested.
> 	- it's fine for the copied range to start and/or end past end of
> 	  file, it'll just return a short read.
> 	- A copy of more than 0 bytes returning 0 means you're at end of
> 	  file.
> 
> The particular problem here is that that doesn't fit how clone works at
> all.
> 
> It feels like what happened is that copy_file_range() was made mainly
> for the clone case, with the idea that copy might be reluctantly
> accepted as a second-class implementation.
> 
> But the performance gain of copy offload is too big to just ignore, and
> in fact it's what copy_file_range does on every filesystem but btrfs and
> ocfs2 (and maybe cifs?), so I don't think we can just ignore it.
> 
> If we had separate copy_file_range and clone_file_range, I *think* it
> could all be made sensible.  Am I missing something?
> 

How would the application (cp) know when to call the clone_file_range and when to call copy_file_range?
 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
@ 2017-03-08 20:00                   ` Olga Kornievskaia
  0 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-08 20:00 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Christoph Hellwig, Trond.Myklebust, linux-nfs, linux-fsdevel


> On Mar 8, 2017, at 2:53 PM, J. Bruce Fields <bfields@fieldses.org> =
wrote:
>=20
> On Wed, Mar 08, 2017 at 12:32:12PM -0500, Olga Kornievskaia wrote:
>>=20
>>> On Mar 8, 2017, at 12:25 PM, Christoph Hellwig <hch@infradead.org>
>>> wrote:
>>>=20
>>> On Wed, Mar 08, 2017 at 12:05:21PM -0500, J. Bruce Fields wrote:
>>>> Since copy isn't atomic that check is never going to be reliable.
>>>=20
>>> That's true for everything that COPY does.  By that logic we should
>>> not implement it at all (a logic that I'd fully support)
>>=20
>> If you were to only keep CLONE then you=E2=80=99d lose a huge =
performance gain
>> you get from server-to-server COPY.=20
>=20
> Yes.  Also, I think copy-like copy implementations have reasonable
> semantics that are basically the same as read:
>=20
> 	- copy can return successfully with less copied than requested.
> 	- it's fine for the copied range to start and/or end past end of
> 	  file, it'll just return a short read.
> 	- A copy of more than 0 bytes returning 0 means you're at end of
> 	  file.
>=20
> The particular problem here is that that doesn't fit how clone works =
at
> all.
>=20
> It feels like what happened is that copy_file_range() was made mainly
> for the clone case, with the idea that copy might be reluctantly
> accepted as a second-class implementation.
>=20
> But the performance gain of copy offload is too big to just ignore, =
and
> in fact it's what copy_file_range does on every filesystem but btrfs =
and
> ocfs2 (and maybe cifs?), so I don't think we can just ignore it.
>=20
> If we had separate copy_file_range and clone_file_range, I *think* it
> could all be made sensible.  Am I missing something?
>=20

How would the application (cp) know when to call the clone_file_range =
and when to call copy_file_range?
=20


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
  2017-03-08 20:00                   ` Olga Kornievskaia
  (?)
@ 2017-03-08 20:18                   ` J. Bruce Fields
  -1 siblings, 0 replies; 56+ messages in thread
From: J. Bruce Fields @ 2017-03-08 20:18 UTC (permalink / raw)
  To: Olga Kornievskaia
  Cc: Christoph Hellwig, Trond.Myklebust, linux-nfs, linux-fsdevel

On Wed, Mar 08, 2017 at 03:00:52PM -0500, Olga Kornievskaia wrote:
> 
> > On Mar 8, 2017, at 2:53 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > 
> > On Wed, Mar 08, 2017 at 12:32:12PM -0500, Olga Kornievskaia wrote:
> >> 
> >>> On Mar 8, 2017, at 12:25 PM, Christoph Hellwig <hch@infradead.org>
> >>> wrote:
> >>> 
> >>> On Wed, Mar 08, 2017 at 12:05:21PM -0500, J. Bruce Fields wrote:
> >>>> Since copy isn't atomic that check is never going to be reliable.
> >>> 
> >>> That's true for everything that COPY does.  By that logic we should
> >>> not implement it at all (a logic that I'd fully support)
> >> 
> >> If you were to only keep CLONE then you’d lose a huge performance gain
> >> you get from server-to-server COPY. 
> > 
> > Yes.  Also, I think copy-like copy implementations have reasonable
> > semantics that are basically the same as read:
> > 
> > 	- copy can return successfully with less copied than requested.
> > 	- it's fine for the copied range to start and/or end past end of
> > 	  file, it'll just return a short read.
> > 	- A copy of more than 0 bytes returning 0 means you're at end of
> > 	  file.
> > 
> > The particular problem here is that that doesn't fit how clone works at
> > all.
> > 
> > It feels like what happened is that copy_file_range() was made mainly
> > for the clone case, with the idea that copy might be reluctantly
> > accepted as a second-class implementation.
> > 
> > But the performance gain of copy offload is too big to just ignore, and
> > in fact it's what copy_file_range does on every filesystem but btrfs and
> > ocfs2 (and maybe cifs?), so I don't think we can just ignore it.
> > 
> > If we had separate copy_file_range and clone_file_range, I *think* it
> > could all be made sensible.  Am I missing something?
> 
> How would the application (cp) know when to call the clone_file_range and when to call copy_file_range?

Try clone and then fall back on copy if that's not available?

Which is the same thing vfs_copy_file_range() is doing now, but it'd
seem less confusing if that logic was in the application.

--b.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
  2017-03-08 20:00                   ` Olga Kornievskaia
@ 2017-03-08 20:18                     ` Trond Myklebust
  -1 siblings, 0 replies; 56+ messages in thread
From: Trond Myklebust @ 2017-03-08 20:18 UTC (permalink / raw)
  To: bfields, kolga; +Cc: hch, Trond Myklebust, linux-nfs, linux-fsdevel

On Wed, 2017-03-08 at 15:00 -0500, Olga Kornievskaia wrote:
> > On Mar 8, 2017, at 2:53 PM, J. Bruce Fields <bfields@fieldses.org>
> > wrote:
> > 
> > On Wed, Mar 08, 2017 at 12:32:12PM -0500, Olga Kornievskaia wrote:
> > > 
> > > > On Mar 8, 2017, at 12:25 PM, Christoph Hellwig <hch@infradead.o
> > > > rg>
> > > > wrote:
> > > > 
> > > > On Wed, Mar 08, 2017 at 12:05:21PM -0500, J. Bruce Fields
> > > > wrote:
> > > > > Since copy isn't atomic that check is never going to be
> > > > > reliable.
> > > > 
> > > > That's true for everything that COPY does.  By that logic we
> > > > should
> > > > not implement it at all (a logic that I'd fully support)
> > > 
> > > If you were to only keep CLONE then you’d lose a huge performance
> > > gain
> > > you get from server-to-server COPY. 
> > 
> > Yes.  Also, I think copy-like copy implementations have reasonable
> > semantics that are basically the same as read:
> > 
> > 	- copy can return successfully with less copied than requested.
> > 	- it's fine for the copied range to start and/or end past end
> > of
> > 	  file, it'll just return a short read.
> > 	- A copy of more than 0 bytes returning 0 means you're at end
> > of
> > 	  file.
> > 
> > The particular problem here is that that doesn't fit how clone
> > works at
> > all.
> > 
> > It feels like what happened is that copy_file_range() was made
> > mainly
> > for the clone case, with the idea that copy might be reluctantly
> > accepted as a second-class implementation.

Historically? No... Christoph added clone as a valid implementation of
copy_file_range() almost a year after Zach and Anna defined the
semantics of vfs_copy_file_range(). git blame is your friend...

> > 
> > But the performance gain of copy offload is too big to just ignore,
> > and
> > in fact it's what copy_file_range does on every filesystem but
> > btrfs and
> > ocfs2 (and maybe cifs?), so I don't think we can just ignore it.
> > 
> > If we had separate copy_file_range and clone_file_range, I *think*
> > it
> > could all be made sensible.  Am I missing something?
> > 
> 
> How would the application (cp) know when to call the clone_file_range
> and when to call copy_file_range?

cp can probably call copy_file_range(), but any application that needs
atomic semantics (i.e. a binary operation success/fail) must call
clone_file_range().

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
@ 2017-03-08 20:18                     ` Trond Myklebust
  0 siblings, 0 replies; 56+ messages in thread
From: Trond Myklebust @ 2017-03-08 20:18 UTC (permalink / raw)
  To: bfields, kolga; +Cc: hch, Trond Myklebust, linux-nfs, linux-fsdevel

T24gV2VkLCAyMDE3LTAzLTA4IGF0IDE1OjAwIC0wNTAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90
ZToNCj4gPiBPbiBNYXIgOCwgMjAxNywgYXQgMjo1MyBQTSwgSi4gQnJ1Y2UgRmllbGRzIDxiZmll
bGRzQGZpZWxkc2VzLm9yZz4NCj4gPiB3cm90ZToNCj4gPiANCj4gPiBPbiBXZWQsIE1hciAwOCwg
MjAxNyBhdCAxMjozMjoxMlBNIC0wNTAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90ZToNCj4gPiA+
IA0KPiA+ID4gPiBPbiBNYXIgOCwgMjAxNywgYXQgMTI6MjUgUE0sIENocmlzdG9waCBIZWxsd2ln
IDxoY2hAaW5mcmFkZWFkLm8NCj4gPiA+ID4gcmc+DQo+ID4gPiA+IHdyb3RlOg0KPiA+ID4gPiAN
Cj4gPiA+ID4gT24gV2VkLCBNYXIgMDgsIDIwMTcgYXQgMTI6MDU6MjFQTSAtMDUwMCwgSi4gQnJ1
Y2UgRmllbGRzDQo+ID4gPiA+IHdyb3RlOg0KPiA+ID4gPiA+IFNpbmNlIGNvcHkgaXNuJ3QgYXRv
bWljIHRoYXQgY2hlY2sgaXMgbmV2ZXIgZ29pbmcgdG8gYmUNCj4gPiA+ID4gPiByZWxpYWJsZS4N
Cj4gPiA+ID4gDQo+ID4gPiA+IFRoYXQncyB0cnVlIGZvciBldmVyeXRoaW5nIHRoYXQgQ09QWSBk
b2VzLsKgwqBCeSB0aGF0IGxvZ2ljIHdlDQo+ID4gPiA+IHNob3VsZA0KPiA+ID4gPiBub3QgaW1w
bGVtZW50IGl0IGF0IGFsbCAoYSBsb2dpYyB0aGF0IEknZCBmdWxseSBzdXBwb3J0KQ0KPiA+ID4g
DQo+ID4gPiBJZiB5b3Ugd2VyZSB0byBvbmx5IGtlZXAgQ0xPTkUgdGhlbiB5b3XigJlkIGxvc2Ug
YSBodWdlIHBlcmZvcm1hbmNlDQo+ID4gPiBnYWluDQo+ID4gPiB5b3UgZ2V0IGZyb20gc2VydmVy
LXRvLXNlcnZlciBDT1BZLsKgDQo+ID4gDQo+ID4gWWVzLsKgwqBBbHNvLCBJIHRoaW5rIGNvcHkt
bGlrZSBjb3B5IGltcGxlbWVudGF0aW9ucyBoYXZlIHJlYXNvbmFibGUNCj4gPiBzZW1hbnRpY3Mg
dGhhdCBhcmUgYmFzaWNhbGx5IHRoZSBzYW1lIGFzIHJlYWQ6DQo+ID4gDQo+ID4gCS0gY29weSBj
YW4gcmV0dXJuIHN1Y2Nlc3NmdWxseSB3aXRoIGxlc3MgY29waWVkIHRoYW4gcmVxdWVzdGVkLg0K
PiA+IAktIGl0J3MgZmluZSBmb3IgdGhlIGNvcGllZCByYW5nZSB0byBzdGFydCBhbmQvb3IgZW5k
IHBhc3QgZW5kDQo+ID4gb2YNCj4gPiAJwqDCoGZpbGUsIGl0J2xsIGp1c3QgcmV0dXJuIGEgc2hv
cnQgcmVhZC4NCj4gPiAJLSBBIGNvcHkgb2YgbW9yZSB0aGFuIDAgYnl0ZXMgcmV0dXJuaW5nIDAg
bWVhbnMgeW91J3JlIGF0IGVuZA0KPiA+IG9mDQo+ID4gCcKgwqBmaWxlLg0KPiA+IA0KPiA+IFRo
ZSBwYXJ0aWN1bGFyIHByb2JsZW0gaGVyZSBpcyB0aGF0IHRoYXQgZG9lc24ndCBmaXQgaG93IGNs
b25lDQo+ID4gd29ya3MgYXQNCj4gPiBhbGwuDQo+ID4gDQo+ID4gSXQgZmVlbHMgbGlrZSB3aGF0
IGhhcHBlbmVkIGlzIHRoYXQgY29weV9maWxlX3JhbmdlKCkgd2FzIG1hZGUNCj4gPiBtYWlubHkN
Cj4gPiBmb3IgdGhlIGNsb25lIGNhc2UsIHdpdGggdGhlIGlkZWEgdGhhdCBjb3B5IG1pZ2h0IGJl
IHJlbHVjdGFudGx5DQo+ID4gYWNjZXB0ZWQgYXMgYSBzZWNvbmQtY2xhc3MgaW1wbGVtZW50YXRp
b24uDQoNCkhpc3RvcmljYWxseT8gTm8uLi4gQ2hyaXN0b3BoIGFkZGVkIGNsb25lIGFzIGEgdmFs
aWQgaW1wbGVtZW50YXRpb24gb2YNCmNvcHlfZmlsZV9yYW5nZSgpIGFsbW9zdCBhIHllYXIgYWZ0
ZXIgWmFjaCBhbmQgQW5uYSBkZWZpbmVkIHRoZQ0Kc2VtYW50aWNzIG9mIHZmc19jb3B5X2ZpbGVf
cmFuZ2UoKS4gZ2l0IGJsYW1lIGlzIHlvdXIgZnJpZW5kLi4uDQoNCj4gPiANCj4gPiBCdXQgdGhl
IHBlcmZvcm1hbmNlIGdhaW4gb2YgY29weSBvZmZsb2FkIGlzIHRvbyBiaWcgdG8ganVzdCBpZ25v
cmUsDQo+ID4gYW5kDQo+ID4gaW4gZmFjdCBpdCdzIHdoYXQgY29weV9maWxlX3JhbmdlIGRvZXMg
b24gZXZlcnkgZmlsZXN5c3RlbSBidXQNCj4gPiBidHJmcyBhbmQNCj4gPiBvY2ZzMiAoYW5kIG1h
eWJlIGNpZnM/KSwgc28gSSBkb24ndCB0aGluayB3ZSBjYW4ganVzdCBpZ25vcmUgaXQuDQo+ID4g
DQo+ID4gSWYgd2UgaGFkIHNlcGFyYXRlIGNvcHlfZmlsZV9yYW5nZSBhbmQgY2xvbmVfZmlsZV9y
YW5nZSwgSSAqdGhpbmsqDQo+ID4gaXQNCj4gPiBjb3VsZCBhbGwgYmUgbWFkZSBzZW5zaWJsZS7C
oMKgQW0gSSBtaXNzaW5nIHNvbWV0aGluZz8NCj4gPiANCj4gDQo+IEhvdyB3b3VsZCB0aGUgYXBw
bGljYXRpb24gKGNwKSBrbm93IHdoZW4gdG8gY2FsbCB0aGUgY2xvbmVfZmlsZV9yYW5nZQ0KPiBh
bmQgd2hlbiB0byBjYWxsIGNvcHlfZmlsZV9yYW5nZT8NCg0KY3AgY2FuIHByb2JhYmx5IGNhbGwg
Y29weV9maWxlX3JhbmdlKCksIGJ1dCBhbnkgYXBwbGljYXRpb24gdGhhdCBuZWVkcw0KYXRvbWlj
IHNlbWFudGljcyAoaS5lLiBhIGJpbmFyeSBvcGVyYXRpb24gc3VjY2Vzcy9mYWlsKSBtdXN0IGNh
bGwNCmNsb25lX2ZpbGVfcmFuZ2UoKS4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5G
UyBjbGllbnQgbWFpbnRhaW5lciwgUHJpbWFyeURhdGENCnRyb25kLm15a2xlYnVzdEBwcmltYXJ5
ZGF0YS5jb20NCg==


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
  2017-03-08 20:18                     ` Trond Myklebust
  (?)
@ 2017-03-08 20:32                     ` bfields
  2017-03-08 20:49                         ` Trond Myklebust
  -1 siblings, 1 reply; 56+ messages in thread
From: bfields @ 2017-03-08 20:32 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: kolga, hch, linux-nfs, linux-fsdevel

On Wed, Mar 08, 2017 at 08:18:31PM +0000, Trond Myklebust wrote:
> On Wed, 2017-03-08 at 15:00 -0500, Olga Kornievskaia wrote:
> > > On Mar 8, 2017, at 2:53 PM, J. Bruce Fields <bfields@fieldses.org>
> > > wrote:
> > > 
> > > On Wed, Mar 08, 2017 at 12:32:12PM -0500, Olga Kornievskaia wrote:
> > > > 
> > > > > On Mar 8, 2017, at 12:25 PM, Christoph Hellwig <hch@infradead.o
> > > > > rg>
> > > > > wrote:
> > > > > 
> > > > > On Wed, Mar 08, 2017 at 12:05:21PM -0500, J. Bruce Fields
> > > > > wrote:
> > > > > > Since copy isn't atomic that check is never going to be
> > > > > > reliable.
> > > > > 
> > > > > That's true for everything that COPY does.  By that logic we
> > > > > should
> > > > > not implement it at all (a logic that I'd fully support)
> > > > 
> > > > If you were to only keep CLONE then you’d lose a huge performance
> > > > gain
> > > > you get from server-to-server COPY. 
> > > 
> > > Yes.  Also, I think copy-like copy implementations have reasonable
> > > semantics that are basically the same as read:
> > > 
> > > 	- copy can return successfully with less copied than requested.
> > > 	- it's fine for the copied range to start and/or end past end
> > > of
> > > 	  file, it'll just return a short read.
> > > 	- A copy of more than 0 bytes returning 0 means you're at end
> > > of
> > > 	  file.
> > > 
> > > The particular problem here is that that doesn't fit how clone
> > > works at
> > > all.
> > > 
> > > It feels like what happened is that copy_file_range() was made
> > > mainly
> > > for the clone case, with the idea that copy might be reluctantly
> > > accepted as a second-class implementation.
> 
> Historically? No... Christoph added clone as a valid implementation of
> copy_file_range() almost a year after Zach and Anna defined the
> semantics of vfs_copy_file_range(). git blame is your friend...

Yeah, I know.  It still feels to me like the interface was originally
designed with clone in mind, but that's my vague impression from the man
pages and half-remembered conversations.

Though the lack of a "just copy the whole file regardless of size" case
is weird for clone.  All you can do is stat the file and then hope it
doesn't change before you issue the copy_file_range.  But I'd think it'd
be easy for an atomic clone implementation to handle, say, getting a
snapshot of a log file while it's getting continuously appended to.

> > > But the performance gain of copy offload is too big to just ignore,
> > > and
> > > in fact it's what copy_file_range does on every filesystem but
> > > btrfs and
> > > ocfs2 (and maybe cifs?), so I don't think we can just ignore it.
> > > 
> > > If we had separate copy_file_range and clone_file_range, I *think*
> > > it
> > > could all be made sensible.  Am I missing something?
> > > 
> > 
> > How would the application (cp) know when to call the clone_file_range
> > and when to call copy_file_range?
> 
> cp can probably call copy_file_range(), but any application that needs
> atomic semantics (i.e. a binary operation success/fail) must call
> clone_file_range().

I don't believe there's a clone_file_range().  I see the vfs interface,
but no system call.

And implementing a simple cp is harder than it should be when you don't
know whether it's implemented as copy or clone.  You have to stat for
the file size first, retry if you got it wrong, and also retry if you
get a short read.  The example in the clone_file_range() man page is
incomplete.

--b.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
  2017-03-08 20:32                     ` bfields
@ 2017-03-08 20:49                         ` Trond Myklebust
  0 siblings, 0 replies; 56+ messages in thread
From: Trond Myklebust @ 2017-03-08 20:49 UTC (permalink / raw)
  To: bfields; +Cc: hch, linux-nfs, kolga, linux-fsdevel

On Wed, 2017-03-08 at 15:32 -0500, bfields@fieldses.org wrote:
> On Wed, Mar 08, 2017 at 08:18:31PM +0000, Trond Myklebust wrote:
> > On Wed, 2017-03-08 at 15:00 -0500, Olga Kornievskaia wrote:
> > > > On Mar 8, 2017, at 2:53 PM, J. Bruce Fields <bfields@fieldses.o
> > > > rg>
> > > > wrote:
> > > > 
> > > > On Wed, Mar 08, 2017 at 12:32:12PM -0500, Olga Kornievskaia
> > > > wrote:
> > > > > 
> > > > > > On Mar 8, 2017, at 12:25 PM, Christoph Hellwig <hch@infrade
> > > > > > ad.o
> > > > > > rg>
> > > > > > wrote:
> > > > > > 
> > > > > > On Wed, Mar 08, 2017 at 12:05:21PM -0500, J. Bruce Fields
> > > > > > wrote:
> > > > > > > Since copy isn't atomic that check is never going to be
> > > > > > > reliable.
> > > > > > 
> > > > > > That's true for everything that COPY does.  By that logic
> > > > > > we
> > > > > > should
> > > > > > not implement it at all (a logic that I'd fully support)
> > > > > 
> > > > > If you were to only keep CLONE then you’d lose a huge
> > > > > performance
> > > > > gain
> > > > > you get from server-to-server COPY. 
> > > > 
> > > > Yes.  Also, I think copy-like copy implementations have
> > > > reasonable
> > > > semantics that are basically the same as read:
> > > > 
> > > > 	- copy can return successfully with less copied than
> > > > requested.
> > > > 	- it's fine for the copied range to start and/or end
> > > > past end
> > > > of
> > > > 	  file, it'll just return a short read.
> > > > 	- A copy of more than 0 bytes returning 0 means you're
> > > > at end
> > > > of
> > > > 	  file.
> > > > 
> > > > The particular problem here is that that doesn't fit how clone
> > > > works at
> > > > all.
> > > > 
> > > > It feels like what happened is that copy_file_range() was made
> > > > mainly
> > > > for the clone case, with the idea that copy might be
> > > > reluctantly
> > > > accepted as a second-class implementation.
> > 
> > Historically? No... Christoph added clone as a valid implementation
> > of
> > copy_file_range() almost a year after Zach and Anna defined the
> > semantics of vfs_copy_file_range(). git blame is your friend...
> 
> Yeah, I know.  It still feels to me like the interface was originally
> designed with clone in mind, but that's my vague impression from the
> man
> pages and half-remembered conversations.
> 
> Though the lack of a "just copy the whole file regardless of size"
> case
> is weird for clone.  All you can do is stat the file and then hope it
> doesn't change before you issue the copy_file_range.  But I'd think
> it'd
> be easy for an atomic clone implementation to handle, say, getting a
> snapshot of a log file while it's getting continuously appended to.

It really isn't that interesting in the continuously appended case
(what difference does it make if you only get data from just a few
moments ago), but I can see it being an issue in the case of random
writes where the file size is being extended.

The thing is that in both those cases, the copy_file_range() semantics
are worse, since they don't even guarantee a time-consistent copy.

> > > > But the performance gain of copy offload is too big to just
> > > > ignore,
> > > > and
> > > > in fact it's what copy_file_range does on every filesystem but
> > > > btrfs and
> > > > ocfs2 (and maybe cifs?), so I don't think we can just ignore
> > > > it.
> > > > 
> > > > If we had separate copy_file_range and clone_file_range, I
> > > > *think*
> > > > it
> > > > could all be made sensible.  Am I missing something?
> > > > 
> > > 
> > > How would the application (cp) know when to call the
> > > clone_file_range
> > > and when to call copy_file_range?
> > 
> > cp can probably call copy_file_range(), but any application that
> > needs
> > atomic semantics (i.e. a binary operation success/fail) must call
> > clone_file_range().
> 
> I don't believe there's a clone_file_range().  I see the vfs
> interface,
> but no system call.

There is a standard FICLONERANGE ioctl() that can be used on all
filesystems that support the vfs interface.

> And implementing a simple cp is harder than it should be when you
> don't
> know whether it's implemented as copy or clone.  You have to stat for
> the file size first, retry if you got it wrong, and also retry if you
> get a short read.  The example in the clone_file_range() man page is
> incomplete.

As I said, you shouldn't be using copy_file_range() either in the case
where the file is being modified.

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
@ 2017-03-08 20:49                         ` Trond Myklebust
  0 siblings, 0 replies; 56+ messages in thread
From: Trond Myklebust @ 2017-03-08 20:49 UTC (permalink / raw)
  To: bfields; +Cc: hch, linux-nfs, kolga, linux-fsdevel

T24gV2VkLCAyMDE3LTAzLTA4IGF0IDE1OjMyIC0wNTAwLCBiZmllbGRzQGZpZWxkc2VzLm9yZyB3
cm90ZToNCj4gT24gV2VkLCBNYXIgMDgsIDIwMTcgYXQgMDg6MTg6MzFQTSArMDAwMCwgVHJvbmQg
TXlrbGVidXN0IHdyb3RlOg0KPiA+IE9uIFdlZCwgMjAxNy0wMy0wOCBhdCAxNTowMCAtMDUwMCwg
T2xnYSBLb3JuaWV2c2thaWEgd3JvdGU6DQo+ID4gPiA+IE9uIE1hciA4LCAyMDE3LCBhdCAyOjUz
IFBNLCBKLiBCcnVjZSBGaWVsZHMgPGJmaWVsZHNAZmllbGRzZXMubw0KPiA+ID4gPiByZz4NCj4g
PiA+ID4gd3JvdGU6DQo+ID4gPiA+IA0KPiA+ID4gPiBPbiBXZWQsIE1hciAwOCwgMjAxNyBhdCAx
MjozMjoxMlBNIC0wNTAwLCBPbGdhIEtvcm5pZXZza2FpYQ0KPiA+ID4gPiB3cm90ZToNCj4gPiA+
ID4gPiANCj4gPiA+ID4gPiA+IE9uIE1hciA4LCAyMDE3LCBhdCAxMjoyNSBQTSwgQ2hyaXN0b3Bo
IEhlbGx3aWcgPGhjaEBpbmZyYWRlDQo+ID4gPiA+ID4gPiBhZC5vDQo+ID4gPiA+ID4gPiByZz4N
Cj4gPiA+ID4gPiA+IHdyb3RlOg0KPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiBPbiBXZWQsIE1h
ciAwOCwgMjAxNyBhdCAxMjowNToyMVBNIC0wNTAwLCBKLiBCcnVjZSBGaWVsZHMNCj4gPiA+ID4g
PiA+IHdyb3RlOg0KPiA+ID4gPiA+ID4gPiBTaW5jZSBjb3B5IGlzbid0IGF0b21pYyB0aGF0IGNo
ZWNrIGlzIG5ldmVyIGdvaW5nIHRvIGJlDQo+ID4gPiA+ID4gPiA+IHJlbGlhYmxlLg0KPiA+ID4g
PiA+ID4gDQo+ID4gPiA+ID4gPiBUaGF0J3MgdHJ1ZSBmb3IgZXZlcnl0aGluZyB0aGF0IENPUFkg
ZG9lcy7CoMKgQnkgdGhhdCBsb2dpYw0KPiA+ID4gPiA+ID4gd2UNCj4gPiA+ID4gPiA+IHNob3Vs
ZA0KPiA+ID4gPiA+ID4gbm90IGltcGxlbWVudCBpdCBhdCBhbGwgKGEgbG9naWMgdGhhdCBJJ2Qg
ZnVsbHkgc3VwcG9ydCkNCj4gPiA+ID4gPiANCj4gPiA+ID4gPiBJZiB5b3Ugd2VyZSB0byBvbmx5
IGtlZXAgQ0xPTkUgdGhlbiB5b3XigJlkIGxvc2UgYSBodWdlDQo+ID4gPiA+ID4gcGVyZm9ybWFu
Y2UNCj4gPiA+ID4gPiBnYWluDQo+ID4gPiA+ID4geW91IGdldCBmcm9tIHNlcnZlci10by1zZXJ2
ZXIgQ09QWS7CoA0KPiA+ID4gPiANCj4gPiA+ID4gWWVzLsKgwqBBbHNvLCBJIHRoaW5rIGNvcHkt
bGlrZSBjb3B5IGltcGxlbWVudGF0aW9ucyBoYXZlDQo+ID4gPiA+IHJlYXNvbmFibGUNCj4gPiA+
ID4gc2VtYW50aWNzIHRoYXQgYXJlIGJhc2ljYWxseSB0aGUgc2FtZSBhcyByZWFkOg0KPiA+ID4g
PiANCj4gPiA+ID4gCS0gY29weSBjYW4gcmV0dXJuIHN1Y2Nlc3NmdWxseSB3aXRoIGxlc3MgY29w
aWVkIHRoYW4NCj4gPiA+ID4gcmVxdWVzdGVkLg0KPiA+ID4gPiAJLSBpdCdzIGZpbmUgZm9yIHRo
ZSBjb3BpZWQgcmFuZ2UgdG8gc3RhcnQgYW5kL29yIGVuZA0KPiA+ID4gPiBwYXN0IGVuZA0KPiA+
ID4gPiBvZg0KPiA+ID4gPiAJwqDCoGZpbGUsIGl0J2xsIGp1c3QgcmV0dXJuIGEgc2hvcnQgcmVh
ZC4NCj4gPiA+ID4gCS0gQSBjb3B5IG9mIG1vcmUgdGhhbiAwIGJ5dGVzIHJldHVybmluZyAwIG1l
YW5zIHlvdSdyZQ0KPiA+ID4gPiBhdCBlbmQNCj4gPiA+ID4gb2YNCj4gPiA+ID4gCcKgwqBmaWxl
Lg0KPiA+ID4gPiANCj4gPiA+ID4gVGhlIHBhcnRpY3VsYXIgcHJvYmxlbSBoZXJlIGlzIHRoYXQg
dGhhdCBkb2Vzbid0IGZpdCBob3cgY2xvbmUNCj4gPiA+ID4gd29ya3MgYXQNCj4gPiA+ID4gYWxs
Lg0KPiA+ID4gPiANCj4gPiA+ID4gSXQgZmVlbHMgbGlrZSB3aGF0IGhhcHBlbmVkIGlzIHRoYXQg
Y29weV9maWxlX3JhbmdlKCkgd2FzIG1hZGUNCj4gPiA+ID4gbWFpbmx5DQo+ID4gPiA+IGZvciB0
aGUgY2xvbmUgY2FzZSwgd2l0aCB0aGUgaWRlYSB0aGF0IGNvcHkgbWlnaHQgYmUNCj4gPiA+ID4g
cmVsdWN0YW50bHkNCj4gPiA+ID4gYWNjZXB0ZWQgYXMgYSBzZWNvbmQtY2xhc3MgaW1wbGVtZW50
YXRpb24uDQo+ID4gDQo+ID4gSGlzdG9yaWNhbGx5PyBOby4uLiBDaHJpc3RvcGggYWRkZWQgY2xv
bmUgYXMgYSB2YWxpZCBpbXBsZW1lbnRhdGlvbg0KPiA+IG9mDQo+ID4gY29weV9maWxlX3Jhbmdl
KCkgYWxtb3N0IGEgeWVhciBhZnRlciBaYWNoIGFuZCBBbm5hIGRlZmluZWQgdGhlDQo+ID4gc2Vt
YW50aWNzIG9mIHZmc19jb3B5X2ZpbGVfcmFuZ2UoKS4gZ2l0IGJsYW1lIGlzIHlvdXIgZnJpZW5k
Li4uDQo+IA0KPiBZZWFoLCBJIGtub3cuwqDCoEl0IHN0aWxsIGZlZWxzIHRvIG1lIGxpa2UgdGhl
IGludGVyZmFjZSB3YXMgb3JpZ2luYWxseQ0KPiBkZXNpZ25lZCB3aXRoIGNsb25lIGluIG1pbmQs
IGJ1dCB0aGF0J3MgbXkgdmFndWUgaW1wcmVzc2lvbiBmcm9tIHRoZQ0KPiBtYW4NCj4gcGFnZXMg
YW5kIGhhbGYtcmVtZW1iZXJlZCBjb252ZXJzYXRpb25zLg0KPiANCj4gVGhvdWdoIHRoZSBsYWNr
IG9mIGEgImp1c3QgY29weSB0aGUgd2hvbGUgZmlsZSByZWdhcmRsZXNzIG9mIHNpemUiDQo+IGNh
c2UNCj4gaXMgd2VpcmQgZm9yIGNsb25lLsKgwqBBbGwgeW91IGNhbiBkbyBpcyBzdGF0IHRoZSBm
aWxlIGFuZCB0aGVuIGhvcGUgaXQNCj4gZG9lc24ndCBjaGFuZ2UgYmVmb3JlIHlvdSBpc3N1ZSB0
aGUgY29weV9maWxlX3JhbmdlLsKgwqBCdXQgSSdkIHRoaW5rDQo+IGl0J2QNCj4gYmUgZWFzeSBm
b3IgYW4gYXRvbWljIGNsb25lIGltcGxlbWVudGF0aW9uIHRvIGhhbmRsZSwgc2F5LCBnZXR0aW5n
IGENCj4gc25hcHNob3Qgb2YgYSBsb2cgZmlsZSB3aGlsZSBpdCdzIGdldHRpbmcgY29udGludW91
c2x5IGFwcGVuZGVkIHRvLg0KDQpJdCByZWFsbHkgaXNuJ3QgdGhhdCBpbnRlcmVzdGluZyBpbiB0
aGUgY29udGludW91c2x5IGFwcGVuZGVkIGNhc2UNCih3aGF0IGRpZmZlcmVuY2UgZG9lcyBpdCBt
YWtlIGlmIHlvdSBvbmx5IGdldCBkYXRhIGZyb20ganVzdCBhIGZldw0KbW9tZW50cyBhZ28pLCBi
dXQgSSBjYW4gc2VlIGl0IGJlaW5nIGFuIGlzc3VlIGluIHRoZSBjYXNlIG9mIHJhbmRvbQ0Kd3Jp
dGVzIHdoZXJlIHRoZSBmaWxlIHNpemUgaXMgYmVpbmcgZXh0ZW5kZWQuDQoNClRoZSB0aGluZyBp
cyB0aGF0IGluIGJvdGggdGhvc2UgY2FzZXMsIHRoZSBjb3B5X2ZpbGVfcmFuZ2UoKSBzZW1hbnRp
Y3MNCmFyZSB3b3JzZSwgc2luY2UgdGhleSBkb24ndCBldmVuIGd1YXJhbnRlZSBhIHRpbWUtY29u
c2lzdGVudCBjb3B5Lg0KDQo+ID4gPiA+IEJ1dCB0aGUgcGVyZm9ybWFuY2UgZ2FpbiBvZiBjb3B5
IG9mZmxvYWQgaXMgdG9vIGJpZyB0byBqdXN0DQo+ID4gPiA+IGlnbm9yZSwNCj4gPiA+ID4gYW5k
DQo+ID4gPiA+IGluIGZhY3QgaXQncyB3aGF0IGNvcHlfZmlsZV9yYW5nZSBkb2VzIG9uIGV2ZXJ5
IGZpbGVzeXN0ZW0gYnV0DQo+ID4gPiA+IGJ0cmZzIGFuZA0KPiA+ID4gPiBvY2ZzMiAoYW5kIG1h
eWJlIGNpZnM/KSwgc28gSSBkb24ndCB0aGluayB3ZSBjYW4ganVzdCBpZ25vcmUNCj4gPiA+ID4g
aXQuDQo+ID4gPiA+IA0KPiA+ID4gPiBJZiB3ZSBoYWQgc2VwYXJhdGUgY29weV9maWxlX3Jhbmdl
IGFuZCBjbG9uZV9maWxlX3JhbmdlLCBJDQo+ID4gPiA+ICp0aGluayoNCj4gPiA+ID4gaXQNCj4g
PiA+ID4gY291bGQgYWxsIGJlIG1hZGUgc2Vuc2libGUuwqDCoEFtIEkgbWlzc2luZyBzb21ldGhp
bmc/DQo+ID4gPiA+IA0KPiA+ID4gDQo+ID4gPiBIb3cgd291bGQgdGhlIGFwcGxpY2F0aW9uIChj
cCkga25vdyB3aGVuIHRvIGNhbGwgdGhlDQo+ID4gPiBjbG9uZV9maWxlX3JhbmdlDQo+ID4gPiBh
bmQgd2hlbiB0byBjYWxsIGNvcHlfZmlsZV9yYW5nZT8NCj4gPiANCj4gPiBjcCBjYW4gcHJvYmFi
bHkgY2FsbCBjb3B5X2ZpbGVfcmFuZ2UoKSwgYnV0IGFueSBhcHBsaWNhdGlvbiB0aGF0DQo+ID4g
bmVlZHMNCj4gPiBhdG9taWMgc2VtYW50aWNzIChpLmUuIGEgYmluYXJ5IG9wZXJhdGlvbiBzdWNj
ZXNzL2ZhaWwpIG11c3QgY2FsbA0KPiA+IGNsb25lX2ZpbGVfcmFuZ2UoKS4NCj4gDQo+IEkgZG9u
J3QgYmVsaWV2ZSB0aGVyZSdzIGEgY2xvbmVfZmlsZV9yYW5nZSgpLsKgwqBJIHNlZSB0aGUgdmZz
DQo+IGludGVyZmFjZSwNCj4gYnV0IG5vIHN5c3RlbSBjYWxsLg0KDQpUaGVyZSBpcyBhIHN0YW5k
YXJkIEZJQ0xPTkVSQU5HRSBpb2N0bCgpIHRoYXQgY2FuIGJlIHVzZWQgb24gYWxsDQpmaWxlc3lz
dGVtcyB0aGF0IHN1cHBvcnQgdGhlIHZmcyBpbnRlcmZhY2UuDQoNCj4gQW5kIGltcGxlbWVudGlu
ZyBhIHNpbXBsZSBjcCBpcyBoYXJkZXIgdGhhbiBpdCBzaG91bGQgYmUgd2hlbiB5b3UNCj4gZG9u
J3QNCj4ga25vdyB3aGV0aGVyIGl0J3MgaW1wbGVtZW50ZWQgYXMgY29weSBvciBjbG9uZS7CoMKg
WW91IGhhdmUgdG8gc3RhdCBmb3INCj4gdGhlIGZpbGUgc2l6ZSBmaXJzdCwgcmV0cnkgaWYgeW91
IGdvdCBpdCB3cm9uZywgYW5kIGFsc28gcmV0cnkgaWYgeW91DQo+IGdldCBhIHNob3J0IHJlYWQu
wqDCoFRoZSBleGFtcGxlIGluIHRoZSBjbG9uZV9maWxlX3JhbmdlKCkgbWFuIHBhZ2UgaXMNCj4g
aW5jb21wbGV0ZS4NCg0KQXMgSSBzYWlkLCB5b3Ugc2hvdWxkbid0IGJlIHVzaW5nIGNvcHlfZmls
ZV9yYW5nZSgpIGVpdGhlciBpbiB0aGUgY2FzZQ0Kd2hlcmUgdGhlIGZpbGUgaXMgYmVpbmcgbW9k
aWZpZWQuDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWlu
ZXIsIFByaW1hcnlEYXRhDQp0cm9uZC5teWtsZWJ1c3RAcHJpbWFyeWRhdGEuY29tDQo=


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
  2017-03-08 20:49                         ` Trond Myklebust
  (?)
@ 2017-03-09 15:29                         ` bfields
  2017-03-09 15:35                           ` hch
  -1 siblings, 1 reply; 56+ messages in thread
From: bfields @ 2017-03-09 15:29 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: hch, linux-nfs, kolga, linux-fsdevel

On Wed, Mar 08, 2017 at 08:49:56PM +0000, Trond Myklebust wrote:
> On Wed, 2017-03-08 at 15:32 -0500, bfields@fieldses.org wrote:
> > On Wed, Mar 08, 2017 at 08:18:31PM +0000, Trond Myklebust wrote:
> > > On Wed, 2017-03-08 at 15:00 -0500, Olga Kornievskaia wrote:
> > > > > On Mar 8, 2017, at 2:53 PM, J. Bruce Fields <bfields@fieldses.o
> > > > > rg>
> > > > > wrote:
> > > > > 
> > > > > On Wed, Mar 08, 2017 at 12:32:12PM -0500, Olga Kornievskaia
> > > > > wrote:
> > > > > > 
> > > > > > > On Mar 8, 2017, at 12:25 PM, Christoph Hellwig <hch@infrade
> > > > > > > ad.o
> > > > > > > rg>
> > > > > > > wrote:
> > > > > > > 
> > > > > > > On Wed, Mar 08, 2017 at 12:05:21PM -0500, J. Bruce Fields
> > > > > > > wrote:
> > > > > > > > Since copy isn't atomic that check is never going to be
> > > > > > > > reliable.
> > > > > > > 
> > > > > > > That's true for everything that COPY does.  By that logic
> > > > > > > we
> > > > > > > should
> > > > > > > not implement it at all (a logic that I'd fully support)
> > > > > > 
> > > > > > If you were to only keep CLONE then you’d lose a huge
> > > > > > performance
> > > > > > gain
> > > > > > you get from server-to-server COPY. 
> > > > > 
> > > > > Yes.  Also, I think copy-like copy implementations have
> > > > > reasonable
> > > > > semantics that are basically the same as read:
> > > > > 
> > > > > 	- copy can return successfully with less copied than
> > > > > requested.
> > > > > 	- it's fine for the copied range to start and/or end
> > > > > past end
> > > > > of
> > > > > 	  file, it'll just return a short read.
> > > > > 	- A copy of more than 0 bytes returning 0 means you're
> > > > > at end
> > > > > of
> > > > > 	  file.
> > > > > 
> > > > > The particular problem here is that that doesn't fit how clone
> > > > > works at
> > > > > all.
> > > > > 
> > > > > It feels like what happened is that copy_file_range() was made
> > > > > mainly
> > > > > for the clone case, with the idea that copy might be
> > > > > reluctantly
> > > > > accepted as a second-class implementation.
> > > 
> > > Historically? No... Christoph added clone as a valid implementation
> > > of
> > > copy_file_range() almost a year after Zach and Anna defined the
> > > semantics of vfs_copy_file_range(). git blame is your friend...
> > 
> > Yeah, I know.  It still feels to me like the interface was originally
> > designed with clone in mind, but that's my vague impression from the
> > man
> > pages and half-remembered conversations.
> > 
> > Though the lack of a "just copy the whole file regardless of size"
> > case
> > is weird for clone.  All you can do is stat the file and then hope it
> > doesn't change before you issue the copy_file_range.  But I'd think
> > it'd
> > be easy for an atomic clone implementation to handle, say, getting a
> > snapshot of a log file while it's getting continuously appended to.
> 
> It really isn't that interesting in the continuously appended case
> (what difference does it make if you only get data from just a few
> moments ago), but I can see it being an issue in the case of random
> writes where the file size is being extended.

Bah, yes, apologies for the bad example.

> The thing is that in both those cases, the copy_file_range() semantics
> are worse, since they don't even guarantee a time-consistent copy.
>
> > > > > But the performance gain of copy offload is too big to just
> > > > > ignore,
> > > > > and
> > > > > in fact it's what copy_file_range does on every filesystem but
> > > > > btrfs and
> > > > > ocfs2 (and maybe cifs?), so I don't think we can just ignore
> > > > > it.
> > > > > 
> > > > > If we had separate copy_file_range and clone_file_range, I
> > > > > *think*
> > > > > it
> > > > > could all be made sensible.  Am I missing something?
> > > > > 
> > > > 
> > > > How would the application (cp) know when to call the
> > > > clone_file_range
> > > > and when to call copy_file_range?
> > > 
> > > cp can probably call copy_file_range(), but any application that
> > > needs
> > > atomic semantics (i.e. a binary operation success/fail) must call
> > > clone_file_range().
> > 
> > I don't believe there's a clone_file_range().  I see the vfs
> > interface,
> > but no system call.
> 
> There is a standard FICLONERANGE ioctl() that can be used on all
> filesystems that support the vfs interface.

Oh, thanks, I forgot about that.

So I don't understand why it needed to be added to copy_file_range().
The copy and clone semantics are different enough that I think callers
want to know which they're getting.

> > And implementing a simple cp is harder than it should be when you
> > don't
> > know whether it's implemented as copy or clone.  You have to stat for
> > the file size first, retry if you got it wrong, and also retry if you
> > get a short read.  The example in the clone_file_range() man page is
> > incomplete.
> 
> As I said, you shouldn't be using copy_file_range() either in the case
> where the file is being modified.

Don't we want it to have more or less the same behavior as a read-write
loop?  People are probably running backup programs that depend on just
simple copies, and maybe the results are good enough for their purposes,
or maybe they're actually corrupting parts of their backups and don't
know, but we can't suddenly start aborting their backups with errors and
tell users it's for their own good.  So copy_file_range() callers will
need to handle EINVAL on changing files somehow.

--b.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
  2017-03-09 15:29                         ` bfields
@ 2017-03-09 15:35                           ` hch
  2017-03-09 16:16                             ` bfields
  0 siblings, 1 reply; 56+ messages in thread
From: hch @ 2017-03-09 15:35 UTC (permalink / raw)
  To: bfields; +Cc: Trond Myklebust, hch, linux-nfs, kolga, linux-fsdevel

[please trim your replies instead of this giant unreadable garbage,
 thanks!]

On Thu, Mar 09, 2017 at 10:29:48AM -0500, bfields@fieldses.org wrote:
> So I don't understand why it needed to be added to copy_file_range().
> The copy and clone semantics are different enough that I think callers
> want to know which they're getting.

Because if a file systems implements clone is literally is always better
than doing a copy loop, so using it is an absolute non-brainer.

> Don't we want it to have more or less the same behavior as a read-write
> loop?  People are probably running backup programs that depend on just
> simple copies, and maybe the results are good enough for their purposes,
> or maybe they're actually corrupting parts of their backups and don't
> know, but we can't suddenly start aborting their backups with errors and
> tell users it's for their own good.  So copy_file_range() callers will
> need to handle EINVAL on changing files somehow.

They do, and the system call has been in the tree for almost a year and
a half, so we can't simply change it.  Fortunately we do have a flags
argument that can be used to implement your preferred semantics if you
care deeply enough about them.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
  2017-03-09 15:35                           ` hch
@ 2017-03-09 16:16                             ` bfields
  2017-03-09 16:17                               ` hch
  2017-03-09 17:35                                 ` Olga Kornievskaia
  0 siblings, 2 replies; 56+ messages in thread
From: bfields @ 2017-03-09 16:16 UTC (permalink / raw)
  To: hch; +Cc: Trond Myklebust, linux-nfs, kolga, linux-fsdevel

On Thu, Mar 09, 2017 at 07:35:59AM -0800, hch@infradead.org wrote:
> On Thu, Mar 09, 2017 at 10:29:48AM -0500, bfields@fieldses.org wrote:
> > So I don't understand why it needed to be added to copy_file_range().
> > The copy and clone semantics are different enough that I think callers
> > want to know which they're getting.
> 
> Because if a file systems implements clone is literally is always better
> than doing a copy loop, so using it is an absolute non-brainer.

I guess I'm just hung up on the EINVAL vs. short copy behavior.  It
seems more annoying and error-prone to be prepared for both, as opposed
to trying clone and then explicitly falling back to copy if that doesn't
work.  Maybe it's not that big a deal.

> They do, and the system call has been in the tree for almost a year and
> a half, so we can't simply change it.  Fortunately we do have a flags
> argument that can be used to implement your preferred semantics if you
> care deeply enough about them.

Yeah.

There are also some other offset, length combinations that currently
return -EINVAL, I wonder if any of those could be repurposed e.g. for a
"keep copying to end of file" call.

--b.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
  2017-03-09 16:16                             ` bfields
@ 2017-03-09 16:17                               ` hch
  2017-03-09 17:28                                   ` Olga Kornievskaia
  2017-03-09 17:35                                 ` Olga Kornievskaia
  1 sibling, 1 reply; 56+ messages in thread
From: hch @ 2017-03-09 16:17 UTC (permalink / raw)
  To: bfields; +Cc: hch, Trond Myklebust, linux-nfs, kolga, linux-fsdevel

On Thu, Mar 09, 2017 at 11:16:01AM -0500, bfields@fieldses.org wrote:
> I guess I'm just hung up on the EINVAL vs. short copy behavior.  It
> seems more annoying and error-prone to be prepared for both, as opposed
> to trying clone and then explicitly falling back to copy if that doesn't
> work.  Maybe it's not that big a deal.

We can do short copies^H^H^H^H^Hclones for clone just as easily,
at least for local filesystems (NFS would require some tweaks due to the
protocol).

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
  2017-03-09 16:17                               ` hch
@ 2017-03-09 17:28                                   ` Olga Kornievskaia
  0 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-09 17:28 UTC (permalink / raw)
  To: hch; +Cc: bfields, Trond Myklebust, linux-nfs, linux-fsdevel


> On Mar 9, 2017, at 11:17 AM, hch@infradead.org wrote:
> 
> On Thu, Mar 09, 2017 at 11:16:01AM -0500, bfields@fieldses.org wrote:
>> I guess I'm just hung up on the EINVAL vs. short copy behavior.  It
>> seems more annoying and error-prone to be prepared for both, as opposed
>> to trying clone and then explicitly falling back to copy if that doesn't
>> work.  Maybe it's not that big a deal.
> 
> We can do short copies^H^H^H^H^Hclones for clone just as easily,
> at least for local filesystems (NFS would require some tweaks due to the
> protocol).

I’m confused by the wording of “we can do … easily” . Is “can” = in the future? Currently, testing copy_file_range() on a btfs with argument of offset+len beyond the end of the file fail with EINVAL. Is NFS tweaking = revert the “MUST” in the spec for the check? 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
@ 2017-03-09 17:28                                   ` Olga Kornievskaia
  0 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-09 17:28 UTC (permalink / raw)
  To: hch; +Cc: bfields, Trond Myklebust, linux-nfs, linux-fsdevel


> On Mar 9, 2017, at 11:17 AM, hch@infradead.org wrote:
>=20
> On Thu, Mar 09, 2017 at 11:16:01AM -0500, bfields@fieldses.org wrote:
>> I guess I'm just hung up on the EINVAL vs. short copy behavior.  It
>> seems more annoying and error-prone to be prepared for both, as =
opposed
>> to trying clone and then explicitly falling back to copy if that =
doesn't
>> work.  Maybe it's not that big a deal.
>=20
> We can do short copies^H^H^H^H^Hclones for clone just as easily,
> at least for local filesystems (NFS would require some tweaks due to =
the
> protocol).

I=E2=80=99m confused by the wording of =E2=80=9Cwe can do =E2=80=A6 =
easily=E2=80=9D . Is =E2=80=9Ccan=E2=80=9D =3D in the future? Currently, =
testing copy_file_range() on a btfs with argument of offset+len beyond =
the end of the file fail with EINVAL. Is NFS tweaking =3D revert the =
=E2=80=9CMUST=E2=80=9D in the spec for the check?=20




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
  2017-03-09 16:16                             ` bfields
@ 2017-03-09 17:35                                 ` Olga Kornievskaia
  2017-03-09 17:35                                 ` Olga Kornievskaia
  1 sibling, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-09 17:35 UTC (permalink / raw)
  To: bfields; +Cc: hch, Trond Myklebust, linux-nfs, linux-fsdevel


> On Mar 9, 2017, at 11:16 AM, bfields@fieldses.org wrote:
> 
> On Thu, Mar 09, 2017 at 07:35:59AM -0800, hch@infradead.org wrote:
>> On Thu, Mar 09, 2017 at 10:29:48AM -0500, bfields@fieldses.org wrote:
>>> So I don't understand why it needed to be added to copy_file_range().
>>> The copy and clone semantics are different enough that I think callers
>>> want to know which they're getting.
>> 
>> Because if a file systems implements clone is literally is always better
>> than doing a copy loop, so using it is an absolute non-brainer.
> 
> I guess I'm just hung up on the EINVAL vs. short copy behavior.  It
> seems more annoying and error-prone to be prepared for both, as opposed
> to trying clone and then explicitly falling back to copy if that doesn't
> work.  Maybe it's not that big a deal.

I’m confused at which layer are we calling clone() and if we get EINVAL we call copy()? In VFS? Or are we talking about the case where there are two different system calls clone_file_range() and copy_file_range() that are provided to the “cp” and it calls one and then falls back onto another?

> 
>> They do, and the system call has been in the tree for almost a year and
>> a half, so we can't simply change it.  Fortunately we do have a flags
>> argument that can be used to implement your preferred semantics if you
>> care deeply enough about them.
> 
> Yeah.
> 
> There are also some other offset, length combinations that currently
> return -EINVAL, I wonder if any of those could be repurposed e.g. for a
> "keep copying to end of file" call.
> 
> --b.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
@ 2017-03-09 17:35                                 ` Olga Kornievskaia
  0 siblings, 0 replies; 56+ messages in thread
From: Olga Kornievskaia @ 2017-03-09 17:35 UTC (permalink / raw)
  To: bfields; +Cc: hch, Trond Myklebust, linux-nfs, linux-fsdevel


> On Mar 9, 2017, at 11:16 AM, bfields@fieldses.org wrote:
>=20
> On Thu, Mar 09, 2017 at 07:35:59AM -0800, hch@infradead.org wrote:
>> On Thu, Mar 09, 2017 at 10:29:48AM -0500, bfields@fieldses.org wrote:
>>> So I don't understand why it needed to be added to =
copy_file_range().
>>> The copy and clone semantics are different enough that I think =
callers
>>> want to know which they're getting.
>>=20
>> Because if a file systems implements clone is literally is always =
better
>> than doing a copy loop, so using it is an absolute non-brainer.
>=20
> I guess I'm just hung up on the EINVAL vs. short copy behavior.  It
> seems more annoying and error-prone to be prepared for both, as =
opposed
> to trying clone and then explicitly falling back to copy if that =
doesn't
> work.  Maybe it's not that big a deal.

I=E2=80=99m confused at which layer are we calling clone() and if we get =
EINVAL we call copy()? In VFS? Or are we talking about the case where =
there are two different system calls clone_file_range() and =
copy_file_range() that are provided to the =E2=80=9Ccp=E2=80=9D and it =
calls one and then falls back onto another?

>=20
>> They do, and the system call has been in the tree for almost a year =
and
>> a half, so we can't simply change it.  Fortunately we do have a flags
>> argument that can be used to implement your preferred semantics if =
you
>> care deeply enough about them.
>=20
> Yeah.
>=20
> There are also some other offset, length combinations that currently
> return -EINVAL, I wonder if any of those could be repurposed e.g. for =
a
> "keep copying to end of file" call.
>=20
> --b.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
  2017-03-09 17:28                                   ` Olga Kornievskaia
  (?)
@ 2017-03-09 18:40                                   ` bfields
  -1 siblings, 0 replies; 56+ messages in thread
From: bfields @ 2017-03-09 18:40 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: hch, Trond Myklebust, linux-nfs, linux-fsdevel

On Thu, Mar 09, 2017 at 12:28:03PM -0500, Olga Kornievskaia wrote:
> 
> > On Mar 9, 2017, at 11:17 AM, hch@infradead.org wrote:
> > 
> > On Thu, Mar 09, 2017 at 11:16:01AM -0500, bfields@fieldses.org
> > wrote:
> >> I guess I'm just hung up on the EINVAL vs. short copy behavior.  It
> >> seems more annoying and error-prone to be prepared for both, as
> >> opposed to trying clone and then explicitly falling back to copy if
> >> that doesn't work.  Maybe it's not that big a deal.
> > 
> > We can do short copies^H^H^H^H^Hclones for clone just as easily, at
> > least for local filesystems (NFS would require some tweaks due to
> > the protocol).
> 
> I’m confused by the wording of “we can do … easily” . Is “can” = in
> the future? Currently, testing copy_file_range() on a btfs with
> argument of offset+len beyond the end of the file fail with EINVAL. Is
> NFS tweaking = revert the “MUST” in the spec for the check? 

Checking https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41
... looks like the CLONE result doesn't even have a length.

Most users are probably cloning the whole file, so I guess it only
matters in the case the file's changing (so that the size returned from
stat might not be correct by the time you do the clone).

Allowing short copies would be one way to handle that--I think it'd work
then to just always request a clone to the maximum length.

Alternatively if the linux interface just had a way to say "clone to end
of file", that'd solve most of the problem and be a nice fit for the
existing NFS protocol (which special-cases length 0 to mean "to end of
file").

Maybe we could special-case the maximum size_t to mean that.  (What is
that, 2^63-1?.)

--b.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file
  2017-03-09 17:28                                   ` Olga Kornievskaia
  (?)
  (?)
@ 2017-03-09 21:55                                   ` hch
  -1 siblings, 0 replies; 56+ messages in thread
From: hch @ 2017-03-09 21:55 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: hch, bfields, Trond Myklebust, linux-nfs, linux-fsdevel

On Thu, Mar 09, 2017 at 12:28:03PM -0500, Olga Kornievskaia wrote:
> I’m confused by the wording of “we can do … easily” . Is “can” = in the future? Currently, testing copy_file_range() on a btfs with argument of offset+len beyond the end of the file fail with EINVAL. Is NFS tweaking = revert the “MUST” in the spec for the check? 

The current clone and copy documents (which are our specs) require
this behavior, so that is expected.  But if we decided we want a variant
of copy / clone that doesn't do this it would be very easy to implement
in the local file systems.  Only for NFS we'd get a failure back and
would have to retry the clone based on the updated file size.

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2017-03-09 22:52 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-02 16:01 [RFC v1 00/19] NFS support for inter and async COPY Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 01/19] fs: Don't copy beyond the end of the file Olga Kornievskaia
2017-03-02 16:22   ` Christoph Hellwig
2017-03-02 16:34     ` Olga Kornievskaia
2017-03-03 20:47     ` J. Bruce Fields
2017-03-03 21:08       ` Olga Kornievskaia
2017-03-03 21:32         ` J. Bruce Fields
     [not found]           ` <B3F80DA0-B4F8-4628-88C5-E5C047620F17@netapp.com>
2017-03-04  2:10             ` J. Bruce Fields
2017-03-06 16:27               ` [RFC v1 01/19] " Olga Kornievskaia
2017-03-06 19:09                 ` J. Bruce Fields
2017-03-06 19:23                 ` J. Bruce Fields
2017-03-06 19:23                   ` J. Bruce Fields
2017-03-07 14:18                   ` Olga Kornievskaia
2017-03-07 14:18                     ` Olga Kornievskaia
2017-03-07 14:18                     ` Olga Kornievskaia
2017-03-07 23:40       ` [RFC v1 01/19] fs: " Christoph Hellwig
2017-03-08 17:05         ` J. Bruce Fields
2017-03-08 17:25           ` Christoph Hellwig
2017-03-08 17:32             ` Olga Kornievskaia
2017-03-08 19:53               ` J. Bruce Fields
2017-03-08 20:00                 ` Olga Kornievskaia
2017-03-08 20:00                   ` Olga Kornievskaia
2017-03-08 20:18                   ` J. Bruce Fields
2017-03-08 20:18                   ` Trond Myklebust
2017-03-08 20:18                     ` Trond Myklebust
2017-03-08 20:32                     ` bfields
2017-03-08 20:49                       ` Trond Myklebust
2017-03-08 20:49                         ` Trond Myklebust
2017-03-09 15:29                         ` bfields
2017-03-09 15:35                           ` hch
2017-03-09 16:16                             ` bfields
2017-03-09 16:17                               ` hch
2017-03-09 17:28                                 ` Olga Kornievskaia
2017-03-09 17:28                                   ` Olga Kornievskaia
2017-03-09 18:40                                   ` bfields
2017-03-09 21:55                                   ` hch
2017-03-09 17:35                               ` Olga Kornievskaia
2017-03-09 17:35                                 ` Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 02/19] VFS permit cross device vfs_copy_file_range Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 03/19] VFS don't try clone if superblocks are different Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 04/19] NFS inter ssc open Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 05/19] NFS add COPY_NOTIFY operation Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 06/19] NFS add ca_source_server<> to COPY Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 07/19] NFS CB_OFFLOAD xdr Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 08/19] NFS OFFLOAD_STATUS xdr Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 09/19] NFS OFFLOAD_STATUS op Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 10/19] NFS OFFLOAD_CANCEL xdr Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 11/19] NFS COPY xdr handle async reply Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 12/19] NFS add support for asynchronous COPY Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 13/19] NFS handle COPY reply CB_OFFLOAD call race Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 14/19] NFS send OFFLOAD_CANCEL when COPY killed Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 15/19] NFS make COPY synchronous xdr configurable Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 16/19] NFS handle COPY ERR_OFFLOAD_NO_REQS Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 17/19] NFS skip recovery of copy open on dest server Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 18/19] NFS recover from destination server reboot for copies Olga Kornievskaia
2017-03-02 16:01 ` [RFC v1 19/19] NFS if we got partial copy ignore errors Olga Kornievskaia

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.