All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] NFSv4.2: Add support for the COPY operation
@ 2015-08-07 20:38 Anna Schumaker
  2015-08-07 20:38 ` [PATCH 1/7] vfs: add copy_file_range syscall and vfs helper Anna Schumaker
                   ` (8 more replies)
  0 siblings, 9 replies; 12+ messages in thread
From: Anna Schumaker @ 2015-08-07 20:38 UTC (permalink / raw)
  To: Trond.Myklebust, linux-nfs, bfields; +Cc: Anna.Schumaker

These patches add client and server support for the NFS v4.2 COPY operation.
Unlike the similar CLONE operation, COPY can support both acceleration through
and reflink and a full copy of data from one file into another.  These patches
make use of Zach Brown's vfs_copy_file_range() syscall, and the first three
patches in this series are simply a reposting of the patches that add the
syscall.

Patch 4 expands vfs_copy_file_range() to fall back on the splice interface for
copies where the filesystem does not support copy accelerations.  This behavior
is useful for NFSD, since we'll still want to copy the file even if we can't
do a reflink.  Additionally, this opens up the possibility of in-kernel copies
for all filesystems without needing to do frequent switches between kernel and
user space.  The only potential drawback I've noticed is that splice will write
out data in PAGE_SIZE chunks, even if wsize > PAGE_SIZE.  This leads to a few
more writes over the wire, but I have not noticed a significant timing
difference.  Still, I wonder if there is a better way to optimize this for NFS.

The remaining patches implement the COPY operation for both the client and the
server.  The program I used for testing is included as an RFC as the last patch
in the series.  I gathered performance information by comparing the runtime and
RPC count of this program against /usr/bin/cp for various file sizes.

/usr/bin/cp:
                      size:    513MB   1024MB   1536MB   2048MB
------------- ------------- -------- -------- -------- --------
nfs v4 client        total:     8203    16396    24588    32780
------------- ------------- -------- -------- -------- --------
nfs v4 client         read:     4096     8192    12288    16384
nfs v4 client        write:     4096     8192    12288    16384
nfs v4 client       commit:        1        1        1        1
nfs v4 client         open:        1        1        1        1
nfs v4 client    open_noat:        2        2        2        2
nfs v4 client        close:        1        1        1        1
nfs v4 client      setattr:        2        2        2        2
nfs v4 client       access:        2        3        3        3
nfs v4 client      getattr:        2        2        2        2

/usr/bin/cp /nfs/test-512  /nfs/test-copy  0.00s user 0.32s system 14% cpu 2.209 total
/usr/bin/cp /nfs/test-1024 /nfs/test-copy  0.00s user 0.66s system 18% cpu 3.651 total
/usr/bin/cp /nfs/test-1536 /nfs/test-copy  0.02s user 0.97s system 18% cpu 5.477 total
/usr/bin/cp /nfs/test-2048 /nfs/test-copy  0.00s user 1.38s system 15% cpu 9.085 total


Copy system call:
                      size:    512MB   1024MB   1536MB   2048MB
------------- ------------- -------- -------- -------- --------
nfs v4 client        total:        6        6        6        6
------------- ------------- -------- -------- -------- --------
nfs v4 client         open:        2        2        2        2
nfs v4 client        close:        2        2        2        2
nfs v4 client       access:        1        1        1        1
nfs v4 client         copy:        1        1        1        1


./nfscopy /nfs/test-512  /nfs/test-copy  0.00s user 0.00s system 0% cpu 1.148 total
./nfscopy /nfs/test-1024 /nfs/test-copy  0.00s user 0.00s system 0% cpu 2.293 total
./nfscopy /nfs/test-1536 /nfs/test-copy  0.00s user 0.00s system 0% cpu 3.037 total
./nfscopy /nfs/test-2048 /nfs/test-copy  0.00s user 0.00s system 0% cpu 4.045 total


Questions, comments, and other testing ideas would be greatly appreciated!

Thanks,
Anna


Anna Schumaker (4):
  VFS: Fall back on splice if no copy function defined
  nfsd: Pass filehandle to nfs4_preprocess_stateid_op()
  NFSD: Implement the COPY call
  NFS: Add COPY nfs operation

Zach Brown (3):
  vfs: add copy_file_range syscall and vfs helper
  x86: add sys_copy_file_range to syscall tables
  btrfs: add .copy_file_range file operation

 arch/x86/entry/syscalls/syscall_32.tbl |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 fs/btrfs/ctree.h                       |   3 +
 fs/btrfs/file.c                        |   1 +
 fs/btrfs/ioctl.c                       |  91 ++++++++++++----------
 fs/nfs/nfs42.h                         |   1 +
 fs/nfs/nfs42proc.c                     |  40 ++++++++++
 fs/nfs/nfs42xdr.c                      | 136 +++++++++++++++++++++++++++++++++
 fs/nfs/nfs4file.c                      |   8 ++
 fs/nfs/nfs4proc.c                      |   1 +
 fs/nfs/nfs4xdr.c                       |   1 +
 fs/nfsd/nfs4proc.c                     |  79 +++++++++++++++++--
 fs/nfsd/nfs4state.c                    |   5 +-
 fs/nfsd/nfs4xdr.c                      |  62 ++++++++++++++-
 fs/nfsd/state.h                        |   4 +-
 fs/nfsd/vfs.c                          |  13 ++++
 fs/nfsd/vfs.h                          |   1 +
 fs/nfsd/xdr4.h                         |  23 ++++++
 fs/read_write.c                        | 133 ++++++++++++++++++++++++++++++++
 include/linux/fs.h                     |   3 +
 include/linux/nfs4.h                   |   1 +
 include/linux/nfs_fs_sb.h              |   1 +
 include/linux/nfs_xdr.h                |  27 +++++++
 include/uapi/asm-generic/unistd.h      |   4 +-
 kernel/sys_ni.c                        |   1 +
 25 files changed, 587 insertions(+), 54 deletions(-)

-- 
2.5.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/7] vfs: add copy_file_range syscall and vfs helper
  2015-08-07 20:38 [PATCH 0/7] NFSv4.2: Add support for the COPY operation Anna Schumaker
@ 2015-08-07 20:38 ` Anna Schumaker
  2015-08-07 20:38 ` [PATCH 2/7] x86: add sys_copy_file_range to syscall tables Anna Schumaker
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Anna Schumaker @ 2015-08-07 20:38 UTC (permalink / raw)
  To: Trond.Myklebust, linux-nfs, bfields; +Cc: Anna.Schumaker

From: Zach Brown <zab@redhat.com>

Add a copy_file_range() system call for offloading copies between
regular files.

This gives an interface to underlying layers of the storage stack which
can copy without reading and writing all the data.  There are a few
candidates that should support copy offloading in the nearer term:

- btrfs shares extent references with its clone ioctl
- NFS has patches to add a COPY command which copies on the server
- SCSI has a family of XCOPY commands which copy in the device

This system call avoids the complexity of also accelerating the creation
of the destination file by operating on an existing destination file
descriptor, not a path.

Currently the high level vfs entry point limits copy offloading to files
on the same mount and super (and not in the same file).  This can be
relaxed if we get implementations which can copy between file systems
safely.

Signed-off-by: Zach Brown <zab@redhat.com>
---
 fs/read_write.c                   | 129 ++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h                |   3 +
 include/uapi/asm-generic/unistd.h |   4 +-
 kernel/sys_ni.c                   |   1 +
 4 files changed, 136 insertions(+), 1 deletion(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 819ef3f..3804547 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -16,6 +16,7 @@
 #include <linux/pagemap.h>
 #include <linux/splice.h>
 #include <linux/compat.h>
+#include <linux/mount.h>
 #include "internal.h"
 
 #include <asm/uaccess.h>
@@ -1327,3 +1328,131 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd,
 	return do_sendfile(out_fd, in_fd, NULL, count, 0);
 }
 #endif
+
+/*
+ * copy_file_range() differs from regular file read and write in that it
+ * specifically allows return partial success.  When it does so is up to
+ * the copy_file_range method.
+ */
+ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
+			    struct file *file_out, loff_t pos_out,
+			    size_t len, int flags)
+{
+	struct inode *inode_in;
+	struct inode *inode_out;
+	ssize_t ret;
+
+	if (flags)
+		return -EINVAL;
+
+	if (len == 0)
+		return 0;
+
+	/* copy_file_range allows full ssize_t len, ignoring MAX_RW_COUNT  */
+	ret = rw_verify_area(READ, file_in, &pos_in, len);
+	if (ret >= 0)
+		ret = rw_verify_area(WRITE, file_out, &pos_out, len);
+	if (ret < 0)
+		return ret;
+
+	if (!(file_in->f_mode & FMODE_READ) ||
+	    !(file_out->f_mode & FMODE_WRITE) ||
+	    (file_out->f_flags & O_APPEND) ||
+	    !file_in->f_op || !file_in->f_op->copy_file_range)
+		return -EINVAL;
+
+	inode_in = file_inode(file_in);
+	inode_out = file_inode(file_out);
+
+	/* make sure offsets don't wrap and the input is inside i_size */
+	if (pos_in + len < pos_in || pos_out + len < pos_out ||
+	    pos_in + len > i_size_read(inode_in))
+		return -EINVAL;
+
+	/* this could be relaxed once a method supports cross-fs copies */
+	if (inode_in->i_sb != inode_out->i_sb ||
+	    file_in->f_path.mnt != file_out->f_path.mnt)
+		return -EXDEV;
+
+	/* forbid ranges in the same file */
+	if (inode_in == inode_out)
+		return -EINVAL;
+
+	ret = mnt_want_write_file(file_out);
+	if (ret)
+		return ret;
+
+	ret = file_in->f_op->copy_file_range(file_in, pos_in, file_out, pos_out,
+					     len, flags);
+	if (ret > 0) {
+		fsnotify_access(file_in);
+		add_rchar(current, ret);
+		fsnotify_modify(file_out);
+		add_wchar(current, ret);
+	}
+	inc_syscr(current);
+	inc_syscw(current);
+
+	mnt_drop_write_file(file_out);
+
+	return ret;
+}
+EXPORT_SYMBOL(vfs_copy_file_range);
+
+SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
+		int, fd_out, loff_t __user *, off_out,
+		size_t, len, unsigned int, flags)
+{
+	loff_t pos_in;
+	loff_t pos_out;
+	struct fd f_in;
+	struct fd f_out;
+	ssize_t ret;
+
+	f_in = fdget(fd_in);
+	f_out = fdget(fd_out);
+	if (!f_in.file || !f_out.file) {
+		ret = -EBADF;
+		goto out;
+	}
+
+	ret = -EFAULT;
+	if (off_in) {
+		if (copy_from_user(&pos_in, off_in, sizeof(loff_t)))
+			goto out;
+	} else {
+		pos_in = f_in.file->f_pos;
+	}
+
+	if (off_out) {
+		if (copy_from_user(&pos_out, off_out, sizeof(loff_t)))
+			goto out;
+	} else {
+		pos_out = f_out.file->f_pos;
+	}
+
+	ret = vfs_copy_file_range(f_in.file, pos_in, f_out.file, pos_out, len,
+				  flags);
+	if (ret > 0) {
+		pos_in += ret;
+		pos_out += ret;
+
+		if (off_in) {
+			if (copy_to_user(off_in, &pos_in, sizeof(loff_t)))
+				ret = -EFAULT;
+		} else {
+			f_in.file->f_pos = pos_in;
+		}
+
+		if (off_out) {
+			if (copy_to_user(off_out, &pos_out, sizeof(loff_t)))
+				ret = -EFAULT;
+		} else {
+			f_out.file->f_pos = pos_out;
+		}
+	}
+out:
+	fdput(f_in);
+	fdput(f_out);
+	return ret;
+}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index cc008c3..c97aed8 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1631,6 +1631,7 @@ struct file_operations {
 #ifndef CONFIG_MMU
 	unsigned (*mmap_capabilities)(struct file *);
 #endif
+	ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, loff_t, size_t, int);
 };
 
 struct inode_operations {
@@ -1684,6 +1685,8 @@ extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
 		unsigned long, loff_t *);
 extern ssize_t vfs_writev(struct file *, const struct iovec __user *,
 		unsigned long, loff_t *);
+extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
+				   loff_t, size_t, int);
 
 struct super_operations {
    	struct inode *(*alloc_inode)(struct super_block *sb);
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index e016bd9..2b60f0c 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -709,9 +709,11 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
 __SYSCALL(__NR_bpf, sys_bpf)
 #define __NR_execveat 281
 __SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
+#define __NR_copy_file_range 282
+__SYSCALL(__NR_copy_file_range, sys_copy_file_range)
 
 #undef __NR_syscalls
-#define __NR_syscalls 282
+#define __NR_syscalls 283
 
 /*
  * All syscalls below here should go away really,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 7995ef5..4e01cd9 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -173,6 +173,7 @@ cond_syscall(sys_setfsuid);
 cond_syscall(sys_setfsgid);
 cond_syscall(sys_capget);
 cond_syscall(sys_capset);
+cond_syscall(sys_copy_file_range);
 
 /* arch-specific weak syscall entries */
 cond_syscall(sys_pciconfig_read);
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/7] x86: add sys_copy_file_range to syscall tables
  2015-08-07 20:38 [PATCH 0/7] NFSv4.2: Add support for the COPY operation Anna Schumaker
  2015-08-07 20:38 ` [PATCH 1/7] vfs: add copy_file_range syscall and vfs helper Anna Schumaker
@ 2015-08-07 20:38 ` Anna Schumaker
  2015-08-07 20:38 ` [PATCH 3/7] btrfs: add .copy_file_range file operation Anna Schumaker
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Anna Schumaker @ 2015-08-07 20:38 UTC (permalink / raw)
  To: Trond.Myklebust, linux-nfs, bfields; +Cc: Anna.Schumaker

From: Zach Brown <zab@redhat.com>

Add sys_copy_file_range to the x86 syscall tables.

Signed-off-by: Zach Brown <zab@redhat.com>
---
 arch/x86/entry/syscalls/syscall_32.tbl | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index ef8187f..2f5e1e0 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -365,3 +365,4 @@
 356	i386	memfd_create		sys_memfd_create
 357	i386	bpf			sys_bpf
 358	i386	execveat		sys_execveat			stub32_execveat
+359	i386	copy_file_range		sys_copy_file_range
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 9ef32d5..b2101de 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -329,6 +329,7 @@
 320	common	kexec_file_load		sys_kexec_file_load
 321	common	bpf			sys_bpf
 322	64	execveat		stub_execveat
+323	common	copy_file_range		sys_copy_file_range
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/7] btrfs: add .copy_file_range file operation
  2015-08-07 20:38 [PATCH 0/7] NFSv4.2: Add support for the COPY operation Anna Schumaker
  2015-08-07 20:38 ` [PATCH 1/7] vfs: add copy_file_range syscall and vfs helper Anna Schumaker
  2015-08-07 20:38 ` [PATCH 2/7] x86: add sys_copy_file_range to syscall tables Anna Schumaker
@ 2015-08-07 20:38 ` Anna Schumaker
  2015-08-07 20:38 ` [PATCH 4/7] VFS: Fall back on splice if no copy function defined Anna Schumaker
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Anna Schumaker @ 2015-08-07 20:38 UTC (permalink / raw)
  To: Trond.Myklebust, linux-nfs, bfields; +Cc: Anna.Schumaker

From: Zach Brown <zab@redhat.com>

This rearranges the existing COPY_RANGE ioctl implementation so that the
.copy_file_range file operation can call the core loop that copies file
data extent items.

The extent copying loop is lifted up into its own function.  It retains
the core btrfs error checks that should be shared.

Signed-off-by: Zach Brown <zab@redhat.com>
---
 fs/btrfs/ctree.h |  3 ++
 fs/btrfs/file.c  |  1 +
 fs/btrfs/ioctl.c | 91 ++++++++++++++++++++++++++++++++------------------------
 3 files changed, 56 insertions(+), 39 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index aac314e..e09d4e2 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -4000,6 +4000,9 @@ int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode,
 		      loff_t pos, size_t write_bytes,
 		      struct extent_state **cached);
 int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
+ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
+			      struct file *file_out, loff_t pos_out,
+			      size_t len, int flags);
 
 /* tree-defrag.c */
 int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index b823fac..b05449c 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2816,6 +2816,7 @@ const struct file_operations btrfs_file_operations = {
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	= btrfs_ioctl,
 #endif
+	.copy_file_range = btrfs_copy_file_range,
 };
 
 void btrfs_auto_defrag_exit(void)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 0770c91..62ae286 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3719,17 +3719,16 @@ out:
 	return ret;
 }
 
-static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
-				       u64 off, u64 olen, u64 destoff)
+static noinline int btrfs_clone_files(struct file *file, struct file *file_src,
+					u64 off, u64 olen, u64 destoff)
 {
 	struct inode *inode = file_inode(file);
+	struct inode *src = file_inode(file_src);
 	struct btrfs_root *root = BTRFS_I(inode)->root;
-	struct fd src_file;
-	struct inode *src;
 	int ret;
 	u64 len = olen;
 	u64 bs = root->fs_info->sb->s_blocksize;
-	int same_inode = 0;
+	int same_inode = src == inode;
 
 	/*
 	 * TODO:
@@ -3742,49 +3741,20 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
 	 *   be either compressed or non-compressed.
 	 */
 
-	/* the destination must be opened for writing */
-	if (!(file->f_mode & FMODE_WRITE) || (file->f_flags & O_APPEND))
-		return -EINVAL;
-
 	if (btrfs_root_readonly(root))
 		return -EROFS;
 
-	ret = mnt_want_write_file(file);
-	if (ret)
-		return ret;
-
-	src_file = fdget(srcfd);
-	if (!src_file.file) {
-		ret = -EBADF;
-		goto out_drop_write;
-	}
-
-	ret = -EXDEV;
-	if (src_file.file->f_path.mnt != file->f_path.mnt)
-		goto out_fput;
-
-	src = file_inode(src_file.file);
-
-	ret = -EINVAL;
-	if (src == inode)
-		same_inode = 1;
-
-	/* the src must be open for reading */
-	if (!(src_file.file->f_mode & FMODE_READ))
-		goto out_fput;
+	if (file_src->f_path.mnt != file->f_path.mnt ||
+	    src->i_sb != inode->i_sb)
+		return -EXDEV;
 
 	/* don't make the dst file partly checksummed */
 	if ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) !=
 	    (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM))
-		goto out_fput;
+		return -EINVAL;
 
-	ret = -EISDIR;
 	if (S_ISDIR(src->i_mode) || S_ISDIR(inode->i_mode))
-		goto out_fput;
-
-	ret = -EXDEV;
-	if (src->i_sb != inode->i_sb)
-		goto out_fput;
+		return -EISDIR;
 
 	if (!same_inode) {
 		if (inode < src) {
@@ -3877,6 +3847,49 @@ out_unlock:
 	} else {
 		mutex_unlock(&src->i_mutex);
 	}
+	return ret;
+}
+
+ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
+			      struct file *file_out, loff_t pos_out,
+			      size_t len, int flags)
+{
+	ssize_t ret;
+
+	ret = btrfs_clone_files(file_out, file_in, pos_in, len, pos_out);
+	if (ret == 0)
+		ret = len;
+	return ret;
+}
+
+static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
+				       u64 off, u64 olen, u64 destoff)
+{
+	struct fd src_file;
+	int ret;
+
+	/* the destination must be opened for writing */
+	if (!(file->f_mode & FMODE_WRITE) || (file->f_flags & O_APPEND))
+		return -EINVAL;
+
+	ret = mnt_want_write_file(file);
+	if (ret)
+		return ret;
+
+	src_file = fdget(srcfd);
+	if (!src_file.file) {
+		ret = -EBADF;
+		goto out_drop_write;
+	}
+
+	/* the src must be open for reading */
+	if (!(src_file.file->f_mode & FMODE_READ)) {
+		ret = -EINVAL;
+		goto out_fput;
+	}
+
+	ret = btrfs_clone_files(file, src_file.file, off, olen, destoff);
+
 out_fput:
 	fdput(src_file);
 out_drop_write:
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/7] VFS: Fall back on splice if no copy function defined
  2015-08-07 20:38 [PATCH 0/7] NFSv4.2: Add support for the COPY operation Anna Schumaker
                   ` (2 preceding siblings ...)
  2015-08-07 20:38 ` [PATCH 3/7] btrfs: add .copy_file_range file operation Anna Schumaker
@ 2015-08-07 20:38 ` Anna Schumaker
  2015-08-13 13:03   ` Kinglong Mee
  2015-08-07 20:38 ` [PATCH 5/7] nfsd: Pass filehandle to nfs4_preprocess_stateid_op() Anna Schumaker
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 12+ messages in thread
From: Anna Schumaker @ 2015-08-07 20:38 UTC (permalink / raw)
  To: Trond.Myklebust, linux-nfs, bfields; +Cc: Anna.Schumaker

The NFS server will need a fallback for filesystems that don't have any
kind of copy acceleration yet.  Let's handle this by having
vfs_copy_range() fall back to splice, enabling an in-kernel fallback for
all filesystems.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 fs/read_write.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 3804547..e564a6b 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1358,7 +1358,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	if (!(file_in->f_mode & FMODE_READ) ||
 	    !(file_out->f_mode & FMODE_WRITE) ||
 	    (file_out->f_flags & O_APPEND) ||
-	    !file_in->f_op || !file_in->f_op->copy_file_range)
+	    !file_in->f_op)
 		return -EINVAL;
 
 	inode_in = file_inode(file_in);
@@ -1382,8 +1382,12 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	if (ret)
 		return ret;
 
-	ret = file_in->f_op->copy_file_range(file_in, pos_in, file_out, pos_out,
-					     len, flags);
+	ret = -ENOTSUPP;
+	if (file_in->f_op->copy_file_range)
+		ret = file_in->f_op->copy_file_range(file_in, pos_in, file_out,
+						     pos_out, len, flags);
+	if (ret == -ENOTSUPP)
+		ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out, len, flags);
 	if (ret > 0) {
 		fsnotify_access(file_in);
 		add_rchar(current, ret);
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/7] nfsd: Pass filehandle to nfs4_preprocess_stateid_op()
  2015-08-07 20:38 [PATCH 0/7] NFSv4.2: Add support for the COPY operation Anna Schumaker
                   ` (3 preceding siblings ...)
  2015-08-07 20:38 ` [PATCH 4/7] VFS: Fall back on splice if no copy function defined Anna Schumaker
@ 2015-08-07 20:38 ` Anna Schumaker
  2015-08-07 20:38 ` [PATCH 6/7] NFSD: Implement the COPY call Anna Schumaker
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Anna Schumaker @ 2015-08-07 20:38 UTC (permalink / raw)
  To: Trond.Myklebust, linux-nfs, bfields; +Cc: Anna.Schumaker

This will be needed so COPY can look up the saved_fh in addition to the
current_fh.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 fs/nfsd/nfs4proc.c  | 16 +++++++++-------
 fs/nfsd/nfs4state.c |  5 ++---
 fs/nfsd/state.h     |  4 ++--
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 90cfda7..d34c967 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -776,8 +776,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags);
 
 	/* check stateid */
-	status = nfs4_preprocess_stateid_op(rqstp, cstate, &read->rd_stateid,
-			RD_STATE, &read->rd_filp, &read->rd_tmp_file);
+	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
+					&read->rd_stateid, RD_STATE,
+					&read->rd_filp, &read->rd_tmp_file);
 	if (status) {
 		dprintk("NFSD: nfsd4_read: couldn't process stateid!\n");
 		goto out;
@@ -923,7 +924,8 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 
 	if (setattr->sa_iattr.ia_valid & ATTR_SIZE) {
 		status = nfs4_preprocess_stateid_op(rqstp, cstate,
-			&setattr->sa_stateid, WR_STATE, NULL, NULL);
+				&cstate->current_fh, &setattr->sa_stateid,
+				WR_STATE, NULL, NULL);
 		if (status) {
 			dprintk("NFSD: nfsd4_setattr: couldn't process stateid!\n");
 			return status;
@@ -987,8 +989,8 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	if (write->wr_offset >= OFFSET_MAX)
 		return nfserr_inval;
 
-	status = nfs4_preprocess_stateid_op(rqstp, cstate, stateid, WR_STATE,
-			&filp, NULL);
+	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
+						stateid, WR_STATE, &filp, NULL);
 	if (status) {
 		dprintk("NFSD: nfsd4_write: couldn't process stateid!\n");
 		return status;
@@ -1018,7 +1020,7 @@ nfsd4_fallocate(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	__be32 status = nfserr_notsupp;
 	struct file *file;
 
-	status = nfs4_preprocess_stateid_op(rqstp, cstate,
+	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
 					    &fallocate->falloc_stateid,
 					    WR_STATE, &file, NULL);
 	if (status != nfs_ok) {
@@ -1057,7 +1059,7 @@ nfsd4_seek(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	__be32 status;
 	struct file *file;
 
-	status = nfs4_preprocess_stateid_op(rqstp, cstate,
+	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
 					    &seek->seek_stateid,
 					    RD_STATE, &file, NULL);
 	if (status) {
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 61dfb33..7b0059d 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4645,10 +4645,9 @@ nfs4_check_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfs4_stid *s,
  */
 __be32
 nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
-		struct nfsd4_compound_state *cstate, stateid_t *stateid,
-		int flags, struct file **filpp, bool *tmp_file)
+		struct nfsd4_compound_state *cstate, struct svc_fh *fhp,
+		stateid_t *stateid, int flags, struct file **filpp, bool *tmp_file)
 {
-	struct svc_fh *fhp = &cstate->current_fh;
 	struct inode *ino = d_inode(fhp->fh_dentry);
 	struct net *net = SVC_NET(rqstp);
 	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 4874ce5..d3e81ce 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -584,8 +584,8 @@ struct nfsd4_compound_state;
 struct nfsd_net;
 
 extern __be32 nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
-		struct nfsd4_compound_state *cstate, stateid_t *stateid,
-		int flags, struct file **filp, bool *tmp_file);
+		struct nfsd4_compound_state *cstate, struct svc_fh *fhp,
+		stateid_t *stateid, int flags, struct file **filp, bool *tmp_file);
 __be32 nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
 		     stateid_t *stateid, unsigned char typemask,
 		     struct nfs4_stid **s, struct nfsd_net *nn);
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 6/7] NFSD: Implement the COPY call
  2015-08-07 20:38 [PATCH 0/7] NFSv4.2: Add support for the COPY operation Anna Schumaker
                   ` (4 preceding siblings ...)
  2015-08-07 20:38 ` [PATCH 5/7] nfsd: Pass filehandle to nfs4_preprocess_stateid_op() Anna Schumaker
@ 2015-08-07 20:38 ` Anna Schumaker
  2015-08-07 20:38 ` [PATCH 7/7] NFS: Add COPY nfs operation Anna Schumaker
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Anna Schumaker @ 2015-08-07 20:38 UTC (permalink / raw)
  To: Trond.Myklebust, linux-nfs, bfields; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@netapp.com>

I only implemented the sync version of this call, since it's the
easiest.  I can simply call vfs_copy_range() and have the vfs do the
right thing for the filesystem being exported.

Signed-off-by: Anna Schumaker <bjschuma@netapp.com>
---
 fs/nfsd/nfs4proc.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/nfsd/nfs4xdr.c  | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 fs/nfsd/vfs.c      | 13 +++++++++++
 fs/nfsd/vfs.h      |  1 +
 fs/nfsd/xdr4.h     | 23 ++++++++++++++++++++
 5 files changed, 160 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index d34c967..fbfb509 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1014,6 +1014,63 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 }
 
 static __be32
+nfsd4_verify_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
+		  struct nfsd4_copy *copy, struct file **src, struct file **dst)
+{
+	__be32 status;
+
+	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->save_fh,
+						&copy->cp_src_stateid, RD_STATE,
+						src, NULL);
+	if (status) {
+		dprintk("NFSD: nfsd4_copy: couldn't process src stateid!\n");
+		return status;
+	}
+
+	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
+						&copy->cp_dst_stateid, WR_STATE,
+						dst, NULL);
+	if (status) {
+		dprintk("NFSD: nfsd4_copy: couldn't process dst stateid!\n");
+		fput(*src);
+	}
+
+	return status;
+}
+
+static __be32
+nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
+		struct nfsd4_copy *copy)
+{
+	ssize_t bytes;
+	__be32 status;
+	struct file *src = NULL, *dst = NULL;
+
+	status = nfsd4_verify_copy(rqstp, cstate, copy, &src, &dst);
+	if (status)
+		return status;
+
+	bytes = nfsd_copy_range(src, copy->cp_src_pos,
+				 dst, copy->cp_dst_pos,
+				 copy->cp_count);
+
+	if (bytes < 0)
+		status = nfserrno(bytes);
+	else {
+		copy->cp_res.wr_bytes_written = bytes;
+		copy->cp_res.wr_stable_how = NFS_FILE_SYNC;
+		copy->cp_consecutive = 1;
+		copy->cp_synchronous = 1;
+		gen_boot_verifier(&copy->cp_res.wr_verifier, SVC_NET(rqstp));
+		status = nfs_ok;
+	}
+
+	fput(src);
+	fput(dst);
+	return status;
+}
+
+static __be32
 nfsd4_fallocate(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		struct nfsd4_fallocate *fallocate, int flags)
 {
@@ -2283,6 +2340,12 @@ static struct nfsd4_operation nfsd4_ops[] = {
 		.op_name = "OP_DEALLOCATE",
 		.op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
 	},
+	[OP_COPY] = {
+		.op_func = (nfsd4op_func)nfsd4_copy,
+		.op_flags = OP_MODIFIES_SOMETHING | OP_CACHEME,
+		.op_name = "OP_COPY",
+		.op_rsize_bop = (nfsd4op_rsize)nfsd4_write_rsize,
+	},
 	[OP_SEEK] = {
 		.op_func = (nfsd4op_func)nfsd4_seek,
 		.op_name = "OP_SEEK",
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 5463385..3a78c7f 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1675,6 +1675,30 @@ nfsd4_decode_fallocate(struct nfsd4_compoundargs *argp,
 }
 
 static __be32
+nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy)
+{
+	DECODE_HEAD;
+	unsigned int tmp;
+
+	status = nfsd4_decode_stateid(argp, &copy->cp_src_stateid);
+	if (status)
+		return status;
+	status = nfsd4_decode_stateid(argp, &copy->cp_dst_stateid);
+	if (status)
+		return status;
+
+	READ_BUF(8 + 8 + 8 + 4 + 4 + 4);
+	p = xdr_decode_hyper(p, &copy->cp_src_pos);
+	p = xdr_decode_hyper(p, &copy->cp_dst_pos);
+	p = xdr_decode_hyper(p, &copy->cp_count);
+	copy->cp_consecutive = be32_to_cpup(p++);
+	copy->cp_synchronous = be32_to_cpup(p++);
+	tmp = be32_to_cpup(p); /* Source server list not supported */
+
+	DECODE_TAIL;
+}
+
+static __be32
 nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek *seek)
 {
 	DECODE_HEAD;
@@ -1774,7 +1798,7 @@ static nfsd4_dec nfsd4_dec_ops[] = {
 
 	/* new operations for NFSv4.2 */
 	[OP_ALLOCATE]		= (nfsd4_dec)nfsd4_decode_fallocate,
-	[OP_COPY]		= (nfsd4_dec)nfsd4_decode_notsupp,
+	[OP_COPY]		= (nfsd4_dec)nfsd4_decode_copy,
 	[OP_COPY_NOTIFY]	= (nfsd4_dec)nfsd4_decode_notsupp,
 	[OP_DEALLOCATE]		= (nfsd4_dec)nfsd4_decode_fallocate,
 	[OP_IO_ADVISE]		= (nfsd4_dec)nfsd4_decode_notsupp,
@@ -4140,6 +4164,40 @@ nfsd4_encode_layoutreturn(struct nfsd4_compoundres *resp, __be32 nfserr,
 #endif /* CONFIG_NFSD_PNFS */
 
 static __be32
+nfsd42_encode_write_res(struct nfsd4_compoundres *resp, struct nfsd42_write_res *write)
+{
+	__be32 *p;
+
+	p = xdr_reserve_space(&resp->xdr, 4 + 8 + 4 + NFS4_VERIFIER_SIZE);
+	if (!p)
+		return nfserr_resource;
+
+	*p++ = cpu_to_be32(0);
+	p = xdr_encode_hyper(p, write->wr_bytes_written);
+	*p++ = cpu_to_be32(write->wr_stable_how);
+	p = xdr_encode_opaque_fixed(p, write->wr_verifier.data, NFS4_VERIFIER_SIZE);
+	return nfs_ok;
+}
+
+static __be32
+nfsd4_encode_copy(struct nfsd4_compoundres *resp, __be32 nfserr,
+		  struct nfsd4_copy *copy)
+{
+	__be32 *p, err;
+
+	if (!nfserr) {
+		err = nfsd42_encode_write_res(resp, &copy->cp_res);
+		if (err)
+			return err;
+
+		p = xdr_reserve_space(&resp->xdr, 4 + 4);
+		*p++ = cpu_to_be32(copy->cp_consecutive);
+		*p++ = cpu_to_be32(copy->cp_synchronous);
+	}
+	return nfserr;
+}
+
+static __be32
 nfsd4_encode_seek(struct nfsd4_compoundres *resp, __be32 nfserr,
 		  struct nfsd4_seek *seek)
 {
@@ -4238,7 +4296,7 @@ static nfsd4_enc nfsd4_enc_ops[] = {
 
 	/* NFSv4.2 operations */
 	[OP_ALLOCATE]		= (nfsd4_enc)nfsd4_encode_noop,
-	[OP_COPY]		= (nfsd4_enc)nfsd4_encode_noop,
+	[OP_COPY]		= (nfsd4_enc)nfsd4_encode_copy,
 	[OP_COPY_NOTIFY]	= (nfsd4_enc)nfsd4_encode_noop,
 	[OP_DEALLOCATE]		= (nfsd4_enc)nfsd4_encode_noop,
 	[OP_IO_ADVISE]		= (nfsd4_enc)nfsd4_encode_noop,
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index b5e077a..4065f38 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -36,6 +36,7 @@
 #endif /* CONFIG_NFSD_V3 */
 
 #ifdef CONFIG_NFSD_V4
+#include "../internal.h"
 #include "acl.h"
 #include "idmap.h"
 #endif /* CONFIG_NFSD_V4 */
@@ -498,6 +499,18 @@ __be32 nfsd4_set_nfs4_label(struct svc_rqst *rqstp, struct svc_fh *fhp,
 }
 #endif
 
+ssize_t nfsd_copy_range(struct file *src, u64 src_pos,
+		       struct file *dst, u64 dst_pos,
+		       u64 count)
+{
+	ssize_t bytes;
+
+	bytes = vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
+	if (bytes > 0)
+		vfs_fsync_range(dst, dst_pos, dst_pos + bytes, 0);
+	return bytes;
+}
+
 __be32 nfsd4_vfs_fallocate(struct svc_rqst *rqstp, struct svc_fh *fhp,
 			   struct file *file, loff_t offset, loff_t len,
 			   int flags)
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 5be875e..c529f9e 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -91,6 +91,7 @@ __be32		nfsd_symlink(struct svc_rqst *, struct svc_fh *,
 				struct svc_fh *res);
 __be32		nfsd_link(struct svc_rqst *, struct svc_fh *,
 				char *, int, struct svc_fh *);
+ssize_t		nfsd_copy_range(struct file *, u64, struct file *, u64, u64);
 __be32		nfsd_rename(struct svc_rqst *,
 				struct svc_fh *, char *, int,
 				struct svc_fh *, char *, int);
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 9f99100..9e83f95 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -491,6 +491,28 @@ struct nfsd4_fallocate {
 	u64		falloc_length;
 };
 
+struct nfsd42_write_res {
+	u64			wr_bytes_written;
+	u32			wr_stable_how;
+	nfs4_verifier		wr_verifier;
+};
+
+struct nfsd4_copy {
+	/* request */
+	stateid_t	cp_src_stateid;
+	stateid_t	cp_dst_stateid;
+	u64		cp_src_pos;
+	u64		cp_dst_pos;
+	u64		cp_count;
+
+	/* both */
+	bool		cp_consecutive;
+	bool		cp_synchronous;
+
+	/* response */
+	struct nfsd42_write_res	cp_res;
+};
+
 struct nfsd4_seek {
 	/* request */
 	stateid_t	seek_stateid;
@@ -555,6 +577,7 @@ struct nfsd4_op {
 		/* NFSv4.2 */
 		struct nfsd4_fallocate		allocate;
 		struct nfsd4_fallocate		deallocate;
+		struct nfsd4_copy		copy;
 		struct nfsd4_seek		seek;
 	} u;
 	struct nfs4_replay *			replay;
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 7/7] NFS: Add COPY nfs operation
  2015-08-07 20:38 [PATCH 0/7] NFSv4.2: Add support for the COPY operation Anna Schumaker
                   ` (5 preceding siblings ...)
  2015-08-07 20:38 ` [PATCH 6/7] NFSD: Implement the COPY call Anna Schumaker
@ 2015-08-07 20:38 ` Anna Schumaker
  2015-08-07 20:38 ` [RFC] vfs_copy_range() test program Anna Schumaker
  2015-08-10 21:07 ` [PATCH 0/7] NFSv4.2: Add support for the COPY operation J. Bruce Fields
  8 siblings, 0 replies; 12+ messages in thread
From: Anna Schumaker @ 2015-08-07 20:38 UTC (permalink / raw)
  To: Trond.Myklebust, linux-nfs, bfields; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@netapp.com>

This adds the copy_range file_ops function pointer used by the
sys_copy_range() function call.  This patch only implements sync copies,
so if an async copy happens we decode the stateid and ignore it.

Signed-off-by: Anna Schumaker <bjschuma@netapp.com>
---
 fs/nfs/nfs42.h            |   1 +
 fs/nfs/nfs42proc.c        |  40 ++++++++++++++
 fs/nfs/nfs42xdr.c         | 136 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/nfs4file.c         |   8 +++
 fs/nfs/nfs4proc.c         |   1 +
 fs/nfs/nfs4xdr.c          |   1 +
 include/linux/nfs4.h      |   1 +
 include/linux/nfs_fs_sb.h |   1 +
 include/linux/nfs_xdr.h   |  27 +++++++++
 9 files changed, 216 insertions(+)

diff --git a/fs/nfs/nfs42.h b/fs/nfs/nfs42.h
index ff66ae7..b54b916 100644
--- a/fs/nfs/nfs42.h
+++ b/fs/nfs/nfs42.h
@@ -13,6 +13,7 @@
 
 /* nfs4.2proc.c */
 int nfs42_proc_allocate(struct file *, loff_t, loff_t);
+ssize_t nfs42_proc_copy(struct file *, loff_t, struct file *, loff_t, size_t);
 int nfs42_proc_deallocate(struct file *, loff_t, loff_t);
 loff_t nfs42_proc_llseek(struct file *, loff_t, int);
 int nfs42_proc_layoutstats_generic(struct nfs_server *,
diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index d731bbf..fa665f9 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -135,6 +135,46 @@ int nfs42_proc_deallocate(struct file *filep, loff_t offset, loff_t len)
 	return err;
 }
 
+ssize_t nfs42_proc_copy(struct file *src, loff_t pos_src,
+			struct file *dst, loff_t pos_dst,
+			size_t count)
+{
+	struct nfs42_copy_args args = {
+		.src_fh		= NFS_FH(file_inode(src)),
+		.src_pos	= pos_src,
+		.dst_fh		= NFS_FH(file_inode(dst)),
+		.dst_pos	= pos_dst,
+		.count		= count,
+	};
+	struct nfs42_copy_res res;
+	struct rpc_message msg = {
+		.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_COPY],
+		.rpc_argp = &args,
+		.rpc_resp = &res,
+	};
+	struct nfs_server *server = NFS_SERVER(file_inode(dst));
+	int status;
+
+	if (!(server->caps & NFS_CAP_COPY))
+		return -ENOTSUPP;
+
+	status = nfs42_set_rw_stateid(&args.src_stateid, src, FMODE_READ);
+	if (status)
+		return status;
+
+	status = nfs42_set_rw_stateid(&args.dst_stateid, dst, FMODE_WRITE);
+	if (status)
+		return status;
+
+	status = nfs4_call_sync(server->client, server, &msg,
+				&args.seq_args, &res.seq_res, 0);
+	if (status == -ENOTSUPP)
+		server->caps &= ~NFS_CAP_COPY;
+	if (status)
+		return status;
+	return res.write_res.count;
+}
+
 static loff_t _nfs42_proc_llseek(struct file *filep, loff_t offset, int whence)
 {
 	struct inode *inode = file_inode(filep);
diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
index a6bd27d..489bbf3 100644
--- a/fs/nfs/nfs42xdr.c
+++ b/fs/nfs/nfs42xdr.c
@@ -9,9 +9,22 @@
 #define encode_fallocate_maxsz		(encode_stateid_maxsz + \
 					 2 /* offset */ + \
 					 2 /* length */)
+#define NFS42_WRITE_RES_SIZE		(1 /* wr_callback_id size */ +\
+					 XDR_QUADLEN(NFS4_STATEID_SIZE) + \
+					 2 /* wr_count */ + \
+					 1 /* wr_committed */ + \
+					 XDR_QUADLEN(NFS4_VERIFIER_SIZE))
 #define encode_allocate_maxsz		(op_encode_hdr_maxsz + \
 					 encode_fallocate_maxsz)
 #define decode_allocate_maxsz		(op_decode_hdr_maxsz)
+#define encode_copy_maxsz		(op_encode_hdr_maxsz +          \
+					 XDR_QUADLEN(NFS4_STATEID_SIZE) + \
+					 XDR_QUADLEN(NFS4_STATEID_SIZE) + \
+					 2 + 2 + 2 + 1 + 1 + 1)
+#define decode_copy_maxsz		(op_decode_hdr_maxsz + \
+					 NFS42_WRITE_RES_SIZE + \
+					 1 /* cr_consecutive */ + \
+					 1 /* cr_synchronous */)
 #define encode_deallocate_maxsz		(op_encode_hdr_maxsz + \
 					 encode_fallocate_maxsz)
 #define decode_deallocate_maxsz		(op_decode_hdr_maxsz)
@@ -43,6 +56,16 @@
 					 decode_putfh_maxsz + \
 					 decode_allocate_maxsz + \
 					 decode_getattr_maxsz)
+#define NFS4_enc_copy_sz		(compound_encode_hdr_maxsz + \
+					 encode_putfh_maxsz + \
+					 encode_savefh_maxsz + \
+					 encode_putfh_maxsz + \
+					 encode_copy_maxsz)
+#define NFS4_dec_copy_sz		(compound_decode_hdr_maxsz + \
+					 decode_putfh_maxsz + \
+					 decode_savefh_maxsz + \
+					 decode_putfh_maxsz + \
+					 decode_copy_maxsz)
 #define NFS4_enc_deallocate_sz		(compound_encode_hdr_maxsz + \
 					 encode_putfh_maxsz + \
 					 encode_deallocate_maxsz + \
@@ -83,6 +106,23 @@ static void encode_allocate(struct xdr_stream *xdr,
 	encode_fallocate(xdr, args);
 }
 
+static void encode_copy(struct xdr_stream *xdr,
+			struct nfs42_copy_args *args,
+			struct compound_hdr *hdr)
+{
+	encode_op_hdr(xdr, OP_COPY, decode_copy_maxsz, hdr);
+	encode_nfs4_stateid(xdr, &args->src_stateid);
+	encode_nfs4_stateid(xdr, &args->dst_stateid);
+
+	encode_uint64(xdr, args->src_pos);
+	encode_uint64(xdr, args->dst_pos);
+	encode_uint64(xdr, args->count);
+
+	encode_uint32(xdr, 1); /* consecutive = true */
+	encode_uint32(xdr, 1); /* synchronous = true */
+	encode_uint32(xdr, 0); /* src server list */
+}
+
 static void encode_deallocate(struct xdr_stream *xdr,
 			      struct nfs42_falloc_args *args,
 			      struct compound_hdr *hdr)
@@ -148,6 +188,26 @@ static void nfs4_xdr_enc_allocate(struct rpc_rqst *req,
 }
 
 /*
+ * Encode COPY request
+ */
+static void nfs4_xdr_enc_copy(struct rpc_rqst *req,
+			      struct xdr_stream *xdr,
+			      struct nfs42_copy_args *args)
+{
+	struct compound_hdr hdr = {
+		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
+	};
+
+	encode_compound_hdr(xdr, req, &hdr);
+	encode_sequence(xdr, &args->seq_args, &hdr);
+	encode_putfh(xdr, args->src_fh, &hdr);
+	encode_savefh(xdr, &hdr);
+	encode_putfh(xdr, args->dst_fh, &hdr);
+	encode_copy(xdr, args, &hdr);
+	encode_nops(&hdr);
+}
+
+/*
  * Encode DEALLOCATE request
  */
 static void nfs4_xdr_enc_deallocate(struct rpc_rqst *req,
@@ -211,6 +271,52 @@ static int decode_allocate(struct xdr_stream *xdr, struct nfs42_falloc_res *res)
 	return decode_op_hdr(xdr, OP_ALLOCATE);
 }
 
+static int decode_write_response(struct xdr_stream *xdr,
+				 struct nfs42_write_res *res)
+{
+	__be32 *p;
+	int stateids;
+
+	p = xdr_inline_decode(xdr, 4 + 8 + 4);
+	if (unlikely(!p))
+		goto out_overflow;
+
+	stateids = be32_to_cpup(p++);
+	p = xdr_decode_hyper(p, &res->count);
+	res->committed = be32_to_cpup(p);
+	return decode_verifier(xdr, &res->verifier);
+
+out_overflow:
+	print_overflow_msg(__func__, xdr);
+	return -EIO;
+}
+
+static int decode_copy(struct xdr_stream *xdr, struct nfs42_copy_res *res)
+{
+	__be32 *p;
+	int status;
+
+	status = decode_op_hdr(xdr, OP_COPY);
+	if (status)
+		return status;
+
+	status = decode_write_response(xdr, &res->write_res);
+	if (status)
+		return status;
+
+	p = xdr_inline_decode(xdr, 4 + 4);
+	if (unlikely(!p))
+		goto out_overflow;
+
+	res->consecutive = be32_to_cpup(p++);
+	res->synchronous = be32_to_cpup(p++);
+	return 0;
+
+out_overflow:
+	print_overflow_msg(__func__, xdr);
+	return -EIO;
+}
+
 static int decode_deallocate(struct xdr_stream *xdr, struct nfs42_falloc_res *res)
 {
 	return decode_op_hdr(xdr, OP_DEALLOCATE);
@@ -272,6 +378,36 @@ out:
 }
 
 /*
+ * Decode COPY response
+ */
+static int nfs4_xdr_dec_copy(struct rpc_rqst *rqstp,
+			     struct xdr_stream *xdr,
+			     struct nfs42_copy_res *res)
+{
+	struct compound_hdr hdr;
+	int status;
+
+	status = decode_compound_hdr(xdr, &hdr);
+	if (status)
+		goto out;
+	status = decode_sequence(xdr, &res->seq_res, rqstp);
+	if (status)
+		goto out;
+	status = decode_putfh(xdr);
+	if (status)
+		goto out;
+	status = decode_savefh(xdr);
+	if (status)
+		goto out;
+	status = decode_putfh(xdr);
+	if (status)
+		goto out;
+	status = decode_copy(xdr, res);
+out:
+	return status;
+}
+
+/*
  * Decode DEALLOCATE request
  */
 static int nfs4_xdr_dec_deallocate(struct rpc_rqst *rqstp,
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index dcd39d4..cc3353a 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -132,6 +132,13 @@ nfs4_file_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 }
 
 #ifdef CONFIG_NFS_V4_2
+static ssize_t nfs4_copy_file_range(struct file *file_in, loff_t pos_in,
+				    struct file *file_out, loff_t pos_out,
+				    size_t count, int flags)
+{
+	return nfs42_proc_copy(file_in, pos_in, file_out, pos_out, count);
+}
+
 static loff_t nfs4_file_llseek(struct file *filep, loff_t offset, int whence)
 {
 	loff_t ret;
@@ -186,6 +193,7 @@ const struct file_operations nfs4_file_operations = {
 	.splice_read	= nfs_file_splice_read,
 	.splice_write	= iter_file_splice_write,
 #ifdef CONFIG_NFS_V4_2
+	.copy_file_range = nfs4_copy_file_range,
 	.fallocate	= nfs42_fallocate,
 #endif /* CONFIG_NFS_V4_2 */
 	.check_flags	= nfs_check_flags,
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 3acb1eb..f0c59eb 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -8648,6 +8648,7 @@ static const struct nfs4_minor_version_ops nfs_v4_2_minor_ops = {
 		| NFS_CAP_STATEID_NFSV41
 		| NFS_CAP_ATOMIC_OPEN_V1
 		| NFS_CAP_ALLOCATE
+		| NFS_CAP_COPY
 		| NFS_CAP_DEALLOCATE
 		| NFS_CAP_SEEK
 		| NFS_CAP_LAYOUTSTATS,
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 558cd65d..8296628 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -7432,6 +7432,7 @@ struct rpc_procinfo	nfs4_procedures[] = {
 	PROC(ALLOCATE,		enc_allocate,		dec_allocate),
 	PROC(DEALLOCATE,	enc_deallocate,		dec_deallocate),
 	PROC(LAYOUTSTATS,	enc_layoutstats,	dec_layoutstats),
+	PROC(COPY,		enc_copy,		dec_copy),
 #endif /* CONFIG_NFS_V4_2 */
 };
 
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index b8e72aa..c975a99 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -501,6 +501,7 @@ enum {
 	NFSPROC4_CLNT_ALLOCATE,
 	NFSPROC4_CLNT_DEALLOCATE,
 	NFSPROC4_CLNT_LAYOUTSTATS,
+	NFSPROC4_CLNT_COPY,
 };
 
 /* nfs41 types */
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 20bc8e5..8d37f59 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -238,5 +238,6 @@ struct nfs_server {
 #define NFS_CAP_ALLOCATE	(1U << 20)
 #define NFS_CAP_DEALLOCATE	(1U << 21)
 #define NFS_CAP_LAYOUTSTATS	(1U << 22)
+#define NFS_CAP_COPY		(1U << 23)
 
 #endif
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 7bbe505..e5f6227 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1321,6 +1321,33 @@ struct nfs42_falloc_res {
 	const struct nfs_server		*falloc_server;
 };
 
+struct nfs42_copy_args {
+	struct nfs4_sequence_args	seq_args;
+
+	struct nfs_fh			*src_fh;
+	nfs4_stateid			src_stateid;
+	u64				src_pos;
+
+	struct nfs_fh			*dst_fh;
+	nfs4_stateid			dst_stateid;
+	u64				dst_pos;
+
+	u64				count;
+};
+
+struct nfs42_write_res {
+	u64		count;
+	u32		committed;
+	nfs4_verifier	verifier;
+};
+
+struct nfs42_copy_res {
+	struct nfs4_sequence_res	seq_res;
+	struct nfs42_write_res		write_res;
+	bool				consecutive;
+	bool				synchronous;
+};
+
 struct nfs42_seek_args {
 	struct nfs4_sequence_args	seq_args;
 
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC] vfs_copy_range() test program
  2015-08-07 20:38 [PATCH 0/7] NFSv4.2: Add support for the COPY operation Anna Schumaker
                   ` (6 preceding siblings ...)
  2015-08-07 20:38 ` [PATCH 7/7] NFS: Add COPY nfs operation Anna Schumaker
@ 2015-08-07 20:38 ` Anna Schumaker
  2015-08-10 21:07 ` [PATCH 0/7] NFSv4.2: Add support for the COPY operation J. Bruce Fields
  8 siblings, 0 replies; 12+ messages in thread
From: Anna Schumaker @ 2015-08-07 20:38 UTC (permalink / raw)
  To: Trond.Myklebust, linux-nfs, bfields; +Cc: Anna.Schumaker

This is a simple C program that I used for calling the copy system call.
Usage:  ./nfscopy /nfs/original.txt /nfs/copy.txt
---
 nfscopy.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)
 create mode 100644 nfscopy.c

diff --git a/nfscopy.c b/nfscopy.c
new file mode 100644
index 0000000..535673e
--- /dev/null
+++ b/nfscopy.c
@@ -0,0 +1,64 @@
+#include <errno.h>
+#include <fcntl.h>
+#include <limits.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#define SYS_COPY_RANGE 323
+
+int do_copy(int f_in, int f_out, loff_t f_size)
+{
+	loff_t offset = 0;
+	ssize_t ret;
+
+	while (offset < f_size) {
+		size_t size = f_size - offset;
+		if ((size + offset) != f_size)
+			size = INT_MAX - 1;
+
+		ret = syscall(SYS_COPY_RANGE, f_in, &offset, f_out, &offset, size, 0);
+		if (ret < 0) {
+			printf("Copy error: %s\n", strerror(errno));
+			return ret;
+		}
+	}
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	int f_in, f_out, ret;
+	struct stat fstat;
+
+	if (argc != 3) {
+		printf("Usage: %s f_in f_out\n", argv[0]);
+		exit(1);
+	}
+
+	f_in = open(argv[1], O_RDONLY);
+	if (f_in < 0) {
+		printf("%s: %s\n", argv[1], strerror(errno));
+		exit(1);
+	}
+
+	if (stat(argv[1], &fstat) < 0) {
+		printf("%s: %s\n", argv[1], strerror(errno));
+		exit(1);
+	}
+
+	f_out = open(argv[2], O_WRONLY | O_CREAT | O_SYNC, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
+	if (f_out < 0) {
+		printf("%s: %s\n", argv[2], strerror(errno));
+		exit(1);
+	}
+
+	ret = do_copy(f_in, f_out, fstat.st_size);
+
+	fsync(f_out);
+	close(f_in);
+	close(f_out);
+	return ret;
+}
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/7] NFSv4.2: Add support for the COPY operation
  2015-08-07 20:38 [PATCH 0/7] NFSv4.2: Add support for the COPY operation Anna Schumaker
                   ` (7 preceding siblings ...)
  2015-08-07 20:38 ` [RFC] vfs_copy_range() test program Anna Schumaker
@ 2015-08-10 21:07 ` J. Bruce Fields
  2015-08-11 17:52   ` Anna Schumaker
  8 siblings, 1 reply; 12+ messages in thread
From: J. Bruce Fields @ 2015-08-10 21:07 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Trond.Myklebust, linux-nfs

On Fri, Aug 07, 2015 at 04:38:16PM -0400, Anna Schumaker wrote:
> These patches add client and server support for the NFS v4.2 COPY operation.
> Unlike the similar CLONE operation, COPY can support both acceleration through
> and reflink and a full copy of data from one file into another.  These patches
> make use of Zach Brown's vfs_copy_file_range() syscall, and the first three
> patches in this series are simply a reposting of the patches that add the
> syscall.
> 
> Patch 4 expands vfs_copy_file_range() to fall back on the splice interface for
> copies where the filesystem does not support copy accelerations.  This behavior
> is useful for NFSD, since we'll still want to copy the file even if we can't
> do a reflink.  Additionally, this opens up the possibility of in-kernel copies
> for all filesystems without needing to do frequent switches between kernel and
> user space.  The only potential drawback I've noticed is that splice will write

Also on the server side it means the copy can potentially take
arbitrarily long, right?  (And tie up a protocol slot and server thread
the whole time?)

> out data in PAGE_SIZE chunks, even if wsize > PAGE_SIZE.  This leads to a few
> more writes over the wire, but I have not noticed a significant timing
> difference.  Still, I wonder if there is a better way to optimize this for NFS.

Ideally, write-behind and readahead should paper over this?

> 
> The remaining patches implement the COPY operation for both the client and the
> server.  The program I used for testing is included as an RFC as the last patch
> in the series.  I gathered performance information by comparing the runtime and
> RPC count of this program against /usr/bin/cp for various file sizes.
> 
> /usr/bin/cp:
>                       size:    513MB   1024MB   1536MB   2048MB
> ------------- ------------- -------- -------- -------- --------
> nfs v4 client        total:     8203    16396    24588    32780
> ------------- ------------- -------- -------- -------- --------
> nfs v4 client         read:     4096     8192    12288    16384
> nfs v4 client        write:     4096     8192    12288    16384
> nfs v4 client       commit:        1        1        1        1
> nfs v4 client         open:        1        1        1        1
> nfs v4 client    open_noat:        2        2        2        2
> nfs v4 client        close:        1        1        1        1
> nfs v4 client      setattr:        2        2        2        2
> nfs v4 client       access:        2        3        3        3
> nfs v4 client      getattr:        2        2        2        2
> 
> /usr/bin/cp /nfs/test-512  /nfs/test-copy  0.00s user 0.32s system 14% cpu 2.209 total
> /usr/bin/cp /nfs/test-1024 /nfs/test-copy  0.00s user 0.66s system 18% cpu 3.651 total
> /usr/bin/cp /nfs/test-1536 /nfs/test-copy  0.02s user 0.97s system 18% cpu 5.477 total
> /usr/bin/cp /nfs/test-2048 /nfs/test-copy  0.00s user 1.38s system 15% cpu 9.085 total
> 
> 
> Copy system call:
>                       size:    512MB   1024MB   1536MB   2048MB
> ------------- ------------- -------- -------- -------- --------
> nfs v4 client        total:        6        6        6        6
> ------------- ------------- -------- -------- -------- --------
> nfs v4 client         open:        2        2        2        2
> nfs v4 client        close:        2        2        2        2
> nfs v4 client       access:        1        1        1        1
> nfs v4 client         copy:        1        1        1        1
> 
> 
> ./nfscopy /nfs/test-512  /nfs/test-copy  0.00s user 0.00s system 0% cpu 1.148 total
> ./nfscopy /nfs/test-1024 /nfs/test-copy  0.00s user 0.00s system 0% cpu 2.293 total
> ./nfscopy /nfs/test-1536 /nfs/test-copy  0.00s user 0.00s system 0% cpu 3.037 total
> ./nfscopy /nfs/test-2048 /nfs/test-copy  0.00s user 0.00s system 0% cpu 4.045 total
> 
> 
> Questions, comments, and other testing ideas would be greatly appreciated!
> 
> Thanks,
> Anna
> 
> 
> Anna Schumaker (4):
>   VFS: Fall back on splice if no copy function defined
>   nfsd: Pass filehandle to nfs4_preprocess_stateid_op()
>   NFSD: Implement the COPY call
>   NFS: Add COPY nfs operation
> 
> Zach Brown (3):
>   vfs: add copy_file_range syscall and vfs helper
>   x86: add sys_copy_file_range to syscall tables
>   btrfs: add .copy_file_range file operation
> 
>  arch/x86/entry/syscalls/syscall_32.tbl |   1 +
>  arch/x86/entry/syscalls/syscall_64.tbl |   1 +
>  fs/btrfs/ctree.h                       |   3 +
>  fs/btrfs/file.c                        |   1 +
>  fs/btrfs/ioctl.c                       |  91 ++++++++++++----------
>  fs/nfs/nfs42.h                         |   1 +
>  fs/nfs/nfs42proc.c                     |  40 ++++++++++
>  fs/nfs/nfs42xdr.c                      | 136 +++++++++++++++++++++++++++++++++
>  fs/nfs/nfs4file.c                      |   8 ++
>  fs/nfs/nfs4proc.c                      |   1 +
>  fs/nfs/nfs4xdr.c                       |   1 +
>  fs/nfsd/nfs4proc.c                     |  79 +++++++++++++++++--
>  fs/nfsd/nfs4state.c                    |   5 +-
>  fs/nfsd/nfs4xdr.c                      |  62 ++++++++++++++-
>  fs/nfsd/state.h                        |   4 +-
>  fs/nfsd/vfs.c                          |  13 ++++
>  fs/nfsd/vfs.h                          |   1 +
>  fs/nfsd/xdr4.h                         |  23 ++++++
>  fs/read_write.c                        | 133 ++++++++++++++++++++++++++++++++
>  include/linux/fs.h                     |   3 +
>  include/linux/nfs4.h                   |   1 +
>  include/linux/nfs_fs_sb.h              |   1 +
>  include/linux/nfs_xdr.h                |  27 +++++++
>  include/uapi/asm-generic/unistd.h      |   4 +-
>  kernel/sys_ni.c                        |   1 +
>  25 files changed, 587 insertions(+), 54 deletions(-)
> 
> -- 
> 2.5.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/7] NFSv4.2: Add support for the COPY operation
  2015-08-10 21:07 ` [PATCH 0/7] NFSv4.2: Add support for the COPY operation J. Bruce Fields
@ 2015-08-11 17:52   ` Anna Schumaker
  0 siblings, 0 replies; 12+ messages in thread
From: Anna Schumaker @ 2015-08-11 17:52 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Trond.Myklebust, linux-nfs

On 08/10/2015 05:07 PM, J. Bruce Fields wrote:
> On Fri, Aug 07, 2015 at 04:38:16PM -0400, Anna Schumaker wrote:
>> These patches add client and server support for the NFS v4.2 COPY operation.
>> Unlike the similar CLONE operation, COPY can support both acceleration through
>> and reflink and a full copy of data from one file into another.  These patches
>> make use of Zach Brown's vfs_copy_file_range() syscall, and the first three
>> patches in this series are simply a reposting of the patches that add the
>> syscall.
>>
>> Patch 4 expands vfs_copy_file_range() to fall back on the splice interface for
>> copies where the filesystem does not support copy accelerations.  This behavior
>> is useful for NFSD, since we'll still want to copy the file even if we can't
>> do a reflink.  Additionally, this opens up the possibility of in-kernel copies
>> for all filesystems without needing to do frequent switches between kernel and
>> user space.  The only potential drawback I've noticed is that splice will write
> 
> Also on the server side it means the copy can potentially take
> arbitrarily long, right?  (And tie up a protocol slot and server thread
> the whole time?)

Potentially, but I could put in a cap like we had talked about in the past.  In practice, the VFS limits copies to slightly more than 2G because of the call to rw_verify_area().

> 
>> out data in PAGE_SIZE chunks, even if wsize > PAGE_SIZE.  This leads to a few
>> more writes over the wire, but I have not noticed a significant timing
>> difference.  Still, I wonder if there is a better way to optimize this for NFS.
> 
> Ideally, write-behind and readahead should paper over this?

I think I might have misinterpreted the RPC counts, and it is writing out chunks larger than PAGE_SIZE.  I noticed that there were twice as many writes as reads, but when I looked at wireshark I saw that each write was wsize/2.

Thanks,
Anna

> 
>>
>> The remaining patches implement the COPY operation for both the client and the
>> server.  The program I used for testing is included as an RFC as the last patch
>> in the series.  I gathered performance information by comparing the runtime and
>> RPC count of this program against /usr/bin/cp for various file sizes.
>>
>> /usr/bin/cp:
>>                       size:    513MB   1024MB   1536MB   2048MB
>> ------------- ------------- -------- -------- -------- --------
>> nfs v4 client        total:     8203    16396    24588    32780
>> ------------- ------------- -------- -------- -------- --------
>> nfs v4 client         read:     4096     8192    12288    16384
>> nfs v4 client        write:     4096     8192    12288    16384
>> nfs v4 client       commit:        1        1        1        1
>> nfs v4 client         open:        1        1        1        1
>> nfs v4 client    open_noat:        2        2        2        2
>> nfs v4 client        close:        1        1        1        1
>> nfs v4 client      setattr:        2        2        2        2
>> nfs v4 client       access:        2        3        3        3
>> nfs v4 client      getattr:        2        2        2        2
>>
>> /usr/bin/cp /nfs/test-512  /nfs/test-copy  0.00s user 0.32s system 14% cpu 2.209 total
>> /usr/bin/cp /nfs/test-1024 /nfs/test-copy  0.00s user 0.66s system 18% cpu 3.651 total
>> /usr/bin/cp /nfs/test-1536 /nfs/test-copy  0.02s user 0.97s system 18% cpu 5.477 total
>> /usr/bin/cp /nfs/test-2048 /nfs/test-copy  0.00s user 1.38s system 15% cpu 9.085 total
>>
>>
>> Copy system call:
>>                       size:    512MB   1024MB   1536MB   2048MB
>> ------------- ------------- -------- -------- -------- --------
>> nfs v4 client        total:        6        6        6        6
>> ------------- ------------- -------- -------- -------- --------
>> nfs v4 client         open:        2        2        2        2
>> nfs v4 client        close:        2        2        2        2
>> nfs v4 client       access:        1        1        1        1
>> nfs v4 client         copy:        1        1        1        1
>>
>>
>> ./nfscopy /nfs/test-512  /nfs/test-copy  0.00s user 0.00s system 0% cpu 1.148 total
>> ./nfscopy /nfs/test-1024 /nfs/test-copy  0.00s user 0.00s system 0% cpu 2.293 total
>> ./nfscopy /nfs/test-1536 /nfs/test-copy  0.00s user 0.00s system 0% cpu 3.037 total
>> ./nfscopy /nfs/test-2048 /nfs/test-copy  0.00s user 0.00s system 0% cpu 4.045 total
>>
>>
>> Questions, comments, and other testing ideas would be greatly appreciated!
>>
>> Thanks,
>> Anna
>>
>>
>> Anna Schumaker (4):
>>   VFS: Fall back on splice if no copy function defined
>>   nfsd: Pass filehandle to nfs4_preprocess_stateid_op()
>>   NFSD: Implement the COPY call
>>   NFS: Add COPY nfs operation
>>
>> Zach Brown (3):
>>   vfs: add copy_file_range syscall and vfs helper
>>   x86: add sys_copy_file_range to syscall tables
>>   btrfs: add .copy_file_range file operation
>>
>>  arch/x86/entry/syscalls/syscall_32.tbl |   1 +
>>  arch/x86/entry/syscalls/syscall_64.tbl |   1 +
>>  fs/btrfs/ctree.h                       |   3 +
>>  fs/btrfs/file.c                        |   1 +
>>  fs/btrfs/ioctl.c                       |  91 ++++++++++++----------
>>  fs/nfs/nfs42.h                         |   1 +
>>  fs/nfs/nfs42proc.c                     |  40 ++++++++++
>>  fs/nfs/nfs42xdr.c                      | 136 +++++++++++++++++++++++++++++++++
>>  fs/nfs/nfs4file.c                      |   8 ++
>>  fs/nfs/nfs4proc.c                      |   1 +
>>  fs/nfs/nfs4xdr.c                       |   1 +
>>  fs/nfsd/nfs4proc.c                     |  79 +++++++++++++++++--
>>  fs/nfsd/nfs4state.c                    |   5 +-
>>  fs/nfsd/nfs4xdr.c                      |  62 ++++++++++++++-
>>  fs/nfsd/state.h                        |   4 +-
>>  fs/nfsd/vfs.c                          |  13 ++++
>>  fs/nfsd/vfs.h                          |   1 +
>>  fs/nfsd/xdr4.h                         |  23 ++++++
>>  fs/read_write.c                        | 133 ++++++++++++++++++++++++++++++++
>>  include/linux/fs.h                     |   3 +
>>  include/linux/nfs4.h                   |   1 +
>>  include/linux/nfs_fs_sb.h              |   1 +
>>  include/linux/nfs_xdr.h                |  27 +++++++
>>  include/uapi/asm-generic/unistd.h      |   4 +-
>>  kernel/sys_ni.c                        |   1 +
>>  25 files changed, 587 insertions(+), 54 deletions(-)
>>
>> -- 
>> 2.5.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/7] VFS: Fall back on splice if no copy function defined
  2015-08-07 20:38 ` [PATCH 4/7] VFS: Fall back on splice if no copy function defined Anna Schumaker
@ 2015-08-13 13:03   ` Kinglong Mee
  0 siblings, 0 replies; 12+ messages in thread
From: Kinglong Mee @ 2015-08-13 13:03 UTC (permalink / raw)
  To: Anna Schumaker, Trond.Myklebust, linux-nfs, J. Bruce Fields, kinglongmee

On 8/8/2015 04:38, Anna Schumaker wrote:
> The NFS server will need a fallback for filesystems that don't have any
> kind of copy acceleration yet.  Let's handle this by having
> vfs_copy_range() fall back to splice, enabling an in-kernel fallback for
> all filesystems.

I'd like do the job in nfsd_copy_range().

If user only want call the underlay filesystem's copy_file_range()?
want get an error if not support. But, this patch lets the syscall
to another logical of calling do_splice_direct().

thanks,
Kinglong Mee

> 
> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> ---
>  fs/read_write.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 3804547..e564a6b 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1358,7 +1358,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  	if (!(file_in->f_mode & FMODE_READ) ||
>  	    !(file_out->f_mode & FMODE_WRITE) ||
>  	    (file_out->f_flags & O_APPEND) ||
> -	    !file_in->f_op || !file_in->f_op->copy_file_range)
> +	    !file_in->f_op)
>  		return -EINVAL;
>  
>  	inode_in = file_inode(file_in);
> @@ -1382,8 +1382,12 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  	if (ret)
>  		return ret;
>  
> -	ret = file_in->f_op->copy_file_range(file_in, pos_in, file_out, pos_out,
> -					     len, flags);
> +	ret = -ENOTSUPP;
> +	if (file_in->f_op->copy_file_range)
> +		ret = file_in->f_op->copy_file_range(file_in, pos_in, file_out,
> +						     pos_out, len, flags);
> +	if (ret == -ENOTSUPP)
> +		ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out, len, flags);
>  	if (ret > 0) {
>  		fsnotify_access(file_in);
>  		add_rchar(current, ret);
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-08-13 13:04 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-07 20:38 [PATCH 0/7] NFSv4.2: Add support for the COPY operation Anna Schumaker
2015-08-07 20:38 ` [PATCH 1/7] vfs: add copy_file_range syscall and vfs helper Anna Schumaker
2015-08-07 20:38 ` [PATCH 2/7] x86: add sys_copy_file_range to syscall tables Anna Schumaker
2015-08-07 20:38 ` [PATCH 3/7] btrfs: add .copy_file_range file operation Anna Schumaker
2015-08-07 20:38 ` [PATCH 4/7] VFS: Fall back on splice if no copy function defined Anna Schumaker
2015-08-13 13:03   ` Kinglong Mee
2015-08-07 20:38 ` [PATCH 5/7] nfsd: Pass filehandle to nfs4_preprocess_stateid_op() Anna Schumaker
2015-08-07 20:38 ` [PATCH 6/7] NFSD: Implement the COPY call Anna Schumaker
2015-08-07 20:38 ` [PATCH 7/7] NFS: Add COPY nfs operation Anna Schumaker
2015-08-07 20:38 ` [RFC] vfs_copy_range() test program Anna Schumaker
2015-08-10 21:07 ` [PATCH 0/7] NFSv4.2: Add support for the COPY operation J. Bruce Fields
2015-08-11 17:52   ` Anna Schumaker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.