linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/9] VFS: In-kernel copy system call
@ 2015-09-29 18:05 Anna Schumaker
  2015-09-29 18:05 ` [PATCH v4 1/9] vfs: add copy_file_range syscall and vfs helper Anna Schumaker
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Anna Schumaker @ 2015-09-29 18:05 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, zab-ugsP4Wv/S6ZeoWH0uzbU5w,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, clm-b10kYP2dOMg,
	darrick.wong-QHcLZuEGTsvQT0dZR+AlfA,
	mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	andros-HgOvQuBEEgTQT0dZR+AlfA, hch-wEGCiKHe2LqWVfeAwA7xHQ

Copy system calls came up during Plumbers a while ago, mostly because several
filesystems (including NFS and XFS) are currently working on copy acceleration
implementations.  We haven't heard from Zach Brown in a while, so I volunteered
to push his patches upstream so individual filesystems don't need to keep
writing their own ioctls.

The question has come up about how vfs_copy_file_range() responds to signals,
and I don't have a good answer.  The pagecache copy option uses splice,
which (as far as I can tell) doesn't get interrupted.  Please let me know if
I'm missing something or completely misunderstanding the question!

This is the fourth version, and only contains minor changes from v3.

Changes in v4:
- Rename COPY_FR_DEDUPE to COPY_FR_DEDUP
- Update man page after mailing list comments about COPY_FR_DEDUP
- Add Reviewed-by tags from Darrick

Questions?  Comments?  Thoughts?

Anna


Anna Schumaker (6):
  vfs: Copy should check len after file open mode
  vfs: Copy shouldn't forbid ranges inside the same file
  vfs: Copy should use file_out rather than file_in
  vfs: Remove copy_file_range mountpoint checks
  vfs: Add vfs_copy_file_range() support for pagecache copies
  btrfs: btrfs_copy_file_range() only supports reflinks

Zach Brown (3):
  vfs: add copy_file_range syscall and vfs helper
  x86: add sys_copy_file_range to syscall tables
  btrfs: add .copy_file_range file operation

 arch/x86/entry/syscalls/syscall_32.tbl |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 fs/btrfs/ctree.h                       |   3 +
 fs/btrfs/file.c                        |   1 +
 fs/btrfs/ioctl.c                       |  95 +++++++++++++---------
 fs/read_write.c                        | 141 +++++++++++++++++++++++++++++++++
 include/linux/copy.h                   |   6 ++
 include/linux/fs.h                     |   3 +
 include/uapi/asm-generic/unistd.h      |   2 +
 include/uapi/linux/Kbuild              |   1 +
 include/uapi/linux/copy.h              |   8 ++
 kernel/sys_ni.c                        |   1 +
 12 files changed, 224 insertions(+), 39 deletions(-)
 create mode 100644 include/linux/copy.h
 create mode 100644 include/uapi/linux/copy.h

-- 
2.5.3

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4 1/9] vfs: add copy_file_range syscall and vfs helper
  2015-09-29 18:05 [PATCH v4 0/9] VFS: In-kernel copy system call Anna Schumaker
@ 2015-09-29 18:05 ` Anna Schumaker
  2015-09-29 18:05 ` [PATCH v4 4/9] vfs: Copy should check len after file open mode Anna Schumaker
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Anna Schumaker @ 2015-09-29 18:05 UTC (permalink / raw)
  To: linux-nfs, linux-btrfs, linux-fsdevel, linux-api, zab, viro, clm,
	darrick.wong, mtk.manpages, andros, hch

From: Zach Brown <zab@redhat.com>

Add a copy_file_range() system call for offloading copies between
regular files.

This gives an interface to underlying layers of the storage stack which
can copy without reading and writing all the data.  There are a few
candidates that should support copy offloading in the nearer term:

- btrfs shares extent references with its clone ioctl
- NFS has patches to add a COPY command which copies on the server
- SCSI has a family of XCOPY commands which copy in the device

This system call avoids the complexity of also accelerating the creation
of the destination file by operating on an existing destination file
descriptor, not a path.

Currently the high level vfs entry point limits copy offloading to files
on the same mount and super (and not in the same file).  This can be
relaxed if we get implementations which can copy between file systems
safely.

Signed-off-by: Zach Brown <zab@redhat.com>
[Anna Schumaker: Change -EINVAL to -EBADF during file verification]
[Anna Schumaker: Change flags parameter from int to unsigned int]
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 fs/read_write.c                   | 129 ++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h                |   3 +
 include/uapi/asm-generic/unistd.h |   2 +
 kernel/sys_ni.c                   |   1 +
 4 files changed, 135 insertions(+)

diff --git a/fs/read_write.c b/fs/read_write.c
index 819ef3f..dd10750 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -16,6 +16,7 @@
 #include <linux/pagemap.h>
 #include <linux/splice.h>
 #include <linux/compat.h>
+#include <linux/mount.h>
 #include "internal.h"
 
 #include <asm/uaccess.h>
@@ -1327,3 +1328,131 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd,
 	return do_sendfile(out_fd, in_fd, NULL, count, 0);
 }
 #endif
+
+/*
+ * copy_file_range() differs from regular file read and write in that it
+ * specifically allows return partial success.  When it does so is up to
+ * the copy_file_range method.
+ */
+ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
+			    struct file *file_out, loff_t pos_out,
+			    size_t len, unsigned int flags)
+{
+	struct inode *inode_in;
+	struct inode *inode_out;
+	ssize_t ret;
+
+	if (flags)
+		return -EINVAL;
+
+	if (len == 0)
+		return 0;
+
+	/* copy_file_range allows full ssize_t len, ignoring MAX_RW_COUNT  */
+	ret = rw_verify_area(READ, file_in, &pos_in, len);
+	if (ret >= 0)
+		ret = rw_verify_area(WRITE, file_out, &pos_out, len);
+	if (ret < 0)
+		return ret;
+
+	if (!(file_in->f_mode & FMODE_READ) ||
+	    !(file_out->f_mode & FMODE_WRITE) ||
+	    (file_out->f_flags & O_APPEND) ||
+	    !file_in->f_op || !file_in->f_op->copy_file_range)
+		return -EBADF;
+
+	inode_in = file_inode(file_in);
+	inode_out = file_inode(file_out);
+
+	/* make sure offsets don't wrap and the input is inside i_size */
+	if (pos_in + len < pos_in || pos_out + len < pos_out ||
+	    pos_in + len > i_size_read(inode_in))
+		return -EINVAL;
+
+	/* this could be relaxed once a method supports cross-fs copies */
+	if (inode_in->i_sb != inode_out->i_sb ||
+	    file_in->f_path.mnt != file_out->f_path.mnt)
+		return -EXDEV;
+
+	/* forbid ranges in the same file */
+	if (inode_in == inode_out)
+		return -EINVAL;
+
+	ret = mnt_want_write_file(file_out);
+	if (ret)
+		return ret;
+
+	ret = file_in->f_op->copy_file_range(file_in, pos_in, file_out, pos_out,
+					     len, flags);
+	if (ret > 0) {
+		fsnotify_access(file_in);
+		add_rchar(current, ret);
+		fsnotify_modify(file_out);
+		add_wchar(current, ret);
+	}
+	inc_syscr(current);
+	inc_syscw(current);
+
+	mnt_drop_write_file(file_out);
+
+	return ret;
+}
+EXPORT_SYMBOL(vfs_copy_file_range);
+
+SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
+		int, fd_out, loff_t __user *, off_out,
+		size_t, len, unsigned int, flags)
+{
+	loff_t pos_in;
+	loff_t pos_out;
+	struct fd f_in;
+	struct fd f_out;
+	ssize_t ret;
+
+	f_in = fdget(fd_in);
+	f_out = fdget(fd_out);
+	if (!f_in.file || !f_out.file) {
+		ret = -EBADF;
+		goto out;
+	}
+
+	ret = -EFAULT;
+	if (off_in) {
+		if (copy_from_user(&pos_in, off_in, sizeof(loff_t)))
+			goto out;
+	} else {
+		pos_in = f_in.file->f_pos;
+	}
+
+	if (off_out) {
+		if (copy_from_user(&pos_out, off_out, sizeof(loff_t)))
+			goto out;
+	} else {
+		pos_out = f_out.file->f_pos;
+	}
+
+	ret = vfs_copy_file_range(f_in.file, pos_in, f_out.file, pos_out, len,
+				  flags);
+	if (ret > 0) {
+		pos_in += ret;
+		pos_out += ret;
+
+		if (off_in) {
+			if (copy_to_user(off_in, &pos_in, sizeof(loff_t)))
+				ret = -EFAULT;
+		} else {
+			f_in.file->f_pos = pos_in;
+		}
+
+		if (off_out) {
+			if (copy_to_user(off_out, &pos_out, sizeof(loff_t)))
+				ret = -EFAULT;
+		} else {
+			f_out.file->f_pos = pos_out;
+		}
+	}
+out:
+	fdput(f_in);
+	fdput(f_out);
+	return ret;
+}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 72d8a84..6220307 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1642,6 +1642,7 @@ struct file_operations {
 #ifndef CONFIG_MMU
 	unsigned (*mmap_capabilities)(struct file *);
 #endif
+	ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, loff_t, size_t, unsigned int);
 };
 
 struct inode_operations {
@@ -1695,6 +1696,8 @@ extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
 		unsigned long, loff_t *);
 extern ssize_t vfs_writev(struct file *, const struct iovec __user *,
 		unsigned long, loff_t *);
+extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
+				   loff_t, size_t, unsigned int);
 
 struct super_operations {
    	struct inode *(*alloc_inode)(struct super_block *sb);
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index ee12400..078bd21 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -713,6 +713,8 @@ __SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
 __SYSCALL(__NR_userfaultfd, sys_userfaultfd)
 #define __NR_membarrier 283
 __SYSCALL(__NR_membarrier, sys_membarrier)
+#define __NR_copy_file_range 283
+__SYSCALL(__NR_copy_file_range, sys_copy_file_range)
 
 #undef __NR_syscalls
 #define __NR_syscalls 284
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index a02decf..83c5c82 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -174,6 +174,7 @@ cond_syscall(sys_setfsuid);
 cond_syscall(sys_setfsgid);
 cond_syscall(sys_capget);
 cond_syscall(sys_capset);
+cond_syscall(sys_copy_file_range);
 
 /* arch-specific weak syscall entries */
 cond_syscall(sys_pciconfig_read);
-- 
2.5.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 2/9] x86: add sys_copy_file_range to syscall tables
       [not found] ` <1443549913-8091-1-git-send-email-Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
@ 2015-09-29 18:05   ` Anna Schumaker
  2015-09-29 18:05   ` [PATCH v4 3/9] btrfs: add .copy_file_range file operation Anna Schumaker
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Anna Schumaker @ 2015-09-29 18:05 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, zab-ugsP4Wv/S6ZeoWH0uzbU5w,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, clm-b10kYP2dOMg,
	darrick.wong-QHcLZuEGTsvQT0dZR+AlfA,
	mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	andros-HgOvQuBEEgTQT0dZR+AlfA, hch-wEGCiKHe2LqWVfeAwA7xHQ

From: Zach Brown <zab-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Add sys_copy_file_range to the x86 syscall tables.

Signed-off-by: Zach Brown <zab-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
[Anna Schumaker: Update syscall number in syscall_32.tbl]
Signed-off-by: Anna Schumaker <Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
---
 arch/x86/entry/syscalls/syscall_32.tbl | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 7663c45..0531270 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -382,3 +382,4 @@
 373	i386	shutdown		sys_shutdown
 374	i386	userfaultfd		sys_userfaultfd
 375	i386	membarrier		sys_membarrier
+376	i386	copy_file_range		sys_copy_file_range
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 278842f..03a9396 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -331,6 +331,7 @@
 322	64	execveat		stub_execveat
 323	common	userfaultfd		sys_userfaultfd
 324	common	membarrier		sys_membarrier
+325	common	copy_file_range		sys_copy_file_range
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
-- 
2.5.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 3/9] btrfs: add .copy_file_range file operation
       [not found] ` <1443549913-8091-1-git-send-email-Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
  2015-09-29 18:05   ` [PATCH v4 2/9] x86: add sys_copy_file_range to syscall tables Anna Schumaker
@ 2015-09-29 18:05   ` Anna Schumaker
       [not found]     ` <1443549913-8091-4-git-send-email-Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
  2015-09-29 18:05   ` [PATCH v4 5/9] vfs: Copy shouldn't forbid ranges inside the same file Anna Schumaker
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Anna Schumaker @ 2015-09-29 18:05 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, zab-ugsP4Wv/S6ZeoWH0uzbU5w,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, clm-b10kYP2dOMg,
	darrick.wong-QHcLZuEGTsvQT0dZR+AlfA,
	mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	andros-HgOvQuBEEgTQT0dZR+AlfA, hch-wEGCiKHe2LqWVfeAwA7xHQ

From: Zach Brown <zab-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

This rearranges the existing COPY_RANGE ioctl implementation so that the
.copy_file_range file operation can call the core loop that copies file
data extent items.

The extent copying loop is lifted up into its own function.  It retains
the core btrfs error checks that should be shared.

Signed-off-by: Zach Brown <zab-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Josef Bacik <jbacik-b10kYP2dOMg@public.gmane.org>
Reviewed-by: David Sterba <dsterba-IBi9RG/b67k@public.gmane.org>
---
 fs/btrfs/ctree.h |  3 ++
 fs/btrfs/file.c  |  1 +
 fs/btrfs/ioctl.c | 91 ++++++++++++++++++++++++++++++++------------------------
 3 files changed, 56 insertions(+), 39 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 938efe3..5d06a4f 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3996,6 +3996,9 @@ int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode,
 		      loff_t pos, size_t write_bytes,
 		      struct extent_state **cached);
 int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
+ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
+			      struct file *file_out, loff_t pos_out,
+			      size_t len, int flags);
 
 /* tree-defrag.c */
 int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index b823fac..b05449c 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2816,6 +2816,7 @@ const struct file_operations btrfs_file_operations = {
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	= btrfs_ioctl,
 #endif
+	.copy_file_range = btrfs_copy_file_range,
 };
 
 void btrfs_auto_defrag_exit(void)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 0adf542..4311554 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3727,17 +3727,16 @@ out:
 	return ret;
 }
 
-static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
-				       u64 off, u64 olen, u64 destoff)
+static noinline int btrfs_clone_files(struct file *file, struct file *file_src,
+					u64 off, u64 olen, u64 destoff)
 {
 	struct inode *inode = file_inode(file);
+	struct inode *src = file_inode(file_src);
 	struct btrfs_root *root = BTRFS_I(inode)->root;
-	struct fd src_file;
-	struct inode *src;
 	int ret;
 	u64 len = olen;
 	u64 bs = root->fs_info->sb->s_blocksize;
-	int same_inode = 0;
+	int same_inode = src == inode;
 
 	/*
 	 * TODO:
@@ -3750,49 +3749,20 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
 	 *   be either compressed or non-compressed.
 	 */
 
-	/* the destination must be opened for writing */
-	if (!(file->f_mode & FMODE_WRITE) || (file->f_flags & O_APPEND))
-		return -EINVAL;
-
 	if (btrfs_root_readonly(root))
 		return -EROFS;
 
-	ret = mnt_want_write_file(file);
-	if (ret)
-		return ret;
-
-	src_file = fdget(srcfd);
-	if (!src_file.file) {
-		ret = -EBADF;
-		goto out_drop_write;
-	}
-
-	ret = -EXDEV;
-	if (src_file.file->f_path.mnt != file->f_path.mnt)
-		goto out_fput;
-
-	src = file_inode(src_file.file);
-
-	ret = -EINVAL;
-	if (src == inode)
-		same_inode = 1;
-
-	/* the src must be open for reading */
-	if (!(src_file.file->f_mode & FMODE_READ))
-		goto out_fput;
+	if (file_src->f_path.mnt != file->f_path.mnt ||
+	    src->i_sb != inode->i_sb)
+		return -EXDEV;
 
 	/* don't make the dst file partly checksummed */
 	if ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) !=
 	    (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM))
-		goto out_fput;
+		return -EINVAL;
 
-	ret = -EISDIR;
 	if (S_ISDIR(src->i_mode) || S_ISDIR(inode->i_mode))
-		goto out_fput;
-
-	ret = -EXDEV;
-	if (src->i_sb != inode->i_sb)
-		goto out_fput;
+		return -EISDIR;
 
 	if (!same_inode) {
 		btrfs_double_inode_lock(src, inode);
@@ -3869,6 +3839,49 @@ out_unlock:
 		btrfs_double_inode_unlock(src, inode);
 	else
 		mutex_unlock(&src->i_mutex);
+	return ret;
+}
+
+ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
+			      struct file *file_out, loff_t pos_out,
+			      size_t len, int flags)
+{
+	ssize_t ret;
+
+	ret = btrfs_clone_files(file_out, file_in, pos_in, len, pos_out);
+	if (ret == 0)
+		ret = len;
+	return ret;
+}
+
+static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
+				       u64 off, u64 olen, u64 destoff)
+{
+	struct fd src_file;
+	int ret;
+
+	/* the destination must be opened for writing */
+	if (!(file->f_mode & FMODE_WRITE) || (file->f_flags & O_APPEND))
+		return -EINVAL;
+
+	ret = mnt_want_write_file(file);
+	if (ret)
+		return ret;
+
+	src_file = fdget(srcfd);
+	if (!src_file.file) {
+		ret = -EBADF;
+		goto out_drop_write;
+	}
+
+	/* the src must be open for reading */
+	if (!(src_file.file->f_mode & FMODE_READ)) {
+		ret = -EINVAL;
+		goto out_fput;
+	}
+
+	ret = btrfs_clone_files(file, src_file.file, off, olen, destoff);
+
 out_fput:
 	fdput(src_file);
 out_drop_write:
-- 
2.5.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 4/9] vfs: Copy should check len after file open mode
  2015-09-29 18:05 [PATCH v4 0/9] VFS: In-kernel copy system call Anna Schumaker
  2015-09-29 18:05 ` [PATCH v4 1/9] vfs: add copy_file_range syscall and vfs helper Anna Schumaker
@ 2015-09-29 18:05 ` Anna Schumaker
       [not found] ` <1443549913-8091-1-git-send-email-Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Anna Schumaker @ 2015-09-29 18:05 UTC (permalink / raw)
  To: linux-nfs, linux-btrfs, linux-fsdevel, linux-api, zab, viro, clm,
	darrick.wong, mtk.manpages, andros, hch

I don't think it makes sense to report that a copy succeeded if the
files aren't open properly.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Reviewed-by: David Sterba <dsterba@suse.com>
---
 fs/read_write.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index dd10750..f3d6c48 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1345,9 +1345,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	if (flags)
 		return -EINVAL;
 
-	if (len == 0)
-		return 0;
-
 	/* copy_file_range allows full ssize_t len, ignoring MAX_RW_COUNT  */
 	ret = rw_verify_area(READ, file_in, &pos_in, len);
 	if (ret >= 0)
@@ -1378,6 +1375,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	if (inode_in == inode_out)
 		return -EINVAL;
 
+	if (len == 0)
+		return 0;
+
 	ret = mnt_want_write_file(file_out);
 	if (ret)
 		return ret;
-- 
2.5.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 5/9] vfs: Copy shouldn't forbid ranges inside the same file
       [not found] ` <1443549913-8091-1-git-send-email-Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
  2015-09-29 18:05   ` [PATCH v4 2/9] x86: add sys_copy_file_range to syscall tables Anna Schumaker
  2015-09-29 18:05   ` [PATCH v4 3/9] btrfs: add .copy_file_range file operation Anna Schumaker
@ 2015-09-29 18:05   ` Anna Schumaker
  2015-09-29 18:05   ` [PATCH v4 6/9] vfs: Copy should use file_out rather than file_in Anna Schumaker
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Anna Schumaker @ 2015-09-29 18:05 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, zab-ugsP4Wv/S6ZeoWH0uzbU5w,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, clm-b10kYP2dOMg,
	darrick.wong-QHcLZuEGTsvQT0dZR+AlfA,
	mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	andros-HgOvQuBEEgTQT0dZR+AlfA, hch-wEGCiKHe2LqWVfeAwA7xHQ

This is perfectly valid for BTRFS and XFS, so let's leave this up to
filesystems to check.

Signed-off-by: Anna Schumaker <Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
Reviewed-by: David Sterba <dsterba-IBi9RG/b67k@public.gmane.org>
Reviewed-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 fs/read_write.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index f3d6c48..8e7cb33 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1371,10 +1371,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	    file_in->f_path.mnt != file_out->f_path.mnt)
 		return -EXDEV;
 
-	/* forbid ranges in the same file */
-	if (inode_in == inode_out)
-		return -EINVAL;
-
 	if (len == 0)
 		return 0;
 
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 6/9] vfs: Copy should use file_out rather than file_in
       [not found] ` <1443549913-8091-1-git-send-email-Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-09-29 18:05   ` [PATCH v4 5/9] vfs: Copy shouldn't forbid ranges inside the same file Anna Schumaker
@ 2015-09-29 18:05   ` Anna Schumaker
  2015-09-29 18:05   ` [PATCH v4 7/9] vfs: Remove copy_file_range mountpoint checks Anna Schumaker
  2015-09-29 18:05   ` [PATCH v4 8/9] vfs: Add vfs_copy_file_range() support for pagecache copies Anna Schumaker
  5 siblings, 0 replies; 13+ messages in thread
From: Anna Schumaker @ 2015-09-29 18:05 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, zab-ugsP4Wv/S6ZeoWH0uzbU5w,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, clm-b10kYP2dOMg,
	darrick.wong-QHcLZuEGTsvQT0dZR+AlfA,
	mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	andros-HgOvQuBEEgTQT0dZR+AlfA, hch-wEGCiKHe2LqWVfeAwA7xHQ

The way to think about this is that the destination filesystem reads the
data from the source file and processes it accordingly.  This is
especially important to avoid an infinate loop when doing a "server to
server" copy on NFS.

Signed-off-by: Anna Schumaker <Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
---
 fs/read_write.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 8e7cb33..6f74f1f 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1355,7 +1355,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	if (!(file_in->f_mode & FMODE_READ) ||
 	    !(file_out->f_mode & FMODE_WRITE) ||
 	    (file_out->f_flags & O_APPEND) ||
-	    !file_in->f_op || !file_in->f_op->copy_file_range)
+	    !file_out->f_op || !file_out->f_op->copy_file_range)
 		return -EBADF;
 
 	inode_in = file_inode(file_in);
@@ -1378,8 +1378,8 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	if (ret)
 		return ret;
 
-	ret = file_in->f_op->copy_file_range(file_in, pos_in, file_out, pos_out,
-					     len, flags);
+	ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out, pos_out,
+					      len, flags);
 	if (ret > 0) {
 		fsnotify_access(file_in);
 		add_rchar(current, ret);
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 7/9] vfs: Remove copy_file_range mountpoint checks
       [not found] ` <1443549913-8091-1-git-send-email-Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
                     ` (3 preceding siblings ...)
  2015-09-29 18:05   ` [PATCH v4 6/9] vfs: Copy should use file_out rather than file_in Anna Schumaker
@ 2015-09-29 18:05   ` Anna Schumaker
  2015-09-29 18:05   ` [PATCH v4 8/9] vfs: Add vfs_copy_file_range() support for pagecache copies Anna Schumaker
  5 siblings, 0 replies; 13+ messages in thread
From: Anna Schumaker @ 2015-09-29 18:05 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, zab-ugsP4Wv/S6ZeoWH0uzbU5w,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, clm-b10kYP2dOMg,
	darrick.wong-QHcLZuEGTsvQT0dZR+AlfA,
	mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	andros-HgOvQuBEEgTQT0dZR+AlfA, hch-wEGCiKHe2LqWVfeAwA7xHQ

I still want to do an in-kernel copy even if the files are on different
mountpoints, and NFS has a "server to server" copy that expects two
files on different mountpoints.  Let's have individual filesystems
implement this check instead.

Signed-off-by: Anna Schumaker <Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
Reviewed-by: David Sterba <dsterba-IBi9RG/b67k@public.gmane.org>
---
 fs/read_write.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 6f74f1f..ee9fa37 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1366,11 +1366,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	    pos_in + len > i_size_read(inode_in))
 		return -EINVAL;
 
-	/* this could be relaxed once a method supports cross-fs copies */
-	if (inode_in->i_sb != inode_out->i_sb ||
-	    file_in->f_path.mnt != file_out->f_path.mnt)
-		return -EXDEV;
-
 	if (len == 0)
 		return 0;
 
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 8/9] vfs: Add vfs_copy_file_range() support for pagecache copies
       [not found] ` <1443549913-8091-1-git-send-email-Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
                     ` (4 preceding siblings ...)
  2015-09-29 18:05   ` [PATCH v4 7/9] vfs: Remove copy_file_range mountpoint checks Anna Schumaker
@ 2015-09-29 18:05   ` Anna Schumaker
  5 siblings, 0 replies; 13+ messages in thread
From: Anna Schumaker @ 2015-09-29 18:05 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, zab-ugsP4Wv/S6ZeoWH0uzbU5w,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, clm-b10kYP2dOMg,
	darrick.wong-QHcLZuEGTsvQT0dZR+AlfA,
	mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	andros-HgOvQuBEEgTQT0dZR+AlfA, hch-wEGCiKHe2LqWVfeAwA7xHQ

This allows us to have an in-kernel copy mechanism that avoids frequent
switches between kernel and user space.  This is especially useful so
NFSD can support server-side copies.

I make pagecache copies configurable by adding three new (exclusive)
flags:
- COPY_FR_REFLINK tells vfs_copy_file_range() to only create a reflink.
- COPY_FR_COPY does a full data copy, but may be filesystem accelerated.
- COPY_FR_DEDUP creates a reflink, but only if the contents of both
  ranges are identical.

The default (flags=0) means to first attempt a reflink, but use the pagecache
if that fails.

I moved the rw_verify_area() calls into the fallback code since some
filesystems can handle reflinking a large range.

Signed-off-by: Anna Schumaker <Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
v4:
- Reword commit message
- Rename COPY_FR_DEDUPE -> COPY_FR_DEDUP
---
 fs/read_write.c           | 61 +++++++++++++++++++++++++++++++----------------
 include/linux/copy.h      |  6 +++++
 include/uapi/linux/Kbuild |  1 +
 include/uapi/linux/copy.h |  8 +++++++
 4 files changed, 56 insertions(+), 20 deletions(-)
 create mode 100644 include/linux/copy.h
 create mode 100644 include/uapi/linux/copy.h

diff --git a/fs/read_write.c b/fs/read_write.c
index ee9fa37..4fb9b8e 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -7,6 +7,7 @@
 #include <linux/slab.h> 
 #include <linux/stat.h>
 #include <linux/fcntl.h>
+#include <linux/copy.h>
 #include <linux/file.h>
 #include <linux/uio.h>
 #include <linux/fsnotify.h>
@@ -1329,6 +1330,29 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd,
 }
 #endif
 
+static ssize_t vfs_copy_file_pagecache(struct file *file_in, loff_t pos_in,
+				       struct file *file_out, loff_t pos_out,
+				       size_t len)
+{
+	ssize_t ret;
+
+	ret = rw_verify_area(READ, file_in, &pos_in, len);
+	if (ret >= 0) {
+		len = ret;
+		ret = rw_verify_area(WRITE, file_out, &pos_out, len);
+		if (ret >= 0)
+			len = ret;
+	}
+	if (ret < 0)
+		return ret;
+
+	file_start_write(file_out);
+	ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out, len, 0);
+	file_end_write(file_out);
+
+	return ret;
+}
+
 /*
  * copy_file_range() differs from regular file read and write in that it
  * specifically allows return partial success.  When it does so is up to
@@ -1338,34 +1362,26 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 			    struct file *file_out, loff_t pos_out,
 			    size_t len, unsigned int flags)
 {
-	struct inode *inode_in;
-	struct inode *inode_out;
 	ssize_t ret;
 
-	if (flags)
+	/* Flags should only be used exclusively. */
+	if ((flags & COPY_FR_COPY) && (flags & ~COPY_FR_COPY))
+		return -EINVAL;
+	if ((flags & COPY_FR_REFLINK) && (flags & ~COPY_FR_REFLINK))
+		return -EINVAL;
+	if ((flags & COPY_FR_DEDUP) && (flags & ~COPY_FR_DEDUP))
 		return -EINVAL;
 
-	/* copy_file_range allows full ssize_t len, ignoring MAX_RW_COUNT  */
-	ret = rw_verify_area(READ, file_in, &pos_in, len);
-	if (ret >= 0)
-		ret = rw_verify_area(WRITE, file_out, &pos_out, len);
-	if (ret < 0)
-		return ret;
+	/* Default behavior is to try both. */
+	if (flags == 0)
+		flags = COPY_FR_COPY | COPY_FR_REFLINK;
 
 	if (!(file_in->f_mode & FMODE_READ) ||
 	    !(file_out->f_mode & FMODE_WRITE) ||
 	    (file_out->f_flags & O_APPEND) ||
-	    !file_out->f_op || !file_out->f_op->copy_file_range)
+	    !file_out->f_op)
 		return -EBADF;
 
-	inode_in = file_inode(file_in);
-	inode_out = file_inode(file_out);
-
-	/* make sure offsets don't wrap and the input is inside i_size */
-	if (pos_in + len < pos_in || pos_out + len < pos_out ||
-	    pos_in + len > i_size_read(inode_in))
-		return -EINVAL;
-
 	if (len == 0)
 		return 0;
 
@@ -1373,8 +1389,13 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	if (ret)
 		return ret;
 
-	ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out, pos_out,
-					      len, flags);
+	ret = -EOPNOTSUPP;
+	if (file_out->f_op->copy_file_range && (file_in->f_op == file_out->f_op))
+		ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
+						      pos_out, len, flags);
+	if ((ret < 0) && (flags & COPY_FR_COPY))
+		ret = vfs_copy_file_pagecache(file_in, pos_in, file_out,
+					      pos_out, len);
 	if (ret > 0) {
 		fsnotify_access(file_in);
 		add_rchar(current, ret);
diff --git a/include/linux/copy.h b/include/linux/copy.h
new file mode 100644
index 0000000..fd54543
--- /dev/null
+++ b/include/linux/copy.h
@@ -0,0 +1,6 @@
+#ifndef _LINUX_COPY_H
+#define _LINUX_COPY_H
+
+#include <uapi/linux/copy.h>
+
+#endif /* _LINUX_COPY_H */
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index f7b2db4..faafd67 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -90,6 +90,7 @@ header-y += coda_psdev.h
 header-y += coff.h
 header-y += connector.h
 header-y += const.h
+header-y += copy.h
 header-y += cramfs_fs.h
 header-y += cuda.h
 header-y += cyclades.h
diff --git a/include/uapi/linux/copy.h b/include/uapi/linux/copy.h
new file mode 100644
index 0000000..b807dcd
--- /dev/null
+++ b/include/uapi/linux/copy.h
@@ -0,0 +1,8 @@
+#ifndef _UAPI_LINUX_COPY_H
+#define _UAPI_LINUX_COPY_H
+
+#define COPY_FR_COPY		(1 << 0)  /* Only do a pagecache copy.  */
+#define COPY_FR_REFLINK		(1 << 1)  /* Only make a reflink.       */
+#define COPY_FR_DEDUP		(1 << 2)  /* Deduplicate file data.     */
+
+#endif /* _UAPI_LINUX_COPY_H */
-- 
2.5.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 9/9] btrfs: btrfs_copy_file_range() only supports reflinks
  2015-09-29 18:05 [PATCH v4 0/9] VFS: In-kernel copy system call Anna Schumaker
                   ` (2 preceding siblings ...)
       [not found] ` <1443549913-8091-1-git-send-email-Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
@ 2015-09-29 18:05 ` Anna Schumaker
  2015-09-29 18:05 ` [PATCH v4 10/9] copy_file_range.2: New page documenting copy_file_range() Anna Schumaker
  4 siblings, 0 replies; 13+ messages in thread
From: Anna Schumaker @ 2015-09-29 18:05 UTC (permalink / raw)
  To: linux-nfs, linux-btrfs, linux-fsdevel, linux-api, zab, viro, clm,
	darrick.wong, mtk.manpages, andros, hch

Reject copies that don't have the COPY_FR_REFLINK flag set.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Reviewed-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ioctl.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 4311554..2e14b91 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -44,6 +44,7 @@
 #include <linux/uuid.h>
 #include <linux/btrfs.h>
 #include <linux/uaccess.h>
+#include <linux/copy.h>
 #include "ctree.h"
 #include "disk-io.h"
 #include "transaction.h"
@@ -3848,6 +3849,9 @@ ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
 {
 	ssize_t ret;
 
+	if (!(flags & COPY_FR_REFLINK))
+		return -EOPNOTSUPP;
+
 	ret = btrfs_clone_files(file_out, file_in, pos_in, len, pos_out);
 	if (ret == 0)
 		ret = len;
-- 
2.5.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 10/9] copy_file_range.2: New page documenting copy_file_range()
  2015-09-29 18:05 [PATCH v4 0/9] VFS: In-kernel copy system call Anna Schumaker
                   ` (3 preceding siblings ...)
  2015-09-29 18:05 ` [PATCH v4 9/9] btrfs: btrfs_copy_file_range() only supports reflinks Anna Schumaker
@ 2015-09-29 18:05 ` Anna Schumaker
  4 siblings, 0 replies; 13+ messages in thread
From: Anna Schumaker @ 2015-09-29 18:05 UTC (permalink / raw)
  To: linux-nfs, linux-btrfs, linux-fsdevel, linux-api, zab, viro, clm,
	darrick.wong, mtk.manpages, andros, hch

copy_file_range() is a new system call for copying ranges of data
completely in the kernel.  This gives filesystems an opportunity to
implement some kind of "copy acceleration", such as reflinks or
server-side-copy (in the case of NFS).

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v4:
- Updates for COPY_FR_DEDUP
---
 man2/copy_file_range.2 | 224 +++++++++++++++++++++++++++++++++++++++++++++++++
 man2/splice.2          |   1 +
 2 files changed, 225 insertions(+)
 create mode 100644 man2/copy_file_range.2

diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
new file mode 100644
index 0000000..23e3875
--- /dev/null
+++ b/man2/copy_file_range.2
@@ -0,0 +1,224 @@
+.\"This manpage is Copyright (C) 2015 Anna Schumaker <Anna.Schumaker@Netapp.com>
+.\"
+.\" %%%LICENSE_START(VERBATIM)
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\"
+.\" Permission is granted to copy and distribute modified versions of
+.\" this manual under the conditions for verbatim copying, provided that
+.\" the entire resulting derived work is distributed under the terms of
+.\" a permission notice identical to this one.
+.\"
+.\" Since the Linux kernel and libraries are constantly changing, this
+.\" manual page may be incorrect or out-of-date.  The author(s) assume.
+.\" no responsibility for errors or omissions, or for damages resulting.
+.\" from the use of the information contained herein.  The author(s) may.
+.\" not have taken the same level of care in the production of this.
+.\" manual, which is licensed free of charge, as they might when working.
+.\" professionally.
+.\"
+.\" Formatted or processed versions of this manual, if unaccompanied by
+.\" the source, must acknowledge the copyright and authors of this work.
+.\" %%%LICENSE_END
+.\"
+.TH COPY 2 2015-09-29 "Linux" "Linux Programmer's Manual"
+.SH NAME
+copy_file_range \- Copy a range of data from one file to another
+.SH SYNOPSIS
+.nf
+.B #include <linux/copy.h>
+.B #include <sys/syscall.h>
+.B #include <unistd.h>
+
+.BI "ssize_t copy_file_range(int " fd_in ", loff_t *" off_in ", int " fd_out ",
+.BI "                        loff_t *" off_out ", size_t " len \
+", unsigned int " flags );
+.fi
+.SH DESCRIPTION
+The
+.BR copy_file_range ()
+system call performs an in-kernel copy between two file descriptors
+without the additional cost of transferring data from the kernel to userspace
+and then back into the kernel.
+It copies up to
+.I len
+bytes of data from file descriptor
+.I fd_in
+to file descriptor
+.IR fd_out ,
+overwriting any data that exists within the requested range of the target file.
+
+The following semantics apply for
+.IR off_in ,
+and similar statements apply to
+.IR off_out :
+.IP * 3
+If
+.I off_in
+is NULL, then bytes are read from
+.I fd_in
+starting from the current file offset, and the offset is
+adjusted by the number of bytes copied.
+.IP *
+If
+.I off_in
+is not NULL, then
+.I off_in
+must point to a buffer that specifies the starting
+offset where bytes from
+.I fd_in
+will be read.  The current file offset of
+.I fd_in
+is not changed, but
+.I off_in
+is adjusted appropriately.
+.PP
+
+The
+.I flags
+argument can have one of the following flags set:
+.TP 1.9i
+.B COPY_FR_COPY
+Copy all the file data in the requested range.
+Some filesystems might be able to accelerate this copy
+to avoid unnecessary data transfers.
+.TP
+.B COPY_FR_REFLINK
+Create a lightweight "reflink", where data is not copied until
+one of the files is modified.
+.TP
+.B COPY_FR_DEDUP
+Create a reflink, but only if the contents of
+both files' byte ranges are identical.
+If ranges do not match,
+.B EILSEQ
+will be returned.
+.PP
+The default behavior
+.RI ( flags
+== 0) is to try creating a reflink,
+and if reflinking fails
+.BR copy_file_range ()
+will fall back to performing a full data copy.
+.SH RETURN VALUE
+Upon successful completion,
+.BR copy_file_range ()
+will return the number of bytes copied between files.
+This could be less than the length originally requested.
+
+On error,
+.BR copy_file_range ()
+returns \-1 and
+.I errno
+is set to indicate the error.
+.SH ERRORS
+.TP
+.B EBADF
+One or more file descriptors are not valid; or
+.I fd_in
+is not open for reading; or
+.I fd_out
+is not open for writing.
+.TP
+.B EILSEQ
+The contents of both files' byte ranges did not match.
+.TP
+.B EINVAL
+Requested range extends beyond the end of the source file; or the
+.I flags
+argument is set to an invalid value.
+.TP
+.B EIO
+A low level I/O error occurred while copying.
+.TP
+.B ENOMEM
+Out of memory.
+.TP
+.B ENOSPC
+There is not enough space on the target filesystem to complete the copy.
+.TP
+.B EOPNOTSUPP
+.B COPY_REFLINK
+or
+.B COPY_DEDUP
+was specified in
+.IR flags ,
+but the target filesystem does not support the given operation.
+.TP
+.B EXDEV
+Target filesystem doesn't support cross-filesystem copies.
+.SH VERSIONS
+The
+.BR copy_file_range ()
+system call first appeared in Linux 4.4.
+.SH CONFORMING TO
+The
+.BR copy_file_range ()
+system call is a nonstandard Linux extension.
+.SH EXAMPLE
+.nf
+#define _GNU_SOURCE
+#include <fcntl.h>
+#include <linux/copy.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <unistd.h>
+
+loff_t copy_file_range(int fd_in, loff_t *off_in, int fd_out,
+                       loff_t *off_out, size_t len, unsigned int flags)
+{
+    return syscall(__NR_copy_file_range, fd_in, off_in, fd_out,
+                   off_out, len, flags);
+}
+
+int main(int argc, char **argv)
+{
+    int fd_in, fd_out;
+    struct stat stat;
+    loff_t len, ret;
+    char buf[2];
+
+    if (argc != 3) {
+        fprintf(stderr, "Usage: %s <source> <destination>\\n", argv[0]);
+        exit(EXIT_FAILURE);
+    }
+
+    fd_in = open(argv[1], O_RDONLY);
+    if (fd_in == \-1) {
+        perror("open (argv[1])");
+        exit(EXIT_FAILURE);
+    }
+
+    if (fstat(fd_in, &stat) == \-1) {
+        perror("fstat");
+        exit(EXIT_FAILURE);
+    }
+    len = stat.st_size;
+
+    fd_out = open(argv[2], O_CREAT|O_WRONLY|O_TRUNC, 0644);
+    if (fd_out == \-1) {
+        perror("open (argv[2])");
+        exit(EXIT_FAILURE);
+    }
+
+    do {
+        ret = copy_file_range(fd_in, NULL, fd_out, NULL, 
+                              len, COPY_FR_COPY);
+        if (ret == \-1) {
+            perror("copy_file_range");
+            exit(EXIT_FAILURE);
+        }
+
+        len \-= ret;
+    } while (len > 0);
+
+    close(fd_in);
+    close(fd_out);
+    exit(EXIT_SUCCESS);
+}
+.fi
+.SH SEE ALSO
+.BR splice (2)
diff --git a/man2/splice.2 b/man2/splice.2
index b9b4f42..5c162e0 100644
--- a/man2/splice.2
+++ b/man2/splice.2
@@ -238,6 +238,7 @@ only pointers are copied, not the pages of the buffer.
 See
 .BR tee (2).
 .SH SEE ALSO
+.BR copy_file_range (2),
 .BR sendfile (2),
 .BR tee (2),
 .BR vmsplice (2)
-- 
2.5.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* RE: [PATCH v4 3/9] btrfs: add .copy_file_range file operation
       [not found]     ` <1443549913-8091-4-git-send-email-Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
@ 2015-09-30  3:20       ` Zhao Lei
  2015-09-30 12:55         ` Anna Schumaker
  0 siblings, 1 reply; 13+ messages in thread
From: Zhao Lei @ 2015-09-30  3:20 UTC (permalink / raw)
  To: 'Anna Schumaker',
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, zab-ugsP4Wv/S6ZeoWH0uzbU5w,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, clm-b10kYP2dOMg,
	darrick.wong-QHcLZuEGTsvQT0dZR+AlfA,
	mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	andros-HgOvQuBEEgTQT0dZR+AlfA, hch-wEGCiKHe2LqWVfeAwA7xHQ

Hi, Anna Schumaker

> -----Original Message-----
> From: linux-btrfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> [mailto:linux-btrfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Anna Schumaker
> Sent: Wednesday, September 30, 2015 2:05 AM
> To: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
> linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; zab-ugsP4Wv/S6ZeoWH0uzbU5w@public.gmane.org;
> viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org; clm-b10kYP2dOMg@public.gmane.org; darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org;
> mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; andros-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org; hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org
> Subject: [PATCH v4 3/9] btrfs: add .copy_file_range file operation
> 
> From: Zach Brown <zab-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> This rearranges the existing COPY_RANGE ioctl implementation so that
> the .copy_file_range file operation can call the core loop that copies file data
> extent items.
> 
> The extent copying loop is lifted up into its own function.  It retains the core
> btrfs error checks that should be shared.
> 
> Signed-off-by: Zach Brown <zab-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Anna Schumaker <Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
> Reviewed-by: Josef Bacik <jbacik-b10kYP2dOMg@public.gmane.org>
> Reviewed-by: David Sterba <dsterba-IBi9RG/b67k@public.gmane.org>
> ---
>  fs/btrfs/ctree.h |  3 ++
>  fs/btrfs/file.c  |  1 +
>  fs/btrfs/ioctl.c | 91 ++++++++++++++++++++++++++++++++------------------------
>  3 files changed, 56 insertions(+), 39 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 938efe3..5d06a4f 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3996,6 +3996,9 @@ int btrfs_dirty_pages(struct btrfs_root *root, struct
> inode *inode,
>  		      loff_t pos, size_t write_bytes,
>  		      struct extent_state **cached);
>  int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
> +ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
> +			      struct file *file_out, loff_t pos_out,
> +			      size_t len, int flags);
> 

It is different with declaration:
ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, loff_t, size_t, unsigned int);

(flags changed from unsigned int to int)

>  /* tree-defrag.c */
>  int btrfs_defrag_leaves(struct btrfs_trans_handle *trans, diff --git
> a/fs/btrfs/file.c b/fs/btrfs/file.c index b823fac..b05449c 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2816,6 +2816,7 @@ const struct file_operations btrfs_file_operations =
> {  #ifdef CONFIG_COMPAT
>  	.compat_ioctl	= btrfs_ioctl,
>  #endif
> +	.copy_file_range = btrfs_copy_file_range,

And cause compiler warning at this line:
fs/btrfs/file.c:2819: warning: initialization from incompatible pointer type   

Small problem, but better to fix.

Thanks
Zhaolei

>  };
> 
>  void btrfs_auto_defrag_exit(void)
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 0adf542..4311554 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3727,17 +3727,16 @@ out:
>  	return ret;
>  }
> 
> -static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
> -				       u64 off, u64 olen, u64 destoff)
> +static noinline int btrfs_clone_files(struct file *file, struct file *file_src,
> +					u64 off, u64 olen, u64 destoff)
>  {
>  	struct inode *inode = file_inode(file);
> +	struct inode *src = file_inode(file_src);
>  	struct btrfs_root *root = BTRFS_I(inode)->root;
> -	struct fd src_file;
> -	struct inode *src;
>  	int ret;
>  	u64 len = olen;
>  	u64 bs = root->fs_info->sb->s_blocksize;
> -	int same_inode = 0;
> +	int same_inode = src == inode;
> 
>  	/*
>  	 * TODO:
> @@ -3750,49 +3749,20 @@ static noinline long btrfs_ioctl_clone(struct file
> *file, unsigned long srcfd,
>  	 *   be either compressed or non-compressed.
>  	 */
> 
> -	/* the destination must be opened for writing */
> -	if (!(file->f_mode & FMODE_WRITE) || (file->f_flags & O_APPEND))
> -		return -EINVAL;
> -
>  	if (btrfs_root_readonly(root))
>  		return -EROFS;
> 
> -	ret = mnt_want_write_file(file);
> -	if (ret)
> -		return ret;
> -
> -	src_file = fdget(srcfd);
> -	if (!src_file.file) {
> -		ret = -EBADF;
> -		goto out_drop_write;
> -	}
> -
> -	ret = -EXDEV;
> -	if (src_file.file->f_path.mnt != file->f_path.mnt)
> -		goto out_fput;
> -
> -	src = file_inode(src_file.file);
> -
> -	ret = -EINVAL;
> -	if (src == inode)
> -		same_inode = 1;
> -
> -	/* the src must be open for reading */
> -	if (!(src_file.file->f_mode & FMODE_READ))
> -		goto out_fput;
> +	if (file_src->f_path.mnt != file->f_path.mnt ||
> +	    src->i_sb != inode->i_sb)
> +		return -EXDEV;
> 
>  	/* don't make the dst file partly checksummed */
>  	if ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) !=
>  	    (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM))
> -		goto out_fput;
> +		return -EINVAL;
> 
> -	ret = -EISDIR;
>  	if (S_ISDIR(src->i_mode) || S_ISDIR(inode->i_mode))
> -		goto out_fput;
> -
> -	ret = -EXDEV;
> -	if (src->i_sb != inode->i_sb)
> -		goto out_fput;
> +		return -EISDIR;
> 
>  	if (!same_inode) {
>  		btrfs_double_inode_lock(src, inode);
> @@ -3869,6 +3839,49 @@ out_unlock:
>  		btrfs_double_inode_unlock(src, inode);
>  	else
>  		mutex_unlock(&src->i_mutex);
> +	return ret;
> +}
> +
> +ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
> +			      struct file *file_out, loff_t pos_out,
> +			      size_t len, int flags)
> +{
> +	ssize_t ret;
> +
> +	ret = btrfs_clone_files(file_out, file_in, pos_in, len, pos_out);
> +	if (ret == 0)
> +		ret = len;
> +	return ret;
> +}
> +
> +static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
> +				       u64 off, u64 olen, u64 destoff) {
> +	struct fd src_file;
> +	int ret;
> +
> +	/* the destination must be opened for writing */
> +	if (!(file->f_mode & FMODE_WRITE) || (file->f_flags & O_APPEND))
> +		return -EINVAL;
> +
> +	ret = mnt_want_write_file(file);
> +	if (ret)
> +		return ret;
> +
> +	src_file = fdget(srcfd);
> +	if (!src_file.file) {
> +		ret = -EBADF;
> +		goto out_drop_write;
> +	}
> +
> +	/* the src must be open for reading */
> +	if (!(src_file.file->f_mode & FMODE_READ)) {
> +		ret = -EINVAL;
> +		goto out_fput;
> +	}
> +
> +	ret = btrfs_clone_files(file, src_file.file, off, olen, destoff);
> +
>  out_fput:
>  	fdput(src_file);
>  out_drop_write:
> --
> 2.5.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body
> of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 3/9] btrfs: add .copy_file_range file operation
  2015-09-30  3:20       ` Zhao Lei
@ 2015-09-30 12:55         ` Anna Schumaker
  0 siblings, 0 replies; 13+ messages in thread
From: Anna Schumaker @ 2015-09-30 12:55 UTC (permalink / raw)
  To: Zhao Lei, 'Anna Schumaker',
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, zab-ugsP4Wv/S6ZeoWH0uzbU5w,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, clm-b10kYP2dOMg,
	darrick.wong-QHcLZuEGTsvQT0dZR+AlfA,
	mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	andros-HgOvQuBEEgTQT0dZR+AlfA, hch-wEGCiKHe2LqWVfeAwA7xHQ

On 09/29/2015 11:20 PM, Zhao Lei wrote:
> Hi, Anna Schumaker
> 
>> -----Original Message-----
>> From: linux-btrfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> [mailto:linux-btrfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Anna Schumaker
>> Sent: Wednesday, September 30, 2015 2:05 AM
>> To: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
>> linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; zab-ugsP4Wv/S6ZeoWH0uzbU5w@public.gmane.org;
>> viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org; clm-b10kYP2dOMg@public.gmane.org; darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org;
>> mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; andros-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org; hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org
>> Subject: [PATCH v4 3/9] btrfs: add .copy_file_range file operation
>>
>> From: Zach Brown <zab-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>
>> This rearranges the existing COPY_RANGE ioctl implementation so that
>> the .copy_file_range file operation can call the core loop that copies file data
>> extent items.
>>
>> The extent copying loop is lifted up into its own function.  It retains the core
>> btrfs error checks that should be shared.
>>
>> Signed-off-by: Zach Brown <zab-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> Signed-off-by: Anna Schumaker <Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
>> Reviewed-by: Josef Bacik <jbacik-b10kYP2dOMg@public.gmane.org>
>> Reviewed-by: David Sterba <dsterba-IBi9RG/b67k@public.gmane.org>
>> ---
>>  fs/btrfs/ctree.h |  3 ++
>>  fs/btrfs/file.c  |  1 +
>>  fs/btrfs/ioctl.c | 91 ++++++++++++++++++++++++++++++++------------------------
>>  3 files changed, 56 insertions(+), 39 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 938efe3..5d06a4f 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -3996,6 +3996,9 @@ int btrfs_dirty_pages(struct btrfs_root *root, struct
>> inode *inode,
>>  		      loff_t pos, size_t write_bytes,
>>  		      struct extent_state **cached);
>>  int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
>> +ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
>> +			      struct file *file_out, loff_t pos_out,
>> +			      size_t len, int flags);
>>
> 
> It is different with declaration:
> ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, loff_t, size_t, unsigned int);
> 
> (flags changed from unsigned int to int)
> 
>>  /* tree-defrag.c */
>>  int btrfs_defrag_leaves(struct btrfs_trans_handle *trans, diff --git
>> a/fs/btrfs/file.c b/fs/btrfs/file.c index b823fac..b05449c 100644
>> --- a/fs/btrfs/file.c
>> +++ b/fs/btrfs/file.c
>> @@ -2816,6 +2816,7 @@ const struct file_operations btrfs_file_operations =
>> {  #ifdef CONFIG_COMPAT
>>  	.compat_ioctl	= btrfs_ioctl,
>>  #endif
>> +	.copy_file_range = btrfs_copy_file_range,
> 
> And cause compiler warning at this line:
> fs/btrfs/file.c:2819: warning: initialization from incompatible pointer type   
> 
> Small problem, but better to fix.

Thanks!  I must have missed it when I switched flags to unsigned int.

Anna

> 
> Thanks
> Zhaolei
> 
>>  };
>>
>>  void btrfs_auto_defrag_exit(void)
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 0adf542..4311554 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -3727,17 +3727,16 @@ out:
>>  	return ret;
>>  }
>>
>> -static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
>> -				       u64 off, u64 olen, u64 destoff)
>> +static noinline int btrfs_clone_files(struct file *file, struct file *file_src,
>> +					u64 off, u64 olen, u64 destoff)
>>  {
>>  	struct inode *inode = file_inode(file);
>> +	struct inode *src = file_inode(file_src);
>>  	struct btrfs_root *root = BTRFS_I(inode)->root;
>> -	struct fd src_file;
>> -	struct inode *src;
>>  	int ret;
>>  	u64 len = olen;
>>  	u64 bs = root->fs_info->sb->s_blocksize;
>> -	int same_inode = 0;
>> +	int same_inode = src == inode;
>>
>>  	/*
>>  	 * TODO:
>> @@ -3750,49 +3749,20 @@ static noinline long btrfs_ioctl_clone(struct file
>> *file, unsigned long srcfd,
>>  	 *   be either compressed or non-compressed.
>>  	 */
>>
>> -	/* the destination must be opened for writing */
>> -	if (!(file->f_mode & FMODE_WRITE) || (file->f_flags & O_APPEND))
>> -		return -EINVAL;
>> -
>>  	if (btrfs_root_readonly(root))
>>  		return -EROFS;
>>
>> -	ret = mnt_want_write_file(file);
>> -	if (ret)
>> -		return ret;
>> -
>> -	src_file = fdget(srcfd);
>> -	if (!src_file.file) {
>> -		ret = -EBADF;
>> -		goto out_drop_write;
>> -	}
>> -
>> -	ret = -EXDEV;
>> -	if (src_file.file->f_path.mnt != file->f_path.mnt)
>> -		goto out_fput;
>> -
>> -	src = file_inode(src_file.file);
>> -
>> -	ret = -EINVAL;
>> -	if (src == inode)
>> -		same_inode = 1;
>> -
>> -	/* the src must be open for reading */
>> -	if (!(src_file.file->f_mode & FMODE_READ))
>> -		goto out_fput;
>> +	if (file_src->f_path.mnt != file->f_path.mnt ||
>> +	    src->i_sb != inode->i_sb)
>> +		return -EXDEV;
>>
>>  	/* don't make the dst file partly checksummed */
>>  	if ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) !=
>>  	    (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM))
>> -		goto out_fput;
>> +		return -EINVAL;
>>
>> -	ret = -EISDIR;
>>  	if (S_ISDIR(src->i_mode) || S_ISDIR(inode->i_mode))
>> -		goto out_fput;
>> -
>> -	ret = -EXDEV;
>> -	if (src->i_sb != inode->i_sb)
>> -		goto out_fput;
>> +		return -EISDIR;
>>
>>  	if (!same_inode) {
>>  		btrfs_double_inode_lock(src, inode);
>> @@ -3869,6 +3839,49 @@ out_unlock:
>>  		btrfs_double_inode_unlock(src, inode);
>>  	else
>>  		mutex_unlock(&src->i_mutex);
>> +	return ret;
>> +}
>> +
>> +ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
>> +			      struct file *file_out, loff_t pos_out,
>> +			      size_t len, int flags)
>> +{
>> +	ssize_t ret;
>> +
>> +	ret = btrfs_clone_files(file_out, file_in, pos_in, len, pos_out);
>> +	if (ret == 0)
>> +		ret = len;
>> +	return ret;
>> +}
>> +
>> +static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
>> +				       u64 off, u64 olen, u64 destoff) {
>> +	struct fd src_file;
>> +	int ret;
>> +
>> +	/* the destination must be opened for writing */
>> +	if (!(file->f_mode & FMODE_WRITE) || (file->f_flags & O_APPEND))
>> +		return -EINVAL;
>> +
>> +	ret = mnt_want_write_file(file);
>> +	if (ret)
>> +		return ret;
>> +
>> +	src_file = fdget(srcfd);
>> +	if (!src_file.file) {
>> +		ret = -EBADF;
>> +		goto out_drop_write;
>> +	}
>> +
>> +	/* the src must be open for reading */
>> +	if (!(src_file.file->f_mode & FMODE_READ)) {
>> +		ret = -EINVAL;
>> +		goto out_fput;
>> +	}
>> +
>> +	ret = btrfs_clone_files(file, src_file.file, off, olen, destoff);
>> +
>>  out_fput:
>>  	fdput(src_file);
>>  out_drop_write:
>> --
>> 2.5.3
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body
>> of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-09-30 12:55 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-29 18:05 [PATCH v4 0/9] VFS: In-kernel copy system call Anna Schumaker
2015-09-29 18:05 ` [PATCH v4 1/9] vfs: add copy_file_range syscall and vfs helper Anna Schumaker
2015-09-29 18:05 ` [PATCH v4 4/9] vfs: Copy should check len after file open mode Anna Schumaker
     [not found] ` <1443549913-8091-1-git-send-email-Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
2015-09-29 18:05   ` [PATCH v4 2/9] x86: add sys_copy_file_range to syscall tables Anna Schumaker
2015-09-29 18:05   ` [PATCH v4 3/9] btrfs: add .copy_file_range file operation Anna Schumaker
     [not found]     ` <1443549913-8091-4-git-send-email-Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
2015-09-30  3:20       ` Zhao Lei
2015-09-30 12:55         ` Anna Schumaker
2015-09-29 18:05   ` [PATCH v4 5/9] vfs: Copy shouldn't forbid ranges inside the same file Anna Schumaker
2015-09-29 18:05   ` [PATCH v4 6/9] vfs: Copy should use file_out rather than file_in Anna Schumaker
2015-09-29 18:05   ` [PATCH v4 7/9] vfs: Remove copy_file_range mountpoint checks Anna Schumaker
2015-09-29 18:05   ` [PATCH v4 8/9] vfs: Add vfs_copy_file_range() support for pagecache copies Anna Schumaker
2015-09-29 18:05 ` [PATCH v4 9/9] btrfs: btrfs_copy_file_range() only supports reflinks Anna Schumaker
2015-09-29 18:05 ` [PATCH v4 10/9] copy_file_range.2: New page documenting copy_file_range() Anna Schumaker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).