All of lore.kernel.org
 help / color / mirror / Atom feed
* move btrfs clone ioctls to common code V2
@ 2015-12-03 11:59 Christoph Hellwig
  2015-12-03 11:59   ` Christoph Hellwig
                   ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Christoph Hellwig @ 2015-12-03 11:59 UTC (permalink / raw)
  To: viro
  Cc: tao.peng, jeff.layton, bfields, linux-fsdevel, linux-btrfs,
	linux-nfs, linux-cifs

This patch set moves the existing btrfs clone ioctls that other file
system have started to implement to common code, and allows the NFS
server to export this functionality to remote systems.

This work is based originally on my NFS CLONE prototype, which reused
code from Anna Schumaker's NFS COPY prototype, as well as various
updates from Peng Tao to this code.

The patches are also available as a git branch and on gitweb:

	git://git.infradead.org/users/hch/pnfs.git clone-for-viro
	http://git.infradead.org/users/hch/pnfs.git/shortlog/refs/heads/clone-for-viro


Changes since V1:
 - change the locks_mandatory_area calling convention again
 - support clones on CIFS properly
 - rebase on top of the NFS clone updates in 4.4-rc3


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/4] locks: new locks_mandatory_area calling convention
@ 2015-12-03 11:59   ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2015-12-03 11:59 UTC (permalink / raw)
  To: viro
  Cc: tao.peng, jeff.layton, bfields, linux-fsdevel, linux-btrfs,
	linux-nfs, linux-cifs

Pass a loff_t end for the last byte instead of the 32-bit count
parameter to allow full file clones even on 32-bit architectures.
While we're at it also drop the pointless inode argument and simplify
the read/write selection.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
---
 fs/locks.c         | 22 +++++++++-------------
 fs/read_write.c    |  5 ++---
 include/linux/fs.h | 28 +++++++++++++---------------
 3 files changed, 24 insertions(+), 31 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 0d2b326..ab2ea2e 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1227,21 +1227,17 @@ int locks_mandatory_locked(struct file *file)
 
 /**
  * locks_mandatory_area - Check for a conflicting lock
- * @read_write: %FLOCK_VERIFY_WRITE for exclusive access, %FLOCK_VERIFY_READ
- *		for shared
- * @inode:      the file to check
  * @filp:       how the file was opened (if it was)
- * @offset:     start of area to check
- * @count:      length of area to check
+ * @start:	first byte in the file to check
+ * @end:	lastbyte in the file to check
+ * @type:	%F_WRLCK for a write lock, else %F_RDLCK
  *
  * Searches the inode's list of locks to find any POSIX locks which conflict.
- * This function is called from rw_verify_area() and
- * locks_verify_truncate().
  */
-int locks_mandatory_area(int read_write, struct inode *inode,
-			 struct file *filp, loff_t offset,
-			 size_t count)
+int locks_mandatory_area(struct file *filp, loff_t start, loff_t end,
+		unsigned char type)
 {
+	struct inode *inode = file_inode(filp);
 	struct file_lock fl;
 	int error;
 	bool sleep = false;
@@ -1252,9 +1248,9 @@ int locks_mandatory_area(int read_write, struct inode *inode,
 	fl.fl_flags = FL_POSIX | FL_ACCESS;
 	if (filp && !(filp->f_flags & O_NONBLOCK))
 		sleep = true;
-	fl.fl_type = (read_write == FLOCK_VERIFY_WRITE) ? F_WRLCK : F_RDLCK;
-	fl.fl_start = offset;
-	fl.fl_end = offset + count - 1;
+	fl.fl_type = type;
+	fl.fl_start = start;
+	fl.fl_end = end;
 
 	for (;;) {
 		if (filp) {
diff --git a/fs/read_write.c b/fs/read_write.c
index c81ef39..6c1aa73 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -396,9 +396,8 @@ int rw_verify_area(int read_write, struct file *file, const loff_t *ppos, size_t
 	}
 
 	if (unlikely(inode->i_flctx && mandatory_lock(inode))) {
-		retval = locks_mandatory_area(
-			read_write == READ ? FLOCK_VERIFY_READ : FLOCK_VERIFY_WRITE,
-			inode, file, pos, count);
+		retval = locks_mandatory_area(file, pos, pos + count - 1,
+				read_write == READ ? F_RDLCK : F_WRLCK);
 		if (retval < 0)
 			return retval;
 	}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 870a76e..af559ac 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2030,12 +2030,9 @@ extern struct kobject *fs_kobj;
 
 #define MAX_RW_COUNT (INT_MAX & PAGE_CACHE_MASK)
 
-#define FLOCK_VERIFY_READ  1
-#define FLOCK_VERIFY_WRITE 2
-
 #ifdef CONFIG_FILE_LOCKING
 extern int locks_mandatory_locked(struct file *);
-extern int locks_mandatory_area(int, struct inode *, struct file *, loff_t, size_t);
+extern int locks_mandatory_area(struct file *, loff_t, loff_t, unsigned char);
 
 /*
  * Candidates for mandatory locking have the setgid bit set
@@ -2068,14 +2065,16 @@ static inline int locks_verify_truncate(struct inode *inode,
 				    struct file *filp,
 				    loff_t size)
 {
-	if (inode->i_flctx && mandatory_lock(inode))
-		return locks_mandatory_area(
-			FLOCK_VERIFY_WRITE, inode, filp,
-			size < inode->i_size ? size : inode->i_size,
-			(size < inode->i_size ? inode->i_size - size
-			 : size - inode->i_size)
-		);
-	return 0;
+	if (!inode->i_flctx || !mandatory_lock(inode))
+		return 0;
+
+	if (size < inode->i_size) {
+		return locks_mandatory_area(filp, size, inode->i_size - 1,
+				F_WRLCK);
+	} else {
+		return locks_mandatory_area(filp, inode->i_size, size - 1,
+				F_WRLCK);
+	}
 }
 
 static inline int break_lease(struct inode *inode, unsigned int mode)
@@ -2144,9 +2143,8 @@ static inline int locks_mandatory_locked(struct file *file)
 	return 0;
 }
 
-static inline int locks_mandatory_area(int rw, struct inode *inode,
-				       struct file *filp, loff_t offset,
-				       size_t count)
+static inline int locks_mandatory_area(struct file *filp, loff_t start,
+		loff_t end, unsigned char type)
 {
 	return 0;
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 1/4] locks: new locks_mandatory_area calling convention
@ 2015-12-03 11:59   ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2015-12-03 11:59 UTC (permalink / raw)
  To: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn
  Cc: tao.peng-7I+n7zu2hftEKMMhf/gKZA,
	jeff.layton-7I+n7zu2hftEKMMhf/gKZA,
	bfields-uC3wQj2KruNg9hUCZPvPmw,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA

Pass a loff_t end for the last byte instead of the 32-bit count
parameter to allow full file clones even on 32-bit architectures.
While we're at it also drop the pointless inode argument and simplify
the read/write selection.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Acked-by: J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
---
 fs/locks.c         | 22 +++++++++-------------
 fs/read_write.c    |  5 ++---
 include/linux/fs.h | 28 +++++++++++++---------------
 3 files changed, 24 insertions(+), 31 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 0d2b326..ab2ea2e 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1227,21 +1227,17 @@ int locks_mandatory_locked(struct file *file)
 
 /**
  * locks_mandatory_area - Check for a conflicting lock
- * @read_write: %FLOCK_VERIFY_WRITE for exclusive access, %FLOCK_VERIFY_READ
- *		for shared
- * @inode:      the file to check
  * @filp:       how the file was opened (if it was)
- * @offset:     start of area to check
- * @count:      length of area to check
+ * @start:	first byte in the file to check
+ * @end:	lastbyte in the file to check
+ * @type:	%F_WRLCK for a write lock, else %F_RDLCK
  *
  * Searches the inode's list of locks to find any POSIX locks which conflict.
- * This function is called from rw_verify_area() and
- * locks_verify_truncate().
  */
-int locks_mandatory_area(int read_write, struct inode *inode,
-			 struct file *filp, loff_t offset,
-			 size_t count)
+int locks_mandatory_area(struct file *filp, loff_t start, loff_t end,
+		unsigned char type)
 {
+	struct inode *inode = file_inode(filp);
 	struct file_lock fl;
 	int error;
 	bool sleep = false;
@@ -1252,9 +1248,9 @@ int locks_mandatory_area(int read_write, struct inode *inode,
 	fl.fl_flags = FL_POSIX | FL_ACCESS;
 	if (filp && !(filp->f_flags & O_NONBLOCK))
 		sleep = true;
-	fl.fl_type = (read_write == FLOCK_VERIFY_WRITE) ? F_WRLCK : F_RDLCK;
-	fl.fl_start = offset;
-	fl.fl_end = offset + count - 1;
+	fl.fl_type = type;
+	fl.fl_start = start;
+	fl.fl_end = end;
 
 	for (;;) {
 		if (filp) {
diff --git a/fs/read_write.c b/fs/read_write.c
index c81ef39..6c1aa73 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -396,9 +396,8 @@ int rw_verify_area(int read_write, struct file *file, const loff_t *ppos, size_t
 	}
 
 	if (unlikely(inode->i_flctx && mandatory_lock(inode))) {
-		retval = locks_mandatory_area(
-			read_write == READ ? FLOCK_VERIFY_READ : FLOCK_VERIFY_WRITE,
-			inode, file, pos, count);
+		retval = locks_mandatory_area(file, pos, pos + count - 1,
+				read_write == READ ? F_RDLCK : F_WRLCK);
 		if (retval < 0)
 			return retval;
 	}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 870a76e..af559ac 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2030,12 +2030,9 @@ extern struct kobject *fs_kobj;
 
 #define MAX_RW_COUNT (INT_MAX & PAGE_CACHE_MASK)
 
-#define FLOCK_VERIFY_READ  1
-#define FLOCK_VERIFY_WRITE 2
-
 #ifdef CONFIG_FILE_LOCKING
 extern int locks_mandatory_locked(struct file *);
-extern int locks_mandatory_area(int, struct inode *, struct file *, loff_t, size_t);
+extern int locks_mandatory_area(struct file *, loff_t, loff_t, unsigned char);
 
 /*
  * Candidates for mandatory locking have the setgid bit set
@@ -2068,14 +2065,16 @@ static inline int locks_verify_truncate(struct inode *inode,
 				    struct file *filp,
 				    loff_t size)
 {
-	if (inode->i_flctx && mandatory_lock(inode))
-		return locks_mandatory_area(
-			FLOCK_VERIFY_WRITE, inode, filp,
-			size < inode->i_size ? size : inode->i_size,
-			(size < inode->i_size ? inode->i_size - size
-			 : size - inode->i_size)
-		);
-	return 0;
+	if (!inode->i_flctx || !mandatory_lock(inode))
+		return 0;
+
+	if (size < inode->i_size) {
+		return locks_mandatory_area(filp, size, inode->i_size - 1,
+				F_WRLCK);
+	} else {
+		return locks_mandatory_area(filp, inode->i_size, size - 1,
+				F_WRLCK);
+	}
 }
 
 static inline int break_lease(struct inode *inode, unsigned int mode)
@@ -2144,9 +2143,8 @@ static inline int locks_mandatory_locked(struct file *file)
 	return 0;
 }
 
-static inline int locks_mandatory_area(int rw, struct inode *inode,
-				       struct file *filp, loff_t offset,
-				       size_t count)
+static inline int locks_mandatory_area(struct file *filp, loff_t start,
+		loff_t end, unsigned char type)
 {
 	return 0;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/4] vfs: pull btrfs clone API to vfs layer
@ 2015-12-03 11:59   ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2015-12-03 11:59 UTC (permalink / raw)
  To: viro
  Cc: tao.peng, jeff.layton, bfields, linux-fsdevel, linux-btrfs,
	linux-nfs, linux-cifs

The btrfs clone ioctls are now adopted by other file systems, with NFS
and CIFS already having support for them, and XFS being under active
development.  To avoid growth of various slightly incompatible
implementations, add one to the VFS.  Note that clones are different from
file copies in several ways:

 - they are atomic vs other writers
 - they support whole file clones
 - they support 64-bit legth clones
 - they do not allow partial success (aka short writes)
 - clones are expected to be a fast metadata operation

Because of that it would be rather cumbersome to try to piggyback them on
top of the recent clone_file_range infrastructure.  The converse isn't
true and the clone_file_range system call could try clone file range as
a first attempt to copy, something that further patches will enable.

Based on earlier work from Peng Tao.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/ctree.h        |   3 +-
 fs/btrfs/file.c         |   1 +
 fs/btrfs/ioctl.c        |  49 ++-----------------
 fs/cifs/cifsfs.c        |  63 ++++++++++++++++++++++++
 fs/cifs/cifsfs.h        |   1 -
 fs/cifs/ioctl.c         | 126 +++++++++++++++++++++++-------------------------
 fs/ioctl.c              |  29 +++++++++++
 fs/nfs/nfs4file.c       |  87 ++++-----------------------------
 fs/read_write.c         |  72 +++++++++++++++++++++++++++
 include/linux/fs.h      |   7 ++-
 include/uapi/linux/fs.h |   9 ++++
 11 files changed, 254 insertions(+), 193 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index ede7277..dd4733f 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -4025,7 +4025,6 @@ void btrfs_get_block_group_info(struct list_head *groups_list,
 void update_ioctl_balance_args(struct btrfs_fs_info *fs_info, int lock,
 			       struct btrfs_ioctl_balance_args *bargs);
 
-
 /* file.c */
 int btrfs_auto_defrag_init(void);
 void btrfs_auto_defrag_exit(void);
@@ -4058,6 +4057,8 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
 ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
 			      struct file *file_out, loff_t pos_out,
 			      size_t len, unsigned int flags);
+int btrfs_clone_file_range(struct file *file_in, loff_t pos_in,
+			   struct file *file_out, loff_t pos_out, u64 len);
 
 /* tree-defrag.c */
 int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index e67fe6a..232e300 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2925,6 +2925,7 @@ const struct file_operations btrfs_file_operations = {
 	.compat_ioctl	= btrfs_ioctl,
 #endif
 	.copy_file_range = btrfs_copy_file_range,
+	.clone_file_range = btrfs_clone_file_range,
 };
 
 void btrfs_auto_defrag_exit(void)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 0f92735..85b1cae 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3906,49 +3906,10 @@ ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	return ret;
 }
 
-static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
-				       u64 off, u64 olen, u64 destoff)
+int btrfs_clone_file_range(struct file *src_file, loff_t off,
+		struct file *dst_file, loff_t destoff, u64 len)
 {
-	struct fd src_file;
-	int ret;
-
-	/* the destination must be opened for writing */
-	if (!(file->f_mode & FMODE_WRITE) || (file->f_flags & O_APPEND))
-		return -EINVAL;
-
-	ret = mnt_want_write_file(file);
-	if (ret)
-		return ret;
-
-	src_file = fdget(srcfd);
-	if (!src_file.file) {
-		ret = -EBADF;
-		goto out_drop_write;
-	}
-
-	/* the src must be open for reading */
-	if (!(src_file.file->f_mode & FMODE_READ)) {
-		ret = -EINVAL;
-		goto out_fput;
-	}
-
-	ret = btrfs_clone_files(file, src_file.file, off, olen, destoff);
-
-out_fput:
-	fdput(src_file);
-out_drop_write:
-	mnt_drop_write_file(file);
-	return ret;
-}
-
-static long btrfs_ioctl_clone_range(struct file *file, void __user *argp)
-{
-	struct btrfs_ioctl_clone_range_args args;
-
-	if (copy_from_user(&args, argp, sizeof(args)))
-		return -EFAULT;
-	return btrfs_ioctl_clone(file, args.src_fd, args.src_offset,
-				 args.src_length, args.dest_offset);
+	return btrfs_clone_files(dst_file, src_file, off, len, destoff);
 }
 
 /*
@@ -5498,10 +5459,6 @@ long btrfs_ioctl(struct file *file, unsigned int
 		return btrfs_ioctl_dev_info(root, argp);
 	case BTRFS_IOC_BALANCE:
 		return btrfs_ioctl_balance(file, NULL);
-	case BTRFS_IOC_CLONE:
-		return btrfs_ioctl_clone(file, arg, 0, 0, 0);
-	case BTRFS_IOC_CLONE_RANGE:
-		return btrfs_ioctl_clone_range(file, argp);
 	case BTRFS_IOC_TRANS_START:
 		return btrfs_ioctl_trans_start(file);
 	case BTRFS_IOC_TRANS_END:
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index cbc0f4b..e9b978f 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -914,6 +914,61 @@ const struct inode_operations cifs_symlink_inode_ops = {
 #endif
 };
 
+static int cifs_clone_file_range(struct file *src_file, loff_t off,
+		struct file *dst_file, loff_t destoff, u64 len)
+{
+	struct inode *src_inode = file_inode(src_file);
+	struct inode *target_inode = file_inode(dst_file);
+	struct cifsFileInfo *smb_file_src = src_file->private_data;
+	struct cifsFileInfo *smb_file_target = dst_file->private_data;
+	struct cifs_tcon *src_tcon = tlink_tcon(smb_file_src->tlink);
+	struct cifs_tcon *target_tcon = tlink_tcon(smb_file_target->tlink);
+	unsigned int xid;
+	int rc;
+
+	cifs_dbg(FYI, "clone range\n");
+
+	xid = get_xid();
+
+	if (!src_file->private_data || !dst_file->private_data) {
+		rc = -EBADF;
+		cifs_dbg(VFS, "missing cifsFileInfo on copy range src file\n");
+		goto out;
+	}
+
+	/*
+	 * Note: cifs case is easier than btrfs since server responsible for
+	 * checks for proper open modes and file type and if it wants
+	 * server could even support copy of range where source = target
+	 */
+	lock_two_nondirectories(target_inode, src_inode);
+
+	if (len == 0)
+		len = src_inode->i_size - off;
+
+	cifs_dbg(FYI, "about to flush pages\n");
+	/* should we flush first and last page first */
+	truncate_inode_pages_range(&target_inode->i_data, destoff,
+				   PAGE_CACHE_ALIGN(destoff + len)-1);
+
+	if (target_tcon->ses->server->ops->duplicate_extents)
+		rc = target_tcon->ses->server->ops->duplicate_extents(xid,
+			smb_file_src, smb_file_target, off, len, destoff);
+	else
+		rc = -EOPNOTSUPP;
+
+	/* force revalidate of size and timestamps of target file now
+	   that target is updated on the server */
+	CIFS_I(target_inode)->time = 0;
+out_unlock:
+	/* although unlocking in the reverse order from locking is not
+	   strictly necessary here it is a little cleaner to be consistent */
+	unlock_two_nondirectories(src_inode, target_inode);
+out:
+	free_xid(xid);
+	return rc;
+}
+
 const struct file_operations cifs_file_ops = {
 	.read_iter = cifs_loose_read_iter,
 	.write_iter = cifs_file_write_iter,
@@ -926,6 +981,7 @@ const struct file_operations cifs_file_ops = {
 	.splice_read = generic_file_splice_read,
 	.llseek = cifs_llseek,
 	.unlocked_ioctl	= cifs_ioctl,
+	.clone_file_range = cifs_clone_file_range,
 	.setlease = cifs_setlease,
 	.fallocate = cifs_fallocate,
 };
@@ -942,6 +998,8 @@ const struct file_operations cifs_file_strict_ops = {
 	.splice_read = generic_file_splice_read,
 	.llseek = cifs_llseek,
 	.unlocked_ioctl	= cifs_ioctl,
+	.clone_file_range = cifs_clone_file_range,
+	.clone_file_range = cifs_clone_file_range,
 	.setlease = cifs_setlease,
 	.fallocate = cifs_fallocate,
 };
@@ -958,6 +1016,7 @@ const struct file_operations cifs_file_direct_ops = {
 	.mmap = cifs_file_mmap,
 	.splice_read = generic_file_splice_read,
 	.unlocked_ioctl  = cifs_ioctl,
+	.clone_file_range = cifs_clone_file_range,
 	.llseek = cifs_llseek,
 	.setlease = cifs_setlease,
 	.fallocate = cifs_fallocate,
@@ -974,6 +1033,7 @@ const struct file_operations cifs_file_nobrl_ops = {
 	.splice_read = generic_file_splice_read,
 	.llseek = cifs_llseek,
 	.unlocked_ioctl	= cifs_ioctl,
+	.clone_file_range = cifs_clone_file_range,
 	.setlease = cifs_setlease,
 	.fallocate = cifs_fallocate,
 };
@@ -989,6 +1049,7 @@ const struct file_operations cifs_file_strict_nobrl_ops = {
 	.splice_read = generic_file_splice_read,
 	.llseek = cifs_llseek,
 	.unlocked_ioctl	= cifs_ioctl,
+	.clone_file_range = cifs_clone_file_range,
 	.setlease = cifs_setlease,
 	.fallocate = cifs_fallocate,
 };
@@ -1004,6 +1065,7 @@ const struct file_operations cifs_file_direct_nobrl_ops = {
 	.mmap = cifs_file_mmap,
 	.splice_read = generic_file_splice_read,
 	.unlocked_ioctl  = cifs_ioctl,
+	.clone_file_range = cifs_clone_file_range,
 	.llseek = cifs_llseek,
 	.setlease = cifs_setlease,
 	.fallocate = cifs_fallocate,
@@ -1014,6 +1076,7 @@ const struct file_operations cifs_dir_ops = {
 	.release = cifs_closedir,
 	.read    = generic_read_dir,
 	.unlocked_ioctl  = cifs_ioctl,
+	.clone_file_range = cifs_clone_file_range,
 	.llseek = generic_file_llseek,
 };
 
diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index c3cc160..c399513 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -131,7 +131,6 @@ extern int	cifs_setxattr(struct dentry *, const char *, const void *,
 extern ssize_t	cifs_getxattr(struct dentry *, const char *, void *, size_t);
 extern ssize_t	cifs_listxattr(struct dentry *, char *, size_t);
 extern long cifs_ioctl(struct file *filep, unsigned int cmd, unsigned long arg);
-
 #ifdef CONFIG_CIFS_NFSD_EXPORT
 extern const struct export_operations cifs_export_ops;
 #endif /* CONFIG_CIFS_NFSD_EXPORT */
diff --git a/fs/cifs/ioctl.c b/fs/cifs/ioctl.c
index 35cf990..7a3b84e 100644
--- a/fs/cifs/ioctl.c
+++ b/fs/cifs/ioctl.c
@@ -34,73 +34,36 @@
 #include "cifs_ioctl.h"
 #include <linux/btrfs.h>
 
-static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file,
-			unsigned long srcfd, u64 off, u64 len, u64 destoff,
-			bool dup_extents)
+static int cifs_file_clone_range(unsigned int xid, struct file *src_file,
+			  struct file *dst_file)
 {
-	int rc;
-	struct cifsFileInfo *smb_file_target = dst_file->private_data;
+	struct inode *src_inode = file_inode(src_file);
 	struct inode *target_inode = file_inode(dst_file);
-	struct cifs_tcon *target_tcon;
-	struct fd src_file;
 	struct cifsFileInfo *smb_file_src;
-	struct inode *src_inode;
+	struct cifsFileInfo *smb_file_target;
 	struct cifs_tcon *src_tcon;
+	struct cifs_tcon *target_tcon;
+	int rc;
 
 	cifs_dbg(FYI, "ioctl clone range\n");
-	/* the destination must be opened for writing */
-	if (!(dst_file->f_mode & FMODE_WRITE)) {
-		cifs_dbg(FYI, "file target not open for write\n");
-		return -EINVAL;
-	}
 
-	/* check if target volume is readonly and take reference */
-	rc = mnt_want_write_file(dst_file);
-	if (rc) {
-		cifs_dbg(FYI, "mnt_want_write failed with rc %d\n", rc);
-		return rc;
-	}
-
-	src_file = fdget(srcfd);
-	if (!src_file.file) {
-		rc = -EBADF;
-		goto out_drop_write;
-	}
-
-	if (src_file.file->f_op->unlocked_ioctl != cifs_ioctl) {
-		rc = -EBADF;
-		cifs_dbg(VFS, "src file seems to be from a different filesystem type\n");
-		goto out_fput;
-	}
-
-	if ((!src_file.file->private_data) || (!dst_file->private_data)) {
+	if (!src_file->private_data || !dst_file->private_data) {
 		rc = -EBADF;
 		cifs_dbg(VFS, "missing cifsFileInfo on copy range src file\n");
-		goto out_fput;
+		goto out;
 	}
 
 	rc = -EXDEV;
 	smb_file_target = dst_file->private_data;
-	smb_file_src = src_file.file->private_data;
+	smb_file_src = src_file->private_data;
 	src_tcon = tlink_tcon(smb_file_src->tlink);
 	target_tcon = tlink_tcon(smb_file_target->tlink);
 
-	/* check source and target on same server (or volume if dup_extents) */
-	if (dup_extents && (src_tcon != target_tcon)) {
-		cifs_dbg(VFS, "source and target of copy not on same share\n");
-		goto out_fput;
-	}
-
-	if (!dup_extents && (src_tcon->ses != target_tcon->ses)) {
+	if (src_tcon->ses != target_tcon->ses) {
 		cifs_dbg(VFS, "source and target of copy not on same server\n");
-		goto out_fput;
+		goto out;
 	}
 
-	src_inode = file_inode(src_file.file);
-	rc = -EINVAL;
-	if (S_ISDIR(src_inode->i_mode))
-		goto out_fput;
-
 	/*
 	 * Note: cifs case is easier than btrfs since server responsible for
 	 * checks for proper open modes and file type and if it wants
@@ -108,34 +71,66 @@ static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file,
 	 */
 	lock_two_nondirectories(target_inode, src_inode);
 
-	/* determine range to clone */
-	rc = -EINVAL;
-	if (off + len > src_inode->i_size || off + len < off)
-		goto out_unlock;
-	if (len == 0)
-		len = src_inode->i_size - off;
-
 	cifs_dbg(FYI, "about to flush pages\n");
 	/* should we flush first and last page first */
-	truncate_inode_pages_range(&target_inode->i_data, destoff,
-				   PAGE_CACHE_ALIGN(destoff + len)-1);
+	truncate_inode_pages(&target_inode->i_data, 0);
 
-	if (dup_extents && target_tcon->ses->server->ops->duplicate_extents)
-		rc = target_tcon->ses->server->ops->duplicate_extents(xid,
-			smb_file_src, smb_file_target, off, len, destoff);
-	else if (!dup_extents && target_tcon->ses->server->ops->clone_range)
+	if (target_tcon->ses->server->ops->clone_range)
 		rc = target_tcon->ses->server->ops->clone_range(xid,
-			smb_file_src, smb_file_target, off, len, destoff);
+			smb_file_src, smb_file_target, 0, src_inode->i_size, 0);
 	else
 		rc = -EOPNOTSUPP;
 
 	/* force revalidate of size and timestamps of target file now
 	   that target is updated on the server */
 	CIFS_I(target_inode)->time = 0;
-out_unlock:
 	/* although unlocking in the reverse order from locking is not
 	   strictly necessary here it is a little cleaner to be consistent */
 	unlock_two_nondirectories(src_inode, target_inode);
+out:
+	return rc;
+}
+
+static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file,
+			unsigned long srcfd)
+{
+	int rc;
+	struct fd src_file;
+	struct inode *src_inode;
+
+	cifs_dbg(FYI, "ioctl clone range\n");
+	/* the destination must be opened for writing */
+	if (!(dst_file->f_mode & FMODE_WRITE)) {
+		cifs_dbg(FYI, "file target not open for write\n");
+		return -EINVAL;
+	}
+
+	/* check if target volume is readonly and take reference */
+	rc = mnt_want_write_file(dst_file);
+	if (rc) {
+		cifs_dbg(FYI, "mnt_want_write failed with rc %d\n", rc);
+		return rc;
+	}
+
+	src_file = fdget(srcfd);
+	if (!src_file.file) {
+		rc = -EBADF;
+		goto out_drop_write;
+	}
+
+	if (src_file.file->f_op->unlocked_ioctl != cifs_ioctl) {
+		rc = -EBADF;
+		cifs_dbg(VFS, "src file seems to be from a different filesystem type\n");
+		goto out_fput;
+	}
+
+	src_inode = file_inode(src_file.file);
+	rc = -EINVAL;
+	if (S_ISDIR(src_inode->i_mode))
+		goto out_fput;
+
+	rc = cifs_file_clone_range(xid, src_file.file, dst_file);
+
 out_fput:
 	fdput(src_file);
 out_drop_write:
@@ -256,10 +251,7 @@ long cifs_ioctl(struct file *filep, unsigned int command, unsigned long arg)
 			}
 			break;
 		case CIFS_IOC_COPYCHUNK_FILE:
-			rc = cifs_ioctl_clone(xid, filep, arg, 0, 0, 0, false);
-			break;
-		case BTRFS_IOC_CLONE:
-			rc = cifs_ioctl_clone(xid, filep, arg, 0, 0, 0, true);
+			rc = cifs_ioctl_clone(xid, filep, arg);
 			break;
 		case CIFS_IOC_SET_INTEGRITY:
 			if (pSMBFile == NULL)
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 5d01d26..84c6e79 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -215,6 +215,29 @@ static int ioctl_fiemap(struct file *filp, unsigned long arg)
 	return error;
 }
 
+static long ioctl_file_clone(struct file *dst_file, unsigned long srcfd,
+			     u64 off, u64 olen, u64 destoff)
+{
+	struct fd src_file = fdget(srcfd);
+	int ret;
+
+	if (!src_file.file)
+		return -EBADF;
+	ret = vfs_clone_file_range(src_file.file, off, dst_file, destoff, olen);
+	fdput(src_file);
+	return ret;
+}
+
+static long ioctl_file_clone_range(struct file *file, void __user *argp)
+{
+	struct file_clone_range args;
+
+	if (copy_from_user(&args, argp, sizeof(args)))
+		return -EFAULT;
+	return ioctl_file_clone(file, args.src_fd, args.src_offset,
+				args.src_length, args.dest_offset);
+}
+
 #ifdef CONFIG_BLOCK
 
 static inline sector_t logical_to_blk(struct inode *inode, loff_t offset)
@@ -600,6 +623,12 @@ int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
 	case FIGETBSZ:
 		return put_user(inode->i_sb->s_blocksize, argp);
 
+	case FICLONE:
+		return ioctl_file_clone(filp, arg, 0, 0, 0);
+
+	case FICLONERANGE:
+		return ioctl_file_clone_range(filp, argp);
+
 	default:
 		if (S_ISREG(inode->i_mode))
 			error = file_ioctl(filp, cmd, arg);
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index db9b5fe..26f9a23 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -195,65 +195,27 @@ static long nfs42_fallocate(struct file *filep, int mode, loff_t offset, loff_t
 	return nfs42_proc_allocate(filep, offset, len);
 }
 
-static noinline long
-nfs42_ioctl_clone(struct file *dst_file, unsigned long srcfd,
-		  u64 src_off, u64 dst_off, u64 count)
+static int nfs42_clone_file_range(struct file *src_file, loff_t src_off,
+		struct file *dst_file, loff_t dst_off, u64 count)
 {
 	struct inode *dst_inode = file_inode(dst_file);
 	struct nfs_server *server = NFS_SERVER(dst_inode);
-	struct fd src_file;
-	struct inode *src_inode;
+	struct inode *src_inode = file_inode(src_file);
 	unsigned int bs = server->clone_blksize;
 	bool same_inode = false;
 	int ret;
 
-	/* dst file must be opened for writing */
-	if (!(dst_file->f_mode & FMODE_WRITE))
-		return -EINVAL;
-
-	ret = mnt_want_write_file(dst_file);
-	if (ret)
-		return ret;
-
-	src_file = fdget(srcfd);
-	if (!src_file.file) {
-		ret = -EBADF;
-		goto out_drop_write;
-	}
-
-	src_inode = file_inode(src_file.file);
-
-	if (src_inode == dst_inode)
-		same_inode = true;
-
-	/* src file must be opened for reading */
-	if (!(src_file.file->f_mode & FMODE_READ))
-		goto out_fput;
-
-	/* src and dst must be regular files */
-	ret = -EISDIR;
-	if (!S_ISREG(src_inode->i_mode) || !S_ISREG(dst_inode->i_mode))
-		goto out_fput;
-
-	ret = -EXDEV;
-	if (src_file.file->f_path.mnt != dst_file->f_path.mnt ||
-	    src_inode->i_sb != dst_inode->i_sb)
-		goto out_fput;
-
 	/* check alignment w.r.t. clone_blksize */
 	ret = -EINVAL;
 	if (bs) {
 		if (!IS_ALIGNED(src_off, bs) || !IS_ALIGNED(dst_off, bs))
-			goto out_fput;
+			goto out;
 		if (!IS_ALIGNED(count, bs) && i_size_read(src_inode) != (src_off + count))
-			goto out_fput;
+			goto out;
 	}
 
-	/* verify if ranges are overlapped within the same file */
-	if (same_inode) {
-		if (dst_off + count > src_off && dst_off < src_off + count)
-			goto out_fput;
-	}
+	if (src_inode == dst_inode)
+		same_inode = true;
 
 	/* XXX: do we lock at all? what if server needs CB_RECALL_LAYOUT? */
 	if (same_inode) {
@@ -275,7 +237,7 @@ nfs42_ioctl_clone(struct file *dst_file, unsigned long srcfd,
 	if (ret)
 		goto out_unlock;
 
-	ret = nfs42_proc_clone(src_file.file, dst_file, src_off, dst_off, count);
+	ret = nfs42_proc_clone(src_file, dst_file, src_off, dst_off, count);
 
 	/* truncate inode page cache of the dst range so that future reads can fetch
 	 * new data from server */
@@ -292,37 +254,9 @@ out_unlock:
 		mutex_unlock(&dst_inode->i_mutex);
 		mutex_unlock(&src_inode->i_mutex);
 	}
-out_fput:
-	fdput(src_file);
-out_drop_write:
-	mnt_drop_write_file(dst_file);
+out:
 	return ret;
 }
-
-static long nfs42_ioctl_clone_range(struct file *dst_file, void __user *argp)
-{
-	struct btrfs_ioctl_clone_range_args args;
-
-	if (copy_from_user(&args, argp, sizeof(args)))
-		return -EFAULT;
-
-	return nfs42_ioctl_clone(dst_file, args.src_fd, args.src_offset,
-				 args.dest_offset, args.src_length);
-}
-
-long nfs4_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
-{
-	void __user *argp = (void __user *)arg;
-
-	switch (cmd) {
-	case BTRFS_IOC_CLONE:
-		return nfs42_ioctl_clone(file, arg, 0, 0, 0);
-	case BTRFS_IOC_CLONE_RANGE:
-		return nfs42_ioctl_clone_range(file, argp);
-	}
-
-	return -ENOTTY;
-}
 #endif /* CONFIG_NFS_V4_2 */
 
 const struct file_operations nfs4_file_operations = {
@@ -342,8 +276,7 @@ const struct file_operations nfs4_file_operations = {
 #ifdef CONFIG_NFS_V4_2
 	.llseek		= nfs4_file_llseek,
 	.fallocate	= nfs42_fallocate,
-	.unlocked_ioctl = nfs4_ioctl,
-	.compat_ioctl	= nfs4_ioctl,
+	.clone_file_range = nfs42_clone_file_range,
 #else
 	.llseek		= nfs_file_llseek,
 #endif
diff --git a/fs/read_write.c b/fs/read_write.c
index 6c1aa73..9e3dd8f 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1451,3 +1451,75 @@ out1:
 out2:
 	return ret;
 }
+
+static int clone_verify_area(struct file *file, loff_t pos, u64 len, bool write)
+{
+	struct inode *inode = file_inode(file);
+
+	if (unlikely(pos < 0))
+		return -EINVAL;
+
+	 if (unlikely((loff_t) (pos + len) < 0))
+		return -EINVAL;
+
+	if (unlikely(inode->i_flctx && mandatory_lock(inode))) {
+		loff_t end = len ? pos + len - 1 : OFFSET_MAX;
+		int retval;
+
+		retval = locks_mandatory_area(file, pos, end,
+				write ? F_WRLCK : F_RDLCK);
+		if (retval < 0)
+			return retval;
+	}
+
+	return security_file_permission(file, write ? MAY_WRITE : MAY_READ);
+}
+
+int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
+		struct file *file_out, loff_t pos_out, u64 len)
+{
+	struct inode *inode_in = file_inode(file_in);
+	struct inode *inode_out = file_inode(file_out);
+	int ret;
+
+	if (inode_in->i_sb != inode_out->i_sb ||
+	    file_in->f_path.mnt != file_out->f_path.mnt)
+		return -EXDEV;
+
+	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
+		return -EISDIR;
+	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
+		return -EOPNOTSUPP;
+
+	if (!(file_in->f_mode & FMODE_READ) ||
+	    !(file_out->f_mode & FMODE_WRITE) ||
+	    (file_out->f_flags & O_APPEND) ||
+	    !file_in->f_op->clone_file_range)
+		return -EBADF;
+
+	ret = clone_verify_area(file_in, pos_in, len, false);
+	if (ret)
+		return ret;
+
+	ret = clone_verify_area(file_out, pos_out, len, true);
+	if (ret)
+		return ret;
+
+	if (pos_in + len > i_size_read(inode_in))
+		return -EINVAL;
+
+	ret = mnt_want_write_file(file_out);
+	if (ret)
+		return ret;
+
+	ret = file_in->f_op->clone_file_range(file_in, pos_in,
+			file_out, pos_out, len);
+	if (!ret) {
+		fsnotify_access(file_in);
+		fsnotify_modify(file_out);
+	}
+
+	mnt_drop_write_file(file_out);
+	return ret;
+}
+EXPORT_SYMBOL(vfs_clone_file_range);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index af559ac..59bf96d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1629,7 +1629,10 @@ struct file_operations {
 #ifndef CONFIG_MMU
 	unsigned (*mmap_capabilities)(struct file *);
 #endif
-	ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, loff_t, size_t, unsigned int);
+	ssize_t (*copy_file_range)(struct file *, loff_t, struct file *,
+			loff_t, size_t, unsigned int);
+	int (*clone_file_range)(struct file *, loff_t, struct file *, loff_t,
+			u64);
 };
 
 struct inode_operations {
@@ -1683,6 +1686,8 @@ extern ssize_t vfs_writev(struct file *, const struct iovec __user *,
 		unsigned long, loff_t *);
 extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
 				   loff_t, size_t, unsigned int);
+extern int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
+		struct file *file_out, loff_t pos_out, u64 len);
 
 struct super_operations {
    	struct inode *(*alloc_inode)(struct super_block *sb);
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index f15d980..cd5db7f 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -39,6 +39,13 @@
 #define RENAME_EXCHANGE		(1 << 1)	/* Exchange source and dest */
 #define RENAME_WHITEOUT		(1 << 2)	/* Whiteout source */
 
+struct file_clone_range {
+	__s64 src_fd;
+	__u64 src_offset;
+	__u64 src_length;
+	__u64 dest_offset;
+};
+
 struct fstrim_range {
 	__u64 start;
 	__u64 len;
@@ -159,6 +166,8 @@ struct inodes_stat_t {
 #define FIFREEZE	_IOWR('X', 119, int)	/* Freeze */
 #define FITHAW		_IOWR('X', 120, int)	/* Thaw */
 #define FITRIM		_IOWR('X', 121, struct fstrim_range)	/* Trim */
+#define FICLONE		_IOW(0x94, 9, int)
+#define FICLONERANGE	_IOW(0x94, 13, struct file_clone_range)
 
 #define	FS_IOC_GETFLAGS			_IOR('f', 1, long)
 #define	FS_IOC_SETFLAGS			_IOW('f', 2, long)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/4] vfs: pull btrfs clone API to vfs layer
@ 2015-12-03 11:59   ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2015-12-03 11:59 UTC (permalink / raw)
  To: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn
  Cc: tao.peng-7I+n7zu2hftEKMMhf/gKZA,
	jeff.layton-7I+n7zu2hftEKMMhf/gKZA,
	bfields-uC3wQj2KruNg9hUCZPvPmw,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA

The btrfs clone ioctls are now adopted by other file systems, with NFS
and CIFS already having support for them, and XFS being under active
development.  To avoid growth of various slightly incompatible
implementations, add one to the VFS.  Note that clones are different from
file copies in several ways:

 - they are atomic vs other writers
 - they support whole file clones
 - they support 64-bit legth clones
 - they do not allow partial success (aka short writes)
 - clones are expected to be a fast metadata operation

Because of that it would be rather cumbersome to try to piggyback them on
top of the recent clone_file_range infrastructure.  The converse isn't
true and the clone_file_range system call could try clone file range as
a first attempt to copy, something that further patches will enable.

Based on earlier work from Peng Tao.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 fs/btrfs/ctree.h        |   3 +-
 fs/btrfs/file.c         |   1 +
 fs/btrfs/ioctl.c        |  49 ++-----------------
 fs/cifs/cifsfs.c        |  63 ++++++++++++++++++++++++
 fs/cifs/cifsfs.h        |   1 -
 fs/cifs/ioctl.c         | 126 +++++++++++++++++++++++-------------------------
 fs/ioctl.c              |  29 +++++++++++
 fs/nfs/nfs4file.c       |  87 ++++-----------------------------
 fs/read_write.c         |  72 +++++++++++++++++++++++++++
 include/linux/fs.h      |   7 ++-
 include/uapi/linux/fs.h |   9 ++++
 11 files changed, 254 insertions(+), 193 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index ede7277..dd4733f 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -4025,7 +4025,6 @@ void btrfs_get_block_group_info(struct list_head *groups_list,
 void update_ioctl_balance_args(struct btrfs_fs_info *fs_info, int lock,
 			       struct btrfs_ioctl_balance_args *bargs);
 
-
 /* file.c */
 int btrfs_auto_defrag_init(void);
 void btrfs_auto_defrag_exit(void);
@@ -4058,6 +4057,8 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
 ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
 			      struct file *file_out, loff_t pos_out,
 			      size_t len, unsigned int flags);
+int btrfs_clone_file_range(struct file *file_in, loff_t pos_in,
+			   struct file *file_out, loff_t pos_out, u64 len);
 
 /* tree-defrag.c */
 int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index e67fe6a..232e300 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2925,6 +2925,7 @@ const struct file_operations btrfs_file_operations = {
 	.compat_ioctl	= btrfs_ioctl,
 #endif
 	.copy_file_range = btrfs_copy_file_range,
+	.clone_file_range = btrfs_clone_file_range,
 };
 
 void btrfs_auto_defrag_exit(void)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 0f92735..85b1cae 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3906,49 +3906,10 @@ ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	return ret;
 }
 
-static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
-				       u64 off, u64 olen, u64 destoff)
+int btrfs_clone_file_range(struct file *src_file, loff_t off,
+		struct file *dst_file, loff_t destoff, u64 len)
 {
-	struct fd src_file;
-	int ret;
-
-	/* the destination must be opened for writing */
-	if (!(file->f_mode & FMODE_WRITE) || (file->f_flags & O_APPEND))
-		return -EINVAL;
-
-	ret = mnt_want_write_file(file);
-	if (ret)
-		return ret;
-
-	src_file = fdget(srcfd);
-	if (!src_file.file) {
-		ret = -EBADF;
-		goto out_drop_write;
-	}
-
-	/* the src must be open for reading */
-	if (!(src_file.file->f_mode & FMODE_READ)) {
-		ret = -EINVAL;
-		goto out_fput;
-	}
-
-	ret = btrfs_clone_files(file, src_file.file, off, olen, destoff);
-
-out_fput:
-	fdput(src_file);
-out_drop_write:
-	mnt_drop_write_file(file);
-	return ret;
-}
-
-static long btrfs_ioctl_clone_range(struct file *file, void __user *argp)
-{
-	struct btrfs_ioctl_clone_range_args args;
-
-	if (copy_from_user(&args, argp, sizeof(args)))
-		return -EFAULT;
-	return btrfs_ioctl_clone(file, args.src_fd, args.src_offset,
-				 args.src_length, args.dest_offset);
+	return btrfs_clone_files(dst_file, src_file, off, len, destoff);
 }
 
 /*
@@ -5498,10 +5459,6 @@ long btrfs_ioctl(struct file *file, unsigned int
 		return btrfs_ioctl_dev_info(root, argp);
 	case BTRFS_IOC_BALANCE:
 		return btrfs_ioctl_balance(file, NULL);
-	case BTRFS_IOC_CLONE:
-		return btrfs_ioctl_clone(file, arg, 0, 0, 0);
-	case BTRFS_IOC_CLONE_RANGE:
-		return btrfs_ioctl_clone_range(file, argp);
 	case BTRFS_IOC_TRANS_START:
 		return btrfs_ioctl_trans_start(file);
 	case BTRFS_IOC_TRANS_END:
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index cbc0f4b..e9b978f 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -914,6 +914,61 @@ const struct inode_operations cifs_symlink_inode_ops = {
 #endif
 };
 
+static int cifs_clone_file_range(struct file *src_file, loff_t off,
+		struct file *dst_file, loff_t destoff, u64 len)
+{
+	struct inode *src_inode = file_inode(src_file);
+	struct inode *target_inode = file_inode(dst_file);
+	struct cifsFileInfo *smb_file_src = src_file->private_data;
+	struct cifsFileInfo *smb_file_target = dst_file->private_data;
+	struct cifs_tcon *src_tcon = tlink_tcon(smb_file_src->tlink);
+	struct cifs_tcon *target_tcon = tlink_tcon(smb_file_target->tlink);
+	unsigned int xid;
+	int rc;
+
+	cifs_dbg(FYI, "clone range\n");
+
+	xid = get_xid();
+
+	if (!src_file->private_data || !dst_file->private_data) {
+		rc = -EBADF;
+		cifs_dbg(VFS, "missing cifsFileInfo on copy range src file\n");
+		goto out;
+	}
+
+	/*
+	 * Note: cifs case is easier than btrfs since server responsible for
+	 * checks for proper open modes and file type and if it wants
+	 * server could even support copy of range where source = target
+	 */
+	lock_two_nondirectories(target_inode, src_inode);
+
+	if (len == 0)
+		len = src_inode->i_size - off;
+
+	cifs_dbg(FYI, "about to flush pages\n");
+	/* should we flush first and last page first */
+	truncate_inode_pages_range(&target_inode->i_data, destoff,
+				   PAGE_CACHE_ALIGN(destoff + len)-1);
+
+	if (target_tcon->ses->server->ops->duplicate_extents)
+		rc = target_tcon->ses->server->ops->duplicate_extents(xid,
+			smb_file_src, smb_file_target, off, len, destoff);
+	else
+		rc = -EOPNOTSUPP;
+
+	/* force revalidate of size and timestamps of target file now
+	   that target is updated on the server */
+	CIFS_I(target_inode)->time = 0;
+out_unlock:
+	/* although unlocking in the reverse order from locking is not
+	   strictly necessary here it is a little cleaner to be consistent */
+	unlock_two_nondirectories(src_inode, target_inode);
+out:
+	free_xid(xid);
+	return rc;
+}
+
 const struct file_operations cifs_file_ops = {
 	.read_iter = cifs_loose_read_iter,
 	.write_iter = cifs_file_write_iter,
@@ -926,6 +981,7 @@ const struct file_operations cifs_file_ops = {
 	.splice_read = generic_file_splice_read,
 	.llseek = cifs_llseek,
 	.unlocked_ioctl	= cifs_ioctl,
+	.clone_file_range = cifs_clone_file_range,
 	.setlease = cifs_setlease,
 	.fallocate = cifs_fallocate,
 };
@@ -942,6 +998,8 @@ const struct file_operations cifs_file_strict_ops = {
 	.splice_read = generic_file_splice_read,
 	.llseek = cifs_llseek,
 	.unlocked_ioctl	= cifs_ioctl,
+	.clone_file_range = cifs_clone_file_range,
+	.clone_file_range = cifs_clone_file_range,
 	.setlease = cifs_setlease,
 	.fallocate = cifs_fallocate,
 };
@@ -958,6 +1016,7 @@ const struct file_operations cifs_file_direct_ops = {
 	.mmap = cifs_file_mmap,
 	.splice_read = generic_file_splice_read,
 	.unlocked_ioctl  = cifs_ioctl,
+	.clone_file_range = cifs_clone_file_range,
 	.llseek = cifs_llseek,
 	.setlease = cifs_setlease,
 	.fallocate = cifs_fallocate,
@@ -974,6 +1033,7 @@ const struct file_operations cifs_file_nobrl_ops = {
 	.splice_read = generic_file_splice_read,
 	.llseek = cifs_llseek,
 	.unlocked_ioctl	= cifs_ioctl,
+	.clone_file_range = cifs_clone_file_range,
 	.setlease = cifs_setlease,
 	.fallocate = cifs_fallocate,
 };
@@ -989,6 +1049,7 @@ const struct file_operations cifs_file_strict_nobrl_ops = {
 	.splice_read = generic_file_splice_read,
 	.llseek = cifs_llseek,
 	.unlocked_ioctl	= cifs_ioctl,
+	.clone_file_range = cifs_clone_file_range,
 	.setlease = cifs_setlease,
 	.fallocate = cifs_fallocate,
 };
@@ -1004,6 +1065,7 @@ const struct file_operations cifs_file_direct_nobrl_ops = {
 	.mmap = cifs_file_mmap,
 	.splice_read = generic_file_splice_read,
 	.unlocked_ioctl  = cifs_ioctl,
+	.clone_file_range = cifs_clone_file_range,
 	.llseek = cifs_llseek,
 	.setlease = cifs_setlease,
 	.fallocate = cifs_fallocate,
@@ -1014,6 +1076,7 @@ const struct file_operations cifs_dir_ops = {
 	.release = cifs_closedir,
 	.read    = generic_read_dir,
 	.unlocked_ioctl  = cifs_ioctl,
+	.clone_file_range = cifs_clone_file_range,
 	.llseek = generic_file_llseek,
 };
 
diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index c3cc160..c399513 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -131,7 +131,6 @@ extern int	cifs_setxattr(struct dentry *, const char *, const void *,
 extern ssize_t	cifs_getxattr(struct dentry *, const char *, void *, size_t);
 extern ssize_t	cifs_listxattr(struct dentry *, char *, size_t);
 extern long cifs_ioctl(struct file *filep, unsigned int cmd, unsigned long arg);
-
 #ifdef CONFIG_CIFS_NFSD_EXPORT
 extern const struct export_operations cifs_export_ops;
 #endif /* CONFIG_CIFS_NFSD_EXPORT */
diff --git a/fs/cifs/ioctl.c b/fs/cifs/ioctl.c
index 35cf990..7a3b84e 100644
--- a/fs/cifs/ioctl.c
+++ b/fs/cifs/ioctl.c
@@ -34,73 +34,36 @@
 #include "cifs_ioctl.h"
 #include <linux/btrfs.h>
 
-static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file,
-			unsigned long srcfd, u64 off, u64 len, u64 destoff,
-			bool dup_extents)
+static int cifs_file_clone_range(unsigned int xid, struct file *src_file,
+			  struct file *dst_file)
 {
-	int rc;
-	struct cifsFileInfo *smb_file_target = dst_file->private_data;
+	struct inode *src_inode = file_inode(src_file);
 	struct inode *target_inode = file_inode(dst_file);
-	struct cifs_tcon *target_tcon;
-	struct fd src_file;
 	struct cifsFileInfo *smb_file_src;
-	struct inode *src_inode;
+	struct cifsFileInfo *smb_file_target;
 	struct cifs_tcon *src_tcon;
+	struct cifs_tcon *target_tcon;
+	int rc;
 
 	cifs_dbg(FYI, "ioctl clone range\n");
-	/* the destination must be opened for writing */
-	if (!(dst_file->f_mode & FMODE_WRITE)) {
-		cifs_dbg(FYI, "file target not open for write\n");
-		return -EINVAL;
-	}
 
-	/* check if target volume is readonly and take reference */
-	rc = mnt_want_write_file(dst_file);
-	if (rc) {
-		cifs_dbg(FYI, "mnt_want_write failed with rc %d\n", rc);
-		return rc;
-	}
-
-	src_file = fdget(srcfd);
-	if (!src_file.file) {
-		rc = -EBADF;
-		goto out_drop_write;
-	}
-
-	if (src_file.file->f_op->unlocked_ioctl != cifs_ioctl) {
-		rc = -EBADF;
-		cifs_dbg(VFS, "src file seems to be from a different filesystem type\n");
-		goto out_fput;
-	}
-
-	if ((!src_file.file->private_data) || (!dst_file->private_data)) {
+	if (!src_file->private_data || !dst_file->private_data) {
 		rc = -EBADF;
 		cifs_dbg(VFS, "missing cifsFileInfo on copy range src file\n");
-		goto out_fput;
+		goto out;
 	}
 
 	rc = -EXDEV;
 	smb_file_target = dst_file->private_data;
-	smb_file_src = src_file.file->private_data;
+	smb_file_src = src_file->private_data;
 	src_tcon = tlink_tcon(smb_file_src->tlink);
 	target_tcon = tlink_tcon(smb_file_target->tlink);
 
-	/* check source and target on same server (or volume if dup_extents) */
-	if (dup_extents && (src_tcon != target_tcon)) {
-		cifs_dbg(VFS, "source and target of copy not on same share\n");
-		goto out_fput;
-	}
-
-	if (!dup_extents && (src_tcon->ses != target_tcon->ses)) {
+	if (src_tcon->ses != target_tcon->ses) {
 		cifs_dbg(VFS, "source and target of copy not on same server\n");
-		goto out_fput;
+		goto out;
 	}
 
-	src_inode = file_inode(src_file.file);
-	rc = -EINVAL;
-	if (S_ISDIR(src_inode->i_mode))
-		goto out_fput;
-
 	/*
 	 * Note: cifs case is easier than btrfs since server responsible for
 	 * checks for proper open modes and file type and if it wants
@@ -108,34 +71,66 @@ static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file,
 	 */
 	lock_two_nondirectories(target_inode, src_inode);
 
-	/* determine range to clone */
-	rc = -EINVAL;
-	if (off + len > src_inode->i_size || off + len < off)
-		goto out_unlock;
-	if (len == 0)
-		len = src_inode->i_size - off;
-
 	cifs_dbg(FYI, "about to flush pages\n");
 	/* should we flush first and last page first */
-	truncate_inode_pages_range(&target_inode->i_data, destoff,
-				   PAGE_CACHE_ALIGN(destoff + len)-1);
+	truncate_inode_pages(&target_inode->i_data, 0);
 
-	if (dup_extents && target_tcon->ses->server->ops->duplicate_extents)
-		rc = target_tcon->ses->server->ops->duplicate_extents(xid,
-			smb_file_src, smb_file_target, off, len, destoff);
-	else if (!dup_extents && target_tcon->ses->server->ops->clone_range)
+	if (target_tcon->ses->server->ops->clone_range)
 		rc = target_tcon->ses->server->ops->clone_range(xid,
-			smb_file_src, smb_file_target, off, len, destoff);
+			smb_file_src, smb_file_target, 0, src_inode->i_size, 0);
 	else
 		rc = -EOPNOTSUPP;
 
 	/* force revalidate of size and timestamps of target file now
 	   that target is updated on the server */
 	CIFS_I(target_inode)->time = 0;
-out_unlock:
 	/* although unlocking in the reverse order from locking is not
 	   strictly necessary here it is a little cleaner to be consistent */
 	unlock_two_nondirectories(src_inode, target_inode);
+out:
+	return rc;
+}
+
+static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file,
+			unsigned long srcfd)
+{
+	int rc;
+	struct fd src_file;
+	struct inode *src_inode;
+
+	cifs_dbg(FYI, "ioctl clone range\n");
+	/* the destination must be opened for writing */
+	if (!(dst_file->f_mode & FMODE_WRITE)) {
+		cifs_dbg(FYI, "file target not open for write\n");
+		return -EINVAL;
+	}
+
+	/* check if target volume is readonly and take reference */
+	rc = mnt_want_write_file(dst_file);
+	if (rc) {
+		cifs_dbg(FYI, "mnt_want_write failed with rc %d\n", rc);
+		return rc;
+	}
+
+	src_file = fdget(srcfd);
+	if (!src_file.file) {
+		rc = -EBADF;
+		goto out_drop_write;
+	}
+
+	if (src_file.file->f_op->unlocked_ioctl != cifs_ioctl) {
+		rc = -EBADF;
+		cifs_dbg(VFS, "src file seems to be from a different filesystem type\n");
+		goto out_fput;
+	}
+
+	src_inode = file_inode(src_file.file);
+	rc = -EINVAL;
+	if (S_ISDIR(src_inode->i_mode))
+		goto out_fput;
+
+	rc = cifs_file_clone_range(xid, src_file.file, dst_file);
+
 out_fput:
 	fdput(src_file);
 out_drop_write:
@@ -256,10 +251,7 @@ long cifs_ioctl(struct file *filep, unsigned int command, unsigned long arg)
 			}
 			break;
 		case CIFS_IOC_COPYCHUNK_FILE:
-			rc = cifs_ioctl_clone(xid, filep, arg, 0, 0, 0, false);
-			break;
-		case BTRFS_IOC_CLONE:
-			rc = cifs_ioctl_clone(xid, filep, arg, 0, 0, 0, true);
+			rc = cifs_ioctl_clone(xid, filep, arg);
 			break;
 		case CIFS_IOC_SET_INTEGRITY:
 			if (pSMBFile == NULL)
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 5d01d26..84c6e79 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -215,6 +215,29 @@ static int ioctl_fiemap(struct file *filp, unsigned long arg)
 	return error;
 }
 
+static long ioctl_file_clone(struct file *dst_file, unsigned long srcfd,
+			     u64 off, u64 olen, u64 destoff)
+{
+	struct fd src_file = fdget(srcfd);
+	int ret;
+
+	if (!src_file.file)
+		return -EBADF;
+	ret = vfs_clone_file_range(src_file.file, off, dst_file, destoff, olen);
+	fdput(src_file);
+	return ret;
+}
+
+static long ioctl_file_clone_range(struct file *file, void __user *argp)
+{
+	struct file_clone_range args;
+
+	if (copy_from_user(&args, argp, sizeof(args)))
+		return -EFAULT;
+	return ioctl_file_clone(file, args.src_fd, args.src_offset,
+				args.src_length, args.dest_offset);
+}
+
 #ifdef CONFIG_BLOCK
 
 static inline sector_t logical_to_blk(struct inode *inode, loff_t offset)
@@ -600,6 +623,12 @@ int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
 	case FIGETBSZ:
 		return put_user(inode->i_sb->s_blocksize, argp);
 
+	case FICLONE:
+		return ioctl_file_clone(filp, arg, 0, 0, 0);
+
+	case FICLONERANGE:
+		return ioctl_file_clone_range(filp, argp);
+
 	default:
 		if (S_ISREG(inode->i_mode))
 			error = file_ioctl(filp, cmd, arg);
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index db9b5fe..26f9a23 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -195,65 +195,27 @@ static long nfs42_fallocate(struct file *filep, int mode, loff_t offset, loff_t
 	return nfs42_proc_allocate(filep, offset, len);
 }
 
-static noinline long
-nfs42_ioctl_clone(struct file *dst_file, unsigned long srcfd,
-		  u64 src_off, u64 dst_off, u64 count)
+static int nfs42_clone_file_range(struct file *src_file, loff_t src_off,
+		struct file *dst_file, loff_t dst_off, u64 count)
 {
 	struct inode *dst_inode = file_inode(dst_file);
 	struct nfs_server *server = NFS_SERVER(dst_inode);
-	struct fd src_file;
-	struct inode *src_inode;
+	struct inode *src_inode = file_inode(src_file);
 	unsigned int bs = server->clone_blksize;
 	bool same_inode = false;
 	int ret;
 
-	/* dst file must be opened for writing */
-	if (!(dst_file->f_mode & FMODE_WRITE))
-		return -EINVAL;
-
-	ret = mnt_want_write_file(dst_file);
-	if (ret)
-		return ret;
-
-	src_file = fdget(srcfd);
-	if (!src_file.file) {
-		ret = -EBADF;
-		goto out_drop_write;
-	}
-
-	src_inode = file_inode(src_file.file);
-
-	if (src_inode == dst_inode)
-		same_inode = true;
-
-	/* src file must be opened for reading */
-	if (!(src_file.file->f_mode & FMODE_READ))
-		goto out_fput;
-
-	/* src and dst must be regular files */
-	ret = -EISDIR;
-	if (!S_ISREG(src_inode->i_mode) || !S_ISREG(dst_inode->i_mode))
-		goto out_fput;
-
-	ret = -EXDEV;
-	if (src_file.file->f_path.mnt != dst_file->f_path.mnt ||
-	    src_inode->i_sb != dst_inode->i_sb)
-		goto out_fput;
-
 	/* check alignment w.r.t. clone_blksize */
 	ret = -EINVAL;
 	if (bs) {
 		if (!IS_ALIGNED(src_off, bs) || !IS_ALIGNED(dst_off, bs))
-			goto out_fput;
+			goto out;
 		if (!IS_ALIGNED(count, bs) && i_size_read(src_inode) != (src_off + count))
-			goto out_fput;
+			goto out;
 	}
 
-	/* verify if ranges are overlapped within the same file */
-	if (same_inode) {
-		if (dst_off + count > src_off && dst_off < src_off + count)
-			goto out_fput;
-	}
+	if (src_inode == dst_inode)
+		same_inode = true;
 
 	/* XXX: do we lock at all? what if server needs CB_RECALL_LAYOUT? */
 	if (same_inode) {
@@ -275,7 +237,7 @@ nfs42_ioctl_clone(struct file *dst_file, unsigned long srcfd,
 	if (ret)
 		goto out_unlock;
 
-	ret = nfs42_proc_clone(src_file.file, dst_file, src_off, dst_off, count);
+	ret = nfs42_proc_clone(src_file, dst_file, src_off, dst_off, count);
 
 	/* truncate inode page cache of the dst range so that future reads can fetch
 	 * new data from server */
@@ -292,37 +254,9 @@ out_unlock:
 		mutex_unlock(&dst_inode->i_mutex);
 		mutex_unlock(&src_inode->i_mutex);
 	}
-out_fput:
-	fdput(src_file);
-out_drop_write:
-	mnt_drop_write_file(dst_file);
+out:
 	return ret;
 }
-
-static long nfs42_ioctl_clone_range(struct file *dst_file, void __user *argp)
-{
-	struct btrfs_ioctl_clone_range_args args;
-
-	if (copy_from_user(&args, argp, sizeof(args)))
-		return -EFAULT;
-
-	return nfs42_ioctl_clone(dst_file, args.src_fd, args.src_offset,
-				 args.dest_offset, args.src_length);
-}
-
-long nfs4_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
-{
-	void __user *argp = (void __user *)arg;
-
-	switch (cmd) {
-	case BTRFS_IOC_CLONE:
-		return nfs42_ioctl_clone(file, arg, 0, 0, 0);
-	case BTRFS_IOC_CLONE_RANGE:
-		return nfs42_ioctl_clone_range(file, argp);
-	}
-
-	return -ENOTTY;
-}
 #endif /* CONFIG_NFS_V4_2 */
 
 const struct file_operations nfs4_file_operations = {
@@ -342,8 +276,7 @@ const struct file_operations nfs4_file_operations = {
 #ifdef CONFIG_NFS_V4_2
 	.llseek		= nfs4_file_llseek,
 	.fallocate	= nfs42_fallocate,
-	.unlocked_ioctl = nfs4_ioctl,
-	.compat_ioctl	= nfs4_ioctl,
+	.clone_file_range = nfs42_clone_file_range,
 #else
 	.llseek		= nfs_file_llseek,
 #endif
diff --git a/fs/read_write.c b/fs/read_write.c
index 6c1aa73..9e3dd8f 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1451,3 +1451,75 @@ out1:
 out2:
 	return ret;
 }
+
+static int clone_verify_area(struct file *file, loff_t pos, u64 len, bool write)
+{
+	struct inode *inode = file_inode(file);
+
+	if (unlikely(pos < 0))
+		return -EINVAL;
+
+	 if (unlikely((loff_t) (pos + len) < 0))
+		return -EINVAL;
+
+	if (unlikely(inode->i_flctx && mandatory_lock(inode))) {
+		loff_t end = len ? pos + len - 1 : OFFSET_MAX;
+		int retval;
+
+		retval = locks_mandatory_area(file, pos, end,
+				write ? F_WRLCK : F_RDLCK);
+		if (retval < 0)
+			return retval;
+	}
+
+	return security_file_permission(file, write ? MAY_WRITE : MAY_READ);
+}
+
+int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
+		struct file *file_out, loff_t pos_out, u64 len)
+{
+	struct inode *inode_in = file_inode(file_in);
+	struct inode *inode_out = file_inode(file_out);
+	int ret;
+
+	if (inode_in->i_sb != inode_out->i_sb ||
+	    file_in->f_path.mnt != file_out->f_path.mnt)
+		return -EXDEV;
+
+	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
+		return -EISDIR;
+	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
+		return -EOPNOTSUPP;
+
+	if (!(file_in->f_mode & FMODE_READ) ||
+	    !(file_out->f_mode & FMODE_WRITE) ||
+	    (file_out->f_flags & O_APPEND) ||
+	    !file_in->f_op->clone_file_range)
+		return -EBADF;
+
+	ret = clone_verify_area(file_in, pos_in, len, false);
+	if (ret)
+		return ret;
+
+	ret = clone_verify_area(file_out, pos_out, len, true);
+	if (ret)
+		return ret;
+
+	if (pos_in + len > i_size_read(inode_in))
+		return -EINVAL;
+
+	ret = mnt_want_write_file(file_out);
+	if (ret)
+		return ret;
+
+	ret = file_in->f_op->clone_file_range(file_in, pos_in,
+			file_out, pos_out, len);
+	if (!ret) {
+		fsnotify_access(file_in);
+		fsnotify_modify(file_out);
+	}
+
+	mnt_drop_write_file(file_out);
+	return ret;
+}
+EXPORT_SYMBOL(vfs_clone_file_range);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index af559ac..59bf96d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1629,7 +1629,10 @@ struct file_operations {
 #ifndef CONFIG_MMU
 	unsigned (*mmap_capabilities)(struct file *);
 #endif
-	ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, loff_t, size_t, unsigned int);
+	ssize_t (*copy_file_range)(struct file *, loff_t, struct file *,
+			loff_t, size_t, unsigned int);
+	int (*clone_file_range)(struct file *, loff_t, struct file *, loff_t,
+			u64);
 };
 
 struct inode_operations {
@@ -1683,6 +1686,8 @@ extern ssize_t vfs_writev(struct file *, const struct iovec __user *,
 		unsigned long, loff_t *);
 extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
 				   loff_t, size_t, unsigned int);
+extern int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
+		struct file *file_out, loff_t pos_out, u64 len);
 
 struct super_operations {
    	struct inode *(*alloc_inode)(struct super_block *sb);
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index f15d980..cd5db7f 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -39,6 +39,13 @@
 #define RENAME_EXCHANGE		(1 << 1)	/* Exchange source and dest */
 #define RENAME_WHITEOUT		(1 << 2)	/* Whiteout source */
 
+struct file_clone_range {
+	__s64 src_fd;
+	__u64 src_offset;
+	__u64 src_length;
+	__u64 dest_offset;
+};
+
 struct fstrim_range {
 	__u64 start;
 	__u64 len;
@@ -159,6 +166,8 @@ struct inodes_stat_t {
 #define FIFREEZE	_IOWR('X', 119, int)	/* Freeze */
 #define FITHAW		_IOWR('X', 120, int)	/* Thaw */
 #define FITRIM		_IOWR('X', 121, struct fstrim_range)	/* Trim */
+#define FICLONE		_IOW(0x94, 9, int)
+#define FICLONERANGE	_IOW(0x94, 13, struct file_clone_range)
 
 #define	FS_IOC_GETFLAGS			_IOR('f', 1, long)
 #define	FS_IOC_SETFLAGS			_IOW('f', 2, long)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/4] nfsd: Pass filehandle to nfs4_preprocess_stateid_op()
  2015-12-03 11:59 move btrfs clone ioctls to common code V2 Christoph Hellwig
  2015-12-03 11:59   ` Christoph Hellwig
  2015-12-03 11:59   ` Christoph Hellwig
@ 2015-12-03 11:59 ` Christoph Hellwig
  2015-12-03 11:59 ` [PATCH 4/4] nfsd: implement the NFSv4.2 CLONE operation Christoph Hellwig
  3 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2015-12-03 11:59 UTC (permalink / raw)
  To: viro
  Cc: tao.peng, jeff.layton, bfields, linux-fsdevel, linux-btrfs,
	linux-nfs, linux-cifs, Anna Schumaker, Anna Schumaker

From: Anna Schumaker <Anna.Schumaker@netapp.com>

This will be needed so COPY can look up the saved_fh in addition to the
current_fh.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
---
 fs/nfsd/nfs4proc.c  | 16 +++++++++-------
 fs/nfsd/nfs4state.c |  5 ++---
 fs/nfsd/state.h     |  4 ++--
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index a9f096c..3ba10a3 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -774,8 +774,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags);
 
 	/* check stateid */
-	status = nfs4_preprocess_stateid_op(rqstp, cstate, &read->rd_stateid,
-			RD_STATE, &read->rd_filp, &read->rd_tmp_file);
+	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
+					&read->rd_stateid, RD_STATE,
+					&read->rd_filp, &read->rd_tmp_file);
 	if (status) {
 		dprintk("NFSD: nfsd4_read: couldn't process stateid!\n");
 		goto out;
@@ -921,7 +922,8 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 
 	if (setattr->sa_iattr.ia_valid & ATTR_SIZE) {
 		status = nfs4_preprocess_stateid_op(rqstp, cstate,
-			&setattr->sa_stateid, WR_STATE, NULL, NULL);
+				&cstate->current_fh, &setattr->sa_stateid,
+				WR_STATE, NULL, NULL);
 		if (status) {
 			dprintk("NFSD: nfsd4_setattr: couldn't process stateid!\n");
 			return status;
@@ -985,8 +987,8 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	if (write->wr_offset >= OFFSET_MAX)
 		return nfserr_inval;
 
-	status = nfs4_preprocess_stateid_op(rqstp, cstate, stateid, WR_STATE,
-			&filp, NULL);
+	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
+						stateid, WR_STATE, &filp, NULL);
 	if (status) {
 		dprintk("NFSD: nfsd4_write: couldn't process stateid!\n");
 		return status;
@@ -1016,7 +1018,7 @@ nfsd4_fallocate(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	__be32 status = nfserr_notsupp;
 	struct file *file;
 
-	status = nfs4_preprocess_stateid_op(rqstp, cstate,
+	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
 					    &fallocate->falloc_stateid,
 					    WR_STATE, &file, NULL);
 	if (status != nfs_ok) {
@@ -1055,7 +1057,7 @@ nfsd4_seek(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	__be32 status;
 	struct file *file;
 
-	status = nfs4_preprocess_stateid_op(rqstp, cstate,
+	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
 					    &seek->seek_stateid,
 					    RD_STATE, &file, NULL);
 	if (status) {
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 6b800b5..df5dba6 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4797,10 +4797,9 @@ nfs4_check_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfs4_stid *s,
  */
 __be32
 nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
-		struct nfsd4_compound_state *cstate, stateid_t *stateid,
-		int flags, struct file **filpp, bool *tmp_file)
+		struct nfsd4_compound_state *cstate, struct svc_fh *fhp,
+		stateid_t *stateid, int flags, struct file **filpp, bool *tmp_file)
 {
-	struct svc_fh *fhp = &cstate->current_fh;
 	struct inode *ino = d_inode(fhp->fh_dentry);
 	struct net *net = SVC_NET(rqstp);
 	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 77fdf4d..99432b7 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -578,8 +578,8 @@ struct nfsd4_compound_state;
 struct nfsd_net;
 
 extern __be32 nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
-		struct nfsd4_compound_state *cstate, stateid_t *stateid,
-		int flags, struct file **filp, bool *tmp_file);
+		struct nfsd4_compound_state *cstate, struct svc_fh *fhp,
+		stateid_t *stateid, int flags, struct file **filp, bool *tmp_file);
 __be32 nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
 		     stateid_t *stateid, unsigned char typemask,
 		     struct nfs4_stid **s, struct nfsd_net *nn);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/4] nfsd: implement the NFSv4.2 CLONE operation
  2015-12-03 11:59 move btrfs clone ioctls to common code V2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2015-12-03 11:59 ` [PATCH 3/4] nfsd: Pass filehandle to nfs4_preprocess_stateid_op() Christoph Hellwig
@ 2015-12-03 11:59 ` Christoph Hellwig
  3 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2015-12-03 11:59 UTC (permalink / raw)
  To: viro
  Cc: tao.peng, jeff.layton, bfields, linux-fsdevel, linux-btrfs,
	linux-nfs, linux-cifs

This is basically a remote version of the btrfs CLONE operation,
so the implementation is fairly trivial.  Made even more trivial
by stealing the XDR code and general framework Anna Schumaker's
COPY prototype.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
---
 fs/nfsd/nfs4proc.c   | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/nfsd/nfs4xdr.c    | 21 +++++++++++++++++++++
 fs/nfsd/vfs.c        |  8 ++++++++
 fs/nfsd/vfs.h        |  2 ++
 fs/nfsd/xdr4.h       | 10 ++++++++++
 include/linux/nfs4.h |  4 ++--
 6 files changed, 90 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 3ba10a3..819ad81 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1012,6 +1012,47 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 }
 
 static __be32
+nfsd4_clone(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
+		struct nfsd4_clone *clone)
+{
+	struct file *src, *dst;
+	__be32 status;
+
+	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->save_fh,
+					    &clone->cl_src_stateid, RD_STATE,
+					    &src, NULL);
+	if (status) {
+		dprintk("NFSD: %s: couldn't process src stateid!\n", __func__);
+		goto out;
+	}
+
+	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
+					    &clone->cl_dst_stateid, WR_STATE,
+					    &dst, NULL);
+	if (status) {
+		dprintk("NFSD: %s: couldn't process dst stateid!\n", __func__);
+		goto out_put_src;
+	}
+
+	/* fix up for NFS-specific error code */
+	if (!S_ISREG(file_inode(src)->i_mode) ||
+	    !S_ISREG(file_inode(dst)->i_mode)) {
+		status = nfserr_wrong_type;
+		goto out_put_dst;
+	}
+
+	status = nfsd4_clone_file_range(src, clone->cl_src_pos,
+			dst, clone->cl_dst_pos, clone->cl_count);
+
+out_put_dst:
+	fput(dst);
+out_put_src:
+	fput(src);
+out:
+	return status;
+}
+
+static __be32
 nfsd4_fallocate(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		struct nfsd4_fallocate *fallocate, int flags)
 {
@@ -2281,6 +2322,12 @@ static struct nfsd4_operation nfsd4_ops[] = {
 		.op_name = "OP_DEALLOCATE",
 		.op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
 	},
+	[OP_CLONE] = {
+		.op_func = (nfsd4op_func)nfsd4_clone,
+		.op_flags = OP_MODIFIES_SOMETHING | OP_CACHEME,
+		.op_name = "OP_CLONE",
+		.op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
+	},
 	[OP_SEEK] = {
 		.op_func = (nfsd4op_func)nfsd4_seek,
 		.op_name = "OP_SEEK",
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 51c9e9c..924416f 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1675,6 +1675,25 @@ nfsd4_decode_fallocate(struct nfsd4_compoundargs *argp,
 }
 
 static __be32
+nfsd4_decode_clone(struct nfsd4_compoundargs *argp, struct nfsd4_clone *clone)
+{
+	DECODE_HEAD;
+
+	status = nfsd4_decode_stateid(argp, &clone->cl_src_stateid);
+	if (status)
+		return status;
+	status = nfsd4_decode_stateid(argp, &clone->cl_dst_stateid);
+	if (status)
+		return status;
+
+	READ_BUF(8 + 8 + 8);
+	p = xdr_decode_hyper(p, &clone->cl_src_pos);
+	p = xdr_decode_hyper(p, &clone->cl_dst_pos);
+	p = xdr_decode_hyper(p, &clone->cl_count);
+	DECODE_TAIL;
+}
+
+static __be32
 nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek *seek)
 {
 	DECODE_HEAD;
@@ -1785,6 +1804,7 @@ static nfsd4_dec nfsd4_dec_ops[] = {
 	[OP_READ_PLUS]		= (nfsd4_dec)nfsd4_decode_notsupp,
 	[OP_SEEK]		= (nfsd4_dec)nfsd4_decode_seek,
 	[OP_WRITE_SAME]		= (nfsd4_dec)nfsd4_decode_notsupp,
+	[OP_CLONE]		= (nfsd4_dec)nfsd4_decode_clone,
 };
 
 static inline bool
@@ -4292,6 +4312,7 @@ static nfsd4_enc nfsd4_enc_ops[] = {
 	[OP_READ_PLUS]		= (nfsd4_enc)nfsd4_encode_noop,
 	[OP_SEEK]		= (nfsd4_enc)nfsd4_encode_seek,
 	[OP_WRITE_SAME]		= (nfsd4_enc)nfsd4_encode_noop,
+	[OP_CLONE]		= (nfsd4_enc)nfsd4_encode_noop,
 };
 
 /*
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 994d66f..5411bf0 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -36,6 +36,7 @@
 #endif /* CONFIG_NFSD_V3 */
 
 #ifdef CONFIG_NFSD_V4
+#include "../internal.h"
 #include "acl.h"
 #include "idmap.h"
 #endif /* CONFIG_NFSD_V4 */
@@ -498,6 +499,13 @@ __be32 nfsd4_set_nfs4_label(struct svc_rqst *rqstp, struct svc_fh *fhp,
 }
 #endif
 
+__be32 nfsd4_clone_file_range(struct file *src, u64 src_pos, struct file *dst,
+		u64 dst_pos, u64 count)
+{
+	return nfserrno(vfs_clone_file_range(src, src_pos, dst, dst_pos,
+			count));
+}
+
 __be32 nfsd4_vfs_fallocate(struct svc_rqst *rqstp, struct svc_fh *fhp,
 			   struct file *file, loff_t offset, loff_t len,
 			   int flags)
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index fcfc48c..c11ba31 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -56,6 +56,8 @@ __be32          nfsd4_set_nfs4_label(struct svc_rqst *, struct svc_fh *,
 		    struct xdr_netobj *);
 __be32		nfsd4_vfs_fallocate(struct svc_rqst *, struct svc_fh *,
 				    struct file *, loff_t, loff_t, int);
+__be32		nfsd4_clone_file_range(struct file *, u64, struct file *,
+			u64, u64);
 #endif /* CONFIG_NFSD_V4 */
 __be32		nfsd_create(struct svc_rqst *, struct svc_fh *,
 				char *name, int len, struct iattr *attrs,
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index ce7362c..d955481 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -491,6 +491,15 @@ struct nfsd4_fallocate {
 	u64		falloc_length;
 };
 
+struct nfsd4_clone {
+	/* request */
+	stateid_t	cl_src_stateid;
+	stateid_t	cl_dst_stateid;
+	u64		cl_src_pos;
+	u64		cl_dst_pos;
+	u64		cl_count;
+};
+
 struct nfsd4_seek {
 	/* request */
 	stateid_t	seek_stateid;
@@ -555,6 +564,7 @@ struct nfsd4_op {
 		/* NFSv4.2 */
 		struct nfsd4_fallocate		allocate;
 		struct nfsd4_fallocate		deallocate;
+		struct nfsd4_clone		clone;
 		struct nfsd4_seek		seek;
 	} u;
 	struct nfs4_replay *			replay;
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index e7e7853..43aeabd 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -139,10 +139,10 @@ enum nfs_opnum4 {
 Needs to be updated if more operations are defined in future.*/
 
 #define FIRST_NFS4_OP	OP_ACCESS
-#define LAST_NFS4_OP 	OP_WRITE_SAME
 #define LAST_NFS40_OP	OP_RELEASE_LOCKOWNER
 #define LAST_NFS41_OP	OP_RECLAIM_COMPLETE
-#define LAST_NFS42_OP	OP_WRITE_SAME
+#define LAST_NFS42_OP	OP_CLONE
+#define LAST_NFS4_OP	LAST_NFS42_OP
 
 enum nfsstat4 {
 	NFS4_OK = 0,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] vfs: pull btrfs clone API to vfs layer
@ 2015-12-07  0:53     ` Darrick J. Wong
  0 siblings, 0 replies; 27+ messages in thread
From: Darrick J. Wong @ 2015-12-07  0:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, tao.peng, jeff.layton, bfields, linux-fsdevel, linux-btrfs,
	linux-nfs, linux-cifs

On Thu, Dec 03, 2015 at 12:59:50PM +0100, Christoph Hellwig wrote:
> The btrfs clone ioctls are now adopted by other file systems, with NFS
> and CIFS already having support for them, and XFS being under active
> development.  To avoid growth of various slightly incompatible
> implementations, add one to the VFS.  Note that clones are different from
> file copies in several ways:
> 
>  - they are atomic vs other writers
>  - they support whole file clones
>  - they support 64-bit legth clones
>  - they do not allow partial success (aka short writes)
>  - clones are expected to be a fast metadata operation
> 
> Because of that it would be rather cumbersome to try to piggyback them on
> top of the recent clone_file_range infrastructure.  The converse isn't
> true and the clone_file_range system call could try clone file range as
> a first attempt to copy, something that further patches will enable.
> 
> Based on earlier work from Peng Tao.

<snip>

> diff --git a/fs/read_write.c b/fs/read_write.c
> index 6c1aa73..9e3dd8f 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1451,3 +1451,75 @@ out1:
>  out2:
>  	return ret;
>  }
> +
> +static int clone_verify_area(struct file *file, loff_t pos, u64 len, bool write)
> +{
> +	struct inode *inode = file_inode(file);
> +
> +	if (unlikely(pos < 0))
> +		return -EINVAL;
> +
> +	 if (unlikely((loff_t) (pos + len) < 0))
> +		return -EINVAL;
> +
> +	if (unlikely(inode->i_flctx && mandatory_lock(inode))) {
> +		loff_t end = len ? pos + len - 1 : OFFSET_MAX;
> +		int retval;
> +
> +		retval = locks_mandatory_area(file, pos, end,
> +				write ? F_WRLCK : F_RDLCK);
> +		if (retval < 0)
> +			return retval;
> +	}
> +
> +	return security_file_permission(file, write ? MAY_WRITE : MAY_READ);
> +}
> +
> +int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
> +		struct file *file_out, loff_t pos_out, u64 len)
> +{
> +	struct inode *inode_in = file_inode(file_in);
> +	struct inode *inode_out = file_inode(file_out);
> +	int ret;
> +
> +	if (inode_in->i_sb != inode_out->i_sb ||
> +	    file_in->f_path.mnt != file_out->f_path.mnt)
> +		return -EXDEV;
> +
> +	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
> +		return -EISDIR;
> +	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
> +		return -EOPNOTSUPP;

I thought we were moving to -EINVAL for wrong file types?

Though, perhaps "I've also prepared a btrfs patch for this and clone" from the
earlier thread about generic/157 wasn't referring to /this/ patch. :)

In any case, I'm ok with EINVAL, and I haven't heard any objections to
changing -EOPNOTSUPP -> -EINVAL when trying to reflink/dedupe/whatever
non-file non-dir fds.

<shrug> Anyone object?

--D

> +
> +	if (!(file_in->f_mode & FMODE_READ) ||
> +	    !(file_out->f_mode & FMODE_WRITE) ||
> +	    (file_out->f_flags & O_APPEND) ||
> +	    !file_in->f_op->clone_file_range)
> +		return -EBADF;
> +
> +	ret = clone_verify_area(file_in, pos_in, len, false);
> +	if (ret)
> +		return ret;
> +
> +	ret = clone_verify_area(file_out, pos_out, len, true);
> +	if (ret)
> +		return ret;
> +
> +	if (pos_in + len > i_size_read(inode_in))
> +		return -EINVAL;
> +
> +	ret = mnt_want_write_file(file_out);
> +	if (ret)
> +		return ret;
> +
> +	ret = file_in->f_op->clone_file_range(file_in, pos_in,
> +			file_out, pos_out, len);
> +	if (!ret) {
> +		fsnotify_access(file_in);
> +		fsnotify_modify(file_out);
> +	}
> +
> +	mnt_drop_write_file(file_out);
> +	return ret;
> +}
> +EXPORT_SYMBOL(vfs_clone_file_range);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index af559ac..59bf96d 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1629,7 +1629,10 @@ struct file_operations {
>  #ifndef CONFIG_MMU
>  	unsigned (*mmap_capabilities)(struct file *);
>  #endif
> -	ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, loff_t, size_t, unsigned int);
> +	ssize_t (*copy_file_range)(struct file *, loff_t, struct file *,
> +			loff_t, size_t, unsigned int);
> +	int (*clone_file_range)(struct file *, loff_t, struct file *, loff_t,
> +			u64);
>  };
>  
>  struct inode_operations {
> @@ -1683,6 +1686,8 @@ extern ssize_t vfs_writev(struct file *, const struct iovec __user *,
>  		unsigned long, loff_t *);
>  extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
>  				   loff_t, size_t, unsigned int);
> +extern int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
> +		struct file *file_out, loff_t pos_out, u64 len);
>  
>  struct super_operations {
>     	struct inode *(*alloc_inode)(struct super_block *sb);
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index f15d980..cd5db7f 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -39,6 +39,13 @@
>  #define RENAME_EXCHANGE		(1 << 1)	/* Exchange source and dest */
>  #define RENAME_WHITEOUT		(1 << 2)	/* Whiteout source */
>  
> +struct file_clone_range {
> +	__s64 src_fd;
> +	__u64 src_offset;
> +	__u64 src_length;
> +	__u64 dest_offset;
> +};
> +
>  struct fstrim_range {
>  	__u64 start;
>  	__u64 len;
> @@ -159,6 +166,8 @@ struct inodes_stat_t {
>  #define FIFREEZE	_IOWR('X', 119, int)	/* Freeze */
>  #define FITHAW		_IOWR('X', 120, int)	/* Thaw */
>  #define FITRIM		_IOWR('X', 121, struct fstrim_range)	/* Trim */
> +#define FICLONE		_IOW(0x94, 9, int)
> +#define FICLONERANGE	_IOW(0x94, 13, struct file_clone_range)
>  
>  #define	FS_IOC_GETFLAGS			_IOR('f', 1, long)
>  #define	FS_IOC_SETFLAGS			_IOW('f', 2, long)
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] vfs: pull btrfs clone API to vfs layer
@ 2015-12-07  0:53     ` Darrick J. Wong
  0 siblings, 0 replies; 27+ messages in thread
From: Darrick J. Wong @ 2015-12-07  0:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	tao.peng-7I+n7zu2hftEKMMhf/gKZA,
	jeff.layton-7I+n7zu2hftEKMMhf/gKZA,
	bfields-uC3wQj2KruNg9hUCZPvPmw,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA

On Thu, Dec 03, 2015 at 12:59:50PM +0100, Christoph Hellwig wrote:
> The btrfs clone ioctls are now adopted by other file systems, with NFS
> and CIFS already having support for them, and XFS being under active
> development.  To avoid growth of various slightly incompatible
> implementations, add one to the VFS.  Note that clones are different from
> file copies in several ways:
> 
>  - they are atomic vs other writers
>  - they support whole file clones
>  - they support 64-bit legth clones
>  - they do not allow partial success (aka short writes)
>  - clones are expected to be a fast metadata operation
> 
> Because of that it would be rather cumbersome to try to piggyback them on
> top of the recent clone_file_range infrastructure.  The converse isn't
> true and the clone_file_range system call could try clone file range as
> a first attempt to copy, something that further patches will enable.
> 
> Based on earlier work from Peng Tao.

<snip>

> diff --git a/fs/read_write.c b/fs/read_write.c
> index 6c1aa73..9e3dd8f 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1451,3 +1451,75 @@ out1:
>  out2:
>  	return ret;
>  }
> +
> +static int clone_verify_area(struct file *file, loff_t pos, u64 len, bool write)
> +{
> +	struct inode *inode = file_inode(file);
> +
> +	if (unlikely(pos < 0))
> +		return -EINVAL;
> +
> +	 if (unlikely((loff_t) (pos + len) < 0))
> +		return -EINVAL;
> +
> +	if (unlikely(inode->i_flctx && mandatory_lock(inode))) {
> +		loff_t end = len ? pos + len - 1 : OFFSET_MAX;
> +		int retval;
> +
> +		retval = locks_mandatory_area(file, pos, end,
> +				write ? F_WRLCK : F_RDLCK);
> +		if (retval < 0)
> +			return retval;
> +	}
> +
> +	return security_file_permission(file, write ? MAY_WRITE : MAY_READ);
> +}
> +
> +int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
> +		struct file *file_out, loff_t pos_out, u64 len)
> +{
> +	struct inode *inode_in = file_inode(file_in);
> +	struct inode *inode_out = file_inode(file_out);
> +	int ret;
> +
> +	if (inode_in->i_sb != inode_out->i_sb ||
> +	    file_in->f_path.mnt != file_out->f_path.mnt)
> +		return -EXDEV;
> +
> +	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
> +		return -EISDIR;
> +	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
> +		return -EOPNOTSUPP;

I thought we were moving to -EINVAL for wrong file types?

Though, perhaps "I've also prepared a btrfs patch for this and clone" from the
earlier thread about generic/157 wasn't referring to /this/ patch. :)

In any case, I'm ok with EINVAL, and I haven't heard any objections to
changing -EOPNOTSUPP -> -EINVAL when trying to reflink/dedupe/whatever
non-file non-dir fds.

<shrug> Anyone object?

--D

> +
> +	if (!(file_in->f_mode & FMODE_READ) ||
> +	    !(file_out->f_mode & FMODE_WRITE) ||
> +	    (file_out->f_flags & O_APPEND) ||
> +	    !file_in->f_op->clone_file_range)
> +		return -EBADF;
> +
> +	ret = clone_verify_area(file_in, pos_in, len, false);
> +	if (ret)
> +		return ret;
> +
> +	ret = clone_verify_area(file_out, pos_out, len, true);
> +	if (ret)
> +		return ret;
> +
> +	if (pos_in + len > i_size_read(inode_in))
> +		return -EINVAL;
> +
> +	ret = mnt_want_write_file(file_out);
> +	if (ret)
> +		return ret;
> +
> +	ret = file_in->f_op->clone_file_range(file_in, pos_in,
> +			file_out, pos_out, len);
> +	if (!ret) {
> +		fsnotify_access(file_in);
> +		fsnotify_modify(file_out);
> +	}
> +
> +	mnt_drop_write_file(file_out);
> +	return ret;
> +}
> +EXPORT_SYMBOL(vfs_clone_file_range);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index af559ac..59bf96d 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1629,7 +1629,10 @@ struct file_operations {
>  #ifndef CONFIG_MMU
>  	unsigned (*mmap_capabilities)(struct file *);
>  #endif
> -	ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, loff_t, size_t, unsigned int);
> +	ssize_t (*copy_file_range)(struct file *, loff_t, struct file *,
> +			loff_t, size_t, unsigned int);
> +	int (*clone_file_range)(struct file *, loff_t, struct file *, loff_t,
> +			u64);
>  };
>  
>  struct inode_operations {
> @@ -1683,6 +1686,8 @@ extern ssize_t vfs_writev(struct file *, const struct iovec __user *,
>  		unsigned long, loff_t *);
>  extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
>  				   loff_t, size_t, unsigned int);
> +extern int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
> +		struct file *file_out, loff_t pos_out, u64 len);
>  
>  struct super_operations {
>     	struct inode *(*alloc_inode)(struct super_block *sb);
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index f15d980..cd5db7f 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -39,6 +39,13 @@
>  #define RENAME_EXCHANGE		(1 << 1)	/* Exchange source and dest */
>  #define RENAME_WHITEOUT		(1 << 2)	/* Whiteout source */
>  
> +struct file_clone_range {
> +	__s64 src_fd;
> +	__u64 src_offset;
> +	__u64 src_length;
> +	__u64 dest_offset;
> +};
> +
>  struct fstrim_range {
>  	__u64 start;
>  	__u64 len;
> @@ -159,6 +166,8 @@ struct inodes_stat_t {
>  #define FIFREEZE	_IOWR('X', 119, int)	/* Freeze */
>  #define FITHAW		_IOWR('X', 120, int)	/* Thaw */
>  #define FITRIM		_IOWR('X', 121, struct fstrim_range)	/* Trim */
> +#define FICLONE		_IOW(0x94, 9, int)
> +#define FICLONERANGE	_IOW(0x94, 13, struct file_clone_range)
>  
>  #define	FS_IOC_GETFLAGS			_IOR('f', 1, long)
>  #define	FS_IOC_SETFLAGS			_IOW('f', 2, long)
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] vfs: pull btrfs clone API to vfs layer
@ 2015-12-07 15:13       ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2015-12-07 15:13 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, viro, tao.peng, jeff.layton, bfields,
	linux-fsdevel, linux-btrfs, linux-nfs, linux-cifs

On Sun, Dec 06, 2015 at 04:53:31PM -0800, Darrick J. Wong wrote:
> > +	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
> > +		return -EISDIR;
> > +	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
> > +		return -EOPNOTSUPP;
> 
> I thought we were moving to -EINVAL for wrong file types?
> 
> Though, perhaps "I've also prepared a btrfs patch for this and clone" from the
> earlier thread about generic/157 wasn't referring to /this/ patch. :)
> 
> In any case, I'm ok with EINVAL, and I haven't heard any objections to
> changing -EOPNOTSUPP -> -EINVAL when trying to reflink/dedupe/whatever
> non-file non-dir fds.

I'm fine with with EINVAL - not sure why I ended up with EOPNOTSUP,
probably because 157 is already failing as in general the errors for
something in the VFS vs a specific ioctl handler are just too different.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] vfs: pull btrfs clone API to vfs layer
@ 2015-12-07 15:13       ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2015-12-07 15:13 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	tao.peng-7I+n7zu2hftEKMMhf/gKZA,
	jeff.layton-7I+n7zu2hftEKMMhf/gKZA,
	bfields-uC3wQj2KruNg9hUCZPvPmw,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA

On Sun, Dec 06, 2015 at 04:53:31PM -0800, Darrick J. Wong wrote:
> > +	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
> > +		return -EISDIR;
> > +	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
> > +		return -EOPNOTSUPP;
> 
> I thought we were moving to -EINVAL for wrong file types?
> 
> Though, perhaps "I've also prepared a btrfs patch for this and clone" from the
> earlier thread about generic/157 wasn't referring to /this/ patch. :)
> 
> In any case, I'm ok with EINVAL, and I haven't heard any objections to
> changing -EOPNOTSUPP -> -EINVAL when trying to reflink/dedupe/whatever
> non-file non-dir fds.

I'm fine with with EINVAL - not sure why I ended up with EOPNOTSUP,
probably because 157 is already failing as in general the errors for
something in the VFS vs a specific ioctl handler are just too different.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] vfs: pull btrfs clone API to vfs layer
  2015-12-07 15:13       ` Christoph Hellwig
  (?)
@ 2015-12-07 21:09       ` Darrick J. Wong
  -1 siblings, 0 replies; 27+ messages in thread
From: Darrick J. Wong @ 2015-12-07 21:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, tao.peng, jeff.layton, bfields, linux-fsdevel, linux-btrfs,
	linux-nfs, linux-cifs

On Mon, Dec 07, 2015 at 04:13:19PM +0100, Christoph Hellwig wrote:
> On Sun, Dec 06, 2015 at 04:53:31PM -0800, Darrick J. Wong wrote:
> > > +	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
> > > +		return -EISDIR;
> > > +	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
> > > +		return -EOPNOTSUPP;
> > 
> > I thought we were moving to -EINVAL for wrong file types?
> > 
> > Though, perhaps "I've also prepared a btrfs patch for this and clone" from the
> > earlier thread about generic/157 wasn't referring to /this/ patch. :)
> > 
> > In any case, I'm ok with EINVAL, and I haven't heard any objections to
> > changing -EOPNOTSUPP -> -EINVAL when trying to reflink/dedupe/whatever
> > non-file non-dir fds.
> 
> I'm fine with with EINVAL - not sure why I ended up with EOPNOTSUP,
> probably because 157 is already failing as in general the errors for
> something in the VFS vs a specific ioctl handler are just too different.

Ok, I'm going to ensure that generic/1[57-60] all look for EINVAL when
the file type is wrong, and resend the xfstests patches.  I'll also
patch them up to accept the error codes that btrfs spit out before the
ioctl hoist.

--D

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] vfs: pull btrfs clone API to vfs layer
@ 2015-12-08  1:54         ` Darrick J. Wong
  0 siblings, 0 replies; 27+ messages in thread
From: Darrick J. Wong @ 2015-12-08  1:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, tao.peng, jeff.layton, bfields, linux-fsdevel, linux-btrfs,
	linux-nfs, linux-cifs

On Mon, Dec 07, 2015 at 04:13:19PM +0100, Christoph Hellwig wrote:
> On Sun, Dec 06, 2015 at 04:53:31PM -0800, Darrick J. Wong wrote:
> > > +	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
> > > +		return -EISDIR;
> > > +	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
> > > +		return -EOPNOTSUPP;
> > 
> > I thought we were moving to -EINVAL for wrong file types?
> > 
> > Though, perhaps "I've also prepared a btrfs patch for this and clone" from the
> > earlier thread about generic/157 wasn't referring to /this/ patch. :)
> > 
> > In any case, I'm ok with EINVAL, and I haven't heard any objections to
> > changing -EOPNOTSUPP -> -EINVAL when trying to reflink/dedupe/whatever
> > non-file non-dir fds.
> 
> I'm fine with with EINVAL - not sure why I ended up with EOPNOTSUP,
> probably because 157 is already failing as in general the errors for
> something in the VFS vs a specific ioctl handler are just too different.

Ok, will have respun fixes for 157/158 soon.

--D

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] vfs: pull btrfs clone API to vfs layer
@ 2015-12-08  1:54         ` Darrick J. Wong
  0 siblings, 0 replies; 27+ messages in thread
From: Darrick J. Wong @ 2015-12-08  1:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	tao.peng-7I+n7zu2hftEKMMhf/gKZA,
	jeff.layton-7I+n7zu2hftEKMMhf/gKZA,
	bfields-uC3wQj2KruNg9hUCZPvPmw,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA

On Mon, Dec 07, 2015 at 04:13:19PM +0100, Christoph Hellwig wrote:
> On Sun, Dec 06, 2015 at 04:53:31PM -0800, Darrick J. Wong wrote:
> > > +	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
> > > +		return -EISDIR;
> > > +	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
> > > +		return -EOPNOTSUPP;
> > 
> > I thought we were moving to -EINVAL for wrong file types?
> > 
> > Though, perhaps "I've also prepared a btrfs patch for this and clone" from the
> > earlier thread about generic/157 wasn't referring to /this/ patch. :)
> > 
> > In any case, I'm ok with EINVAL, and I haven't heard any objections to
> > changing -EOPNOTSUPP -> -EINVAL when trying to reflink/dedupe/whatever
> > non-file non-dir fds.
> 
> I'm fine with with EINVAL - not sure why I ended up with EOPNOTSUP,
> probably because 157 is already failing as in general the errors for
> something in the VFS vs a specific ioctl handler are just too different.

Ok, will have respun fixes for 157/158 soon.

--D

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/4] locks: new locks_mandatory_area calling convention
  2015-12-03 11:59   ` Christoph Hellwig
  (?)
@ 2015-12-08  4:05   ` Al Viro
  2015-12-08 14:54       ` Christoph Hellwig
  -1 siblings, 1 reply; 27+ messages in thread
From: Al Viro @ 2015-12-08  4:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: tao.peng, jeff.layton, bfields, linux-fsdevel, linux-btrfs,
	linux-nfs, linux-cifs

On Thu, Dec 03, 2015 at 12:59:49PM +0100, Christoph Hellwig wrote:
> Pass a loff_t end for the last byte instead of the 32-bit count
> parameter to allow full file clones even on 32-bit architectures.
> While we're at it also drop the pointless inode argument and simplify
> the read/write selection.

locks_mandatory_area() contains this:
        if (filp && !(filp->f_flags & O_NONBLOCK))
                sleep = true;
which is a strong hint that filp might be NULL.  And indeed it might -
        error = locks_verify_truncate(inode, NULL, length);
in vfs_truncate() and
        host_err = locks_verify_truncate(inode, NULL, iap->ia_size);
in nfsd_get_write_access().  Both are broken by that commit.

Where the hell would truncate(2) get struct file, anyway?  IOW, the inode
argument is _not_ pointless; re-added.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/4] locks: new locks_mandatory_area calling convention
@ 2015-12-08 14:54       ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2015-12-08 14:54 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, tao.peng, jeff.layton, bfields, linux-fsdevel,
	linux-btrfs, linux-nfs, linux-cifs

On Tue, Dec 08, 2015 at 04:05:04AM +0000, Al Viro wrote:
> Where the hell would truncate(2) get struct file, anyway?  IOW, the inode
> argument is _not_ pointless; re-added.

Oh, right.  Interestingly is seems like xfstests has no coverage of this
code path at all.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/4] locks: new locks_mandatory_area calling convention
@ 2015-12-08 14:54       ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2015-12-08 14:54 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, tao.peng-7I+n7zu2hftEKMMhf/gKZA,
	jeff.layton-7I+n7zu2hftEKMMhf/gKZA,
	bfields-uC3wQj2KruNg9hUCZPvPmw,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA

On Tue, Dec 08, 2015 at 04:05:04AM +0000, Al Viro wrote:
> Where the hell would truncate(2) get struct file, anyway?  IOW, the inode
> argument is _not_ pointless; re-added.

Oh, right.  Interestingly is seems like xfstests has no coverage of this
code path at all.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/4] locks: new locks_mandatory_area calling convention
@ 2015-12-08 16:16         ` Al Viro
  0 siblings, 0 replies; 27+ messages in thread
From: Al Viro @ 2015-12-08 16:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: tao.peng, jeff.layton, bfields, linux-fsdevel, linux-btrfs,
	linux-nfs, linux-cifs

On Tue, Dec 08, 2015 at 03:54:53PM +0100, Christoph Hellwig wrote:
> On Tue, Dec 08, 2015 at 04:05:04AM +0000, Al Viro wrote:
> > Where the hell would truncate(2) get struct file, anyway?  IOW, the inode
> > argument is _not_ pointless; re-added.
> 
> Oh, right.  Interestingly is seems like xfstests has no coverage of this
> code path at all.

LTP does (ftruncate04)...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/4] locks: new locks_mandatory_area calling convention
@ 2015-12-08 16:16         ` Al Viro
  0 siblings, 0 replies; 27+ messages in thread
From: Al Viro @ 2015-12-08 16:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: tao.peng-7I+n7zu2hftEKMMhf/gKZA,
	jeff.layton-7I+n7zu2hftEKMMhf/gKZA,
	bfields-uC3wQj2KruNg9hUCZPvPmw,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA

On Tue, Dec 08, 2015 at 03:54:53PM +0100, Christoph Hellwig wrote:
> On Tue, Dec 08, 2015 at 04:05:04AM +0000, Al Viro wrote:
> > Where the hell would truncate(2) get struct file, anyway?  IOW, the inode
> > argument is _not_ pointless; re-added.
> 
> Oh, right.  Interestingly is seems like xfstests has no coverage of this
> code path at all.

LTP does (ftruncate04)...
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] vfs: pull btrfs clone API to vfs layer
@ 2015-12-09 20:40     ` Darrick J. Wong
  0 siblings, 0 replies; 27+ messages in thread
From: Darrick J. Wong @ 2015-12-09 20:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, tao.peng, jeff.layton, bfields, linux-fsdevel, linux-btrfs,
	linux-nfs, linux-cifs

On Thu, Dec 03, 2015 at 12:59:50PM +0100, Christoph Hellwig wrote:
> The btrfs clone ioctls are now adopted by other file systems, with NFS
> and CIFS already having support for them, and XFS being under active
> development.  To avoid growth of various slightly incompatible
> implementations, add one to the VFS.  Note that clones are different from
> file copies in several ways:
> 
>  - they are atomic vs other writers
>  - they support whole file clones
>  - they support 64-bit legth clones
>  - they do not allow partial success (aka short writes)
>  - clones are expected to be a fast metadata operation
> 
> Because of that it would be rather cumbersome to try to piggyback them on
> top of the recent clone_file_range infrastructure.  The converse isn't
> true and the clone_file_range system call could try clone file range as
> a first attempt to copy, something that further patches will enable.
> 
> Based on earlier work from Peng Tao.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/btrfs/ctree.h        |   3 +-
>  fs/btrfs/file.c         |   1 +
>  fs/btrfs/ioctl.c        |  49 ++-----------------
>  fs/cifs/cifsfs.c        |  63 ++++++++++++++++++++++++
>  fs/cifs/cifsfs.h        |   1 -
>  fs/cifs/ioctl.c         | 126 +++++++++++++++++++++++-------------------------
>  fs/ioctl.c              |  29 +++++++++++

I tried this patch series on ppc64 (w/ 32-bit powerpc userland) and I think
it needs to fix up the compat ioctl to make the vfs call...

diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
index dcf2653..70d4b10 100644
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -1580,6 +1580,10 @@ COMPAT_SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd,
                goto out_fput;
 #endif
 
+       case FICLONE:
+       case FICLONERANGE:
+               goto do_ioctl;
+
        case FIBMAP:
        case FIGETBSZ:
        case FIONREAD:

--D

>  fs/nfs/nfs4file.c       |  87 ++++-----------------------------
>  fs/read_write.c         |  72 +++++++++++++++++++++++++++
>  include/linux/fs.h      |   7 ++-
>  include/uapi/linux/fs.h |   9 ++++
>  11 files changed, 254 insertions(+), 193 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index ede7277..dd4733f 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -4025,7 +4025,6 @@ void btrfs_get_block_group_info(struct list_head *groups_list,
>  void update_ioctl_balance_args(struct btrfs_fs_info *fs_info, int lock,
>  			       struct btrfs_ioctl_balance_args *bargs);
>  
> -
>  /* file.c */
>  int btrfs_auto_defrag_init(void);
>  void btrfs_auto_defrag_exit(void);
> @@ -4058,6 +4057,8 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
>  ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  			      struct file *file_out, loff_t pos_out,
>  			      size_t len, unsigned int flags);
> +int btrfs_clone_file_range(struct file *file_in, loff_t pos_in,
> +			   struct file *file_out, loff_t pos_out, u64 len);
>  
>  /* tree-defrag.c */
>  int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index e67fe6a..232e300 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2925,6 +2925,7 @@ const struct file_operations btrfs_file_operations = {
>  	.compat_ioctl	= btrfs_ioctl,
>  #endif
>  	.copy_file_range = btrfs_copy_file_range,
> +	.clone_file_range = btrfs_clone_file_range,
>  };
>  
>  void btrfs_auto_defrag_exit(void)
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 0f92735..85b1cae 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3906,49 +3906,10 @@ ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  	return ret;
>  }
>  
> -static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
> -				       u64 off, u64 olen, u64 destoff)
> +int btrfs_clone_file_range(struct file *src_file, loff_t off,
> +		struct file *dst_file, loff_t destoff, u64 len)
>  {
> -	struct fd src_file;
> -	int ret;
> -
> -	/* the destination must be opened for writing */
> -	if (!(file->f_mode & FMODE_WRITE) || (file->f_flags & O_APPEND))
> -		return -EINVAL;
> -
> -	ret = mnt_want_write_file(file);
> -	if (ret)
> -		return ret;
> -
> -	src_file = fdget(srcfd);
> -	if (!src_file.file) {
> -		ret = -EBADF;
> -		goto out_drop_write;
> -	}
> -
> -	/* the src must be open for reading */
> -	if (!(src_file.file->f_mode & FMODE_READ)) {
> -		ret = -EINVAL;
> -		goto out_fput;
> -	}
> -
> -	ret = btrfs_clone_files(file, src_file.file, off, olen, destoff);
> -
> -out_fput:
> -	fdput(src_file);
> -out_drop_write:
> -	mnt_drop_write_file(file);
> -	return ret;
> -}
> -
> -static long btrfs_ioctl_clone_range(struct file *file, void __user *argp)
> -{
> -	struct btrfs_ioctl_clone_range_args args;
> -
> -	if (copy_from_user(&args, argp, sizeof(args)))
> -		return -EFAULT;
> -	return btrfs_ioctl_clone(file, args.src_fd, args.src_offset,
> -				 args.src_length, args.dest_offset);
> +	return btrfs_clone_files(dst_file, src_file, off, len, destoff);
>  }
>  
>  /*
> @@ -5498,10 +5459,6 @@ long btrfs_ioctl(struct file *file, unsigned int
>  		return btrfs_ioctl_dev_info(root, argp);
>  	case BTRFS_IOC_BALANCE:
>  		return btrfs_ioctl_balance(file, NULL);
> -	case BTRFS_IOC_CLONE:
> -		return btrfs_ioctl_clone(file, arg, 0, 0, 0);
> -	case BTRFS_IOC_CLONE_RANGE:
> -		return btrfs_ioctl_clone_range(file, argp);
>  	case BTRFS_IOC_TRANS_START:
>  		return btrfs_ioctl_trans_start(file);
>  	case BTRFS_IOC_TRANS_END:
> diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> index cbc0f4b..e9b978f 100644
> --- a/fs/cifs/cifsfs.c
> +++ b/fs/cifs/cifsfs.c
> @@ -914,6 +914,61 @@ const struct inode_operations cifs_symlink_inode_ops = {
>  #endif
>  };
>  
> +static int cifs_clone_file_range(struct file *src_file, loff_t off,
> +		struct file *dst_file, loff_t destoff, u64 len)
> +{
> +	struct inode *src_inode = file_inode(src_file);
> +	struct inode *target_inode = file_inode(dst_file);
> +	struct cifsFileInfo *smb_file_src = src_file->private_data;
> +	struct cifsFileInfo *smb_file_target = dst_file->private_data;
> +	struct cifs_tcon *src_tcon = tlink_tcon(smb_file_src->tlink);
> +	struct cifs_tcon *target_tcon = tlink_tcon(smb_file_target->tlink);
> +	unsigned int xid;
> +	int rc;
> +
> +	cifs_dbg(FYI, "clone range\n");
> +
> +	xid = get_xid();
> +
> +	if (!src_file->private_data || !dst_file->private_data) {
> +		rc = -EBADF;
> +		cifs_dbg(VFS, "missing cifsFileInfo on copy range src file\n");
> +		goto out;
> +	}
> +
> +	/*
> +	 * Note: cifs case is easier than btrfs since server responsible for
> +	 * checks for proper open modes and file type and if it wants
> +	 * server could even support copy of range where source = target
> +	 */
> +	lock_two_nondirectories(target_inode, src_inode);
> +
> +	if (len == 0)
> +		len = src_inode->i_size - off;
> +
> +	cifs_dbg(FYI, "about to flush pages\n");
> +	/* should we flush first and last page first */
> +	truncate_inode_pages_range(&target_inode->i_data, destoff,
> +				   PAGE_CACHE_ALIGN(destoff + len)-1);
> +
> +	if (target_tcon->ses->server->ops->duplicate_extents)
> +		rc = target_tcon->ses->server->ops->duplicate_extents(xid,
> +			smb_file_src, smb_file_target, off, len, destoff);
> +	else
> +		rc = -EOPNOTSUPP;
> +
> +	/* force revalidate of size and timestamps of target file now
> +	   that target is updated on the server */
> +	CIFS_I(target_inode)->time = 0;
> +out_unlock:
> +	/* although unlocking in the reverse order from locking is not
> +	   strictly necessary here it is a little cleaner to be consistent */
> +	unlock_two_nondirectories(src_inode, target_inode);
> +out:
> +	free_xid(xid);
> +	return rc;
> +}
> +
>  const struct file_operations cifs_file_ops = {
>  	.read_iter = cifs_loose_read_iter,
>  	.write_iter = cifs_file_write_iter,
> @@ -926,6 +981,7 @@ const struct file_operations cifs_file_ops = {
>  	.splice_read = generic_file_splice_read,
>  	.llseek = cifs_llseek,
>  	.unlocked_ioctl	= cifs_ioctl,
> +	.clone_file_range = cifs_clone_file_range,
>  	.setlease = cifs_setlease,
>  	.fallocate = cifs_fallocate,
>  };
> @@ -942,6 +998,8 @@ const struct file_operations cifs_file_strict_ops = {
>  	.splice_read = generic_file_splice_read,
>  	.llseek = cifs_llseek,
>  	.unlocked_ioctl	= cifs_ioctl,
> +	.clone_file_range = cifs_clone_file_range,
> +	.clone_file_range = cifs_clone_file_range,
>  	.setlease = cifs_setlease,
>  	.fallocate = cifs_fallocate,
>  };
> @@ -958,6 +1016,7 @@ const struct file_operations cifs_file_direct_ops = {
>  	.mmap = cifs_file_mmap,
>  	.splice_read = generic_file_splice_read,
>  	.unlocked_ioctl  = cifs_ioctl,
> +	.clone_file_range = cifs_clone_file_range,
>  	.llseek = cifs_llseek,
>  	.setlease = cifs_setlease,
>  	.fallocate = cifs_fallocate,
> @@ -974,6 +1033,7 @@ const struct file_operations cifs_file_nobrl_ops = {
>  	.splice_read = generic_file_splice_read,
>  	.llseek = cifs_llseek,
>  	.unlocked_ioctl	= cifs_ioctl,
> +	.clone_file_range = cifs_clone_file_range,
>  	.setlease = cifs_setlease,
>  	.fallocate = cifs_fallocate,
>  };
> @@ -989,6 +1049,7 @@ const struct file_operations cifs_file_strict_nobrl_ops = {
>  	.splice_read = generic_file_splice_read,
>  	.llseek = cifs_llseek,
>  	.unlocked_ioctl	= cifs_ioctl,
> +	.clone_file_range = cifs_clone_file_range,
>  	.setlease = cifs_setlease,
>  	.fallocate = cifs_fallocate,
>  };
> @@ -1004,6 +1065,7 @@ const struct file_operations cifs_file_direct_nobrl_ops = {
>  	.mmap = cifs_file_mmap,
>  	.splice_read = generic_file_splice_read,
>  	.unlocked_ioctl  = cifs_ioctl,
> +	.clone_file_range = cifs_clone_file_range,
>  	.llseek = cifs_llseek,
>  	.setlease = cifs_setlease,
>  	.fallocate = cifs_fallocate,
> @@ -1014,6 +1076,7 @@ const struct file_operations cifs_dir_ops = {
>  	.release = cifs_closedir,
>  	.read    = generic_read_dir,
>  	.unlocked_ioctl  = cifs_ioctl,
> +	.clone_file_range = cifs_clone_file_range,
>  	.llseek = generic_file_llseek,
>  };
>  
> diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
> index c3cc160..c399513 100644
> --- a/fs/cifs/cifsfs.h
> +++ b/fs/cifs/cifsfs.h
> @@ -131,7 +131,6 @@ extern int	cifs_setxattr(struct dentry *, const char *, const void *,
>  extern ssize_t	cifs_getxattr(struct dentry *, const char *, void *, size_t);
>  extern ssize_t	cifs_listxattr(struct dentry *, char *, size_t);
>  extern long cifs_ioctl(struct file *filep, unsigned int cmd, unsigned long arg);
> -
>  #ifdef CONFIG_CIFS_NFSD_EXPORT
>  extern const struct export_operations cifs_export_ops;
>  #endif /* CONFIG_CIFS_NFSD_EXPORT */
> diff --git a/fs/cifs/ioctl.c b/fs/cifs/ioctl.c
> index 35cf990..7a3b84e 100644
> --- a/fs/cifs/ioctl.c
> +++ b/fs/cifs/ioctl.c
> @@ -34,73 +34,36 @@
>  #include "cifs_ioctl.h"
>  #include <linux/btrfs.h>
>  
> -static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file,
> -			unsigned long srcfd, u64 off, u64 len, u64 destoff,
> -			bool dup_extents)
> +static int cifs_file_clone_range(unsigned int xid, struct file *src_file,
> +			  struct file *dst_file)
>  {
> -	int rc;
> -	struct cifsFileInfo *smb_file_target = dst_file->private_data;
> +	struct inode *src_inode = file_inode(src_file);
>  	struct inode *target_inode = file_inode(dst_file);
> -	struct cifs_tcon *target_tcon;
> -	struct fd src_file;
>  	struct cifsFileInfo *smb_file_src;
> -	struct inode *src_inode;
> +	struct cifsFileInfo *smb_file_target;
>  	struct cifs_tcon *src_tcon;
> +	struct cifs_tcon *target_tcon;
> +	int rc;
>  
>  	cifs_dbg(FYI, "ioctl clone range\n");
> -	/* the destination must be opened for writing */
> -	if (!(dst_file->f_mode & FMODE_WRITE)) {
> -		cifs_dbg(FYI, "file target not open for write\n");
> -		return -EINVAL;
> -	}
>  
> -	/* check if target volume is readonly and take reference */
> -	rc = mnt_want_write_file(dst_file);
> -	if (rc) {
> -		cifs_dbg(FYI, "mnt_want_write failed with rc %d\n", rc);
> -		return rc;
> -	}
> -
> -	src_file = fdget(srcfd);
> -	if (!src_file.file) {
> -		rc = -EBADF;
> -		goto out_drop_write;
> -	}
> -
> -	if (src_file.file->f_op->unlocked_ioctl != cifs_ioctl) {
> -		rc = -EBADF;
> -		cifs_dbg(VFS, "src file seems to be from a different filesystem type\n");
> -		goto out_fput;
> -	}
> -
> -	if ((!src_file.file->private_data) || (!dst_file->private_data)) {
> +	if (!src_file->private_data || !dst_file->private_data) {
>  		rc = -EBADF;
>  		cifs_dbg(VFS, "missing cifsFileInfo on copy range src file\n");
> -		goto out_fput;
> +		goto out;
>  	}
>  
>  	rc = -EXDEV;
>  	smb_file_target = dst_file->private_data;
> -	smb_file_src = src_file.file->private_data;
> +	smb_file_src = src_file->private_data;
>  	src_tcon = tlink_tcon(smb_file_src->tlink);
>  	target_tcon = tlink_tcon(smb_file_target->tlink);
>  
> -	/* check source and target on same server (or volume if dup_extents) */
> -	if (dup_extents && (src_tcon != target_tcon)) {
> -		cifs_dbg(VFS, "source and target of copy not on same share\n");
> -		goto out_fput;
> -	}
> -
> -	if (!dup_extents && (src_tcon->ses != target_tcon->ses)) {
> +	if (src_tcon->ses != target_tcon->ses) {
>  		cifs_dbg(VFS, "source and target of copy not on same server\n");
> -		goto out_fput;
> +		goto out;
>  	}
>  
> -	src_inode = file_inode(src_file.file);
> -	rc = -EINVAL;
> -	if (S_ISDIR(src_inode->i_mode))
> -		goto out_fput;
> -
>  	/*
>  	 * Note: cifs case is easier than btrfs since server responsible for
>  	 * checks for proper open modes and file type and if it wants
> @@ -108,34 +71,66 @@ static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file,
>  	 */
>  	lock_two_nondirectories(target_inode, src_inode);
>  
> -	/* determine range to clone */
> -	rc = -EINVAL;
> -	if (off + len > src_inode->i_size || off + len < off)
> -		goto out_unlock;
> -	if (len == 0)
> -		len = src_inode->i_size - off;
> -
>  	cifs_dbg(FYI, "about to flush pages\n");
>  	/* should we flush first and last page first */
> -	truncate_inode_pages_range(&target_inode->i_data, destoff,
> -				   PAGE_CACHE_ALIGN(destoff + len)-1);
> +	truncate_inode_pages(&target_inode->i_data, 0);
>  
> -	if (dup_extents && target_tcon->ses->server->ops->duplicate_extents)
> -		rc = target_tcon->ses->server->ops->duplicate_extents(xid,
> -			smb_file_src, smb_file_target, off, len, destoff);
> -	else if (!dup_extents && target_tcon->ses->server->ops->clone_range)
> +	if (target_tcon->ses->server->ops->clone_range)
>  		rc = target_tcon->ses->server->ops->clone_range(xid,
> -			smb_file_src, smb_file_target, off, len, destoff);
> +			smb_file_src, smb_file_target, 0, src_inode->i_size, 0);
>  	else
>  		rc = -EOPNOTSUPP;
>  
>  	/* force revalidate of size and timestamps of target file now
>  	   that target is updated on the server */
>  	CIFS_I(target_inode)->time = 0;
> -out_unlock:
>  	/* although unlocking in the reverse order from locking is not
>  	   strictly necessary here it is a little cleaner to be consistent */
>  	unlock_two_nondirectories(src_inode, target_inode);
> +out:
> +	return rc;
> +}
> +
> +static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file,
> +			unsigned long srcfd)
> +{
> +	int rc;
> +	struct fd src_file;
> +	struct inode *src_inode;
> +
> +	cifs_dbg(FYI, "ioctl clone range\n");
> +	/* the destination must be opened for writing */
> +	if (!(dst_file->f_mode & FMODE_WRITE)) {
> +		cifs_dbg(FYI, "file target not open for write\n");
> +		return -EINVAL;
> +	}
> +
> +	/* check if target volume is readonly and take reference */
> +	rc = mnt_want_write_file(dst_file);
> +	if (rc) {
> +		cifs_dbg(FYI, "mnt_want_write failed with rc %d\n", rc);
> +		return rc;
> +	}
> +
> +	src_file = fdget(srcfd);
> +	if (!src_file.file) {
> +		rc = -EBADF;
> +		goto out_drop_write;
> +	}
> +
> +	if (src_file.file->f_op->unlocked_ioctl != cifs_ioctl) {
> +		rc = -EBADF;
> +		cifs_dbg(VFS, "src file seems to be from a different filesystem type\n");
> +		goto out_fput;
> +	}
> +
> +	src_inode = file_inode(src_file.file);
> +	rc = -EINVAL;
> +	if (S_ISDIR(src_inode->i_mode))
> +		goto out_fput;
> +
> +	rc = cifs_file_clone_range(xid, src_file.file, dst_file);
> +
>  out_fput:
>  	fdput(src_file);
>  out_drop_write:
> @@ -256,10 +251,7 @@ long cifs_ioctl(struct file *filep, unsigned int command, unsigned long arg)
>  			}
>  			break;
>  		case CIFS_IOC_COPYCHUNK_FILE:
> -			rc = cifs_ioctl_clone(xid, filep, arg, 0, 0, 0, false);
> -			break;
> -		case BTRFS_IOC_CLONE:
> -			rc = cifs_ioctl_clone(xid, filep, arg, 0, 0, 0, true);
> +			rc = cifs_ioctl_clone(xid, filep, arg);
>  			break;
>  		case CIFS_IOC_SET_INTEGRITY:
>  			if (pSMBFile == NULL)
> diff --git a/fs/ioctl.c b/fs/ioctl.c
> index 5d01d26..84c6e79 100644
> --- a/fs/ioctl.c
> +++ b/fs/ioctl.c
> @@ -215,6 +215,29 @@ static int ioctl_fiemap(struct file *filp, unsigned long arg)
>  	return error;
>  }
>  
> +static long ioctl_file_clone(struct file *dst_file, unsigned long srcfd,
> +			     u64 off, u64 olen, u64 destoff)
> +{
> +	struct fd src_file = fdget(srcfd);
> +	int ret;
> +
> +	if (!src_file.file)
> +		return -EBADF;
> +	ret = vfs_clone_file_range(src_file.file, off, dst_file, destoff, olen);
> +	fdput(src_file);
> +	return ret;
> +}
> +
> +static long ioctl_file_clone_range(struct file *file, void __user *argp)
> +{
> +	struct file_clone_range args;
> +
> +	if (copy_from_user(&args, argp, sizeof(args)))
> +		return -EFAULT;
> +	return ioctl_file_clone(file, args.src_fd, args.src_offset,
> +				args.src_length, args.dest_offset);
> +}
> +
>  #ifdef CONFIG_BLOCK
>  
>  static inline sector_t logical_to_blk(struct inode *inode, loff_t offset)
> @@ -600,6 +623,12 @@ int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
>  	case FIGETBSZ:
>  		return put_user(inode->i_sb->s_blocksize, argp);
>  
> +	case FICLONE:
> +		return ioctl_file_clone(filp, arg, 0, 0, 0);
> +
> +	case FICLONERANGE:
> +		return ioctl_file_clone_range(filp, argp);
> +
>  	default:
>  		if (S_ISREG(inode->i_mode))
>  			error = file_ioctl(filp, cmd, arg);
> diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
> index db9b5fe..26f9a23 100644
> --- a/fs/nfs/nfs4file.c
> +++ b/fs/nfs/nfs4file.c
> @@ -195,65 +195,27 @@ static long nfs42_fallocate(struct file *filep, int mode, loff_t offset, loff_t
>  	return nfs42_proc_allocate(filep, offset, len);
>  }
>  
> -static noinline long
> -nfs42_ioctl_clone(struct file *dst_file, unsigned long srcfd,
> -		  u64 src_off, u64 dst_off, u64 count)
> +static int nfs42_clone_file_range(struct file *src_file, loff_t src_off,
> +		struct file *dst_file, loff_t dst_off, u64 count)
>  {
>  	struct inode *dst_inode = file_inode(dst_file);
>  	struct nfs_server *server = NFS_SERVER(dst_inode);
> -	struct fd src_file;
> -	struct inode *src_inode;
> +	struct inode *src_inode = file_inode(src_file);
>  	unsigned int bs = server->clone_blksize;
>  	bool same_inode = false;
>  	int ret;
>  
> -	/* dst file must be opened for writing */
> -	if (!(dst_file->f_mode & FMODE_WRITE))
> -		return -EINVAL;
> -
> -	ret = mnt_want_write_file(dst_file);
> -	if (ret)
> -		return ret;
> -
> -	src_file = fdget(srcfd);
> -	if (!src_file.file) {
> -		ret = -EBADF;
> -		goto out_drop_write;
> -	}
> -
> -	src_inode = file_inode(src_file.file);
> -
> -	if (src_inode == dst_inode)
> -		same_inode = true;
> -
> -	/* src file must be opened for reading */
> -	if (!(src_file.file->f_mode & FMODE_READ))
> -		goto out_fput;
> -
> -	/* src and dst must be regular files */
> -	ret = -EISDIR;
> -	if (!S_ISREG(src_inode->i_mode) || !S_ISREG(dst_inode->i_mode))
> -		goto out_fput;
> -
> -	ret = -EXDEV;
> -	if (src_file.file->f_path.mnt != dst_file->f_path.mnt ||
> -	    src_inode->i_sb != dst_inode->i_sb)
> -		goto out_fput;
> -
>  	/* check alignment w.r.t. clone_blksize */
>  	ret = -EINVAL;
>  	if (bs) {
>  		if (!IS_ALIGNED(src_off, bs) || !IS_ALIGNED(dst_off, bs))
> -			goto out_fput;
> +			goto out;
>  		if (!IS_ALIGNED(count, bs) && i_size_read(src_inode) != (src_off + count))
> -			goto out_fput;
> +			goto out;
>  	}
>  
> -	/* verify if ranges are overlapped within the same file */
> -	if (same_inode) {
> -		if (dst_off + count > src_off && dst_off < src_off + count)
> -			goto out_fput;
> -	}
> +	if (src_inode == dst_inode)
> +		same_inode = true;
>  
>  	/* XXX: do we lock at all? what if server needs CB_RECALL_LAYOUT? */
>  	if (same_inode) {
> @@ -275,7 +237,7 @@ nfs42_ioctl_clone(struct file *dst_file, unsigned long srcfd,
>  	if (ret)
>  		goto out_unlock;
>  
> -	ret = nfs42_proc_clone(src_file.file, dst_file, src_off, dst_off, count);
> +	ret = nfs42_proc_clone(src_file, dst_file, src_off, dst_off, count);
>  
>  	/* truncate inode page cache of the dst range so that future reads can fetch
>  	 * new data from server */
> @@ -292,37 +254,9 @@ out_unlock:
>  		mutex_unlock(&dst_inode->i_mutex);
>  		mutex_unlock(&src_inode->i_mutex);
>  	}
> -out_fput:
> -	fdput(src_file);
> -out_drop_write:
> -	mnt_drop_write_file(dst_file);
> +out:
>  	return ret;
>  }
> -
> -static long nfs42_ioctl_clone_range(struct file *dst_file, void __user *argp)
> -{
> -	struct btrfs_ioctl_clone_range_args args;
> -
> -	if (copy_from_user(&args, argp, sizeof(args)))
> -		return -EFAULT;
> -
> -	return nfs42_ioctl_clone(dst_file, args.src_fd, args.src_offset,
> -				 args.dest_offset, args.src_length);
> -}
> -
> -long nfs4_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> -{
> -	void __user *argp = (void __user *)arg;
> -
> -	switch (cmd) {
> -	case BTRFS_IOC_CLONE:
> -		return nfs42_ioctl_clone(file, arg, 0, 0, 0);
> -	case BTRFS_IOC_CLONE_RANGE:
> -		return nfs42_ioctl_clone_range(file, argp);
> -	}
> -
> -	return -ENOTTY;
> -}
>  #endif /* CONFIG_NFS_V4_2 */
>  
>  const struct file_operations nfs4_file_operations = {
> @@ -342,8 +276,7 @@ const struct file_operations nfs4_file_operations = {
>  #ifdef CONFIG_NFS_V4_2
>  	.llseek		= nfs4_file_llseek,
>  	.fallocate	= nfs42_fallocate,
> -	.unlocked_ioctl = nfs4_ioctl,
> -	.compat_ioctl	= nfs4_ioctl,
> +	.clone_file_range = nfs42_clone_file_range,
>  #else
>  	.llseek		= nfs_file_llseek,
>  #endif
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 6c1aa73..9e3dd8f 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1451,3 +1451,75 @@ out1:
>  out2:
>  	return ret;
>  }
> +
> +static int clone_verify_area(struct file *file, loff_t pos, u64 len, bool write)
> +{
> +	struct inode *inode = file_inode(file);
> +
> +	if (unlikely(pos < 0))
> +		return -EINVAL;
> +
> +	 if (unlikely((loff_t) (pos + len) < 0))
> +		return -EINVAL;
> +
> +	if (unlikely(inode->i_flctx && mandatory_lock(inode))) {
> +		loff_t end = len ? pos + len - 1 : OFFSET_MAX;
> +		int retval;
> +
> +		retval = locks_mandatory_area(file, pos, end,
> +				write ? F_WRLCK : F_RDLCK);
> +		if (retval < 0)
> +			return retval;
> +	}
> +
> +	return security_file_permission(file, write ? MAY_WRITE : MAY_READ);
> +}
> +
> +int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
> +		struct file *file_out, loff_t pos_out, u64 len)
> +{
> +	struct inode *inode_in = file_inode(file_in);
> +	struct inode *inode_out = file_inode(file_out);
> +	int ret;
> +
> +	if (inode_in->i_sb != inode_out->i_sb ||
> +	    file_in->f_path.mnt != file_out->f_path.mnt)
> +		return -EXDEV;
> +
> +	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
> +		return -EISDIR;
> +	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
> +		return -EOPNOTSUPP;
> +
> +	if (!(file_in->f_mode & FMODE_READ) ||
> +	    !(file_out->f_mode & FMODE_WRITE) ||
> +	    (file_out->f_flags & O_APPEND) ||
> +	    !file_in->f_op->clone_file_range)
> +		return -EBADF;
> +
> +	ret = clone_verify_area(file_in, pos_in, len, false);
> +	if (ret)
> +		return ret;
> +
> +	ret = clone_verify_area(file_out, pos_out, len, true);
> +	if (ret)
> +		return ret;
> +
> +	if (pos_in + len > i_size_read(inode_in))
> +		return -EINVAL;
> +
> +	ret = mnt_want_write_file(file_out);
> +	if (ret)
> +		return ret;
> +
> +	ret = file_in->f_op->clone_file_range(file_in, pos_in,
> +			file_out, pos_out, len);
> +	if (!ret) {
> +		fsnotify_access(file_in);
> +		fsnotify_modify(file_out);
> +	}
> +
> +	mnt_drop_write_file(file_out);
> +	return ret;
> +}
> +EXPORT_SYMBOL(vfs_clone_file_range);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index af559ac..59bf96d 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1629,7 +1629,10 @@ struct file_operations {
>  #ifndef CONFIG_MMU
>  	unsigned (*mmap_capabilities)(struct file *);
>  #endif
> -	ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, loff_t, size_t, unsigned int);
> +	ssize_t (*copy_file_range)(struct file *, loff_t, struct file *,
> +			loff_t, size_t, unsigned int);
> +	int (*clone_file_range)(struct file *, loff_t, struct file *, loff_t,
> +			u64);
>  };
>  
>  struct inode_operations {
> @@ -1683,6 +1686,8 @@ extern ssize_t vfs_writev(struct file *, const struct iovec __user *,
>  		unsigned long, loff_t *);
>  extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
>  				   loff_t, size_t, unsigned int);
> +extern int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
> +		struct file *file_out, loff_t pos_out, u64 len);
>  
>  struct super_operations {
>     	struct inode *(*alloc_inode)(struct super_block *sb);
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index f15d980..cd5db7f 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -39,6 +39,13 @@
>  #define RENAME_EXCHANGE		(1 << 1)	/* Exchange source and dest */
>  #define RENAME_WHITEOUT		(1 << 2)	/* Whiteout source */
>  
> +struct file_clone_range {
> +	__s64 src_fd;
> +	__u64 src_offset;
> +	__u64 src_length;
> +	__u64 dest_offset;
> +};
> +
>  struct fstrim_range {
>  	__u64 start;
>  	__u64 len;
> @@ -159,6 +166,8 @@ struct inodes_stat_t {
>  #define FIFREEZE	_IOWR('X', 119, int)	/* Freeze */
>  #define FITHAW		_IOWR('X', 120, int)	/* Thaw */
>  #define FITRIM		_IOWR('X', 121, struct fstrim_range)	/* Trim */
> +#define FICLONE		_IOW(0x94, 9, int)
> +#define FICLONERANGE	_IOW(0x94, 13, struct file_clone_range)
>  
>  #define	FS_IOC_GETFLAGS			_IOR('f', 1, long)
>  #define	FS_IOC_SETFLAGS			_IOW('f', 2, long)
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] vfs: pull btrfs clone API to vfs layer
@ 2015-12-09 20:40     ` Darrick J. Wong
  0 siblings, 0 replies; 27+ messages in thread
From: Darrick J. Wong @ 2015-12-09 20:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	tao.peng-7I+n7zu2hftEKMMhf/gKZA,
	jeff.layton-7I+n7zu2hftEKMMhf/gKZA,
	bfields-uC3wQj2KruNg9hUCZPvPmw,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA

On Thu, Dec 03, 2015 at 12:59:50PM +0100, Christoph Hellwig wrote:
> The btrfs clone ioctls are now adopted by other file systems, with NFS
> and CIFS already having support for them, and XFS being under active
> development.  To avoid growth of various slightly incompatible
> implementations, add one to the VFS.  Note that clones are different from
> file copies in several ways:
> 
>  - they are atomic vs other writers
>  - they support whole file clones
>  - they support 64-bit legth clones
>  - they do not allow partial success (aka short writes)
>  - clones are expected to be a fast metadata operation
> 
> Because of that it would be rather cumbersome to try to piggyback them on
> top of the recent clone_file_range infrastructure.  The converse isn't
> true and the clone_file_range system call could try clone file range as
> a first attempt to copy, something that further patches will enable.
> 
> Based on earlier work from Peng Tao.
> 
> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> ---
>  fs/btrfs/ctree.h        |   3 +-
>  fs/btrfs/file.c         |   1 +
>  fs/btrfs/ioctl.c        |  49 ++-----------------
>  fs/cifs/cifsfs.c        |  63 ++++++++++++++++++++++++
>  fs/cifs/cifsfs.h        |   1 -
>  fs/cifs/ioctl.c         | 126 +++++++++++++++++++++++-------------------------
>  fs/ioctl.c              |  29 +++++++++++

I tried this patch series on ppc64 (w/ 32-bit powerpc userland) and I think
it needs to fix up the compat ioctl to make the vfs call...

diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
index dcf2653..70d4b10 100644
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -1580,6 +1580,10 @@ COMPAT_SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd,
                goto out_fput;
 #endif
 
+       case FICLONE:
+       case FICLONERANGE:
+               goto do_ioctl;
+
        case FIBMAP:
        case FIGETBSZ:
        case FIONREAD:

--D

>  fs/nfs/nfs4file.c       |  87 ++++-----------------------------
>  fs/read_write.c         |  72 +++++++++++++++++++++++++++
>  include/linux/fs.h      |   7 ++-
>  include/uapi/linux/fs.h |   9 ++++
>  11 files changed, 254 insertions(+), 193 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index ede7277..dd4733f 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -4025,7 +4025,6 @@ void btrfs_get_block_group_info(struct list_head *groups_list,
>  void update_ioctl_balance_args(struct btrfs_fs_info *fs_info, int lock,
>  			       struct btrfs_ioctl_balance_args *bargs);
>  
> -
>  /* file.c */
>  int btrfs_auto_defrag_init(void);
>  void btrfs_auto_defrag_exit(void);
> @@ -4058,6 +4057,8 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
>  ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  			      struct file *file_out, loff_t pos_out,
>  			      size_t len, unsigned int flags);
> +int btrfs_clone_file_range(struct file *file_in, loff_t pos_in,
> +			   struct file *file_out, loff_t pos_out, u64 len);
>  
>  /* tree-defrag.c */
>  int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index e67fe6a..232e300 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2925,6 +2925,7 @@ const struct file_operations btrfs_file_operations = {
>  	.compat_ioctl	= btrfs_ioctl,
>  #endif
>  	.copy_file_range = btrfs_copy_file_range,
> +	.clone_file_range = btrfs_clone_file_range,
>  };
>  
>  void btrfs_auto_defrag_exit(void)
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 0f92735..85b1cae 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3906,49 +3906,10 @@ ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  	return ret;
>  }
>  
> -static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
> -				       u64 off, u64 olen, u64 destoff)
> +int btrfs_clone_file_range(struct file *src_file, loff_t off,
> +		struct file *dst_file, loff_t destoff, u64 len)
>  {
> -	struct fd src_file;
> -	int ret;
> -
> -	/* the destination must be opened for writing */
> -	if (!(file->f_mode & FMODE_WRITE) || (file->f_flags & O_APPEND))
> -		return -EINVAL;
> -
> -	ret = mnt_want_write_file(file);
> -	if (ret)
> -		return ret;
> -
> -	src_file = fdget(srcfd);
> -	if (!src_file.file) {
> -		ret = -EBADF;
> -		goto out_drop_write;
> -	}
> -
> -	/* the src must be open for reading */
> -	if (!(src_file.file->f_mode & FMODE_READ)) {
> -		ret = -EINVAL;
> -		goto out_fput;
> -	}
> -
> -	ret = btrfs_clone_files(file, src_file.file, off, olen, destoff);
> -
> -out_fput:
> -	fdput(src_file);
> -out_drop_write:
> -	mnt_drop_write_file(file);
> -	return ret;
> -}
> -
> -static long btrfs_ioctl_clone_range(struct file *file, void __user *argp)
> -{
> -	struct btrfs_ioctl_clone_range_args args;
> -
> -	if (copy_from_user(&args, argp, sizeof(args)))
> -		return -EFAULT;
> -	return btrfs_ioctl_clone(file, args.src_fd, args.src_offset,
> -				 args.src_length, args.dest_offset);
> +	return btrfs_clone_files(dst_file, src_file, off, len, destoff);
>  }
>  
>  /*
> @@ -5498,10 +5459,6 @@ long btrfs_ioctl(struct file *file, unsigned int
>  		return btrfs_ioctl_dev_info(root, argp);
>  	case BTRFS_IOC_BALANCE:
>  		return btrfs_ioctl_balance(file, NULL);
> -	case BTRFS_IOC_CLONE:
> -		return btrfs_ioctl_clone(file, arg, 0, 0, 0);
> -	case BTRFS_IOC_CLONE_RANGE:
> -		return btrfs_ioctl_clone_range(file, argp);
>  	case BTRFS_IOC_TRANS_START:
>  		return btrfs_ioctl_trans_start(file);
>  	case BTRFS_IOC_TRANS_END:
> diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> index cbc0f4b..e9b978f 100644
> --- a/fs/cifs/cifsfs.c
> +++ b/fs/cifs/cifsfs.c
> @@ -914,6 +914,61 @@ const struct inode_operations cifs_symlink_inode_ops = {
>  #endif
>  };
>  
> +static int cifs_clone_file_range(struct file *src_file, loff_t off,
> +		struct file *dst_file, loff_t destoff, u64 len)
> +{
> +	struct inode *src_inode = file_inode(src_file);
> +	struct inode *target_inode = file_inode(dst_file);
> +	struct cifsFileInfo *smb_file_src = src_file->private_data;
> +	struct cifsFileInfo *smb_file_target = dst_file->private_data;
> +	struct cifs_tcon *src_tcon = tlink_tcon(smb_file_src->tlink);
> +	struct cifs_tcon *target_tcon = tlink_tcon(smb_file_target->tlink);
> +	unsigned int xid;
> +	int rc;
> +
> +	cifs_dbg(FYI, "clone range\n");
> +
> +	xid = get_xid();
> +
> +	if (!src_file->private_data || !dst_file->private_data) {
> +		rc = -EBADF;
> +		cifs_dbg(VFS, "missing cifsFileInfo on copy range src file\n");
> +		goto out;
> +	}
> +
> +	/*
> +	 * Note: cifs case is easier than btrfs since server responsible for
> +	 * checks for proper open modes and file type and if it wants
> +	 * server could even support copy of range where source = target
> +	 */
> +	lock_two_nondirectories(target_inode, src_inode);
> +
> +	if (len == 0)
> +		len = src_inode->i_size - off;
> +
> +	cifs_dbg(FYI, "about to flush pages\n");
> +	/* should we flush first and last page first */
> +	truncate_inode_pages_range(&target_inode->i_data, destoff,
> +				   PAGE_CACHE_ALIGN(destoff + len)-1);
> +
> +	if (target_tcon->ses->server->ops->duplicate_extents)
> +		rc = target_tcon->ses->server->ops->duplicate_extents(xid,
> +			smb_file_src, smb_file_target, off, len, destoff);
> +	else
> +		rc = -EOPNOTSUPP;
> +
> +	/* force revalidate of size and timestamps of target file now
> +	   that target is updated on the server */
> +	CIFS_I(target_inode)->time = 0;
> +out_unlock:
> +	/* although unlocking in the reverse order from locking is not
> +	   strictly necessary here it is a little cleaner to be consistent */
> +	unlock_two_nondirectories(src_inode, target_inode);
> +out:
> +	free_xid(xid);
> +	return rc;
> +}
> +
>  const struct file_operations cifs_file_ops = {
>  	.read_iter = cifs_loose_read_iter,
>  	.write_iter = cifs_file_write_iter,
> @@ -926,6 +981,7 @@ const struct file_operations cifs_file_ops = {
>  	.splice_read = generic_file_splice_read,
>  	.llseek = cifs_llseek,
>  	.unlocked_ioctl	= cifs_ioctl,
> +	.clone_file_range = cifs_clone_file_range,
>  	.setlease = cifs_setlease,
>  	.fallocate = cifs_fallocate,
>  };
> @@ -942,6 +998,8 @@ const struct file_operations cifs_file_strict_ops = {
>  	.splice_read = generic_file_splice_read,
>  	.llseek = cifs_llseek,
>  	.unlocked_ioctl	= cifs_ioctl,
> +	.clone_file_range = cifs_clone_file_range,
> +	.clone_file_range = cifs_clone_file_range,
>  	.setlease = cifs_setlease,
>  	.fallocate = cifs_fallocate,
>  };
> @@ -958,6 +1016,7 @@ const struct file_operations cifs_file_direct_ops = {
>  	.mmap = cifs_file_mmap,
>  	.splice_read = generic_file_splice_read,
>  	.unlocked_ioctl  = cifs_ioctl,
> +	.clone_file_range = cifs_clone_file_range,
>  	.llseek = cifs_llseek,
>  	.setlease = cifs_setlease,
>  	.fallocate = cifs_fallocate,
> @@ -974,6 +1033,7 @@ const struct file_operations cifs_file_nobrl_ops = {
>  	.splice_read = generic_file_splice_read,
>  	.llseek = cifs_llseek,
>  	.unlocked_ioctl	= cifs_ioctl,
> +	.clone_file_range = cifs_clone_file_range,
>  	.setlease = cifs_setlease,
>  	.fallocate = cifs_fallocate,
>  };
> @@ -989,6 +1049,7 @@ const struct file_operations cifs_file_strict_nobrl_ops = {
>  	.splice_read = generic_file_splice_read,
>  	.llseek = cifs_llseek,
>  	.unlocked_ioctl	= cifs_ioctl,
> +	.clone_file_range = cifs_clone_file_range,
>  	.setlease = cifs_setlease,
>  	.fallocate = cifs_fallocate,
>  };
> @@ -1004,6 +1065,7 @@ const struct file_operations cifs_file_direct_nobrl_ops = {
>  	.mmap = cifs_file_mmap,
>  	.splice_read = generic_file_splice_read,
>  	.unlocked_ioctl  = cifs_ioctl,
> +	.clone_file_range = cifs_clone_file_range,
>  	.llseek = cifs_llseek,
>  	.setlease = cifs_setlease,
>  	.fallocate = cifs_fallocate,
> @@ -1014,6 +1076,7 @@ const struct file_operations cifs_dir_ops = {
>  	.release = cifs_closedir,
>  	.read    = generic_read_dir,
>  	.unlocked_ioctl  = cifs_ioctl,
> +	.clone_file_range = cifs_clone_file_range,
>  	.llseek = generic_file_llseek,
>  };
>  
> diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
> index c3cc160..c399513 100644
> --- a/fs/cifs/cifsfs.h
> +++ b/fs/cifs/cifsfs.h
> @@ -131,7 +131,6 @@ extern int	cifs_setxattr(struct dentry *, const char *, const void *,
>  extern ssize_t	cifs_getxattr(struct dentry *, const char *, void *, size_t);
>  extern ssize_t	cifs_listxattr(struct dentry *, char *, size_t);
>  extern long cifs_ioctl(struct file *filep, unsigned int cmd, unsigned long arg);
> -
>  #ifdef CONFIG_CIFS_NFSD_EXPORT
>  extern const struct export_operations cifs_export_ops;
>  #endif /* CONFIG_CIFS_NFSD_EXPORT */
> diff --git a/fs/cifs/ioctl.c b/fs/cifs/ioctl.c
> index 35cf990..7a3b84e 100644
> --- a/fs/cifs/ioctl.c
> +++ b/fs/cifs/ioctl.c
> @@ -34,73 +34,36 @@
>  #include "cifs_ioctl.h"
>  #include <linux/btrfs.h>
>  
> -static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file,
> -			unsigned long srcfd, u64 off, u64 len, u64 destoff,
> -			bool dup_extents)
> +static int cifs_file_clone_range(unsigned int xid, struct file *src_file,
> +			  struct file *dst_file)
>  {
> -	int rc;
> -	struct cifsFileInfo *smb_file_target = dst_file->private_data;
> +	struct inode *src_inode = file_inode(src_file);
>  	struct inode *target_inode = file_inode(dst_file);
> -	struct cifs_tcon *target_tcon;
> -	struct fd src_file;
>  	struct cifsFileInfo *smb_file_src;
> -	struct inode *src_inode;
> +	struct cifsFileInfo *smb_file_target;
>  	struct cifs_tcon *src_tcon;
> +	struct cifs_tcon *target_tcon;
> +	int rc;
>  
>  	cifs_dbg(FYI, "ioctl clone range\n");
> -	/* the destination must be opened for writing */
> -	if (!(dst_file->f_mode & FMODE_WRITE)) {
> -		cifs_dbg(FYI, "file target not open for write\n");
> -		return -EINVAL;
> -	}
>  
> -	/* check if target volume is readonly and take reference */
> -	rc = mnt_want_write_file(dst_file);
> -	if (rc) {
> -		cifs_dbg(FYI, "mnt_want_write failed with rc %d\n", rc);
> -		return rc;
> -	}
> -
> -	src_file = fdget(srcfd);
> -	if (!src_file.file) {
> -		rc = -EBADF;
> -		goto out_drop_write;
> -	}
> -
> -	if (src_file.file->f_op->unlocked_ioctl != cifs_ioctl) {
> -		rc = -EBADF;
> -		cifs_dbg(VFS, "src file seems to be from a different filesystem type\n");
> -		goto out_fput;
> -	}
> -
> -	if ((!src_file.file->private_data) || (!dst_file->private_data)) {
> +	if (!src_file->private_data || !dst_file->private_data) {
>  		rc = -EBADF;
>  		cifs_dbg(VFS, "missing cifsFileInfo on copy range src file\n");
> -		goto out_fput;
> +		goto out;
>  	}
>  
>  	rc = -EXDEV;
>  	smb_file_target = dst_file->private_data;
> -	smb_file_src = src_file.file->private_data;
> +	smb_file_src = src_file->private_data;
>  	src_tcon = tlink_tcon(smb_file_src->tlink);
>  	target_tcon = tlink_tcon(smb_file_target->tlink);
>  
> -	/* check source and target on same server (or volume if dup_extents) */
> -	if (dup_extents && (src_tcon != target_tcon)) {
> -		cifs_dbg(VFS, "source and target of copy not on same share\n");
> -		goto out_fput;
> -	}
> -
> -	if (!dup_extents && (src_tcon->ses != target_tcon->ses)) {
> +	if (src_tcon->ses != target_tcon->ses) {
>  		cifs_dbg(VFS, "source and target of copy not on same server\n");
> -		goto out_fput;
> +		goto out;
>  	}
>  
> -	src_inode = file_inode(src_file.file);
> -	rc = -EINVAL;
> -	if (S_ISDIR(src_inode->i_mode))
> -		goto out_fput;
> -
>  	/*
>  	 * Note: cifs case is easier than btrfs since server responsible for
>  	 * checks for proper open modes and file type and if it wants
> @@ -108,34 +71,66 @@ static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file,
>  	 */
>  	lock_two_nondirectories(target_inode, src_inode);
>  
> -	/* determine range to clone */
> -	rc = -EINVAL;
> -	if (off + len > src_inode->i_size || off + len < off)
> -		goto out_unlock;
> -	if (len == 0)
> -		len = src_inode->i_size - off;
> -
>  	cifs_dbg(FYI, "about to flush pages\n");
>  	/* should we flush first and last page first */
> -	truncate_inode_pages_range(&target_inode->i_data, destoff,
> -				   PAGE_CACHE_ALIGN(destoff + len)-1);
> +	truncate_inode_pages(&target_inode->i_data, 0);
>  
> -	if (dup_extents && target_tcon->ses->server->ops->duplicate_extents)
> -		rc = target_tcon->ses->server->ops->duplicate_extents(xid,
> -			smb_file_src, smb_file_target, off, len, destoff);
> -	else if (!dup_extents && target_tcon->ses->server->ops->clone_range)
> +	if (target_tcon->ses->server->ops->clone_range)
>  		rc = target_tcon->ses->server->ops->clone_range(xid,
> -			smb_file_src, smb_file_target, off, len, destoff);
> +			smb_file_src, smb_file_target, 0, src_inode->i_size, 0);
>  	else
>  		rc = -EOPNOTSUPP;
>  
>  	/* force revalidate of size and timestamps of target file now
>  	   that target is updated on the server */
>  	CIFS_I(target_inode)->time = 0;
> -out_unlock:
>  	/* although unlocking in the reverse order from locking is not
>  	   strictly necessary here it is a little cleaner to be consistent */
>  	unlock_two_nondirectories(src_inode, target_inode);
> +out:
> +	return rc;
> +}
> +
> +static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file,
> +			unsigned long srcfd)
> +{
> +	int rc;
> +	struct fd src_file;
> +	struct inode *src_inode;
> +
> +	cifs_dbg(FYI, "ioctl clone range\n");
> +	/* the destination must be opened for writing */
> +	if (!(dst_file->f_mode & FMODE_WRITE)) {
> +		cifs_dbg(FYI, "file target not open for write\n");
> +		return -EINVAL;
> +	}
> +
> +	/* check if target volume is readonly and take reference */
> +	rc = mnt_want_write_file(dst_file);
> +	if (rc) {
> +		cifs_dbg(FYI, "mnt_want_write failed with rc %d\n", rc);
> +		return rc;
> +	}
> +
> +	src_file = fdget(srcfd);
> +	if (!src_file.file) {
> +		rc = -EBADF;
> +		goto out_drop_write;
> +	}
> +
> +	if (src_file.file->f_op->unlocked_ioctl != cifs_ioctl) {
> +		rc = -EBADF;
> +		cifs_dbg(VFS, "src file seems to be from a different filesystem type\n");
> +		goto out_fput;
> +	}
> +
> +	src_inode = file_inode(src_file.file);
> +	rc = -EINVAL;
> +	if (S_ISDIR(src_inode->i_mode))
> +		goto out_fput;
> +
> +	rc = cifs_file_clone_range(xid, src_file.file, dst_file);
> +
>  out_fput:
>  	fdput(src_file);
>  out_drop_write:
> @@ -256,10 +251,7 @@ long cifs_ioctl(struct file *filep, unsigned int command, unsigned long arg)
>  			}
>  			break;
>  		case CIFS_IOC_COPYCHUNK_FILE:
> -			rc = cifs_ioctl_clone(xid, filep, arg, 0, 0, 0, false);
> -			break;
> -		case BTRFS_IOC_CLONE:
> -			rc = cifs_ioctl_clone(xid, filep, arg, 0, 0, 0, true);
> +			rc = cifs_ioctl_clone(xid, filep, arg);
>  			break;
>  		case CIFS_IOC_SET_INTEGRITY:
>  			if (pSMBFile == NULL)
> diff --git a/fs/ioctl.c b/fs/ioctl.c
> index 5d01d26..84c6e79 100644
> --- a/fs/ioctl.c
> +++ b/fs/ioctl.c
> @@ -215,6 +215,29 @@ static int ioctl_fiemap(struct file *filp, unsigned long arg)
>  	return error;
>  }
>  
> +static long ioctl_file_clone(struct file *dst_file, unsigned long srcfd,
> +			     u64 off, u64 olen, u64 destoff)
> +{
> +	struct fd src_file = fdget(srcfd);
> +	int ret;
> +
> +	if (!src_file.file)
> +		return -EBADF;
> +	ret = vfs_clone_file_range(src_file.file, off, dst_file, destoff, olen);
> +	fdput(src_file);
> +	return ret;
> +}
> +
> +static long ioctl_file_clone_range(struct file *file, void __user *argp)
> +{
> +	struct file_clone_range args;
> +
> +	if (copy_from_user(&args, argp, sizeof(args)))
> +		return -EFAULT;
> +	return ioctl_file_clone(file, args.src_fd, args.src_offset,
> +				args.src_length, args.dest_offset);
> +}
> +
>  #ifdef CONFIG_BLOCK
>  
>  static inline sector_t logical_to_blk(struct inode *inode, loff_t offset)
> @@ -600,6 +623,12 @@ int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
>  	case FIGETBSZ:
>  		return put_user(inode->i_sb->s_blocksize, argp);
>  
> +	case FICLONE:
> +		return ioctl_file_clone(filp, arg, 0, 0, 0);
> +
> +	case FICLONERANGE:
> +		return ioctl_file_clone_range(filp, argp);
> +
>  	default:
>  		if (S_ISREG(inode->i_mode))
>  			error = file_ioctl(filp, cmd, arg);
> diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
> index db9b5fe..26f9a23 100644
> --- a/fs/nfs/nfs4file.c
> +++ b/fs/nfs/nfs4file.c
> @@ -195,65 +195,27 @@ static long nfs42_fallocate(struct file *filep, int mode, loff_t offset, loff_t
>  	return nfs42_proc_allocate(filep, offset, len);
>  }
>  
> -static noinline long
> -nfs42_ioctl_clone(struct file *dst_file, unsigned long srcfd,
> -		  u64 src_off, u64 dst_off, u64 count)
> +static int nfs42_clone_file_range(struct file *src_file, loff_t src_off,
> +		struct file *dst_file, loff_t dst_off, u64 count)
>  {
>  	struct inode *dst_inode = file_inode(dst_file);
>  	struct nfs_server *server = NFS_SERVER(dst_inode);
> -	struct fd src_file;
> -	struct inode *src_inode;
> +	struct inode *src_inode = file_inode(src_file);
>  	unsigned int bs = server->clone_blksize;
>  	bool same_inode = false;
>  	int ret;
>  
> -	/* dst file must be opened for writing */
> -	if (!(dst_file->f_mode & FMODE_WRITE))
> -		return -EINVAL;
> -
> -	ret = mnt_want_write_file(dst_file);
> -	if (ret)
> -		return ret;
> -
> -	src_file = fdget(srcfd);
> -	if (!src_file.file) {
> -		ret = -EBADF;
> -		goto out_drop_write;
> -	}
> -
> -	src_inode = file_inode(src_file.file);
> -
> -	if (src_inode == dst_inode)
> -		same_inode = true;
> -
> -	/* src file must be opened for reading */
> -	if (!(src_file.file->f_mode & FMODE_READ))
> -		goto out_fput;
> -
> -	/* src and dst must be regular files */
> -	ret = -EISDIR;
> -	if (!S_ISREG(src_inode->i_mode) || !S_ISREG(dst_inode->i_mode))
> -		goto out_fput;
> -
> -	ret = -EXDEV;
> -	if (src_file.file->f_path.mnt != dst_file->f_path.mnt ||
> -	    src_inode->i_sb != dst_inode->i_sb)
> -		goto out_fput;
> -
>  	/* check alignment w.r.t. clone_blksize */
>  	ret = -EINVAL;
>  	if (bs) {
>  		if (!IS_ALIGNED(src_off, bs) || !IS_ALIGNED(dst_off, bs))
> -			goto out_fput;
> +			goto out;
>  		if (!IS_ALIGNED(count, bs) && i_size_read(src_inode) != (src_off + count))
> -			goto out_fput;
> +			goto out;
>  	}
>  
> -	/* verify if ranges are overlapped within the same file */
> -	if (same_inode) {
> -		if (dst_off + count > src_off && dst_off < src_off + count)
> -			goto out_fput;
> -	}
> +	if (src_inode == dst_inode)
> +		same_inode = true;
>  
>  	/* XXX: do we lock at all? what if server needs CB_RECALL_LAYOUT? */
>  	if (same_inode) {
> @@ -275,7 +237,7 @@ nfs42_ioctl_clone(struct file *dst_file, unsigned long srcfd,
>  	if (ret)
>  		goto out_unlock;
>  
> -	ret = nfs42_proc_clone(src_file.file, dst_file, src_off, dst_off, count);
> +	ret = nfs42_proc_clone(src_file, dst_file, src_off, dst_off, count);
>  
>  	/* truncate inode page cache of the dst range so that future reads can fetch
>  	 * new data from server */
> @@ -292,37 +254,9 @@ out_unlock:
>  		mutex_unlock(&dst_inode->i_mutex);
>  		mutex_unlock(&src_inode->i_mutex);
>  	}
> -out_fput:
> -	fdput(src_file);
> -out_drop_write:
> -	mnt_drop_write_file(dst_file);
> +out:
>  	return ret;
>  }
> -
> -static long nfs42_ioctl_clone_range(struct file *dst_file, void __user *argp)
> -{
> -	struct btrfs_ioctl_clone_range_args args;
> -
> -	if (copy_from_user(&args, argp, sizeof(args)))
> -		return -EFAULT;
> -
> -	return nfs42_ioctl_clone(dst_file, args.src_fd, args.src_offset,
> -				 args.dest_offset, args.src_length);
> -}
> -
> -long nfs4_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> -{
> -	void __user *argp = (void __user *)arg;
> -
> -	switch (cmd) {
> -	case BTRFS_IOC_CLONE:
> -		return nfs42_ioctl_clone(file, arg, 0, 0, 0);
> -	case BTRFS_IOC_CLONE_RANGE:
> -		return nfs42_ioctl_clone_range(file, argp);
> -	}
> -
> -	return -ENOTTY;
> -}
>  #endif /* CONFIG_NFS_V4_2 */
>  
>  const struct file_operations nfs4_file_operations = {
> @@ -342,8 +276,7 @@ const struct file_operations nfs4_file_operations = {
>  #ifdef CONFIG_NFS_V4_2
>  	.llseek		= nfs4_file_llseek,
>  	.fallocate	= nfs42_fallocate,
> -	.unlocked_ioctl = nfs4_ioctl,
> -	.compat_ioctl	= nfs4_ioctl,
> +	.clone_file_range = nfs42_clone_file_range,
>  #else
>  	.llseek		= nfs_file_llseek,
>  #endif
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 6c1aa73..9e3dd8f 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1451,3 +1451,75 @@ out1:
>  out2:
>  	return ret;
>  }
> +
> +static int clone_verify_area(struct file *file, loff_t pos, u64 len, bool write)
> +{
> +	struct inode *inode = file_inode(file);
> +
> +	if (unlikely(pos < 0))
> +		return -EINVAL;
> +
> +	 if (unlikely((loff_t) (pos + len) < 0))
> +		return -EINVAL;
> +
> +	if (unlikely(inode->i_flctx && mandatory_lock(inode))) {
> +		loff_t end = len ? pos + len - 1 : OFFSET_MAX;
> +		int retval;
> +
> +		retval = locks_mandatory_area(file, pos, end,
> +				write ? F_WRLCK : F_RDLCK);
> +		if (retval < 0)
> +			return retval;
> +	}
> +
> +	return security_file_permission(file, write ? MAY_WRITE : MAY_READ);
> +}
> +
> +int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
> +		struct file *file_out, loff_t pos_out, u64 len)
> +{
> +	struct inode *inode_in = file_inode(file_in);
> +	struct inode *inode_out = file_inode(file_out);
> +	int ret;
> +
> +	if (inode_in->i_sb != inode_out->i_sb ||
> +	    file_in->f_path.mnt != file_out->f_path.mnt)
> +		return -EXDEV;
> +
> +	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
> +		return -EISDIR;
> +	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
> +		return -EOPNOTSUPP;
> +
> +	if (!(file_in->f_mode & FMODE_READ) ||
> +	    !(file_out->f_mode & FMODE_WRITE) ||
> +	    (file_out->f_flags & O_APPEND) ||
> +	    !file_in->f_op->clone_file_range)
> +		return -EBADF;
> +
> +	ret = clone_verify_area(file_in, pos_in, len, false);
> +	if (ret)
> +		return ret;
> +
> +	ret = clone_verify_area(file_out, pos_out, len, true);
> +	if (ret)
> +		return ret;
> +
> +	if (pos_in + len > i_size_read(inode_in))
> +		return -EINVAL;
> +
> +	ret = mnt_want_write_file(file_out);
> +	if (ret)
> +		return ret;
> +
> +	ret = file_in->f_op->clone_file_range(file_in, pos_in,
> +			file_out, pos_out, len);
> +	if (!ret) {
> +		fsnotify_access(file_in);
> +		fsnotify_modify(file_out);
> +	}
> +
> +	mnt_drop_write_file(file_out);
> +	return ret;
> +}
> +EXPORT_SYMBOL(vfs_clone_file_range);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index af559ac..59bf96d 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1629,7 +1629,10 @@ struct file_operations {
>  #ifndef CONFIG_MMU
>  	unsigned (*mmap_capabilities)(struct file *);
>  #endif
> -	ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, loff_t, size_t, unsigned int);
> +	ssize_t (*copy_file_range)(struct file *, loff_t, struct file *,
> +			loff_t, size_t, unsigned int);
> +	int (*clone_file_range)(struct file *, loff_t, struct file *, loff_t,
> +			u64);
>  };
>  
>  struct inode_operations {
> @@ -1683,6 +1686,8 @@ extern ssize_t vfs_writev(struct file *, const struct iovec __user *,
>  		unsigned long, loff_t *);
>  extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
>  				   loff_t, size_t, unsigned int);
> +extern int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
> +		struct file *file_out, loff_t pos_out, u64 len);
>  
>  struct super_operations {
>     	struct inode *(*alloc_inode)(struct super_block *sb);
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index f15d980..cd5db7f 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -39,6 +39,13 @@
>  #define RENAME_EXCHANGE		(1 << 1)	/* Exchange source and dest */
>  #define RENAME_WHITEOUT		(1 << 2)	/* Whiteout source */
>  
> +struct file_clone_range {
> +	__s64 src_fd;
> +	__u64 src_offset;
> +	__u64 src_length;
> +	__u64 dest_offset;
> +};
> +
>  struct fstrim_range {
>  	__u64 start;
>  	__u64 len;
> @@ -159,6 +166,8 @@ struct inodes_stat_t {
>  #define FIFREEZE	_IOWR('X', 119, int)	/* Freeze */
>  #define FITHAW		_IOWR('X', 120, int)	/* Thaw */
>  #define FITRIM		_IOWR('X', 121, struct fstrim_range)	/* Trim */
> +#define FICLONE		_IOW(0x94, 9, int)
> +#define FICLONERANGE	_IOW(0x94, 13, struct file_clone_range)
>  
>  #define	FS_IOC_GETFLAGS			_IOR('f', 1, long)
>  #define	FS_IOC_SETFLAGS			_IOW('f', 2, long)
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] vfs: pull btrfs clone API to vfs layer
@ 2015-12-14 16:34       ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2015-12-14 16:34 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, viro, tao.peng, jeff.layton, bfields,
	linux-fsdevel, linux-btrfs, linux-nfs, linux-cifs

On Wed, Dec 09, 2015 at 12:40:33PM -0800, Darrick J. Wong wrote:
> I tried this patch series on ppc64 (w/ 32-bit powerpc userland) and I think
> it needs to fix up the compat ioctl to make the vfs call...

Might need a proper signoff for Al, unless he wants to directly fold it..

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] vfs: pull btrfs clone API to vfs layer
@ 2015-12-14 16:34       ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2015-12-14 16:34 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	tao.peng-7I+n7zu2hftEKMMhf/gKZA,
	jeff.layton-7I+n7zu2hftEKMMhf/gKZA,
	bfields-uC3wQj2KruNg9hUCZPvPmw,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA

On Wed, Dec 09, 2015 at 12:40:33PM -0800, Darrick J. Wong wrote:
> I tried this patch series on ppc64 (w/ 32-bit powerpc userland) and I think
> it needs to fix up the compat ioctl to make the vfs call...

Might need a proper signoff for Al, unless he wants to directly fold it..

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 5/4] vfs: return EINVAL for unsupported file types in clone
@ 2015-12-14 16:34       ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2015-12-14 16:34 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, viro, tao.peng, jeff.layton, bfields,
	linux-fsdevel, linux-btrfs, linux-nfs, linux-cifs

Signed-off-by: Christoph Hellwig <hch@lst.de>

diff --git a/fs/read_write.c b/fs/read_write.c
index 1f0d3f1..6268ebc 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1528,7 +1528,7 @@ int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
 	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
 		return -EISDIR;
 	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
-		return -EOPNOTSUPP;
+		return -EINVAL;
 
 	if (!(file_in->f_mode & FMODE_READ) ||
 	    !(file_out->f_mode & FMODE_WRITE) ||

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 5/4] vfs: return EINVAL for unsupported file types in clone
@ 2015-12-14 16:34       ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2015-12-14 16:34 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	tao.peng-7I+n7zu2hftEKMMhf/gKZA,
	jeff.layton-7I+n7zu2hftEKMMhf/gKZA,
	bfields-uC3wQj2KruNg9hUCZPvPmw,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>

diff --git a/fs/read_write.c b/fs/read_write.c
index 1f0d3f1..6268ebc 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1528,7 +1528,7 @@ int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
 	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
 		return -EISDIR;
 	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
-		return -EOPNOTSUPP;
+		return -EINVAL;
 
 	if (!(file_in->f_mode & FMODE_READ) ||
 	    !(file_out->f_mode & FMODE_WRITE) ||
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] vfs: pull btrfs clone API to vfs layer
@ 2015-12-14 17:08       ` Darrick J. Wong
  0 siblings, 0 replies; 27+ messages in thread
From: Darrick J. Wong @ 2015-12-14 17:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, tao.peng, jeff.layton, bfields, linux-fsdevel, linux-btrfs,
	linux-nfs, linux-cifs

On Wed, Dec 09, 2015 at 12:40:33PM -0800, Darrick J. Wong wrote:
> On Thu, Dec 03, 2015 at 12:59:50PM +0100, Christoph Hellwig wrote:
> > The btrfs clone ioctls are now adopted by other file systems, with NFS
> > and CIFS already having support for them, and XFS being under active
> > development.  To avoid growth of various slightly incompatible
> > implementations, add one to the VFS.  Note that clones are different from
> > file copies in several ways:
> > 
> >  - they are atomic vs other writers
> >  - they support whole file clones
> >  - they support 64-bit legth clones
> >  - they do not allow partial success (aka short writes)
> >  - clones are expected to be a fast metadata operation
> > 
> > Because of that it would be rather cumbersome to try to piggyback them on
> > top of the recent clone_file_range infrastructure.  The converse isn't
> > true and the clone_file_range system call could try clone file range as
> > a first attempt to copy, something that further patches will enable.
> > 
> > Based on earlier work from Peng Tao.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/btrfs/ctree.h        |   3 +-
> >  fs/btrfs/file.c         |   1 +
> >  fs/btrfs/ioctl.c        |  49 ++-----------------
> >  fs/cifs/cifsfs.c        |  63 ++++++++++++++++++++++++
> >  fs/cifs/cifsfs.h        |   1 -
> >  fs/cifs/ioctl.c         | 126 +++++++++++++++++++++++-------------------------
> >  fs/ioctl.c              |  29 +++++++++++
> 
> I tried this patch series on ppc64 (w/ 32-bit powerpc userland) and I think
> it needs to fix up the compat ioctl to make the vfs call...

Bah, forgot to add:
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

(Feel free to fold this three line chunk into the original patch...)

--D

> diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
> index dcf2653..70d4b10 100644
> --- a/fs/compat_ioctl.c
> +++ b/fs/compat_ioctl.c
> @@ -1580,6 +1580,10 @@ COMPAT_SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd,
>                 goto out_fput;
>  #endif
>  
> +       case FICLONE:
> +       case FICLONERANGE:
> +               goto do_ioctl;
> +
>         case FIBMAP:
>         case FIGETBSZ:
>         case FIONREAD:
> 
> --D
> 
> >  fs/nfs/nfs4file.c       |  87 ++++-----------------------------
> >  fs/read_write.c         |  72 +++++++++++++++++++++++++++
> >  include/linux/fs.h      |   7 ++-
> >  include/uapi/linux/fs.h |   9 ++++
> >  11 files changed, 254 insertions(+), 193 deletions(-)
> > 
> > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > index ede7277..dd4733f 100644
> > --- a/fs/btrfs/ctree.h
> > +++ b/fs/btrfs/ctree.h
> > @@ -4025,7 +4025,6 @@ void btrfs_get_block_group_info(struct list_head *groups_list,
> >  void update_ioctl_balance_args(struct btrfs_fs_info *fs_info, int lock,
> >  			       struct btrfs_ioctl_balance_args *bargs);
> >  
> > -
> >  /* file.c */
> >  int btrfs_auto_defrag_init(void);
> >  void btrfs_auto_defrag_exit(void);
> > @@ -4058,6 +4057,8 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
> >  ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >  			      struct file *file_out, loff_t pos_out,
> >  			      size_t len, unsigned int flags);
> > +int btrfs_clone_file_range(struct file *file_in, loff_t pos_in,
> > +			   struct file *file_out, loff_t pos_out, u64 len);
> >  
> >  /* tree-defrag.c */
> >  int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
> > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> > index e67fe6a..232e300 100644
> > --- a/fs/btrfs/file.c
> > +++ b/fs/btrfs/file.c
> > @@ -2925,6 +2925,7 @@ const struct file_operations btrfs_file_operations = {
> >  	.compat_ioctl	= btrfs_ioctl,
> >  #endif
> >  	.copy_file_range = btrfs_copy_file_range,
> > +	.clone_file_range = btrfs_clone_file_range,
> >  };
> >  
> >  void btrfs_auto_defrag_exit(void)
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index 0f92735..85b1cae 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -3906,49 +3906,10 @@ ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >  	return ret;
> >  }
> >  
> > -static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
> > -				       u64 off, u64 olen, u64 destoff)
> > +int btrfs_clone_file_range(struct file *src_file, loff_t off,
> > +		struct file *dst_file, loff_t destoff, u64 len)
> >  {
> > -	struct fd src_file;
> > -	int ret;
> > -
> > -	/* the destination must be opened for writing */
> > -	if (!(file->f_mode & FMODE_WRITE) || (file->f_flags & O_APPEND))
> > -		return -EINVAL;
> > -
> > -	ret = mnt_want_write_file(file);
> > -	if (ret)
> > -		return ret;
> > -
> > -	src_file = fdget(srcfd);
> > -	if (!src_file.file) {
> > -		ret = -EBADF;
> > -		goto out_drop_write;
> > -	}
> > -
> > -	/* the src must be open for reading */
> > -	if (!(src_file.file->f_mode & FMODE_READ)) {
> > -		ret = -EINVAL;
> > -		goto out_fput;
> > -	}
> > -
> > -	ret = btrfs_clone_files(file, src_file.file, off, olen, destoff);
> > -
> > -out_fput:
> > -	fdput(src_file);
> > -out_drop_write:
> > -	mnt_drop_write_file(file);
> > -	return ret;
> > -}
> > -
> > -static long btrfs_ioctl_clone_range(struct file *file, void __user *argp)
> > -{
> > -	struct btrfs_ioctl_clone_range_args args;
> > -
> > -	if (copy_from_user(&args, argp, sizeof(args)))
> > -		return -EFAULT;
> > -	return btrfs_ioctl_clone(file, args.src_fd, args.src_offset,
> > -				 args.src_length, args.dest_offset);
> > +	return btrfs_clone_files(dst_file, src_file, off, len, destoff);
> >  }
> >  
> >  /*
> > @@ -5498,10 +5459,6 @@ long btrfs_ioctl(struct file *file, unsigned int
> >  		return btrfs_ioctl_dev_info(root, argp);
> >  	case BTRFS_IOC_BALANCE:
> >  		return btrfs_ioctl_balance(file, NULL);
> > -	case BTRFS_IOC_CLONE:
> > -		return btrfs_ioctl_clone(file, arg, 0, 0, 0);
> > -	case BTRFS_IOC_CLONE_RANGE:
> > -		return btrfs_ioctl_clone_range(file, argp);
> >  	case BTRFS_IOC_TRANS_START:
> >  		return btrfs_ioctl_trans_start(file);
> >  	case BTRFS_IOC_TRANS_END:
> > diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> > index cbc0f4b..e9b978f 100644
> > --- a/fs/cifs/cifsfs.c
> > +++ b/fs/cifs/cifsfs.c
> > @@ -914,6 +914,61 @@ const struct inode_operations cifs_symlink_inode_ops = {
> >  #endif
> >  };
> >  
> > +static int cifs_clone_file_range(struct file *src_file, loff_t off,
> > +		struct file *dst_file, loff_t destoff, u64 len)
> > +{
> > +	struct inode *src_inode = file_inode(src_file);
> > +	struct inode *target_inode = file_inode(dst_file);
> > +	struct cifsFileInfo *smb_file_src = src_file->private_data;
> > +	struct cifsFileInfo *smb_file_target = dst_file->private_data;
> > +	struct cifs_tcon *src_tcon = tlink_tcon(smb_file_src->tlink);
> > +	struct cifs_tcon *target_tcon = tlink_tcon(smb_file_target->tlink);
> > +	unsigned int xid;
> > +	int rc;
> > +
> > +	cifs_dbg(FYI, "clone range\n");
> > +
> > +	xid = get_xid();
> > +
> > +	if (!src_file->private_data || !dst_file->private_data) {
> > +		rc = -EBADF;
> > +		cifs_dbg(VFS, "missing cifsFileInfo on copy range src file\n");
> > +		goto out;
> > +	}
> > +
> > +	/*
> > +	 * Note: cifs case is easier than btrfs since server responsible for
> > +	 * checks for proper open modes and file type and if it wants
> > +	 * server could even support copy of range where source = target
> > +	 */
> > +	lock_two_nondirectories(target_inode, src_inode);
> > +
> > +	if (len == 0)
> > +		len = src_inode->i_size - off;
> > +
> > +	cifs_dbg(FYI, "about to flush pages\n");
> > +	/* should we flush first and last page first */
> > +	truncate_inode_pages_range(&target_inode->i_data, destoff,
> > +				   PAGE_CACHE_ALIGN(destoff + len)-1);
> > +
> > +	if (target_tcon->ses->server->ops->duplicate_extents)
> > +		rc = target_tcon->ses->server->ops->duplicate_extents(xid,
> > +			smb_file_src, smb_file_target, off, len, destoff);
> > +	else
> > +		rc = -EOPNOTSUPP;
> > +
> > +	/* force revalidate of size and timestamps of target file now
> > +	   that target is updated on the server */
> > +	CIFS_I(target_inode)->time = 0;
> > +out_unlock:
> > +	/* although unlocking in the reverse order from locking is not
> > +	   strictly necessary here it is a little cleaner to be consistent */
> > +	unlock_two_nondirectories(src_inode, target_inode);
> > +out:
> > +	free_xid(xid);
> > +	return rc;
> > +}
> > +
> >  const struct file_operations cifs_file_ops = {
> >  	.read_iter = cifs_loose_read_iter,
> >  	.write_iter = cifs_file_write_iter,
> > @@ -926,6 +981,7 @@ const struct file_operations cifs_file_ops = {
> >  	.splice_read = generic_file_splice_read,
> >  	.llseek = cifs_llseek,
> >  	.unlocked_ioctl	= cifs_ioctl,
> > +	.clone_file_range = cifs_clone_file_range,
> >  	.setlease = cifs_setlease,
> >  	.fallocate = cifs_fallocate,
> >  };
> > @@ -942,6 +998,8 @@ const struct file_operations cifs_file_strict_ops = {
> >  	.splice_read = generic_file_splice_read,
> >  	.llseek = cifs_llseek,
> >  	.unlocked_ioctl	= cifs_ioctl,
> > +	.clone_file_range = cifs_clone_file_range,
> > +	.clone_file_range = cifs_clone_file_range,
> >  	.setlease = cifs_setlease,
> >  	.fallocate = cifs_fallocate,
> >  };
> > @@ -958,6 +1016,7 @@ const struct file_operations cifs_file_direct_ops = {
> >  	.mmap = cifs_file_mmap,
> >  	.splice_read = generic_file_splice_read,
> >  	.unlocked_ioctl  = cifs_ioctl,
> > +	.clone_file_range = cifs_clone_file_range,
> >  	.llseek = cifs_llseek,
> >  	.setlease = cifs_setlease,
> >  	.fallocate = cifs_fallocate,
> > @@ -974,6 +1033,7 @@ const struct file_operations cifs_file_nobrl_ops = {
> >  	.splice_read = generic_file_splice_read,
> >  	.llseek = cifs_llseek,
> >  	.unlocked_ioctl	= cifs_ioctl,
> > +	.clone_file_range = cifs_clone_file_range,
> >  	.setlease = cifs_setlease,
> >  	.fallocate = cifs_fallocate,
> >  };
> > @@ -989,6 +1049,7 @@ const struct file_operations cifs_file_strict_nobrl_ops = {
> >  	.splice_read = generic_file_splice_read,
> >  	.llseek = cifs_llseek,
> >  	.unlocked_ioctl	= cifs_ioctl,
> > +	.clone_file_range = cifs_clone_file_range,
> >  	.setlease = cifs_setlease,
> >  	.fallocate = cifs_fallocate,
> >  };
> > @@ -1004,6 +1065,7 @@ const struct file_operations cifs_file_direct_nobrl_ops = {
> >  	.mmap = cifs_file_mmap,
> >  	.splice_read = generic_file_splice_read,
> >  	.unlocked_ioctl  = cifs_ioctl,
> > +	.clone_file_range = cifs_clone_file_range,
> >  	.llseek = cifs_llseek,
> >  	.setlease = cifs_setlease,
> >  	.fallocate = cifs_fallocate,
> > @@ -1014,6 +1076,7 @@ const struct file_operations cifs_dir_ops = {
> >  	.release = cifs_closedir,
> >  	.read    = generic_read_dir,
> >  	.unlocked_ioctl  = cifs_ioctl,
> > +	.clone_file_range = cifs_clone_file_range,
> >  	.llseek = generic_file_llseek,
> >  };
> >  
> > diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
> > index c3cc160..c399513 100644
> > --- a/fs/cifs/cifsfs.h
> > +++ b/fs/cifs/cifsfs.h
> > @@ -131,7 +131,6 @@ extern int	cifs_setxattr(struct dentry *, const char *, const void *,
> >  extern ssize_t	cifs_getxattr(struct dentry *, const char *, void *, size_t);
> >  extern ssize_t	cifs_listxattr(struct dentry *, char *, size_t);
> >  extern long cifs_ioctl(struct file *filep, unsigned int cmd, unsigned long arg);
> > -
> >  #ifdef CONFIG_CIFS_NFSD_EXPORT
> >  extern const struct export_operations cifs_export_ops;
> >  #endif /* CONFIG_CIFS_NFSD_EXPORT */
> > diff --git a/fs/cifs/ioctl.c b/fs/cifs/ioctl.c
> > index 35cf990..7a3b84e 100644
> > --- a/fs/cifs/ioctl.c
> > +++ b/fs/cifs/ioctl.c
> > @@ -34,73 +34,36 @@
> >  #include "cifs_ioctl.h"
> >  #include <linux/btrfs.h>
> >  
> > -static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file,
> > -			unsigned long srcfd, u64 off, u64 len, u64 destoff,
> > -			bool dup_extents)
> > +static int cifs_file_clone_range(unsigned int xid, struct file *src_file,
> > +			  struct file *dst_file)
> >  {
> > -	int rc;
> > -	struct cifsFileInfo *smb_file_target = dst_file->private_data;
> > +	struct inode *src_inode = file_inode(src_file);
> >  	struct inode *target_inode = file_inode(dst_file);
> > -	struct cifs_tcon *target_tcon;
> > -	struct fd src_file;
> >  	struct cifsFileInfo *smb_file_src;
> > -	struct inode *src_inode;
> > +	struct cifsFileInfo *smb_file_target;
> >  	struct cifs_tcon *src_tcon;
> > +	struct cifs_tcon *target_tcon;
> > +	int rc;
> >  
> >  	cifs_dbg(FYI, "ioctl clone range\n");
> > -	/* the destination must be opened for writing */
> > -	if (!(dst_file->f_mode & FMODE_WRITE)) {
> > -		cifs_dbg(FYI, "file target not open for write\n");
> > -		return -EINVAL;
> > -	}
> >  
> > -	/* check if target volume is readonly and take reference */
> > -	rc = mnt_want_write_file(dst_file);
> > -	if (rc) {
> > -		cifs_dbg(FYI, "mnt_want_write failed with rc %d\n", rc);
> > -		return rc;
> > -	}
> > -
> > -	src_file = fdget(srcfd);
> > -	if (!src_file.file) {
> > -		rc = -EBADF;
> > -		goto out_drop_write;
> > -	}
> > -
> > -	if (src_file.file->f_op->unlocked_ioctl != cifs_ioctl) {
> > -		rc = -EBADF;
> > -		cifs_dbg(VFS, "src file seems to be from a different filesystem type\n");
> > -		goto out_fput;
> > -	}
> > -
> > -	if ((!src_file.file->private_data) || (!dst_file->private_data)) {
> > +	if (!src_file->private_data || !dst_file->private_data) {
> >  		rc = -EBADF;
> >  		cifs_dbg(VFS, "missing cifsFileInfo on copy range src file\n");
> > -		goto out_fput;
> > +		goto out;
> >  	}
> >  
> >  	rc = -EXDEV;
> >  	smb_file_target = dst_file->private_data;
> > -	smb_file_src = src_file.file->private_data;
> > +	smb_file_src = src_file->private_data;
> >  	src_tcon = tlink_tcon(smb_file_src->tlink);
> >  	target_tcon = tlink_tcon(smb_file_target->tlink);
> >  
> > -	/* check source and target on same server (or volume if dup_extents) */
> > -	if (dup_extents && (src_tcon != target_tcon)) {
> > -		cifs_dbg(VFS, "source and target of copy not on same share\n");
> > -		goto out_fput;
> > -	}
> > -
> > -	if (!dup_extents && (src_tcon->ses != target_tcon->ses)) {
> > +	if (src_tcon->ses != target_tcon->ses) {
> >  		cifs_dbg(VFS, "source and target of copy not on same server\n");
> > -		goto out_fput;
> > +		goto out;
> >  	}
> >  
> > -	src_inode = file_inode(src_file.file);
> > -	rc = -EINVAL;
> > -	if (S_ISDIR(src_inode->i_mode))
> > -		goto out_fput;
> > -
> >  	/*
> >  	 * Note: cifs case is easier than btrfs since server responsible for
> >  	 * checks for proper open modes and file type and if it wants
> > @@ -108,34 +71,66 @@ static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file,
> >  	 */
> >  	lock_two_nondirectories(target_inode, src_inode);
> >  
> > -	/* determine range to clone */
> > -	rc = -EINVAL;
> > -	if (off + len > src_inode->i_size || off + len < off)
> > -		goto out_unlock;
> > -	if (len == 0)
> > -		len = src_inode->i_size - off;
> > -
> >  	cifs_dbg(FYI, "about to flush pages\n");
> >  	/* should we flush first and last page first */
> > -	truncate_inode_pages_range(&target_inode->i_data, destoff,
> > -				   PAGE_CACHE_ALIGN(destoff + len)-1);
> > +	truncate_inode_pages(&target_inode->i_data, 0);
> >  
> > -	if (dup_extents && target_tcon->ses->server->ops->duplicate_extents)
> > -		rc = target_tcon->ses->server->ops->duplicate_extents(xid,
> > -			smb_file_src, smb_file_target, off, len, destoff);
> > -	else if (!dup_extents && target_tcon->ses->server->ops->clone_range)
> > +	if (target_tcon->ses->server->ops->clone_range)
> >  		rc = target_tcon->ses->server->ops->clone_range(xid,
> > -			smb_file_src, smb_file_target, off, len, destoff);
> > +			smb_file_src, smb_file_target, 0, src_inode->i_size, 0);
> >  	else
> >  		rc = -EOPNOTSUPP;
> >  
> >  	/* force revalidate of size and timestamps of target file now
> >  	   that target is updated on the server */
> >  	CIFS_I(target_inode)->time = 0;
> > -out_unlock:
> >  	/* although unlocking in the reverse order from locking is not
> >  	   strictly necessary here it is a little cleaner to be consistent */
> >  	unlock_two_nondirectories(src_inode, target_inode);
> > +out:
> > +	return rc;
> > +}
> > +
> > +static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file,
> > +			unsigned long srcfd)
> > +{
> > +	int rc;
> > +	struct fd src_file;
> > +	struct inode *src_inode;
> > +
> > +	cifs_dbg(FYI, "ioctl clone range\n");
> > +	/* the destination must be opened for writing */
> > +	if (!(dst_file->f_mode & FMODE_WRITE)) {
> > +		cifs_dbg(FYI, "file target not open for write\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	/* check if target volume is readonly and take reference */
> > +	rc = mnt_want_write_file(dst_file);
> > +	if (rc) {
> > +		cifs_dbg(FYI, "mnt_want_write failed with rc %d\n", rc);
> > +		return rc;
> > +	}
> > +
> > +	src_file = fdget(srcfd);
> > +	if (!src_file.file) {
> > +		rc = -EBADF;
> > +		goto out_drop_write;
> > +	}
> > +
> > +	if (src_file.file->f_op->unlocked_ioctl != cifs_ioctl) {
> > +		rc = -EBADF;
> > +		cifs_dbg(VFS, "src file seems to be from a different filesystem type\n");
> > +		goto out_fput;
> > +	}
> > +
> > +	src_inode = file_inode(src_file.file);
> > +	rc = -EINVAL;
> > +	if (S_ISDIR(src_inode->i_mode))
> > +		goto out_fput;
> > +
> > +	rc = cifs_file_clone_range(xid, src_file.file, dst_file);
> > +
> >  out_fput:
> >  	fdput(src_file);
> >  out_drop_write:
> > @@ -256,10 +251,7 @@ long cifs_ioctl(struct file *filep, unsigned int command, unsigned long arg)
> >  			}
> >  			break;
> >  		case CIFS_IOC_COPYCHUNK_FILE:
> > -			rc = cifs_ioctl_clone(xid, filep, arg, 0, 0, 0, false);
> > -			break;
> > -		case BTRFS_IOC_CLONE:
> > -			rc = cifs_ioctl_clone(xid, filep, arg, 0, 0, 0, true);
> > +			rc = cifs_ioctl_clone(xid, filep, arg);
> >  			break;
> >  		case CIFS_IOC_SET_INTEGRITY:
> >  			if (pSMBFile == NULL)
> > diff --git a/fs/ioctl.c b/fs/ioctl.c
> > index 5d01d26..84c6e79 100644
> > --- a/fs/ioctl.c
> > +++ b/fs/ioctl.c
> > @@ -215,6 +215,29 @@ static int ioctl_fiemap(struct file *filp, unsigned long arg)
> >  	return error;
> >  }
> >  
> > +static long ioctl_file_clone(struct file *dst_file, unsigned long srcfd,
> > +			     u64 off, u64 olen, u64 destoff)
> > +{
> > +	struct fd src_file = fdget(srcfd);
> > +	int ret;
> > +
> > +	if (!src_file.file)
> > +		return -EBADF;
> > +	ret = vfs_clone_file_range(src_file.file, off, dst_file, destoff, olen);
> > +	fdput(src_file);
> > +	return ret;
> > +}
> > +
> > +static long ioctl_file_clone_range(struct file *file, void __user *argp)
> > +{
> > +	struct file_clone_range args;
> > +
> > +	if (copy_from_user(&args, argp, sizeof(args)))
> > +		return -EFAULT;
> > +	return ioctl_file_clone(file, args.src_fd, args.src_offset,
> > +				args.src_length, args.dest_offset);
> > +}
> > +
> >  #ifdef CONFIG_BLOCK
> >  
> >  static inline sector_t logical_to_blk(struct inode *inode, loff_t offset)
> > @@ -600,6 +623,12 @@ int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
> >  	case FIGETBSZ:
> >  		return put_user(inode->i_sb->s_blocksize, argp);
> >  
> > +	case FICLONE:
> > +		return ioctl_file_clone(filp, arg, 0, 0, 0);
> > +
> > +	case FICLONERANGE:
> > +		return ioctl_file_clone_range(filp, argp);
> > +
> >  	default:
> >  		if (S_ISREG(inode->i_mode))
> >  			error = file_ioctl(filp, cmd, arg);
> > diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
> > index db9b5fe..26f9a23 100644
> > --- a/fs/nfs/nfs4file.c
> > +++ b/fs/nfs/nfs4file.c
> > @@ -195,65 +195,27 @@ static long nfs42_fallocate(struct file *filep, int mode, loff_t offset, loff_t
> >  	return nfs42_proc_allocate(filep, offset, len);
> >  }
> >  
> > -static noinline long
> > -nfs42_ioctl_clone(struct file *dst_file, unsigned long srcfd,
> > -		  u64 src_off, u64 dst_off, u64 count)
> > +static int nfs42_clone_file_range(struct file *src_file, loff_t src_off,
> > +		struct file *dst_file, loff_t dst_off, u64 count)
> >  {
> >  	struct inode *dst_inode = file_inode(dst_file);
> >  	struct nfs_server *server = NFS_SERVER(dst_inode);
> > -	struct fd src_file;
> > -	struct inode *src_inode;
> > +	struct inode *src_inode = file_inode(src_file);
> >  	unsigned int bs = server->clone_blksize;
> >  	bool same_inode = false;
> >  	int ret;
> >  
> > -	/* dst file must be opened for writing */
> > -	if (!(dst_file->f_mode & FMODE_WRITE))
> > -		return -EINVAL;
> > -
> > -	ret = mnt_want_write_file(dst_file);
> > -	if (ret)
> > -		return ret;
> > -
> > -	src_file = fdget(srcfd);
> > -	if (!src_file.file) {
> > -		ret = -EBADF;
> > -		goto out_drop_write;
> > -	}
> > -
> > -	src_inode = file_inode(src_file.file);
> > -
> > -	if (src_inode == dst_inode)
> > -		same_inode = true;
> > -
> > -	/* src file must be opened for reading */
> > -	if (!(src_file.file->f_mode & FMODE_READ))
> > -		goto out_fput;
> > -
> > -	/* src and dst must be regular files */
> > -	ret = -EISDIR;
> > -	if (!S_ISREG(src_inode->i_mode) || !S_ISREG(dst_inode->i_mode))
> > -		goto out_fput;
> > -
> > -	ret = -EXDEV;
> > -	if (src_file.file->f_path.mnt != dst_file->f_path.mnt ||
> > -	    src_inode->i_sb != dst_inode->i_sb)
> > -		goto out_fput;
> > -
> >  	/* check alignment w.r.t. clone_blksize */
> >  	ret = -EINVAL;
> >  	if (bs) {
> >  		if (!IS_ALIGNED(src_off, bs) || !IS_ALIGNED(dst_off, bs))
> > -			goto out_fput;
> > +			goto out;
> >  		if (!IS_ALIGNED(count, bs) && i_size_read(src_inode) != (src_off + count))
> > -			goto out_fput;
> > +			goto out;
> >  	}
> >  
> > -	/* verify if ranges are overlapped within the same file */
> > -	if (same_inode) {
> > -		if (dst_off + count > src_off && dst_off < src_off + count)
> > -			goto out_fput;
> > -	}
> > +	if (src_inode == dst_inode)
> > +		same_inode = true;
> >  
> >  	/* XXX: do we lock at all? what if server needs CB_RECALL_LAYOUT? */
> >  	if (same_inode) {
> > @@ -275,7 +237,7 @@ nfs42_ioctl_clone(struct file *dst_file, unsigned long srcfd,
> >  	if (ret)
> >  		goto out_unlock;
> >  
> > -	ret = nfs42_proc_clone(src_file.file, dst_file, src_off, dst_off, count);
> > +	ret = nfs42_proc_clone(src_file, dst_file, src_off, dst_off, count);
> >  
> >  	/* truncate inode page cache of the dst range so that future reads can fetch
> >  	 * new data from server */
> > @@ -292,37 +254,9 @@ out_unlock:
> >  		mutex_unlock(&dst_inode->i_mutex);
> >  		mutex_unlock(&src_inode->i_mutex);
> >  	}
> > -out_fput:
> > -	fdput(src_file);
> > -out_drop_write:
> > -	mnt_drop_write_file(dst_file);
> > +out:
> >  	return ret;
> >  }
> > -
> > -static long nfs42_ioctl_clone_range(struct file *dst_file, void __user *argp)
> > -{
> > -	struct btrfs_ioctl_clone_range_args args;
> > -
> > -	if (copy_from_user(&args, argp, sizeof(args)))
> > -		return -EFAULT;
> > -
> > -	return nfs42_ioctl_clone(dst_file, args.src_fd, args.src_offset,
> > -				 args.dest_offset, args.src_length);
> > -}
> > -
> > -long nfs4_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> > -{
> > -	void __user *argp = (void __user *)arg;
> > -
> > -	switch (cmd) {
> > -	case BTRFS_IOC_CLONE:
> > -		return nfs42_ioctl_clone(file, arg, 0, 0, 0);
> > -	case BTRFS_IOC_CLONE_RANGE:
> > -		return nfs42_ioctl_clone_range(file, argp);
> > -	}
> > -
> > -	return -ENOTTY;
> > -}
> >  #endif /* CONFIG_NFS_V4_2 */
> >  
> >  const struct file_operations nfs4_file_operations = {
> > @@ -342,8 +276,7 @@ const struct file_operations nfs4_file_operations = {
> >  #ifdef CONFIG_NFS_V4_2
> >  	.llseek		= nfs4_file_llseek,
> >  	.fallocate	= nfs42_fallocate,
> > -	.unlocked_ioctl = nfs4_ioctl,
> > -	.compat_ioctl	= nfs4_ioctl,
> > +	.clone_file_range = nfs42_clone_file_range,
> >  #else
> >  	.llseek		= nfs_file_llseek,
> >  #endif
> > diff --git a/fs/read_write.c b/fs/read_write.c
> > index 6c1aa73..9e3dd8f 100644
> > --- a/fs/read_write.c
> > +++ b/fs/read_write.c
> > @@ -1451,3 +1451,75 @@ out1:
> >  out2:
> >  	return ret;
> >  }
> > +
> > +static int clone_verify_area(struct file *file, loff_t pos, u64 len, bool write)
> > +{
> > +	struct inode *inode = file_inode(file);
> > +
> > +	if (unlikely(pos < 0))
> > +		return -EINVAL;
> > +
> > +	 if (unlikely((loff_t) (pos + len) < 0))
> > +		return -EINVAL;
> > +
> > +	if (unlikely(inode->i_flctx && mandatory_lock(inode))) {
> > +		loff_t end = len ? pos + len - 1 : OFFSET_MAX;
> > +		int retval;
> > +
> > +		retval = locks_mandatory_area(file, pos, end,
> > +				write ? F_WRLCK : F_RDLCK);
> > +		if (retval < 0)
> > +			return retval;
> > +	}
> > +
> > +	return security_file_permission(file, write ? MAY_WRITE : MAY_READ);
> > +}
> > +
> > +int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
> > +		struct file *file_out, loff_t pos_out, u64 len)
> > +{
> > +	struct inode *inode_in = file_inode(file_in);
> > +	struct inode *inode_out = file_inode(file_out);
> > +	int ret;
> > +
> > +	if (inode_in->i_sb != inode_out->i_sb ||
> > +	    file_in->f_path.mnt != file_out->f_path.mnt)
> > +		return -EXDEV;
> > +
> > +	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
> > +		return -EISDIR;
> > +	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
> > +		return -EOPNOTSUPP;
> > +
> > +	if (!(file_in->f_mode & FMODE_READ) ||
> > +	    !(file_out->f_mode & FMODE_WRITE) ||
> > +	    (file_out->f_flags & O_APPEND) ||
> > +	    !file_in->f_op->clone_file_range)
> > +		return -EBADF;
> > +
> > +	ret = clone_verify_area(file_in, pos_in, len, false);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = clone_verify_area(file_out, pos_out, len, true);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (pos_in + len > i_size_read(inode_in))
> > +		return -EINVAL;
> > +
> > +	ret = mnt_want_write_file(file_out);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = file_in->f_op->clone_file_range(file_in, pos_in,
> > +			file_out, pos_out, len);
> > +	if (!ret) {
> > +		fsnotify_access(file_in);
> > +		fsnotify_modify(file_out);
> > +	}
> > +
> > +	mnt_drop_write_file(file_out);
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL(vfs_clone_file_range);
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index af559ac..59bf96d 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -1629,7 +1629,10 @@ struct file_operations {
> >  #ifndef CONFIG_MMU
> >  	unsigned (*mmap_capabilities)(struct file *);
> >  #endif
> > -	ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, loff_t, size_t, unsigned int);
> > +	ssize_t (*copy_file_range)(struct file *, loff_t, struct file *,
> > +			loff_t, size_t, unsigned int);
> > +	int (*clone_file_range)(struct file *, loff_t, struct file *, loff_t,
> > +			u64);
> >  };
> >  
> >  struct inode_operations {
> > @@ -1683,6 +1686,8 @@ extern ssize_t vfs_writev(struct file *, const struct iovec __user *,
> >  		unsigned long, loff_t *);
> >  extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
> >  				   loff_t, size_t, unsigned int);
> > +extern int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
> > +		struct file *file_out, loff_t pos_out, u64 len);
> >  
> >  struct super_operations {
> >     	struct inode *(*alloc_inode)(struct super_block *sb);
> > diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> > index f15d980..cd5db7f 100644
> > --- a/include/uapi/linux/fs.h
> > +++ b/include/uapi/linux/fs.h
> > @@ -39,6 +39,13 @@
> >  #define RENAME_EXCHANGE		(1 << 1)	/* Exchange source and dest */
> >  #define RENAME_WHITEOUT		(1 << 2)	/* Whiteout source */
> >  
> > +struct file_clone_range {
> > +	__s64 src_fd;
> > +	__u64 src_offset;
> > +	__u64 src_length;
> > +	__u64 dest_offset;
> > +};
> > +
> >  struct fstrim_range {
> >  	__u64 start;
> >  	__u64 len;
> > @@ -159,6 +166,8 @@ struct inodes_stat_t {
> >  #define FIFREEZE	_IOWR('X', 119, int)	/* Freeze */
> >  #define FITHAW		_IOWR('X', 120, int)	/* Thaw */
> >  #define FITRIM		_IOWR('X', 121, struct fstrim_range)	/* Trim */
> > +#define FICLONE		_IOW(0x94, 9, int)
> > +#define FICLONERANGE	_IOW(0x94, 13, struct file_clone_range)
> >  
> >  #define	FS_IOC_GETFLAGS			_IOR('f', 1, long)
> >  #define	FS_IOC_SETFLAGS			_IOW('f', 2, long)
> > -- 
> > 1.9.1
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] vfs: pull btrfs clone API to vfs layer
@ 2015-12-14 17:08       ` Darrick J. Wong
  0 siblings, 0 replies; 27+ messages in thread
From: Darrick J. Wong @ 2015-12-14 17:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	tao.peng-7I+n7zu2hftEKMMhf/gKZA,
	jeff.layton-7I+n7zu2hftEKMMhf/gKZA,
	bfields-uC3wQj2KruNg9hUCZPvPmw,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA

On Wed, Dec 09, 2015 at 12:40:33PM -0800, Darrick J. Wong wrote:
> On Thu, Dec 03, 2015 at 12:59:50PM +0100, Christoph Hellwig wrote:
> > The btrfs clone ioctls are now adopted by other file systems, with NFS
> > and CIFS already having support for them, and XFS being under active
> > development.  To avoid growth of various slightly incompatible
> > implementations, add one to the VFS.  Note that clones are different from
> > file copies in several ways:
> > 
> >  - they are atomic vs other writers
> >  - they support whole file clones
> >  - they support 64-bit legth clones
> >  - they do not allow partial success (aka short writes)
> >  - clones are expected to be a fast metadata operation
> > 
> > Because of that it would be rather cumbersome to try to piggyback them on
> > top of the recent clone_file_range infrastructure.  The converse isn't
> > true and the clone_file_range system call could try clone file range as
> > a first attempt to copy, something that further patches will enable.
> > 
> > Based on earlier work from Peng Tao.
> > 
> > Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> > ---
> >  fs/btrfs/ctree.h        |   3 +-
> >  fs/btrfs/file.c         |   1 +
> >  fs/btrfs/ioctl.c        |  49 ++-----------------
> >  fs/cifs/cifsfs.c        |  63 ++++++++++++++++++++++++
> >  fs/cifs/cifsfs.h        |   1 -
> >  fs/cifs/ioctl.c         | 126 +++++++++++++++++++++++-------------------------
> >  fs/ioctl.c              |  29 +++++++++++
> 
> I tried this patch series on ppc64 (w/ 32-bit powerpc userland) and I think
> it needs to fix up the compat ioctl to make the vfs call...

Bah, forgot to add:
Signed-off-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

(Feel free to fold this three line chunk into the original patch...)

--D

> diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
> index dcf2653..70d4b10 100644
> --- a/fs/compat_ioctl.c
> +++ b/fs/compat_ioctl.c
> @@ -1580,6 +1580,10 @@ COMPAT_SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd,
>                 goto out_fput;
>  #endif
>  
> +       case FICLONE:
> +       case FICLONERANGE:
> +               goto do_ioctl;
> +
>         case FIBMAP:
>         case FIGETBSZ:
>         case FIONREAD:
> 
> --D
> 
> >  fs/nfs/nfs4file.c       |  87 ++++-----------------------------
> >  fs/read_write.c         |  72 +++++++++++++++++++++++++++
> >  include/linux/fs.h      |   7 ++-
> >  include/uapi/linux/fs.h |   9 ++++
> >  11 files changed, 254 insertions(+), 193 deletions(-)
> > 
> > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > index ede7277..dd4733f 100644
> > --- a/fs/btrfs/ctree.h
> > +++ b/fs/btrfs/ctree.h
> > @@ -4025,7 +4025,6 @@ void btrfs_get_block_group_info(struct list_head *groups_list,
> >  void update_ioctl_balance_args(struct btrfs_fs_info *fs_info, int lock,
> >  			       struct btrfs_ioctl_balance_args *bargs);
> >  
> > -
> >  /* file.c */
> >  int btrfs_auto_defrag_init(void);
> >  void btrfs_auto_defrag_exit(void);
> > @@ -4058,6 +4057,8 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
> >  ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >  			      struct file *file_out, loff_t pos_out,
> >  			      size_t len, unsigned int flags);
> > +int btrfs_clone_file_range(struct file *file_in, loff_t pos_in,
> > +			   struct file *file_out, loff_t pos_out, u64 len);
> >  
> >  /* tree-defrag.c */
> >  int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
> > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> > index e67fe6a..232e300 100644
> > --- a/fs/btrfs/file.c
> > +++ b/fs/btrfs/file.c
> > @@ -2925,6 +2925,7 @@ const struct file_operations btrfs_file_operations = {
> >  	.compat_ioctl	= btrfs_ioctl,
> >  #endif
> >  	.copy_file_range = btrfs_copy_file_range,
> > +	.clone_file_range = btrfs_clone_file_range,
> >  };
> >  
> >  void btrfs_auto_defrag_exit(void)
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index 0f92735..85b1cae 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -3906,49 +3906,10 @@ ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >  	return ret;
> >  }
> >  
> > -static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
> > -				       u64 off, u64 olen, u64 destoff)
> > +int btrfs_clone_file_range(struct file *src_file, loff_t off,
> > +		struct file *dst_file, loff_t destoff, u64 len)
> >  {
> > -	struct fd src_file;
> > -	int ret;
> > -
> > -	/* the destination must be opened for writing */
> > -	if (!(file->f_mode & FMODE_WRITE) || (file->f_flags & O_APPEND))
> > -		return -EINVAL;
> > -
> > -	ret = mnt_want_write_file(file);
> > -	if (ret)
> > -		return ret;
> > -
> > -	src_file = fdget(srcfd);
> > -	if (!src_file.file) {
> > -		ret = -EBADF;
> > -		goto out_drop_write;
> > -	}
> > -
> > -	/* the src must be open for reading */
> > -	if (!(src_file.file->f_mode & FMODE_READ)) {
> > -		ret = -EINVAL;
> > -		goto out_fput;
> > -	}
> > -
> > -	ret = btrfs_clone_files(file, src_file.file, off, olen, destoff);
> > -
> > -out_fput:
> > -	fdput(src_file);
> > -out_drop_write:
> > -	mnt_drop_write_file(file);
> > -	return ret;
> > -}
> > -
> > -static long btrfs_ioctl_clone_range(struct file *file, void __user *argp)
> > -{
> > -	struct btrfs_ioctl_clone_range_args args;
> > -
> > -	if (copy_from_user(&args, argp, sizeof(args)))
> > -		return -EFAULT;
> > -	return btrfs_ioctl_clone(file, args.src_fd, args.src_offset,
> > -				 args.src_length, args.dest_offset);
> > +	return btrfs_clone_files(dst_file, src_file, off, len, destoff);
> >  }
> >  
> >  /*
> > @@ -5498,10 +5459,6 @@ long btrfs_ioctl(struct file *file, unsigned int
> >  		return btrfs_ioctl_dev_info(root, argp);
> >  	case BTRFS_IOC_BALANCE:
> >  		return btrfs_ioctl_balance(file, NULL);
> > -	case BTRFS_IOC_CLONE:
> > -		return btrfs_ioctl_clone(file, arg, 0, 0, 0);
> > -	case BTRFS_IOC_CLONE_RANGE:
> > -		return btrfs_ioctl_clone_range(file, argp);
> >  	case BTRFS_IOC_TRANS_START:
> >  		return btrfs_ioctl_trans_start(file);
> >  	case BTRFS_IOC_TRANS_END:
> > diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> > index cbc0f4b..e9b978f 100644
> > --- a/fs/cifs/cifsfs.c
> > +++ b/fs/cifs/cifsfs.c
> > @@ -914,6 +914,61 @@ const struct inode_operations cifs_symlink_inode_ops = {
> >  #endif
> >  };
> >  
> > +static int cifs_clone_file_range(struct file *src_file, loff_t off,
> > +		struct file *dst_file, loff_t destoff, u64 len)
> > +{
> > +	struct inode *src_inode = file_inode(src_file);
> > +	struct inode *target_inode = file_inode(dst_file);
> > +	struct cifsFileInfo *smb_file_src = src_file->private_data;
> > +	struct cifsFileInfo *smb_file_target = dst_file->private_data;
> > +	struct cifs_tcon *src_tcon = tlink_tcon(smb_file_src->tlink);
> > +	struct cifs_tcon *target_tcon = tlink_tcon(smb_file_target->tlink);
> > +	unsigned int xid;
> > +	int rc;
> > +
> > +	cifs_dbg(FYI, "clone range\n");
> > +
> > +	xid = get_xid();
> > +
> > +	if (!src_file->private_data || !dst_file->private_data) {
> > +		rc = -EBADF;
> > +		cifs_dbg(VFS, "missing cifsFileInfo on copy range src file\n");
> > +		goto out;
> > +	}
> > +
> > +	/*
> > +	 * Note: cifs case is easier than btrfs since server responsible for
> > +	 * checks for proper open modes and file type and if it wants
> > +	 * server could even support copy of range where source = target
> > +	 */
> > +	lock_two_nondirectories(target_inode, src_inode);
> > +
> > +	if (len == 0)
> > +		len = src_inode->i_size - off;
> > +
> > +	cifs_dbg(FYI, "about to flush pages\n");
> > +	/* should we flush first and last page first */
> > +	truncate_inode_pages_range(&target_inode->i_data, destoff,
> > +				   PAGE_CACHE_ALIGN(destoff + len)-1);
> > +
> > +	if (target_tcon->ses->server->ops->duplicate_extents)
> > +		rc = target_tcon->ses->server->ops->duplicate_extents(xid,
> > +			smb_file_src, smb_file_target, off, len, destoff);
> > +	else
> > +		rc = -EOPNOTSUPP;
> > +
> > +	/* force revalidate of size and timestamps of target file now
> > +	   that target is updated on the server */
> > +	CIFS_I(target_inode)->time = 0;
> > +out_unlock:
> > +	/* although unlocking in the reverse order from locking is not
> > +	   strictly necessary here it is a little cleaner to be consistent */
> > +	unlock_two_nondirectories(src_inode, target_inode);
> > +out:
> > +	free_xid(xid);
> > +	return rc;
> > +}
> > +
> >  const struct file_operations cifs_file_ops = {
> >  	.read_iter = cifs_loose_read_iter,
> >  	.write_iter = cifs_file_write_iter,
> > @@ -926,6 +981,7 @@ const struct file_operations cifs_file_ops = {
> >  	.splice_read = generic_file_splice_read,
> >  	.llseek = cifs_llseek,
> >  	.unlocked_ioctl	= cifs_ioctl,
> > +	.clone_file_range = cifs_clone_file_range,
> >  	.setlease = cifs_setlease,
> >  	.fallocate = cifs_fallocate,
> >  };
> > @@ -942,6 +998,8 @@ const struct file_operations cifs_file_strict_ops = {
> >  	.splice_read = generic_file_splice_read,
> >  	.llseek = cifs_llseek,
> >  	.unlocked_ioctl	= cifs_ioctl,
> > +	.clone_file_range = cifs_clone_file_range,
> > +	.clone_file_range = cifs_clone_file_range,
> >  	.setlease = cifs_setlease,
> >  	.fallocate = cifs_fallocate,
> >  };
> > @@ -958,6 +1016,7 @@ const struct file_operations cifs_file_direct_ops = {
> >  	.mmap = cifs_file_mmap,
> >  	.splice_read = generic_file_splice_read,
> >  	.unlocked_ioctl  = cifs_ioctl,
> > +	.clone_file_range = cifs_clone_file_range,
> >  	.llseek = cifs_llseek,
> >  	.setlease = cifs_setlease,
> >  	.fallocate = cifs_fallocate,
> > @@ -974,6 +1033,7 @@ const struct file_operations cifs_file_nobrl_ops = {
> >  	.splice_read = generic_file_splice_read,
> >  	.llseek = cifs_llseek,
> >  	.unlocked_ioctl	= cifs_ioctl,
> > +	.clone_file_range = cifs_clone_file_range,
> >  	.setlease = cifs_setlease,
> >  	.fallocate = cifs_fallocate,
> >  };
> > @@ -989,6 +1049,7 @@ const struct file_operations cifs_file_strict_nobrl_ops = {
> >  	.splice_read = generic_file_splice_read,
> >  	.llseek = cifs_llseek,
> >  	.unlocked_ioctl	= cifs_ioctl,
> > +	.clone_file_range = cifs_clone_file_range,
> >  	.setlease = cifs_setlease,
> >  	.fallocate = cifs_fallocate,
> >  };
> > @@ -1004,6 +1065,7 @@ const struct file_operations cifs_file_direct_nobrl_ops = {
> >  	.mmap = cifs_file_mmap,
> >  	.splice_read = generic_file_splice_read,
> >  	.unlocked_ioctl  = cifs_ioctl,
> > +	.clone_file_range = cifs_clone_file_range,
> >  	.llseek = cifs_llseek,
> >  	.setlease = cifs_setlease,
> >  	.fallocate = cifs_fallocate,
> > @@ -1014,6 +1076,7 @@ const struct file_operations cifs_dir_ops = {
> >  	.release = cifs_closedir,
> >  	.read    = generic_read_dir,
> >  	.unlocked_ioctl  = cifs_ioctl,
> > +	.clone_file_range = cifs_clone_file_range,
> >  	.llseek = generic_file_llseek,
> >  };
> >  
> > diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
> > index c3cc160..c399513 100644
> > --- a/fs/cifs/cifsfs.h
> > +++ b/fs/cifs/cifsfs.h
> > @@ -131,7 +131,6 @@ extern int	cifs_setxattr(struct dentry *, const char *, const void *,
> >  extern ssize_t	cifs_getxattr(struct dentry *, const char *, void *, size_t);
> >  extern ssize_t	cifs_listxattr(struct dentry *, char *, size_t);
> >  extern long cifs_ioctl(struct file *filep, unsigned int cmd, unsigned long arg);
> > -
> >  #ifdef CONFIG_CIFS_NFSD_EXPORT
> >  extern const struct export_operations cifs_export_ops;
> >  #endif /* CONFIG_CIFS_NFSD_EXPORT */
> > diff --git a/fs/cifs/ioctl.c b/fs/cifs/ioctl.c
> > index 35cf990..7a3b84e 100644
> > --- a/fs/cifs/ioctl.c
> > +++ b/fs/cifs/ioctl.c
> > @@ -34,73 +34,36 @@
> >  #include "cifs_ioctl.h"
> >  #include <linux/btrfs.h>
> >  
> > -static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file,
> > -			unsigned long srcfd, u64 off, u64 len, u64 destoff,
> > -			bool dup_extents)
> > +static int cifs_file_clone_range(unsigned int xid, struct file *src_file,
> > +			  struct file *dst_file)
> >  {
> > -	int rc;
> > -	struct cifsFileInfo *smb_file_target = dst_file->private_data;
> > +	struct inode *src_inode = file_inode(src_file);
> >  	struct inode *target_inode = file_inode(dst_file);
> > -	struct cifs_tcon *target_tcon;
> > -	struct fd src_file;
> >  	struct cifsFileInfo *smb_file_src;
> > -	struct inode *src_inode;
> > +	struct cifsFileInfo *smb_file_target;
> >  	struct cifs_tcon *src_tcon;
> > +	struct cifs_tcon *target_tcon;
> > +	int rc;
> >  
> >  	cifs_dbg(FYI, "ioctl clone range\n");
> > -	/* the destination must be opened for writing */
> > -	if (!(dst_file->f_mode & FMODE_WRITE)) {
> > -		cifs_dbg(FYI, "file target not open for write\n");
> > -		return -EINVAL;
> > -	}
> >  
> > -	/* check if target volume is readonly and take reference */
> > -	rc = mnt_want_write_file(dst_file);
> > -	if (rc) {
> > -		cifs_dbg(FYI, "mnt_want_write failed with rc %d\n", rc);
> > -		return rc;
> > -	}
> > -
> > -	src_file = fdget(srcfd);
> > -	if (!src_file.file) {
> > -		rc = -EBADF;
> > -		goto out_drop_write;
> > -	}
> > -
> > -	if (src_file.file->f_op->unlocked_ioctl != cifs_ioctl) {
> > -		rc = -EBADF;
> > -		cifs_dbg(VFS, "src file seems to be from a different filesystem type\n");
> > -		goto out_fput;
> > -	}
> > -
> > -	if ((!src_file.file->private_data) || (!dst_file->private_data)) {
> > +	if (!src_file->private_data || !dst_file->private_data) {
> >  		rc = -EBADF;
> >  		cifs_dbg(VFS, "missing cifsFileInfo on copy range src file\n");
> > -		goto out_fput;
> > +		goto out;
> >  	}
> >  
> >  	rc = -EXDEV;
> >  	smb_file_target = dst_file->private_data;
> > -	smb_file_src = src_file.file->private_data;
> > +	smb_file_src = src_file->private_data;
> >  	src_tcon = tlink_tcon(smb_file_src->tlink);
> >  	target_tcon = tlink_tcon(smb_file_target->tlink);
> >  
> > -	/* check source and target on same server (or volume if dup_extents) */
> > -	if (dup_extents && (src_tcon != target_tcon)) {
> > -		cifs_dbg(VFS, "source and target of copy not on same share\n");
> > -		goto out_fput;
> > -	}
> > -
> > -	if (!dup_extents && (src_tcon->ses != target_tcon->ses)) {
> > +	if (src_tcon->ses != target_tcon->ses) {
> >  		cifs_dbg(VFS, "source and target of copy not on same server\n");
> > -		goto out_fput;
> > +		goto out;
> >  	}
> >  
> > -	src_inode = file_inode(src_file.file);
> > -	rc = -EINVAL;
> > -	if (S_ISDIR(src_inode->i_mode))
> > -		goto out_fput;
> > -
> >  	/*
> >  	 * Note: cifs case is easier than btrfs since server responsible for
> >  	 * checks for proper open modes and file type and if it wants
> > @@ -108,34 +71,66 @@ static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file,
> >  	 */
> >  	lock_two_nondirectories(target_inode, src_inode);
> >  
> > -	/* determine range to clone */
> > -	rc = -EINVAL;
> > -	if (off + len > src_inode->i_size || off + len < off)
> > -		goto out_unlock;
> > -	if (len == 0)
> > -		len = src_inode->i_size - off;
> > -
> >  	cifs_dbg(FYI, "about to flush pages\n");
> >  	/* should we flush first and last page first */
> > -	truncate_inode_pages_range(&target_inode->i_data, destoff,
> > -				   PAGE_CACHE_ALIGN(destoff + len)-1);
> > +	truncate_inode_pages(&target_inode->i_data, 0);
> >  
> > -	if (dup_extents && target_tcon->ses->server->ops->duplicate_extents)
> > -		rc = target_tcon->ses->server->ops->duplicate_extents(xid,
> > -			smb_file_src, smb_file_target, off, len, destoff);
> > -	else if (!dup_extents && target_tcon->ses->server->ops->clone_range)
> > +	if (target_tcon->ses->server->ops->clone_range)
> >  		rc = target_tcon->ses->server->ops->clone_range(xid,
> > -			smb_file_src, smb_file_target, off, len, destoff);
> > +			smb_file_src, smb_file_target, 0, src_inode->i_size, 0);
> >  	else
> >  		rc = -EOPNOTSUPP;
> >  
> >  	/* force revalidate of size and timestamps of target file now
> >  	   that target is updated on the server */
> >  	CIFS_I(target_inode)->time = 0;
> > -out_unlock:
> >  	/* although unlocking in the reverse order from locking is not
> >  	   strictly necessary here it is a little cleaner to be consistent */
> >  	unlock_two_nondirectories(src_inode, target_inode);
> > +out:
> > +	return rc;
> > +}
> > +
> > +static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file,
> > +			unsigned long srcfd)
> > +{
> > +	int rc;
> > +	struct fd src_file;
> > +	struct inode *src_inode;
> > +
> > +	cifs_dbg(FYI, "ioctl clone range\n");
> > +	/* the destination must be opened for writing */
> > +	if (!(dst_file->f_mode & FMODE_WRITE)) {
> > +		cifs_dbg(FYI, "file target not open for write\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	/* check if target volume is readonly and take reference */
> > +	rc = mnt_want_write_file(dst_file);
> > +	if (rc) {
> > +		cifs_dbg(FYI, "mnt_want_write failed with rc %d\n", rc);
> > +		return rc;
> > +	}
> > +
> > +	src_file = fdget(srcfd);
> > +	if (!src_file.file) {
> > +		rc = -EBADF;
> > +		goto out_drop_write;
> > +	}
> > +
> > +	if (src_file.file->f_op->unlocked_ioctl != cifs_ioctl) {
> > +		rc = -EBADF;
> > +		cifs_dbg(VFS, "src file seems to be from a different filesystem type\n");
> > +		goto out_fput;
> > +	}
> > +
> > +	src_inode = file_inode(src_file.file);
> > +	rc = -EINVAL;
> > +	if (S_ISDIR(src_inode->i_mode))
> > +		goto out_fput;
> > +
> > +	rc = cifs_file_clone_range(xid, src_file.file, dst_file);
> > +
> >  out_fput:
> >  	fdput(src_file);
> >  out_drop_write:
> > @@ -256,10 +251,7 @@ long cifs_ioctl(struct file *filep, unsigned int command, unsigned long arg)
> >  			}
> >  			break;
> >  		case CIFS_IOC_COPYCHUNK_FILE:
> > -			rc = cifs_ioctl_clone(xid, filep, arg, 0, 0, 0, false);
> > -			break;
> > -		case BTRFS_IOC_CLONE:
> > -			rc = cifs_ioctl_clone(xid, filep, arg, 0, 0, 0, true);
> > +			rc = cifs_ioctl_clone(xid, filep, arg);
> >  			break;
> >  		case CIFS_IOC_SET_INTEGRITY:
> >  			if (pSMBFile == NULL)
> > diff --git a/fs/ioctl.c b/fs/ioctl.c
> > index 5d01d26..84c6e79 100644
> > --- a/fs/ioctl.c
> > +++ b/fs/ioctl.c
> > @@ -215,6 +215,29 @@ static int ioctl_fiemap(struct file *filp, unsigned long arg)
> >  	return error;
> >  }
> >  
> > +static long ioctl_file_clone(struct file *dst_file, unsigned long srcfd,
> > +			     u64 off, u64 olen, u64 destoff)
> > +{
> > +	struct fd src_file = fdget(srcfd);
> > +	int ret;
> > +
> > +	if (!src_file.file)
> > +		return -EBADF;
> > +	ret = vfs_clone_file_range(src_file.file, off, dst_file, destoff, olen);
> > +	fdput(src_file);
> > +	return ret;
> > +}
> > +
> > +static long ioctl_file_clone_range(struct file *file, void __user *argp)
> > +{
> > +	struct file_clone_range args;
> > +
> > +	if (copy_from_user(&args, argp, sizeof(args)))
> > +		return -EFAULT;
> > +	return ioctl_file_clone(file, args.src_fd, args.src_offset,
> > +				args.src_length, args.dest_offset);
> > +}
> > +
> >  #ifdef CONFIG_BLOCK
> >  
> >  static inline sector_t logical_to_blk(struct inode *inode, loff_t offset)
> > @@ -600,6 +623,12 @@ int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
> >  	case FIGETBSZ:
> >  		return put_user(inode->i_sb->s_blocksize, argp);
> >  
> > +	case FICLONE:
> > +		return ioctl_file_clone(filp, arg, 0, 0, 0);
> > +
> > +	case FICLONERANGE:
> > +		return ioctl_file_clone_range(filp, argp);
> > +
> >  	default:
> >  		if (S_ISREG(inode->i_mode))
> >  			error = file_ioctl(filp, cmd, arg);
> > diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
> > index db9b5fe..26f9a23 100644
> > --- a/fs/nfs/nfs4file.c
> > +++ b/fs/nfs/nfs4file.c
> > @@ -195,65 +195,27 @@ static long nfs42_fallocate(struct file *filep, int mode, loff_t offset, loff_t
> >  	return nfs42_proc_allocate(filep, offset, len);
> >  }
> >  
> > -static noinline long
> > -nfs42_ioctl_clone(struct file *dst_file, unsigned long srcfd,
> > -		  u64 src_off, u64 dst_off, u64 count)
> > +static int nfs42_clone_file_range(struct file *src_file, loff_t src_off,
> > +		struct file *dst_file, loff_t dst_off, u64 count)
> >  {
> >  	struct inode *dst_inode = file_inode(dst_file);
> >  	struct nfs_server *server = NFS_SERVER(dst_inode);
> > -	struct fd src_file;
> > -	struct inode *src_inode;
> > +	struct inode *src_inode = file_inode(src_file);
> >  	unsigned int bs = server->clone_blksize;
> >  	bool same_inode = false;
> >  	int ret;
> >  
> > -	/* dst file must be opened for writing */
> > -	if (!(dst_file->f_mode & FMODE_WRITE))
> > -		return -EINVAL;
> > -
> > -	ret = mnt_want_write_file(dst_file);
> > -	if (ret)
> > -		return ret;
> > -
> > -	src_file = fdget(srcfd);
> > -	if (!src_file.file) {
> > -		ret = -EBADF;
> > -		goto out_drop_write;
> > -	}
> > -
> > -	src_inode = file_inode(src_file.file);
> > -
> > -	if (src_inode == dst_inode)
> > -		same_inode = true;
> > -
> > -	/* src file must be opened for reading */
> > -	if (!(src_file.file->f_mode & FMODE_READ))
> > -		goto out_fput;
> > -
> > -	/* src and dst must be regular files */
> > -	ret = -EISDIR;
> > -	if (!S_ISREG(src_inode->i_mode) || !S_ISREG(dst_inode->i_mode))
> > -		goto out_fput;
> > -
> > -	ret = -EXDEV;
> > -	if (src_file.file->f_path.mnt != dst_file->f_path.mnt ||
> > -	    src_inode->i_sb != dst_inode->i_sb)
> > -		goto out_fput;
> > -
> >  	/* check alignment w.r.t. clone_blksize */
> >  	ret = -EINVAL;
> >  	if (bs) {
> >  		if (!IS_ALIGNED(src_off, bs) || !IS_ALIGNED(dst_off, bs))
> > -			goto out_fput;
> > +			goto out;
> >  		if (!IS_ALIGNED(count, bs) && i_size_read(src_inode) != (src_off + count))
> > -			goto out_fput;
> > +			goto out;
> >  	}
> >  
> > -	/* verify if ranges are overlapped within the same file */
> > -	if (same_inode) {
> > -		if (dst_off + count > src_off && dst_off < src_off + count)
> > -			goto out_fput;
> > -	}
> > +	if (src_inode == dst_inode)
> > +		same_inode = true;
> >  
> >  	/* XXX: do we lock at all? what if server needs CB_RECALL_LAYOUT? */
> >  	if (same_inode) {
> > @@ -275,7 +237,7 @@ nfs42_ioctl_clone(struct file *dst_file, unsigned long srcfd,
> >  	if (ret)
> >  		goto out_unlock;
> >  
> > -	ret = nfs42_proc_clone(src_file.file, dst_file, src_off, dst_off, count);
> > +	ret = nfs42_proc_clone(src_file, dst_file, src_off, dst_off, count);
> >  
> >  	/* truncate inode page cache of the dst range so that future reads can fetch
> >  	 * new data from server */
> > @@ -292,37 +254,9 @@ out_unlock:
> >  		mutex_unlock(&dst_inode->i_mutex);
> >  		mutex_unlock(&src_inode->i_mutex);
> >  	}
> > -out_fput:
> > -	fdput(src_file);
> > -out_drop_write:
> > -	mnt_drop_write_file(dst_file);
> > +out:
> >  	return ret;
> >  }
> > -
> > -static long nfs42_ioctl_clone_range(struct file *dst_file, void __user *argp)
> > -{
> > -	struct btrfs_ioctl_clone_range_args args;
> > -
> > -	if (copy_from_user(&args, argp, sizeof(args)))
> > -		return -EFAULT;
> > -
> > -	return nfs42_ioctl_clone(dst_file, args.src_fd, args.src_offset,
> > -				 args.dest_offset, args.src_length);
> > -}
> > -
> > -long nfs4_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> > -{
> > -	void __user *argp = (void __user *)arg;
> > -
> > -	switch (cmd) {
> > -	case BTRFS_IOC_CLONE:
> > -		return nfs42_ioctl_clone(file, arg, 0, 0, 0);
> > -	case BTRFS_IOC_CLONE_RANGE:
> > -		return nfs42_ioctl_clone_range(file, argp);
> > -	}
> > -
> > -	return -ENOTTY;
> > -}
> >  #endif /* CONFIG_NFS_V4_2 */
> >  
> >  const struct file_operations nfs4_file_operations = {
> > @@ -342,8 +276,7 @@ const struct file_operations nfs4_file_operations = {
> >  #ifdef CONFIG_NFS_V4_2
> >  	.llseek		= nfs4_file_llseek,
> >  	.fallocate	= nfs42_fallocate,
> > -	.unlocked_ioctl = nfs4_ioctl,
> > -	.compat_ioctl	= nfs4_ioctl,
> > +	.clone_file_range = nfs42_clone_file_range,
> >  #else
> >  	.llseek		= nfs_file_llseek,
> >  #endif
> > diff --git a/fs/read_write.c b/fs/read_write.c
> > index 6c1aa73..9e3dd8f 100644
> > --- a/fs/read_write.c
> > +++ b/fs/read_write.c
> > @@ -1451,3 +1451,75 @@ out1:
> >  out2:
> >  	return ret;
> >  }
> > +
> > +static int clone_verify_area(struct file *file, loff_t pos, u64 len, bool write)
> > +{
> > +	struct inode *inode = file_inode(file);
> > +
> > +	if (unlikely(pos < 0))
> > +		return -EINVAL;
> > +
> > +	 if (unlikely((loff_t) (pos + len) < 0))
> > +		return -EINVAL;
> > +
> > +	if (unlikely(inode->i_flctx && mandatory_lock(inode))) {
> > +		loff_t end = len ? pos + len - 1 : OFFSET_MAX;
> > +		int retval;
> > +
> > +		retval = locks_mandatory_area(file, pos, end,
> > +				write ? F_WRLCK : F_RDLCK);
> > +		if (retval < 0)
> > +			return retval;
> > +	}
> > +
> > +	return security_file_permission(file, write ? MAY_WRITE : MAY_READ);
> > +}
> > +
> > +int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
> > +		struct file *file_out, loff_t pos_out, u64 len)
> > +{
> > +	struct inode *inode_in = file_inode(file_in);
> > +	struct inode *inode_out = file_inode(file_out);
> > +	int ret;
> > +
> > +	if (inode_in->i_sb != inode_out->i_sb ||
> > +	    file_in->f_path.mnt != file_out->f_path.mnt)
> > +		return -EXDEV;
> > +
> > +	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
> > +		return -EISDIR;
> > +	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
> > +		return -EOPNOTSUPP;
> > +
> > +	if (!(file_in->f_mode & FMODE_READ) ||
> > +	    !(file_out->f_mode & FMODE_WRITE) ||
> > +	    (file_out->f_flags & O_APPEND) ||
> > +	    !file_in->f_op->clone_file_range)
> > +		return -EBADF;
> > +
> > +	ret = clone_verify_area(file_in, pos_in, len, false);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = clone_verify_area(file_out, pos_out, len, true);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (pos_in + len > i_size_read(inode_in))
> > +		return -EINVAL;
> > +
> > +	ret = mnt_want_write_file(file_out);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = file_in->f_op->clone_file_range(file_in, pos_in,
> > +			file_out, pos_out, len);
> > +	if (!ret) {
> > +		fsnotify_access(file_in);
> > +		fsnotify_modify(file_out);
> > +	}
> > +
> > +	mnt_drop_write_file(file_out);
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL(vfs_clone_file_range);
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index af559ac..59bf96d 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -1629,7 +1629,10 @@ struct file_operations {
> >  #ifndef CONFIG_MMU
> >  	unsigned (*mmap_capabilities)(struct file *);
> >  #endif
> > -	ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, loff_t, size_t, unsigned int);
> > +	ssize_t (*copy_file_range)(struct file *, loff_t, struct file *,
> > +			loff_t, size_t, unsigned int);
> > +	int (*clone_file_range)(struct file *, loff_t, struct file *, loff_t,
> > +			u64);
> >  };
> >  
> >  struct inode_operations {
> > @@ -1683,6 +1686,8 @@ extern ssize_t vfs_writev(struct file *, const struct iovec __user *,
> >  		unsigned long, loff_t *);
> >  extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
> >  				   loff_t, size_t, unsigned int);
> > +extern int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
> > +		struct file *file_out, loff_t pos_out, u64 len);
> >  
> >  struct super_operations {
> >     	struct inode *(*alloc_inode)(struct super_block *sb);
> > diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> > index f15d980..cd5db7f 100644
> > --- a/include/uapi/linux/fs.h
> > +++ b/include/uapi/linux/fs.h
> > @@ -39,6 +39,13 @@
> >  #define RENAME_EXCHANGE		(1 << 1)	/* Exchange source and dest */
> >  #define RENAME_WHITEOUT		(1 << 2)	/* Whiteout source */
> >  
> > +struct file_clone_range {
> > +	__s64 src_fd;
> > +	__u64 src_offset;
> > +	__u64 src_length;
> > +	__u64 dest_offset;
> > +};
> > +
> >  struct fstrim_range {
> >  	__u64 start;
> >  	__u64 len;
> > @@ -159,6 +166,8 @@ struct inodes_stat_t {
> >  #define FIFREEZE	_IOWR('X', 119, int)	/* Freeze */
> >  #define FITHAW		_IOWR('X', 120, int)	/* Thaw */
> >  #define FITRIM		_IOWR('X', 121, struct fstrim_range)	/* Trim */
> > +#define FICLONE		_IOW(0x94, 9, int)
> > +#define FICLONERANGE	_IOW(0x94, 13, struct file_clone_range)
> >  
> >  #define	FS_IOC_GETFLAGS			_IOR('f', 1, long)
> >  #define	FS_IOC_SETFLAGS			_IOW('f', 2, long)
> > -- 
> > 1.9.1
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2015-12-14 17:08 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-03 11:59 move btrfs clone ioctls to common code V2 Christoph Hellwig
2015-12-03 11:59 ` [PATCH 1/4] locks: new locks_mandatory_area calling convention Christoph Hellwig
2015-12-03 11:59   ` Christoph Hellwig
2015-12-08  4:05   ` Al Viro
2015-12-08 14:54     ` Christoph Hellwig
2015-12-08 14:54       ` Christoph Hellwig
2015-12-08 16:16       ` Al Viro
2015-12-08 16:16         ` Al Viro
2015-12-03 11:59 ` [PATCH 2/4] vfs: pull btrfs clone API to vfs layer Christoph Hellwig
2015-12-03 11:59   ` Christoph Hellwig
2015-12-07  0:53   ` Darrick J. Wong
2015-12-07  0:53     ` Darrick J. Wong
2015-12-07 15:13     ` Christoph Hellwig
2015-12-07 15:13       ` Christoph Hellwig
2015-12-07 21:09       ` Darrick J. Wong
2015-12-08  1:54       ` Darrick J. Wong
2015-12-08  1:54         ` Darrick J. Wong
2015-12-14 16:34     ` [PATCH 5/4] vfs: return EINVAL for unsupported file types in clone Christoph Hellwig
2015-12-14 16:34       ` Christoph Hellwig
2015-12-09 20:40   ` [PATCH 2/4] vfs: pull btrfs clone API to vfs layer Darrick J. Wong
2015-12-09 20:40     ` Darrick J. Wong
2015-12-14 16:34     ` Christoph Hellwig
2015-12-14 16:34       ` Christoph Hellwig
2015-12-14 17:08     ` Darrick J. Wong
2015-12-14 17:08       ` Darrick J. Wong
2015-12-03 11:59 ` [PATCH 3/4] nfsd: Pass filehandle to nfs4_preprocess_stateid_op() Christoph Hellwig
2015-12-03 11:59 ` [PATCH 4/4] nfsd: implement the NFSv4.2 CLONE operation Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.