linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/4] ovl: efficient copy up by reflink
@ 2016-09-23  8:38 Amir Goldstein
  2016-09-23  8:38 ` [PATCH v4 1/3] vfs: allow vfs_clone_file_range() across mount points Amir Goldstein
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Amir Goldstein @ 2016-09-23  8:38 UTC (permalink / raw)
  To: Miklos Szeredi, Dave Chinner, Christoph Hellwig, linux-unionfs
  Cc: Al Viro, linux-xfs, Darrick J . Wong, linux-fsdevel

This is the 4rd revision of implementing overlayfs
copy up by reflink.

Btrfs has file reflink support and XFS is about to gain
file reflink support soon. It is very useful to use reflink
to implement copy up of regular file data when possible.

For example, on my laptop, xfstest overlay/001 (copy up of 4G
sparse files) takes less than 1 second with copy up by reflink
vs. 25 seconds with regular copy up.

This series includes 3 patches:
- patches 1,2 are re-factoring patches to allow using the
  vfs_clone_file_range() helper from file system code.
- patch 3 utilizes the helper for overlay copy up

These changes passed the unionmount-testsuite (over tmpfs)
They passed the overlay/??? xfstests over the following underlying fs:
1. ext4 (copy up)
2. xfs + reflink patches + mkfs.xfs (copy up)
3. xfs + reflink patches + mkfs.xfs -m reflink=1 (reflink up)

Dave Chinner suggested the following implementation for copy up:
1. try to clone_file_range() entire length
2. fallback to trying copy_file_range() in small chunks
3. fallback to do_splice_direct() in small chunks

This is a good general implementation to cover the future use cases of
filesystems that can do either clone_file_range() or copy_file_range()
and for filesystems that may fail clone_file_range() and succeed
to copy_file_range().

However, currently, the only in-tree file systems that support
clone/copy_file_range are btrfs, xfs (soon), cifs and nfs.
btrfs and xfs use the same implementation for clone and copy range,
so copy_file_range() in small chunks is only useful in extreme
ENOSPC cases.
cifs supports only clone_file_range() so copy_file_range() step is moot.
nfs does have a different implementation for clone_file_range() and
copy_file_range(), but neither nfs nor cifs are supported as lower layer
for overlayfs.

For this posting, I decided to leave out the copy_file_range() patches
until a reliable use case to test them presents itself.

Miklos,

Please pick these patches to your tree for clear and immediate benefit to
copy up performance on filesystems with reflink support.

Thanks,
Amir.

V4:
- Fix recursive mnt_want_write()
- Leave out the vfs_copy_file_range() patches

V3:
- Address style comments from Dave Chinner

V2:
- Re-factor vfs helpers so they can be called from copy up
- Single call to vfs_clone_file_range() and fallback to
  vfs_copy_file_range() loop

V1:
- Replace iteravite call to copy_file_range() with
  a single call to clone_file_range()

V0:
- Call clone_file_range() and fallback to do_splice_direct()


Amir Goldstein (3):
  vfs: allow vfs_clone_file_range() across mount points
  vfs: call vfs_clone_file_range() under mnt_want_write()
  ovl: use vfs_clone_file_range() for copy up if possible

 fs/ioctl.c             | 11 +++++++++++
 fs/overlayfs/copy_up.c |  8 ++++++++
 fs/read_write.c        | 13 ++++++-------
 3 files changed, 25 insertions(+), 7 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v4 1/3] vfs: allow vfs_clone_file_range() across mount points
  2016-09-23  8:38 [PATCH v4 0/4] ovl: efficient copy up by reflink Amir Goldstein
@ 2016-09-23  8:38 ` Amir Goldstein
  2016-09-23  8:38 ` [PATCH v4 2/3] vfs: call vfs_clone_file_range() under mnt_want_write() Amir Goldstein
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Amir Goldstein @ 2016-09-23  8:38 UTC (permalink / raw)
  To: Miklos Szeredi, Dave Chinner, Christoph Hellwig, linux-unionfs
  Cc: Al Viro, linux-xfs, Darrick J . Wong, linux-fsdevel

FICLONE/FICLONERANGE ioctls return -EXDEV if src and dest
files are not on the same mount point.
Practically, clone only requires that src and dest files
are on the same file system.

Move the check for same mount point to ioctl handler and keep
only the check for same super block in the vfs helper.

A following patch is going to use the vfs_clone_file_range()
helper in overlayfs to copy up between lower and upper
mount points on the same file system.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/ioctl.c      | 2 ++
 fs/read_write.c | 8 ++++++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/ioctl.c b/fs/ioctl.c
index c415668..34d2994 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -223,6 +223,8 @@ static long ioctl_file_clone(struct file *dst_file, unsigned long srcfd,
 
 	if (!src_file.file)
 		return -EBADF;
+	if (src_file.file->f_path.mnt != dst_file->f_path.mnt)
+		return -EXDEV;
 	ret = vfs_clone_file_range(src_file.file, off, dst_file, destoff, olen);
 	fdput(src_file);
 	return ret;
diff --git a/fs/read_write.c b/fs/read_write.c
index 66215a7..9dc6e52 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1628,8 +1628,12 @@ int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
 	struct inode *inode_out = file_inode(file_out);
 	int ret;
 
-	if (inode_in->i_sb != inode_out->i_sb ||
-	    file_in->f_path.mnt != file_out->f_path.mnt)
+	/*
+	 * FICLONE/FICLONERANGE ioctls enforce that src and dest files
+	 * are on the same mount point. Practically, they only need
+	 * to be on the same file system.
+	 */
+	if (inode_in->i_sb != inode_out->i_sb)
 		return -EXDEV;
 
 	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v4 2/3] vfs: call vfs_clone_file_range() under mnt_want_write()
  2016-09-23  8:38 [PATCH v4 0/4] ovl: efficient copy up by reflink Amir Goldstein
  2016-09-23  8:38 ` [PATCH v4 1/3] vfs: allow vfs_clone_file_range() across mount points Amir Goldstein
@ 2016-09-23  8:38 ` Amir Goldstein
  2016-09-23  8:38 ` [PATCH v4 3/3] ovl: use vfs_clone_file_range() for copy up if possible Amir Goldstein
  2016-09-26 18:17 ` [PATCH v4 0/4] ovl: efficient copy up by reflink Amir Goldstein
  3 siblings, 0 replies; 5+ messages in thread
From: Amir Goldstein @ 2016-09-23  8:38 UTC (permalink / raw)
  To: Miklos Szeredi, Dave Chinner, Christoph Hellwig, linux-unionfs
  Cc: Al Viro, linux-xfs, Darrick J . Wong, linux-fsdevel

Move mnt_want_write() out of the vfs helper and up into
the ioctl handler.
Taking mnt_want_write() outside the vfs helper is the standard
in most namei.c syscalls.
This change will allow overlayfs code to use the helper for
copy up.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/ioctl.c      | 9 +++++++++
 fs/read_write.c | 5 -----
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/ioctl.c b/fs/ioctl.c
index 34d2994..9299832 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -15,6 +15,7 @@
 #include <linux/writeback.h>
 #include <linux/buffer_head.h>
 #include <linux/falloc.h>
+#include <linux/mount.h>
 #include "internal.h"
 
 #include <asm/ioctls.h>
@@ -225,7 +226,15 @@ static long ioctl_file_clone(struct file *dst_file, unsigned long srcfd,
 		return -EBADF;
 	if (src_file.file->f_path.mnt != dst_file->f_path.mnt)
 		return -EXDEV;
+
+	ret = mnt_want_write_file(dst_file);
+	if (ret)
+		goto out_fput;
+
 	ret = vfs_clone_file_range(src_file.file, off, dst_file, destoff, olen);
+
+	mnt_drop_write_file(dst_file);
+out_fput:
 	fdput(src_file);
 	return ret;
 }
diff --git a/fs/read_write.c b/fs/read_write.c
index 9dc6e52..90bc18b 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1660,10 +1660,6 @@ int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
 	if (pos_in + len > i_size_read(inode_in))
 		return -EINVAL;
 
-	ret = mnt_want_write_file(file_out);
-	if (ret)
-		return ret;
-
 	ret = file_in->f_op->clone_file_range(file_in, pos_in,
 			file_out, pos_out, len);
 	if (!ret) {
@@ -1671,7 +1667,6 @@ int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
 		fsnotify_modify(file_out);
 	}
 
-	mnt_drop_write_file(file_out);
 	return ret;
 }
 EXPORT_SYMBOL(vfs_clone_file_range);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v4 3/3] ovl: use vfs_clone_file_range() for copy up if possible
  2016-09-23  8:38 [PATCH v4 0/4] ovl: efficient copy up by reflink Amir Goldstein
  2016-09-23  8:38 ` [PATCH v4 1/3] vfs: allow vfs_clone_file_range() across mount points Amir Goldstein
  2016-09-23  8:38 ` [PATCH v4 2/3] vfs: call vfs_clone_file_range() under mnt_want_write() Amir Goldstein
@ 2016-09-23  8:38 ` Amir Goldstein
  2016-09-26 18:17 ` [PATCH v4 0/4] ovl: efficient copy up by reflink Amir Goldstein
  3 siblings, 0 replies; 5+ messages in thread
From: Amir Goldstein @ 2016-09-23  8:38 UTC (permalink / raw)
  To: Miklos Szeredi, Dave Chinner, Christoph Hellwig, linux-unionfs
  Cc: Al Viro, linux-xfs, Darrick J . Wong, linux-fsdevel

When copying up within the same fs, try to use vfs_clone_file_range().
This is very efficient when lower and upper are on the same fs
with file reflink support. If vfs_clone_file_range() fails for any
reason, copy up falls back to the regular data copy code.

Tested correct behavior when lower and upper are on:
1. same ext4 (copy)
2. same xfs + reflink patches + mkfs.xfs (copy)
3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (reflink)
4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy)

For comparison, on my laptop, xfstest overlay/001 (copy up of large
sparse files) takes less than 1 second in the xfs reflink setup vs.
25 seconds on the rest of the setups.

Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 43fdc27..93f0a34 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -136,6 +136,13 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
 		goto out_fput;
 	}
 
+	/* Try to use clone_file_range to clone up within the same fs */
+	error = vfs_clone_file_range(old_file, 0, new_file, 0, len);
+	if (!error)
+		goto out;
+	/* Couldn't clone, so now we try to copy the data */
+	error = 0;
+
 	/* FIXME: copy up sparse files efficiently */
 	while (len) {
 		size_t this_len = OVL_COPY_UP_CHUNK_SIZE;
@@ -161,6 +168,7 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
 		len -= bytes;
 	}
 
+out:
 	fput(new_file);
 out_fput:
 	fput(old_file);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v4 0/4] ovl: efficient copy up by reflink
  2016-09-23  8:38 [PATCH v4 0/4] ovl: efficient copy up by reflink Amir Goldstein
                   ` (2 preceding siblings ...)
  2016-09-23  8:38 ` [PATCH v4 3/3] ovl: use vfs_clone_file_range() for copy up if possible Amir Goldstein
@ 2016-09-26 18:17 ` Amir Goldstein
  3 siblings, 0 replies; 5+ messages in thread
From: Amir Goldstein @ 2016-09-26 18:17 UTC (permalink / raw)
  To: Dave Chinner, Christoph Hellwig
  Cc: Al Viro, linux-xfs, Darrick J . Wong, linux-fsdevel,
	linux-unionfs, Miklos Szeredi

On Fri, Sep 23, 2016 at 11:38 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> This is the 4rd revision of implementing overlayfs
> copy up by reflink.
>
> Btrfs has file reflink support and XFS is about to gain
> file reflink support soon. It is very useful to use reflink
> to implement copy up of regular file data when possible.
>
> For example, on my laptop, xfstest overlay/001 (copy up of 4G
> sparse files) takes less than 1 second with copy up by reflink
> vs. 25 seconds with regular copy up.
>
> This series includes 3 patches:
> - patches 1,2 are re-factoring patches to allow using the
>   vfs_clone_file_range() helper from file system code.

Christoph, Dave,

Can you please ack the 2 vfs_clone_file_range() changes?

Thanks,
Amir.

> - patch 3 utilizes the helper for overlay copy up
>
> These changes passed the unionmount-testsuite (over tmpfs)
> They passed the overlay/??? xfstests over the following underlying fs:
> 1. ext4 (copy up)
> 2. xfs + reflink patches + mkfs.xfs (copy up)
> 3. xfs + reflink patches + mkfs.xfs -m reflink=1 (reflink up)
>
> Dave Chinner suggested the following implementation for copy up:
> 1. try to clone_file_range() entire length
> 2. fallback to trying copy_file_range() in small chunks
> 3. fallback to do_splice_direct() in small chunks
>
> This is a good general implementation to cover the future use cases of
> filesystems that can do either clone_file_range() or copy_file_range()
> and for filesystems that may fail clone_file_range() and succeed
> to copy_file_range().
>
> However, currently, the only in-tree file systems that support
> clone/copy_file_range are btrfs, xfs (soon), cifs and nfs.
> btrfs and xfs use the same implementation for clone and copy range,
> so copy_file_range() in small chunks is only useful in extreme
> ENOSPC cases.
> cifs supports only clone_file_range() so copy_file_range() step is moot.
> nfs does have a different implementation for clone_file_range() and
> copy_file_range(), but neither nfs nor cifs are supported as lower layer
> for overlayfs.
>
> For this posting, I decided to leave out the copy_file_range() patches
> until a reliable use case to test them presents itself.
>
> Miklos,
>
> Please pick these patches to your tree for clear and immediate benefit to
> copy up performance on filesystems with reflink support.
>
> Thanks,
> Amir.
>
> V4:
> - Fix recursive mnt_want_write()
> - Leave out the vfs_copy_file_range() patches
>
> V3:
> - Address style comments from Dave Chinner
>
> V2:
> - Re-factor vfs helpers so they can be called from copy up
> - Single call to vfs_clone_file_range() and fallback to
>   vfs_copy_file_range() loop
>
> V1:
> - Replace iteravite call to copy_file_range() with
>   a single call to clone_file_range()
>
> V0:
> - Call clone_file_range() and fallback to do_splice_direct()
>
>
> Amir Goldstein (3):
>   vfs: allow vfs_clone_file_range() across mount points
>   vfs: call vfs_clone_file_range() under mnt_want_write()
>   ovl: use vfs_clone_file_range() for copy up if possible
>
>  fs/ioctl.c             | 11 +++++++++++
>  fs/overlayfs/copy_up.c |  8 ++++++++
>  fs/read_write.c        | 13 ++++++-------
>  3 files changed, 25 insertions(+), 7 deletions(-)
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-09-26 18:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-23  8:38 [PATCH v4 0/4] ovl: efficient copy up by reflink Amir Goldstein
2016-09-23  8:38 ` [PATCH v4 1/3] vfs: allow vfs_clone_file_range() across mount points Amir Goldstein
2016-09-23  8:38 ` [PATCH v4 2/3] vfs: call vfs_clone_file_range() under mnt_want_write() Amir Goldstein
2016-09-23  8:38 ` [PATCH v4 3/3] ovl: use vfs_clone_file_range() for copy up if possible Amir Goldstein
2016-09-26 18:17 ` [PATCH v4 0/4] ovl: efficient copy up by reflink Amir Goldstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).