All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] ovl: efficient copy up by reflink
@ 2016-09-14 12:43 Amir Goldstein
  2016-09-14 12:43 ` [PATCH v3 1/4] vfs: allow vfs_clone_file_range() across mount points Amir Goldstein
                   ` (4 more replies)
  0 siblings, 5 replies; 28+ messages in thread
From: Amir Goldstein @ 2016-09-14 12:43 UTC (permalink / raw)
  To: Miklos Szeredi, Dave Chinner, linux-unionfs
  Cc: Christoph Hellwig, linux-xfs, Darrick J . Wong, linux-fsdevel

This is the 3rd revision of implementing overlayfs
copy up by reflink.

Btrfs has file reflink support and XFS is about to gain
file reflink support soon. It is very useful to use reflink
to implement copy up of regular file data when possible.

For example, on my laptop, xfstest overlay/001 (copy up of 4G
sparse files) takes less than 1 second with copy up by reflink
vs. 25 seconds with regular copy up.

This series includes two pairs of patches:
- patches 1,2 utilize the clone_file_range() API
- patches 3,4 utilize the copy_file_range() API

The two pairs of patches are independent of each other.
They were each tested separately and both tested together.
All combinations passed the unionmount-testsuite (over tmpfs)
All combinations passed the overlay/??? xfstests over the
following underlying fs:
1. ext4 (copy up)
2. xfs + reflink patches + mkfs.xfs (copy up)
3. xfs + reflink patches + mkfs.xfs -m reflink=1 (reflink up)

Dave Chinner suggested the following implementation for copy up,
which I implemented in this series:
1. try to clone_file_range() entire length
2. fallback to trying copy_file_range() in small chunks
3. fallback to do_splice_direct() in small chunks

This is a good general implementation to cover the future use cases of
file systems that can do either clone_file_range() or copy_file_range().
However, currently, the only in-tree file systems that support
clone/copy_file_range are btrfs, xfs (soon), cifs and nfs.
btrfs and xfs use the same implementation for clone and copy range,
so the copy_file_range() step is never needed.
cifs supports only clone_file_range() so copy_file_range() step is moot.
nfs does have a different implementation for clone_file_range() and
copy_file_range(), but nfs is not supported as upper layer for overlayfs
at the moment.

Please pick patches 1,2 for clear and immediate benefit to copy up
performance on filesystems with reflink support.

Please consider picking patches 3,4 additionally for future generations
and for code consolidation into vfs helpers.

Cheers,
Amir.

V3:
- Address style comments from Dave Chinner

V2:
- Re-factor vfs helpers so they can be called from copy up
- Single call to vfs_clone_file_range() and fallback to
  vfs_copy_file_range() loop

V1:
- Replace iteravite call to copy_file_range() with
  a single call to clone_file_range()

V0:
- Call clone_file_range() and fallback to do_splice_direct()

Amir Goldstein (4):
  vfs: allow vfs_clone_file_range() across mount points
  ovl: use vfs_clone_file_range() for copy up if possible
  vfs: allow vfs_copy_file_range() across file systems
  ovl: use vfs_copy_file_range() to copy up file data

 fs/ioctl.c             |  2 ++
 fs/overlayfs/copy_up.c | 22 ++++++++++++++++------
 fs/read_write.c        | 25 ++++++++++++++++++-------
 3 files changed, 36 insertions(+), 13 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 1/4] vfs: allow vfs_clone_file_range() across mount points
  2016-09-14 12:43 [PATCH v3 0/4] ovl: efficient copy up by reflink Amir Goldstein
@ 2016-09-14 12:43 ` Amir Goldstein
  2016-09-14 12:43 ` [PATCH v3 2/4] ovl: use vfs_clone_file_range() for copy up if possible Amir Goldstein
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 28+ messages in thread
From: Amir Goldstein @ 2016-09-14 12:43 UTC (permalink / raw)
  To: Miklos Szeredi, Dave Chinner, linux-unionfs
  Cc: Christoph Hellwig, linux-xfs, Darrick J . Wong, linux-fsdevel

FICLONE/FICLONERANGE ioctls return -EXDEV if src and dest
files are not on the same mount point.
Practically, clone only requires that src and dest files
are on the same file system.

Move the check for same mount point to ioctl code and keep
only the check for same super block in the vfs helper.

A following patch is going to use the vfs_clone_file_range()
helper in overlayfs to copy up between lower and upper
mount points on the same file system.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/ioctl.c      | 2 ++
 fs/read_write.c | 8 ++++++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/ioctl.c b/fs/ioctl.c
index c415668..34d2994 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -223,6 +223,8 @@ static long ioctl_file_clone(struct file *dst_file, unsigned long srcfd,
 
 	if (!src_file.file)
 		return -EBADF;
+	if (src_file.file->f_path.mnt != dst_file->f_path.mnt)
+		return -EXDEV;
 	ret = vfs_clone_file_range(src_file.file, off, dst_file, destoff, olen);
 	fdput(src_file);
 	return ret;
diff --git a/fs/read_write.c b/fs/read_write.c
index 66215a7..9dc6e52 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1628,8 +1628,12 @@ int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
 	struct inode *inode_out = file_inode(file_out);
 	int ret;
 
-	if (inode_in->i_sb != inode_out->i_sb ||
-	    file_in->f_path.mnt != file_out->f_path.mnt)
+	/*
+	 * FICLONE/FICLONERANGE ioctls enforce that src and dest files
+	 * are on the same mount point. Practically, they only need
+	 * to be on the same file system.
+	 */
+	if (inode_in->i_sb != inode_out->i_sb)
 		return -EXDEV;
 
 	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 2/4] ovl: use vfs_clone_file_range() for copy up if possible
  2016-09-14 12:43 [PATCH v3 0/4] ovl: efficient copy up by reflink Amir Goldstein
  2016-09-14 12:43 ` [PATCH v3 1/4] vfs: allow vfs_clone_file_range() across mount points Amir Goldstein
@ 2016-09-14 12:43 ` Amir Goldstein
  2016-09-21 15:09   ` Miklos Szeredi
  2016-09-14 12:43 ` [PATCH v3 3/4] vfs: allow vfs_copy_file_range() across file systems Amir Goldstein
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 28+ messages in thread
From: Amir Goldstein @ 2016-09-14 12:43 UTC (permalink / raw)
  To: Miklos Szeredi, Dave Chinner, linux-unionfs
  Cc: Christoph Hellwig, linux-xfs, Darrick J . Wong, linux-fsdevel

When copying up within the same fs, try to use vfs_clone_file_range().
This is very efficient when lower and upper are on the same fs
with file reflink support. If vfs_clone_file_range() fails because
lower and upper are not on the same fs or if fs has no reflink support,
copy up falls back to the regular data copy code.

Tested correct behavior when lower and upper are on:
1. same ext4 (copy)
2. same xfs + reflink patches + mkfs.xfs (copy)
3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (reflink)
4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy)

For comparison, on my laptop, xfstest overlay/001 (copy up of large
sparse files) takes less than 1 second in the xfs reflink setup vs.
25 seconds on the rest of the setups.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 43fdc27..ba039f8 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -136,6 +136,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
 		goto out_fput;
 	}
 
+	/* Try to use clone_file_range to clone up within the same fs */
+	error = vfs_clone_file_range(old_file, 0, new_file, 0, len);
+	if (!error)
+		goto out;
+	/* If we can clone but clone failed - abort */
+	if (error != -EXDEV && error != -EOPNOTSUPP)
+		goto out;
+	/* Can't clone, so now we try to copy the data */
+	error = 0;
+
 	/* FIXME: copy up sparse files efficiently */
 	while (len) {
 		size_t this_len = OVL_COPY_UP_CHUNK_SIZE;
@@ -160,7 +170,7 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
 
 		len -= bytes;
 	}
-
+out:
 	fput(new_file);
 out_fput:
 	fput(old_file);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 3/4] vfs: allow vfs_copy_file_range() across file systems
  2016-09-14 12:43 [PATCH v3 0/4] ovl: efficient copy up by reflink Amir Goldstein
  2016-09-14 12:43 ` [PATCH v3 1/4] vfs: allow vfs_clone_file_range() across mount points Amir Goldstein
  2016-09-14 12:43 ` [PATCH v3 2/4] ovl: use vfs_clone_file_range() for copy up if possible Amir Goldstein
@ 2016-09-14 12:43 ` Amir Goldstein
  2016-09-23  7:57   ` Amir Goldstein
  2016-09-14 12:43 ` [PATCH v3 4/4] ovl: use vfs_copy_file_range() to copy up file data Amir Goldstein
  2016-09-19 18:55 ` [PATCH v3 0/4] ovl: efficient copy up by reflink Amir Goldstein
  4 siblings, 1 reply; 28+ messages in thread
From: Amir Goldstein @ 2016-09-14 12:43 UTC (permalink / raw)
  To: Miklos Szeredi, Dave Chinner, linux-unionfs
  Cc: Christoph Hellwig, linux-xfs, Darrick J . Wong, linux-fsdevel

copy_file_range syscall returns -EXDEV if src and dest
file are not on the same file system.
The vfs_copy_file_range() helper, however, knows how to copy
across file systems with do_splice_direct().

Move the enforcement of same file system from the vfs helper
to the syscall code.

A following patch is going to use the vfs_copy_file_range()
helper in overlayfs to copy up between lower and upper
not on the same file system.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/read_write.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 9dc6e52..6975fe8 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1502,10 +1502,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	    (file_out->f_flags & O_APPEND))
 		return -EBADF;
 
-	/* this could be relaxed once a method supports cross-fs copies */
-	if (inode_in->i_sb != inode_out->i_sb)
-		return -EXDEV;
-
 	if (len == 0)
 		return 0;
 
@@ -1514,7 +1510,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 		return ret;
 
 	ret = -EOPNOTSUPP;
-	if (file_out->f_op->copy_file_range)
+	/* copy_file_range() method does not support cross-fs copies */
+	if (inode_in->i_sb == inode_out->i_sb &&
+	    file_out->f_op->copy_file_range)
 		ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
 						      pos_out, len, flags);
 	if (ret == -EOPNOTSUPP)
@@ -1569,6 +1567,14 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
 		pos_out = f_out.file->f_pos;
 	}
 
+	/*
+	 * vfs_copy_file_range() can do cross-fs copy, but we want to
+	 * fulfill the guaranty to userland that copy_file_range syscall
+	 * does not allow cross-fs copy
+	 */
+	if (file_inode(f_in.file)->i_sb != file_inode(f_out.file)->i_sb)
+		return -EXDEV;
+
 	ret = vfs_copy_file_range(f_in.file, pos_in, f_out.file, pos_out, len,
 				  flags);
 	if (ret > 0) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 4/4] ovl: use vfs_copy_file_range() to copy up file data
  2016-09-14 12:43 [PATCH v3 0/4] ovl: efficient copy up by reflink Amir Goldstein
                   ` (2 preceding siblings ...)
  2016-09-14 12:43 ` [PATCH v3 3/4] vfs: allow vfs_copy_file_range() across file systems Amir Goldstein
@ 2016-09-14 12:43 ` Amir Goldstein
  2016-09-22  8:49   ` Amir Goldstein
  2016-09-19 18:55 ` [PATCH v3 0/4] ovl: efficient copy up by reflink Amir Goldstein
  4 siblings, 1 reply; 28+ messages in thread
From: Amir Goldstein @ 2016-09-14 12:43 UTC (permalink / raw)
  To: Miklos Szeredi, Dave Chinner, linux-unionfs
  Cc: Christoph Hellwig, linux-xfs, Darrick J . Wong, linux-fsdevel

Use vfs_copy_file_range() helper instead of calling do_splice_direct()
when copying up file data.
When copying up within the same fs, which supports copy_file_range(),
fs implementation can be more efficient then do_splice_direct().
vfs_copy_file_range() helper falls back to do_splice_direct() if
it cannot use the file system's copy_file_range() implementation.

A previous change added a vfs_clone_file_range() call before
the data copy loop, so this change is only effective for filesystems
that support copy_file_range() and *do not* support clone_file_range().
At the moment, there are no such filesystems in the kernel that
can be used as overlayfs upper, so I tested this change by disabling
the vfs_clone_file_range() call.

Tested correct behavior when lower and upper are on:
1. same ext4 (copy)
2. same xfs + reflink patches + mkfs.xfs (copy)
3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (reflink)
4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy)

For comparison, on my laptop, xfstest overlay/001 (copy up of large
sparse files) takes less than 1 second in the xfs reflink setup vs.
25 seconds on the rest of the setups.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c | 10 +++++-----
 fs/read_write.c        |  1 +
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index ba039f8..a6d6bac 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -146,7 +146,6 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
 	/* Can't clone, so now we try to copy the data */
 	error = 0;
 
-	/* FIXME: copy up sparse files efficiently */
 	while (len) {
 		size_t this_len = OVL_COPY_UP_CHUNK_SIZE;
 		long bytes;
@@ -159,15 +158,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
 			break;
 		}
 
-		bytes = do_splice_direct(old_file, &old_pos,
-					 new_file, &new_pos,
-					 this_len, SPLICE_F_MOVE);
+		bytes = vfs_copy_file_range(old_file, old_pos,
+					    new_file, new_pos,
+					    this_len, 0);
 		if (bytes <= 0) {
 			error = bytes;
 			break;
 		}
-		WARN_ON(old_pos != new_pos);
 
+		old_pos += bytes;
+		new_pos += bytes;
 		len -= bytes;
 	}
 out:
diff --git a/fs/read_write.c b/fs/read_write.c
index 6975fe8..dfc083a 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1515,6 +1515,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	    file_out->f_op->copy_file_range)
 		ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
 						      pos_out, len, flags);
+	/* FIXME: copy sparse file range efficiently */
 	if (ret == -EOPNOTSUPP)
 		ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out,
 				len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/4] ovl: efficient copy up by reflink
  2016-09-14 12:43 [PATCH v3 0/4] ovl: efficient copy up by reflink Amir Goldstein
                   ` (3 preceding siblings ...)
  2016-09-14 12:43 ` [PATCH v3 4/4] ovl: use vfs_copy_file_range() to copy up file data Amir Goldstein
@ 2016-09-19 18:55 ` Amir Goldstein
  4 siblings, 0 replies; 28+ messages in thread
From: Amir Goldstein @ 2016-09-19 18:55 UTC (permalink / raw)
  To: Miklos Szeredi, Dave Chinner, linux-unionfs, Christoph Hellwig
  Cc: linux-xfs, Darrick J . Wong, linux-fsdevel

On Wed, Sep 14, 2016 at 3:43 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> This is the 3rd revision of implementing overlayfs
> copy up by reflink.
>

Ping.

Dave, Christoph,

Get I get your Acks please?

Miklos,

Can you please look at least at patches 1,2

Thanks!
Amir.

> Btrfs has file reflink support and XFS is about to gain
> file reflink support soon. It is very useful to use reflink
> to implement copy up of regular file data when possible.
>
> For example, on my laptop, xfstest overlay/001 (copy up of 4G
> sparse files) takes less than 1 second with copy up by reflink
> vs. 25 seconds with regular copy up.
>
> This series includes two pairs of patches:
> - patches 1,2 utilize the clone_file_range() API
> - patches 3,4 utilize the copy_file_range() API
>
> The two pairs of patches are independent of each other.
> They were each tested separately and both tested together.
> All combinations passed the unionmount-testsuite (over tmpfs)
> All combinations passed the overlay/??? xfstests over the
> following underlying fs:
> 1. ext4 (copy up)
> 2. xfs + reflink patches + mkfs.xfs (copy up)
> 3. xfs + reflink patches + mkfs.xfs -m reflink=1 (reflink up)
>
> Dave Chinner suggested the following implementation for copy up,
> which I implemented in this series:
> 1. try to clone_file_range() entire length
> 2. fallback to trying copy_file_range() in small chunks
> 3. fallback to do_splice_direct() in small chunks
>
> This is a good general implementation to cover the future use cases of
> file systems that can do either clone_file_range() or copy_file_range().
> However, currently, the only in-tree file systems that support
> clone/copy_file_range are btrfs, xfs (soon), cifs and nfs.
> btrfs and xfs use the same implementation for clone and copy range,
> so the copy_file_range() step is never needed.
> cifs supports only clone_file_range() so copy_file_range() step is moot.
> nfs does have a different implementation for clone_file_range() and
> copy_file_range(), but nfs is not supported as upper layer for overlayfs
> at the moment.
>
> Please pick patches 1,2 for clear and immediate benefit to copy up
> performance on filesystems with reflink support.
>
> Please consider picking patches 3,4 additionally for future generations
> and for code consolidation into vfs helpers.
>
> Cheers,
> Amir.
>
> V3:
> - Address style comments from Dave Chinner
>
> V2:
> - Re-factor vfs helpers so they can be called from copy up
> - Single call to vfs_clone_file_range() and fallback to
>   vfs_copy_file_range() loop
>
> V1:
> - Replace iteravite call to copy_file_range() with
>   a single call to clone_file_range()
>
> V0:
> - Call clone_file_range() and fallback to do_splice_direct()
>
> Amir Goldstein (4):
>   vfs: allow vfs_clone_file_range() across mount points
>   ovl: use vfs_clone_file_range() for copy up if possible
>   vfs: allow vfs_copy_file_range() across file systems
>   ovl: use vfs_copy_file_range() to copy up file data
>
>  fs/ioctl.c             |  2 ++
>  fs/overlayfs/copy_up.c | 22 ++++++++++++++++------
>  fs/read_write.c        | 25 ++++++++++++++++++-------
>  3 files changed, 36 insertions(+), 13 deletions(-)
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/4] ovl: use vfs_clone_file_range() for copy up if possible
  2016-09-14 12:43 ` [PATCH v3 2/4] ovl: use vfs_clone_file_range() for copy up if possible Amir Goldstein
@ 2016-09-21 15:09   ` Miklos Szeredi
  2016-09-21 17:01     ` Amir Goldstein
  0 siblings, 1 reply; 28+ messages in thread
From: Miklos Szeredi @ 2016-09-21 15:09 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Dave Chinner, linux-unionfs, Christoph Hellwig, linux-xfs,
	Darrick J . Wong, linux-fsdevel

On Wed, Sep 14, 2016 at 2:43 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> When copying up within the same fs, try to use vfs_clone_file_range().
> This is very efficient when lower and upper are on the same fs
> with file reflink support. If vfs_clone_file_range() fails because
> lower and upper are not on the same fs or if fs has no reflink support,
> copy up falls back to the regular data copy code.
>
> Tested correct behavior when lower and upper are on:
> 1. same ext4 (copy)
> 2. same xfs + reflink patches + mkfs.xfs (copy)
> 3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (reflink)
> 4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy)
>
> For comparison, on my laptop, xfstest overlay/001 (copy up of large
> sparse files) takes less than 1 second in the xfs reflink setup vs.
> 25 seconds on the rest of the setups.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/copy_up.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index 43fdc27..ba039f8 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -136,6 +136,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
>                 goto out_fput;
>         }
>
> +       /* Try to use clone_file_range to clone up within the same fs */
> +       error = vfs_clone_file_range(old_file, 0, new_file, 0, len);
> +       if (!error)
> +               goto out;
> +       /* If we can clone but clone failed - abort */
> +       if (error != -EXDEV && error != -EOPNOTSUPP)
> +               goto out;

Would be safer to fall back on any error.

Otherwise ACK.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/4] ovl: use vfs_clone_file_range() for copy up if possible
  2016-09-21 15:09   ` Miklos Szeredi
@ 2016-09-21 17:01     ` Amir Goldstein
  2016-09-21 18:29       ` Miklos Szeredi
  2016-09-21 21:48       ` Dave Chinner
  0 siblings, 2 replies; 28+ messages in thread
From: Amir Goldstein @ 2016-09-21 17:01 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Dave Chinner, linux-unionfs, Christoph Hellwig, linux-xfs,
	Darrick J . Wong, linux-fsdevel

On Wed, Sep 21, 2016 at 6:09 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Wed, Sep 14, 2016 at 2:43 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> When copying up within the same fs, try to use vfs_clone_file_range().
>> This is very efficient when lower and upper are on the same fs
>> with file reflink support. If vfs_clone_file_range() fails because
>> lower and upper are not on the same fs or if fs has no reflink support,
>> copy up falls back to the regular data copy code.
>>
>> Tested correct behavior when lower and upper are on:
>> 1. same ext4 (copy)
>> 2. same xfs + reflink patches + mkfs.xfs (copy)
>> 3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (reflink)
>> 4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy)
>>
>> For comparison, on my laptop, xfstest overlay/001 (copy up of large
>> sparse files) takes less than 1 second in the xfs reflink setup vs.
>> 25 seconds on the rest of the setups.
>>
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>>  fs/overlayfs/copy_up.c | 12 +++++++++++-
>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
>> index 43fdc27..ba039f8 100644
>> --- a/fs/overlayfs/copy_up.c
>> +++ b/fs/overlayfs/copy_up.c
>> @@ -136,6 +136,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
>>                 goto out_fput;
>>         }
>>
>> +       /* Try to use clone_file_range to clone up within the same fs */
>> +       error = vfs_clone_file_range(old_file, 0, new_file, 0, len);
>> +       if (!error)
>> +               goto out;
>> +       /* If we can clone but clone failed - abort */
>> +       if (error != -EXDEV && error != -EOPNOTSUPP)
>> +               goto out;
>
> Would be safer to fall back on any error.
>

Agreed. Dave, since you suggested the 'softer' fall back, do you have
any objections?

> Otherwise ACK.
>

Will you be taking this to your tree?

Please note that this patch depends on patch v3 1/4,
because vfs_clone_file_range() in current mainline
fails to clone from lower to upper due to upper and lower being
private mount clones
and therefore not the same f_path.mnt.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/4] ovl: use vfs_clone_file_range() for copy up if possible
  2016-09-21 17:01     ` Amir Goldstein
@ 2016-09-21 18:29       ` Miklos Szeredi
  2016-09-29  9:00         ` Amir Goldstein
  2016-09-21 21:48       ` Dave Chinner
  1 sibling, 1 reply; 28+ messages in thread
From: Miklos Szeredi @ 2016-09-21 18:29 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Dave Chinner, linux-unionfs, Christoph Hellwig, linux-xfs,
	Darrick J . Wong, linux-fsdevel

On Wed, Sep 21, 2016 at 7:01 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, Sep 21, 2016 at 6:09 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Wed, Sep 14, 2016 at 2:43 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> When copying up within the same fs, try to use vfs_clone_file_range().
>>> This is very efficient when lower and upper are on the same fs
>>> with file reflink support. If vfs_clone_file_range() fails because
>>> lower and upper are not on the same fs or if fs has no reflink support,
>>> copy up falls back to the regular data copy code.
>>>
>>> Tested correct behavior when lower and upper are on:
>>> 1. same ext4 (copy)
>>> 2. same xfs + reflink patches + mkfs.xfs (copy)
>>> 3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (reflink)
>>> 4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy)
>>>
>>> For comparison, on my laptop, xfstest overlay/001 (copy up of large
>>> sparse files) takes less than 1 second in the xfs reflink setup vs.
>>> 25 seconds on the rest of the setups.
>>>
>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>> ---
>>>  fs/overlayfs/copy_up.c | 12 +++++++++++-
>>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
>>> index 43fdc27..ba039f8 100644
>>> --- a/fs/overlayfs/copy_up.c
>>> +++ b/fs/overlayfs/copy_up.c
>>> @@ -136,6 +136,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
>>>                 goto out_fput;
>>>         }
>>>
>>> +       /* Try to use clone_file_range to clone up within the same fs */
>>> +       error = vfs_clone_file_range(old_file, 0, new_file, 0, len);
>>> +       if (!error)
>>> +               goto out;
>>> +       /* If we can clone but clone failed - abort */
>>> +       if (error != -EXDEV && error != -EOPNOTSUPP)
>>> +               goto out;
>>
>> Would be safer to fall back on any error.
>>
>
> Agreed. Dave, since you suggested the 'softer' fall back, do you have
> any objections?
>
>> Otherwise ACK.
>>
>
> Will you be taking this to your tree?

Sure I can take it.

>
> Please note that this patch depends on patch v3 1/4,
> because vfs_clone_file_range() in current mainline
> fails to clone from lower to upper due to upper and lower being
> private mount clones
> and therefore not the same f_path.mnt.

Right.  I didn't do a thorough audit of ->clone_file_range()
implementations, but 1/4 is probably OK.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/4] ovl: use vfs_clone_file_range() for copy up if possible
  2016-09-21 17:01     ` Amir Goldstein
  2016-09-21 18:29       ` Miklos Szeredi
@ 2016-09-21 21:48       ` Dave Chinner
  2016-09-21 21:57         ` Al Viro
  1 sibling, 1 reply; 28+ messages in thread
From: Dave Chinner @ 2016-09-21 21:48 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, linux-unionfs, Christoph Hellwig, linux-xfs,
	Darrick J . Wong, linux-fsdevel

On Wed, Sep 21, 2016 at 08:01:22PM +0300, Amir Goldstein wrote:
> On Wed, Sep 21, 2016 at 6:09 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> > On Wed, Sep 14, 2016 at 2:43 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> >> When copying up within the same fs, try to use vfs_clone_file_range().
> >> This is very efficient when lower and upper are on the same fs
> >> with file reflink support. If vfs_clone_file_range() fails because
> >> lower and upper are not on the same fs or if fs has no reflink support,
> >> copy up falls back to the regular data copy code.
> >>
> >> Tested correct behavior when lower and upper are on:
> >> 1. same ext4 (copy)
> >> 2. same xfs + reflink patches + mkfs.xfs (copy)
> >> 3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (reflink)
> >> 4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy)
> >>
> >> For comparison, on my laptop, xfstest overlay/001 (copy up of large
> >> sparse files) takes less than 1 second in the xfs reflink setup vs.
> >> 25 seconds on the rest of the setups.
> >>
> >> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >> ---
> >>  fs/overlayfs/copy_up.c | 12 +++++++++++-
> >>  1 file changed, 11 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> >> index 43fdc27..ba039f8 100644
> >> --- a/fs/overlayfs/copy_up.c
> >> +++ b/fs/overlayfs/copy_up.c
> >> @@ -136,6 +136,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
> >>                 goto out_fput;
> >>         }
> >>
> >> +       /* Try to use clone_file_range to clone up within the same fs */
> >> +       error = vfs_clone_file_range(old_file, 0, new_file, 0, len);
> >> +       if (!error)
> >> +               goto out;
> >> +       /* If we can clone but clone failed - abort */
> >> +       if (error != -EXDEV && error != -EOPNOTSUPP)
> >> +               goto out;
> >
> > Would be safer to fall back on any error.
> >
> 
> Agreed. Dave, since you suggested the 'softer' fall back, do you have
> any objections?

If you get any error other than -EXDEV or -EOPNOTSUPP from a clone
operation, there's somethign seriously wrong with the metadata of
the inode or th eunderlying storage. You're not going to be able to
copy the data if a clone fails for exactly the same reasons.

What's worse is that you might get a partial copy before failure, or
there might not be a failure during copy at all because the fs uses
delayed allocation and the corruption problem during allocation that
prevented clone from working is not triggered until writeback time.
i.e. you may not have anyone to report the fact that the copyup
failed by the time the failure is known.

Your choice, really - I'd much prefer that known hard errors are
propagated immediately than leave them to chance and have a user
then wonder where the $ANTI-DEITY their data has gone later on.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/4] ovl: use vfs_clone_file_range() for copy up if possible
  2016-09-21 21:48       ` Dave Chinner
@ 2016-09-21 21:57         ` Al Viro
  2016-09-21 22:33           ` Dave Chinner
  0 siblings, 1 reply; 28+ messages in thread
From: Al Viro @ 2016-09-21 21:57 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, Christoph Hellwig,
	linux-xfs, Darrick J . Wong, linux-fsdevel

On Thu, Sep 22, 2016 at 07:48:15AM +1000, Dave Chinner wrote:

> If you get any error other than -EXDEV or -EOPNOTSUPP from a clone
> operation, there's somethign seriously wrong with the metadata of
> the inode or th eunderlying storage.

Such as -ENOSPC?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/4] ovl: use vfs_clone_file_range() for copy up if possible
  2016-09-21 21:57         ` Al Viro
@ 2016-09-21 22:33           ` Dave Chinner
  2016-09-22  2:25             ` Darrick J. Wong
  0 siblings, 1 reply; 28+ messages in thread
From: Dave Chinner @ 2016-09-21 22:33 UTC (permalink / raw)
  To: Al Viro
  Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, Christoph Hellwig,
	linux-xfs, Darrick J . Wong, linux-fsdevel

On Wed, Sep 21, 2016 at 10:57:24PM +0100, Al Viro wrote:
> On Thu, Sep 22, 2016 at 07:48:15AM +1000, Dave Chinner wrote:
> 
> > If you get any error other than -EXDEV or -EOPNOTSUPP from a clone
> > operation, there's somethign seriously wrong with the metadata of
> > the inode or the underlying storage.
> 
> Such as -ENOSPC?

Yup, that's a fatal error, too.  i.e. if a clone returns ENOSPC
because there isn't space for the extra metadata, then the fallback
data copy is almost certainly going to fail with ENOSPC when trying
to reserve/allocate space for both the extra data copy and the extra
metadata....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/4] ovl: use vfs_clone_file_range() for copy up if possible
  2016-09-21 22:33           ` Dave Chinner
@ 2016-09-22  2:25             ` Darrick J. Wong
  2016-09-22  2:52               ` Amir Goldstein
  0 siblings, 1 reply; 28+ messages in thread
From: Darrick J. Wong @ 2016-09-22  2:25 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Al Viro, Amir Goldstein, Miklos Szeredi, linux-unionfs,
	Christoph Hellwig, linux-xfs, linux-fsdevel

On Thu, Sep 22, 2016 at 08:33:31AM +1000, Dave Chinner wrote:
> On Wed, Sep 21, 2016 at 10:57:24PM +0100, Al Viro wrote:
> > On Thu, Sep 22, 2016 at 07:48:15AM +1000, Dave Chinner wrote:
> > 
> > > If you get any error other than -EXDEV or -EOPNOTSUPP from a clone
> > > operation, there's somethign seriously wrong with the metadata of
> > > the inode or the underlying storage.
> > 
> > Such as -ENOSPC?
> 
> Yup, that's a fatal error, too.  i.e. if a clone returns ENOSPC
> because there isn't space for the extra metadata, then the fallback
> data copy is almost certainly going to fail with ENOSPC when trying
> to reserve/allocate space for both the extra data copy and the extra
> metadata....

XFS will return ENOSPC during reflink if one of the relevant AGs is
running low on space for the refcount/rmap trees; however there might be
enough blocks in another AG to make a regular old copy.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/4] ovl: use vfs_clone_file_range() for copy up if possible
  2016-09-22  2:25             ` Darrick J. Wong
@ 2016-09-22  2:52               ` Amir Goldstein
  0 siblings, 0 replies; 28+ messages in thread
From: Amir Goldstein @ 2016-09-22  2:52 UTC (permalink / raw)
  To: Darrick J. Wong, Miklos Szeredi, Dave Chinner, Al Viro
  Cc: linux-unionfs, Christoph Hellwig, linux-xfs, linux-fsdevel

On Thu, Sep 22, 2016 at 5:25 AM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> On Thu, Sep 22, 2016 at 08:33:31AM +1000, Dave Chinner wrote:
>> On Wed, Sep 21, 2016 at 10:57:24PM +0100, Al Viro wrote:
>> > On Thu, Sep 22, 2016 at 07:48:15AM +1000, Dave Chinner wrote:
>> >
>> > > If you get any error other than -EXDEV or -EOPNOTSUPP from a clone
>> > > operation, there's somethign seriously wrong with the metadata of
>> > > the inode or the underlying storage.
>> >
>> > Such as -ENOSPC?
>>
>> Yup, that's a fatal error, too.  i.e. if a clone returns ENOSPC
>> because there isn't space for the extra metadata, then the fallback
>> data copy is almost certainly going to fail with ENOSPC when trying
>> to reserve/allocate space for both the extra data copy and the extra
>> metadata....
>
> XFS will return ENOSPC during reflink if one of the relevant AGs is
> running low on space for the refcount/rmap trees; however there might be
> enough blocks in another AG to make a regular old copy.
>

That settles it then. I rather be prudent and retry copy anyway.
Other (maybe future) filesystems may have their own non-fatal reasons
to fail clone
and they have the right to do so.

I will resend the ACKed clone_range patch pair for Miklos to pick up

For now, I am not at ease about resending the copy_range patch pair
without a proper way for users to get independent test coverage for it.

Unless Miklos or Al feel confident enough about taking those patches
on my testing testimony and their review?

Darrick, do you have an easy way to reproduce the extreme case of
clone failure due to certain AG ENOSPC? any xfstest for it?
I recon that the fallback to copy_range could be useful in that case
(i.e. very large file with one block in the unfortunate AG)

Cheers,
Amir.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 4/4] ovl: use vfs_copy_file_range() to copy up file data
  2016-09-14 12:43 ` [PATCH v3 4/4] ovl: use vfs_copy_file_range() to copy up file data Amir Goldstein
@ 2016-09-22  8:49   ` Amir Goldstein
  2016-09-22 14:49     ` Miklos Szeredi
  0 siblings, 1 reply; 28+ messages in thread
From: Amir Goldstein @ 2016-09-22  8:49 UTC (permalink / raw)
  To: Miklos Szeredi, Dave Chinner, linux-unionfs, Al Viro
  Cc: Christoph Hellwig, linux-xfs, Darrick J . Wong, linux-fsdevel

On Wed, Sep 14, 2016 at 3:43 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> Use vfs_copy_file_range() helper instead of calling do_splice_direct()
> when copying up file data.
> When copying up within the same fs, which supports copy_file_range(),
> fs implementation can be more efficient then do_splice_direct().
> vfs_copy_file_range() helper falls back to do_splice_direct() if
> it cannot use the file system's copy_file_range() implementation.
>
> A previous change added a vfs_clone_file_range() call before
> the data copy loop, so this change is only effective for filesystems
> that support copy_file_range() and *do not* support clone_file_range().
> At the moment, there are no such filesystems in the kernel that
> can be used as overlayfs upper, so I tested this change by disabling
> the vfs_clone_file_range() call.
>
> Tested correct behavior when lower and upper are on:
> 1. same ext4 (copy)
> 2. same xfs + reflink patches + mkfs.xfs (copy)
> 3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (reflink)
> 4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy)
>
> For comparison, on my laptop, xfstest overlay/001 (copy up of large
> sparse files) takes less than 1 second in the xfs reflink setup vs.
> 25 seconds on the rest of the setups.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/copy_up.c | 10 +++++-----
>  fs/read_write.c        |  1 +
>  2 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index ba039f8..a6d6bac 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -146,7 +146,6 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
>         /* Can't clone, so now we try to copy the data */
>         error = 0;
>
> -       /* FIXME: copy up sparse files efficiently */
>         while (len) {
>                 size_t this_len = OVL_COPY_UP_CHUNK_SIZE;
>                 long bytes;
> @@ -159,15 +158,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
>                         break;
>                 }
>
> -               bytes = do_splice_direct(old_file, &old_pos,
> -                                        new_file, &new_pos,
> -                                        this_len, SPLICE_F_MOVE);
> +               bytes = vfs_copy_file_range(old_file, old_pos,
> +                                           new_file, new_pos,
> +                                           this_len, 0);
>                 if (bytes <= 0) {
>                         error = bytes;
>                         break;
>                 }
> -               WARN_ON(old_pos != new_pos);
>
> +               old_pos += bytes;
> +               new_pos += bytes;
>                 len -= bytes;
>         }
>  out:
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 6975fe8..dfc083a 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1515,6 +1515,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>             file_out->f_op->copy_file_range)
>                 ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
>                                                       pos_out, len, flags);
> +       /* FIXME: copy sparse file range efficiently */
>         if (ret == -EOPNOTSUPP)
>                 ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out,
>                                 len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0);
> --
> 2.7.4
>

Guys, I just hit a lockdep warning on nesting an mnt_want_write() with
this patch
because overlay already holds the write lock when getting to ovl_copy_up_data()
I don't know how I missed it before.

Is it really a problem to nest this lock?
Should I factor our another set of _locked helpers from the
vfs_{copy,clone}_file_range helpers?

===========

[  675.814754] =============================================
[  675.814755] [ INFO: possible recursive locking detected ]
[  675.814757] 4.8.0-rc5+ #5 Tainted: G        W  O
[  675.814758] ---------------------------------------------
[  675.814759] xfs_io/15241 is trying to acquire lock:
[  675.814760]  (sb_writers#7){.+.+.+}, at: [<ffffffffa2277144>]
__sb_start_write+0xb4/0xf0
[  675.814767]
               but task is already holding lock:
[  675.814768]  (sb_writers#7){.+.+.+}, at: [<ffffffffa2277144>]
__sb_start_write+0xb4/0xf0
[  675.814772]
               other info that might help us debug this:
[  675.814773]  Possible unsafe locking scenario:

[  675.814774]        CPU0
[  675.814775]        ----
[  675.814775]   lock(sb_writers#7);
[  675.814777]   lock(sb_writers#7);
[  675.814779]
                *** DEADLOCK ***

[  675.814780]  May be due to missing lock nesting notation

[  675.814782] 4 locks held by xfs_io/15241:
[  675.814783]  #0:  (sb_writers#7){.+.+.+}, at: [<ffffffffa2277144>]
__sb_start_write+0xb4/0xf0
[  675.814787]  #1:  (&type->s_vfs_rename_key){+.+.+.}, at:
[<ffffffffa227f992>] lock_rename+0x32/0x100
[  675.814791]  #2:  (&type->i_mutex_dir_key#2/1){+.+.+.}, at:
[<ffffffffa227fa40>] lock_rename+0xe0/0x100
[  675.814796]  #3:  (&type->i_mutex_dir_key#2/5){+.+.+.}, at:
[<ffffffffa227fa55>] lock_rename+0xf5/0x100
[  675.814800]
               stack backtrace:
[  675.814802] CPU: 2 PID: 15241 Comm: xfs_io Tainted: G        W  O
 4.8.0-rc5+ #5
[  675.814804] Hardware name: Dell Inc. Latitude E7450/0D8H72, BIOS
A13 05/17/2016
[  675.814805]  0000000000000086 00000000029bfc05 ffff9cd454ab78b0
ffffffffa2474d83
[  675.814808]  ffffffffa3a7b6f0 ffff9cd4685f0000 ffff9cd454ab7980
ffffffffa20ed54d
[  675.814810]  0000000000000296 0000000400000246 ffff9cd400000000
ffffffffa3392101
[  675.814813] Call Trace:
[  675.814816]  [<ffffffffa2474d83>] dump_stack+0x85/0xc2
[  675.814820]  [<ffffffffa20ed54d>] __lock_acquire+0x148d/0x14f0
[  675.814822]  [<ffffffffa22ec170>] ? dquot_file_open+0x40/0x50
[  675.814825]  [<ffffffffa20eda70>] lock_acquire+0x100/0x1f0
[  675.814826]  [<ffffffffa2277144>] ? __sb_start_write+0xb4/0xf0
[  675.814829]  [<ffffffffa20e6b1f>] percpu_down_read+0x4f/0xa0
[  675.814830]  [<ffffffffa2277144>] ? __sb_start_write+0xb4/0xf0
[  675.814832]  [<ffffffffa2277144>] __sb_start_write+0xb4/0xf0
[  675.814834]  [<ffffffffa23cdd2d>] ? security_file_permission+0x3d/0xc0
[  675.814837]  [<ffffffffa229ba88>] mnt_want_write_file+0x28/0x60
[  675.814839]  [<ffffffffa2274dd2>] vfs_copy_file_range+0xc2/0x270
[  675.814844]  [<ffffffffc0c797f6>] ovl_copy_up_one+0x4d6/0x710 [overlay]
[  675.814847]  [<ffffffffc0c79b16>] ovl_copy_up+0xe6/0x118 [overlay]
[  675.814850]  [<ffffffffc0c75d6d>] ovl_open_maybe_copy_up+0x8d/0xd0 [overlay]
[  675.814852]  [<ffffffffc0c73303>] ovl_d_real+0xd3/0x130 [overlay]
[  675.814854]  [<ffffffffa227209f>] vfs_open+0x6f/0x80
[  675.814856]  [<ffffffffa2284698>] path_openat+0x2a8/0xb80
[  675.814858]  [<ffffffffa22864e1>] do_filp_open+0x91/0x100
[  675.814861]  [<ffffffffa292abe7>] ? _raw_spin_unlock+0x27/0x40
[  675.814862]  [<ffffffffa2297f39>] ? __alloc_fd+0xf9/0x210
[  675.814864]  [<ffffffffa2272494>] do_sys_open+0x124/0x210
[  675.814867]  [<ffffffffa227259e>] SyS_open+0x1e/0x20
[  675.814868]  [<ffffffffa292b540>] entry_SYSCALL_64_fastpath+0x23/0xc1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 4/4] ovl: use vfs_copy_file_range() to copy up file data
  2016-09-22  8:49   ` Amir Goldstein
@ 2016-09-22 14:49     ` Miklos Szeredi
  2016-09-22 15:44       ` Amir Goldstein
  0 siblings, 1 reply; 28+ messages in thread
From: Miklos Szeredi @ 2016-09-22 14:49 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Dave Chinner, linux-unionfs, Al Viro, Christoph Hellwig,
	linux-xfs, Darrick J . Wong, linux-fsdevel

On Thu, Sep 22, 2016 at 10:49 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> Guys, I just hit a lockdep warning on nesting an mnt_want_write() with
> this patch
> because overlay already holds the write lock when getting to ovl_copy_up_data()
> I don't know how I missed it before.
>
> Is it really a problem to nest this lock?
> Should I factor our another set of _locked helpers from the
> vfs_{copy,clone}_file_range helpers?
>

vfs_{copy,clone}_file_range should call sb_start_write() instead of
mnt_want_write_file(), the checks against FMODE_WRITE + S_ISREG ensure
that the __mnt_want_write() part is already held on the file.

That still leaves the lockdep warning.  We could do
__vfs_{copy,clone}_file_range() variants without the
sb_start_write()/sb_end_write() and add the non-underscore variants as
static inline to fs.h that do call the sb_start/end_write.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 4/4] ovl: use vfs_copy_file_range() to copy up file data
  2016-09-22 14:49     ` Miklos Szeredi
@ 2016-09-22 15:44       ` Amir Goldstein
  2016-09-22 17:21         ` Amir Goldstein
  0 siblings, 1 reply; 28+ messages in thread
From: Amir Goldstein @ 2016-09-22 15:44 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Dave Chinner, linux-unionfs, Al Viro, Christoph Hellwig,
	linux-xfs, Darrick J . Wong, linux-fsdevel

On Thu, Sep 22, 2016 at 5:49 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, Sep 22, 2016 at 10:49 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>> Guys, I just hit a lockdep warning on nesting an mnt_want_write() with
>> this patch
>> because overlay already holds the write lock when getting to ovl_copy_up_data()
>> I don't know how I missed it before.
>>
>> Is it really a problem to nest this lock?
>> Should I factor our another set of _locked helpers from the
>> vfs_{copy,clone}_file_range helpers?
>>
>
> vfs_{copy,clone}_file_range should call sb_start_write() instead of
> mnt_want_write_file(), the checks against FMODE_WRITE + S_ISREG ensure
> that the __mnt_want_write() part is already held on the file.

Wait a minute, I think you got the solution but mixed it up a bit.
IIUC, vfs_{clone,copy} should call __mnt_want_write_file()
instead of mnt_want_write_file(), which will skip sb_start_write()
because file is already open for write and no lockdep warning
and no problem.

Am I missing something?

>
> That still leaves the lockdep warning.  We could do
> __vfs_{copy,clone}_file_range() variants without the
> sb_start_write()/sb_end_write() and add the non-underscore variants as
> static inline to fs.h that do call the sb_start/end_write.
>
> Thanks,
> Miklos

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 4/4] ovl: use vfs_copy_file_range() to copy up file data
  2016-09-22 15:44       ` Amir Goldstein
@ 2016-09-22 17:21         ` Amir Goldstein
  0 siblings, 0 replies; 28+ messages in thread
From: Amir Goldstein @ 2016-09-22 17:21 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Dave Chinner, linux-unionfs, Al Viro, Christoph Hellwig,
	linux-xfs, Darrick J . Wong, linux-fsdevel

On Thu, Sep 22, 2016 at 6:44 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Thu, Sep 22, 2016 at 5:49 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Thu, Sep 22, 2016 at 10:49 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> Guys, I just hit a lockdep warning on nesting an mnt_want_write() with
>>> this patch
>>> because overlay already holds the write lock when getting to ovl_copy_up_data()
>>> I don't know how I missed it before.
>>>
>>> Is it really a problem to nest this lock?
>>> Should I factor our another set of _locked helpers from the
>>> vfs_{copy,clone}_file_range helpers?
>>>
>>
>> vfs_{copy,clone}_file_range should call sb_start_write() instead of
>> mnt_want_write_file(), the checks against FMODE_WRITE + S_ISREG ensure
>> that the __mnt_want_write() part is already held on the file.
>
> Wait a minute, I think you got the solution but mixed it up a bit.
> IIUC, vfs_{clone,copy} should call __mnt_want_write_file()
> instead of mnt_want_write_file(), which will skip sb_start_write()
> because file is already open for write and no lockdep warning
> and no problem.
>
> Am I missing something?
>

Ah.. I got it. Anyway a short survey of fs/namei.c
shows that the custom is to have mnt_want_write() in either the syscall
or the do_XXX helper and not in the vfs_XXX helper,
so I will conform to this standard.

>>
>> That still leaves the lockdep warning.  We could do
>> __vfs_{copy,clone}_file_range() variants without the
>> sb_start_write()/sb_end_write() and add the non-underscore variants as
>> static inline to fs.h that do call the sb_start/end_write.
>>
>> Thanks,
>> Miklos

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/4] vfs: allow vfs_copy_file_range() across file systems
  2016-09-14 12:43 ` [PATCH v3 3/4] vfs: allow vfs_copy_file_range() across file systems Amir Goldstein
@ 2016-09-23  7:57   ` Amir Goldstein
  2016-09-23 15:19     ` Darrick J. Wong
  2016-09-23 16:13     ` Darrick J. Wong
  0 siblings, 2 replies; 28+ messages in thread
From: Amir Goldstein @ 2016-09-23  7:57 UTC (permalink / raw)
  To: Miklos Szeredi, Dave Chinner, linux-unionfs
  Cc: Christoph Hellwig, linux-xfs, Darrick J . Wong, linux-fsdevel

On Wed, Sep 14, 2016 at 3:43 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> copy_file_range syscall returns -EXDEV if src and dest
> file are not on the same file system.
> The vfs_copy_file_range() helper, however, knows how to copy
> across file systems with do_splice_direct().
>
> Move the enforcement of same file system from the vfs helper
> to the syscall code.
>
> A following patch is going to use the vfs_copy_file_range()
> helper in overlayfs to copy up between lower and upper
> not on the same file system.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/read_write.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 9dc6e52..6975fe8 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1502,10 +1502,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>             (file_out->f_flags & O_APPEND))
>                 return -EBADF;
>
> -       /* this could be relaxed once a method supports cross-fs copies */
> -       if (inode_in->i_sb != inode_out->i_sb)
> -               return -EXDEV;
> -
>         if (len == 0)
>                 return 0;
>
> @@ -1514,7 +1510,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>                 return ret;
>
>         ret = -EOPNOTSUPP;
> -       if (file_out->f_op->copy_file_range)
> +       /* copy_file_range() method does not support cross-fs copies */
> +       if (inode_in->i_sb == inode_out->i_sb &&
> +           file_out->f_op->copy_file_range)
>                 ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
>                                                       pos_out, len, flags);
>         if (ret == -EOPNOTSUPP)
> @@ -1569,6 +1567,14 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
>                 pos_out = f_out.file->f_pos;
>         }
>
> +       /*
> +        * vfs_copy_file_range() can do cross-fs copy, but we want to
> +        * fulfill the guaranty to userland that copy_file_range syscall
> +        * does not allow cross-fs copy
> +        */
> +       if (file_inode(f_in.file)->i_sb != file_inode(f_out.file)->i_sb)
> +               return -EXDEV;

Oops, that was supposed to be goto out;
Anyway, I am holding back on the vfs_copy_file_range() patches sub set
until I have a reliable test on xfs to fall back from clone to copy range

> +
>         ret = vfs_copy_file_range(f_in.file, pos_in, f_out.file, pos_out, len,
>                                   flags);
>         if (ret > 0) {
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/4] vfs: allow vfs_copy_file_range() across file systems
  2016-09-23  7:57   ` Amir Goldstein
@ 2016-09-23 15:19     ` Darrick J. Wong
  2016-09-23 16:13     ` Darrick J. Wong
  1 sibling, 0 replies; 28+ messages in thread
From: Darrick J. Wong @ 2016-09-23 15:19 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Dave Chinner, linux-unionfs, Christoph Hellwig,
	linux-xfs, linux-fsdevel

On Fri, Sep 23, 2016 at 10:57:56AM +0300, Amir Goldstein wrote:
> On Wed, Sep 14, 2016 at 3:43 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> > copy_file_range syscall returns -EXDEV if src and dest
> > file are not on the same file system.
> > The vfs_copy_file_range() helper, however, knows how to copy
> > across file systems with do_splice_direct().
> >
> > Move the enforcement of same file system from the vfs helper
> > to the syscall code.
> >
> > A following patch is going to use the vfs_copy_file_range()
> > helper in overlayfs to copy up between lower and upper
> > not on the same file system.
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >  fs/read_write.c | 16 +++++++++++-----
> >  1 file changed, 11 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/read_write.c b/fs/read_write.c
> > index 9dc6e52..6975fe8 100644
> > --- a/fs/read_write.c
> > +++ b/fs/read_write.c
> > @@ -1502,10 +1502,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >             (file_out->f_flags & O_APPEND))
> >                 return -EBADF;
> >
> > -       /* this could be relaxed once a method supports cross-fs copies */
> > -       if (inode_in->i_sb != inode_out->i_sb)
> > -               return -EXDEV;
> > -
> >         if (len == 0)
> >                 return 0;
> >
> > @@ -1514,7 +1510,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >                 return ret;
> >
> >         ret = -EOPNOTSUPP;
> > -       if (file_out->f_op->copy_file_range)
> > +       /* copy_file_range() method does not support cross-fs copies */
> > +       if (inode_in->i_sb == inode_out->i_sb &&
> > +           file_out->f_op->copy_file_range)
> >                 ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
> >                                                       pos_out, len, flags);
> >         if (ret == -EOPNOTSUPP)
> > @@ -1569,6 +1567,14 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
> >                 pos_out = f_out.file->f_pos;
> >         }
> >
> > +       /*
> > +        * vfs_copy_file_range() can do cross-fs copy, but we want to
> > +        * fulfill the guaranty to userland that copy_file_range syscall
> > +        * does not allow cross-fs copy
> > +        */
> > +       if (file_inode(f_in.file)->i_sb != file_inode(f_out.file)->i_sb)
> > +               return -EXDEV;
> 
> Oops, that was supposed to be goto out;
> Anyway, I am holding back on the vfs_copy_file_range() patches sub set
> until I have a reliable test on xfs to fall back from clone to copy range

I could just create another error injection point that'll make XFS pretend
that it's out of space.

--D

> 
> > +
> >         ret = vfs_copy_file_range(f_in.file, pos_in, f_out.file, pos_out, len,
> >                                   flags);
> >         if (ret > 0) {
> > --
> > 2.7.4
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/4] vfs: allow vfs_copy_file_range() across file systems
  2016-09-23  7:57   ` Amir Goldstein
  2016-09-23 15:19     ` Darrick J. Wong
@ 2016-09-23 16:13     ` Darrick J. Wong
  2016-09-23 18:52       ` Amir Goldstein
  1 sibling, 1 reply; 28+ messages in thread
From: Darrick J. Wong @ 2016-09-23 16:13 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Dave Chinner, linux-unionfs, Christoph Hellwig,
	linux-xfs, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 3141 bytes --]

On Fri, Sep 23, 2016 at 10:57:56AM +0300, Amir Goldstein wrote:
> On Wed, Sep 14, 2016 at 3:43 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> > copy_file_range syscall returns -EXDEV if src and dest
> > file are not on the same file system.
> > The vfs_copy_file_range() helper, however, knows how to copy
> > across file systems with do_splice_direct().
> >
> > Move the enforcement of same file system from the vfs helper
> > to the syscall code.
> >
> > A following patch is going to use the vfs_copy_file_range()
> > helper in overlayfs to copy up between lower and upper
> > not on the same file system.
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >  fs/read_write.c | 16 +++++++++++-----
> >  1 file changed, 11 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/read_write.c b/fs/read_write.c
> > index 9dc6e52..6975fe8 100644
> > --- a/fs/read_write.c
> > +++ b/fs/read_write.c
> > @@ -1502,10 +1502,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >             (file_out->f_flags & O_APPEND))
> >                 return -EBADF;
> >
> > -       /* this could be relaxed once a method supports cross-fs copies */
> > -       if (inode_in->i_sb != inode_out->i_sb)
> > -               return -EXDEV;
> > -
> >         if (len == 0)
> >                 return 0;
> >
> > @@ -1514,7 +1510,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >                 return ret;
> >
> >         ret = -EOPNOTSUPP;
> > -       if (file_out->f_op->copy_file_range)
> > +       /* copy_file_range() method does not support cross-fs copies */
> > +       if (inode_in->i_sb == inode_out->i_sb &&
> > +           file_out->f_op->copy_file_range)
> >                 ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
> >                                                       pos_out, len, flags);
> >         if (ret == -EOPNOTSUPP)
> > @@ -1569,6 +1567,14 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
> >                 pos_out = f_out.file->f_pos;
> >         }
> >
> > +       /*
> > +        * vfs_copy_file_range() can do cross-fs copy, but we want to
> > +        * fulfill the guaranty to userland that copy_file_range syscall
> > +        * does not allow cross-fs copy
> > +        */
> > +       if (file_inode(f_in.file)->i_sb != file_inode(f_out.file)->i_sb)
> > +               return -EXDEV;
> 
> Oops, that was supposed to be goto out;
> Anyway, I am holding back on the vfs_copy_file_range() patches sub set
> until I have a reliable test on xfs to fall back from clone to copy range

Ok, attached are two rough patches -- one to add the error injection point
into the kernel, and a second one to add it to the xfs_io 'inject' command.
Note that you'll have to format the XFS filesystem with rmapbt=1 since we
can't otherwise avoid per-AG ENOSPC if rmap is enabled.

The relevant xfstests commands are:

_require_xfs_io_error_injection "ag_resv_critical"
_scratch_inject_error "ag_resv_critical"

See the xfs/325 test for a rough framework.  I'll work on cleaning up the
patches and trying to get them into 4.9.

--D

[-- Attachment #2: kernel.patch --]
[-- Type: text/x-diff, Size: 1833 bytes --]

diff --git a/fs/xfs/libxfs/xfs_ag_resv.c b/fs/xfs/libxfs/xfs_ag_resv.c
index adf770f..e5ebc37 100644
--- a/fs/xfs/libxfs/xfs_ag_resv.c
+++ b/fs/xfs/libxfs/xfs_ag_resv.c
@@ -109,7 +109,9 @@ xfs_ag_resv_critical(
 	trace_xfs_ag_resv_critical(pag, type, avail);
 
 	/* Critically low if less than 10% or max btree height remains. */
-	return avail < orig / 10 || avail < XFS_BTREE_MAXLEVELS;
+	return XFS_TEST_ERROR(avail < orig / 10 || avail < XFS_BTREE_MAXLEVELS,
+			pag->pag_mount, XFS_ERRTAG_AG_RESV_CRITICAL,
+			XFS_RANDOM_AG_RESV_CRITICAL);
 }
 
 /*
diff --git a/fs/xfs/xfs_error.h b/fs/xfs/xfs_error.h
index 8d8e1b07..e539194 100644
--- a/fs/xfs/xfs_error.h
+++ b/fs/xfs/xfs_error.h
@@ -95,7 +95,8 @@ extern void xfs_verifier_error(struct xfs_buf *bp);
 #define XFS_ERRTAG_REFCOUNT_CONTINUE_UPDATE		24
 #define XFS_ERRTAG_REFCOUNT_FINISH_ONE			25
 #define XFS_ERRTAG_BMAP_FINISH_ONE			26
-#define XFS_ERRTAG_MAX					27
+#define XFS_ERRTAG_AG_RESV_CRITICAL			27
+#define XFS_ERRTAG_MAX					28
 
 /*
  * Random factors for above tags, 1 means always, 2 means 1/2 time, etc.
@@ -127,6 +128,7 @@ extern void xfs_verifier_error(struct xfs_buf *bp);
 #define XFS_RANDOM_REFCOUNT_CONTINUE_UPDATE		1
 #define XFS_RANDOM_REFCOUNT_FINISH_ONE			1
 #define XFS_RANDOM_BMAP_FINISH_ONE			1
+#define XFS_RANDOM_AG_RESV_CRITICAL			1
 
 #ifdef DEBUG
 extern int xfs_error_test_active;
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 7881142..ead31f8 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1124,7 +1124,8 @@ xfs_reflink_ag_has_free_space(
 		return 0;
 
 	pag = xfs_perag_get(mp, agno);
-	if (xfs_ag_resv_critical(pag, XFS_AG_RESV_AGFL))
+	if (xfs_ag_resv_critical(pag, XFS_AG_RESV_AGFL) ||
+	    xfs_ag_resv_critical(pag, XFS_AG_RESV_METADATA))
 		error = -ENOSPC;
 	xfs_perag_put(pag);
 	return error;

[-- Attachment #3: xfsprogs.patch --]
[-- Type: text/x-diff, Size: 570 bytes --]

diff --git a/io/inject.c b/io/inject.c
index 56642b8..5d5e4ae 100644
--- a/io/inject.c
+++ b/io/inject.c
@@ -84,7 +84,9 @@ error_tag(char *name)
 		{ XFS_ERRTAG_REFCOUNT_FINISH_ONE,	"refcount_finish_one" },
 #define XFS_ERRTAG_BMAP_FINISH_ONE			26
 		{ XFS_ERRTAG_BMAP_FINISH_ONE,		"bmap_finish_one" },
-#define XFS_ERRTAG_MAX                                  27
+#define XFS_ERRTAG_AG_RESV_CRITICAL			27
+		{ XFS_ERRTAG_AG_RESV_CRITICAL,		"ag_resv_critical" },
+#define XFS_ERRTAG_MAX                                  28
 		{ XFS_ERRTAG_MAX,			NULL }
 	};
 	int	count;

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/4] vfs: allow vfs_copy_file_range() across file systems
  2016-09-23 16:13     ` Darrick J. Wong
@ 2016-09-23 18:52       ` Amir Goldstein
  2016-09-24 15:06         ` Darrick J. Wong
  2016-09-26 16:33         ` Darrick J. Wong
  0 siblings, 2 replies; 28+ messages in thread
From: Amir Goldstein @ 2016-09-23 18:52 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Miklos Szeredi, Dave Chinner, linux-unionfs, Christoph Hellwig,
	linux-xfs, linux-fsdevel

On Fri, Sep 23, 2016 at 7:13 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> On Fri, Sep 23, 2016 at 10:57:56AM +0300, Amir Goldstein wrote:
>> On Wed, Sep 14, 2016 at 3:43 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> > copy_file_range syscall returns -EXDEV if src and dest
>> > file are not on the same file system.
>> > The vfs_copy_file_range() helper, however, knows how to copy
>> > across file systems with do_splice_direct().
>> >
>> > Move the enforcement of same file system from the vfs helper
>> > to the syscall code.
>> >
>> > A following patch is going to use the vfs_copy_file_range()
>> > helper in overlayfs to copy up between lower and upper
>> > not on the same file system.
>> >
>> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> > ---
>> >  fs/read_write.c | 16 +++++++++++-----
>> >  1 file changed, 11 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/fs/read_write.c b/fs/read_write.c
>> > index 9dc6e52..6975fe8 100644
>> > --- a/fs/read_write.c
>> > +++ b/fs/read_write.c
>> > @@ -1502,10 +1502,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>> >             (file_out->f_flags & O_APPEND))
>> >                 return -EBADF;
>> >
>> > -       /* this could be relaxed once a method supports cross-fs copies */
>> > -       if (inode_in->i_sb != inode_out->i_sb)
>> > -               return -EXDEV;
>> > -
>> >         if (len == 0)
>> >                 return 0;
>> >
>> > @@ -1514,7 +1510,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>> >                 return ret;
>> >
>> >         ret = -EOPNOTSUPP;
>> > -       if (file_out->f_op->copy_file_range)
>> > +       /* copy_file_range() method does not support cross-fs copies */
>> > +       if (inode_in->i_sb == inode_out->i_sb &&
>> > +           file_out->f_op->copy_file_range)
>> >                 ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
>> >                                                       pos_out, len, flags);
>> >         if (ret == -EOPNOTSUPP)
>> > @@ -1569,6 +1567,14 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
>> >                 pos_out = f_out.file->f_pos;
>> >         }
>> >
>> > +       /*
>> > +        * vfs_copy_file_range() can do cross-fs copy, but we want to
>> > +        * fulfill the guaranty to userland that copy_file_range syscall
>> > +        * does not allow cross-fs copy
>> > +        */
>> > +       if (file_inode(f_in.file)->i_sb != file_inode(f_out.file)->i_sb)
>> > +               return -EXDEV;
>>
>> Oops, that was supposed to be goto out;
>> Anyway, I am holding back on the vfs_copy_file_range() patches sub set
>> until I have a reliable test on xfs to fall back from clone to copy range
>
> Ok, attached are two rough patches -- one to add the error injection point
> into the kernel, and a second one to add it to the xfs_io 'inject' command.
> Note that you'll have to format the XFS filesystem with rmapbt=1 since we
> can't otherwise avoid per-AG ENOSPC if rmap is enabled.
>
> The relevant xfstests commands are:
>
> _require_xfs_io_error_injection "ag_resv_critical"
> _scratch_inject_error "ag_resv_critical"
>
> See the xfs/325 test for a rough framework.  I'll work on cleaning up the
> patches and trying to get them into 4.9.
>

Thanks, Darrick, but I'm not sure that's enough. does the framework allow
to inject an error for a specific AG? otherwise, the code will not
fall back from
failing full reflink to partial copy partial reflink.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/4] vfs: allow vfs_copy_file_range() across file systems
  2016-09-23 18:52       ` Amir Goldstein
@ 2016-09-24 15:06         ` Darrick J. Wong
  2016-09-26 16:33         ` Darrick J. Wong
  1 sibling, 0 replies; 28+ messages in thread
From: Darrick J. Wong @ 2016-09-24 15:06 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Dave Chinner, linux-unionfs, Christoph Hellwig,
	linux-xfs, linux-fsdevel

On Fri, Sep 23, 2016 at 09:52:42PM +0300, Amir Goldstein wrote:
> On Fri, Sep 23, 2016 at 7:13 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > On Fri, Sep 23, 2016 at 10:57:56AM +0300, Amir Goldstein wrote:
> >> On Wed, Sep 14, 2016 at 3:43 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> >> > copy_file_range syscall returns -EXDEV if src and dest
> >> > file are not on the same file system.
> >> > The vfs_copy_file_range() helper, however, knows how to copy
> >> > across file systems with do_splice_direct().
> >> >
> >> > Move the enforcement of same file system from the vfs helper
> >> > to the syscall code.
> >> >
> >> > A following patch is going to use the vfs_copy_file_range()
> >> > helper in overlayfs to copy up between lower and upper
> >> > not on the same file system.
> >> >
> >> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >> > ---
> >> >  fs/read_write.c | 16 +++++++++++-----
> >> >  1 file changed, 11 insertions(+), 5 deletions(-)
> >> >
> >> > diff --git a/fs/read_write.c b/fs/read_write.c
> >> > index 9dc6e52..6975fe8 100644
> >> > --- a/fs/read_write.c
> >> > +++ b/fs/read_write.c
> >> > @@ -1502,10 +1502,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >> >             (file_out->f_flags & O_APPEND))
> >> >                 return -EBADF;
> >> >
> >> > -       /* this could be relaxed once a method supports cross-fs copies */
> >> > -       if (inode_in->i_sb != inode_out->i_sb)
> >> > -               return -EXDEV;
> >> > -
> >> >         if (len == 0)
> >> >                 return 0;
> >> >
> >> > @@ -1514,7 +1510,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >> >                 return ret;
> >> >
> >> >         ret = -EOPNOTSUPP;
> >> > -       if (file_out->f_op->copy_file_range)
> >> > +       /* copy_file_range() method does not support cross-fs copies */
> >> > +       if (inode_in->i_sb == inode_out->i_sb &&
> >> > +           file_out->f_op->copy_file_range)
> >> >                 ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
> >> >                                                       pos_out, len, flags);
> >> >         if (ret == -EOPNOTSUPP)
> >> > @@ -1569,6 +1567,14 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
> >> >                 pos_out = f_out.file->f_pos;
> >> >         }
> >> >
> >> > +       /*
> >> > +        * vfs_copy_file_range() can do cross-fs copy, but we want to
> >> > +        * fulfill the guaranty to userland that copy_file_range syscall
> >> > +        * does not allow cross-fs copy
> >> > +        */
> >> > +       if (file_inode(f_in.file)->i_sb != file_inode(f_out.file)->i_sb)
> >> > +               return -EXDEV;
> >>
> >> Oops, that was supposed to be goto out;
> >> Anyway, I am holding back on the vfs_copy_file_range() patches sub set
> >> until I have a reliable test on xfs to fall back from clone to copy range
> >
> > Ok, attached are two rough patches -- one to add the error injection point
> > into the kernel, and a second one to add it to the xfs_io 'inject' command.
> > Note that you'll have to format the XFS filesystem with rmapbt=1 since we
> > can't otherwise avoid per-AG ENOSPC if rmap is enabled.
> >
> > The relevant xfstests commands are:
> >
> > _require_xfs_io_error_injection "ag_resv_critical"
> > _scratch_inject_error "ag_resv_critical"
> >
> > See the xfs/325 test for a rough framework.  I'll work on cleaning up the
> > patches and trying to get them into 4.9.
> >
> 
> Thanks, Darrick, but I'm not sure that's enough. does the framework allow
> to inject an error for a specific AG? otherwise, the code will not
> fall back from
> failing full reflink to partial copy partial reflink.

You could change XFS_RANDOM_AG_RESV_CRITICAL to a non-1 value to make it
fail randomly (instead of all the time) and reflink a sparse file with a
large number of extents.  This...

#define XFS_RANDOM_AG_RESV_CRITICAL 2

...would make it fail half the time.

--D

> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/4] vfs: allow vfs_copy_file_range() across file systems
  2016-09-23 18:52       ` Amir Goldstein
  2016-09-24 15:06         ` Darrick J. Wong
@ 2016-09-26 16:33         ` Darrick J. Wong
  2016-09-26 18:12           ` Amir Goldstein
  1 sibling, 1 reply; 28+ messages in thread
From: Darrick J. Wong @ 2016-09-26 16:33 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Dave Chinner, linux-unionfs, Christoph Hellwig,
	linux-xfs, linux-fsdevel

On Fri, Sep 23, 2016 at 09:52:42PM +0300, Amir Goldstein wrote:
> On Fri, Sep 23, 2016 at 7:13 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > On Fri, Sep 23, 2016 at 10:57:56AM +0300, Amir Goldstein wrote:
> >> On Wed, Sep 14, 2016 at 3:43 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> >> > copy_file_range syscall returns -EXDEV if src and dest
> >> > file are not on the same file system.
> >> > The vfs_copy_file_range() helper, however, knows how to copy
> >> > across file systems with do_splice_direct().
> >> >
> >> > Move the enforcement of same file system from the vfs helper
> >> > to the syscall code.
> >> >
> >> > A following patch is going to use the vfs_copy_file_range()
> >> > helper in overlayfs to copy up between lower and upper
> >> > not on the same file system.
> >> >
> >> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >> > ---
> >> >  fs/read_write.c | 16 +++++++++++-----
> >> >  1 file changed, 11 insertions(+), 5 deletions(-)
> >> >
> >> > diff --git a/fs/read_write.c b/fs/read_write.c
> >> > index 9dc6e52..6975fe8 100644
> >> > --- a/fs/read_write.c
> >> > +++ b/fs/read_write.c
> >> > @@ -1502,10 +1502,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >> >             (file_out->f_flags & O_APPEND))
> >> >                 return -EBADF;
> >> >
> >> > -       /* this could be relaxed once a method supports cross-fs copies */
> >> > -       if (inode_in->i_sb != inode_out->i_sb)
> >> > -               return -EXDEV;
> >> > -
> >> >         if (len == 0)
> >> >                 return 0;
> >> >
> >> > @@ -1514,7 +1510,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >> >                 return ret;
> >> >
> >> >         ret = -EOPNOTSUPP;
> >> > -       if (file_out->f_op->copy_file_range)
> >> > +       /* copy_file_range() method does not support cross-fs copies */
> >> > +       if (inode_in->i_sb == inode_out->i_sb &&
> >> > +           file_out->f_op->copy_file_range)
> >> >                 ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
> >> >                                                       pos_out, len, flags);
> >> >         if (ret == -EOPNOTSUPP)
> >> > @@ -1569,6 +1567,14 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
> >> >                 pos_out = f_out.file->f_pos;
> >> >         }
> >> >
> >> > +       /*
> >> > +        * vfs_copy_file_range() can do cross-fs copy, but we want to
> >> > +        * fulfill the guaranty to userland that copy_file_range syscall
> >> > +        * does not allow cross-fs copy
> >> > +        */
> >> > +       if (file_inode(f_in.file)->i_sb != file_inode(f_out.file)->i_sb)
> >> > +               return -EXDEV;
> >>
> >> Oops, that was supposed to be goto out;
> >> Anyway, I am holding back on the vfs_copy_file_range() patches sub set
> >> until I have a reliable test on xfs to fall back from clone to copy range
> >
> > Ok, attached are two rough patches -- one to add the error injection point
> > into the kernel, and a second one to add it to the xfs_io 'inject' command.
> > Note that you'll have to format the XFS filesystem with rmapbt=1 since we
> > can't otherwise avoid per-AG ENOSPC if rmap is enabled.
> >
> > The relevant xfstests commands are:
> >
> > _require_xfs_io_error_injection "ag_resv_critical"
> > _scratch_inject_error "ag_resv_critical"
> >
> > See the xfs/325 test for a rough framework.  I'll work on cleaning up the
> > patches and trying to get them into 4.9.
> >
> 
> Thanks, Darrick, but I'm not sure that's enough. does the framework allow
> to inject an error for a specific AG? otherwise, the code will not
> fall back from
> failing full reflink to partial copy partial reflink.

The error injector (as far as I know) cannot inject errors only for a
specific AG.  However, since your goal is to have the reflink partially
succeed, I could set up the injector to fail only a fraction of the
time.  The difficulty here is that what we /probably/ need is to have it
ENOSPC after N extents, where 0 < N <= nr_file_extents.  Tricky since
the probabilistic nature means that it could inject during the first
XFS_TEST_ERROR call.

Or change the function such that the error injector function only gets
called if the file offset > 0.  Then you can prepare a specially crafted
file such that you'll always get at least a partial reflink before
ENOSPC.  Note that the reflink functions won't return where an error
happened, so you'll end up recopying the entire range regardless.

--D

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/4] vfs: allow vfs_copy_file_range() across file systems
  2016-09-26 16:33         ` Darrick J. Wong
@ 2016-09-26 18:12           ` Amir Goldstein
  2016-09-26 18:16             ` Darrick J. Wong
  0 siblings, 1 reply; 28+ messages in thread
From: Amir Goldstein @ 2016-09-26 18:12 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Miklos Szeredi, Dave Chinner, linux-unionfs, Christoph Hellwig,
	linux-xfs, linux-fsdevel

On Mon, Sep 26, 2016 at 7:33 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> On Fri, Sep 23, 2016 at 09:52:42PM +0300, Amir Goldstein wrote:
>> On Fri, Sep 23, 2016 at 7:13 PM, Darrick J. Wong
>> <darrick.wong@oracle.com> wrote:
>> > On Fri, Sep 23, 2016 at 10:57:56AM +0300, Amir Goldstein wrote:
>> >> On Wed, Sep 14, 2016 at 3:43 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> >> > copy_file_range syscall returns -EXDEV if src and dest
>> >> > file are not on the same file system.
>> >> > The vfs_copy_file_range() helper, however, knows how to copy
>> >> > across file systems with do_splice_direct().
>> >> >
>> >> > Move the enforcement of same file system from the vfs helper
>> >> > to the syscall code.
>> >> >
>> >> > A following patch is going to use the vfs_copy_file_range()
>> >> > helper in overlayfs to copy up between lower and upper
>> >> > not on the same file system.
>> >> >
>> >> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> >> > ---
>> >> >  fs/read_write.c | 16 +++++++++++-----
>> >> >  1 file changed, 11 insertions(+), 5 deletions(-)
>> >> >
>> >> > diff --git a/fs/read_write.c b/fs/read_write.c
>> >> > index 9dc6e52..6975fe8 100644
>> >> > --- a/fs/read_write.c
>> >> > +++ b/fs/read_write.c
>> >> > @@ -1502,10 +1502,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>> >> >             (file_out->f_flags & O_APPEND))
>> >> >                 return -EBADF;
>> >> >
>> >> > -       /* this could be relaxed once a method supports cross-fs copies */
>> >> > -       if (inode_in->i_sb != inode_out->i_sb)
>> >> > -               return -EXDEV;
>> >> > -
>> >> >         if (len == 0)
>> >> >                 return 0;
>> >> >
>> >> > @@ -1514,7 +1510,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>> >> >                 return ret;
>> >> >
>> >> >         ret = -EOPNOTSUPP;
>> >> > -       if (file_out->f_op->copy_file_range)
>> >> > +       /* copy_file_range() method does not support cross-fs copies */
>> >> > +       if (inode_in->i_sb == inode_out->i_sb &&
>> >> > +           file_out->f_op->copy_file_range)
>> >> >                 ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
>> >> >                                                       pos_out, len, flags);
>> >> >         if (ret == -EOPNOTSUPP)
>> >> > @@ -1569,6 +1567,14 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
>> >> >                 pos_out = f_out.file->f_pos;
>> >> >         }
>> >> >
>> >> > +       /*
>> >> > +        * vfs_copy_file_range() can do cross-fs copy, but we want to
>> >> > +        * fulfill the guaranty to userland that copy_file_range syscall
>> >> > +        * does not allow cross-fs copy
>> >> > +        */
>> >> > +       if (file_inode(f_in.file)->i_sb != file_inode(f_out.file)->i_sb)
>> >> > +               return -EXDEV;
>> >>
>> >> Oops, that was supposed to be goto out;
>> >> Anyway, I am holding back on the vfs_copy_file_range() patches sub set
>> >> until I have a reliable test on xfs to fall back from clone to copy range
>> >
>> > Ok, attached are two rough patches -- one to add the error injection point
>> > into the kernel, and a second one to add it to the xfs_io 'inject' command.
>> > Note that you'll have to format the XFS filesystem with rmapbt=1 since we
>> > can't otherwise avoid per-AG ENOSPC if rmap is enabled.
>> >
>> > The relevant xfstests commands are:
>> >
>> > _require_xfs_io_error_injection "ag_resv_critical"
>> > _scratch_inject_error "ag_resv_critical"
>> >
>> > See the xfs/325 test for a rough framework.  I'll work on cleaning up the
>> > patches and trying to get them into 4.9.
>> >
>>
>> Thanks, Darrick, but I'm not sure that's enough. does the framework allow
>> to inject an error for a specific AG? otherwise, the code will not
>> fall back from
>> failing full reflink to partial copy partial reflink.
>
> The error injector (as far as I know) cannot inject errors only for a
> specific AG.  However, since your goal is to have the reflink partially
> succeed, I could set up the injector to fail only a fraction of the
> time.  The difficulty here is that what we /probably/ need is to have it
> ENOSPC after N extents, where 0 < N <= nr_file_extents.  Tricky since
> the probabilistic nature means that it could inject during the first
> XFS_TEST_ERROR call.
>
> Or change the function such that the error injector function only gets
> called if the file offset > 0.  Then you can prepare a specially crafted
> file such that you'll always get at least a partial reflink before
> ENOSPC.  Note that the reflink functions won't return where an error
> happened, so you'll end up recopying the entire range regardless.
>

All right. How about the realtime AG. Does it support reflink?
Can a file have extents both in realtime AG and other AGs?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/4] vfs: allow vfs_copy_file_range() across file systems
  2016-09-26 18:12           ` Amir Goldstein
@ 2016-09-26 18:16             ` Darrick J. Wong
  0 siblings, 0 replies; 28+ messages in thread
From: Darrick J. Wong @ 2016-09-26 18:16 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Dave Chinner, linux-unionfs, Christoph Hellwig,
	linux-xfs, linux-fsdevel

On Mon, Sep 26, 2016 at 09:12:13PM +0300, Amir Goldstein wrote:
> On Mon, Sep 26, 2016 at 7:33 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > On Fri, Sep 23, 2016 at 09:52:42PM +0300, Amir Goldstein wrote:
> >> On Fri, Sep 23, 2016 at 7:13 PM, Darrick J. Wong
> >> <darrick.wong@oracle.com> wrote:
> >> > On Fri, Sep 23, 2016 at 10:57:56AM +0300, Amir Goldstein wrote:
> >> >> On Wed, Sep 14, 2016 at 3:43 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> >> >> > copy_file_range syscall returns -EXDEV if src and dest
> >> >> > file are not on the same file system.
> >> >> > The vfs_copy_file_range() helper, however, knows how to copy
> >> >> > across file systems with do_splice_direct().
> >> >> >
> >> >> > Move the enforcement of same file system from the vfs helper
> >> >> > to the syscall code.
> >> >> >
> >> >> > A following patch is going to use the vfs_copy_file_range()
> >> >> > helper in overlayfs to copy up between lower and upper
> >> >> > not on the same file system.
> >> >> >
> >> >> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >> >> > ---
> >> >> >  fs/read_write.c | 16 +++++++++++-----
> >> >> >  1 file changed, 11 insertions(+), 5 deletions(-)
> >> >> >
> >> >> > diff --git a/fs/read_write.c b/fs/read_write.c
> >> >> > index 9dc6e52..6975fe8 100644
> >> >> > --- a/fs/read_write.c
> >> >> > +++ b/fs/read_write.c
> >> >> > @@ -1502,10 +1502,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >> >> >             (file_out->f_flags & O_APPEND))
> >> >> >                 return -EBADF;
> >> >> >
> >> >> > -       /* this could be relaxed once a method supports cross-fs copies */
> >> >> > -       if (inode_in->i_sb != inode_out->i_sb)
> >> >> > -               return -EXDEV;
> >> >> > -
> >> >> >         if (len == 0)
> >> >> >                 return 0;
> >> >> >
> >> >> > @@ -1514,7 +1510,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >> >> >                 return ret;
> >> >> >
> >> >> >         ret = -EOPNOTSUPP;
> >> >> > -       if (file_out->f_op->copy_file_range)
> >> >> > +       /* copy_file_range() method does not support cross-fs copies */
> >> >> > +       if (inode_in->i_sb == inode_out->i_sb &&
> >> >> > +           file_out->f_op->copy_file_range)
> >> >> >                 ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
> >> >> >                                                       pos_out, len, flags);
> >> >> >         if (ret == -EOPNOTSUPP)
> >> >> > @@ -1569,6 +1567,14 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
> >> >> >                 pos_out = f_out.file->f_pos;
> >> >> >         }
> >> >> >
> >> >> > +       /*
> >> >> > +        * vfs_copy_file_range() can do cross-fs copy, but we want to
> >> >> > +        * fulfill the guaranty to userland that copy_file_range syscall
> >> >> > +        * does not allow cross-fs copy
> >> >> > +        */
> >> >> > +       if (file_inode(f_in.file)->i_sb != file_inode(f_out.file)->i_sb)
> >> >> > +               return -EXDEV;
> >> >>
> >> >> Oops, that was supposed to be goto out;
> >> >> Anyway, I am holding back on the vfs_copy_file_range() patches sub set
> >> >> until I have a reliable test on xfs to fall back from clone to copy range
> >> >
> >> > Ok, attached are two rough patches -- one to add the error injection point
> >> > into the kernel, and a second one to add it to the xfs_io 'inject' command.
> >> > Note that you'll have to format the XFS filesystem with rmapbt=1 since we
> >> > can't otherwise avoid per-AG ENOSPC if rmap is enabled.
> >> >
> >> > The relevant xfstests commands are:
> >> >
> >> > _require_xfs_io_error_injection "ag_resv_critical"
> >> > _scratch_inject_error "ag_resv_critical"
> >> >
> >> > See the xfs/325 test for a rough framework.  I'll work on cleaning up the
> >> > patches and trying to get them into 4.9.
> >> >
> >>
> >> Thanks, Darrick, but I'm not sure that's enough. does the framework allow
> >> to inject an error for a specific AG? otherwise, the code will not
> >> fall back from
> >> failing full reflink to partial copy partial reflink.
> >
> > The error injector (as far as I know) cannot inject errors only for a
> > specific AG.  However, since your goal is to have the reflink partially
> > succeed, I could set up the injector to fail only a fraction of the
> > time.  The difficulty here is that what we /probably/ need is to have it
> > ENOSPC after N extents, where 0 < N <= nr_file_extents.  Tricky since
> > the probabilistic nature means that it could inject during the first
> > XFS_TEST_ERROR call.
> >
> > Or change the function such that the error injector function only gets
> > called if the file offset > 0.  Then you can prepare a specially crafted
> > file such that you'll always get at least a partial reflink before
> > ENOSPC.  Note that the reflink functions won't return where an error
> > happened, so you'll end up recopying the entire range regardless.
> >
> 
> All right. How about the realtime AG. Does it support reflink?

Not at this time, and probably never.

> Can a file have extents both in realtime AG and other AGs?

Yes -- data extents on the rt device and extended attr extents on the
regular data device.  The only thing you can reflink are file data
extents on the data device.  Everything else (attr blocks, directories,
metadata) cannot be shared, just like in pre-reflink XFS.

--D

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/4] ovl: use vfs_clone_file_range() for copy up if possible
  2016-09-21 18:29       ` Miklos Szeredi
@ 2016-09-29  9:00         ` Amir Goldstein
  2016-09-30 11:14           ` Miklos Szeredi
  0 siblings, 1 reply; 28+ messages in thread
From: Amir Goldstein @ 2016-09-29  9:00 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Dave Chinner, linux-unionfs, Christoph Hellwig, linux-xfs,
	Darrick J . Wong, linux-fsdevel

On Wed, Sep 21, 2016 at 9:29 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Wed, Sep 21, 2016 at 7:01 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Wed, Sep 21, 2016 at 6:09 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> On Wed, Sep 14, 2016 at 2:43 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>> When copying up within the same fs, try to use vfs_clone_file_range().
>>>> This is very efficient when lower and upper are on the same fs
>>>> with file reflink support. If vfs_clone_file_range() fails because
>>>> lower and upper are not on the same fs or if fs has no reflink support,
>>>> copy up falls back to the regular data copy code.
>>>>
>>>> Tested correct behavior when lower and upper are on:
>>>> 1. same ext4 (copy)
>>>> 2. same xfs + reflink patches + mkfs.xfs (copy)
>>>> 3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (reflink)
>>>> 4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy)
>>>>
>>>> For comparison, on my laptop, xfstest overlay/001 (copy up of large
>>>> sparse files) takes less than 1 second in the xfs reflink setup vs.
>>>> 25 seconds on the rest of the setups.
>>>>
>>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>>> ---
>>>>  fs/overlayfs/copy_up.c | 12 +++++++++++-
>>>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
>>>> index 43fdc27..ba039f8 100644
>>>> --- a/fs/overlayfs/copy_up.c
>>>> +++ b/fs/overlayfs/copy_up.c
>>>> @@ -136,6 +136,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
>>>>                 goto out_fput;
>>>>         }
>>>>
>>>> +       /* Try to use clone_file_range to clone up within the same fs */
>>>> +       error = vfs_clone_file_range(old_file, 0, new_file, 0, len);
>>>> +       if (!error)
>>>> +               goto out;
>>>> +       /* If we can clone but clone failed - abort */
>>>> +       if (error != -EXDEV && error != -EOPNOTSUPP)
>>>> +               goto out;
>>>
>>> Would be safer to fall back on any error.
>>>
>>
>> Agreed. Dave, since you suggested the 'softer' fall back, do you have
>> any objections?
>>
>>> Otherwise ACK.
>>>
>>
>> Will you be taking this to your tree?
>
> Sure I can take it.
>

Miklos,

Will you be able to queue v4 patches for 4.9?

Thanks,
Amir.

>>
>> Please note that this patch depends on patch v3 1/4,
>> because vfs_clone_file_range() in current mainline
>> fails to clone from lower to upper due to upper and lower being
>> private mount clones
>> and therefore not the same f_path.mnt.
>
> Right.  I didn't do a thorough audit of ->clone_file_range()
> implementations, but 1/4 is probably OK.
>
> Thanks,
> Miklos

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/4] ovl: use vfs_clone_file_range() for copy up if possible
  2016-09-29  9:00         ` Amir Goldstein
@ 2016-09-30 11:14           ` Miklos Szeredi
  0 siblings, 0 replies; 28+ messages in thread
From: Miklos Szeredi @ 2016-09-30 11:14 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Dave Chinner, linux-unionfs, Christoph Hellwig, linux-xfs,
	Darrick J . Wong, linux-fsdevel

On Thu, Sep 29, 2016 at 11:00 AM, Amir Goldstein <amir73il@gmail.com> wrote:

> Will you be able to queue v4 patches for 4.9?

Queued to

 git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git #overlayfs-next

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-09-30 11:14 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-14 12:43 [PATCH v3 0/4] ovl: efficient copy up by reflink Amir Goldstein
2016-09-14 12:43 ` [PATCH v3 1/4] vfs: allow vfs_clone_file_range() across mount points Amir Goldstein
2016-09-14 12:43 ` [PATCH v3 2/4] ovl: use vfs_clone_file_range() for copy up if possible Amir Goldstein
2016-09-21 15:09   ` Miklos Szeredi
2016-09-21 17:01     ` Amir Goldstein
2016-09-21 18:29       ` Miklos Szeredi
2016-09-29  9:00         ` Amir Goldstein
2016-09-30 11:14           ` Miklos Szeredi
2016-09-21 21:48       ` Dave Chinner
2016-09-21 21:57         ` Al Viro
2016-09-21 22:33           ` Dave Chinner
2016-09-22  2:25             ` Darrick J. Wong
2016-09-22  2:52               ` Amir Goldstein
2016-09-14 12:43 ` [PATCH v3 3/4] vfs: allow vfs_copy_file_range() across file systems Amir Goldstein
2016-09-23  7:57   ` Amir Goldstein
2016-09-23 15:19     ` Darrick J. Wong
2016-09-23 16:13     ` Darrick J. Wong
2016-09-23 18:52       ` Amir Goldstein
2016-09-24 15:06         ` Darrick J. Wong
2016-09-26 16:33         ` Darrick J. Wong
2016-09-26 18:12           ` Amir Goldstein
2016-09-26 18:16             ` Darrick J. Wong
2016-09-14 12:43 ` [PATCH v3 4/4] ovl: use vfs_copy_file_range() to copy up file data Amir Goldstein
2016-09-22  8:49   ` Amir Goldstein
2016-09-22 14:49     ` Miklos Szeredi
2016-09-22 15:44       ` Amir Goldstein
2016-09-22 17:21         ` Amir Goldstein
2016-09-19 18:55 ` [PATCH v3 0/4] ovl: efficient copy up by reflink Amir Goldstein

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.