All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ovl: use copy_file_range for copy up if possible
@ 2016-09-08 15:29 Amir Goldstein
  2016-09-08 20:25 ` Dave Chinner
  2016-09-12 15:06 ` [PATCH v2 0/4] ovl: efficient copy up by reflink Amir Goldstein
  0 siblings, 2 replies; 24+ messages in thread
From: Amir Goldstein @ 2016-09-08 15:29 UTC (permalink / raw)
  To: Miklos Szeredi, Darrick J . Wong
  Cc: linux-unionfs, linux-xfs, linux-fsdevel, Amir Goldstein

When copying up within the same fs, try to use f_op->copy_file_range().
This becomes very efficient when lower and upper are on the same fs
with file reflink support.

Tested correct behavior when lower and upper are on:
1. same ext4 (copy)
2. same xfs + reflink patches + mkfs.xfs (copy)
3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (clone)
4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy)

Verified that all the overlay xfstests pass in the 'same xfs+reflink'
setup.

For comparison, on my laptop, xfstest overlay/001 (copy up of large
sparse files) takes less than 1 second in the xfs reflink setup vs.
25 seconds on the rest of the setups.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 43fdc27..400567b 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -121,6 +121,7 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
 	struct file *new_file;
 	loff_t old_pos = 0;
 	loff_t new_pos = 0;
+	int try_copy_file = 0;
 	int error = 0;
 
 	if (len == 0)
@@ -136,6 +137,13 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
 		goto out_fput;
 	}
 
+	/*
+	 * When copying up within the same fs, try to use fs's copy_file_range
+	 */
+	if (file_inode(old_file)->i_sb == file_inode(new_file)->i_sb) {
+		try_copy_file = (new_file->f_op->copy_file_range != NULL);
+	}
+
 	/* FIXME: copy up sparse files efficiently */
 	while (len) {
 		size_t this_len = OVL_COPY_UP_CHUNK_SIZE;
@@ -149,9 +157,23 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
 			break;
 		}
 
-		bytes = do_splice_direct(old_file, &old_pos,
-					 new_file, &new_pos,
-					 this_len, SPLICE_F_MOVE);
+		if (try_copy_file) {
+			bytes = new_file->f_op->copy_file_range(
+					old_file, old_pos,
+					new_file, new_pos,
+					len, 0);
+			if (bytes == -EOPNOTSUPP) {
+				try_copy_file = 0;
+				continue;
+			} else if (bytes > 0) {
+				old_pos += bytes;
+				new_pos += bytes;
+			}
+		} else {
+			bytes = do_splice_direct(old_file, &old_pos,
+					new_file, &new_pos,
+					this_len, SPLICE_F_MOVE);
+		}
 		if (bytes <= 0) {
 			error = bytes;
 			break;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH] ovl: use copy_file_range for copy up if possible
  2016-09-08 15:29 [PATCH] ovl: use copy_file_range for copy up if possible Amir Goldstein
@ 2016-09-08 20:25 ` Dave Chinner
  2016-09-09  7:31   ` Amir Goldstein
  2016-09-12 15:06 ` [PATCH v2 0/4] ovl: efficient copy up by reflink Amir Goldstein
  1 sibling, 1 reply; 24+ messages in thread
From: Dave Chinner @ 2016-09-08 20:25 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Darrick J . Wong, linux-unionfs, linux-xfs,
	linux-fsdevel

On Thu, Sep 08, 2016 at 06:29:54PM +0300, Amir Goldstein wrote:
> When copying up within the same fs, try to use f_op->copy_file_range().
> This becomes very efficient when lower and upper are on the same fs
> with file reflink support.
> 
> Tested correct behavior when lower and upper are on:
> 1. same ext4 (copy)
> 2. same xfs + reflink patches + mkfs.xfs (copy)
> 3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (clone)
> 4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy)
> 
> Verified that all the overlay xfstests pass in the 'same xfs+reflink'
> setup.
> 
> For comparison, on my laptop, xfstest overlay/001 (copy up of large
> sparse files) takes less than 1 second in the xfs reflink setup vs.
> 25 seconds on the rest of the setups.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/copy_up.c | 28 +++++++++++++++++++++++++---
>  1 file changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index 43fdc27..400567b 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -121,6 +121,7 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
>  	struct file *new_file;
>  	loff_t old_pos = 0;
>  	loff_t new_pos = 0;
> +	int try_copy_file = 0;
>  	int error = 0;
>  
>  	if (len == 0)
> @@ -136,6 +137,13 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
>  		goto out_fput;
>  	}
>  
> +	/*
> +	 * When copying up within the same fs, try to use fs's copy_file_range
> +	 */
> +	if (file_inode(old_file)->i_sb == file_inode(new_file)->i_sb) {
> +		try_copy_file = (new_file->f_op->copy_file_range != NULL);
> +	}

You don't need this. .copy_file_range() should return -EXDEV when
you try to use it to copy files across different mount points or
superblocks.

i.e. you should probably be calling vfs_copy_file_range() here to do
the copy up, and if that fails (for whatever reason) then fall back
to the existing data copying code.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] ovl: use copy_file_range for copy up if possible
  2016-09-08 20:25 ` Dave Chinner
@ 2016-09-09  7:31   ` Amir Goldstein
  2016-09-09  7:54     ` Dave Chinner
  0 siblings, 1 reply; 24+ messages in thread
From: Amir Goldstein @ 2016-09-09  7:31 UTC (permalink / raw)
  To: Dave Chinner, Miklos Szeredi
  Cc: Darrick J . Wong, linux-unionfs, linux-xfs, linux-fsdevel

On Thu, Sep 8, 2016 at 11:25 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Thu, Sep 08, 2016 at 06:29:54PM +0300, Amir Goldstein wrote:
>> When copying up within the same fs, try to use f_op->copy_file_range().
>> This becomes very efficient when lower and upper are on the same fs
>> with file reflink support.
>>
>> Tested correct behavior when lower and upper are on:
>> 1. same ext4 (copy)
>> 2. same xfs + reflink patches + mkfs.xfs (copy)
>> 3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (clone)
>> 4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy)
>>
>> Verified that all the overlay xfstests pass in the 'same xfs+reflink'
>> setup.
>>
>> For comparison, on my laptop, xfstest overlay/001 (copy up of large
>> sparse files) takes less than 1 second in the xfs reflink setup vs.
>> 25 seconds on the rest of the setups.
>>
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>>  fs/overlayfs/copy_up.c | 28 +++++++++++++++++++++++++---
>>  1 file changed, 25 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
>> index 43fdc27..400567b 100644
>> --- a/fs/overlayfs/copy_up.c
>> +++ b/fs/overlayfs/copy_up.c
>> @@ -121,6 +121,7 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
>>       struct file *new_file;
>>       loff_t old_pos = 0;
>>       loff_t new_pos = 0;
>> +     int try_copy_file = 0;
>>       int error = 0;
>>
>>       if (len == 0)
>> @@ -136,6 +137,13 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
>>               goto out_fput;
>>       }
>>
>> +     /*
>> +      * When copying up within the same fs, try to use fs's copy_file_range
>> +      */
>> +     if (file_inode(old_file)->i_sb == file_inode(new_file)->i_sb) {
>> +             try_copy_file = (new_file->f_op->copy_file_range != NULL);
>> +     }
>
> You don't need this. .copy_file_range() should return -EXDEV when
> you try to use it to copy files across different mount points or
> superblocks.
>

Right.

> i.e. you should probably be calling vfs_copy_file_range() here to do
> the copy up, and if that fails (for whatever reason) then fall back
> to the existing data copying code.
>

Yes, I considered that. With this V0 patch, copy_file_range() is
called inside the copy data 'killable loop'
but, unlike the slower splice, it tries to copy the entire remaining
len on every cycle and will most likely get all or nothing without
causing any major stalls.
So my options for V1 are:
1. use the existing loop only fallback to splice on any
copy_file_range() failure.
2. add another (non killable?) loop before the splice killable loop to
try and copy up as much data with copy_file_range()
3. implement ovl_copy_up_file_range() and do the fallback near the
call site of ovl_copy_up_data()

Miklos, so you have any preference?

Cheers,
Amir.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] ovl: use copy_file_range for copy up if possible
  2016-09-09  7:31   ` Amir Goldstein
@ 2016-09-09  7:54     ` Dave Chinner
  2016-09-09  8:27       ` Amir Goldstein
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Chinner @ 2016-09-09  7:54 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Darrick J . Wong, linux-unionfs, linux-xfs,
	linux-fsdevel

On Fri, Sep 09, 2016 at 10:31:02AM +0300, Amir Goldstein wrote:
> On Thu, Sep 8, 2016 at 11:25 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Thu, Sep 08, 2016 at 06:29:54PM +0300, Amir Goldstein wrote:
> >> When copying up within the same fs, try to use f_op->copy_file_range().
> >> This becomes very efficient when lower and upper are on the same fs
> >> with file reflink support.
> >>
> >> Tested correct behavior when lower and upper are on:
> >> 1. same ext4 (copy)
> >> 2. same xfs + reflink patches + mkfs.xfs (copy)
> >> 3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (clone)
> >> 4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy)
> >>
> >> Verified that all the overlay xfstests pass in the 'same xfs+reflink'
> >> setup.
> >>
> >> For comparison, on my laptop, xfstest overlay/001 (copy up of large
> >> sparse files) takes less than 1 second in the xfs reflink setup vs.
> >> 25 seconds on the rest of the setups.
> >>
> >> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >> ---
> >>  fs/overlayfs/copy_up.c | 28 +++++++++++++++++++++++++---
> >>  1 file changed, 25 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> >> index 43fdc27..400567b 100644
> >> --- a/fs/overlayfs/copy_up.c
> >> +++ b/fs/overlayfs/copy_up.c
> >> @@ -121,6 +121,7 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
> >>       struct file *new_file;
> >>       loff_t old_pos = 0;
> >>       loff_t new_pos = 0;
> >> +     int try_copy_file = 0;
> >>       int error = 0;
> >>
> >>       if (len == 0)
> >> @@ -136,6 +137,13 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
> >>               goto out_fput;
> >>       }
> >>
> >> +     /*
> >> +      * When copying up within the same fs, try to use fs's copy_file_range
> >> +      */
> >> +     if (file_inode(old_file)->i_sb == file_inode(new_file)->i_sb) {
> >> +             try_copy_file = (new_file->f_op->copy_file_range != NULL);
> >> +     }
> >
> > You don't need this. .copy_file_range() should return -EXDEV when
> > you try to use it to copy files across different mount points or
> > superblocks.
> >
> 
> Right.
> 
> > i.e. you should probably be calling vfs_copy_file_range() here to do
> > the copy up, and if that fails (for whatever reason) then fall back
> > to the existing data copying code.
> >
> 
> Yes, I considered that. With this V0 patch, copy_file_range() is
> called inside the copy data 'killable loop'
> but, unlike the slower splice, it tries to copy the entire remaining
> len on every cycle and will most likely get all or nothing without
> causing any major stalls.
> So my options for V1 are:
> 1. use the existing loop only fallback to splice on any
> copy_file_range() failure.
> 2. add another (non killable?) loop before the splice killable loop to
> try and copy up as much data with copy_file_range()
> 3. implement ovl_copy_up_file_range() and do the fallback near the
> call site of ovl_copy_up_data()

vfs_copy_file_range() already has a fallback to call
do_splice_direct() itself if ->copy_file_range() is not supported.
i.e. it will behave identically to the existing code if
copy_file_range is not supported by the underlying fs.

If copy_file_range() fails, then it's for a reason that will cause
do_splice_direct() to fail as well.

vfs_copy_file_range() should really be a direct replacement for any
code that calls do_splice_direct(). If it's not, we should make it
so (e.g call do_splice direct for cross-fs copies automatically
rather than returning EXDEV) and then replace all the calls in the
kernel to do_splice_direct() with vfs_copy_file_range()....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] ovl: use copy_file_range for copy up if possible
  2016-09-09  7:54     ` Dave Chinner
@ 2016-09-09  8:27       ` Amir Goldstein
  2016-09-09 23:52         ` Dave Chinner
  0 siblings, 1 reply; 24+ messages in thread
From: Amir Goldstein @ 2016-09-09  8:27 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Miklos Szeredi, Darrick J . Wong, linux-unionfs, linux-xfs,
	linux-fsdevel

On Fri, Sep 9, 2016 at 10:54 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Fri, Sep 09, 2016 at 10:31:02AM +0300, Amir Goldstein wrote:
>> On Thu, Sep 8, 2016 at 11:25 PM, Dave Chinner <david@fromorbit.com> wrote:
>> > On Thu, Sep 08, 2016 at 06:29:54PM +0300, Amir Goldstein wrote:
>> >> When copying up within the same fs, try to use f_op->copy_file_range().
>> >> This becomes very efficient when lower and upper are on the same fs
>> >> with file reflink support.
>> >>
>> >> Tested correct behavior when lower and upper are on:
>> >> 1. same ext4 (copy)
>> >> 2. same xfs + reflink patches + mkfs.xfs (copy)
>> >> 3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (clone)
>> >> 4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy)
>> >>
>> >> Verified that all the overlay xfstests pass in the 'same xfs+reflink'
>> >> setup.
>> >>
>> >> For comparison, on my laptop, xfstest overlay/001 (copy up of large
>> >> sparse files) takes less than 1 second in the xfs reflink setup vs.
>> >> 25 seconds on the rest of the setups.
>> >>
>> >> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> >> ---
>> >>  fs/overlayfs/copy_up.c | 28 +++++++++++++++++++++++++---
>> >>  1 file changed, 25 insertions(+), 3 deletions(-)
>> >>
>> >> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
>> >> index 43fdc27..400567b 100644
>> >> --- a/fs/overlayfs/copy_up.c
>> >> +++ b/fs/overlayfs/copy_up.c
>> >> @@ -121,6 +121,7 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
>> >>       struct file *new_file;
>> >>       loff_t old_pos = 0;
>> >>       loff_t new_pos = 0;
>> >> +     int try_copy_file = 0;
>> >>       int error = 0;
>> >>
>> >>       if (len == 0)
>> >> @@ -136,6 +137,13 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
>> >>               goto out_fput;
>> >>       }
>> >>
>> >> +     /*
>> >> +      * When copying up within the same fs, try to use fs's copy_file_range
>> >> +      */
>> >> +     if (file_inode(old_file)->i_sb == file_inode(new_file)->i_sb) {
>> >> +             try_copy_file = (new_file->f_op->copy_file_range != NULL);
>> >> +     }
>> >
>> > You don't need this. .copy_file_range() should return -EXDEV when
>> > you try to use it to copy files across different mount points or
>> > superblocks.
>> >
>>
>> Right.
>>
>> > i.e. you should probably be calling vfs_copy_file_range() here to do
>> > the copy up, and if that fails (for whatever reason) then fall back
>> > to the existing data copying code.
>> >
>>
>> Yes, I considered that. With this V0 patch, copy_file_range() is
>> called inside the copy data 'killable loop'
>> but, unlike the slower splice, it tries to copy the entire remaining
>> len on every cycle and will most likely get all or nothing without
>> causing any major stalls.
>> So my options for V1 are:
>> 1. use the existing loop only fallback to splice on any
>> copy_file_range() failure.
>> 2. add another (non killable?) loop before the splice killable loop to
>> try and copy up as much data with copy_file_range()
>> 3. implement ovl_copy_up_file_range() and do the fallback near the
>> call site of ovl_copy_up_data()
>
> vfs_copy_file_range() already has a fallback to call
> do_splice_direct() itself if ->copy_file_range() is not supported.
> i.e. it will behave identically to the existing code if
> copy_file_range is not supported by the underlying fs.
>

I though so initially, but existing code is not identical to the
vfs_copy_file_range() implementation because ovl_copy_up_data()
splices in small chunks allowing the user to kill the copying process.
This makes sense because the poor process only called open(),
so the app writer may not have been expecting a stall of copying
a large file...

> If copy_file_range() fails, then it's for a reason that will cause
> do_splice_direct() to fail as well.
>
> vfs_copy_file_range() should really be a direct replacement for any
> code that calls do_splice_direct(). If it's not, we should make it
> so (e.g call do_splice direct for cross-fs copies automatically
> rather than returning EXDEV)

But man page states that EXDEV will be returned if
     "The files referred to by file_in and file_out are not on the
      same mounted filesystem"

I guess that when API is updated to allow for non zero flags,
then vfs_copy_file_range() should do_splice() instead or returning
EXDEV, only if (flags == COPY_FR_COPY).

> and then replace all the calls in the
> kernel to do_splice_direct() with vfs_copy_file_range()....

So in this case, I could not have replaced do_splice_direct() with
vfs_copy_file_range(), because I would either break the killable loop
behavior, or call copy_file_range() in small chunks which is not
desirable - is it?

Cheers,
Amir.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] ovl: use copy_file_range for copy up if possible
  2016-09-09  8:27       ` Amir Goldstein
@ 2016-09-09 23:52         ` Dave Chinner
  2016-09-10  7:40           ` Christoph Hellwig
  2016-09-10 18:54           ` Amir Goldstein
  0 siblings, 2 replies; 24+ messages in thread
From: Dave Chinner @ 2016-09-09 23:52 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Darrick J . Wong, linux-unionfs, linux-xfs,
	linux-fsdevel

On Fri, Sep 09, 2016 at 11:27:34AM +0300, Amir Goldstein wrote:
> On Fri, Sep 9, 2016 at 10:54 AM, Dave Chinner <david@fromorbit.com> wrote:
> > On Fri, Sep 09, 2016 at 10:31:02AM +0300, Amir Goldstein wrote:
> >> On Thu, Sep 8, 2016 at 11:25 PM, Dave Chinner <david@fromorbit.com> wrote:
> >> > On Thu, Sep 08, 2016 at 06:29:54PM +0300, Amir Goldstein wrote:
> >> Yes, I considered that. With this V0 patch, copy_file_range() is
> >> called inside the copy data 'killable loop'
> >> but, unlike the slower splice, it tries to copy the entire remaining
> >> len on every cycle and will most likely get all or nothing without
> >> causing any major stalls.
> >> So my options for V1 are:
> >> 1. use the existing loop only fallback to splice on any
> >> copy_file_range() failure.
> >> 2. add another (non killable?) loop before the splice killable loop to
> >> try and copy up as much data with copy_file_range()
> >> 3. implement ovl_copy_up_file_range() and do the fallback near the
> >> call site of ovl_copy_up_data()
> >
> > vfs_copy_file_range() already has a fallback to call
> > do_splice_direct() itself if ->copy_file_range() is not supported.
> > i.e. it will behave identically to the existing code if
> > copy_file_range is not supported by the underlying fs.
> >
> 
> I though so initially, but existing code is not identical to the
> vfs_copy_file_range() implementation because ovl_copy_up_data()
> splices in small chunks allowing the user to kill the copying process.
> This makes sense because the poor process only called open(),
> so the app writer may not have been expecting a stall of copying
> a large file...

So call vfs_copy_file_range() iteratively, just like is being done
right now for do_splice_direct() to limit latency on kill.

> > If copy_file_range() fails, then it's for a reason that will cause
> > do_splice_direct() to fail as well.
> >
> > vfs_copy_file_range() should really be a direct replacement for any
> > code that calls do_splice_direct(). If it's not, we should make it
> > so (e.g call do_splice direct for cross-fs copies automatically
> > rather than returning EXDEV)
> 
> But man page states that EXDEV will be returned if
>      "The files referred to by file_in and file_out are not on the
>       same mounted filesystem"

That's the /syscall/ man page, not how we must implement the
internal helper. Did you even /look/ at vfs_copy_file_range()?
hint:

        /* this could be relaxed once a method supports cross-fs copies */
        if (inode_in->i_sb != inode_out->i_sb)
                return -EXDEV;


> 
> I guess that when API is updated to allow for non zero flags,
> then vfs_copy_file_range() should do_splice() instead or returning
> EXDEV, only if (flags == COPY_FR_COPY).

Not necessary - just hoist the EXDEV check to the syscall layer.
Then, as I've already said, make vfs_copy_file_range "call do_splice
direct for cross-fs copies automatically".

i.e. vfs_copy_file_range() should just copy the data in the most
efficient way possible for the given src/dst inode pair.  In future,
if we add capability for offload of cross-fs copies, we can add the
infrastructure to do that within vfs_copy_file_range() and not have
to change a single caller to take advantage of it....

> > and then replace all the calls in the
> > kernel to do_splice_direct() with vfs_copy_file_range()....
> 
> So in this case, I could not have replaced do_splice_direct() with
> vfs_copy_file_range(), because I would either break the killable loop
> behavior, or call copy_file_range() in small chunks which is not
> desirable - is it?

Of course you can call vfs_copy_file_range() in small chunks. It's
just not going to be as efficient as a single large copy offload.
Worst case, it ends up being identical to what ovl is doing now.

But the question here is this: why are you even trying to /copy/ the
data?  That's not guaranteed to do a fast, atomic,
zero-data-movement operation. i.e. what we really want here first is
an attempt to /clone/ the data:

	1. try a fast, atomic, metadata clone operation like reflink
	2. try a fast, optimised data copy
	3. if all else fails, use do_splice_direct() to copy data.

i.e first try vfs_clone_file_range() because:

http://oss.sgi.com/archives/xfs/2015-12/msg00356.html

	[...] Note that clones are different from
	file copies in several ways:

	 - they are atomic vs other writers
	 - they support whole file clones
	 - they support 64-bit legth clones
	 - they do not allow partial success (aka short writes)
	 - clones are expected to be a fast metadata operation

i.e. if you want to use reflink type methods to optimise copy-up
latency, then you need to be /cloning/ the file, not copying it.
You can test whether this is supported at mount time, so you do a
simply flag test at copyup to determine if a clone should be
attempted or not.

If cloning fails or is not supported, then try vfs_copy_file_range()
to do an optimised iterative partial range file copy.  Finally, try
a slow, iterative partial range file copies using
do_splice_direct(). This part can be wholly handled by
vfs_copy_file_range() - this 'not supported' fallback doesn't need
to be implemented every time someone wants to copy data between two
files...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] ovl: use copy_file_range for copy up if possible
  2016-09-09 23:52         ` Dave Chinner
@ 2016-09-10  7:40           ` Christoph Hellwig
  2016-09-10 18:15             ` Amir Goldstein
  2016-09-10 18:54           ` Amir Goldstein
  1 sibling, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2016-09-10  7:40 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Amir Goldstein, Miklos Szeredi, Darrick J . Wong, linux-unionfs,
	linux-xfs, linux-fsdevel

On Sat, Sep 10, 2016 at 09:52:21AM +1000, Dave Chinner wrote:
> > vfs_copy_file_range() implementation because ovl_copy_up_data()
> > splices in small chunks allowing the user to kill the copying process.
> > This makes sense because the poor process only called open(),
> > so the app writer may not have been expecting a stall of copying
> > a large file...
> 
> So call vfs_copy_file_range() iteratively, just like is being done
> right now for do_splice_direct() to limit latency on kill.

I wish vfs_copy_file_range would do useful chinking itself.

But either way it might be a good idea to call vfs_clone_file_range
first, because that gives your a very efficient copy without the need
to copy anything if supported.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] ovl: use copy_file_range for copy up if possible
  2016-09-10  7:40           ` Christoph Hellwig
@ 2016-09-10 18:15             ` Amir Goldstein
  0 siblings, 0 replies; 24+ messages in thread
From: Amir Goldstein @ 2016-09-10 18:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dave Chinner, Miklos Szeredi, Darrick J . Wong, linux-unionfs,
	linux-xfs, linux-fsdevel

On Sat, Sep 10, 2016 at 10:40 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Sat, Sep 10, 2016 at 09:52:21AM +1000, Dave Chinner wrote:
>> > vfs_copy_file_range() implementation because ovl_copy_up_data()
>> > splices in small chunks allowing the user to kill the copying process.
>> > This makes sense because the poor process only called open(),
>> > so the app writer may not have been expecting a stall of copying
>> > a large file...
>>
>> So call vfs_copy_file_range() iteratively, just like is being done
>> right now for do_splice_direct() to limit latency on kill.
>
> I wish vfs_copy_file_range would do useful chinking itself.
>

I guess that is more changes then I set out to do here, but
if there is consensus about this idea I don't mind drafting the patch.

> But either way it might be a good idea to call vfs_clone_file_range
> first, because that gives your a very efficient copy without the need
> to copy anything if supported.

Definitely. I'll do that.
Thanks.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] ovl: use copy_file_range for copy up if possible
  2016-09-09 23:52         ` Dave Chinner
  2016-09-10  7:40           ` Christoph Hellwig
@ 2016-09-10 18:54           ` Amir Goldstein
  2016-09-11 22:11             ` Dave Chinner
  1 sibling, 1 reply; 24+ messages in thread
From: Amir Goldstein @ 2016-09-10 18:54 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Miklos Szeredi, Darrick J . Wong, linux-unionfs, linux-xfs,
	linux-fsdevel, Christoph Hellwig

On Sat, Sep 10, 2016 at 2:52 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Fri, Sep 09, 2016 at 11:27:34AM +0300, Amir Goldstein wrote:
>> On Fri, Sep 9, 2016 at 10:54 AM, Dave Chinner <david@fromorbit.com> wrote:
>> > On Fri, Sep 09, 2016 at 10:31:02AM +0300, Amir Goldstein wrote:
>> >> On Thu, Sep 8, 2016 at 11:25 PM, Dave Chinner <david@fromorbit.com> wrote:
>> >> > On Thu, Sep 08, 2016 at 06:29:54PM +0300, Amir Goldstein wrote:
>> >> Yes, I considered that. With this V0 patch, copy_file_range() is
>> >> called inside the copy data 'killable loop'
>> >> but, unlike the slower splice, it tries to copy the entire remaining
>> >> len on every cycle and will most likely get all or nothing without
>> >> causing any major stalls.
>> >> So my options for V1 are:
>> >> 1. use the existing loop only fallback to splice on any
>> >> copy_file_range() failure.
>> >> 2. add another (non killable?) loop before the splice killable loop to
>> >> try and copy up as much data with copy_file_range()
>> >> 3. implement ovl_copy_up_file_range() and do the fallback near the
>> >> call site of ovl_copy_up_data()
>> >
>> > vfs_copy_file_range() already has a fallback to call
>> > do_splice_direct() itself if ->copy_file_range() is not supported.
>> > i.e. it will behave identically to the existing code if
>> > copy_file_range is not supported by the underlying fs.
>> >
>>
>> I though so initially, but existing code is not identical to the
>> vfs_copy_file_range() implementation because ovl_copy_up_data()
>> splices in small chunks allowing the user to kill the copying process.
>> This makes sense because the poor process only called open(),
>> so the app writer may not have been expecting a stall of copying
>> a large file...
>
> So call vfs_copy_file_range() iteratively, just like is being done
> right now for do_splice_direct() to limit latency on kill.
>
>> > If copy_file_range() fails, then it's for a reason that will cause
>> > do_splice_direct() to fail as well.
>> >
>> > vfs_copy_file_range() should really be a direct replacement for any
>> > code that calls do_splice_direct(). If it's not, we should make it
>> > so (e.g call do_splice direct for cross-fs copies automatically
>> > rather than returning EXDEV)
>>
>> But man page states that EXDEV will be returned if
>>      "The files referred to by file_in and file_out are not on the
>>       same mounted filesystem"
>
> That's the /syscall/ man page, not how we must implement the
> internal helper. Did you even /look/ at vfs_copy_file_range()?
> hint:
>
>         /* this could be relaxed once a method supports cross-fs copies */
>         if (inode_in->i_sb != inode_out->i_sb)
>                 return -EXDEV;
>
>
>>
>> I guess that when API is updated to allow for non zero flags,
>> then vfs_copy_file_range() should do_splice() instead or returning
>> EXDEV, only if (flags == COPY_FR_COPY).
>
> Not necessary - just hoist the EXDEV check to the syscall layer.
> Then, as I've already said, make vfs_copy_file_range "call do_splice
> direct for cross-fs copies automatically".
>
> i.e. vfs_copy_file_range() should just copy the data in the most
> efficient way possible for the given src/dst inode pair.  In future,
> if we add capability for offload of cross-fs copies, we can add the
> infrastructure to do that within vfs_copy_file_range() and not have
> to change a single caller to take advantage of it....
>
>> > and then replace all the calls in the
>> > kernel to do_splice_direct() with vfs_copy_file_range()....
>>
>> So in this case, I could not have replaced do_splice_direct() with
>> vfs_copy_file_range(), because I would either break the killable loop
>> behavior, or call copy_file_range() in small chunks which is not
>> desirable - is it?
>
> Of course you can call vfs_copy_file_range() in small chunks. It's
> just not going to be as efficient as a single large copy offload.
> Worst case, it ends up being identical to what ovl is doing now.
>
> But the question here is this: why are you even trying to /copy/ the
> data?  That's not guaranteed to do a fast, atomic,
> zero-data-movement operation. i.e. what we really want here first is
> an attempt to /clone/ the data:
>
>         1. try a fast, atomic, metadata clone operation like reflink
>         2. try a fast, optimised data copy
>         3. if all else fails, use do_splice_direct() to copy data.
>
> i.e first try vfs_clone_file_range() because:
>
> http://oss.sgi.com/archives/xfs/2015-12/msg00356.html
>
>         [...] Note that clones are different from
>         file copies in several ways:
>
>          - they are atomic vs other writers
>          - they support whole file clones
>          - they support 64-bit legth clones
>          - they do not allow partial success (aka short writes)
>          - clones are expected to be a fast metadata operation
>

I admit I missed this Note. perhaps it would be good to keep it in comment next
to the copy/clone_range helpers.

> i.e. if you want to use reflink type methods to optimise copy-up
> latency, then you need to be /cloning/ the file, not copying it.

That's a good advise and I will definitely follow it.
I shall call clone_file_range once above the splice loop.

> You can test whether this is supported at mount time, so you do a
> simply flag test at copyup to determine if a clone should be
> attempted or not.
>

I am not sure that would not be over optimization.
I am already checking for the obvious reason for clone to fail in copyup -
different i_sb.
After all, if I just call clone_file_range() once and it fails, then we are back
to copying and that is going to make the cost of calling clone insignificant.

> If cloning fails or is not supported, then try vfs_copy_file_range()
> to do an optimised iterative partial range file copy.  Finally, try
> a slow, iterative partial range file copies using
> do_splice_direct(). This part can be wholly handled by
> vfs_copy_file_range() - this 'not supported' fallback doesn't need
> to be implemented every time someone wants to copy data between two
> files...

You do realize that vfs_copy_file_range() itself does the 'not
supported' fallback
and if I do call it iteratively, then there will be a lot more 'not
supported' attempts
then there are in my current patch.
But regardless, as I wrote to Christoph, changing the
vfs_copy_file_range() helper
and changing users of do_splice to use it like you suggested sounds
like it may be the right thing to do, but without consensus, I am a bit hesitant
to make those changes. I am definitely willing to draft the patch and test it
if I get more ACKs on the idea.

Beyond the question of whether or not to change vfs_copy_file_range(),
there is the pragmatic question of which workload is going to benefit from
this in the copyup case.

My patch sets out to improve a very clear and immediate problem for
overlayfs over xfs setup (lower and upper on the same fs).
It is not a hypothetical case, it is a very common case for docker.
Further more, when docker users realize they have this improvement,
it will provide a very good incentive for users (and container distros) to
test and deploy overlayfs over xfs with reflink support.
So it would be a great service to the docker community if xfs reflink support
would be out with v4.9 (hint hint).

The copy_file_range() before do_splice() step, OTOH, may be useful
for overlayfs over nfs (lower and upper on the same fs) and I don't know
that is an interesting use case. anyone?

Thanks for the good review comments
Will work on V1 tomorrow.

Cheers,
Amir.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] ovl: use copy_file_range for copy up if possible
  2016-09-10 18:54           ` Amir Goldstein
@ 2016-09-11 22:11             ` Dave Chinner
  2016-09-12  6:52               ` Amir Goldstein
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Chinner @ 2016-09-11 22:11 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Darrick J . Wong, linux-unionfs, linux-xfs,
	linux-fsdevel, Christoph Hellwig

On Sat, Sep 10, 2016 at 09:54:59PM +0300, Amir Goldstein wrote:
> On Sat, Sep 10, 2016 at 2:52 AM, Dave Chinner <david@fromorbit.com> wrote:
> 
> > You can test whether this is supported at mount time, so you do a
> > simply flag test at copyup to determine if a clone should be
> > attempted or not.
> >
> 
> I am not sure that would not be over optimization.
> I am already checking for the obvious reason for clone to fail in copyup -
> different i_sb.

Again, please don't do that.  Call vfs_clone_file_range() as it
checks a whole lot more stuff that can cause a clone to fail. And it
makes sure that the write references to the mnt are taken so that
things like freeze and remount-ro behave correctly while a clone is
in progress.

> After all, if I just call clone_file_range() once and it fails, then we are back
> to copying and that is going to make the cost of calling clone insignificant.

Apart from the fact that the ->clone_file_range() calls assume that
all the validity checks have already been done by the caller, which
you are not doing.

> > If cloning fails or is not supported, then try vfs_copy_file_range()
> > to do an optimised iterative partial range file copy.  Finally, try
> > a slow, iterative partial range file copies using
> > do_splice_direct(). This part can be wholly handled by
> > vfs_copy_file_range() - this 'not supported' fallback doesn't need
> > to be implemented every time someone wants to copy data between two
> > files...
> 
> You do realize that vfs_copy_file_range() itself does the 'not
> supported' fallback
> and if I do call it iteratively, then there will be a lot more 'not
> supported' attempts
> then there are in my current patch.

No shit, Sherlock. But you're concentrating on the wrong thing -
the overhead of checking if .clone_file_range/.copy_file_range is
implemented and can be executed is effectively zero compared to
copying any amount of data.

IOWs, Amir, you're trying to *optimise the wrong thing*. It's the
data copy that is costly and needs to be optimised, not the
iteration or the checks done to determine what type of clone/copy
can be executed. Shortcuts around generic helpers like you are
proposing are more costly in the long run because code like this is
much more likely to contain/mask bugs that only appear months or
years later when something else is changed. Case in point: the mnt
write references that need to be taken before calling
clone/copy_file_range()....

Please, just use vfs_clone_file_range() and vfs_copy_file_range()
and only fall back to a slower method if the error returned is
-EOPNOTSUPP. For any other error, the copy should fail, not be
ignored.

> But regardless, as I wrote to Christoph, changing the
> vfs_copy_file_range() helper
> and changing users of do_splice to use it like you suggested sounds
> like it may be the right thing to do, but without consensus, I am a bit hesitant
> to make those changes. I am definitely willing to draft the patch and test it
> if I get more ACKs on the idea.

Send a patch - that's the only way you'll get anyone to comment
on it.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] ovl: use copy_file_range for copy up if possible
  2016-09-11 22:11             ` Dave Chinner
@ 2016-09-12  6:52               ` Amir Goldstein
  2016-09-12 15:37                 ` Amir Goldstein
  0 siblings, 1 reply; 24+ messages in thread
From: Amir Goldstein @ 2016-09-12  6:52 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Miklos Szeredi, Darrick J . Wong, linux-unionfs, linux-xfs,
	linux-fsdevel, Christoph Hellwig

On Mon, Sep 12, 2016 at 1:11 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Sat, Sep 10, 2016 at 09:54:59PM +0300, Amir Goldstein wrote:
>> On Sat, Sep 10, 2016 at 2:52 AM, Dave Chinner <david@fromorbit.com> wrote:
>>
>> > You can test whether this is supported at mount time, so you do a
>> > simply flag test at copyup to determine if a clone should be
>> > attempted or not.
>> >
>>
>> I am not sure that would not be over optimization.
>> I am already checking for the obvious reason for clone to fail in copyup -
>> different i_sb.
>
> Again, please don't do that.  Call vfs_clone_file_range() as it
> checks a whole lot more stuff that can cause a clone to fail. And it
> makes sure that the write references to the mnt are taken so that
> things like freeze and remount-ro behave correctly while a clone is
> in progress.
>

OK

>> After all, if I just call clone_file_range() once and it fails, then we are back
>> to copying and that is going to make the cost of calling clone insignificant.
>
> Apart from the fact that the ->clone_file_range() calls assume that
> all the validity checks have already been done by the caller, which
> you are not doing.
>
>> > If cloning fails or is not supported, then try vfs_copy_file_range()
>> > to do an optimised iterative partial range file copy.  Finally, try
>> > a slow, iterative partial range file copies using
>> > do_splice_direct(). This part can be wholly handled by
>> > vfs_copy_file_range() - this 'not supported' fallback doesn't need
>> > to be implemented every time someone wants to copy data between two
>> > files...
>>
>> You do realize that vfs_copy_file_range() itself does the 'not
>> supported' fallback
>> and if I do call it iteratively, then there will be a lot more 'not
>> supported' attempts
>> then there are in my current patch.
>
> No shit, Sherlock. But you're concentrating on the wrong thing -
> the overhead of checking if .clone_file_range/.copy_file_range is
> implemented and can be executed is effectively zero compared to
> copying any amount of data.
>
> IOWs, Amir, you're trying to *optimise the wrong thing*. It's the
> data copy that is costly and needs to be optimised, not the
> iteration or the checks done to determine what type of clone/copy
> can be executed. Shortcuts around generic helpers like you are
> proposing are more costly in the long run because code like this is
> much more likely to contain/mask bugs that only appear months or
> years later when something else is changed. Case in point: the mnt
> write references that need to be taken before calling
> clone/copy_file_range()....
>
> Please, just use vfs_clone_file_range() and vfs_copy_file_range()
> and only fall back to a slower method if the error returned is
> -EOPNOTSUPP. For any other error, the copy should fail, not be
> ignored.
>

Obviously, you meant to check for -EOPNOTSUPP or -EXDEV

>> But regardless, as I wrote to Christoph, changing the
>> vfs_copy_file_range() helper
>> and changing users of do_splice to use it like you suggested sounds
>> like it may be the right thing to do, but without consensus, I am a bit hesitant
>> to make those changes. I am definitely willing to draft the patch and test it
>> if I get more ACKs on the idea.
>
> Send a patch - that's the only way you'll get anyone to comment
> on it.
>

Will do.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 0/4] ovl: efficient copy up by reflink
  2016-09-08 15:29 [PATCH] ovl: use copy_file_range for copy up if possible Amir Goldstein
  2016-09-08 20:25 ` Dave Chinner
@ 2016-09-12 15:06 ` Amir Goldstein
  2016-09-12 15:06   ` [PATCH v2 1/4] vfs: allow vfs_clone_file_range() across mount points Amir Goldstein
                     ` (3 more replies)
  1 sibling, 4 replies; 24+ messages in thread
From: Amir Goldstein @ 2016-09-12 15:06 UTC (permalink / raw)
  To: Miklos Szeredi, Dave Chinner, linux-unionfs
  Cc: Christoph Hellwig, linux-xfs, Darrick J . Wong, linux-fsdevel

Btrfs has file reflink support and XFS is about to gain
file reflink support soon. It is very useful to use reflink
to implement copy up of regular file data when possible.

For example, on my laptop, xfstest overlay/001 (copy up of 4G
sparse files) takes less than 1 second with copy up by reflink
vs. 25 seconds with regular copy up.

This series includes two pairs of patches:
- patches 1,2 utilize the clone_file_range() API
- patches 3,4 utilize the copy_file_range() API

The two pairs of patches are independent of each other.
They were each tested separately and both tested together.
All combinations passed the unionmount-testsuite (over tmpfs)
All combinations passed the overlay/??? xfstests over the
following underlying fs:
1. ext4 (copy up)
2. xfs + reflink patches + mkfs.xfs (copy up)
3. xfs + reflink patches + mkfs.xfs -m reflink=1 (reflink up)

Dave Chinner suggested the following implementation for copy up,
which I implemented in this series:
1. try to clone_file_range() entire length
2. fallback to trying copy_file_range() in small chunks
3. fallback to do_splice_direct() in small chunks

This is a good general implementation to cover the future use cases of
file systems that can do either clone_file_range() or copy_file_range().
However, currently, the only in-tree file systems that support
clone/copy_file_range are btrfs, xfs (soon), cifs and nfs.
btrfs and xfs use the same implementation for clone and copy range,
so the copy_file_range() step is never needed.
cifs supports only clone_file_range() so copy_file_range() step is moot.
nfs does have a different implementation for clone_file_range() and
copy_file_range(), but nfs is not supported as upper layer for overlayfs
at the moment.

Please pick patches 1,2 for clear and immediate benefit to copy up
performance on filesystems with reflink support.

Please consider picking patches 3,4 additionally for future generations
and for code consolidation into vfs helpers.

Cheers,
Amir.

V2:
- Re-factor vfs helpers so they can be called from copy up
- Single call to vfs_clone_file_range() and fallback to
  vfs_copy_file_range() loop

V1:
- Replace iteravite call to copy_file_range() with
  a single call to clone_file_range()

V0:
- Call clone_file_range() and fallback to do_splice_direct()

Amir Goldstein (4):
  vfs: allow vfs_clone_file_range() across mount points
  ovl: use vfs_clone_file_range() for copy up if possible
  vfs: allow vfs_copy_file_range() across file systems
  ovl: use vfs_copy_file_range() to copy up file data

 fs/ioctl.c             |  2 ++
 fs/overlayfs/copy_up.c | 19 +++++++++++++++----
 fs/read_write.c        | 23 ++++++++++++++++-------
 3 files changed, 33 insertions(+), 11 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/4] vfs: allow vfs_clone_file_range() across mount points
  2016-09-12 15:06 ` [PATCH v2 0/4] ovl: efficient copy up by reflink Amir Goldstein
@ 2016-09-12 15:06   ` Amir Goldstein
  2016-09-12 15:06   ` [PATCH v2 2/4] ovl: use vfs_clone_file_range() for copy up if possible Amir Goldstein
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 24+ messages in thread
From: Amir Goldstein @ 2016-09-12 15:06 UTC (permalink / raw)
  To: Miklos Szeredi, Dave Chinner, linux-unionfs
  Cc: Christoph Hellwig, linux-xfs, Darrick J . Wong, linux-fsdevel

FICLONE/FICLONERANGE ioctls return -EXDEV if src and dest
files are not on the same mount point.
Practically, clone only requires that src and dest files
are on the same file system.

Move the check for same mount point to ioctl code and keep
only the check for same super block in the vfs helper.

A following patch is going to use the vfs_clone_file_range()
helper in overlayfs to copy up between lower and upper
mount points on the same file system.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/ioctl.c      | 2 ++
 fs/read_write.c | 8 ++++++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/ioctl.c b/fs/ioctl.c
index c415668..34d2994 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -223,6 +223,8 @@ static long ioctl_file_clone(struct file *dst_file, unsigned long srcfd,
 
 	if (!src_file.file)
 		return -EBADF;
+	if (src_file.file->f_path.mnt != dst_file->f_path.mnt)
+		return -EXDEV;
 	ret = vfs_clone_file_range(src_file.file, off, dst_file, destoff, olen);
 	fdput(src_file);
 	return ret;
diff --git a/fs/read_write.c b/fs/read_write.c
index 66215a7..9dc6e52 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1628,8 +1628,12 @@ int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
 	struct inode *inode_out = file_inode(file_out);
 	int ret;
 
-	if (inode_in->i_sb != inode_out->i_sb ||
-	    file_in->f_path.mnt != file_out->f_path.mnt)
+	/*
+	 * FICLONE/FICLONERANGE ioctls enforce that src and dest files
+	 * are on the same mount point. Practically, they only need
+	 * to be on the same file system.
+	 */
+	if (inode_in->i_sb != inode_out->i_sb)
 		return -EXDEV;
 
 	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 2/4] ovl: use vfs_clone_file_range() for copy up if possible
  2016-09-12 15:06 ` [PATCH v2 0/4] ovl: efficient copy up by reflink Amir Goldstein
  2016-09-12 15:06   ` [PATCH v2 1/4] vfs: allow vfs_clone_file_range() across mount points Amir Goldstein
@ 2016-09-12 15:06   ` Amir Goldstein
  2016-09-13  0:02     ` Dave Chinner
  2016-09-12 15:06   ` [PATCH v2 3/4] vfs: allow vfs_copy_file_range() across file systems Amir Goldstein
  2016-09-12 15:06   ` [PATCH v2 4/4] ovl: use vfs_copy_file_range() to copy up file data Amir Goldstein
  3 siblings, 1 reply; 24+ messages in thread
From: Amir Goldstein @ 2016-09-12 15:06 UTC (permalink / raw)
  To: Miklos Szeredi, Dave Chinner, linux-unionfs
  Cc: Christoph Hellwig, linux-xfs, Darrick J . Wong, linux-fsdevel

When copying up within the same fs, try to use vfs_clone_file_range().
This is very efficient when lower and upper are on the same fs
with file reflink support. If vfs_clone_file_range() fails because
lower and upper are not on the same fs or if fs has no reflink support,
copy up falls back to the regular data copy code.

Tested correct behavior when lower and upper are on:
1. same ext4 (copy)
2. same xfs + reflink patches + mkfs.xfs (copy)
3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (clone)
4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy)

For comparison, on my laptop, xfstest overlay/001 (copy up of large
sparse files) takes less than 1 second in the xfs reflink setup vs.
25 seconds on the rest of the setups.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 43fdc27..e432d7e 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -136,6 +136,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
 		goto out_fput;
 	}
 
+	/*
+	 * Try to use clone_file_range to clone up within the same fs.
+	 * On failure to clone entire file, fallback to copy loop.
+	 */
+	error = vfs_clone_file_range(old_file, 0, new_file, 0, len);
+	if (error == -EXDEV || error == -EOPNOTSUPP)
+		error = 0;
+	else
+		len = 0;
+
 	/* FIXME: copy up sparse files efficiently */
 	while (len) {
 		size_t this_len = OVL_COPY_UP_CHUNK_SIZE;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 3/4] vfs: allow vfs_copy_file_range() across file systems
  2016-09-12 15:06 ` [PATCH v2 0/4] ovl: efficient copy up by reflink Amir Goldstein
  2016-09-12 15:06   ` [PATCH v2 1/4] vfs: allow vfs_clone_file_range() across mount points Amir Goldstein
  2016-09-12 15:06   ` [PATCH v2 2/4] ovl: use vfs_clone_file_range() for copy up if possible Amir Goldstein
@ 2016-09-12 15:06   ` Amir Goldstein
  2016-09-13  0:08     ` Dave Chinner
  2016-09-12 15:06   ` [PATCH v2 4/4] ovl: use vfs_copy_file_range() to copy up file data Amir Goldstein
  3 siblings, 1 reply; 24+ messages in thread
From: Amir Goldstein @ 2016-09-12 15:06 UTC (permalink / raw)
  To: Miklos Szeredi, Dave Chinner, linux-unionfs
  Cc: Christoph Hellwig, linux-xfs, Darrick J . Wong, linux-fsdevel

copy_file_range syscall returns -EXDEV if src and dest
file are not on the same file system.
The vfs_copy_file_range() helper, however, knows how to copy
across file systems with do_splice_direct().

Move the enforcement of same file system from the vfs helper
to the syscall code.

A following patch is going to use the vfs_copy_file_range()
helper in overlayfs to copy up between lower and upper
not on the same file system.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/read_write.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 9dc6e52..c4675c6 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1502,10 +1502,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	    (file_out->f_flags & O_APPEND))
 		return -EBADF;
 
-	/* this could be relaxed once a method supports cross-fs copies */
-	if (inode_in->i_sb != inode_out->i_sb)
-		return -EXDEV;
-
 	if (len == 0)
 		return 0;
 
@@ -1514,7 +1510,8 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 		return ret;
 
 	ret = -EOPNOTSUPP;
-	if (file_out->f_op->copy_file_range)
+	if (inode_in->i_sb == inode_out->i_sb &&
+			file_out->f_op->copy_file_range)
 		ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
 						      pos_out, len, flags);
 	if (ret == -EOPNOTSUPP)
@@ -1569,6 +1566,14 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
 		pos_out = f_out.file->f_pos;
 	}
 
+	/*
+	 * FIXME: should copy_file_range syscall enforce that src and
+	 * dest files are on the same mount point or only on the same
+	 * file system? none of the above?
+	 */
+	if (file_inode(f_in.file)->i_sb != file_inode(f_out.file)->i_sb)
+		return -EXDEV;
+
 	ret = vfs_copy_file_range(f_in.file, pos_in, f_out.file, pos_out, len,
 				  flags);
 	if (ret > 0) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 4/4] ovl: use vfs_copy_file_range() to copy up file data
  2016-09-12 15:06 ` [PATCH v2 0/4] ovl: efficient copy up by reflink Amir Goldstein
                     ` (2 preceding siblings ...)
  2016-09-12 15:06   ` [PATCH v2 3/4] vfs: allow vfs_copy_file_range() across file systems Amir Goldstein
@ 2016-09-12 15:06   ` Amir Goldstein
  2016-09-13  0:11     ` Dave Chinner
  3 siblings, 1 reply; 24+ messages in thread
From: Amir Goldstein @ 2016-09-12 15:06 UTC (permalink / raw)
  To: Miklos Szeredi, Dave Chinner, linux-unionfs
  Cc: Christoph Hellwig, linux-xfs, Darrick J . Wong, linux-fsdevel

Use vfs_copy_file_range() helper instead of calling do_splice_direct()
when copying up file data.
When copying up within the same fs, which supports copy_file_range(),
fs implementation can be more efficient then do_splice_direct().
vfs_copy_file_range() helper falls back to do_splice_direct() if
it cannot use the file system's copy_file_range() implementation.

Tested correct behavior when lower and upper are on:
1. same ext4 (copy)
2. same xfs + reflink patches + mkfs.xfs (copy)
3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (clone)
4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy)

For comparison, on my laptop, xfstest overlay/001 (copy up of large
sparse files) takes less than 1 second in the xfs reflink setup vs.
25 seconds on the rest of the setups.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index e432d7e..32ea54f 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -159,15 +159,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
 			break;
 		}
 
-		bytes = do_splice_direct(old_file, &old_pos,
-					 new_file, &new_pos,
-					 this_len, SPLICE_F_MOVE);
+		bytes = vfs_copy_file_range(old_file, old_pos,
+					    new_file, new_pos,
+					    this_len, 0);
 		if (bytes <= 0) {
 			error = bytes;
 			break;
 		}
-		WARN_ON(old_pos != new_pos);
 
+		old_pos += bytes;
+		new_pos += bytes;
 		len -= bytes;
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH] ovl: use copy_file_range for copy up if possible
  2016-09-12  6:52               ` Amir Goldstein
@ 2016-09-12 15:37                 ` Amir Goldstein
  0 siblings, 0 replies; 24+ messages in thread
From: Amir Goldstein @ 2016-09-12 15:37 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Miklos Szeredi, Darrick J . Wong, linux-unionfs, linux-xfs,
	linux-fsdevel, Christoph Hellwig

On Mon, Sep 12, 2016 at 9:52 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Mon, Sep 12, 2016 at 1:11 AM, Dave Chinner <david@fromorbit.com> wrote:
>> On Sat, Sep 10, 2016 at 09:54:59PM +0300, Amir Goldstein wrote:
>>> On Sat, Sep 10, 2016 at 2:52 AM, Dave Chinner <david@fromorbit.com> wrote:
>>>
>>> > You can test whether this is supported at mount time, so you do a
>>> > simply flag test at copyup to determine if a clone should be
>>> > attempted or not.
>>> >
>>>
>>> I am not sure that would not be over optimization.
>>> I am already checking for the obvious reason for clone to fail in copyup -
>>> different i_sb.
>>
>> Again, please don't do that.  Call vfs_clone_file_range() as it
>> checks a whole lot more stuff that can cause a clone to fail. And it
>> makes sure that the write references to the mnt are taken so that
>> things like freeze and remount-ro behave correctly while a clone is
>> in progress.
>>
>
> OK
>

Dave,

I just sent out v2 patch series that follows your suggestions (I hope).
Please note that inside vfs_copy_file_range() I *did* add a pre-condition
of different i_sb *before* calling into ->copy_file_range().
The reason is that not all fs check for different i_sb inside the implementation
(i.e. nfs) and since I removed same i_sb constrain from vfs_copy_file_range()
I wanted to make sure that different i_sb case always ends up with
do_splice_direct()
and never propagates into the fs implementation where consequences are unknown.

Please reply on new patch set if you disagree.
Thanks,

Amir.

>>> After all, if I just call clone_file_range() once and it fails, then we are back
>>> to copying and that is going to make the cost of calling clone insignificant.
>>
>> Apart from the fact that the ->clone_file_range() calls assume that
>> all the validity checks have already been done by the caller, which
>> you are not doing.
>>
>>> > If cloning fails or is not supported, then try vfs_copy_file_range()
>>> > to do an optimised iterative partial range file copy.  Finally, try
>>> > a slow, iterative partial range file copies using
>>> > do_splice_direct(). This part can be wholly handled by
>>> > vfs_copy_file_range() - this 'not supported' fallback doesn't need
>>> > to be implemented every time someone wants to copy data between two
>>> > files...
>>>
>>> You do realize that vfs_copy_file_range() itself does the 'not
>>> supported' fallback
>>> and if I do call it iteratively, then there will be a lot more 'not
>>> supported' attempts
>>> then there are in my current patch.
>>
>> No shit, Sherlock. But you're concentrating on the wrong thing -
>> the overhead of checking if .clone_file_range/.copy_file_range is
>> implemented and can be executed is effectively zero compared to
>> copying any amount of data.
>>
>> IOWs, Amir, you're trying to *optimise the wrong thing*. It's the
>> data copy that is costly and needs to be optimised, not the
>> iteration or the checks done to determine what type of clone/copy
>> can be executed. Shortcuts around generic helpers like you are
>> proposing are more costly in the long run because code like this is
>> much more likely to contain/mask bugs that only appear months or
>> years later when something else is changed. Case in point: the mnt
>> write references that need to be taken before calling
>> clone/copy_file_range()....
>>
>> Please, just use vfs_clone_file_range() and vfs_copy_file_range()
>> and only fall back to a slower method if the error returned is
>> -EOPNOTSUPP. For any other error, the copy should fail, not be
>> ignored.
>>
>
> Obviously, you meant to check for -EOPNOTSUPP or -EXDEV
>
>>> But regardless, as I wrote to Christoph, changing the
>>> vfs_copy_file_range() helper
>>> and changing users of do_splice to use it like you suggested sounds
>>> like it may be the right thing to do, but without consensus, I am a bit hesitant
>>> to make those changes. I am definitely willing to draft the patch and test it
>>> if I get more ACKs on the idea.
>>
>> Send a patch - that's the only way you'll get anyone to comment
>> on it.
>>
>
> Will do.
>
> Thanks,
> Amir.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] ovl: use vfs_clone_file_range() for copy up if possible
  2016-09-12 15:06   ` [PATCH v2 2/4] ovl: use vfs_clone_file_range() for copy up if possible Amir Goldstein
@ 2016-09-13  0:02     ` Dave Chinner
  2016-09-13  6:47       ` Amir Goldstein
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Chinner @ 2016-09-13  0:02 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, linux-unionfs, Christoph Hellwig, linux-xfs,
	Darrick J . Wong, linux-fsdevel

On Mon, Sep 12, 2016 at 06:06:41PM +0300, Amir Goldstein wrote:
> When copying up within the same fs, try to use vfs_clone_file_range().
> This is very efficient when lower and upper are on the same fs
> with file reflink support. If vfs_clone_file_range() fails because
> lower and upper are not on the same fs or if fs has no reflink support,
> copy up falls back to the regular data copy code.
> 
> Tested correct behavior when lower and upper are on:
> 1. same ext4 (copy)
> 2. same xfs + reflink patches + mkfs.xfs (copy)
> 3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (clone)
> 4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy)
> 
> For comparison, on my laptop, xfstest overlay/001 (copy up of large
> sparse files) takes less than 1 second in the xfs reflink setup vs.
> 25 seconds on the rest of the setups.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/copy_up.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index 43fdc27..e432d7e 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -136,6 +136,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
>  		goto out_fput;
>  	}
>  
> +	/*
> +	 * Try to use clone_file_range to clone up within the same fs.
> +	 * On failure to clone entire file, fallback to copy loop.
> +	 */
> +	error = vfs_clone_file_range(old_file, 0, new_file, 0, len);
> +	if (error == -EXDEV || error == -EOPNOTSUPP)
> +		error = 0;
> +	else
> +		len = 0;
> +
>  	/* FIXME: copy up sparse files efficiently */
>  	while (len) {
>  		size_t this_len = OVL_COPY_UP_CHUNK_SIZE;

That's nasty. Just use a goto to skip the code that doesn't need to
be run if vfs_clone_file_range() succeeds. This is much better, IMO:

	if (!error)
		goto out_done;
	if (error != -EXDEV && error != -EOPNOTSUPP)
		goto out_error;

	/* Can't clone, so now we try to copy the data */

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/4] vfs: allow vfs_copy_file_range() across file systems
  2016-09-12 15:06   ` [PATCH v2 3/4] vfs: allow vfs_copy_file_range() across file systems Amir Goldstein
@ 2016-09-13  0:08     ` Dave Chinner
  2016-09-13  7:01       ` Amir Goldstein
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Chinner @ 2016-09-13  0:08 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, linux-unionfs, Christoph Hellwig, linux-xfs,
	Darrick J . Wong, linux-fsdevel

On Mon, Sep 12, 2016 at 06:06:42PM +0300, Amir Goldstein wrote:
> copy_file_range syscall returns -EXDEV if src and dest
> file are not on the same file system.
> The vfs_copy_file_range() helper, however, knows how to copy
> across file systems with do_splice_direct().
> 
> Move the enforcement of same file system from the vfs helper
> to the syscall code.
> 
> A following patch is going to use the vfs_copy_file_range()
> helper in overlayfs to copy up between lower and upper
> not on the same file system.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/read_write.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 9dc6e52..c4675c6 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1502,10 +1502,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  	    (file_out->f_flags & O_APPEND))
>  		return -EBADF;
>  
> -	/* this could be relaxed once a method supports cross-fs copies */
> -	if (inode_in->i_sb != inode_out->i_sb)
> -		return -EXDEV;
> -
>  	if (len == 0)
>  		return 0;
>  
> @@ -1514,7 +1510,8 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  		return ret;
>  
>  	ret = -EOPNOTSUPP;
> -	if (file_out->f_op->copy_file_range)
> +	if (inode_in->i_sb == inode_out->i_sb &&
> +			file_out->f_op->copy_file_range)
>  		ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
>  						      pos_out, len, flags);

Indenting is wrong, and you dropped an important comment. i.e
copy_file_range() still doesn't support cross fs copies.....

>  	if (ret == -EOPNOTSUPP)
> @@ -1569,6 +1566,14 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
>  		pos_out = f_out.file->f_pos;
>  	}
>  
> +	/*
> +	 * FIXME: should copy_file_range syscall enforce that src and
> +	 * dest files are on the same mount point or only on the same
> +	 * file system? none of the above?
> +	 */
> +	if (file_inode(f_in.file)->i_sb != file_inode(f_out.file)->i_sb)
> +		return -EXDEV;

For the purposes of this patch, it should simply do what it already
does. If there's a API semantic change that needs to be made, then
get that sorted out now rather than adding a "fixme" comment that
will simply be ignored....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 4/4] ovl: use vfs_copy_file_range() to copy up file data
  2016-09-12 15:06   ` [PATCH v2 4/4] ovl: use vfs_copy_file_range() to copy up file data Amir Goldstein
@ 2016-09-13  0:11     ` Dave Chinner
  2016-09-13  7:26       ` Amir Goldstein
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Chinner @ 2016-09-13  0:11 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, linux-unionfs, Christoph Hellwig, linux-xfs,
	Darrick J . Wong, linux-fsdevel

On Mon, Sep 12, 2016 at 06:06:43PM +0300, Amir Goldstein wrote:
> Use vfs_copy_file_range() helper instead of calling do_splice_direct()
> when copying up file data.
> When copying up within the same fs, which supports copy_file_range(),
> fs implementation can be more efficient then do_splice_direct().
> vfs_copy_file_range() helper falls back to do_splice_direct() if
> it cannot use the file system's copy_file_range() implementation.
> 
> Tested correct behavior when lower and upper are on:
> 1. same ext4 (copy)
> 2. same xfs + reflink patches + mkfs.xfs (copy)
> 3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (clone)
> 4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy)
> 
> For comparison, on my laptop, xfstest overlay/001 (copy up of large
> sparse files) takes less than 1 second in the xfs reflink setup vs.
> 25 seconds on the rest of the setups.

This is now stale, right? the reflink is done from the
vfs_clone_file_range() call added in an earlier patch, not this
change....

> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/copy_up.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index e432d7e..32ea54f 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -159,15 +159,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
>  			break;
>  		}
>  
> -		bytes = do_splice_direct(old_file, &old_pos,
> -					 new_file, &new_pos,
> -					 this_len, SPLICE_F_MOVE);
> +		bytes = vfs_copy_file_range(old_file, old_pos,
> +					    new_file, new_pos,
> +					    this_len, 0);
>  		if (bytes <= 0) {
>  			error = bytes;
>  			break;
>  		}
> -		WARN_ON(old_pos != new_pos);
>  
> +		old_pos += bytes;
> +		new_pos += bytes;
>  		len -= bytes;
>  	}

Patch does not remove the

	/* FIXME: copy up sparse files efficiently */

comment. Efficient copying of sparse files is something
vfs_copy_file_range() should do internally....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] ovl: use vfs_clone_file_range() for copy up if possible
  2016-09-13  0:02     ` Dave Chinner
@ 2016-09-13  6:47       ` Amir Goldstein
  0 siblings, 0 replies; 24+ messages in thread
From: Amir Goldstein @ 2016-09-13  6:47 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Miklos Szeredi, linux-unionfs, Christoph Hellwig, linux-xfs,
	Darrick J . Wong, linux-fsdevel

On Tue, Sep 13, 2016 at 3:02 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Mon, Sep 12, 2016 at 06:06:41PM +0300, Amir Goldstein wrote:
>> When copying up within the same fs, try to use vfs_clone_file_range().
>> This is very efficient when lower and upper are on the same fs
>> with file reflink support. If vfs_clone_file_range() fails because
>> lower and upper are not on the same fs or if fs has no reflink support,
>> copy up falls back to the regular data copy code.
>>
>> Tested correct behavior when lower and upper are on:
>> 1. same ext4 (copy)
>> 2. same xfs + reflink patches + mkfs.xfs (copy)
>> 3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (clone)
>> 4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy)
>>
>> For comparison, on my laptop, xfstest overlay/001 (copy up of large
>> sparse files) takes less than 1 second in the xfs reflink setup vs.
>> 25 seconds on the rest of the setups.
>>
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>>  fs/overlayfs/copy_up.c | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
>> index 43fdc27..e432d7e 100644
>> --- a/fs/overlayfs/copy_up.c
>> +++ b/fs/overlayfs/copy_up.c
>> @@ -136,6 +136,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
>>               goto out_fput;
>>       }
>>
>> +     /*
>> +      * Try to use clone_file_range to clone up within the same fs.
>> +      * On failure to clone entire file, fallback to copy loop.
>> +      */
>> +     error = vfs_clone_file_range(old_file, 0, new_file, 0, len);
>> +     if (error == -EXDEV || error == -EOPNOTSUPP)
>> +             error = 0;
>> +     else
>> +             len = 0;
>> +
>>       /* FIXME: copy up sparse files efficiently */
>>       while (len) {
>>               size_t this_len = OVL_COPY_UP_CHUNK_SIZE;
>
> That's nasty. Just use a goto to skip the code that doesn't need to
> be run if vfs_clone_file_range() succeeds. This is much better, IMO:

OK, but I prefer to write up explicit comments for all cases.

>
>         if (!error)
>                 goto out_done;
>         if (error != -EXDEV && error != -EOPNOTSUPP)
>                 goto out_error;
>
>         /* Can't clone, so now we try to copy the data */
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/4] vfs: allow vfs_copy_file_range() across file systems
  2016-09-13  0:08     ` Dave Chinner
@ 2016-09-13  7:01       ` Amir Goldstein
  0 siblings, 0 replies; 24+ messages in thread
From: Amir Goldstein @ 2016-09-13  7:01 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Miklos Szeredi, linux-unionfs, Christoph Hellwig, linux-xfs,
	Darrick J . Wong, linux-fsdevel

On Tue, Sep 13, 2016 at 3:08 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Mon, Sep 12, 2016 at 06:06:42PM +0300, Amir Goldstein wrote:
>> copy_file_range syscall returns -EXDEV if src and dest
>> file are not on the same file system.
>> The vfs_copy_file_range() helper, however, knows how to copy
>> across file systems with do_splice_direct().
>>
>> Move the enforcement of same file system from the vfs helper
>> to the syscall code.
>>
>> A following patch is going to use the vfs_copy_file_range()
>> helper in overlayfs to copy up between lower and upper
>> not on the same file system.
>>
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>>  fs/read_write.c | 15 ++++++++++-----
>>  1 file changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/read_write.c b/fs/read_write.c
>> index 9dc6e52..c4675c6 100644
>> --- a/fs/read_write.c
>> +++ b/fs/read_write.c
>> @@ -1502,10 +1502,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>>           (file_out->f_flags & O_APPEND))
>>               return -EBADF;
>>
>> -     /* this could be relaxed once a method supports cross-fs copies */
>> -     if (inode_in->i_sb != inode_out->i_sb)
>> -             return -EXDEV;
>> -
>>       if (len == 0)
>>               return 0;
>>
>> @@ -1514,7 +1510,8 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>>               return ret;
>>
>>       ret = -EOPNOTSUPP;
>> -     if (file_out->f_op->copy_file_range)
>> +     if (inode_in->i_sb == inode_out->i_sb &&
>> +                     file_out->f_op->copy_file_range)
>>               ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
>>                                                     pos_out, len, flags);
>
> Indenting is wrong, and you dropped an important comment. i.e
> copy_file_range() still doesn't support cross fs copies.....

OK. moved the comment down this this same i_sb test

>
>>       if (ret == -EOPNOTSUPP)
>> @@ -1569,6 +1566,14 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
>>               pos_out = f_out.file->f_pos;
>>       }
>>
>> +     /*
>> +      * FIXME: should copy_file_range syscall enforce that src and
>> +      * dest files are on the same mount point or only on the same
>> +      * file system? none of the above?
>> +      */
>> +     if (file_inode(f_in.file)->i_sb != file_inode(f_out.file)->i_sb)
>> +             return -EXDEV;
>
> For the purposes of this patch, it should simply do what it already
> does. If there's a API semantic change that needs to be made, then
> get that sorted out now rather than adding a "fixme" comment that
> will simply be ignored....

How about:

        /*
         * vfs_copy_file_range() can do cross-fs copy, but we want to
         * fulfill the guaranty to userland that copy_file_range syscall
         * does not allow cross-fs copy
         */

Cheers,
Amir.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 4/4] ovl: use vfs_copy_file_range() to copy up file data
  2016-09-13  0:11     ` Dave Chinner
@ 2016-09-13  7:26       ` Amir Goldstein
  2016-09-14 12:43         ` Amir Goldstein
  0 siblings, 1 reply; 24+ messages in thread
From: Amir Goldstein @ 2016-09-13  7:26 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Miklos Szeredi, linux-unionfs, Christoph Hellwig, linux-xfs,
	Darrick J . Wong, linux-fsdevel

On Tue, Sep 13, 2016 at 3:11 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Mon, Sep 12, 2016 at 06:06:43PM +0300, Amir Goldstein wrote:
>> Use vfs_copy_file_range() helper instead of calling do_splice_direct()
>> when copying up file data.
>> When copying up within the same fs, which supports copy_file_range(),
>> fs implementation can be more efficient then do_splice_direct().
>> vfs_copy_file_range() helper falls back to do_splice_direct() if
>> it cannot use the file system's copy_file_range() implementation.
>>
>> Tested correct behavior when lower and upper are on:
>> 1. same ext4 (copy)
>> 2. same xfs + reflink patches + mkfs.xfs (copy)
>> 3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (clone)
>> 4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy)
>>
>> For comparison, on my laptop, xfstest overlay/001 (copy up of large
>> sparse files) takes less than 1 second in the xfs reflink setup vs.
>> 25 seconds on the rest of the setups.
>
> This is now stale, right? the reflink is done from the
> vfs_clone_file_range() call added in an earlier patch, not this
> change....

Well.. that depends on the definition of "now"
as patch 4/4 you are correct, but as I wrote in the cover letter
I tested patches 3,4 independently of patches 1,2
otherwise, patches 3,4 would be adding moot code
or rather a simple re-factoring.

So now it boils down to a process question:
should patches 3,4 be merged, even though no one can properly test the
'fallback to chunk copy range' case
with any of the in-tree filesystems (without modifying them first)?

I could be presenting patch 4 as a simple re-factoring
(use vfs copy range helper instead of using do_splice_direct)
which makes for a cleaner code.
But the fact that it changes behavior in a subtle way
(hopefully for the best) without anyone noticing this change far into
the future is somewhat troubling.

Your thoughts?

>
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>>  fs/overlayfs/copy_up.c | 9 +++++----
>>  1 file changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
>> index e432d7e..32ea54f 100644
>> --- a/fs/overlayfs/copy_up.c
>> +++ b/fs/overlayfs/copy_up.c
>> @@ -159,15 +159,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
>>                       break;
>>               }
>>
>> -             bytes = do_splice_direct(old_file, &old_pos,
>> -                                      new_file, &new_pos,
>> -                                      this_len, SPLICE_F_MOVE);
>> +             bytes = vfs_copy_file_range(old_file, old_pos,
>> +                                         new_file, new_pos,
>> +                                         this_len, 0);
>>               if (bytes <= 0) {
>>                       error = bytes;
>>                       break;
>>               }
>> -             WARN_ON(old_pos != new_pos);
>>
>> +             old_pos += bytes;
>> +             new_pos += bytes;
>>               len -= bytes;
>>       }
>
> Patch does not remove the
>
>         /* FIXME: copy up sparse files efficiently */
>
> comment. Efficient copying of sparse files is something
> vfs_copy_file_range() should do internally....

Moved the comment into vfs_copy_file_range()

Cheers,
Amir.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 4/4] ovl: use vfs_copy_file_range() to copy up file data
  2016-09-13  7:26       ` Amir Goldstein
@ 2016-09-14 12:43         ` Amir Goldstein
  0 siblings, 0 replies; 24+ messages in thread
From: Amir Goldstein @ 2016-09-14 12:43 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Miklos Szeredi, linux-unionfs, Christoph Hellwig, linux-xfs,
	Darrick J . Wong, linux-fsdevel

On Tue, Sep 13, 2016 at 10:26 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Tue, Sep 13, 2016 at 3:11 AM, Dave Chinner <david@fromorbit.com> wrote:
>> On Mon, Sep 12, 2016 at 06:06:43PM +0300, Amir Goldstein wrote:
>>> Use vfs_copy_file_range() helper instead of calling do_splice_direct()
>>> when copying up file data.
>>> When copying up within the same fs, which supports copy_file_range(),
>>> fs implementation can be more efficient then do_splice_direct().
>>> vfs_copy_file_range() helper falls back to do_splice_direct() if
>>> it cannot use the file system's copy_file_range() implementation.
>>>
>>> Tested correct behavior when lower and upper are on:
>>> 1. same ext4 (copy)
>>> 2. same xfs + reflink patches + mkfs.xfs (copy)
>>> 3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (clone)
>>> 4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy)
>>>
>>> For comparison, on my laptop, xfstest overlay/001 (copy up of large
>>> sparse files) takes less than 1 second in the xfs reflink setup vs.
>>> 25 seconds on the rest of the setups.
>>
>> This is now stale, right? the reflink is done from the
>> vfs_clone_file_range() call added in an earlier patch, not this
>> change....
>
> Well.. that depends on the definition of "now"
> as patch 4/4 you are correct, but as I wrote in the cover letter
> I tested patches 3,4 independently of patches 1,2
> otherwise, patches 3,4 would be adding moot code
> or rather a simple re-factoring.
>
> So now it boils down to a process question:
> should patches 3,4 be merged, even though no one can properly test the
> 'fallback to chunk copy range' case
> with any of the in-tree filesystems (without modifying them first)?
>
> I could be presenting patch 4 as a simple re-factoring
> (use vfs copy range helper instead of using do_splice_direct)
> which makes for a cleaner code.
> But the fact that it changes behavior in a subtle way
> (hopefully for the best) without anyone noticing this change far into
> the future is somewhat troubling.
>
> Your thoughts?
>

Anyway, I added a comment in commit message about how I tested this change
(by removing the vfs_clone_file_range() call). resending...

>>
>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>> ---
>>>  fs/overlayfs/copy_up.c | 9 +++++----
>>>  1 file changed, 5 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
>>> index e432d7e..32ea54f 100644
>>> --- a/fs/overlayfs/copy_up.c
>>> +++ b/fs/overlayfs/copy_up.c
>>> @@ -159,15 +159,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
>>>                       break;
>>>               }
>>>
>>> -             bytes = do_splice_direct(old_file, &old_pos,
>>> -                                      new_file, &new_pos,
>>> -                                      this_len, SPLICE_F_MOVE);
>>> +             bytes = vfs_copy_file_range(old_file, old_pos,
>>> +                                         new_file, new_pos,
>>> +                                         this_len, 0);
>>>               if (bytes <= 0) {
>>>                       error = bytes;
>>>                       break;
>>>               }
>>> -             WARN_ON(old_pos != new_pos);
>>>
>>> +             old_pos += bytes;
>>> +             new_pos += bytes;
>>>               len -= bytes;
>>>       }
>>
>> Patch does not remove the
>>
>>         /* FIXME: copy up sparse files efficiently */
>>
>> comment. Efficient copying of sparse files is something
>> vfs_copy_file_range() should do internally....
>
> Moved the comment into vfs_copy_file_range()
>
> Cheers,
> Amir.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2016-09-14 12:43 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-08 15:29 [PATCH] ovl: use copy_file_range for copy up if possible Amir Goldstein
2016-09-08 20:25 ` Dave Chinner
2016-09-09  7:31   ` Amir Goldstein
2016-09-09  7:54     ` Dave Chinner
2016-09-09  8:27       ` Amir Goldstein
2016-09-09 23:52         ` Dave Chinner
2016-09-10  7:40           ` Christoph Hellwig
2016-09-10 18:15             ` Amir Goldstein
2016-09-10 18:54           ` Amir Goldstein
2016-09-11 22:11             ` Dave Chinner
2016-09-12  6:52               ` Amir Goldstein
2016-09-12 15:37                 ` Amir Goldstein
2016-09-12 15:06 ` [PATCH v2 0/4] ovl: efficient copy up by reflink Amir Goldstein
2016-09-12 15:06   ` [PATCH v2 1/4] vfs: allow vfs_clone_file_range() across mount points Amir Goldstein
2016-09-12 15:06   ` [PATCH v2 2/4] ovl: use vfs_clone_file_range() for copy up if possible Amir Goldstein
2016-09-13  0:02     ` Dave Chinner
2016-09-13  6:47       ` Amir Goldstein
2016-09-12 15:06   ` [PATCH v2 3/4] vfs: allow vfs_copy_file_range() across file systems Amir Goldstein
2016-09-13  0:08     ` Dave Chinner
2016-09-13  7:01       ` Amir Goldstein
2016-09-12 15:06   ` [PATCH v2 4/4] ovl: use vfs_copy_file_range() to copy up file data Amir Goldstein
2016-09-13  0:11     ` Dave Chinner
2016-09-13  7:26       ` Amir Goldstein
2016-09-14 12:43         ` Amir Goldstein

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.