All of lore.kernel.org
 help / color / mirror / Atom feed
* copy_file_range and user space tools to do copy fastest
@ 2018-04-27 18:25 Steve French
  2018-04-27 19:45 ` Andreas Dilger
  0 siblings, 1 reply; 6+ messages in thread
From: Steve French @ 2018-04-27 18:25 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: samba-technical, CIFS, LKML

Are there any user space tools (other than our test tools and xfs_io
etc.) that support copy_file_range?  Looks like at least cp and rsync
and dd don't.  That syscall which now has been around a couple years,
and was reminded about at the LSF/MM summit a few days ago, presumably
is the 'best' way to copy a file fast since it tries all the
mechanisms (reflink etc.) in order.

Since copy_file_range syscall can be 100x or more faster for network
file systems than the alternative, was surprised when I noticed that
cp and rsync didn't support it.  It doesn't look like rsync even
supports reflink either(although presumably if you call
copy_file_range you don't have to worry about that), and reads/writes
are 8K. See copy_file() in rsync/util.c

In the cp command it looks like it can call the FICLONE IOCTL (see
clone_file() in coreutils/src/copy.c) but doesn't call the expected
"copy_file_range" syscall.

In the dd command it doesn't call either - see dd_copy in corutils/src/dd.c

Since it can be 100x or more faster in some cases to call
copy_file_range than do reads/writes back and forth to do a copy
(especially if network or clustered backend or cloud), what tools are
the best to recommend?

Would rsync or cp be likely to take patches to call the standard
"copy_file_range" syscall
(http://man7.org/linux/man-pages/man2/copy_file_range.2.html)?
Presumably not if it has been two+ years ... but would be interested
what copy tools to recommend to use instead.

These are not uncommon cases (all Windows, Macs, Samba etc. and even
some NFS servers) ... but copies over local file systems can benefit
too (as copy_file_range tries various mechanisms).
-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: copy_file_range and user space tools to do copy fastest
  2018-04-27 18:25 copy_file_range and user space tools to do copy fastest Steve French
@ 2018-04-27 19:45 ` Andreas Dilger
  2018-04-27 23:41   ` Eric Biggers
  0 siblings, 1 reply; 6+ messages in thread
From: Andreas Dilger @ 2018-04-27 19:45 UTC (permalink / raw)
  To: Steve French; +Cc: linux-fsdevel, samba-technical, CIFS, LKML

[-- Attachment #1: Type: text/plain, Size: 2002 bytes --]

On Apr 27, 2018, at 12:25 PM, Steve French <smfrench@gmail.com> wrote:
> 
> Are there any user space tools (other than our test tools and xfs_io
> etc.) that support copy_file_range?  Looks like at least cp and rsync
> and dd don't.  That syscall which now has been around a couple years,
> and was reminded about at the LSF/MM summit a few days ago, presumably
> is the 'best' way to copy a file fast since it tries all the
> mechanisms (reflink etc.) in order.
> 
> Since copy_file_range syscall can be 100x or more faster for network
> file systems than the alternative, was surprised when I noticed that
> cp and rsync didn't support it.  It doesn't look like rsync even
> supports reflink either(although presumably if you call
> copy_file_range you don't have to worry about that), and reads/writes
> are 8K. See copy_file() in rsync/util.c
> 
> In the cp command it looks like it can call the FICLONE IOCTL (see
> clone_file() in coreutils/src/copy.c) but doesn't call the expected
> "copy_file_range" syscall.
> 
> In the dd command it doesn't call either - see dd_copy in corutils/src/dd.c
> 
> Since it can be 100x or more faster in some cases to call
> copy_file_range than do reads/writes back and forth to do a copy
> (especially if network or clustered backend or cloud), what tools are
> the best to recommend?
> 
> Would rsync or cp be likely to take patches to call the standard
> "copy_file_range" syscall
> (http://man7.org/linux/man-pages/man2/copy_file_range.2.html)?
> Presumably not if it has been two+ years ... but would be interested
> what copy tools to recommend to use instead.

I would start with submitting a patch to coreutils, if you can figure
out that code enough to do so (I find it quite opaque).  Since it has
been in the kernel for a while already, it should be acceptable to the
upstream coreutils maintainers to use this interface.  Doubly so if you
include some benchmarks with CIFS/NFS clients avoiding network overhead
during the copy.

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: copy_file_range and user space tools to do copy fastest
  2018-04-27 19:45 ` Andreas Dilger
@ 2018-04-27 23:41   ` Eric Biggers
  2018-04-28  5:18     ` Andreas Dilger
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Biggers @ 2018-04-27 23:41 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Steve French, linux-fsdevel, samba-technical, CIFS, LKML

On Fri, Apr 27, 2018 at 01:45:40PM -0600, Andreas Dilger wrote:
> On Apr 27, 2018, at 12:25 PM, Steve French <smfrench@gmail.com> wrote:
> > 
> > Are there any user space tools (other than our test tools and xfs_io
> > etc.) that support copy_file_range?  Looks like at least cp and rsync
> > and dd don't.  That syscall which now has been around a couple years,
> > and was reminded about at the LSF/MM summit a few days ago, presumably
> > is the 'best' way to copy a file fast since it tries all the
> > mechanisms (reflink etc.) in order.
> > 
> > Since copy_file_range syscall can be 100x or more faster for network
> > file systems than the alternative, was surprised when I noticed that
> > cp and rsync didn't support it.  It doesn't look like rsync even
> > supports reflink either(although presumably if you call
> > copy_file_range you don't have to worry about that), and reads/writes
> > are 8K. See copy_file() in rsync/util.c
> > 
> > In the cp command it looks like it can call the FICLONE IOCTL (see
> > clone_file() in coreutils/src/copy.c) but doesn't call the expected
> > "copy_file_range" syscall.
> > 
> > In the dd command it doesn't call either - see dd_copy in corutils/src/dd.c
> > 
> > Since it can be 100x or more faster in some cases to call
> > copy_file_range than do reads/writes back and forth to do a copy
> > (especially if network or clustered backend or cloud), what tools are
> > the best to recommend?
> > 
> > Would rsync or cp be likely to take patches to call the standard
> > "copy_file_range" syscall
> > (http://man7.org/linux/man-pages/man2/copy_file_range.2.html)?
> > Presumably not if it has been two+ years ... but would be interested
> > what copy tools to recommend to use instead.
> 
> I would start with submitting a patch to coreutils, if you can figure
> out that code enough to do so (I find it quite opaque).  Since it has
> been in the kernel for a while already, it should be acceptable to the
> upstream coreutils maintainers to use this interface.  Doubly so if you
> include some benchmarks with CIFS/NFS clients avoiding network overhead
> during the copy.
> 

For cp (coreutils), apparently there was a concern that copy_file_range()
expands holes; see the thread at
https://lists.gnu.org/archive/html/bug-coreutils/2016-09/msg00020.html.
Though, I'd think it could just be used on non-holes only.  And I don't think
the size_t type of 'len' is a problem either, since it's the copy length, not
the file size.  You just call it multiple times if the file is larger.

Eric

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: copy_file_range and user space tools to do copy fastest
  2018-04-27 23:41   ` Eric Biggers
@ 2018-04-28  5:18     ` Andreas Dilger
  2018-04-28  5:26       ` Steve French
  0 siblings, 1 reply; 6+ messages in thread
From: Andreas Dilger @ 2018-04-28  5:18 UTC (permalink / raw)
  To: Eric Biggers; +Cc: Steve French, linux-fsdevel, samba-technical, CIFS, LKML

[-- Attachment #1: Type: text/plain, Size: 2958 bytes --]

On Apr 27, 2018, at 5:41 PM, Eric Biggers <ebiggers3@gmail.com> wrote:
> 
> On Fri, Apr 27, 2018 at 01:45:40PM -0600, Andreas Dilger wrote:
>> On Apr 27, 2018, at 12:25 PM, Steve French <smfrench@gmail.com> wrote:
>>> 
>>> Are there any user space tools (other than our test tools and xfs_io
>>> etc.) that support copy_file_range?  Looks like at least cp and rsync
>>> and dd don't.  That syscall which now has been around a couple years,
>>> and was reminded about at the LSF/MM summit a few days ago, presumably
>>> is the 'best' way to copy a file fast since it tries all the
>>> mechanisms (reflink etc.) in order.
>>> 
>>> Since copy_file_range syscall can be 100x or more faster for network
>>> file systems than the alternative, was surprised when I noticed that
>>> cp and rsync didn't support it.  It doesn't look like rsync even
>>> supports reflink either(although presumably if you call
>>> copy_file_range you don't have to worry about that), and reads/writes
>>> are 8K. See copy_file() in rsync/util.c
>>> 
>>> In the cp command it looks like it can call the FICLONE IOCTL (see
>>> clone_file() in coreutils/src/copy.c) but doesn't call the expected
>>> "copy_file_range" syscall.
>>> 
>>> In the dd command it doesn't call either - see dd_copy in corutils/src/dd.c
>>> 
>>> Since it can be 100x or more faster in some cases to call
>>> copy_file_range than do reads/writes back and forth to do a copy
>>> (especially if network or clustered backend or cloud), what tools are
>>> the best to recommend?
>>> 
>>> Would rsync or cp be likely to take patches to call the standard
>>> "copy_file_range" syscall
>>> (http://man7.org/linux/man-pages/man2/copy_file_range.2.html)?
>>> Presumably not if it has been two+ years ... but would be interested
>>> what copy tools to recommend to use instead.
>> 
>> I would start with submitting a patch to coreutils, if you can figure
>> out that code enough to do so (I find it quite opaque).  Since it has
>> been in the kernel for a while already, it should be acceptable to the
>> upstream coreutils maintainers to use this interface.  Doubly so if you
>> include some benchmarks with CIFS/NFS clients avoiding network overhead
>> during the copy.
>> 
> 
> For cp (coreutils), apparently there was a concern that copy_file_range()
> expands holes; see the thread at
> https://lists.gnu.org/archive/html/bug-coreutils/2016-09/msg00020.html.
> Though, I'd think it could just be used on non-holes only.  And I don't think
> the size_t type of 'len' is a problem either, since it's the copy length, not
> the file size.  You just call it multiple times if the file is larger.

I think cp is already using SEEK_HOLE/SEEK_DATA and/or FIEMAP to determine
the mapped and sparse segments of the file, so it should be practical to
use copy_file_range() in conjunction with these to copy only the allocated
parts of the file.

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: copy_file_range and user space tools to do copy fastest
  2018-04-28  5:18     ` Andreas Dilger
@ 2018-04-28  5:26       ` Steve French
  2018-04-28 13:59         ` Goldwyn Rodrigues
  0 siblings, 1 reply; 6+ messages in thread
From: Steve French @ 2018-04-28  5:26 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Eric Biggers, linux-fsdevel, samba-technical, CIFS, LKML

On Sat, Apr 28, 2018 at 12:18 AM, Andreas Dilger <adilger@dilger.ca> wrote:
> On Apr 27, 2018, at 5:41 PM, Eric Biggers <ebiggers3@gmail.com> wrote:
>>
>> On Fri, Apr 27, 2018 at 01:45:40PM -0600, Andreas Dilger wrote:
>>> On Apr 27, 2018, at 12:25 PM, Steve French <smfrench@gmail.com> wrote:
>>>>
>>>> Are there any user space tools (other than our test tools and xfs_io
>>>> etc.) that support copy_file_range?  Looks like at least cp and rsync
>>>> and dd don't.  That syscall which now has been around a couple years,
>>>> and was reminded about at the LSF/MM summit a few days ago, presumably
>>>> is the 'best' way to copy a file fast since it tries all the
>>>> mechanisms (reflink etc.) in order.
>>>>
>>>> Since copy_file_range syscall can be 100x or more faster for network
>>>> file systems than the alternative, was surprised when I noticed that
>>>> cp and rsync didn't support it.  It doesn't look like rsync even
>>>> supports reflink either(although presumably if you call
>>>> copy_file_range you don't have to worry about that), and reads/writes
>>>> are 8K. See copy_file() in rsync/util.c
>>>>
>>>> In the cp command it looks like it can call the FICLONE IOCTL (see
>>>> clone_file() in coreutils/src/copy.c) but doesn't call the expected
>>>> "copy_file_range" syscall.
>>>>
>>>> In the dd command it doesn't call either - see dd_copy in corutils/src/dd.c
>>>>
>>>> Since it can be 100x or more faster in some cases to call
>>>> copy_file_range than do reads/writes back and forth to do a copy
>>>> (especially if network or clustered backend or cloud), what tools are
>>>> the best to recommend?
>>>>
>>>> Would rsync or cp be likely to take patches to call the standard
>>>> "copy_file_range" syscall
>>>> (http://man7.org/linux/man-pages/man2/copy_file_range.2.html)?
>>>> Presumably not if it has been two+ years ... but would be interested
>>>> what copy tools to recommend to use instead.
>>>
>>> I would start with submitting a patch to coreutils, if you can figure
>>> out that code enough to do so (I find it quite opaque).  Since it has
>>> been in the kernel for a while already, it should be acceptable to the
>>> upstream coreutils maintainers to use this interface.  Doubly so if you
>>> include some benchmarks with CIFS/NFS clients avoiding network overhead
>>> during the copy.
>>>
>>
>> For cp (coreutils), apparently there was a concern that copy_file_range()
>> expands holes; see the thread at
>> https://lists.gnu.org/archive/html/bug-coreutils/2016-09/msg00020.html.
>> Though, I'd think it could just be used on non-holes only.  And I don't think
>> the size_t type of 'len' is a problem either, since it's the copy length, not
>> the file size.  You just call it multiple times if the file is larger.
>
> I think cp is already using SEEK_HOLE/SEEK_DATA and/or FIEMAP to determine
> the mapped and sparse segments of the file, so it should be practical to
> use copy_file_range() in conjunction with these to copy only the allocated
> parts of the file.

For the case where clone/reflink or copy_file_range is supported - is
there any reason to
not sent the request to copy the whole file? Presumably long
timeout/errors might be a concern, but
that could happen with ranges too.  In any case, if sent the whole
file copy request,
the server file system can figure out the  holes and copy more efficiently.

In the case where it is copying local to remote or remote to local -
figuring out whether it is
sparse and optimizing makes a lot of sense - but I didn't think cp did
that (at least the
sections of code I was looking at).



-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: copy_file_range and user space tools to do copy fastest
  2018-04-28  5:26       ` Steve French
@ 2018-04-28 13:59         ` Goldwyn Rodrigues
  0 siblings, 0 replies; 6+ messages in thread
From: Goldwyn Rodrigues @ 2018-04-28 13:59 UTC (permalink / raw)
  To: Steve French, Andreas Dilger
  Cc: Eric Biggers, linux-fsdevel, samba-technical, CIFS, LKML



On 04/28/2018 12:26 AM, Steve French wrote:
> On Sat, Apr 28, 2018 at 12:18 AM, Andreas Dilger <adilger@dilger.ca> wrote:
>> On Apr 27, 2018, at 5:41 PM, Eric Biggers <ebiggers3@gmail.com> wrote:
>>>
>>> On Fri, Apr 27, 2018 at 01:45:40PM -0600, Andreas Dilger wrote:
>>>> On Apr 27, 2018, at 12:25 PM, Steve French <smfrench@gmail.com> wrote:
>>>>>
>>>>> Are there any user space tools (other than our test tools and xfs_io
>>>>> etc.) that support copy_file_range?  Looks like at least cp and rsync
>>>>> and dd don't.  That syscall which now has been around a couple years,
>>>>> and was reminded about at the LSF/MM summit a few days ago, presumably
>>>>> is the 'best' way to copy a file fast since it tries all the
>>>>> mechanisms (reflink etc.) in order.
>>>>>
>>>>> Since copy_file_range syscall can be 100x or more faster for network
>>>>> file systems than the alternative, was surprised when I noticed that
>>>>> cp and rsync didn't support it.  It doesn't look like rsync even
>>>>> supports reflink either(although presumably if you call
>>>>> copy_file_range you don't have to worry about that), and reads/writes
>>>>> are 8K. See copy_file() in rsync/util.c
>>>>>
>>>>> In the cp command it looks like it can call the FICLONE IOCTL (see
>>>>> clone_file() in coreutils/src/copy.c) but doesn't call the expected
>>>>> "copy_file_range" syscall.
>>>>>
>>>>> In the dd command it doesn't call either - see dd_copy in corutils/src/dd.c
>>>>>
>>>>> Since it can be 100x or more faster in some cases to call
>>>>> copy_file_range than do reads/writes back and forth to do a copy
>>>>> (especially if network or clustered backend or cloud), what tools are
>>>>> the best to recommend?
>>>>>
>>>>> Would rsync or cp be likely to take patches to call the standard
>>>>> "copy_file_range" syscall
>>>>> (http://man7.org/linux/man-pages/man2/copy_file_range.2.html)?
>>>>> Presumably not if it has been two+ years ... but would be interested
>>>>> what copy tools to recommend to use instead.
>>>>
>>>> I would start with submitting a patch to coreutils, if you can figure
>>>> out that code enough to do so (I find it quite opaque).  Since it has
>>>> been in the kernel for a while already, it should be acceptable to the
>>>> upstream coreutils maintainers to use this interface.  Doubly so if you
>>>> include some benchmarks with CIFS/NFS clients avoiding network overhead
>>>> during the copy.
>>>>
>>>
>>> For cp (coreutils), apparently there was a concern that copy_file_range()
>>> expands holes; see the thread at
>>> https://lists.gnu.org/archive/html/bug-coreutils/2016-09/msg00020.html.
>>> Though, I'd think it could just be used on non-holes only.  And I don't think
>>> the size_t type of 'len' is a problem either, since it's the copy length, not
>>> the file size.  You just call it multiple times if the file is larger.
>>
>> I think cp is already using SEEK_HOLE/SEEK_DATA and/or FIEMAP to determine
>> the mapped and sparse segments of the file, so it should be practical to
>> use copy_file_range() in conjunction with these to copy only the allocated
>> parts of the file.
> 
> For the case where clone/reflink or copy_file_range is supported - is
> there any reason to
> not sent the request to copy the whole file? Presumably long
> timeout/errors might be a concern, but
> that could happen with ranges too.  In any case, if sent the whole
> file copy request,
> the server file system can figure out the  holes and copy more efficiently.
> 
> In the case where it is copying local to remote or remote to local -
> figuring out whether it is
> sparse and optimizing makes a lot of sense - but I didn't think cp did
> that (at least the
> sections of code I was looking at).

cp does check for sparse files and tries to recreate them depending on
--sparse=WHEN option. Check the make_holes variable in copy.c. However,
we could still use copy_file_range() when make_holes is false and close
on success. However, you would have to be careful to check if the return
value is positive and less than len and have to act accordingly.

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-04-28 13:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-27 18:25 copy_file_range and user space tools to do copy fastest Steve French
2018-04-27 19:45 ` Andreas Dilger
2018-04-27 23:41   ` Eric Biggers
2018-04-28  5:18     ` Andreas Dilger
2018-04-28  5:26       ` Steve French
2018-04-28 13:59         ` Goldwyn Rodrigues

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.