All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anna Schumaker <Anna.Schumaker@netapp.com>
To: Andreas Dilger <adilger@dilger.ca>,
	"Darrick J. Wong" <darrick.wong@oracle.com>
Cc: <linux-nfs@vger.kernel.org>, <linux-btrfs@vger.kernel.org>,
	<linux-fsdevel@vger.kernel.org>, <linux-api@vger.kernel.org>,
	<zab@zabbo.net>, <viro@zeniv.linux.org.uk>, <clm@fb.com>,
	<mtk.manpages@gmail.com>, <andros@netapp.com>,
	<hch@infradead.org>
Subject: Re: [PATCH v1 9/8] copy_file_range.2: New page documenting copy_file_range()
Date: Tue, 8 Sep 2015 11:05:18 -0400	[thread overview]
Message-ID: <55EEF92E.2090201@Netapp.com> (raw)
In-Reply-To: <95674806-645C-410C-8A4B-A46F03AFFE20@dilger.ca>

On 09/04/2015 06:31 PM, Andreas Dilger wrote:
> On Sep 4, 2015, at 3:38 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
>>
>> On Fri, Sep 04, 2015 at 04:17:03PM -0400, Anna Schumaker wrote:
>>> copy_file_range() is a new system call for copying ranges of data
>>> completely in the kernel.  This gives filesystems an opportunity to
>>> implement some kind of "copy acceleration", such as reflinks or
>>> server-side-copy (in the case of NFS).
>>>
>>> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
>>> ---
>>> man2/copy_file_range.2 | 168 +++++++++++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 168 insertions(+)
>>> create mode 100644 man2/copy_file_range.2
>>>
>>> diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
>>> new file mode 100644
>>> index 0000000..4a4cb73
>>> --- /dev/null
>>> +++ b/man2/copy_file_range.2
>>> @@ -0,0 +1,168 @@
>>> +.\"This manpage is Copyright (C) 2015 Anna Schumaker <Anna.Schumaker@Netapp.com>
>>> +.TH COPY 2 2015-8-31 "Linux" "Linux Programmer's Manual"
>>> +.SH NAME
>>> +copy_file_range \- Copy a range of data from one file to another
>>> +.SH SYNOPSIS
>>> +.nf
>>> +.B #include <linux/copy.h>
>>> +.B #include <sys/syscall.h>
>>> +.B #include <unistd.h>
>>> +
>>> +.BI "ssize_t syscall(__NR_copy_file_range, int " fd_in ", loff_t * " off_in ",
>>> +.BI "                int " fd_out ", loff_t * " off_out ", size_t " len ",
>>> +.BI "                unsigned int " flags );
>>> +.fi
>>> +.SH DESCRIPTION
>>> +The
>>> +.BR copy_file_range ()
>>> +system call performs an in-kernel copy between two file descriptors
>>> +without all that tedious mucking about in userspace.
>>
>> ;)
>>
>>> +It copies up to
>>> +.I len
>>> +bytes of data from file descriptor
>>> +.I fd_in
>>> +to file descriptor
>>> +.I fd_out
>>> +at
>>> +.IR off_out .
>>> +The file descriptors must not refer to the same file.
>>
>> Why?  btrfs (and XFS) reflink can handle the case of a file sharing blocks
>> with itself.
>>
>>> +
>>> +The following semantics apply for
>>> +.IR fd_in ,
>>> +and similar statements apply to
>>> +.IR off_out :
>>> +.IP * 3
>>> +If
>>> +.I off_in
>>> +is NULL, then bytes are read from
>>> +.I fd_in
>>> +starting from the current file offset and the current
>>> +file offset is adjusted appropriately.
>>> +.IP *
>>> +If
>>> +.I off_in
>>> +is not NULL, then
>>> +.I off_in
>>> +must point to a buffer that specifies the starting
>>> +offset where bytes from
>>> +.I fd_in
>>> +will be read.  The current file offset of
>>> +.I fd_in
>>> +is not changed, but
>>> +.I off_in
>>> +is adjusted appropriately.
>>> +.PP
>>> +The default behavior of
>>> +.BR copy_file_range ()
>>> +is filesystem specific, and might result in creating a
>>> +copy-on-write reflink.
>>> +In the event that a given filesystem does not implement
>>> +any form of copy acceleration, the kernel will perform
>>> +a deep copy of the requested range by reading bytes from
>>
>> I wonder if it's wise to allow deep copies -- what happens if
>> len == 1T? Will this syscall just block for a really long time?
> 
> It should be interruptible, and return the length of the number of
> bytes copied so far, just like read() and write().  That allows
> the caller to continue where it left off, or abort and delete the
> target file, or whatever it wants to do.

We already return the number of bytes copied so far, so I'll look into making it interruptable!

Thanks,
Anna

> 
> Cheers, Andreas
> 
>>> +.I fd_in
>>> +and writing them to
>>> +.IR fd_out .
>>
>> "...if COPY_REFLINK is not set in flags."
>>
>>> +
>>> +Currently, Linux only supports the following flag:
>>> +.TP 1.9i
>>> +.B COPY_REFLINK
>>> +Only perform the copy if the filesystem can do it as a reflink.
>>> +Do not fall back on performing a deep copy.
>>> +.SH RETURN VALUE
>>> +Upon successful completion,
>>> +.BR copy_file_range ()
>>> +will return the number of bytes copied between files.
>>> +This could be less than the length originally requested.
>>> +
>>> +On error,
>>> +.BR copy_file_range ()
>>> +returns \-1 and
>>> +.I errno
>>> +is set to indicate the error.
>>> +.SH ERRORS
>>> +.TP
>>> +.B EBADF
>>> +One or more file descriptors are not valid,
>>> +or do not have proper read-write mode.
>>
>> "or fd_out is not opened for writing"?
>>
>>> +.TP
>>> +.B EINVAL
>>> +Requested range extends beyond the end of the file;
>>> +.I flags
>>> +argument is set to an invalid value.
>>> +.TP
>>> +.B EOPNOTSUPP
>>> +.B COPY_REFLINK
>>> +was specified in
>>> +.IR flags ,
>>> +but the target filesystem does not support reflinks.
>>> +.TP
>>> +.B EXDEV
>>> +Target filesystem doesn't support cross-filesystem copies.
>>> +.SH VERSIONS
>>
>> Perhaps this ought to list a few more errors (EIO, ENOSPC, ENOSYS, EPERM...)
>> that can be returned?  (I was looking at the fallocate manpage.)
>>
>> --D
>>
>>> +The
>>> +.BR copy_file_range ()
>>> +system call first appeared in Linux 4.3.
>>> +.SH CONFORMING TO
>>> +The
>>> +.BR copy_file_range ()
>>> +system call is a nonstandard Linux extension.
>>> +.SH EXAMPLE
>>> +.nf
>>> +
>>> +#define _GNU_SOURCE
>>> +#include <fcntl.h>
>>> +#include <linux/copy.h>
>>> +#include <stdio.h>
>>> +#include <stdlib.h>
>>> +#include <sys/stat.h>
>>> +#include <sys/syscall.h>
>>> +#include <unistd.h>
>>> +
>>> +
>>> +int main(int argc, char **argv)
>>> +{
>>> +    int fd_in, fd_out;
>>> +    struct stat stat;
>>> +    loff_t len, ret;
>>> +
>>> +    if (argc != 3) {
>>> +        fprintf(stderr, "Usage: %s <pathname> <pathname>\n", argv[0]);
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +
>>> +    fd_in = open(argv[1], O_RDONLY);
>>> +    if (fd_in == -1) {
>>> +        perror("open (argv[1])");
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +
>>> +    if (fstat(fd_in, &stat) == -1) {
>>> +        perror("fstat");
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +    len = stat.st_size;
>>> +
>>> +    fd_out = open(argv[2], O_WRONLY | O_CREAT, 0644);
>>> +    if (fd_out == -1) {
>>> +        perror("open (argv[2])");
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +
>>> +    do {
>>> +        ret = syscall(__NR_copy_file_range, fd_in, NULL,
>>> +                      fd_out, NULL, len, 0);
>>> +        if (ret == -1) {
>>> +            perror("copy_file_range");
>>> +            exit(EXIT_FAILURE);
>>> +        }
>>> +
>>> +        len -= ret;
>>> +    } while (len > 0);
>>> +
>>> +    close(fd_in);
>>> +    close(fd_out);
>>> +    exit(EXIT_SUCCESS);
>>> +}
>>> +.fi
>>> +.SH SEE ALSO
>>> +.BR splice (2)
>>> -- 
>>> 2.5.1
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> Cheers, Andreas
> 
> 
> 
> 
> 


WARNING: multiple messages have this Message-ID (diff)
From: Anna Schumaker <Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>
To: Andreas Dilger <adilger-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>,
	"Darrick J. Wong"
	<darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: <linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	<linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	<linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	<zab-ugsP4Wv/S6ZeoWH0uzbU5w@public.gmane.org>,
	<viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>,
	<clm-b10kYP2dOMg@public.gmane.org>,
	<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	<andros-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>,
	<hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Subject: Re: [PATCH v1 9/8] copy_file_range.2: New page documenting copy_file_range()
Date: Tue, 8 Sep 2015 11:05:18 -0400	[thread overview]
Message-ID: <55EEF92E.2090201@Netapp.com> (raw)
In-Reply-To: <95674806-645C-410C-8A4B-A46F03AFFE20-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>

On 09/04/2015 06:31 PM, Andreas Dilger wrote:
> On Sep 4, 2015, at 3:38 PM, Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>>
>> On Fri, Sep 04, 2015 at 04:17:03PM -0400, Anna Schumaker wrote:
>>> copy_file_range() is a new system call for copying ranges of data
>>> completely in the kernel.  This gives filesystems an opportunity to
>>> implement some kind of "copy acceleration", such as reflinks or
>>> server-side-copy (in the case of NFS).
>>>
>>> Signed-off-by: Anna Schumaker <Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
>>> ---
>>> man2/copy_file_range.2 | 168 +++++++++++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 168 insertions(+)
>>> create mode 100644 man2/copy_file_range.2
>>>
>>> diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
>>> new file mode 100644
>>> index 0000000..4a4cb73
>>> --- /dev/null
>>> +++ b/man2/copy_file_range.2
>>> @@ -0,0 +1,168 @@
>>> +.\"This manpage is Copyright (C) 2015 Anna Schumaker <Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
>>> +.TH COPY 2 2015-8-31 "Linux" "Linux Programmer's Manual"
>>> +.SH NAME
>>> +copy_file_range \- Copy a range of data from one file to another
>>> +.SH SYNOPSIS
>>> +.nf
>>> +.B #include <linux/copy.h>
>>> +.B #include <sys/syscall.h>
>>> +.B #include <unistd.h>
>>> +
>>> +.BI "ssize_t syscall(__NR_copy_file_range, int " fd_in ", loff_t * " off_in ",
>>> +.BI "                int " fd_out ", loff_t * " off_out ", size_t " len ",
>>> +.BI "                unsigned int " flags );
>>> +.fi
>>> +.SH DESCRIPTION
>>> +The
>>> +.BR copy_file_range ()
>>> +system call performs an in-kernel copy between two file descriptors
>>> +without all that tedious mucking about in userspace.
>>
>> ;)
>>
>>> +It copies up to
>>> +.I len
>>> +bytes of data from file descriptor
>>> +.I fd_in
>>> +to file descriptor
>>> +.I fd_out
>>> +at
>>> +.IR off_out .
>>> +The file descriptors must not refer to the same file.
>>
>> Why?  btrfs (and XFS) reflink can handle the case of a file sharing blocks
>> with itself.
>>
>>> +
>>> +The following semantics apply for
>>> +.IR fd_in ,
>>> +and similar statements apply to
>>> +.IR off_out :
>>> +.IP * 3
>>> +If
>>> +.I off_in
>>> +is NULL, then bytes are read from
>>> +.I fd_in
>>> +starting from the current file offset and the current
>>> +file offset is adjusted appropriately.
>>> +.IP *
>>> +If
>>> +.I off_in
>>> +is not NULL, then
>>> +.I off_in
>>> +must point to a buffer that specifies the starting
>>> +offset where bytes from
>>> +.I fd_in
>>> +will be read.  The current file offset of
>>> +.I fd_in
>>> +is not changed, but
>>> +.I off_in
>>> +is adjusted appropriately.
>>> +.PP
>>> +The default behavior of
>>> +.BR copy_file_range ()
>>> +is filesystem specific, and might result in creating a
>>> +copy-on-write reflink.
>>> +In the event that a given filesystem does not implement
>>> +any form of copy acceleration, the kernel will perform
>>> +a deep copy of the requested range by reading bytes from
>>
>> I wonder if it's wise to allow deep copies -- what happens if
>> len == 1T? Will this syscall just block for a really long time?
> 
> It should be interruptible, and return the length of the number of
> bytes copied so far, just like read() and write().  That allows
> the caller to continue where it left off, or abort and delete the
> target file, or whatever it wants to do.

We already return the number of bytes copied so far, so I'll look into making it interruptable!

Thanks,
Anna

> 
> Cheers, Andreas
> 
>>> +.I fd_in
>>> +and writing them to
>>> +.IR fd_out .
>>
>> "...if COPY_REFLINK is not set in flags."
>>
>>> +
>>> +Currently, Linux only supports the following flag:
>>> +.TP 1.9i
>>> +.B COPY_REFLINK
>>> +Only perform the copy if the filesystem can do it as a reflink.
>>> +Do not fall back on performing a deep copy.
>>> +.SH RETURN VALUE
>>> +Upon successful completion,
>>> +.BR copy_file_range ()
>>> +will return the number of bytes copied between files.
>>> +This could be less than the length originally requested.
>>> +
>>> +On error,
>>> +.BR copy_file_range ()
>>> +returns \-1 and
>>> +.I errno
>>> +is set to indicate the error.
>>> +.SH ERRORS
>>> +.TP
>>> +.B EBADF
>>> +One or more file descriptors are not valid,
>>> +or do not have proper read-write mode.
>>
>> "or fd_out is not opened for writing"?
>>
>>> +.TP
>>> +.B EINVAL
>>> +Requested range extends beyond the end of the file;
>>> +.I flags
>>> +argument is set to an invalid value.
>>> +.TP
>>> +.B EOPNOTSUPP
>>> +.B COPY_REFLINK
>>> +was specified in
>>> +.IR flags ,
>>> +but the target filesystem does not support reflinks.
>>> +.TP
>>> +.B EXDEV
>>> +Target filesystem doesn't support cross-filesystem copies.
>>> +.SH VERSIONS
>>
>> Perhaps this ought to list a few more errors (EIO, ENOSPC, ENOSYS, EPERM...)
>> that can be returned?  (I was looking at the fallocate manpage.)
>>
>> --D
>>
>>> +The
>>> +.BR copy_file_range ()
>>> +system call first appeared in Linux 4.3.
>>> +.SH CONFORMING TO
>>> +The
>>> +.BR copy_file_range ()
>>> +system call is a nonstandard Linux extension.
>>> +.SH EXAMPLE
>>> +.nf
>>> +
>>> +#define _GNU_SOURCE
>>> +#include <fcntl.h>
>>> +#include <linux/copy.h>
>>> +#include <stdio.h>
>>> +#include <stdlib.h>
>>> +#include <sys/stat.h>
>>> +#include <sys/syscall.h>
>>> +#include <unistd.h>
>>> +
>>> +
>>> +int main(int argc, char **argv)
>>> +{
>>> +    int fd_in, fd_out;
>>> +    struct stat stat;
>>> +    loff_t len, ret;
>>> +
>>> +    if (argc != 3) {
>>> +        fprintf(stderr, "Usage: %s <pathname> <pathname>\n", argv[0]);
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +
>>> +    fd_in = open(argv[1], O_RDONLY);
>>> +    if (fd_in == -1) {
>>> +        perror("open (argv[1])");
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +
>>> +    if (fstat(fd_in, &stat) == -1) {
>>> +        perror("fstat");
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +    len = stat.st_size;
>>> +
>>> +    fd_out = open(argv[2], O_WRONLY | O_CREAT, 0644);
>>> +    if (fd_out == -1) {
>>> +        perror("open (argv[2])");
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +
>>> +    do {
>>> +        ret = syscall(__NR_copy_file_range, fd_in, NULL,
>>> +                      fd_out, NULL, len, 0);
>>> +        if (ret == -1) {
>>> +            perror("copy_file_range");
>>> +            exit(EXIT_FAILURE);
>>> +        }
>>> +
>>> +        len -= ret;
>>> +    } while (len > 0);
>>> +
>>> +    close(fd_in);
>>> +    close(fd_out);
>>> +    exit(EXIT_SUCCESS);
>>> +}
>>> +.fi
>>> +.SH SEE ALSO
>>> +.BR splice (2)
>>> -- 
>>> 2.5.1
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> Cheers, Andreas
> 
> 
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: Anna Schumaker <Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>
To: Andreas Dilger <adilger-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>,
	"Darrick J. Wong"
	<darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	zab-ugsP4Wv/S6ZeoWH0uzbU5w@public.gmane.org,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org,
	clm-b10kYP2dOMg@public.gmane.org,
	mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	andros-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org,
	hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org
Subject: Re: [PATCH v1 9/8] copy_file_range.2: New page documenting copy_file_range()
Date: Tue, 8 Sep 2015 11:05:18 -0400	[thread overview]
Message-ID: <55EEF92E.2090201@Netapp.com> (raw)
In-Reply-To: <95674806-645C-410C-8A4B-A46F03AFFE20-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>

On 09/04/2015 06:31 PM, Andreas Dilger wrote:
> On Sep 4, 2015, at 3:38 PM, Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>>
>> On Fri, Sep 04, 2015 at 04:17:03PM -0400, Anna Schumaker wrote:
>>> copy_file_range() is a new system call for copying ranges of data
>>> completely in the kernel.  This gives filesystems an opportunity to
>>> implement some kind of "copy acceleration", such as reflinks or
>>> server-side-copy (in the case of NFS).
>>>
>>> Signed-off-by: Anna Schumaker <Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
>>> ---
>>> man2/copy_file_range.2 | 168 +++++++++++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 168 insertions(+)
>>> create mode 100644 man2/copy_file_range.2
>>>
>>> diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
>>> new file mode 100644
>>> index 0000000..4a4cb73
>>> --- /dev/null
>>> +++ b/man2/copy_file_range.2
>>> @@ -0,0 +1,168 @@
>>> +.\"This manpage is Copyright (C) 2015 Anna Schumaker <Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
>>> +.TH COPY 2 2015-8-31 "Linux" "Linux Programmer's Manual"
>>> +.SH NAME
>>> +copy_file_range \- Copy a range of data from one file to another
>>> +.SH SYNOPSIS
>>> +.nf
>>> +.B #include <linux/copy.h>
>>> +.B #include <sys/syscall.h>
>>> +.B #include <unistd.h>
>>> +
>>> +.BI "ssize_t syscall(__NR_copy_file_range, int " fd_in ", loff_t * " off_in ",
>>> +.BI "                int " fd_out ", loff_t * " off_out ", size_t " len ",
>>> +.BI "                unsigned int " flags );
>>> +.fi
>>> +.SH DESCRIPTION
>>> +The
>>> +.BR copy_file_range ()
>>> +system call performs an in-kernel copy between two file descriptors
>>> +without all that tedious mucking about in userspace.
>>
>> ;)
>>
>>> +It copies up to
>>> +.I len
>>> +bytes of data from file descriptor
>>> +.I fd_in
>>> +to file descriptor
>>> +.I fd_out
>>> +at
>>> +.IR off_out .
>>> +The file descriptors must not refer to the same file.
>>
>> Why?  btrfs (and XFS) reflink can handle the case of a file sharing blocks
>> with itself.
>>
>>> +
>>> +The following semantics apply for
>>> +.IR fd_in ,
>>> +and similar statements apply to
>>> +.IR off_out :
>>> +.IP * 3
>>> +If
>>> +.I off_in
>>> +is NULL, then bytes are read from
>>> +.I fd_in
>>> +starting from the current file offset and the current
>>> +file offset is adjusted appropriately.
>>> +.IP *
>>> +If
>>> +.I off_in
>>> +is not NULL, then
>>> +.I off_in
>>> +must point to a buffer that specifies the starting
>>> +offset where bytes from
>>> +.I fd_in
>>> +will be read.  The current file offset of
>>> +.I fd_in
>>> +is not changed, but
>>> +.I off_in
>>> +is adjusted appropriately.
>>> +.PP
>>> +The default behavior of
>>> +.BR copy_file_range ()
>>> +is filesystem specific, and might result in creating a
>>> +copy-on-write reflink.
>>> +In the event that a given filesystem does not implement
>>> +any form of copy acceleration, the kernel will perform
>>> +a deep copy of the requested range by reading bytes from
>>
>> I wonder if it's wise to allow deep copies -- what happens if
>> len == 1T? Will this syscall just block for a really long time?
> 
> It should be interruptible, and return the length of the number of
> bytes copied so far, just like read() and write().  That allows
> the caller to continue where it left off, or abort and delete the
> target file, or whatever it wants to do.

We already return the number of bytes copied so far, so I'll look into making it interruptable!

Thanks,
Anna

> 
> Cheers, Andreas
> 
>>> +.I fd_in
>>> +and writing them to
>>> +.IR fd_out .
>>
>> "...if COPY_REFLINK is not set in flags."
>>
>>> +
>>> +Currently, Linux only supports the following flag:
>>> +.TP 1.9i
>>> +.B COPY_REFLINK
>>> +Only perform the copy if the filesystem can do it as a reflink.
>>> +Do not fall back on performing a deep copy.
>>> +.SH RETURN VALUE
>>> +Upon successful completion,
>>> +.BR copy_file_range ()
>>> +will return the number of bytes copied between files.
>>> +This could be less than the length originally requested.
>>> +
>>> +On error,
>>> +.BR copy_file_range ()
>>> +returns \-1 and
>>> +.I errno
>>> +is set to indicate the error.
>>> +.SH ERRORS
>>> +.TP
>>> +.B EBADF
>>> +One or more file descriptors are not valid,
>>> +or do not have proper read-write mode.
>>
>> "or fd_out is not opened for writing"?
>>
>>> +.TP
>>> +.B EINVAL
>>> +Requested range extends beyond the end of the file;
>>> +.I flags
>>> +argument is set to an invalid value.
>>> +.TP
>>> +.B EOPNOTSUPP
>>> +.B COPY_REFLINK
>>> +was specified in
>>> +.IR flags ,
>>> +but the target filesystem does not support reflinks.
>>> +.TP
>>> +.B EXDEV
>>> +Target filesystem doesn't support cross-filesystem copies.
>>> +.SH VERSIONS
>>
>> Perhaps this ought to list a few more errors (EIO, ENOSPC, ENOSYS, EPERM...)
>> that can be returned?  (I was looking at the fallocate manpage.)
>>
>> --D
>>
>>> +The
>>> +.BR copy_file_range ()
>>> +system call first appeared in Linux 4.3.
>>> +.SH CONFORMING TO
>>> +The
>>> +.BR copy_file_range ()
>>> +system call is a nonstandard Linux extension.
>>> +.SH EXAMPLE
>>> +.nf
>>> +
>>> +#define _GNU_SOURCE
>>> +#include <fcntl.h>
>>> +#include <linux/copy.h>
>>> +#include <stdio.h>
>>> +#include <stdlib.h>
>>> +#include <sys/stat.h>
>>> +#include <sys/syscall.h>
>>> +#include <unistd.h>
>>> +
>>> +
>>> +int main(int argc, char **argv)
>>> +{
>>> +    int fd_in, fd_out;
>>> +    struct stat stat;
>>> +    loff_t len, ret;
>>> +
>>> +    if (argc != 3) {
>>> +        fprintf(stderr, "Usage: %s <pathname> <pathname>\n", argv[0]);
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +
>>> +    fd_in = open(argv[1], O_RDONLY);
>>> +    if (fd_in == -1) {
>>> +        perror("open (argv[1])");
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +
>>> +    if (fstat(fd_in, &stat) == -1) {
>>> +        perror("fstat");
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +    len = stat.st_size;
>>> +
>>> +    fd_out = open(argv[2], O_WRONLY | O_CREAT, 0644);
>>> +    if (fd_out == -1) {
>>> +        perror("open (argv[2])");
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +
>>> +    do {
>>> +        ret = syscall(__NR_copy_file_range, fd_in, NULL,
>>> +                      fd_out, NULL, len, 0);
>>> +        if (ret == -1) {
>>> +            perror("copy_file_range");
>>> +            exit(EXIT_FAILURE);
>>> +        }
>>> +
>>> +        len -= ret;
>>> +    } while (len > 0);
>>> +
>>> +    close(fd_in);
>>> +    close(fd_out);
>>> +    exit(EXIT_SUCCESS);
>>> +}
>>> +.fi
>>> +.SH SEE ALSO
>>> +.BR splice (2)
>>> -- 
>>> 2.5.1
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> Cheers, Andreas
> 
> 
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2015-09-08 15:05 UTC|newest]

Thread overview: 121+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-04 20:16 [PATCH v1 0/8] VFS: In-kernel copy system call Anna Schumaker
2015-09-04 20:16 ` Anna Schumaker
2015-09-04 20:16 ` Anna Schumaker
2015-09-04 20:16 ` [PATCH v1 1/9] vfs: add copy_file_range syscall and vfs helper Anna Schumaker
2015-09-04 20:16   ` Anna Schumaker
2015-09-04 20:16   ` Anna Schumaker
2015-09-04 21:50   ` Darrick J. Wong
2015-09-04 20:16 ` [PATCH v1 2/8] x86: add sys_copy_file_range to syscall tables Anna Schumaker
2015-09-04 20:16   ` Anna Schumaker
2015-09-04 20:16 ` [PATCH v1 3/8] btrfs: add .copy_file_range file operation Anna Schumaker
2015-09-04 20:16   ` Anna Schumaker
2015-09-04 21:02   ` Josef Bacik
2015-09-04 21:02     ` Josef Bacik
2015-09-04 21:02     ` Josef Bacik
2015-09-09  8:39   ` David Sterba
2015-09-04 20:16 ` [PATCH v1 4/8] btrfs: Add mountpoint checking during btrfs_copy_file_range Anna Schumaker
2015-09-04 20:16   ` Anna Schumaker
2015-09-04 20:16   ` Anna Schumaker
2015-09-09  9:18   ` David Sterba
2015-09-09 15:56     ` Anna Schumaker
2015-09-09 15:56       ` Anna Schumaker
2015-09-04 20:16 ` [PATCH v1 5/8] vfs: Remove copy_file_range mountpoint checks Anna Schumaker
2015-09-04 20:16   ` Anna Schumaker
2015-09-04 20:16   ` Anna Schumaker
2015-09-04 20:17 ` [PATCH v1 6/8] vfs: Copy should check len after file open mode Anna Schumaker
2015-09-04 20:17   ` Anna Schumaker
2015-09-04 20:17   ` Anna Schumaker
2015-09-04 20:17 ` [PATCH v1 7/8] vfs: Copy should use file_out rather than file_in Anna Schumaker
2015-09-04 20:17   ` Anna Schumaker
2015-09-04 20:17 ` [PATCH v1 8/8] vfs: Fall back on splice if no copy function defined Anna Schumaker
2015-09-04 20:17   ` Anna Schumaker
2015-09-04 20:17   ` Anna Schumaker
2015-09-04 21:08   ` Darrick J. Wong
2015-09-08 14:57     ` Anna Schumaker
2015-09-08 14:57       ` Anna Schumaker
2015-09-08 14:57       ` Anna Schumaker
2015-09-04 20:17 ` [PATCH v1 9/8] copy_file_range.2: New page documenting copy_file_range() Anna Schumaker
2015-09-04 20:17   ` Anna Schumaker
2015-09-04 20:17   ` Anna Schumaker
2015-09-04 21:38   ` Darrick J. Wong
2015-09-04 22:31     ` Andreas Dilger
2015-09-04 22:31       ` Andreas Dilger
2015-09-08 15:05       ` Anna Schumaker [this message]
2015-09-08 15:05         ` Anna Schumaker
2015-09-08 15:05         ` Anna Schumaker
2015-09-08 15:04     ` Anna Schumaker
2015-09-08 15:04       ` Anna Schumaker
2015-09-08 15:04       ` Anna Schumaker
2015-09-08 20:39       ` Darrick J. Wong
2015-09-09  9:16         ` David Sterba
2015-09-09 11:38         ` Austin S Hemmelgarn
2015-09-09 11:38           ` Austin S Hemmelgarn
2015-09-09 17:17           ` Darrick J. Wong
2015-09-09 17:31             ` Anna Schumaker
2015-09-09 17:31               ` Anna Schumaker
2015-09-09 17:31               ` Anna Schumaker
2015-09-09 18:12               ` Darrick J. Wong
2015-09-09 18:12                 ` Darrick J. Wong
2015-09-09 19:25                 ` Anna Schumaker
2015-09-09 19:25                   ` Anna Schumaker
2015-09-10 15:42             ` David Sterba
2015-09-10 15:42               ` David Sterba
2015-09-10 16:43               ` Darrick J. Wong
2015-09-10 16:43                 ` Darrick J. Wong
2015-09-04 22:25 ` [PATCH v1 0/8] VFS: In-kernel copy system call Andreas Dilger
2015-09-04 22:25   ` Andreas Dilger
2015-09-05  8:33   ` Al Viro
2015-09-05  8:33     ` Al Viro
2015-09-08 15:08     ` Anna Schumaker
2015-09-08 15:08       ` Anna Schumaker
2015-09-08 15:08       ` Anna Schumaker
2015-09-08 20:45       ` Darrick J. Wong
2015-09-08 20:49         ` Anna Schumaker
2015-09-08 20:49           ` Anna Schumaker
2015-09-08 20:49           ` Anna Schumaker
2015-09-08 15:07   ` Anna Schumaker
2015-09-08 15:07     ` Anna Schumaker
2015-09-08 15:21 ` Pádraig Brady
2015-09-08 15:21   ` Pádraig Brady
2015-09-08 18:23   ` Anna Schumaker
2015-09-08 18:23     ` Anna Schumaker
2015-09-08 19:10     ` Andy Lutomirski
2015-09-08 19:10       ` Andy Lutomirski
2015-09-08 20:03       ` Pádraig Brady
2015-09-08 20:03         ` Pádraig Brady
2015-09-08 21:29         ` Darrick J. Wong
2015-09-08 21:29           ` Darrick J. Wong
2015-09-08 21:45           ` Andy Lutomirski
2015-09-08 21:45             ` Andy Lutomirski
2015-09-08 22:39             ` Darrick J. Wong
2015-09-08 22:39               ` Darrick J. Wong
2015-09-08 23:08               ` Andy Lutomirski
2015-09-08 23:08                 ` Andy Lutomirski
2015-09-09  1:19                 ` Darrick J. Wong
2015-09-09  1:19                   ` Darrick J. Wong
2015-09-09 20:09                 ` Chris Mason
2015-09-09 20:26                   ` Trond Myklebust
2015-09-09 20:26                     ` Trond Myklebust
2015-09-09 20:38                     ` Chris Mason
2015-09-09 20:38                       ` Chris Mason
2015-09-09 20:41                       ` Anna Schumaker
2015-09-09 20:41                         ` Anna Schumaker
2015-09-09 21:42                         ` Darrick J. Wong
2015-09-09 21:42                           ` Darrick J. Wong
2015-09-09 20:37                   ` Andy Lutomirski
2015-09-09 20:37                     ` Andy Lutomirski
2015-09-09 20:42                     ` Chris Mason
2015-09-09 20:42                       ` Chris Mason
2015-09-13 23:25                 ` Dave Chinner
2015-09-13 23:25                   ` Dave Chinner
2015-09-14 17:53                   ` Andy Lutomirski
2015-09-14 17:53                     ` Andy Lutomirski
2015-09-09 18:52               ` Anna Schumaker
2015-09-09 18:52                 ` Anna Schumaker
2015-09-09 21:16                 ` Darrick J. Wong
2015-09-09 21:16                   ` Darrick J. Wong
2015-09-10 15:10                   ` Anna Schumaker
2015-09-10 15:10                     ` Anna Schumaker
2015-09-10 15:49                     ` Austin S Hemmelgarn
2015-09-10 15:49                       ` Austin S Hemmelgarn
2015-09-10 11:40                 ` Austin S Hemmelgarn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55EEF92E.2090201@Netapp.com \
    --to=anna.schumaker@netapp.com \
    --cc=adilger@dilger.ca \
    --cc=andros@netapp.com \
    --cc=clm@fb.com \
    --cc=darrick.wong@oracle.com \
    --cc=hch@infradead.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mtk.manpages@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zab@zabbo.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.