All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
To: Anna Schumaker <Anna.Schumaker@netapp.com>,
	linux-nfs@vger.kernel.org, linux-btrfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org,
	zab@zabbo.net, viro@zeniv.linux.org.uk, clm@fb.com,
	darrick.wong@oracle.com, andros@netapp.com, hch@infradead.org
Cc: mtk.manpages@gmail.com
Subject: Re: [PATCH v2 10/9] copy_file_range.2: New page documenting copy_file_range()
Date: Sun, 13 Sep 2015 09:50:18 +0200	[thread overview]
Message-ID: <55F52ABA.9070908@gmail.com> (raw)
In-Reply-To: <1442003423-6884-11-git-send-email-Anna.Schumaker@Netapp.com>

Hi Anna,

On 09/11/2015 10:30 PM, Anna Schumaker wrote:
> copy_file_range() is a new system call for copying ranges of data
> completely in the kernel.  This gives filesystems an opportunity to
> implement some kind of "copy acceleration", such as reflinks or
> server-side-copy (in the case of NFS).
> 
> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Thanks for writing such a nice first draft man page! I have a few
comments below. Would you be willing to revise and resubmit?

> ---
>  man2/copy_file_range.2 | 188 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 188 insertions(+)
>  create mode 100644 man2/copy_file_range.2
> 
> diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
> new file mode 100644
> index 0000000..84912b5
> --- /dev/null
> +++ b/man2/copy_file_range.2
> @@ -0,0 +1,188 @@
> +.\"This manpage is Copyright (C) 2015 Anna Schumaker <Anna.Schumaker@Netapp.com>

We need a license for this page. Please see
https://www.kernel.org/doc/man-pages/licenses.html

> +.TH COPY 2 2015-8-31 "Linux" "Linux Programmer's Manual"

Make the month 2 digits (leading 0).

> +.SH NAME
> +copy_file_range \- Copy a range of data from one file to another
> +.SH SYNOPSIS
> +.nf
> +.B #include <linux/copy.h>
> +.B #include <sys/syscall.h>
> +.B #include <unistd.h>
> +
> +.BI "ssize_t syscall(__NR_copy_file_range, int " fd_in ", loff_t * " off_in ",
> +.BI "                int " fd_out ", loff_t * " off_out ", size_t " len ",
> +.BI "                unsigned int " flags );

Remove spaces following "*" in the lines above. (man-pages convention for code)

I know that the copy_file_range() (obviously) doesn't yet have a wrapper
in glibc, but in the man pages we document all system calls as though there
is a wrapper. See, for example, gettid(2), for an axample of how this is done
(a note in the SYNOPSIS and then some further details in NOTES).

> +.fi
> +.SH DESCRIPTION
> +The
> +.BR copy_file_range ()
> +system call performs an in-kernel copy between two file descriptors
> +without all that tedious mucking about in userspace.

I'd write that last piece as:

"without the cost of (a loop) transferring data from the kernel to a 
user-space buffer and then back to the kernel again.

> +It copies up to
> +.I len
> +bytes of data from file descriptor
> +.I fd_in
> +to file descriptor
> +.IR fd_out ,
> +overwriting any data that exists within the requested range.

s/.$/ of the target file./

> +
> +The following semantics apply for
> +.IR off_in ,
> +and similar statements apply to
> +.IR off_out :
> +.IP * 3
> +If
> +.I off_in
> +is NULL, then bytes are read from
> +.I fd_in
> +starting from the current file offset and the current
> +file offset is adjusted appropriately.
> +.IP *
> +If
> +.I off_in
> +is not NULL, then
> +.I off_in
> +must point to a buffer that specifies the starting
> +offset where bytes from
> +.I fd_in
> +will be read.  The current file offset of
> +.I fd_in
> +is not changed, but
> +.I off_in
> +is adjusted appropriately.
> +.PP
> +
> +The
> +.I flags
> +argument is a bit mask composed by OR-ing together zero
> +or more of the following flags:
> +.TP 1.9i
> +.B COPY_FR_COPY
> +Copy all the file data in the requested range.
> +Some filesystems, like NFS, might be able to accelerate this copy
> +to avoid unnecessary data transfers.
> +.TP
> +.B COPY_FR_REFLINK
> +Create a lightweight "reflink", where data is not copied until
> +one of the files is modified.
> +.PP
> +The default behavior
> +.RI ( flags
> +== 0) is to try creating a reflink,
> +and if reflinking fails
> +.BR copy_file_range ()
> +will fall back on performing a full data copy.

s/back on/back to/

> +This is equivalent to setting
> +.I flags
> +equal to
> +.RB ( COPY_FR_COPY | COPY_FR_REFLINK ).

So, from an API deign perspective, the interoperation of these two
flags appears odd. Bit flags are commonly (not always) orthogonal.
I think here it could be pointed out a little more explicitly that
these two flags are not orthogonal. In particular, perhaps after the
last sentence, you could add another sentence:

[[
(This contrasts with specifying
.I flags
as just
.BR COPY_FR_REFLINK ,
which causes the call to create a reflink,
and fail if that is not possible,
rather than falling back to a full data copy.)
]]

Furthermore, I even wonder if explicitly specifying flags as
COPY_FR_COPY | COPY_FR_REFLINK should just generate an EINVAL
error. 0 already gives us the behavior described above,
and allowing the combination COPY_FR_COPY | COPY_FR_REFLINK
perhaps just contributes to misleading the user that these
flags are orthogonal, when in reality they are not. What do
you think?

What are the semantics with respect to signals, especially if data 
copying a very large file range? This info should be included in the 
man page, probably under NOTES.

> +.SH RETURN VALUE
> +Upon successful completion,
> +.BR copy_file_range ()
> +will return the number of bytes copied between files.
> +This could be less than the length originally requested.
> +
> +On error,
> +.BR copy_file_range ()
> +returns \-1 and
> +.I errno
> +is set to indicate the error.
> +.SH ERRORS
> +.TP
> +.B EBADF
> +One or more file descriptors are not valid,
> +or do not have proper read-write mode;

I think that last line can go. I mean, isn't this point (again)
covered in the next few lines?

> +.I fd_in
> +is not open for reading; or
> +.I fd_out
> +is not open for writing.
> +.TP
> +.B EINVAL
> +Requested range extends beyond the end of the file; or the

s/file/source file/  (right?)

> +.I flags
> +argument is set to an invalid value.
> +.TP
> +.B EIO
> +A low level I/O error occurred while copying.
> +.TP
> +.B ENOMEM
> +Out of memory.
> +.TP
> +.B ENOSPC
> +There is not enough space to complete the copy.

Space where? On the filesystem?
=> s/space/space on the target filesystem/

> +.TP
> +.B EOPNOTSUPP
> +.B COPY_REFLINK
> +was specified in
> +.IR flags ,
> +but the target filesystem does not support reflinks.
> +.TP
> +.B EXDEV
> +Target filesystem doesn't support cross-filesystem copies.

I'm curious. What are some examples of filesystems that produce this
error?

> +.SH VERSIONS
> +The
> +.BR copy_file_range ()
> +system call first appeared in Linux 4.4.
> +.SH CONFORMING TO
> +The
> +.BR copy_file_range ()
> +system call is a nonstandard Linux extension.
> +.SH EXAMPLE
> +.nf
> +
> +#define _GNU_SOURCE
> +#include <fcntl.h>
> +#include <linux/copy.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <sys/stat.h>
> +#include <sys/syscall.h>
> +#include <unistd.h>
> +
> +
> +int main(int argc, char **argv)
> +{
> +    int fd_in, fd_out;
> +    struct stat stat;
> +    loff_t len, ret;
> +
> +    if (argc != 3) {
> +        fprintf(stderr, "Usage: %s <source> <destination>\\n", argv[0]);
> +        exit(EXIT_FAILURE);
> +    }
> +
> +    fd_in = open(argv[1], O_RDONLY);
> +    if (fd_in == -1) {

Please replace all "-" in code by "\-". (This is a groff
detail.)

> +        perror("open (argv[1])");
> +        exit(EXIT_FAILURE);
> +    }
> +
> +    if (fstat(fd_in, &stat) == -1) {
> +        perror("fstat");
> +        exit(EXIT_FAILURE);
> +    }
> +    len = stat.st_size;
> +
> +    fd_out = creat(argv[2], 0644);

These days, I think we should discourage the use of creat(). Maybe 
better to use open() instead here?

> +    if (fd_out == -1) {
> +        perror("creat (argv[2])");
> +        exit(EXIT_FAILURE);
> +    }
> +
> +    do {
> +        ret = syscall(__NR_copy_file_range, fd_in, NULL,
> +                      fd_out, NULL, len, 0);

I'd rather see this as:

int
copy_file_range(int fd_in, loff_t *off_in,
                int fd_out, loff_t *off_out, size_t len,
                unsigned int flags)
{
    return(syscall(__NR_copy_file_range, fd_in, fd_out, off_out, len, flags);
}

...

    copy_file_range(fd_in, fd_out, off_out, len, flags);

 
> +        if (ret == -1) {
> +            perror("copy_file_range");
> +            exit(EXIT_FAILURE);
> +        }
> +
> +        len -= ret;
> +    } while (len > 0);
> +
> +    close(fd_in);
> +    close(fd_out);
> +    exit(EXIT_SUCCESS);
> +}
> +.fi
> +.SH SEE ALSO
> +.BR splice (2)
> 

In the next iteration of this patch, could you include a change to
splice(2) so that copy_file_range(2) is added under SEE ALSO in
that page. Also, are there any other pages that we should like
to/from? (sendfile(2)?)

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

  reply	other threads:[~2015-09-13  7:50 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-11 20:30 [PATCH v2 0/9] VFS: In-kernel copy system call Anna Schumaker
2015-09-11 20:30 ` Anna Schumaker
2015-09-11 20:30 ` [PATCH v2 1/9] vfs: add copy_file_range syscall and vfs helper Anna Schumaker
2015-09-11 20:30   ` Anna Schumaker
2015-09-22 11:44   ` David Sterba
2015-09-22 11:44     ` David Sterba
2015-09-22 18:27     ` Anna Schumaker
2015-09-22 18:27       ` Anna Schumaker
2015-09-22 18:27       ` Anna Schumaker
2015-09-11 20:30 ` [PATCH v2 2/9] x86: add sys_copy_file_range to syscall tables Anna Schumaker
2015-09-11 20:30   ` Anna Schumaker
2015-09-11 20:30 ` [PATCH v2 3/9] btrfs: add .copy_file_range file operation Anna Schumaker
2015-09-11 20:30   ` Anna Schumaker
2015-09-11 20:30 ` [PATCH v2 4/9] vfs: Copy should check len after file open mode Anna Schumaker
2015-09-11 20:30   ` Anna Schumaker
2015-09-22 11:47   ` David Sterba
2015-09-22 11:47     ` David Sterba
2015-09-11 20:30 ` [PATCH v2 5/9] vfs: Copy shouldn't forbid ranges inside the same file Anna Schumaker
2015-09-11 20:30   ` Anna Schumaker
2015-09-11 20:30   ` Anna Schumaker
2015-09-22 11:48   ` David Sterba
2015-09-11 20:30 ` [PATCH v2 6/9] vfs: Copy should use file_out rather than file_in Anna Schumaker
2015-09-11 20:30   ` Anna Schumaker
2015-09-11 20:30 ` [PATCH v2 7/9] vfs: Remove copy_file_range mountpoint checks Anna Schumaker
2015-09-11 20:30   ` Anna Schumaker
2015-09-22 11:52   ` David Sterba
2015-09-11 20:30 ` [PATCH v2 8/9] vfs: copy_file_range() can do a pagecache copy with splice Anna Schumaker
2015-09-11 20:30   ` Anna Schumaker
2015-09-11 20:30   ` Anna Schumaker
2015-09-15  3:32   ` Darrick J. Wong
2015-09-15 15:58     ` Anna Schumaker
2015-09-15 15:58       ` Anna Schumaker
2015-09-15 15:58       ` Anna Schumaker
2015-09-15 16:38       ` Darrick J. Wong
2015-09-15 16:38         ` Darrick J. Wong
2015-09-15 17:01         ` Austin S Hemmelgarn
2015-09-15 17:01           ` Austin S Hemmelgarn
2015-09-11 20:30 ` [PATCH v2 9/9] btrfs: btrfs_copy_file_range() only supports reflinks Anna Schumaker
2015-09-11 20:30   ` Anna Schumaker
2015-09-22 11:56   ` David Sterba
2015-09-11 20:30 ` [PATCH v2 10/9] copy_file_range.2: New page documenting copy_file_range() Anna Schumaker
2015-09-11 20:30   ` Anna Schumaker
2015-09-13  7:50   ` Michael Kerrisk (man-pages) [this message]
2015-09-14 18:32     ` Darrick J. Wong
2015-09-14 18:32       ` Darrick J. Wong
2015-09-22 20:10       ` Anna Schumaker
2015-09-22 20:10         ` Anna Schumaker
2015-09-22 20:10         ` Anna Schumaker
2015-09-22 20:30         ` Pádraig Brady
2015-09-22 20:30           ` Pádraig Brady
2015-09-28 17:23         ` Darrick J. Wong
2015-09-14 19:02     ` Austin S Hemmelgarn
2015-09-22 20:30     ` Anna Schumaker
2015-09-22 20:30       ` Anna Schumaker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55F52ABA.9070908@gmail.com \
    --to=mtk.manpages@gmail.com \
    --cc=Anna.Schumaker@netapp.com \
    --cc=andros@netapp.com \
    --cc=clm@fb.com \
    --cc=darrick.wong@oracle.com \
    --cc=hch@infradead.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zab@zabbo.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.