All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@poochiereds.net>
To: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Zach Brown <zab@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux FS-devel Mailing List <linux-fsdevel@vger.kernel.org>,
	linux-btrfs@vger.kernel.org,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	linux-scsi@vger.kernel.org
Subject: Re: [PATCH RFC 1/3] vfs: add copy_file_range syscall and vfs helper
Date: Sat, 11 Apr 2015 09:04:02 -0400	[thread overview]
Message-ID: <20150411090402.67d22d02@tlielax.poochiereds.net> (raw)
In-Reply-To: <CAHQdGtSaaPHbvtBM3yVGtAFpEAP8ZquYvErknq_rCoCCK+W2bA@mail.gmail.com>

On Fri, 10 Apr 2015 20:24:06 -0400
Trond Myklebust <trond.myklebust@primarydata.com> wrote:

> On Fri, Apr 10, 2015 at 8:02 PM, Zach Brown <zab@redhat.com> wrote:
> > On Fri, Apr 10, 2015 at 06:36:41PM -0400, Trond Myklebust wrote:
> >> On Fri, Apr 10, 2015 at 6:00 PM, Zach Brown <zab@redhat.com> wrote:
> >
> >> > +
> >> > +/*
> >> > + * copy_file_range() differs from regular file read and write in that it
> >> > + * specifically allows return partial success.  When it does so is up to
> >> > + * the copy_file_range method.
> >> > + */
> >> > +ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >> > +                           struct file *file_out, loff_t pos_out,
> >> > +                           size_t len, int flags)
> >>
> >> I'm going to repeat a gripe with this interface. I really don't think
> >> we should treat copy_file_range() as taking a size_t length, since
> >> that is not sufficient to do a full file copy on 32-bit systems w/ LFS
> >> support.
> >
> > *nod*.  The length type is limited by the syscall return type and the
> > arbitrary desire to mimic read/write.
> >
> > I sympathize with wanting to copy giant files with operations that don't
> > scale with file size because files can be enormous but sparse.
> 
> The other argument against using a size_t is that there is no memory
> buffer involved here. size_t is, after all, a type describing
> in-memory objects, not files.
> 
> >> Could we perhaps instead of a length, define a 'pos_in_start' and a
> >> 'pos_in_end' offset (with the latter being -1 for a full-file copy)
> >> and then return an 'loff_t' value stating where the copy ended?
> >
> > Well, the resulting offset will be set if the caller provided it.  So
> > they could already be getting the copied length from that.  But they
> > might not specify the offsets.  Maybe they're just using the results to
> > total up a completion indicator.
> >
> > Maybe we could make the length a pointer like the offsets that's set to
> > the copied length on return.
> 
> That works, but why do we care so much about the difference between a
> length and an offset as a return value?
> 

I think it just comes down to potential confusion for users. What's
more useful, the number of bytes actually copied, or the offset into the
file where the copy ended?

I tend to the think an offset is more useful for someone trying to
copy a file in chunks, particularly if the file is sparse. That gives
them a clear place to continue the copy.

So, I think I agree with Trond that phrasing this interface in terms of
file offsets seems like it might be more useful. That also neatly
sidesteps the size_t limitations on 32-bit platforms.

> To be fair, the NFS copy offload also allows the copy to proceed out
> of order, in which case the range of copied data could be
> non-contiguous in the case of a failure. However neither the length
> nor the offset case will give you the full story in that case. Any
> return value can at best be considered to define an offset range whose
> contents need to be checked for success/failure.
> 

Yuck! How the heck do you clean up the mess if that happens? I guess
you're just stuck redoing the copy with normal READ/WRITE?

Maybe we need to have the interface return a hard error in that
case and not try to give back any sort of offset?

> > This all seems pretty gross.  Does anyone else have a vote?
> >
> > (And I'll argue strongly against creating magical offset values that
> > change behaviour.  If we want to ignore arguments and get the length
> > from the source file we'd add a flag to do so.)
> 
> The '-1' was not intended to be a special/magical value: as far as I'm
> concerned any end offset that covers the full range of supported file
> lengths would be OK.
> 

Agreed. A "whole file" flag might also be useful too, but I'd leave
that for after the initial implementation is merged, just in the
interest of having _something_ that works in the near term.

> >> Note that both btrfs and NFSv4.2 allow for 64-bit lengths, so this
> >> interface would be closer to what is already in use anyway.
> >
> > Yeah, btrfs doesn't allow partial progress.  It returns 0 on success.
> > We could also do that but people have expressed an interest in returning
> > partial progress.
> 
> Returning an end offset would satisfy the partial progress requirement
> (with the caveat mentioned above).
> 

-- 
Jeff Layton <jlayton@poochiereds.net>

  reply	other threads:[~2015-04-11 13:04 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-10 22:00 [PATCH RFC 0/3] simple copy offloading system call Zach Brown
2015-04-10 22:00 ` [PATCH RFC 1/3] vfs: add copy_file_range syscall and vfs helper Zach Brown
2015-04-10 22:36   ` Trond Myklebust
2015-04-10 22:36     ` Trond Myklebust
2015-04-11  0:02     ` Zach Brown
2015-04-11  0:24       ` Trond Myklebust
2015-04-11 13:04         ` Jeff Layton [this message]
2015-04-13 16:32           ` Zach Brown
2015-04-14 16:53           ` Christoph Hellwig
2015-04-14 16:58             ` Christoph Hellwig
2015-04-14 17:16             ` Anna Schumaker
2015-04-14 17:16               ` Anna Schumaker
2015-04-14 17:16               ` Anna Schumaker
2015-04-14 18:19               ` J. Bruce Fields
2015-04-14 18:19                 ` J. Bruce Fields
2015-04-14 18:22                 ` Zach Brown
2015-04-14 18:22                   ` Zach Brown
2015-04-14 18:29                   ` J. Bruce Fields
2015-04-14 18:29                     ` J. Bruce Fields
2015-04-14 18:54                     ` Zach Brown
2015-04-14 18:54                       ` Zach Brown
2015-04-14 19:23                       ` Christoph Hellwig
2015-04-14 19:23                         ` Christoph Hellwig
2015-04-14 20:04                         ` Zach Brown
2015-04-14 20:04                           ` Zach Brown
2015-04-10 23:01   ` Andreas Dilger
2015-04-10 22:00 ` [PATCH RFC 2/3] x86: add sys_copy_file_range to syscall tables Zach Brown
2015-04-10 22:00 ` [PATCH RFC 3/3] btrfs: add .copy_file_range file operation Zach Brown
2015-04-14 17:08   ` Chris Mason
2015-04-14 17:08     ` Chris Mason
2015-05-06  6:15 ` [PATCH RFC 0/3] simple copy offloading system call Michael Kerrisk
2015-05-06  6:15   ` Michael Kerrisk
2015-05-07  2:52   ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150411090402.67d22d02@tlielax.poochiereds.net \
    --to=jlayton@poochiereds.net \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=trond.myklebust@primarydata.com \
    --cc=zab@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.