All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Zach Brown <zab@redhat.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
	Trond Myklebust <Trond.Myklebust@netapp.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-btrfs@vger.kernel.org, linux-nfs@vger.kernel.org
Subject: Re: [RFC v0 0/4] sys_copy_range() rough draft
Date: Wed, 15 May 2013 07:42:51 +1000	[thread overview]
Message-ID: <20130514214251.GK29466@dastard> (raw)
In-Reply-To: <1368566126-17610-1-git-send-email-zab@redhat.com>

On Tue, May 14, 2013 at 02:15:22PM -0700, Zach Brown wrote:
> We've been talking about implementing some form of bulk data copy
> offloading for a while now.  BTRFS and OCFS2 implement forms of copy
> offloading with ioctls, NFS 4.2 will include a byte-granular COPY
> operation, and the SCSI XCOPY command is being implemented now that
> Windows can issue it.
> 
> In the past we've discussed promoting the ocfs2 reflink ioctl into a
> system call that would create a new file and implicitly copy the
> source data into the new file:
> https://lkml.org/lkml/2009/9/14/481
> 
> These draft patches take the simpler approach of only copying data
> between existing files.  The patches 1) make a system call out of the
> btrfs CLONE_RANGE ioctl, 2) implement the btrfs .copy_range method with
> the ioctl's guts, 3) implement the nfs .copy_range by sending a COPY
> op, and 4) serve the COPY op in nfsd by calling the .copy_range method
> again.
> 
> The nfs patch is an untested hack.  I'm happy to beat it in to shape
> but I'll need some guidance.
> 
> I'd like strong review feedback on the interfaces, here are some
> possible topics:
> 
> a) Hopefully being able to specify a portion of the data to copy will
> avoid *huge* syscall latencies and the motivation for new async
> semantics.
> 
> b) The BTRFS ioctl and nfs COPY let you specify a count of 0 to copy
> from the start offset to the end of the file.  Does anyone have a
> strong feeling about this?  I'm leaning towards not bothering with it
> in the syscall interface.
> 
> c) I chose to return partial progess in the ssize_t return code.  This
> limits the length of the range and the size_t count argument can be too
> large and return errors, much like other io syscalls.  This seemed
> less awful than some extra argument with a pointer to a status value.
> 
> d) I'm dreading mentioning a vector of ranges to copy in one syscall
> because I don't want to think about overlaping ranges and file systems
> that use range locks -- xfs for now, but more if Jan gets his way.

XFS doesn't use range locks (yet).

> I'd rather that we get some experience with this simpler syscall before
> taking on that headache.
> 
> I'm sure I'm forgetting some other details.
> 
> I'm going to keep hacking away at this.  My next step is to get ext4
> supporting .copy_range, probably with a quick hack to copy the
> contents of bios.  Hopefully that'll give enough time to also integrate
> review feedback.

Wouldn't the easiest "support all filesystems" hack just be to add
a destination offset parameter to do_splice_direct() and call that
when the filesystem doesn't supply a ->copy_range method? i.e. use
the mechanisms we already have for copying from one file to another
via the page cache as efficiently as possible?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>
To: Zach Brown <zab-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: "Martin K. Petersen"
	<martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	Trond Myklebust
	<Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [RFC v0 0/4] sys_copy_range() rough draft
Date: Wed, 15 May 2013 07:42:51 +1000	[thread overview]
Message-ID: <20130514214251.GK29466@dastard> (raw)
In-Reply-To: <1368566126-17610-1-git-send-email-zab-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Tue, May 14, 2013 at 02:15:22PM -0700, Zach Brown wrote:
> We've been talking about implementing some form of bulk data copy
> offloading for a while now.  BTRFS and OCFS2 implement forms of copy
> offloading with ioctls, NFS 4.2 will include a byte-granular COPY
> operation, and the SCSI XCOPY command is being implemented now that
> Windows can issue it.
> 
> In the past we've discussed promoting the ocfs2 reflink ioctl into a
> system call that would create a new file and implicitly copy the
> source data into the new file:
> https://lkml.org/lkml/2009/9/14/481
> 
> These draft patches take the simpler approach of only copying data
> between existing files.  The patches 1) make a system call out of the
> btrfs CLONE_RANGE ioctl, 2) implement the btrfs .copy_range method with
> the ioctl's guts, 3) implement the nfs .copy_range by sending a COPY
> op, and 4) serve the COPY op in nfsd by calling the .copy_range method
> again.
> 
> The nfs patch is an untested hack.  I'm happy to beat it in to shape
> but I'll need some guidance.
> 
> I'd like strong review feedback on the interfaces, here are some
> possible topics:
> 
> a) Hopefully being able to specify a portion of the data to copy will
> avoid *huge* syscall latencies and the motivation for new async
> semantics.
> 
> b) The BTRFS ioctl and nfs COPY let you specify a count of 0 to copy
> from the start offset to the end of the file.  Does anyone have a
> strong feeling about this?  I'm leaning towards not bothering with it
> in the syscall interface.
> 
> c) I chose to return partial progess in the ssize_t return code.  This
> limits the length of the range and the size_t count argument can be too
> large and return errors, much like other io syscalls.  This seemed
> less awful than some extra argument with a pointer to a status value.
> 
> d) I'm dreading mentioning a vector of ranges to copy in one syscall
> because I don't want to think about overlaping ranges and file systems
> that use range locks -- xfs for now, but more if Jan gets his way.

XFS doesn't use range locks (yet).

> I'd rather that we get some experience with this simpler syscall before
> taking on that headache.
> 
> I'm sure I'm forgetting some other details.
> 
> I'm going to keep hacking away at this.  My next step is to get ext4
> supporting .copy_range, probably with a quick hack to copy the
> contents of bios.  Hopefully that'll give enough time to also integrate
> review feedback.

Wouldn't the easiest "support all filesystems" hack just be to add
a destination offset parameter to do_splice_direct() and call that
when the filesystem doesn't supply a ->copy_range method? i.e. use
the mechanisms we already have for copying from one file to another
via the page cache as efficiently as possible?

Cheers,

Dave.
-- 
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2013-05-14 21:42 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-14 21:15 [RFC v0 0/4] sys_copy_range() rough draft Zach Brown
2013-05-14 21:15 ` [RFC v0 1/4] vfs: add copy_range syscall and vfs entry point Zach Brown
2013-05-15 19:44   ` Eric Wong
2013-05-15 20:03     ` Zach Brown
2013-05-16 21:16       ` Ric Wheeler
2013-05-21 19:47       ` Eric Wong
2013-05-21 19:50         ` Zach Brown
2013-05-14 21:15 ` [RFC v0 2/4] x86: add sys_copy_range to syscall tables Zach Brown
2013-05-14 21:15 ` [RFC v0 3/4] btrfs: add .copy_range file operation Zach Brown
2013-05-14 21:15 ` [RFC v0 4/4] nfs, nfsd: rough sys_copy_range and COPY support Zach Brown
2013-05-14 21:15   ` Zach Brown
2013-05-15 20:19   ` J. Bruce Fields
2013-05-15 20:19     ` J. Bruce Fields
2013-05-15 20:21     ` Myklebust, Trond
2013-05-15 20:21       ` Myklebust, Trond
2013-05-15 20:24       ` J. Bruce Fields
2013-05-14 21:42 ` Dave Chinner [this message]
2013-05-14 21:42   ` [RFC v0 0/4] sys_copy_range() rough draft Dave Chinner
2013-05-14 22:04   ` Zach Brown
2013-05-15  1:01     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130514214251.GK29466@dastard \
    --to=david@fromorbit.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=zab@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.