Linux-NFS Archive on lore.kernel.org
 help / color / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	Olga Kornievskaia <olga.kornievskaia@gmail.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	overlayfs <linux-unionfs@vger.kernel.org>,
	ceph-devel@vger.kernel.org, linux-cifs@vger.kernel.org,
	Miklos Szeredi <miklos@szeredi.hu>
Subject: Re: [PATCH 03/11] vfs: no fallback for ->copy_file_range
Date: Fri, 7 Dec 2018 08:30:58 +1100
Message-ID: <20181206213058.GY6311@dastard> (raw)
In-Reply-To: <CAOQ4uxgqaGoVs9t9NqWn-uVyeeZzZOUBSYX6=JFn2uCf3rKmUQ@mail.gmail.com>

On Thu, Dec 06, 2018 at 06:16:46AM +0200, Amir Goldstein wrote:
> On Tue, Dec 4, 2018 at 1:02 AM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Mon, Dec 03, 2018 at 12:22:21PM +0200, Amir Goldstein wrote:
> > > On Mon, Dec 3, 2018 at 10:34 AM Dave Chinner <david@fromorbit.com> wrote:
> > > >
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > >
> > > > Now that we have generic_copy_file_range(), remove it as a fallback
> > > > case when offloads fail. This puts the responsibility for executing
> > > > fallbacks on the filesystems that implement ->copy_file_range and
> > > > allows us to add operational validity checks to
> > > > generic_copy_file_range().
> > > >
> > > > Rework vfs_copy_file_range() to call a new do_copy_file_range()
> > > > helper to exceute the copying callout, and move calls to
> > > > generic_file_copy_range() into filesystem methods where they
> > > > currently return failures.
> > > >
> > > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > >
> > > You may add
> > > Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> > >
> > > After fixing the overlayfs issue below.
> > > ...
> > >
> > > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > > > index 84dd957efa24..68736e5d6a56 100644
> > > > --- a/fs/overlayfs/file.c
> > > > +++ b/fs/overlayfs/file.c
> > > > @@ -486,8 +486,15 @@ static ssize_t ovl_copy_file_range(struct file *file_in, loff_t pos_in,
> > > >                                    struct file *file_out, loff_t pos_out,
> > > >                                    size_t len, unsigned int flags)
> > > >  {
> > > > -       return ovl_copyfile(file_in, pos_in, file_out, pos_out, len, flags,
> > > > +       ssize_t ret;
> > > > +
> > > > +       ret =  ovl_copyfile(file_in, pos_in, file_out, pos_out, len, flags,
> > > >                             OVL_COPY);
> > > > +
> > > > +       if (ret == -EOPNOTSUPP)
> > > > +               ret = generic_copy_file_range(file_in, pos_in, file_out,
> > > > +                                       pos_out, len, flags);
> > > > +       return ret;
> > > >  }
> > > >
> > >
> > > This is unneeded, because ovl_copyfile(OVL_COPY) is implemented
> > > by calling vfs_copy_file_range() (on the underlying files) and it is
> > > not possible
> > > to get EOPNOTSUPP from vfs_copy_file_range().
> >
> > Except that it is possible. e.g. If the underlying filesystem tries
> > a copy offload, gets a "not supported" failure from the remote
> > server and then doesn't implement a fallback.
> >
> 
> I'm in the opinion that ovl_copy_file_range() and do_copy_file_range()
> are a like. If you choose not to fallback in the latter to
> generic_copy_file_range() for misbehaving filesystem and WARN_ON
> this case, there is no reason for overlayfs to cover up for the
> misbehaving underlying filesystem.
> 
> If you want to cover up for misbehaving filesystem, please do it
> in do_copy_file_range() and drop the WARN_ON_ONCE().
> Come to think about it, I understand your reasoning for pushing
> generic_copy_file_range() down to filesystems so they can fallback to
> it in several error conditions.
> I do not follow the reasoning of NOT falling back to
> generic_copy_file_range() in vfs if EOPNOTSUPP is returned from
> filesystem. IOW, if we want to cover up for misbehaving filesystem,
> this would have been a more robust code:

Since when have we defined a filesystem returning -EOPNOTSUPP as a
"misbehaving filesystem"? Userspace has to handle errors in
copy_file_range() with it's own fallback copy code (i.e. it cannot
rely on the kernel actually supporting copy_file_range at all).
Hence it's perfectly fine for a filesystem implementation to encode
"offload or fail entirely" semantics if they want.

Yes, I've been shouted at by developers quite recently who
*demanded* that copy_file_range (and other offloads like
fallocate(ZERO_RANGE)) *fail* if they cannot "offload" the operation
to make it "fast". The application developers want to use different
algorithms if the kernel offload isn't any faster than userspace
doing the dumb thing and phsyically pushing bytes around itself.

I've pushed back on this as much as I can, but it doesn't change the
fact that for many situations doing do_splice_direct() is exactly
the wrong thing to do (e.g. because copy_file_range() on a TB+ scale
file couldn't be offloaded by the filesystem because the server said
EOPNOTSUPP)

IOWs, for some filesystems or situations where it makes sense to
have fail-fast semantics and leave the decision of what to do next
in the hands of the userspace application that has the context
necessary to determine what the best action to take is.  And to do
that, we need to give control of the fallback to the filesystems.

Flexibility is what is needed here, not a dumb, hard coded "the VFS
always know what's right for you" policy that triggers when nobody
really wants it to.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply index

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-03  8:34 [PATCH 0/11] fs: fixes for major copy_file_range() issues Dave Chinner
2018-12-03  8:34 ` [PATCH 01/11] vfs: copy_file_range source range over EOF should fail Dave Chinner
2018-12-03 12:46   ` Amir Goldstein
2018-12-04 15:13     ` Christoph Hellwig
2018-12-04 21:29       ` Dave Chinner
2018-12-04 21:47         ` Olga Kornievskaia
2018-12-04 22:31           ` Dave Chinner
2018-12-05 16:51             ` bfields
2019-05-20  9:10             ` Amir Goldstein
2019-05-20 13:12               ` Olga Kornievskaia
2019-05-20 13:36                 ` Amir Goldstein
2019-05-20 13:58                   ` Olga Kornievskaia
2019-05-20 14:02                     ` Amir Goldstein
2018-12-05 14:12         ` Christoph Hellwig
2018-12-05 21:08           ` Dave Chinner
2018-12-05 21:30             ` Christoph Hellwig
2018-12-03  8:34 ` [PATCH 02/11] vfs: introduce generic_copy_file_range() Dave Chinner
2018-12-03 10:03   ` Amir Goldstein
2018-12-03 23:00     ` Dave Chinner
2018-12-04 15:14   ` Christoph Hellwig
2018-12-03  8:34 ` [PATCH 03/11] vfs: no fallback for ->copy_file_range Dave Chinner
2018-12-03 10:22   ` Amir Goldstein
2018-12-03 23:02     ` Dave Chinner
2018-12-06  4:16       ` Amir Goldstein
2018-12-06 21:30         ` Dave Chinner [this message]
2018-12-07  5:38           ` Amir Goldstein
2018-12-03 18:23   ` Anna Schumaker
2018-12-04 15:16   ` Christoph Hellwig
2018-12-03  8:34 ` [PATCH 04/11] vfs: add missing checks to copy_file_range Dave Chinner
2018-12-03 12:42   ` Amir Goldstein
2018-12-03 19:04   ` Darrick J. Wong
2018-12-03 21:33   ` Olga Kornievskaia
2018-12-03 23:04     ` Dave Chinner
2018-12-04 15:18   ` Christoph Hellwig
2018-12-12 11:31   ` Luis Henriques
2018-12-12 16:42     ` Darrick J. Wong
2018-12-12 18:55     ` Olga Kornievskaia
2018-12-12 19:42       ` Matthew Wilcox
2018-12-12 20:22         ` Olga Kornievskaia
2018-12-13 10:29           ` Luis Henriques
2018-12-03  8:34 ` [PATCH 05/11] vfs: use inode_permission in copy_file_range() Dave Chinner
2018-12-03 12:47   ` Amir Goldstein
2018-12-03 18:18   ` Darrick J. Wong
2018-12-03 23:55     ` Dave Chinner
2018-12-05 17:28       ` bfields
2018-12-03 18:53   ` Eric Biggers
2018-12-04 15:19   ` Christoph Hellwig
2018-12-03  8:34 ` [PATCH 06/11] vfs: copy_file_range needs to strip setuid bits Dave Chinner
2018-12-03 12:51   ` Amir Goldstein
2018-12-04 15:21   ` Christoph Hellwig
2018-12-03  8:34 ` [PATCH 07/11] vfs: copy_file_range should update file timestamps Dave Chinner
2018-12-03 10:47   ` Amir Goldstein
2018-12-03 17:33     ` Olga Kornievskaia
2018-12-03 18:22       ` Darrick J. Wong
2018-12-03 23:19     ` Dave Chinner
2018-12-04 15:24   ` Christoph Hellwig
2018-12-03  8:34 ` [PATCH 08/11] vfs: push EXDEV check down into ->remap_file_range Dave Chinner
2018-12-03 11:04   ` Amir Goldstein
2018-12-03 19:11     ` Darrick J. Wong
2018-12-03 23:37       ` Dave Chinner
2018-12-03 23:58         ` Darrick J. Wong
2018-12-04  9:17           ` Amir Goldstein
2018-12-03 23:34     ` Dave Chinner
2018-12-03 18:24   ` Darrick J. Wong
2018-12-04  8:18   ` Olga Kornievskaia
2018-12-03  8:34 ` [PATCH 09/11] vfs: push copy_file_ranges -EXDEV checks down Dave Chinner
2018-12-03 12:36   ` Amir Goldstein
2018-12-03 17:58   ` Olga Kornievskaia
2018-12-03 18:53   ` Anna Schumaker
2018-12-03 19:27     ` Olga Kornievskaia
2018-12-03 23:40     ` Dave Chinner
2018-12-04 15:43   ` Christoph Hellwig
2018-12-04 22:18     ` Dave Chinner
2018-12-04 23:33       ` Olga Kornievskaia
2018-12-05 14:09       ` Christoph Hellwig
2018-12-05 17:01         ` Olga Kornievskaia
2018-12-03  8:34 ` [PATCH 10/11] vfs: allow generic_copy_file_range to copy across devices Dave Chinner
2018-12-03 12:54   ` Amir Goldstein
2018-12-03  8:34 ` [PATCH 11/11] ovl: allow cross-device copy_file_range calls Dave Chinner
2018-12-03 12:55   ` Amir Goldstein
2018-12-03  8:39 ` [PATCH 12/11] man-pages: copy_file_range updates Dave Chinner
2018-12-03 13:05   ` Amir Goldstein
2019-05-21  5:52   ` Amir Goldstein

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181206213058.GY6311@dastard \
    --to=david@fromorbit.com \
    --cc=amir73il@gmail.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=olga.kornievskaia@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nfs/0 linux-nfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nfs linux-nfs/ https://lore.kernel.org/linux-nfs \
		linux-nfs@vger.kernel.org linux-nfs@archiver.kernel.org
	public-inbox-index linux-nfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-nfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox