Linux-NFS Archive on lore.kernel.org
 help / color / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	Olga Kornievskaia <olga.kornievskaia@gmail.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	overlayfs <linux-unionfs@vger.kernel.org>,
	ceph-devel@vger.kernel.org, linux-cifs@vger.kernel.org,
	Miklos Szeredi <miklos@szeredi.hu>
Subject: Re: [PATCH 03/11] vfs: no fallback for ->copy_file_range
Date: Fri, 7 Dec 2018 07:38:36 +0200
Message-ID: <CAOQ4uxjBadFK1MLj0uQvKkQYL=NmK1zqx29yASMH9bwMsTrmFw@mail.gmail.com> (raw)
In-Reply-To: <20181206213058.GY6311@dastard>

On Thu, Dec 6, 2018 at 11:31 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Thu, Dec 06, 2018 at 06:16:46AM +0200, Amir Goldstein wrote:
> > On Tue, Dec 4, 2018 at 1:02 AM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > On Mon, Dec 03, 2018 at 12:22:21PM +0200, Amir Goldstein wrote:
> > > > On Mon, Dec 3, 2018 at 10:34 AM Dave Chinner <david@fromorbit.com> wrote:
> > > > >
> > > > > From: Dave Chinner <dchinner@redhat.com>
> > > > >
> > > > > Now that we have generic_copy_file_range(), remove it as a fallback
> > > > > case when offloads fail. This puts the responsibility for executing
> > > > > fallbacks on the filesystems that implement ->copy_file_range and
> > > > > allows us to add operational validity checks to
> > > > > generic_copy_file_range().
> > > > >
> > > > > Rework vfs_copy_file_range() to call a new do_copy_file_range()
> > > > > helper to exceute the copying callout, and move calls to
> > > > > generic_file_copy_range() into filesystem methods where they
> > > > > currently return failures.
> > > > >
> > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > >
> > > > You may add
> > > > Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> > > >
> > > > After fixing the overlayfs issue below.
> > > > ...
> > > >
> > > > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > > > > index 84dd957efa24..68736e5d6a56 100644
> > > > > --- a/fs/overlayfs/file.c
> > > > > +++ b/fs/overlayfs/file.c
> > > > > @@ -486,8 +486,15 @@ static ssize_t ovl_copy_file_range(struct file *file_in, loff_t pos_in,
> > > > >                                    struct file *file_out, loff_t pos_out,
> > > > >                                    size_t len, unsigned int flags)
> > > > >  {
> > > > > -       return ovl_copyfile(file_in, pos_in, file_out, pos_out, len, flags,
> > > > > +       ssize_t ret;
> > > > > +
> > > > > +       ret =  ovl_copyfile(file_in, pos_in, file_out, pos_out, len, flags,
> > > > >                             OVL_COPY);
> > > > > +
> > > > > +       if (ret == -EOPNOTSUPP)
> > > > > +               ret = generic_copy_file_range(file_in, pos_in, file_out,
> > > > > +                                       pos_out, len, flags);
> > > > > +       return ret;
> > > > >  }
> > > > >
> > > >
> > > > This is unneeded, because ovl_copyfile(OVL_COPY) is implemented
> > > > by calling vfs_copy_file_range() (on the underlying files) and it is
> > > > not possible
> > > > to get EOPNOTSUPP from vfs_copy_file_range().
> > >
> > > Except that it is possible. e.g. If the underlying filesystem tries
> > > a copy offload, gets a "not supported" failure from the remote
> > > server and then doesn't implement a fallback.
> > >
> >
> > I'm in the opinion that ovl_copy_file_range() and do_copy_file_range()
> > are a like. If you choose not to fallback in the latter to
> > generic_copy_file_range() for misbehaving filesystem and WARN_ON
> > this case, there is no reason for overlayfs to cover up for the
> > misbehaving underlying filesystem.
> >
> > If you want to cover up for misbehaving filesystem, please do it
> > in do_copy_file_range() and drop the WARN_ON_ONCE().
> > Come to think about it, I understand your reasoning for pushing
> > generic_copy_file_range() down to filesystems so they can fallback to
> > it in several error conditions.
> > I do not follow the reasoning of NOT falling back to
> > generic_copy_file_range() in vfs if EOPNOTSUPP is returned from
> > filesystem. IOW, if we want to cover up for misbehaving filesystem,
> > this would have been a more robust code:
>
> Since when have we defined a filesystem returning -EOPNOTSUPP as a
> "misbehaving filesystem"?

Since you wrote:

WARN_ON_ONCE(ret == -EOPNOTSUPP);

If filesystem is allowed to return EOPNOTSUPP from ->copy_file_range()
then what is this warning about?

> Userspace has to handle errors in
> copy_file_range() with it's own fallback copy code (i.e. it cannot
> rely on the kernel actually supporting copy_file_range at all).
> Hence it's perfectly fine for a filesystem implementation to encode
> "offload or fail entirely" semantics if they want.
>
> Yes, I've been shouted at by developers quite recently who
> *demanded* that copy_file_range (and other offloads like
> fallocate(ZERO_RANGE)) *fail* if they cannot "offload" the operation
> to make it "fast". The application developers want to use different
> algorithms if the kernel offload isn't any faster than userspace
> doing the dumb thing and phsyically pushing bytes around itself.
>
> I've pushed back on this as much as I can, but it doesn't change the
> fact that for many situations doing do_splice_direct() is exactly
> the wrong thing to do (e.g. because copy_file_range() on a TB+ scale
> file couldn't be offloaded by the filesystem because the server said
> EOPNOTSUPP)
>
> IOWs, for some filesystems or situations where it makes sense to
> have fail-fast semantics and leave the decision of what to do next
> in the hands of the userspace application that has the context
> necessary to determine what the best action to take is.  And to do
> that, we need to give control of the fallback to the filesystems.
>
> Flexibility is what is needed here, not a dumb, hard coded "the VFS
> always know what's right for you" policy that triggers when nobody
> really wants it to.
>

You misunderstood me.
Please remove the fallback to generic_copy_file_range() in
ovl_copy_file_range() as I requested in initial review for the exact
same reasons that you list above.

The overlayfs implementation of ovl_copy_file_range() is just
handing over the call to underlying vfs_copy_file_range().
If the latter is expected to return EOPNOTSUPP, so does the
overlayfs implementation.

Thanks,
Amir.

  reply index

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-03  8:34 [PATCH 0/11] fs: fixes for major copy_file_range() issues Dave Chinner
2018-12-03  8:34 ` [PATCH 01/11] vfs: copy_file_range source range over EOF should fail Dave Chinner
2018-12-03 12:46   ` Amir Goldstein
2018-12-04 15:13     ` Christoph Hellwig
2018-12-04 21:29       ` Dave Chinner
2018-12-04 21:47         ` Olga Kornievskaia
2018-12-04 22:31           ` Dave Chinner
2018-12-05 16:51             ` bfields
2019-05-20  9:10             ` Amir Goldstein
2019-05-20 13:12               ` Olga Kornievskaia
2019-05-20 13:36                 ` Amir Goldstein
2019-05-20 13:58                   ` Olga Kornievskaia
2019-05-20 14:02                     ` Amir Goldstein
2018-12-05 14:12         ` Christoph Hellwig
2018-12-05 21:08           ` Dave Chinner
2018-12-05 21:30             ` Christoph Hellwig
2018-12-03  8:34 ` [PATCH 02/11] vfs: introduce generic_copy_file_range() Dave Chinner
2018-12-03 10:03   ` Amir Goldstein
2018-12-03 23:00     ` Dave Chinner
2018-12-04 15:14   ` Christoph Hellwig
2018-12-03  8:34 ` [PATCH 03/11] vfs: no fallback for ->copy_file_range Dave Chinner
2018-12-03 10:22   ` Amir Goldstein
2018-12-03 23:02     ` Dave Chinner
2018-12-06  4:16       ` Amir Goldstein
2018-12-06 21:30         ` Dave Chinner
2018-12-07  5:38           ` Amir Goldstein [this message]
2018-12-03 18:23   ` Anna Schumaker
2018-12-04 15:16   ` Christoph Hellwig
2018-12-03  8:34 ` [PATCH 04/11] vfs: add missing checks to copy_file_range Dave Chinner
2018-12-03 12:42   ` Amir Goldstein
2018-12-03 19:04   ` Darrick J. Wong
2018-12-03 21:33   ` Olga Kornievskaia
2018-12-03 23:04     ` Dave Chinner
2018-12-04 15:18   ` Christoph Hellwig
2018-12-12 11:31   ` Luis Henriques
2018-12-12 16:42     ` Darrick J. Wong
2018-12-12 18:55     ` Olga Kornievskaia
2018-12-12 19:42       ` Matthew Wilcox
2018-12-12 20:22         ` Olga Kornievskaia
2018-12-13 10:29           ` Luis Henriques
2018-12-03  8:34 ` [PATCH 05/11] vfs: use inode_permission in copy_file_range() Dave Chinner
2018-12-03 12:47   ` Amir Goldstein
2018-12-03 18:18   ` Darrick J. Wong
2018-12-03 23:55     ` Dave Chinner
2018-12-05 17:28       ` bfields
2018-12-03 18:53   ` Eric Biggers
2018-12-04 15:19   ` Christoph Hellwig
2018-12-03  8:34 ` [PATCH 06/11] vfs: copy_file_range needs to strip setuid bits Dave Chinner
2018-12-03 12:51   ` Amir Goldstein
2018-12-04 15:21   ` Christoph Hellwig
2018-12-03  8:34 ` [PATCH 07/11] vfs: copy_file_range should update file timestamps Dave Chinner
2018-12-03 10:47   ` Amir Goldstein
2018-12-03 17:33     ` Olga Kornievskaia
2018-12-03 18:22       ` Darrick J. Wong
2018-12-03 23:19     ` Dave Chinner
2018-12-04 15:24   ` Christoph Hellwig
2018-12-03  8:34 ` [PATCH 08/11] vfs: push EXDEV check down into ->remap_file_range Dave Chinner
2018-12-03 11:04   ` Amir Goldstein
2018-12-03 19:11     ` Darrick J. Wong
2018-12-03 23:37       ` Dave Chinner
2018-12-03 23:58         ` Darrick J. Wong
2018-12-04  9:17           ` Amir Goldstein
2018-12-03 23:34     ` Dave Chinner
2018-12-03 18:24   ` Darrick J. Wong
2018-12-04  8:18   ` Olga Kornievskaia
2018-12-03  8:34 ` [PATCH 09/11] vfs: push copy_file_ranges -EXDEV checks down Dave Chinner
2018-12-03 12:36   ` Amir Goldstein
2018-12-03 17:58   ` Olga Kornievskaia
2018-12-03 18:53   ` Anna Schumaker
2018-12-03 19:27     ` Olga Kornievskaia
2018-12-03 23:40     ` Dave Chinner
2018-12-04 15:43   ` Christoph Hellwig
2018-12-04 22:18     ` Dave Chinner
2018-12-04 23:33       ` Olga Kornievskaia
2018-12-05 14:09       ` Christoph Hellwig
2018-12-05 17:01         ` Olga Kornievskaia
2018-12-03  8:34 ` [PATCH 10/11] vfs: allow generic_copy_file_range to copy across devices Dave Chinner
2018-12-03 12:54   ` Amir Goldstein
2018-12-03  8:34 ` [PATCH 11/11] ovl: allow cross-device copy_file_range calls Dave Chinner
2018-12-03 12:55   ` Amir Goldstein
2018-12-03  8:39 ` [PATCH 12/11] man-pages: copy_file_range updates Dave Chinner
2018-12-03 13:05   ` Amir Goldstein
2019-05-21  5:52   ` Amir Goldstein

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOQ4uxjBadFK1MLj0uQvKkQYL=NmK1zqx29yASMH9bwMsTrmFw@mail.gmail.com' \
    --to=amir73il@gmail.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=olga.kornievskaia@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nfs/0 linux-nfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nfs linux-nfs/ https://lore.kernel.org/linux-nfs \
		linux-nfs@vger.kernel.org linux-nfs@archiver.kernel.org
	public-inbox-index linux-nfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-nfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox