linux-cifs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Steve French <smfrench@gmail.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>,
	Luis Henriques <lhenriques@suse.de>,
	Trond Myklebust <trondmy@hammerspace.com>,
	"samba-technical@lists.samba.org"
	<samba-technical@lists.samba.org>,
	"drinkcat@chromium.org" <drinkcat@chromium.org>,
	"iant@google.com" <iant@google.com>,
	"linux-cifs@vger.kernel.org" <linux-cifs@vger.kernel.org>,
	"darrick.wong@oracle.com" <darrick.wong@oracle.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"jlayton@kernel.org" <jlayton@kernel.org>,
	"llozano@chromium.org" <llozano@chromium.org>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"miklos@szeredi.hu" <miklos@szeredi.hu>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"dchinner@redhat.com" <dchinner@redhat.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"sfrench@samba.org" <sfrench@samba.org>,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: [PATCH v2] vfs: prevent copy_file_range to copy across devices
Date: Wed, 17 Feb 2021 10:08:49 +0200	[thread overview]
Message-ID: <CAOQ4uxii=7KUKv1w32VbjkwS+Z1a0ge0gezNzpn_BiY6MFWkpA@mail.gmail.com> (raw)
In-Reply-To: <CAH2r5msmtuk0f8FuZxdRBQ2d_VKXexctcP0OmnLXoDEBien-nQ@mail.gmail.com>

On Tue, Feb 16, 2021 at 11:15 PM Steve French <smfrench@gmail.com> wrote:
>
> On Tue, Feb 16, 2021 at 1:40 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Tue, Feb 16, 2021 at 9:31 PM Steve French <smfrench@gmail.com> wrote:
> > >
> > > On Tue, Feb 16, 2021 at 1:29 PM Anna Schumaker
> > > <anna.schumaker@netapp.com> wrote:
> > > >
> > > > On Tue, Feb 16, 2021 at 2:22 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > >
> > > > > On Tue, Feb 16, 2021 at 8:54 PM Luis Henriques <lhenriques@suse.de> wrote:
> > > > > >
> > > > > > Amir Goldstein <amir73il@gmail.com> writes:
> > > > > >
> > > > > > > On Tue, Feb 16, 2021 at 6:41 PM Luis Henriques <lhenriques@suse.de> wrote:
> > > > > > >>
> > > > > > >> Amir Goldstein <amir73il@gmail.com> writes:
> > > > > > >>
> > > > > > >> >> Ugh.  And I guess overlayfs may have a similar problem.
> > > > > > >> >
> > > > > > >> > Not exactly.
> > > > > > >> > Generally speaking, overlayfs should call vfs_copy_file_range()
> > > > > > >> > with the flags it got from layer above, so if called from nfsd it
> > > > > > >> > will allow cross fs copy and when called from syscall it won't.
> > > > > > >> >
> > > > > > >> > There are some corner cases where overlayfs could benefit from
> > > > > > >> > COPY_FILE_SPLICE (e.g. copy from lower file to upper file), but
> > > > > > >> > let's leave those for now. Just leave overlayfs code as is.
> > > > > > >>
> > > > > > >> Got it, thanks for clarifying.
> > > > > > >>
> > > > > > >> >> > This is easy to solve with a flag COPY_FILE_SPLICE (or something) that
> > > > > > >> >> > is internal to kernel users.
> > > > > > >> >> >
> > > > > > >> >> > FWIW, you may want to look at the loop in ovl_copy_up_data()
> > > > > > >> >> > for improvements to nfsd_copy_file_range().
> > > > > > >> >> >
> > > > > > >> >> > We can move the check out to copy_file_range syscall:
> > > > > > >> >> >
> > > > > > >> >> >         if (flags != 0)
> > > > > > >> >> >                 return -EINVAL;
> > > > > > >> >> >
> > > > > > >> >> > Leave the fallback from all filesystems and check for the
> > > > > > >> >> > COPY_FILE_SPLICE flag inside generic_copy_file_range().
> > > > > > >> >>
> > > > > > >> >> Ok, the diff bellow is just to make sure I understood your suggestion.
> > > > > > >> >>
> > > > > > >> >> The patch will also need to:
> > > > > > >> >>
> > > > > > >> >>  - change nfs and overlayfs calls to vfs_copy_file_range() so that they
> > > > > > >> >>    use the new flag.
> > > > > > >> >>
> > > > > > >> >>  - check flags in generic_copy_file_checks() to make sure only valid flags
> > > > > > >> >>    are used (COPY_FILE_SPLICE at the moment).
> > > > > > >> >>
> > > > > > >> >> Also, where should this flag be defined?  include/uapi/linux/fs.h?
> > > > > > >> >
> > > > > > >> > Grep for REMAP_FILE_
> > > > > > >> > Same header file, same Documentation rst file.
> > > > > > >> >
> > > > > > >> >>
> > > > > > >> >> Cheers,
> > > > > > >> >> --
> > > > > > >> >> Luis
> > > > > > >> >>
> > > > > > >> >> diff --git a/fs/read_write.c b/fs/read_write.c
> > > > > > >> >> index 75f764b43418..341d315d2a96 100644
> > > > > > >> >> --- a/fs/read_write.c
> > > > > > >> >> +++ b/fs/read_write.c
> > > > > > >> >> @@ -1383,6 +1383,13 @@ ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
> > > > > > >> >>                                 struct file *file_out, loff_t pos_out,
> > > > > > >> >>                                 size_t len, unsigned int flags)
> > > > > > >> >>  {
> > > > > > >> >> +       if (!(flags & COPY_FILE_SPLICE)) {
> > > > > > >> >> +               if (!file_out->f_op->copy_file_range)
> > > > > > >> >> +                       return -EOPNOTSUPP;
> > > > > > >> >> +               else if (file_out->f_op->copy_file_range !=
> > > > > > >> >> +                        file_in->f_op->copy_file_range)
> > > > > > >> >> +                       return -EXDEV;
> > > > > > >> >> +       }
> > > > > > >> >
> > > > > > >> > That looks strange, because you are duplicating the logic in
> > > > > > >> > do_copy_file_range(). Maybe better:
> > > > > > >> >
> > > > > > >> > if (WARN_ON_ONCE(flags & ~COPY_FILE_SPLICE))
> > > > > > >> >         return -EINVAL;
> > > > > > >> > if (flags & COPY_FILE_SPLICE)
> > > > > > >> >        return do_splice_direct(file_in, &pos_in, file_out, &pos_out,
> > > > > > >> >                                  len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0);
> > > > > > >>
> > > > > > >> My initial reasoning for duplicating the logic in do_copy_file_range() was
> > > > > > >> to allow the generic_copy_file_range() callers to be left unmodified and
> > > > > > >> allow the filesystems to default to this implementation.
> > > > > > >>
> > > > > > >> With this change, I guess that the calls to generic_copy_file_range() from
> > > > > > >> the different filesystems can be dropped, as in my initial patch, as they
> > > > > > >> will always get -EINVAL.  The other option would be to set the
> > > > > > >> COPY_FILE_SPLICE flag in those calls, but that would get us back to the
> > > > > > >> problem we're trying to solve.
> > > > > > >
> > > > > > > I don't understand the problem.
> > > > > > >
> > > > > > > What exactly is wrong with the code I suggested?
> > > > > > > Why should any filesystem be changed?
> > > > > > >
> > > > > > > Maybe I am missing something.
> > > > > >
> > > > > > Ok, I have to do a full brain reboot and start all over.
> > > > > >
> > > > > > Before that, I picked the code you suggested and tested it.  I've mounted
> > > > > > a cephfs filesystem and used xfs_io to execute a 'copy_range' command
> > > > > > using /sys/kernel/debug/sched_features as source.  The result was a
> > > > > > 0-sized file in cephfs.  And the reason is thevfs_copy_file_range()
> > > > > > early exit in:
> > > > > >
> > > > > >         if (len == 0)
> > > > > >                 return 0;
> > > > > >
> > > > > > 'len' is set in generic_copy_file_checks().
> > > > >
> > > > > Good point.. I guess we will need to do all the checks earlier in
> > > > > generic_copy_file_checks() including the logic of:
> > > > >
> > > > >         if (file_in->f_op->remap_file_range &&
> > > > >             file_inode(file_in)->i_sb == file_inode(file_out)->i_sb)
> > > > >
> > > > >
> > > > > >
> > > > > > This means that we're not solving the original problem anymore (probably
> > > > > > since v1 of this patch, haven't checked).
> > > > > >
> > > > > > Also, re-reading Trond's emails, I read: "... also disallowing the copy
> > > > > > from, say, an XFS formatted partition to an ext4 partition".  Isn't that
> > > > > > *exactly* what we're trying to do here?  I.e. _prevent_ these copies from
> > > > > > happening so that tracefs files can't be CFR'ed?
> > > > > >
> > > > >
> > > > > We want to address the report which means calls coming from
> > > > > copy_file_range() syscall.
> > > > >
> > > > > Trond's use case is vfs_copy_file_range() coming from nfsd.
> > > > > When he writes about copy from XFS to ext4, he means an
> > > > > NFS client is issuing server side copy (on same or different NFS mounts)
> > > > > and the NFS server is executing nfsd_copy_file_range() on a source
> > > > > file that happens to be on XFS and destination happens to be on ext4.
> > > >
> > > > NFS also supports a server-to-server copy where the destination server
> > > > mounts the source server and reads the data to be copied. Please don't
> > > > break that either :)
> > >
> >
> > As long as the copy is via nfsd_copy_file_range() and not from the syscall
> > it should not regress.
> >
> > > This is a case we will eventually need to support for cifs (SMB3) as well.
> > >
> >
> > samba already does server side copy very well without needing any support
> > from the kernel.
> >
> > nfsd also doesn't *need* to use vfs_copy_file_range() it can use kernel APIs
> > like the loop in ovl_copy_up_data(). But it does, so we should not regress it.
> >
> > samba/nfsd can try to use copy_file_range() and it will work if the
> > source/target
> > fs support it. Otherwise, the server can perfectly well do the copy via other
> > available interfaces, just like userspace copy tools.
>
> I was thinking about cifsd ("ksmbd") the kernel server from
> Namjae/Sergey etc. which is making excellent progress.
>

You are missing my point.
Never mind which server. The server does not *need* to rely on
vfs_copy_file_range() to copy files from XFS to ext4.
The server is very capable of implementing the fallback generic copy
in case source/target fs do not support native {copy,remap}_file_range().

w.r.t semantics of copy_file_range() syscall vs. the fallback to userespace
'cp' tool (check source file size before copy or not), please note that the
semantics of CIFS_IOC_COPYCHUNK_FILE are that of the former:

        rc = cifs_file_copychunk_range(xid, src_file.file, 0, dst_file, 0,
                                        src_inode->i_size, 0);

It will copy zero bytes if advertised source file size if zero.

NFS server side copy semantics are currently de-facto the same
because both the client and the server will have to pass through this
line in vfs_copy_file_range():

        if (len == 0)
                return 0;

IMO, and this opinion was voiced by several other filesystem developers,
the shortend copy semantics are the correct semantics for copy_file_range()
syscall as well as for vfs_copy_file_range() for internal kernel users.

I guess what this means is that if the 'cp' tool ever tries an opportunistic
copy_file_range() syscall (e.g. --cfr=auto), it may result in zero size copy.

Thanks,
Amir.

  reply	other threads:[~2021-02-17  8:09 UTC|newest]

Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAOQ4uxiFGjdvX2-zh5o46pn7RZhvbGHH0wpzLPuPOom91FwWeQ@mail.gmail.com>
2021-02-15 15:43 ` [PATCH v2] vfs: prevent copy_file_range to copy across devices Luis Henriques
2021-02-15 16:02   ` Trond Myklebust
2021-02-16  0:25     ` Steve French
2021-02-15 16:34   ` Amir Goldstein
2021-02-15 16:53     ` Trond Myklebust
2021-02-15 17:24       ` Amir Goldstein
2021-02-15 18:57         ` Trond Myklebust
2021-02-15 19:43           ` Amir Goldstein
2021-02-16 11:17             ` Luis Henriques
2021-02-16 11:28               ` gregkh
2021-02-16 12:01                 ` Luis Henriques
2021-02-16 12:08                   ` Greg KH
2021-02-16 13:51               ` Amir Goldstein
2021-02-16 16:42                 ` Luis Henriques
2021-02-16 17:44                   ` Amir Goldstein
2021-02-16 18:55                     ` Luis Henriques
2021-02-16 19:20                       ` Amir Goldstein
2021-02-16 19:27                         ` Anna Schumaker
2021-02-16 19:31                           ` Steve French
2021-02-16 19:40                             ` Amir Goldstein
2021-02-16 21:15                               ` Steve French
2021-02-17  8:08                                 ` Amir Goldstein [this message]
2021-02-17 17:26                                   ` [PATCH v3] vfs: fix copy_file_range regression in cross-fs copies Luis Henriques
2021-02-17 20:47                                     ` Amir Goldstein
2021-02-18  0:56                                     ` Nicolas Boichat
2021-02-18  5:32                                     ` Olga Kornievskaia
2021-02-18  6:47                                       ` Amir Goldstein
2021-02-18 16:28                                         ` Olga Kornievskaia
2021-02-18  7:43                                     ` Christoph Hellwig
2021-02-18  0:50                                   ` [PATCH v2] vfs: prevent copy_file_range to copy across devices Andreas Dilger
2021-02-18  7:34                                     ` gregkh
2021-02-16 18:54                 ` Andreas Dilger
2021-02-17  4:45   ` Nicolas Boichat
2021-02-18  7:42   ` Christoph Hellwig
2021-02-18  9:10     ` Amir Goldstein
2021-02-18 10:29       ` Luis Henriques
2021-02-18 12:15         ` Luis Henriques
2021-02-18 12:49           ` Amir Goldstein
2021-02-18 14:36             ` [PATCH v4] vfs: fix copy_file_range regression in cross-fs copies Luis Henriques
2021-02-18 14:58               ` Amir Goldstein
2021-02-18 15:17                 ` [PATCH v5] " Luis Henriques
2021-02-18 15:53                   ` Amir Goldstein
2021-02-18 16:35                     ` Luis Henriques
2021-02-18 17:18                       ` [PATCH v6] " Luis Henriques
2021-02-19 21:18                         ` Olga Kornievskaia
2021-02-19 21:52                           ` Amir Goldstein
2021-02-21 19:58                           ` [PATCH v7] " Luis Henriques
2021-02-22  3:00                             ` Nicolas Boichat
2021-02-22 10:24                             ` [PATCH v8] " Luis Henriques
2021-02-22 10:46                               ` Amir Goldstein
2021-02-22 16:25                               ` dai.ngo
2021-02-23 10:32                                 ` Luis Henriques
2021-02-23 15:28                                   ` Amir Goldstein
2021-02-23 15:29                                   ` dai.ngo
2021-02-23 16:02                                     ` dai.ngo
2021-02-23 16:47                                       ` Amir Goldstein
2021-02-23 16:57                                         ` dai.ngo
     [not found]                                           ` <e3eed18b-fc7e-e687-608b-7f662017329c@oracle.com>
2021-02-23 17:33                                             ` Amir Goldstein
2021-02-24  0:13                                               ` dai.ngo
2021-02-23 17:56                                           ` Luis Henriques
2021-02-23 17:13                                       ` Olga Kornievskaia
2021-02-24  1:00                               ` Olga Kornievskaia
2021-02-24 10:23                                 ` Luis Henriques
2021-02-24 10:44                                   ` Nicolas Boichat
2021-04-09  5:23                                     ` Nicolas Boichat
2021-04-09 13:39                                       ` Luis Henriques
2021-04-09 13:50                                         ` Amir Goldstein
2021-04-23  4:40                                           ` Nicolas Boichat
2021-05-03  8:54                                             ` Luis Henriques
2021-02-24 14:23                               ` [PATCH] copy_file_range.2: Kernel v5.12 updates Luis Henriques
2021-02-24 16:10                                 ` Amir Goldstein
2021-02-25 10:21                                   ` Luis Henriques
2021-02-26 10:13                                     ` Alejandro Colomar (man-pages)
2021-02-26 10:34                                       ` Amir Goldstein
2021-02-26 11:15                                         ` Alejandro Colomar (man-pages)
2021-02-26 13:59                                           ` Jeff Layton
2021-02-26 21:26                                             ` Alejandro Colomar (man-pages)
2021-02-26 22:18                                   ` Alejandro Colomar (man-pages)
2021-02-27  5:41                                     ` Amir Goldstein
2021-02-27 12:20                                       ` Alejandro Colomar (man-pages)
2021-02-27 13:49                                         ` [RFC v2] copy_file_range.2: Update cross-filesystem support for 5.12 Alejandro Colomar
2021-02-27 16:00                                           ` Amir Goldstein
2021-02-27 23:08                                       ` [PATCH] copy_file_range.2: Kernel v5.12 updates Steve French
2021-02-28  7:35                                         ` Amir Goldstein
2021-02-28 22:25                                           ` Steve French
2021-03-01  6:18                                             ` Amir Goldstein
2021-03-01 14:41                                 ` [RFC v3] copy_file_range.2: Update cross-filesystem support for 5.12 Alejandro Colomar
2021-03-01 14:58                                   ` Amir Goldstein
2021-03-04  9:38                                 ` [RFC v4] " Alejandro Colomar
2021-03-04 17:13                                   ` Darrick J. Wong
2021-03-04 18:24                                     ` Alejandro Colomar (man-pages)
2021-03-04 23:50                                       ` Darrick J. Wong
2021-02-18 20:41       ` [PATCH v2] vfs: prevent copy_file_range to copy across devices Steve French

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOQ4uxii=7KUKv1w32VbjkwS+Z1a0ge0gezNzpn_BiY6MFWkpA@mail.gmail.com' \
    --to=amir73il@gmail.com \
    --cc=anna.schumaker@netapp.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=darrick.wong@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=drinkcat@chromium.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=iant@google.com \
    --cc=jlayton@kernel.org \
    --cc=lhenriques@suse.de \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=llozano@chromium.org \
    --cc=miklos@szeredi.hu \
    --cc=samba-technical@lists.samba.org \
    --cc=sfrench@samba.org \
    --cc=smfrench@gmail.com \
    --cc=trondmy@hammerspace.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).