From: "Luís Henriques" <lhenriques@suse.de>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
Olga Kornievskaia <olga.kornievskaia@gmail.com>,
Dave Chinner <david@fromorbit.com>,
"Darrick J . Wong" <darrick.wong@oracle.com>,
He Zhe <zhe.he@windriver.com>,
Paul Gortmaker <paul.gortmaker@windriver.com>,
linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org,
Nicolas Boichat <drinkcat@chromium.org>,
kernel test robot <oliver.sang@intel.com>
Subject: Re: [PATCH v13] vfs: fix copy_file_range regression in cross-fs copies
Date: Fri, 20 May 2022 11:33:00 +0100 [thread overview]
Message-ID: <87ilq0334z.fsf@brahms.olymp> (raw)
In-Reply-To: <20220520082111.2066400-1-amir73il@gmail.com> (Amir Goldstein's message of "Fri, 20 May 2022 11:21:11 +0300")
Amir Goldstein <amir73il@gmail.com> writes:
> From: Luis Henriques <lhenriques@suse.de>
>
> A regression has been reported by Nicolas Boichat, found while using the
> copy_file_range syscall to copy a tracefs file. Before commit
> 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the
> kernel would return -EXDEV to userspace when trying to copy a file across
> different filesystems. After this commit, the syscall doesn't fail anymore
> and instead returns zero (zero bytes copied), as this file's content is
> generated on-the-fly and thus reports a size of zero.
>
> Another regression has been reported by He Zhe and Paul Gortmaker -
> WARN_ON_ONCE(ret == -EOPNOTSUPP) can be triggered from userspace when
> copying from a sysfs file whose read operation may return -EOPNOTSUPP:
>
> xfs_io -f -c "copy_range /sys/class/net/lo/phys_switch_id" /tmp/foo
>
> Since we do not have test coverage for copy_file_range() between any
> two types of filesystems, the best way to avoid with this sort of issues
> in the future is for the kernel to be more picky about filesystems that
> are allowed to do copy_file_range().
>
> This patch restores some cross-filesystem copy restrictions that existed
> prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy across
> devices").
>
> Filesystems that implement copy_file_range() or remap_file_range() may
> still fall-back to the generic_copy_file_range() implementation when the
> filesystem cannot handle the copy, so kernel can maintain a consistent
> story about which filesystems support copy_file_range().
> That will allow userspace tools to make consistent desicions w.r.t using
> copy_file_range().
>
> nfsd is also modified to fall-back into generic_copy_file_range() in case
> vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV.
>
> Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices")
> Link: https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/
> Link: https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx+BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/
> Link: https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/
> Link: https://lore.kernel.org/linux-fsdevel/20210630161320.29006-1-lhenriques@suse.de/
> Reported-by: Nicolas Boichat <drinkcat@chromium.org>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Signed-off-by: Luis Henriques <lhenriques@suse.de>
> Fixes: 64bf5ff58dff ("vfs: no fallback for ->copy_file_range")
> Link: https://lore.kernel.org/linux-fsdevel/20f17f64-88cb-4e80-07c1-85cb96c83619@windriver.com/
> Reported-by: He Zhe <zhe.he@windriver.com>
> Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>
> Al,
>
> Please take this patch to fix several issues that started to pop up
> as userspace tools started using copy_file_range(), which could only
> get more popular in the future.
>
> I've decides to pick this up from Luis' v12 last year [1] following
> some recent bug report. Although Luis' patch was reviewed by me back
> then and tested by Olga on nfs, I tested it now and found that it
> regressed fstests on xfs,btrfs, so decided on a slightly differnt logic.
>
> The rationale behind my change is that given src fs and dst fs, the
> outcomes of EOPNOTSUPP and EXDEV are consistent, while keeping the
> fs that may return success from this syscall to a minimum.
>
> I've tested this with the -g copy_range tests from fstests on
> ext4,xfs,btrfs,nfs,overlayfs (over xfs).
>
> For ext4, all tests are skipping as expected.
> For xfs,btrfs (support clone) the cross-fs copy test (genric/565)
> is skipped.
> For nfs,overlayfs the tests pass including the cross-fs copy test.
Thanks a lot for taking care of this, Amir. I've also tested it with ceph
and didn't saw anything breaking (well... there was a bug in a
ceph-specific test, but that's unrelated). So, FWIW:
Reviewed-by: Luís Henriques <lhenriques@suse.de>
Tested-by: Luís Henriques <lhenriques@suse.de>
Cheers
--
Luís
>
> Thanks,
> Amir.
>
> [1] https://lore.kernel.org/linux-fsdevel/CAOQ4uxh6PegaOtMXQ9WmU=05bhQfYTeweGjFWR7T+XVAbuR09A@mail.gmail.com/
>
> fs/nfsd/vfs.c | 8 ++++-
> fs/read_write.c | 79 +++++++++++++++++++++++++++----------------------
> 2 files changed, 50 insertions(+), 37 deletions(-)
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index c22ad0532e8e..74f7fef48504 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -577,6 +577,7 @@ __be32 nfsd4_clone_file_range(struct svc_rqst *rqstp,
> ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos, struct file *dst,
> u64 dst_pos, u64 count)
> {
> + ssize_t ret;
>
> /*
> * Limit copy to 4MB to prevent indefinitely blocking an nfsd
> @@ -587,7 +588,12 @@ ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos, struct file *dst,
> * limit like this and pipeline multiple COPY requests.
> */
> count = min_t(u64, count, 1 << 22);
> - return vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
> + ret = vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
> +
> + if (ret == -EOPNOTSUPP || ret == -EXDEV)
> + ret = generic_copy_file_range(src, src_pos, dst, dst_pos,
> + count, 0);
> + return ret;
> }
>
> __be32 nfsd4_vfs_fallocate(struct svc_rqst *rqstp, struct svc_fh *fhp,
> diff --git a/fs/read_write.c b/fs/read_write.c
> index e643aec2b0ef..a8f12e91762f 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1381,28 +1381,6 @@ ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
> }
> EXPORT_SYMBOL(generic_copy_file_range);
>
> -static ssize_t do_copy_file_range(struct file *file_in, loff_t pos_in,
> - struct file *file_out, loff_t pos_out,
> - size_t len, unsigned int flags)
> -{
> - /*
> - * Although we now allow filesystems to handle cross sb copy, passing
> - * a file of the wrong filesystem type to filesystem driver can result
> - * in an attempt to dereference the wrong type of ->private_data, so
> - * avoid doing that until we really have a good reason. NFS defines
> - * several different file_system_type structures, but they all end up
> - * using the same ->copy_file_range() function pointer.
> - */
> - if (file_out->f_op->copy_file_range &&
> - file_out->f_op->copy_file_range == file_in->f_op->copy_file_range)
> - return file_out->f_op->copy_file_range(file_in, pos_in,
> - file_out, pos_out,
> - len, flags);
> -
> - return generic_copy_file_range(file_in, pos_in, file_out, pos_out, len,
> - flags);
> -}
> -
> /*
> * Performs necessary checks before doing a file copy
> *
> @@ -1424,6 +1402,25 @@ static int generic_copy_file_checks(struct file *file_in, loff_t pos_in,
> if (ret)
> return ret;
>
> + /*
> + * Although we now allow filesystems to handle cross sb copy, passing
> + * a file of the wrong filesystem type to filesystem driver can result
> + * in an attempt to dereference the wrong type of ->private_data, so
> + * avoid doing that until we really have a good reason. NFS defines
> + * several different file_system_type structures, but they all end up
> + * using the same ->copy_file_range() function pointer.
> + */
> + if (file_out->f_op->copy_file_range) {
> + if (file_in->f_op->copy_file_range !=
> + file_out->f_op->copy_file_range)
> + return -EXDEV;
> + } else if (file_in->f_op->remap_file_range) {
> + if (file_inode(file_in)->i_sb != file_inode(file_out)->i_sb)
> + return -EXDEV;
> + } else {
> + return -EOPNOTSUPP;
> + }
> +
> /* Don't touch certain kinds of inodes */
> if (IS_IMMUTABLE(inode_out))
> return -EPERM;
> @@ -1489,26 +1486,36 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> file_start_write(file_out);
>
> /*
> - * Try cloning first, this is supported by more file systems, and
> - * more efficient if both clone and copy are supported (e.g. NFS).
> + * Cloning is supported by more file systems, so we implement copy on
> + * same sb using clone, but for filesystems were both clone and copy
> + * are supported (e.g. NFS), we only call the copy method.
> */
> - if (file_in->f_op->remap_file_range &&
> - file_inode(file_in)->i_sb == file_inode(file_out)->i_sb) {
> - loff_t cloned;
> -
> - cloned = file_in->f_op->remap_file_range(file_in, pos_in,
> + if (file_out->f_op->copy_file_range) {
> + ret = file_out->f_op->copy_file_range(file_in, pos_in,
> + file_out, pos_out,
> + len, flags);
> + } else if (file_in->f_op->remap_file_range &&
> + file_inode(file_in)->i_sb == file_inode(file_out)->i_sb) {
> + ret = file_in->f_op->remap_file_range(file_in, pos_in,
> file_out, pos_out,
> min_t(loff_t, MAX_RW_COUNT, len),
> REMAP_FILE_CAN_SHORTEN);
> - if (cloned > 0) {
> - ret = cloned;
> - goto done;
> - }
> }
>
> - ret = do_copy_file_range(file_in, pos_in, file_out, pos_out, len,
> - flags);
> - WARN_ON_ONCE(ret == -EOPNOTSUPP);
> + if (ret > 0)
> + goto done;
> +
> + /*
> + * We can get here if filesystem supports clone but rejected the clone
> + * request (e.g. because it was not block aligned).
> + * In that case, fall back to kernel copy so we are able to maintain a
> + * consistent story about which filesystems support copy_file_range()
> + * and which filesystems do not, that will allow userspace tools to
> + * make consistent desicions w.r.t using copy_file_range().
> + */
> + ret = generic_copy_file_range(file_in, pos_in, file_out, pos_out, len,
> + flags);
> +
> done:
> if (ret > 0) {
> fsnotify_access(file_in);
> --
>
> 2.25.1
>
prev parent reply other threads:[~2022-05-20 10:32 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-20 8:21 [PATCH v13] vfs: fix copy_file_range regression in cross-fs copies Amir Goldstein
2022-05-20 10:33 ` Luís Henriques [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ilq0334z.fsf@brahms.olymp \
--to=lhenriques@suse.de \
--cc=amir73il@gmail.com \
--cc=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=drinkcat@chromium.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=olga.kornievskaia@gmail.com \
--cc=oliver.sang@intel.com \
--cc=paul.gortmaker@windriver.com \
--cc=viro@zeniv.linux.org.uk \
--cc=zhe.he@windriver.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).