linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Luís Henriques" <lhenriques@suse.de>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	Olga Kornievskaia <olga.kornievskaia@gmail.com>,
	Dave Chinner <david@fromorbit.com>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	He Zhe <zhe.he@windriver.com>,
	Paul Gortmaker <paul.gortmaker@windriver.com>,
	linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org,
	Nicolas Boichat <drinkcat@chromium.org>,
	kernel test robot <oliver.sang@intel.com>
Subject: Re: [PATCH v13] vfs: fix copy_file_range regression in cross-fs copies
Date: Fri, 20 May 2022 11:33:00 +0100	[thread overview]
Message-ID: <87ilq0334z.fsf@brahms.olymp> (raw)
In-Reply-To: <20220520082111.2066400-1-amir73il@gmail.com> (Amir Goldstein's message of "Fri, 20 May 2022 11:21:11 +0300")

Amir Goldstein <amir73il@gmail.com> writes:

> From: Luis Henriques <lhenriques@suse.de>
>
> A regression has been reported by Nicolas Boichat, found while using the
> copy_file_range syscall to copy a tracefs file.  Before commit
> 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the
> kernel would return -EXDEV to userspace when trying to copy a file across
> different filesystems.  After this commit, the syscall doesn't fail anymore
> and instead returns zero (zero bytes copied), as this file's content is
> generated on-the-fly and thus reports a size of zero.
>
> Another regression has been reported by He Zhe and Paul Gortmaker -
> WARN_ON_ONCE(ret == -EOPNOTSUPP) can be triggered from userspace when
> copying from a sysfs file whose read operation may return -EOPNOTSUPP:
>
>   xfs_io -f -c "copy_range /sys/class/net/lo/phys_switch_id" /tmp/foo
>
> Since we do not have test coverage for copy_file_range() between any
> two types of filesystems, the best way to avoid with this sort of issues
> in the future is for the kernel to be more picky about filesystems that
> are allowed to do copy_file_range().
>
> This patch restores some cross-filesystem copy restrictions that existed
> prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy across
> devices").
>
> Filesystems that implement copy_file_range() or remap_file_range() may
> still fall-back to the generic_copy_file_range() implementation when the
> filesystem cannot handle the copy, so kernel can maintain a consistent
> story about which filesystems support copy_file_range().
> That will allow userspace tools to make consistent desicions w.r.t using
> copy_file_range().
>
> nfsd is also modified to fall-back into generic_copy_file_range() in case
> vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV.
>
> Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices")
> Link: https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/
> Link: https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx+BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/
> Link: https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/
> Link: https://lore.kernel.org/linux-fsdevel/20210630161320.29006-1-lhenriques@suse.de/
> Reported-by: Nicolas Boichat <drinkcat@chromium.org>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Signed-off-by: Luis Henriques <lhenriques@suse.de>
> Fixes: 64bf5ff58dff ("vfs: no fallback for ->copy_file_range")
> Link: https://lore.kernel.org/linux-fsdevel/20f17f64-88cb-4e80-07c1-85cb96c83619@windriver.com/
> Reported-by: He Zhe <zhe.he@windriver.com>
> Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>
> Al,
>
> Please take this patch to fix several issues that started to pop up
> as userspace tools started using copy_file_range(), which could only
> get more popular in the future.
>
> I've decides to pick this up from Luis' v12 last year [1] following
> some recent bug report. Although Luis' patch was reviewed by me back
> then and tested by Olga on nfs, I tested it now and found that it
> regressed fstests on xfs,btrfs, so decided on a slightly differnt logic.
>
> The rationale behind my change is that given src fs and dst fs, the
> outcomes of EOPNOTSUPP and EXDEV are consistent, while keeping the
> fs that may return success from this syscall to a minimum.
>
> I've tested this with the -g copy_range tests from fstests on
> ext4,xfs,btrfs,nfs,overlayfs (over xfs).
>
> For ext4, all tests are skipping as expected.
> For xfs,btrfs (support clone) the cross-fs copy test (genric/565)
> is skipped.
> For nfs,overlayfs the tests pass including the cross-fs copy test.

Thanks a lot for taking care of this, Amir.  I've also tested it with ceph
and didn't saw anything breaking (well... there was a bug in a
ceph-specific test, but that's unrelated).  So, FWIW:

Reviewed-by: Luís Henriques <lhenriques@suse.de>
Tested-by: Luís Henriques <lhenriques@suse.de>

Cheers
-- 
Luís

>
> Thanks,
> Amir.
>
> [1] https://lore.kernel.org/linux-fsdevel/CAOQ4uxh6PegaOtMXQ9WmU=05bhQfYTeweGjFWR7T+XVAbuR09A@mail.gmail.com/
>
>  fs/nfsd/vfs.c   |  8 ++++-
>  fs/read_write.c | 79 +++++++++++++++++++++++++++----------------------
>  2 files changed, 50 insertions(+), 37 deletions(-)
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index c22ad0532e8e..74f7fef48504 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -577,6 +577,7 @@ __be32 nfsd4_clone_file_range(struct svc_rqst *rqstp,
>  ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos, struct file *dst,
>  			     u64 dst_pos, u64 count)
>  {
> +	ssize_t ret;
>  
>  	/*
>  	 * Limit copy to 4MB to prevent indefinitely blocking an nfsd
> @@ -587,7 +588,12 @@ ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos, struct file *dst,
>  	 * limit like this and pipeline multiple COPY requests.
>  	 */
>  	count = min_t(u64, count, 1 << 22);
> -	return vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
> +	ret = vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
> +
> +	if (ret == -EOPNOTSUPP || ret == -EXDEV)
> +		ret = generic_copy_file_range(src, src_pos, dst, dst_pos,
> +					      count, 0);
> +	return ret;
>  }
>  
>  __be32 nfsd4_vfs_fallocate(struct svc_rqst *rqstp, struct svc_fh *fhp,
> diff --git a/fs/read_write.c b/fs/read_write.c
> index e643aec2b0ef..a8f12e91762f 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1381,28 +1381,6 @@ ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
>  }
>  EXPORT_SYMBOL(generic_copy_file_range);
>  
> -static ssize_t do_copy_file_range(struct file *file_in, loff_t pos_in,
> -				  struct file *file_out, loff_t pos_out,
> -				  size_t len, unsigned int flags)
> -{
> -	/*
> -	 * Although we now allow filesystems to handle cross sb copy, passing
> -	 * a file of the wrong filesystem type to filesystem driver can result
> -	 * in an attempt to dereference the wrong type of ->private_data, so
> -	 * avoid doing that until we really have a good reason.  NFS defines
> -	 * several different file_system_type structures, but they all end up
> -	 * using the same ->copy_file_range() function pointer.
> -	 */
> -	if (file_out->f_op->copy_file_range &&
> -	    file_out->f_op->copy_file_range == file_in->f_op->copy_file_range)
> -		return file_out->f_op->copy_file_range(file_in, pos_in,
> -						       file_out, pos_out,
> -						       len, flags);
> -
> -	return generic_copy_file_range(file_in, pos_in, file_out, pos_out, len,
> -				       flags);
> -}
> -
>  /*
>   * Performs necessary checks before doing a file copy
>   *
> @@ -1424,6 +1402,25 @@ static int generic_copy_file_checks(struct file *file_in, loff_t pos_in,
>  	if (ret)
>  		return ret;
>  
> +	/*
> +	 * Although we now allow filesystems to handle cross sb copy, passing
> +	 * a file of the wrong filesystem type to filesystem driver can result
> +	 * in an attempt to dereference the wrong type of ->private_data, so
> +	 * avoid doing that until we really have a good reason.  NFS defines
> +	 * several different file_system_type structures, but they all end up
> +	 * using the same ->copy_file_range() function pointer.
> +	 */
> +	if (file_out->f_op->copy_file_range) {
> +		if (file_in->f_op->copy_file_range !=
> +		    file_out->f_op->copy_file_range)
> +			return -EXDEV;
> +	} else if (file_in->f_op->remap_file_range) {
> +		if (file_inode(file_in)->i_sb != file_inode(file_out)->i_sb)
> +			return -EXDEV;
> +	} else {
> +                return -EOPNOTSUPP;
> +	}
> +
>  	/* Don't touch certain kinds of inodes */
>  	if (IS_IMMUTABLE(inode_out))
>  		return -EPERM;
> @@ -1489,26 +1486,36 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  	file_start_write(file_out);
>  
>  	/*
> -	 * Try cloning first, this is supported by more file systems, and
> -	 * more efficient if both clone and copy are supported (e.g. NFS).
> +	 * Cloning is supported by more file systems, so we implement copy on
> +	 * same sb using clone, but for filesystems were both clone and copy
> +	 * are supported (e.g. NFS), we only call the copy method.
>  	 */
> -	if (file_in->f_op->remap_file_range &&
> -	    file_inode(file_in)->i_sb == file_inode(file_out)->i_sb) {
> -		loff_t cloned;
> -
> -		cloned = file_in->f_op->remap_file_range(file_in, pos_in,
> +	if (file_out->f_op->copy_file_range) {
> +		ret = file_out->f_op->copy_file_range(file_in, pos_in,
> +						      file_out, pos_out,
> +						      len, flags);
> +	} else if (file_in->f_op->remap_file_range &&
> +		   file_inode(file_in)->i_sb == file_inode(file_out)->i_sb) {
> +		ret = file_in->f_op->remap_file_range(file_in, pos_in,
>  				file_out, pos_out,
>  				min_t(loff_t, MAX_RW_COUNT, len),
>  				REMAP_FILE_CAN_SHORTEN);
> -		if (cloned > 0) {
> -			ret = cloned;
> -			goto done;
> -		}
>  	}
>  
> -	ret = do_copy_file_range(file_in, pos_in, file_out, pos_out, len,
> -				flags);
> -	WARN_ON_ONCE(ret == -EOPNOTSUPP);
> +	if (ret > 0)
> +		goto done;
> +
> +	/*
> +	 * We can get here if filesystem supports clone but rejected the clone
> +	 * request (e.g. because it was not block aligned).
> +	 * In that case, fall back to kernel copy so we are able to maintain a
> +	 * consistent story about which filesystems support copy_file_range()
> +	 * and which filesystems do not, that will allow userspace tools to
> +	 * make consistent desicions w.r.t using copy_file_range().
> +	 */
> +	ret = generic_copy_file_range(file_in, pos_in, file_out, pos_out, len,
> +				      flags);
> +
>  done:
>  	if (ret > 0) {
>  		fsnotify_access(file_in);
> -- 
>
> 2.25.1
>


      reply	other threads:[~2022-05-20 10:32 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-20  8:21 [PATCH v13] vfs: fix copy_file_range regression in cross-fs copies Amir Goldstein
2022-05-20 10:33 ` Luís Henriques [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ilq0334z.fsf@brahms.olymp \
    --to=lhenriques@suse.de \
    --cc=amir73il@gmail.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=drinkcat@chromium.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=olga.kornievskaia@gmail.com \
    --cc=oliver.sang@intel.com \
    --cc=paul.gortmaker@windriver.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zhe.he@windriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).