* [PATCH v2] vfs: fix copy_file_range() averts filesystem freeze protection
@ 2022-11-17 20:52 Amir Goldstein
2022-11-24 9:54 ` Amir Goldstein
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Amir Goldstein @ 2022-11-17 20:52 UTC (permalink / raw)
To: Linus Torvalds
Cc: Al Viro, Namjae Jeon, Luis Henriques, Olga Kornievskaia,
Jan Kara, linux-fsdevel, linux-cifs, linux-nfs, Luis Henriques
Commit 868f9f2f8e00 ("vfs: fix copy_file_range() regression in cross-fs
copies") removed fallback to generic_copy_file_range() for cross-fs
cases inside vfs_copy_file_range().
To preserve behavior of nfsd and ksmbd server-side-copy, the fallback to
generic_copy_file_range() was added in nfsd and ksmbd code, but that
call is missing sb_start_write(), fsnotify hooks and more.
Ideally, nfsd and ksmbd would pass a flag to vfs_copy_file_range() that
will take care of the fallback, but that code would be subtle and we got
vfs_copy_file_range() logic wrong too many times already.
Instead, add a flag to explicitly request vfs_copy_file_range() to
perform only generic_copy_file_range() and let nfsd and ksmbd use this
flag only in the fallback path.
This choise keeps the logic changes to minimum in the non-nfsd/ksmbd code
paths to reduce the risk of further regressions.
Fixes: 868f9f2f8e00 ("vfs: fix copy_file_range() regression in cross-fs copies")
Tested-by: Namjae Jeon <linkinjeon@kernel.org>
Tested-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
Hi Linus,
I've tried Al, but he seems to be AFK, so since you ended up applying
the regressing commit, I might as well send you the fix as well.
I intentionally chose a fix "for dummies", because I'd like to end this
copy_file_range() regression streak.
I ran the copy_range fstests group on ext4/xfs/overlay to verify no
regressions in local fs and nfsv3/nfsv4 to test server-side-copy.
I also patched copy_file_range() locally to test the "dumb" fallback
code on local fs.
Namje tested ksmbd.
Please apply.
Thanks,
Amir.
Changes since v1:
- Added Tested-by's
fs/ksmbd/vfs.c | 6 +++---
fs/nfsd/vfs.c | 4 ++--
fs/read_write.c | 19 +++++++++++++++----
include/linux/fs.h | 8 ++++++++
4 files changed, 28 insertions(+), 9 deletions(-)
diff --git a/fs/ksmbd/vfs.c b/fs/ksmbd/vfs.c
index 8de970d6146f..94b8ed4ef870 100644
--- a/fs/ksmbd/vfs.c
+++ b/fs/ksmbd/vfs.c
@@ -1794,9 +1794,9 @@ int ksmbd_vfs_copy_file_ranges(struct ksmbd_work *work,
ret = vfs_copy_file_range(src_fp->filp, src_off,
dst_fp->filp, dst_off, len, 0);
if (ret == -EOPNOTSUPP || ret == -EXDEV)
- ret = generic_copy_file_range(src_fp->filp, src_off,
- dst_fp->filp, dst_off,
- len, 0);
+ ret = vfs_copy_file_range(src_fp->filp, src_off,
+ dst_fp->filp, dst_off, len,
+ COPY_FILE_SPLICE);
if (ret < 0)
return ret;
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index f650afedd67f..5cf11cde51f8 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -596,8 +596,8 @@ ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos, struct file *dst,
ret = vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
if (ret == -EOPNOTSUPP || ret == -EXDEV)
- ret = generic_copy_file_range(src, src_pos, dst, dst_pos,
- count, 0);
+ ret = vfs_copy_file_range(src, src_pos, dst, dst_pos, count,
+ COPY_FILE_SPLICE);
return ret;
}
diff --git a/fs/read_write.c b/fs/read_write.c
index 328ce8cf9a85..24b9668d6377 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1388,6 +1388,8 @@ ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
struct file *file_out, loff_t pos_out,
size_t len, unsigned int flags)
{
+ lockdep_assert(sb_write_started(file_inode(file_out)->i_sb));
+
return do_splice_direct(file_in, &pos_in, file_out, &pos_out,
len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0);
}
@@ -1424,7 +1426,9 @@ static int generic_copy_file_checks(struct file *file_in, loff_t pos_in,
* and several different sets of file_operations, but they all end up
* using the same ->copy_file_range() function pointer.
*/
- if (file_out->f_op->copy_file_range) {
+ if (flags & COPY_FILE_SPLICE) {
+ /* cross sb splice is allowed */
+ } else if (file_out->f_op->copy_file_range) {
if (file_in->f_op->copy_file_range !=
file_out->f_op->copy_file_range)
return -EXDEV;
@@ -1474,8 +1478,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
size_t len, unsigned int flags)
{
ssize_t ret;
+ bool splice = flags & COPY_FILE_SPLICE;
- if (flags != 0)
+ if (flags & ~COPY_FILE_SPLICE)
return -EINVAL;
ret = generic_copy_file_checks(file_in, pos_in, file_out, pos_out, &len,
@@ -1501,14 +1506,14 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
* same sb using clone, but for filesystems where both clone and copy
* are supported (e.g. nfs,cifs), we only call the copy method.
*/
- if (file_out->f_op->copy_file_range) {
+ if (!splice && file_out->f_op->copy_file_range) {
ret = file_out->f_op->copy_file_range(file_in, pos_in,
file_out, pos_out,
len, flags);
goto done;
}
- if (file_in->f_op->remap_file_range &&
+ if (!splice && file_in->f_op->remap_file_range &&
file_inode(file_in)->i_sb == file_inode(file_out)->i_sb) {
ret = file_in->f_op->remap_file_range(file_in, pos_in,
file_out, pos_out,
@@ -1528,6 +1533,8 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
* consistent story about which filesystems support copy_file_range()
* and which filesystems do not, that will allow userspace tools to
* make consistent desicions w.r.t using copy_file_range().
+ *
+ * We also get here if caller (e.g. nfsd) requested COPY_FILE_SPLICE.
*/
ret = generic_copy_file_range(file_in, pos_in, file_out, pos_out, len,
flags);
@@ -1582,6 +1589,10 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
pos_out = f_out.file->f_pos;
}
+ ret = -EINVAL;
+ if (flags != 0)
+ goto out;
+
ret = vfs_copy_file_range(f_in.file, pos_in, f_out.file, pos_out, len,
flags);
if (ret > 0) {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e654435f1651..59ae95ddb679 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2089,6 +2089,14 @@ struct dir_context {
*/
#define REMAP_FILE_ADVISORY (REMAP_FILE_CAN_SHORTEN)
+/*
+ * These flags control the behavior of vfs_copy_file_range().
+ * They are not available to the user via syscall.
+ *
+ * COPY_FILE_SPLICE: call splice direct instead of fs clone/copy ops
+ */
+#define COPY_FILE_SPLICE (1 << 0)
+
struct iov_iter;
struct io_uring_cmd;
--
2.25.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2] vfs: fix copy_file_range() averts filesystem freeze protection
2022-11-17 20:52 [PATCH v2] vfs: fix copy_file_range() averts filesystem freeze protection Amir Goldstein
@ 2022-11-24 9:54 ` Amir Goldstein
2022-11-25 5:36 ` Al Viro
2022-11-27 20:49 ` Linus Torvalds
2 siblings, 0 replies; 4+ messages in thread
From: Amir Goldstein @ 2022-11-24 9:54 UTC (permalink / raw)
To: Linus Torvalds
Cc: Al Viro, Namjae Jeon, Luis Henriques, Olga Kornievskaia,
Jan Kara, linux-fsdevel, linux-cifs, linux-nfs, Luis Henriques
On Thu, Nov 17, 2022 at 10:53 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> Commit 868f9f2f8e00 ("vfs: fix copy_file_range() regression in cross-fs
> copies") removed fallback to generic_copy_file_range() for cross-fs
> cases inside vfs_copy_file_range().
>
> To preserve behavior of nfsd and ksmbd server-side-copy, the fallback to
> generic_copy_file_range() was added in nfsd and ksmbd code, but that
> call is missing sb_start_write(), fsnotify hooks and more.
>
> Ideally, nfsd and ksmbd would pass a flag to vfs_copy_file_range() that
> will take care of the fallback, but that code would be subtle and we got
> vfs_copy_file_range() logic wrong too many times already.
>
> Instead, add a flag to explicitly request vfs_copy_file_range() to
> perform only generic_copy_file_range() and let nfsd and ksmbd use this
> flag only in the fallback path.
>
> This choise keeps the logic changes to minimum in the non-nfsd/ksmbd code
> paths to reduce the risk of further regressions.
>
> Fixes: 868f9f2f8e00 ("vfs: fix copy_file_range() regression in cross-fs copies")
> Tested-by: Namjae Jeon <linkinjeon@kernel.org>
> Tested-by: Luis Henriques <lhenriques@suse.de>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>
> Hi Linus,
>
> I've tried Al, but he seems to be AFK, so since you ended up applying
> the regressing commit, I might as well send you the fix as well.
>
> I intentionally chose a fix "for dummies", because I'd like to end this
> copy_file_range() regression streak.
>
> I ran the copy_range fstests group on ext4/xfs/overlay to verify no
> regressions in local fs and nfsv3/nfsv4 to test server-side-copy.
>
> I also patched copy_file_range() locally to test the "dumb" fallback
> code on local fs.
>
> Namje tested ksmbd.
>
> Please apply.
>
Ping.
Happy Thanksgiving!
Amir.
>
> Changes since v1:
> - Added Tested-by's
>
> fs/ksmbd/vfs.c | 6 +++---
> fs/nfsd/vfs.c | 4 ++--
> fs/read_write.c | 19 +++++++++++++++----
> include/linux/fs.h | 8 ++++++++
> 4 files changed, 28 insertions(+), 9 deletions(-)
>
> diff --git a/fs/ksmbd/vfs.c b/fs/ksmbd/vfs.c
> index 8de970d6146f..94b8ed4ef870 100644
> --- a/fs/ksmbd/vfs.c
> +++ b/fs/ksmbd/vfs.c
> @@ -1794,9 +1794,9 @@ int ksmbd_vfs_copy_file_ranges(struct ksmbd_work *work,
> ret = vfs_copy_file_range(src_fp->filp, src_off,
> dst_fp->filp, dst_off, len, 0);
> if (ret == -EOPNOTSUPP || ret == -EXDEV)
> - ret = generic_copy_file_range(src_fp->filp, src_off,
> - dst_fp->filp, dst_off,
> - len, 0);
> + ret = vfs_copy_file_range(src_fp->filp, src_off,
> + dst_fp->filp, dst_off, len,
> + COPY_FILE_SPLICE);
> if (ret < 0)
> return ret;
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index f650afedd67f..5cf11cde51f8 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -596,8 +596,8 @@ ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos, struct file *dst,
> ret = vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
>
> if (ret == -EOPNOTSUPP || ret == -EXDEV)
> - ret = generic_copy_file_range(src, src_pos, dst, dst_pos,
> - count, 0);
> + ret = vfs_copy_file_range(src, src_pos, dst, dst_pos, count,
> + COPY_FILE_SPLICE);
> return ret;
> }
>
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 328ce8cf9a85..24b9668d6377 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1388,6 +1388,8 @@ ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
> struct file *file_out, loff_t pos_out,
> size_t len, unsigned int flags)
> {
> + lockdep_assert(sb_write_started(file_inode(file_out)->i_sb));
> +
> return do_splice_direct(file_in, &pos_in, file_out, &pos_out,
> len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0);
> }
> @@ -1424,7 +1426,9 @@ static int generic_copy_file_checks(struct file *file_in, loff_t pos_in,
> * and several different sets of file_operations, but they all end up
> * using the same ->copy_file_range() function pointer.
> */
> - if (file_out->f_op->copy_file_range) {
> + if (flags & COPY_FILE_SPLICE) {
> + /* cross sb splice is allowed */
> + } else if (file_out->f_op->copy_file_range) {
> if (file_in->f_op->copy_file_range !=
> file_out->f_op->copy_file_range)
> return -EXDEV;
> @@ -1474,8 +1478,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> size_t len, unsigned int flags)
> {
> ssize_t ret;
> + bool splice = flags & COPY_FILE_SPLICE;
>
> - if (flags != 0)
> + if (flags & ~COPY_FILE_SPLICE)
> return -EINVAL;
>
> ret = generic_copy_file_checks(file_in, pos_in, file_out, pos_out, &len,
> @@ -1501,14 +1506,14 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> * same sb using clone, but for filesystems where both clone and copy
> * are supported (e.g. nfs,cifs), we only call the copy method.
> */
> - if (file_out->f_op->copy_file_range) {
> + if (!splice && file_out->f_op->copy_file_range) {
> ret = file_out->f_op->copy_file_range(file_in, pos_in,
> file_out, pos_out,
> len, flags);
> goto done;
> }
>
> - if (file_in->f_op->remap_file_range &&
> + if (!splice && file_in->f_op->remap_file_range &&
> file_inode(file_in)->i_sb == file_inode(file_out)->i_sb) {
> ret = file_in->f_op->remap_file_range(file_in, pos_in,
> file_out, pos_out,
> @@ -1528,6 +1533,8 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> * consistent story about which filesystems support copy_file_range()
> * and which filesystems do not, that will allow userspace tools to
> * make consistent desicions w.r.t using copy_file_range().
> + *
> + * We also get here if caller (e.g. nfsd) requested COPY_FILE_SPLICE.
> */
> ret = generic_copy_file_range(file_in, pos_in, file_out, pos_out, len,
> flags);
> @@ -1582,6 +1589,10 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
> pos_out = f_out.file->f_pos;
> }
>
> + ret = -EINVAL;
> + if (flags != 0)
> + goto out;
> +
> ret = vfs_copy_file_range(f_in.file, pos_in, f_out.file, pos_out, len,
> flags);
> if (ret > 0) {
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index e654435f1651..59ae95ddb679 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2089,6 +2089,14 @@ struct dir_context {
> */
> #define REMAP_FILE_ADVISORY (REMAP_FILE_CAN_SHORTEN)
>
> +/*
> + * These flags control the behavior of vfs_copy_file_range().
> + * They are not available to the user via syscall.
> + *
> + * COPY_FILE_SPLICE: call splice direct instead of fs clone/copy ops
> + */
> +#define COPY_FILE_SPLICE (1 << 0)
> +
> struct iov_iter;
> struct io_uring_cmd;
>
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2] vfs: fix copy_file_range() averts filesystem freeze protection
2022-11-17 20:52 [PATCH v2] vfs: fix copy_file_range() averts filesystem freeze protection Amir Goldstein
2022-11-24 9:54 ` Amir Goldstein
@ 2022-11-25 5:36 ` Al Viro
2022-11-27 20:49 ` Linus Torvalds
2 siblings, 0 replies; 4+ messages in thread
From: Al Viro @ 2022-11-25 5:36 UTC (permalink / raw)
To: Amir Goldstein
Cc: Linus Torvalds, Namjae Jeon, Luis Henriques, Olga Kornievskaia,
Jan Kara, linux-fsdevel, linux-cifs, linux-nfs, Luis Henriques
On Thu, Nov 17, 2022 at 10:52:49PM +0200, Amir Goldstein wrote:
> Commit 868f9f2f8e00 ("vfs: fix copy_file_range() regression in cross-fs
> copies") removed fallback to generic_copy_file_range() for cross-fs
> cases inside vfs_copy_file_range().
>
> To preserve behavior of nfsd and ksmbd server-side-copy, the fallback to
> generic_copy_file_range() was added in nfsd and ksmbd code, but that
> call is missing sb_start_write(), fsnotify hooks and more.
>
> Ideally, nfsd and ksmbd would pass a flag to vfs_copy_file_range() that
> will take care of the fallback, but that code would be subtle and we got
> vfs_copy_file_range() logic wrong too many times already.
>
> Instead, add a flag to explicitly request vfs_copy_file_range() to
> perform only generic_copy_file_range() and let nfsd and ksmbd use this
> flag only in the fallback path.
>
> This choise keeps the logic changes to minimum in the non-nfsd/ksmbd code
> paths to reduce the risk of further regressions.
>
> Fixes: 868f9f2f8e00 ("vfs: fix copy_file_range() regression in cross-fs copies")
> Tested-by: Namjae Jeon <linkinjeon@kernel.org>
> Tested-by: Luis Henriques <lhenriques@suse.de>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Applied...
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2] vfs: fix copy_file_range() averts filesystem freeze protection
2022-11-17 20:52 [PATCH v2] vfs: fix copy_file_range() averts filesystem freeze protection Amir Goldstein
2022-11-24 9:54 ` Amir Goldstein
2022-11-25 5:36 ` Al Viro
@ 2022-11-27 20:49 ` Linus Torvalds
2 siblings, 0 replies; 4+ messages in thread
From: Linus Torvalds @ 2022-11-27 20:49 UTC (permalink / raw)
To: Amir Goldstein
Cc: Al Viro, Namjae Jeon, Luis Henriques, Olga Kornievskaia,
Jan Kara, linux-fsdevel, linux-cifs, linux-nfs, Luis Henriques
Ok, this is finally in my tree now. Thanks,
Linus
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-11-27 20:49 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-17 20:52 [PATCH v2] vfs: fix copy_file_range() averts filesystem freeze protection Amir Goldstein
2022-11-24 9:54 ` Amir Goldstein
2022-11-25 5:36 ` Al Viro
2022-11-27 20:49 ` Linus Torvalds
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.