* [GIT PULL] Btrfs updates for 5.18 @ 2022-03-21 21:33 David Sterba 2022-03-22 2:35 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: David Sterba @ 2022-03-21 21:33 UTC (permalink / raw) To: torvalds; +Cc: David Sterba, linux-btrfs, linux-kernel Hi, this update contains feature updates, performance improvements, preparatory and core work and some related VFS updates. Please pull, thanks. Features: - encoded read/write ioctls, allows user space to read or write raw data directly to extents (now compressed, encrypted in the future), will be used by send/receive v2 where it saves processing time - zoned mode now works with metadata DUP (the mkfs.btrfs default) - error message header updates: - print error state: transaction abort, other error, log tree errors - print transient filesystem state: remount, device replace, ignored checksum verifications - tree-checker: verify the transaction id of the to-be-written dirty extent buffer Performance improvements: - fsync speedups - directory logging speedups (up to -90% run time) - avoid logging all directory changes during renames (up to -60% run time) - avoid inode logging during rename and link when possible (up to -60% run time) - prepare extents to be logged before locking a log tree path (throughput +7%) - stop copying old file extents when doing a full fsync () - improved logging of old extents after truncate Core, fixes: - improved stale device identification by dev_t and not just path (for devices that are behind other layers like device mapper) - continued extent tree v2 preparatory work - disable features that won't work yet - add wrappers and abstractions for new tree roots - improved error handling - add super block write annotations around background block group reclaim - fix device scanning messages potentially accessing stale pointer - cleanups and refactoring VFS: - allow reflinks/deduplication from two different mounts of the same filesystem - export and add helpers for read/write range verification, for the encoded ioctls ---------------------------------------------------------------- The following changes since commit 09688c0166e76ce2fb85e86b9d99be8b0084cdf9: Linux 5.17-rc8 (2022-03-13 13:23:37 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-5.18-tag for you to fetch changes up to d3e29967079c522ce1c5cab0e9fab2c280b977eb: btrfs: zoned: put block group after final usage (2022-03-14 13:13:54 +0100) ---------------------------------------------------------------- Anand Jain (6): btrfs: simplify fs_devices member access in btrfs_init_dev_replace_tgtdev btrfs: harden identification of a stale device btrfs: match stale devices by dev_t btrfs: add device major-minor info in the struct btrfs_device btrfs: use dev_t to match device in device_matched btrfs: cleanup temporary variables when finding rotational device status David Sterba (1): btrfs: replace BUILD_BUG_ON by static_assert Dongliang Mu (1): btrfs: don't access possibly stale fs_info data in device_list_add Dāvis Mosāns (1): btrfs: add lzo workspace buffer length constants Filipe Manana (28): btrfs: remove write and wait of struct walk_control btrfs: don't log unnecessary boundary keys when logging directory btrfs: put initial index value of a directory in a constant btrfs: stop copying old dir items when logging a directory btrfs: stop trying to log subdirectories created in past transactions btrfs: add helper to delete a dir entry from a log tree btrfs: pass the dentry to btrfs_log_new_name() instead of the inode btrfs: avoid logging all directory changes during renames btrfs: stop doing unnecessary log updates during a rename btrfs: avoid inode logging during rename and link when possible btrfs: use single variable to track return value at btrfs_log_inode() btrfs: remove unnecessary leaf free space checks when pushing items btrfs: avoid unnecessary COW of leaves when deleting items from a leaf btrfs: avoid unnecessary computation when deleting items from a leaf btrfs: remove constraint on number of visited leaves when replacing extents btrfs: remove useless path release in the fast fsync path btrfs: prepare extents to be logged before locking a log tree path btrfs: stop checking for NULL return from btrfs_get_extent() btrfs: fix lost error return value when reading a data page btrfs: remove no longer used counter when reading data page btrfs: assert we have a write lock when removing and replacing extent maps btrfs: stop copying old file extents when doing a full fsync btrfs: hold on to less memory when logging checksums during full fsync btrfs: voluntarily relinquish cpu when doing a full fsync btrfs: reset last_reflink_trans after fsyncing inode btrfs: fix unexpected error path when reflinking an inline extent btrfs: deal with unexpected extent type during reflinking btrfs: add and use helper for unlinking inode during log replay Jiapeng Chong (2): btrfs: zoned: remove redundant initialization of to_add btrfs: scrub: remove redundant initialization of increment Johannes Thumshirn (5): btrfs: zoned: make zone activation multi stripe capable btrfs: zoned: make zone finishing multi stripe capable btrfs: zoned: prepare for allowing DUP on zoned btrfs: zoned: allow DUP on meta-data block groups btrfs: stop checking for NULL return from btrfs_get_extent_fiemap() Josef Bacik (27): btrfs: add definition for EXTENT_TREE_V2 btrfs: disable balance for extent tree v2 for now btrfs: disable device manipulation ioctl's EXTENT_TREE_V2 btrfs: disable qgroups in extent tree v2 btrfs: disable scrub for extent-tree-v2 btrfs: disable snapshot creation/deletion for extent tree v2 btrfs: disable space cache related mount options for extent tree v2 btrfs: tree-checker: don't fail on empty extent roots for extent tree v2 btrfs: abstract out loading the tree root btrfs: add code to support the block group root btrfs: add support for multiple global roots btrfs: make search_csum_tree return 0 if we get -EFBIG btrfs: handle csum lookup errors properly on reads btrfs: check correct bio in finish_compressed_bio_read btrfs: remove the bio argument from finish_compressed_bio_read btrfs: track compressed bio errors as blk_status_t btrfs: do not double complete bio on errors during compressed reads btrfs: do not try to repair bio that has no mirror set btrfs: do not clean up repair bio if submit fails btrfs: pass btrfs_fs_info for deleting snapshots and cleaner btrfs: pass btrfs_fs_info to btrfs_recover_relocation btrfs: remove the cross file system checks from remap fs: allow cross-vfsmount reflink/dedupe btrfs: remove BUG_ON(ret) in alloc_reserved_tree_block btrfs: add a alloc_reserved_extent helper btrfs: remove last_ref from the extent freeing code btrfs: factor out do_free_extent_accounting helper Minghao Chi (1): btrfs: send: remove redundant ret variable in fs_path_copy Naohiro Aota (1): btrfs: zoned: mark relocation as writing Niels Dossche (2): btrfs: extend locking to all space_info members accesses btrfs: add lockdep_assert_held to need_preemptive_reclaim Nikolay Borisov (3): btrfs: move missing device handling in a dedicate function btrfs: move QUOTA_ENABLED check to rescan_should_stop from btrfs_qgroup_rescan_worker btrfs: zoned: put block group after final usage Omar Sandoval (10): fs: export rw_verify_area() fs: export variant of generic_write_checks without iov_iter btrfs: don't advance offset for compressed bios in btrfs_csum_one_bio() btrfs: add ram_bytes and offset to btrfs_ordered_extent btrfs: support different disk extent size for delalloc btrfs: clean up cow_file_range_inline() btrfs: optionally extend i_size in cow_file_range_inline() btrfs: add definitions and documentation for encoded I/O ioctls btrfs: add BTRFS_IOC_ENCODED_READ ioctl btrfs: add BTRFS_IOC_ENCODED_WRITE Pankaj Raghav (1): btrfs: zoned: remove redundant assignment in btrfs_check_zoned_mode Qu Wenruo (4): btrfs: populate extent_map::generation when reading from disk btrfs: unify the error handling pattern for read_tree_block() btrfs: unify the error handling of btrfs_read_buffer() btrfs: verify the tranisd of the to-be-written dirty extent buffer Sahil Kang (2): btrfs: reuse existing pointers from btrfs_ioctl btrfs: reuse existing inode from btrfs_ioctl Sidong Yang (2): btrfs: qgroup: remove duplicated check in adding qgroup relations btrfs: qgroup: remove outdated TODO comments Sweet Tea Dorminy (1): btrfs: add filesystems state details to error messages fs/btrfs/backref.c | 7 +- fs/btrfs/block-group.c | 36 +- fs/btrfs/block-group.h | 1 + fs/btrfs/btrfs_inode.h | 42 +- fs/btrfs/compression.c | 63 +- fs/btrfs/compression.h | 10 +- fs/btrfs/ctree.c | 108 ++-- fs/btrfs/ctree.h | 83 ++- fs/btrfs/delalloc-space.c | 18 +- fs/btrfs/dev-replace.c | 18 +- fs/btrfs/disk-io.c | 219 +++++-- fs/btrfs/disk-io.h | 2 + fs/btrfs/extent-tree.c | 148 +++-- fs/btrfs/extent_io.c | 45 +- fs/btrfs/extent_map.c | 4 + fs/btrfs/file-item.c | 76 +-- fs/btrfs/file.c | 79 ++- fs/btrfs/free-space-tree.c | 2 + fs/btrfs/inode.c | 1183 +++++++++++++++++++++++++++++-------- fs/btrfs/ioctl.c | 309 ++++++++-- fs/btrfs/lzo.c | 11 +- fs/btrfs/ordered-data.c | 132 ++--- fs/btrfs/ordered-data.h | 25 +- fs/btrfs/print-tree.c | 5 +- fs/btrfs/qgroup.c | 72 ++- fs/btrfs/reflink.c | 43 +- fs/btrfs/relocation.c | 11 +- fs/btrfs/scrub.c | 2 +- fs/btrfs/send.c | 11 +- fs/btrfs/send.h | 2 +- fs/btrfs/space-info.c | 5 +- fs/btrfs/super.c | 96 ++- fs/btrfs/sysfs.c | 15 +- fs/btrfs/tests/extent-map-tests.c | 2 + fs/btrfs/transaction.c | 19 +- fs/btrfs/transaction.h | 2 +- fs/btrfs/tree-checker.c | 35 +- fs/btrfs/tree-log.c | 982 ++++++++++++++++++------------ fs/btrfs/tree-log.h | 7 +- fs/btrfs/volumes.c | 147 ++--- fs/btrfs/volumes.h | 7 +- fs/btrfs/zoned.c | 167 ++++-- fs/internal.h | 5 - fs/ioctl.c | 4 - fs/read_write.c | 34 +- fs/remap_range.c | 7 +- include/linux/fs.h | 2 + include/trace/events/btrfs.h | 1 + include/uapi/linux/btrfs.h | 133 +++++ include/uapi/linux/btrfs_tree.h | 3 + 50 files changed, 3109 insertions(+), 1331 deletions(-) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [GIT PULL] Btrfs updates for 5.18 2022-03-21 21:33 [GIT PULL] Btrfs updates for 5.18 David Sterba @ 2022-03-22 2:35 ` Linus Torvalds 2022-03-22 18:23 ` Linus Torvalds 2022-03-22 18:32 ` pr-tracker-bot 2 siblings, 0 replies; 9+ messages in thread From: Linus Torvalds @ 2022-03-22 2:35 UTC (permalink / raw) To: David Sterba; +Cc: linux-btrfs, lkml On Mon, Mar 21, 2022 at 2:37 PM David Sterba <dsterba@suse.com> wrote: > > this update contains feature updates, performance improvements, > preparatory and core work and some related VFS updates. Please pull, > thanks. Hmm. This was marked as spam for me. Not sure why - the email looks fine, and has proper DKIM and SPF records. Linus ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [GIT PULL] Btrfs updates for 5.18 2022-03-21 21:33 [GIT PULL] Btrfs updates for 5.18 David Sterba 2022-03-22 2:35 ` Linus Torvalds @ 2022-03-22 18:23 ` Linus Torvalds 2022-03-22 20:55 ` Josef Bacik 2022-03-22 22:09 ` Amir Goldstein 2022-03-22 18:32 ` pr-tracker-bot 2 siblings, 2 replies; 9+ messages in thread From: Linus Torvalds @ 2022-03-22 18:23 UTC (permalink / raw) To: David Sterba, linux-fsdevel; +Cc: linux-btrfs, lkml On Mon, Mar 21, 2022 at 2:37 PM David Sterba <dsterba@suse.com> wrote: > > - allow reflinks/deduplication from two different mounts of the same > filesystem So I've pulled this, and it looks ok, but I'm not getting the warm and fuzzies. In particular, I'm not seeing any commentary about different filesystems for this. There are several filesystems that use that ->remap_file_range() operation, so these relaxed rules don't just affect btrfs. Yes, yes, checking for i_sb matching does seem sensible, but I'd *really* have liked some sign that people checked with other filesystem maintainers and this is ok for all of them, and they didn't make assumptions about "always same mount" rather than "always same filesystem". This affects at least cifs, nfs, overlayfs and ocfs2. Adding fsdevel, and pointing to that - if (src_file->f_path.mnt != dst_file->f_path.mnt) + if (file_inode(src_file)->i_sb != file_inode(dst_file)->i_sb) change in commit 9f5710bbfd30 ("fs: allow cross-vfsmount reflink/dedupe") And yes, there was already a comment about "Practically, they only need to be on the same file system" from before that matches the new behavior, but hey, comments have been known to be wrong in the past too. And yes, I'm also aware that do_clone_file_range() already had that exact same i_sb check and it's not new, but since ioctl_file_clone() cheched for the mount path, I don't think you could actually reach it without being on the same mount. And while discussing these sanity checks: wouldn't it make sense to check that *both* the source file and the destination file support that remap_file_range() op, and it's the same op? Yes, yes, it probably always is in practice, but I could imagine some type confusion thing. So wouldn't it be nice to also have something like if (dst_file->f_op != src_file->f_op) goto out_drop_write; in there? I'm thinking "how about dedupe from a directory to a regular file" kind of craziness... Linus ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [GIT PULL] Btrfs updates for 5.18 2022-03-22 18:23 ` Linus Torvalds @ 2022-03-22 20:55 ` Josef Bacik 2022-03-22 21:15 ` Linus Torvalds 2022-03-22 21:19 ` Darrick J. Wong 2022-03-22 22:09 ` Amir Goldstein 1 sibling, 2 replies; 9+ messages in thread From: Josef Bacik @ 2022-03-22 20:55 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Sterba, linux-fsdevel, linux-btrfs, lkml On Tue, Mar 22, 2022 at 11:23:21AM -0700, Linus Torvalds wrote: > On Mon, Mar 21, 2022 at 2:37 PM David Sterba <dsterba@suse.com> wrote: > > > > - allow reflinks/deduplication from two different mounts of the same > > filesystem > > So I've pulled this, and it looks ok, but I'm not getting the warm and fuzzies. > > In particular, I'm not seeing any commentary about different > filesystems for this. > > There are several filesystems that use that ->remap_file_range() > operation, so these relaxed rules don't just affect btrfs. > > Yes, yes, checking for i_sb matching does seem sensible, but I'd > *really* have liked some sign that people checked with other > filesystem maintainers and this is ok for all of them, and they didn't > make assumptions about "always same mount" rather than "always same > filesystem". > > This affects at least cifs, nfs, overlayfs and ocfs2. I had a talk with Darrick Wong about this on IRC, and his Reviewed-by is on the patch. This did surprise nfsd when xfstests started failing, but talking with Bruce he didn't complain once he understood what was going on. Believe me I have 0 interest in getting the other maintainers upset with me by sneaking something by them, I made sure to run it by people first, tho I probably should have checked with people directly other than Darrick. > > Adding fsdevel, and pointing to that > > - if (src_file->f_path.mnt != dst_file->f_path.mnt) > + if (file_inode(src_file)->i_sb != file_inode(dst_file)->i_sb) > > change in commit 9f5710bbfd30 ("fs: allow cross-vfsmount reflink/dedupe") > > And yes, there was already a comment about "Practically, they only > need to be on the same file system" from before that matches the new > behavior, but hey, comments have been known to be wrong in the past > too. > > And yes, I'm also aware that do_clone_file_range() already had that > exact same i_sb check and it's not new, but since ioctl_file_clone() > cheched for the mount path, I don't think you could actually reach it > without being on the same mount. > > And while discussing these sanity checks: wouldn't it make sense to > check that *both* the source file and the destination file support > that remap_file_range() op, and it's the same op? > > Yes, yes, it probably always is in practice, but I could imagine some > type confusion thing. So wouldn't it be nice to also have something > like > > if (dst_file->f_op != src_file->f_op) > goto out_drop_write; > > in there? I'm thinking "how about dedupe from a directory to a regular > file" kind of craziness... > This more fine-grained checking is handled by generic_remap_file_range_prep() to make sure we don't try to dedup a directory or pipe or some other nonsense. Thanks, Josef ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [GIT PULL] Btrfs updates for 5.18 2022-03-22 20:55 ` Josef Bacik @ 2022-03-22 21:15 ` Linus Torvalds 2022-03-22 21:19 ` Darrick J. Wong 1 sibling, 0 replies; 9+ messages in thread From: Linus Torvalds @ 2022-03-22 21:15 UTC (permalink / raw) To: Josef Bacik; +Cc: David Sterba, linux-fsdevel, linux-btrfs, lkml On Tue, Mar 22, 2022 at 1:55 PM Josef Bacik <josef@toxicpanda.com> wrote: > > This more fine-grained checking is handled by generic_remap_file_range_prep() to > make sure we don't try to dedup a directory or pipe or some other nonsense. Yeah, that does seem to take care of the obvious cases, and requires that both files be regular files at least. I'm still not a huge fan of how we use the 'f_op->remap_file_range' of the source file, without really checking that the destination file is ok with remap_file_range. They end up _superficially_ very similar, yes, but I can point to filesystems that use different f_op's for different files. And some of those depend on - wait for it - how the filesystem was mounted. See for example cifs: if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_BRL) file->f_op = &cifs_file_direct_nobrl_ops; else file->f_op = &cifs_file_direct_ops; so I'm just thinking "what about doing remap_file_range between two regular files that act differently - either due to mount options or other details". In that cifs example, read_iter and write_iter are different. Yes, copy/remap_file_range uses the same function pointer, but it still worries me about copying from a mount to another if there might be different semantics for IO between them. I think in this cifs case, the superblock ends up being the same, so the mnt_cifs_flags end up being the same, and the above is not actually a real issue. But conceptually I could imagine cases where that wasn't the case - or even cases like /proc that have fundamentally different file operations for different files) Linus ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [GIT PULL] Btrfs updates for 5.18 2022-03-22 20:55 ` Josef Bacik 2022-03-22 21:15 ` Linus Torvalds @ 2022-03-22 21:19 ` Darrick J. Wong 2022-03-24 10:10 ` Amir Goldstein 1 sibling, 1 reply; 9+ messages in thread From: Darrick J. Wong @ 2022-03-22 21:19 UTC (permalink / raw) To: Josef Bacik Cc: Linus Torvalds, David Sterba, linux-fsdevel, linux-btrfs, lkml On Tue, Mar 22, 2022 at 04:55:17PM -0400, Josef Bacik wrote: > On Tue, Mar 22, 2022 at 11:23:21AM -0700, Linus Torvalds wrote: > > On Mon, Mar 21, 2022 at 2:37 PM David Sterba <dsterba@suse.com> wrote: > > > > > > - allow reflinks/deduplication from two different mounts of the same > > > filesystem > > > > So I've pulled this, and it looks ok, but I'm not getting the warm and fuzzies. > > > > In particular, I'm not seeing any commentary about different > > filesystems for this. > > > > There are several filesystems that use that ->remap_file_range() > > operation, so these relaxed rules don't just affect btrfs. > > > > Yes, yes, checking for i_sb matching does seem sensible, but I'd > > *really* have liked some sign that people checked with other > > filesystem maintainers and this is ok for all of them, and they didn't > > make assumptions about "always same mount" rather than "always same > > filesystem". > > > > > This affects at least cifs, nfs, overlayfs and ocfs2. > > I had a talk with Darrick Wong about this on IRC, and his Reviewed-by is on the > patch. This did surprise nfsd when xfstests started failing, but talking with > Bruce he didn't complain once he understood what was going on. FWIW, I remember talking about this with Bruce and (probably Anna too) during a hallway BOF at the last LSFMMBPFBBQ that I went to, which was 2018(?) At the time, I think we resolved that nfs42_remap_file_range was capable of detecting and dealing with unsupported requests, so a direct comparison of the ->remap_file_range or ->f_op wasn't necessary for them. > Believe me I > have 0 interest in getting the other maintainers upset with me by sneaking > something by them, I made sure to run it by people first, tho I probably should > have checked with people directly other than Darrick. I /am/ a little curious what Steve French has to say w.r.t CIFS. AFAICT overlayfs passes the request down to the appropriate fs under-layer, so its correctness mostly depends on the under-layer's implementation. But I'll let Amir or someone chime in on that. ;) As for ocfs2, back when I added support for ->remap_file_range to ocfs2, cross-mount reflink and dedupe worked fine, or at least as well as anything works on ocfs2. (XFS has always supported cross-mount remappings.) > > > > Adding fsdevel, and pointing to that > > > > - if (src_file->f_path.mnt != dst_file->f_path.mnt) > > + if (file_inode(src_file)->i_sb != file_inode(dst_file)->i_sb) > > > > change in commit 9f5710bbfd30 ("fs: allow cross-vfsmount reflink/dedupe") > > > > And yes, there was already a comment about "Practically, they only > > need to be on the same file system" from before that matches the new > > behavior, but hey, comments have been known to be wrong in the past > > too. > > > > And yes, I'm also aware that do_clone_file_range() already had that > > exact same i_sb check and it's not new, but since ioctl_file_clone() > > cheched for the mount path, I don't think you could actually reach it > > without being on the same mount. > > > > And while discussing these sanity checks: wouldn't it make sense to > > check that *both* the source file and the destination file support > > that remap_file_range() op, and it's the same op? > > > > Yes, yes, it probably always is in practice, but I could imagine some > > type confusion thing. So wouldn't it be nice to also have something > > like > > > > if (dst_file->f_op != src_file->f_op) > > goto out_drop_write; > > > > in there? I'm thinking "how about dedupe from a directory to a regular > > file" kind of craziness... > > > > This more fine-grained checking is handled by generic_remap_file_range_prep() to > make sure we don't try to dedup a directory or pipe or some other nonsense. Yes. The VFS only allows remapping between regular files. --D > Thanks, > > Josef ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [GIT PULL] Btrfs updates for 5.18 2022-03-22 21:19 ` Darrick J. Wong @ 2022-03-24 10:10 ` Amir Goldstein 0 siblings, 0 replies; 9+ messages in thread From: Amir Goldstein @ 2022-03-24 10:10 UTC (permalink / raw) To: Darrick J. Wong Cc: Josef Bacik, Linus Torvalds, David Sterba, linux-fsdevel, linux-btrfs, lkml On Thu, Mar 24, 2022 at 1:35 AM Darrick J. Wong <djwong@kernel.org> wrote: > > On Tue, Mar 22, 2022 at 04:55:17PM -0400, Josef Bacik wrote: > > On Tue, Mar 22, 2022 at 11:23:21AM -0700, Linus Torvalds wrote: > > > On Mon, Mar 21, 2022 at 2:37 PM David Sterba <dsterba@suse.com> wrote: > > > > > > > > - allow reflinks/deduplication from two different mounts of the same > > > > filesystem > > > > > > So I've pulled this, and it looks ok, but I'm not getting the warm and fuzzies. > > > > > > In particular, I'm not seeing any commentary about different > > > filesystems for this. > > > > > > There are several filesystems that use that ->remap_file_range() > > > operation, so these relaxed rules don't just affect btrfs. > > > > > > Yes, yes, checking for i_sb matching does seem sensible, but I'd > > > *really* have liked some sign that people checked with other > > > filesystem maintainers and this is ok for all of them, and they didn't > > > make assumptions about "always same mount" rather than "always same > > > filesystem". > > > > > > > > This affects at least cifs, nfs, overlayfs and ocfs2. > > > > I had a talk with Darrick Wong about this on IRC, and his Reviewed-by is on the > > patch. This did surprise nfsd when xfstests started failing, but talking with > > Bruce he didn't complain once he understood what was going on. > > FWIW, I remember talking about this with Bruce and (probably Anna too) > during a hallway BOF at the last LSFMMBPFBBQ that I went to, which was > 2018(?) At the time, I think we resolved that nfs42_remap_file_range > was capable of detecting and dealing with unsupported requests, so a > direct comparison of the ->remap_file_range or ->f_op wasn't necessary > for them. > > > Believe me I > > have 0 interest in getting the other maintainers upset with me by sneaking > > something by them, I made sure to run it by people first, tho I probably should > > have checked with people directly other than Darrick. > > I /am/ a little curious what Steve French has to say w.r.t CIFS. > > AFAICT overlayfs passes the request down to the appropriate fs > under-layer, so its correctness mostly depends on the under-layer's > implementation. But I'll let Amir or someone chime in on that. ;) > The thing is, overlayfs ALREADY does cross-mnt clone_file_range() on underlying layers. So if there was a bug with allowing cross mnt clones on xfs it would have been in the wild for a long time already. OTOH, overlayfs doesn't support nfs/cifs/ocfs2 as upper fs. If you mount an overlay with lower and upper layer on the same xfs/btrfs sb the original mnt of lower path and upper patch is irrelevant. Overlayfs uses different private mnt per layer anyway, so if the source file is on lower layer then even if originally the overlay mount was done with different upper/lower mounts of the same sb, clone via overlayfs would work. Allowing cross overlayfs mount clone makes very little difference. Thanks, Amir. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [GIT PULL] Btrfs updates for 5.18 2022-03-22 18:23 ` Linus Torvalds 2022-03-22 20:55 ` Josef Bacik @ 2022-03-22 22:09 ` Amir Goldstein 1 sibling, 0 replies; 9+ messages in thread From: Amir Goldstein @ 2022-03-22 22:09 UTC (permalink / raw) To: Linus Torvalds Cc: David Sterba, linux-fsdevel, linux-btrfs, lkml, Olga Kornievskaia, Steve French On Tue, Mar 22, 2022 at 10:11 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Mon, Mar 21, 2022 at 2:37 PM David Sterba <dsterba@suse.com> wrote: > > > > - allow reflinks/deduplication from two different mounts of the same > > filesystem > > So I've pulled this, and it looks ok, but I'm not getting the warm and fuzzies. > > In particular, I'm not seeing any commentary about different > filesystems for this. > > There are several filesystems that use that ->remap_file_range() > operation, so these relaxed rules don't just affect btrfs. > > Yes, yes, checking for i_sb matching does seem sensible, but I'd > *really* have liked some sign that people checked with other > filesystem maintainers and this is ok for all of them, and they didn't > make assumptions about "always same mount" rather than "always same > filesystem". > > This affects at least cifs, nfs, overlayfs and ocfs2. overlayfs shouldn't have a problem with that change. IIUC, cifs would also gain from this, because clone is implemented on server side and even two different sb's could technically do server side duplicate_extents. Same goes for nfs v4.2. There was a lot of discussion on these aspects when cross server (i.e. cross sb) copy was implemented not so long ago. Relaxing cross-mnt clone is nothing compared to that. > > Adding fsdevel, and pointing to that > > - if (src_file->f_path.mnt != dst_file->f_path.mnt) > + if (file_inode(src_file)->i_sb != file_inode(dst_file)->i_sb) > > change in commit 9f5710bbfd30 ("fs: allow cross-vfsmount reflink/dedupe") > > And yes, there was already a comment about "Practically, they only > need to be on the same file system" from before that matches the new > behavior, but hey, comments have been known to be wrong in the past > too. As the one who left this comment I can say it is based only on common sense, similar to the rationale in this recent commit. If it is any help, overlayfs has been doing cross mnt clones since 913b86e92e1f vfs: allow vfs_clone_file_range() across mount points I left the comment because I did not need to take responsibility for changing user behavior at the time, but I do not see any immediate harm from the user behavior changes now. > > And yes, I'm also aware that do_clone_file_range() already had that > exact same i_sb check and it's not new, but since ioctl_file_clone() > cheched for the mount path, I don't think you could actually reach it > without being on the same mount. > > And while discussing these sanity checks: wouldn't it make sense to > check that *both* the source file and the destination file support > that remap_file_range() op, and it's the same op? > > Yes, yes, it probably always is in practice, but I could imagine some > type confusion thing. So wouldn't it be nice to also have something > like > > if (dst_file->f_op != src_file->f_op) > goto out_drop_write; > > in there? I'm thinking "how about dedupe from a directory to a regular > file" kind of craziness... Both S_ISDIR and !S_ISREG cases are already checked for both clone and dedupe on both files (twice in fact), so at least that is not a concern. There may be other reasons to worry about, but I can't think of any. Thanks, Amir. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [GIT PULL] Btrfs updates for 5.18 2022-03-21 21:33 [GIT PULL] Btrfs updates for 5.18 David Sterba 2022-03-22 2:35 ` Linus Torvalds 2022-03-22 18:23 ` Linus Torvalds @ 2022-03-22 18:32 ` pr-tracker-bot 2 siblings, 0 replies; 9+ messages in thread From: pr-tracker-bot @ 2022-03-22 18:32 UTC (permalink / raw) To: David Sterba; +Cc: torvalds, David Sterba, linux-btrfs, linux-kernel The pull request you sent on Mon, 21 Mar 2022 22:33:04 +0100: > git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-5.18-tag has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/5191290407668028179f2544a11ae9b57f0bcf07 Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-03-24 10:11 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-03-21 21:33 [GIT PULL] Btrfs updates for 5.18 David Sterba 2022-03-22 2:35 ` Linus Torvalds 2022-03-22 18:23 ` Linus Torvalds 2022-03-22 20:55 ` Josef Bacik 2022-03-22 21:15 ` Linus Torvalds 2022-03-22 21:19 ` Darrick J. Wong 2022-03-24 10:10 ` Amir Goldstein 2022-03-22 22:09 ` Amir Goldstein 2022-03-22 18:32 ` pr-tracker-bot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).