linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] Btrfs updates for 5.18
@ 2022-03-21 21:33 David Sterba
  2022-03-22  2:35 ` Linus Torvalds
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: David Sterba @ 2022-03-21 21:33 UTC (permalink / raw)
  To: torvalds; +Cc: David Sterba, linux-btrfs, linux-kernel

Hi,

this update contains feature updates, performance improvements,
preparatory and core work and some related VFS updates.  Please pull,
thanks.

Features:

- encoded read/write ioctls, allows user space to read or write raw data
  directly to extents (now compressed, encrypted in the future), will be
  used by send/receive v2 where it saves processing time

- zoned mode now works with metadata DUP (the mkfs.btrfs default)

- error message header updates:
  - print error state: transaction abort, other error, log tree errors
  - print transient filesystem state: remount, device replace, ignored
    checksum verifications

- tree-checker: verify the transaction id of the to-be-written dirty
  extent buffer

Performance improvements:

- fsync speedups
  - directory logging speedups (up to -90% run time)
  - avoid logging all directory changes during renames (up to -60% run
    time)
  - avoid inode logging during rename and link when possible (up to -60%
    run time)
  - prepare extents to be logged before locking a log tree path
    (throughput +7%)
  - stop copying old file extents when doing a full fsync ()
  - improved logging of old extents after truncate

Core, fixes:

- improved stale device identification by dev_t and not just path (for
  devices that are behind other layers like device mapper)

- continued extent tree v2 preparatory work
  - disable features that won't work yet
  - add wrappers and abstractions for new tree roots

- improved error handling

- add super block write annotations around background block group
  reclaim

- fix device scanning messages potentially accessing stale pointer

- cleanups and refactoring

VFS:

- allow reflinks/deduplication from two different mounts of the same
  filesystem

- export and add helpers for read/write range verification, for the
  encoded ioctls

----------------------------------------------------------------
The following changes since commit 09688c0166e76ce2fb85e86b9d99be8b0084cdf9:

  Linux 5.17-rc8 (2022-03-13 13:23:37 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-5.18-tag

for you to fetch changes up to d3e29967079c522ce1c5cab0e9fab2c280b977eb:

  btrfs: zoned: put block group after final usage (2022-03-14 13:13:54 +0100)

----------------------------------------------------------------
Anand Jain (6):
      btrfs: simplify fs_devices member access in btrfs_init_dev_replace_tgtdev
      btrfs: harden identification of a stale device
      btrfs: match stale devices by dev_t
      btrfs: add device major-minor info in the struct btrfs_device
      btrfs: use dev_t to match device in device_matched
      btrfs: cleanup temporary variables when finding rotational device status

David Sterba (1):
      btrfs: replace BUILD_BUG_ON by static_assert

Dongliang Mu (1):
      btrfs: don't access possibly stale fs_info data in device_list_add

Dāvis Mosāns (1):
      btrfs: add lzo workspace buffer length constants

Filipe Manana (28):
      btrfs: remove write and wait of struct walk_control
      btrfs: don't log unnecessary boundary keys when logging directory
      btrfs: put initial index value of a directory in a constant
      btrfs: stop copying old dir items when logging a directory
      btrfs: stop trying to log subdirectories created in past transactions
      btrfs: add helper to delete a dir entry from a log tree
      btrfs: pass the dentry to btrfs_log_new_name() instead of the inode
      btrfs: avoid logging all directory changes during renames
      btrfs: stop doing unnecessary log updates during a rename
      btrfs: avoid inode logging during rename and link when possible
      btrfs: use single variable to track return value at btrfs_log_inode()
      btrfs: remove unnecessary leaf free space checks when pushing items
      btrfs: avoid unnecessary COW of leaves when deleting items from a leaf
      btrfs: avoid unnecessary computation when deleting items from a leaf
      btrfs: remove constraint on number of visited leaves when replacing extents
      btrfs: remove useless path release in the fast fsync path
      btrfs: prepare extents to be logged before locking a log tree path
      btrfs: stop checking for NULL return from btrfs_get_extent()
      btrfs: fix lost error return value when reading a data page
      btrfs: remove no longer used counter when reading data page
      btrfs: assert we have a write lock when removing and replacing extent maps
      btrfs: stop copying old file extents when doing a full fsync
      btrfs: hold on to less memory when logging checksums during full fsync
      btrfs: voluntarily relinquish cpu when doing a full fsync
      btrfs: reset last_reflink_trans after fsyncing inode
      btrfs: fix unexpected error path when reflinking an inline extent
      btrfs: deal with unexpected extent type during reflinking
      btrfs: add and use helper for unlinking inode during log replay

Jiapeng Chong (2):
      btrfs: zoned: remove redundant initialization of to_add
      btrfs: scrub: remove redundant initialization of increment

Johannes Thumshirn (5):
      btrfs: zoned: make zone activation multi stripe capable
      btrfs: zoned: make zone finishing multi stripe capable
      btrfs: zoned: prepare for allowing DUP on zoned
      btrfs: zoned: allow DUP on meta-data block groups
      btrfs: stop checking for NULL return from btrfs_get_extent_fiemap()

Josef Bacik (27):
      btrfs: add definition for EXTENT_TREE_V2
      btrfs: disable balance for extent tree v2 for now
      btrfs: disable device manipulation ioctl's EXTENT_TREE_V2
      btrfs: disable qgroups in extent tree v2
      btrfs: disable scrub for extent-tree-v2
      btrfs: disable snapshot creation/deletion for extent tree v2
      btrfs: disable space cache related mount options for extent tree v2
      btrfs: tree-checker: don't fail on empty extent roots for extent tree v2
      btrfs: abstract out loading the tree root
      btrfs: add code to support the block group root
      btrfs: add support for multiple global roots
      btrfs: make search_csum_tree return 0 if we get -EFBIG
      btrfs: handle csum lookup errors properly on reads
      btrfs: check correct bio in finish_compressed_bio_read
      btrfs: remove the bio argument from finish_compressed_bio_read
      btrfs: track compressed bio errors as blk_status_t
      btrfs: do not double complete bio on errors during compressed reads
      btrfs: do not try to repair bio that has no mirror set
      btrfs: do not clean up repair bio if submit fails
      btrfs: pass btrfs_fs_info for deleting snapshots and cleaner
      btrfs: pass btrfs_fs_info to btrfs_recover_relocation
      btrfs: remove the cross file system checks from remap
      fs: allow cross-vfsmount reflink/dedupe
      btrfs: remove BUG_ON(ret) in alloc_reserved_tree_block
      btrfs: add a alloc_reserved_extent helper
      btrfs: remove last_ref from the extent freeing code
      btrfs: factor out do_free_extent_accounting helper

Minghao Chi (1):
      btrfs: send: remove redundant ret variable in fs_path_copy

Naohiro Aota (1):
      btrfs: zoned: mark relocation as writing

Niels Dossche (2):
      btrfs: extend locking to all space_info members accesses
      btrfs: add lockdep_assert_held to need_preemptive_reclaim

Nikolay Borisov (3):
      btrfs: move missing device handling in a dedicate function
      btrfs: move QUOTA_ENABLED check to rescan_should_stop from btrfs_qgroup_rescan_worker
      btrfs: zoned: put block group after final usage

Omar Sandoval (10):
      fs: export rw_verify_area()
      fs: export variant of generic_write_checks without iov_iter
      btrfs: don't advance offset for compressed bios in btrfs_csum_one_bio()
      btrfs: add ram_bytes and offset to btrfs_ordered_extent
      btrfs: support different disk extent size for delalloc
      btrfs: clean up cow_file_range_inline()
      btrfs: optionally extend i_size in cow_file_range_inline()
      btrfs: add definitions and documentation for encoded I/O ioctls
      btrfs: add BTRFS_IOC_ENCODED_READ ioctl
      btrfs: add BTRFS_IOC_ENCODED_WRITE

Pankaj Raghav (1):
      btrfs: zoned: remove redundant assignment in btrfs_check_zoned_mode

Qu Wenruo (4):
      btrfs: populate extent_map::generation when reading from disk
      btrfs: unify the error handling pattern for read_tree_block()
      btrfs: unify the error handling of btrfs_read_buffer()
      btrfs: verify the tranisd of the to-be-written dirty extent buffer

Sahil Kang (2):
      btrfs: reuse existing pointers from btrfs_ioctl
      btrfs: reuse existing inode from btrfs_ioctl

Sidong Yang (2):
      btrfs: qgroup: remove duplicated check in adding qgroup relations
      btrfs: qgroup: remove outdated TODO comments

Sweet Tea Dorminy (1):
      btrfs: add filesystems state details to error messages

 fs/btrfs/backref.c                |    7 +-
 fs/btrfs/block-group.c            |   36 +-
 fs/btrfs/block-group.h            |    1 +
 fs/btrfs/btrfs_inode.h            |   42 +-
 fs/btrfs/compression.c            |   63 +-
 fs/btrfs/compression.h            |   10 +-
 fs/btrfs/ctree.c                  |  108 ++--
 fs/btrfs/ctree.h                  |   83 ++-
 fs/btrfs/delalloc-space.c         |   18 +-
 fs/btrfs/dev-replace.c            |   18 +-
 fs/btrfs/disk-io.c                |  219 +++++--
 fs/btrfs/disk-io.h                |    2 +
 fs/btrfs/extent-tree.c            |  148 +++--
 fs/btrfs/extent_io.c              |   45 +-
 fs/btrfs/extent_map.c             |    4 +
 fs/btrfs/file-item.c              |   76 +--
 fs/btrfs/file.c                   |   79 ++-
 fs/btrfs/free-space-tree.c        |    2 +
 fs/btrfs/inode.c                  | 1183 +++++++++++++++++++++++++++++--------
 fs/btrfs/ioctl.c                  |  309 ++++++++--
 fs/btrfs/lzo.c                    |   11 +-
 fs/btrfs/ordered-data.c           |  132 ++---
 fs/btrfs/ordered-data.h           |   25 +-
 fs/btrfs/print-tree.c             |    5 +-
 fs/btrfs/qgroup.c                 |   72 ++-
 fs/btrfs/reflink.c                |   43 +-
 fs/btrfs/relocation.c             |   11 +-
 fs/btrfs/scrub.c                  |    2 +-
 fs/btrfs/send.c                   |   11 +-
 fs/btrfs/send.h                   |    2 +-
 fs/btrfs/space-info.c             |    5 +-
 fs/btrfs/super.c                  |   96 ++-
 fs/btrfs/sysfs.c                  |   15 +-
 fs/btrfs/tests/extent-map-tests.c |    2 +
 fs/btrfs/transaction.c            |   19 +-
 fs/btrfs/transaction.h            |    2 +-
 fs/btrfs/tree-checker.c           |   35 +-
 fs/btrfs/tree-log.c               |  982 ++++++++++++++++++------------
 fs/btrfs/tree-log.h               |    7 +-
 fs/btrfs/volumes.c                |  147 ++---
 fs/btrfs/volumes.h                |    7 +-
 fs/btrfs/zoned.c                  |  167 ++++--
 fs/internal.h                     |    5 -
 fs/ioctl.c                        |    4 -
 fs/read_write.c                   |   34 +-
 fs/remap_range.c                  |    7 +-
 include/linux/fs.h                |    2 +
 include/trace/events/btrfs.h      |    1 +
 include/uapi/linux/btrfs.h        |  133 +++++
 include/uapi/linux/btrfs_tree.h   |    3 +
 50 files changed, 3109 insertions(+), 1331 deletions(-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [GIT PULL] Btrfs updates for 5.18
  2022-03-21 21:33 [GIT PULL] Btrfs updates for 5.18 David Sterba
@ 2022-03-22  2:35 ` Linus Torvalds
  2022-03-22 18:23 ` Linus Torvalds
  2022-03-22 18:32 ` pr-tracker-bot
  2 siblings, 0 replies; 9+ messages in thread
From: Linus Torvalds @ 2022-03-22  2:35 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, lkml

On Mon, Mar 21, 2022 at 2:37 PM David Sterba <dsterba@suse.com> wrote:
>
> this update contains feature updates, performance improvements,
> preparatory and core work and some related VFS updates.  Please pull,
> thanks.

Hmm. This was marked as spam for me.

Not sure why - the email looks fine, and has proper DKIM and SPF records.

              Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [GIT PULL] Btrfs updates for 5.18
  2022-03-21 21:33 [GIT PULL] Btrfs updates for 5.18 David Sterba
  2022-03-22  2:35 ` Linus Torvalds
@ 2022-03-22 18:23 ` Linus Torvalds
  2022-03-22 20:55   ` Josef Bacik
  2022-03-22 22:09   ` Amir Goldstein
  2022-03-22 18:32 ` pr-tracker-bot
  2 siblings, 2 replies; 9+ messages in thread
From: Linus Torvalds @ 2022-03-22 18:23 UTC (permalink / raw)
  To: David Sterba, linux-fsdevel; +Cc: linux-btrfs, lkml

On Mon, Mar 21, 2022 at 2:37 PM David Sterba <dsterba@suse.com> wrote:
>
> - allow reflinks/deduplication from two different mounts of the same
>   filesystem

So I've pulled this, and it looks ok, but I'm not getting the warm and fuzzies.

In particular, I'm not seeing any commentary about different
filesystems for this.

There are several filesystems that use that ->remap_file_range()
operation, so these relaxed rules don't just affect btrfs.

Yes, yes, checking for i_sb matching does seem sensible, but I'd
*really* have liked some sign that people checked with other
filesystem maintainers and this is ok for all of them, and they didn't
make assumptions about "always same mount" rather than "always same
filesystem".

This affects at least cifs, nfs, overlayfs and ocfs2.

Adding fsdevel, and pointing to that

-       if (src_file->f_path.mnt != dst_file->f_path.mnt)
+       if (file_inode(src_file)->i_sb != file_inode(dst_file)->i_sb)

change in commit 9f5710bbfd30 ("fs: allow cross-vfsmount reflink/dedupe")

And yes, there was already a comment about "Practically, they only
need to be on the same file system" from before that matches the new
behavior, but hey, comments have been known to be wrong in the past
too.

And yes, I'm also aware that do_clone_file_range() already had that
exact same i_sb check and it's not new, but since ioctl_file_clone()
cheched for the mount path, I don't think you could actually reach it
without being on the same mount.

And while discussing these sanity checks: wouldn't it make sense to
check that *both* the source file and the destination file support
that remap_file_range() op, and it's the same op?

Yes, yes, it probably always is in practice, but I could imagine some
type confusion thing. So wouldn't it be nice to also have something
like

    if (dst_file->f_op != src_file->f_op)
          goto out_drop_write;

in there? I'm thinking "how about dedupe from a directory to a regular
file" kind of craziness...

                 Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [GIT PULL] Btrfs updates for 5.18
  2022-03-21 21:33 [GIT PULL] Btrfs updates for 5.18 David Sterba
  2022-03-22  2:35 ` Linus Torvalds
  2022-03-22 18:23 ` Linus Torvalds
@ 2022-03-22 18:32 ` pr-tracker-bot
  2 siblings, 0 replies; 9+ messages in thread
From: pr-tracker-bot @ 2022-03-22 18:32 UTC (permalink / raw)
  To: David Sterba; +Cc: torvalds, David Sterba, linux-btrfs, linux-kernel

The pull request you sent on Mon, 21 Mar 2022 22:33:04 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-5.18-tag

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/5191290407668028179f2544a11ae9b57f0bcf07

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [GIT PULL] Btrfs updates for 5.18
  2022-03-22 18:23 ` Linus Torvalds
@ 2022-03-22 20:55   ` Josef Bacik
  2022-03-22 21:15     ` Linus Torvalds
  2022-03-22 21:19     ` Darrick J. Wong
  2022-03-22 22:09   ` Amir Goldstein
  1 sibling, 2 replies; 9+ messages in thread
From: Josef Bacik @ 2022-03-22 20:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Sterba, linux-fsdevel, linux-btrfs, lkml

On Tue, Mar 22, 2022 at 11:23:21AM -0700, Linus Torvalds wrote:
> On Mon, Mar 21, 2022 at 2:37 PM David Sterba <dsterba@suse.com> wrote:
> >
> > - allow reflinks/deduplication from two different mounts of the same
> >   filesystem
> 
> So I've pulled this, and it looks ok, but I'm not getting the warm and fuzzies.
> 
> In particular, I'm not seeing any commentary about different
> filesystems for this.
> 
> There are several filesystems that use that ->remap_file_range()
> operation, so these relaxed rules don't just affect btrfs.
> 
> Yes, yes, checking for i_sb matching does seem sensible, but I'd
> *really* have liked some sign that people checked with other
> filesystem maintainers and this is ok for all of them, and they didn't
> make assumptions about "always same mount" rather than "always same
> filesystem".
> 

> This affects at least cifs, nfs, overlayfs and ocfs2.

I had a talk with Darrick Wong about this on IRC, and his Reviewed-by is on the
patch.  This did surprise nfsd when xfstests started failing, but talking with
Bruce he didn't complain once he understood what was going on.  Believe me I
have 0 interest in getting the other maintainers upset with me by sneaking
something by them, I made sure to run it by people first, tho I probably should
have checked with people directly other than Darrick.

> 
> Adding fsdevel, and pointing to that
> 
> -       if (src_file->f_path.mnt != dst_file->f_path.mnt)
> +       if (file_inode(src_file)->i_sb != file_inode(dst_file)->i_sb)
> 
> change in commit 9f5710bbfd30 ("fs: allow cross-vfsmount reflink/dedupe")
> 
> And yes, there was already a comment about "Practically, they only
> need to be on the same file system" from before that matches the new
> behavior, but hey, comments have been known to be wrong in the past
> too.
> 
> And yes, I'm also aware that do_clone_file_range() already had that
> exact same i_sb check and it's not new, but since ioctl_file_clone()
> cheched for the mount path, I don't think you could actually reach it
> without being on the same mount.
> 
> And while discussing these sanity checks: wouldn't it make sense to
> check that *both* the source file and the destination file support
> that remap_file_range() op, and it's the same op?
> 
> Yes, yes, it probably always is in practice, but I could imagine some
> type confusion thing. So wouldn't it be nice to also have something
> like
> 
>     if (dst_file->f_op != src_file->f_op)
>           goto out_drop_write;
> 
> in there? I'm thinking "how about dedupe from a directory to a regular
> file" kind of craziness...
>

This more fine-grained checking is handled by generic_remap_file_range_prep() to
make sure we don't try to dedup a directory or pipe or some other nonsense.
Thanks,

Josef 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [GIT PULL] Btrfs updates for 5.18
  2022-03-22 20:55   ` Josef Bacik
@ 2022-03-22 21:15     ` Linus Torvalds
  2022-03-22 21:19     ` Darrick J. Wong
  1 sibling, 0 replies; 9+ messages in thread
From: Linus Torvalds @ 2022-03-22 21:15 UTC (permalink / raw)
  To: Josef Bacik; +Cc: David Sterba, linux-fsdevel, linux-btrfs, lkml

On Tue, Mar 22, 2022 at 1:55 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> This more fine-grained checking is handled by generic_remap_file_range_prep() to
> make sure we don't try to dedup a directory or pipe or some other nonsense.

Yeah, that does seem to take care of the obvious cases, and requires
that both files be regular files at least.

I'm still not a huge fan of how we use the 'f_op->remap_file_range' of
the source file, without really checking that the destination file is
ok with remap_file_range.

They end up _superficially_ very similar, yes, but I can point to
filesystems that use different f_op's for different files.

And some of those depend on - wait for it - how the filesystem was mounted.

See for example cifs:

                if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_BRL)
                        file->f_op = &cifs_file_direct_nobrl_ops;
                else
                        file->f_op = &cifs_file_direct_ops;

so I'm just thinking "what about doing remap_file_range between two
regular files that act differently - either due to mount options or
other details".

In that cifs example, read_iter and write_iter are different. Yes,
copy/remap_file_range uses the same function pointer, but it still
worries me about copying from a mount to another if there might be
different semantics for IO between them.

I think in this cifs case, the superblock ends up being the same, so
the mnt_cifs_flags end up being the same, and the above is not
actually a real issue. But conceptually I could imagine cases where
that wasn't the case - or even cases like /proc that have
fundamentally different file operations for different files)

                 Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [GIT PULL] Btrfs updates for 5.18
  2022-03-22 20:55   ` Josef Bacik
  2022-03-22 21:15     ` Linus Torvalds
@ 2022-03-22 21:19     ` Darrick J. Wong
  2022-03-24 10:10       ` Amir Goldstein
  1 sibling, 1 reply; 9+ messages in thread
From: Darrick J. Wong @ 2022-03-22 21:19 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Linus Torvalds, David Sterba, linux-fsdevel, linux-btrfs, lkml

On Tue, Mar 22, 2022 at 04:55:17PM -0400, Josef Bacik wrote:
> On Tue, Mar 22, 2022 at 11:23:21AM -0700, Linus Torvalds wrote:
> > On Mon, Mar 21, 2022 at 2:37 PM David Sterba <dsterba@suse.com> wrote:
> > >
> > > - allow reflinks/deduplication from two different mounts of the same
> > >   filesystem
> > 
> > So I've pulled this, and it looks ok, but I'm not getting the warm and fuzzies.
> > 
> > In particular, I'm not seeing any commentary about different
> > filesystems for this.
> > 
> > There are several filesystems that use that ->remap_file_range()
> > operation, so these relaxed rules don't just affect btrfs.
> > 
> > Yes, yes, checking for i_sb matching does seem sensible, but I'd
> > *really* have liked some sign that people checked with other
> > filesystem maintainers and this is ok for all of them, and they didn't
> > make assumptions about "always same mount" rather than "always same
> > filesystem".
> > 
> 
> > This affects at least cifs, nfs, overlayfs and ocfs2.
> 
> I had a talk with Darrick Wong about this on IRC, and his Reviewed-by is on the
> patch.  This did surprise nfsd when xfstests started failing, but talking with
> Bruce he didn't complain once he understood what was going on.

FWIW, I remember talking about this with Bruce and (probably Anna too)
during a hallway BOF at the last LSFMMBPFBBQ that I went to, which was
2018(?)  At the time, I think we resolved that nfs42_remap_file_range
was capable of detecting and dealing with unsupported requests, so a
direct comparison of the ->remap_file_range or ->f_op wasn't necessary
for them.

> Believe me I
> have 0 interest in getting the other maintainers upset with me by sneaking
> something by them, I made sure to run it by people first, tho I probably should
> have checked with people directly other than Darrick.

I /am/ a little curious what Steve French has to say w.r.t CIFS.

AFAICT overlayfs passes the request down to the appropriate fs
under-layer, so its correctness mostly depends on the under-layer's
implementation.  But I'll let Amir or someone chime in on that. ;)

As for ocfs2, back when I added support for ->remap_file_range to ocfs2,
cross-mount reflink and dedupe worked fine, or at least as well as
anything works on ocfs2.

(XFS has always supported cross-mount remappings.)

> > 
> > Adding fsdevel, and pointing to that
> > 
> > -       if (src_file->f_path.mnt != dst_file->f_path.mnt)
> > +       if (file_inode(src_file)->i_sb != file_inode(dst_file)->i_sb)
> > 
> > change in commit 9f5710bbfd30 ("fs: allow cross-vfsmount reflink/dedupe")
> > 
> > And yes, there was already a comment about "Practically, they only
> > need to be on the same file system" from before that matches the new
> > behavior, but hey, comments have been known to be wrong in the past
> > too.
> > 
> > And yes, I'm also aware that do_clone_file_range() already had that
> > exact same i_sb check and it's not new, but since ioctl_file_clone()
> > cheched for the mount path, I don't think you could actually reach it
> > without being on the same mount.
> > 
> > And while discussing these sanity checks: wouldn't it make sense to
> > check that *both* the source file and the destination file support
> > that remap_file_range() op, and it's the same op?
> > 
> > Yes, yes, it probably always is in practice, but I could imagine some
> > type confusion thing. So wouldn't it be nice to also have something
> > like
> > 
> >     if (dst_file->f_op != src_file->f_op)
> >           goto out_drop_write;
> > 
> > in there? I'm thinking "how about dedupe from a directory to a regular
> > file" kind of craziness...
> >
> 
> This more fine-grained checking is handled by generic_remap_file_range_prep() to
> make sure we don't try to dedup a directory or pipe or some other nonsense.

Yes.  The VFS only allows remapping between regular files.

--D

> Thanks,
> 
> Josef 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [GIT PULL] Btrfs updates for 5.18
  2022-03-22 18:23 ` Linus Torvalds
  2022-03-22 20:55   ` Josef Bacik
@ 2022-03-22 22:09   ` Amir Goldstein
  1 sibling, 0 replies; 9+ messages in thread
From: Amir Goldstein @ 2022-03-22 22:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Sterba, linux-fsdevel, linux-btrfs, lkml,
	Olga Kornievskaia, Steve French

On Tue, Mar 22, 2022 at 10:11 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Mon, Mar 21, 2022 at 2:37 PM David Sterba <dsterba@suse.com> wrote:
> >
> > - allow reflinks/deduplication from two different mounts of the same
> >   filesystem
>
> So I've pulled this, and it looks ok, but I'm not getting the warm and fuzzies.
>
> In particular, I'm not seeing any commentary about different
> filesystems for this.
>
> There are several filesystems that use that ->remap_file_range()
> operation, so these relaxed rules don't just affect btrfs.
>
> Yes, yes, checking for i_sb matching does seem sensible, but I'd
> *really* have liked some sign that people checked with other
> filesystem maintainers and this is ok for all of them, and they didn't
> make assumptions about "always same mount" rather than "always same
> filesystem".
>
> This affects at least cifs, nfs, overlayfs and ocfs2.

overlayfs shouldn't have a problem with that change.

IIUC, cifs would also gain from this, because clone is implemented
on server side and even two different sb's could technically do
server side  duplicate_extents.
Same goes for nfs v4.2.

There was a lot of discussion on these aspects when cross server
(i.e. cross sb) copy was implemented not so long ago.
Relaxing cross-mnt clone is nothing compared to that.


>
> Adding fsdevel, and pointing to that
>
> -       if (src_file->f_path.mnt != dst_file->f_path.mnt)
> +       if (file_inode(src_file)->i_sb != file_inode(dst_file)->i_sb)
>
> change in commit 9f5710bbfd30 ("fs: allow cross-vfsmount reflink/dedupe")
>
> And yes, there was already a comment about "Practically, they only
> need to be on the same file system" from before that matches the new
> behavior, but hey, comments have been known to be wrong in the past
> too.

As the one who left this comment I can say it is based only on common
sense, similar to the rationale in this recent commit.
If it is any help, overlayfs has been doing cross mnt clones since
  913b86e92e1f vfs: allow vfs_clone_file_range() across mount points
I left the comment because I did not need to take responsibility for changing
user behavior at the time, but I do not see any immediate harm from the user
behavior changes now.

>
> And yes, I'm also aware that do_clone_file_range() already had that
> exact same i_sb check and it's not new, but since ioctl_file_clone()
> cheched for the mount path, I don't think you could actually reach it
> without being on the same mount.
>
> And while discussing these sanity checks: wouldn't it make sense to
> check that *both* the source file and the destination file support
> that remap_file_range() op, and it's the same op?
>
> Yes, yes, it probably always is in practice, but I could imagine some
> type confusion thing. So wouldn't it be nice to also have something
> like
>
>     if (dst_file->f_op != src_file->f_op)
>           goto out_drop_write;
>
> in there? I'm thinking "how about dedupe from a directory to a regular
> file" kind of craziness...

Both S_ISDIR and !S_ISREG cases are already checked for both clone
and dedupe on both files (twice in fact), so at least that is not a concern.

There may be other reasons to worry about, but I can't think of any.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [GIT PULL] Btrfs updates for 5.18
  2022-03-22 21:19     ` Darrick J. Wong
@ 2022-03-24 10:10       ` Amir Goldstein
  0 siblings, 0 replies; 9+ messages in thread
From: Amir Goldstein @ 2022-03-24 10:10 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Josef Bacik, Linus Torvalds, David Sterba, linux-fsdevel,
	linux-btrfs, lkml

On Thu, Mar 24, 2022 at 1:35 AM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Tue, Mar 22, 2022 at 04:55:17PM -0400, Josef Bacik wrote:
> > On Tue, Mar 22, 2022 at 11:23:21AM -0700, Linus Torvalds wrote:
> > > On Mon, Mar 21, 2022 at 2:37 PM David Sterba <dsterba@suse.com> wrote:
> > > >
> > > > - allow reflinks/deduplication from two different mounts of the same
> > > >   filesystem
> > >
> > > So I've pulled this, and it looks ok, but I'm not getting the warm and fuzzies.
> > >
> > > In particular, I'm not seeing any commentary about different
> > > filesystems for this.
> > >
> > > There are several filesystems that use that ->remap_file_range()
> > > operation, so these relaxed rules don't just affect btrfs.
> > >
> > > Yes, yes, checking for i_sb matching does seem sensible, but I'd
> > > *really* have liked some sign that people checked with other
> > > filesystem maintainers and this is ok for all of them, and they didn't
> > > make assumptions about "always same mount" rather than "always same
> > > filesystem".
> > >
> >
> > > This affects at least cifs, nfs, overlayfs and ocfs2.
> >
> > I had a talk with Darrick Wong about this on IRC, and his Reviewed-by is on the
> > patch.  This did surprise nfsd when xfstests started failing, but talking with
> > Bruce he didn't complain once he understood what was going on.
>
> FWIW, I remember talking about this with Bruce and (probably Anna too)
> during a hallway BOF at the last LSFMMBPFBBQ that I went to, which was
> 2018(?)  At the time, I think we resolved that nfs42_remap_file_range
> was capable of detecting and dealing with unsupported requests, so a
> direct comparison of the ->remap_file_range or ->f_op wasn't necessary
> for them.
>
> > Believe me I
> > have 0 interest in getting the other maintainers upset with me by sneaking
> > something by them, I made sure to run it by people first, tho I probably should
> > have checked with people directly other than Darrick.
>
> I /am/ a little curious what Steve French has to say w.r.t CIFS.
>
> AFAICT overlayfs passes the request down to the appropriate fs
> under-layer, so its correctness mostly depends on the under-layer's
> implementation.  But I'll let Amir or someone chime in on that. ;)
>

The thing is, overlayfs ALREADY does cross-mnt clone_file_range()
on underlying layers. So if there was a bug with allowing cross mnt
clones on xfs it would have been in the wild for a long time already.

OTOH, overlayfs doesn't support nfs/cifs/ocfs2 as upper fs.

If you mount an overlay with lower and upper layer on the same
xfs/btrfs sb the original mnt of lower path and upper patch is irrelevant.

Overlayfs uses different private mnt per layer anyway, so if the source
file is on lower layer then even if originally the overlay mount was done
with different upper/lower mounts of the same sb, clone via overlayfs
would work.

Allowing cross overlayfs mount clone makes very little difference.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-03-24 10:11 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-21 21:33 [GIT PULL] Btrfs updates for 5.18 David Sterba
2022-03-22  2:35 ` Linus Torvalds
2022-03-22 18:23 ` Linus Torvalds
2022-03-22 20:55   ` Josef Bacik
2022-03-22 21:15     ` Linus Torvalds
2022-03-22 21:19     ` Darrick J. Wong
2022-03-24 10:10       ` Amir Goldstein
2022-03-22 22:09   ` Amir Goldstein
2022-03-22 18:32 ` pr-tracker-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).