All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
@ 2021-07-27 22:37 NeilBrown
  2021-07-27 22:37 ` [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points NeilBrown
                   ` (14 more replies)
  0 siblings, 15 replies; 129+ messages in thread
From: NeilBrown @ 2021-07-27 22:37 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro
  Cc: linux-fsdevel, linux-nfs, linux-btrfs

There are long-standing problems with btrfs subvols, particularly in
relation to whether and how they are exposed in the mount table.

 - /proc/self/mountinfo reports the major:minor device number for each
    filesystem and when a btrfs subvol is explicitly mounted, the number
    reported is wrong - it does not match what stat() reports for the
    mountpoint.

 - when subvol are not explicitly mounted, they don't appear in
   mountinfo at all.

Consequences include that a tool which uses stat() to find the dev of the
filesystem, then searches mountinfo for that filesystem, will not find
it.

Some tools (e.g. findmnt) appear to have been enhanced to cope with this
strangeness, but it would be best to make btrfs behave more normally.

  - nfsd cannot currently see the transition to subvol, so reports the
    main volume and all subvols to the client as being in the same
    filesystem.  As inode numbers are not unique across all subvols,
    this can confuse clients.  In particular, 'find' is likely to report a
    loop.

subvols can be made to appear in mountinfo using automounts.  However
nfsd does not cope well with automounts.  It assumes all filesystems to
be exported are already mounted.  So adding automounts to btrfs would
break nfsd.

We can enhance nfsd to understand that some automounts can be managed.
"internal mounts" where a filesystem provides an automount point and
mounts its own directories, can be handled differently by nfsd.

This series addresses all these issues.  After a few enhancements to the
VFS to provide needed support, they enhance exportfs and nfsd to cope
with the concept of internal mounts, and then enhance btrfs to provide
them.

The NFSv3 support is incomplete.  I'm not sure we can make it work
"perfectly".  A normal nfsv3 mount seem to work well enough, but if
mounted with '-o noac', it loses track of the mounted-on inode number
and complains about inode numbers changing.

My basic test for these is to mount a btrfs filesystem which contains
subvols, nfs-export it and mount it with nfsv3 and nfsv4, then run
'find' in each of the filesystem and check the contents of
/proc/self/mountinfo.

The first patch simply fixes the dev number in mountinfo and could
possibly be tagged for -stable.

NeilBrown

---

NeilBrown (11):
      VFS: show correct dev num in mountinfo
      VFS: allow d_automount to create in-place bind-mount.
      VFS: pass lookup_flags into follow_down()
      VFS: export lookup_mnt()
      VFS: new function: mount_is_internal()
      nfsd: include a vfsmount in struct svc_fh
      exportfs: Allow filehandle lookup to cross internal mount points.
      nfsd: change get_parent_attributes() to nfsd_get_mounted_on()
      nfsd: Allow filehandle lookup to cross internal mount points.
      btrfs: introduce mapping function from location to inum
      btrfs: use automount to bind-mount all subvol roots.


 fs/btrfs/btrfs_inode.h   |  12 +++
 fs/btrfs/inode.c         | 111 ++++++++++++++++++++++++++-
 fs/btrfs/super.c         |   1 +
 fs/exportfs/expfs.c      | 100 ++++++++++++++++++++----
 fs/fhandle.c             |   2 +-
 fs/internal.h            |   1 -
 fs/namei.c               |   6 +-
 fs/namespace.c           |  32 +++++++-
 fs/nfsd/export.c         |   4 +-
 fs/nfsd/nfs3xdr.c        |  40 +++++++---
 fs/nfsd/nfs4proc.c       |   9 ++-
 fs/nfsd/nfs4xdr.c        | 106 ++++++++++++-------------
 fs/nfsd/nfsfh.c          |  44 +++++++----
 fs/nfsd/nfsfh.h          |   3 +-
 fs/nfsd/nfsproc.c        |   5 +-
 fs/nfsd/vfs.c            | 162 +++++++++++++++++++++++----------------
 fs/nfsd/vfs.h            |  12 +--
 fs/nfsd/xdr4.h           |   2 +-
 fs/overlayfs/namei.c     |   5 +-
 fs/xfs/xfs_ioctl.c       |  12 ++-
 include/linux/exportfs.h |   4 +-
 include/linux/mount.h    |   4 +
 include/linux/namei.h    |   2 +-
 23 files changed, 490 insertions(+), 189 deletions(-)

--
Signature


^ permalink raw reply	[flat|nested] 129+ messages in thread
* Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues.
@ 2021-08-02  9:11 Forza
  2021-08-02 21:50 ` NeilBrown
  0 siblings, 1 reply; 129+ messages in thread
From: Forza @ 2021-08-02  9:11 UTC (permalink / raw)
  To: Amir Goldstein, NeilBrown
  Cc: Al Viro, Miklos Szeredi, Christoph Hellwig, Josef Bacik,
	J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba,
	linux-fsdevel, Linux NFS list, Btrfs BTRFS



---- From: Amir Goldstein <amir73il@gmail.com> -- Sent: 2021-08-02 - 09:54 ----

> On Mon, Aug 2, 2021 at 8:41 AM NeilBrown <neilb@suse.de> wrote:
>>
>> On Mon, 02 Aug 2021, Al Viro wrote:
>> > On Mon, Aug 02, 2021 at 02:18:29PM +1000, NeilBrown wrote:
>> >
>> > > It think we need to bite-the-bullet and decide that 64bits is not
>> > > enough, and in fact no number of bits will ever be enough.  overlayfs
>> > > makes this clear.
>> >
>> > Sure - let's go for broke and use XML.  Oh, wait - it's 8 months too
>> > early...
>> >
>> > > So I think we need to strongly encourage user-space to start using
>> > > name_to_handle_at() whenever there is a need to test if two things are
>> > > the same.
>> >
>> > ... and forgetting the inconvenient facts, such as that two different
>> > fhandles may correspond to the same object.
>>
>> Can they?  They certainly can if the "connectable" flag is passed.
>> name_to_handle_at() cannot set that flag.
>> nfsd can, so using name_to_handle_at() on an NFS filesystem isn't quite
>> perfect.  However it is the best that can be done over NFS.
>>
>> Or is there some other situation where two different filehandles can be
>> reported for the same inode?
>>
>> Do you have a better suggestion?
>>
> 
> Neil,
> 
> I think the plan of "changing the world" is not very realistic.
> Sure, *some* tools can be changed, but all of them?
> 
> I went back to read your initial cover letter to understand the
> problem and what I mostly found there was that the view of
> /proc/x/mountinfo was hiding information that is important for
> some tools to understand what is going on with btrfs subvols.
> 
> Well I am not a UNIX history expert, but I suppose that
> /proc/PID/mountinfo was created because /proc/mounts and
> /proc/PID/mounts no longer provided tool with all the information
> about Linux mounts.
> 
> Maybe it's time for a new interface to query the more advanced
> sb/mount topology? fsinfo() maybe? With mount2 compatible API for
> traversing mounts that is not limited to reporting all entries inside
> a single page. I suppose we could go for some hierarchical view
> under /proc/PID/mounttree. I don't know - new API is hard.
> 
> In any case, instead of changing st_dev and st_ino or changing the
> world to work with file handles, why not add inode generation (and
> maybe subvol id) to statx().
> filesystem that care enough will provide this information and tools that
> care enough will use it.
> 
> Thanks,
> Amir.

I think it would be better and easier if nfs provided clients with virtual inodes and kept an internal mapping to actual filesystem inodes. Samba does this with the mount.cifs -o noserverino option, and as far as I know it works pretty well. 

This  could be made either as an export option (/mnt/foo *(noserverino) or like in the Samba case, a mount option. 

This way existing tools will continue to work and we don't have to reinvent various Linux subsystems. Because it's an option, users that don't use btrfs or other filesystems with snapshots, can simply skip it. 

Thanks, 
Forza 


^ permalink raw reply	[flat|nested] 129+ messages in thread

end of thread, other threads:[~2021-08-25  2:07 UTC | newest]

Thread overview: 129+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-27 22:37 [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly NeilBrown
2021-07-27 22:37 ` [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points NeilBrown
2021-07-28 10:13   ` Amir Goldstein
2021-07-29  0:28     ` NeilBrown
2021-07-29  5:27       ` Amir Goldstein
2021-08-06  7:52         ` Miklos Szeredi
2021-08-06  8:08           ` Amir Goldstein
2021-08-06  8:18             ` Miklos Szeredi
2021-07-28 19:17   ` J. Bruce Fields
2021-07-28 22:25     ` NeilBrown
2021-07-27 22:37 ` [PATCH 04/11] VFS: export lookup_mnt() NeilBrown
2021-07-30  0:31   ` Al Viro
2021-07-30  5:33     ` NeilBrown
2021-07-27 22:37 ` [PATCH 01/11] VFS: show correct dev num in mountinfo NeilBrown
2021-07-30  0:25   ` Al Viro
2021-07-30  5:28     ` NeilBrown
2021-07-30  5:54       ` Miklos Szeredi
2021-07-30  6:13         ` NeilBrown
2021-07-30  7:18           ` Miklos Szeredi
2021-07-30  7:33             ` NeilBrown
2021-07-30  7:59               ` Miklos Szeredi
2021-08-02  4:18                 ` A Third perspective on BTRFS nfsd subvol dev/inode number issues NeilBrown
2021-08-02  5:25                   ` Al Viro
2021-08-02  5:40                     ` NeilBrown
2021-08-02  7:54                       ` Amir Goldstein
2021-08-02 13:53                         ` Josef Bacik
2021-08-03 22:29                           ` Qu Wenruo
2021-08-02 14:47                         ` Frank Filz
2021-08-02 21:24                         ` NeilBrown
2021-08-02  7:15                   ` Martin Steigerwald
2021-08-02 21:40                     ` NeilBrown
2021-08-02 12:39                   ` J. Bruce Fields
2021-08-02 20:32                     ` Patrick Goetz
2021-08-02 20:41                       ` J. Bruce Fields
2021-08-02 21:10                     ` NeilBrown
2021-08-02 21:50                       ` J. Bruce Fields
2021-08-02 21:59                         ` NeilBrown
2021-08-02 22:14                           ` J. Bruce Fields
2021-08-02 22:36                             ` NeilBrown
2021-08-03  0:15                               ` J. Bruce Fields
2021-07-27 22:37 ` [PATCH 03/11] VFS: pass lookup_flags into follow_down() NeilBrown
2021-07-27 22:37 ` [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots NeilBrown
2021-07-28  8:37   ` kernel test robot
2021-07-28  8:37     ` kernel test robot
2021-07-28  8:37   ` [RFC PATCH] btrfs: btrfs_mountpoint_expiry_timeout can be static kernel test robot
2021-07-28  8:37     ` kernel test robot
2021-07-28 13:12   ` [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots Christian Brauner
2021-07-29  0:43     ` NeilBrown
2021-07-29 14:38       ` Christian Brauner
2021-07-31  6:25   ` [btrfs] 5874902268: xfstests.btrfs.202.fail kernel test robot
2021-07-31  6:25     ` kernel test robot
2021-07-27 22:37 ` [PATCH 06/11] nfsd: include a vfsmount in struct svc_fh NeilBrown
2021-07-27 22:37 ` [PATCH 10/11] btrfs: introduce mapping function from location to inum NeilBrown
2021-07-27 22:37 ` [PATCH 02/11] VFS: allow d_automount to create in-place bind-mount NeilBrown
2021-07-27 22:37 ` [PATCH 09/11] nfsd: Allow filehandle lookup to cross internal mount points NeilBrown
2021-07-28 19:15   ` J. Bruce Fields
2021-07-28 22:29     ` NeilBrown
2021-07-30  0:42   ` Al Viro
2021-07-30  5:43     ` NeilBrown
2021-07-27 22:37 ` [PATCH 08/11] nfsd: change get_parent_attributes() to nfsd_get_mounted_on() NeilBrown
2021-07-27 22:37 ` [PATCH 05/11] VFS: new function: mount_is_internal() NeilBrown
2021-07-28  2:16   ` Al Viro
2021-07-28  3:32     ` NeilBrown
2021-07-30  0:34       ` Al Viro
2021-07-28  2:19 ` [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly Al Viro
2021-07-28  4:58 ` Wang Yugui
2021-07-28  6:04   ` Wang Yugui
2021-07-28  7:01     ` NeilBrown
2021-07-28 12:26       ` Neal Gompa
2021-07-28 19:14         ` J. Bruce Fields
2021-07-29  1:29           ` Zygo Blaxell
2021-07-29  1:43             ` NeilBrown
2021-07-29 23:20               ` Zygo Blaxell
2021-07-28 22:50         ` NeilBrown
2021-07-29  2:37           ` Zygo Blaxell
2021-07-29  3:36             ` NeilBrown
2021-07-29 23:20               ` Zygo Blaxell
2021-07-30  2:36                 ` NeilBrown
2021-07-30  5:25                   ` Qu Wenruo
2021-07-30  5:31                     ` Qu Wenruo
2021-07-30  5:53                       ` Amir Goldstein
2021-07-30  6:00                       ` NeilBrown
2021-07-30  6:09                         ` Qu Wenruo
2021-07-30  5:58                     ` NeilBrown
2021-07-30  6:23                       ` Qu Wenruo
2021-07-30  6:53                         ` NeilBrown
2021-07-30  7:09                           ` Qu Wenruo
2021-07-30 18:15                             ` Zygo Blaxell
2021-07-30 15:17                         ` J. Bruce Fields
2021-07-30 15:48                           ` Josef Bacik
2021-07-30 16:25                             ` Forza
2021-07-30 17:43                             ` Zygo Blaxell
2021-07-30  5:28                   ` Amir Goldstein
2021-07-28 13:43       ` g.btrfs
2021-07-29  1:39         ` NeilBrown
2021-07-29  9:28           ` Graham Cobb
2021-07-28  7:06   ` NeilBrown
2021-07-28  9:36     ` Wang Yugui
2021-07-28 19:35 ` J. Bruce Fields
2021-07-28 21:30   ` Josef Bacik
2021-07-30  0:13     ` Al Viro
2021-07-30  6:08       ` NeilBrown
2021-08-13  1:45 ` [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export NeilBrown
2021-08-13 14:55   ` Josef Bacik
2021-08-15  7:39   ` Goffredo Baroncelli
2021-08-15 19:35     ` Roman Mamedov
2021-08-15 21:03       ` Goffredo Baroncelli
2021-08-15 21:53         ` NeilBrown
2021-08-17 19:34           ` Goffredo Baroncelli
2021-08-17 21:39             ` NeilBrown
2021-08-18 17:24               ` Goffredo Baroncelli
2021-08-15 22:17       ` NeilBrown
2021-08-19  8:01         ` Amir Goldstein
2021-08-20  3:21           ` NeilBrown
2021-08-20  6:23             ` Amir Goldstein
2021-08-23  4:05         ` [PATCH v2] BTRFS/NFSD: " NeilBrown
2021-08-23  8:17           ` kernel test robot
2021-08-23  8:17             ` kernel test robot
2021-08-18 14:54   ` [PATCH] VFS/BTRFS/NFSD: " Wang Yugui
2021-08-18 21:46     ` NeilBrown
2021-08-19  2:19       ` Zygo Blaxell
2021-08-20  2:54         ` NeilBrown
2021-08-22 19:29           ` Zygo Blaxell
2021-08-23  5:51             ` NeilBrown
2021-08-23 23:22             ` NeilBrown
2021-08-25  2:06               ` Zygo Blaxell
2021-08-23  0:57         ` Wang Yugui
2021-08-02  9:11 A Third perspective on BTRFS nfsd subvol dev/inode number issues Forza
2021-08-02 21:50 ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.