linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Josef Bacik <josef@toxicpanda.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>,
	Qu Wenruo <quwenruo.btrfs@gmx.com>, NeilBrown <neilb@suse.de>,
	Neal Gompa <ngompa13@gmail.com>,
	Wang Yugui <wangyugui@e16-tech.com>,
	Christoph Hellwig <hch@infradead.org>,
	Chuck Lever <chuck.lever@oracle.com>, Chris Mason <clm@fb.com>,
	David Sterba <dsterba@suse.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-nfs@vger.kernel.org,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
Date: Fri, 30 Jul 2021 13:43:13 -0400	[thread overview]
Message-ID: <20210730174313.GM10170@hungrycats.org> (raw)
In-Reply-To: <ae85654d-950f-04a2-8fca-145412b31e57@toxicpanda.com>

On Fri, Jul 30, 2021 at 11:48:15AM -0400, Josef Bacik wrote:
> On 7/30/21 11:17 AM, J. Bruce Fields wrote:
> > On Fri, Jul 30, 2021 at 02:23:44PM +0800, Qu Wenruo wrote:
> > > OK, forgot it's an opt-in feature, then it's less an impact.
> > > 
> > > But it can still sometimes be problematic.
> > > 
> > > E.g. if the user want to put some git code into one subvolume, while
> > > export another subvolume through NFS.
> > > 
> > > Then the user has to opt-in, affecting the git subvolume to lose the
> > > ability to determine subvolume boundary, right?
> > 
> > Totally naive question: is it be possible to treat different subvolumes
> > differently, and give the user some choice at subvolume creation time
> > how this new boundary should behave?
> > 
> > It seems like there are some conflicting priorities that can only be
> > resolved by someone who knows the intended use case.
> > 
> 
> This is the crux of the problem.  We have no real interfaces or anything to
> deal with this sort of paradigm.  We do the st_dev thing because that's the
> most common way that tools like find or rsync use to determine they've
> wandered into a "different" volume.  This exists specifically because of
> usescases like Zygo's, where he's taking thousands of snapshots and manually
> excluding them from find/rsync is just not reasonable.
> 
> We have no good way to give the user information about what's going on, we
> just have these old shitty interfaces.  I asked our guys about filling up
> /proc/self/mountinfo with our subvolumes and they had a heart attack because
> we have around 2-4k subvolumes on machines, and with monitoring stuff in
> place we regularly read /proc/self/mountinfo to determine what's mounted and
> such.
> 
> And then there's NFS which needs to know that it's walked into a new inode space.

NFS somehow works surprisingly well without knowing that.  I didn't know
there was a problem with NFS, despite exporting thousands of btrfs subvols
from a single export point for 7 years.  Maybe I have some non-default
setting in /etc/exports which works around the problems, or maybe I got
lucky, and all my use cases are weirdly specific and evade all the bugs
by accident?

> This is all super shitty, and mostly exists because we don't have a good way
> to expose to the user wtf is going on.
> 
> Personally I would be ok with simply disallowing NFS to wander into
> subvolumes from an exported fs.  If you want to export subvolumes then
> export them individually, otherwise if you walk into a subvolume from NFS
> you simply get an empty directory.

As a present exporter of thousands of btrfs subvols over NFS from single
export points, I'm not a fan of this idea.

> This doesn't solve the mountinfo problem where a user may want to figure out
> which subvol they're in, but this is where I think we could address the
> issue with better interfaces.  Or perhaps Neil's idea to have a common major
> number with a different minor number for every subvol.

It's not hard to figure out what subvol you're in.  There's an ioctl
which tells the subvol ID, and another that tells the name.  The problem
is that it's btrfs-specific, and no existing software knows how and when
to use it (and also it's privileged, but that's easy to fix compared to
the other issues).

> Either way this isn't as simple as shoehorning it into automount and being
> done with it, we need to take a step back and think about how should this
> actually look, taking into account we've got 12 years of having Btrfs
> deployed with existing usecases that expect a certain behavior.  Thanks,

I think if we got into a time machine, went back 12 years, changed
the btrfs behavior, and then returned to the present, in the alternate
history, we would all be here today talking about how mountinfo doesn't
scale up to what btrfs throws at it, and can btrfs opt out of it somehow.

Maybe we could have a system call for mount point discovery?  Right now,
the kernel throws a trail of breadcrumbs into /proc/self/mountinfo,
and users use userspace libraries to translate that text blob into
actionable information.  We could solve problems with scalability and
visibility in mountinfo if we only had to provide the information in
the context of a single inode (i.e. the inode's parent or child mount
points accessible to the caller).

So you'd have a call for "get paths for all the mount points below inode
X" and another for "get paths for all mount points above inode X", and
calls that tell you details about mount points (like what they're mounted
on, which filesystem they are part of, what the mount flags are, etc).

> Josef

  parent reply	other threads:[~2021-07-30 17:43 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-27 22:37 [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly NeilBrown
2021-07-27 22:37 ` [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points NeilBrown
2021-07-28 10:13   ` Amir Goldstein
2021-07-29  0:28     ` NeilBrown
2021-07-29  5:27       ` Amir Goldstein
2021-08-06  7:52         ` Miklos Szeredi
2021-08-06  8:08           ` Amir Goldstein
2021-08-06  8:18             ` Miklos Szeredi
2021-07-28 19:17   ` J. Bruce Fields
2021-07-28 22:25     ` NeilBrown
2021-07-27 22:37 ` [PATCH 04/11] VFS: export lookup_mnt() NeilBrown
2021-07-30  0:31   ` Al Viro
2021-07-30  5:33     ` NeilBrown
2021-07-27 22:37 ` [PATCH 01/11] VFS: show correct dev num in mountinfo NeilBrown
2021-07-30  0:25   ` Al Viro
2021-07-30  5:28     ` NeilBrown
2021-07-30  5:54       ` Miklos Szeredi
2021-07-30  6:13         ` NeilBrown
2021-07-30  7:18           ` Miklos Szeredi
2021-07-30  7:33             ` NeilBrown
2021-07-30  7:59               ` Miklos Szeredi
2021-08-02  4:18                 ` A Third perspective on BTRFS nfsd subvol dev/inode number issues NeilBrown
2021-08-02  5:25                   ` Al Viro
2021-08-02  5:40                     ` NeilBrown
2021-08-02  7:54                       ` Amir Goldstein
2021-08-02 13:53                         ` Josef Bacik
2021-08-03 22:29                           ` Qu Wenruo
2021-08-02 14:47                         ` Frank Filz
2021-08-02 21:24                         ` NeilBrown
2021-08-02  7:15                   ` Martin Steigerwald
2021-08-02 21:40                     ` NeilBrown
2021-08-02 12:39                   ` J. Bruce Fields
2021-08-02 20:32                     ` Patrick Goetz
2021-08-02 20:41                       ` J. Bruce Fields
2021-08-02 21:10                     ` NeilBrown
2021-08-02 21:50                       ` J. Bruce Fields
2021-08-02 21:59                         ` NeilBrown
2021-08-02 22:14                           ` J. Bruce Fields
2021-08-02 22:36                             ` NeilBrown
2021-08-03  0:15                               ` J. Bruce Fields
2021-07-27 22:37 ` [PATCH 03/11] VFS: pass lookup_flags into follow_down() NeilBrown
2021-07-27 22:37 ` [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots NeilBrown
2021-07-28  8:37   ` kernel test robot
2021-07-28  8:37   ` [RFC PATCH] btrfs: btrfs_mountpoint_expiry_timeout can be static kernel test robot
2021-07-28 13:12   ` [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots Christian Brauner
2021-07-29  0:43     ` NeilBrown
2021-07-29 14:38       ` Christian Brauner
2021-07-31  6:25   ` [btrfs] 5874902268: xfstests.btrfs.202.fail kernel test robot
2021-07-27 22:37 ` [PATCH 06/11] nfsd: include a vfsmount in struct svc_fh NeilBrown
2021-07-27 22:37 ` [PATCH 10/11] btrfs: introduce mapping function from location to inum NeilBrown
2021-07-27 22:37 ` [PATCH 02/11] VFS: allow d_automount to create in-place bind-mount NeilBrown
2021-07-27 22:37 ` [PATCH 09/11] nfsd: Allow filehandle lookup to cross internal mount points NeilBrown
2021-07-28 19:15   ` J. Bruce Fields
2021-07-28 22:29     ` NeilBrown
2021-07-30  0:42   ` Al Viro
2021-07-30  5:43     ` NeilBrown
2021-07-27 22:37 ` [PATCH 08/11] nfsd: change get_parent_attributes() to nfsd_get_mounted_on() NeilBrown
2021-07-27 22:37 ` [PATCH 05/11] VFS: new function: mount_is_internal() NeilBrown
2021-07-28  2:16   ` Al Viro
2021-07-28  3:32     ` NeilBrown
2021-07-30  0:34       ` Al Viro
2021-07-28  2:19 ` [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly Al Viro
2021-07-28  4:58 ` Wang Yugui
2021-07-28  6:04   ` Wang Yugui
2021-07-28  7:01     ` NeilBrown
2021-07-28 12:26       ` Neal Gompa
2021-07-28 19:14         ` J. Bruce Fields
2021-07-29  1:29           ` Zygo Blaxell
2021-07-29  1:43             ` NeilBrown
2021-07-29 23:20               ` Zygo Blaxell
2021-07-28 22:50         ` NeilBrown
2021-07-29  2:37           ` Zygo Blaxell
2021-07-29  3:36             ` NeilBrown
2021-07-29 23:20               ` Zygo Blaxell
2021-07-30  2:36                 ` NeilBrown
2021-07-30  5:25                   ` Qu Wenruo
2021-07-30  5:31                     ` Qu Wenruo
2021-07-30  5:53                       ` Amir Goldstein
2021-07-30  6:00                       ` NeilBrown
2021-07-30  6:09                         ` Qu Wenruo
2021-07-30  5:58                     ` NeilBrown
2021-07-30  6:23                       ` Qu Wenruo
2021-07-30  6:53                         ` NeilBrown
2021-07-30  7:09                           ` Qu Wenruo
2021-07-30 18:15                             ` Zygo Blaxell
2021-07-30 15:17                         ` J. Bruce Fields
2021-07-30 15:48                           ` Josef Bacik
2021-07-30 16:25                             ` Forza
2021-07-30 17:43                             ` Zygo Blaxell [this message]
2021-07-30  5:28                   ` Amir Goldstein
2021-07-28 13:43       ` g.btrfs
2021-07-29  1:39         ` NeilBrown
2021-07-29  9:28           ` Graham Cobb
2021-07-28  7:06   ` NeilBrown
2021-07-28  9:36     ` Wang Yugui
2021-07-28 19:35 ` J. Bruce Fields
2021-07-28 21:30   ` Josef Bacik
2021-07-30  0:13     ` Al Viro
2021-07-30  6:08       ` NeilBrown
2021-08-13  1:45 ` [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export NeilBrown
2021-08-13 14:55   ` Josef Bacik
2021-08-15  7:39   ` Goffredo Baroncelli
2021-08-15 19:35     ` Roman Mamedov
2021-08-15 21:03       ` Goffredo Baroncelli
2021-08-15 21:53         ` NeilBrown
2021-08-17 19:34           ` Goffredo Baroncelli
2021-08-17 21:39             ` NeilBrown
2021-08-18 17:24               ` Goffredo Baroncelli
2021-08-15 22:17       ` NeilBrown
2021-08-19  8:01         ` Amir Goldstein
2021-08-20  3:21           ` NeilBrown
2021-08-20  6:23             ` Amir Goldstein
2021-08-23  4:05         ` [PATCH v2] BTRFS/NFSD: " NeilBrown
2021-08-18 14:54   ` [PATCH] VFS/BTRFS/NFSD: " Wang Yugui
2021-08-18 21:46     ` NeilBrown
2021-08-19  2:19       ` Zygo Blaxell
2021-08-20  2:54         ` NeilBrown
2021-08-22 19:29           ` Zygo Blaxell
2021-08-23  5:51             ` NeilBrown
2021-08-23 23:22             ` NeilBrown
2021-08-25  2:06               ` Zygo Blaxell
2021-08-23  0:57         ` Wang Yugui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210730174313.GM10170@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=hch@infradead.org \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=ngompa13@gmail.com \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wangyugui@e16-tech.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).