All of lore.kernel.org
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Josef Bacik <josef@toxicpanda.com>
Cc: Christoph Hellwig <hch@infradead.org>, Jan Kara <jack@suse.cz>,
	Christian Brauner <brauner@kernel.org>, Chris Mason <clm@fb.com>,
	David Sterba <dsterba@suse.com>,
	linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Miklos Szeredi <miklos@szeredi.hu>
Subject: Re: [PATCH 0/3] fanotify support for btrfs sub-volumes
Date: Sat, 28 Oct 2023 08:57:58 +0300	[thread overview]
Message-ID: <CAOQ4uxjaA=pm_zoE1rG2zZRUeUnKw=rU5AQ2A7uSrRXWeDgVww@mail.gmail.com> (raw)
In-Reply-To: <20231027131726.GA2915471@perftesting>

On Fri, Oct 27, 2023 at 4:17 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On Thu, Oct 26, 2023 at 10:46:01PM -0700, Christoph Hellwig wrote:
> > I think you're missing the point.  A bunch of statx fields might be
> > useful, but they are not solving the problem.  What you need is
> > a separate vfsmount per subvolume so that userspace sees when it
> > is crossing into it.  We probably can't force this onto existing
> > users, so it needs a mount, or even better on-disk option but without
> > that we're not getting anywhere.
> >
>
> We have this same discussion every time, and every time you stop responding
> after I point out the problems with it.
>
> A per-subvolume vfsmount means that /proc/mounts /proc/$PID/mountinfo becomes
> insanely dumb.  I've got millions of machines in this fleet with thousands of
> subvolumes.  One of our workloads fires up several containers per task and runs
> multiple tasks per machine, so on the order of 10-20k subvolumes.
>

I think it is probably just as common to see similar workloads using overlayfs
for containers, especially considering the fact that the more you
scale the number
of containers, the more you need the inode page cache sharing between them.

Overlayfs has sb/vfsmount per instance, so any users having problems with
huge number of mounts would have already complained about it and maybe
they have because...

> So now I've got thousands of entries in /proc/mounts, and literally every system
> related tool parses /proc/mounts every 4 nanoseconds, now I'm significantly
> contributing to global warming from the massive amount of CPU usage that is
> burned parsing this stupid file.
>

...after Miklos sorts out the new list/statmount() syscalls and mount
tree change
notifications, maybe vfsmount per btrfs subvol could be reconsidered? ;)

> Additionally, now you're ending up with potentially sensitive information being
> leaked through /proc/mounts that you didn't expect to be leaked before.  I've
> got users complaining to be me because "/home/john/twilight_fanfic" showed up in
> their /proc/mounts.
>

This makes me wonder.
I understand why using diverse st_dev is needed for btrfs snapshots
where the same st_ino can have different revisions.
I am not sure I understand why diverse st_dev is needed for subvols
that are created for containerisation reasons.
Don't files in sub-vols have unique st_ino anyway?
Is the st_dev mitigation for sub-vol a must or just an implementation
convenience?

> And then there's the expiry thing.  Now they're just directories, reclaim works
> like it works for anything else.  With auto mounts they have to expire at some
> point, which makes them so much more heavier weight than we want to sign up for.
> Who knows what sort of scalability issues we'll run into.
>

I agree that this aspect of auto mount is unfortunate, but I think it would
benefit other fs that support auto mount to improve reclaiming of auto mounts.

In the end, I think that we all understand that the legacy btrfs behavior
is not going away without an opt-in, but I think it would be a good outcome
if users could choose the tradeoff between efficiency of single mount vs.
working well with features like nfs export and fanotify subvol watch.

Having an incentive to migrate to the "multi-sb" btrfs mode, would create
the pressure from end users on distros and from there to project leaders
to fix the issues that you mentioned related to huse number of mounts
and auto mount reclaim.

Thanks,
Amir.

  parent reply	other threads:[~2023-10-28  6:05 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-25 13:50 [PATCH 0/3] fanotify support for btrfs sub-volumes Amir Goldstein
2023-10-25 13:50 ` [PATCH 1/3] fs: define a new super operation to get fsid Amir Goldstein
2023-10-25 13:50 ` [PATCH 2/3] btrfs: implement " Amir Goldstein
2023-10-25 13:50 ` [PATCH 3/3] fanotify: support reporting events with fid on btrfs sub-volumes Amir Goldstein
2023-10-25 15:34 ` [PATCH 0/3] fanotify support for " Christoph Hellwig
2023-10-25 17:04   ` Jan Kara
2023-10-27  5:44     ` Christoph Hellwig
2023-10-27 10:58       ` Jan Kara
2023-10-25 21:06   ` Josef Bacik
2023-10-25 23:02     ` Qu Wenruo
2023-10-26  5:49       ` Amir Goldstein
2023-10-27  5:46     ` Christoph Hellwig
2023-10-27 13:17       ` Josef Bacik
2023-10-27 13:47         ` Miklos Szeredi
2023-10-28  5:57         ` Amir Goldstein [this message]
2023-10-30 13:25         ` Christoph Hellwig
2023-10-31 12:14           ` Christian Brauner
2023-10-31 12:22             ` Christoph Hellwig
2023-10-31 12:50               ` Christian Brauner
2023-10-31 17:06                 ` Christoph Hellwig
2023-11-01  0:03                   ` Qu Wenruo
2023-11-03 14:21                     ` Christoph Hellwig
2023-11-01  8:16                   ` Christian Brauner
2023-11-01  8:41                     ` Qu Wenruo
2023-11-01  9:52                       ` Christian Brauner
2023-11-02  5:13                         ` Josef Bacik
2023-11-02  8:53                           ` Amir Goldstein
2023-11-02  9:48                           ` Christian Brauner
2023-11-02 12:34                             ` Josef Bacik
2023-11-02 17:07                               ` David Sterba
2023-11-02 20:32                                 ` Josef Bacik
2023-11-03  6:56                                 ` Christian Brauner
2023-11-03 13:52                                   ` Josef Bacik
2023-11-02 11:07                           ` Christian Brauner
2023-11-03 14:28                             ` Christoph Hellwig
2023-11-03 15:47                               ` Christian Brauner
2023-11-06  7:53                                 ` Christoph Hellwig
2023-11-06  8:18                                   ` Qu Wenruo
2023-11-06  9:56                                     ` Christian Brauner
2023-11-06 12:25                                     ` Christoph Hellwig
2023-11-06 10:03                                   ` Christian Brauner
2023-11-06 10:41                                     ` Qu Wenruo
2023-11-06 10:59                                       ` Christian Brauner
2023-11-06 12:30                                         ` Christoph Hellwig
2023-11-06 13:05                                           ` Christian Brauner
2023-11-06 17:10                                             ` Christoph Hellwig
2023-11-07  8:58                                               ` Christian Brauner
2023-11-08  7:56                                                 ` Christoph Hellwig
2023-11-08  8:09                                                   ` Christian Brauner
2023-11-08  8:12                                                     ` Christoph Hellwig
2023-11-08  8:22                                                       ` Christian Brauner
2023-11-08 14:07                                                         ` Christoph Hellwig
2023-11-08 15:57                                                           ` Christian Brauner
2023-11-06 12:29                                     ` Christoph Hellwig
2023-11-06 13:47                                       ` Christian Brauner
2023-11-06 17:13                                         ` Christoph Hellwig
2023-11-06 22:42                                           ` Josef Bacik
2023-11-07  9:06                                             ` Christian Brauner
2023-11-08  7:52                                               ` Christoph Hellwig
2023-11-08  8:27                                                 ` Christian Brauner
2023-11-08 14:08                                                   ` Christoph Hellwig
2023-11-08 16:16                                                     ` Christian Brauner
2023-11-08 16:20                                                       ` Christian Brauner
2023-11-09  6:55                                                         ` Christoph Hellwig
2023-11-09  9:07                                                           ` Christian Brauner
2023-11-09 14:41                                                             ` Christoph Hellwig
2023-11-10  9:33                                                               ` Christian Brauner
2023-11-10 10:31                                                                 ` Amir Goldstein
2023-11-09  6:53                                                       ` Christoph Hellwig
2023-11-08  7:51                                             ` Christoph Hellwig
2023-11-08 11:08                                               ` Jan Kara
2023-11-08 14:11                                                 ` Christoph Hellwig
2023-11-06  9:03                                 ` Jan Kara
2023-11-06  9:52                                   ` Christian Brauner
2023-11-06 12:22                                     ` Jan Kara
2023-11-03 14:23                       ` Christoph Hellwig
2023-11-03 14:22                     ` Christoph Hellwig
2023-10-25 17:17 ` Amir Goldstein
2023-10-25 18:02   ` Amir Goldstein
2023-10-26 12:17     ` Jan Kara
2023-10-26 12:36       ` Amir Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOQ4uxjaA=pm_zoE1rG2zZRUeUnKw=rU5AQ2A7uSrRXWeDgVww@mail.gmail.com' \
    --to=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.