From: Josef Bacik <josef@toxicpanda.com>
To: NeilBrown <neilb@suse.de>, Chris Mason <clm@fb.com>,
David Sterba <dsterba@suse.com>
Cc: linux-fsdevel@vger.kernel.org,
Linux NFS list <linux-nfs@vger.kernel.org>,
Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH/RFC 0/4] Attempt to make progress with btrfs dev number strangeness.
Date: Tue, 10 Aug 2021 16:51:11 -0400 [thread overview]
Message-ID: <e6496956-0df3-6232-eecb-5209b28ca790@toxicpanda.com> (raw)
In-Reply-To: <162848123483.25823.15844774651164477866.stgit@noble.brown>
On 8/8/21 11:55 PM, NeilBrown wrote:
> I continue to search for a way forward for btrfs so that its behaviour
> with respect to device numbers and subvols is somewhat coherent.
>
> This series implements some of the ideas in my "A Third perspective"[1],
> though with changes is various details.
>
> I introduce two new mount options, which default to
> no-change-in-behaviour.
>
> -o inumbits= causes inode numbers to be more unique across a whole btrfs
> filesystem, and is many cases completely unique. Mounting
> with "-i inumbits=56" will resolve the NFS issues that
> started me tilting at this particular windmill.
>
> -o numdevs= can reduce the number of distinct devices reported by
> stat(), either to 2 or to 1.
> Both ease problems for sites that exhaust their supply of
> device numbers.
> '2' allows "du -x" to continue to work, but is otherwise
> rather strange.
> '1' breaks the use of "du -x" and similar to examine a
> single subvol which might have subvol descendants, but
> provides generally sane behaviour
> "-o numdevs=1" also forces inumbits to have a useful value.
>
> I introduce a "tree id" which can be discovered using statx(). Two
> files with the same dev and ino might still be different if the tree-ids
> are different. Connected files with the same tree-id may be usefully
> considered to be related.
>
> I also change various /proc files (only when numdevs=1 is used) to
> provide extra information so they are useful with btrfs despite subvols.
> /proc/maps /proc/smaps /proc/locks /proc/X/fdinfo/Y are affected.
> The inode number becomes "XX:YY" where XX is the subvol number (tree id)
> and YY is the inode number.
>
> An alternate might be to report a number which might use up to 128 bits.
> Which is less likely to seriously break code?
>
> Note that code which ignores badly formatted lines is safe, because it
> will never currently find a match for a btrfs file in these files
> anyway. The device number they report is never returned in st_dev for
> stat() on any file.
>
> The audit subsystem and one or two other places report dev/ino and so
> need enhanced, but I haven't tried to address those.
>
> Various trace points also report dev/ino. I haven't tried thinking
> about those either.
I think this is a step in the right direction, but I want to figure out a way to
accomplish this without magical mount points that users must be aware of.
I think the stat() st_dev ship as sailed, we're stuck with that. However
Christoph does have a valid point where it breaks the various info spit out by
/proc. You've done a good job with the treeid here, but it still makes it
impossible for somebody to map the st_dev back to the correct mount.
I think we aren't going to solve that problem, at least not with stat(). I
think with statx() spitting out treeid we have given userspace a way to
differentiate subvolumes, and so we should fix statx() to spit out the the super
block device, that way new userspace things can do their appropriate lookup if
they so choose.
This leaves the problem of nfsd. Can you just integrate this new treeid into
nfsd, and use that to either change the ino within nfsd itself, or do something
similar to what your first patchset did and generate a fsid based on the treeid?
Mount options are messy, and are just going to lead to distro's turning them on
without understanding what's going on and then we have to support them forever.
I want to get this fixed in a way that we all hate the least with as little
opportunity for confused users to make bad decisions. Thanks,
Josef
next prev parent reply other threads:[~2021-08-10 20:51 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-09 3:55 [PATCH/RFC 0/4] Attempt to make progress with btrfs dev number strangeness NeilBrown
2021-08-09 3:55 ` [PATCH 4/4] Add "tree" number to "inode" number in various /proc files NeilBrown
2021-08-09 3:55 ` [PATCH 3/4] VFS/btrfs: add STATX_TREE_ID NeilBrown
2021-08-09 3:55 ` [PATCH 1/4] btrfs: include subvol identifier in inode number if -o inumbits= NeilBrown
2021-08-09 3:55 ` [PATCH 2/4] btrfs: add numdevs= mount option NeilBrown
2021-08-09 7:50 ` kernel test robot
2021-08-10 20:51 ` Josef Bacik [this message]
2021-08-11 22:13 ` [PATCH/RFC 0/4] Attempt to make progress with btrfs dev number strangeness NeilBrown
2021-08-12 13:54 ` Josef Bacik
2021-08-12 14:06 ` Hugo Mills
2021-08-12 22:35 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e6496956-0df3-6232-eecb-5209b28ca790@toxicpanda.com \
--to=josef@toxicpanda.com \
--cc=clm@fb.com \
--cc=dsterba@suse.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).