linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: NeilBrown <neilb@suse.de>, Chris Mason <clm@fb.com>,
	David Sterba <dsterba@suse.com>
Cc: linux-fsdevel@vger.kernel.org,
	Linux NFS list <linux-nfs@vger.kernel.org>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH/RFC 0/4] Attempt to make progress with btrfs dev number strangeness.
Date: Tue, 10 Aug 2021 16:51:11 -0400	[thread overview]
Message-ID: <e6496956-0df3-6232-eecb-5209b28ca790@toxicpanda.com> (raw)
In-Reply-To: <162848123483.25823.15844774651164477866.stgit@noble.brown>

On 8/8/21 11:55 PM, NeilBrown wrote:
> I continue to search for a way forward for btrfs so that its behaviour
> with respect to device numbers and subvols is somewhat coherent.
> 
> This series implements some of the ideas in my "A Third perspective"[1],
> though with changes is various details.
> 
> I introduce two new mount options, which default to
> no-change-in-behaviour.
> 
>   -o inumbits=  causes inode numbers to be more unique across a whole btrfs
>                 filesystem, and is many cases completely unique.  Mounting
>                 with "-i inumbits=56" will resolve the NFS issues that
>                 started me tilting at this particular windmill.
> 
>   -o numdevs=  can reduce the number of distinct devices reported by
>                stat(), either to 2 or to 1.
>                Both ease problems for sites that exhaust their supply of
>                device numbers.
>                '2' allows "du -x" to continue to work, but is otherwise
>                rather strange.
>                '1' breaks the use of "du -x" and similar to examine a
>                single subvol which might have subvol descendants, but
>                provides generally sane behaviour
>                "-o numdevs=1" also forces inumbits to have a useful value.
> 
> I introduce a "tree id" which can be discovered using statx().  Two
> files with the same dev and ino might still be different if the tree-ids
> are different.  Connected files with the same tree-id may be usefully
> considered to be related.
> 
> I also change various /proc files (only when numdevs=1 is used) to
> provide extra information so they are useful with btrfs despite subvols.
> /proc/maps /proc/smaps /proc/locks /proc/X/fdinfo/Y are affected.
> The inode number becomes "XX:YY" where XX is the subvol number (tree id)
> and YY is the inode number.
> 
> An alternate might be to report a number which might use up to 128 bits.
> Which is less likely to seriously break code?
> 
> Note that code which ignores badly formatted lines is safe, because it
> will never currently find a match for a btrfs file in these files
> anyway.  The device number they report is never returned in st_dev for
> stat() on any file.
> 
> The audit subsystem and one or two other places report dev/ino and so
> need enhanced, but I haven't tried to address those.
> 
> Various trace points also report dev/ino.  I haven't tried thinking
> about those either.

I think this is a step in the right direction, but I want to figure out a way to 
accomplish this without magical mount points that users must be aware of.

I think the stat() st_dev ship as sailed, we're stuck with that.  However 
Christoph does have a valid point where it breaks the various info spit out by 
/proc.  You've done a good job with the treeid here, but it still makes it 
impossible for somebody to map the st_dev back to the correct mount.

I think we aren't going to solve that problem, at least not with stat().  I 
think with statx() spitting out treeid we have given userspace a way to 
differentiate subvolumes, and so we should fix statx() to spit out the the super 
block device, that way new userspace things can do their appropriate lookup if 
they so choose.

This leaves the problem of nfsd.  Can you just integrate this new treeid into 
nfsd, and use that to either change the ino within nfsd itself, or do something 
similar to what your first patchset did and generate a fsid based on the treeid?

Mount options are messy, and are just going to lead to distro's turning them on 
without understanding what's going on and then we have to support them forever. 
  I want to get this fixed in a way that we all hate the least with as little 
opportunity for confused users to make bad decisions.  Thanks,

Josef


  parent reply	other threads:[~2021-08-10 20:51 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-09  3:55 [PATCH/RFC 0/4] Attempt to make progress with btrfs dev number strangeness NeilBrown
2021-08-09  3:55 ` [PATCH 4/4] Add "tree" number to "inode" number in various /proc files NeilBrown
2021-08-09  3:55 ` [PATCH 3/4] VFS/btrfs: add STATX_TREE_ID NeilBrown
2021-08-09  3:55 ` [PATCH 1/4] btrfs: include subvol identifier in inode number if -o inumbits= NeilBrown
2021-08-09  3:55 ` [PATCH 2/4] btrfs: add numdevs= mount option NeilBrown
2021-08-09  7:50   ` kernel test robot
2021-08-10 20:51 ` Josef Bacik [this message]
2021-08-11 22:13   ` [PATCH/RFC 0/4] Attempt to make progress with btrfs dev number strangeness NeilBrown
2021-08-12 13:54     ` Josef Bacik
2021-08-12 14:06       ` Hugo Mills
2021-08-12 22:35       ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e6496956-0df3-6232-eecb-5209b28ca790@toxicpanda.com \
    --to=josef@toxicpanda.com \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).