Re: [PATCH] nfs: derive f_fsid from server's fsid

From: Amir Goldstein <amir73il@gmail.com>
To: Benjamin Coddington <bcodding@redhat.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>,
	Anna Schumaker <anna@kernel.org>,
	 Jeff Layton <jlayton@kernel.org>,
	Chuck Lever <chuck.lever@oracle.com>,
	 Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org,  linux-nfs@vger.kernel.org
Subject: Re: [PATCH] nfs: derive f_fsid from server's fsid
Date: Tue, 24 Oct 2023 20:12:19 +0300	[thread overview]
Message-ID: <CAOQ4uxho0ryGuq7G+LaoTvqHRR_kg2fCNL2sGMLvNujODA8YPQ@mail.gmail.com> (raw)
In-Reply-To: <2382DA9B-D66B-41D9-8413-1C5319C01165@redhat.com>

On Tue, Oct 24, 2023 at 6:32 PM Benjamin Coddington <bcodding@redhat.com> wrote:
>
> On 24 Oct 2023, at 10:58, Amir Goldstein wrote:
>
> > On Tue, Oct 24, 2023 at 5:01 PM Benjamin Coddington <bcodding@redhat.com> wrote:
> >>
> >> On 24 Oct 2023, at 7:01, Amir Goldstein wrote:
> >>
> >>> Fold the server's 128bit fsid to report f_fsid in statfs(2).
> >>> This is similar to how uuid is folded for f_fsid of ext2/ext4/zonefs.
> >>>
> >>> This allows nfs client to be monitored by fanotify filesystem watch
> >>> for local client access if nfs supports re-export.
> >>>
> >>> For example, with inotify-tools 4.23.8.0, the following command can be
> >>> used to watch local client access over entire nfs filesystem:
> >>>
> >>>   fsnotifywatch --filesystem /mnt/nfs
> >>>
> >>> Note that fanotify filesystem watch does not report remote changes on
> >>> server.  It provides the same notifications as inotify, but it watches
> >>> over the entire filesystem and reports file handle of objects and fsid
> >>> with events.
> >>
> >> I think this will run into trouble where an NFSv4 will report both
> >> fsid.major and fsid.minor as zero for the special root filesystem.   We can
> >> expect an NFSv4 client to have one of these per server.
> >>
> >> Could use s_dev from nfs_server for a unique major/minor for each mount on
> >> the client, but these values won't be stable against a particular server
> >> export.
> >>
> >
> > That's a good point.
> > Not sure I understand the relation between mount/server/export.
> >
> > If the client mounts the special NFSv4 root filesystem at /mnt/nfs,
> > are the rest of the server exports going to be accessible via the same
> > mount/sb or via new auto mounts of different nfs sb?
>
> If we cross into a new filesystem on the server, then the client will also
> cross and leave the "root" and have a new sb with non-zero fsid.
>
> > In any case, f_fsid does not have to be uniform across all inodes
> > of the same sb. This is the case with btrfs, where the the btrfs sb
> > has inodes from the root volume and from sub-volumes.
> > inodes from btrfs sub-volumes have a different f_fsid than inodes
> > in the root btrfs volume.
>
> This isn't what I'm worried about.  I'm worried about the case where an nfs
> client will have multiple mounts with fsid's of 0:0, and those are
> distinctly different mounts of the "root" of NFSv4 on different servers.
>

fanotify_mark() fails with -ENODEV when trying to watch an sb with
zero f_fsid. This is the current state with nfs, so there is no concern
for a problem - it just means that fanotify will not be able to watch
those mounts. It's not great.

> > We try to detect this case in fanotify, which currently does not
> > support watching btrfs sub-volume for that reason.
> > I have a WIP branch [1] for handling non-uniform f_fsid in
> > fanotify by introducing the s_op->get_fsid(inode) method.
> >
> > Anyway, IIUC, my proposed f_fsid change is going to be fine for
> > NFSv2/3 and best effort for NFSv4:
> > - For NFSv2/3 mount, f_fsid is a good identifier?
>
> Yes, it should represent the same filesystem on the server.  You could still
> get duplicates between servers. What's returned in the protocol's u64 fsid
> goes into major with minor always zero.
>
> I'm sure there was discussion about what implementations should use long
> ago, but that predates me.
>

Yeh, duplicates aren't great when watching two different sb with same
f_fsid, it is not possible to know which sb the events came from.
However, the process setting the sb watches can know that in advance.

> > - For NFSv4 mount of a specific export, f_fsid is a good identifier?
>
> Yes, but if the specific export is on the same server's filesystem as the
> "root", you'll still get zero.  There are various ways to set fsid on
> exports for linux servers, but the fsid will be the same for all exports of
> the same filesystem on the server.
>

OK. good to know. I thought zero fsid was only for the root itself.

> > - For the NFSv4 root export mount, f_fsid remains zero as it is now
>
> Yes.
>
> > Am I understanding this correctly?
>
> I think so.
>
> > Do you see a reason not to make this change?
> > Do you see a reason to limit this change for NFSv2/3?
>
> I'm not familiar with fanotify enough to know if having multiple fsid 0
> mounts of different filesystems on different servers will do the right
> thing.  I wanted to point out that very real possibility for v4.
>

The fact that fsid 0 would be very common for many nfs mounts
makes this patch much less attractive.

Because we only get events for local client changes, we do not
have to tie the fsid with the server's fsid, we could just use a local
volatile fsid, as we do in other non-blockdev fs (tmpfs, kernfs).

I am not decisive about the best way to tackle this and since
Jan was not sure about the value of local-only notifications
for network filesystems, I am going to put this one on hold.

Thanks for the useful feedback!
Amir.