linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Richard Weinberger <richard@nod.at>
To: bfields <bfields@fieldses.org>
Cc: linux-nfs <linux-nfs@vger.kernel.org>,
	david <david@sigma-star.at>,
	luis turcitu <luis.turcitu@appsbroker.com>,
	david young <david.young@appsbroker.com>,
	david oberhollenzer <david.oberhollenzer@sigma-star.at>,
	trond myklebust <trond.myklebust@hammerspace.com>,
	anna schumaker <anna.schumaker@netapp.com>,
	chris chilvers <chris.chilvers@appsbroker.com>
Subject: Re: [RFC PATCH 2/6] exports: Implement new export option reexport=
Date: Wed, 9 Mar 2022 10:43:27 +0100 (CET)	[thread overview]
Message-ID: <401495945.127799.1646819007180.JavaMail.zimbra@nod.at> (raw)
In-Reply-To: <20220308221007.GC22644@fieldses.org>

Bruce,

----- Ursprüngliche Mail -----
> Von: "bfields" <bfields@fieldses.org>
>> 1. auto-fsidnum
>>    In this mode mountd/exportd will create a new numerical fsid
>>    for a NFS volume and subvolume. The numbers are stored in a database
>>    such that the server will always use the same fsid.
>>    The entry in the exports file allowed to skip fsid= entiry but
>>    stating a UUID is allowed, if needed.
>> 
>>    This mode has the obvious downside that load balancing is not
>>    possible since multiple re-exporting NFS servers would generate
>>    different ids.
> 
> This is the one I think it makes sense to concentrate on first.  Ideally
> it should Just Work without requiring any configuration.

Agreed.
 
> And then eventually my hope is that we could replace sqlite by a
> distributed database to get filehandles that are consistent across
> multiple servers.

Sure. I see at least two options here:

a. Allow multiple SQL backends in nfs-utils. SQLite by default, but also remote MariaDB
or Postgres...

b. Placing the SQLite database on a shared file system that is capable of file locks.
That way we can use SQlite as-is. We just need to handle the SQLITE_LOCKED case in the code.
Luckily writing happens seldom, so this shouldn't be a big deal.

>> 
>> 2. predefined-fsidnum
>>    This mode works just like auto-fsidnum but does not generate ids
>>    for you. It helps in the load balancing case. A system administrator
>>    has to manually maintain the database and install it on all re-exporting
>>    NFS servers. If you have a massive amount of subvolumes this mode
>>    will help because you don't have to bloat the exports list.
> 
> OK, I can see that being sort of useful but it'd be nice if we could
> start with something more automatic.
> 
>> 3. remote-devfsid
>>    If this mode is selected mountd/exportd will derive an UUID from the
>>    re-exported NFS volume's fsid (rfc7530 section-5.8.1.9).
> 
> How does the server take a filehandle with a UUID in it and map that
> UUID back to the original fsid?

knfsd does not need the original fsid. All it sees is the UUID.
If it needs to know which export belongs to a UUID it asks mountd.
In mountd the regular UUID lookup is used then.

>>    No further local state is needed on the re-exporting server.
>>    The export list entry still needs a fsid= setting because while
>>    parsing the exports file the NFS mounts might be not there yet.
> 
> I don't understand that bit.

I tried to explain that with this mode we don't need to store UUID or
fsids on disk.

>>    This mode is dangerous, use only of you're absolutely sure that the
>>    NFS server you're re-exporting has a stable fsid. Chances are good
>>    that it can change.
> 
> The fsid should be stable.

Didn't you explain me last time that it is not?
By fsid I mean:
https://datatracker.ietf.org/doc/html/rfc7530#section-5.8.1.9
https://datatracker.ietf.org/doc/html/rfc7530#section-2.2.5

So after a reboot the very same filesystem could be on different
disks and the major/minor tuple is different. (If the server uses disk  ids
as is).
 
> The case I'm worried about is the case where we're reexporting exports
> from multiple servers.  Then there's nothing preventing the two servers
> from accidentally picking the same fsid to represent different exports.

That's a good point. Since /proc/fs/nfsfs/volumes shows all that information
we can add sanity checks to mountd.

Thanks,
//richard

  reply	other threads:[~2022-03-09  9:43 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-17 13:15 [RFC PATCH 0/6] nfs-utils: Improving NFS re-exports Richard Weinberger
2022-02-17 13:15 ` [RFC PATCH 1/6] Implement reexport helper library Richard Weinberger
2022-03-08 21:44   ` J. Bruce Fields
2022-03-09  9:43     ` Richard Weinberger
2022-03-09 14:19       ` bfields
2022-03-09 15:02         ` Richard Weinberger
2022-03-09 15:28           ` bfields
2022-02-17 13:15 ` [RFC PATCH 2/6] exports: Implement new export option reexport= Richard Weinberger
2022-03-08 22:10   ` J. Bruce Fields
2022-03-09  9:43     ` Richard Weinberger [this message]
2022-02-17 13:15 ` [RFC PATCH 3/6] export: Implement logic behind reexport= Richard Weinberger
2022-02-17 13:15 ` [RFC PATCH 4/6] export: Record mounted volumes Richard Weinberger
2022-02-17 13:15 ` [RFC PATCH 5/6] nfsd: statfs() every known subvolume upon start Richard Weinberger
2022-02-17 13:15 ` [RFC PATCH 6/6] export: Garbage collect orphaned subvolumes " Richard Weinberger
2022-02-17 16:33 ` [RFC PATCH 0/6] nfs-utils: Improving NFS re-exports J. Bruce Fields
2022-02-17 17:27   ` Richard Weinberger
2022-02-17 19:27     ` bfields
2022-02-17 20:15       ` Richard Weinberger
2022-02-17 20:18         ` bfields
2022-02-17 20:29           ` Richard Weinberger
2022-03-07  9:25   ` Richard Weinberger
2022-03-07 22:29     ` bfields
2022-04-19 20:20       ` Steve Dickson
2022-04-19 20:31         ` Richard Weinberger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=401495945.127799.1646819007180.JavaMail.zimbra@nod.at \
    --to=richard@nod.at \
    --cc=anna.schumaker@netapp.com \
    --cc=bfields@fieldses.org \
    --cc=chris.chilvers@appsbroker.com \
    --cc=david.oberhollenzer@sigma-star.at \
    --cc=david.young@appsbroker.com \
    --cc=david@sigma-star.at \
    --cc=linux-nfs@vger.kernel.org \
    --cc=luis.turcitu@appsbroker.com \
    --cc=trond.myklebust@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).