All of lore.kernel.org
 help / color / mirror / Atom feed
* Improving NFS re-export
@ 2021-12-09 21:05 Richard Weinberger
  2021-12-09 21:41 ` J. Bruce Fields
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Weinberger @ 2021-12-09 21:05 UTC (permalink / raw)
  To: linux-nfs
  Cc: luis.turcitu, chris.chilvers, david.young, david, bfields,
	david oberhollenzer

Hello NFS list,

I'd like to improve the NFS re-export feature, especially wrt. crossmounts.
Currently a NFS client will face EIO when crossing a mount point on the re-exporting server.
This was discussed here[0]. While in that discussion the assumption was that check_export()
in fs/nfsd/export.c emits EIO I did further experiments and realized that EIO actually
comes from the NFS client side of the re-exporting server.

nfs_encode_fh() in fs/nfs/export.c checks for IS_AUTOMOUNT(inode), if this is the case
it refuses to create a new file handle.
So while accessing /files/disk2 directly on the re-exporting server triggers an automount,
accessing via nfsd the export function of the client side gives up.

AFAIU the suggested proxy-only-mode[1] will not address this problem, right?

One workaround is manually adding an export for each volume on the re-exporting server.
This kinda works but is tedious and error prone.

I have a crazy idea how to automate this:
Since nfs_encode_fh() in the NFS client side of the re-exporting server can detect
crossing mounts, we could install a new export on the sever side as soon the
IS_AUTOMOUNT(inode) case arises. We could even use the same fsid.
What do you think?

Another obstacle is file handle wrapping.
When re-exporting, the NFS client side adds inode and file information to each file handle,
the server side also adds information. In my test setup this enlarges a 16 bytes file handle
to 40 bytes.
The proxy-only-mode won't help us either here.

Did you consider using the opaque file handle from the server as lookup key in a
(persisted) data structure?
That way at least the client side of the re-exporting server no longer has to enlarge
the file handle with inode and file type information.
If the re-exporting server re-exports just one server (proxy-only-mode) we could also
skip adding the fsid to the handle.
What do you think?

I'm looking forward to hear your comments.

Thanks,
//richard

[0] https://marc.info/?l=linux-nfs&m=161670807413876&w=2
[1] https://linux-nfs.org/wiki/index.php/NFS_proxy-only_mode

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Improving NFS re-export
  2021-12-09 21:05 Improving NFS re-export Richard Weinberger
@ 2021-12-09 21:41 ` J. Bruce Fields
  2021-12-09 22:03   ` Richard Weinberger
  0 siblings, 1 reply; 6+ messages in thread
From: J. Bruce Fields @ 2021-12-09 21:41 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: linux-nfs, luis.turcitu, chris.chilvers, david.young, david,
	david oberhollenzer

On Thu, Dec 09, 2021 at 10:05:48PM +0100, Richard Weinberger wrote:
> Hello NFS list,
> 
> I'd like to improve the NFS re-export feature, especially wrt. crossmounts.
> Currently a NFS client will face EIO when crossing a mount point on the re-exporting server.
> This was discussed here[0]. While in that discussion the assumption was that check_export()
> in fs/nfsd/export.c emits EIO I did further experiments and realized that EIO actually
> comes from the NFS client side of the re-exporting server.
> 
> nfs_encode_fh() in fs/nfs/export.c checks for IS_AUTOMOUNT(inode), if this is the case
> it refuses to create a new file handle.
> So while accessing /files/disk2 directly on the re-exporting server triggers an automount,
> accessing via nfsd the export function of the client side gives up.
> 
> AFAIU the suggested proxy-only-mode[1] will not address this problem, right?

That's how I was thinking of addressing the problem, actually.  I
haven't figured out how to make that proxy-only mode work, though.

> One workaround is manually adding an export for each volume on the re-exporting server.
> This kinda works but is tedious and error prone.
> 
> I have a crazy idea how to automate this:
> Since nfs_encode_fh() in the NFS client side of the re-exporting server can detect
> crossing mounts, we could install a new export on the sever side as soon the
> IS_AUTOMOUNT(inode) case arises. We could even use the same fsid.
> What do you think?

Something like that might work.

I'm not sure what you mean by the same fsid.  I think you'd need to make
up a new fsid each time you encounter a new filesystem.  And you'd also
want to persist it on disk if you want this to keep working across
reboots of the proxy.

I think you could patch rpc.mountd to do that.

> Another obstacle is file handle wrapping.
> When re-exporting, the NFS client side adds inode and file information to each file handle,
> the server side also adds information. In my test setup this enlarges a 16 bytes file handle
> to 40 bytes.
> The proxy-only-mode won't help us either here.

Part of my motivation for a proxy-only mode was to remove that wrapping.

Since you're dedicating the host to reexporting one single backend
server, in theory you don't need any of the information in the wrapper.
When you (the proxy) get a filehandle from a client, you know which
server that filehandle originally came from, so you can go ask that
server for whatever you need to know about the filehandle (like an
fsid).

> Did you consider using the opaque file handle from the server as
> lookup key in a (persisted) data structure?

A little, but I don't think it works.

If you do this, you do need to require that you only export one server.
Otherwise there may be collisions (two different servers could return
filehandles that happen to have the same value).

The database would store every filehandle the client has ever seen.
That could be a lot.  It may also include filehandles for since-deleted
files.  The only way to prune such entries would be to try using them
and see if the server gives you STALE errors.

--b.

> That way at least the client side of the re-exporting server no longer has to enlarge
> the file handle with inode and file type information.
> If the re-exporting server re-exports just one server (proxy-only-mode) we could also
> skip adding the fsid to the handle.
> What do you think?
> 
> I'm looking forward to hear your comments.
> 
> Thanks,
> //richard
> 
> [0] https://marc.info/?l=linux-nfs&m=161670807413876&w=2
> [1] https://linux-nfs.org/wiki/index.php/NFS_proxy-only_mode

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Improving NFS re-export
  2021-12-09 21:41 ` J. Bruce Fields
@ 2021-12-09 22:03   ` Richard Weinberger
  2021-12-21 14:30     ` Daire Byrne
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Weinberger @ 2021-12-09 22:03 UTC (permalink / raw)
  To: bfields
  Cc: linux-nfs, luis turcitu, chris chilvers, david young, david,
	david oberhollenzer

----- Ursprüngliche Mail -----
> On Thu, Dec 09, 2021 at 10:05:48PM +0100, Richard Weinberger wrote:
>> nfs_encode_fh() in fs/nfs/export.c checks for IS_AUTOMOUNT(inode), if this is
>> the case
>> it refuses to create a new file handle.
>> So while accessing /files/disk2 directly on the re-exporting server triggers an
>> automount,
>> accessing via nfsd the export function of the client side gives up.
>> 
>> AFAIU the suggested proxy-only-mode[1] will not address this problem, right?
> 
> That's how I was thinking of addressing the problem, actually.  I
> haven't figured out how to make that proxy-only mode work, though.
> 
>> One workaround is manually adding an export for each volume on the re-exporting
>> server.
>> This kinda works but is tedious and error prone.
>> 
>> I have a crazy idea how to automate this:
>> Since nfs_encode_fh() in the NFS client side of the re-exporting server can
>> detect
>> crossing mounts, we could install a new export on the sever side as soon the
>> IS_AUTOMOUNT(inode) case arises. We could even use the same fsid.
>> What do you think?
> 
> Something like that might work.
> 
> I'm not sure what you mean by the same fsid.  I think you'd need to make
> up a new fsid each time you encounter a new filesystem.  And you'd also
> want to persist it on disk if you want this to keep working across
> reboots of the proxy.

By same fsid I meant reusing the fsid from the backend server.
 
> I think you could patch rpc.mountd to do that.

Okay, I need to dig into this.

>> Another obstacle is file handle wrapping.
>> When re-exporting, the NFS client side adds inode and file information to each
>> file handle,
>> the server side also adds information. In my test setup this enlarges a 16 bytes
>> file handle
>> to 40 bytes.
>> The proxy-only-mode won't help us either here.
> 
> Part of my motivation for a proxy-only mode was to remove that wrapping.
> 
> Since you're dedicating the host to reexporting one single backend
> server, in theory you don't need any of the information in the wrapper.
> When you (the proxy) get a filehandle from a client, you know which
> server that filehandle originally came from, so you can go ask that
> server for whatever you need to know about the filehandle (like an
> fsid).

I see. That way we could get rid of file handle wrapping but loose the
NFS clinet inode cache on the re-exporting server, I think.
 
>> Did you consider using the opaque file handle from the server as
>> lookup key in a (persisted) data structure?
> 
> A little, but I don't think it works.
> 
> If you do this, you do need to require that you only export one server.
> Otherwise there may be collisions (two different servers could return
> filehandles that happen to have the same value).
> 
> The database would store every filehandle the client has ever seen.
> That could be a lot.  It may also include filehandles for since-deleted
> files.  The only way to prune such entries would be to try using them
> and see if the server gives you STALE errors.

True. I didn't think about the pruning case.

Thanks a lot for the prompt reply and your valuable input.
//richard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Improving NFS re-export
  2021-12-09 22:03   ` Richard Weinberger
@ 2021-12-21 14:30     ` Daire Byrne
  2021-12-21 17:21       ` bfields
  2021-12-21 21:39       ` Richard Weinberger
  0 siblings, 2 replies; 6+ messages in thread
From: Daire Byrne @ 2021-12-21 14:30 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: bfields, linux-nfs, luis turcitu, chris chilvers, david young,
	david, david oberhollenzer

On Thu, 9 Dec 2021 at 22:03, Richard Weinberger <richard@nod.at> wrote:
>
> I see. That way we could get rid of file handle wrapping but loose the
> NFS clinet inode cache on the re-exporting server, I think.

As an avid user of re-exporting over the WAN, we do like to be able to
selectively cache as much of the metadata lookups as possible
(actimeo=3600, vfs_cache_pressure=1).

I'm not sure if losing the re-export server's client inode cache would
effect that ability?

And on the subject of the "proxy" server and a server per export; if
like us, you have 30 servers or mountpoints to re-export but you might
only actively use 5-10 of those at any one time, so it is more
resource efficient (CPU, RAM, fscache storage) to use a single
re-export server for more than one mountpoint re-export. But in the
proxy case, maybe the same thing could be achieved with a
containerised knfsd with all the proxy servers running on the same
server?

I'm not sure if you could have shared storage and have multiple
fs-cache/cachefilesd in containers though.

Either way, I'm interested to see what you come up with. Always happy
to test new variations on re-exporting.

Daire

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Improving NFS re-export
  2021-12-21 14:30     ` Daire Byrne
@ 2021-12-21 17:21       ` bfields
  2021-12-21 21:39       ` Richard Weinberger
  1 sibling, 0 replies; 6+ messages in thread
From: bfields @ 2021-12-21 17:21 UTC (permalink / raw)
  To: Daire Byrne
  Cc: Richard Weinberger, linux-nfs, luis turcitu, chris chilvers,
	david young, david, david oberhollenzer

On Tue, Dec 21, 2021 at 02:30:45PM +0000, Daire Byrne wrote:
> On Thu, 9 Dec 2021 at 22:03, Richard Weinberger <richard@nod.at> wrote:
> >
> > I see. That way we could get rid of file handle wrapping but loose the
> > NFS clinet inode cache on the re-exporting server, I think.
> 
> As an avid user of re-exporting over the WAN, we do like to be able to
> selectively cache as much of the metadata lookups as possible
> (actimeo=3600, vfs_cache_pressure=1).
> 
> I'm not sure if losing the re-export server's client inode cache would
> effect that ability?

A proxy without an inode cache wouldn't be good.

So the inode cache would have to be indexed just on (a hash of) the raw
filehandle.

> And on the subject of the "proxy" server and a server per export; if
> like us, you have 30 servers or mountpoints to re-export but you might
> only actively use 5-10 of those at any one time, so it is more
> resource efficient (CPU, RAM, fscache storage) to use a single
> re-export server for more than one mountpoint re-export.

That's useful to know, thanks.

> But in the proxy case, maybe the same thing could be achieved with a
> containerised knfsd with all the proxy servers running on the same
> server?

Yes, that's what I was thinking.

> I'm not sure if you could have shared storage and have multiple
> fs-cache/cachefilesd in containers though.

Seems like there should be a few ways to do that.

> Either way, I'm interested to see what you come up with. Always happy
> to test new variations on re-exporting.

I haven't managed to come up with a plan for making a proxy-only mode
work, though, so I'm not feeling too optimistic about that particular
idea.

--b.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Improving NFS re-export
  2021-12-21 14:30     ` Daire Byrne
  2021-12-21 17:21       ` bfields
@ 2021-12-21 21:39       ` Richard Weinberger
  1 sibling, 0 replies; 6+ messages in thread
From: Richard Weinberger @ 2021-12-21 21:39 UTC (permalink / raw)
  To: Daire Byrne
  Cc: bfields, linux-nfs, luis turcitu, chris chilvers, david young,
	david, david oberhollenzer

Daire,

----- Ursprüngliche Mail -----
> Von: "Daire Byrne" <daire@dneg.com>
> Either way, I'm interested to see what you come up with. Always happy
> to test new variations on re-exporting.

David and I will share patches soon. We're quite happy with the kernel side,
but our rpc.mountd changes are still hacky.
We have a prove of concept fix for cross mounts and some crazy ideas how to
reduce the fhandle overhead when re-exporting.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-12-21 21:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-09 21:05 Improving NFS re-export Richard Weinberger
2021-12-09 21:41 ` J. Bruce Fields
2021-12-09 22:03   ` Richard Weinberger
2021-12-21 14:30     ` Daire Byrne
2021-12-21 17:21       ` bfields
2021-12-21 21:39       ` Richard Weinberger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.