All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: virtiofs uuid and file handles
       [not found]           ` <20200922210445.GG57620@redhat.com>
@ 2020-09-23  2:49             ` Amir Goldstein
  2020-09-23  7:44               ` Miklos Szeredi
  0 siblings, 1 reply; 22+ messages in thread
From: Amir Goldstein @ 2020-09-23  2:49 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Miklos Szeredi, overlayfs, linux-fsdevel

On Wed, Sep 23, 2020 at 12:04 AM Vivek Goyal <vgoyal@redhat.com> wrote:
>
...
> > Note that if all overlayfs layers are on the same fs and that fs has
> > null uuid, then the "disk copy" use case should just work, but I never
> > tested that.
> >
> > So far, there has been no filesystem with null uuid that could be used
> > as upper+lower layers (even xfs with option nouuid has non null s_uuid).
> >
> > Recently, virtiofs was added as a filesystem that could be used for
> > upper+lower layers and virtiofs (which is fuse) has null uuid.
>
> I guess I never paid attention to uuid part of virtiofs. Probably we
> are not using index or any of the advanced features of overlayfs yet,
> that's why.
>

I don't expect you should have a problem enabling index because
of null uuid when all layers are on the same virtiofs.
That setup is allowed.
We only ever start checking for null uuid on lower layers that
are NOT on the same fs as upper layer.

What you are expected to have a problem with is that FUSE support
for file handles is "problematic".

I found out the hard way that FUSE can decode NFS file handles
to completely different object than the encoded object if the encoded
inode was evicted from cache and its node id has been reused.

Another problem is that FUSE protocol does not have complete
support for decoding file handles. FUSE implements decode
file handle by LOOKUP(ino, ".") to server, but if server is proxying
a local filesystem, there is not enough information to construct
an open_by_handle_at() request.

I wrote a fuse filesystem whose file handles are persistent and
reliable [1], but it is a specialized server that uses knowledge
of the local filesystem file handle format and it requires that the
local filesystem has a special feature to interpret a file handle
with 0 generation as ANY generation (ext4 does that).

I think that the proper was to implement reliable persistent file
handles in fuse/virtiofs would be to add ENCODE/DECODE to
FUSE protocol and allow the server to handle this.

Thanks,
Amir,

[1] https://github.com/amir73il/libfuse/commits/cachegwfs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: virtiofs uuid and file handles
  2020-09-23  2:49             ` virtiofs uuid and file handles Amir Goldstein
@ 2020-09-23  7:44               ` Miklos Szeredi
  2020-09-23  9:56                 ` Amir Goldstein
  2022-09-11 10:14                 ` Persistent FUSE file handles (Was: virtiofs uuid and file handles) Amir Goldstein
  0 siblings, 2 replies; 22+ messages in thread
From: Miklos Szeredi @ 2020-09-23  7:44 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Vivek Goyal, overlayfs, linux-fsdevel, Max Reitz

On Wed, Sep 23, 2020 at 4:49 AM Amir Goldstein <amir73il@gmail.com> wrote:

> I think that the proper was to implement reliable persistent file
> handles in fuse/virtiofs would be to add ENCODE/DECODE to
> FUSE protocol and allow the server to handle this.

Max Reitz (Cc-d) is currently looking into this.

One proposal was to add  LOOKUP_HANDLE operation that is similar to
LOOKUP except it takes a {variable length handle, name} as input and
returns a variable length handle *and* a u64 node_id that can be used
normally for all other operations.

The advantage of such a scheme for virtio-fs (and possibly other fuse
based fs) would be that userspace need not keep a refcounted object
around until the kernel sends a FORGET, but can prune its node ID
based cache at any time.   If that happens and a request from the
client (kernel) comes in with a stale node ID, the server will return
-ESTALE and the client can ask for a new node ID with a special
lookup_handle(fh, NULL).

Disadvantages being:

 - cost of generating a file handle on all lookups
 - cost of storing file handle in kernel icache

I don't think either of those are problematic in the virtiofs case.
The cost of having to keep fds open while the client has them in its
cache is much higher.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: virtiofs uuid and file handles
  2020-09-23  7:44               ` Miklos Szeredi
@ 2020-09-23  9:56                 ` Amir Goldstein
  2020-09-23 11:12                   ` Miklos Szeredi
  2022-09-11 10:14                 ` Persistent FUSE file handles (Was: virtiofs uuid and file handles) Amir Goldstein
  1 sibling, 1 reply; 22+ messages in thread
From: Amir Goldstein @ 2020-09-23  9:56 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, overlayfs, linux-fsdevel, Max Reitz

On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Wed, Sep 23, 2020 at 4:49 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> > I think that the proper was to implement reliable persistent file
> > handles in fuse/virtiofs would be to add ENCODE/DECODE to
> > FUSE protocol and allow the server to handle this.
>
> Max Reitz (Cc-d) is currently looking into this.
>
> One proposal was to add  LOOKUP_HANDLE operation that is similar to
> LOOKUP except it takes a {variable length handle, name} as input and
> returns a variable length handle *and* a u64 node_id that can be used
> normally for all other operations.
>
> The advantage of such a scheme for virtio-fs (and possibly other fuse
> based fs) would be that userspace need not keep a refcounted object
> around until the kernel sends a FORGET, but can prune its node ID
> based cache at any time.   If that happens and a request from the
> client (kernel) comes in with a stale node ID, the server will return
> -ESTALE and the client can ask for a new node ID with a special
> lookup_handle(fh, NULL).
>
> Disadvantages being:
>
>  - cost of generating a file handle on all lookups

I never ran into a local fs implementation where this was expensive.

>  - cost of storing file handle in kernel icache
>
> I don't think either of those are problematic in the virtiofs case.
> The cost of having to keep fds open while the client has them in its
> cache is much higher.
>

Sounds good.
I suppose flock() does need to keep the open fd on server.

Are there any other states for an open fd that must be preserved?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: virtiofs uuid and file handles
  2020-09-23  9:56                 ` Amir Goldstein
@ 2020-09-23 11:12                   ` Miklos Szeredi
  2021-05-29 16:05                     ` Amir Goldstein
  0 siblings, 1 reply; 22+ messages in thread
From: Miklos Szeredi @ 2020-09-23 11:12 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Vivek Goyal, overlayfs, linux-fsdevel, Max Reitz

On Wed, Sep 23, 2020 at 11:57 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Wed, Sep 23, 2020 at 4:49 AM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > > I think that the proper was to implement reliable persistent file
> > > handles in fuse/virtiofs would be to add ENCODE/DECODE to
> > > FUSE protocol and allow the server to handle this.
> >
> > Max Reitz (Cc-d) is currently looking into this.
> >
> > One proposal was to add  LOOKUP_HANDLE operation that is similar to
> > LOOKUP except it takes a {variable length handle, name} as input and
> > returns a variable length handle *and* a u64 node_id that can be used
> > normally for all other operations.
> >
> > The advantage of such a scheme for virtio-fs (and possibly other fuse
> > based fs) would be that userspace need not keep a refcounted object
> > around until the kernel sends a FORGET, but can prune its node ID
> > based cache at any time.   If that happens and a request from the
> > client (kernel) comes in with a stale node ID, the server will return
> > -ESTALE and the client can ask for a new node ID with a special
> > lookup_handle(fh, NULL).
> >
> > Disadvantages being:
> >
> >  - cost of generating a file handle on all lookups
>
> I never ran into a local fs implementation where this was expensive.
>
> >  - cost of storing file handle in kernel icache
> >
> > I don't think either of those are problematic in the virtiofs case.
> > The cost of having to keep fds open while the client has them in its
> > cache is much higher.
> >
>
> Sounds good.
> I suppose flock() does need to keep the open fd on server.

Open files are a separate issue and do need an active object in the server.

The issue this solves  is synchronizing "released" and "evicted"
states of objects between  server and client.  I.e. when a file is
closed (and no more open files exist referencing the same object) the
dentry refcount goes to zero but it remains in the cache.   In this
state the server could really evict it's own cached object, but can't
because the client can gain an active reference at any time  via
cached path lookup.

One other solution would be for the server to send a notification
(NOTIFY_EVICT) that would try to clean out the object from the server
cache and respond with a FORGET if successful.   But I sort of like
the file handle one better, since it solves multiple problems.

Thanks,
Miklos

>
> Are there any other states for an open fd that must be preserved?


>
> Thanks,
> Amir.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Copying overlayfs directories with index=on
       [not found]             ` <CAOQ4uxjp6NpF_Q0QqUTzE5=YiKz9w6JbUVyROG+rNFcHPAThFg@mail.gmail.com>
@ 2020-09-23 12:53               ` Pavel Tikhomirov
  0 siblings, 0 replies; 22+ messages in thread
From: Pavel Tikhomirov @ 2020-09-23 12:53 UTC (permalink / raw)
  To: Amir Goldstein, Vivek Goyal
  Cc: Pavel Tikhomirov, Miklos Szeredi, linux-unionfs

Hi, I've sent a patch which is trying to acheive what Amir had 
suggested. Please take a look:

[PATCH] ovl: introduce new "index=nouuid" option for inodes index feature

On 9/23/20 5:10 AM, Amir Goldstein wrote:
> On Wed, Sep 23, 2020 at 12:25 AM Vivek Goyal <vgoyal@redhat.com> wrote:
>>
>> On Tue, Sep 22, 2020 at 02:15:55PM +0300, Amir Goldstein wrote:
>>
>> [..]
>>>
>>> No objection, but if I were you I wouldn't bother re-writing new ovl_fh.
>>> If you know you don't care about matching uuid in the first place,
>>> it is better to add a mount option to overlayfs 'index=nouuid' to relax the
>>> uuid comparison check for ovl_fh.
>>
>> So is it possible that somebody uses "nouuid" and then a different file
>> got same file handle (as stored in upper). I think that's one issue
>> you were worried about while addressing squashfs fix. IIRC, Miklos had said
>> with-in same filesystem it will not happen and across filesystems
>> sb->uuid check will ensure this does not happen. IOW, "nouuid" will
>> open the possibility of upper file handle matching a different file?
>>
> 
> Well, to be accurate, I did write that when cloning a base lower fs (like with
> dm-thinp) the problem reported with re-created lower squashfs still exists but
> that it is a corner case [1].
> 
> But what I suggested is that index=nouuid will only be allowed for all layers
> on the same fs, where this is not a problem.
> 
> Thanks,
> Amir.
> 
> [1] https://lore.kernel.org/linux-unionfs/CAOQ4uxiq7hkaew4LoFZkf4R73iH_pU7OHOriycLCnnywtA0O0w@mail.gmail.com/
> 

-- 
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: virtiofs uuid and file handles
  2020-09-23 11:12                   ` Miklos Szeredi
@ 2021-05-29 16:05                     ` Amir Goldstein
  2021-05-31 14:11                       ` Miklos Szeredi
  0 siblings, 1 reply; 22+ messages in thread
From: Amir Goldstein @ 2021-05-29 16:05 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, overlayfs, linux-fsdevel, Max Reitz

On Wed, Sep 23, 2020 at 2:12 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Wed, Sep 23, 2020 at 11:57 AM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > >
> > > On Wed, Sep 23, 2020 at 4:49 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > > I think that the proper was to implement reliable persistent file
> > > > handles in fuse/virtiofs would be to add ENCODE/DECODE to
> > > > FUSE protocol and allow the server to handle this.
> > >
> > > Max Reitz (Cc-d) is currently looking into this.
> > >
> > > One proposal was to add  LOOKUP_HANDLE operation that is similar to
> > > LOOKUP except it takes a {variable length handle, name} as input and
> > > returns a variable length handle *and* a u64 node_id that can be used
> > > normally for all other operations.
> > >

Miklos, Max,

Any updates on LOOKUP_HANDLE work?

> > > The advantage of such a scheme for virtio-fs (and possibly other fuse
> > > based fs) would be that userspace need not keep a refcounted object
> > > around until the kernel sends a FORGET, but can prune its node ID
> > > based cache at any time.   If that happens and a request from the
> > > client (kernel) comes in with a stale node ID, the server will return
> > > -ESTALE and the client can ask for a new node ID with a special
> > > lookup_handle(fh, NULL).
> > >
> > > Disadvantages being:
> > >
> > >  - cost of generating a file handle on all lookups
> >
> > I never ran into a local fs implementation where this was expensive.
> >
> > >  - cost of storing file handle in kernel icache
> > >
> > > I don't think either of those are problematic in the virtiofs case.
> > > The cost of having to keep fds open while the client has them in its
> > > cache is much higher.
> > >
> >
> > Sounds good.
> > I suppose flock() does need to keep the open fd on server.
>
> Open files are a separate issue and do need an active object in the server.
>
> The issue this solves  is synchronizing "released" and "evicted"
> states of objects between  server and client.  I.e. when a file is
> closed (and no more open files exist referencing the same object) the
> dentry refcount goes to zero but it remains in the cache.   In this
> state the server could really evict it's own cached object, but can't
> because the client can gain an active reference at any time  via
> cached path lookup.
>
> One other solution would be for the server to send a notification
> (NOTIFY_EVICT) that would try to clean out the object from the server
> cache and respond with a FORGET if successful.   But I sort of like
> the file handle one better, since it solves multiple problems.
>

Even with LOOKUP_HANDLE, I am struggling to understand how we
intend to invalidate all fuse dentries referring to ino X in case the server
replies with reused ino X with a different generation that the one stored
in fuse inode cache.

This is an issue that I encountered when running the passthrough_hp test,
on my filesystem. In tst_readdir_big() for example, underlying files are being
unlinked and new files created reusing the old inode numbers.

This creates a situation where server gets a lookup request
for file B that uses the reused inode number X, while old file A is
still in fuse dentry cache using the older generation of real inode
number X which is still in fuse inode cache.

Now the server knows that the real inode has been rused, because
the server caches the old generation value, but it cannot reply to
the lookup request before the old fuse inode has been invalidated.
IIUC, fuse_lowlevel_notify_inval_inode() is not enough(?).
We would also need to change fuse_dentry_revalidate() to
detect the case of reused/invalidated inode.

The straightforward way I can think of is to store inode generation
in fuse_dentry. It won't even grow the size of the struct.

Am I over complicating this?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: virtiofs uuid and file handles
  2021-05-29 16:05                     ` Amir Goldstein
@ 2021-05-31 14:11                       ` Miklos Szeredi
  2021-05-31 18:12                         ` Amir Goldstein
  0 siblings, 1 reply; 22+ messages in thread
From: Miklos Szeredi @ 2021-05-31 14:11 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Vivek Goyal, overlayfs, linux-fsdevel, Max Reitz

On Sat, 29 May 2021 at 18:05, Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Wed, Sep 23, 2020 at 2:12 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Wed, Sep 23, 2020 at 11:57 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > > >
> > > > On Wed, Sep 23, 2020 at 4:49 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > > >
> > > > > I think that the proper was to implement reliable persistent file
> > > > > handles in fuse/virtiofs would be to add ENCODE/DECODE to
> > > > > FUSE protocol and allow the server to handle this.
> > > >
> > > > Max Reitz (Cc-d) is currently looking into this.
> > > >
> > > > One proposal was to add  LOOKUP_HANDLE operation that is similar to
> > > > LOOKUP except it takes a {variable length handle, name} as input and
> > > > returns a variable length handle *and* a u64 node_id that can be used
> > > > normally for all other operations.
> > > >
>
> Miklos, Max,
>
> Any updates on LOOKUP_HANDLE work?
>
> > > > The advantage of such a scheme for virtio-fs (and possibly other fuse
> > > > based fs) would be that userspace need not keep a refcounted object
> > > > around until the kernel sends a FORGET, but can prune its node ID
> > > > based cache at any time.   If that happens and a request from the
> > > > client (kernel) comes in with a stale node ID, the server will return
> > > > -ESTALE and the client can ask for a new node ID with a special
> > > > lookup_handle(fh, NULL).
> > > >
> > > > Disadvantages being:
> > > >
> > > >  - cost of generating a file handle on all lookups
> > >
> > > I never ran into a local fs implementation where this was expensive.
> > >
> > > >  - cost of storing file handle in kernel icache
> > > >
> > > > I don't think either of those are problematic in the virtiofs case.
> > > > The cost of having to keep fds open while the client has them in its
> > > > cache is much higher.
> > > >
> > >
> > > Sounds good.
> > > I suppose flock() does need to keep the open fd on server.
> >
> > Open files are a separate issue and do need an active object in the server.
> >
> > The issue this solves  is synchronizing "released" and "evicted"
> > states of objects between  server and client.  I.e. when a file is
> > closed (and no more open files exist referencing the same object) the
> > dentry refcount goes to zero but it remains in the cache.   In this
> > state the server could really evict it's own cached object, but can't
> > because the client can gain an active reference at any time  via
> > cached path lookup.
> >
> > One other solution would be for the server to send a notification
> > (NOTIFY_EVICT) that would try to clean out the object from the server
> > cache and respond with a FORGET if successful.   But I sort of like
> > the file handle one better, since it solves multiple problems.
> >
>
> Even with LOOKUP_HANDLE, I am struggling to understand how we
> intend to invalidate all fuse dentries referring to ino X in case the server
> replies with reused ino X with a different generation that the one stored
> in fuse inode cache.
>
> This is an issue that I encountered when running the passthrough_hp test,
> on my filesystem. In tst_readdir_big() for example, underlying files are being
> unlinked and new files created reusing the old inode numbers.
>
> This creates a situation where server gets a lookup request
> for file B that uses the reused inode number X, while old file A is
> still in fuse dentry cache using the older generation of real inode
> number X which is still in fuse inode cache.
>
> Now the server knows that the real inode has been rused, because
> the server caches the old generation value, but it cannot reply to
> the lookup request before the old fuse inode has been invalidated.
> IIUC, fuse_lowlevel_notify_inval_inode() is not enough(?).
> We would also need to change fuse_dentry_revalidate() to
> detect the case of reused/invalidated inode.
>
> The straightforward way I can think of is to store inode generation
> in fuse_dentry. It won't even grow the size of the struct.
>
> Am I over complicating this?

In this scheme the generation number is already embedded in the file
handle.  If LOOKUP_HANDLE returns a nodeid that can be found in the
icache, but which doesn't match the new file handle, then the old
inode will be marked bad and a new one allocated.

Does that answer your worries?  Or am I missing something?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: virtiofs uuid and file handles
  2021-05-31 14:11                       ` Miklos Szeredi
@ 2021-05-31 18:12                         ` Amir Goldstein
  2021-06-01 14:49                           ` Vivek Goyal
  0 siblings, 1 reply; 22+ messages in thread
From: Amir Goldstein @ 2021-05-31 18:12 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, overlayfs, linux-fsdevel, Max Reitz

On Mon, May 31, 2021 at 5:11 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Sat, 29 May 2021 at 18:05, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Wed, Sep 23, 2020 at 2:12 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > >
> > > On Wed, Sep 23, 2020 at 11:57 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > > >
> > > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > > > >
> > > > > On Wed, Sep 23, 2020 at 4:49 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > >
> > > > > > I think that the proper was to implement reliable persistent file
> > > > > > handles in fuse/virtiofs would be to add ENCODE/DECODE to
> > > > > > FUSE protocol and allow the server to handle this.
> > > > >
> > > > > Max Reitz (Cc-d) is currently looking into this.
> > > > >
> > > > > One proposal was to add  LOOKUP_HANDLE operation that is similar to
> > > > > LOOKUP except it takes a {variable length handle, name} as input and
> > > > > returns a variable length handle *and* a u64 node_id that can be used
> > > > > normally for all other operations.
> > > > >
> >
> > Miklos, Max,
> >
> > Any updates on LOOKUP_HANDLE work?
> >
> > > > > The advantage of such a scheme for virtio-fs (and possibly other fuse
> > > > > based fs) would be that userspace need not keep a refcounted object
> > > > > around until the kernel sends a FORGET, but can prune its node ID
> > > > > based cache at any time.   If that happens and a request from the
> > > > > client (kernel) comes in with a stale node ID, the server will return
> > > > > -ESTALE and the client can ask for a new node ID with a special
> > > > > lookup_handle(fh, NULL).
> > > > >
> > > > > Disadvantages being:
> > > > >
> > > > >  - cost of generating a file handle on all lookups
> > > >
> > > > I never ran into a local fs implementation where this was expensive.
> > > >
> > > > >  - cost of storing file handle in kernel icache
> > > > >
> > > > > I don't think either of those are problematic in the virtiofs case.
> > > > > The cost of having to keep fds open while the client has them in its
> > > > > cache is much higher.
> > > > >
> > > >
> > > > Sounds good.
> > > > I suppose flock() does need to keep the open fd on server.
> > >
> > > Open files are a separate issue and do need an active object in the server.
> > >
> > > The issue this solves  is synchronizing "released" and "evicted"
> > > states of objects between  server and client.  I.e. when a file is
> > > closed (and no more open files exist referencing the same object) the
> > > dentry refcount goes to zero but it remains in the cache.   In this
> > > state the server could really evict it's own cached object, but can't
> > > because the client can gain an active reference at any time  via
> > > cached path lookup.
> > >
> > > One other solution would be for the server to send a notification
> > > (NOTIFY_EVICT) that would try to clean out the object from the server
> > > cache and respond with a FORGET if successful.   But I sort of like
> > > the file handle one better, since it solves multiple problems.
> > >
> >
> > Even with LOOKUP_HANDLE, I am struggling to understand how we
> > intend to invalidate all fuse dentries referring to ino X in case the server
> > replies with reused ino X with a different generation that the one stored
> > in fuse inode cache.
> >
> > This is an issue that I encountered when running the passthrough_hp test,
> > on my filesystem. In tst_readdir_big() for example, underlying files are being
> > unlinked and new files created reusing the old inode numbers.
> >
> > This creates a situation where server gets a lookup request
> > for file B that uses the reused inode number X, while old file A is
> > still in fuse dentry cache using the older generation of real inode
> > number X which is still in fuse inode cache.
> >
> > Now the server knows that the real inode has been rused, because
> > the server caches the old generation value, but it cannot reply to
> > the lookup request before the old fuse inode has been invalidated.
> > IIUC, fuse_lowlevel_notify_inval_inode() is not enough(?).
> > We would also need to change fuse_dentry_revalidate() to
> > detect the case of reused/invalidated inode.
> >
> > The straightforward way I can think of is to store inode generation
> > in fuse_dentry. It won't even grow the size of the struct.
> >
> > Am I over complicating this?
>
> In this scheme the generation number is already embedded in the file
> handle.  If LOOKUP_HANDLE returns a nodeid that can be found in the
> icache, but which doesn't match the new file handle, then the old
> inode will be marked bad and a new one allocated.
>
> Does that answer your worries?  Or am I missing something?

It affirms my understanding of the future implementation, but
does not help my implementation without protocol changes.
I thought I could get away without LOOKUP_HANDLE for
underlying fs that is able to resolve by ino, but seems that I still have an
unhandled corner case, so will need to add some kernel patch.
Unless there is already a way to signal from server to make the
inode bad in a synchronous manner (I did not find any) before
replying to LOOKUP with a new generation of the same ino.

Any idea about the timeline for LOOKUP_HANDLE?
I may be able to pick this up myself if there is no one actively
working on it or plans for anyone to make this happen.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: virtiofs uuid and file handles
  2021-05-31 18:12                         ` Amir Goldstein
@ 2021-06-01 14:49                           ` Vivek Goyal
  2021-06-01 15:42                             ` Amir Goldstein
  0 siblings, 1 reply; 22+ messages in thread
From: Vivek Goyal @ 2021-06-01 14:49 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, overlayfs, linux-fsdevel, Max Reitz

On Mon, May 31, 2021 at 09:12:59PM +0300, Amir Goldstein wrote:
> On Mon, May 31, 2021 at 5:11 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Sat, 29 May 2021 at 18:05, Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Wed, Sep 23, 2020 at 2:12 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > > >
> > > > On Wed, Sep 23, 2020 at 11:57 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > >
> > > > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > > > > >
> > > > > > On Wed, Sep 23, 2020 at 4:49 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > > >
> > > > > > > I think that the proper was to implement reliable persistent file
> > > > > > > handles in fuse/virtiofs would be to add ENCODE/DECODE to
> > > > > > > FUSE protocol and allow the server to handle this.
> > > > > >
> > > > > > Max Reitz (Cc-d) is currently looking into this.
> > > > > >
> > > > > > One proposal was to add  LOOKUP_HANDLE operation that is similar to
> > > > > > LOOKUP except it takes a {variable length handle, name} as input and
> > > > > > returns a variable length handle *and* a u64 node_id that can be used
> > > > > > normally for all other operations.
> > > > > >
> > >
> > > Miklos, Max,
> > >
> > > Any updates on LOOKUP_HANDLE work?
> > >
> > > > > > The advantage of such a scheme for virtio-fs (and possibly other fuse
> > > > > > based fs) would be that userspace need not keep a refcounted object
> > > > > > around until the kernel sends a FORGET, but can prune its node ID
> > > > > > based cache at any time.   If that happens and a request from the
> > > > > > client (kernel) comes in with a stale node ID, the server will return
> > > > > > -ESTALE and the client can ask for a new node ID with a special
> > > > > > lookup_handle(fh, NULL).
> > > > > >
> > > > > > Disadvantages being:
> > > > > >
> > > > > >  - cost of generating a file handle on all lookups
> > > > >
> > > > > I never ran into a local fs implementation where this was expensive.
> > > > >
> > > > > >  - cost of storing file handle in kernel icache
> > > > > >
> > > > > > I don't think either of those are problematic in the virtiofs case.
> > > > > > The cost of having to keep fds open while the client has them in its
> > > > > > cache is much higher.
> > > > > >
> > > > >
> > > > > Sounds good.
> > > > > I suppose flock() does need to keep the open fd on server.
> > > >
> > > > Open files are a separate issue and do need an active object in the server.
> > > >
> > > > The issue this solves  is synchronizing "released" and "evicted"
> > > > states of objects between  server and client.  I.e. when a file is
> > > > closed (and no more open files exist referencing the same object) the
> > > > dentry refcount goes to zero but it remains in the cache.   In this
> > > > state the server could really evict it's own cached object, but can't
> > > > because the client can gain an active reference at any time  via
> > > > cached path lookup.
> > > >
> > > > One other solution would be for the server to send a notification
> > > > (NOTIFY_EVICT) that would try to clean out the object from the server
> > > > cache and respond with a FORGET if successful.   But I sort of like
> > > > the file handle one better, since it solves multiple problems.
> > > >
> > >
> > > Even with LOOKUP_HANDLE, I am struggling to understand how we
> > > intend to invalidate all fuse dentries referring to ino X in case the server
> > > replies with reused ino X with a different generation that the one stored
> > > in fuse inode cache.
> > >
> > > This is an issue that I encountered when running the passthrough_hp test,
> > > on my filesystem. In tst_readdir_big() for example, underlying files are being
> > > unlinked and new files created reusing the old inode numbers.
> > >
> > > This creates a situation where server gets a lookup request
> > > for file B that uses the reused inode number X, while old file A is
> > > still in fuse dentry cache using the older generation of real inode
> > > number X which is still in fuse inode cache.
> > >
> > > Now the server knows that the real inode has been rused, because
> > > the server caches the old generation value, but it cannot reply to
> > > the lookup request before the old fuse inode has been invalidated.
> > > IIUC, fuse_lowlevel_notify_inval_inode() is not enough(?).
> > > We would also need to change fuse_dentry_revalidate() to
> > > detect the case of reused/invalidated inode.
> > >
> > > The straightforward way I can think of is to store inode generation
> > > in fuse_dentry. It won't even grow the size of the struct.
> > >
> > > Am I over complicating this?
> >
> > In this scheme the generation number is already embedded in the file
> > handle.  If LOOKUP_HANDLE returns a nodeid that can be found in the
> > icache, but which doesn't match the new file handle, then the old
> > inode will be marked bad and a new one allocated.
> >
> > Does that answer your worries?  Or am I missing something?
> 
> It affirms my understanding of the future implementation, but
> does not help my implementation without protocol changes.
> I thought I could get away without LOOKUP_HANDLE for
> underlying fs that is able to resolve by ino, but seems that I still have an
> unhandled corner case, so will need to add some kernel patch.
> Unless there is already a way to signal from server to make the
> inode bad in a synchronous manner (I did not find any) before
> replying to LOOKUP with a new generation of the same ino.
> 
> Any idea about the timeline for LOOKUP_HANDLE?
> I may be able to pick this up myself if there is no one actively
> working on it or plans for anyone to make this happen.

AFAIK, right now max is not actively looking into LOOKUP_HANDLE.

To solve the issue of virtiofs server having too many fds open, he
is now planning to store corresonding file handle in server and use
that to open fd later.

But this does not help with persistent file handle issue for fuse
client.

BTW, one concern with file handles coming from guest kernel was that
how to trust those handles. Guest can create anything and use
file server to open the files on same filesystem (but not shared
with guest). 

I am assuming same concern should be there with non-virtiofs use
cases. Regular fuse client must be sending a file handle and
file server is running with CAP_DAC_READ_SEARCH. How will it make
sure that client is not able to access files not exported through
shared directory but are present on same filesystem.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: virtiofs uuid and file handles
  2021-06-01 14:49                           ` Vivek Goyal
@ 2021-06-01 15:42                             ` Amir Goldstein
  2021-06-01 16:08                               ` Max Reitz
  0 siblings, 1 reply; 22+ messages in thread
From: Amir Goldstein @ 2021-06-01 15:42 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Miklos Szeredi, overlayfs, linux-fsdevel, Max Reitz

On Tue, Jun 1, 2021 at 5:49 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Mon, May 31, 2021 at 09:12:59PM +0300, Amir Goldstein wrote:
> > On Mon, May 31, 2021 at 5:11 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > >
> > > On Sat, 29 May 2021 at 18:05, Amir Goldstein <amir73il@gmail.com> wrote:
> > > >
> > > > On Wed, Sep 23, 2020 at 2:12 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > > > >
> > > > > On Wed, Sep 23, 2020 at 11:57 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > > >
> > > > > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > > > > > >
> > > > > > > On Wed, Sep 23, 2020 at 4:49 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > > > >
> > > > > > > > I think that the proper was to implement reliable persistent file
> > > > > > > > handles in fuse/virtiofs would be to add ENCODE/DECODE to
> > > > > > > > FUSE protocol and allow the server to handle this.
> > > > > > >
> > > > > > > Max Reitz (Cc-d) is currently looking into this.
> > > > > > >
> > > > > > > One proposal was to add  LOOKUP_HANDLE operation that is similar to
> > > > > > > LOOKUP except it takes a {variable length handle, name} as input and
> > > > > > > returns a variable length handle *and* a u64 node_id that can be used
> > > > > > > normally for all other operations.
> > > > > > >
> > > >
> > > > Miklos, Max,
> > > >
> > > > Any updates on LOOKUP_HANDLE work?
> > > >
> > > > > > > The advantage of such a scheme for virtio-fs (and possibly other fuse
> > > > > > > based fs) would be that userspace need not keep a refcounted object
> > > > > > > around until the kernel sends a FORGET, but can prune its node ID
> > > > > > > based cache at any time.   If that happens and a request from the
> > > > > > > client (kernel) comes in with a stale node ID, the server will return
> > > > > > > -ESTALE and the client can ask for a new node ID with a special
> > > > > > > lookup_handle(fh, NULL).
> > > > > > >
> > > > > > > Disadvantages being:
> > > > > > >
> > > > > > >  - cost of generating a file handle on all lookups
> > > > > >
> > > > > > I never ran into a local fs implementation where this was expensive.
> > > > > >
> > > > > > >  - cost of storing file handle in kernel icache
> > > > > > >
> > > > > > > I don't think either of those are problematic in the virtiofs case.
> > > > > > > The cost of having to keep fds open while the client has them in its
> > > > > > > cache is much higher.
> > > > > > >
> > > > > >
> > > > > > Sounds good.
> > > > > > I suppose flock() does need to keep the open fd on server.
> > > > >
> > > > > Open files are a separate issue and do need an active object in the server.
> > > > >
> > > > > The issue this solves  is synchronizing "released" and "evicted"
> > > > > states of objects between  server and client.  I.e. when a file is
> > > > > closed (and no more open files exist referencing the same object) the
> > > > > dentry refcount goes to zero but it remains in the cache.   In this
> > > > > state the server could really evict it's own cached object, but can't
> > > > > because the client can gain an active reference at any time  via
> > > > > cached path lookup.
> > > > >
> > > > > One other solution would be for the server to send a notification
> > > > > (NOTIFY_EVICT) that would try to clean out the object from the server
> > > > > cache and respond with a FORGET if successful.   But I sort of like
> > > > > the file handle one better, since it solves multiple problems.
> > > > >
> > > >
> > > > Even with LOOKUP_HANDLE, I am struggling to understand how we
> > > > intend to invalidate all fuse dentries referring to ino X in case the server
> > > > replies with reused ino X with a different generation that the one stored
> > > > in fuse inode cache.
> > > >
> > > > This is an issue that I encountered when running the passthrough_hp test,
> > > > on my filesystem. In tst_readdir_big() for example, underlying files are being
> > > > unlinked and new files created reusing the old inode numbers.
> > > >
> > > > This creates a situation where server gets a lookup request
> > > > for file B that uses the reused inode number X, while old file A is
> > > > still in fuse dentry cache using the older generation of real inode
> > > > number X which is still in fuse inode cache.
> > > >
> > > > Now the server knows that the real inode has been rused, because
> > > > the server caches the old generation value, but it cannot reply to
> > > > the lookup request before the old fuse inode has been invalidated.
> > > > IIUC, fuse_lowlevel_notify_inval_inode() is not enough(?).
> > > > We would also need to change fuse_dentry_revalidate() to
> > > > detect the case of reused/invalidated inode.
> > > >
> > > > The straightforward way I can think of is to store inode generation
> > > > in fuse_dentry. It won't even grow the size of the struct.
> > > >
> > > > Am I over complicating this?
> > >
> > > In this scheme the generation number is already embedded in the file
> > > handle.  If LOOKUP_HANDLE returns a nodeid that can be found in the
> > > icache, but which doesn't match the new file handle, then the old
> > > inode will be marked bad and a new one allocated.
> > >
> > > Does that answer your worries?  Or am I missing something?
> >
> > It affirms my understanding of the future implementation, but
> > does not help my implementation without protocol changes.
> > I thought I could get away without LOOKUP_HANDLE for
> > underlying fs that is able to resolve by ino, but seems that I still have an
> > unhandled corner case, so will need to add some kernel patch.
> > Unless there is already a way to signal from server to make the
> > inode bad in a synchronous manner (I did not find any) before
> > replying to LOOKUP with a new generation of the same ino.
> >
> > Any idea about the timeline for LOOKUP_HANDLE?
> > I may be able to pick this up myself if there is no one actively
> > working on it or plans for anyone to make this happen.
>
> AFAIK, right now max is not actively looking into LOOKUP_HANDLE.
>
> To solve the issue of virtiofs server having too many fds open, he
> is now planning to store corresonding file handle in server and use
> that to open fd later.
>
> But this does not help with persistent file handle issue for fuse
> client.
>

I see. Yes that should work, but he'd still need to cope with reused
inode numbers in case you allow unlinks from the host (do you?),
because LOOKUP can find a host fs inode that does not match
the file handle of a previously found inode of the same ino.

Quoting Miklos' response above:
> > > If LOOKUP_HANDLE returns a nodeid that can be found in the
> > > icache, but which doesn't match the new file handle, then the old
> > > inode will be marked bad and a new one allocated.

This statement, with minor adjustments is also true for LOOKUP:

"If LOOKUP returns a nodeid that can be found in the icache, but
 whose i_generation doesn't match the generation returned in outarg,
 then the old inode should be marked bad and a new one allocated."

> BTW, one concern with file handles coming from guest kernel was that
> how to trust those handles. Guest can create anything and use
> file server to open the files on same filesystem (but not shared
> with guest).
>
> I am assuming same concern should be there with non-virtiofs use
> cases. Regular fuse client must be sending a file handle and
> file server is running with CAP_DAC_READ_SEARCH. How will it make
> sure that client is not able to access files not exported through
> shared directory but are present on same filesystem.
>

That is a concern.
It's the same concern for NFS clients that can guess file handles.

The ways to address this concern with NFS is the export option
subtree_check, but that uses non unique file handles to an inode
which include a parent handle, so that's probably not a good fit for
LOOKUP_HANDLE.

If there are no plans to go forward with LOOKUP_HANDLE, I may
respin my protosal to add  ENCODE/DECODE to FUSE protocol
in place of the lookup(".") command.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: virtiofs uuid and file handles
  2021-06-01 15:42                             ` Amir Goldstein
@ 2021-06-01 16:08                               ` Max Reitz
  2021-06-01 18:23                                 ` Amir Goldstein
  0 siblings, 1 reply; 22+ messages in thread
From: Max Reitz @ 2021-06-01 16:08 UTC (permalink / raw)
  To: Amir Goldstein, Vivek Goyal; +Cc: Miklos Szeredi, overlayfs, linux-fsdevel

On 01.06.21 17:42, Amir Goldstein wrote:
> On Tue, Jun 1, 2021 at 5:49 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>> On Mon, May 31, 2021 at 09:12:59PM +0300, Amir Goldstein wrote:
>>> On Mon, May 31, 2021 at 5:11 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>> On Sat, 29 May 2021 at 18:05, Amir Goldstein <amir73il@gmail.com> wrote:
>>>>> On Wed, Sep 23, 2020 at 2:12 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>>>> On Wed, Sep 23, 2020 at 11:57 AM Amir Goldstein <amir73il@gmail.com> wrote:
>>>>>>> On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>>>>>> On Wed, Sep 23, 2020 at 4:49 AM Amir Goldstein <amir73il@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I think that the proper was to implement reliable persistent file
>>>>>>>>> handles in fuse/virtiofs would be to add ENCODE/DECODE to
>>>>>>>>> FUSE protocol and allow the server to handle this.
>>>>>>>> Max Reitz (Cc-d) is currently looking into this.
>>>>>>>>
>>>>>>>> One proposal was to add  LOOKUP_HANDLE operation that is similar to
>>>>>>>> LOOKUP except it takes a {variable length handle, name} as input and
>>>>>>>> returns a variable length handle *and* a u64 node_id that can be used
>>>>>>>> normally for all other operations.
>>>>>>>>
>>>>> Miklos, Max,
>>>>>
>>>>> Any updates on LOOKUP_HANDLE work?

Unfortunately not :(

>>>>>>>> The advantage of such a scheme for virtio-fs (and possibly other fuse
>>>>>>>> based fs) would be that userspace need not keep a refcounted object
>>>>>>>> around until the kernel sends a FORGET, but can prune its node ID
>>>>>>>> based cache at any time.   If that happens and a request from the
>>>>>>>> client (kernel) comes in with a stale node ID, the server will return
>>>>>>>> -ESTALE and the client can ask for a new node ID with a special
>>>>>>>> lookup_handle(fh, NULL).
>>>>>>>>
>>>>>>>> Disadvantages being:
>>>>>>>>
>>>>>>>>   - cost of generating a file handle on all lookups
>>>>>>> I never ran into a local fs implementation where this was expensive.
>>>>>>>
>>>>>>>>   - cost of storing file handle in kernel icache
>>>>>>>>
>>>>>>>> I don't think either of those are problematic in the virtiofs case.
>>>>>>>> The cost of having to keep fds open while the client has them in its
>>>>>>>> cache is much higher.
>>>>>>>>
>>>>>>> Sounds good.
>>>>>>> I suppose flock() does need to keep the open fd on server.
>>>>>> Open files are a separate issue and do need an active object in the server.
>>>>>>
>>>>>> The issue this solves  is synchronizing "released" and "evicted"
>>>>>> states of objects between  server and client.  I.e. when a file is
>>>>>> closed (and no more open files exist referencing the same object) the
>>>>>> dentry refcount goes to zero but it remains in the cache.   In this
>>>>>> state the server could really evict it's own cached object, but can't
>>>>>> because the client can gain an active reference at any time  via
>>>>>> cached path lookup.
>>>>>>
>>>>>> One other solution would be for the server to send a notification
>>>>>> (NOTIFY_EVICT) that would try to clean out the object from the server
>>>>>> cache and respond with a FORGET if successful.   But I sort of like
>>>>>> the file handle one better, since it solves multiple problems.
>>>>>>
>>>>> Even with LOOKUP_HANDLE, I am struggling to understand how we
>>>>> intend to invalidate all fuse dentries referring to ino X in case the server
>>>>> replies with reused ino X with a different generation that the one stored
>>>>> in fuse inode cache.
>>>>>
>>>>> This is an issue that I encountered when running the passthrough_hp test,
>>>>> on my filesystem. In tst_readdir_big() for example, underlying files are being
>>>>> unlinked and new files created reusing the old inode numbers.
>>>>>
>>>>> This creates a situation where server gets a lookup request
>>>>> for file B that uses the reused inode number X, while old file A is
>>>>> still in fuse dentry cache using the older generation of real inode
>>>>> number X which is still in fuse inode cache.
>>>>>
>>>>> Now the server knows that the real inode has been rused, because
>>>>> the server caches the old generation value, but it cannot reply to
>>>>> the lookup request before the old fuse inode has been invalidated.
>>>>> IIUC, fuse_lowlevel_notify_inval_inode() is not enough(?).
>>>>> We would also need to change fuse_dentry_revalidate() to
>>>>> detect the case of reused/invalidated inode.
>>>>>
>>>>> The straightforward way I can think of is to store inode generation
>>>>> in fuse_dentry. It won't even grow the size of the struct.
>>>>>
>>>>> Am I over complicating this?
>>>> In this scheme the generation number is already embedded in the file
>>>> handle.  If LOOKUP_HANDLE returns a nodeid that can be found in the
>>>> icache, but which doesn't match the new file handle, then the old
>>>> inode will be marked bad and a new one allocated.
>>>>
>>>> Does that answer your worries?  Or am I missing something?
>>> It affirms my understanding of the future implementation, but
>>> does not help my implementation without protocol changes.
>>> I thought I could get away without LOOKUP_HANDLE for
>>> underlying fs that is able to resolve by ino, but seems that I still have an
>>> unhandled corner case, so will need to add some kernel patch.
>>> Unless there is already a way to signal from server to make the
>>> inode bad in a synchronous manner (I did not find any) before
>>> replying to LOOKUP with a new generation of the same ino.
>>>
>>> Any idea about the timeline for LOOKUP_HANDLE?
>>> I may be able to pick this up myself if there is no one actively
>>> working on it or plans for anyone to make this happen.
>> AFAIK, right now max is not actively looking into LOOKUP_HANDLE.
>>
>> To solve the issue of virtiofs server having too many fds open, he
>> is now planning to store corresonding file handle in server and use
>> that to open fd later.

Yes, that’s right. Initially, I had hoped these things could tie into 
each other, but it turns out they’re largely separate issue, so for now 
I’m only working on replacing our O_PATH fds by file handles.

>> But this does not help with persistent file handle issue for fuse
>> client.
>>
> I see. Yes that should work, but he'd still need to cope with reused
> inode numbers in case you allow unlinks from the host (do you?),
> because LOOKUP can find a host fs inode that does not match
> the file handle of a previously found inode of the same ino.

That’s indeed an issue.  My current approach is to use the file handle 
(if available) as the key for lookups, so that the generation ID is 
included.

Right now, we use st_ino+st_dev+mnt_id as the key.  st_dev is just a 
fallback for the mount ID, basically, so what we’d really need is inode 
ID + generation ID + mount ID, and that’s basically the file handle + 
mount ID.  So different generation IDs will lead to lookup 
finding/creating a different inode object (lo_inode in C virtiofsd, 
InodeData in virtiofsd-rs), and thus returning different fuse_ino IDs to 
the guest.

(See also: 
https://gitlab.com/mreitz/virtiofsd-rs/-/blob/handles-for-inodes-v4/src/passthrough/mod.rs#L594)

> Quoting Miklos' response above:
>>>> If LOOKUP_HANDLE returns a nodeid that can be found in the
>>>> icache, but which doesn't match the new file handle, then the old
>>>> inode will be marked bad and a new one allocated.
> This statement, with minor adjustments is also true for LOOKUP:
>
> "If LOOKUP returns a nodeid that can be found in the icache, but
>   whose i_generation doesn't match the generation returned in outarg,
>   then the old inode should be marked bad and a new one allocated."
>
>> BTW, one concern with file handles coming from guest kernel was that
>> how to trust those handles. Guest can create anything and use
>> file server to open the files on same filesystem (but not shared
>> with guest).
>>
>> I am assuming same concern should be there with non-virtiofs use
>> cases. Regular fuse client must be sending a file handle and
>> file server is running with CAP_DAC_READ_SEARCH. How will it make
>> sure that client is not able to access files not exported through
>> shared directory but are present on same filesystem.
>>
> That is a concern.
> It's the same concern for NFS clients that can guess file handles.
>
> The ways to address this concern with NFS is the export option
> subtree_check, but that uses non unique file handles to an inode
> which include a parent handle, so that's probably not a good fit for
> LOOKUP_HANDLE.

There was a mail thread on the topic of securing file handles in March:

https://listman.redhat.com/archives/virtio-fs/2021-March/msg00022.html

The problem I see with the subtree_check option is that file handles are 
invalidated when files are moved to a different parent.

So far the consensus was to append a MAC to file handles, generated with 
some secret key, so that only file handles that have been generated 
before can later be opened. Ideally, that MAC would be managed by the 
kernel, so that we could allow virtiofsd to use such MAC-ed file handles 
even when it doesn’t have CAP_DAC_READ_SEARCH.

Max


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: virtiofs uuid and file handles
  2021-06-01 16:08                               ` Max Reitz
@ 2021-06-01 18:23                                 ` Amir Goldstein
  0 siblings, 0 replies; 22+ messages in thread
From: Amir Goldstein @ 2021-06-01 18:23 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vivek Goyal, Miklos Szeredi, overlayfs, linux-fsdevel

> >> But this does not help with persistent file handle issue for fuse
> >> client.
> >>
> > I see. Yes that should work, but he'd still need to cope with reused
> > inode numbers in case you allow unlinks from the host (do you?),
> > because LOOKUP can find a host fs inode that does not match
> > the file handle of a previously found inode of the same ino.
>
> That’s indeed an issue.  My current approach is to use the file handle
> (if available) as the key for lookups, so that the generation ID is
> included.
>
> Right now, we use st_ino+st_dev+mnt_id as the key.  st_dev is just a
> fallback for the mount ID, basically, so what we’d really need is inode
> ID + generation ID + mount ID, and that’s basically the file handle +
> mount ID.  So different generation IDs will lead to lookup
> finding/creating a different inode object (lo_inode in C virtiofsd,
> InodeData in virtiofsd-rs), and thus returning different fuse_ino IDs to
> the guest.
>
> (See also:
> https://gitlab.com/mreitz/virtiofsd-rs/-/blob/handles-for-inodes-v4/src/passthrough/mod.rs#L594)
>

I see, because you do not require persistent inode numbers.
That makes sense if you do not need to export file handles to NFS
and if you are not evicting inode objects from the server inodes map.

Please keep me posted if there are any updates on LOOKUP_HANDLE.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Persistent FUSE file handles (Was: virtiofs uuid and file handles)
  2020-09-23  7:44               ` Miklos Szeredi
  2020-09-23  9:56                 ` Amir Goldstein
@ 2022-09-11 10:14                 ` Amir Goldstein
  2022-09-11 15:16                   ` Bernd Schubert
  2022-09-12 13:16                   ` Vivek Goyal
  1 sibling, 2 replies; 22+ messages in thread
From: Amir Goldstein @ 2022-09-11 10:14 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, linux-fsdevel, Hanna Reitz

On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> One proposal was to add  LOOKUP_HANDLE operation that is similar to
> LOOKUP except it takes a {variable length handle, name} as input and
> returns a variable length handle *and* a u64 node_id that can be used
> normally for all other operations.
>
> The advantage of such a scheme for virtio-fs (and possibly other fuse
> based fs) would be that userspace need not keep a refcounted object
> around until the kernel sends a FORGET, but can prune its node ID
> based cache at any time.   If that happens and a request from the
> client (kernel) comes in with a stale node ID, the server will return
> -ESTALE and the client can ask for a new node ID with a special
> lookup_handle(fh, NULL).
>
> Disadvantages being:
>
>  - cost of generating a file handle on all lookups
>  - cost of storing file handle in kernel icache
>
> I don't think either of those are problematic in the virtiofs case.
> The cost of having to keep fds open while the client has them in its
> cache is much higher.
>

I was thinking of taking a stab at LOOKUP_HANDLE for a generic
implementation of persistent file handles for FUSE.

The purpose is "proper" NFS export support for FUSE.
"proper" being survives server restart.

I realize there is an ongoing effort to use file handles in the virtiofsd
instead of open fds and that LOOKUP_HANDLE could assist in that
effort, but that is an added benefit.

I have a C++ implementation [1] which sort of supports persistent
file handles, but not in a generic manner.

If anyone knows of any WIP on LOOKUP_HANDLE please let me know.

Thanks,
Amir.

[1] https://github.com/amir73il/libfuse/commits/fuse_passthrough

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Persistent FUSE file handles (Was: virtiofs uuid and file handles)
  2022-09-11 10:14                 ` Persistent FUSE file handles (Was: virtiofs uuid and file handles) Amir Goldstein
@ 2022-09-11 15:16                   ` Bernd Schubert
  2022-09-11 15:29                     ` Amir Goldstein
  2022-09-12 13:16                   ` Vivek Goyal
  1 sibling, 1 reply; 22+ messages in thread
From: Bernd Schubert @ 2022-09-11 15:16 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi; +Cc: Vivek Goyal, linux-fsdevel, Hanna Reitz



On 9/11/22 12:14, Amir Goldstein wrote:
> On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>>
>> One proposal was to add  LOOKUP_HANDLE operation that is similar to
>> LOOKUP except it takes a {variable length handle, name} as input and
>> returns a variable length handle *and* a u64 node_id that can be used
>> normally for all other operations.
>>
>> The advantage of such a scheme for virtio-fs (and possibly other fuse
>> based fs) would be that userspace need not keep a refcounted object
>> around until the kernel sends a FORGET, but can prune its node ID
>> based cache at any time.   If that happens and a request from the
>> client (kernel) comes in with a stale node ID, the server will return
>> -ESTALE and the client can ask for a new node ID with a special
>> lookup_handle(fh, NULL).
>>
>> Disadvantages being:
>>
>>   - cost of generating a file handle on all lookups
>>   - cost of storing file handle in kernel icache
>>
>> I don't think either of those are problematic in the virtiofs case.
>> The cost of having to keep fds open while the client has them in its
>> cache is much higher.
>>
> 
> I was thinking of taking a stab at LOOKUP_HANDLE for a generic
> implementation of persistent file handles for FUSE.
> 
> The purpose is "proper" NFS export support for FUSE.
> "proper" being survives server restart.

Wouldn't fuse just need to get struct export_operations additions to 
encode and decode handles?

> 
> I realize there is an ongoing effort to use file handles in the virtiofsd
> instead of open fds and that LOOKUP_HANDLE could assist in that
> effort, but that is an added benefit.
> 
> I have a C++ implementation [1] which sort of supports persistent
> file handles, but not in a generic manner.

How does this interact with exportfs?

> 
> If anyone knows of any WIP on LOOKUP_HANDLE please let me know.
> 
> Thanks,
> Amir.
> 
> [1] https://github.com/amir73il/libfuse/commits/fuse_passthrough


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Persistent FUSE file handles (Was: virtiofs uuid and file handles)
  2022-09-11 15:16                   ` Bernd Schubert
@ 2022-09-11 15:29                     ` Amir Goldstein
  2022-09-11 15:55                       ` Bernd Schubert
  0 siblings, 1 reply; 22+ messages in thread
From: Amir Goldstein @ 2022-09-11 15:29 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: Miklos Szeredi, Vivek Goyal, linux-fsdevel, Hanna Reitz

On Sun, Sep 11, 2022 at 6:16 PM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
>
>
> On 9/11/22 12:14, Amir Goldstein wrote:
> > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >>
> >> One proposal was to add  LOOKUP_HANDLE operation that is similar to
> >> LOOKUP except it takes a {variable length handle, name} as input and
> >> returns a variable length handle *and* a u64 node_id that can be used
> >> normally for all other operations.
> >>
> >> The advantage of such a scheme for virtio-fs (and possibly other fuse
> >> based fs) would be that userspace need not keep a refcounted object
> >> around until the kernel sends a FORGET, but can prune its node ID
> >> based cache at any time.   If that happens and a request from the
> >> client (kernel) comes in with a stale node ID, the server will return
> >> -ESTALE and the client can ask for a new node ID with a special
> >> lookup_handle(fh, NULL).
> >>
> >> Disadvantages being:
> >>
> >>   - cost of generating a file handle on all lookups
> >>   - cost of storing file handle in kernel icache
> >>
> >> I don't think either of those are problematic in the virtiofs case.
> >> The cost of having to keep fds open while the client has them in its
> >> cache is much higher.
> >>
> >
> > I was thinking of taking a stab at LOOKUP_HANDLE for a generic
> > implementation of persistent file handles for FUSE.
> >
> > The purpose is "proper" NFS export support for FUSE.
> > "proper" being survives server restart.
>
> Wouldn't fuse just need to get struct export_operations additions to
> encode and decode handles?
>

FUSE already implements those, but not in a "proper" way, because
there is nothing guaranteeing the persistence of the FUSE file handles.

As a result, when exporting some FUSE fs to NFS and the server is
restarted, NFS client may read a file A and get the content of file B,
because after server restart, FUSE file B got the node id that file A
had before restart.

This is not a hypothetical use case, I have seen this happen.

> >
> > I realize there is an ongoing effort to use file handles in the virtiofsd
> > instead of open fds and that LOOKUP_HANDLE could assist in that
> > effort, but that is an added benefit.
> >
> > I have a C++ implementation [1] which sort of supports persistent
> > file handles, but not in a generic manner.
>
> How does this interact with exportfs?
>

It makes use of internal fs knowledge to encode/decode ext4/xfs
file handles into the 64bit FUSE node id.

This sort of works, but it is not generic.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Persistent FUSE file handles (Was: virtiofs uuid and file handles)
  2022-09-11 15:29                     ` Amir Goldstein
@ 2022-09-11 15:55                       ` Bernd Schubert
  0 siblings, 0 replies; 22+ messages in thread
From: Bernd Schubert @ 2022-09-11 15:55 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, Vivek Goyal, linux-fsdevel, Hanna Reitz



On 9/11/22 17:29, Amir Goldstein wrote:
> On Sun, Sep 11, 2022 at 6:16 PM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
>>
>>
>>
>> On 9/11/22 12:14, Amir Goldstein wrote:
>>> On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>>
>>>> One proposal was to add  LOOKUP_HANDLE operation that is similar to
>>>> LOOKUP except it takes a {variable length handle, name} as input and
>>>> returns a variable length handle *and* a u64 node_id that can be used
>>>> normally for all other operations.
>>>>
>>>> The advantage of such a scheme for virtio-fs (and possibly other fuse
>>>> based fs) would be that userspace need not keep a refcounted object
>>>> around until the kernel sends a FORGET, but can prune its node ID
>>>> based cache at any time.   If that happens and a request from the
>>>> client (kernel) comes in with a stale node ID, the server will return
>>>> -ESTALE and the client can ask for a new node ID with a special
>>>> lookup_handle(fh, NULL).
>>>>
>>>> Disadvantages being:
>>>>
>>>>    - cost of generating a file handle on all lookups
>>>>    - cost of storing file handle in kernel icache
>>>>
>>>> I don't think either of those are problematic in the virtiofs case.
>>>> The cost of having to keep fds open while the client has them in its
>>>> cache is much higher.
>>>>
>>>
>>> I was thinking of taking a stab at LOOKUP_HANDLE for a generic
>>> implementation of persistent file handles for FUSE.
>>>
>>> The purpose is "proper" NFS export support for FUSE.
>>> "proper" being survives server restart.
>>
>> Wouldn't fuse just need to get struct export_operations additions to
>> encode and decode handles?
>>
> 
> FUSE already implements those, but not in a "proper" way, because
> there is nothing guaranteeing the persistence of the FUSE file handles.
> 
> As a result, when exporting some FUSE fs to NFS and the server is
> restarted, NFS client may read a file A and get the content of file B,
> because after server restart, FUSE file B got the node id that file A
> had before restart.

Sorry, right, it is just not passed through to user space. For the file 
systems I'm working on that doesn't work at all.

> 
> This is not a hypothetical use case, I have seen this happen.
> 
>>>
>>> I realize there is an ongoing effort to use file handles in the virtiofsd
>>> instead of open fds and that LOOKUP_HANDLE could assist in that
>>> effort, but that is an added benefit.
>>>
>>> I have a C++ implementation [1] which sort of supports persistent
>>> file handles, but not in a generic manner.
>>
>> How does this interact with exportfs?
>>
> 
> It makes use of internal fs knowledge to encode/decode ext4/xfs
> file handles into the 64bit FUSE node id.
> 
> This sort of works, but it is not generic.

Hmm, I guess I need to look at it.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Persistent FUSE file handles (Was: virtiofs uuid and file handles)
  2022-09-11 10:14                 ` Persistent FUSE file handles (Was: virtiofs uuid and file handles) Amir Goldstein
  2022-09-11 15:16                   ` Bernd Schubert
@ 2022-09-12 13:16                   ` Vivek Goyal
  2022-09-12 13:38                     ` Amir Goldstein
  1 sibling, 1 reply; 22+ messages in thread
From: Vivek Goyal @ 2022-09-12 13:16 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, linux-fsdevel, Hanna Reitz

On Sun, Sep 11, 2022 at 01:14:49PM +0300, Amir Goldstein wrote:
> On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > One proposal was to add  LOOKUP_HANDLE operation that is similar to
> > LOOKUP except it takes a {variable length handle, name} as input and
> > returns a variable length handle *and* a u64 node_id that can be used
> > normally for all other operations.
> >
> > The advantage of such a scheme for virtio-fs (and possibly other fuse
> > based fs) would be that userspace need not keep a refcounted object
> > around until the kernel sends a FORGET, but can prune its node ID
> > based cache at any time.   If that happens and a request from the
> > client (kernel) comes in with a stale node ID, the server will return
> > -ESTALE and the client can ask for a new node ID with a special
> > lookup_handle(fh, NULL).
> >
> > Disadvantages being:
> >
> >  - cost of generating a file handle on all lookups
> >  - cost of storing file handle in kernel icache
> >
> > I don't think either of those are problematic in the virtiofs case.
> > The cost of having to keep fds open while the client has them in its
> > cache is much higher.
> >
> 
> I was thinking of taking a stab at LOOKUP_HANDLE for a generic
> implementation of persistent file handles for FUSE.

Hi Amir,

I was going throug the proposal above for LOOKUP_HANDLE and I was
wondering how nodeid reuse is handled. IOW, if server decides to drop
nodeid from its cache and reuse it for some other file, how will we
differentiate between two. Some sort of generation id encoded in
nodeid?

> 
> The purpose is "proper" NFS export support for FUSE.
> "proper" being survives server restart.

Sounds like a good enhancement. 

> 
> I realize there is an ongoing effort to use file handles in the virtiofsd
> instead of open fds and that LOOKUP_HANDLE could assist in that
> effort, but that is an added benefit.

Using file handles (instead of keeping O_PATH fds open), is now available
in virtriofsd. (option --inode-file-handles). Thanks to Hanna for this
work.

https://gitlab.com/virtio-fs/virtiofsd/-/blob/main/README.md

With LOOKUP_HANDLE implemented, we can afford to implement a mixed
approach where we can keep O_PATH fd open but if there are too many
fds open, then close some of these fds. So this is sort of hybrid
approach between fd and file handles.

> 
> I have a C++ implementation [1] which sort of supports persistent
> file handles, but not in a generic manner.
> 
> If anyone knows of any WIP on LOOKUP_HANDLE please let me know.

I am not aware of any. Will be good if you make it happen upstream :-)

Thanks
Vivek

> 
> Thanks,
> Amir.
> 
> [1] https://github.com/amir73il/libfuse/commits/fuse_passthrough
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Persistent FUSE file handles (Was: virtiofs uuid and file handles)
  2022-09-12 13:16                   ` Vivek Goyal
@ 2022-09-12 13:38                     ` Amir Goldstein
  2022-09-12 14:35                       ` Vivek Goyal
  0 siblings, 1 reply; 22+ messages in thread
From: Amir Goldstein @ 2022-09-12 13:38 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Miklos Szeredi, linux-fsdevel, Hanna Reitz

On Mon, Sep 12, 2022 at 4:16 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Sun, Sep 11, 2022 at 01:14:49PM +0300, Amir Goldstein wrote:
> > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > >
> > > One proposal was to add  LOOKUP_HANDLE operation that is similar to
> > > LOOKUP except it takes a {variable length handle, name} as input and
> > > returns a variable length handle *and* a u64 node_id that can be used
> > > normally for all other operations.
> > >
> > > The advantage of such a scheme for virtio-fs (and possibly other fuse
> > > based fs) would be that userspace need not keep a refcounted object
> > > around until the kernel sends a FORGET, but can prune its node ID
> > > based cache at any time.   If that happens and a request from the
> > > client (kernel) comes in with a stale node ID, the server will return
> > > -ESTALE and the client can ask for a new node ID with a special
> > > lookup_handle(fh, NULL).
> > >
> > > Disadvantages being:
> > >
> > >  - cost of generating a file handle on all lookups
> > >  - cost of storing file handle in kernel icache
> > >
> > > I don't think either of those are problematic in the virtiofs case.
> > > The cost of having to keep fds open while the client has them in its
> > > cache is much higher.
> > >
> >
> > I was thinking of taking a stab at LOOKUP_HANDLE for a generic
> > implementation of persistent file handles for FUSE.
>
> Hi Amir,
>
> I was going throug the proposal above for LOOKUP_HANDLE and I was
> wondering how nodeid reuse is handled.

LOOKUP_HANDLE extends the 64bit node id to be variable size id.
A server that declares support for LOOKUP_HANDLE must never
reuse a handle.

That's the basic idea. Just as a filesystem that declares to support
exportfs must never reuse a file handle.

> IOW, if server decides to drop
> nodeid from its cache and reuse it for some other file, how will we
> differentiate between two. Some sort of generation id encoded in
> nodeid?
>

That's usually the way that file handles are implemented in
local fs. The inode number is the internal lookup index and the
generation part is advanced on reuse.

But for passthrough fs like virtiofsd, the LOOKUP_HANDLE will
just use the native fs file handles, so virtiofsd can evict the inodes
entry from its cache completely, not only close the open fds.

That is what my libfuse_passthough POC does.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Persistent FUSE file handles (Was: virtiofs uuid and file handles)
  2022-09-12 13:38                     ` Amir Goldstein
@ 2022-09-12 14:35                       ` Vivek Goyal
  2022-09-12 15:07                         ` Amir Goldstein
  0 siblings, 1 reply; 22+ messages in thread
From: Vivek Goyal @ 2022-09-12 14:35 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, linux-fsdevel, Hanna Reitz

On Mon, Sep 12, 2022 at 04:38:48PM +0300, Amir Goldstein wrote:
> On Mon, Sep 12, 2022 at 4:16 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Sun, Sep 11, 2022 at 01:14:49PM +0300, Amir Goldstein wrote:
> > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > > >
> > > > One proposal was to add  LOOKUP_HANDLE operation that is similar to
> > > > LOOKUP except it takes a {variable length handle, name} as input and
> > > > returns a variable length handle *and* a u64 node_id that can be used
> > > > normally for all other operations.
> > > >
> > > > The advantage of such a scheme for virtio-fs (and possibly other fuse
> > > > based fs) would be that userspace need not keep a refcounted object
> > > > around until the kernel sends a FORGET, but can prune its node ID
> > > > based cache at any time.   If that happens and a request from the
> > > > client (kernel) comes in with a stale node ID, the server will return
> > > > -ESTALE and the client can ask for a new node ID with a special
> > > > lookup_handle(fh, NULL).
> > > >
> > > > Disadvantages being:
> > > >
> > > >  - cost of generating a file handle on all lookups
> > > >  - cost of storing file handle in kernel icache
> > > >
> > > > I don't think either of those are problematic in the virtiofs case.
> > > > The cost of having to keep fds open while the client has them in its
> > > > cache is much higher.
> > > >
> > >
> > > I was thinking of taking a stab at LOOKUP_HANDLE for a generic
> > > implementation of persistent file handles for FUSE.
> >
> > Hi Amir,
> >
> > I was going throug the proposal above for LOOKUP_HANDLE and I was
> > wondering how nodeid reuse is handled.
> 
> LOOKUP_HANDLE extends the 64bit node id to be variable size id.

Ok. So this variable size id is basically file handle returned by
host?

So this looks little different from what Miklos had suggested. IIUC,
he wanted LOOKUP_HANDLE to return both file handle as well as *node id*.

*********************************
One proposal was to add  LOOKUP_HANDLE operation that is similar to
LOOKUP except it takes a {variable length handle, name} as input and
returns a variable length handle *and* a u64 node_id that can be used
normally for all other operations.
***************************************

> A server that declares support for LOOKUP_HANDLE must never
> reuse a handle.
> 
> That's the basic idea. Just as a filesystem that declares to support
> exportfs must never reuse a file handle.

> 
> > IOW, if server decides to drop
> > nodeid from its cache and reuse it for some other file, how will we
> > differentiate between two. Some sort of generation id encoded in
> > nodeid?
> >
> 
> That's usually the way that file handles are implemented in
> local fs. The inode number is the internal lookup index and the
> generation part is advanced on reuse.
> 
> But for passthrough fs like virtiofsd, the LOOKUP_HANDLE will
> just use the native fs file handles, so virtiofsd can evict the inodes
> entry from its cache completely, not only close the open fds.

Ok, got it. Will be interesting to see how kernel fuse changes look
to accomodate this variable sized nodeid.

> 
> That is what my libfuse_passthough POC does.

Where have you hosted corresponding kernel changes?

Thanks
Vivek


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Persistent FUSE file handles (Was: virtiofs uuid and file handles)
  2022-09-12 14:35                       ` Vivek Goyal
@ 2022-09-12 15:07                         ` Amir Goldstein
  2022-09-12 19:56                           ` Vivek Goyal
  0 siblings, 1 reply; 22+ messages in thread
From: Amir Goldstein @ 2022-09-12 15:07 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Miklos Szeredi, linux-fsdevel, Hanna Reitz

On Mon, Sep 12, 2022 at 5:35 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Mon, Sep 12, 2022 at 04:38:48PM +0300, Amir Goldstein wrote:
> > On Mon, Sep 12, 2022 at 4:16 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > On Sun, Sep 11, 2022 at 01:14:49PM +0300, Amir Goldstein wrote:
> > > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > > > >
> > > > > One proposal was to add  LOOKUP_HANDLE operation that is similar to
> > > > > LOOKUP except it takes a {variable length handle, name} as input and
> > > > > returns a variable length handle *and* a u64 node_id that can be used
> > > > > normally for all other operations.
> > > > >
> > > > > The advantage of such a scheme for virtio-fs (and possibly other fuse
> > > > > based fs) would be that userspace need not keep a refcounted object
> > > > > around until the kernel sends a FORGET, but can prune its node ID
> > > > > based cache at any time.   If that happens and a request from the
> > > > > client (kernel) comes in with a stale node ID, the server will return
> > > > > -ESTALE and the client can ask for a new node ID with a special
> > > > > lookup_handle(fh, NULL).
> > > > >
> > > > > Disadvantages being:
> > > > >
> > > > >  - cost of generating a file handle on all lookups
> > > > >  - cost of storing file handle in kernel icache
> > > > >
> > > > > I don't think either of those are problematic in the virtiofs case.
> > > > > The cost of having to keep fds open while the client has them in its
> > > > > cache is much higher.
> > > > >
> > > >
> > > > I was thinking of taking a stab at LOOKUP_HANDLE for a generic
> > > > implementation of persistent file handles for FUSE.
> > >
> > > Hi Amir,
> > >
> > > I was going throug the proposal above for LOOKUP_HANDLE and I was
> > > wondering how nodeid reuse is handled.
> >
> > LOOKUP_HANDLE extends the 64bit node id to be variable size id.
>
> Ok. So this variable size id is basically file handle returned by
> host?
>
> So this looks little different from what Miklos had suggested. IIUC,
> he wanted LOOKUP_HANDLE to return both file handle as well as *node id*.
>
> *********************************
> One proposal was to add  LOOKUP_HANDLE operation that is similar to
> LOOKUP except it takes a {variable length handle, name} as input and
> returns a variable length handle *and* a u64 node_id that can be used
> normally for all other operations.
> ***************************************
>

Ha! Thanks for reminding me about that.
It's been a while since I looked at what actually needs to be done.
That means that evicting server inodes from cache may not be as
easy as I had imagined.

> > A server that declares support for LOOKUP_HANDLE must never
> > reuse a handle.
> >
> > That's the basic idea. Just as a filesystem that declares to support
> > exportfs must never reuse a file handle.
>
> >
> > > IOW, if server decides to drop
> > > nodeid from its cache and reuse it for some other file, how will we
> > > differentiate between two. Some sort of generation id encoded in
> > > nodeid?
> > >
> >
> > That's usually the way that file handles are implemented in
> > local fs. The inode number is the internal lookup index and the
> > generation part is advanced on reuse.
> >
> > But for passthrough fs like virtiofsd, the LOOKUP_HANDLE will
> > just use the native fs file handles, so virtiofsd can evict the inodes
> > entry from its cache completely, not only close the open fds.
>
> Ok, got it. Will be interesting to see how kernel fuse changes look
> to accomodate this variable sized nodeid.
>

It may make sense to have a FUSE protocol dialect where nodeid
is variable size for all commands, but it probably won't be part of
the initial LOOKUP_HANDLE work.

> >
> > That is what my libfuse_passthough POC does.
>
> Where have you hosted corresponding kernel changes?
>

There are no kernel changes.

For xfs and ext4 I know how to implement open_by_ino()
and I know how to parse the opaque fs file handle to extract
ino+generation from it and return them in FUSE_LOOKUP
response.

So I could manage to implement persistent NFS file handles
over the existing FUSE protocol with 64bit node id.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Persistent FUSE file handles (Was: virtiofs uuid and file handles)
  2022-09-12 15:07                         ` Amir Goldstein
@ 2022-09-12 19:56                           ` Vivek Goyal
  2022-09-13  2:07                             ` Amir Goldstein
  0 siblings, 1 reply; 22+ messages in thread
From: Vivek Goyal @ 2022-09-12 19:56 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, linux-fsdevel, Hanna Reitz

On Mon, Sep 12, 2022 at 06:07:42PM +0300, Amir Goldstein wrote:
> On Mon, Sep 12, 2022 at 5:35 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Mon, Sep 12, 2022 at 04:38:48PM +0300, Amir Goldstein wrote:
> > > On Mon, Sep 12, 2022 at 4:16 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > >
> > > > On Sun, Sep 11, 2022 at 01:14:49PM +0300, Amir Goldstein wrote:
> > > > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > > > > >
> > > > > > One proposal was to add  LOOKUP_HANDLE operation that is similar to
> > > > > > LOOKUP except it takes a {variable length handle, name} as input and
> > > > > > returns a variable length handle *and* a u64 node_id that can be used
> > > > > > normally for all other operations.
> > > > > >
> > > > > > The advantage of such a scheme for virtio-fs (and possibly other fuse
> > > > > > based fs) would be that userspace need not keep a refcounted object
> > > > > > around until the kernel sends a FORGET, but can prune its node ID
> > > > > > based cache at any time.   If that happens and a request from the
> > > > > > client (kernel) comes in with a stale node ID, the server will return
> > > > > > -ESTALE and the client can ask for a new node ID with a special
> > > > > > lookup_handle(fh, NULL).
> > > > > >
> > > > > > Disadvantages being:
> > > > > >
> > > > > >  - cost of generating a file handle on all lookups
> > > > > >  - cost of storing file handle in kernel icache
> > > > > >
> > > > > > I don't think either of those are problematic in the virtiofs case.
> > > > > > The cost of having to keep fds open while the client has them in its
> > > > > > cache is much higher.
> > > > > >
> > > > >
> > > > > I was thinking of taking a stab at LOOKUP_HANDLE for a generic
> > > > > implementation of persistent file handles for FUSE.
> > > >
> > > > Hi Amir,
> > > >
> > > > I was going throug the proposal above for LOOKUP_HANDLE and I was
> > > > wondering how nodeid reuse is handled.
> > >
> > > LOOKUP_HANDLE extends the 64bit node id to be variable size id.
> >
> > Ok. So this variable size id is basically file handle returned by
> > host?
> >
> > So this looks little different from what Miklos had suggested. IIUC,
> > he wanted LOOKUP_HANDLE to return both file handle as well as *node id*.
> >
> > *********************************
> > One proposal was to add  LOOKUP_HANDLE operation that is similar to
> > LOOKUP except it takes a {variable length handle, name} as input and
> > returns a variable length handle *and* a u64 node_id that can be used
> > normally for all other operations.
> > ***************************************
> >
> 
> Ha! Thanks for reminding me about that.
> It's been a while since I looked at what actually needs to be done.
> That means that evicting server inodes from cache may not be as
> easy as I had imagined.
> 
> > > A server that declares support for LOOKUP_HANDLE must never
> > > reuse a handle.
> > >
> > > That's the basic idea. Just as a filesystem that declares to support
> > > exportfs must never reuse a file handle.
> >
> > >
> > > > IOW, if server decides to drop
> > > > nodeid from its cache and reuse it for some other file, how will we
> > > > differentiate between two. Some sort of generation id encoded in
> > > > nodeid?
> > > >
> > >
> > > That's usually the way that file handles are implemented in
> > > local fs. The inode number is the internal lookup index and the
> > > generation part is advanced on reuse.
> > >
> > > But for passthrough fs like virtiofsd, the LOOKUP_HANDLE will
> > > just use the native fs file handles, so virtiofsd can evict the inodes
> > > entry from its cache completely, not only close the open fds.
> >
> > Ok, got it. Will be interesting to see how kernel fuse changes look
> > to accomodate this variable sized nodeid.
> >
> 
> It may make sense to have a FUSE protocol dialect where nodeid
> is variable size for all commands, but it probably won't be part of
> the initial LOOKUP_HANDLE work.
> 
> > >
> > > That is what my libfuse_passthough POC does.
> >
> > Where have you hosted corresponding kernel changes?
> >
> 
> There are no kernel changes.
> 
> For xfs and ext4 I know how to implement open_by_ino()
> and I know how to parse the opaque fs file handle to extract
> ino+generation from it and return them in FUSE_LOOKUP
> response.

Aha, interesting. So is this filesystem specific. Works on xfs/ext4 but
not necessarily on other filesystems like nfs. (Because they have their
own way of encoding things in file handle).

> 
> So I could manage to implement persistent NFS file handles
> over the existing FUSE protocol with 64bit node id.

And that explains that why you did not have to make kernel changes. But
this does not allow sever to close the fd associate with nodeid? Or there
is a way for server to generate file handle and then call
open_by_handle_at().

Thanks
Vivek


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Persistent FUSE file handles (Was: virtiofs uuid and file handles)
  2022-09-12 19:56                           ` Vivek Goyal
@ 2022-09-13  2:07                             ` Amir Goldstein
  0 siblings, 0 replies; 22+ messages in thread
From: Amir Goldstein @ 2022-09-13  2:07 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Miklos Szeredi, linux-fsdevel, Hanna Reitz

On Mon, Sep 12, 2022 at 10:56 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Mon, Sep 12, 2022 at 06:07:42PM +0300, Amir Goldstein wrote:
> > On Mon, Sep 12, 2022 at 5:35 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > On Mon, Sep 12, 2022 at 04:38:48PM +0300, Amir Goldstein wrote:
> > > > On Mon, Sep 12, 2022 at 4:16 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > >
> > > > > On Sun, Sep 11, 2022 at 01:14:49PM +0300, Amir Goldstein wrote:
> > > > > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > > > > > >
> > > > > > > One proposal was to add  LOOKUP_HANDLE operation that is similar to
> > > > > > > LOOKUP except it takes a {variable length handle, name} as input and
> > > > > > > returns a variable length handle *and* a u64 node_id that can be used
> > > > > > > normally for all other operations.
> > > > > > >
> > > > > > > The advantage of such a scheme for virtio-fs (and possibly other fuse
> > > > > > > based fs) would be that userspace need not keep a refcounted object
> > > > > > > around until the kernel sends a FORGET, but can prune its node ID
> > > > > > > based cache at any time.   If that happens and a request from the
> > > > > > > client (kernel) comes in with a stale node ID, the server will return
> > > > > > > -ESTALE and the client can ask for a new node ID with a special
> > > > > > > lookup_handle(fh, NULL).
> > > > > > >
> > > > > > > Disadvantages being:
> > > > > > >
> > > > > > >  - cost of generating a file handle on all lookups
> > > > > > >  - cost of storing file handle in kernel icache
> > > > > > >
> > > > > > > I don't think either of those are problematic in the virtiofs case.
> > > > > > > The cost of having to keep fds open while the client has them in its
> > > > > > > cache is much higher.
> > > > > > >
> > > > > >
> > > > > > I was thinking of taking a stab at LOOKUP_HANDLE for a generic
> > > > > > implementation of persistent file handles for FUSE.
> > > > >
> > > > > Hi Amir,
> > > > >
> > > > > I was going throug the proposal above for LOOKUP_HANDLE and I was
> > > > > wondering how nodeid reuse is handled.
> > > >
> > > > LOOKUP_HANDLE extends the 64bit node id to be variable size id.
> > >
> > > Ok. So this variable size id is basically file handle returned by
> > > host?
> > >
> > > So this looks little different from what Miklos had suggested. IIUC,
> > > he wanted LOOKUP_HANDLE to return both file handle as well as *node id*.
> > >
> > > *********************************
> > > One proposal was to add  LOOKUP_HANDLE operation that is similar to
> > > LOOKUP except it takes a {variable length handle, name} as input and
> > > returns a variable length handle *and* a u64 node_id that can be used
> > > normally for all other operations.
> > > ***************************************
> > >
> >
> > Ha! Thanks for reminding me about that.
> > It's been a while since I looked at what actually needs to be done.
> > That means that evicting server inodes from cache may not be as
> > easy as I had imagined.
> >
> > > > A server that declares support for LOOKUP_HANDLE must never
> > > > reuse a handle.
> > > >
> > > > That's the basic idea. Just as a filesystem that declares to support
> > > > exportfs must never reuse a file handle.
> > >
> > > >
> > > > > IOW, if server decides to drop
> > > > > nodeid from its cache and reuse it for some other file, how will we
> > > > > differentiate between two. Some sort of generation id encoded in
> > > > > nodeid?
> > > > >
> > > >
> > > > That's usually the way that file handles are implemented in
> > > > local fs. The inode number is the internal lookup index and the
> > > > generation part is advanced on reuse.
> > > >
> > > > But for passthrough fs like virtiofsd, the LOOKUP_HANDLE will
> > > > just use the native fs file handles, so virtiofsd can evict the inodes
> > > > entry from its cache completely, not only close the open fds.
> > >
> > > Ok, got it. Will be interesting to see how kernel fuse changes look
> > > to accomodate this variable sized nodeid.
> > >
> >
> > It may make sense to have a FUSE protocol dialect where nodeid
> > is variable size for all commands, but it probably won't be part of
> > the initial LOOKUP_HANDLE work.
> >
> > > >
> > > > That is what my libfuse_passthough POC does.
> > >
> > > Where have you hosted corresponding kernel changes?
> > >
> >
> > There are no kernel changes.
> >
> > For xfs and ext4 I know how to implement open_by_ino()
> > and I know how to parse the opaque fs file handle to extract
> > ino+generation from it and return them in FUSE_LOOKUP
> > response.
>
> Aha, interesting. So is this filesystem specific. Works on xfs/ext4 but
> not necessarily on other filesystems like nfs. (Because they have their
> own way of encoding things in file handle).

Correct.
If it is not xfs/ext4, the passthrough implementation uses open fds.
In the non xfs/ext4 case, an NFS client may get ESTALE errors after
server restart and worse - get the content of the wrong object
if server has reused nodeid+genration after restart.

>
> >
> > So I could manage to implement persistent NFS file handles
> > over the existing FUSE protocol with 64bit node id.
>
> And that explains that why you did not have to make kernel changes. But
> this does not allow sever to close the fd associate with nodeid? Or there
> is a way for server to generate file handle and then call
> open_by_handle_at().
>

Yes, that's what the server does in case of ext4/xfs:
https://github.com/amir73il/libfuse/blob/fuse_passthrough/passthrough/fuse_passthrough.cpp#L912

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-09-13  2:07 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <a8828676-210a-99e8-30d7-6076f334ed71@virtuozzo.com>
     [not found] ` <CAOQ4uxgZ08ePA5WFOYFoLZaq_-Kjr-haNzBN5Aj3MfF=f9pjdg@mail.gmail.com>
     [not found]   ` <1bb71cbf-0a10-34c7-409d-914058e102f6@virtuozzo.com>
     [not found]     ` <CAOQ4uxieqnKENV_kJYwfcnPjNdVuqH3BnKVx_zLz=N_PdAguNg@mail.gmail.com>
     [not found]       ` <dc696835-bbb5-ed4e-8708-bc828d415a2b@virtuozzo.com>
     [not found]         ` <CAOQ4uxg0XVEEzc+HyyC63WWZuA2AsRjJmbZBuNimtj=t+quVyg@mail.gmail.com>
     [not found]           ` <20200922210445.GG57620@redhat.com>
2020-09-23  2:49             ` virtiofs uuid and file handles Amir Goldstein
2020-09-23  7:44               ` Miklos Szeredi
2020-09-23  9:56                 ` Amir Goldstein
2020-09-23 11:12                   ` Miklos Szeredi
2021-05-29 16:05                     ` Amir Goldstein
2021-05-31 14:11                       ` Miklos Szeredi
2021-05-31 18:12                         ` Amir Goldstein
2021-06-01 14:49                           ` Vivek Goyal
2021-06-01 15:42                             ` Amir Goldstein
2021-06-01 16:08                               ` Max Reitz
2021-06-01 18:23                                 ` Amir Goldstein
2022-09-11 10:14                 ` Persistent FUSE file handles (Was: virtiofs uuid and file handles) Amir Goldstein
2022-09-11 15:16                   ` Bernd Schubert
2022-09-11 15:29                     ` Amir Goldstein
2022-09-11 15:55                       ` Bernd Schubert
2022-09-12 13:16                   ` Vivek Goyal
2022-09-12 13:38                     ` Amir Goldstein
2022-09-12 14:35                       ` Vivek Goyal
2022-09-12 15:07                         ` Amir Goldstein
2022-09-12 19:56                           ` Vivek Goyal
2022-09-13  2:07                             ` Amir Goldstein
     [not found]           ` <20200922212534.GH57620@redhat.com>
     [not found]             ` <CAOQ4uxjp6NpF_Q0QqUTzE5=YiKz9w6JbUVyROG+rNFcHPAThFg@mail.gmail.com>
2020-09-23 12:53               ` Copying overlayfs directories with index=on Pavel Tikhomirov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.