* Re: virtiofs uuid and file handles [not found] ` <20200922210445.GG57620@redhat.com> @ 2020-09-23 2:49 ` Amir Goldstein 2020-09-23 7:44 ` Miklos Szeredi 0 siblings, 1 reply; 22+ messages in thread From: Amir Goldstein @ 2020-09-23 2:49 UTC (permalink / raw) To: Vivek Goyal; +Cc: Miklos Szeredi, overlayfs, linux-fsdevel On Wed, Sep 23, 2020 at 12:04 AM Vivek Goyal <vgoyal@redhat.com> wrote: > ... > > Note that if all overlayfs layers are on the same fs and that fs has > > null uuid, then the "disk copy" use case should just work, but I never > > tested that. > > > > So far, there has been no filesystem with null uuid that could be used > > as upper+lower layers (even xfs with option nouuid has non null s_uuid). > > > > Recently, virtiofs was added as a filesystem that could be used for > > upper+lower layers and virtiofs (which is fuse) has null uuid. > > I guess I never paid attention to uuid part of virtiofs. Probably we > are not using index or any of the advanced features of overlayfs yet, > that's why. > I don't expect you should have a problem enabling index because of null uuid when all layers are on the same virtiofs. That setup is allowed. We only ever start checking for null uuid on lower layers that are NOT on the same fs as upper layer. What you are expected to have a problem with is that FUSE support for file handles is "problematic". I found out the hard way that FUSE can decode NFS file handles to completely different object than the encoded object if the encoded inode was evicted from cache and its node id has been reused. Another problem is that FUSE protocol does not have complete support for decoding file handles. FUSE implements decode file handle by LOOKUP(ino, ".") to server, but if server is proxying a local filesystem, there is not enough information to construct an open_by_handle_at() request. I wrote a fuse filesystem whose file handles are persistent and reliable [1], but it is a specialized server that uses knowledge of the local filesystem file handle format and it requires that the local filesystem has a special feature to interpret a file handle with 0 generation as ANY generation (ext4 does that). I think that the proper was to implement reliable persistent file handles in fuse/virtiofs would be to add ENCODE/DECODE to FUSE protocol and allow the server to handle this. Thanks, Amir, [1] https://github.com/amir73il/libfuse/commits/cachegwfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: virtiofs uuid and file handles 2020-09-23 2:49 ` virtiofs uuid and file handles Amir Goldstein @ 2020-09-23 7:44 ` Miklos Szeredi 2020-09-23 9:56 ` Amir Goldstein 2022-09-11 10:14 ` Persistent FUSE file handles (Was: virtiofs uuid and file handles) Amir Goldstein 0 siblings, 2 replies; 22+ messages in thread From: Miklos Szeredi @ 2020-09-23 7:44 UTC (permalink / raw) To: Amir Goldstein; +Cc: Vivek Goyal, overlayfs, linux-fsdevel, Max Reitz On Wed, Sep 23, 2020 at 4:49 AM Amir Goldstein <amir73il@gmail.com> wrote: > I think that the proper was to implement reliable persistent file > handles in fuse/virtiofs would be to add ENCODE/DECODE to > FUSE protocol and allow the server to handle this. Max Reitz (Cc-d) is currently looking into this. One proposal was to add LOOKUP_HANDLE operation that is similar to LOOKUP except it takes a {variable length handle, name} as input and returns a variable length handle *and* a u64 node_id that can be used normally for all other operations. The advantage of such a scheme for virtio-fs (and possibly other fuse based fs) would be that userspace need not keep a refcounted object around until the kernel sends a FORGET, but can prune its node ID based cache at any time. If that happens and a request from the client (kernel) comes in with a stale node ID, the server will return -ESTALE and the client can ask for a new node ID with a special lookup_handle(fh, NULL). Disadvantages being: - cost of generating a file handle on all lookups - cost of storing file handle in kernel icache I don't think either of those are problematic in the virtiofs case. The cost of having to keep fds open while the client has them in its cache is much higher. Thanks, Miklos ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: virtiofs uuid and file handles 2020-09-23 7:44 ` Miklos Szeredi @ 2020-09-23 9:56 ` Amir Goldstein 2020-09-23 11:12 ` Miklos Szeredi 2022-09-11 10:14 ` Persistent FUSE file handles (Was: virtiofs uuid and file handles) Amir Goldstein 1 sibling, 1 reply; 22+ messages in thread From: Amir Goldstein @ 2020-09-23 9:56 UTC (permalink / raw) To: Miklos Szeredi; +Cc: Vivek Goyal, overlayfs, linux-fsdevel, Max Reitz On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > On Wed, Sep 23, 2020 at 4:49 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > I think that the proper was to implement reliable persistent file > > handles in fuse/virtiofs would be to add ENCODE/DECODE to > > FUSE protocol and allow the server to handle this. > > Max Reitz (Cc-d) is currently looking into this. > > One proposal was to add LOOKUP_HANDLE operation that is similar to > LOOKUP except it takes a {variable length handle, name} as input and > returns a variable length handle *and* a u64 node_id that can be used > normally for all other operations. > > The advantage of such a scheme for virtio-fs (and possibly other fuse > based fs) would be that userspace need not keep a refcounted object > around until the kernel sends a FORGET, but can prune its node ID > based cache at any time. If that happens and a request from the > client (kernel) comes in with a stale node ID, the server will return > -ESTALE and the client can ask for a new node ID with a special > lookup_handle(fh, NULL). > > Disadvantages being: > > - cost of generating a file handle on all lookups I never ran into a local fs implementation where this was expensive. > - cost of storing file handle in kernel icache > > I don't think either of those are problematic in the virtiofs case. > The cost of having to keep fds open while the client has them in its > cache is much higher. > Sounds good. I suppose flock() does need to keep the open fd on server. Are there any other states for an open fd that must be preserved? Thanks, Amir. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: virtiofs uuid and file handles 2020-09-23 9:56 ` Amir Goldstein @ 2020-09-23 11:12 ` Miklos Szeredi 2021-05-29 16:05 ` Amir Goldstein 0 siblings, 1 reply; 22+ messages in thread From: Miklos Szeredi @ 2020-09-23 11:12 UTC (permalink / raw) To: Amir Goldstein; +Cc: Vivek Goyal, overlayfs, linux-fsdevel, Max Reitz On Wed, Sep 23, 2020 at 11:57 AM Amir Goldstein <amir73il@gmail.com> wrote: > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > On Wed, Sep 23, 2020 at 4:49 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > I think that the proper was to implement reliable persistent file > > > handles in fuse/virtiofs would be to add ENCODE/DECODE to > > > FUSE protocol and allow the server to handle this. > > > > Max Reitz (Cc-d) is currently looking into this. > > > > One proposal was to add LOOKUP_HANDLE operation that is similar to > > LOOKUP except it takes a {variable length handle, name} as input and > > returns a variable length handle *and* a u64 node_id that can be used > > normally for all other operations. > > > > The advantage of such a scheme for virtio-fs (and possibly other fuse > > based fs) would be that userspace need not keep a refcounted object > > around until the kernel sends a FORGET, but can prune its node ID > > based cache at any time. If that happens and a request from the > > client (kernel) comes in with a stale node ID, the server will return > > -ESTALE and the client can ask for a new node ID with a special > > lookup_handle(fh, NULL). > > > > Disadvantages being: > > > > - cost of generating a file handle on all lookups > > I never ran into a local fs implementation where this was expensive. > > > - cost of storing file handle in kernel icache > > > > I don't think either of those are problematic in the virtiofs case. > > The cost of having to keep fds open while the client has them in its > > cache is much higher. > > > > Sounds good. > I suppose flock() does need to keep the open fd on server. Open files are a separate issue and do need an active object in the server. The issue this solves is synchronizing "released" and "evicted" states of objects between server and client. I.e. when a file is closed (and no more open files exist referencing the same object) the dentry refcount goes to zero but it remains in the cache. In this state the server could really evict it's own cached object, but can't because the client can gain an active reference at any time via cached path lookup. One other solution would be for the server to send a notification (NOTIFY_EVICT) that would try to clean out the object from the server cache and respond with a FORGET if successful. But I sort of like the file handle one better, since it solves multiple problems. Thanks, Miklos > > Are there any other states for an open fd that must be preserved? > > Thanks, > Amir. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: virtiofs uuid and file handles 2020-09-23 11:12 ` Miklos Szeredi @ 2021-05-29 16:05 ` Amir Goldstein 2021-05-31 14:11 ` Miklos Szeredi 0 siblings, 1 reply; 22+ messages in thread From: Amir Goldstein @ 2021-05-29 16:05 UTC (permalink / raw) To: Miklos Szeredi; +Cc: Vivek Goyal, overlayfs, linux-fsdevel, Max Reitz On Wed, Sep 23, 2020 at 2:12 PM Miklos Szeredi <miklos@szeredi.hu> wrote: > > On Wed, Sep 23, 2020 at 11:57 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > > On Wed, Sep 23, 2020 at 4:49 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > > I think that the proper was to implement reliable persistent file > > > > handles in fuse/virtiofs would be to add ENCODE/DECODE to > > > > FUSE protocol and allow the server to handle this. > > > > > > Max Reitz (Cc-d) is currently looking into this. > > > > > > One proposal was to add LOOKUP_HANDLE operation that is similar to > > > LOOKUP except it takes a {variable length handle, name} as input and > > > returns a variable length handle *and* a u64 node_id that can be used > > > normally for all other operations. > > > Miklos, Max, Any updates on LOOKUP_HANDLE work? > > > The advantage of such a scheme for virtio-fs (and possibly other fuse > > > based fs) would be that userspace need not keep a refcounted object > > > around until the kernel sends a FORGET, but can prune its node ID > > > based cache at any time. If that happens and a request from the > > > client (kernel) comes in with a stale node ID, the server will return > > > -ESTALE and the client can ask for a new node ID with a special > > > lookup_handle(fh, NULL). > > > > > > Disadvantages being: > > > > > > - cost of generating a file handle on all lookups > > > > I never ran into a local fs implementation where this was expensive. > > > > > - cost of storing file handle in kernel icache > > > > > > I don't think either of those are problematic in the virtiofs case. > > > The cost of having to keep fds open while the client has them in its > > > cache is much higher. > > > > > > > Sounds good. > > I suppose flock() does need to keep the open fd on server. > > Open files are a separate issue and do need an active object in the server. > > The issue this solves is synchronizing "released" and "evicted" > states of objects between server and client. I.e. when a file is > closed (and no more open files exist referencing the same object) the > dentry refcount goes to zero but it remains in the cache. In this > state the server could really evict it's own cached object, but can't > because the client can gain an active reference at any time via > cached path lookup. > > One other solution would be for the server to send a notification > (NOTIFY_EVICT) that would try to clean out the object from the server > cache and respond with a FORGET if successful. But I sort of like > the file handle one better, since it solves multiple problems. > Even with LOOKUP_HANDLE, I am struggling to understand how we intend to invalidate all fuse dentries referring to ino X in case the server replies with reused ino X with a different generation that the one stored in fuse inode cache. This is an issue that I encountered when running the passthrough_hp test, on my filesystem. In tst_readdir_big() for example, underlying files are being unlinked and new files created reusing the old inode numbers. This creates a situation where server gets a lookup request for file B that uses the reused inode number X, while old file A is still in fuse dentry cache using the older generation of real inode number X which is still in fuse inode cache. Now the server knows that the real inode has been rused, because the server caches the old generation value, but it cannot reply to the lookup request before the old fuse inode has been invalidated. IIUC, fuse_lowlevel_notify_inval_inode() is not enough(?). We would also need to change fuse_dentry_revalidate() to detect the case of reused/invalidated inode. The straightforward way I can think of is to store inode generation in fuse_dentry. It won't even grow the size of the struct. Am I over complicating this? Thanks, Amir. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: virtiofs uuid and file handles 2021-05-29 16:05 ` Amir Goldstein @ 2021-05-31 14:11 ` Miklos Szeredi 2021-05-31 18:12 ` Amir Goldstein 0 siblings, 1 reply; 22+ messages in thread From: Miklos Szeredi @ 2021-05-31 14:11 UTC (permalink / raw) To: Amir Goldstein; +Cc: Vivek Goyal, overlayfs, linux-fsdevel, Max Reitz On Sat, 29 May 2021 at 18:05, Amir Goldstein <amir73il@gmail.com> wrote: > > On Wed, Sep 23, 2020 at 2:12 PM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > On Wed, Sep 23, 2020 at 11:57 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > > > > On Wed, Sep 23, 2020 at 4:49 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > > > > I think that the proper was to implement reliable persistent file > > > > > handles in fuse/virtiofs would be to add ENCODE/DECODE to > > > > > FUSE protocol and allow the server to handle this. > > > > > > > > Max Reitz (Cc-d) is currently looking into this. > > > > > > > > One proposal was to add LOOKUP_HANDLE operation that is similar to > > > > LOOKUP except it takes a {variable length handle, name} as input and > > > > returns a variable length handle *and* a u64 node_id that can be used > > > > normally for all other operations. > > > > > > Miklos, Max, > > Any updates on LOOKUP_HANDLE work? > > > > > The advantage of such a scheme for virtio-fs (and possibly other fuse > > > > based fs) would be that userspace need not keep a refcounted object > > > > around until the kernel sends a FORGET, but can prune its node ID > > > > based cache at any time. If that happens and a request from the > > > > client (kernel) comes in with a stale node ID, the server will return > > > > -ESTALE and the client can ask for a new node ID with a special > > > > lookup_handle(fh, NULL). > > > > > > > > Disadvantages being: > > > > > > > > - cost of generating a file handle on all lookups > > > > > > I never ran into a local fs implementation where this was expensive. > > > > > > > - cost of storing file handle in kernel icache > > > > > > > > I don't think either of those are problematic in the virtiofs case. > > > > The cost of having to keep fds open while the client has them in its > > > > cache is much higher. > > > > > > > > > > Sounds good. > > > I suppose flock() does need to keep the open fd on server. > > > > Open files are a separate issue and do need an active object in the server. > > > > The issue this solves is synchronizing "released" and "evicted" > > states of objects between server and client. I.e. when a file is > > closed (and no more open files exist referencing the same object) the > > dentry refcount goes to zero but it remains in the cache. In this > > state the server could really evict it's own cached object, but can't > > because the client can gain an active reference at any time via > > cached path lookup. > > > > One other solution would be for the server to send a notification > > (NOTIFY_EVICT) that would try to clean out the object from the server > > cache and respond with a FORGET if successful. But I sort of like > > the file handle one better, since it solves multiple problems. > > > > Even with LOOKUP_HANDLE, I am struggling to understand how we > intend to invalidate all fuse dentries referring to ino X in case the server > replies with reused ino X with a different generation that the one stored > in fuse inode cache. > > This is an issue that I encountered when running the passthrough_hp test, > on my filesystem. In tst_readdir_big() for example, underlying files are being > unlinked and new files created reusing the old inode numbers. > > This creates a situation where server gets a lookup request > for file B that uses the reused inode number X, while old file A is > still in fuse dentry cache using the older generation of real inode > number X which is still in fuse inode cache. > > Now the server knows that the real inode has been rused, because > the server caches the old generation value, but it cannot reply to > the lookup request before the old fuse inode has been invalidated. > IIUC, fuse_lowlevel_notify_inval_inode() is not enough(?). > We would also need to change fuse_dentry_revalidate() to > detect the case of reused/invalidated inode. > > The straightforward way I can think of is to store inode generation > in fuse_dentry. It won't even grow the size of the struct. > > Am I over complicating this? In this scheme the generation number is already embedded in the file handle. If LOOKUP_HANDLE returns a nodeid that can be found in the icache, but which doesn't match the new file handle, then the old inode will be marked bad and a new one allocated. Does that answer your worries? Or am I missing something? Thanks, Miklos ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: virtiofs uuid and file handles 2021-05-31 14:11 ` Miklos Szeredi @ 2021-05-31 18:12 ` Amir Goldstein 2021-06-01 14:49 ` Vivek Goyal 0 siblings, 1 reply; 22+ messages in thread From: Amir Goldstein @ 2021-05-31 18:12 UTC (permalink / raw) To: Miklos Szeredi; +Cc: Vivek Goyal, overlayfs, linux-fsdevel, Max Reitz On Mon, May 31, 2021 at 5:11 PM Miklos Szeredi <miklos@szeredi.hu> wrote: > > On Sat, 29 May 2021 at 18:05, Amir Goldstein <amir73il@gmail.com> wrote: > > > > On Wed, Sep 23, 2020 at 2:12 PM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > > On Wed, Sep 23, 2020 at 11:57 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > > > > > > On Wed, Sep 23, 2020 at 4:49 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > > > > > > I think that the proper was to implement reliable persistent file > > > > > > handles in fuse/virtiofs would be to add ENCODE/DECODE to > > > > > > FUSE protocol and allow the server to handle this. > > > > > > > > > > Max Reitz (Cc-d) is currently looking into this. > > > > > > > > > > One proposal was to add LOOKUP_HANDLE operation that is similar to > > > > > LOOKUP except it takes a {variable length handle, name} as input and > > > > > returns a variable length handle *and* a u64 node_id that can be used > > > > > normally for all other operations. > > > > > > > > > Miklos, Max, > > > > Any updates on LOOKUP_HANDLE work? > > > > > > > The advantage of such a scheme for virtio-fs (and possibly other fuse > > > > > based fs) would be that userspace need not keep a refcounted object > > > > > around until the kernel sends a FORGET, but can prune its node ID > > > > > based cache at any time. If that happens and a request from the > > > > > client (kernel) comes in with a stale node ID, the server will return > > > > > -ESTALE and the client can ask for a new node ID with a special > > > > > lookup_handle(fh, NULL). > > > > > > > > > > Disadvantages being: > > > > > > > > > > - cost of generating a file handle on all lookups > > > > > > > > I never ran into a local fs implementation where this was expensive. > > > > > > > > > - cost of storing file handle in kernel icache > > > > > > > > > > I don't think either of those are problematic in the virtiofs case. > > > > > The cost of having to keep fds open while the client has them in its > > > > > cache is much higher. > > > > > > > > > > > > > Sounds good. > > > > I suppose flock() does need to keep the open fd on server. > > > > > > Open files are a separate issue and do need an active object in the server. > > > > > > The issue this solves is synchronizing "released" and "evicted" > > > states of objects between server and client. I.e. when a file is > > > closed (and no more open files exist referencing the same object) the > > > dentry refcount goes to zero but it remains in the cache. In this > > > state the server could really evict it's own cached object, but can't > > > because the client can gain an active reference at any time via > > > cached path lookup. > > > > > > One other solution would be for the server to send a notification > > > (NOTIFY_EVICT) that would try to clean out the object from the server > > > cache and respond with a FORGET if successful. But I sort of like > > > the file handle one better, since it solves multiple problems. > > > > > > > Even with LOOKUP_HANDLE, I am struggling to understand how we > > intend to invalidate all fuse dentries referring to ino X in case the server > > replies with reused ino X with a different generation that the one stored > > in fuse inode cache. > > > > This is an issue that I encountered when running the passthrough_hp test, > > on my filesystem. In tst_readdir_big() for example, underlying files are being > > unlinked and new files created reusing the old inode numbers. > > > > This creates a situation where server gets a lookup request > > for file B that uses the reused inode number X, while old file A is > > still in fuse dentry cache using the older generation of real inode > > number X which is still in fuse inode cache. > > > > Now the server knows that the real inode has been rused, because > > the server caches the old generation value, but it cannot reply to > > the lookup request before the old fuse inode has been invalidated. > > IIUC, fuse_lowlevel_notify_inval_inode() is not enough(?). > > We would also need to change fuse_dentry_revalidate() to > > detect the case of reused/invalidated inode. > > > > The straightforward way I can think of is to store inode generation > > in fuse_dentry. It won't even grow the size of the struct. > > > > Am I over complicating this? > > In this scheme the generation number is already embedded in the file > handle. If LOOKUP_HANDLE returns a nodeid that can be found in the > icache, but which doesn't match the new file handle, then the old > inode will be marked bad and a new one allocated. > > Does that answer your worries? Or am I missing something? It affirms my understanding of the future implementation, but does not help my implementation without protocol changes. I thought I could get away without LOOKUP_HANDLE for underlying fs that is able to resolve by ino, but seems that I still have an unhandled corner case, so will need to add some kernel patch. Unless there is already a way to signal from server to make the inode bad in a synchronous manner (I did not find any) before replying to LOOKUP with a new generation of the same ino. Any idea about the timeline for LOOKUP_HANDLE? I may be able to pick this up myself if there is no one actively working on it or plans for anyone to make this happen. Thanks, Amir. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: virtiofs uuid and file handles 2021-05-31 18:12 ` Amir Goldstein @ 2021-06-01 14:49 ` Vivek Goyal 2021-06-01 15:42 ` Amir Goldstein 0 siblings, 1 reply; 22+ messages in thread From: Vivek Goyal @ 2021-06-01 14:49 UTC (permalink / raw) To: Amir Goldstein; +Cc: Miklos Szeredi, overlayfs, linux-fsdevel, Max Reitz On Mon, May 31, 2021 at 09:12:59PM +0300, Amir Goldstein wrote: > On Mon, May 31, 2021 at 5:11 PM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > On Sat, 29 May 2021 at 18:05, Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > On Wed, Sep 23, 2020 at 2:12 PM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > > > > On Wed, Sep 23, 2020 at 11:57 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > > > > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > > > > > > > > On Wed, Sep 23, 2020 at 4:49 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > > > > > > > > I think that the proper was to implement reliable persistent file > > > > > > > handles in fuse/virtiofs would be to add ENCODE/DECODE to > > > > > > > FUSE protocol and allow the server to handle this. > > > > > > > > > > > > Max Reitz (Cc-d) is currently looking into this. > > > > > > > > > > > > One proposal was to add LOOKUP_HANDLE operation that is similar to > > > > > > LOOKUP except it takes a {variable length handle, name} as input and > > > > > > returns a variable length handle *and* a u64 node_id that can be used > > > > > > normally for all other operations. > > > > > > > > > > > > Miklos, Max, > > > > > > Any updates on LOOKUP_HANDLE work? > > > > > > > > > The advantage of such a scheme for virtio-fs (and possibly other fuse > > > > > > based fs) would be that userspace need not keep a refcounted object > > > > > > around until the kernel sends a FORGET, but can prune its node ID > > > > > > based cache at any time. If that happens and a request from the > > > > > > client (kernel) comes in with a stale node ID, the server will return > > > > > > -ESTALE and the client can ask for a new node ID with a special > > > > > > lookup_handle(fh, NULL). > > > > > > > > > > > > Disadvantages being: > > > > > > > > > > > > - cost of generating a file handle on all lookups > > > > > > > > > > I never ran into a local fs implementation where this was expensive. > > > > > > > > > > > - cost of storing file handle in kernel icache > > > > > > > > > > > > I don't think either of those are problematic in the virtiofs case. > > > > > > The cost of having to keep fds open while the client has them in its > > > > > > cache is much higher. > > > > > > > > > > > > > > > > Sounds good. > > > > > I suppose flock() does need to keep the open fd on server. > > > > > > > > Open files are a separate issue and do need an active object in the server. > > > > > > > > The issue this solves is synchronizing "released" and "evicted" > > > > states of objects between server and client. I.e. when a file is > > > > closed (and no more open files exist referencing the same object) the > > > > dentry refcount goes to zero but it remains in the cache. In this > > > > state the server could really evict it's own cached object, but can't > > > > because the client can gain an active reference at any time via > > > > cached path lookup. > > > > > > > > One other solution would be for the server to send a notification > > > > (NOTIFY_EVICT) that would try to clean out the object from the server > > > > cache and respond with a FORGET if successful. But I sort of like > > > > the file handle one better, since it solves multiple problems. > > > > > > > > > > Even with LOOKUP_HANDLE, I am struggling to understand how we > > > intend to invalidate all fuse dentries referring to ino X in case the server > > > replies with reused ino X with a different generation that the one stored > > > in fuse inode cache. > > > > > > This is an issue that I encountered when running the passthrough_hp test, > > > on my filesystem. In tst_readdir_big() for example, underlying files are being > > > unlinked and new files created reusing the old inode numbers. > > > > > > This creates a situation where server gets a lookup request > > > for file B that uses the reused inode number X, while old file A is > > > still in fuse dentry cache using the older generation of real inode > > > number X which is still in fuse inode cache. > > > > > > Now the server knows that the real inode has been rused, because > > > the server caches the old generation value, but it cannot reply to > > > the lookup request before the old fuse inode has been invalidated. > > > IIUC, fuse_lowlevel_notify_inval_inode() is not enough(?). > > > We would also need to change fuse_dentry_revalidate() to > > > detect the case of reused/invalidated inode. > > > > > > The straightforward way I can think of is to store inode generation > > > in fuse_dentry. It won't even grow the size of the struct. > > > > > > Am I over complicating this? > > > > In this scheme the generation number is already embedded in the file > > handle. If LOOKUP_HANDLE returns a nodeid that can be found in the > > icache, but which doesn't match the new file handle, then the old > > inode will be marked bad and a new one allocated. > > > > Does that answer your worries? Or am I missing something? > > It affirms my understanding of the future implementation, but > does not help my implementation without protocol changes. > I thought I could get away without LOOKUP_HANDLE for > underlying fs that is able to resolve by ino, but seems that I still have an > unhandled corner case, so will need to add some kernel patch. > Unless there is already a way to signal from server to make the > inode bad in a synchronous manner (I did not find any) before > replying to LOOKUP with a new generation of the same ino. > > Any idea about the timeline for LOOKUP_HANDLE? > I may be able to pick this up myself if there is no one actively > working on it or plans for anyone to make this happen. AFAIK, right now max is not actively looking into LOOKUP_HANDLE. To solve the issue of virtiofs server having too many fds open, he is now planning to store corresonding file handle in server and use that to open fd later. But this does not help with persistent file handle issue for fuse client. BTW, one concern with file handles coming from guest kernel was that how to trust those handles. Guest can create anything and use file server to open the files on same filesystem (but not shared with guest). I am assuming same concern should be there with non-virtiofs use cases. Regular fuse client must be sending a file handle and file server is running with CAP_DAC_READ_SEARCH. How will it make sure that client is not able to access files not exported through shared directory but are present on same filesystem. Thanks Vivek ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: virtiofs uuid and file handles 2021-06-01 14:49 ` Vivek Goyal @ 2021-06-01 15:42 ` Amir Goldstein 2021-06-01 16:08 ` Max Reitz 0 siblings, 1 reply; 22+ messages in thread From: Amir Goldstein @ 2021-06-01 15:42 UTC (permalink / raw) To: Vivek Goyal; +Cc: Miklos Szeredi, overlayfs, linux-fsdevel, Max Reitz On Tue, Jun 1, 2021 at 5:49 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > On Mon, May 31, 2021 at 09:12:59PM +0300, Amir Goldstein wrote: > > On Mon, May 31, 2021 at 5:11 PM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > > On Sat, 29 May 2021 at 18:05, Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > > > On Wed, Sep 23, 2020 at 2:12 PM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > > > > > > On Wed, Sep 23, 2020 at 11:57 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > > > > > > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > > > > > > > > > > On Wed, Sep 23, 2020 at 4:49 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > > > > > > > > > > I think that the proper was to implement reliable persistent file > > > > > > > > handles in fuse/virtiofs would be to add ENCODE/DECODE to > > > > > > > > FUSE protocol and allow the server to handle this. > > > > > > > > > > > > > > Max Reitz (Cc-d) is currently looking into this. > > > > > > > > > > > > > > One proposal was to add LOOKUP_HANDLE operation that is similar to > > > > > > > LOOKUP except it takes a {variable length handle, name} as input and > > > > > > > returns a variable length handle *and* a u64 node_id that can be used > > > > > > > normally for all other operations. > > > > > > > > > > > > > > > Miklos, Max, > > > > > > > > Any updates on LOOKUP_HANDLE work? > > > > > > > > > > > The advantage of such a scheme for virtio-fs (and possibly other fuse > > > > > > > based fs) would be that userspace need not keep a refcounted object > > > > > > > around until the kernel sends a FORGET, but can prune its node ID > > > > > > > based cache at any time. If that happens and a request from the > > > > > > > client (kernel) comes in with a stale node ID, the server will return > > > > > > > -ESTALE and the client can ask for a new node ID with a special > > > > > > > lookup_handle(fh, NULL). > > > > > > > > > > > > > > Disadvantages being: > > > > > > > > > > > > > > - cost of generating a file handle on all lookups > > > > > > > > > > > > I never ran into a local fs implementation where this was expensive. > > > > > > > > > > > > > - cost of storing file handle in kernel icache > > > > > > > > > > > > > > I don't think either of those are problematic in the virtiofs case. > > > > > > > The cost of having to keep fds open while the client has them in its > > > > > > > cache is much higher. > > > > > > > > > > > > > > > > > > > Sounds good. > > > > > > I suppose flock() does need to keep the open fd on server. > > > > > > > > > > Open files are a separate issue and do need an active object in the server. > > > > > > > > > > The issue this solves is synchronizing "released" and "evicted" > > > > > states of objects between server and client. I.e. when a file is > > > > > closed (and no more open files exist referencing the same object) the > > > > > dentry refcount goes to zero but it remains in the cache. In this > > > > > state the server could really evict it's own cached object, but can't > > > > > because the client can gain an active reference at any time via > > > > > cached path lookup. > > > > > > > > > > One other solution would be for the server to send a notification > > > > > (NOTIFY_EVICT) that would try to clean out the object from the server > > > > > cache and respond with a FORGET if successful. But I sort of like > > > > > the file handle one better, since it solves multiple problems. > > > > > > > > > > > > > Even with LOOKUP_HANDLE, I am struggling to understand how we > > > > intend to invalidate all fuse dentries referring to ino X in case the server > > > > replies with reused ino X with a different generation that the one stored > > > > in fuse inode cache. > > > > > > > > This is an issue that I encountered when running the passthrough_hp test, > > > > on my filesystem. In tst_readdir_big() for example, underlying files are being > > > > unlinked and new files created reusing the old inode numbers. > > > > > > > > This creates a situation where server gets a lookup request > > > > for file B that uses the reused inode number X, while old file A is > > > > still in fuse dentry cache using the older generation of real inode > > > > number X which is still in fuse inode cache. > > > > > > > > Now the server knows that the real inode has been rused, because > > > > the server caches the old generation value, but it cannot reply to > > > > the lookup request before the old fuse inode has been invalidated. > > > > IIUC, fuse_lowlevel_notify_inval_inode() is not enough(?). > > > > We would also need to change fuse_dentry_revalidate() to > > > > detect the case of reused/invalidated inode. > > > > > > > > The straightforward way I can think of is to store inode generation > > > > in fuse_dentry. It won't even grow the size of the struct. > > > > > > > > Am I over complicating this? > > > > > > In this scheme the generation number is already embedded in the file > > > handle. If LOOKUP_HANDLE returns a nodeid that can be found in the > > > icache, but which doesn't match the new file handle, then the old > > > inode will be marked bad and a new one allocated. > > > > > > Does that answer your worries? Or am I missing something? > > > > It affirms my understanding of the future implementation, but > > does not help my implementation without protocol changes. > > I thought I could get away without LOOKUP_HANDLE for > > underlying fs that is able to resolve by ino, but seems that I still have an > > unhandled corner case, so will need to add some kernel patch. > > Unless there is already a way to signal from server to make the > > inode bad in a synchronous manner (I did not find any) before > > replying to LOOKUP with a new generation of the same ino. > > > > Any idea about the timeline for LOOKUP_HANDLE? > > I may be able to pick this up myself if there is no one actively > > working on it or plans for anyone to make this happen. > > AFAIK, right now max is not actively looking into LOOKUP_HANDLE. > > To solve the issue of virtiofs server having too many fds open, he > is now planning to store corresonding file handle in server and use > that to open fd later. > > But this does not help with persistent file handle issue for fuse > client. > I see. Yes that should work, but he'd still need to cope with reused inode numbers in case you allow unlinks from the host (do you?), because LOOKUP can find a host fs inode that does not match the file handle of a previously found inode of the same ino. Quoting Miklos' response above: > > > If LOOKUP_HANDLE returns a nodeid that can be found in the > > > icache, but which doesn't match the new file handle, then the old > > > inode will be marked bad and a new one allocated. This statement, with minor adjustments is also true for LOOKUP: "If LOOKUP returns a nodeid that can be found in the icache, but whose i_generation doesn't match the generation returned in outarg, then the old inode should be marked bad and a new one allocated." > BTW, one concern with file handles coming from guest kernel was that > how to trust those handles. Guest can create anything and use > file server to open the files on same filesystem (but not shared > with guest). > > I am assuming same concern should be there with non-virtiofs use > cases. Regular fuse client must be sending a file handle and > file server is running with CAP_DAC_READ_SEARCH. How will it make > sure that client is not able to access files not exported through > shared directory but are present on same filesystem. > That is a concern. It's the same concern for NFS clients that can guess file handles. The ways to address this concern with NFS is the export option subtree_check, but that uses non unique file handles to an inode which include a parent handle, so that's probably not a good fit for LOOKUP_HANDLE. If there are no plans to go forward with LOOKUP_HANDLE, I may respin my protosal to add ENCODE/DECODE to FUSE protocol in place of the lookup(".") command. Thanks, Amir. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: virtiofs uuid and file handles 2021-06-01 15:42 ` Amir Goldstein @ 2021-06-01 16:08 ` Max Reitz 2021-06-01 18:23 ` Amir Goldstein 0 siblings, 1 reply; 22+ messages in thread From: Max Reitz @ 2021-06-01 16:08 UTC (permalink / raw) To: Amir Goldstein, Vivek Goyal; +Cc: Miklos Szeredi, overlayfs, linux-fsdevel On 01.06.21 17:42, Amir Goldstein wrote: > On Tue, Jun 1, 2021 at 5:49 PM Vivek Goyal <vgoyal@redhat.com> wrote: >> On Mon, May 31, 2021 at 09:12:59PM +0300, Amir Goldstein wrote: >>> On Mon, May 31, 2021 at 5:11 PM Miklos Szeredi <miklos@szeredi.hu> wrote: >>>> On Sat, 29 May 2021 at 18:05, Amir Goldstein <amir73il@gmail.com> wrote: >>>>> On Wed, Sep 23, 2020 at 2:12 PM Miklos Szeredi <miklos@szeredi.hu> wrote: >>>>>> On Wed, Sep 23, 2020 at 11:57 AM Amir Goldstein <amir73il@gmail.com> wrote: >>>>>>> On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote: >>>>>>>> On Wed, Sep 23, 2020 at 4:49 AM Amir Goldstein <amir73il@gmail.com> wrote: >>>>>>>> >>>>>>>>> I think that the proper was to implement reliable persistent file >>>>>>>>> handles in fuse/virtiofs would be to add ENCODE/DECODE to >>>>>>>>> FUSE protocol and allow the server to handle this. >>>>>>>> Max Reitz (Cc-d) is currently looking into this. >>>>>>>> >>>>>>>> One proposal was to add LOOKUP_HANDLE operation that is similar to >>>>>>>> LOOKUP except it takes a {variable length handle, name} as input and >>>>>>>> returns a variable length handle *and* a u64 node_id that can be used >>>>>>>> normally for all other operations. >>>>>>>> >>>>> Miklos, Max, >>>>> >>>>> Any updates on LOOKUP_HANDLE work? Unfortunately not :( >>>>>>>> The advantage of such a scheme for virtio-fs (and possibly other fuse >>>>>>>> based fs) would be that userspace need not keep a refcounted object >>>>>>>> around until the kernel sends a FORGET, but can prune its node ID >>>>>>>> based cache at any time. If that happens and a request from the >>>>>>>> client (kernel) comes in with a stale node ID, the server will return >>>>>>>> -ESTALE and the client can ask for a new node ID with a special >>>>>>>> lookup_handle(fh, NULL). >>>>>>>> >>>>>>>> Disadvantages being: >>>>>>>> >>>>>>>> - cost of generating a file handle on all lookups >>>>>>> I never ran into a local fs implementation where this was expensive. >>>>>>> >>>>>>>> - cost of storing file handle in kernel icache >>>>>>>> >>>>>>>> I don't think either of those are problematic in the virtiofs case. >>>>>>>> The cost of having to keep fds open while the client has them in its >>>>>>>> cache is much higher. >>>>>>>> >>>>>>> Sounds good. >>>>>>> I suppose flock() does need to keep the open fd on server. >>>>>> Open files are a separate issue and do need an active object in the server. >>>>>> >>>>>> The issue this solves is synchronizing "released" and "evicted" >>>>>> states of objects between server and client. I.e. when a file is >>>>>> closed (and no more open files exist referencing the same object) the >>>>>> dentry refcount goes to zero but it remains in the cache. In this >>>>>> state the server could really evict it's own cached object, but can't >>>>>> because the client can gain an active reference at any time via >>>>>> cached path lookup. >>>>>> >>>>>> One other solution would be for the server to send a notification >>>>>> (NOTIFY_EVICT) that would try to clean out the object from the server >>>>>> cache and respond with a FORGET if successful. But I sort of like >>>>>> the file handle one better, since it solves multiple problems. >>>>>> >>>>> Even with LOOKUP_HANDLE, I am struggling to understand how we >>>>> intend to invalidate all fuse dentries referring to ino X in case the server >>>>> replies with reused ino X with a different generation that the one stored >>>>> in fuse inode cache. >>>>> >>>>> This is an issue that I encountered when running the passthrough_hp test, >>>>> on my filesystem. In tst_readdir_big() for example, underlying files are being >>>>> unlinked and new files created reusing the old inode numbers. >>>>> >>>>> This creates a situation where server gets a lookup request >>>>> for file B that uses the reused inode number X, while old file A is >>>>> still in fuse dentry cache using the older generation of real inode >>>>> number X which is still in fuse inode cache. >>>>> >>>>> Now the server knows that the real inode has been rused, because >>>>> the server caches the old generation value, but it cannot reply to >>>>> the lookup request before the old fuse inode has been invalidated. >>>>> IIUC, fuse_lowlevel_notify_inval_inode() is not enough(?). >>>>> We would also need to change fuse_dentry_revalidate() to >>>>> detect the case of reused/invalidated inode. >>>>> >>>>> The straightforward way I can think of is to store inode generation >>>>> in fuse_dentry. It won't even grow the size of the struct. >>>>> >>>>> Am I over complicating this? >>>> In this scheme the generation number is already embedded in the file >>>> handle. If LOOKUP_HANDLE returns a nodeid that can be found in the >>>> icache, but which doesn't match the new file handle, then the old >>>> inode will be marked bad and a new one allocated. >>>> >>>> Does that answer your worries? Or am I missing something? >>> It affirms my understanding of the future implementation, but >>> does not help my implementation without protocol changes. >>> I thought I could get away without LOOKUP_HANDLE for >>> underlying fs that is able to resolve by ino, but seems that I still have an >>> unhandled corner case, so will need to add some kernel patch. >>> Unless there is already a way to signal from server to make the >>> inode bad in a synchronous manner (I did not find any) before >>> replying to LOOKUP with a new generation of the same ino. >>> >>> Any idea about the timeline for LOOKUP_HANDLE? >>> I may be able to pick this up myself if there is no one actively >>> working on it or plans for anyone to make this happen. >> AFAIK, right now max is not actively looking into LOOKUP_HANDLE. >> >> To solve the issue of virtiofs server having too many fds open, he >> is now planning to store corresonding file handle in server and use >> that to open fd later. Yes, that’s right. Initially, I had hoped these things could tie into each other, but it turns out they’re largely separate issue, so for now I’m only working on replacing our O_PATH fds by file handles. >> But this does not help with persistent file handle issue for fuse >> client. >> > I see. Yes that should work, but he'd still need to cope with reused > inode numbers in case you allow unlinks from the host (do you?), > because LOOKUP can find a host fs inode that does not match > the file handle of a previously found inode of the same ino. That’s indeed an issue. My current approach is to use the file handle (if available) as the key for lookups, so that the generation ID is included. Right now, we use st_ino+st_dev+mnt_id as the key. st_dev is just a fallback for the mount ID, basically, so what we’d really need is inode ID + generation ID + mount ID, and that’s basically the file handle + mount ID. So different generation IDs will lead to lookup finding/creating a different inode object (lo_inode in C virtiofsd, InodeData in virtiofsd-rs), and thus returning different fuse_ino IDs to the guest. (See also: https://gitlab.com/mreitz/virtiofsd-rs/-/blob/handles-for-inodes-v4/src/passthrough/mod.rs#L594) > Quoting Miklos' response above: >>>> If LOOKUP_HANDLE returns a nodeid that can be found in the >>>> icache, but which doesn't match the new file handle, then the old >>>> inode will be marked bad and a new one allocated. > This statement, with minor adjustments is also true for LOOKUP: > > "If LOOKUP returns a nodeid that can be found in the icache, but > whose i_generation doesn't match the generation returned in outarg, > then the old inode should be marked bad and a new one allocated." > >> BTW, one concern with file handles coming from guest kernel was that >> how to trust those handles. Guest can create anything and use >> file server to open the files on same filesystem (but not shared >> with guest). >> >> I am assuming same concern should be there with non-virtiofs use >> cases. Regular fuse client must be sending a file handle and >> file server is running with CAP_DAC_READ_SEARCH. How will it make >> sure that client is not able to access files not exported through >> shared directory but are present on same filesystem. >> > That is a concern. > It's the same concern for NFS clients that can guess file handles. > > The ways to address this concern with NFS is the export option > subtree_check, but that uses non unique file handles to an inode > which include a parent handle, so that's probably not a good fit for > LOOKUP_HANDLE. There was a mail thread on the topic of securing file handles in March: https://listman.redhat.com/archives/virtio-fs/2021-March/msg00022.html The problem I see with the subtree_check option is that file handles are invalidated when files are moved to a different parent. So far the consensus was to append a MAC to file handles, generated with some secret key, so that only file handles that have been generated before can later be opened. Ideally, that MAC would be managed by the kernel, so that we could allow virtiofsd to use such MAC-ed file handles even when it doesn’t have CAP_DAC_READ_SEARCH. Max ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: virtiofs uuid and file handles 2021-06-01 16:08 ` Max Reitz @ 2021-06-01 18:23 ` Amir Goldstein 0 siblings, 0 replies; 22+ messages in thread From: Amir Goldstein @ 2021-06-01 18:23 UTC (permalink / raw) To: Max Reitz; +Cc: Vivek Goyal, Miklos Szeredi, overlayfs, linux-fsdevel > >> But this does not help with persistent file handle issue for fuse > >> client. > >> > > I see. Yes that should work, but he'd still need to cope with reused > > inode numbers in case you allow unlinks from the host (do you?), > > because LOOKUP can find a host fs inode that does not match > > the file handle of a previously found inode of the same ino. > > That’s indeed an issue. My current approach is to use the file handle > (if available) as the key for lookups, so that the generation ID is > included. > > Right now, we use st_ino+st_dev+mnt_id as the key. st_dev is just a > fallback for the mount ID, basically, so what we’d really need is inode > ID + generation ID + mount ID, and that’s basically the file handle + > mount ID. So different generation IDs will lead to lookup > finding/creating a different inode object (lo_inode in C virtiofsd, > InodeData in virtiofsd-rs), and thus returning different fuse_ino IDs to > the guest. > > (See also: > https://gitlab.com/mreitz/virtiofsd-rs/-/blob/handles-for-inodes-v4/src/passthrough/mod.rs#L594) > I see, because you do not require persistent inode numbers. That makes sense if you do not need to export file handles to NFS and if you are not evicting inode objects from the server inodes map. Please keep me posted if there are any updates on LOOKUP_HANDLE. Thanks, Amir. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Persistent FUSE file handles (Was: virtiofs uuid and file handles) 2020-09-23 7:44 ` Miklos Szeredi 2020-09-23 9:56 ` Amir Goldstein @ 2022-09-11 10:14 ` Amir Goldstein 2022-09-11 15:16 ` Bernd Schubert 2022-09-12 13:16 ` Vivek Goyal 1 sibling, 2 replies; 22+ messages in thread From: Amir Goldstein @ 2022-09-11 10:14 UTC (permalink / raw) To: Miklos Szeredi; +Cc: Vivek Goyal, linux-fsdevel, Hanna Reitz On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > One proposal was to add LOOKUP_HANDLE operation that is similar to > LOOKUP except it takes a {variable length handle, name} as input and > returns a variable length handle *and* a u64 node_id that can be used > normally for all other operations. > > The advantage of such a scheme for virtio-fs (and possibly other fuse > based fs) would be that userspace need not keep a refcounted object > around until the kernel sends a FORGET, but can prune its node ID > based cache at any time. If that happens and a request from the > client (kernel) comes in with a stale node ID, the server will return > -ESTALE and the client can ask for a new node ID with a special > lookup_handle(fh, NULL). > > Disadvantages being: > > - cost of generating a file handle on all lookups > - cost of storing file handle in kernel icache > > I don't think either of those are problematic in the virtiofs case. > The cost of having to keep fds open while the client has them in its > cache is much higher. > I was thinking of taking a stab at LOOKUP_HANDLE for a generic implementation of persistent file handles for FUSE. The purpose is "proper" NFS export support for FUSE. "proper" being survives server restart. I realize there is an ongoing effort to use file handles in the virtiofsd instead of open fds and that LOOKUP_HANDLE could assist in that effort, but that is an added benefit. I have a C++ implementation [1] which sort of supports persistent file handles, but not in a generic manner. If anyone knows of any WIP on LOOKUP_HANDLE please let me know. Thanks, Amir. [1] https://github.com/amir73il/libfuse/commits/fuse_passthrough ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Persistent FUSE file handles (Was: virtiofs uuid and file handles) 2022-09-11 10:14 ` Persistent FUSE file handles (Was: virtiofs uuid and file handles) Amir Goldstein @ 2022-09-11 15:16 ` Bernd Schubert 2022-09-11 15:29 ` Amir Goldstein 2022-09-12 13:16 ` Vivek Goyal 1 sibling, 1 reply; 22+ messages in thread From: Bernd Schubert @ 2022-09-11 15:16 UTC (permalink / raw) To: Amir Goldstein, Miklos Szeredi; +Cc: Vivek Goyal, linux-fsdevel, Hanna Reitz On 9/11/22 12:14, Amir Goldstein wrote: > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote: >> >> One proposal was to add LOOKUP_HANDLE operation that is similar to >> LOOKUP except it takes a {variable length handle, name} as input and >> returns a variable length handle *and* a u64 node_id that can be used >> normally for all other operations. >> >> The advantage of such a scheme for virtio-fs (and possibly other fuse >> based fs) would be that userspace need not keep a refcounted object >> around until the kernel sends a FORGET, but can prune its node ID >> based cache at any time. If that happens and a request from the >> client (kernel) comes in with a stale node ID, the server will return >> -ESTALE and the client can ask for a new node ID with a special >> lookup_handle(fh, NULL). >> >> Disadvantages being: >> >> - cost of generating a file handle on all lookups >> - cost of storing file handle in kernel icache >> >> I don't think either of those are problematic in the virtiofs case. >> The cost of having to keep fds open while the client has them in its >> cache is much higher. >> > > I was thinking of taking a stab at LOOKUP_HANDLE for a generic > implementation of persistent file handles for FUSE. > > The purpose is "proper" NFS export support for FUSE. > "proper" being survives server restart. Wouldn't fuse just need to get struct export_operations additions to encode and decode handles? > > I realize there is an ongoing effort to use file handles in the virtiofsd > instead of open fds and that LOOKUP_HANDLE could assist in that > effort, but that is an added benefit. > > I have a C++ implementation [1] which sort of supports persistent > file handles, but not in a generic manner. How does this interact with exportfs? > > If anyone knows of any WIP on LOOKUP_HANDLE please let me know. > > Thanks, > Amir. > > [1] https://github.com/amir73il/libfuse/commits/fuse_passthrough Thanks, Bernd ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Persistent FUSE file handles (Was: virtiofs uuid and file handles) 2022-09-11 15:16 ` Bernd Schubert @ 2022-09-11 15:29 ` Amir Goldstein 2022-09-11 15:55 ` Bernd Schubert 0 siblings, 1 reply; 22+ messages in thread From: Amir Goldstein @ 2022-09-11 15:29 UTC (permalink / raw) To: Bernd Schubert; +Cc: Miklos Szeredi, Vivek Goyal, linux-fsdevel, Hanna Reitz On Sun, Sep 11, 2022 at 6:16 PM Bernd Schubert <bernd.schubert@fastmail.fm> wrote: > > > > On 9/11/22 12:14, Amir Goldstein wrote: > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > >> > >> One proposal was to add LOOKUP_HANDLE operation that is similar to > >> LOOKUP except it takes a {variable length handle, name} as input and > >> returns a variable length handle *and* a u64 node_id that can be used > >> normally for all other operations. > >> > >> The advantage of such a scheme for virtio-fs (and possibly other fuse > >> based fs) would be that userspace need not keep a refcounted object > >> around until the kernel sends a FORGET, but can prune its node ID > >> based cache at any time. If that happens and a request from the > >> client (kernel) comes in with a stale node ID, the server will return > >> -ESTALE and the client can ask for a new node ID with a special > >> lookup_handle(fh, NULL). > >> > >> Disadvantages being: > >> > >> - cost of generating a file handle on all lookups > >> - cost of storing file handle in kernel icache > >> > >> I don't think either of those are problematic in the virtiofs case. > >> The cost of having to keep fds open while the client has them in its > >> cache is much higher. > >> > > > > I was thinking of taking a stab at LOOKUP_HANDLE for a generic > > implementation of persistent file handles for FUSE. > > > > The purpose is "proper" NFS export support for FUSE. > > "proper" being survives server restart. > > Wouldn't fuse just need to get struct export_operations additions to > encode and decode handles? > FUSE already implements those, but not in a "proper" way, because there is nothing guaranteeing the persistence of the FUSE file handles. As a result, when exporting some FUSE fs to NFS and the server is restarted, NFS client may read a file A and get the content of file B, because after server restart, FUSE file B got the node id that file A had before restart. This is not a hypothetical use case, I have seen this happen. > > > > I realize there is an ongoing effort to use file handles in the virtiofsd > > instead of open fds and that LOOKUP_HANDLE could assist in that > > effort, but that is an added benefit. > > > > I have a C++ implementation [1] which sort of supports persistent > > file handles, but not in a generic manner. > > How does this interact with exportfs? > It makes use of internal fs knowledge to encode/decode ext4/xfs file handles into the 64bit FUSE node id. This sort of works, but it is not generic. Thanks, Amir. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Persistent FUSE file handles (Was: virtiofs uuid and file handles) 2022-09-11 15:29 ` Amir Goldstein @ 2022-09-11 15:55 ` Bernd Schubert 0 siblings, 0 replies; 22+ messages in thread From: Bernd Schubert @ 2022-09-11 15:55 UTC (permalink / raw) To: Amir Goldstein; +Cc: Miklos Szeredi, Vivek Goyal, linux-fsdevel, Hanna Reitz On 9/11/22 17:29, Amir Goldstein wrote: > On Sun, Sep 11, 2022 at 6:16 PM Bernd Schubert > <bernd.schubert@fastmail.fm> wrote: >> >> >> >> On 9/11/22 12:14, Amir Goldstein wrote: >>> On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote: >>>> >>>> One proposal was to add LOOKUP_HANDLE operation that is similar to >>>> LOOKUP except it takes a {variable length handle, name} as input and >>>> returns a variable length handle *and* a u64 node_id that can be used >>>> normally for all other operations. >>>> >>>> The advantage of such a scheme for virtio-fs (and possibly other fuse >>>> based fs) would be that userspace need not keep a refcounted object >>>> around until the kernel sends a FORGET, but can prune its node ID >>>> based cache at any time. If that happens and a request from the >>>> client (kernel) comes in with a stale node ID, the server will return >>>> -ESTALE and the client can ask for a new node ID with a special >>>> lookup_handle(fh, NULL). >>>> >>>> Disadvantages being: >>>> >>>> - cost of generating a file handle on all lookups >>>> - cost of storing file handle in kernel icache >>>> >>>> I don't think either of those are problematic in the virtiofs case. >>>> The cost of having to keep fds open while the client has them in its >>>> cache is much higher. >>>> >>> >>> I was thinking of taking a stab at LOOKUP_HANDLE for a generic >>> implementation of persistent file handles for FUSE. >>> >>> The purpose is "proper" NFS export support for FUSE. >>> "proper" being survives server restart. >> >> Wouldn't fuse just need to get struct export_operations additions to >> encode and decode handles? >> > > FUSE already implements those, but not in a "proper" way, because > there is nothing guaranteeing the persistence of the FUSE file handles. > > As a result, when exporting some FUSE fs to NFS and the server is > restarted, NFS client may read a file A and get the content of file B, > because after server restart, FUSE file B got the node id that file A > had before restart. Sorry, right, it is just not passed through to user space. For the file systems I'm working on that doesn't work at all. > > This is not a hypothetical use case, I have seen this happen. > >>> >>> I realize there is an ongoing effort to use file handles in the virtiofsd >>> instead of open fds and that LOOKUP_HANDLE could assist in that >>> effort, but that is an added benefit. >>> >>> I have a C++ implementation [1] which sort of supports persistent >>> file handles, but not in a generic manner. >> >> How does this interact with exportfs? >> > > It makes use of internal fs knowledge to encode/decode ext4/xfs > file handles into the 64bit FUSE node id. > > This sort of works, but it is not generic. Hmm, I guess I need to look at it. Thanks, Bernd ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Persistent FUSE file handles (Was: virtiofs uuid and file handles) 2022-09-11 10:14 ` Persistent FUSE file handles (Was: virtiofs uuid and file handles) Amir Goldstein 2022-09-11 15:16 ` Bernd Schubert @ 2022-09-12 13:16 ` Vivek Goyal 2022-09-12 13:38 ` Amir Goldstein 1 sibling, 1 reply; 22+ messages in thread From: Vivek Goyal @ 2022-09-12 13:16 UTC (permalink / raw) To: Amir Goldstein; +Cc: Miklos Szeredi, linux-fsdevel, Hanna Reitz On Sun, Sep 11, 2022 at 01:14:49PM +0300, Amir Goldstein wrote: > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > One proposal was to add LOOKUP_HANDLE operation that is similar to > > LOOKUP except it takes a {variable length handle, name} as input and > > returns a variable length handle *and* a u64 node_id that can be used > > normally for all other operations. > > > > The advantage of such a scheme for virtio-fs (and possibly other fuse > > based fs) would be that userspace need not keep a refcounted object > > around until the kernel sends a FORGET, but can prune its node ID > > based cache at any time. If that happens and a request from the > > client (kernel) comes in with a stale node ID, the server will return > > -ESTALE and the client can ask for a new node ID with a special > > lookup_handle(fh, NULL). > > > > Disadvantages being: > > > > - cost of generating a file handle on all lookups > > - cost of storing file handle in kernel icache > > > > I don't think either of those are problematic in the virtiofs case. > > The cost of having to keep fds open while the client has them in its > > cache is much higher. > > > > I was thinking of taking a stab at LOOKUP_HANDLE for a generic > implementation of persistent file handles for FUSE. Hi Amir, I was going throug the proposal above for LOOKUP_HANDLE and I was wondering how nodeid reuse is handled. IOW, if server decides to drop nodeid from its cache and reuse it for some other file, how will we differentiate between two. Some sort of generation id encoded in nodeid? > > The purpose is "proper" NFS export support for FUSE. > "proper" being survives server restart. Sounds like a good enhancement. > > I realize there is an ongoing effort to use file handles in the virtiofsd > instead of open fds and that LOOKUP_HANDLE could assist in that > effort, but that is an added benefit. Using file handles (instead of keeping O_PATH fds open), is now available in virtriofsd. (option --inode-file-handles). Thanks to Hanna for this work. https://gitlab.com/virtio-fs/virtiofsd/-/blob/main/README.md With LOOKUP_HANDLE implemented, we can afford to implement a mixed approach where we can keep O_PATH fd open but if there are too many fds open, then close some of these fds. So this is sort of hybrid approach between fd and file handles. > > I have a C++ implementation [1] which sort of supports persistent > file handles, but not in a generic manner. > > If anyone knows of any WIP on LOOKUP_HANDLE please let me know. I am not aware of any. Will be good if you make it happen upstream :-) Thanks Vivek > > Thanks, > Amir. > > [1] https://github.com/amir73il/libfuse/commits/fuse_passthrough > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Persistent FUSE file handles (Was: virtiofs uuid and file handles) 2022-09-12 13:16 ` Vivek Goyal @ 2022-09-12 13:38 ` Amir Goldstein 2022-09-12 14:35 ` Vivek Goyal 0 siblings, 1 reply; 22+ messages in thread From: Amir Goldstein @ 2022-09-12 13:38 UTC (permalink / raw) To: Vivek Goyal; +Cc: Miklos Szeredi, linux-fsdevel, Hanna Reitz On Mon, Sep 12, 2022 at 4:16 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > On Sun, Sep 11, 2022 at 01:14:49PM +0300, Amir Goldstein wrote: > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > > One proposal was to add LOOKUP_HANDLE operation that is similar to > > > LOOKUP except it takes a {variable length handle, name} as input and > > > returns a variable length handle *and* a u64 node_id that can be used > > > normally for all other operations. > > > > > > The advantage of such a scheme for virtio-fs (and possibly other fuse > > > based fs) would be that userspace need not keep a refcounted object > > > around until the kernel sends a FORGET, but can prune its node ID > > > based cache at any time. If that happens and a request from the > > > client (kernel) comes in with a stale node ID, the server will return > > > -ESTALE and the client can ask for a new node ID with a special > > > lookup_handle(fh, NULL). > > > > > > Disadvantages being: > > > > > > - cost of generating a file handle on all lookups > > > - cost of storing file handle in kernel icache > > > > > > I don't think either of those are problematic in the virtiofs case. > > > The cost of having to keep fds open while the client has them in its > > > cache is much higher. > > > > > > > I was thinking of taking a stab at LOOKUP_HANDLE for a generic > > implementation of persistent file handles for FUSE. > > Hi Amir, > > I was going throug the proposal above for LOOKUP_HANDLE and I was > wondering how nodeid reuse is handled. LOOKUP_HANDLE extends the 64bit node id to be variable size id. A server that declares support for LOOKUP_HANDLE must never reuse a handle. That's the basic idea. Just as a filesystem that declares to support exportfs must never reuse a file handle. > IOW, if server decides to drop > nodeid from its cache and reuse it for some other file, how will we > differentiate between two. Some sort of generation id encoded in > nodeid? > That's usually the way that file handles are implemented in local fs. The inode number is the internal lookup index and the generation part is advanced on reuse. But for passthrough fs like virtiofsd, the LOOKUP_HANDLE will just use the native fs file handles, so virtiofsd can evict the inodes entry from its cache completely, not only close the open fds. That is what my libfuse_passthough POC does. Thanks, Amir. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Persistent FUSE file handles (Was: virtiofs uuid and file handles) 2022-09-12 13:38 ` Amir Goldstein @ 2022-09-12 14:35 ` Vivek Goyal 2022-09-12 15:07 ` Amir Goldstein 0 siblings, 1 reply; 22+ messages in thread From: Vivek Goyal @ 2022-09-12 14:35 UTC (permalink / raw) To: Amir Goldstein; +Cc: Miklos Szeredi, linux-fsdevel, Hanna Reitz On Mon, Sep 12, 2022 at 04:38:48PM +0300, Amir Goldstein wrote: > On Mon, Sep 12, 2022 at 4:16 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > > > On Sun, Sep 11, 2022 at 01:14:49PM +0300, Amir Goldstein wrote: > > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > > > > One proposal was to add LOOKUP_HANDLE operation that is similar to > > > > LOOKUP except it takes a {variable length handle, name} as input and > > > > returns a variable length handle *and* a u64 node_id that can be used > > > > normally for all other operations. > > > > > > > > The advantage of such a scheme for virtio-fs (and possibly other fuse > > > > based fs) would be that userspace need not keep a refcounted object > > > > around until the kernel sends a FORGET, but can prune its node ID > > > > based cache at any time. If that happens and a request from the > > > > client (kernel) comes in with a stale node ID, the server will return > > > > -ESTALE and the client can ask for a new node ID with a special > > > > lookup_handle(fh, NULL). > > > > > > > > Disadvantages being: > > > > > > > > - cost of generating a file handle on all lookups > > > > - cost of storing file handle in kernel icache > > > > > > > > I don't think either of those are problematic in the virtiofs case. > > > > The cost of having to keep fds open while the client has them in its > > > > cache is much higher. > > > > > > > > > > I was thinking of taking a stab at LOOKUP_HANDLE for a generic > > > implementation of persistent file handles for FUSE. > > > > Hi Amir, > > > > I was going throug the proposal above for LOOKUP_HANDLE and I was > > wondering how nodeid reuse is handled. > > LOOKUP_HANDLE extends the 64bit node id to be variable size id. Ok. So this variable size id is basically file handle returned by host? So this looks little different from what Miklos had suggested. IIUC, he wanted LOOKUP_HANDLE to return both file handle as well as *node id*. ********************************* One proposal was to add LOOKUP_HANDLE operation that is similar to LOOKUP except it takes a {variable length handle, name} as input and returns a variable length handle *and* a u64 node_id that can be used normally for all other operations. *************************************** > A server that declares support for LOOKUP_HANDLE must never > reuse a handle. > > That's the basic idea. Just as a filesystem that declares to support > exportfs must never reuse a file handle. > > > IOW, if server decides to drop > > nodeid from its cache and reuse it for some other file, how will we > > differentiate between two. Some sort of generation id encoded in > > nodeid? > > > > That's usually the way that file handles are implemented in > local fs. The inode number is the internal lookup index and the > generation part is advanced on reuse. > > But for passthrough fs like virtiofsd, the LOOKUP_HANDLE will > just use the native fs file handles, so virtiofsd can evict the inodes > entry from its cache completely, not only close the open fds. Ok, got it. Will be interesting to see how kernel fuse changes look to accomodate this variable sized nodeid. > > That is what my libfuse_passthough POC does. Where have you hosted corresponding kernel changes? Thanks Vivek ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Persistent FUSE file handles (Was: virtiofs uuid and file handles) 2022-09-12 14:35 ` Vivek Goyal @ 2022-09-12 15:07 ` Amir Goldstein 2022-09-12 19:56 ` Vivek Goyal 0 siblings, 1 reply; 22+ messages in thread From: Amir Goldstein @ 2022-09-12 15:07 UTC (permalink / raw) To: Vivek Goyal; +Cc: Miklos Szeredi, linux-fsdevel, Hanna Reitz On Mon, Sep 12, 2022 at 5:35 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > On Mon, Sep 12, 2022 at 04:38:48PM +0300, Amir Goldstein wrote: > > On Mon, Sep 12, 2022 at 4:16 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > > > > > On Sun, Sep 11, 2022 at 01:14:49PM +0300, Amir Goldstein wrote: > > > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > > > > > > One proposal was to add LOOKUP_HANDLE operation that is similar to > > > > > LOOKUP except it takes a {variable length handle, name} as input and > > > > > returns a variable length handle *and* a u64 node_id that can be used > > > > > normally for all other operations. > > > > > > > > > > The advantage of such a scheme for virtio-fs (and possibly other fuse > > > > > based fs) would be that userspace need not keep a refcounted object > > > > > around until the kernel sends a FORGET, but can prune its node ID > > > > > based cache at any time. If that happens and a request from the > > > > > client (kernel) comes in with a stale node ID, the server will return > > > > > -ESTALE and the client can ask for a new node ID with a special > > > > > lookup_handle(fh, NULL). > > > > > > > > > > Disadvantages being: > > > > > > > > > > - cost of generating a file handle on all lookups > > > > > - cost of storing file handle in kernel icache > > > > > > > > > > I don't think either of those are problematic in the virtiofs case. > > > > > The cost of having to keep fds open while the client has them in its > > > > > cache is much higher. > > > > > > > > > > > > > I was thinking of taking a stab at LOOKUP_HANDLE for a generic > > > > implementation of persistent file handles for FUSE. > > > > > > Hi Amir, > > > > > > I was going throug the proposal above for LOOKUP_HANDLE and I was > > > wondering how nodeid reuse is handled. > > > > LOOKUP_HANDLE extends the 64bit node id to be variable size id. > > Ok. So this variable size id is basically file handle returned by > host? > > So this looks little different from what Miklos had suggested. IIUC, > he wanted LOOKUP_HANDLE to return both file handle as well as *node id*. > > ********************************* > One proposal was to add LOOKUP_HANDLE operation that is similar to > LOOKUP except it takes a {variable length handle, name} as input and > returns a variable length handle *and* a u64 node_id that can be used > normally for all other operations. > *************************************** > Ha! Thanks for reminding me about that. It's been a while since I looked at what actually needs to be done. That means that evicting server inodes from cache may not be as easy as I had imagined. > > A server that declares support for LOOKUP_HANDLE must never > > reuse a handle. > > > > That's the basic idea. Just as a filesystem that declares to support > > exportfs must never reuse a file handle. > > > > > > IOW, if server decides to drop > > > nodeid from its cache and reuse it for some other file, how will we > > > differentiate between two. Some sort of generation id encoded in > > > nodeid? > > > > > > > That's usually the way that file handles are implemented in > > local fs. The inode number is the internal lookup index and the > > generation part is advanced on reuse. > > > > But for passthrough fs like virtiofsd, the LOOKUP_HANDLE will > > just use the native fs file handles, so virtiofsd can evict the inodes > > entry from its cache completely, not only close the open fds. > > Ok, got it. Will be interesting to see how kernel fuse changes look > to accomodate this variable sized nodeid. > It may make sense to have a FUSE protocol dialect where nodeid is variable size for all commands, but it probably won't be part of the initial LOOKUP_HANDLE work. > > > > That is what my libfuse_passthough POC does. > > Where have you hosted corresponding kernel changes? > There are no kernel changes. For xfs and ext4 I know how to implement open_by_ino() and I know how to parse the opaque fs file handle to extract ino+generation from it and return them in FUSE_LOOKUP response. So I could manage to implement persistent NFS file handles over the existing FUSE protocol with 64bit node id. Thanks, Amir. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Persistent FUSE file handles (Was: virtiofs uuid and file handles) 2022-09-12 15:07 ` Amir Goldstein @ 2022-09-12 19:56 ` Vivek Goyal 2022-09-13 2:07 ` Amir Goldstein 0 siblings, 1 reply; 22+ messages in thread From: Vivek Goyal @ 2022-09-12 19:56 UTC (permalink / raw) To: Amir Goldstein; +Cc: Miklos Szeredi, linux-fsdevel, Hanna Reitz On Mon, Sep 12, 2022 at 06:07:42PM +0300, Amir Goldstein wrote: > On Mon, Sep 12, 2022 at 5:35 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > > > On Mon, Sep 12, 2022 at 04:38:48PM +0300, Amir Goldstein wrote: > > > On Mon, Sep 12, 2022 at 4:16 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > > > > > > > On Sun, Sep 11, 2022 at 01:14:49PM +0300, Amir Goldstein wrote: > > > > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > > > > > > > > One proposal was to add LOOKUP_HANDLE operation that is similar to > > > > > > LOOKUP except it takes a {variable length handle, name} as input and > > > > > > returns a variable length handle *and* a u64 node_id that can be used > > > > > > normally for all other operations. > > > > > > > > > > > > The advantage of such a scheme for virtio-fs (and possibly other fuse > > > > > > based fs) would be that userspace need not keep a refcounted object > > > > > > around until the kernel sends a FORGET, but can prune its node ID > > > > > > based cache at any time. If that happens and a request from the > > > > > > client (kernel) comes in with a stale node ID, the server will return > > > > > > -ESTALE and the client can ask for a new node ID with a special > > > > > > lookup_handle(fh, NULL). > > > > > > > > > > > > Disadvantages being: > > > > > > > > > > > > - cost of generating a file handle on all lookups > > > > > > - cost of storing file handle in kernel icache > > > > > > > > > > > > I don't think either of those are problematic in the virtiofs case. > > > > > > The cost of having to keep fds open while the client has them in its > > > > > > cache is much higher. > > > > > > > > > > > > > > > > I was thinking of taking a stab at LOOKUP_HANDLE for a generic > > > > > implementation of persistent file handles for FUSE. > > > > > > > > Hi Amir, > > > > > > > > I was going throug the proposal above for LOOKUP_HANDLE and I was > > > > wondering how nodeid reuse is handled. > > > > > > LOOKUP_HANDLE extends the 64bit node id to be variable size id. > > > > Ok. So this variable size id is basically file handle returned by > > host? > > > > So this looks little different from what Miklos had suggested. IIUC, > > he wanted LOOKUP_HANDLE to return both file handle as well as *node id*. > > > > ********************************* > > One proposal was to add LOOKUP_HANDLE operation that is similar to > > LOOKUP except it takes a {variable length handle, name} as input and > > returns a variable length handle *and* a u64 node_id that can be used > > normally for all other operations. > > *************************************** > > > > Ha! Thanks for reminding me about that. > It's been a while since I looked at what actually needs to be done. > That means that evicting server inodes from cache may not be as > easy as I had imagined. > > > > A server that declares support for LOOKUP_HANDLE must never > > > reuse a handle. > > > > > > That's the basic idea. Just as a filesystem that declares to support > > > exportfs must never reuse a file handle. > > > > > > > > > IOW, if server decides to drop > > > > nodeid from its cache and reuse it for some other file, how will we > > > > differentiate between two. Some sort of generation id encoded in > > > > nodeid? > > > > > > > > > > That's usually the way that file handles are implemented in > > > local fs. The inode number is the internal lookup index and the > > > generation part is advanced on reuse. > > > > > > But for passthrough fs like virtiofsd, the LOOKUP_HANDLE will > > > just use the native fs file handles, so virtiofsd can evict the inodes > > > entry from its cache completely, not only close the open fds. > > > > Ok, got it. Will be interesting to see how kernel fuse changes look > > to accomodate this variable sized nodeid. > > > > It may make sense to have a FUSE protocol dialect where nodeid > is variable size for all commands, but it probably won't be part of > the initial LOOKUP_HANDLE work. > > > > > > > That is what my libfuse_passthough POC does. > > > > Where have you hosted corresponding kernel changes? > > > > There are no kernel changes. > > For xfs and ext4 I know how to implement open_by_ino() > and I know how to parse the opaque fs file handle to extract > ino+generation from it and return them in FUSE_LOOKUP > response. Aha, interesting. So is this filesystem specific. Works on xfs/ext4 but not necessarily on other filesystems like nfs. (Because they have their own way of encoding things in file handle). > > So I could manage to implement persistent NFS file handles > over the existing FUSE protocol with 64bit node id. And that explains that why you did not have to make kernel changes. But this does not allow sever to close the fd associate with nodeid? Or there is a way for server to generate file handle and then call open_by_handle_at(). Thanks Vivek ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Persistent FUSE file handles (Was: virtiofs uuid and file handles) 2022-09-12 19:56 ` Vivek Goyal @ 2022-09-13 2:07 ` Amir Goldstein 0 siblings, 0 replies; 22+ messages in thread From: Amir Goldstein @ 2022-09-13 2:07 UTC (permalink / raw) To: Vivek Goyal; +Cc: Miklos Szeredi, linux-fsdevel, Hanna Reitz On Mon, Sep 12, 2022 at 10:56 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > On Mon, Sep 12, 2022 at 06:07:42PM +0300, Amir Goldstein wrote: > > On Mon, Sep 12, 2022 at 5:35 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > > > > > On Mon, Sep 12, 2022 at 04:38:48PM +0300, Amir Goldstein wrote: > > > > On Mon, Sep 12, 2022 at 4:16 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > > > > > > > > > On Sun, Sep 11, 2022 at 01:14:49PM +0300, Amir Goldstein wrote: > > > > > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > > > > > > > > > > One proposal was to add LOOKUP_HANDLE operation that is similar to > > > > > > > LOOKUP except it takes a {variable length handle, name} as input and > > > > > > > returns a variable length handle *and* a u64 node_id that can be used > > > > > > > normally for all other operations. > > > > > > > > > > > > > > The advantage of such a scheme for virtio-fs (and possibly other fuse > > > > > > > based fs) would be that userspace need not keep a refcounted object > > > > > > > around until the kernel sends a FORGET, but can prune its node ID > > > > > > > based cache at any time. If that happens and a request from the > > > > > > > client (kernel) comes in with a stale node ID, the server will return > > > > > > > -ESTALE and the client can ask for a new node ID with a special > > > > > > > lookup_handle(fh, NULL). > > > > > > > > > > > > > > Disadvantages being: > > > > > > > > > > > > > > - cost of generating a file handle on all lookups > > > > > > > - cost of storing file handle in kernel icache > > > > > > > > > > > > > > I don't think either of those are problematic in the virtiofs case. > > > > > > > The cost of having to keep fds open while the client has them in its > > > > > > > cache is much higher. > > > > > > > > > > > > > > > > > > > I was thinking of taking a stab at LOOKUP_HANDLE for a generic > > > > > > implementation of persistent file handles for FUSE. > > > > > > > > > > Hi Amir, > > > > > > > > > > I was going throug the proposal above for LOOKUP_HANDLE and I was > > > > > wondering how nodeid reuse is handled. > > > > > > > > LOOKUP_HANDLE extends the 64bit node id to be variable size id. > > > > > > Ok. So this variable size id is basically file handle returned by > > > host? > > > > > > So this looks little different from what Miklos had suggested. IIUC, > > > he wanted LOOKUP_HANDLE to return both file handle as well as *node id*. > > > > > > ********************************* > > > One proposal was to add LOOKUP_HANDLE operation that is similar to > > > LOOKUP except it takes a {variable length handle, name} as input and > > > returns a variable length handle *and* a u64 node_id that can be used > > > normally for all other operations. > > > *************************************** > > > > > > > Ha! Thanks for reminding me about that. > > It's been a while since I looked at what actually needs to be done. > > That means that evicting server inodes from cache may not be as > > easy as I had imagined. > > > > > > A server that declares support for LOOKUP_HANDLE must never > > > > reuse a handle. > > > > > > > > That's the basic idea. Just as a filesystem that declares to support > > > > exportfs must never reuse a file handle. > > > > > > > > > > > > IOW, if server decides to drop > > > > > nodeid from its cache and reuse it for some other file, how will we > > > > > differentiate between two. Some sort of generation id encoded in > > > > > nodeid? > > > > > > > > > > > > > That's usually the way that file handles are implemented in > > > > local fs. The inode number is the internal lookup index and the > > > > generation part is advanced on reuse. > > > > > > > > But for passthrough fs like virtiofsd, the LOOKUP_HANDLE will > > > > just use the native fs file handles, so virtiofsd can evict the inodes > > > > entry from its cache completely, not only close the open fds. > > > > > > Ok, got it. Will be interesting to see how kernel fuse changes look > > > to accomodate this variable sized nodeid. > > > > > > > It may make sense to have a FUSE protocol dialect where nodeid > > is variable size for all commands, but it probably won't be part of > > the initial LOOKUP_HANDLE work. > > > > > > > > > > That is what my libfuse_passthough POC does. > > > > > > Where have you hosted corresponding kernel changes? > > > > > > > There are no kernel changes. > > > > For xfs and ext4 I know how to implement open_by_ino() > > and I know how to parse the opaque fs file handle to extract > > ino+generation from it and return them in FUSE_LOOKUP > > response. > > Aha, interesting. So is this filesystem specific. Works on xfs/ext4 but > not necessarily on other filesystems like nfs. (Because they have their > own way of encoding things in file handle). Correct. If it is not xfs/ext4, the passthrough implementation uses open fds. In the non xfs/ext4 case, an NFS client may get ESTALE errors after server restart and worse - get the content of the wrong object if server has reused nodeid+genration after restart. > > > > > So I could manage to implement persistent NFS file handles > > over the existing FUSE protocol with 64bit node id. > > And that explains that why you did not have to make kernel changes. But > this does not allow sever to close the fd associate with nodeid? Or there > is a way for server to generate file handle and then call > open_by_handle_at(). > Yes, that's what the server does in case of ext4/xfs: https://github.com/amir73il/libfuse/blob/fuse_passthrough/passthrough/fuse_passthrough.cpp#L912 Thanks, Amir. ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <20200922212534.GH57620@redhat.com>]
[parent not found: <CAOQ4uxjp6NpF_Q0QqUTzE5=YiKz9w6JbUVyROG+rNFcHPAThFg@mail.gmail.com>]
* Re: Copying overlayfs directories with index=on [not found] ` <CAOQ4uxjp6NpF_Q0QqUTzE5=YiKz9w6JbUVyROG+rNFcHPAThFg@mail.gmail.com> @ 2020-09-23 12:53 ` Pavel Tikhomirov 0 siblings, 0 replies; 22+ messages in thread From: Pavel Tikhomirov @ 2020-09-23 12:53 UTC (permalink / raw) To: Amir Goldstein, Vivek Goyal Cc: Pavel Tikhomirov, Miklos Szeredi, linux-unionfs Hi, I've sent a patch which is trying to acheive what Amir had suggested. Please take a look: [PATCH] ovl: introduce new "index=nouuid" option for inodes index feature On 9/23/20 5:10 AM, Amir Goldstein wrote: > On Wed, Sep 23, 2020 at 12:25 AM Vivek Goyal <vgoyal@redhat.com> wrote: >> >> On Tue, Sep 22, 2020 at 02:15:55PM +0300, Amir Goldstein wrote: >> >> [..] >>> >>> No objection, but if I were you I wouldn't bother re-writing new ovl_fh. >>> If you know you don't care about matching uuid in the first place, >>> it is better to add a mount option to overlayfs 'index=nouuid' to relax the >>> uuid comparison check for ovl_fh. >> >> So is it possible that somebody uses "nouuid" and then a different file >> got same file handle (as stored in upper). I think that's one issue >> you were worried about while addressing squashfs fix. IIRC, Miklos had said >> with-in same filesystem it will not happen and across filesystems >> sb->uuid check will ensure this does not happen. IOW, "nouuid" will >> open the possibility of upper file handle matching a different file? >> > > Well, to be accurate, I did write that when cloning a base lower fs (like with > dm-thinp) the problem reported with re-created lower squashfs still exists but > that it is a corner case [1]. > > But what I suggested is that index=nouuid will only be allowed for all layers > on the same fs, where this is not a problem. > > Thanks, > Amir. > > [1] https://lore.kernel.org/linux-unionfs/CAOQ4uxiq7hkaew4LoFZkf4R73iH_pU7OHOriycLCnnywtA0O0w@mail.gmail.com/ > -- Best regards, Tikhomirov Pavel Software Developer, Virtuozzo. ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2022-09-13 2:07 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <a8828676-210a-99e8-30d7-6076f334ed71@virtuozzo.com> [not found] ` <CAOQ4uxgZ08ePA5WFOYFoLZaq_-Kjr-haNzBN5Aj3MfF=f9pjdg@mail.gmail.com> [not found] ` <1bb71cbf-0a10-34c7-409d-914058e102f6@virtuozzo.com> [not found] ` <CAOQ4uxieqnKENV_kJYwfcnPjNdVuqH3BnKVx_zLz=N_PdAguNg@mail.gmail.com> [not found] ` <dc696835-bbb5-ed4e-8708-bc828d415a2b@virtuozzo.com> [not found] ` <CAOQ4uxg0XVEEzc+HyyC63WWZuA2AsRjJmbZBuNimtj=t+quVyg@mail.gmail.com> [not found] ` <20200922210445.GG57620@redhat.com> 2020-09-23 2:49 ` virtiofs uuid and file handles Amir Goldstein 2020-09-23 7:44 ` Miklos Szeredi 2020-09-23 9:56 ` Amir Goldstein 2020-09-23 11:12 ` Miklos Szeredi 2021-05-29 16:05 ` Amir Goldstein 2021-05-31 14:11 ` Miklos Szeredi 2021-05-31 18:12 ` Amir Goldstein 2021-06-01 14:49 ` Vivek Goyal 2021-06-01 15:42 ` Amir Goldstein 2021-06-01 16:08 ` Max Reitz 2021-06-01 18:23 ` Amir Goldstein 2022-09-11 10:14 ` Persistent FUSE file handles (Was: virtiofs uuid and file handles) Amir Goldstein 2022-09-11 15:16 ` Bernd Schubert 2022-09-11 15:29 ` Amir Goldstein 2022-09-11 15:55 ` Bernd Schubert 2022-09-12 13:16 ` Vivek Goyal 2022-09-12 13:38 ` Amir Goldstein 2022-09-12 14:35 ` Vivek Goyal 2022-09-12 15:07 ` Amir Goldstein 2022-09-12 19:56 ` Vivek Goyal 2022-09-13 2:07 ` Amir Goldstein [not found] ` <20200922212534.GH57620@redhat.com> [not found] ` <CAOQ4uxjp6NpF_Q0QqUTzE5=YiKz9w6JbUVyROG+rNFcHPAThFg@mail.gmail.com> 2020-09-23 12:53 ` Copying overlayfs directories with index=on Pavel Tikhomirov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.