All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [fuse-devel] Proxmox + NFS w/ exported FUSE = EIO
       [not found] <d997c02b-d5ef-41f8-92b6-8c6775899388@spawn.link>
@ 2024-02-06  6:53 ` Amir Goldstein
  2024-02-07  0:08   ` Antonio SJ Musumeci
       [not found]   ` <b9cec6b7-0973-4d61-9bef-120e3c4654d7@spawn.link>
  0 siblings, 2 replies; 23+ messages in thread
From: Amir Goldstein @ 2024-02-06  6:53 UTC (permalink / raw)
  To: Antonio SJ Musumeci
  Cc: fuse-devel, Miklos Szeredi, linux-fsdevel, Bernd Schubert

On Tue, Feb 6, 2024 at 4:52 AM Antonio SJ Musumeci <trapexit@spawn.link> wrote:
>
> Hi,
>
> Anyone have users exporting a FUSE filesystem over NFS? Particularly
> from Proxmox (recent release, kernel 6.5.11)? I've gotten a number of
> reports recently from individuals who have such a setup and after some
> time (not easily reproducible, seems usually after software like Plex or
> Jellyfin do a scan of media or a backup process) starts returning EIO
> errors. Not just from NFS but also when trying to access the FUSE mount
> as well. One person noted that they had moved from Ubuntu 18.04 (kernel
> 4.15.0) to Proxmox and on Ubuntu had no problems with otherwise the same
> settings.
>
> I've not yet been able to reproduced this issue myself but wanted to see
> if anyone else has run into this. As far as I can tell from what users
> have reported the FUSE server is still running but isn't receiving most
> requests. I do see evidence of statfs calls coming through but nothing
> else. Though the straces I've received typically are after the issues start.
>
> In an effort to rule out the FUSE server... is there anything the server
> could do to cause the kernel to return EIO and not forward anything but
> statfs? Doesn't seem to matter if direct_io is enabled or attr/entry
> caching is used.
>

This could be the outcome of commit 15db16837a35 ("fuse: fix illegal
access to inode with reused nodeid") in kernel v5.14.

It is not an unintended regression - this behavior replaces what would
have been a potentially severe security violation with an EIO error.

As the commit says:
"...With current code, this situation will not be detected and an old fuse
    dentry that used to point to an older generation real inode, can be used to
    access a completely new inode, which should be accessed only via the new
    dentry."

I have made this fix after seeing users get the content of another
file from the one that they opened in NFS!

libfuse commit 10ecd4f ("test/test_syscalls.c: check unlinked testfiles
at the end of the test") reproduces this problem in a test.
This test does not involve NFS export, but NFS export has higher
likelihood of exposing this issue.

I wonder if the FUSE filesystems that report the errors have
FUSE_EXPORT_SUPPORT capability?
Not that this capability guarantees anything wrt to this issue.

IMO, the root of all evil wrt NFS+FUSE is that LOOKUP is by ino
without generation with FUSE_EXPORT_SUPPORT, but worse
is that FUSE does not even require FUSE_EXPORT_SUPPORT
capability to export to NFS, but this is legacy FUSE behavior and
I am sure that many people export FUSE filesystems, as your
report proves.

There is now a proposal for opt-out of NFS export:
https://lore.kernel.org/linux-fsdevel/20240126072120.71867-1-jefflexu@linux.alibaba.com/
so there will be a way for a FUSE filesystem to prevent misuse.

Some practical suggestions for users running existing FUSE filesystems:

- Never export a FUSE filesystem with a fixed fsid
- Everytime one wants to export a FUSE filesystem generate
  a oneshot fsid/uuid to use in exportfs
- Then restarting/re-exporting the FUSE filesystem will result in
  ESTALE errors on NFS client, but not security violations and not EIO
  errors
- This does not give full guarantee, unlinked inodes could still result
  in EIO errors, as the libfuse test demonstrates
- The situation with NFSv4 is slightly better than with NFSv3, because
   with NFSv3, an open file in the client does not keep the FUSE file
   open and increases the chance of evicted FUSE inode for an open
   NFS file

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [fuse-devel] Proxmox + NFS w/ exported FUSE = EIO
  2024-02-06  6:53 ` [fuse-devel] Proxmox + NFS w/ exported FUSE = EIO Amir Goldstein
@ 2024-02-07  0:08   ` Antonio SJ Musumeci
  2024-02-07  7:24     ` Amir Goldstein
       [not found]   ` <b9cec6b7-0973-4d61-9bef-120e3c4654d7@spawn.link>
  1 sibling, 1 reply; 23+ messages in thread
From: Antonio SJ Musumeci @ 2024-02-07  0:08 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: fuse-devel, Miklos Szeredi, linux-fsdevel, Bernd Schubert

On 2/6/24 00:53, Amir Goldstein wrote:
> On Tue, Feb 6, 2024 at 4:52 AM Antonio SJ Musumeci <trapexit@spawn.link> wrote:
>> Hi,
>>
>> Anyone have users exporting a FUSE filesystem over NFS? Particularly
>> from Proxmox (recent release, kernel 6.5.11)? I've gotten a number of
>> reports recently from individuals who have such a setup and after some
>> time (not easily reproducible, seems usually after software like Plex or
>> Jellyfin do a scan of media or a backup process) starts returning EIO
>> errors. Not just from NFS but also when trying to access the FUSE mount
>> as well. One person noted that they had moved from Ubuntu 18.04 (kernel
>> 4.15.0) to Proxmox and on Ubuntu had no problems with otherwise the same
>> settings.
>>
>> I've not yet been able to reproduced this issue myself but wanted to see
>> if anyone else has run into this. As far as I can tell from what users
>> have reported the FUSE server is still running but isn't receiving most
>> requests. I do see evidence of statfs calls coming through but nothing
>> else. Though the straces I've received typically are after the issues start.
>>
>> In an effort to rule out the FUSE server... is there anything the server
>> could do to cause the kernel to return EIO and not forward anything but
>> statfs? Doesn't seem to matter if direct_io is enabled or attr/entry
>> caching is used.
>>
> This could be the outcome of commit 15db16837a35 ("fuse: fix illegal
> access to inode with reused nodeid") in kernel v5.14.
>
> It is not an unintended regression - this behavior replaces what would
> have been a potentially severe security violation with an EIO error.
>
> As the commit says:
> "...With current code, this situation will not be detected and an old fuse
>      dentry that used to point to an older generation real inode, can be used to
>      access a completely new inode, which should be accessed only via the new
>      dentry."
>
> I have made this fix after seeing users get the content of another
> file from the one that they opened in NFS!
>
> libfuse commit 10ecd4f ("test/test_syscalls.c: check unlinked testfiles
> at the end of the test") reproduces this problem in a test.
> This test does not involve NFS export, but NFS export has higher
> likelihood of exposing this issue.
>
> I wonder if the FUSE filesystems that report the errors have
> FUSE_EXPORT_SUPPORT capability?
> Not that this capability guarantees anything wrt to this issue.
>
> IMO, the root of all evil wrt NFS+FUSE is that LOOKUP is by ino
> without generation with FUSE_EXPORT_SUPPORT, but worse
> is that FUSE does not even require FUSE_EXPORT_SUPPORT
> capability to export to NFS, but this is legacy FUSE behavior and
> I am sure that many people export FUSE filesystems, as your
> report proves.
>
> There is now a proposal for opt-out of NFS export:
> https://lore.kernel.org/linux-fsdevel/20240126072120.71867-1-jefflexu@linux.alibaba.com/
> so there will be a way for a FUSE filesystem to prevent misuse.
>
> Some practical suggestions for users running existing FUSE filesystems:
>
> - Never export a FUSE filesystem with a fixed fsid
> - Everytime one wants to export a FUSE filesystem generate
>    a oneshot fsid/uuid to use in exportfs
> - Then restarting/re-exporting the FUSE filesystem will result in
>    ESTALE errors on NFS client, but not security violations and not EIO
>    errors
> - This does not give full guarantee, unlinked inodes could still result
>    in EIO errors, as the libfuse test demonstrates
> - The situation with NFSv4 is slightly better than with NFSv3, because
>     with NFSv3, an open file in the client does not keep the FUSE file
>     open and increases the chance of evicted FUSE inode for an open
>     NFS file
>
> Thanks,
> Amir.

Thank you Amir for such a detailed response. I'll look into this further 
but a few questions. To answer your question: yes, the server is setting 
EXPORT_SUPPORT.

1. The expected behavior, if the above situation occurred, is that the 
whole of the mount would return EIO? All requests going forward? What 
about FUSE_STATFS? From what I saw that was coming through.

2. Regarding the tests. I downloaded the latest libfuse, compiled, and 
ran test_syscalls against the FUSE server. I get no failures when 
running `./test_syscalls /mnt/fusemount :/mnt/ext4mount -u` or 
`./test_syscalls /mnt/fusemount -u` where ext4mount is the underlying 
filesystem and fusemount is the FUSE server's. No error is reported. A 
strace shows the fstat returning ESTALE at the end but the tests all 
pass. The mount continues to work after running the test. This is on 
kernel 6.5.0. Is that expected? It sounds from your description that I 
should be seeing EIOs somewhere.

3. Thank you for the "practical suggestions". I will compare them to 
what my users are doing... but are there specific guidelines somewhere 
for building a FUSE server to ensure NFS export can be supported? This 
topic has had limited details available over the years and I/users have 
had odd behaviors at times that were unclear of the cause. Like this 
situation or when NFS somehow triggered a request for '..' of the root 
nodeid (1). Some questions that come to mind: is the generation strictly 
necessary (practically) for things to work so long as nodeid is unique 
during a session (64bit nodeid space can last a long time)? Is there 
possibility of conflict if multiple fuse servers used the same 
nodeid//gen pairs at the same time? To what degree does the inode value 
matter? Should old node/gen pairs be kept around forever as noforget 
libfuse option suggests for NFS? Perhaps some of this is obvious but 
given changes to FUSE over time and the differences between kernel and 
userspace fs experiences it would be nice to have some of these more 
niche/complicated situations better flushed out in the official docs.

Thanks again.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [fuse-devel] Proxmox + NFS w/ exported FUSE = EIO
  2024-02-07  0:08   ` Antonio SJ Musumeci
@ 2024-02-07  7:24     ` Amir Goldstein
  0 siblings, 0 replies; 23+ messages in thread
From: Amir Goldstein @ 2024-02-07  7:24 UTC (permalink / raw)
  To: Antonio SJ Musumeci
  Cc: fuse-devel, Miklos Szeredi, linux-fsdevel, Bernd Schubert

On Wed, Feb 7, 2024 at 2:08 AM Antonio SJ Musumeci <trapexit@spawn.link> wrote:
>
> On 2/6/24 00:53, Amir Goldstein wrote:
> > On Tue, Feb 6, 2024 at 4:52 AM Antonio SJ Musumeci <trapexit@spawn.link> wrote:
> >> Hi,
> >>
> >> Anyone have users exporting a FUSE filesystem over NFS? Particularly
> >> from Proxmox (recent release, kernel 6.5.11)? I've gotten a number of
> >> reports recently from individuals who have such a setup and after some
> >> time (not easily reproducible, seems usually after software like Plex or
> >> Jellyfin do a scan of media or a backup process) starts returning EIO
> >> errors. Not just from NFS but also when trying to access the FUSE mount
> >> as well. One person noted that they had moved from Ubuntu 18.04 (kernel
> >> 4.15.0) to Proxmox and on Ubuntu had no problems with otherwise the same
> >> settings.
> >>
> >> I've not yet been able to reproduced this issue myself but wanted to see
> >> if anyone else has run into this. As far as I can tell from what users
> >> have reported the FUSE server is still running but isn't receiving most
> >> requests. I do see evidence of statfs calls coming through but nothing
> >> else. Though the straces I've received typically are after the issues start.
> >>
> >> In an effort to rule out the FUSE server... is there anything the server
> >> could do to cause the kernel to return EIO and not forward anything but
> >> statfs? Doesn't seem to matter if direct_io is enabled or attr/entry
> >> caching is used.
> >>
> > This could be the outcome of commit 15db16837a35 ("fuse: fix illegal
> > access to inode with reused nodeid") in kernel v5.14.
> >
> > It is not an unintended regression - this behavior replaces what would
> > have been a potentially severe security violation with an EIO error.
> >
> > As the commit says:
> > "...With current code, this situation will not be detected and an old fuse
> >      dentry that used to point to an older generation real inode, can be used to
> >      access a completely new inode, which should be accessed only via the new
> >      dentry."
> >
> > I have made this fix after seeing users get the content of another
> > file from the one that they opened in NFS!
> >
> > libfuse commit 10ecd4f ("test/test_syscalls.c: check unlinked testfiles
> > at the end of the test") reproduces this problem in a test.
> > This test does not involve NFS export, but NFS export has higher
> > likelihood of exposing this issue.
> >
> > I wonder if the FUSE filesystems that report the errors have
> > FUSE_EXPORT_SUPPORT capability?
> > Not that this capability guarantees anything wrt to this issue.
> >
> > IMO, the root of all evil wrt NFS+FUSE is that LOOKUP is by ino
> > without generation with FUSE_EXPORT_SUPPORT, but worse
> > is that FUSE does not even require FUSE_EXPORT_SUPPORT
> > capability to export to NFS, but this is legacy FUSE behavior and
> > I am sure that many people export FUSE filesystems, as your
> > report proves.
> >
> > There is now a proposal for opt-out of NFS export:
> > https://lore.kernel.org/linux-fsdevel/20240126072120.71867-1-jefflexu@linux.alibaba.com/
> > so there will be a way for a FUSE filesystem to prevent misuse.
> >
> > Some practical suggestions for users running existing FUSE filesystems:
> >
> > - Never export a FUSE filesystem with a fixed fsid
> > - Everytime one wants to export a FUSE filesystem generate
> >    a oneshot fsid/uuid to use in exportfs
> > - Then restarting/re-exporting the FUSE filesystem will result in
> >    ESTALE errors on NFS client, but not security violations and not EIO
> >    errors
> > - This does not give full guarantee, unlinked inodes could still result
> >    in EIO errors, as the libfuse test demonstrates
> > - The situation with NFSv4 is slightly better than with NFSv3, because
> >     with NFSv3, an open file in the client does not keep the FUSE file
> >     open and increases the chance of evicted FUSE inode for an open
> >     NFS file
> >
> > Thanks,
> > Amir.
>
> Thank you Amir for such a detailed response. I'll look into this further
> but a few questions. To answer your question: yes, the server is setting
> EXPORT_SUPPORT.
>
> 1. The expected behavior, if the above situation occurred, is that the
> whole of the mount would return EIO? All requests going forward? What
> about FUSE_STATFS? From what I saw that was coming through.
>

It's only for a specific bad/stale inode which you have an open fd for
and trying to access, but another FUSE inode object already reused
its inode number.

> 2. Regarding the tests. I downloaded the latest libfuse, compiled, and
> ran test_syscalls against the FUSE server. I get no failures when
> running `./test_syscalls /mnt/fusemount :/mnt/ext4mount -u` or
> `./test_syscalls /mnt/fusemount -u` where ext4mount is the underlying
> filesystem and fusemount is the FUSE server's. No error is reported. A
> strace shows the fstat returning ESTALE at the end but the tests all
> pass. The mount continues to work after running the test. This is on
> kernel 6.5.0. Is that expected? It sounds from your description that I
> should be seeing EIOs somewhere.
>

It is expected.
The test says:

                        // With O_PATH fd, the server does not have to keep
                        // the inode alive so FUSE inode may be stale or bad
                        if (errno == ESTALE || errno == EIO ||
                            errno == ENOENT || errno == EBADF)
                                return 0;

So it is a matter of chance which error you get.
But those EIO errors are relatively rare, so if your users see them
across the fs, it's probably due to something else.

> 3. Thank you for the "practical suggestions". I will compare them to
> what my users are doing... but are there specific guidelines somewhere
> for building a FUSE server to ensure NFS export can be supported? This

I have implemented a library/fs-template for writing FUSE passthrough fs
that supports persistent NFS file handles (i.e. they survive server restart):

https://github.com/amir73il/libfuse/blob/fuse_passthrough/passthrough/fuse_passthrough.cpp

This is an implementation that assumes passthrough to ext4/xfs.
A generic implementation would require FUSE protocol change.

See: https://lore.kernel.org/linux-fsdevel/CAOQ4uxiJ3qxb_XNWdmQPZ3omT3fjEhoMfG=3CSKucvoJbj6JSg@mail.gmail.com/

> topic has had limited details available over the years and I/users have
> had odd behaviors at times that were unclear of the cause. Like this
> situation or when NFS somehow triggered a request for '..' of the root
> nodeid (1). Some questions that come to mind: is the generation strictly
> necessary (practically) for things to work so long as nodeid is unique
> during a session (64bit nodeid space can last a long time)? Is there

The nodeid space is restarted on server restart and new nodeids are
assigned to same objects.

Using server inode numbers is more sane but as the test demonstrates
it is not always enough.

> possibility of conflict if multiple fuse servers used the same
> nodeid//gen pairs at the same time?

You cannot export two different fs with the same fsid/uuid at the same
time. NFS won't let you do that.

>  To what degree does the inode value
> matter? Should old node/gen pairs be kept around forever as noforget
> libfuse option suggests for NFS?

Does not matter.
As long as FUSE protocol does lookup by ino without generation
there is little that the server can do. It can only return the most
recent generation for that ino.

> Perhaps some of this is obvious but
> given changes to FUSE over time and the differences between kernel and
> userspace fs experiences it would be nice to have some of these more
> niche/complicated situations better flushed out in the official docs.
>

Would be nice if someone picked up that glove, but nothing about this
is trivial...

My plan was to contribute fuse_passthrough lib to the libfuse project,
but my focus has shifted and it requires some work yet to package this lib.

If someone is interested to take up this work and help maintain this
library, I am willing to help them.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [fuse-devel] Proxmox + NFS w/ exported FUSE = EIO
       [not found]     ` <CAOQ4uxgZR4OtCkdrpcDGCK-MqZEHcrx+RY4G94saqaXVkL4cKA@mail.gmail.com>
@ 2024-02-18  0:48       ` Antonio SJ Musumeci
  2024-02-19 11:36         ` Bernd Schubert
  0 siblings, 1 reply; 23+ messages in thread
From: Antonio SJ Musumeci @ 2024-02-18  0:48 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Linux FS Devel, fuse-devel

On 2/7/24 01:04, Amir Goldstein wrote:
> On Wed, Feb 7, 2024 at 5:05 AM Antonio SJ Musumeci <trapexit@spawn.link> wrote:
>>
>> On 2/6/24 00:53, Amir Goldstein wrote:
> only for a specific inode object to which you have an open fd for.
> Certainly not at the sb/mount level.
> 
> Thanks,
> Amir.

Thanks again Amir.

I've narrowed down the situation but I'm still struggling to pinpoint 
the specifics. And I'm unfortunately currently unable to replicate using 
any of the passthrough examples. Perhaps some feature I'm enabling (or 
not). My next steps are looking at exactly what differences there are in 
the INIT reply.

I'm seeing a FUSE_LOOKUP request coming in for ".." of nodeid 1.

I have my FUSE fs setup about as simply as I can. Single threaded. attr 
and entry/neg-entry caching off. direct-io on. EXPORT_SUPPORT is 
enabled. The mountpoint is exported via NFS. On the same host I mount 
NFS. I mount it on another host as well.

On the local machine I loop reading a large file using dd 
(if=/mnt/nfs/file, of=/dev/null). After it finished I echo 3 > 
drop_caches. That alone will go forever. If on the second machine I 
start issuing `ls -lh /mnt/nfs` repeatedly after a moment it will 
trigger the issue.

`ls` will successfully statx /mnt/nfs and the following openat and 
getdents also return successfully. As it iterates over the output of 
getdents statx's for directories fail with EIO and files succeed as 
normal. In my FUSE server for each EIO failure I'm seeing a lookup for 
".." on nodeid 1. Afterwards all lookups fail on /mnt/nfs. The only 
request that seems to work is statfs.

This was happening some time ago without me being able to reproduce it 
so I put a check to see if that was happening and return -ENOENT. 
However, looking over libfuse HLAPI it looks like fuse_lib_lookup 
doesn't handle this situation. Perhaps a segv waiting to happen?

If I remove EXPORT_SUPPORT I'm no longer triggering the issue (which I 
guess makes sense.)

Any ideas on how/why ".." for root node is coming in? Is that valid? It 
only happens when using NFS? I know there is talk of adding the ability 
of refusing export but what is the consequence of disabling 
EXPORT_SUPPORT? Is there a performance or capability difference? If it 
is a valid request what should I be returning?

Thanks in advance.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [fuse-devel] Proxmox + NFS w/ exported FUSE = EIO
  2024-02-18  0:48       ` Antonio SJ Musumeci
@ 2024-02-19 11:36         ` Bernd Schubert
  2024-02-19 19:05           ` Antonio SJ Musumeci
  0 siblings, 1 reply; 23+ messages in thread
From: Bernd Schubert @ 2024-02-19 11:36 UTC (permalink / raw)
  To: Antonio SJ Musumeci, Amir Goldstein; +Cc: Linux FS Devel, fuse-devel



On 2/18/24 01:48, Antonio SJ Musumeci wrote:
> On 2/7/24 01:04, Amir Goldstein wrote:
>> On Wed, Feb 7, 2024 at 5:05 AM Antonio SJ Musumeci <trapexit@spawn.link> wrote:
>>>
>>> On 2/6/24 00:53, Amir Goldstein wrote:
>> only for a specific inode object to which you have an open fd for.
>> Certainly not at the sb/mount level.
>>
>> Thanks,
>> Amir.
> 
> Thanks again Amir.
> 
> I've narrowed down the situation but I'm still struggling to pinpoint 
> the specifics. And I'm unfortunately currently unable to replicate using 
> any of the passthrough examples. Perhaps some feature I'm enabling (or 
> not). My next steps are looking at exactly what differences there are in 
> the INIT reply.
> 
> I'm seeing a FUSE_LOOKUP request coming in for ".." of nodeid 1.
> 
> I have my FUSE fs setup about as simply as I can. Single threaded. attr 
> and entry/neg-entry caching off. direct-io on. EXPORT_SUPPORT is 
> enabled. The mountpoint is exported via NFS. On the same host I mount 
> NFS. I mount it on another host as well.
> 
> On the local machine I loop reading a large file using dd 
> (if=/mnt/nfs/file, of=/dev/null). After it finished I echo 3 > 
> drop_caches. That alone will go forever. If on the second machine I 
> start issuing `ls -lh /mnt/nfs` repeatedly after a moment it will 
> trigger the issue.
> 
> `ls` will successfully statx /mnt/nfs and the following openat and 
> getdents also return successfully. As it iterates over the output of 
> getdents statx's for directories fail with EIO and files succeed as 
> normal. In my FUSE server for each EIO failure I'm seeing a lookup for 
> ".." on nodeid 1. Afterwards all lookups fail on /mnt/nfs. The only 
> request that seems to work is statfs.
> 
> This was happening some time ago without me being able to reproduce it 
> so I put a check to see if that was happening and return -ENOENT. 
> However, looking over libfuse HLAPI it looks like fuse_lib_lookup 
> doesn't handle this situation. Perhaps a segv waiting to happen?
> 
> If I remove EXPORT_SUPPORT I'm no longer triggering the issue (which I 
> guess makes sense.)
> 
> Any ideas on how/why ".." for root node is coming in? Is that valid? It 
> only happens when using NFS? I know there is talk of adding the ability 
> of refusing export but what is the consequence of disabling 
> EXPORT_SUPPORT? Is there a performance or capability difference? If it 
> is a valid request what should I be returning?

If you don't set EXPORT_SUPPORT, it just returns -ESTALE in the kernel
side functions - which is then probably handled by the NFS client. I
don't think it can handle that in all situations, though. With
EXPORT_SUPPORT an uncached inode is attempted to be opened with the name
"." and the node-id set in the lookup call. Similar for parent, but
".." is used.

A simple case were this would already fail without NFS, but with the
same API

name_to_handle_at()
umount fuse
mount fuse
open_by_handle_at


I will see if I can come up with a simple patch that just passes these
through to fuse-server


static const struct export_operations fuse_export_operations = {
	.fh_to_dentry	= fuse_fh_to_dentry,
	.fh_to_parent	= fuse_fh_to_parent,
	.encode_fh	= fuse_encode_fh,
	.get_parent	= fuse_get_parent,
};




Cheers,
Bernd

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [fuse-devel] Proxmox + NFS w/ exported FUSE = EIO
  2024-02-19 11:36         ` Bernd Schubert
@ 2024-02-19 19:05           ` Antonio SJ Musumeci
  2024-02-19 19:17             ` Bernd Schubert
  2024-02-19 19:38             ` Miklos Szeredi
  0 siblings, 2 replies; 23+ messages in thread
From: Antonio SJ Musumeci @ 2024-02-19 19:05 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: Amir Goldstein, linux-fsdevel, fuse-devel

On Monday, February 19th, 2024 at 5:36 AM, Bernd Schubert <bernd.schubert@fastmail.fm> wrote:

>
>
>
>
> On 2/18/24 01:48, Antonio SJ Musumeci wrote:
>
> > On 2/7/24 01:04, Amir Goldstein wrote:
> >
> > > On Wed, Feb 7, 2024 at 5:05 AM Antonio SJ Musumeci trapexit@spawn.link wrote:
> > >
> > > > On 2/6/24 00:53, Amir Goldstein wrote:
> > > > only for a specific inode object to which you have an open fd for.
> > > > Certainly not at the sb/mount level.
> > >
> > > Thanks,
> > > Amir.
> >
> > Thanks again Amir.
> >
> > I've narrowed down the situation but I'm still struggling to pinpoint
> > the specifics. And I'm unfortunately currently unable to replicate using
> > any of the passthrough examples. Perhaps some feature I'm enabling (or
> > not). My next steps are looking at exactly what differences there are in
> > the INIT reply.
> >
> > I'm seeing a FUSE_LOOKUP request coming in for ".." of nodeid 1.
> >
> > I have my FUSE fs setup about as simply as I can. Single threaded. attr
> > and entry/neg-entry caching off. direct-io on. EXPORT_SUPPORT is
> > enabled. The mountpoint is exported via NFS. On the same host I mount
> > NFS. I mount it on another host as well.
> >
> > On the local machine I loop reading a large file using dd
> > (if=/mnt/nfs/file, of=/dev/null). After it finished I echo 3 >
> > drop_caches. That alone will go forever. If on the second machine I
> > start issuing `ls -lh /mnt/nfs` repeatedly after a moment it will
> > trigger the issue.
> >
> > `ls` will successfully statx /mnt/nfs and the following openat and
> > getdents also return successfully. As it iterates over the output of
> > getdents statx's for directories fail with EIO and files succeed as
> > normal. In my FUSE server for each EIO failure I'm seeing a lookup for
> > ".." on nodeid 1. Afterwards all lookups fail on /mnt/nfs. The only
> > request that seems to work is statfs.
> >
> > This was happening some time ago without me being able to reproduce it
> > so I put a check to see if that was happening and return -ENOENT.
> > However, looking over libfuse HLAPI it looks like fuse_lib_lookup
> > doesn't handle this situation. Perhaps a segv waiting to happen?
> >
> > If I remove EXPORT_SUPPORT I'm no longer triggering the issue (which I
> > guess makes sense.)
> >
> > Any ideas on how/why ".." for root node is coming in? Is that valid? It
> > only happens when using NFS? I know there is talk of adding the ability
> > of refusing export but what is the consequence of disabling
> > EXPORT_SUPPORT? Is there a performance or capability difference? If it
> > is a valid request what should I be returning?
>
>
> If you don't set EXPORT_SUPPORT, it just returns -ESTALE in the kernel
> side functions - which is then probably handled by the NFS client. I
> don't think it can handle that in all situations, though. With
> EXPORT_SUPPORT an uncached inode is attempted to be opened with the name
> "." and the node-id set in the lookup call. Similar for parent, but
> ".." is used.
>
> A simple case were this would already fail without NFS, but with the
> same API
>
> name_to_handle_at()
> umount fuse
> mount fuse
> open_by_handle_at
>
>
> I will see if I can come up with a simple patch that just passes these
> through to fuse-server
>
>
> static const struct export_operations fuse_export_operations = {
> .fh_to_dentry = fuse_fh_to_dentry,
> .fh_to_parent = fuse_fh_to_parent,
> .encode_fh = fuse_encode_fh,
> .get_parent = fuse_get_parent,
> };
>
>
>
>
> Cheers,
> Bernd

Thank you but I'm not sure I'm able to piece together the answers to my questions from that.

Perhaps my ignorance of the kernel side is showing but how can the root node have a parent? If it can have a parent then does that mean that the HLAPI has a possible bug in lookup?

I handle "." and ".." just fine for non-root nodes. But this is `lookup(nodeid=1,name="..");`.

Given the relative directory structure:

* /dir1/
* /dir2/
* /dir3/
* /file1
* /file2

This is what I see from the kernel:

lookup(nodeid=3, name=.);
lookup(nodeid=3, name=..);
lookup(nodeid=1, name=dir2);
lookup(nodeid=1, name=..);
forget(nodeid=3);
forget(nodeid=1);

lookup(nodeid=4, name=.);
lookup(nodeid=4, name=..);
lookup(nodeid=1, name=dir3);
lookup(nodeid=1, name=..);
forget(nodeid=4);

lookup(nodeid=5, name=.);
lookup(nodeid=5, name=..);
lookup(nodeid=1, name=dir1);
lookup(nodeid=1, name=..);
forget(nodeid=5);
forget(nodeid=1);


It isn't clear to me what the proper response is for lookup(nodeid=1, name=..). Make something up? From userspace if you stat "/.." you get details for "/". If I respond to that lookup request with the details of root node it errors out.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [fuse-devel] Proxmox + NFS w/ exported FUSE = EIO
  2024-02-19 19:05           ` Antonio SJ Musumeci
@ 2024-02-19 19:17             ` Bernd Schubert
  2024-02-19 19:38             ` Miklos Szeredi
  1 sibling, 0 replies; 23+ messages in thread
From: Bernd Schubert @ 2024-02-19 19:17 UTC (permalink / raw)
  To: Antonio SJ Musumeci; +Cc: Amir Goldstein, linux-fsdevel, fuse-devel



On 2/19/24 20:05, Antonio SJ Musumeci wrote:
> On Monday, February 19th, 2024 at 5:36 AM, Bernd Schubert <bernd.schubert@fastmail.fm> wrote:
> 
>>
>>
>>
>>
>> On 2/18/24 01:48, Antonio SJ Musumeci wrote:
>>
>>> On 2/7/24 01:04, Amir Goldstein wrote:
>>>
>>>> On Wed, Feb 7, 2024 at 5:05 AM Antonio SJ Musumeci trapexit@spawn.link wrote:
>>>>
>>>>> On 2/6/24 00:53, Amir Goldstein wrote:
>>>>> only for a specific inode object to which you have an open fd for.
>>>>> Certainly not at the sb/mount level.
>>>>
>>>> Thanks,
>>>> Amir.
>>>
>>> Thanks again Amir.
>>>
>>> I've narrowed down the situation but I'm still struggling to pinpoint
>>> the specifics. And I'm unfortunately currently unable to replicate using
>>> any of the passthrough examples. Perhaps some feature I'm enabling (or
>>> not). My next steps are looking at exactly what differences there are in
>>> the INIT reply.
>>>
>>> I'm seeing a FUSE_LOOKUP request coming in for ".." of nodeid 1.
>>>
>>> I have my FUSE fs setup about as simply as I can. Single threaded. attr
>>> and entry/neg-entry caching off. direct-io on. EXPORT_SUPPORT is
>>> enabled. The mountpoint is exported via NFS. On the same host I mount
>>> NFS. I mount it on another host as well.
>>>
>>> On the local machine I loop reading a large file using dd
>>> (if=/mnt/nfs/file, of=/dev/null). After it finished I echo 3 >
>>> drop_caches. That alone will go forever. If on the second machine I
>>> start issuing `ls -lh /mnt/nfs` repeatedly after a moment it will
>>> trigger the issue.
>>>
>>> `ls` will successfully statx /mnt/nfs and the following openat and
>>> getdents also return successfully. As it iterates over the output of
>>> getdents statx's for directories fail with EIO and files succeed as
>>> normal. In my FUSE server for each EIO failure I'm seeing a lookup for
>>> ".." on nodeid 1. Afterwards all lookups fail on /mnt/nfs. The only
>>> request that seems to work is statfs.
>>>
>>> This was happening some time ago without me being able to reproduce it
>>> so I put a check to see if that was happening and return -ENOENT.
>>> However, looking over libfuse HLAPI it looks like fuse_lib_lookup
>>> doesn't handle this situation. Perhaps a segv waiting to happen?
>>>
>>> If I remove EXPORT_SUPPORT I'm no longer triggering the issue (which I
>>> guess makes sense.)
>>>
>>> Any ideas on how/why ".." for root node is coming in? Is that valid? It
>>> only happens when using NFS? I know there is talk of adding the ability
>>> of refusing export but what is the consequence of disabling
>>> EXPORT_SUPPORT? Is there a performance or capability difference? If it
>>> is a valid request what should I be returning?
>>
>>
>> If you don't set EXPORT_SUPPORT, it just returns -ESTALE in the kernel
>> side functions - which is then probably handled by the NFS client. I
>> don't think it can handle that in all situations, though. With
>> EXPORT_SUPPORT an uncached inode is attempted to be opened with the name
>> "." and the node-id set in the lookup call. Similar for parent, but
>> ".." is used.
>>
>> A simple case were this would already fail without NFS, but with the
>> same API
>>
>> name_to_handle_at()
>> umount fuse
>> mount fuse
>> open_by_handle_at
>>
>>
>> I will see if I can come up with a simple patch that just passes these
>> through to fuse-server
>>
>>
>> static const struct export_operations fuse_export_operations = {
>> .fh_to_dentry = fuse_fh_to_dentry,
>> .fh_to_parent = fuse_fh_to_parent,
>> .encode_fh = fuse_encode_fh,
>> .get_parent = fuse_get_parent,
>> };
>>
>>
>>
>>
>> Cheers,
>> Bernd
> 
> Thank you but I'm not sure I'm able to piece together the answers to my questions from that.
> 
> Perhaps my ignorance of the kernel side is showing but how can the root node have a parent? If it can have a parent then does that mean that the HLAPI has a possible bug in lookup?
> 
> I handle "." and ".." just fine for non-root nodes. But this is `lookup(nodeid=1,name="..");`.
> 
> Given the relative directory structure:
> 
> * /dir1/
> * /dir2/
> * /dir3/
> * /file1
> * /file2
> 
> This is what I see from the kernel:
> 
> lookup(nodeid=3, name=.);
> lookup(nodeid=3, name=..);
> lookup(nodeid=1, name=dir2);
> lookup(nodeid=1, name=..);
> forget(nodeid=3);
> forget(nodeid=1);
> 
> lookup(nodeid=4, name=.);
> lookup(nodeid=4, name=..);
> lookup(nodeid=1, name=dir3);
> lookup(nodeid=1, name=..);
> forget(nodeid=4);
> 
> lookup(nodeid=5, name=.);
> lookup(nodeid=5, name=..);
> lookup(nodeid=1, name=dir1);
> lookup(nodeid=1, name=..);
> forget(nodeid=5);
> forget(nodeid=1);
> 
> 
> It isn't clear to me what the proper response is for lookup(nodeid=1, name=..). Make something up? From userspace if you stat "/.." you get details for "/". If I respond to that lookup request with the details of root node it errors out.

I might be wrong, but from my understanding of the code, "." here means
"I don't have the name, please look up the entry by ID. Entry can be any
valid directory entry. And  ".." means "I don't have the name, please
look up parent by ID". Parent has to be a directory. So for ".." and
ID=FUSE_ROOT_ID it should return your file system root dir.




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [fuse-devel] Proxmox + NFS w/ exported FUSE = EIO
  2024-02-19 19:05           ` Antonio SJ Musumeci
  2024-02-19 19:17             ` Bernd Schubert
@ 2024-02-19 19:38             ` Miklos Szeredi
  2024-02-19 19:54               ` Antonio SJ Musumeci
  2024-02-19 19:55               ` Bernd Schubert
  1 sibling, 2 replies; 23+ messages in thread
From: Miklos Szeredi @ 2024-02-19 19:38 UTC (permalink / raw)
  To: Antonio SJ Musumeci
  Cc: Bernd Schubert, Amir Goldstein, linux-fsdevel, fuse-devel

On Mon, 19 Feb 2024 at 20:05, Antonio SJ Musumeci <trapexit@spawn.link> wrote:

> This is what I see from the kernel:
>
> lookup(nodeid=3, name=.);
> lookup(nodeid=3, name=..);
> lookup(nodeid=1, name=dir2);
> lookup(nodeid=1, name=..);
> forget(nodeid=3);
> forget(nodeid=1);

This is really weird.  It's a kernel bug, no arguments, because kernel
should never send a forget against the root inode.   But that
lookup(nodeid=1, name=..); already looks bogus.

Will try to untangle this by code review, but instructions to
reproduce could be a big help.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [fuse-devel] Proxmox + NFS w/ exported FUSE = EIO
  2024-02-19 19:38             ` Miklos Szeredi
@ 2024-02-19 19:54               ` Antonio SJ Musumeci
  2024-02-20  8:35                 ` Miklos Szeredi
  2024-02-19 19:55               ` Bernd Schubert
  1 sibling, 1 reply; 23+ messages in thread
From: Antonio SJ Musumeci @ 2024-02-19 19:54 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Bernd Schubert, Amir Goldstein, linux-fsdevel, fuse-devel

On 2/19/24 13:38, Miklos Szeredi wrote:
> On Mon, 19 Feb 2024 at 20:05, Antonio SJ Musumeci <trapexit@spawn.link> wrote:
> 
>> This is what I see from the kernel:
>>
>> lookup(nodeid=3, name=.);
>> lookup(nodeid=3, name=..);
>> lookup(nodeid=1, name=dir2);
>> lookup(nodeid=1, name=..);
>> forget(nodeid=3);
>> forget(nodeid=1);
> 
> This is really weird.  It's a kernel bug, no arguments, because kernel
> should never send a forget against the root inode.   But that
> lookup(nodeid=1, name=..); already looks bogus.
> 
> Will try to untangle this by code review, but instructions to
> reproduce could be a big help.
> 
> Thanks,
> Miklos

Thank you.

I'll try to build something bespoke against current libfuse.

That forget(nodeid=1) is coming in after my code detects that 
lookup(nodeid=1, name="..") and returns -ENOENT. Which in a way makes 
sense. Though it seems to happen no matter what I return. Returning the 
same as lookup(nodeid=1, name=".") results in the same behavior. I've 
not tried creating a fake node with a new nodeid/generation though.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [fuse-devel] Proxmox + NFS w/ exported FUSE = EIO
  2024-02-19 19:38             ` Miklos Szeredi
  2024-02-19 19:54               ` Antonio SJ Musumeci
@ 2024-02-19 19:55               ` Bernd Schubert
  2024-02-19 19:58                 ` Miklos Szeredi
  1 sibling, 1 reply; 23+ messages in thread
From: Bernd Schubert @ 2024-02-19 19:55 UTC (permalink / raw)
  To: Miklos Szeredi, Antonio SJ Musumeci
  Cc: Amir Goldstein, linux-fsdevel, fuse-devel



On 2/19/24 20:38, Miklos Szeredi wrote:
> On Mon, 19 Feb 2024 at 20:05, Antonio SJ Musumeci <trapexit@spawn.link> wrote:
> 
>> This is what I see from the kernel:
>>
>> lookup(nodeid=3, name=.);
>> lookup(nodeid=3, name=..);
>> lookup(nodeid=1, name=dir2);
>> lookup(nodeid=1, name=..);
>> forget(nodeid=3);
>> forget(nodeid=1);
> 
> This is really weird.  It's a kernel bug, no arguments, because kernel
> should never send a forget against the root inode.   But that
> lookup(nodeid=1, name=..); already looks bogus.

Why exactly bogus?

reconnect_path()
		if (IS_ROOT(dentry))
			parent = reconnect_one(mnt, dentry, nbuf);

reconnect_one()
		if (mnt->mnt_sb->s_export_op->get_parent)
			parent = mnt->mnt_sb->s_export_op->get_parent(dentry);


fuse_get_parent()
	if (!fc->export_support)
		return ERR_PTR(-ESTALE);

	err = fuse_lookup_name(child_inode->i_sb, get_node_id(child_inode),
			       &dotdot_name, &outarg, &inode);



Thanks,
Bernd

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [fuse-devel] Proxmox + NFS w/ exported FUSE = EIO
  2024-02-19 19:55               ` Bernd Schubert
@ 2024-02-19 19:58                 ` Miklos Szeredi
  2024-02-19 21:14                   ` Bernd Schubert
  0 siblings, 1 reply; 23+ messages in thread
From: Miklos Szeredi @ 2024-02-19 19:58 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Antonio SJ Musumeci, Amir Goldstein, linux-fsdevel, fuse-devel

On Mon, 19 Feb 2024 at 20:55, Bernd Schubert <bernd.schubert@fastmail.fm> wrote:
>
>
>
> On 2/19/24 20:38, Miklos Szeredi wrote:
> > On Mon, 19 Feb 2024 at 20:05, Antonio SJ Musumeci <trapexit@spawn.link> wrote:
> >
> >> This is what I see from the kernel:
> >>
> >> lookup(nodeid=3, name=.);
> >> lookup(nodeid=3, name=..);
> >> lookup(nodeid=1, name=dir2);
> >> lookup(nodeid=1, name=..);
> >> forget(nodeid=3);
> >> forget(nodeid=1);
> >
> > This is really weird.  It's a kernel bug, no arguments, because kernel
> > should never send a forget against the root inode.   But that
> > lookup(nodeid=1, name=..); already looks bogus.
>
> Why exactly bogus?
>
> reconnect_path()
>                 if (IS_ROOT(dentry))
>                         parent = reconnect_one(mnt, dentry, nbuf);

It's only getting this far if (dentry->d_flags & DCACHE_DISCONNECTED),
but that doesn't make sense on the root dentry.  It does happen,
though, I'm just not seeing yet how.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [fuse-devel] Proxmox + NFS w/ exported FUSE = EIO
  2024-02-19 19:58                 ` Miklos Szeredi
@ 2024-02-19 21:14                   ` Bernd Schubert
  2024-02-19 23:22                     ` Bernd Schubert
  0 siblings, 1 reply; 23+ messages in thread
From: Bernd Schubert @ 2024-02-19 21:14 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Antonio SJ Musumeci, Amir Goldstein, linux-fsdevel, fuse-devel



On 2/19/24 20:58, Miklos Szeredi wrote:
> On Mon, 19 Feb 2024 at 20:55, Bernd Schubert <bernd.schubert@fastmail.fm> wrote:
>>
>>
>>
>> On 2/19/24 20:38, Miklos Szeredi wrote:
>>> On Mon, 19 Feb 2024 at 20:05, Antonio SJ Musumeci <trapexit@spawn.link> wrote:
>>>
>>>> This is what I see from the kernel:
>>>>
>>>> lookup(nodeid=3, name=.);
>>>> lookup(nodeid=3, name=..);
>>>> lookup(nodeid=1, name=dir2);
>>>> lookup(nodeid=1, name=..);
>>>> forget(nodeid=3);
>>>> forget(nodeid=1);
>>>
>>> This is really weird.  It's a kernel bug, no arguments, because kernel
>>> should never send a forget against the root inode.   But that
>>> lookup(nodeid=1, name=..); already looks bogus.
>>
>> Why exactly bogus?
>>
>> reconnect_path()
>>                 if (IS_ROOT(dentry))
>>                         parent = reconnect_one(mnt, dentry, nbuf);
> 
> It's only getting this far if (dentry->d_flags & DCACHE_DISCONNECTED),
> but that doesn't make sense on the root dentry.  It does happen,
> though, I'm just not seeing yet how.

I see the BUG_ON(), but on the other the "if IS_ROOT(dentry)" condition
triggers.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [fuse-devel] Proxmox + NFS w/ exported FUSE = EIO
  2024-02-19 21:14                   ` Bernd Schubert
@ 2024-02-19 23:22                     ` Bernd Schubert
  0 siblings, 0 replies; 23+ messages in thread
From: Bernd Schubert @ 2024-02-19 23:22 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Antonio SJ Musumeci, Amir Goldstein, linux-fsdevel, fuse-devel



On 2/19/24 22:14, Bernd Schubert wrote:
> 
> 
> On 2/19/24 20:58, Miklos Szeredi wrote:
>> On Mon, 19 Feb 2024 at 20:55, Bernd Schubert <bernd.schubert@fastmail.fm> wrote:
>>>
>>>
>>>
>>> On 2/19/24 20:38, Miklos Szeredi wrote:
>>>> On Mon, 19 Feb 2024 at 20:05, Antonio SJ Musumeci <trapexit@spawn.link> wrote:
>>>>
>>>>> This is what I see from the kernel:
>>>>>
>>>>> lookup(nodeid=3, name=.);
>>>>> lookup(nodeid=3, name=..);
>>>>> lookup(nodeid=1, name=dir2);
>>>>> lookup(nodeid=1, name=..);
>>>>> forget(nodeid=3);
>>>>> forget(nodeid=1);
>>>>
>>>> This is really weird.  It's a kernel bug, no arguments, because kernel
>>>> should never send a forget against the root inode.   But that
>>>> lookup(nodeid=1, name=..); already looks bogus.
>>>
>>> Why exactly bogus?
>>>
>>> reconnect_path()
>>>                 if (IS_ROOT(dentry))
>>>                         parent = reconnect_one(mnt, dentry, nbuf);
>>
>> It's only getting this far if (dentry->d_flags & DCACHE_DISCONNECTED),
>> but that doesn't make sense on the root dentry.  It does happen,
>> though, I'm just not seeing yet how.
> 
> I see the BUG_ON(), but on the other the "if IS_ROOT(dentry)" condition
> triggers.

Oh I see, IS_ROOT() triggers if parent is not known yet.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [fuse-devel] Proxmox + NFS w/ exported FUSE = EIO
  2024-02-19 19:54               ` Antonio SJ Musumeci
@ 2024-02-20  8:35                 ` Miklos Szeredi
  2024-02-20  8:47                   ` Miklos Szeredi
  0 siblings, 1 reply; 23+ messages in thread
From: Miklos Szeredi @ 2024-02-20  8:35 UTC (permalink / raw)
  To: Antonio SJ Musumeci
  Cc: Bernd Schubert, Amir Goldstein, linux-fsdevel, fuse-devel

[-- Attachment #1: Type: text/plain, Size: 442 bytes --]

On Mon, 19 Feb 2024 at 20:54, Antonio SJ Musumeci <trapexit@spawn.link> wrote:
>
> On 2/19/24 13:38, Miklos Szeredi wrote:
> > On Mon, 19 Feb 2024 at 20:05, Antonio SJ Musumeci <trapexit@spawn.link> wrote:
> >
> >> This is what I see from the kernel:
> >>
> >> lookup(nodeid=3, name=.);
> >> lookup(nodeid=3, name=..);
> >> lookup(nodeid=1, name=dir2);
> >> lookup(nodeid=1, name=..);


Can you please try the attached patch?

Thanks,
Miklos

[-- Attachment #2: fuse-fix-bad-root.patch --]
[-- Type: text/x-patch, Size: 429 bytes --]

---
 fs/fuse/dir.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1210,7 +1210,7 @@ static int fuse_do_statx(struct inode *i
 	if (((sx->mask & STATX_SIZE) && !fuse_valid_size(sx->size)) ||
 	    ((sx->mask & STATX_TYPE) && (!fuse_valid_type(sx->mode) ||
 					 inode_wrong_type(inode, sx->mode)))) {
-		make_bad_inode(inode);
+		fuse_make_bad(inode);
 		return -EIO;
 	}
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [fuse-devel] Proxmox + NFS w/ exported FUSE = EIO
  2024-02-20  8:35                 ` Miklos Szeredi
@ 2024-02-20  8:47                   ` Miklos Szeredi
  2024-02-22  1:25                     ` Antonio SJ Musumeci
  0 siblings, 1 reply; 23+ messages in thread
From: Miklos Szeredi @ 2024-02-20  8:47 UTC (permalink / raw)
  To: Antonio SJ Musumeci
  Cc: Bernd Schubert, Amir Goldstein, linux-fsdevel, fuse-devel

[-- Attachment #1: Type: text/plain, Size: 616 bytes --]

On Tue, 20 Feb 2024 at 09:35, Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Mon, 19 Feb 2024 at 20:54, Antonio SJ Musumeci <trapexit@spawn.link> wrote:
> >
> > On 2/19/24 13:38, Miklos Szeredi wrote:
> > > On Mon, 19 Feb 2024 at 20:05, Antonio SJ Musumeci <trapexit@spawn.link> wrote:
> > >
> > >> This is what I see from the kernel:
> > >>
> > >> lookup(nodeid=3, name=.);
> > >> lookup(nodeid=3, name=..);
> > >> lookup(nodeid=1, name=dir2);
> > >> lookup(nodeid=1, name=..);
>
>
> Can you please try the attached patch?

Sorry, missing one hunk from the previous patch.  Here's an updated one.

Thanks,
Miklos

[-- Attachment #2: fuse-fix-bad-root-v2.patch --]
[-- Type: text/x-patch, Size: 800 bytes --]

---
 fs/fuse/dir.c    |    2 +-
 fs/fuse/fuse_i.h |    3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1210,7 +1210,7 @@ static int fuse_do_statx(struct inode *i
 	if (((sx->mask & STATX_SIZE) && !fuse_valid_size(sx->size)) ||
 	    ((sx->mask & STATX_TYPE) && (!fuse_valid_type(sx->mode) ||
 					 inode_wrong_type(inode, sx->mode)))) {
-		make_bad_inode(inode);
+		fuse_make_bad(inode);
 		return -EIO;
 	}
 
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -939,7 +939,8 @@ static inline bool fuse_stale_inode(cons
 
 static inline void fuse_make_bad(struct inode *inode)
 {
-	remove_inode_hash(inode);
+	if (get_fuse_inode(inode)->nodeid != FUSE_ROOT_ID)
+		remove_inode_hash(inode);
 	set_bit(FUSE_I_BAD, &get_fuse_inode(inode)->state);
 }
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [fuse-devel] Proxmox + NFS w/ exported FUSE = EIO
  2024-02-20  8:47                   ` Miklos Szeredi
@ 2024-02-22  1:25                     ` Antonio SJ Musumeci
  2024-02-22 11:09                       ` Miklos Szeredi
  0 siblings, 1 reply; 23+ messages in thread
From: Antonio SJ Musumeci @ 2024-02-22  1:25 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Bernd Schubert, Amir Goldstein, linux-fsdevel, fuse-devel

On 2/20/24 02:47, Miklos Szeredi wrote:
> On Tue, 20 Feb 2024 at 09:35, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Mon, 19 Feb 2024 at 20:54, Antonio SJ Musumeci <trapexit@spawn.link> wrote:
>>> On 2/19/24 13:38, Miklos Szeredi wrote:
>>>> On Mon, 19 Feb 2024 at 20:05, Antonio SJ Musumeci <trapexit@spawn.link> wrote:
>>>>
>>>>> This is what I see from the kernel:
>>>>>
>>>>> lookup(nodeid=3, name=.);
>>>>> lookup(nodeid=3, name=..);
>>>>> lookup(nodeid=1, name=dir2);
>>>>> lookup(nodeid=1, name=..);
>>
>> Can you please try the attached patch?
> Sorry, missing one hunk from the previous patch.  Here's an updated one.
>
> Thanks,
> Miklos

I'll try it when I get some cycles in the next week or so but... I'm not 
sure I see how this would address it.  Is this not still marking the 
inode bad. So while it won't forget it perhaps it will still error out. 
How does this keep ".." of root being looked up?

I don't know the code well but I'd have thought the reason for the 
forget was because the lookup of the parent fails.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [fuse-devel] Proxmox + NFS w/ exported FUSE = EIO
  2024-02-22  1:25                     ` Antonio SJ Musumeci
@ 2024-02-22 11:09                       ` Miklos Szeredi
  2024-02-25  0:18                         ` Antonio SJ Musumeci
  0 siblings, 1 reply; 23+ messages in thread
From: Miklos Szeredi @ 2024-02-22 11:09 UTC (permalink / raw)
  To: Antonio SJ Musumeci
  Cc: Bernd Schubert, Amir Goldstein, linux-fsdevel, fuse-devel

On Thu, 22 Feb 2024 at 02:26, Antonio SJ Musumeci <trapexit@spawn.link> wrote:

> I'll try it when I get some cycles in the next week or so but... I'm not
> sure I see how this would address it.  Is this not still marking the
> inode bad. So while it won't forget it perhaps it will still error out.
> How does this keep ".." of root being looked up?
>
> I don't know the code well but I'd have thought the reason for the
> forget was because the lookup of the parent fails.

It shouldn't be looking up the parent of root.   Root should always be
there, and the only way I see root disappearing is by marking it bad.

If the patch makes a difference, then you need to find out why the
root is marked bad, since the filesystem will still fail in that case.
But at least the kernel won't do stupid things.

I think the patch is correct and is needed regardless of the outcome
of your test.  But there might be other kernel bugs involved, so
definitely need to see what happens.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [fuse-devel] Proxmox + NFS w/ exported FUSE = EIO
  2024-02-22 11:09                       ` Miklos Szeredi
@ 2024-02-25  0:18                         ` Antonio SJ Musumeci
  2024-02-25 20:58                           ` Antonio SJ Musumeci
  0 siblings, 1 reply; 23+ messages in thread
From: Antonio SJ Musumeci @ 2024-02-25  0:18 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Bernd Schubert, Amir Goldstein, linux-fsdevel, fuse-devel

On 2/22/24 05:09, Miklos Szeredi wrote:
> On Thu, 22 Feb 2024 at 02:26, Antonio SJ Musumeci <trapexit@spawn.link> wrote:
> 
>> I'll try it when I get some cycles in the next week or so but... I'm not
>> sure I see how this would address it.  Is this not still marking the
>> inode bad. So while it won't forget it perhaps it will still error out.
>> How does this keep ".." of root being looked up?
>>
>> I don't know the code well but I'd have thought the reason for the
>> forget was because the lookup of the parent fails.
> 
> It shouldn't be looking up the parent of root.   Root should always be
> there, and the only way I see root disappearing is by marking it bad.
> 
> If the patch makes a difference, then you need to find out why the
> root is marked bad, since the filesystem will still fail in that case.
> But at least the kernel won't do stupid things.
> 
> I think the patch is correct and is needed regardless of the outcome
> of your test.  But there might be other kernel bugs involved, so
> definitely need to see what happens.
> 
> Thanks,
> Miklos

With the patch it doesn't issue forget(nodeid=1) anymore. Nor requesting 
parent of nodeid=1.

However, I'm seeing different issues.

I instrumented FUSE to print when it tags an inode bad.

After it gets into the bad state I'm seeing nfsd hammering the mount 
even when I've umounted the nfs share and killed the FUSE server. nfsd 
is pegging a CPU core and the kernel log is filled with 
fuse_stale_inode(nodeid=1) fuse_make_bad(nodeid=1) calls. Have to reboot.

What's triggering the flagging the inode as bad seems to be in 
fuse_iget() at fuse_stale_inode() check. inode->i_generation is 0 while 
the generation value is as I set it originally.

 From the FUSE server I see:

lookup(nodeid=3,name=".")
lookup(nodeid=3,name="..") which returns ino=1 gen=expected_val
getattr(nodeid=2) inodeid=2 is the file I'm reading in a loop
forget(nodeid=2)

after which point it's no longer functional.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [fuse-devel] Proxmox + NFS w/ exported FUSE = EIO
  2024-02-25  0:18                         ` Antonio SJ Musumeci
@ 2024-02-25 20:58                           ` Antonio SJ Musumeci
  2024-02-28 13:06                             ` Miklos Szeredi
  0 siblings, 1 reply; 23+ messages in thread
From: Antonio SJ Musumeci @ 2024-02-25 20:58 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Bernd Schubert, Amir Goldstein, linux-fsdevel, fuse-devel

On 2/24/24 18:18, Antonio SJ Musumeci wrote:
> On 2/22/24 05:09, Miklos Szeredi wrote:
>> On Thu, 22 Feb 2024 at 02:26, Antonio SJ Musumeci <trapexit@spawn.link> wrote:
>>
>>> I'll try it when I get some cycles in the next week or so but... I'm not
>>> sure I see how this would address it.  Is this not still marking the
>>> inode bad. So while it won't forget it perhaps it will still error out.
>>> How does this keep ".." of root being looked up?
>>>
>>> I don't know the code well but I'd have thought the reason for the
>>> forget was because the lookup of the parent fails.
>>
>> It shouldn't be looking up the parent of root.   Root should always be
>> there, and the only way I see root disappearing is by marking it bad.
>>
>> If the patch makes a difference, then you need to find out why the
>> root is marked bad, since the filesystem will still fail in that case.
>> But at least the kernel won't do stupid things.
>>
>> I think the patch is correct and is needed regardless of the outcome
>> of your test.  But there might be other kernel bugs involved, so
>> definitely need to see what happens.
>>
>> Thanks,
>> Miklos
> 
> With the patch it doesn't issue forget(nodeid=1) anymore. Nor requesting
> parent of nodeid=1.
> 
> However, I'm seeing different issues.
> 
> I instrumented FUSE to print when it tags an inode bad.
> 
> After it gets into the bad state I'm seeing nfsd hammering the mount
> even when I've umounted the nfs share and killed the FUSE server. nfsd
> is pegging a CPU core and the kernel log is filled with
> fuse_stale_inode(nodeid=1) fuse_make_bad(nodeid=1) calls. Have to reboot.
> 
> What's triggering the flagging the inode as bad seems to be in
> fuse_iget() at fuse_stale_inode() check. inode->i_generation is 0 while
> the generation value is as I set it originally.
> 
>   From the FUSE server I see:
> 
> lookup(nodeid=3,name=".")
> lookup(nodeid=3,name="..") which returns ino=1 gen=expected_val
> getattr(nodeid=2) inodeid=2 is the file I'm reading in a loop
> forget(nodeid=2)
> 
> after which point it's no longer functional.
> 
> 

I've resolved the issue and I believe I know why I couldn't reproduce 
with current libfuse examples. The fact root node has a generation of 0 
is implicit in the examples and as a result when the request came in the 
lookup on ".." of a child node to root it would return 0. However, in my 
server I start the generation value of everything at different non-zero 
value per instance of the server as at one point I read that ensuring 
different nodeid + gen pairs for different filesystems was better/needed 
for NFS support. I'm guessing the increase in reports I've had was 
happenstance of people upgrading to kernels past 5.14.

In retrospect it makes sense that the nodeid and gen are assumed to be 1 
and 0 respectively, and don't change, but due to the symptoms I had it 
wasn't clicking till I saw the stale check.

Not sure if there is any changes to the kernel code that would make 
sense. A log entry indicating root was tagged as bad and why would have 
helped but not sure it needs more than a note in some docs. Which I'll 
likely add to libfuse.

Thanks for everyone's help. Sorry for the goose chase.




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [fuse-devel] Proxmox + NFS w/ exported FUSE = EIO
  2024-02-25 20:58                           ` Antonio SJ Musumeci
@ 2024-02-28 13:06                             ` Miklos Szeredi
  2024-02-28 13:16                               ` Bernd Schubert
  0 siblings, 1 reply; 23+ messages in thread
From: Miklos Szeredi @ 2024-02-28 13:06 UTC (permalink / raw)
  To: Antonio SJ Musumeci
  Cc: Bernd Schubert, Amir Goldstein, linux-fsdevel, fuse-devel

[-- Attachment #1: Type: text/plain, Size: 1764 bytes --]

On Sun, 25 Feb 2024 at 21:58, Antonio SJ Musumeci <trapexit@spawn.link> wrote:

> I've resolved the issue and I believe I know why I couldn't reproduce
> with current libfuse examples. The fact root node has a generation of 0
> is implicit in the examples and as a result when the request came in the
> lookup on ".." of a child node to root it would return 0. However, in my
> server I start the generation value of everything at different non-zero
> value per instance of the server as at one point I read that ensuring
> different nodeid + gen pairs for different filesystems was better/needed
> for NFS support. I'm guessing the increase in reports I've had was
> happenstance of people upgrading to kernels past 5.14.
>
> In retrospect it makes sense that the nodeid and gen are assumed to be 1
> and 0 respectively, and don't change, but due to the symptoms I had it
> wasn't clicking till I saw the stale check.
>
> Not sure if there is any changes to the kernel code that would make
> sense. A log entry indicating root was tagged as bad and why would have
> helped but not sure it needs more than a note in some docs. Which I'll
> likely add to libfuse.
>
> Thanks for everyone's help. Sorry for the goose chase.

Looking deeper this turned out to be a regression, introduced in v5.14
by commit 15db16837a35 ("fuse: fix illegal access to inode with reused
nodeid").  Prior to this commit the generation number would be ignored
and things would work fine.

The attached patch reverts this behavior for the root inode (which
wasn't intended, since the generation number is not supplied by the
server), but with an added warn_on_once() so this doesn't remain
hidden in the future.

Can you please test with both the fixed and unfixed server?

Thanks,
Miklos

[-- Attachment #2: fuse-fix-lookup-root-with-nonzero-generation.patch --]
[-- Type: text/x-patch, Size: 1365 bytes --]

From: Miklos Szeredi <mszeredi@redhat.com>
Subject: fuse: fix root lookup with nonzero generation

The root inode has a fixed nodeid and generation (1, 0).

Prior to the commit 15db16837a35 ("fuse: fix illegal access to inode with
reused nodeid") generation number on lookup was ignored.  After this commit
lookup with the wrong generation number resulted in the inode being
unhashed.  This is correct for non-root inodes, but replacing the root
inode is wrong and results in weird behavior.

Fix by reverting to the old behavior if ignoring the generation for the
root inode, but issuing a warning in dmesg.

Reported-by: Antonio SJ Musumeci <trapexit@spawn.link>
Fixes: 15db16837a35 ("fuse: fix illegal access to inode with reused nodeid")
Cc: <stable@vger.kernel.org> # v5.14
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/fuse/dir.c    |    4 ++++
 fs/fuse/fuse_i.h |    2 ++
 2 files changed, 6 insertions(+)

--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -391,6 +391,10 @@ int fuse_lookup_name(struct super_block
 	err = -EIO;
 	if (fuse_invalid_attr(&outarg->attr))
 		goto out_put_forget;
+	if (outarg->nodeid == FUSE_ROOT_ID && outarg->generation != 0) {
+		pr_warn_once("root generation should be zero\n");
+		outarg->generation = 0;
+	}
 
 	*inode = fuse_iget(sb, outarg->nodeid, outarg->generation,
 			   &outarg->attr, ATTR_TIMEOUT(outarg),

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [fuse-devel] Proxmox + NFS w/ exported FUSE = EIO
  2024-02-28 13:06                             ` Miklos Szeredi
@ 2024-02-28 13:16                               ` Bernd Schubert
  2024-02-28 14:14                                 ` Miklos Szeredi
  0 siblings, 1 reply; 23+ messages in thread
From: Bernd Schubert @ 2024-02-28 13:16 UTC (permalink / raw)
  To: Miklos Szeredi, Antonio SJ Musumeci
  Cc: Amir Goldstein, linux-fsdevel, fuse-devel



On 2/28/24 14:06, Miklos Szeredi wrote:
> On Sun, 25 Feb 2024 at 21:58, Antonio SJ Musumeci <trapexit@spawn.link> wrote:
> 
>> I've resolved the issue and I believe I know why I couldn't reproduce
>> with current libfuse examples. The fact root node has a generation of 0
>> is implicit in the examples and as a result when the request came in the
>> lookup on ".." of a child node to root it would return 0. However, in my
>> server I start the generation value of everything at different non-zero
>> value per instance of the server as at one point I read that ensuring
>> different nodeid + gen pairs for different filesystems was better/needed
>> for NFS support. I'm guessing the increase in reports I've had was
>> happenstance of people upgrading to kernels past 5.14.
>>
>> In retrospect it makes sense that the nodeid and gen are assumed to be 1
>> and 0 respectively, and don't change, but due to the symptoms I had it
>> wasn't clicking till I saw the stale check.
>>
>> Not sure if there is any changes to the kernel code that would make
>> sense. A log entry indicating root was tagged as bad and why would have
>> helped but not sure it needs more than a note in some docs. Which I'll
>> likely add to libfuse.
>>
>> Thanks for everyone's help. Sorry for the goose chase.
> 
> Looking deeper this turned out to be a regression, introduced in v5.14
> by commit 15db16837a35 ("fuse: fix illegal access to inode with reused
> nodeid").  Prior to this commit the generation number would be ignored
> and things would work fine.
> 
> The attached patch reverts this behavior for the root inode (which
> wasn't intended, since the generation number is not supplied by the
> server), but with an added warn_on_once() so this doesn't remain
> hidden in the future.

Could you still apply your previous patch? I think that definitely makes
sense as well.

I think what we also need is notification message from kernel to server
that it does something wrong - instead of going to kernel logs, such
messages should go to server side logs.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [fuse-devel] Proxmox + NFS w/ exported FUSE = EIO
  2024-02-28 13:16                               ` Bernd Schubert
@ 2024-02-28 14:14                                 ` Miklos Szeredi
  2024-02-28 16:03                                   ` Miklos Szeredi
  0 siblings, 1 reply; 23+ messages in thread
From: Miklos Szeredi @ 2024-02-28 14:14 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Antonio SJ Musumeci, Amir Goldstein, linux-fsdevel, fuse-devel

On Wed, 28 Feb 2024 at 14:16, Bernd Schubert <bernd.schubert@fastmail.fm> wrote:

> Could you still apply your previous patch? I think that definitely makes
> sense as well.

Half of it is trivial (s/make_bad_inode/fuse_make_bad/).

The other half is probably broken in that form (see 775c5033a0d1
("fuse: fix live lock in fuse_iget()")).

Things get complicated, though, because the root of submounts can have
nodeid != FUSE_ROOT_ID, yet they should have the same rules as the
original root.   Needs more thought...

> I think what we also need is notification message from kernel to server
> that it does something wrong - instead of going to kernel logs, such
> messages should go to server side logs.

That's generally not possible.  We can return -EIO to the application,
but not the server.  In this case I think it's better to just fall
back to old behavior of ignoring the generation.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [fuse-devel] Proxmox + NFS w/ exported FUSE = EIO
  2024-02-28 14:14                                 ` Miklos Szeredi
@ 2024-02-28 16:03                                   ` Miklos Szeredi
  0 siblings, 0 replies; 23+ messages in thread
From: Miklos Szeredi @ 2024-02-28 16:03 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Antonio SJ Musumeci, Amir Goldstein, linux-fsdevel, fuse-devel

On Wed, 28 Feb 2024 at 15:14, Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Wed, 28 Feb 2024 at 14:16, Bernd Schubert <bernd.schubert@fastmail.fm> wrote:
>
> > Could you still apply your previous patch? I think that definitely makes
> > sense as well.
>
> Half of it is trivial (s/make_bad_inode/fuse_make_bad/).
>
> The other half is probably broken in that form (see 775c5033a0d1
> ("fuse: fix live lock in fuse_iget()")).
>
> Things get complicated, though, because the root of submounts can have
> nodeid != FUSE_ROOT_ID, yet they should have the same rules as the
> original root.   Needs more thought...

Sent out a series that I think correctly handles all cases.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2024-02-28 16:03 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <d997c02b-d5ef-41f8-92b6-8c6775899388@spawn.link>
2024-02-06  6:53 ` [fuse-devel] Proxmox + NFS w/ exported FUSE = EIO Amir Goldstein
2024-02-07  0:08   ` Antonio SJ Musumeci
2024-02-07  7:24     ` Amir Goldstein
     [not found]   ` <b9cec6b7-0973-4d61-9bef-120e3c4654d7@spawn.link>
     [not found]     ` <CAOQ4uxgZR4OtCkdrpcDGCK-MqZEHcrx+RY4G94saqaXVkL4cKA@mail.gmail.com>
2024-02-18  0:48       ` Antonio SJ Musumeci
2024-02-19 11:36         ` Bernd Schubert
2024-02-19 19:05           ` Antonio SJ Musumeci
2024-02-19 19:17             ` Bernd Schubert
2024-02-19 19:38             ` Miklos Szeredi
2024-02-19 19:54               ` Antonio SJ Musumeci
2024-02-20  8:35                 ` Miklos Szeredi
2024-02-20  8:47                   ` Miklos Szeredi
2024-02-22  1:25                     ` Antonio SJ Musumeci
2024-02-22 11:09                       ` Miklos Szeredi
2024-02-25  0:18                         ` Antonio SJ Musumeci
2024-02-25 20:58                           ` Antonio SJ Musumeci
2024-02-28 13:06                             ` Miklos Szeredi
2024-02-28 13:16                               ` Bernd Schubert
2024-02-28 14:14                                 ` Miklos Szeredi
2024-02-28 16:03                                   ` Miklos Szeredi
2024-02-19 19:55               ` Bernd Schubert
2024-02-19 19:58                 ` Miklos Szeredi
2024-02-19 21:14                   ` Bernd Schubert
2024-02-19 23:22                     ` Bernd Schubert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.