All of lore.kernel.org
 help / color / mirror / Atom feed
* 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)
@ 2022-02-20 22:26 Kurt Garloff
  2022-02-20 23:17 ` Kurt Garloff
  0 siblings, 1 reply; 16+ messages in thread
From: Kurt Garloff @ 2022-02-20 22:26 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: linux-nfs, Anna Schumaker, Trond Myklebust

Hi Olga,

your upstream commit 1976b2b3, applied to 5.15.24 as 6f283634 breaks NFS
for me.

This is while mounting many NFS filesystems from two NFS servers, one
Qnap (nfs v4.1) and one linux 5.15.16 knfsd (nfs v4.2).

The NFS mounts just would not succeed. This appears to happen to all
Qnap mounts and one of the mounts from the linux knfsd.

I did some bisecting in 5.15.24 ... reverting 6f283634 and subsequent
NFS/sunRPC patches from you and Xiyu, Anna did the trick to recover from
this failure.
To be precise: I reverted 4403233b 4b22aa42 5ca123c9 c5ae18fa be67be6a
6f283634 2df6a47a 0c5d3bfb 3cb5b317 58967a23 bbf647ec and 38ae9387 in
5.15.24. I started reenabling and 2df6aa647a is the last patch that
still results a working NFS for me.

Looking at the culprit patch, I could not immediately see what's wrong
-- so I'll leave it to you. I guess the server does not return
fs_locations in the way it's expected and thus the NFS mount hangs.

I seem not to be the only one, see
https://bbs.archlinux.org/viewtopic.php?pid=2022938
https://bugs.archlinux.org/task/73860

HTH,

-- 
Kurt Garloff <kurt@garloff.de>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)
  2022-02-20 22:26 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD) Kurt Garloff
@ 2022-02-20 23:17 ` Kurt Garloff
  2022-02-21  1:19   ` Kornievskaia, Olga
  0 siblings, 1 reply; 16+ messages in thread
From: Kurt Garloff @ 2022-02-20 23:17 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: linux-nfs, Anna Schumaker, Trond Myklebust

Hi Olga,

two updates:

On 20.02.22 23:26, Kurt Garloff wrote:
> Hi Olga,
>
> your upstream commit 1976b2b3, applied to 5.15.24 as 6f283634 breaks NFS
> for me.
>
> This is while mounting many NFS filesystems from two NFS servers, one
> Qnap (nfs v4.1) and one linux 5.15.16 knfsd (nfs v4.2).
I have to correct myself. All volumes broken by 5.15.24 come from Qnap.
> The NFS mounts just would not succeed. This appears to happen to all
> Qnap mounts and one of the mounts from the linux knfsd.
This mount also cam from Qnap -- in my mind I had migrated it already,
but not in reality :-O
> I did some bisecting in 5.15.24 ... reverting 6f283634 and subsequent
> NFS/sunRPC patches from you and Xiyu, Anna did the trick to recover from
> this failure.
> To be precise: I reverted 4403233b 4b22aa42 5ca123c9 c5ae18fa be67be6a
> 6f283634 2df6a47a 0c5d3bfb 3cb5b317 58967a23 bbf647ec and 38ae9387 in
> 5.15.24. I started reenabling and 2df6aa647a is the last patch that
> still results a working NFS for me.

Also, taking plain 5.15.24 and just reverting 6f283634 creates a
kernel that works well with Qnap NFS shares.

Best,

-- 
Kurt Garloff <kurt@garloff.de>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)
  2022-02-20 23:17 ` Kurt Garloff
@ 2022-02-21  1:19   ` Kornievskaia, Olga
  2022-02-21  9:31     ` Kurt Garloff
  0 siblings, 1 reply; 16+ messages in thread
From: Kornievskaia, Olga @ 2022-02-21  1:19 UTC (permalink / raw)
  To: Kurt Garloff; +Cc: linux-nfs, Schumaker, Anna, Trond Myklebust



On 2/20/22, 6:17 PM, "Kurt Garloff" <kurt@garloff.de> wrote:

    NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.




    Hi Olga,

    two updates:

    On 20.02.22 23:26, Kurt Garloff wrote:
    > Hi Olga,
    >
    > your upstream commit 1976b2b3, applied to 5.15.24 as 6f283634 breaks NFS
    > for me.
    >
    > This is while mounting many NFS filesystems from two NFS servers, one
    > Qnap (nfs v4.1) and one linux 5.15.16 knfsd (nfs v4.2).
    I have to correct myself. All volumes broken by 5.15.24 come from Qnap.
    > The NFS mounts just would not succeed. This appears to happen to all
    > Qnap mounts and one of the mounts from the linux knfsd.
    This mount also cam from Qnap -- in my mind I had migrated it already,
    but not in reality :-O
    > I did some bisecting in 5.15.24 ... reverting 6f283634 and subsequent
    > NFS/sunRPC patches from you and Xiyu, Anna did the trick to recover from
    > this failure.
    > To be precise: I reverted 4403233b 4b22aa42 5ca123c9 c5ae18fa be67be6a
    > 6f283634 2df6a47a 0c5d3bfb 3cb5b317 58967a23 bbf647ec and 38ae9387 in
    > 5.15.24. I started reenabling and 2df6aa647a is the last patch that
    > still results a working NFS for me.

    Also, taking plain 5.15.24 and just reverting 6f283634 creates a
    kernel that works well with Qnap NFS shares.

Is it possible for you to provide a network trace?

    Best,

    --
    Kurt Garloff <kurt@garloff.de>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)
  2022-02-21  1:19   ` Kornievskaia, Olga
@ 2022-02-21  9:31     ` Kurt Garloff
  2022-02-21 10:48       ` Kurt Garloff
  0 siblings, 1 reply; 16+ messages in thread
From: Kurt Garloff @ 2022-02-21  9:31 UTC (permalink / raw)
  To: Kornievskaia, Olga; +Cc: linux-nfs, Schumaker, Anna, Trond Myklebust

Hi Olga,

On 21.02.22 02:19, Kornievskaia, Olga wrote:
> On 2/20/22, 6:17 PM, "Kurt Garloff" <kurt@garloff.de> wrote:
>
>      Hi Olga,
>
>      two updates:
>
>      On 20.02.22 23:26, Kurt Garloff wrote:
>      > Hi Olga,
>      >
>      > your upstream commit 1976b2b3, applied to 5.15.24 as 6f283634 breaks NFS
>      > for me.
>      >
>      > This is while mounting many NFS filesystems from two NFS servers, one
>      > Qnap (nfs v4.1) and one linux 5.15.16 knfsd (nfs v4.2).
>      I have to correct myself. All volumes broken by 5.15.24 come from Qnap.
>      > The NFS mounts just would not succeed. This appears to happen to all
>      > Qnap mounts and one of the mounts from the linux knfsd.
>      This mount also cam from Qnap -- in my mind I had migrated it already,
>      but not in reality :-O
>      > I did some bisecting in 5.15.24 ... reverting 6f283634 and subsequent
>      > NFS/sunRPC patches from you and Xiyu, Anna did the trick to recover from
>      > this failure.
>      > To be precise: I reverted 4403233b 4b22aa42 5ca123c9 c5ae18fa be67be6a
>      > 6f283634 2df6a47a 0c5d3bfb 3cb5b317 58967a23 bbf647ec and 38ae9387 in
>      > 5.15.24. I started reenabling and 2df6aa647a is the last patch that
>      > still results a working NFS for me.
>
>      Also, taking plain 5.15.24 and just reverting 6f283634 creates a
>      kernel that works well with Qnap NFS shares.
>
> Is it possible for you to provide a network trace?

Yes.

Is tcpdump what you'd like to see? wireshark's dumpcap?
Any NFS specific tracing tools I should be using?

One trace with a working kernel and one with the broken one?

Best,

-- 
Kurt Garloff <kurt@garloff.de>
Cologne, Germany


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)
  2022-02-21  9:31     ` Kurt Garloff
@ 2022-02-21 10:48       ` Kurt Garloff
  2022-02-23  8:05         ` Kurt Garloff
  2022-02-23 18:06         ` Chuck Lever III
  0 siblings, 2 replies; 16+ messages in thread
From: Kurt Garloff @ 2022-02-21 10:48 UTC (permalink / raw)
  To: Kornievskaia, Olga; +Cc: linux-nfs, Schumaker, Anna, Trond Myklebust

Hi,

On 21.02.22 10:31, Kurt Garloff wrote:
> Hi Olga,
>
> On 21.02.22 02:19, Kornievskaia, Olga wrote:
>> [...]
>> Is it possible for you to provide a network trace?
>
> Yes.
>
> Is tcpdump what you'd like to see? wireshark's dumpcap?
> Any NFS specific tracing tools I should be using?
>
> One trace with a working kernel and one with the broken one?

Comparing the good and the bad trace ...

mount -t nfs 192.168.155.74:/Public /mnt/Public
against Qnap 4.3.4.xxx NFS v4.1 server.

Both do:

Establish conn
NFS NULL (ack)
NFS EXCHANGE_ID (4.2 -> NFS4ERR_MINOR_VERS_MISMATCH)
Teardown and reestablish
NFS NULL (ack)
NFS EXCAHNGE_ID (4.1 -> ack)
NFS EXCAHNGE_ID (4.1 -> ack)
NFS CREATE_SESSION (ack)
NFS RECLAIM_COMPLETE (CB_NULL, ack)
NFS_SECINFO_NO_NAME (ack)
NFS PUTROOTFH|GETATTR (ack)
NFS GETATTR FH:0x62d40c52 (ack), 8 times
NFS ACCESS FH_ -x62d40c52 (denied md xt dl, alllowed rd lu)
NFS LOOKUP DH:0x62d40c52/Public (ack)
NFS LOOKUP DH:0x62d40c52/Public (ack)
NFS GETATTR FH:0x8ee88cee (ack), 3 times


Now the differences start:

The fixed NFS client repeatedly gets ack back, the broken NFS client gets

NFS GETATTR FH:0x8ee88cee (NFS4ERR_DELAY), repeating forever (exp. backoff)


If someone else wants to look at the pcapng data, let me know.

HTH,

-- 
Kurt Garloff <kurt@garloff.de>
Cologne, Germany


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)
  2022-02-21 10:48       ` Kurt Garloff
@ 2022-02-23  8:05         ` Kurt Garloff
  2022-02-23 14:22           ` Olga Kornievskaia
  2022-02-23 17:56           ` Olga Kornievskaia
  2022-02-23 18:06         ` Chuck Lever III
  1 sibling, 2 replies; 16+ messages in thread
From: Kurt Garloff @ 2022-02-23  8:05 UTC (permalink / raw)
  To: Kornievskaia, Olga; +Cc: linux-nfs, Schumaker, Anna, Trond Myklebust

Hi Olga,

any updates? Were you able to investigate the traces?

Breaking NFS mounts from Qnap (knfsd with 3.4.6 kernel here,
though Qnap might have patched it),is not something that
should happen with a -stable kernel update, even if the problem
would be on the Qnap side, which would not be completely
surprising.

So I think we should revert this patch at least for -stable,
unless we understand what's going on and have a better fix
than a plain revert.

Best,
-- 

Kurt Garloff <kurt@garloff.de>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)
  2022-02-23  8:05         ` Kurt Garloff
@ 2022-02-23 14:22           ` Olga Kornievskaia
  2022-02-23 17:31             ` Kurt Garloff
  2022-02-23 17:56           ` Olga Kornievskaia
  1 sibling, 1 reply; 16+ messages in thread
From: Olga Kornievskaia @ 2022-02-23 14:22 UTC (permalink / raw)
  To: Kurt Garloff
  Cc: Kornievskaia, Olga, linux-nfs, Schumaker, Anna, Trond Myklebust

On Wed, Feb 23, 2022 at 8:20 AM Kurt Garloff <kurt@garloff.de> wrote:
>
> Hi Olga,
>
> any updates? Were you able to investigate the traces?
>
> Breaking NFS mounts from Qnap (knfsd with 3.4.6 kernel here,
> though Qnap might have patched it),is not something that
> should happen with a -stable kernel update, even if the problem
> would be on the Qnap side, which would not be completely
> surprising.
>
> So I think we should revert this patch at least for -stable,
> unless we understand what's going on and have a better fix
> than a plain revert.

Hi Kurt,

I apologize for the late response. I have looked at the network trace.
The problem stems from the broken server that claims to support
fs_locations but then decides to never reply to the query.

I can implement a mount option to say fs_locquery=off to handle mounts
against the broken servers?

However I would like to ask if the better path forward isn't to update
to the knfsd where the problem is fixed?

>
> Best,
> --
>
> Kurt Garloff <kurt@garloff.de>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)
  2022-02-23 14:22           ` Olga Kornievskaia
@ 2022-02-23 17:31             ` Kurt Garloff
  2022-02-23 17:49               ` Olga Kornievskaia
  0 siblings, 1 reply; 16+ messages in thread
From: Kurt Garloff @ 2022-02-23 17:31 UTC (permalink / raw)
  To: Olga Kornievskaia
  Cc: Kornievskaia, Olga, linux-nfs, Schumaker, Anna, Trond Myklebust

[-- Attachment #1: Type: text/plain, Size: 1902 bytes --]

Hi Olga,

thanks for coming back!

On 23.02.22 15:22, Olga Kornievskaia wrote:
> Hi Kurt,
> I apologize for the late response. I have looked at the network trace.
> The problem stems from the broken server that claims to support
> fs_locations but then decides to never reply to the query.
>
> I can implement a mount option to say fs_locquery=off to handle mounts
> against the broken servers?
>
> However I would like to ask if the better path forward isn't to update
> to the knfsd where the problem is fixed?

Well, I have ran self-compiled kernels on Qnap appliances before (to
work around Qnap's ext4 breakage when doing the case-independent
name lookup), but it was a painful and cumbersome process and I don't
want to repeat it. Appliances are not meant to use with custom
kernels.
Even if I do: This does not help many many other users ... Unless we
convince Qnap to provide patches for old appliances, we'll experience
breakage.

On my end, I have applied the attached patch, restricting the use
of FS_LOCATIONS to servers that advertize NFS v4.2 or later.

In the patch, you'll also see clearing the bit before it gets set.
This was spotted by seth, see
https://bbs.archlinux.org/viewtopic.php?pid=2023983#p2023983
In latest upstream kernels you'd also need to clear
NFS_CAP_CASE_PRESERVING | NFS_CAP_CASE_INSENSITIVE
so I wonder whether we should not just nullify the caps
bit field prior to testing and selectively setting flags.

With this patch, I can mount NFS volumes from Qnap knfsd
again without any special workarounds (such as nfsver=3 or the
to-be-implemented setting that you suggest). I have no idea
whether or not we leave a lot features behind by restricting
FS_LOCATIONS on the client side to servers >= NFS v4.2.
But certainly better than breaking in a -stable kernel update,
even if the server might be to blame.

Best,

-- 
Kurt Garloff <kurt@garloff.de>
Cologne, Germany

[-- Attachment #2: nfs-restrict-fs-loc-to-nfs42.diff --]
[-- Type: text/x-patch, Size: 1326 bytes --]

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 389fa72d4ca9..fc29daf00a72 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3880,8 +3880,8 @@ static int _nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *f
 			res.attr_bitmask[2] &= FATTR4_WORD2_NFS42_MASK;
 		}
 		memcpy(server->attr_bitmask, res.attr_bitmask, sizeof(server->attr_bitmask));
-		server->caps &= ~(NFS_CAP_ACLS | NFS_CAP_HARDLINKS |
-				  NFS_CAP_SYMLINKS| NFS_CAP_SECURITY_LABEL);
+		server->caps &= ~(NFS_CAP_ACLS | NFS_CAP_HARDLINKS | NFS_CAP_SYMLINKS
+				| NFS_CAP_SECURITY_LABEL | NFS_CAP_FS_LOCATIONS);
 		server->fattr_valid = NFS_ATTR_FATTR_V4;
 		if (res.attr_bitmask[0] & FATTR4_WORD0_ACL &&
 				res.acl_bitmask & ACL4_SUPPORT_ALLOW_ACL)
@@ -3894,7 +3894,8 @@ static int _nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *f
 		if (res.attr_bitmask[2] & FATTR4_WORD2_SECURITY_LABEL)
 			server->caps |= NFS_CAP_SECURITY_LABEL;
 #endif
-		if (res.attr_bitmask[0] & FATTR4_WORD0_FS_LOCATIONS)
+		/* Restrict FS_LOCATIONS to NFS v4.2+ to work around Qnap knfsd-3.4.6 bug */
+		if (res.attr_bitmask[0] & FATTR4_WORD0_FS_LOCATIONS && minorversion >= 2)
 			server->caps |= NFS_CAP_FS_LOCATIONS;
 		if (!(res.attr_bitmask[0] & FATTR4_WORD0_FILEID))
 			server->fattr_valid &= ~NFS_ATTR_FATTR_FILEID;

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)
  2022-02-23 17:31             ` Kurt Garloff
@ 2022-02-23 17:49               ` Olga Kornievskaia
  2022-02-23 22:35                 ` Kurt Garloff
  0 siblings, 1 reply; 16+ messages in thread
From: Olga Kornievskaia @ 2022-02-23 17:49 UTC (permalink / raw)
  To: Kurt Garloff
  Cc: Kornievskaia, Olga, linux-nfs, Schumaker, Anna, Trond Myklebust

On Wed, Feb 23, 2022 at 12:31 PM Kurt Garloff <kurt@garloff.de> wrote:
>
> Hi Olga,
>
> thanks for coming back!
>
> On 23.02.22 15:22, Olga Kornievskaia wrote:
> > Hi Kurt,
> > I apologize for the late response. I have looked at the network trace.
> > The problem stems from the broken server that claims to support
> > fs_locations but then decides to never reply to the query.
> >
> > I can implement a mount option to say fs_locquery=off to handle mounts
> > against the broken servers?
> >

I have posted a patch where you can mount with "notrunkdiscovery" and
that should fix the problem with the Qnap server?

> > However I would like to ask if the better path forward isn't to update
> > to the knfsd where the problem is fixed?
>
> Well, I have ran self-compiled kernels on Qnap appliances before (to
> work around Qnap's ext4 breakage when doing the case-independent
> name lookup), but it was a painful and cumbersome process and I don't
> want to repeat it. Appliances are not meant to use with custom
> kernels.
> Even if I do: This does not help many many other users ... Unless we
> convince Qnap to provide patches for old appliances, we'll experience
> breakage.
>
> On my end, I have applied the attached patch, restricting the use
> of FS_LOCATIONS to servers that advertize NFS v4.2 or later.
>
> In the patch, you'll also see clearing the bit before it gets set.
> This was spotted by seth, see
> https://bbs.archlinux.org/viewtopic.php?pid=2023983#p2023983
> In latest upstream kernels you'd also need to clear
> NFS_CAP_CASE_PRESERVING | NFS_CAP_CASE_INSENSITIVE
> so I wonder whether we should not just nullify the caps
> bit field prior to testing and selectively setting flags.
>
> With this patch, I can mount NFS volumes from Qnap knfsd
> again without any special workarounds (such as nfsver=3 or the
> to-be-implemented setting that you suggest). I have no idea
> whether or not we leave a lot features behind by restricting
> FS_LOCATIONS on the client side to servers >= NFS v4.2.
> But certainly better than breaking in a -stable kernel update,
> even if the server might be to blame.
>
> Best,
>
> --
> Kurt Garloff <kurt@garloff.de>
> Cologne, Germany

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)
  2022-02-23  8:05         ` Kurt Garloff
  2022-02-23 14:22           ` Olga Kornievskaia
@ 2022-02-23 17:56           ` Olga Kornievskaia
  2022-02-23 22:24             ` Kurt Garloff
  1 sibling, 1 reply; 16+ messages in thread
From: Olga Kornievskaia @ 2022-02-23 17:56 UTC (permalink / raw)
  To: Kurt Garloff
  Cc: Kornievskaia, Olga, linux-nfs, Schumaker, Anna, Trond Myklebust

On Wed, Feb 23, 2022 at 8:20 AM Kurt Garloff <kurt@garloff.de> wrote:
>
> Hi Olga,
>
> any updates? Were you able to investigate the traces?
>
> Breaking NFS mounts from Qnap (knfsd with 3.4.6 kernel here,
> though Qnap might have patched it),is not something that
> should happen with a -stable kernel update, even if the problem
> would be on the Qnap side, which would not be completely
> surprising.
>
> So I think we should revert this patch at least for -stable,
> unless we understand what's going on and have a better fix
> than a plain revert.

I haven't commented on your ask of requesting a revert in the stable
version. I'm not sure what the philosophy there. I don't see why we
can't ask for this feature to only be available from the kernel
version it has been accepted into and not before. If you think the
kernel version that you want to use will always be before this feature
was accepted, then asking folks responsible for "stable" kernels seems
like a good idea. At the time of inclusion to stable, I wasn't aware
of the broken legacy server implementations out there.

>
> Best,
> --
>
> Kurt Garloff <kurt@garloff.de>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)
  2022-02-21 10:48       ` Kurt Garloff
  2022-02-23  8:05         ` Kurt Garloff
@ 2022-02-23 18:06         ` Chuck Lever III
  2022-02-23 22:00           ` Trond Myklebust
  1 sibling, 1 reply; 16+ messages in thread
From: Chuck Lever III @ 2022-02-23 18:06 UTC (permalink / raw)
  To: Kurt Garloff, Kornievskaia, Olga
  Cc: Linux NFS Mailing List, Anna Schumaker, Trond Myklebust



> On Feb 21, 2022, at 5:48 AM, Kurt Garloff <kurt@garloff.de> wrote:
> 
> Hi,
> 
> On 21.02.22 10:31, Kurt Garloff wrote:
>> Hi Olga,
>> 
>> On 21.02.22 02:19, Kornievskaia, Olga wrote:
>>> [...]
>>> Is it possible for you to provide a network trace?
>> 
>> Yes.
>> 
>> Is tcpdump what you'd like to see? wireshark's dumpcap?
>> Any NFS specific tracing tools I should be using?
>> 
>> One trace with a working kernel and one with the broken one?
> 
> Comparing the good and the bad trace ...
> 
> mount -t nfs 192.168.155.74:/Public /mnt/Public
> against Qnap 4.3.4.xxx NFS v4.1 server.
> 
> Both do:
> 
> Establish conn
> NFS NULL (ack)
> NFS EXCHANGE_ID (4.2 -> NFS4ERR_MINOR_VERS_MISMATCH)
> Teardown and reestablish
> NFS NULL (ack)
> NFS EXCAHNGE_ID (4.1 -> ack)
> NFS EXCAHNGE_ID (4.1 -> ack)
> NFS CREATE_SESSION (ack)
> NFS RECLAIM_COMPLETE (CB_NULL, ack)
> NFS_SECINFO_NO_NAME (ack)
> NFS PUTROOTFH|GETATTR (ack)
> NFS GETATTR FH:0x62d40c52 (ack), 8 times
> NFS ACCESS FH_ -x62d40c52 (denied md xt dl, alllowed rd lu)
> NFS LOOKUP DH:0x62d40c52/Public (ack)
> NFS LOOKUP DH:0x62d40c52/Public (ack)
> NFS GETATTR FH:0x8ee88cee (ack), 3 times
> 
> 
> Now the differences start:
> 
> The fixed NFS client repeatedly gets ack back, the broken NFS client gets
> 
> NFS GETATTR FH:0x8ee88cee (NFS4ERR_DELAY), repeating forever (exp. backoff)

Any idea why the server is not able to respond properly to
the GETATTR request? That seems like the root of the problem.

--
Chuck Lever




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)
  2022-02-23 18:06         ` Chuck Lever III
@ 2022-02-23 22:00           ` Trond Myklebust
  2022-02-23 22:22             ` Chuck Lever III
  0 siblings, 1 reply; 16+ messages in thread
From: Trond Myklebust @ 2022-02-23 22:00 UTC (permalink / raw)
  To: kurt, Olga.Kornievskaia, chuck.lever; +Cc: linux-nfs, Anna.Schumaker

On Wed, 2022-02-23 at 18:06 +0000, Chuck Lever III wrote:
> 
> 
> > On Feb 21, 2022, at 5:48 AM, Kurt Garloff <kurt@garloff.de> wrote:
> > 
> > Hi,
> > 
> > On 21.02.22 10:31, Kurt Garloff wrote:
> > > Hi Olga,
> > > 
> > > On 21.02.22 02:19, Kornievskaia, Olga wrote:
> > > > [...]
> > > > Is it possible for you to provide a network trace?
> > > 
> > > Yes.
> > > 
> > > Is tcpdump what you'd like to see? wireshark's dumpcap?
> > > Any NFS specific tracing tools I should be using?
> > > 
> > > One trace with a working kernel and one with the broken one?
> > 
> > Comparing the good and the bad trace ...
> > 
> > mount -t nfs 192.168.155.74:/Public /mnt/Public
> > against Qnap 4.3.4.xxx NFS v4.1 server.
> > 
> > Both do:
> > 
> > Establish conn
> > NFS NULL (ack)
> > NFS EXCHANGE_ID (4.2 -> NFS4ERR_MINOR_VERS_MISMATCH)
> > Teardown and reestablish
> > NFS NULL (ack)
> > NFS EXCAHNGE_ID (4.1 -> ack)
> > NFS EXCAHNGE_ID (4.1 -> ack)
> > NFS CREATE_SESSION (ack)
> > NFS RECLAIM_COMPLETE (CB_NULL, ack)
> > NFS_SECINFO_NO_NAME (ack)
> > NFS PUTROOTFH|GETATTR (ack)
> > NFS GETATTR FH:0x62d40c52 (ack), 8 times
> > NFS ACCESS FH_ -x62d40c52 (denied md xt dl, alllowed rd lu)
> > NFS LOOKUP DH:0x62d40c52/Public (ack)
> > NFS LOOKUP DH:0x62d40c52/Public (ack)
> > NFS GETATTR FH:0x8ee88cee (ack), 3 times
> > 
> > 
> > Now the differences start:
> > 
> > The fixed NFS client repeatedly gets ack back, the broken NFS
> > client gets
> > 
> > NFS GETATTR FH:0x8ee88cee (NFS4ERR_DELAY), repeating forever (exp.
> > backoff)
> 
> Any idea why the server is not able to respond properly to
> the GETATTR request? That seems like the root of the problem.
> 

The GETATTR is a request for fs_locations in order to probe for
alternative IP addresses.

IIRC, some earlier implementations of knfsd had this response when the
mountd daemon wasn't configured to expect a referral upcall for that
location.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)
  2022-02-23 22:00           ` Trond Myklebust
@ 2022-02-23 22:22             ` Chuck Lever III
  0 siblings, 0 replies; 16+ messages in thread
From: Chuck Lever III @ 2022-02-23 22:22 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: kurt, Olga.Kornievskaia, Linux NFS Mailing List, Anna Schumaker



> On Feb 23, 2022, at 5:00 PM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> On Wed, 2022-02-23 at 18:06 +0000, Chuck Lever III wrote:
>> 
>> 
>>> On Feb 21, 2022, at 5:48 AM, Kurt Garloff <kurt@garloff.de> wrote:
>>> 
>>> Hi,
>>> 
>>> On 21.02.22 10:31, Kurt Garloff wrote:
>>>> Hi Olga,
>>>> 
>>>> On 21.02.22 02:19, Kornievskaia, Olga wrote:
>>>>> [...]
>>>>> Is it possible for you to provide a network trace?
>>>> 
>>>> Yes.
>>>> 
>>>> Is tcpdump what you'd like to see? wireshark's dumpcap?
>>>> Any NFS specific tracing tools I should be using?
>>>> 
>>>> One trace with a working kernel and one with the broken one?
>>> 
>>> Comparing the good and the bad trace ...
>>> 
>>> mount -t nfs 192.168.155.74:/Public /mnt/Public
>>> against Qnap 4.3.4.xxx NFS v4.1 server.
>>> 
>>> Both do:
>>> 
>>> Establish conn
>>> NFS NULL (ack)
>>> NFS EXCHANGE_ID (4.2 -> NFS4ERR_MINOR_VERS_MISMATCH)
>>> Teardown and reestablish
>>> NFS NULL (ack)
>>> NFS EXCAHNGE_ID (4.1 -> ack)
>>> NFS EXCAHNGE_ID (4.1 -> ack)
>>> NFS CREATE_SESSION (ack)
>>> NFS RECLAIM_COMPLETE (CB_NULL, ack)
>>> NFS_SECINFO_NO_NAME (ack)
>>> NFS PUTROOTFH|GETATTR (ack)
>>> NFS GETATTR FH:0x62d40c52 (ack), 8 times
>>> NFS ACCESS FH_ -x62d40c52 (denied md xt dl, alllowed rd lu)
>>> NFS LOOKUP DH:0x62d40c52/Public (ack)
>>> NFS LOOKUP DH:0x62d40c52/Public (ack)
>>> NFS GETATTR FH:0x8ee88cee (ack), 3 times
>>> 
>>> 
>>> Now the differences start:
>>> 
>>> The fixed NFS client repeatedly gets ack back, the broken NFS
>>> client gets
>>> 
>>> NFS GETATTR FH:0x8ee88cee (NFS4ERR_DELAY), repeating forever (exp.
>>> backoff)
>> 
>> Any idea why the server is not able to respond properly to
>> the GETATTR request? That seems like the root of the problem.
>> 
> 
> The GETATTR is a request for fs_locations in order to probe for
> alternative IP addresses.
> 
> IIRC, some earlier implementations of knfsd had this response when the
> mountd daemon wasn't configured to expect a referral upcall for that
> location.

knfsd, or mountd? Is there known to be a server-side fix available?


--
Chuck Lever




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)
  2022-02-23 17:56           ` Olga Kornievskaia
@ 2022-02-23 22:24             ` Kurt Garloff
  2022-02-24  8:17               ` Greg Kroah-Hartman
  0 siblings, 1 reply; 16+ messages in thread
From: Kurt Garloff @ 2022-02-23 22:24 UTC (permalink / raw)
  To: Olga Kornievskaia
  Cc: Kornievskaia, Olga, linux-nfs, Schumaker, Anna, Trond Myklebust,
	Greg Kroah-Hartman

Hi Olga,

On 23/02/2022 18:56, Olga Kornievskaia wrote:
> On Wed, Feb 23, 2022 at 8:20 AM Kurt Garloff <kurt@garloff.de> wrote:
>> Hi Olga,
>>
>> any updates? Were you able to investigate the traces?
>>
>> Breaking NFS mounts from Qnap (knfsd with 3.4.6 kernel here,
>> though Qnap might have patched it),is not something that
>> should happen with a -stable kernel update, even if the problem
>> would be on the Qnap side, which would not be completely
>> surprising.
>>
>> So I think we should revert this patch at least for -stable,
>> unless we understand what's going on and have a better fix
>> than a plain revert.
> I haven't commented on your ask of requesting a revert in the stable
> version. I'm not sure what the philosophy there. I don't see why we
> can't ask for this feature to only be available from the kernel
> version it has been accepted into and not before. If you think the
> kernel version that you want to use will always be before this feature
> was accepted, then asking folks responsible for "stable" kernels seems
> like a good idea. At the time of inclusion to stable, I wasn't aware
> of the broken legacy server implementations out there.

I guess Greg would need to comment on the detailed policies
for stable kernels.
One of the goals for sure is to avoid regressions. If that causes
bugs not to be fixable or features not to be available, than that's
a price that might need to be accepted. A regression is just many many
times worse than an unfixed issue, twice so for something that claims
to be stable.

So, if we are relatively sure that no NFSv4.2 server has the
kernel-3.4.6-knfsd Qnap (NFSv4.1) misbehavior, my change that masks the
new features for NFS<v4.2 might be what makes this patch acceptable
for stable. Otherwise, we should either revert it or make it
opt-in. The latter is not really a good idea if we then differ
from the main branch where we might go for an opt-out solution.
So maybe it's opt-out for main branch and for stable with an
additional guard against NFS<v4.2 at least for -stable.

Just my 0.02€.

-- 
Kurt Garloff <kurt@garloff.de>
Cologne, Germany



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)
  2022-02-23 17:49               ` Olga Kornievskaia
@ 2022-02-23 22:35                 ` Kurt Garloff
  0 siblings, 0 replies; 16+ messages in thread
From: Kurt Garloff @ 2022-02-23 22:35 UTC (permalink / raw)
  To: Olga Kornievskaia
  Cc: Kornievskaia, Olga, linux-nfs, Schumaker, Anna, Trond Myklebust

Hi Olga,

On 23/02/2022 18:49, Olga Kornievskaia wrote:
> I have posted a patch where you can mount with "notrunkdiscovery" and
> that should fix the problem with the Qnap server?

I have not seen it, unfortunately,y

Care to copy me?

You have seen my patch that limits the
FS_LOCATIONS capability to NFS >= v4.2 and
I found this to be effective in making things
work again. Assuming that you check for
the mount parameter instead of the NFS version
to disable this feature, I would assume the
option to be effective. I'm happy to test
as soon as I get hold of the patch.

Thanks,

-- 
Kurt Garloff <kurt@garloff.de>
Cologne, Germany


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD)
  2022-02-23 22:24             ` Kurt Garloff
@ 2022-02-24  8:17               ` Greg Kroah-Hartman
  0 siblings, 0 replies; 16+ messages in thread
From: Greg Kroah-Hartman @ 2022-02-24  8:17 UTC (permalink / raw)
  To: Kurt Garloff
  Cc: Olga Kornievskaia, Kornievskaia, Olga, linux-nfs, Schumaker,
	Anna, Trond Myklebust

On Wed, Feb 23, 2022 at 11:24:41PM +0100, Kurt Garloff wrote:
> Hi Olga,
> 
> On 23/02/2022 18:56, Olga Kornievskaia wrote:
> > On Wed, Feb 23, 2022 at 8:20 AM Kurt Garloff <kurt@garloff.de> wrote:
> > > Hi Olga,
> > > 
> > > any updates? Were you able to investigate the traces?
> > > 
> > > Breaking NFS mounts from Qnap (knfsd with 3.4.6 kernel here,
> > > though Qnap might have patched it),is not something that
> > > should happen with a -stable kernel update, even if the problem
> > > would be on the Qnap side, which would not be completely
> > > surprising.
> > > 
> > > So I think we should revert this patch at least for -stable,
> > > unless we understand what's going on and have a better fix
> > > than a plain revert.
> > I haven't commented on your ask of requesting a revert in the stable
> > version. I'm not sure what the philosophy there. I don't see why we
> > can't ask for this feature to only be available from the kernel
> > version it has been accepted into and not before. If you think the
> > kernel version that you want to use will always be before this feature
> > was accepted, then asking folks responsible for "stable" kernels seems
> > like a good idea. At the time of inclusion to stable, I wasn't aware
> > of the broken legacy server implementations out there.
> 
> I guess Greg would need to comment on the detailed policies
> for stable kernels.
> One of the goals for sure is to avoid regressions. If that causes
> bugs not to be fixable or features not to be available, than that's
> a price that might need to be accepted. A regression is just many many
> times worse than an unfixed issue, twice so for something that claims
> to be stable.

The policy for the stable kernel releases is the same as for Linus's
releases, "no user visible regressions are allowed".

There is no difference here, if something changes in one of Linus's
releases that breaks a working system, then it needs to be fixed.  The
stable kernels are not unique here at all.  Any user must be able to
always upgrade to a new kernel version without having to worry about
anything breaking.

So if there is a kernel change in Linus's tree that breaks existing
systems, it needs to be reverted or fixed to not do this.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-02-24  8:21 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-20 22:26 6f283634 / 1976b2b3 breaks NFS (QNAP/Linux kNFSD) Kurt Garloff
2022-02-20 23:17 ` Kurt Garloff
2022-02-21  1:19   ` Kornievskaia, Olga
2022-02-21  9:31     ` Kurt Garloff
2022-02-21 10:48       ` Kurt Garloff
2022-02-23  8:05         ` Kurt Garloff
2022-02-23 14:22           ` Olga Kornievskaia
2022-02-23 17:31             ` Kurt Garloff
2022-02-23 17:49               ` Olga Kornievskaia
2022-02-23 22:35                 ` Kurt Garloff
2022-02-23 17:56           ` Olga Kornievskaia
2022-02-23 22:24             ` Kurt Garloff
2022-02-24  8:17               ` Greg Kroah-Hartman
2022-02-23 18:06         ` Chuck Lever III
2022-02-23 22:00           ` Trond Myklebust
2022-02-23 22:22             ` Chuck Lever III

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.