linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* nfs4err_delay
@ 2021-09-22 20:56 Kazi Anwar
  2021-09-24 16:39 ` nfs4err_delay J. Bruce Fields
  0 siblings, 1 reply; 8+ messages in thread
From: Kazi Anwar @ 2021-09-22 20:56 UTC (permalink / raw)
  To: linux-nfs

Hi,
We are running nfs v 4.1 on centos 7.6.
We are seeing an NFS issue where when files/dirs are deleted from a
client they are occasionally stuck at unlinkat system call(according
to strace its stuck for 100.5 secs every time). Can anyone explain
this behavior?
Running tcp dump shows nfs4err_delay status sent from the server to
the stuck client.

--
Kazi Anwar

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nfs4err_delay
  2021-09-22 20:56 nfs4err_delay Kazi Anwar
@ 2021-09-24 16:39 ` J. Bruce Fields
  2021-09-24 17:54   ` nfs4err_delay Kazi Anwar
  0 siblings, 1 reply; 8+ messages in thread
From: J. Bruce Fields @ 2021-09-24 16:39 UTC (permalink / raw)
  To: Kazi Anwar; +Cc: linux-nfs

On Wed, Sep 22, 2021 at 03:56:23PM -0500, Kazi Anwar wrote:
> We are running nfs v 4.1 on centos 7.6.
> We are seeing an NFS issue where when files/dirs are deleted from a
> client they are occasionally stuck at unlinkat system call(according
> to strace its stuck for 100.5 secs every time). Can anyone explain
> this behavior?
> Running tcp dump shows nfs4err_delay status sent from the server to
> the stuck client.

Client and server are both centos 7.6?

Is the NFS4ERR_DELAY a reponse to a REMOVE?

Does /proc/locks show a delegation held on the file the client's trying
to remove?

--b.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nfs4err_delay
  2021-09-24 16:39 ` nfs4err_delay J. Bruce Fields
@ 2021-09-24 17:54   ` Kazi Anwar
  2021-09-24 19:03     ` nfs4err_delay J. Bruce Fields
  0 siblings, 1 reply; 8+ messages in thread
From: Kazi Anwar @ 2021-09-24 17:54 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

Yes, both clients and server are centos 7.6. And the NFS4ERR_DELAY is
a response to a REMOVE.
I will need to check on the locks the next time it happens. Can you
share what you are thinking?

thanks,
Kazi

On Fri, Sep 24, 2021 at 11:39 AM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Wed, Sep 22, 2021 at 03:56:23PM -0500, Kazi Anwar wrote:
> > We are running nfs v 4.1 on centos 7.6.
> > We are seeing an NFS issue where when files/dirs are deleted from a
> > client they are occasionally stuck at unlinkat system call(according
> > to strace its stuck for 100.5 secs every time). Can anyone explain
> > this behavior?
> > Running tcp dump shows nfs4err_delay status sent from the server to
> > the stuck client.
>
> Client and server are both centos 7.6?
>
> Is the NFS4ERR_DELAY a reponse to a REMOVE?
>
> Does /proc/locks show a delegation held on the file the client's trying
> to remove?
>
> --b.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nfs4err_delay
  2021-09-24 17:54   ` nfs4err_delay Kazi Anwar
@ 2021-09-24 19:03     ` J. Bruce Fields
  0 siblings, 0 replies; 8+ messages in thread
From: J. Bruce Fields @ 2021-09-24 19:03 UTC (permalink / raw)
  To: Kazi Anwar; +Cc: linux-nfs

On Fri, Sep 24, 2021 at 12:54:30PM -0500, Kazi Anwar wrote:
> Yes, both clients and server are centos 7.6. And the NFS4ERR_DELAY is
> a response to a REMOVE.
> I will need to check on the locks the next time it happens. Can you
> share what you are thinking?

Offhand the only reason I can think a server would return DELAY is that
there's a delegation on the file being removed, and the delegation
recall and return isn't working for some reason.

If that's the case, it should succeed after about 90 seconds.  Also, you
can workaround the problem by turning of delegations and leases with
"echo 0 >/proc/sys/fs/leases_enable".

--b.

> 
> thanks,
> Kazi
> 
> On Fri, Sep 24, 2021 at 11:39 AM J. Bruce Fields <bfields@fieldses.org> wrote:
> >
> > On Wed, Sep 22, 2021 at 03:56:23PM -0500, Kazi Anwar wrote:
> > > We are running nfs v 4.1 on centos 7.6.
> > > We are seeing an NFS issue where when files/dirs are deleted from a
> > > client they are occasionally stuck at unlinkat system call(according
> > > to strace its stuck for 100.5 secs every time). Can anyone explain
> > > this behavior?
> > > Running tcp dump shows nfs4err_delay status sent from the server to
> > > the stuck client.
> >
> > Client and server are both centos 7.6?
> >
> > Is the NFS4ERR_DELAY a reponse to a REMOVE?
> >
> > Does /proc/locks show a delegation held on the file the client's trying
> > to remove?
> >
> > --b.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFS4ERR_DELAY
  2012-08-21 17:47         ` NFS4ERR_DELAY J. Bruce Fields
@ 2012-08-22  8:33           ` Sven Geggus
  0 siblings, 0 replies; 8+ messages in thread
From: Sven Geggus @ 2012-08-22  8:33 UTC (permalink / raw)
  To: linux-nfs

J. Bruce Fields <bfields@fieldses.org> wrote:

> Yep.  There's a recent regression which could cause this; could you try:
> 
>        https://lkml.org/lkml/2012/8/16/531

Patch applied, problem solved.

Thanks

Sven

-- 
Um Kontrolle Ihres Kontos wiederzugewinnen, klicken Sie bitte auf das
Verbindungsgebrüll. (aus einer Ebay fishing Mail)

/me is giggls@ircnet, http://sven.gegg.us/ on the Web

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFS4ERR_DELAY
  2012-08-21 13:07       ` NFS4ERR_DELAY Jeff Layton
@ 2012-08-21 17:47         ` J. Bruce Fields
  2012-08-22  8:33           ` NFS4ERR_DELAY Sven Geggus
  0 siblings, 1 reply; 8+ messages in thread
From: J. Bruce Fields @ 2012-08-21 17:47 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Sven Geggus, linux-nfs

On Tue, Aug 21, 2012 at 09:07:06AM -0400, Jeff Layton wrote:
> On Tue, 21 Aug 2012 12:52:16 +0000 (UTC)
> Sven Geggus <lists@fuchsschwanzdomain.de> wrote:
> > The NFS-server is providing the home for the user on both machines.
> > 
> > The ssh is now getting delayed for up to 1 minute because the NFS server
> > does not allow for the .Xauthority file to be deleted immediately.
> > 
> > It is probably worth to mention, that I'm currently experimenting with
> > btrfs on the server. Is there a chance that this bug will disapper when I
> > change the underlaying filesystem of the server to ext4?
> > 
> > Sven
> > 
> 
> You asked for hints on how to debug it, and I gave one. The server will
> often return NFS4ERR_DELAY when it's waiting for a delegation recall to
> complete. I'd make sure that that's all working as expected.

Yep.  There's a recent regression which could cause this; could you try:

	https://lkml.org/lkml/2012/8/16/531

?

--b.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFS4ERR_DELAY
  2012-08-21 12:52     ` NFS4ERR_DELAY Sven Geggus
@ 2012-08-21 13:07       ` Jeff Layton
  2012-08-21 17:47         ` NFS4ERR_DELAY J. Bruce Fields
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Layton @ 2012-08-21 13:07 UTC (permalink / raw)
  To: Sven Geggus; +Cc: linux-nfs

On Tue, 21 Aug 2012 12:52:16 +0000 (UTC)
Sven Geggus <lists@fuchsschwanzdomain.de> wrote:

> Jeff Layton <jlayton@redhat.com> wrote:
> 
> > It's often the case that this indicates a problem communicating over
> > the callback channel. For instance, the server is trying to recall a
> > delegation but the client isn't responding, so the server has to wait
> > until the recall attempt times out before proceeding.
> 
> Hm I'm not shure if I understand this correctly.
> 
> I am talking about exactly 3 machines (and one single user for now) here:
> clientA, clientB and the NFS-server.
> 
> "user" is logged in on clientA any now opens a shell to ssh to clientB.
> 

Right, so you probably opened ~/Xauthority on clientA and got a
delegation. Then you ssh'ed to clientB and opened the file there. At
that point, the server has to recall the delegation. Usually that's
pretty quick, but if the server can't talk to clientA on the callback
port then it has to wait and eventually time out before it can allow
the open on clientB to proceed.

> The NFS-server is providing the home for the user on both machines.
> 
> The ssh is now getting delayed for up to 1 minute because the NFS server
> does not allow for the .Xauthority file to be deleted immediately.
> 
> It is probably worth to mention, that I'm currently experimenting with
> btrfs on the server. Is there a chance that this bug will disapper when I
> change the underlaying filesystem of the server to ext4?
> 
> Sven
> 

You asked for hints on how to debug it, and I gave one. The server will
often return NFS4ERR_DELAY when it's waiting for a delegation recall to
complete. I'd make sure that that's all working as expected.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFS4ERR_DELAY
  2012-08-21 10:41   ` Jeff Layton
@ 2012-08-21 12:52     ` Sven Geggus
  2012-08-21 13:07       ` NFS4ERR_DELAY Jeff Layton
  0 siblings, 1 reply; 8+ messages in thread
From: Sven Geggus @ 2012-08-21 12:52 UTC (permalink / raw)
  To: linux-nfs

Jeff Layton <jlayton@redhat.com> wrote:

> It's often the case that this indicates a problem communicating over
> the callback channel. For instance, the server is trying to recall a
> delegation but the client isn't responding, so the server has to wait
> until the recall attempt times out before proceeding.

Hm I'm not shure if I understand this correctly.

I am talking about exactly 3 machines (and one single user for now) here:
clientA, clientB and the NFS-server.

"user" is logged in on clientA any now opens a shell to ssh to clientB.

The NFS-server is providing the home for the user on both machines.

The ssh is now getting delayed for up to 1 minute because the NFS server
does not allow for the .Xauthority file to be deleted immediately.

It is probably worth to mention, that I'm currently experimenting with
btrfs on the server. Is there a chance that this bug will disapper when I
change the underlaying filesystem of the server to ext4?

Sven

-- 
Trotz der zunehmenden Verbreitung von Linux erfreut sich der Bär,
und - dank Knut - insbesondere der Eisbär, deutlich größerer
Beliebtheit als der Pinguin. (Gefunden bei http://telepolis.de/)
/me is giggls@ircnet, http://sven.gegg.us/ on the Web

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-09-24 19:03 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-22 20:56 nfs4err_delay Kazi Anwar
2021-09-24 16:39 ` nfs4err_delay J. Bruce Fields
2021-09-24 17:54   ` nfs4err_delay Kazi Anwar
2021-09-24 19:03     ` nfs4err_delay J. Bruce Fields
  -- strict thread matches above, loose matches on Subject: below --
2012-08-21  8:25 NFS4: ssh + unlink(~/.Xauthority) delays Sven Geggus
2012-08-21  9:19 ` NFS4ERR_DELAY (was: NFS4: ssh + unlink(~/.Xauthority) delays) Sven Geggus
2012-08-21 10:41   ` Jeff Layton
2012-08-21 12:52     ` NFS4ERR_DELAY Sven Geggus
2012-08-21 13:07       ` NFS4ERR_DELAY Jeff Layton
2012-08-21 17:47         ` NFS4ERR_DELAY J. Bruce Fields
2012-08-22  8:33           ` NFS4ERR_DELAY Sven Geggus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).