All of lore.kernel.org
 help / color / mirror / Atom feed
* Lockd error message is unclear.
@ 2021-04-27 19:03 Rogier Wolff
  2021-04-27 19:34 ` J. Bruce Fields
  0 siblings, 1 reply; 3+ messages in thread
From: Rogier Wolff @ 2021-04-27 19:03 UTC (permalink / raw)
  To: bfields, chuck.lever, linux-nfs


Hi, 

Two things..... 

I got: 

   lockd: cannot monitor <client> 

in the logfile and the client was terrily slow/not working at all.

everything pointed to a lockd problem... 

In the end... it turns out that my rpc.statd stopped working.  I had
to go and download the sources to figure this out... I would firstly
suggest to improve the error message to give others running into this
more hints as to where to look.

The erorr message on line 169 of lockd.c could read: 

	lockd: Error in the rpc to rpc.statd to monitor %s\n

Would it be an idea to print the res.status error code? 


That said... 

When this situation is going on, the client grinds to a halt, and
lockd seems "stuck" in D state. I tried killing or stracing it, to try
to clear the error, before I found out it is a kernel deamon...

When this failure happens, I get the impression that lockd keeps on
trying to be "of service", retrying operations that are bound to
fail. So maybe the error should be cached, and then immediately
handled instead of making the client grind to a halt. (it is the (one
second?) timeout in nsm_mon_unmon and the big backlog of requests that
result in the same call and timeout that frustrate the client... )

	Roger. 


-- 
** R.E.Wolff@BitWizard.nl ** https://www.BitWizard.nl/ ** +31-15-2049110 **
**    Delftechpark 11 2628 XJ  Delft, The Netherlands.  KVK: 27239233    **
f equals m times a. When your f is steady, and your m is going down
your a is going up.  -- Chris Hadfield about flying up the space shuttle.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Lockd error message is unclear.
  2021-04-27 19:03 Lockd error message is unclear Rogier Wolff
@ 2021-04-27 19:34 ` J. Bruce Fields
  2021-04-27 21:10   ` Rogier Wolff
  0 siblings, 1 reply; 3+ messages in thread
From: J. Bruce Fields @ 2021-04-27 19:34 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: chuck.lever, linux-nfs

On Tue, Apr 27, 2021 at 09:03:11PM +0200, Rogier Wolff wrote:
> 
> Hi, 
> 
> Two things..... 
> 
> I got: 
> 
>    lockd: cannot monitor <client> 
> 
> in the logfile and the client was terrily slow/not working at all.
> 
> everything pointed to a lockd problem... 
> 
> In the end... it turns out that my rpc.statd stopped working.  I had
> to go and download the sources to figure this out... I would firstly
> suggest to improve the error message to give others running into this
> more hints as to where to look.
> 
> The erorr message on line 169 of lockd.c could read: 
> 
> 	lockd: Error in the rpc to rpc.statd to monitor %s\n
> 
> Would it be an idea to print the res.status error code? 

I'm not sure about the wording, but including the error code sounds like
a good idea.  (Would that have made a difference in your case?)

> That said... 
> 
> When this situation is going on, the client grinds to a halt, and
> lockd seems "stuck" in D state. I tried killing or stracing it, to try
> to clear the error, before I found out it is a kernel deamon...
> 
> When this failure happens, I get the impression that lockd keeps on
> trying to be "of service", retrying operations that are bound to
> fail. So maybe the error should be cached, and then immediately
> handled instead of making the client grind to a halt. (it is the (one
> second?) timeout in nsm_mon_unmon and the big backlog of requests that
> result in the same call and timeout that frustrate the client... )

The -ECONNREFUSED case?

I'm not sure why it retries there.  Maybe just to allow stopping and
starting rpc.statd (e.g. for upgrades) without failing operations?

--b.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Lockd error message is unclear.
  2021-04-27 19:34 ` J. Bruce Fields
@ 2021-04-27 21:10   ` Rogier Wolff
  0 siblings, 0 replies; 3+ messages in thread
From: Rogier Wolff @ 2021-04-27 21:10 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: chuck.lever, linux-nfs

On Tue, Apr 27, 2021 at 03:34:52PM -0400, J. Bruce Fields wrote:
> On Tue, Apr 27, 2021 at 09:03:11PM +0200, Rogier Wolff wrote:
> > 
> > Hi, 
> > 
> > Two things..... 
> > 
> > I got: 
> > 
> >    lockd: cannot monitor <client> 
> > 
> > in the logfile and the client was terrily slow/not working at all.
> > 
> > everything pointed to a lockd problem... 
> > 
> > In the end... it turns out that my rpc.statd stopped working.  I had
> > to go and download the sources to figure this out... I would firstly
> > suggest to improve the error message to give others running into this
> > more hints as to where to look.
> > 
> > The erorr message on line 169 of lockd.c could read: 
> > 
> > 	lockd: Error in the rpc to rpc.statd to monitor %s\n
> > 
> > Would it be an idea to print the res.status error code? 
> 
> I'm not sure about the wording, but including the error code sounds like
> a good idea.  (Would that have made a difference in your case?)

Not sure. Of course I was just "looking for a solution". So once I
figured out that rpc.statd was missing I went looking for how that
came about. 

But as it was the prime culprit was "lockd is misbehaving". With a
better error message you can shift the blame away from your part of
the system. :-)

> > second?) timeout in nsm_mon_unmon and the big backlog of requests that
> > result in the same call and timeout that frustrate the client... )
> 
> The -ECONNREFUSED case?
> 
> I'm not sure why it retries there.  Maybe just to allow stopping and
> starting rpc.statd (e.g. for upgrades) without failing operations?

Not sure IF it was retrying. Maybe not. But starting "google-chrome"
with 40 open tabs didn't progress to any tabs loading inside the half
hour that I was looking for why this was happening (unable to google
for a solution).... So in the meantime it was constantly spewing the
error message, rate limited to 10 per minute....

	Roger.

-- 
** R.E.Wolff@BitWizard.nl ** https://www.BitWizard.nl/ ** +31-15-2049110 **
**    Delftechpark 11 2628 XJ  Delft, The Netherlands.  KVK: 27239233    **
f equals m times a. When your f is steady, and your m is going down
your a is going up.  -- Chris Hadfield about flying up the space shuttle.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-04-27 21:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-27 19:03 Lockd error message is unclear Rogier Wolff
2021-04-27 19:34 ` J. Bruce Fields
2021-04-27 21:10   ` Rogier Wolff

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.