All of lore.kernel.org
 help / color / mirror / Atom feed
* statd: server localhost not responding, timed out; lockd: cannot monitor <client>
@ 2010-08-01  7:47 Dan Wallis
  2010-08-02  9:12 ` Bian Naimeng
  0 siblings, 1 reply; 2+ messages in thread
From: Dan Wallis @ 2010-08-01  7:47 UTC (permalink / raw)
  To: linux-nfs

Hello list :)

This is my first post here, so if it's in the wrong place, or if I've
not read the appropriate documentation, please feel free to send me a
link to the relevant bits. I've tried searching the archives, and the
internet at large, but haven't yet come across anything that's helped.

Our NFS-shared file-system is locking up.

At the time, there are a lot of processes in "disk sleep" state, and
the load averages on our machines sky-rocket. The machines are
responsive on SSH, but our the majority of our websites
(apache+mod_php) just hang, as does our email system (exim+dovecot).
Any websites which don't require write access to the file-system
continue to operate.

The load averages continue to rise until some kind of time-out is
reached, but for at least 10-15 minutes. I've seen load averages over
800, yet the machines are still responsive for actions which don't
require writing to the shared file-system.

I've been investigating a variety of options, which have all turned
out to be red-herrings: nagios, proftpd, bind, cron tasks.

I'm seeing these messages in the file server's system log:

Jul 30 09:37:17 fs0 kernel: [1810036.560046] statd: server localhost
not responding, timed out
Jul 30 09:37:17 fs0 kernel: [1810036.560053] nsm_mon_unmon: rpc
failed, status=-5
Jul 30 09:37:17 fs0 kernel: [1810036.560064] lockd: cannot monitor node2
Jul 30 09:38:22 fs0 kernel: [1810101.384027] statd: server localhost
not responding, timed out
Jul 30 09:38:22 fs0 kernel: [1810101.384033] nsm_mon_unmon: rpc
failed, status=-5
Jul 30 09:38:22 fs0 kernel: [1810101.384044] lockd: cannot monitor node0

Software involved:
VMWare, Debian lenny (64bit), ancient Red Hat (32 bit) (version 7 I
believe), Debian etch (32bit)
NFS, apache2+mod_php, exim, dovecot, bind, amanda, proftpd, nagios,
cacti, drbd, heartbeat, keepalived, LVS, cron, ssmtp, NIS, svn,
puppet, memcache, mysql, postgres
Joomla!, Magento, Typo3, Midgard, Symfony, custom php apps

So far, I've spent a lot of time trying to reproduce the problem, but
haven't been able to do so.


If you've any ideas, please let me know. :)



Cheers
Dan

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: statd: server localhost not responding, timed out; lockd: cannot monitor <client>
  2010-08-01  7:47 statd: server localhost not responding, timed out; lockd: cannot monitor <client> Dan Wallis
@ 2010-08-02  9:12 ` Bian Naimeng
  0 siblings, 0 replies; 2+ messages in thread
From: Bian Naimeng @ 2010-08-02  9:12 UTC (permalink / raw)
  To: Dan Wallis; +Cc: linux-nfs

> Hello list :)
> 
> This is my first post here, so if it's in the wrong place, or if I've
> not read the appropriate documentation, please feel free to send me a
> link to the relevant bits. I've tried searching the archives, and the
> internet at large, but haven't yet come across anything that's helped.

   ... snip ...

> I'm seeing these messages in the file server's system log:
> 
> Jul 30 09:37:17 fs0 kernel: [1810036.560046] statd: server localhost
> not responding, timed out
> Jul 30 09:37:17 fs0 kernel: [1810036.560053] nsm_mon_unmon: rpc
> failed, status=-5
> Jul 30 09:37:17 fs0 kernel: [1810036.560064] lockd: cannot monitor node2
> Jul 30 09:38:22 fs0 kernel: [1810101.384027] statd: server localhost
> not responding, timed out
> Jul 30 09:38:22 fs0 kernel: [1810101.384033] nsm_mon_unmon: rpc
> failed, status=-5
> Jul 30 09:38:22 fs0 kernel: [1810101.384044] lockd: cannot monitor node0
> 

  It looks like that nfslock service(statd) not running, please try run the
  followed command,

  # service rpcbind start
  # service nfslock restart

-- 
Regards
Bian Naimeng


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2010-08-02  9:13 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-01  7:47 statd: server localhost not responding, timed out; lockd: cannot monitor <client> Dan Wallis
2010-08-02  9:12 ` Bian Naimeng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.