All of lore.kernel.org
 help / color / mirror / Atom feed
* NFSv3/NFSv4 problem.
@ 2010-03-01 15:01 Anton Starikov
  2010-03-02 17:16 ` J. Bruce Fields
  0 siblings, 1 reply; 5+ messages in thread
From: Anton Starikov @ 2010-03-01 15:01 UTC (permalink / raw)
  To: linux-nfs

Hi, 


my config is diskless NFSv3 nfsroot (+ some extra NFDSv3 mounts) and NFSv4 /home/* automount.
Centos 5.4, kernel 2.6.18-164.11.1.el5.

Periodically my nodes hangs, nothing appeared in the logs (remote syslog + netconsole).
Node is kind of alive, you can ping, some deamons (for example pbs_mom) reports that it's alive etc.
But anything which require FS access - frozen.

Another symptom, it looks like portmap doesn't answer. At lease if I try "rpcinfo -p node_name", then it ends with
"rpcinfo: can't contact portmapper: rpcinfo: RPC: Timed out"

In principal, this can have something with locking.
At least, I had to mount all my NFSv3 mounts with nolock, to reduce frequency of problem (nfsroot was nolock, obviously. but there are couple of extra v3 mounts, like /opt with extra software and RW directory for torque.

What can be a problem here?

What kind of information I have to collect from system to figure out what it real problem?

Anton. 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NFSv3/NFSv4 problem.
  2010-03-01 15:01 NFSv3/NFSv4 problem Anton Starikov
@ 2010-03-02 17:16 ` J. Bruce Fields
       [not found]   ` <0A63A1BA-F749-4CFF-B77D-98AEFC531035@gmail.com>
  0 siblings, 1 reply; 5+ messages in thread
From: J. Bruce Fields @ 2010-03-02 17:16 UTC (permalink / raw)
  To: Anton Starikov; +Cc: linux-nfs

On Mon, Mar 01, 2010 at 04:01:42PM +0100, Anton Starikov wrote:
> Hi, 
> 
> 
> my config is diskless NFSv3 nfsroot (+ some extra NFDSv3 mounts) and NFSv4 /home/* automount.
> Centos 5.4, kernel 2.6.18-164.11.1.el5.

That's the client?  What's the server?

That's pretty old kernel; I'd file a bug with CentOS.

> Periodically my nodes hangs, nothing appeared in the logs (remote syslog + netconsole).
> Node is kind of alive, you can ping, some deamons (for example pbs_mom) reports that it's alive etc.
> But anything which require FS access - frozen.
> 
> Another symptom, it looks like portmap doesn't answer. At lease if I try "rpcinfo -p node_name", then it ends with
> "rpcinfo: can't contact portmapper: rpcinfo: RPC: Timed out"
> 
> In principal, this can have something with locking.
> At least, I had to mount all my NFSv3 mounts with nolock, to reduce frequency of problem (nfsroot was nolock, obviously. but there are couple of extra v3 mounts, like /opt with extra software and RW directory for torque.
> 
> What can be a problem here?
> 
> What kind of information I have to collect from system to figure out what it real problem?

Is there any server-side logging?

Can you see any interesting network traffic after the hang?

--b.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NFSv3/NFSv4 problem.
       [not found]   ` <0A63A1BA-F749-4CFF-B77D-98AEFC531035@gmail.com>
@ 2010-03-02 17:52     ` J. Bruce Fields
       [not found]       ` <F4A02563-59B0-4A9B-B773-4FCF5FD2A8D4@gmail.com>
  0 siblings, 1 reply; 5+ messages in thread
From: J. Bruce Fields @ 2010-03-02 17:52 UTC (permalink / raw)
  To: Anton Starikov; +Cc: linux-nfs

On Tue, Mar 02, 2010 at 06:20:43PM +0100, Anton Starikov wrote:
> 
> On Mar 2, 2010, at 6:16 PM, J. Bruce Fields wrote:
> 
> > On Mon, Mar 01, 2010 at 04:01:42PM +0100, Anton Starikov wrote:
> >> Hi, 
> >> 
> >> 
> >> my config is diskless NFSv3 nfsroot (+ some extra NFDSv3 mounts) and NFSv4 /home/* automount.
> >> Centos 5.4, kernel 2.6.18-164.11.1.el5.
> > 
> > That's the client?  What's the server?
> 
> Server is Opensolaris.
> 
> > That's pretty old kernel; I'd file a bug with CentOS.
> 
> Unfortunately, with newer kernels this setup is even more problematic. :)

Any details?

As a rule, this list is probably going to be a better place to handle
bugs with the latest upstream kernels, and your distributor is more
likely to be useful for their kernels.

--b.

> 
> >> 
> >> What kind of information I have to collect from system to figure out what it real problem?
> > 
> > Is there any server-side logging?
> > Can you see any interesting network traffic after the hang?
> 
> It always unfortunate, but last couple of days I can't get a hang :) Although nothing changed in setup, so it will happen anyway.
> 
> 
> Anton.
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NFSv3/NFSv4 problem.
       [not found]       ` <F4A02563-59B0-4A9B-B773-4FCF5FD2A8D4@gmail.com>
@ 2010-03-03 15:45         ` J. Bruce Fields
  2010-03-04 13:36           ` Anton Starikov
  0 siblings, 1 reply; 5+ messages in thread
From: J. Bruce Fields @ 2010-03-03 15:45 UTC (permalink / raw)
  To: Anton Starikov; +Cc: linux-nfs

On Tue, Mar 02, 2010 at 07:05:08PM +0100, Anton Starikov wrote:
> 
> On Mar 2, 2010, at 6:52 PM, J. Bruce Fields wrote:
> 
> >> 
> >>> That's pretty old kernel; I'd file a bug with CentOS.
> >> 
> >> Unfortunately, with newer kernels this setup is even more problematic. :)
> > 
> > Any details?
> > 
> > As a rule, this list is probably going to be a better place to handle
> > bugs with the latest upstream kernels, and your distributor is more
> > likely to be useful for their kernels.
> 

> I submitted that to this list about year ago. It seems that one of the
> biggest issues that with NFS3 root, and NFS4 /home idmapd get
> deadlocked. To resolve NFS4 credentials it need to access NFS3. which
> is blocked by waiting final of NFS4 operation. I tried to move a lot
> of stuff to tmpfs, but it didn't resolve situation, if root still
> NFS3.

Do you have a pointer to the previous discussion?

--b.

> 
> My general observation is that there is trend: newer kernel, faster
> you get deadlock with this setup :)
> 
> Anton.
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NFSv3/NFSv4 problem.
  2010-03-03 15:45         ` J. Bruce Fields
@ 2010-03-04 13:36           ` Anton Starikov
  0 siblings, 0 replies; 5+ messages in thread
From: Anton Starikov @ 2010-03-04 13:36 UTC (permalink / raw)
  Cc: linux-nfs

What can I do to debug problem?

This issue is killing me! 

BTW, I also created centos bug-report.

Anton.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-03-04 13:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-01 15:01 NFSv3/NFSv4 problem Anton Starikov
2010-03-02 17:16 ` J. Bruce Fields
     [not found]   ` <0A63A1BA-F749-4CFF-B77D-98AEFC531035@gmail.com>
2010-03-02 17:52     ` J. Bruce Fields
     [not found]       ` <F4A02563-59B0-4A9B-B773-4FCF5FD2A8D4@gmail.com>
2010-03-03 15:45         ` J. Bruce Fields
2010-03-04 13:36           ` Anton Starikov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.