All of lore.kernel.org
 help / color / mirror / Atom feed
* nfsroot clients hang while mounting second NFS server
@ 2003-07-11  1:23 Chris Adams
  2003-07-11  9:34 ` Trond Myklebust
  0 siblings, 1 reply; 4+ messages in thread
From: Chris Adams @ 2003-07-11  1:23 UTC (permalink / raw)
  To: nfs

We have a small cluster using a single head node and a number of 
diskless compute nodes, all running 2.4.20-openmosix. We have an old 
NAS box which contains the NFS root filesystems for the compute nodes 
and our data directories. Now I'm trying to add another NAS box into 
the mix and have found it an easy way to kill our nodes.

Basic network connectivity is fine and rpcinfo / showmount on the nodes 
works as expected. No matter what options I use (hard/soft, tcp/udp, 
etc.) as soon as I try to mount the new server the nodes will hang. At 
that point they are still pingable but nothing higher level will 
respond and all open connections will fail.

Here's the total traffic I see from the new NAS server's perspective - 
10.0.73.29 is the node, 192.168.1.20 is the new file server:

  18   7.186921   10.0.73.29 -> 192.168.1.20 TCP 722 > sunrpc [SYN] 
Seq=2685755181 Ack=0 Win=5840 Len=0 MSS=1460 TSV=133216 TSER=0 WS=0
  19   7.187115   10.0.73.29 -> 192.168.1.20 TCP 722 > sunrpc [ACK] 
Seq=2685755182 Ack=2199399551 Win=5840 Len=0 TSV=133216 TSER=131447901
  20   7.187190   10.0.73.29 -> 192.168.1.20 Portmap V2 DUMP Call XID 
0x524a6869
  21   7.187581   10.0.73.29 -> 192.168.1.20 TCP 722 > sunrpc [ACK] 
Seq=2685755226 Ack=2199399951 Win=6432 Len=0 TSV=133216 TSER=131447901
  22   7.187703   10.0.73.29 -> 192.168.1.20 TCP 722 > sunrpc [ACK] 
Seq=2685755226 Ack=2199400751 Win=8000 Len=0 TSV=133216 TSER=131447901
  23   7.187875   10.0.73.29 -> 192.168.1.20 TCP 722 > sunrpc [ACK] 
Seq=2685755226 Ack=2199400875 Win=8000 Len=0 TSV=133216 TSER=131447901
  24   7.187900   10.0.73.29 -> 192.168.1.20 TCP 722 > sunrpc [FIN, ACK] 
Seq=2685755226 Ack=2199400875 Win=8000 Len=0 TSV=133216 TSER=131447901
  25   7.187999   10.0.73.29 -> 192.168.1.20 MOUNT V3 MNT Call XID 
0xe10349f
  26   7.188107   10.0.73.29 -> 192.168.1.20 TCP 722 > sunrpc [ACK] 
Seq=2685755227 Ack=2199400876 Win=8000 Len=0 TSV=133216 TSER=131447901
  27   7.199825   10.0.73.29 -> 192.168.1.20 Portmap V2 GETPORT Call XID 
0x1832c2fd
  30   7.200203 192.168.1.20 -> 10.0.73.29   Portmap V2 GETPORT Reply 
XID 0x1832c2fd

At that point the machine is dead and and the server won't see another 
packet from it. We're having some problems with the terminal server so 
I can't tell if it's dumping a panic to the serial console.

Is there a known problem with the in-kernel NFS support which might 
cause problems with multiple NFS servers? We aren't having problems on 
the other machines with the 2.4.20 kernel which makes me suspect the 
nfsroot support is involved somehow but I haven't found anything 
pertinent in my searches.

Thanks,
Chris



-------------------------------------------------------
This SF.Net email sponsored by: Parasoft
Error proof Web apps, automate testing & more.
Download & eval WebKing and get a free book.
www.parasoft.com/bulletproofapps1
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: nfsroot clients hang while mounting second NFS server
  2003-07-11  1:23 nfsroot clients hang while mounting second NFS server Chris Adams
@ 2003-07-11  9:34 ` Trond Myklebust
  2003-07-11 19:07   ` Chris Adams
  0 siblings, 1 reply; 4+ messages in thread
From: Trond Myklebust @ 2003-07-11  9:34 UTC (permalink / raw)
  To: Chris Adams; +Cc: nfs


Looks like it hangs just after a GETPORT call. Any info on what that
GETPORT call is for (i.e. what the arguments are)? Ethereal should be
able to decode that information for you...

Cheers,
  Trond


-------------------------------------------------------
This SF.Net email sponsored by: Parasoft
Error proof Web apps, automate testing & more.
Download & eval WebKing and get a free book.
www.parasoft.com/bulletproofapps1
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: nfsroot clients hang while mounting second NFS server
  2003-07-11  9:34 ` Trond Myklebust
@ 2003-07-11 19:07   ` Chris Adams
  2003-07-11 19:27     ` Trond Myklebust
  0 siblings, 1 reply; 4+ messages in thread
From: Chris Adams @ 2003-07-11 19:07 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: nfs

[-- Attachment #1: Type: text/plain, Size: 559 bytes --]

On Friday, July 11, 2003, at 02:34  AM, Trond Myklebust wrote:
> Looks like it hangs just after a GETPORT call. Any info on what that
> GETPORT call is for (i.e. what the arguments are)? Ethereal should be
> able to decode that information for you...

I've attached the full output - here's the request info & response:

Request:
Portmap
     Program Version: 2
     V2 Procedure: GETPORT (3)
     Program: NFS (100003)
     Version: 3
     Proto: UDP (17)
     Port: 0

Reply:
Portmap
     Program Version: 2
     V2 Procedure: GETPORT (3)
     Port: 2049



[-- Attachment #2: portmap-traffic.txt --]
[-- Type: text/plain, Size: 7502 bytes --]

Frame 20 (110 bytes on wire, 110 bytes captured)
    Arrival Time: Jul  8, 2003 18:24:11.858187000
    Time delta from previous packet: 7.187190000 seconds
    Time relative to first packet: 7.187190000 seconds
    Frame Number: 20
    Packet Length: 110 bytes
    Capture Length: 110 bytes
Ethernet II, Src: 00:09:b6:11:cd:fc, Dst: 00:04:76:3b:63:a6
    Destination: 00:04:76:3b:63:a6 (00:04:76:3b:63:a6)
    Source: 00:09:b6:11:cd:fc (00:09:b6:11:cd:fc)
    Type: IP (0x0800)
Internet Protocol, Src Addr: 10.0.73.29 (10.0.73.29), Dst Addr: 192.168.1.20 (198.202.70.20)
    Version: 4
    Header length: 20 bytes
    Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
        0000 00.. = Differentiated Services Codepoint: Default (0x00)
        .... ..0. = ECN-Capable Transport (ECT): 0
        .... ...0 = ECN-CE: 0
    Total Length: 96
    Identification: 0x574f
    Flags: 0x04
        .1.. = Don't fragment: Set
        ..0. = More fragments: Not set
    Fragment offset: 0
    Time to live: 63
    Protocol: TCP (0x06)
    Header checksum: 0x844d (correct)
    Source: 10.0.73.29 (10.0.73.29)
    Destination: 192.168.1.20 (198.202.70.20)
Transmission Control Protocol, Src Port: 722 (722), Dst Port: sunrpc (111), Seq: 2685755182, Ack: 2199399551, Len: 44
    Source port: 722 (722)
    Destination port: sunrpc (111)
    Sequence number: 2685755182
    Next sequence number: 2685755226
    Acknowledgement number: 2199399551
    Header length: 32 bytes
    Flags: 0x0018 (PSH, ACK)
        0... .... = Congestion Window Reduced (CWR): Not set
        .0.. .... = ECN-Echo: Not set
        ..0. .... = Urgent: Not set
        ...1 .... = Acknowledgment: Set
        .... 1... = Push: Set
        .... .0.. = Reset: Not set
        .... ..0. = Syn: Not set
        .... ...0 = Fin: Not set
    Window size: 5840
    Checksum: 0xbf86 (correct)
    Options: (12 bytes)
        NOP
        NOP
        Time stamp: tsval 133216, tsecr 131447901
Remote Procedure Call
    Fragment header: Last fragment, 40 bytes
        1... .... .... .... .... .... .... .... = Last Fragment: Yes
        .000 0000 0000 0000 0000 0000 0010 1000 = Fragment Length: 40
    XID: 0x524a6869 (1380608105)
    Message Type: Call (0)
    RPC Version: 2
    Program: Portmap (100000)
    Program Version: 2
    Procedure: DUMP (4)
    Credentials
        Flavor: AUTH_NULL (0)
        Length: 0
    Verifier
        Flavor: AUTH_NULL (0)
        Length: 0
Portmap
    Program Version: 2
    V2 Procedure: DUMP (4)

0000  00 04 76 3b 63 a6 00 09 b6 11 cd fc 08 00 45 00   ..v;c.........E.
0010  00 60 57 4f 40 00 3f 06 84 4d 0a 00 49 1d c6 ca   .`WO@.?..M..I...
0020  46 14 02 d2 00 6f a0 15 5f 2e 83 18 2c 7f 80 18   F....o.._...,...
0030  16 d0 bf 86 00 00 01 01 08 0a 00 02 08 60 07 d5   .............`..
0040  bc 5d 80 00 00 28 52 4a 68 69 00 00 00 00 00 00   .]...(RJhi......
0050  00 02 00 01 86 a0 00 00 00 02 00 00 00 04 00 00   ................
0060  00 00 00 00 00 00 00 00 00 00 00 00 00 00         ..............

Frame 27 (98 bytes on wire, 98 bytes captured)
    Arrival Time: Jul  8, 2003 18:24:11.870822000
    Time delta from previous packet: 0.012635000 seconds
    Time relative to first packet: 7.199825000 seconds
    Frame Number: 27
    Packet Length: 98 bytes
    Capture Length: 98 bytes
Ethernet II, Src: 00:09:b6:11:cd:fc, Dst: 00:04:76:3b:63:a6
    Destination: 00:04:76:3b:63:a6 (00:04:76:3b:63:a6)
    Source: 00:09:b6:11:cd:fc (00:09:b6:11:cd:fc)
    Type: IP (0x0800)
Internet Protocol, Src Addr: 10.0.73.29 (10.0.73.29), Dst Addr: 192.168.1.20 (198.202.70.20)
    Version: 4
    Header length: 20 bytes
    Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
        0000 00.. = Differentiated Services Codepoint: Default (0x00)
        .... ..0. = ECN-Capable Transport (ECT): 0
        .... ...0 = ECN-CE: 0
    Total Length: 84
    Identification: 0x0000
    Flags: 0x04
        .1.. = Don't fragment: Set
        ..0. = More fragments: Not set
    Fragment offset: 0
    Time to live: 63
    Protocol: UDP (0x11)
    Header checksum: 0xdb9d (correct)
    Source: 10.0.73.29 (10.0.73.29)
    Destination: 192.168.1.20 (198.202.70.20)
User Datagram Protocol, Src Port: 725 (725), Dst Port: sunrpc (111)
    Source port: 725 (725)
    Destination port: sunrpc (111)
    Length: 64
    Checksum: 0xb39d (correct)
Remote Procedure Call
    XID: 0x1832c2fd (405979901)
    Message Type: Call (0)
    RPC Version: 2
    Program: Portmap (100000)
    Program Version: 2
    Procedure: GETPORT (3)
    The reply to this request is in frame 30
    Credentials
        Flavor: AUTH_NULL (0)
        Length: 0
    Verifier
        Flavor: AUTH_NULL (0)
        Length: 0
Portmap
    Program Version: 2
    V2 Procedure: GETPORT (3)
    Program: NFS (100003)
    Version: 3
    Proto: UDP (17)
    Port: 0

0000  00 04 76 3b 63 a6 00 09 b6 11 cd fc 08 00 45 00   ..v;c.........E.
0010  00 54 00 00 40 00 3f 11 db 9d 0a 00 49 1d c6 ca   .T..@.?.....I...
0020  46 14 02 d5 00 6f 00 40 b3 9d 18 32 c2 fd 00 00   F....o.@...2....
0030  00 00 00 00 00 02 00 01 86 a0 00 00 00 02 00 00   ................
0040  00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
0050  00 00 00 01 86 a3 00 00 00 03 00 00 00 11 00 00   ................
0060  00 00                                             ..

Frame 30 (70 bytes on wire, 70 bytes captured)
    Arrival Time: Jul  8, 2003 18:24:11.871200000
    Time delta from previous packet: 0.000378000 seconds
    Time relative to first packet: 7.200203000 seconds
    Frame Number: 30
    Packet Length: 70 bytes
    Capture Length: 70 bytes
Ethernet II, Src: 00:04:76:3b:63:a6, Dst: 00:e0:81:20:3d:a4
    Destination: 00:e0:81:20:3d:a4 (00:e0:81:20:3d:a4)
    Source: 00:04:76:3b:63:a6 (00:04:76:3b:63:a6)
    Type: IP (0x0800)
Internet Protocol, Src Addr: 192.168.1.20 (198.202.70.20), Dst Addr: 10.0.73.29 (10.0.73.29)
    Version: 4
    Header length: 20 bytes
    Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
        0000 00.. = Differentiated Services Codepoint: Default (0x00)
        .... ..0. = ECN-Capable Transport (ECT): 0
        .... ...0 = ECN-CE: 0
    Total Length: 56
    Identification: 0x4658
    Flags: 0x00
        .0.. = Don't fragment: Not set
        ..0. = More fragments: Not set
    Fragment offset: 0
    Time to live: 64
    Protocol: UDP (0x11)
    Header checksum: 0xd461 (correct)
    Source: 192.168.1.20 (198.202.70.20)
    Destination: 10.0.73.29 (10.0.73.29)
User Datagram Protocol, Src Port: sunrpc (111), Dst Port: 725 (725)
    Source port: sunrpc (111)
    Destination port: 725 (725)
    Length: 36
    Checksum: 0xb934 (correct)
Remote Procedure Call
    XID: 0x1832c2fd (405979901)
    Message Type: Reply (1)
    Program: Portmap (100000)
    Program Version: 2
    Procedure: GETPORT (3)
    Reply State: accepted (0)
    This is a reply to a request in frame 27
    Time from request: 0.000378000 seconds
    Verifier
        Flavor: AUTH_NULL (0)
        Length: 0
    Accept State: RPC executed successfully (0)
Portmap
    Program Version: 2
    V2 Procedure: GETPORT (3)
    Port: 2049

0000  00 e0 81 20 3d a4 00 04 76 3b 63 a6 08 00 45 00   ... =...v;c...E.
0010  00 38 46 58 00 00 40 11 d4 61 c6 ca 46 14 0a 00   .8FX..@..a..F...
0020  49 1d 00 6f 02 d5 00 24 b9 34 18 32 c2 fd 00 00   I..o...$.4.2....
0030  00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
0040  00 00 00 00 08 01                                 ......

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: nfsroot clients hang while mounting second NFS server
  2003-07-11 19:07   ` Chris Adams
@ 2003-07-11 19:27     ` Trond Myklebust
  0 siblings, 0 replies; 4+ messages in thread
From: Trond Myklebust @ 2003-07-11 19:27 UTC (permalink / raw)
  To: Chris Adams; +Cc: nfs


The program is asking for NFSv3, but the reply gives a port for NFSv2.
Is your server supposed to support NFSv3 and, if not, have you tried
specifying the 'v2' mount option?

cheers,
  Trond


-------------------------------------------------------
This SF.Net email sponsored by: Parasoft
Error proof Web apps, automate testing & more.
Download & eval WebKing and get a free book.
www.parasoft.com/bulletproofapps1
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2003-07-11 19:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-07-11  1:23 nfsroot clients hang while mounting second NFS server Chris Adams
2003-07-11  9:34 ` Trond Myklebust
2003-07-11 19:07   ` Chris Adams
2003-07-11 19:27     ` Trond Myklebust

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.