* NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew @ 2016-02-24 21:43 Jason L Tibbitts III 2016-02-25 19:58 ` J. Bruce Fields 0 siblings, 1 reply; 27+ messages in thread From: Jason L Tibbitts III @ 2016-02-24 21:43 UTC (permalink / raw) To: linux-nfs My NFS infrastructure has servers running current RHEL7.2 (mostly kernel 3.10.0-327.4.5.el7 with a one-line patch needed to fix a soft lockup in nfs4_laundromat) and clients running current Fedora 23 (4.3.5-300.fc23.x86_64). Everything is mounted NFS4.1 with sec=krb5p. Occasionally a client will get into a state where it just hammers the server with network traffic, sometimes at full line rate, with: NFS: nfs4_reclaim_open_state: Lock reclaim failed! spewed to the log about 500 times a second. The load goes up quite a bit (to 5-7 or so). The machine isn't doing anything and there isn't even a user logged in. However, there are always a few user processes hanging around, usually kwin_x11 for whatever reason. (My guess is because of a lock on ~/.Xauthority.) When I kill those user processes, this is logged once: NFS: nfs4_reclaim_open_state: unhandled error -10068 -10068 is NFS4ERR_RETRY_UNCACHED_REP. Unfortunately I did not grab any of that traffic (I just wanted it to stop). This happens to me periodically so I'll be sure to do that when it hits again. One theory is that this is related to a user's kerberos ticket expiring. I see some hits when I search for the line that's spewed, but they're either not recent or or weren't reproducible. I don't find any hits for that specific unhandled error. - J< ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-02-24 21:43 NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew Jason L Tibbitts III @ 2016-02-25 19:58 ` J. Bruce Fields 2016-02-29 23:06 ` Jason L Tibbitts III 0 siblings, 1 reply; 27+ messages in thread From: J. Bruce Fields @ 2016-02-25 19:58 UTC (permalink / raw) To: Jason L Tibbitts III; +Cc: linux-nfs On Wed, Feb 24, 2016 at 03:43:45PM -0600, Jason L Tibbitts III wrote: > My NFS infrastructure has servers running current RHEL7.2 (mostly kernel > 3.10.0-327.4.5.el7 with a one-line patch needed to fix a soft lockup in > nfs4_laundromat) and clients running current Fedora 23 > (4.3.5-300.fc23.x86_64). Everything is mounted NFS4.1 with sec=krb5p. > > Occasionally a client will get into a state where it just hammers the > server with network traffic, sometimes at full line rate, with: > > NFS: nfs4_reclaim_open_state: Lock reclaim failed! > > spewed to the log about 500 times a second. The load goes up quite a > bit (to 5-7 or so). The machine isn't doing anything and there isn't > even a user logged in. However, there are always a few user processes > hanging around, usually kwin_x11 for whatever reason. (My guess is > because of a lock on ~/.Xauthority.) > > When I kill those user processes, this is logged once: > > NFS: nfs4_reclaim_open_state: unhandled error -10068 > > -10068 is NFS4ERR_RETRY_UNCACHED_REP. The only place the server sets that error is in fs/nfsd/nfs4state.c:nfsd4_enc_sequence_replay. If the server's correct, then the client attempted to resend a request that the server was not required to cache. In which case NFS4ERR_RETRY_UNCACHED_REP is a valid error, and the client should give up (or retry with a new slot/seqid?). In any case, something's wrong with the 4.1 reply caching logic on client or server..... > Unfortunately I did not grab any of that traffic (I just wanted it to > stop). This happens to me periodically so I'll be sure to do that when > it hits again. OK, that'd be helpful. Unfortunately what would probably be *most* helpful would be the traffic that lead up to this--by the time the client and server get into this loop the interesting problem may have already happened--but just seeing the loop may be useful too. --b. > One theory is that this is related to a user's kerberos ticket > expiring. I see some hits when I search for the line that's spewed, but > they're either not recent or or weren't reproducible. I don't find any > hits for that specific unhandled error. > > - J< > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-02-25 19:58 ` J. Bruce Fields @ 2016-02-29 23:06 ` Jason L Tibbitts III 2016-03-01 0:48 ` J. Bruce Fields 0 siblings, 1 reply; 27+ messages in thread From: Jason L Tibbitts III @ 2016-02-29 23:06 UTC (permalink / raw) To: J. Bruce Fields; +Cc: linux-nfs >>>>> "JBF" == J Bruce Fields <bfields@fieldses.org> writes: >> Unfortunately I did not grab any of that traffic (I just wanted it to >> stop). This happens to me periodically so I'll be sure to do that >> when it hits again. JBF> OK, that'd be helpful. I waited a bit and it's happened again from a few clients. I captured some traffic from one of them and it's just an endless stream of Call/Reply: 8 0.002842000 172.21.86.135 -> 172.21.86.78 NFS 406 V4 Call 9 0.003493000 172.21.86.78 -> 172.21.86.135 NFS 518 V4 Reply (Call In 8) 10 0.003536000 172.21.86.135 -> 172.21.86.78 NFS 406 V4 Call 11 0.004168000 172.21.86.78 -> 172.21.86.135 NFS 518 V4 Reply (Call In 10) 12 0.004252000 172.21.86.135 -> 172.21.86.78 NFS 406 V4 Call 13 0.004854000 172.21.86.78 -> 172.21.86.135 NFS 518 V4 Reply (Call In 12) 14 0.004931000 172.21.86.135 -> 172.21.86.78 NFS 406 V4 Call 15 0.005613000 172.21.86.78 -> 172.21.86.135 NFS 518 V4 Reply (Call In 14) Here's a call: Frame 8: 406 bytes on wire (3248 bits), 406 bytes captured (3248 bits) on interface 0 Interface id: 0 (eno1) Encapsulation type: Ethernet (1) Arrival Time: Feb 29, 2016 16:48:17.376573000 CST [Time shift for this packet: 0.000000000 seconds] Epoch Time: 1456786097.376573000 seconds [Time delta from previous captured frame: 0.000049000 seconds] [Time delta from previous displayed frame: 0.000000000 seconds] [Time since reference or first frame: 0.002842000 seconds] Frame Number: 8 Frame Length: 406 bytes (3248 bits) Capture Length: 406 bytes (3248 bits) [Frame is marked: False] [Frame is ignored: False] [Protocols in frame: eth:ethertype:ip:tcp:rpc:gss-api:spnego-krb5:spnego-krb5] Ethernet II, Src: AsustekC_6d:f1:7e (f0:79:59:6d:f1:7e), Dst: SuperMic_19:86:94 (00:25:90:19:86:94) Destination: SuperMic_19:86:94 (00:25:90:19:86:94) Address: SuperMic_19:86:94 (00:25:90:19:86:94) .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default) .... ...0 .... .... .... .... = IG bit: Individual address (unicast) Source: AsustekC_6d:f1:7e (f0:79:59:6d:f1:7e) Address: AsustekC_6d:f1:7e (f0:79:59:6d:f1:7e) .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default) .... ...0 .... .... .... .... = IG bit: Individual address (unicast) Type: IP (0x0800) Internet Protocol Version 4, Src: 172.21.86.135 (172.21.86.135), Dst: 172.21.86.78 (172.21.86.78) Version: 4 Header Length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00: Not-ECT (Not ECN-Capable Transport)) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..00 = Explicit Congestion Notification: Not-ECT (Not ECN-Capable Transport) (0x00) Total Length: 392 Identification: 0x79da (31194) Flags: 0x02 (Don't Fragment) 0... .... = Reserved bit: Not set .1.. .... = Don't fragment: Set ..0. .... = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: TCP (6) Header checksum: 0xba95 [validation disabled] [Good: False] [Bad: False] Source: 172.21.86.135 (172.21.86.135) Destination: 172.21.86.78 (172.21.86.78) [Source GeoIP: Unknown] [Destination GeoIP: Unknown] Transmission Control Protocol, Src Port: 863 (863), Dst Port: 2049 (2049), Seq: 1361, Ack: 2261, Len: 340 Source Port: 863 (863) Destination Port: 2049 (2049) [Stream index: 0] [TCP Segment Len: 340] Sequence number: 1361 (relative sequence number) [Next sequence number: 1701 (relative sequence number)] Acknowledgment number: 2261 (relative ack number) Header Length: 32 bytes .... 0000 0001 1000 = Flags: 0x018 (PSH, ACK) 000. .... .... = Reserved: Not set ...0 .... .... = Nonce: Not set .... 0... .... = Congestion Window Reduced (CWR): Not set .... .0.. .... = ECN-Echo: Not set .... ..0. .... = Urgent: Not set .... ...1 .... = Acknowledgment: Set .... .... 1... = Push: Set .... .... .0.. = Reset: Not set .... .... ..0. = Syn: Not set .... .... ...0 = Fin: Not set Window size value: 24576 [Calculated window size: 24576] [Window size scaling factor: -1 (unknown)] Checksum: 0x067b [validation disabled] [Good Checksum: False] [Bad Checksum: False] Urgent pointer: 0 Options: (12 bytes), No-Operation (NOP), No-Operation (NOP), Timestamps No-Operation (NOP) Type: 1 0... .... = Copy on fragmentation: No .00. .... = Class: Control (0) ...0 0001 = Number: No-Operation (NOP) (1) No-Operation (NOP) Type: 1 0... .... = Copy on fragmentation: No .00. .... = Class: Control (0) ...0 0001 = Number: No-Operation (NOP) (1) Timestamps: TSval 1220154220, TSecr 882989725 Kind: Time Stamp Option (8) Length: 10 Timestamp value: 1220154220 Timestamp echo reply: 882989725 [SEQ/ACK analysis] [This is an ACK to the segment in frame: 7] [The RTT to ACK the segment was: 0.000049000 seconds] [Bytes in flight: 340] Remote Procedure Call, Type:Call XID:0xffb2b86a Fragment header: Last fragment, 336 bytes 1... .... .... .... .... .... .... .... = Last Fragment: Yes .000 0000 0000 0000 0000 0001 0101 0000 = Fragment Length: 336 XID: 0xffb2b86a (4289902698) Message Type: Call (0) RPC Version: 2 Program: NFS (100003) Program Version: 4 Procedure: COMPOUND (1) Credentials Flavor: RPCSEC_GSS (6) Length: 28 GSS Version: 1 GSS Procedure: RPCSEC_GSS_DATA (0) GSS Sequence Number: 59958333 GSS Service: rpcsec_gss_svc_privacy (3) GSS Context GSS Context Length: 8 GSS Context: 6d35000000000000 Verifier Flavor: RPCSEC_GSS (6) GSS Token: 0000001c040404ffffffffff000000001bb5b424f2b2f47f... GSS Token Length: 28 GSS-API Generic Security Service Application Program Interface krb5_blob: 040404ffffffffff000000001bb5b424f2b2f47fb068d17e... krb5_tok_id: KRB_TOKEN_CFX_GetMic (0x0404) krb5_cfx_flags: 0x04 .... .1.. = AcceptorSubkey: Set .... ..0. = Sealed: Not set .... ...0 = SendByAcceptor: Not set krb5_filler: ffffffffff krb5_cfx_seq: 464892964 krb5_sgn_cksum: f2b2f47fb068d17e40fca641 GSS-Wrap Length: 236 GSS Data: 050406ff00000000000000001bb5b425d745e53e92ca2305... krb5_blob: 050406ff00000000000000001bb5b425d745e53e92ca2305... krb5_tok_id: KRB_TOKEN_CFX_WRAP (0x0405) krb5_cfx_flags: 0x06 .... .1.. = AcceptorSubkey: Set .... ..1. = Sealed: Set .... ...0 = SendByAcceptor: Not set krb5_filler: ff krb5_cfx_ec: 0 krb5_cfx_rrc: 0 krb5_cfx_seq: 464892965 krb5_sgn_cksum: d745e53e92ca2305e3b1f3bdbb6c710bd731fcc684b8dba8... Network File System [Program Version: 4] [V4 Procedure: COMPOUND (1)] And here's a reply: Frame 9: 518 bytes on wire (4144 bits), 518 bytes captured (4144 bits) on interface 0 Interface id: 0 (eno1) Encapsulation type: Ethernet (1) Arrival Time: Feb 29, 2016 16:48:17.377224000 CST [Time shift for this packet: 0.000000000 seconds] Epoch Time: 1456786097.377224000 seconds [Time delta from previous captured frame: 0.000651000 seconds] [Time delta from previous displayed frame: 0.000000000 seconds] [Time since reference or first frame: 0.003493000 seconds] Frame Number: 9 Frame Length: 518 bytes (4144 bits) Capture Length: 518 bytes (4144 bits) [Frame is marked: False] [Frame is ignored: False] [Protocols in frame: eth:ethertype:ip:tcp:rpc:gss-api:spnego-krb5:spnego-krb5] Ethernet II, Src: SuperMic_19:86:94 (00:25:90:19:86:94), Dst: AsustekC_6d:f1:7e (f0:79:59:6d:f1:7e) Destination: AsustekC_6d:f1:7e (f0:79:59:6d:f1:7e) Address: AsustekC_6d:f1:7e (f0:79:59:6d:f1:7e) .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default) .... ...0 .... .... .... .... = IG bit: Individual address (unicast) Source: SuperMic_19:86:94 (00:25:90:19:86:94) Address: SuperMic_19:86:94 (00:25:90:19:86:94) .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default) .... ...0 .... .... .... .... = IG bit: Individual address (unicast) Type: IP (0x0800) Internet Protocol Version 4, Src: 172.21.86.78 (172.21.86.78), Dst: 172.21.86.135 (172.21.86.135) Version: 4 Header Length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00: Not-ECT (Not ECN-Capable Transport)) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..00 = Explicit Congestion Notification: Not-ECT (Not ECN-Capable Transport) (0x00) Total Length: 504 Identification: 0x2bf6 (11254) Flags: 0x02 (Don't Fragment) 0... .... = Reserved bit: Not set .1.. .... = Don't fragment: Set ..0. .... = More fragments: Not set Fragment offset: 0 Time to live: 64 Protocol: TCP (6) Header checksum: 0x080a [validation disabled] [Good: False] [Bad: False] Source: 172.21.86.78 (172.21.86.78) Destination: 172.21.86.135 (172.21.86.135) [Source GeoIP: Unknown] [Destination GeoIP: Unknown] Transmission Control Protocol, Src Port: 2049 (2049), Dst Port: 863 (863), Seq: 2261, Ack: 1701, Len: 452 Source Port: 2049 (2049) Destination Port: 863 (863) [Stream index: 0] [TCP Segment Len: 452] Sequence number: 2261 (relative sequence number) [Next sequence number: 2713 (relative sequence number)] Acknowledgment number: 1701 (relative ack number) Header Length: 32 bytes .... 0000 0001 1000 = Flags: 0x018 (PSH, ACK) 000. .... .... = Reserved: Not set ...0 .... .... = Nonce: Not set .... 0... .... = Congestion Window Reduced (CWR): Not set .... .0.. .... = ECN-Echo: Not set .... ..0. .... = Urgent: Not set .... ...1 .... = Acknowledgment: Set .... .... 1... = Push: Set .... .... .0.. = Reset: Not set .... .... ..0. = Syn: Not set .... .... ...0 = Fin: Not set Window size value: 32885 [Calculated window size: 32885] [Window size scaling factor: -1 (unknown)] Checksum: 0x8ac4 [validation disabled] [Good Checksum: False] [Bad Checksum: False] Urgent pointer: 0 Options: (12 bytes), No-Operation (NOP), No-Operation (NOP), Timestamps No-Operation (NOP) Type: 1 0... .... = Copy on fragmentation: No .00. .... = Class: Control (0) ...0 0001 = Number: No-Operation (NOP) (1) No-Operation (NOP) Type: 1 0... .... = Copy on fragmentation: No .00. .... = Class: Control (0) ...0 0001 = Number: No-Operation (NOP) (1) Timestamps: TSval 882989725, TSecr 1220154220 Kind: Time Stamp Option (8) Length: 10 Timestamp value: 882989725 Timestamp echo reply: 1220154220 [SEQ/ACK analysis] [This is an ACK to the segment in frame: 8] [The RTT to ACK the segment was: 0.000651000 seconds] [Bytes in flight: 452] Remote Procedure Call, Type:Reply XID:0xffb2b86a Fragment header: Last fragment, 448 bytes 1... .... .... .... .... .... .... .... = Last Fragment: Yes .000 0000 0000 0000 0000 0001 1100 0000 = Fragment Length: 448 XID: 0xffb2b86a (4289902698) Message Type: Reply (1) [Program: NFS (100003)] [Program Version: 4] [Procedure: COMPOUND (1)] Reply State: accepted (0) [This is a reply to a request in frame 8] [Time from request: 0.000651000 seconds] Verifier Flavor: RPCSEC_GSS (6) GSS Token: 0000001c040405ffffffffff000000003480887866d9d129... GSS Token Length: 28 GSS-API Generic Security Service Application Program Interface krb5_blob: 040405ffffffffff000000003480887866d9d129415f2ea7... krb5_tok_id: KRB_TOKEN_CFX_GetMic (0x0404) krb5_cfx_flags: 0x05 .... .1.. = AcceptorSubkey: Set .... ..0. = Sealed: Not set .... ...1 = SendByAcceptor: Set krb5_filler: ffffffffff krb5_cfx_seq: 880838776 krb5_sgn_cksum: 66d9d129415f2ea76c509189 Accept State: RPC executed successfully (0) GSS-Wrap Length: 392 GSS Data: 050407ff000000000000000034808879d55c1d7af7af82f5... krb5_blob: 050407ff000000000000000034808879d55c1d7af7af82f5... krb5_tok_id: KRB_TOKEN_CFX_WRAP (0x0405) krb5_cfx_flags: 0x07 .... .1.. = AcceptorSubkey: Set .... ..1. = Sealed: Set .... ...1 = SendByAcceptor: Set krb5_filler: ff krb5_cfx_ec: 0 krb5_cfx_rrc: 0 krb5_cfx_seq: 880838777 krb5_sgn_cksum: d55c1d7af7af82f5384db9579420a0e8eb2bf20013a66022... Network File System [Program Version: 4] [V4 Procedure: COMPOUND (1)] The calls and replies all appear to be identical. Sorry for the length of those but I wouldn't want to trim anything that might be important. JBF> Unfortunately what would probably be *most* helpful would be the JBF> traffic that lead up to this--by the time the client and server get JBF> into this loop the interesting problem may have already JBF> happened--but just seeing the loop may be useful too. Yeah, there's basically no chance of capturing that, and I have no way to make it happen. I can't just snarf all of the NFS traffic, and once this starts it generates so many packets.... Note that this doesn't appear to be caused by something like a kerberos ticket; the user last logged in well under the maximum ticket renewal time, and SSSD has been dutifully renewing them. - J< ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-02-29 23:06 ` Jason L Tibbitts III @ 2016-03-01 0:48 ` J. Bruce Fields 2016-03-01 0:53 ` Jason L Tibbitts III 0 siblings, 1 reply; 27+ messages in thread From: J. Bruce Fields @ 2016-03-01 0:48 UTC (permalink / raw) To: Jason L Tibbitts III; +Cc: linux-nfs On Mon, Feb 29, 2016 at 05:06:35PM -0600, Jason L Tibbitts III wrote: > >>>>> "JBF" == J Bruce Fields <bfields@fieldses.org> writes: > > >> Unfortunately I did not grab any of that traffic (I just wanted it to > >> stop). This happens to me periodically so I'll be sure to do that > >> when it hits again. > > JBF> OK, that'd be helpful. > > I waited a bit and it's happened again from a few clients. I captured > some traffic from one of them and it's just an endless stream of > Call/Reply: > > 8 0.002842000 172.21.86.135 -> 172.21.86.78 NFS 406 V4 Call > 9 0.003493000 172.21.86.78 -> 172.21.86.135 NFS 518 V4 Reply (Call In 8) > 10 0.003536000 172.21.86.135 -> 172.21.86.78 NFS 406 V4 Call > 11 0.004168000 172.21.86.78 -> 172.21.86.135 NFS 518 V4 Reply (Call In 10) > 12 0.004252000 172.21.86.135 -> 172.21.86.78 NFS 406 V4 Call > 13 0.004854000 172.21.86.78 -> 172.21.86.135 NFS 518 V4 Reply (Call In 12) > 14 0.004931000 172.21.86.135 -> 172.21.86.78 NFS 406 V4 Call > 15 0.005613000 172.21.86.78 -> 172.21.86.135 NFS 518 V4 Reply (Call In 14) > > > Here's a call: > > GSS Service: rpcsec_gss_svc_privacy (3) Argh, it's all encrypted, so we all we have to go on is the size of the request and reply: > GSS-Wrap > Length: 236 ... > And here's a reply: ... > GSS-Wrap > Length: 392 Anyway, just knowing the error may be enough for us to work out the problem--I just need to dig a little more. > The calls and replies all appear to be identical. Sorry for the length > of those but I wouldn't want to trim anything that might be important. > > JBF> Unfortunately what would probably be *most* helpful would be the > JBF> traffic that lead up to this--by the time the client and server get > JBF> into this loop the interesting problem may have already > JBF> happened--but just seeing the loop may be useful too. > > Yeah, there's basically no chance of capturing that, and I have no way > to make it happen. I can't just snarf all of the NFS traffic, and once > this starts it generates so many packets.... The best you could do is capture all traffic and throw away all but the last few seconds (see the ring buffer stuff in tshark) and write a script that kills the capture as soon as it notices you've hit this condition. Anyway, possibly still impractical and not worth the trouble at this point anyway. > Note that this doesn't appear to be caused by something like a kerberos > ticket; the user last logged in well under the maximum ticket renewal > time, and SSSD has been dutifully renewing them. OK, thanks. --b. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-03-01 0:48 ` J. Bruce Fields @ 2016-03-01 0:53 ` Jason L Tibbitts III 2016-03-01 1:01 ` J. Bruce Fields 0 siblings, 1 reply; 27+ messages in thread From: Jason L Tibbitts III @ 2016-03-01 0:53 UTC (permalink / raw) To: J. Bruce Fields; +Cc: linux-nfs >>>>> "JBF" == J Bruce Fields <bfields@fieldses.org> writes: JBF> Argh, it's all encrypted, so we all we have to go on is the size of JBF> the request and reply: Yeah, I'm not sure how to get around that. I know tshark can take a keytab, but the user's key is in the kernel keyring and I'm not at all sure how to dig it out (assuming I even can). But if there's something I can do, I can try. I could switch to krb5i or plain krb for a while if that would be useful, except that the clients would prefer krb5p if it's exported, and if I stop exporting it then existing mounts break.... I'd have to schedule downtime and kick everyone off, which I could do if it would help. JBF> The best you could do is capture all traffic and throw away all but JBF> the last few seconds (see the ring buffer stuff in tshark) and JBF> write a script that kills the capture as soon as it notices you've JBF> hit this condition. If I knew how to detect the condition, though, I have a feeling that would be enough information to track down the bug anyway. Also, I'd have to do this for a couple of hundred clients. Ugh. - J< ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-03-01 0:53 ` Jason L Tibbitts III @ 2016-03-01 1:01 ` J. Bruce Fields 2016-03-01 1:03 ` Jason L Tibbitts III 0 siblings, 1 reply; 27+ messages in thread From: J. Bruce Fields @ 2016-03-01 1:01 UTC (permalink / raw) To: Jason L Tibbitts III; +Cc: linux-nfs On Mon, Feb 29, 2016 at 06:53:15PM -0600, Jason L Tibbitts III wrote: > >>>>> "JBF" == J Bruce Fields <bfields@fieldses.org> writes: > > JBF> Argh, it's all encrypted, so we all we have to go on is the size of > JBF> the request and reply: > > Yeah, I'm not sure how to get around that. I know tshark can take a > keytab, but the user's key is in the kernel keyring and I'm not at all > sure how to dig it out (assuming I even can). But if there's something > I can do, I can try. > > I could switch to krb5i or plain krb for a while if that would be > useful, except that the clients would prefer krb5p if it's exported, and > if I stop exporting it then existing mounts break.... I'd have to > schedule downtime and kick everyone off, which I could do if it would > help. I believe the order of flavors in the sec= option is the order they're given to the client, so reversing it might cause new clients to make a different choice--but I'm not actually sure of the logic there. > > JBF> The best you could do is capture all traffic and throw away all but > JBF> the last few seconds (see the ring buffer stuff in tshark) and > JBF> write a script that kills the capture as soon as it notices you've > JBF> hit this condition. > > If I knew how to detect the condition, though, I have a feeling that > would be enough information to track down the bug anyway. > > Also, I'd have to do this for a couple of hundred clients. Ugh. Yeah, don't worry about it, I was just thinking aloud. --b. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-03-01 1:01 ` J. Bruce Fields @ 2016-03-01 1:03 ` Jason L Tibbitts III 2016-11-16 20:55 ` Jason L Tibbitts III 0 siblings, 1 reply; 27+ messages in thread From: Jason L Tibbitts III @ 2016-03-01 1:03 UTC (permalink / raw) To: J. Bruce Fields; +Cc: linux-nfs >>>>> "JBF" == J Bruce Fields <bfields@fieldses.org> writes: JBF> I believe the order of flavors in the sec= option is the order JBF> they're given to the client, so reversing it might cause new JBF> clients to make a different choice--but I'm not actually sure of JBF> the logic there. I've found it not to actually be stable; different Linux versions will do different things. But then, almost all of the desktops are now running a single OS release so unless it's simply random, that should be good enough. I'll add krb5 to the exports and try to catch this again. - J< ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-03-01 1:03 ` Jason L Tibbitts III @ 2016-11-16 20:55 ` Jason L Tibbitts III 2016-11-17 16:31 ` J. Bruce Fields 0 siblings, 1 reply; 27+ messages in thread From: Jason L Tibbitts III @ 2016-11-16 20:55 UTC (permalink / raw) To: linux-nfs I'm replying to a rather old message, but the issue has just now popped back up again. To recap, a client stops being able to access _any_ mount on a particular server, and "NFS: nfs4_reclaim_open_state: Lock reclaim failed!" appears several hundred times per second in the kernel log. The load goes up by one for ever process attempting to access any mount from that particular server. Mounts to other servers are fine, and other clients can mount things from that one server without problems. When I kill every process keeping that particular mount active and then umount it, I see: NFS: nfs4_reclaim_open_state: unhandled error -10068 The server in this case is running Centos 7: 3.10.0-327.36.3.el7.x86_64 The client is one of the last I have that's still running Fedora 23: 4.7.10-100.fc23.x86_64 The big difference now, besides the updated kernel versions, is that fact that no kerberos is involved. It's a sec=sys mount. There are kerberized exports on the same server, but the client wasn't accessing any of them. There were other kerberized mounts on the client, but they were to different servers and they remained accessible. Here's ten frames from a packet capture. I think it's all repeating at that point so maybe it will be enough, but I can provide a more complete capture log on request. - J< Frame 1: 390 bytes on wire (3120 bits), 390 bytes captured (3120 bits) on interface 0 Ethernet II, Src: SuperMic_19:8e:6e (00:25:90:19:8e:6e), Dst: AsrockIn_84:34:e7 (d0:50:99:84:34:e7) Internet Protocol Version 4, Src: 172.21.86.74, Dst: 172.21.86.141 Transmission Control Protocol, Src Port: 2049, Dst Port: 997, Seq: 1, Ack: 1, Len: 324 Remote Procedure Call Frame 2: 322 bytes on wire (2576 bits), 322 bytes captured (2576 bits) on interface 0 Ethernet II, Src: AsrockIn_84:34:e7 (d0:50:99:84:34:e7), Dst: SuperMic_19:8e:6e (00:25:90:19:8e:6e) Internet Protocol Version 4, Src: 172.21.86.141, Dst: 172.21.86.74 Transmission Control Protocol, Src Port: 997, Dst Port: 2049, Seq: 1, Ack: 325, Len: 256 Remote Procedure Call, Type:Call XID:0x29009a8c Network File System, Ops(5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR [Program Version: 4] [V4 Procedure: COMPOUND (1)] Tag: <EMPTY> length: 0 contents: <EMPTY> minorversion: 1 Operations (count: 5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR Opcode: SEQUENCE (53) sessionid: f0f40f58bd020000e002000000000000 seqid: 0x002b3189 slot id: 0 high slot id: 0 cache this?: Yes Opcode: PUTFH (22) filehandle length: 28 [hash (CRC-32): 0xdcaec4f7] filehandle: 010006016aeda54305f66ede000000000000000096484800... Opcode: OPEN (18) seqid: 0x00000000 share_access: OPEN4_SHARE_ACCESS_BOTH (3) share_deny: OPEN4_SHARE_DENY_NONE (0) clientid: 0xf0f40f58bd020000 owner: <DATA> length: 24 contents: <DATA> Open Type: OPEN4_NOCREATE (0) Claim Type: CLAIM_FH (4) Opcode: ACCESS (3), [Check: RD MD XT XE] Check access: 0x2d .... ...1 = 0x01 READ: allowed? .... .1.. = 0x04 MODIFY: allowed? .... 1... = 0x08 EXTEND: allowed? ..1. .... = 0x20 EXECUTE: allowed? Opcode: GETATTR (9) Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId) reqd_attr: Type (1) reqd_attr: Change (3) reqd_attr: Size (4) reqd_attr: FSID (8) reco_attr: FileId (20) Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId) reco_attr: Mode (33) reco_attr: NumLinks (35) reco_attr: Owner (36) reco_attr: Owner_Group (37) reco_attr: RawDev (41) reco_attr: Space_Used (45) reco_attr: Time_Access (47) reco_attr: Time_Metadata (52) reco_attr: Time_Modify (53) reco_attr: Mounted_on_FileId (55) [Main Opcode: OPEN (18)] Frame 3: 390 bytes on wire (3120 bits), 390 bytes captured (3120 bits) on interface 0 Ethernet II, Src: SuperMic_19:8e:6e (00:25:90:19:8e:6e), Dst: AsrockIn_84:34:e7 (d0:50:99:84:34:e7) Internet Protocol Version 4, Src: 172.21.86.74, Dst: 172.21.86.141 Transmission Control Protocol, Src Port: 2049, Dst Port: 997, Seq: 325, Ack: 257, Len: 324 Remote Procedure Call, Type:Reply XID:0x29009a8c Network File System, Ops(5): SEQUENCE PUTFH OPEN ACCESS GETATTR [Program Version: 4] [V4 Procedure: COMPOUND (1)] Status: NFS4_OK (0) Tag: <EMPTY> length: 0 contents: <EMPTY> Operations (count: 5) Opcode: SEQUENCE (53) Status: NFS4_OK (0) sessionid: f0f40f58bd020000e002000000000000 seqid: 0x002b3189 slot id: 0 high slot id: 30 target high slot id: 30 status flags: 0x00000040, SEQ4_STATUS_RECALLABLE_STATE_REVOKED .... .... .... .... .... .... .... ...0 = SEQ4_STATUS_CB_PATH_DOWN: Not set .... .... .... .... .... .... .... ..0. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING: Not set .... .... .... .... .... .... .... .0.. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED: Not set .... .... .... .... .... .... .... 0... = SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED: Not set .... .... .... .... .... .... ...0 .... = SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED: Not set .... .... .... .... .... .... ..0. .... = SEQ4_STATUS_ADMIN_STATE_REVOKED: Not set .... .... .... .... .... .... .1.. .... = SEQ4_STATUS_RECALLABLE_STATE_REVOKED: Set .... .... .... .... .... .... 0... .... = SEQ4_STATUS_LEASE_MOVED: Not set .... .... .... .... .... ...0 .... .... = SEQ4_STATUS_RESTART_RECLAIM_NEEDED: Not set .... .... .... .... .... ..0. .... .... = SEQ4_STATUS_CB_PATH_DOWN_SESSION: Not set .... .... .... .... .... .0.. .... .... = SEQ4_STATUS_BACKCHANNEL_FAULT: Not set .... .... .... .... .... 0... .... .... = SEQ4_STATUS_DEVID_CHANGED: Not set .... .... .... .... ...0 .... .... .... = SEQ4_STATUS_DEVID_DELETED: Not set Opcode: PUTFH (22) Status: NFS4_OK (0) Opcode: OPEN (18) Status: NFS4_OK (0) stateid [StateID Hash: 0xb207] seqid: 0x00158249 Data: f0f40f58bd02000092000000 [Data hash (CRC-32): 0xf3e67f06] change_info Atomic: No changeid (before): 0 changeid (after): 0 result flags: 0x00000004, locktype posix .... .... .... .... .... .... .... ..0. = confirm: False .... .... .... .... .... .... .... .1.. = locktype posix: True .... .... .... .... .... .... .... 0... = preserve unlinked: False .... .... .... .... .... .... ..0. .... = may notify lock: False Delegation Type: OPEN_DELEGATE_NONE (0) Opcode: ACCESS (3), [Access Denied: XE], [Allowed: RD MD XT] Status: NFS4_OK (0) Supported types (of requested): 0x2d .... ...1 = 0x01 READ: supported .... .1.. = 0x04 MODIFY: supported .... 1... = 0x08 EXTEND: supported ..1. .... = 0x20 EXECUTE: supported Access rights (of requested): 0x0d .... ...1 = 0x01 READ: allowed .... .1.. = 0x04 MODIFY: allowed .... 1... = 0x08 EXTEND: allowed ..0. .... = 0x20 EXECUTE: *Access Denied* Opcode: GETATTR (9) Status: NFS4_OK (0) Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId) reqd_attr: Type (1) ftype4: NF4REG (1) reqd_attr: Change (3) changeid: 6353654913397708009 reqd_attr: Size (4) size: 32768 reqd_attr: FSID (8) fattr4_fsid fsid4.major: 7704996244782214878 fsid4.minor: 0 reco_attr: FileId (20) fileid: 4737174 Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId) reco_attr: Mode (33) mode: 0644, Name: Unknown, Read permission for owner, Write permission for owner, Read permission for group, Read permission for others .... .... .... .... 000. .... .... .... = Name: Unknown (0) .... .... .... .... .... 0... .... .... = Set user id on exec: No .... .... .... .... .... .0.. .... .... = Set group id on exec: No .... .... .... .... .... ..0. .... .... = Save swapped text even after use: No .... .... .... .... .... ...1 .... .... = Read permission for owner: Yes .... .... .... .... .... .... 1... .... = Write permission for owner: Yes .... .... .... .... .... .... .0.. .... = Execute permission for owner: No .... .... .... .... .... .... ..1. .... = Read permission for group: Yes .... .... .... .... .... .... ...0 .... = Write permission for group: No .... .... .... .... .... .... .... 0... = Execute permission for group: No .... .... .... .... .... .... .... .1.. = Read permission for others: Yes .... .... .... .... .... .... .... ..0. = Write permission for others: No .... .... .... .... .... .... .... ...0 = Execute permission for others: No reco_attr: NumLinks (35) numlinks: 1 reco_attr: Owner (36) fattr4_owner: 7143 length: 4 contents: 7143 reco_attr: Owner_Group (37) fattr4_owner_group: 1112 length: 4 contents: 1112 reco_attr: RawDev (41) specdata1: 0 specdata2: 0 reco_attr: Space_Used (45) space_used: 32768 reco_attr: Time_Access (47) seconds: 1479318517 nseconds: 593359867 reco_attr: Time_Metadata (52) seconds: 1479325563 nseconds: 175920361 reco_attr: Time_Modify (53) seconds: 1479325563 nseconds: 175920361 reco_attr: Mounted_on_FileId (55) fileid: 0x0000000000484896 [Main Opcode: OPEN (18)] Frame 4: 322 bytes on wire (2576 bits), 322 bytes captured (2576 bits) on interface 0 Ethernet II, Src: AsrockIn_84:34:e7 (d0:50:99:84:34:e7), Dst: SuperMic_19:8e:6e (00:25:90:19:8e:6e) Internet Protocol Version 4, Src: 172.21.86.141, Dst: 172.21.86.74 Transmission Control Protocol, Src Port: 997, Dst Port: 2049, Seq: 257, Ack: 649, Len: 256 Remote Procedure Call, Type:Call XID:0x2a009a8c Network File System, Ops(5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR [Program Version: 4] [V4 Procedure: COMPOUND (1)] Tag: <EMPTY> length: 0 contents: <EMPTY> minorversion: 1 Operations (count: 5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR Opcode: SEQUENCE (53) sessionid: f0f40f58bd020000e002000000000000 seqid: 0x002b318a slot id: 0 high slot id: 0 cache this?: Yes Opcode: PUTFH (22) filehandle length: 28 [hash (CRC-32): 0x26ab9d8d] filehandle: 010006016aeda54305f66ede00000000000000009a424800... Opcode: OPEN (18) seqid: 0x00000000 share_access: OPEN4_SHARE_ACCESS_BOTH (3) share_deny: OPEN4_SHARE_DENY_NONE (0) clientid: 0xf0f40f58bd020000 owner: <DATA> length: 24 contents: <DATA> Open Type: OPEN4_NOCREATE (0) Claim Type: CLAIM_FH (4) Opcode: ACCESS (3), [Check: RD MD XT XE] Check access: 0x2d .... ...1 = 0x01 READ: allowed? .... .1.. = 0x04 MODIFY: allowed? .... 1... = 0x08 EXTEND: allowed? ..1. .... = 0x20 EXECUTE: allowed? Opcode: GETATTR (9) Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId) reqd_attr: Type (1) reqd_attr: Change (3) reqd_attr: Size (4) reqd_attr: FSID (8) reco_attr: FileId (20) Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId) reco_attr: Mode (33) reco_attr: NumLinks (35) reco_attr: Owner (36) reco_attr: Owner_Group (37) reco_attr: RawDev (41) reco_attr: Space_Used (45) reco_attr: Time_Access (47) reco_attr: Time_Metadata (52) reco_attr: Time_Modify (53) reco_attr: Mounted_on_FileId (55) [Main Opcode: OPEN (18)] Frame 5: 322 bytes on wire (2576 bits), 322 bytes captured (2576 bits) on interface 0 Ethernet II, Src: AsrockIn_84:34:e7 (d0:50:99:84:34:e7), Dst: SuperMic_19:8e:6e (00:25:90:19:8e:6e) Internet Protocol Version 4, Src: 172.21.86.141, Dst: 172.21.86.74 Transmission Control Protocol, Src Port: 997, Dst Port: 2049, Seq: 769, Ack: 1297, Len: 256 Remote Procedure Call, Type:Call XID:0x2c009a8c Network File System, Ops(5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR [Program Version: 4] [V4 Procedure: COMPOUND (1)] Tag: <EMPTY> length: 0 contents: <EMPTY> minorversion: 1 Operations (count: 5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR Opcode: SEQUENCE (53) sessionid: f0f40f58bd020000e002000000000000 seqid: 0x002b318c slot id: 0 high slot id: 0 cache this?: Yes Opcode: PUTFH (22) filehandle length: 28 [hash (CRC-32): 0x26ab9d8d] filehandle: 010006016aeda54305f66ede00000000000000009a424800... Opcode: OPEN (18) seqid: 0x00000000 share_access: OPEN4_SHARE_ACCESS_BOTH (3) share_deny: OPEN4_SHARE_DENY_NONE (0) clientid: 0xf0f40f58bd020000 owner: <DATA> length: 24 contents: <DATA> Open Type: OPEN4_NOCREATE (0) Claim Type: CLAIM_FH (4) Opcode: ACCESS (3), [Check: RD MD XT XE] Check access: 0x2d .... ...1 = 0x01 READ: allowed? .... .1.. = 0x04 MODIFY: allowed? .... 1... = 0x08 EXTEND: allowed? ..1. .... = 0x20 EXECUTE: allowed? Opcode: GETATTR (9) Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId) reqd_attr: Type (1) reqd_attr: Change (3) reqd_attr: Size (4) reqd_attr: FSID (8) reco_attr: FileId (20) Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId) reco_attr: Mode (33) reco_attr: NumLinks (35) reco_attr: Owner (36) reco_attr: Owner_Group (37) reco_attr: RawDev (41) reco_attr: Space_Used (45) reco_attr: Time_Access (47) reco_attr: Time_Metadata (52) reco_attr: Time_Modify (53) reco_attr: Mounted_on_FileId (55) [Main Opcode: OPEN (18)] Frame 6: 390 bytes on wire (3120 bits), 390 bytes captured (3120 bits) on interface 0 Ethernet II, Src: SuperMic_19:8e:6e (00:25:90:19:8e:6e), Dst: AsrockIn_84:34:e7 (d0:50:99:84:34:e7) Internet Protocol Version 4, Src: 172.21.86.74, Dst: 172.21.86.141 Transmission Control Protocol, Src Port: 2049, Dst Port: 997, Seq: 1297, Ack: 1025, Len: 324 Remote Procedure Call, Type:Reply XID:0x2c009a8c Network File System, Ops(5): SEQUENCE PUTFH OPEN ACCESS GETATTR [Program Version: 4] [V4 Procedure: COMPOUND (1)] Status: NFS4_OK (0) Tag: <EMPTY> length: 0 contents: <EMPTY> Operations (count: 5) Opcode: SEQUENCE (53) Status: NFS4_OK (0) sessionid: f0f40f58bd020000e002000000000000 seqid: 0x002b318c slot id: 0 high slot id: 30 target high slot id: 30 status flags: 0x00000040, SEQ4_STATUS_RECALLABLE_STATE_REVOKED .... .... .... .... .... .... .... ...0 = SEQ4_STATUS_CB_PATH_DOWN: Not set .... .... .... .... .... .... .... ..0. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING: Not set .... .... .... .... .... .... .... .0.. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED: Not set .... .... .... .... .... .... .... 0... = SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED: Not set .... .... .... .... .... .... ...0 .... = SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED: Not set .... .... .... .... .... .... ..0. .... = SEQ4_STATUS_ADMIN_STATE_REVOKED: Not set .... .... .... .... .... .... .1.. .... = SEQ4_STATUS_RECALLABLE_STATE_REVOKED: Set .... .... .... .... .... .... 0... .... = SEQ4_STATUS_LEASE_MOVED: Not set .... .... .... .... .... ...0 .... .... = SEQ4_STATUS_RESTART_RECLAIM_NEEDED: Not set .... .... .... .... .... ..0. .... .... = SEQ4_STATUS_CB_PATH_DOWN_SESSION: Not set .... .... .... .... .... .0.. .... .... = SEQ4_STATUS_BACKCHANNEL_FAULT: Not set .... .... .... .... .... 0... .... .... = SEQ4_STATUS_DEVID_CHANGED: Not set .... .... .... .... ...0 .... .... .... = SEQ4_STATUS_DEVID_DELETED: Not set Opcode: PUTFH (22) Status: NFS4_OK (0) Opcode: OPEN (18) Status: NFS4_OK (0) stateid [StateID Hash: 0xb9c3] seqid: 0x0015824c Data: f0f40f58bd0200009e000000 [Data hash (CRC-32): 0xb930c0be] change_info Atomic: No changeid (before): 0 changeid (after): 0 result flags: 0x00000004, locktype posix .... .... .... .... .... .... .... ..0. = confirm: False .... .... .... .... .... .... .... .1.. = locktype posix: True .... .... .... .... .... .... .... 0... = preserve unlinked: False .... .... .... .... .... .... ..0. .... = may notify lock: False Delegation Type: OPEN_DELEGATE_NONE (0) Opcode: ACCESS (3), [Access Denied: XE], [Allowed: RD MD XT] Status: NFS4_OK (0) Supported types (of requested): 0x2d .... ...1 = 0x01 READ: supported .... .1.. = 0x04 MODIFY: supported .... 1... = 0x08 EXTEND: supported ..1. .... = 0x20 EXECUTE: supported Access rights (of requested): 0x0d .... ...1 = 0x01 READ: allowed .... .1.. = 0x04 MODIFY: allowed .... 1... = 0x08 EXTEND: allowed ..0. .... = 0x20 EXECUTE: *Access Denied* Opcode: GETATTR (9) Status: NFS4_OK (0) Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId) reqd_attr: Type (1) ftype4: NF4REG (1) reqd_attr: Change (3) changeid: 6310935438380194724 reqd_attr: Size (4) size: 24576 reqd_attr: FSID (8) fattr4_fsid fsid4.major: 7704996244782214878 fsid4.minor: 0 reco_attr: FileId (20) fileid: 4735642 Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId) reco_attr: Mode (33) mode: 0644, Name: Unknown, Read permission for owner, Write permission for owner, Read permission for group, Read permission for others .... .... .... .... 000. .... .... .... = Name: Unknown (0) .... .... .... .... .... 0... .... .... = Set user id on exec: No .... .... .... .... .... .0.. .... .... = Set group id on exec: No .... .... .... .... .... ..0. .... .... = Save swapped text even after use: No .... .... .... .... .... ...1 .... .... = Read permission for owner: Yes .... .... .... .... .... .... 1... .... = Write permission for owner: Yes .... .... .... .... .... .... .0.. .... = Execute permission for owner: No .... .... .... .... .... .... ..1. .... = Read permission for group: Yes .... .... .... .... .... .... ...0 .... = Write permission for group: No .... .... .... .... .... .... .... 0... = Execute permission for group: No .... .... .... .... .... .... .... .1.. = Read permission for others: Yes .... .... .... .... .... .... .... ..0. = Write permission for others: No .... .... .... .... .... .... .... ...0 = Execute permission for others: No reco_attr: NumLinks (35) numlinks: 1 reco_attr: Owner (36) fattr4_owner: 7143 length: 4 contents: 7143 reco_attr: Owner_Group (37) fattr4_owner_group: 1112 length: 4 contents: 1112 reco_attr: RawDev (41) specdata1: 0 specdata2: 0 reco_attr: Space_Used (45) space_used: 24576 reco_attr: Time_Access (47) seconds: 1479325563 nseconds: 305919836 reco_attr: Time_Metadata (52) seconds: 1469379160 nseconds: 756243364 reco_attr: Time_Modify (53) seconds: 1469379160 nseconds: 756243364 reco_attr: Mounted_on_FileId (55) fileid: 0x000000000048429a [Main Opcode: OPEN (18)] Frame 7: 322 bytes on wire (2576 bits), 322 bytes captured (2576 bits) on interface 0 Ethernet II, Src: AsrockIn_84:34:e7 (d0:50:99:84:34:e7), Dst: SuperMic_19:8e:6e (00:25:90:19:8e:6e) Internet Protocol Version 4, Src: 172.21.86.141, Dst: 172.21.86.74 Transmission Control Protocol, Src Port: 997, Dst Port: 2049, Seq: 1025, Ack: 1621, Len: 256 Remote Procedure Call, Type:Call XID:0x2d009a8c Network File System, Ops(5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR [Program Version: 4] [V4 Procedure: COMPOUND (1)] Tag: <EMPTY> length: 0 contents: <EMPTY> minorversion: 1 Operations (count: 5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR Opcode: SEQUENCE (53) sessionid: f0f40f58bd020000e002000000000000 seqid: 0x002b318d slot id: 0 high slot id: 0 cache this?: Yes Opcode: PUTFH (22) filehandle length: 28 [hash (CRC-32): 0xdcaec4f7] filehandle: 010006016aeda54305f66ede000000000000000096484800... Opcode: OPEN (18) seqid: 0x00000000 share_access: OPEN4_SHARE_ACCESS_BOTH (3) share_deny: OPEN4_SHARE_DENY_NONE (0) clientid: 0xf0f40f58bd020000 owner: <DATA> length: 24 contents: <DATA> Open Type: OPEN4_NOCREATE (0) Claim Type: CLAIM_FH (4) Opcode: ACCESS (3), [Check: RD MD XT XE] Check access: 0x2d .... ...1 = 0x01 READ: allowed? .... .1.. = 0x04 MODIFY: allowed? .... 1... = 0x08 EXTEND: allowed? ..1. .... = 0x20 EXECUTE: allowed? Opcode: GETATTR (9) Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId) reqd_attr: Type (1) reqd_attr: Change (3) reqd_attr: Size (4) reqd_attr: FSID (8) reco_attr: FileId (20) Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId) reco_attr: Mode (33) reco_attr: NumLinks (35) reco_attr: Owner (36) reco_attr: Owner_Group (37) reco_attr: RawDev (41) reco_attr: Space_Used (45) reco_attr: Time_Access (47) reco_attr: Time_Metadata (52) reco_attr: Time_Modify (53) reco_attr: Mounted_on_FileId (55) [Main Opcode: OPEN (18)] Frame 8: 390 bytes on wire (3120 bits), 390 bytes captured (3120 bits) on interface 0 Ethernet II, Src: SuperMic_19:8e:6e (00:25:90:19:8e:6e), Dst: AsrockIn_84:34:e7 (d0:50:99:84:34:e7) Internet Protocol Version 4, Src: 172.21.86.74, Dst: 172.21.86.141 Transmission Control Protocol, Src Port: 2049, Dst Port: 997, Seq: 1621, Ack: 1281, Len: 324 Remote Procedure Call, Type:Reply XID:0x2d009a8c Network File System, Ops(5): SEQUENCE PUTFH OPEN ACCESS GETATTR [Program Version: 4] [V4 Procedure: COMPOUND (1)] Status: NFS4_OK (0) Tag: <EMPTY> length: 0 contents: <EMPTY> Operations (count: 5) Opcode: SEQUENCE (53) Status: NFS4_OK (0) sessionid: f0f40f58bd020000e002000000000000 seqid: 0x002b318d slot id: 0 high slot id: 30 target high slot id: 30 status flags: 0x00000040, SEQ4_STATUS_RECALLABLE_STATE_REVOKED .... .... .... .... .... .... .... ...0 = SEQ4_STATUS_CB_PATH_DOWN: Not set .... .... .... .... .... .... .... ..0. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING: Not set .... .... .... .... .... .... .... .0.. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED: Not set .... .... .... .... .... .... .... 0... = SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED: Not set .... .... .... .... .... .... ...0 .... = SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED: Not set .... .... .... .... .... .... ..0. .... = SEQ4_STATUS_ADMIN_STATE_REVOKED: Not set .... .... .... .... .... .... .1.. .... = SEQ4_STATUS_RECALLABLE_STATE_REVOKED: Set .... .... .... .... .... .... 0... .... = SEQ4_STATUS_LEASE_MOVED: Not set .... .... .... .... .... ...0 .... .... = SEQ4_STATUS_RESTART_RECLAIM_NEEDED: Not set .... .... .... .... .... ..0. .... .... = SEQ4_STATUS_CB_PATH_DOWN_SESSION: Not set .... .... .... .... .... .0.. .... .... = SEQ4_STATUS_BACKCHANNEL_FAULT: Not set .... .... .... .... .... 0... .... .... = SEQ4_STATUS_DEVID_CHANGED: Not set .... .... .... .... ...0 .... .... .... = SEQ4_STATUS_DEVID_DELETED: Not set Opcode: PUTFH (22) Status: NFS4_OK (0) Opcode: OPEN (18) Status: NFS4_OK (0) stateid [StateID Hash: 0x1925] seqid: 0x0015824b Data: f0f40f58bd02000092000000 [Data hash (CRC-32): 0xf3e67f06] change_info Atomic: No changeid (before): 0 changeid (after): 0 result flags: 0x00000004, locktype posix .... .... .... .... .... .... .... ..0. = confirm: False .... .... .... .... .... .... .... .1.. = locktype posix: True .... .... .... .... .... .... .... 0... = preserve unlinked: False .... .... .... .... .... .... ..0. .... = may notify lock: False Delegation Type: OPEN_DELEGATE_NONE (0) Opcode: ACCESS (3), [Access Denied: XE], [Allowed: RD MD XT] Status: NFS4_OK (0) Supported types (of requested): 0x2d .... ...1 = 0x01 READ: supported .... .1.. = 0x04 MODIFY: supported .... 1... = 0x08 EXTEND: supported ..1. .... = 0x20 EXECUTE: supported Access rights (of requested): 0x0d .... ...1 = 0x01 READ: allowed .... .1.. = 0x04 MODIFY: allowed .... 1... = 0x08 EXTEND: allowed ..0. .... = 0x20 EXECUTE: *Access Denied* Opcode: GETATTR (9) Status: NFS4_OK (0) Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId) reqd_attr: Type (1) ftype4: NF4REG (1) reqd_attr: Change (3) changeid: 6353654913397708009 reqd_attr: Size (4) size: 32768 reqd_attr: FSID (8) fattr4_fsid fsid4.major: 7704996244782214878 fsid4.minor: 0 reco_attr: FileId (20) fileid: 4737174 Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId) reco_attr: Mode (33) mode: 0644, Name: Unknown, Read permission for owner, Write permission for owner, Read permission for group, Read permission for others .... .... .... .... 000. .... .... .... = Name: Unknown (0) .... .... .... .... .... 0... .... .... = Set user id on exec: No .... .... .... .... .... .0.. .... .... = Set group id on exec: No .... .... .... .... .... ..0. .... .... = Save swapped text even after use: No .... .... .... .... .... ...1 .... .... = Read permission for owner: Yes .... .... .... .... .... .... 1... .... = Write permission for owner: Yes .... .... .... .... .... .... .0.. .... = Execute permission for owner: No .... .... .... .... .... .... ..1. .... = Read permission for group: Yes .... .... .... .... .... .... ...0 .... = Write permission for group: No .... .... .... .... .... .... .... 0... = Execute permission for group: No .... .... .... .... .... .... .... .1.. = Read permission for others: Yes .... .... .... .... .... .... .... ..0. = Write permission for others: No .... .... .... .... .... .... .... ...0 = Execute permission for others: No reco_attr: NumLinks (35) numlinks: 1 reco_attr: Owner (36) fattr4_owner: 7143 length: 4 contents: 7143 reco_attr: Owner_Group (37) fattr4_owner_group: 1112 length: 4 contents: 1112 reco_attr: RawDev (41) specdata1: 0 specdata2: 0 reco_attr: Space_Used (45) space_used: 32768 reco_attr: Time_Access (47) seconds: 1479318517 nseconds: 593359867 reco_attr: Time_Metadata (52) seconds: 1479325563 nseconds: 175920361 reco_attr: Time_Modify (53) seconds: 1479325563 nseconds: 175920361 reco_attr: Mounted_on_FileId (55) fileid: 0x0000000000484896 [Main Opcode: OPEN (18)] Frame 9: 322 bytes on wire (2576 bits), 322 bytes captured (2576 bits) on interface 0 Ethernet II, Src: AsrockIn_84:34:e7 (d0:50:99:84:34:e7), Dst: SuperMic_19:8e:6e (00:25:90:19:8e:6e) Internet Protocol Version 4, Src: 172.21.86.141, Dst: 172.21.86.74 Transmission Control Protocol, Src Port: 997, Dst Port: 2049, Seq: 1281, Ack: 1945, Len: 256 Remote Procedure Call, Type:Call XID:0x2e009a8c Network File System, Ops(5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR [Program Version: 4] [V4 Procedure: COMPOUND (1)] Tag: <EMPTY> length: 0 contents: <EMPTY> minorversion: 1 Operations (count: 5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR Opcode: SEQUENCE (53) sessionid: f0f40f58bd020000e002000000000000 seqid: 0x002b318e slot id: 0 high slot id: 0 cache this?: Yes Opcode: PUTFH (22) filehandle length: 28 [hash (CRC-32): 0x26ab9d8d] filehandle: 010006016aeda54305f66ede00000000000000009a424800... Opcode: OPEN (18) seqid: 0x00000000 share_access: OPEN4_SHARE_ACCESS_BOTH (3) share_deny: OPEN4_SHARE_DENY_NONE (0) clientid: 0xf0f40f58bd020000 owner: <DATA> length: 24 contents: <DATA> Open Type: OPEN4_NOCREATE (0) Claim Type: CLAIM_FH (4) Opcode: ACCESS (3), [Check: RD MD XT XE] Check access: 0x2d .... ...1 = 0x01 READ: allowed? .... .1.. = 0x04 MODIFY: allowed? .... 1... = 0x08 EXTEND: allowed? ..1. .... = 0x20 EXECUTE: allowed? Opcode: GETATTR (9) Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId) reqd_attr: Type (1) reqd_attr: Change (3) reqd_attr: Size (4) reqd_attr: FSID (8) reco_attr: FileId (20) Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId) reco_attr: Mode (33) reco_attr: NumLinks (35) reco_attr: Owner (36) reco_attr: Owner_Group (37) reco_attr: RawDev (41) reco_attr: Space_Used (45) reco_attr: Time_Access (47) reco_attr: Time_Metadata (52) reco_attr: Time_Modify (53) reco_attr: Mounted_on_FileId (55) [Main Opcode: OPEN (18)] Frame 10: 390 bytes on wire (3120 bits), 390 bytes captured (3120 bits) on interface 0 Ethernet II, Src: SuperMic_19:8e:6e (00:25:90:19:8e:6e), Dst: AsrockIn_84:34:e7 (d0:50:99:84:34:e7) Internet Protocol Version 4, Src: 172.21.86.74, Dst: 172.21.86.141 Transmission Control Protocol, Src Port: 2049, Dst Port: 997, Seq: 1945, Ack: 1537, Len: 324 Remote Procedure Call, Type:Reply XID:0x2e009a8c Network File System, Ops(5): SEQUENCE PUTFH OPEN ACCESS GETATTR [Program Version: 4] [V4 Procedure: COMPOUND (1)] Status: NFS4_OK (0) Tag: <EMPTY> length: 0 contents: <EMPTY> Operations (count: 5) Opcode: SEQUENCE (53) Status: NFS4_OK (0) sessionid: f0f40f58bd020000e002000000000000 seqid: 0x002b318e slot id: 0 high slot id: 30 target high slot id: 30 status flags: 0x00000040, SEQ4_STATUS_RECALLABLE_STATE_REVOKED .... .... .... .... .... .... .... ...0 = SEQ4_STATUS_CB_PATH_DOWN: Not set .... .... .... .... .... .... .... ..0. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING: Not set .... .... .... .... .... .... .... .0.. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED: Not set .... .... .... .... .... .... .... 0... = SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED: Not set .... .... .... .... .... .... ...0 .... = SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED: Not set .... .... .... .... .... .... ..0. .... = SEQ4_STATUS_ADMIN_STATE_REVOKED: Not set .... .... .... .... .... .... .1.. .... = SEQ4_STATUS_RECALLABLE_STATE_REVOKED: Set .... .... .... .... .... .... 0... .... = SEQ4_STATUS_LEASE_MOVED: Not set .... .... .... .... .... ...0 .... .... = SEQ4_STATUS_RESTART_RECLAIM_NEEDED: Not set .... .... .... .... .... ..0. .... .... = SEQ4_STATUS_CB_PATH_DOWN_SESSION: Not set .... .... .... .... .... .0.. .... .... = SEQ4_STATUS_BACKCHANNEL_FAULT: Not set .... .... .... .... .... 0... .... .... = SEQ4_STATUS_DEVID_CHANGED: Not set .... .... .... .... ...0 .... .... .... = SEQ4_STATUS_DEVID_DELETED: Not set Opcode: PUTFH (22) Status: NFS4_OK (0) Opcode: OPEN (18) Status: NFS4_OK (0) stateid [StateID Hash: 0xec52] seqid: 0x0015824d Data: f0f40f58bd0200009e000000 [Data hash (CRC-32): 0xb930c0be] change_info Atomic: No changeid (before): 0 changeid (after): 0 result flags: 0x00000004, locktype posix .... .... .... .... .... .... .... ..0. = confirm: False .... .... .... .... .... .... .... .1.. = locktype posix: True .... .... .... .... .... .... .... 0... = preserve unlinked: False .... .... .... .... .... .... ..0. .... = may notify lock: False Delegation Type: OPEN_DELEGATE_NONE (0) Opcode: ACCESS (3), [Access Denied: XE], [Allowed: RD MD XT] Status: NFS4_OK (0) Supported types (of requested): 0x2d .... ...1 = 0x01 READ: supported .... .1.. = 0x04 MODIFY: supported .... 1... = 0x08 EXTEND: supported ..1. .... = 0x20 EXECUTE: supported Access rights (of requested): 0x0d .... ...1 = 0x01 READ: allowed .... .1.. = 0x04 MODIFY: allowed .... 1... = 0x08 EXTEND: allowed ..0. .... = 0x20 EXECUTE: *Access Denied* Opcode: GETATTR (9) Status: NFS4_OK (0) Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId) reqd_attr: Type (1) ftype4: NF4REG (1) reqd_attr: Change (3) changeid: 6310935438380194724 reqd_attr: Size (4) size: 24576 reqd_attr: FSID (8) fattr4_fsid fsid4.major: 7704996244782214878 fsid4.minor: 0 reco_attr: FileId (20) fileid: 4735642 Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId) reco_attr: Mode (33) mode: 0644, Name: Unknown, Read permission for owner, Write permission for owner, Read permission for group, Read permission for others .... .... .... .... 000. .... .... .... = Name: Unknown (0) .... .... .... .... .... 0... .... .... = Set user id on exec: No .... .... .... .... .... .0.. .... .... = Set group id on exec: No .... .... .... .... .... ..0. .... .... = Save swapped text even after use: No .... .... .... .... .... ...1 .... .... = Read permission for owner: Yes .... .... .... .... .... .... 1... .... = Write permission for owner: Yes .... .... .... .... .... .... .0.. .... = Execute permission for owner: No .... .... .... .... .... .... ..1. .... = Read permission for group: Yes .... .... .... .... .... .... ...0 .... = Write permission for group: No .... .... .... .... .... .... .... 0... = Execute permission for group: No .... .... .... .... .... .... .... .1.. = Read permission for others: Yes .... .... .... .... .... .... .... ..0. = Write permission for others: No .... .... .... .... .... .... .... ...0 = Execute permission for others: No reco_attr: NumLinks (35) numlinks: 1 reco_attr: Owner (36) fattr4_owner: 7143 length: 4 contents: 7143 reco_attr: Owner_Group (37) fattr4_owner_group: 1112 length: 4 contents: 1112 reco_attr: RawDev (41) specdata1: 0 specdata2: 0 reco_attr: Space_Used (45) space_used: 24576 reco_attr: Time_Access (47) seconds: 1479325563 nseconds: 305919836 reco_attr: Time_Metadata (52) seconds: 1469379160 nseconds: 756243364 reco_attr: Time_Modify (53) seconds: 1469379160 nseconds: 756243364 reco_attr: Mounted_on_FileId (55) fileid: 0x000000000048429a [Main Opcode: OPEN (18)] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-11-16 20:55 ` Jason L Tibbitts III @ 2016-11-17 16:31 ` J. Bruce Fields 2016-11-17 17:08 ` Jason L Tibbitts III 2016-11-17 17:45 ` Trond Myklebust 0 siblings, 2 replies; 27+ messages in thread From: J. Bruce Fields @ 2016-11-17 16:31 UTC (permalink / raw) To: Jason L Tibbitts III; +Cc: linux-nfs On Wed, Nov 16, 2016 at 02:55:05PM -0600, Jason L Tibbitts III wrote: > I'm replying to a rather old message, but the issue has just now popped > back up again. > > To recap, a client stops being able to access _any_ mount on a > particular server, and "NFS: nfs4_reclaim_open_state: Lock reclaim > failed!" appears several hundred times per second in the kernel log. > The load goes up by one for ever process attempting to access any mount > from that particular server. Mounts to other servers are fine, and > other clients can mount things from that one server without problems. > > When I kill every process keeping that particular mount active and then > umount it, I see: > > NFS: nfs4_reclaim_open_state: unhandled error -10068 NFS4ERR_RETRY_UNCACHED_REP. So, you're using NFSv4.1 or 4.2, and the server thinks that the client has reused a (slot, sequence number) pair, but the server doesn't have a cached response to return. Hard to know how that happened, and it's not shown in the below. Sounds like a bug, though. > The server in this case is running Centos 7: 3.10.0-327.36.3.el7.x86_64 > The client is one of the last I have that's still running Fedora 23: > 4.7.10-100.fc23.x86_64 > > The big difference now, besides the updated kernel versions, is that > fact that no kerberos is involved. It's a sec=sys mount. There are > kerberized exports on the same server, but the client wasn't accessing > any of them. There were other kerberized mounts on the client, but they > were to different servers and they remained accessible. Recent clients will use sec=krb5 for certain state-related operations even if you mount with sec=sys, so it's still possible it could be involved here. > Here's ten frames from a packet capture. I think it's all repeating at > that point so maybe it will be enough, but I can provide a more complete > capture log on request. The SEQ4_STATUS_RECALLABLE_STATE_REVOKED flag set in the OPEN replies is also a sign something's gone wrong. Apparently the server thinks the client has failed to return a delegation. --b. > > - J< > > Frame 1: 390 bytes on wire (3120 bits), 390 bytes captured (3120 bits) on interface 0 > Ethernet II, Src: SuperMic_19:8e:6e (00:25:90:19:8e:6e), Dst: AsrockIn_84:34:e7 (d0:50:99:84:34:e7) > Internet Protocol Version 4, Src: 172.21.86.74, Dst: 172.21.86.141 > Transmission Control Protocol, Src Port: 2049, Dst Port: 997, Seq: 1, Ack: 1, Len: 324 > Remote Procedure Call > > Frame 2: 322 bytes on wire (2576 bits), 322 bytes captured (2576 bits) on interface 0 > Ethernet II, Src: AsrockIn_84:34:e7 (d0:50:99:84:34:e7), Dst: SuperMic_19:8e:6e (00:25:90:19:8e:6e) > Internet Protocol Version 4, Src: 172.21.86.141, Dst: 172.21.86.74 > Transmission Control Protocol, Src Port: 997, Dst Port: 2049, Seq: 1, Ack: 325, Len: 256 > Remote Procedure Call, Type:Call XID:0x29009a8c > Network File System, Ops(5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR > [Program Version: 4] > [V4 Procedure: COMPOUND (1)] > Tag: <EMPTY> > length: 0 > contents: <EMPTY> > minorversion: 1 > Operations (count: 5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR > Opcode: SEQUENCE (53) > sessionid: f0f40f58bd020000e002000000000000 > seqid: 0x002b3189 > slot id: 0 > high slot id: 0 > cache this?: Yes > Opcode: PUTFH (22) > filehandle > length: 28 > [hash (CRC-32): 0xdcaec4f7] > filehandle: 010006016aeda54305f66ede000000000000000096484800... > Opcode: OPEN (18) > seqid: 0x00000000 > share_access: OPEN4_SHARE_ACCESS_BOTH (3) > share_deny: OPEN4_SHARE_DENY_NONE (0) > clientid: 0xf0f40f58bd020000 > owner: <DATA> > length: 24 > contents: <DATA> > Open Type: OPEN4_NOCREATE (0) > Claim Type: CLAIM_FH (4) > Opcode: ACCESS (3), [Check: RD MD XT XE] > Check access: 0x2d > .... ...1 = 0x01 READ: allowed? > .... .1.. = 0x04 MODIFY: allowed? > .... 1... = 0x08 EXTEND: allowed? > ..1. .... = 0x20 EXECUTE: allowed? > Opcode: GETATTR (9) > Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId) > reqd_attr: Type (1) > reqd_attr: Change (3) > reqd_attr: Size (4) > reqd_attr: FSID (8) > reco_attr: FileId (20) > Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId) > reco_attr: Mode (33) > reco_attr: NumLinks (35) > reco_attr: Owner (36) > reco_attr: Owner_Group (37) > reco_attr: RawDev (41) > reco_attr: Space_Used (45) > reco_attr: Time_Access (47) > reco_attr: Time_Metadata (52) > reco_attr: Time_Modify (53) > reco_attr: Mounted_on_FileId (55) > [Main Opcode: OPEN (18)] > > Frame 3: 390 bytes on wire (3120 bits), 390 bytes captured (3120 bits) on interface 0 > Ethernet II, Src: SuperMic_19:8e:6e (00:25:90:19:8e:6e), Dst: AsrockIn_84:34:e7 (d0:50:99:84:34:e7) > Internet Protocol Version 4, Src: 172.21.86.74, Dst: 172.21.86.141 > Transmission Control Protocol, Src Port: 2049, Dst Port: 997, Seq: 325, Ack: 257, Len: 324 > Remote Procedure Call, Type:Reply XID:0x29009a8c > Network File System, Ops(5): SEQUENCE PUTFH OPEN ACCESS GETATTR > [Program Version: 4] > [V4 Procedure: COMPOUND (1)] > Status: NFS4_OK (0) > Tag: <EMPTY> > length: 0 > contents: <EMPTY> > Operations (count: 5) > Opcode: SEQUENCE (53) > Status: NFS4_OK (0) > sessionid: f0f40f58bd020000e002000000000000 > seqid: 0x002b3189 > slot id: 0 > high slot id: 30 > target high slot id: 30 > status flags: 0x00000040, SEQ4_STATUS_RECALLABLE_STATE_REVOKED > .... .... .... .... .... .... .... ...0 = SEQ4_STATUS_CB_PATH_DOWN: Not set > .... .... .... .... .... .... .... ..0. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING: Not set > .... .... .... .... .... .... .... .0.. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED: Not set > .... .... .... .... .... .... .... 0... = SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED: Not set > .... .... .... .... .... .... ...0 .... = SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED: Not set > .... .... .... .... .... .... ..0. .... = SEQ4_STATUS_ADMIN_STATE_REVOKED: Not set > .... .... .... .... .... .... .1.. .... = SEQ4_STATUS_RECALLABLE_STATE_REVOKED: Set > .... .... .... .... .... .... 0... .... = SEQ4_STATUS_LEASE_MOVED: Not set > .... .... .... .... .... ...0 .... .... = SEQ4_STATUS_RESTART_RECLAIM_NEEDED: Not set > .... .... .... .... .... ..0. .... .... = SEQ4_STATUS_CB_PATH_DOWN_SESSION: Not set > .... .... .... .... .... .0.. .... .... = SEQ4_STATUS_BACKCHANNEL_FAULT: Not set > .... .... .... .... .... 0... .... .... = SEQ4_STATUS_DEVID_CHANGED: Not set > .... .... .... .... ...0 .... .... .... = SEQ4_STATUS_DEVID_DELETED: Not set > Opcode: PUTFH (22) > Status: NFS4_OK (0) > Opcode: OPEN (18) > Status: NFS4_OK (0) > stateid > [StateID Hash: 0xb207] > seqid: 0x00158249 > Data: f0f40f58bd02000092000000 > [Data hash (CRC-32): 0xf3e67f06] > change_info > Atomic: No > changeid (before): 0 > changeid (after): 0 > result flags: 0x00000004, locktype posix > .... .... .... .... .... .... .... ..0. = confirm: False > .... .... .... .... .... .... .... .1.. = locktype posix: True > .... .... .... .... .... .... .... 0... = preserve unlinked: False > .... .... .... .... .... .... ..0. .... = may notify lock: False > Delegation Type: OPEN_DELEGATE_NONE (0) > Opcode: ACCESS (3), [Access Denied: XE], [Allowed: RD MD XT] > Status: NFS4_OK (0) > Supported types (of requested): 0x2d > .... ...1 = 0x01 READ: supported > .... .1.. = 0x04 MODIFY: supported > .... 1... = 0x08 EXTEND: supported > ..1. .... = 0x20 EXECUTE: supported > Access rights (of requested): 0x0d > .... ...1 = 0x01 READ: allowed > .... .1.. = 0x04 MODIFY: allowed > .... 1... = 0x08 EXTEND: allowed > ..0. .... = 0x20 EXECUTE: *Access Denied* > Opcode: GETATTR (9) > Status: NFS4_OK (0) > Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId) > reqd_attr: Type (1) > ftype4: NF4REG (1) > reqd_attr: Change (3) > changeid: 6353654913397708009 > reqd_attr: Size (4) > size: 32768 > reqd_attr: FSID (8) > fattr4_fsid > fsid4.major: 7704996244782214878 > fsid4.minor: 0 > reco_attr: FileId (20) > fileid: 4737174 > Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId) > reco_attr: Mode (33) > mode: 0644, Name: Unknown, Read permission for owner, Write permission for owner, Read permission for group, Read permission for others > .... .... .... .... 000. .... .... .... = Name: Unknown (0) > .... .... .... .... .... 0... .... .... = Set user id on exec: No > .... .... .... .... .... .0.. .... .... = Set group id on exec: No > .... .... .... .... .... ..0. .... .... = Save swapped text even after use: No > .... .... .... .... .... ...1 .... .... = Read permission for owner: Yes > .... .... .... .... .... .... 1... .... = Write permission for owner: Yes > .... .... .... .... .... .... .0.. .... = Execute permission for owner: No > .... .... .... .... .... .... ..1. .... = Read permission for group: Yes > .... .... .... .... .... .... ...0 .... = Write permission for group: No > .... .... .... .... .... .... .... 0... = Execute permission for group: No > .... .... .... .... .... .... .... .1.. = Read permission for others: Yes > .... .... .... .... .... .... .... ..0. = Write permission for others: No > .... .... .... .... .... .... .... ...0 = Execute permission for others: No > reco_attr: NumLinks (35) > numlinks: 1 > reco_attr: Owner (36) > fattr4_owner: 7143 > length: 4 > contents: 7143 > reco_attr: Owner_Group (37) > fattr4_owner_group: 1112 > length: 4 > contents: 1112 > reco_attr: RawDev (41) > specdata1: 0 > specdata2: 0 > reco_attr: Space_Used (45) > space_used: 32768 > reco_attr: Time_Access (47) > seconds: 1479318517 > nseconds: 593359867 > reco_attr: Time_Metadata (52) > seconds: 1479325563 > nseconds: 175920361 > reco_attr: Time_Modify (53) > seconds: 1479325563 > nseconds: 175920361 > reco_attr: Mounted_on_FileId (55) > fileid: 0x0000000000484896 > [Main Opcode: OPEN (18)] > > Frame 4: 322 bytes on wire (2576 bits), 322 bytes captured (2576 bits) on interface 0 > Ethernet II, Src: AsrockIn_84:34:e7 (d0:50:99:84:34:e7), Dst: SuperMic_19:8e:6e (00:25:90:19:8e:6e) > Internet Protocol Version 4, Src: 172.21.86.141, Dst: 172.21.86.74 > Transmission Control Protocol, Src Port: 997, Dst Port: 2049, Seq: 257, Ack: 649, Len: 256 > Remote Procedure Call, Type:Call XID:0x2a009a8c > Network File System, Ops(5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR > [Program Version: 4] > [V4 Procedure: COMPOUND (1)] > Tag: <EMPTY> > length: 0 > contents: <EMPTY> > minorversion: 1 > Operations (count: 5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR > Opcode: SEQUENCE (53) > sessionid: f0f40f58bd020000e002000000000000 > seqid: 0x002b318a > slot id: 0 > high slot id: 0 > cache this?: Yes > Opcode: PUTFH (22) > filehandle > length: 28 > [hash (CRC-32): 0x26ab9d8d] > filehandle: 010006016aeda54305f66ede00000000000000009a424800... > Opcode: OPEN (18) > seqid: 0x00000000 > share_access: OPEN4_SHARE_ACCESS_BOTH (3) > share_deny: OPEN4_SHARE_DENY_NONE (0) > clientid: 0xf0f40f58bd020000 > owner: <DATA> > length: 24 > contents: <DATA> > Open Type: OPEN4_NOCREATE (0) > Claim Type: CLAIM_FH (4) > Opcode: ACCESS (3), [Check: RD MD XT XE] > Check access: 0x2d > .... ...1 = 0x01 READ: allowed? > .... .1.. = 0x04 MODIFY: allowed? > .... 1... = 0x08 EXTEND: allowed? > ..1. .... = 0x20 EXECUTE: allowed? > Opcode: GETATTR (9) > Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId) > reqd_attr: Type (1) > reqd_attr: Change (3) > reqd_attr: Size (4) > reqd_attr: FSID (8) > reco_attr: FileId (20) > Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId) > reco_attr: Mode (33) > reco_attr: NumLinks (35) > reco_attr: Owner (36) > reco_attr: Owner_Group (37) > reco_attr: RawDev (41) > reco_attr: Space_Used (45) > reco_attr: Time_Access (47) > reco_attr: Time_Metadata (52) > reco_attr: Time_Modify (53) > reco_attr: Mounted_on_FileId (55) > [Main Opcode: OPEN (18)] > > Frame 5: 322 bytes on wire (2576 bits), 322 bytes captured (2576 bits) on interface 0 > Ethernet II, Src: AsrockIn_84:34:e7 (d0:50:99:84:34:e7), Dst: SuperMic_19:8e:6e (00:25:90:19:8e:6e) > Internet Protocol Version 4, Src: 172.21.86.141, Dst: 172.21.86.74 > Transmission Control Protocol, Src Port: 997, Dst Port: 2049, Seq: 769, Ack: 1297, Len: 256 > Remote Procedure Call, Type:Call XID:0x2c009a8c > Network File System, Ops(5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR > [Program Version: 4] > [V4 Procedure: COMPOUND (1)] > Tag: <EMPTY> > length: 0 > contents: <EMPTY> > minorversion: 1 > Operations (count: 5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR > Opcode: SEQUENCE (53) > sessionid: f0f40f58bd020000e002000000000000 > seqid: 0x002b318c > slot id: 0 > high slot id: 0 > cache this?: Yes > Opcode: PUTFH (22) > filehandle > length: 28 > [hash (CRC-32): 0x26ab9d8d] > filehandle: 010006016aeda54305f66ede00000000000000009a424800... > Opcode: OPEN (18) > seqid: 0x00000000 > share_access: OPEN4_SHARE_ACCESS_BOTH (3) > share_deny: OPEN4_SHARE_DENY_NONE (0) > clientid: 0xf0f40f58bd020000 > owner: <DATA> > length: 24 > contents: <DATA> > Open Type: OPEN4_NOCREATE (0) > Claim Type: CLAIM_FH (4) > Opcode: ACCESS (3), [Check: RD MD XT XE] > Check access: 0x2d > .... ...1 = 0x01 READ: allowed? > .... .1.. = 0x04 MODIFY: allowed? > .... 1... = 0x08 EXTEND: allowed? > ..1. .... = 0x20 EXECUTE: allowed? > Opcode: GETATTR (9) > Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId) > reqd_attr: Type (1) > reqd_attr: Change (3) > reqd_attr: Size (4) > reqd_attr: FSID (8) > reco_attr: FileId (20) > Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId) > reco_attr: Mode (33) > reco_attr: NumLinks (35) > reco_attr: Owner (36) > reco_attr: Owner_Group (37) > reco_attr: RawDev (41) > reco_attr: Space_Used (45) > reco_attr: Time_Access (47) > reco_attr: Time_Metadata (52) > reco_attr: Time_Modify (53) > reco_attr: Mounted_on_FileId (55) > [Main Opcode: OPEN (18)] > > Frame 6: 390 bytes on wire (3120 bits), 390 bytes captured (3120 bits) on interface 0 > Ethernet II, Src: SuperMic_19:8e:6e (00:25:90:19:8e:6e), Dst: AsrockIn_84:34:e7 (d0:50:99:84:34:e7) > Internet Protocol Version 4, Src: 172.21.86.74, Dst: 172.21.86.141 > Transmission Control Protocol, Src Port: 2049, Dst Port: 997, Seq: 1297, Ack: 1025, Len: 324 > Remote Procedure Call, Type:Reply XID:0x2c009a8c > Network File System, Ops(5): SEQUENCE PUTFH OPEN ACCESS GETATTR > [Program Version: 4] > [V4 Procedure: COMPOUND (1)] > Status: NFS4_OK (0) > Tag: <EMPTY> > length: 0 > contents: <EMPTY> > Operations (count: 5) > Opcode: SEQUENCE (53) > Status: NFS4_OK (0) > sessionid: f0f40f58bd020000e002000000000000 > seqid: 0x002b318c > slot id: 0 > high slot id: 30 > target high slot id: 30 > status flags: 0x00000040, SEQ4_STATUS_RECALLABLE_STATE_REVOKED > .... .... .... .... .... .... .... ...0 = SEQ4_STATUS_CB_PATH_DOWN: Not set > .... .... .... .... .... .... .... ..0. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING: Not set > .... .... .... .... .... .... .... .0.. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED: Not set > .... .... .... .... .... .... .... 0... = SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED: Not set > .... .... .... .... .... .... ...0 .... = SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED: Not set > .... .... .... .... .... .... ..0. .... = SEQ4_STATUS_ADMIN_STATE_REVOKED: Not set > .... .... .... .... .... .... .1.. .... = SEQ4_STATUS_RECALLABLE_STATE_REVOKED: Set > .... .... .... .... .... .... 0... .... = SEQ4_STATUS_LEASE_MOVED: Not set > .... .... .... .... .... ...0 .... .... = SEQ4_STATUS_RESTART_RECLAIM_NEEDED: Not set > .... .... .... .... .... ..0. .... .... = SEQ4_STATUS_CB_PATH_DOWN_SESSION: Not set > .... .... .... .... .... .0.. .... .... = SEQ4_STATUS_BACKCHANNEL_FAULT: Not set > .... .... .... .... .... 0... .... .... = SEQ4_STATUS_DEVID_CHANGED: Not set > .... .... .... .... ...0 .... .... .... = SEQ4_STATUS_DEVID_DELETED: Not set > Opcode: PUTFH (22) > Status: NFS4_OK (0) > Opcode: OPEN (18) > Status: NFS4_OK (0) > stateid > [StateID Hash: 0xb9c3] > seqid: 0x0015824c > Data: f0f40f58bd0200009e000000 > [Data hash (CRC-32): 0xb930c0be] > change_info > Atomic: No > changeid (before): 0 > changeid (after): 0 > result flags: 0x00000004, locktype posix > .... .... .... .... .... .... .... ..0. = confirm: False > .... .... .... .... .... .... .... .1.. = locktype posix: True > .... .... .... .... .... .... .... 0... = preserve unlinked: False > .... .... .... .... .... .... ..0. .... = may notify lock: False > Delegation Type: OPEN_DELEGATE_NONE (0) > Opcode: ACCESS (3), [Access Denied: XE], [Allowed: RD MD XT] > Status: NFS4_OK (0) > Supported types (of requested): 0x2d > .... ...1 = 0x01 READ: supported > .... .1.. = 0x04 MODIFY: supported > .... 1... = 0x08 EXTEND: supported > ..1. .... = 0x20 EXECUTE: supported > Access rights (of requested): 0x0d > .... ...1 = 0x01 READ: allowed > .... .1.. = 0x04 MODIFY: allowed > .... 1... = 0x08 EXTEND: allowed > ..0. .... = 0x20 EXECUTE: *Access Denied* > Opcode: GETATTR (9) > Status: NFS4_OK (0) > Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId) > reqd_attr: Type (1) > ftype4: NF4REG (1) > reqd_attr: Change (3) > changeid: 6310935438380194724 > reqd_attr: Size (4) > size: 24576 > reqd_attr: FSID (8) > fattr4_fsid > fsid4.major: 7704996244782214878 > fsid4.minor: 0 > reco_attr: FileId (20) > fileid: 4735642 > Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId) > reco_attr: Mode (33) > mode: 0644, Name: Unknown, Read permission for owner, Write permission for owner, Read permission for group, Read permission for others > .... .... .... .... 000. .... .... .... = Name: Unknown (0) > .... .... .... .... .... 0... .... .... = Set user id on exec: No > .... .... .... .... .... .0.. .... .... = Set group id on exec: No > .... .... .... .... .... ..0. .... .... = Save swapped text even after use: No > .... .... .... .... .... ...1 .... .... = Read permission for owner: Yes > .... .... .... .... .... .... 1... .... = Write permission for owner: Yes > .... .... .... .... .... .... .0.. .... = Execute permission for owner: No > .... .... .... .... .... .... ..1. .... = Read permission for group: Yes > .... .... .... .... .... .... ...0 .... = Write permission for group: No > .... .... .... .... .... .... .... 0... = Execute permission for group: No > .... .... .... .... .... .... .... .1.. = Read permission for others: Yes > .... .... .... .... .... .... .... ..0. = Write permission for others: No > .... .... .... .... .... .... .... ...0 = Execute permission for others: No > reco_attr: NumLinks (35) > numlinks: 1 > reco_attr: Owner (36) > fattr4_owner: 7143 > length: 4 > contents: 7143 > reco_attr: Owner_Group (37) > fattr4_owner_group: 1112 > length: 4 > contents: 1112 > reco_attr: RawDev (41) > specdata1: 0 > specdata2: 0 > reco_attr: Space_Used (45) > space_used: 24576 > reco_attr: Time_Access (47) > seconds: 1479325563 > nseconds: 305919836 > reco_attr: Time_Metadata (52) > seconds: 1469379160 > nseconds: 756243364 > reco_attr: Time_Modify (53) > seconds: 1469379160 > nseconds: 756243364 > reco_attr: Mounted_on_FileId (55) > fileid: 0x000000000048429a > [Main Opcode: OPEN (18)] > > Frame 7: 322 bytes on wire (2576 bits), 322 bytes captured (2576 bits) on interface 0 > Ethernet II, Src: AsrockIn_84:34:e7 (d0:50:99:84:34:e7), Dst: SuperMic_19:8e:6e (00:25:90:19:8e:6e) > Internet Protocol Version 4, Src: 172.21.86.141, Dst: 172.21.86.74 > Transmission Control Protocol, Src Port: 997, Dst Port: 2049, Seq: 1025, Ack: 1621, Len: 256 > Remote Procedure Call, Type:Call XID:0x2d009a8c > Network File System, Ops(5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR > [Program Version: 4] > [V4 Procedure: COMPOUND (1)] > Tag: <EMPTY> > length: 0 > contents: <EMPTY> > minorversion: 1 > Operations (count: 5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR > Opcode: SEQUENCE (53) > sessionid: f0f40f58bd020000e002000000000000 > seqid: 0x002b318d > slot id: 0 > high slot id: 0 > cache this?: Yes > Opcode: PUTFH (22) > filehandle > length: 28 > [hash (CRC-32): 0xdcaec4f7] > filehandle: 010006016aeda54305f66ede000000000000000096484800... > Opcode: OPEN (18) > seqid: 0x00000000 > share_access: OPEN4_SHARE_ACCESS_BOTH (3) > share_deny: OPEN4_SHARE_DENY_NONE (0) > clientid: 0xf0f40f58bd020000 > owner: <DATA> > length: 24 > contents: <DATA> > Open Type: OPEN4_NOCREATE (0) > Claim Type: CLAIM_FH (4) > Opcode: ACCESS (3), [Check: RD MD XT XE] > Check access: 0x2d > .... ...1 = 0x01 READ: allowed? > .... .1.. = 0x04 MODIFY: allowed? > .... 1... = 0x08 EXTEND: allowed? > ..1. .... = 0x20 EXECUTE: allowed? > Opcode: GETATTR (9) > Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId) > reqd_attr: Type (1) > reqd_attr: Change (3) > reqd_attr: Size (4) > reqd_attr: FSID (8) > reco_attr: FileId (20) > Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId) > reco_attr: Mode (33) > reco_attr: NumLinks (35) > reco_attr: Owner (36) > reco_attr: Owner_Group (37) > reco_attr: RawDev (41) > reco_attr: Space_Used (45) > reco_attr: Time_Access (47) > reco_attr: Time_Metadata (52) > reco_attr: Time_Modify (53) > reco_attr: Mounted_on_FileId (55) > [Main Opcode: OPEN (18)] > > Frame 8: 390 bytes on wire (3120 bits), 390 bytes captured (3120 bits) on interface 0 > Ethernet II, Src: SuperMic_19:8e:6e (00:25:90:19:8e:6e), Dst: AsrockIn_84:34:e7 (d0:50:99:84:34:e7) > Internet Protocol Version 4, Src: 172.21.86.74, Dst: 172.21.86.141 > Transmission Control Protocol, Src Port: 2049, Dst Port: 997, Seq: 1621, Ack: 1281, Len: 324 > Remote Procedure Call, Type:Reply XID:0x2d009a8c > Network File System, Ops(5): SEQUENCE PUTFH OPEN ACCESS GETATTR > [Program Version: 4] > [V4 Procedure: COMPOUND (1)] > Status: NFS4_OK (0) > Tag: <EMPTY> > length: 0 > contents: <EMPTY> > Operations (count: 5) > Opcode: SEQUENCE (53) > Status: NFS4_OK (0) > sessionid: f0f40f58bd020000e002000000000000 > seqid: 0x002b318d > slot id: 0 > high slot id: 30 > target high slot id: 30 > status flags: 0x00000040, SEQ4_STATUS_RECALLABLE_STATE_REVOKED > .... .... .... .... .... .... .... ...0 = SEQ4_STATUS_CB_PATH_DOWN: Not set > .... .... .... .... .... .... .... ..0. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING: Not set > .... .... .... .... .... .... .... .0.. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED: Not set > .... .... .... .... .... .... .... 0... = SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED: Not set > .... .... .... .... .... .... ...0 .... = SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED: Not set > .... .... .... .... .... .... ..0. .... = SEQ4_STATUS_ADMIN_STATE_REVOKED: Not set > .... .... .... .... .... .... .1.. .... = SEQ4_STATUS_RECALLABLE_STATE_REVOKED: Set > .... .... .... .... .... .... 0... .... = SEQ4_STATUS_LEASE_MOVED: Not set > .... .... .... .... .... ...0 .... .... = SEQ4_STATUS_RESTART_RECLAIM_NEEDED: Not set > .... .... .... .... .... ..0. .... .... = SEQ4_STATUS_CB_PATH_DOWN_SESSION: Not set > .... .... .... .... .... .0.. .... .... = SEQ4_STATUS_BACKCHANNEL_FAULT: Not set > .... .... .... .... .... 0... .... .... = SEQ4_STATUS_DEVID_CHANGED: Not set > .... .... .... .... ...0 .... .... .... = SEQ4_STATUS_DEVID_DELETED: Not set > Opcode: PUTFH (22) > Status: NFS4_OK (0) > Opcode: OPEN (18) > Status: NFS4_OK (0) > stateid > [StateID Hash: 0x1925] > seqid: 0x0015824b > Data: f0f40f58bd02000092000000 > [Data hash (CRC-32): 0xf3e67f06] > change_info > Atomic: No > changeid (before): 0 > changeid (after): 0 > result flags: 0x00000004, locktype posix > .... .... .... .... .... .... .... ..0. = confirm: False > .... .... .... .... .... .... .... .1.. = locktype posix: True > .... .... .... .... .... .... .... 0... = preserve unlinked: False > .... .... .... .... .... .... ..0. .... = may notify lock: False > Delegation Type: OPEN_DELEGATE_NONE (0) > Opcode: ACCESS (3), [Access Denied: XE], [Allowed: RD MD XT] > Status: NFS4_OK (0) > Supported types (of requested): 0x2d > .... ...1 = 0x01 READ: supported > .... .1.. = 0x04 MODIFY: supported > .... 1... = 0x08 EXTEND: supported > ..1. .... = 0x20 EXECUTE: supported > Access rights (of requested): 0x0d > .... ...1 = 0x01 READ: allowed > .... .1.. = 0x04 MODIFY: allowed > .... 1... = 0x08 EXTEND: allowed > ..0. .... = 0x20 EXECUTE: *Access Denied* > Opcode: GETATTR (9) > Status: NFS4_OK (0) > Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId) > reqd_attr: Type (1) > ftype4: NF4REG (1) > reqd_attr: Change (3) > changeid: 6353654913397708009 > reqd_attr: Size (4) > size: 32768 > reqd_attr: FSID (8) > fattr4_fsid > fsid4.major: 7704996244782214878 > fsid4.minor: 0 > reco_attr: FileId (20) > fileid: 4737174 > Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId) > reco_attr: Mode (33) > mode: 0644, Name: Unknown, Read permission for owner, Write permission for owner, Read permission for group, Read permission for others > .... .... .... .... 000. .... .... .... = Name: Unknown (0) > .... .... .... .... .... 0... .... .... = Set user id on exec: No > .... .... .... .... .... .0.. .... .... = Set group id on exec: No > .... .... .... .... .... ..0. .... .... = Save swapped text even after use: No > .... .... .... .... .... ...1 .... .... = Read permission for owner: Yes > .... .... .... .... .... .... 1... .... = Write permission for owner: Yes > .... .... .... .... .... .... .0.. .... = Execute permission for owner: No > .... .... .... .... .... .... ..1. .... = Read permission for group: Yes > .... .... .... .... .... .... ...0 .... = Write permission for group: No > .... .... .... .... .... .... .... 0... = Execute permission for group: No > .... .... .... .... .... .... .... .1.. = Read permission for others: Yes > .... .... .... .... .... .... .... ..0. = Write permission for others: No > .... .... .... .... .... .... .... ...0 = Execute permission for others: No > reco_attr: NumLinks (35) > numlinks: 1 > reco_attr: Owner (36) > fattr4_owner: 7143 > length: 4 > contents: 7143 > reco_attr: Owner_Group (37) > fattr4_owner_group: 1112 > length: 4 > contents: 1112 > reco_attr: RawDev (41) > specdata1: 0 > specdata2: 0 > reco_attr: Space_Used (45) > space_used: 32768 > reco_attr: Time_Access (47) > seconds: 1479318517 > nseconds: 593359867 > reco_attr: Time_Metadata (52) > seconds: 1479325563 > nseconds: 175920361 > reco_attr: Time_Modify (53) > seconds: 1479325563 > nseconds: 175920361 > reco_attr: Mounted_on_FileId (55) > fileid: 0x0000000000484896 > [Main Opcode: OPEN (18)] > > Frame 9: 322 bytes on wire (2576 bits), 322 bytes captured (2576 bits) on interface 0 > Ethernet II, Src: AsrockIn_84:34:e7 (d0:50:99:84:34:e7), Dst: SuperMic_19:8e:6e (00:25:90:19:8e:6e) > Internet Protocol Version 4, Src: 172.21.86.141, Dst: 172.21.86.74 > Transmission Control Protocol, Src Port: 997, Dst Port: 2049, Seq: 1281, Ack: 1945, Len: 256 > Remote Procedure Call, Type:Call XID:0x2e009a8c > Network File System, Ops(5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR > [Program Version: 4] > [V4 Procedure: COMPOUND (1)] > Tag: <EMPTY> > length: 0 > contents: <EMPTY> > minorversion: 1 > Operations (count: 5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR > Opcode: SEQUENCE (53) > sessionid: f0f40f58bd020000e002000000000000 > seqid: 0x002b318e > slot id: 0 > high slot id: 0 > cache this?: Yes > Opcode: PUTFH (22) > filehandle > length: 28 > [hash (CRC-32): 0x26ab9d8d] > filehandle: 010006016aeda54305f66ede00000000000000009a424800... > Opcode: OPEN (18) > seqid: 0x00000000 > share_access: OPEN4_SHARE_ACCESS_BOTH (3) > share_deny: OPEN4_SHARE_DENY_NONE (0) > clientid: 0xf0f40f58bd020000 > owner: <DATA> > length: 24 > contents: <DATA> > Open Type: OPEN4_NOCREATE (0) > Claim Type: CLAIM_FH (4) > Opcode: ACCESS (3), [Check: RD MD XT XE] > Check access: 0x2d > .... ...1 = 0x01 READ: allowed? > .... .1.. = 0x04 MODIFY: allowed? > .... 1... = 0x08 EXTEND: allowed? > ..1. .... = 0x20 EXECUTE: allowed? > Opcode: GETATTR (9) > Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId) > reqd_attr: Type (1) > reqd_attr: Change (3) > reqd_attr: Size (4) > reqd_attr: FSID (8) > reco_attr: FileId (20) > Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId) > reco_attr: Mode (33) > reco_attr: NumLinks (35) > reco_attr: Owner (36) > reco_attr: Owner_Group (37) > reco_attr: RawDev (41) > reco_attr: Space_Used (45) > reco_attr: Time_Access (47) > reco_attr: Time_Metadata (52) > reco_attr: Time_Modify (53) > reco_attr: Mounted_on_FileId (55) > [Main Opcode: OPEN (18)] > > Frame 10: 390 bytes on wire (3120 bits), 390 bytes captured (3120 bits) on interface 0 > Ethernet II, Src: SuperMic_19:8e:6e (00:25:90:19:8e:6e), Dst: AsrockIn_84:34:e7 (d0:50:99:84:34:e7) > Internet Protocol Version 4, Src: 172.21.86.74, Dst: 172.21.86.141 > Transmission Control Protocol, Src Port: 2049, Dst Port: 997, Seq: 1945, Ack: 1537, Len: 324 > Remote Procedure Call, Type:Reply XID:0x2e009a8c > Network File System, Ops(5): SEQUENCE PUTFH OPEN ACCESS GETATTR > [Program Version: 4] > [V4 Procedure: COMPOUND (1)] > Status: NFS4_OK (0) > Tag: <EMPTY> > length: 0 > contents: <EMPTY> > Operations (count: 5) > Opcode: SEQUENCE (53) > Status: NFS4_OK (0) > sessionid: f0f40f58bd020000e002000000000000 > seqid: 0x002b318e > slot id: 0 > high slot id: 30 > target high slot id: 30 > status flags: 0x00000040, SEQ4_STATUS_RECALLABLE_STATE_REVOKED > .... .... .... .... .... .... .... ...0 = SEQ4_STATUS_CB_PATH_DOWN: Not set > .... .... .... .... .... .... .... ..0. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING: Not set > .... .... .... .... .... .... .... .0.. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED: Not set > .... .... .... .... .... .... .... 0... = SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED: Not set > .... .... .... .... .... .... ...0 .... = SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED: Not set > .... .... .... .... .... .... ..0. .... = SEQ4_STATUS_ADMIN_STATE_REVOKED: Not set > .... .... .... .... .... .... .1.. .... = SEQ4_STATUS_RECALLABLE_STATE_REVOKED: Set > .... .... .... .... .... .... 0... .... = SEQ4_STATUS_LEASE_MOVED: Not set > .... .... .... .... .... ...0 .... .... = SEQ4_STATUS_RESTART_RECLAIM_NEEDED: Not set > .... .... .... .... .... ..0. .... .... = SEQ4_STATUS_CB_PATH_DOWN_SESSION: Not set > .... .... .... .... .... .0.. .... .... = SEQ4_STATUS_BACKCHANNEL_FAULT: Not set > .... .... .... .... .... 0... .... .... = SEQ4_STATUS_DEVID_CHANGED: Not set > .... .... .... .... ...0 .... .... .... = SEQ4_STATUS_DEVID_DELETED: Not set > Opcode: PUTFH (22) > Status: NFS4_OK (0) > Opcode: OPEN (18) > Status: NFS4_OK (0) > stateid > [StateID Hash: 0xec52] > seqid: 0x0015824d > Data: f0f40f58bd0200009e000000 > [Data hash (CRC-32): 0xb930c0be] > change_info > Atomic: No > changeid (before): 0 > changeid (after): 0 > result flags: 0x00000004, locktype posix > .... .... .... .... .... .... .... ..0. = confirm: False > .... .... .... .... .... .... .... .1.. = locktype posix: True > .... .... .... .... .... .... .... 0... = preserve unlinked: False > .... .... .... .... .... .... ..0. .... = may notify lock: False > Delegation Type: OPEN_DELEGATE_NONE (0) > Opcode: ACCESS (3), [Access Denied: XE], [Allowed: RD MD XT] > Status: NFS4_OK (0) > Supported types (of requested): 0x2d > .... ...1 = 0x01 READ: supported > .... .1.. = 0x04 MODIFY: supported > .... 1... = 0x08 EXTEND: supported > ..1. .... = 0x20 EXECUTE: supported > Access rights (of requested): 0x0d > .... ...1 = 0x01 READ: allowed > .... .1.. = 0x04 MODIFY: allowed > .... 1... = 0x08 EXTEND: allowed > ..0. .... = 0x20 EXECUTE: *Access Denied* > Opcode: GETATTR (9) > Status: NFS4_OK (0) > Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId) > reqd_attr: Type (1) > ftype4: NF4REG (1) > reqd_attr: Change (3) > changeid: 6310935438380194724 > reqd_attr: Size (4) > size: 24576 > reqd_attr: FSID (8) > fattr4_fsid > fsid4.major: 7704996244782214878 > fsid4.minor: 0 > reco_attr: FileId (20) > fileid: 4735642 > Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId) > reco_attr: Mode (33) > mode: 0644, Name: Unknown, Read permission for owner, Write permission for owner, Read permission for group, Read permission for others > .... .... .... .... 000. .... .... .... = Name: Unknown (0) > .... .... .... .... .... 0... .... .... = Set user id on exec: No > .... .... .... .... .... .0.. .... .... = Set group id on exec: No > .... .... .... .... .... ..0. .... .... = Save swapped text even after use: No > .... .... .... .... .... ...1 .... .... = Read permission for owner: Yes > .... .... .... .... .... .... 1... .... = Write permission for owner: Yes > .... .... .... .... .... .... .0.. .... = Execute permission for owner: No > .... .... .... .... .... .... ..1. .... = Read permission for group: Yes > .... .... .... .... .... .... ...0 .... = Write permission for group: No > .... .... .... .... .... .... .... 0... = Execute permission for group: No > .... .... .... .... .... .... .... .1.. = Read permission for others: Yes > .... .... .... .... .... .... .... ..0. = Write permission for others: No > .... .... .... .... .... .... .... ...0 = Execute permission for others: No > reco_attr: NumLinks (35) > numlinks: 1 > reco_attr: Owner (36) > fattr4_owner: 7143 > length: 4 > contents: 7143 > reco_attr: Owner_Group (37) > fattr4_owner_group: 1112 > length: 4 > contents: 1112 > reco_attr: RawDev (41) > specdata1: 0 > specdata2: 0 > reco_attr: Space_Used (45) > space_used: 24576 > reco_attr: Time_Access (47) > seconds: 1479325563 > nseconds: 305919836 > reco_attr: Time_Metadata (52) > seconds: 1469379160 > nseconds: 756243364 > reco_attr: Time_Modify (53) > seconds: 1469379160 > nseconds: 756243364 > reco_attr: Mounted_on_FileId (55) > fileid: 0x000000000048429a > [Main Opcode: OPEN (18)] > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-11-17 16:31 ` J. Bruce Fields @ 2016-11-17 17:08 ` Jason L Tibbitts III 2016-11-17 20:22 ` Andrew W Elble 2016-11-17 17:45 ` Trond Myklebust 1 sibling, 1 reply; 27+ messages in thread From: Jason L Tibbitts III @ 2016-11-17 17:08 UTC (permalink / raw) To: J. Bruce Fields; +Cc: linux-nfs >>>>> "JBF" == J Bruce Fields <bfields@fieldses.org> writes: JBF> So, you're using NFSv4.1 or 4.2, and the server thinks that the JBF> client has reused a (slot, sequence number) pair, but the server JBF> doesn't have a cached response to return. Thanks for the reply. Sadly I don't understand all of it, but... JBF> Hard to know how that happened, and it's not shown in the below. JBF> Sounds like a bug, though. Yeah, I only found the problem after it was already happening, so obviously the beginning of the process is missing. And sadly it's not something I can easily repeat, so short of running some continuous package capture (which would be hard since once this starts the traffic volume is huge) there's no easy way to see it. Is there any state on either the client or server that I could inspect which might give any hints? I can add that to my notes in case this problem happens again. JBF> Recent clients will use sec=krb5 for certain state-related JBF> operations even if you mount with sec=sys, so it's still possible JBF> it could be involved here. On the server, the involved filesystem isn't exported with any sec= options, in case it matters. JBF> The SEQ4_STATUS_RECALLABLE_STATE_REVOKED flag set in the OPEN JBF> replies is also a sign something's gone wrong. Apparently the JBF> server thinks the client has failed to return a delegation. I can't imagine how that might have happened. There is nothing else NFS-related in the client's log besides the spew and that final line. There are some automount complaints about the user accessing directories that aren't in the map sources, and the usual random gssproxy noise which was fixed in Fedora 24. Currently the system is stable; it hasn't been rebooted since the problem occurred. Everything cleared up once I was able to unmounted the problematic filesystem. - J< ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-11-17 17:08 ` Jason L Tibbitts III @ 2016-11-17 20:22 ` Andrew W Elble 0 siblings, 0 replies; 27+ messages in thread From: Andrew W Elble @ 2016-11-17 20:22 UTC (permalink / raw) To: Jason L Tibbitts III; +Cc: J. Bruce Fields, linux-nfs I've found this extremely useful on clients in tracking down 'lost' delegations. echo "error != 0" | tee /sys/kernel/debug/tracing/events/nfs4/nfs4_delegreturn_exit/filter ...and then look in here: cat /sys/kernel/debug/tracing/trace (YMMV, not sure if this is going to work on your distro, debugfs etc) There's still work to be done with nfsd4_delegreturn() and revoked delegations serverside (as well as killing fh_verify() per Bruce's earlier suggestions) We've recently seen the server recall a delegation, revoke it, and then have the client try to return it much later (because of an unknown slowness issue) -- after the file had been deleted at the server. Jason L Tibbitts III <tibbs@math.uh.edu> writes: >>>>>> "JBF" == J Bruce Fields <bfields@fieldses.org> writes: > > JBF> So, you're using NFSv4.1 or 4.2, and the server thinks that the > JBF> client has reused a (slot, sequence number) pair, but the server > JBF> doesn't have a cached response to return. > > Thanks for the reply. Sadly I don't understand all of it, but... > > JBF> Hard to know how that happened, and it's not shown in the below. > JBF> Sounds like a bug, though. > > Yeah, I only found the problem after it was already happening, so > obviously the beginning of the process is missing. And sadly it's not > something I can easily repeat, so short of running some continuous > package capture (which would be hard since once this starts the traffic > volume is huge) there's no easy way to see it. > > Is there any state on either the client or server that I could inspect > which might give any hints? I can add that to my notes in case this > problem happens again. > > JBF> Recent clients will use sec=krb5 for certain state-related > JBF> operations even if you mount with sec=sys, so it's still possible > JBF> it could be involved here. > > On the server, the involved filesystem isn't exported with any sec= > options, in case it matters. > > JBF> The SEQ4_STATUS_RECALLABLE_STATE_REVOKED flag set in the OPEN > JBF> replies is also a sign something's gone wrong. Apparently the > JBF> server thinks the client has failed to return a delegation. > > I can't imagine how that might have happened. There is nothing else > NFS-related in the client's log besides the spew and that final line. > There are some automount complaints about the user accessing directories > that aren't in the map sources, and the usual random gssproxy noise > which was fixed in Fedora 24. > > Currently the system is stable; it hasn't been rebooted since the > problem occurred. Everything cleared up once I was able to unmounted > the problematic filesystem. > > - J< > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Andrew W. Elble aweits@discipline.rit.edu Infrastructure Engineer, Communications Technical Lead Rochester Institute of Technology PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-11-17 16:31 ` J. Bruce Fields 2016-11-17 17:08 ` Jason L Tibbitts III @ 2016-11-17 17:45 ` Trond Myklebust 2016-11-17 19:32 ` bfields 1 sibling, 1 reply; 27+ messages in thread From: Trond Myklebust @ 2016-11-17 17:45 UTC (permalink / raw) To: bfields, tibbs; +Cc: linux-nfs T24gVGh1LCAyMDE2LTExLTE3IGF0IDExOjMxIC0wNTAwLCBKLiBCcnVjZSBGaWVsZHMgd3JvdGU6 DQo+IE9uIFdlZCwgTm92IDE2LCAyMDE2IGF0IDAyOjU1OjA1UE0gLTA2MDAsIEphc29uIEwgVGli Yml0dHMgSUlJIHdyb3RlOg0KPiA+IA0KPiA+IEknbSByZXBseWluZyB0byBhIHJhdGhlciBvbGQg bWVzc2FnZSwgYnV0IHRoZSBpc3N1ZSBoYXMganVzdCBub3cNCj4gPiBwb3BwZWQNCj4gPiBiYWNr IHVwIGFnYWluLg0KPiA+IA0KPiA+IFRvIHJlY2FwLCBhIGNsaWVudCBzdG9wcyBiZWluZyBhYmxl IHRvIGFjY2VzcyBfYW55XyBtb3VudCBvbiBhDQo+ID4gcGFydGljdWxhciBzZXJ2ZXIsIGFuZCAi TkZTOiBuZnM0X3JlY2xhaW1fb3Blbl9zdGF0ZTogTG9jayByZWNsYWltDQo+ID4gZmFpbGVkISIg YXBwZWFycyBzZXZlcmFsIGh1bmRyZWQgdGltZXMgcGVyIHNlY29uZCBpbiB0aGUga2VybmVsDQo+ ID4gbG9nLg0KPiA+IFRoZSBsb2FkIGdvZXMgdXAgYnkgb25lIGZvciBldmVyIHByb2Nlc3MgYXR0 ZW1wdGluZyB0byBhY2Nlc3MgYW55DQo+ID4gbW91bnQNCj4gPiBmcm9tIHRoYXQgcGFydGljdWxh ciBzZXJ2ZXIuwqDCoE1vdW50cyB0byBvdGhlciBzZXJ2ZXJzIGFyZSBmaW5lLCBhbmQNCj4gPiBv dGhlciBjbGllbnRzIGNhbiBtb3VudCB0aGluZ3MgZnJvbSB0aGF0IG9uZSBzZXJ2ZXIgd2l0aG91 dA0KPiA+IHByb2JsZW1zLg0KPiA+IA0KPiA+IFdoZW4gSSBraWxsIGV2ZXJ5IHByb2Nlc3Mga2Vl cGluZyB0aGF0IHBhcnRpY3VsYXIgbW91bnQgYWN0aXZlIGFuZA0KPiA+IHRoZW4NCj4gPiB1bW91 bnQgaXQsIEkgc2VlOg0KPiA+IA0KPiA+IE5GUzogbmZzNF9yZWNsYWltX29wZW5fc3RhdGU6IHVu aGFuZGxlZCBlcnJvciAtMTAwNjgNCj4gDQo+IE5GUzRFUlJfUkVUUllfVU5DQUNIRURfUkVQLg0K PiANCj4gU28sIHlvdSdyZSB1c2luZyBORlN2NC4xIG9yIDQuMiwgYW5kIHRoZSBzZXJ2ZXIgdGhp bmtzIHRoYXQgdGhlDQo+IGNsaWVudA0KPiBoYXMgcmV1c2VkIGEgKHNsb3QsIHNlcXVlbmNlIG51 bWJlcikgcGFpciwgYnV0IHRoZSBzZXJ2ZXIgZG9lc24ndA0KPiBoYXZlIGENCj4gY2FjaGVkIHJl c3BvbnNlIHRvIHJldHVybi4NCj4gDQo+IEhhcmQgdG8ga25vdyBob3cgdGhhdCBoYXBwZW5lZCwg YW5kIGl0J3Mgbm90IHNob3duIGluIHRoZSBiZWxvdy4NCj4gU291bmRzIGxpa2UgYSBidWcsIHRo b3VnaC4NCg0KLi4ub3IgYSBDdHJsLUMuLi4uDQoNCg== ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-11-17 17:45 ` Trond Myklebust @ 2016-11-17 19:32 ` bfields 2016-11-17 19:58 ` Olga Kornievskaia 0 siblings, 1 reply; 27+ messages in thread From: bfields @ 2016-11-17 19:32 UTC (permalink / raw) To: Trond Myklebust; +Cc: tibbs, linux-nfs On Thu, Nov 17, 2016 at 05:45:52PM +0000, Trond Myklebust wrote: > On Thu, 2016-11-17 at 11:31 -0500, J. Bruce Fields wrote: > > On Wed, Nov 16, 2016 at 02:55:05PM -0600, Jason L Tibbitts III wrote: > > > > > > I'm replying to a rather old message, but the issue has just now > > > popped > > > back up again. > > > > > > To recap, a client stops being able to access _any_ mount on a > > > particular server, and "NFS: nfs4_reclaim_open_state: Lock reclaim > > > failed!" appears several hundred times per second in the kernel > > > log. > > > The load goes up by one for ever process attempting to access any > > > mount > > > from that particular server. Mounts to other servers are fine, and > > > other clients can mount things from that one server without > > > problems. > > > > > > When I kill every process keeping that particular mount active and > > > then > > > umount it, I see: > > > > > > NFS: nfs4_reclaim_open_state: unhandled error -10068 > > > > NFS4ERR_RETRY_UNCACHED_REP. > > > > So, you're using NFSv4.1 or 4.2, and the server thinks that the > > client > > has reused a (slot, sequence number) pair, but the server doesn't > > have a > > cached response to return. > > > > Hard to know how that happened, and it's not shown in the below. > > Sounds like a bug, though. > > ...or a Ctrl-C.... How does that happen? --b. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-11-17 19:32 ` bfields @ 2016-11-17 19:58 ` Olga Kornievskaia 2016-11-17 20:17 ` bfields 0 siblings, 1 reply; 27+ messages in thread From: Olga Kornievskaia @ 2016-11-17 19:58 UTC (permalink / raw) To: bfields; +Cc: Trond Myklebust, tibbs, linux-nfs On Thu, Nov 17, 2016 at 2:32 PM, bfields@fieldses.org <bfields@fieldses.org> wrote: > On Thu, Nov 17, 2016 at 05:45:52PM +0000, Trond Myklebust wrote: >> On Thu, 2016-11-17 at 11:31 -0500, J. Bruce Fields wrote: >> > On Wed, Nov 16, 2016 at 02:55:05PM -0600, Jason L Tibbitts III wrote: >> > > >> > > I'm replying to a rather old message, but the issue has just now >> > > popped >> > > back up again. >> > > >> > > To recap, a client stops being able to access _any_ mount on a >> > > particular server, and "NFS: nfs4_reclaim_open_state: Lock reclaim >> > > failed!" appears several hundred times per second in the kernel >> > > log. >> > > The load goes up by one for ever process attempting to access any >> > > mount >> > > from that particular server. Mounts to other servers are fine, and >> > > other clients can mount things from that one server without >> > > problems. >> > > >> > > When I kill every process keeping that particular mount active and >> > > then >> > > umount it, I see: >> > > >> > > NFS: nfs4_reclaim_open_state: unhandled error -10068 >> > >> > NFS4ERR_RETRY_UNCACHED_REP. >> > >> > So, you're using NFSv4.1 or 4.2, and the server thinks that the >> > client >> > has reused a (slot, sequence number) pair, but the server doesn't >> > have a >> > cached response to return. >> > >> > Hard to know how that happened, and it's not shown in the below. >> > Sounds like a bug, though. >> >> ...or a Ctrl-C.... > > How does that happen? > If I may chime in... Bruce, when an application sends a Ctrl-C and clients's session slot has sent out an RPC but didn't process the reply, the client doesn't know if the server processed that sequence id or not. In that case, the client doesn't increment the sequence number. Normally the client would handle getting such an error by retrying again (and resetting the slots) but I think during recovery operation the client handles errors differently (by just erroring). I believe the reasoning that we don't want to be stuck trying to recover from the recovery from the recovery etc... Jason, The UNCACHED_REP error is really not interesting as it's a consequence of you having a client that already failed with an error of "unable to reclaim the locks". I'm surprised that the application doesn't error at this point with EIO. But that aside, I think I've seen this kind of behavior due to client't callback channel going down (and not replying to the CB_RECALLs and then server revoking state). > --b. > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-11-17 19:58 ` Olga Kornievskaia @ 2016-11-17 20:17 ` bfields 2016-11-17 20:29 ` Olga Kornievskaia 0 siblings, 1 reply; 27+ messages in thread From: bfields @ 2016-11-17 20:17 UTC (permalink / raw) To: Olga Kornievskaia; +Cc: Trond Myklebust, tibbs, linux-nfs On Thu, Nov 17, 2016 at 02:58:12PM -0500, Olga Kornievskaia wrote: > On Thu, Nov 17, 2016 at 2:32 PM, bfields@fieldses.org > <bfields@fieldses.org> wrote: > > On Thu, Nov 17, 2016 at 05:45:52PM +0000, Trond Myklebust wrote: > >> On Thu, 2016-11-17 at 11:31 -0500, J. Bruce Fields wrote: > >> > On Wed, Nov 16, 2016 at 02:55:05PM -0600, Jason L Tibbitts III wrote: > >> > > > >> > > I'm replying to a rather old message, but the issue has just now > >> > > popped > >> > > back up again. > >> > > > >> > > To recap, a client stops being able to access _any_ mount on a > >> > > particular server, and "NFS: nfs4_reclaim_open_state: Lock reclaim > >> > > failed!" appears several hundred times per second in the kernel > >> > > log. > >> > > The load goes up by one for ever process attempting to access any > >> > > mount > >> > > from that particular server. Mounts to other servers are fine, and > >> > > other clients can mount things from that one server without > >> > > problems. > >> > > > >> > > When I kill every process keeping that particular mount active and > >> > > then > >> > > umount it, I see: > >> > > > >> > > NFS: nfs4_reclaim_open_state: unhandled error -10068 > >> > > >> > NFS4ERR_RETRY_UNCACHED_REP. > >> > > >> > So, you're using NFSv4.1 or 4.2, and the server thinks that the > >> > client > >> > has reused a (slot, sequence number) pair, but the server doesn't > >> > have a > >> > cached response to return. > >> > > >> > Hard to know how that happened, and it's not shown in the below. > >> > Sounds like a bug, though. > >> > >> ...or a Ctrl-C.... > > > > How does that happen? > > > > If I may chime in... > > Bruce, when an application sends a Ctrl-C and clients's session slot > has sent out an RPC but didn't process the reply, the client doesn't > know if the server processed that sequence id or not. In that case, > the client doesn't increment the sequence number. Normally the client > would handle getting such an error by retrying again (and resetting > the slots) but I think during recovery operation the client handles > errors differently (by just erroring). I believe the reasoning that we > don't want to be stuck trying to recover from the recovery from the > recovery etc... So in that case the client can end up sending a different rpc reusing the old slot and sequence number? --b. > > Jason, > > The UNCACHED_REP error is really not interesting as it's a consequence > of you having a client that already failed with an error of "unable to > reclaim the locks". I'm surprised that the application doesn't error > at this point with EIO. But that aside, I think I've seen this kind of > behavior due to client't callback channel going down (and not replying > to the CB_RECALLs and then server revoking state). > > > > > --b. > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-11-17 20:17 ` bfields @ 2016-11-17 20:29 ` Olga Kornievskaia 2016-11-17 20:46 ` bfields 0 siblings, 1 reply; 27+ messages in thread From: Olga Kornievskaia @ 2016-11-17 20:29 UTC (permalink / raw) To: bfields; +Cc: Trond Myklebust, tibbs, linux-nfs On Thu, Nov 17, 2016 at 3:17 PM, bfields@fieldses.org <bfields@fieldses.org> wrote: > On Thu, Nov 17, 2016 at 02:58:12PM -0500, Olga Kornievskaia wrote: >> On Thu, Nov 17, 2016 at 2:32 PM, bfields@fieldses.org >> <bfields@fieldses.org> wrote: >> > On Thu, Nov 17, 2016 at 05:45:52PM +0000, Trond Myklebust wrote: >> >> On Thu, 2016-11-17 at 11:31 -0500, J. Bruce Fields wrote: >> >> > On Wed, Nov 16, 2016 at 02:55:05PM -0600, Jason L Tibbitts III wrote: >> >> > > >> >> > > I'm replying to a rather old message, but the issue has just now >> >> > > popped >> >> > > back up again. >> >> > > >> >> > > To recap, a client stops being able to access _any_ mount on a >> >> > > particular server, and "NFS: nfs4_reclaim_open_state: Lock reclaim >> >> > > failed!" appears several hundred times per second in the kernel >> >> > > log. >> >> > > The load goes up by one for ever process attempting to access any >> >> > > mount >> >> > > from that particular server. Mounts to other servers are fine, and >> >> > > other clients can mount things from that one server without >> >> > > problems. >> >> > > >> >> > > When I kill every process keeping that particular mount active and >> >> > > then >> >> > > umount it, I see: >> >> > > >> >> > > NFS: nfs4_reclaim_open_state: unhandled error -10068 >> >> > >> >> > NFS4ERR_RETRY_UNCACHED_REP. >> >> > >> >> > So, you're using NFSv4.1 or 4.2, and the server thinks that the >> >> > client >> >> > has reused a (slot, sequence number) pair, but the server doesn't >> >> > have a >> >> > cached response to return. >> >> > >> >> > Hard to know how that happened, and it's not shown in the below. >> >> > Sounds like a bug, though. >> >> >> >> ...or a Ctrl-C.... >> > >> > How does that happen? >> > >> >> If I may chime in... >> >> Bruce, when an application sends a Ctrl-C and clients's session slot >> has sent out an RPC but didn't process the reply, the client doesn't >> know if the server processed that sequence id or not. In that case, >> the client doesn't increment the sequence number. Normally the client >> would handle getting such an error by retrying again (and resetting >> the slots) but I think during recovery operation the client handles >> errors differently (by just erroring). I believe the reasoning that we >> don't want to be stuck trying to recover from the recovery from the >> recovery etc... > > So in that case the client can end up sending a different rpc reusing > the old slot and sequence number? Correct. >> >> Jason, >> >> The UNCACHED_REP error is really not interesting as it's a consequence >> of you having a client that already failed with an error of "unable to >> reclaim the locks". I'm surprised that the application doesn't error >> at this point with EIO. But that aside, I think I've seen this kind of >> behavior due to client't callback channel going down (and not replying >> to the CB_RECALLs and then server revoking state). >> >> >> >> > --b. >> > -- >> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> > the body of a message to majordomo@vger.kernel.org >> > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-11-17 20:29 ` Olga Kornievskaia @ 2016-11-17 20:46 ` bfields 2016-11-17 21:05 ` Olga Kornievskaia 0 siblings, 1 reply; 27+ messages in thread From: bfields @ 2016-11-17 20:46 UTC (permalink / raw) To: Olga Kornievskaia; +Cc: Trond Myklebust, tibbs, linux-nfs On Thu, Nov 17, 2016 at 03:29:11PM -0500, Olga Kornievskaia wrote: > On Thu, Nov 17, 2016 at 3:17 PM, bfields@fieldses.org > <bfields@fieldses.org> wrote: > > On Thu, Nov 17, 2016 at 02:58:12PM -0500, Olga Kornievskaia wrote: > >> On Thu, Nov 17, 2016 at 2:32 PM, bfields@fieldses.org > >> <bfields@fieldses.org> wrote: > >> > On Thu, Nov 17, 2016 at 05:45:52PM +0000, Trond Myklebust wrote: > >> >> On Thu, 2016-11-17 at 11:31 -0500, J. Bruce Fields wrote: > >> >> > On Wed, Nov 16, 2016 at 02:55:05PM -0600, Jason L Tibbitts III wrote: > >> >> > > > >> >> > > I'm replying to a rather old message, but the issue has just now > >> >> > > popped > >> >> > > back up again. > >> >> > > > >> >> > > To recap, a client stops being able to access _any_ mount on a > >> >> > > particular server, and "NFS: nfs4_reclaim_open_state: Lock reclaim > >> >> > > failed!" appears several hundred times per second in the kernel > >> >> > > log. > >> >> > > The load goes up by one for ever process attempting to access any > >> >> > > mount > >> >> > > from that particular server. Mounts to other servers are fine, and > >> >> > > other clients can mount things from that one server without > >> >> > > problems. > >> >> > > > >> >> > > When I kill every process keeping that particular mount active and > >> >> > > then > >> >> > > umount it, I see: > >> >> > > > >> >> > > NFS: nfs4_reclaim_open_state: unhandled error -10068 > >> >> > > >> >> > NFS4ERR_RETRY_UNCACHED_REP. > >> >> > > >> >> > So, you're using NFSv4.1 or 4.2, and the server thinks that the > >> >> > client > >> >> > has reused a (slot, sequence number) pair, but the server doesn't > >> >> > have a > >> >> > cached response to return. > >> >> > > >> >> > Hard to know how that happened, and it's not shown in the below. > >> >> > Sounds like a bug, though. > >> >> > >> >> ...or a Ctrl-C.... > >> > > >> > How does that happen? > >> > > >> > >> If I may chime in... > >> > >> Bruce, when an application sends a Ctrl-C and clients's session slot > >> has sent out an RPC but didn't process the reply, the client doesn't > >> know if the server processed that sequence id or not. In that case, > >> the client doesn't increment the sequence number. Normally the client > >> would handle getting such an error by retrying again (and resetting > >> the slots) but I think during recovery operation the client handles > >> errors differently (by just erroring). I believe the reasoning that we > >> don't want to be stuck trying to recover from the recovery from the > >> recovery etc... > > > > So in that case the client can end up sending a different rpc reusing > > the old slot and sequence number? > > Correct. So that could get UNCACHED_REP as the response. But if you're very unlucky, couldn't this also happen?: 1) the compound previously sent on that slot was processed by the server and cached 2) the compound you're sending now happens to have the same set of operations with the result that the client doesn't detect that the reply was actually to some other rpc, and instead it returns bad data to the application? --b. > > >> > >> Jason, > >> > >> The UNCACHED_REP error is really not interesting as it's a consequence > >> of you having a client that already failed with an error of "unable to > >> reclaim the locks". I'm surprised that the application doesn't error > >> at this point with EIO. But that aside, I think I've seen this kind of > >> behavior due to client't callback channel going down (and not replying > >> to the CB_RECALLs and then server revoking state). > >> > >> > >> > >> > --b. > >> > -- > >> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > >> > the body of a message to majordomo@vger.kernel.org > >> > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-11-17 20:46 ` bfields @ 2016-11-17 21:05 ` Olga Kornievskaia 2016-11-17 21:26 ` bfields 0 siblings, 1 reply; 27+ messages in thread From: Olga Kornievskaia @ 2016-11-17 21:05 UTC (permalink / raw) To: bfields; +Cc: Trond Myklebust, tibbs, linux-nfs On Thu, Nov 17, 2016 at 3:46 PM, bfields@fieldses.org <bfields@fieldses.org> wrote: > On Thu, Nov 17, 2016 at 03:29:11PM -0500, Olga Kornievskaia wrote: >> On Thu, Nov 17, 2016 at 3:17 PM, bfields@fieldses.org >> <bfields@fieldses.org> wrote: >> > On Thu, Nov 17, 2016 at 02:58:12PM -0500, Olga Kornievskaia wrote: >> >> On Thu, Nov 17, 2016 at 2:32 PM, bfields@fieldses.org >> >> <bfields@fieldses.org> wrote: >> >> > On Thu, Nov 17, 2016 at 05:45:52PM +0000, Trond Myklebust wrote: >> >> >> On Thu, 2016-11-17 at 11:31 -0500, J. Bruce Fields wrote: >> >> >> > On Wed, Nov 16, 2016 at 02:55:05PM -0600, Jason L Tibbitts III wrote: >> >> >> > > >> >> >> > > I'm replying to a rather old message, but the issue has just now >> >> >> > > popped >> >> >> > > back up again. >> >> >> > > >> >> >> > > To recap, a client stops being able to access _any_ mount on a >> >> >> > > particular server, and "NFS: nfs4_reclaim_open_state: Lock reclaim >> >> >> > > failed!" appears several hundred times per second in the kernel >> >> >> > > log. >> >> >> > > The load goes up by one for ever process attempting to access any >> >> >> > > mount >> >> >> > > from that particular server. Mounts to other servers are fine, and >> >> >> > > other clients can mount things from that one server without >> >> >> > > problems. >> >> >> > > >> >> >> > > When I kill every process keeping that particular mount active and >> >> >> > > then >> >> >> > > umount it, I see: >> >> >> > > >> >> >> > > NFS: nfs4_reclaim_open_state: unhandled error -10068 >> >> >> > >> >> >> > NFS4ERR_RETRY_UNCACHED_REP. >> >> >> > >> >> >> > So, you're using NFSv4.1 or 4.2, and the server thinks that the >> >> >> > client >> >> >> > has reused a (slot, sequence number) pair, but the server doesn't >> >> >> > have a >> >> >> > cached response to return. >> >> >> > >> >> >> > Hard to know how that happened, and it's not shown in the below. >> >> >> > Sounds like a bug, though. >> >> >> >> >> >> ...or a Ctrl-C.... >> >> > >> >> > How does that happen? >> >> > >> >> >> >> If I may chime in... >> >> >> >> Bruce, when an application sends a Ctrl-C and clients's session slot >> >> has sent out an RPC but didn't process the reply, the client doesn't >> >> know if the server processed that sequence id or not. In that case, >> >> the client doesn't increment the sequence number. Normally the client >> >> would handle getting such an error by retrying again (and resetting >> >> the slots) but I think during recovery operation the client handles >> >> errors differently (by just erroring). I believe the reasoning that we >> >> don't want to be stuck trying to recover from the recovery from the >> >> recovery etc... >> > >> > So in that case the client can end up sending a different rpc reusing >> > the old slot and sequence number? >> >> Correct. > > So that could get UNCACHED_REP as the response. But if you're very > unlucky, couldn't this also happen?: > > 1) the compound previously sent on that slot was processed by > the server and cached > 2) the compound you're sending now happens to have the same set > of operations > > with the result that the client doesn't detect that the reply was > actually to some other rpc, and instead it returns bad data to the > application? If you are sending exactly the same operations and arguments, then why is a reply from the cache would lead to bad data? > > --b. > >> >> >> >> >> Jason, >> >> >> >> The UNCACHED_REP error is really not interesting as it's a consequence >> >> of you having a client that already failed with an error of "unable to >> >> reclaim the locks". I'm surprised that the application doesn't error >> >> at this point with EIO. But that aside, I think I've seen this kind of >> >> behavior due to client't callback channel going down (and not replying >> >> to the CB_RECALLs and then server revoking state). >> >> >> >> >> >> >> >> > --b. >> >> > -- >> >> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> >> > the body of a message to majordomo@vger.kernel.org >> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-11-17 21:05 ` Olga Kornievskaia @ 2016-11-17 21:26 ` bfields 2016-11-17 21:45 ` Trond Myklebust 0 siblings, 1 reply; 27+ messages in thread From: bfields @ 2016-11-17 21:26 UTC (permalink / raw) To: Olga Kornievskaia; +Cc: Trond Myklebust, tibbs, linux-nfs On Thu, Nov 17, 2016 at 04:05:32PM -0500, Olga Kornievskaia wrote: > On Thu, Nov 17, 2016 at 3:46 PM, bfields@fieldses.org > <bfields@fieldses.org> wrote: > > On Thu, Nov 17, 2016 at 03:29:11PM -0500, Olga Kornievskaia wrote: > >> On Thu, Nov 17, 2016 at 3:17 PM, bfields@fieldses.org > >> <bfields@fieldses.org> wrote: > >> > On Thu, Nov 17, 2016 at 02:58:12PM -0500, Olga Kornievskaia wrote: > >> >> On Thu, Nov 17, 2016 at 2:32 PM, bfields@fieldses.org > >> >> <bfields@fieldses.org> wrote: > >> >> > On Thu, Nov 17, 2016 at 05:45:52PM +0000, Trond Myklebust wrote: > >> >> >> On Thu, 2016-11-17 at 11:31 -0500, J. Bruce Fields wrote: > >> >> >> > On Wed, Nov 16, 2016 at 02:55:05PM -0600, Jason L Tibbitts III wrote: > >> >> >> > > > >> >> >> > > I'm replying to a rather old message, but the issue has just now > >> >> >> > > popped > >> >> >> > > back up again. > >> >> >> > > > >> >> >> > > To recap, a client stops being able to access _any_ mount on a > >> >> >> > > particular server, and "NFS: nfs4_reclaim_open_state: Lock reclaim > >> >> >> > > failed!" appears several hundred times per second in the kernel > >> >> >> > > log. > >> >> >> > > The load goes up by one for ever process attempting to access any > >> >> >> > > mount > >> >> >> > > from that particular server. Mounts to other servers are fine, and > >> >> >> > > other clients can mount things from that one server without > >> >> >> > > problems. > >> >> >> > > > >> >> >> > > When I kill every process keeping that particular mount active and > >> >> >> > > then > >> >> >> > > umount it, I see: > >> >> >> > > > >> >> >> > > NFS: nfs4_reclaim_open_state: unhandled error -10068 > >> >> >> > > >> >> >> > NFS4ERR_RETRY_UNCACHED_REP. > >> >> >> > > >> >> >> > So, you're using NFSv4.1 or 4.2, and the server thinks that the > >> >> >> > client > >> >> >> > has reused a (slot, sequence number) pair, but the server doesn't > >> >> >> > have a > >> >> >> > cached response to return. > >> >> >> > > >> >> >> > Hard to know how that happened, and it's not shown in the below. > >> >> >> > Sounds like a bug, though. > >> >> >> > >> >> >> ...or a Ctrl-C.... > >> >> > > >> >> > How does that happen? > >> >> > > >> >> > >> >> If I may chime in... > >> >> > >> >> Bruce, when an application sends a Ctrl-C and clients's session slot > >> >> has sent out an RPC but didn't process the reply, the client doesn't > >> >> know if the server processed that sequence id or not. In that case, > >> >> the client doesn't increment the sequence number. Normally the client > >> >> would handle getting such an error by retrying again (and resetting > >> >> the slots) but I think during recovery operation the client handles > >> >> errors differently (by just erroring). I believe the reasoning that we > >> >> don't want to be stuck trying to recover from the recovery from the > >> >> recovery etc... > >> > > >> > So in that case the client can end up sending a different rpc reusing > >> > the old slot and sequence number? > >> > >> Correct. > > > > So that could get UNCACHED_REP as the response. But if you're very > > unlucky, couldn't this also happen?: > > > > 1) the compound previously sent on that slot was processed by > > the server and cached > > 2) the compound you're sending now happens to have the same set > > of operations > > > > with the result that the client doesn't detect that the reply was > > actually to some other rpc, and instead it returns bad data to the > > application? > > If you are sending exactly the same operations and arguments, then why > is a reply from the cache would lead to bad data? That would probably be fine, I was wondering what would happen if you sent the same operation but different arguments. So the original cancelled operation is something like PUTFH(fh1)+OPEN("foo")+GETFH, and the new one is PUTFH(fh2)+OPEN("bar")+GETFH. In theory couldn't the second one succeed and leave the client thinking it had opened (fh2, bar) when the filehandle it got back was really for (fh1, foo)? --b. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-11-17 21:26 ` bfields @ 2016-11-17 21:45 ` Trond Myklebust 2016-11-17 21:53 ` Olga Kornievskaia 0 siblings, 1 reply; 27+ messages in thread From: Trond Myklebust @ 2016-11-17 21:45 UTC (permalink / raw) To: bfields, aglo; +Cc: tibbs, linux-nfs T24gVGh1LCAyMDE2LTExLTE3IGF0IDE2OjI2IC0wNTAwLCBiZmllbGRzQGZpZWxkc2VzLm9yZyB3 cm90ZToNCj4gT24gVGh1LCBOb3YgMTcsIDIwMTYgYXQgMDQ6MDU6MzJQTSAtMDUwMCwgT2xnYSBL b3JuaWV2c2thaWEgd3JvdGU6DQo+ID4gDQo+ID4gT24gVGh1LCBOb3YgMTcsIDIwMTYgYXQgMzo0 NiBQTSwgYmZpZWxkc0BmaWVsZHNlcy5vcmcNCj4gPiA8YmZpZWxkc0BmaWVsZHNlcy5vcmc+IHdy b3RlOg0KPiA+ID4gDQo+ID4gPiBPbiBUaHUsIE5vdiAxNywgMjAxNiBhdCAwMzoyOToxMVBNIC0w NTAwLCBPbGdhIEtvcm5pZXZza2FpYQ0KPiA+ID4gd3JvdGU6DQo+ID4gPiA+IA0KPiA+ID4gPiBP biBUaHUsIE5vdiAxNywgMjAxNiBhdCAzOjE3IFBNLCBiZmllbGRzQGZpZWxkc2VzLm9yZw0KPiA+ ID4gPiA8YmZpZWxkc0BmaWVsZHNlcy5vcmc+IHdyb3RlOg0KPiA+ID4gPiA+IA0KPiA+ID4gPiA+ IE9uIFRodSwgTm92IDE3LCAyMDE2IGF0IDAyOjU4OjEyUE0gLTA1MDAsIE9sZ2EgS29ybmlldnNr YWlhDQo+ID4gPiA+ID4gd3JvdGU6DQo+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+IE9uIFRodSwg Tm92IDE3LCAyMDE2IGF0IDI6MzIgUE0sIGJmaWVsZHNAZmllbGRzZXMub3JnDQo+ID4gPiA+ID4g PiA8YmZpZWxkc0BmaWVsZHNlcy5vcmc+IHdyb3RlOg0KPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4g PiA+ID4gT24gVGh1LCBOb3YgMTcsIDIwMTYgYXQgMDU6NDU6NTJQTSArMDAwMCwgVHJvbmQgTXlr bGVidXN0DQo+ID4gPiA+ID4gPiA+IHdyb3RlOg0KPiA+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ ID4gPiA+IE9uIFRodSwgMjAxNi0xMS0xNyBhdCAxMTozMSAtMDUwMCwgSi4gQnJ1Y2UgRmllbGRz DQo+ID4gPiA+ID4gPiA+ID4gd3JvdGU6DQo+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ ID4gPiA+IE9uIFdlZCwgTm92IDE2LCAyMDE2IGF0IDAyOjU1OjA1UE0gLTA2MDAsIEphc29uIEwN Cj4gPiA+ID4gPiA+ID4gPiA+IFRpYmJpdHRzIElJSSB3cm90ZToNCj4gPiA+ID4gPiA+ID4gPiA+ ID4gDQo+ID4gPiA+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiA+ID4gPiBJJ20gcmVwbHlp bmcgdG8gYSByYXRoZXIgb2xkIG1lc3NhZ2UsIGJ1dCB0aGUgaXNzdWUNCj4gPiA+ID4gPiA+ID4g PiA+ID4gaGFzIGp1c3Qgbm93DQo+ID4gPiA+ID4gPiA+ID4gPiA+IHBvcHBlZA0KPiA+ID4gPiA+ ID4gPiA+ID4gPiBiYWNrIHVwIGFnYWluLg0KPiA+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4g PiA+ID4gPiA+ID4gVG8gcmVjYXAsIGEgY2xpZW50IHN0b3BzIGJlaW5nIGFibGUgdG8gYWNjZXNz IF9hbnlfDQo+ID4gPiA+ID4gPiA+ID4gPiA+IG1vdW50IG9uIGENCj4gPiA+ID4gPiA+ID4gPiA+ ID4gcGFydGljdWxhciBzZXJ2ZXIsIGFuZCAiTkZTOg0KPiA+ID4gPiA+ID4gPiA+ID4gPiBuZnM0 X3JlY2xhaW1fb3Blbl9zdGF0ZTogTG9jayByZWNsYWltDQo+ID4gPiA+ID4gPiA+ID4gPiA+IGZh aWxlZCEiIGFwcGVhcnMgc2V2ZXJhbCBodW5kcmVkIHRpbWVzIHBlciBzZWNvbmQNCj4gPiA+ID4g PiA+ID4gPiA+ID4gaW4gdGhlIGtlcm5lbA0KPiA+ID4gPiA+ID4gPiA+ID4gPiBsb2cuDQo+ID4g PiA+ID4gPiA+ID4gPiA+IFRoZSBsb2FkIGdvZXMgdXAgYnkgb25lIGZvciBldmVyIHByb2Nlc3Mg YXR0ZW1wdGluZw0KPiA+ID4gPiA+ID4gPiA+ID4gPiB0byBhY2Nlc3MgYW55DQo+ID4gPiA+ID4g PiA+ID4gPiA+IG1vdW50DQo+ID4gPiA+ID4gPiA+ID4gPiA+IGZyb20gdGhhdCBwYXJ0aWN1bGFy IHNlcnZlci7CoMKgTW91bnRzIHRvIG90aGVyDQo+ID4gPiA+ID4gPiA+ID4gPiA+IHNlcnZlcnMg YXJlIGZpbmUsIGFuZA0KPiA+ID4gPiA+ID4gPiA+ID4gPiBvdGhlciBjbGllbnRzIGNhbiBtb3Vu dCB0aGluZ3MgZnJvbSB0aGF0IG9uZSBzZXJ2ZXINCj4gPiA+ID4gPiA+ID4gPiA+ID4gd2l0aG91 dA0KPiA+ID4gPiA+ID4gPiA+ID4gPiBwcm9ibGVtcy4NCj4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ ID4gPiA+ID4gPiA+ID4gPiA+IFdoZW4gSSBraWxsIGV2ZXJ5IHByb2Nlc3Mga2VlcGluZyB0aGF0 IHBhcnRpY3VsYXINCj4gPiA+ID4gPiA+ID4gPiA+ID4gbW91bnQgYWN0aXZlIGFuZA0KPiA+ID4g PiA+ID4gPiA+ID4gPiB0aGVuDQo+ID4gPiA+ID4gPiA+ID4gPiA+IHVtb3VudCBpdCwgSSBzZWU6 DQo+ID4gPiA+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiA+ID4gPiBORlM6IG5mczRfcmVj bGFpbV9vcGVuX3N0YXRlOiB1bmhhbmRsZWQgZXJyb3INCj4gPiA+ID4gPiA+ID4gPiA+ID4gLTEw MDY4DQo+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+IE5GUzRFUlJfUkVUUllf VU5DQUNIRURfUkVQLg0KPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gPiBTbywg eW91J3JlIHVzaW5nIE5GU3Y0LjEgb3IgNC4yLCBhbmQgdGhlIHNlcnZlcg0KPiA+ID4gPiA+ID4g PiA+ID4gdGhpbmtzIHRoYXQgdGhlDQo+ID4gPiA+ID4gPiA+ID4gPiBjbGllbnQNCj4gPiA+ID4g PiA+ID4gPiA+IGhhcyByZXVzZWQgYSAoc2xvdCwgc2VxdWVuY2UgbnVtYmVyKSBwYWlyLCBidXQg dGhlDQo+ID4gPiA+ID4gPiA+ID4gPiBzZXJ2ZXIgZG9lc24ndA0KPiA+ID4gPiA+ID4gPiA+ID4g aGF2ZSBhDQo+ID4gPiA+ID4gPiA+ID4gPiBjYWNoZWQgcmVzcG9uc2UgdG8gcmV0dXJuLg0KPiA+ ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gPiBIYXJkIHRvIGtub3cgaG93IHRoYXQg aGFwcGVuZWQsIGFuZCBpdCdzIG5vdCBzaG93biBpbg0KPiA+ID4gPiA+ID4gPiA+ID4gdGhlIGJl bG93Lg0KPiA+ID4gPiA+ID4gPiA+ID4gU291bmRzIGxpa2UgYSBidWcsIHRob3VnaC4NCj4gPiA+ ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiAuLi5vciBhIEN0cmwtQy4uLi4NCj4gPiA+ID4g PiA+ID4gDQo+ID4gPiA+ID4gPiA+IEhvdyBkb2VzIHRoYXQgaGFwcGVuPw0KPiA+ID4gPiA+ID4g PiANCj4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gSWYgSSBtYXkgY2hpbWUgaW4uLi4NCj4gPiA+ ID4gPiA+IA0KPiA+ID4gPiA+ID4gQnJ1Y2UsIHdoZW4gYW4gYXBwbGljYXRpb24gc2VuZHMgYSBD dHJsLUMgYW5kIGNsaWVudHMncw0KPiA+ID4gPiA+ID4gc2Vzc2lvbiBzbG90DQo+ID4gPiA+ID4g PiBoYXMgc2VudCBvdXQgYW4gUlBDIGJ1dCBkaWRuJ3QgcHJvY2VzcyB0aGUgcmVwbHksIHRoZQ0K PiA+ID4gPiA+ID4gY2xpZW50IGRvZXNuJ3QNCj4gPiA+ID4gPiA+IGtub3cgaWYgdGhlIHNlcnZl ciBwcm9jZXNzZWQgdGhhdCBzZXF1ZW5jZSBpZCBvciBub3QuIEluDQo+ID4gPiA+ID4gPiB0aGF0 IGNhc2UsDQo+ID4gPiA+ID4gPiB0aGUgY2xpZW50IGRvZXNuJ3QgaW5jcmVtZW50IHRoZSBzZXF1 ZW5jZSBudW1iZXIuIE5vcm1hbGx5DQo+ID4gPiA+ID4gPiB0aGUgY2xpZW50DQo+ID4gPiA+ID4g PiB3b3VsZCBoYW5kbGUgZ2V0dGluZyBzdWNoIGFuIGVycm9yIGJ5IHJldHJ5aW5nIGFnYWluIChh bmQNCj4gPiA+ID4gPiA+IHJlc2V0dGluZw0KPiA+ID4gPiA+ID4gdGhlIHNsb3RzKSBidXQgSSB0 aGluayBkdXJpbmcgcmVjb3Zlcnkgb3BlcmF0aW9uIHRoZSBjbGllbnQNCj4gPiA+ID4gPiA+IGhh bmRsZXMNCj4gPiA+ID4gPiA+IGVycm9ycyBkaWZmZXJlbnRseSAoYnkganVzdCBlcnJvcmluZyku IEkgYmVsaWV2ZSB0aGUNCj4gPiA+ID4gPiA+IHJlYXNvbmluZyB0aGF0IHdlDQo+ID4gPiA+ID4g PiBkb24ndCB3YW50IHRvIGJlIHN0dWNrIHRyeWluZyB0byByZWNvdmVyIGZyb20gdGhlIHJlY292 ZXJ5DQo+ID4gPiA+ID4gPiBmcm9tIHRoZQ0KPiA+ID4gPiA+ID4gcmVjb3ZlcnkgZXRjLi4uDQo+ ID4gPiA+ID4gDQo+ID4gPiA+ID4gU28gaW4gdGhhdCBjYXNlIHRoZSBjbGllbnQgY2FuIGVuZCB1 cCBzZW5kaW5nIGEgZGlmZmVyZW50IHJwYw0KPiA+ID4gPiA+IHJldXNpbmcNCj4gPiA+ID4gPiB0 aGUgb2xkIHNsb3QgYW5kIHNlcXVlbmNlIG51bWJlcj8NCj4gPiA+ID4gDQo+ID4gPiA+IENvcnJl Y3QuDQo+ID4gPiANCj4gPiA+IFNvIHRoYXQgY291bGQgZ2V0IFVOQ0FDSEVEX1JFUCBhcyB0aGUg cmVzcG9uc2UuwqDCoEJ1dCBpZiB5b3UncmUNCj4gPiA+IHZlcnkNCj4gPiA+IHVubHVja3ksIGNv dWxkbid0IHRoaXMgYWxzbyBoYXBwZW4/Og0KPiA+ID4gDQo+ID4gPiDCoMKgwqDCoMKgwqDCoMKg MSkgdGhlIGNvbXBvdW5kIHByZXZpb3VzbHkgc2VudCBvbiB0aGF0IHNsb3Qgd2FzDQo+ID4gPiBw cm9jZXNzZWQgYnkNCj4gPiA+IMKgwqDCoMKgwqDCoMKgwqB0aGUgc2VydmVyIGFuZCBjYWNoZWQN Cj4gPiA+IMKgwqDCoMKgwqDCoMKgwqAyKSB0aGUgY29tcG91bmQgeW91J3JlIHNlbmRpbmcgbm93 IGhhcHBlbnMgdG8gaGF2ZSB0aGUNCj4gPiA+IHNhbWUgc2V0DQo+ID4gPiDCoMKgwqDCoMKgwqDC oMKgb2Ygb3BlcmF0aW9ucw0KPiA+ID4gDQo+ID4gPiB3aXRoIHRoZSByZXN1bHQgdGhhdCB0aGUg Y2xpZW50IGRvZXNuJ3QgZGV0ZWN0IHRoYXQgdGhlIHJlcGx5IHdhcw0KPiA+ID4gYWN0dWFsbHkg dG8gc29tZSBvdGhlciBycGMsIGFuZCBpbnN0ZWFkIGl0IHJldHVybnMgYmFkIGRhdGEgdG8NCj4g PiA+IHRoZQ0KPiA+ID4gYXBwbGljYXRpb24/DQo+ID4gDQo+ID4gSWYgeW91IGFyZSBzZW5kaW5n IGV4YWN0bHkgdGhlIHNhbWUgb3BlcmF0aW9ucyBhbmQgYXJndW1lbnRzLCB0aGVuDQo+ID4gd2h5 DQo+ID4gaXMgYSByZXBseSBmcm9tIHRoZSBjYWNoZSB3b3VsZCBsZWFkIHRvIGJhZCBkYXRhPw0K PiANCj4gVGhhdCB3b3VsZCBwcm9iYWJseSBiZSBmaW5lLCBJIHdhcyB3b25kZXJpbmcgd2hhdCB3 b3VsZCBoYXBwZW4gaWYgeW91DQo+IHNlbnQgdGhlIHNhbWUgb3BlcmF0aW9uIGJ1dCBkaWZmZXJl bnQgYXJndW1lbnRzLg0KDQo+IFNvIHRoZSBvcmlnaW5hbCBjYW5jZWxsZWQgb3BlcmF0aW9uIGlz IHNvbWV0aGluZyBsaWtlDQo+IFBVVEZIKGZoMSkrT1BFTigiZm9vIikrR0VURkgsIGFuZCB0aGUg bmV3IG9uZSBpcw0KPiBQVVRGSChmaDIpK09QRU4oImJhciIpK0dFVEZILsKgwqBJbiB0aGVvcnkg Y291bGRuJ3QgdGhlIHNlY29uZCBvbmUNCj4gc3VjY2VlZA0KPiBhbmQgbGVhdmUgdGhlIGNsaWVu dCB0aGlua2luZyBpdCBoYWQgb3BlbmVkIChmaDIsIGJhcikgd2hlbiB0aGUNCj4gZmlsZWhhbmRs ZSBpdCBnb3QgYmFjayB3YXMgcmVhbGx5IGZvciAoZmgxLCBmb28pPw0KPsKgDQoNClRoZSBjbGll bnQgd291bGQgcmVjZWl2ZSBhIGZpbGVoYW5kbGUgZm9yIGZoMS8iZm9vIiwgc28gaXQgd291bGQg YXBwbHkNCmFueSBzdGF0ZSBpdCB0aG91Z2h0IGl0IGhhZCByZWNlaXZlZCB0byB0aGF0IGZpbGUu IEhvd2V2ZXIsIG5vcm1hbGx5LA0KSSdkIGV4cGVjdCB0byBzZWUgYSBORlM0RVJSX0ZBTFNFX1JF VFJZIGluIHRoaXMgY2FzZS4= ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-11-17 21:45 ` Trond Myklebust @ 2016-11-17 21:53 ` Olga Kornievskaia 2016-11-17 22:15 ` Trond Myklebust 0 siblings, 1 reply; 27+ messages in thread From: Olga Kornievskaia @ 2016-11-17 21:53 UTC (permalink / raw) To: Trond Myklebust; +Cc: bfields, tibbs, linux-nfs On Thu, Nov 17, 2016 at 4:45 PM, Trond Myklebust <trondmy@primarydata.com> wrote: > On Thu, 2016-11-17 at 16:26 -0500, bfields@fieldses.org wrote: >> On Thu, Nov 17, 2016 at 04:05:32PM -0500, Olga Kornievskaia wrote: >> > >> > On Thu, Nov 17, 2016 at 3:46 PM, bfields@fieldses.org >> > <bfields@fieldses.org> wrote: >> > > >> > > On Thu, Nov 17, 2016 at 03:29:11PM -0500, Olga Kornievskaia >> > > wrote: >> > > > >> > > > On Thu, Nov 17, 2016 at 3:17 PM, bfields@fieldses.org >> > > > <bfields@fieldses.org> wrote: >> > > > > >> > > > > On Thu, Nov 17, 2016 at 02:58:12PM -0500, Olga Kornievskaia >> > > > > wrote: >> > > > > > >> > > > > > On Thu, Nov 17, 2016 at 2:32 PM, bfields@fieldses.org >> > > > > > <bfields@fieldses.org> wrote: >> > > > > > > >> > > > > > > On Thu, Nov 17, 2016 at 05:45:52PM +0000, Trond Myklebust >> > > > > > > wrote: >> > > > > > > > >> > > > > > > > On Thu, 2016-11-17 at 11:31 -0500, J. Bruce Fields >> > > > > > > > wrote: >> > > > > > > > > >> > > > > > > > > On Wed, Nov 16, 2016 at 02:55:05PM -0600, Jason L >> > > > > > > > > Tibbitts III wrote: >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > I'm replying to a rather old message, but the issue >> > > > > > > > > > has just now >> > > > > > > > > > popped >> > > > > > > > > > back up again. >> > > > > > > > > > >> > > > > > > > > > To recap, a client stops being able to access _any_ >> > > > > > > > > > mount on a >> > > > > > > > > > particular server, and "NFS: >> > > > > > > > > > nfs4_reclaim_open_state: Lock reclaim >> > > > > > > > > > failed!" appears several hundred times per second >> > > > > > > > > > in the kernel >> > > > > > > > > > log. >> > > > > > > > > > The load goes up by one for ever process attempting >> > > > > > > > > > to access any >> > > > > > > > > > mount >> > > > > > > > > > from that particular server. Mounts to other >> > > > > > > > > > servers are fine, and >> > > > > > > > > > other clients can mount things from that one server >> > > > > > > > > > without >> > > > > > > > > > problems. >> > > > > > > > > > >> > > > > > > > > > When I kill every process keeping that particular >> > > > > > > > > > mount active and >> > > > > > > > > > then >> > > > > > > > > > umount it, I see: >> > > > > > > > > > >> > > > > > > > > > NFS: nfs4_reclaim_open_state: unhandled error >> > > > > > > > > > -10068 >> > > > > > > > > >> > > > > > > > > NFS4ERR_RETRY_UNCACHED_REP. >> > > > > > > > > >> > > > > > > > > So, you're using NFSv4.1 or 4.2, and the server >> > > > > > > > > thinks that the >> > > > > > > > > client >> > > > > > > > > has reused a (slot, sequence number) pair, but the >> > > > > > > > > server doesn't >> > > > > > > > > have a >> > > > > > > > > cached response to return. >> > > > > > > > > >> > > > > > > > > Hard to know how that happened, and it's not shown in >> > > > > > > > > the below. >> > > > > > > > > Sounds like a bug, though. >> > > > > > > > >> > > > > > > > ...or a Ctrl-C.... >> > > > > > > >> > > > > > > How does that happen? >> > > > > > > >> > > > > > >> > > > > > If I may chime in... >> > > > > > >> > > > > > Bruce, when an application sends a Ctrl-C and clients's >> > > > > > session slot >> > > > > > has sent out an RPC but didn't process the reply, the >> > > > > > client doesn't >> > > > > > know if the server processed that sequence id or not. In >> > > > > > that case, >> > > > > > the client doesn't increment the sequence number. Normally >> > > > > > the client >> > > > > > would handle getting such an error by retrying again (and >> > > > > > resetting >> > > > > > the slots) but I think during recovery operation the client >> > > > > > handles >> > > > > > errors differently (by just erroring). I believe the >> > > > > > reasoning that we >> > > > > > don't want to be stuck trying to recover from the recovery >> > > > > > from the >> > > > > > recovery etc... >> > > > > >> > > > > So in that case the client can end up sending a different rpc >> > > > > reusing >> > > > > the old slot and sequence number? >> > > > >> > > > Correct. >> > > >> > > So that could get UNCACHED_REP as the response. But if you're >> > > very >> > > unlucky, couldn't this also happen?: >> > > >> > > 1) the compound previously sent on that slot was >> > > processed by >> > > the server and cached >> > > 2) the compound you're sending now happens to have the >> > > same set >> > > of operations >> > > >> > > with the result that the client doesn't detect that the reply was >> > > actually to some other rpc, and instead it returns bad data to >> > > the >> > > application? >> > >> > If you are sending exactly the same operations and arguments, then >> > why >> > is a reply from the cache would lead to bad data? >> >> That would probably be fine, I was wondering what would happen if you >> sent the same operation but different arguments. > >> So the original cancelled operation is something like >> PUTFH(fh1)+OPEN("foo")+GETFH, and the new one is >> PUTFH(fh2)+OPEN("bar")+GETFH. In theory couldn't the second one >> succeed >> and leave the client thinking it had opened (fh2, bar) when the >> filehandle it got back was really for (fh1, foo)? >> > > The client would receive a filehandle for fh1/"foo", so it would apply > any state it thought it had received to that file. However, normally, > I'd expect to see a NFS4ERR_FALSE_RETRY in this case. I see Bruce's point that if the server only looks up the cache based on the seqid and slot# and doesn't have like a hash of the content which I could see is expensive, then the client in this case would get into trouble of thinking it opened "bar" but really it's "foo". Spec says: Section 18.46.3 If the client reuses a slot ID and sequence ID for a completely different request, the server MAY treat the request as if it is a retry of what it has already executed. The server MAY however detect the client's illegal reuse and return NFS4ERR_SEQ_FALSE_RETRY. What is "a completely different request". From the client's point of view sending different args would constitute a different request. But in any case it's a "MAY" so client can't depend on this being implemented. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-11-17 21:53 ` Olga Kornievskaia @ 2016-11-17 22:15 ` Trond Myklebust 2016-11-17 22:27 ` Olga Kornievskaia 0 siblings, 1 reply; 27+ messages in thread From: Trond Myklebust @ 2016-11-17 22:15 UTC (permalink / raw) To: aglo; +Cc: bfields, tibbs, linux-nfs T24gVGh1LCAyMDE2LTExLTE3IGF0IDE2OjUzIC0wNTAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90 ZToNCj4gT24gVGh1LCBOb3YgMTcsIDIwMTYgYXQgNDo0NSBQTSwgVHJvbmQgTXlrbGVidXN0DQo+ IDx0cm9uZG15QHByaW1hcnlkYXRhLmNvbT4gd3JvdGU6DQo+ID4gDQo+ID4gT24gVGh1LCAyMDE2 LTExLTE3IGF0IDE2OjI2IC0wNTAwLCBiZmllbGRzQGZpZWxkc2VzLm9yZyB3cm90ZToNCj4gPiA+ IA0KPiA+ID4gT24gVGh1LCBOb3YgMTcsIDIwMTYgYXQgMDQ6MDU6MzJQTSAtMDUwMCwgT2xnYSBL b3JuaWV2c2thaWENCj4gPiA+IHdyb3RlOg0KPiA+ID4gPiANCj4gPiA+ID4gDQo+ID4gPiA+IE9u IFRodSwgTm92IDE3LCAyMDE2IGF0IDM6NDYgUE0sIGJmaWVsZHNAZmllbGRzZXMub3JnDQo+ID4g PiA+IDxiZmllbGRzQGZpZWxkc2VzLm9yZz4gd3JvdGU6DQo+ID4gPiA+ID4gDQo+ID4gPiA+ID4g DQo+ID4gPiA+ID4gT24gVGh1LCBOb3YgMTcsIDIwMTYgYXQgMDM6Mjk6MTFQTSAtMDUwMCwgT2xn YSBLb3JuaWV2c2thaWENCj4gPiA+ID4gPiB3cm90ZToNCj4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ ID4gDQo+ID4gPiA+ID4gPiBPbiBUaHUsIE5vdiAxNywgMjAxNiBhdCAzOjE3IFBNLCBiZmllbGRz QGZpZWxkc2VzLm9yZw0KPiA+ID4gPiA+ID4gPGJmaWVsZHNAZmllbGRzZXMub3JnPiB3cm90ZToN Cj4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiBPbiBUaHUsIE5v diAxNywgMjAxNiBhdCAwMjo1ODoxMlBNIC0wNTAwLCBPbGdhDQo+ID4gPiA+ID4gPiA+IEtvcm5p ZXZza2FpYQ0KPiA+ID4gPiA+ID4gPiB3cm90ZToNCj4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4g PiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiBPbiBUaHUsIE5vdiAxNywgMjAxNiBhdCAyOjMyIFBN LCBiZmllbGRzQGZpZWxkc2VzLm9yZw0KPiA+ID4gPiA+ID4gPiA+IDxiZmllbGRzQGZpZWxkc2Vz Lm9yZz4gd3JvdGU6DQo+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+IA0KPiA+ ID4gPiA+ID4gPiA+ID4gT24gVGh1LCBOb3YgMTcsIDIwMTYgYXQgMDU6NDU6NTJQTSArMDAwMCwg VHJvbmQNCj4gPiA+ID4gPiA+ID4gPiA+IE15a2xlYnVzdA0KPiA+ID4gPiA+ID4gPiA+ID4gd3Jv dGU6DQo+ID4gPiA+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4g PiA+ID4gPiA+ID4gT24gVGh1LCAyMDE2LTExLTE3IGF0IDExOjMxIC0wNTAwLCBKLiBCcnVjZSBG aWVsZHMNCj4gPiA+ID4gPiA+ID4gPiA+ID4gd3JvdGU6DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4g DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gT24gV2VkLCBO b3YgMTYsIDIwMTYgYXQgMDI6NTU6MDVQTSAtMDYwMCwgSmFzb24gTA0KPiA+ID4gPiA+ID4gPiA+ ID4gPiA+IFRpYmJpdHRzIElJSSB3cm90ZToNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IA0KPiA+ ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4g PiA+ID4gPiA+ID4gPiA+IEknbSByZXBseWluZyB0byBhIHJhdGhlciBvbGQgbWVzc2FnZSwgYnV0 IHRoZQ0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gaXNzdWUNCj4gPiA+ID4gPiA+ID4gPiA+ID4g PiA+IGhhcyBqdXN0IG5vdw0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gcG9wcGVkDQo+ID4gPiA+ ID4gPiA+ID4gPiA+ID4gPiBiYWNrIHVwIGFnYWluLg0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4g DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBUbyByZWNhcCwgYSBjbGllbnQgc3RvcHMgYmVpbmcg YWJsZSB0byBhY2Nlc3MNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IF9hbnlfDQo+ID4gPiA+ID4g PiA+ID4gPiA+ID4gPiBtb3VudCBvbiBhDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBwYXJ0aWN1 bGFyIHNlcnZlciwgYW5kICJORlM6DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBuZnM0X3JlY2xh aW1fb3Blbl9zdGF0ZTogTG9jayByZWNsYWltDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBmYWls ZWQhIiBhcHBlYXJzIHNldmVyYWwgaHVuZHJlZCB0aW1lcyBwZXINCj4gPiA+ID4gPiA+ID4gPiA+ ID4gPiA+IHNlY29uZA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gaW4gdGhlIGtlcm5lbA0KPiA+ ID4gPiA+ID4gPiA+ID4gPiA+ID4gbG9nLg0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gVGhlIGxv YWQgZ29lcyB1cCBieSBvbmUgZm9yIGV2ZXIgcHJvY2Vzcw0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ ID4gYXR0ZW1wdGluZw0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gdG8gYWNjZXNzIGFueQ0KPiA+ ID4gPiA+ID4gPiA+ID4gPiA+ID4gbW91bnQNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IGZyb20g dGhhdCBwYXJ0aWN1bGFyIHNlcnZlci7CoMKgTW91bnRzIHRvIG90aGVyDQo+ID4gPiA+ID4gPiA+ ID4gPiA+ID4gPiBzZXJ2ZXJzIGFyZSBmaW5lLCBhbmQNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ IG90aGVyIGNsaWVudHMgY2FuIG1vdW50IHRoaW5ncyBmcm9tIHRoYXQgb25lDQo+ID4gPiA+ID4g PiA+ID4gPiA+ID4gPiBzZXJ2ZXINCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IHdpdGhvdXQNCj4g PiA+ID4gPiA+ID4gPiA+ID4gPiA+IHByb2JsZW1zLg0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4g DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBXaGVuIEkga2lsbCBldmVyeSBwcm9jZXNzIGtlZXBp bmcgdGhhdA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gcGFydGljdWxhcg0KPiA+ID4gPiA+ID4g PiA+ID4gPiA+ID4gbW91bnQgYWN0aXZlIGFuZA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gdGhl bg0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gdW1vdW50IGl0LCBJIHNlZToNCj4gPiA+ID4gPiA+ ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gTkZTOiBuZnM0X3JlY2xhaW1f b3Blbl9zdGF0ZTogdW5oYW5kbGVkIGVycm9yDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiAtMTAw NjgNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiBORlM0RVJS X1JFVFJZX1VOQ0FDSEVEX1JFUC4NCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ ID4gPiA+ID4gPiBTbywgeW91J3JlIHVzaW5nIE5GU3Y0LjEgb3IgNC4yLCBhbmQgdGhlIHNlcnZl cg0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+IHRoaW5rcyB0aGF0IHRoZQ0KPiA+ID4gPiA+ID4gPiA+ ID4gPiA+IGNsaWVudA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+IGhhcyByZXVzZWQgYSAoc2xvdCwg c2VxdWVuY2UgbnVtYmVyKSBwYWlyLCBidXQNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiB0aGUNCj4g PiA+ID4gPiA+ID4gPiA+ID4gPiBzZXJ2ZXIgZG9lc24ndA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ IGhhdmUgYQ0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+IGNhY2hlZCByZXNwb25zZSB0byByZXR1cm4u DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gSGFyZCB0byBr bm93IGhvdyB0aGF0IGhhcHBlbmVkLCBhbmQgaXQncyBub3QNCj4gPiA+ID4gPiA+ID4gPiA+ID4g PiBzaG93biBpbg0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+IHRoZSBiZWxvdy4NCj4gPiA+ID4gPiA+ ID4gPiA+ID4gPiBTb3VuZHMgbGlrZSBhIGJ1ZywgdGhvdWdoLg0KPiA+ID4gPiA+ID4gPiA+ID4g PiANCj4gPiA+ID4gPiA+ID4gPiA+ID4gLi4ub3IgYSBDdHJsLUMuLi4uDQo+ID4gPiA+ID4gPiA+ ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+IEhvdyBkb2VzIHRoYXQgaGFwcGVuPw0KPiA+ID4gPiA+ ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gSWYgSSBtYXkgY2hp bWUgaW4uLi4NCj4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiBCcnVjZSwgd2hlbiBh biBhcHBsaWNhdGlvbiBzZW5kcyBhIEN0cmwtQyBhbmQgY2xpZW50cydzDQo+ID4gPiA+ID4gPiA+ ID4gc2Vzc2lvbiBzbG90DQo+ID4gPiA+ID4gPiA+ID4gaGFzIHNlbnQgb3V0IGFuIFJQQyBidXQg ZGlkbid0IHByb2Nlc3MgdGhlIHJlcGx5LCB0aGUNCj4gPiA+ID4gPiA+ID4gPiBjbGllbnQgZG9l c24ndA0KPiA+ID4gPiA+ID4gPiA+IGtub3cgaWYgdGhlIHNlcnZlciBwcm9jZXNzZWQgdGhhdCBz ZXF1ZW5jZSBpZCBvciBub3QuDQo+ID4gPiA+ID4gPiA+ID4gSW4NCj4gPiA+ID4gPiA+ID4gPiB0 aGF0IGNhc2UsDQo+ID4gPiA+ID4gPiA+ID4gdGhlIGNsaWVudCBkb2Vzbid0IGluY3JlbWVudCB0 aGUgc2VxdWVuY2UgbnVtYmVyLg0KPiA+ID4gPiA+ID4gPiA+IE5vcm1hbGx5DQo+ID4gPiA+ID4g PiA+ID4gdGhlIGNsaWVudA0KPiA+ID4gPiA+ID4gPiA+IHdvdWxkIGhhbmRsZSBnZXR0aW5nIHN1 Y2ggYW4gZXJyb3IgYnkgcmV0cnlpbmcgYWdhaW4NCj4gPiA+ID4gPiA+ID4gPiAoYW5kDQo+ID4g PiA+ID4gPiA+ID4gcmVzZXR0aW5nDQo+ID4gPiA+ID4gPiA+ID4gdGhlIHNsb3RzKSBidXQgSSB0 aGluayBkdXJpbmcgcmVjb3Zlcnkgb3BlcmF0aW9uIHRoZQ0KPiA+ID4gPiA+ID4gPiA+IGNsaWVu dA0KPiA+ID4gPiA+ID4gPiA+IGhhbmRsZXMNCj4gPiA+ID4gPiA+ID4gPiBlcnJvcnMgZGlmZmVy ZW50bHkgKGJ5IGp1c3QgZXJyb3JpbmcpLiBJIGJlbGlldmUgdGhlDQo+ID4gPiA+ID4gPiA+ID4g cmVhc29uaW5nIHRoYXQgd2UNCj4gPiA+ID4gPiA+ID4gPiBkb24ndCB3YW50IHRvIGJlIHN0dWNr IHRyeWluZyB0byByZWNvdmVyIGZyb20gdGhlDQo+ID4gPiA+ID4gPiA+ID4gcmVjb3ZlcnkNCj4g PiA+ID4gPiA+ID4gPiBmcm9tIHRoZQ0KPiA+ID4gPiA+ID4gPiA+IHJlY292ZXJ5IGV0Yy4uLg0K PiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gU28gaW4gdGhhdCBjYXNlIHRoZSBjbGllbnQg Y2FuIGVuZCB1cCBzZW5kaW5nIGEgZGlmZmVyZW50DQo+ID4gPiA+ID4gPiA+IHJwYw0KPiA+ID4g PiA+ID4gPiByZXVzaW5nDQo+ID4gPiA+ID4gPiA+IHRoZSBvbGQgc2xvdCBhbmQgc2VxdWVuY2Ug bnVtYmVyPw0KPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiBDb3JyZWN0Lg0KPiA+ID4gPiA+IA0K PiA+ID4gPiA+IFNvIHRoYXQgY291bGQgZ2V0IFVOQ0FDSEVEX1JFUCBhcyB0aGUgcmVzcG9uc2Uu wqDCoEJ1dCBpZg0KPiA+ID4gPiA+IHlvdSdyZQ0KPiA+ID4gPiA+IHZlcnkNCj4gPiA+ID4gPiB1 bmx1Y2t5LCBjb3VsZG4ndCB0aGlzIGFsc28gaGFwcGVuPzoNCj4gPiA+ID4gPiANCj4gPiA+ID4g PiDCoMKgwqDCoMKgwqDCoMKgMSkgdGhlIGNvbXBvdW5kIHByZXZpb3VzbHkgc2VudCBvbiB0aGF0 IHNsb3Qgd2FzDQo+ID4gPiA+ID4gcHJvY2Vzc2VkIGJ5DQo+ID4gPiA+ID4gwqDCoMKgwqDCoMKg wqDCoHRoZSBzZXJ2ZXIgYW5kIGNhY2hlZA0KPiA+ID4gPiA+IMKgwqDCoMKgwqDCoMKgwqAyKSB0 aGUgY29tcG91bmQgeW91J3JlIHNlbmRpbmcgbm93IGhhcHBlbnMgdG8gaGF2ZQ0KPiA+ID4gPiA+ IHRoZQ0KPiA+ID4gPiA+IHNhbWUgc2V0DQo+ID4gPiA+ID4gwqDCoMKgwqDCoMKgwqDCoG9mIG9w ZXJhdGlvbnMNCj4gPiA+ID4gPiANCj4gPiA+ID4gPiB3aXRoIHRoZSByZXN1bHQgdGhhdCB0aGUg Y2xpZW50IGRvZXNuJ3QgZGV0ZWN0IHRoYXQgdGhlIHJlcGx5DQo+ID4gPiA+ID4gd2FzDQo+ID4g PiA+ID4gYWN0dWFsbHkgdG8gc29tZSBvdGhlciBycGMsIGFuZCBpbnN0ZWFkIGl0IHJldHVybnMg YmFkIGRhdGENCj4gPiA+ID4gPiB0bw0KPiA+ID4gPiA+IHRoZQ0KPiA+ID4gPiA+IGFwcGxpY2F0 aW9uPw0KPiA+ID4gPiANCj4gPiA+ID4gSWYgeW91IGFyZSBzZW5kaW5nIGV4YWN0bHkgdGhlIHNh bWUgb3BlcmF0aW9ucyBhbmQgYXJndW1lbnRzLA0KPiA+ID4gPiB0aGVuDQo+ID4gPiA+IHdoeQ0K PiA+ID4gPiBpcyBhIHJlcGx5IGZyb20gdGhlIGNhY2hlIHdvdWxkIGxlYWQgdG8gYmFkIGRhdGE/ DQo+ID4gPiANCj4gPiA+IFRoYXQgd291bGQgcHJvYmFibHkgYmUgZmluZSwgSSB3YXMgd29uZGVy aW5nIHdoYXQgd291bGQgaGFwcGVuIGlmDQo+ID4gPiB5b3UNCj4gPiA+IHNlbnQgdGhlIHNhbWUg b3BlcmF0aW9uIGJ1dCBkaWZmZXJlbnQgYXJndW1lbnRzLg0KPiA+IA0KPiA+ID4gDQo+ID4gPiBT byB0aGUgb3JpZ2luYWwgY2FuY2VsbGVkIG9wZXJhdGlvbiBpcyBzb21ldGhpbmcgbGlrZQ0KPiA+ ID4gUFVURkgoZmgxKStPUEVOKCJmb28iKStHRVRGSCwgYW5kIHRoZSBuZXcgb25lIGlzDQo+ID4g PiBQVVRGSChmaDIpK09QRU4oImJhciIpK0dFVEZILsKgwqBJbiB0aGVvcnkgY291bGRuJ3QgdGhl IHNlY29uZCBvbmUNCj4gPiA+IHN1Y2NlZWQNCj4gPiA+IGFuZCBsZWF2ZSB0aGUgY2xpZW50IHRo aW5raW5nIGl0IGhhZCBvcGVuZWQgKGZoMiwgYmFyKSB3aGVuIHRoZQ0KPiA+ID4gZmlsZWhhbmRs ZSBpdCBnb3QgYmFjayB3YXMgcmVhbGx5IGZvciAoZmgxLCBmb28pPw0KPiA+ID4gDQo+ID4gDQo+ ID4gVGhlIGNsaWVudCB3b3VsZCByZWNlaXZlIGEgZmlsZWhhbmRsZSBmb3IgZmgxLyJmb28iLCBz byBpdCB3b3VsZA0KPiA+IGFwcGx5DQo+ID4gYW55IHN0YXRlIGl0IHRob3VnaHQgaXQgaGFkIHJl Y2VpdmVkIHRvIHRoYXQgZmlsZS4gSG93ZXZlciwNCj4gPiBub3JtYWxseSwNCj4gPiBJJ2QgZXhw ZWN0IHRvIHNlZSBhIE5GUzRFUlJfRkFMU0VfUkVUUlkgaW4gdGhpcyBjYXNlLg0KPiANCj4gSSBz ZWUgQnJ1Y2UncyBwb2ludCB0aGF0IGlmIHRoZSBzZXJ2ZXIgb25seSBsb29rcyB1cCB0aGUgY2Fj aGUgYmFzZWQNCj4gb24gdGhlIHNlcWlkIGFuZCBzbG90IyBhbmQgZG9lc24ndCBoYXZlIGxpa2Ug YSBoYXNoIG9mIHRoZSBjb250ZW50DQo+IHdoaWNoIEkgY291bGQgc2VlIGlzIGV4cGVuc2l2ZSwg dGhlbiB0aGUgY2xpZW50IGluIHRoaXMgY2FzZSB3b3VsZA0KPiBnZXQNCj4gaW50byB0cm91Ymxl IG9mIHRoaW5raW5nIGl0IG9wZW5lZCAiYmFyIiBidXQgcmVhbGx5IGl0J3MgImZvbyIuIFNwZWMN Cj4gc2F5czoNCj4gDQo+IFNlY3Rpb24gMTguNDYuMw0KPiBJZiB0aGUgY2xpZW50IHJldXNlcyBh IHNsb3QgSUQgYW5kIHNlcXVlbmNlIElEIGZvciBhIGNvbXBsZXRlbHkNCj4gwqDCoMKgZGlmZmVy ZW50IHJlcXVlc3QsIHRoZSBzZXJ2ZXIgTUFZIHRyZWF0IHRoZSByZXF1ZXN0IGFzIGlmIGl0IGlz IGENCj4gwqDCoMKgcmV0cnkgb2Ygd2hhdCBpdCBoYXMgYWxyZWFkeSBleGVjdXRlZC7CoMKgVGhl IHNlcnZlciBNQVkgaG93ZXZlcg0KPiBkZXRlY3QNCj4gwqDCoMKgdGhlIGNsaWVudCdzIGlsbGVn YWwgcmV1c2UgYW5kIHJldHVybiBORlM0RVJSX1NFUV9GQUxTRV9SRVRSWS4NCj4gDQo+IFdoYXQg aXMgImEgY29tcGxldGVseSBkaWZmZXJlbnQgcmVxdWVzdCIuIEZyb20gdGhlIGNsaWVudCdzIHBv aW50IG9mDQo+IHZpZXcgc2VuZGluZyBkaWZmZXJlbnQgYXJncyB3b3VsZCBjb25zdGl0dXRlIGEg ZGlmZmVyZW50IHJlcXVlc3QuIEJ1dA0KPiBpbiBhbnkgY2FzZSBpdCdzIGEgIk1BWSIgc28gY2xp ZW50IGNhbid0IGRlcGVuZCBvbiB0aGlzIGJlaW5nDQo+IGltcGxlbWVudGVkLg0KPiANCg0KV2hh dCdzIHRoZSBhbHRlcm5hdGl2ZT8gQXNzdW1lIHRoZSBjbGllbnQgcHJlLWVtcHRpdmVseSBidW1w cyB0aGUgc2VxaWQNCmluc3RlYWQgb2YgcmV0cnlpbmcsIHRoZW4gdGhlIHVzZXIgcHJlc3NlcyBD dHJsLUMgYWdhaW4uIFJlcGVhdCBhIGZldw0KbW9yZSB0aW1lcy4gSG93IGRvIEkgbm93IHJlc3lu YyB0aGUgc2VxaWRzIGJldHdlZW4gdGhlIGNsaWVudCBhbmQNCnNlcnZlciBvdGhlciB0aGFuIGJ5 IHRyYXNoaW5nIHRoZSBzZXNzaW9uPw== ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-11-17 22:15 ` Trond Myklebust @ 2016-11-17 22:27 ` Olga Kornievskaia 2016-11-17 22:43 ` Trond Myklebust 0 siblings, 1 reply; 27+ messages in thread From: Olga Kornievskaia @ 2016-11-17 22:27 UTC (permalink / raw) To: Trond Myklebust; +Cc: bfields, tibbs, linux-nfs On Thu, Nov 17, 2016 at 5:15 PM, Trond Myklebust <trondmy@primarydata.com> wrote: > On Thu, 2016-11-17 at 16:53 -0500, Olga Kornievskaia wrote: >> On Thu, Nov 17, 2016 at 4:45 PM, Trond Myklebust >> <trondmy@primarydata.com> wrote: >> > >> > On Thu, 2016-11-17 at 16:26 -0500, bfields@fieldses.org wrote: >> > > >> > > On Thu, Nov 17, 2016 at 04:05:32PM -0500, Olga Kornievskaia >> > > wrote: >> > > > >> > > > >> > > > On Thu, Nov 17, 2016 at 3:46 PM, bfields@fieldses.org >> > > > <bfields@fieldses.org> wrote: >> > > > > >> > > > > >> > > > > On Thu, Nov 17, 2016 at 03:29:11PM -0500, Olga Kornievskaia >> > > > > wrote: >> > > > > > >> > > > > > >> > > > > > On Thu, Nov 17, 2016 at 3:17 PM, bfields@fieldses.org >> > > > > > <bfields@fieldses.org> wrote: >> > > > > > > >> > > > > > > >> > > > > > > On Thu, Nov 17, 2016 at 02:58:12PM -0500, Olga >> > > > > > > Kornievskaia >> > > > > > > wrote: >> > > > > > > > >> > > > > > > > >> > > > > > > > On Thu, Nov 17, 2016 at 2:32 PM, bfields@fieldses.org >> > > > > > > > <bfields@fieldses.org> wrote: >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > On Thu, Nov 17, 2016 at 05:45:52PM +0000, Trond >> > > > > > > > > Myklebust >> > > > > > > > > wrote: >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > On Thu, 2016-11-17 at 11:31 -0500, J. Bruce Fields >> > > > > > > > > > wrote: >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > On Wed, Nov 16, 2016 at 02:55:05PM -0600, Jason L >> > > > > > > > > > > Tibbitts III wrote: >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > I'm replying to a rather old message, but the >> > > > > > > > > > > > issue >> > > > > > > > > > > > has just now >> > > > > > > > > > > > popped >> > > > > > > > > > > > back up again. >> > > > > > > > > > > > >> > > > > > > > > > > > To recap, a client stops being able to access >> > > > > > > > > > > > _any_ >> > > > > > > > > > > > mount on a >> > > > > > > > > > > > particular server, and "NFS: >> > > > > > > > > > > > nfs4_reclaim_open_state: Lock reclaim >> > > > > > > > > > > > failed!" appears several hundred times per >> > > > > > > > > > > > second >> > > > > > > > > > > > in the kernel >> > > > > > > > > > > > log. >> > > > > > > > > > > > The load goes up by one for ever process >> > > > > > > > > > > > attempting >> > > > > > > > > > > > to access any >> > > > > > > > > > > > mount >> > > > > > > > > > > > from that particular server. Mounts to other >> > > > > > > > > > > > servers are fine, and >> > > > > > > > > > > > other clients can mount things from that one >> > > > > > > > > > > > server >> > > > > > > > > > > > without >> > > > > > > > > > > > problems. >> > > > > > > > > > > > >> > > > > > > > > > > > When I kill every process keeping that >> > > > > > > > > > > > particular >> > > > > > > > > > > > mount active and >> > > > > > > > > > > > then >> > > > > > > > > > > > umount it, I see: >> > > > > > > > > > > > >> > > > > > > > > > > > NFS: nfs4_reclaim_open_state: unhandled error >> > > > > > > > > > > > -10068 >> > > > > > > > > > > >> > > > > > > > > > > NFS4ERR_RETRY_UNCACHED_REP. >> > > > > > > > > > > >> > > > > > > > > > > So, you're using NFSv4.1 or 4.2, and the server >> > > > > > > > > > > thinks that the >> > > > > > > > > > > client >> > > > > > > > > > > has reused a (slot, sequence number) pair, but >> > > > > > > > > > > the >> > > > > > > > > > > server doesn't >> > > > > > > > > > > have a >> > > > > > > > > > > cached response to return. >> > > > > > > > > > > >> > > > > > > > > > > Hard to know how that happened, and it's not >> > > > > > > > > > > shown in >> > > > > > > > > > > the below. >> > > > > > > > > > > Sounds like a bug, though. >> > > > > > > > > > >> > > > > > > > > > ...or a Ctrl-C.... >> > > > > > > > > >> > > > > > > > > How does that happen? >> > > > > > > > > >> > > > > > > > >> > > > > > > > If I may chime in... >> > > > > > > > >> > > > > > > > Bruce, when an application sends a Ctrl-C and clients's >> > > > > > > > session slot >> > > > > > > > has sent out an RPC but didn't process the reply, the >> > > > > > > > client doesn't >> > > > > > > > know if the server processed that sequence id or not. >> > > > > > > > In >> > > > > > > > that case, >> > > > > > > > the client doesn't increment the sequence number. >> > > > > > > > Normally >> > > > > > > > the client >> > > > > > > > would handle getting such an error by retrying again >> > > > > > > > (and >> > > > > > > > resetting >> > > > > > > > the slots) but I think during recovery operation the >> > > > > > > > client >> > > > > > > > handles >> > > > > > > > errors differently (by just erroring). I believe the >> > > > > > > > reasoning that we >> > > > > > > > don't want to be stuck trying to recover from the >> > > > > > > > recovery >> > > > > > > > from the >> > > > > > > > recovery etc... >> > > > > > > >> > > > > > > So in that case the client can end up sending a different >> > > > > > > rpc >> > > > > > > reusing >> > > > > > > the old slot and sequence number? >> > > > > > >> > > > > > Correct. >> > > > > >> > > > > So that could get UNCACHED_REP as the response. But if >> > > > > you're >> > > > > very >> > > > > unlucky, couldn't this also happen?: >> > > > > >> > > > > 1) the compound previously sent on that slot was >> > > > > processed by >> > > > > the server and cached >> > > > > 2) the compound you're sending now happens to have >> > > > > the >> > > > > same set >> > > > > of operations >> > > > > >> > > > > with the result that the client doesn't detect that the reply >> > > > > was >> > > > > actually to some other rpc, and instead it returns bad data >> > > > > to >> > > > > the >> > > > > application? >> > > > >> > > > If you are sending exactly the same operations and arguments, >> > > > then >> > > > why >> > > > is a reply from the cache would lead to bad data? >> > > >> > > That would probably be fine, I was wondering what would happen if >> > > you >> > > sent the same operation but different arguments. >> > >> > > >> > > So the original cancelled operation is something like >> > > PUTFH(fh1)+OPEN("foo")+GETFH, and the new one is >> > > PUTFH(fh2)+OPEN("bar")+GETFH. In theory couldn't the second one >> > > succeed >> > > and leave the client thinking it had opened (fh2, bar) when the >> > > filehandle it got back was really for (fh1, foo)? >> > > >> > >> > The client would receive a filehandle for fh1/"foo", so it would >> > apply >> > any state it thought it had received to that file. However, >> > normally, >> > I'd expect to see a NFS4ERR_FALSE_RETRY in this case. >> >> I see Bruce's point that if the server only looks up the cache based >> on the seqid and slot# and doesn't have like a hash of the content >> which I could see is expensive, then the client in this case would >> get >> into trouble of thinking it opened "bar" but really it's "foo". Spec >> says: >> >> Section 18.46.3 >> If the client reuses a slot ID and sequence ID for a completely >> different request, the server MAY treat the request as if it is a >> retry of what it has already executed. The server MAY however >> detect >> the client's illegal reuse and return NFS4ERR_SEQ_FALSE_RETRY. >> >> What is "a completely different request". From the client's point of >> view sending different args would constitute a different request. But >> in any case it's a "MAY" so client can't depend on this being >> implemented. >> > > What's the alternative? Assume the client pre-emptively bumps the seqid > instead of retrying, then the user presses Ctrl-C again. Repeat a few > more times. How do I now resync the seqids between the client and > server other than by trashing the session? I don't see any alternatives than to reset in that case. But I think it's better then the possibility of accidentally opening a wrong file? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-11-17 22:27 ` Olga Kornievskaia @ 2016-11-17 22:43 ` Trond Myklebust 2016-11-18 20:52 ` bfields 0 siblings, 1 reply; 27+ messages in thread From: Trond Myklebust @ 2016-11-17 22:43 UTC (permalink / raw) To: aglo; +Cc: bfields, tibbs, linux-nfs T24gVGh1LCAyMDE2LTExLTE3IGF0IDE3OjI3IC0wNTAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90 ZToNCj4gT24gVGh1LCBOb3YgMTcsIDIwMTYgYXQgNToxNSBQTSwgVHJvbmQgTXlrbGVidXN0DQo+ IDx0cm9uZG15QHByaW1hcnlkYXRhLmNvbT4gd3JvdGU6DQo+ID4gDQo+ID4gT24gVGh1LCAyMDE2 LTExLTE3IGF0IDE2OjUzIC0wNTAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90ZToNCj4gPiA+IA0K PiA+ID4gT24gVGh1LCBOb3YgMTcsIDIwMTYgYXQgNDo0NSBQTSwgVHJvbmQgTXlrbGVidXN0DQo+ ID4gPiA8dHJvbmRteUBwcmltYXJ5ZGF0YS5jb20+IHdyb3RlOg0KPiA+ID4gPiANCj4gPiA+ID4g DQo+ID4gPiA+IE9uIFRodSwgMjAxNi0xMS0xNyBhdCAxNjoyNiAtMDUwMCwgYmZpZWxkc0BmaWVs ZHNlcy5vcmcgd3JvdGU6DQo+ID4gPiA+ID4gDQo+ID4gPiA+ID4gDQo+ID4gPiA+ID4gT24gVGh1 LCBOb3YgMTcsIDIwMTYgYXQgMDQ6MDU6MzJQTSAtMDUwMCwgT2xnYSBLb3JuaWV2c2thaWENCj4g PiA+ID4gPiB3cm90ZToNCj4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiAN Cj4gPiA+ID4gPiA+IE9uIFRodSwgTm92IDE3LCAyMDE2IGF0IDM6NDYgUE0sIGJmaWVsZHNAZmll bGRzZXMub3JnDQo+ID4gPiA+ID4gPiA8YmZpZWxkc0BmaWVsZHNlcy5vcmc+IHdyb3RlOg0KPiA+ ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4g PiBPbiBUaHUsIE5vdiAxNywgMjAxNiBhdCAwMzoyOToxMVBNIC0wNTAwLCBPbGdhDQo+ID4gPiA+ ID4gPiA+IEtvcm5pZXZza2FpYQ0KPiA+ID4gPiA+ID4gPiB3cm90ZToNCj4gPiA+ID4gPiA+ID4g PiANCj4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiBP biBUaHUsIE5vdiAxNywgMjAxNiBhdCAzOjE3IFBNLCBiZmllbGRzQGZpZWxkc2VzLm9yZw0KPiA+ ID4gPiA+ID4gPiA+IDxiZmllbGRzQGZpZWxkc2VzLm9yZz4gd3JvdGU6DQo+ID4gPiA+ID4gPiA+ ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4g PiA+ID4gPiBPbiBUaHUsIE5vdiAxNywgMjAxNiBhdCAwMjo1ODoxMlBNIC0wNTAwLCBPbGdhDQo+ ID4gPiA+ID4gPiA+ID4gPiBLb3JuaWV2c2thaWENCj4gPiA+ID4gPiA+ID4gPiA+IHdyb3RlOg0K PiA+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiA+ID4gPiBPbiBUaHUsIE5vdiAxNywgMjAxNiBhdCAyOjMy IFBNLCBiZmllbGRzQGZpZWxkc2VzLm8NCj4gPiA+ID4gPiA+ID4gPiA+ID4gcmcNCj4gPiA+ID4g PiA+ID4gPiA+ID4gPGJmaWVsZHNAZmllbGRzZXMub3JnPiB3cm90ZToNCj4gPiA+ID4gPiA+ID4g PiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiAN Cj4gPiA+ID4gPiA+ID4gPiA+ID4gPiBPbiBUaHUsIE5vdiAxNywgMjAxNiBhdCAwNTo0NTo1MlBN ICswMDAwLCBUcm9uZA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+IE15a2xlYnVzdA0KPiA+ID4gPiA+ ID4gPiA+ID4gPiA+IHdyb3RlOg0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4g PiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiA+ ID4gPiA+ID4gT24gVGh1LCAyMDE2LTExLTE3IGF0IDExOjMxIC0wNTAwLCBKLiBCcnVjZQ0KPiA+ ID4gPiA+ID4gPiA+ID4gPiA+ID4gRmllbGRzDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiB3cm90 ZToNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ IA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4g T24gV2VkLCBOb3YgMTYsIDIwMTYgYXQgMDI6NTU6MDVQTSAtMDYwMCwNCj4gPiA+ID4gPiA+ID4g PiA+ID4gPiA+ID4gSmFzb24gTA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBUaWJiaXR0cyBJ SUkgd3JvdGU6DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4g PiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gSSdtIHJlcGx5aW5n IHRvIGEgcmF0aGVyIG9sZCBtZXNzYWdlLCBidXQNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4g PiB0aGUNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBpc3N1ZQ0KPiA+ID4gPiA+ID4gPiA+ ID4gPiA+ID4gPiA+IGhhcyBqdXN0IG5vdw0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IHBv cHBlZA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IGJhY2sgdXAgYWdhaW4uDQo+ID4gPiA+ ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gVG8gcmVj YXAsIGEgY2xpZW50IHN0b3BzIGJlaW5nIGFibGUgdG8NCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ ID4gPiBhY2Nlc3MNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBfYW55Xw0KPiA+ID4gPiA+ ID4gPiA+ID4gPiA+ID4gPiA+IG1vdW50IG9uIGENCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4g PiBwYXJ0aWN1bGFyIHNlcnZlciwgYW5kICJORlM6DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ ID4gbmZzNF9yZWNsYWltX29wZW5fc3RhdGU6IExvY2sgcmVjbGFpbQ0KPiA+ID4gPiA+ID4gPiA+ ID4gPiA+ID4gPiA+IGZhaWxlZCEiIGFwcGVhcnMgc2V2ZXJhbCBodW5kcmVkIHRpbWVzIHBlcg0K PiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IHNlY29uZA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ ID4gPiA+IGluIHRoZSBrZXJuZWwNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBsb2cuDQo+ ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gVGhlIGxvYWQgZ29lcyB1cCBieSBvbmUgZm9yIGV2 ZXIgcHJvY2Vzcw0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IGF0dGVtcHRpbmcNCj4gPiA+ ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiB0byBhY2Nlc3MgYW55DQo+ID4gPiA+ID4gPiA+ID4gPiA+ ID4gPiA+ID4gbW91bnQNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBmcm9tIHRoYXQgcGFy dGljdWxhciBzZXJ2ZXIuwqDCoE1vdW50cyB0bw0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ IG90aGVyDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gc2VydmVycyBhcmUgZmluZSwgYW5k DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gb3RoZXIgY2xpZW50cyBjYW4gbW91bnQgdGhp bmdzIGZyb20gdGhhdA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IG9uZQ0KPiA+ID4gPiA+ ID4gPiA+ID4gPiA+ID4gPiA+IHNlcnZlcg0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IHdp dGhvdXQNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBwcm9ibGVtcy4NCj4gPiA+ID4gPiA+ ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBXaGVuIEkga2ls bCBldmVyeSBwcm9jZXNzIGtlZXBpbmcgdGhhdA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ IHBhcnRpY3VsYXINCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBtb3VudCBhY3RpdmUgYW5k DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gdGhlbg0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ ID4gPiA+IHVtb3VudCBpdCwgSSBzZWU6DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gTkZTOiBuZnM0X3JlY2xhaW1fb3Blbl9zdGF0ZTog dW5oYW5kbGVkDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gZXJyb3INCj4gPiA+ID4gPiA+ ID4gPiA+ID4gPiA+ID4gPiAtMTAwNjgNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4g PiA+ID4gPiA+ID4gPiA+ID4gPiA+IE5GUzRFUlJfUkVUUllfVU5DQUNIRURfUkVQLg0KPiA+ID4g PiA+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gU28sIHlvdSdy ZSB1c2luZyBORlN2NC4xIG9yIDQuMiwgYW5kIHRoZQ0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4g PiBzZXJ2ZXINCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gdGhpbmtzIHRoYXQgdGhlDQo+ID4g PiA+ID4gPiA+ID4gPiA+ID4gPiA+IGNsaWVudA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBo YXMgcmV1c2VkIGEgKHNsb3QsIHNlcXVlbmNlIG51bWJlcikgcGFpciwNCj4gPiA+ID4gPiA+ID4g PiA+ID4gPiA+ID4gYnV0DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IHRoZQ0KPiA+ID4gPiA+ ID4gPiA+ID4gPiA+ID4gPiBzZXJ2ZXIgZG9lc24ndA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4g PiBoYXZlIGENCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gY2FjaGVkIHJlc3BvbnNlIHRvIHJl dHVybi4NCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4g PiA+IEhhcmQgdG8ga25vdyBob3cgdGhhdCBoYXBwZW5lZCwgYW5kIGl0J3Mgbm90DQo+ID4gPiA+ ID4gPiA+ID4gPiA+ID4gPiA+IHNob3duIGluDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IHRo ZSBiZWxvdy4NCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gU291bmRzIGxpa2UgYSBidWcsIHRo b3VnaC4NCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4g Li4ub3IgYSBDdHJsLUMuLi4uDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ ID4gPiA+ID4gSG93IGRvZXMgdGhhdCBoYXBwZW4/DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ ID4gPiA+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiA+ID4gPiBJZiBJIG1heSBjaGltZSBp bi4uLg0KPiA+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+ID4gQnJ1Y2UsIHdo ZW4gYW4gYXBwbGljYXRpb24gc2VuZHMgYSBDdHJsLUMgYW5kDQo+ID4gPiA+ID4gPiA+ID4gPiA+ IGNsaWVudHMncw0KPiA+ID4gPiA+ID4gPiA+ID4gPiBzZXNzaW9uIHNsb3QNCj4gPiA+ID4gPiA+ ID4gPiA+ID4gaGFzIHNlbnQgb3V0IGFuIFJQQyBidXQgZGlkbid0IHByb2Nlc3MgdGhlIHJlcGx5 LA0KPiA+ID4gPiA+ID4gPiA+ID4gPiB0aGUNCj4gPiA+ID4gPiA+ID4gPiA+ID4gY2xpZW50IGRv ZXNuJ3QNCj4gPiA+ID4gPiA+ID4gPiA+ID4ga25vdyBpZiB0aGUgc2VydmVyIHByb2Nlc3NlZCB0 aGF0IHNlcXVlbmNlIGlkIG9yDQo+ID4gPiA+ID4gPiA+ID4gPiA+IG5vdC4NCj4gPiA+ID4gPiA+ ID4gPiA+ID4gSW4NCj4gPiA+ID4gPiA+ID4gPiA+ID4gdGhhdCBjYXNlLA0KPiA+ID4gPiA+ID4g PiA+ID4gPiB0aGUgY2xpZW50IGRvZXNuJ3QgaW5jcmVtZW50IHRoZSBzZXF1ZW5jZSBudW1iZXIu DQo+ID4gPiA+ID4gPiA+ID4gPiA+IE5vcm1hbGx5DQo+ID4gPiA+ID4gPiA+ID4gPiA+IHRoZSBj bGllbnQNCj4gPiA+ID4gPiA+ID4gPiA+ID4gd291bGQgaGFuZGxlIGdldHRpbmcgc3VjaCBhbiBl cnJvciBieSByZXRyeWluZw0KPiA+ID4gPiA+ID4gPiA+ID4gPiBhZ2Fpbg0KPiA+ID4gPiA+ID4g PiA+ID4gPiAoYW5kDQo+ID4gPiA+ID4gPiA+ID4gPiA+IHJlc2V0dGluZw0KPiA+ID4gPiA+ID4g PiA+ID4gPiB0aGUgc2xvdHMpIGJ1dCBJIHRoaW5rIGR1cmluZyByZWNvdmVyeSBvcGVyYXRpb24N Cj4gPiA+ID4gPiA+ID4gPiA+ID4gdGhlDQo+ID4gPiA+ID4gPiA+ID4gPiA+IGNsaWVudA0KPiA+ ID4gPiA+ID4gPiA+ID4gPiBoYW5kbGVzDQo+ID4gPiA+ID4gPiA+ID4gPiA+IGVycm9ycyBkaWZm ZXJlbnRseSAoYnkganVzdCBlcnJvcmluZykuIEkgYmVsaWV2ZQ0KPiA+ID4gPiA+ID4gPiA+ID4g PiB0aGUNCj4gPiA+ID4gPiA+ID4gPiA+ID4gcmVhc29uaW5nIHRoYXQgd2UNCj4gPiA+ID4gPiA+ ID4gPiA+ID4gZG9uJ3Qgd2FudCB0byBiZSBzdHVjayB0cnlpbmcgdG8gcmVjb3ZlciBmcm9tIHRo ZQ0KPiA+ID4gPiA+ID4gPiA+ID4gPiByZWNvdmVyeQ0KPiA+ID4gPiA+ID4gPiA+ID4gPiBmcm9t IHRoZQ0KPiA+ID4gPiA+ID4gPiA+ID4gPiByZWNvdmVyeSBldGMuLi4NCj4gPiA+ID4gPiA+ID4g PiA+IA0KPiA+ID4gPiA+ID4gPiA+ID4gU28gaW4gdGhhdCBjYXNlIHRoZSBjbGllbnQgY2FuIGVu ZCB1cCBzZW5kaW5nIGENCj4gPiA+ID4gPiA+ID4gPiA+IGRpZmZlcmVudA0KPiA+ID4gPiA+ID4g PiA+ID4gcnBjDQo+ID4gPiA+ID4gPiA+ID4gPiByZXVzaW5nDQo+ID4gPiA+ID4gPiA+ID4gPiB0 aGUgb2xkIHNsb3QgYW5kIHNlcXVlbmNlIG51bWJlcj8NCj4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ ID4gPiA+ID4gPiBDb3JyZWN0Lg0KPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gU28gdGhh dCBjb3VsZCBnZXQgVU5DQUNIRURfUkVQIGFzIHRoZSByZXNwb25zZS7CoMKgQnV0IGlmDQo+ID4g PiA+ID4gPiA+IHlvdSdyZQ0KPiA+ID4gPiA+ID4gPiB2ZXJ5DQo+ID4gPiA+ID4gPiA+IHVubHVj a3ksIGNvdWxkbid0IHRoaXMgYWxzbyBoYXBwZW4/Og0KPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4g PiA+ID4gwqDCoMKgwqDCoMKgwqDCoDEpIHRoZSBjb21wb3VuZCBwcmV2aW91c2x5IHNlbnQgb24g dGhhdCBzbG90IHdhcw0KPiA+ID4gPiA+ID4gPiBwcm9jZXNzZWQgYnkNCj4gPiA+ID4gPiA+ID4g wqDCoMKgwqDCoMKgwqDCoHRoZSBzZXJ2ZXIgYW5kIGNhY2hlZA0KPiA+ID4gPiA+ID4gPiDCoMKg wqDCoMKgwqDCoMKgMikgdGhlIGNvbXBvdW5kIHlvdSdyZSBzZW5kaW5nIG5vdyBoYXBwZW5zIHRv DQo+ID4gPiA+ID4gPiA+IGhhdmUNCj4gPiA+ID4gPiA+ID4gdGhlDQo+ID4gPiA+ID4gPiA+IHNh bWUgc2V0DQo+ID4gPiA+ID4gPiA+IMKgwqDCoMKgwqDCoMKgwqBvZiBvcGVyYXRpb25zDQo+ID4g PiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiB3aXRoIHRoZSByZXN1bHQgdGhhdCB0aGUgY2xpZW50 IGRvZXNuJ3QgZGV0ZWN0IHRoYXQgdGhlDQo+ID4gPiA+ID4gPiA+IHJlcGx5DQo+ID4gPiA+ID4g PiA+IHdhcw0KPiA+ID4gPiA+ID4gPiBhY3R1YWxseSB0byBzb21lIG90aGVyIHJwYywgYW5kIGlu c3RlYWQgaXQgcmV0dXJucyBiYWQNCj4gPiA+ID4gPiA+ID4gZGF0YQ0KPiA+ID4gPiA+ID4gPiB0 bw0KPiA+ID4gPiA+ID4gPiB0aGUNCj4gPiA+ID4gPiA+ID4gYXBwbGljYXRpb24/DQo+ID4gPiA+ ID4gPiANCj4gPiA+ID4gPiA+IElmIHlvdSBhcmUgc2VuZGluZyBleGFjdGx5IHRoZSBzYW1lIG9w ZXJhdGlvbnMgYW5kDQo+ID4gPiA+ID4gPiBhcmd1bWVudHMsDQo+ID4gPiA+ID4gPiB0aGVuDQo+ ID4gPiA+ID4gPiB3aHkNCj4gPiA+ID4gPiA+IGlzIGEgcmVwbHkgZnJvbSB0aGUgY2FjaGUgd291 bGQgbGVhZCB0byBiYWQgZGF0YT8NCj4gPiA+ID4gPiANCj4gPiA+ID4gPiBUaGF0IHdvdWxkIHBy b2JhYmx5IGJlIGZpbmUsIEkgd2FzIHdvbmRlcmluZyB3aGF0IHdvdWxkDQo+ID4gPiA+ID4gaGFw cGVuIGlmDQo+ID4gPiA+ID4geW91DQo+ID4gPiA+ID4gc2VudCB0aGUgc2FtZSBvcGVyYXRpb24g YnV0IGRpZmZlcmVudCBhcmd1bWVudHMuDQo+ID4gPiA+IA0KPiA+ID4gPiA+IA0KPiA+ID4gPiA+ IA0KPiA+ID4gPiA+IFNvIHRoZSBvcmlnaW5hbCBjYW5jZWxsZWQgb3BlcmF0aW9uIGlzIHNvbWV0 aGluZyBsaWtlDQo+ID4gPiA+ID4gUFVURkgoZmgxKStPUEVOKCJmb28iKStHRVRGSCwgYW5kIHRo ZSBuZXcgb25lIGlzDQo+ID4gPiA+ID4gUFVURkgoZmgyKStPUEVOKCJiYXIiKStHRVRGSC7CoMKg SW4gdGhlb3J5IGNvdWxkbid0IHRoZSBzZWNvbmQNCj4gPiA+ID4gPiBvbmUNCj4gPiA+ID4gPiBz dWNjZWVkDQo+ID4gPiA+ID4gYW5kIGxlYXZlIHRoZSBjbGllbnQgdGhpbmtpbmcgaXQgaGFkIG9w ZW5lZCAoZmgyLCBiYXIpIHdoZW4NCj4gPiA+ID4gPiB0aGUNCj4gPiA+ID4gPiBmaWxlaGFuZGxl IGl0IGdvdCBiYWNrIHdhcyByZWFsbHkgZm9yIChmaDEsIGZvbyk/DQo+ID4gPiA+ID4gDQo+ID4g PiA+IA0KPiA+ID4gPiBUaGUgY2xpZW50IHdvdWxkIHJlY2VpdmUgYSBmaWxlaGFuZGxlIGZvciBm aDEvImZvbyIsIHNvIGl0DQo+ID4gPiA+IHdvdWxkDQo+ID4gPiA+IGFwcGx5DQo+ID4gPiA+IGFu eSBzdGF0ZSBpdCB0aG91Z2h0IGl0IGhhZCByZWNlaXZlZCB0byB0aGF0IGZpbGUuIEhvd2V2ZXIs DQo+ID4gPiA+IG5vcm1hbGx5LA0KPiA+ID4gPiBJJ2QgZXhwZWN0IHRvIHNlZSBhIE5GUzRFUlJf RkFMU0VfUkVUUlkgaW4gdGhpcyBjYXNlLg0KPiA+ID4gDQo+ID4gPiBJIHNlZSBCcnVjZSdzIHBv aW50IHRoYXQgaWYgdGhlIHNlcnZlciBvbmx5IGxvb2tzIHVwIHRoZSBjYWNoZQ0KPiA+ID4gYmFz ZWQNCj4gPiA+IG9uIHRoZSBzZXFpZCBhbmQgc2xvdCMgYW5kIGRvZXNuJ3QgaGF2ZSBsaWtlIGEg aGFzaCBvZiB0aGUNCj4gPiA+IGNvbnRlbnQNCj4gPiA+IHdoaWNoIEkgY291bGQgc2VlIGlzIGV4 cGVuc2l2ZSwgdGhlbiB0aGUgY2xpZW50IGluIHRoaXMgY2FzZQ0KPiA+ID4gd291bGQNCj4gPiA+ IGdldA0KPiA+ID4gaW50byB0cm91YmxlIG9mIHRoaW5raW5nIGl0IG9wZW5lZCAiYmFyIiBidXQg cmVhbGx5IGl0J3MgImZvbyIuDQo+ID4gPiBTcGVjDQo+ID4gPiBzYXlzOg0KPiA+ID4gDQo+ID4g PiBTZWN0aW9uIDE4LjQ2LjMNCj4gPiA+IElmIHRoZSBjbGllbnQgcmV1c2VzIGEgc2xvdCBJRCBh bmQgc2VxdWVuY2UgSUQgZm9yIGEgY29tcGxldGVseQ0KPiA+ID4gwqDCoMKgZGlmZmVyZW50IHJl cXVlc3QsIHRoZSBzZXJ2ZXIgTUFZIHRyZWF0IHRoZSByZXF1ZXN0IGFzIGlmIGl0DQo+ID4gPiBp cyBhDQo+ID4gPiDCoMKgwqByZXRyeSBvZiB3aGF0IGl0IGhhcyBhbHJlYWR5IGV4ZWN1dGVkLsKg wqBUaGUgc2VydmVyIE1BWSBob3dldmVyDQo+ID4gPiBkZXRlY3QNCj4gPiA+IMKgwqDCoHRoZSBj bGllbnQncyBpbGxlZ2FsIHJldXNlIGFuZCByZXR1cm4gTkZTNEVSUl9TRVFfRkFMU0VfUkVUUlku DQo+ID4gPiANCj4gPiA+IFdoYXQgaXMgImEgY29tcGxldGVseSBkaWZmZXJlbnQgcmVxdWVzdCIu IEZyb20gdGhlIGNsaWVudCdzIHBvaW50DQo+ID4gPiBvZg0KPiA+ID4gdmlldyBzZW5kaW5nIGRp ZmZlcmVudCBhcmdzIHdvdWxkIGNvbnN0aXR1dGUgYSBkaWZmZXJlbnQgcmVxdWVzdC4NCj4gPiA+ IEJ1dA0KPiA+ID4gaW4gYW55IGNhc2UgaXQncyBhICJNQVkiIHNvIGNsaWVudCBjYW4ndCBkZXBl bmQgb24gdGhpcyBiZWluZw0KPiA+ID4gaW1wbGVtZW50ZWQuDQo+ID4gPiANCj4gPiANCj4gPiBX aGF0J3MgdGhlIGFsdGVybmF0aXZlPyBBc3N1bWUgdGhlIGNsaWVudCBwcmUtZW1wdGl2ZWx5IGJ1 bXBzIHRoZQ0KPiA+IHNlcWlkDQo+ID4gaW5zdGVhZCBvZiByZXRyeWluZywgdGhlbiB0aGUgdXNl ciBwcmVzc2VzIEN0cmwtQyBhZ2Fpbi4gUmVwZWF0IGENCj4gPiBmZXcNCj4gPiBtb3JlIHRpbWVz LiBIb3cgZG8gSSBub3cgcmVzeW5jIHRoZSBzZXFpZHMgYmV0d2VlbiB0aGUgY2xpZW50IGFuZA0K PiA+IHNlcnZlciBvdGhlciB0aGFuIGJ5IHRyYXNoaW5nIHRoZSBzZXNzaW9uPw0KPiANCj4gSSBk b24ndCBzZWUgYW55IGFsdGVybmF0aXZlcyB0aGFuIHRvIHJlc2V0IGluIHRoYXQgY2FzZS4gQnV0 IEkgdGhpbmsNCj4gaXQncyBiZXR0ZXIgdGhlbiB0aGUgcG9zc2liaWxpdHkgb2YgYWNjaWRlbnRh bGx5IG9wZW5pbmcgYSB3cm9uZw0KPiBmaWxlPw0KDQpUaGV5IHNvdW5kIGVxdWFsbHkgYmFkIHRv IG1lIHdoaWNoIGlzIHdoeSBJJ20gbm90IHVuZGVyc3RhbmRpbmcgaG93IGENCnNlcnZlciB3b3Vs ZCBmYWlsIHRvIGltcGxlbWVudCBzb21lIG1pbmltYWwgZm9ybSBvZiBmYWxzZSByZXRyeQ0KY2hl Y2tpbmcuDQpUaGUgTGludXggTkZTdjMgRFJDIHdpbGwsIGZvciBpbnN0YW5jZSwgY2hlY2tzdW0g YXQgbGVhc3Qgc29tZSBwYXJ0IG9mDQp0aGUgUlBDIGFyZ3VtZW50cyBmb3IgX2FsbF8gUlBDIGNh bGxzLiBNb3N0IE5GU3Y0LnggY2xpZW50cyB3aWxsIG9ubHkNCmFzayB0aGF0IHlvdSBjaGVja3N1 bSB0aGUgbm9uLWlkZW1wb3RlbnQgUlBDIGNhbGxzLCB3aGljaCBzaWduaWZpY2FudGx5DQpjdXRz IGRvd24gb24gdGhlIGNhbGN1bGF0aW9uIG92ZXJoZWFkLg0K ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-11-17 22:43 ` Trond Myklebust @ 2016-11-18 20:52 ` bfields 2016-11-18 22:44 ` Trond Myklebust 0 siblings, 1 reply; 27+ messages in thread From: bfields @ 2016-11-18 20:52 UTC (permalink / raw) To: Trond Myklebust; +Cc: aglo, tibbs, linux-nfs On Thu, Nov 17, 2016 at 10:43:58PM +0000, Trond Myklebust wrote: > On Thu, 2016-11-17 at 17:27 -0500, Olga Kornievskaia wrote: > > On Thu, Nov 17, 2016 at 5:15 PM, Trond Myklebust > > <trondmy@primarydata.com> wrote: > > > What's the alternative? Assume the client pre-emptively bumps the > > > seqid > > > instead of retrying, then the user presses Ctrl-C again. Repeat a > > > few > > > more times. How do I now resync the seqids between the client and > > > server other than by trashing the session? > > > > I don't see any alternatives than to reset in that case. But I think > > it's better then the possibility of accidentally opening a wrong > > file? Remind me why you can't continue resending after the Ctrl-C? (I thought this was already done for some lock and other cases?) > They sound equally bad to me which is why I'm not understanding how a > server would fail to implement some minimal form of false retry > checking. > The Linux NFSv3 DRC will, for instance, checksum at least some part of > the RPC arguments for _all_ RPC calls. Most NFSv4.x clients will only > ask that you checksum the non-idempotent RPC calls, which significantly > cuts down on the calculation overhead. I'll look at adding checksumming, it shouldn't be hard. --b. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-11-18 20:52 ` bfields @ 2016-11-18 22:44 ` Trond Myklebust 2016-11-21 18:37 ` Fields Bruce James 0 siblings, 1 reply; 27+ messages in thread From: Trond Myklebust @ 2016-11-18 22:44 UTC (permalink / raw) To: Fields Bruce James; +Cc: Kornievskaia Olga, tibbs, List Linux NFS Mailing DQo+IE9uIE5vdiAxOCwgMjAxNiwgYXQgMTU6NTIsIGJmaWVsZHNAZmllbGRzZXMub3JnIHdyb3Rl Og0KPiANCj4gT24gVGh1LCBOb3YgMTcsIDIwMTYgYXQgMTA6NDM6NThQTSArMDAwMCwgVHJvbmQg TXlrbGVidXN0IHdyb3RlOg0KPj4gT24gVGh1LCAyMDE2LTExLTE3IGF0IDE3OjI3IC0wNTAwLCBP bGdhIEtvcm5pZXZza2FpYSB3cm90ZToNCj4+PiBPbiBUaHUsIE5vdiAxNywgMjAxNiBhdCA1OjE1 IFBNLCBUcm9uZCBNeWtsZWJ1c3QNCj4+PiA8dHJvbmRteUBwcmltYXJ5ZGF0YS5jb20+IHdyb3Rl Og0KPj4+PiBXaGF0J3MgdGhlIGFsdGVybmF0aXZlPyBBc3N1bWUgdGhlIGNsaWVudCBwcmUtZW1w dGl2ZWx5IGJ1bXBzIHRoZQ0KPj4+PiBzZXFpZA0KPj4+PiBpbnN0ZWFkIG9mIHJldHJ5aW5nLCB0 aGVuIHRoZSB1c2VyIHByZXNzZXMgQ3RybC1DIGFnYWluLiBSZXBlYXQgYQ0KPj4+PiBmZXcNCj4+ Pj4gbW9yZSB0aW1lcy4gSG93IGRvIEkgbm93IHJlc3luYyB0aGUgc2VxaWRzIGJldHdlZW4gdGhl IGNsaWVudCBhbmQNCj4+Pj4gc2VydmVyIG90aGVyIHRoYW4gYnkgdHJhc2hpbmcgdGhlIHNlc3Np b24/DQo+Pj4gDQo+Pj4gSSBkb24ndCBzZWUgYW55IGFsdGVybmF0aXZlcyB0aGFuIHRvIHJlc2V0 IGluIHRoYXQgY2FzZS4gQnV0IEkgdGhpbmsNCj4+PiBpdCdzIGJldHRlciB0aGVuIHRoZSBwb3Nz aWJpbGl0eSBvZiBhY2NpZGVudGFsbHkgb3BlbmluZyBhIHdyb25nDQo+Pj4gZmlsZT8NCj4gDQo+ IFJlbWluZCBtZSB3aHkgeW91IGNhbid0IGNvbnRpbnVlIHJlc2VuZGluZyBhZnRlciB0aGUgQ3Ry bC1DPyAgKEkgdGhvdWdodA0KPiB0aGlzIHdhcyBhbHJlYWR5IGRvbmUgZm9yIHNvbWUgbG9jayBh bmQgb3RoZXIgY2FzZXM/KQ0KDQpXZeKAmWQgaGF2ZSB0byBkbyBpdCBmb3IgYWxsIFJQQyBjYWxs cywgd2hpY2ggbWVhbnMgdGhleSB3b3VsZCBhbGwgaGF2ZSB0byBiZSBjb252ZXJ0ZWQgdG8gbm90 IHVzZSB0aGUgc3RhY2suIFRoZSByZXN1bHRpbmcgYmVoYXZpb3VyIHdvdWxkIGFsc28gYmUgY29u ZnVzaW5nLCBhcyBvcGVyYXRpb25zIHdvdWxkIGNvbXBsZXRlIG91dHNpZGUgb2YgbG9ja2luZyBy dWxlcyBldGMuIFNvLCBmb3IgaW5zdGFuY2UsIHlvdSB3b3VsZCBiZSBzZWVpbmcgc3VjY2Vzc2Z1 bGx5IGxvb2tlZCB1cCBmaWxlcyBteXN0ZXJpb3VzbHkgZGlzYXBwZWFyaW5nIGFzIHRoZSBhc3lu Y2hyb25vdXMgdW5saW5rKCkgb3BlcmF0aW9uIHRoYXQgd2FzIGludGVycnVwdGVkIGNvbXBsZXRl cywgb3IgZGlyZWN0b3JpZXMgbXlzdGVyaW91c2x5IGdldHRpbmcgcmVuYW1lZC4NCg0KPiANCj4+ IFRoZXkgc291bmQgZXF1YWxseSBiYWQgdG8gbWUgd2hpY2ggaXMgd2h5IEknbSBub3QgdW5kZXJz dGFuZGluZyBob3cgYQ0KPj4gc2VydmVyIHdvdWxkIGZhaWwgdG8gaW1wbGVtZW50IHNvbWUgbWlu aW1hbCBmb3JtIG9mIGZhbHNlIHJldHJ5DQo+PiBjaGVja2luZy4NCj4+IFRoZSBMaW51eCBORlN2 MyBEUkMgd2lsbCwgZm9yIGluc3RhbmNlLCBjaGVja3N1bSBhdCBsZWFzdCBzb21lIHBhcnQgb2YN Cj4+IHRoZSBSUEMgYXJndW1lbnRzIGZvciBfYWxsXyBSUEMgY2FsbHMuIE1vc3QgTkZTdjQueCBj bGllbnRzIHdpbGwgb25seQ0KPj4gYXNrIHRoYXQgeW91IGNoZWNrc3VtIHRoZSBub24taWRlbXBv dGVudCBSUEMgY2FsbHMsIHdoaWNoIHNpZ25pZmljYW50bHkNCj4+IGN1dHMgZG93biBvbiB0aGUg Y2FsY3VsYXRpb24gb3ZlcmhlYWQuDQo+IA0KPiBJJ2xsIGxvb2sgYXQgYWRkaW5nIGNoZWNrc3Vt bWluZywgaXQgc2hvdWxkbid0IGJlIGhhcmQuDQoNClRoYW5rcy4NCg0K ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew 2016-11-18 22:44 ` Trond Myklebust @ 2016-11-21 18:37 ` Fields Bruce James 0 siblings, 0 replies; 27+ messages in thread From: Fields Bruce James @ 2016-11-21 18:37 UTC (permalink / raw) To: Trond Myklebust; +Cc: Kornievskaia Olga, tibbs, List Linux NFS Mailing On Fri, Nov 18, 2016 at 10:44:52PM +0000, Trond Myklebust wrote: > > > On Nov 18, 2016, at 15:52, bfields@fieldses.org wrote: > > > > On Thu, Nov 17, 2016 at 10:43:58PM +0000, Trond Myklebust wrote: > >> On Thu, 2016-11-17 at 17:27 -0500, Olga Kornievskaia wrote: > >>> On Thu, Nov 17, 2016 at 5:15 PM, Trond Myklebust > >>> <trondmy@primarydata.com> wrote: > >>>> What's the alternative? Assume the client pre-emptively bumps the > >>>> seqid > >>>> instead of retrying, then the user presses Ctrl-C again. Repeat a > >>>> few > >>>> more times. How do I now resync the seqids between the client and > >>>> server other than by trashing the session? > >>> > >>> I don't see any alternatives than to reset in that case. But I think > >>> it's better then the possibility of accidentally opening a wrong > >>> file? > > > > Remind me why you can't continue resending after the Ctrl-C? (I thought > > this was already done for some lock and other cases?) > > We’d have to do it for all RPC calls, which means they would all have to be converted to not use the stack. The resulting behaviour would also be confusing, as operations would complete outside of locking rules etc. So, for instance, you would be seeing successfully looked up files mysteriously disappearing as the asynchronous unlink() operation that was interrupted completes, or directories mysteriously getting renamed. We get that anyway, since the server may already be processing the rpc. I'll admit it sounds complicated. > >> They sound equally bad to me which is why I'm not understanding how a > >> server would fail to implement some minimal form of false retry > >> checking. > >> The Linux NFSv3 DRC will, for instance, checksum at least some part of > >> the RPC arguments for _all_ RPC calls. Most NFSv4.x clients will only > >> ask that you checksum the non-idempotent RPC calls, which significantly > >> cuts down on the calculation overhead. > > > > I'll look at adding checksumming, it shouldn't be hard. > > Thanks. I doubt it's a complete fix, though. Two calls with the same arguments aren't necessarily the same call either, and it's the sequence id that's supposed to distinguish the "same call" and "different call with same argument" cases. --b. ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2016-11-21 18:37 UTC | newest] Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-02-24 21:43 NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew Jason L Tibbitts III 2016-02-25 19:58 ` J. Bruce Fields 2016-02-29 23:06 ` Jason L Tibbitts III 2016-03-01 0:48 ` J. Bruce Fields 2016-03-01 0:53 ` Jason L Tibbitts III 2016-03-01 1:01 ` J. Bruce Fields 2016-03-01 1:03 ` Jason L Tibbitts III 2016-11-16 20:55 ` Jason L Tibbitts III 2016-11-17 16:31 ` J. Bruce Fields 2016-11-17 17:08 ` Jason L Tibbitts III 2016-11-17 20:22 ` Andrew W Elble 2016-11-17 17:45 ` Trond Myklebust 2016-11-17 19:32 ` bfields 2016-11-17 19:58 ` Olga Kornievskaia 2016-11-17 20:17 ` bfields 2016-11-17 20:29 ` Olga Kornievskaia 2016-11-17 20:46 ` bfields 2016-11-17 21:05 ` Olga Kornievskaia 2016-11-17 21:26 ` bfields 2016-11-17 21:45 ` Trond Myklebust 2016-11-17 21:53 ` Olga Kornievskaia 2016-11-17 22:15 ` Trond Myklebust 2016-11-17 22:27 ` Olga Kornievskaia 2016-11-17 22:43 ` Trond Myklebust 2016-11-18 20:52 ` bfields 2016-11-18 22:44 ` Trond Myklebust 2016-11-21 18:37 ` Fields Bruce James
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.