All of lore.kernel.org
 help / color / mirror / Atom feed
* NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
@ 2016-02-24 21:43 Jason L Tibbitts III
  2016-02-25 19:58 ` J. Bruce Fields
  0 siblings, 1 reply; 27+ messages in thread
From: Jason L Tibbitts III @ 2016-02-24 21:43 UTC (permalink / raw)
  To: linux-nfs

My NFS infrastructure has servers running current RHEL7.2 (mostly kernel
3.10.0-327.4.5.el7 with a one-line patch needed to fix a soft lockup in
nfs4_laundromat) and clients running current Fedora 23
(4.3.5-300.fc23.x86_64).  Everything is mounted NFS4.1 with sec=krb5p.

Occasionally a client will get into a state where it just hammers the
server with network traffic, sometimes at full line rate, with:

NFS: nfs4_reclaim_open_state: Lock reclaim failed!

spewed to the log about 500 times a second.  The load goes up quite a
bit (to 5-7 or so).  The machine isn't doing anything and there isn't
even a user logged in.  However, there are always a few user processes
hanging around, usually kwin_x11 for whatever reason.  (My guess is
because of a lock on ~/.Xauthority.)

When I kill those user processes, this is logged once:

NFS: nfs4_reclaim_open_state: unhandled error -10068

-10068 is NFS4ERR_RETRY_UNCACHED_REP.

Unfortunately I did not grab any of that traffic (I just wanted it to
stop).  This happens to me periodically so I'll be sure to do that when
it hits again.

One theory is that this is related to a user's kerberos ticket
expiring.  I see some hits when I search for the line that's spewed, but
they're either not recent or or weren't reproducible.  I don't find any
hits for that specific unhandled error.

 - J<

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-02-24 21:43 NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew Jason L Tibbitts III
@ 2016-02-25 19:58 ` J. Bruce Fields
  2016-02-29 23:06   ` Jason L Tibbitts III
  0 siblings, 1 reply; 27+ messages in thread
From: J. Bruce Fields @ 2016-02-25 19:58 UTC (permalink / raw)
  To: Jason L Tibbitts III; +Cc: linux-nfs

On Wed, Feb 24, 2016 at 03:43:45PM -0600, Jason L Tibbitts III wrote:
> My NFS infrastructure has servers running current RHEL7.2 (mostly kernel
> 3.10.0-327.4.5.el7 with a one-line patch needed to fix a soft lockup in
> nfs4_laundromat) and clients running current Fedora 23
> (4.3.5-300.fc23.x86_64).  Everything is mounted NFS4.1 with sec=krb5p.
> 
> Occasionally a client will get into a state where it just hammers the
> server with network traffic, sometimes at full line rate, with:
> 
> NFS: nfs4_reclaim_open_state: Lock reclaim failed!
> 
> spewed to the log about 500 times a second.  The load goes up quite a
> bit (to 5-7 or so).  The machine isn't doing anything and there isn't
> even a user logged in.  However, there are always a few user processes
> hanging around, usually kwin_x11 for whatever reason.  (My guess is
> because of a lock on ~/.Xauthority.)
> 
> When I kill those user processes, this is logged once:
> 
> NFS: nfs4_reclaim_open_state: unhandled error -10068
> 
> -10068 is NFS4ERR_RETRY_UNCACHED_REP.

The only place the server sets that error is in
fs/nfsd/nfs4state.c:nfsd4_enc_sequence_replay.

If the server's correct, then the client attempted to resend a request
that the server was not required to cache.  In which case
NFS4ERR_RETRY_UNCACHED_REP is a valid error, and the client should give
up (or retry with a new slot/seqid?).

In any case, something's wrong with the 4.1 reply caching logic on
client or server.....

> Unfortunately I did not grab any of that traffic (I just wanted it to
> stop).  This happens to me periodically so I'll be sure to do that when
> it hits again.

OK, that'd be helpful.  Unfortunately what would probably be *most*
helpful would be the traffic that lead up to this--by the time the
client and server get into this loop the interesting problem may have
already happened--but just seeing the loop may be useful too.

--b.

> One theory is that this is related to a user's kerberos ticket
> expiring.  I see some hits when I search for the line that's spewed, but
> they're either not recent or or weren't reproducible.  I don't find any
> hits for that specific unhandled error.
> 
>  - J<
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-02-25 19:58 ` J. Bruce Fields
@ 2016-02-29 23:06   ` Jason L Tibbitts III
  2016-03-01  0:48     ` J. Bruce Fields
  0 siblings, 1 reply; 27+ messages in thread
From: Jason L Tibbitts III @ 2016-02-29 23:06 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

>>>>> "JBF" == J Bruce Fields <bfields@fieldses.org> writes:

>> Unfortunately I did not grab any of that traffic (I just wanted it to
>> stop).  This happens to me periodically so I'll be sure to do that
>> when it hits again.

JBF> OK, that'd be helpful.

I waited a bit and it's happened again from a few clients.  I captured
some traffic from one of them and it's just an endless stream of
Call/Reply:

  8 0.002842000 172.21.86.135 -> 172.21.86.78 NFS 406 V4 Call
  9 0.003493000 172.21.86.78 -> 172.21.86.135 NFS 518 V4 Reply (Call In  8)
 10 0.003536000 172.21.86.135 -> 172.21.86.78 NFS 406 V4 Call
 11 0.004168000 172.21.86.78 -> 172.21.86.135 NFS 518 V4 Reply (Call In 10)
 12 0.004252000 172.21.86.135 -> 172.21.86.78 NFS 406 V4 Call
 13 0.004854000 172.21.86.78 -> 172.21.86.135 NFS 518 V4 Reply (Call In 12)
 14 0.004931000 172.21.86.135 -> 172.21.86.78 NFS 406 V4 Call
 15 0.005613000 172.21.86.78 -> 172.21.86.135 NFS 518 V4 Reply (Call In 14)


Here's a call:

Frame 8: 406 bytes on wire (3248 bits), 406 bytes captured (3248 bits) on interface 0
    Interface id: 0 (eno1)
    Encapsulation type: Ethernet (1)
    Arrival Time: Feb 29, 2016 16:48:17.376573000 CST
    [Time shift for this packet: 0.000000000 seconds]
    Epoch Time: 1456786097.376573000 seconds
    [Time delta from previous captured frame: 0.000049000 seconds]
    [Time delta from previous displayed frame: 0.000000000 seconds]
    [Time since reference or first frame: 0.002842000 seconds]
    Frame Number: 8
    Frame Length: 406 bytes (3248 bits)
    Capture Length: 406 bytes (3248 bits)
    [Frame is marked: False]
    [Frame is ignored: False]
    [Protocols in frame: eth:ethertype:ip:tcp:rpc:gss-api:spnego-krb5:spnego-krb5]
Ethernet II, Src: AsustekC_6d:f1:7e (f0:79:59:6d:f1:7e), Dst: SuperMic_19:86:94 (00:25:90:19:86:94)
    Destination: SuperMic_19:86:94 (00:25:90:19:86:94)
        Address: SuperMic_19:86:94 (00:25:90:19:86:94)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
    Source: AsustekC_6d:f1:7e (f0:79:59:6d:f1:7e)
        Address: AsustekC_6d:f1:7e (f0:79:59:6d:f1:7e)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
    Type: IP (0x0800)
Internet Protocol Version 4, Src: 172.21.86.135 (172.21.86.135), Dst: 172.21.86.78 (172.21.86.78)
    Version: 4
    Header Length: 20 bytes
    Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00: Not-ECT (Not ECN-Capable Transport))
        0000 00.. = Differentiated Services Codepoint: Default (0x00)
        .... ..00 = Explicit Congestion Notification: Not-ECT (Not ECN-Capable Transport) (0x00)
    Total Length: 392
    Identification: 0x79da (31194)
    Flags: 0x02 (Don't Fragment)
        0... .... = Reserved bit: Not set
        .1.. .... = Don't fragment: Set
        ..0. .... = More fragments: Not set
    Fragment offset: 0
    Time to live: 64
    Protocol: TCP (6)
    Header checksum: 0xba95 [validation disabled]
        [Good: False]
        [Bad: False]
    Source: 172.21.86.135 (172.21.86.135)
    Destination: 172.21.86.78 (172.21.86.78)
    [Source GeoIP: Unknown]
    [Destination GeoIP: Unknown]
Transmission Control Protocol, Src Port: 863 (863), Dst Port: 2049 (2049), Seq: 1361, Ack: 2261, Len: 340
    Source Port: 863 (863)
    Destination Port: 2049 (2049)
    [Stream index: 0]
    [TCP Segment Len: 340]
    Sequence number: 1361    (relative sequence number)
    [Next sequence number: 1701    (relative sequence number)]
    Acknowledgment number: 2261    (relative ack number)
    Header Length: 32 bytes
    .... 0000 0001 1000 = Flags: 0x018 (PSH, ACK)
        000. .... .... = Reserved: Not set
        ...0 .... .... = Nonce: Not set
        .... 0... .... = Congestion Window Reduced (CWR): Not set
        .... .0.. .... = ECN-Echo: Not set
        .... ..0. .... = Urgent: Not set
        .... ...1 .... = Acknowledgment: Set
        .... .... 1... = Push: Set
        .... .... .0.. = Reset: Not set
        .... .... ..0. = Syn: Not set
        .... .... ...0 = Fin: Not set
    Window size value: 24576
    [Calculated window size: 24576]
    [Window size scaling factor: -1 (unknown)]
    Checksum: 0x067b [validation disabled]
        [Good Checksum: False]
        [Bad Checksum: False]
    Urgent pointer: 0
    Options: (12 bytes), No-Operation (NOP), No-Operation (NOP), Timestamps
        No-Operation (NOP)
            Type: 1
                0... .... = Copy on fragmentation: No
                .00. .... = Class: Control (0)
                ...0 0001 = Number: No-Operation (NOP) (1)
        No-Operation (NOP)
            Type: 1
                0... .... = Copy on fragmentation: No
                .00. .... = Class: Control (0)
                ...0 0001 = Number: No-Operation (NOP) (1)
        Timestamps: TSval 1220154220, TSecr 882989725
            Kind: Time Stamp Option (8)
            Length: 10
            Timestamp value: 1220154220
            Timestamp echo reply: 882989725
    [SEQ/ACK analysis]
        [This is an ACK to the segment in frame: 7]
        [The RTT to ACK the segment was: 0.000049000 seconds]
        [Bytes in flight: 340]
Remote Procedure Call, Type:Call XID:0xffb2b86a
    Fragment header: Last fragment, 336 bytes
        1... .... .... .... .... .... .... .... = Last Fragment: Yes
        .000 0000 0000 0000 0000 0001 0101 0000 = Fragment Length: 336
    XID: 0xffb2b86a (4289902698)
    Message Type: Call (0)
    RPC Version: 2
    Program: NFS (100003)
    Program Version: 4
    Procedure: COMPOUND (1)
    Credentials
        Flavor: RPCSEC_GSS (6)
        Length: 28
        GSS Version: 1
        GSS Procedure: RPCSEC_GSS_DATA (0)
        GSS Sequence Number: 59958333
        GSS Service: rpcsec_gss_svc_privacy (3)
        GSS Context
            GSS Context Length: 8
            GSS Context: 6d35000000000000
    Verifier
        Flavor: RPCSEC_GSS (6)
        GSS Token: 0000001c040404ffffffffff000000001bb5b424f2b2f47f...
            GSS Token Length: 28
            GSS-API Generic Security Service Application Program Interface
                krb5_blob: 040404ffffffffff000000001bb5b424f2b2f47fb068d17e...
                    krb5_tok_id: KRB_TOKEN_CFX_GetMic (0x0404)
                    krb5_cfx_flags: 0x04
                        .... .1.. = AcceptorSubkey: Set
                        .... ..0. = Sealed: Not set
                        .... ...0 = SendByAcceptor: Not set
                    krb5_filler: ffffffffff
                    krb5_cfx_seq: 464892964
                    krb5_sgn_cksum: f2b2f47fb068d17e40fca641
GSS-Wrap
    Length: 236
    GSS Data: 050406ff00000000000000001bb5b425d745e53e92ca2305...
    krb5_blob: 050406ff00000000000000001bb5b425d745e53e92ca2305...
        krb5_tok_id: KRB_TOKEN_CFX_WRAP (0x0405)
        krb5_cfx_flags: 0x06
            .... .1.. = AcceptorSubkey: Set
            .... ..1. = Sealed: Set
            .... ...0 = SendByAcceptor: Not set
        krb5_filler: ff
        krb5_cfx_ec: 0
        krb5_cfx_rrc: 0
        krb5_cfx_seq: 464892965
        krb5_sgn_cksum: d745e53e92ca2305e3b1f3bdbb6c710bd731fcc684b8dba8...
Network File System
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]


And here's a reply:

Frame 9: 518 bytes on wire (4144 bits), 518 bytes captured (4144 bits) on interface 0
    Interface id: 0 (eno1)
    Encapsulation type: Ethernet (1)
    Arrival Time: Feb 29, 2016 16:48:17.377224000 CST
    [Time shift for this packet: 0.000000000 seconds]
    Epoch Time: 1456786097.377224000 seconds
    [Time delta from previous captured frame: 0.000651000 seconds]
    [Time delta from previous displayed frame: 0.000000000 seconds]
    [Time since reference or first frame: 0.003493000 seconds]
    Frame Number: 9
    Frame Length: 518 bytes (4144 bits)
    Capture Length: 518 bytes (4144 bits)
    [Frame is marked: False]
    [Frame is ignored: False]
    [Protocols in frame: eth:ethertype:ip:tcp:rpc:gss-api:spnego-krb5:spnego-krb5]
Ethernet II, Src: SuperMic_19:86:94 (00:25:90:19:86:94), Dst: AsustekC_6d:f1:7e (f0:79:59:6d:f1:7e)
    Destination: AsustekC_6d:f1:7e (f0:79:59:6d:f1:7e)
        Address: AsustekC_6d:f1:7e (f0:79:59:6d:f1:7e)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
    Source: SuperMic_19:86:94 (00:25:90:19:86:94)
        Address: SuperMic_19:86:94 (00:25:90:19:86:94)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
    Type: IP (0x0800)
Internet Protocol Version 4, Src: 172.21.86.78 (172.21.86.78), Dst: 172.21.86.135 (172.21.86.135)
    Version: 4
    Header Length: 20 bytes
    Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00: Not-ECT (Not ECN-Capable Transport))
        0000 00.. = Differentiated Services Codepoint: Default (0x00)
        .... ..00 = Explicit Congestion Notification: Not-ECT (Not ECN-Capable Transport) (0x00)
    Total Length: 504
    Identification: 0x2bf6 (11254)
    Flags: 0x02 (Don't Fragment)
        0... .... = Reserved bit: Not set
        .1.. .... = Don't fragment: Set
        ..0. .... = More fragments: Not set
    Fragment offset: 0
    Time to live: 64
    Protocol: TCP (6)
    Header checksum: 0x080a [validation disabled]
        [Good: False]
        [Bad: False]
    Source: 172.21.86.78 (172.21.86.78)
    Destination: 172.21.86.135 (172.21.86.135)
    [Source GeoIP: Unknown]
    [Destination GeoIP: Unknown]
Transmission Control Protocol, Src Port: 2049 (2049), Dst Port: 863 (863), Seq: 2261, Ack: 1701, Len: 452
    Source Port: 2049 (2049)
    Destination Port: 863 (863)
    [Stream index: 0]
    [TCP Segment Len: 452]
    Sequence number: 2261    (relative sequence number)
    [Next sequence number: 2713    (relative sequence number)]
    Acknowledgment number: 1701    (relative ack number)
    Header Length: 32 bytes
    .... 0000 0001 1000 = Flags: 0x018 (PSH, ACK)
        000. .... .... = Reserved: Not set
        ...0 .... .... = Nonce: Not set
        .... 0... .... = Congestion Window Reduced (CWR): Not set
        .... .0.. .... = ECN-Echo: Not set
        .... ..0. .... = Urgent: Not set
        .... ...1 .... = Acknowledgment: Set
        .... .... 1... = Push: Set
        .... .... .0.. = Reset: Not set
        .... .... ..0. = Syn: Not set
        .... .... ...0 = Fin: Not set
    Window size value: 32885
    [Calculated window size: 32885]
    [Window size scaling factor: -1 (unknown)]
    Checksum: 0x8ac4 [validation disabled]
        [Good Checksum: False]
        [Bad Checksum: False]
    Urgent pointer: 0
    Options: (12 bytes), No-Operation (NOP), No-Operation (NOP), Timestamps
        No-Operation (NOP)
            Type: 1
                0... .... = Copy on fragmentation: No
                .00. .... = Class: Control (0)
                ...0 0001 = Number: No-Operation (NOP) (1)
        No-Operation (NOP)
            Type: 1
                0... .... = Copy on fragmentation: No
                .00. .... = Class: Control (0)
                ...0 0001 = Number: No-Operation (NOP) (1)
        Timestamps: TSval 882989725, TSecr 1220154220
            Kind: Time Stamp Option (8)
            Length: 10
            Timestamp value: 882989725
            Timestamp echo reply: 1220154220
    [SEQ/ACK analysis]
        [This is an ACK to the segment in frame: 8]
        [The RTT to ACK the segment was: 0.000651000 seconds]
        [Bytes in flight: 452]
Remote Procedure Call, Type:Reply XID:0xffb2b86a
    Fragment header: Last fragment, 448 bytes
        1... .... .... .... .... .... .... .... = Last Fragment: Yes
        .000 0000 0000 0000 0000 0001 1100 0000 = Fragment Length: 448
    XID: 0xffb2b86a (4289902698)
    Message Type: Reply (1)
    [Program: NFS (100003)]
    [Program Version: 4]
    [Procedure: COMPOUND (1)]
    Reply State: accepted (0)
    [This is a reply to a request in frame 8]
    [Time from request: 0.000651000 seconds]
    Verifier
        Flavor: RPCSEC_GSS (6)
        GSS Token: 0000001c040405ffffffffff000000003480887866d9d129...
            GSS Token Length: 28
            GSS-API Generic Security Service Application Program Interface
                krb5_blob: 040405ffffffffff000000003480887866d9d129415f2ea7...
                    krb5_tok_id: KRB_TOKEN_CFX_GetMic (0x0404)
                    krb5_cfx_flags: 0x05
                        .... .1.. = AcceptorSubkey: Set
                        .... ..0. = Sealed: Not set
                        .... ...1 = SendByAcceptor: Set
                    krb5_filler: ffffffffff
                    krb5_cfx_seq: 880838776
                    krb5_sgn_cksum: 66d9d129415f2ea76c509189
    Accept State: RPC executed successfully (0)
GSS-Wrap
    Length: 392
    GSS Data: 050407ff000000000000000034808879d55c1d7af7af82f5...
    krb5_blob: 050407ff000000000000000034808879d55c1d7af7af82f5...
        krb5_tok_id: KRB_TOKEN_CFX_WRAP (0x0405)
        krb5_cfx_flags: 0x07
            .... .1.. = AcceptorSubkey: Set
            .... ..1. = Sealed: Set
            .... ...1 = SendByAcceptor: Set
        krb5_filler: ff
        krb5_cfx_ec: 0
        krb5_cfx_rrc: 0
        krb5_cfx_seq: 880838777
        krb5_sgn_cksum: d55c1d7af7af82f5384db9579420a0e8eb2bf20013a66022...
Network File System
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]

The calls and replies all appear to be identical.  Sorry for the length
of those but I wouldn't want to trim anything that might be important.

JBF> Unfortunately what would probably be *most* helpful would be the
JBF> traffic that lead up to this--by the time the client and server get
JBF> into this loop the interesting problem may have already
JBF> happened--but just seeing the loop may be useful too.

Yeah, there's basically no chance of capturing that, and I have no way
to make it happen.  I can't just snarf all of the NFS traffic, and once
this starts it generates so many packets....

Note that this doesn't appear to be caused by something like a kerberos
ticket; the user last logged in well under the maximum ticket renewal
time, and SSSD has been dutifully renewing them.

 - J<

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-02-29 23:06   ` Jason L Tibbitts III
@ 2016-03-01  0:48     ` J. Bruce Fields
  2016-03-01  0:53       ` Jason L Tibbitts III
  0 siblings, 1 reply; 27+ messages in thread
From: J. Bruce Fields @ 2016-03-01  0:48 UTC (permalink / raw)
  To: Jason L Tibbitts III; +Cc: linux-nfs

On Mon, Feb 29, 2016 at 05:06:35PM -0600, Jason L Tibbitts III wrote:
> >>>>> "JBF" == J Bruce Fields <bfields@fieldses.org> writes:
> 
> >> Unfortunately I did not grab any of that traffic (I just wanted it to
> >> stop).  This happens to me periodically so I'll be sure to do that
> >> when it hits again.
> 
> JBF> OK, that'd be helpful.
> 
> I waited a bit and it's happened again from a few clients.  I captured
> some traffic from one of them and it's just an endless stream of
> Call/Reply:
> 
>   8 0.002842000 172.21.86.135 -> 172.21.86.78 NFS 406 V4 Call
>   9 0.003493000 172.21.86.78 -> 172.21.86.135 NFS 518 V4 Reply (Call In  8)
>  10 0.003536000 172.21.86.135 -> 172.21.86.78 NFS 406 V4 Call
>  11 0.004168000 172.21.86.78 -> 172.21.86.135 NFS 518 V4 Reply (Call In 10)
>  12 0.004252000 172.21.86.135 -> 172.21.86.78 NFS 406 V4 Call
>  13 0.004854000 172.21.86.78 -> 172.21.86.135 NFS 518 V4 Reply (Call In 12)
>  14 0.004931000 172.21.86.135 -> 172.21.86.78 NFS 406 V4 Call
>  15 0.005613000 172.21.86.78 -> 172.21.86.135 NFS 518 V4 Reply (Call In 14)
> 
> 
> Here's a call:
> 
>         GSS Service: rpcsec_gss_svc_privacy (3)

Argh, it's all encrypted, so we all we have to go on is the size of the
request and reply:

> GSS-Wrap
>     Length: 236
...
> And here's a reply:
...
> GSS-Wrap
>     Length: 392

Anyway, just knowing the error may be enough for us to work out the
problem--I just need to dig a little more.

> The calls and replies all appear to be identical.  Sorry for the length
> of those but I wouldn't want to trim anything that might be important.
> 
> JBF> Unfortunately what would probably be *most* helpful would be the
> JBF> traffic that lead up to this--by the time the client and server get
> JBF> into this loop the interesting problem may have already
> JBF> happened--but just seeing the loop may be useful too.
> 
> Yeah, there's basically no chance of capturing that, and I have no way
> to make it happen.  I can't just snarf all of the NFS traffic, and once
> this starts it generates so many packets....

The best you could do is capture all traffic and throw away all but the
last few seconds (see the ring buffer stuff in tshark) and write a
script that kills the capture as soon as it notices you've hit this
condition.  Anyway, possibly still impractical and not worth the trouble
at this point anyway.

> Note that this doesn't appear to be caused by something like a kerberos
> ticket; the user last logged in well under the maximum ticket renewal
> time, and SSSD has been dutifully renewing them.

OK, thanks.

--b.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-03-01  0:48     ` J. Bruce Fields
@ 2016-03-01  0:53       ` Jason L Tibbitts III
  2016-03-01  1:01         ` J. Bruce Fields
  0 siblings, 1 reply; 27+ messages in thread
From: Jason L Tibbitts III @ 2016-03-01  0:53 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

>>>>> "JBF" == J Bruce Fields <bfields@fieldses.org> writes:

JBF> Argh, it's all encrypted, so we all we have to go on is the size of
JBF> the request and reply:

Yeah, I'm not sure how to get around that.  I know tshark can take a
keytab, but the user's key is in the kernel keyring and I'm not at all
sure how to dig it out (assuming I even can).  But if there's something
I can do, I can try.

I could switch to krb5i or plain krb for a while if that would be
useful, except that the clients would prefer krb5p if it's exported, and
if I stop exporting it then existing mounts break....  I'd have to
schedule downtime and kick everyone off, which I could do if it would
help.

JBF> The best you could do is capture all traffic and throw away all but
JBF> the last few seconds (see the ring buffer stuff in tshark) and
JBF> write a script that kills the capture as soon as it notices you've
JBF> hit this condition.

If I knew how to detect the condition, though, I have a feeling that
would be enough information to track down the bug anyway.

Also, I'd have to do this for a couple of hundred clients.  Ugh.

 - J<

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-03-01  0:53       ` Jason L Tibbitts III
@ 2016-03-01  1:01         ` J. Bruce Fields
  2016-03-01  1:03           ` Jason L Tibbitts III
  0 siblings, 1 reply; 27+ messages in thread
From: J. Bruce Fields @ 2016-03-01  1:01 UTC (permalink / raw)
  To: Jason L Tibbitts III; +Cc: linux-nfs

On Mon, Feb 29, 2016 at 06:53:15PM -0600, Jason L Tibbitts III wrote:
> >>>>> "JBF" == J Bruce Fields <bfields@fieldses.org> writes:
> 
> JBF> Argh, it's all encrypted, so we all we have to go on is the size of
> JBF> the request and reply:
> 
> Yeah, I'm not sure how to get around that.  I know tshark can take a
> keytab, but the user's key is in the kernel keyring and I'm not at all
> sure how to dig it out (assuming I even can).  But if there's something
> I can do, I can try.
> 
> I could switch to krb5i or plain krb for a while if that would be
> useful, except that the clients would prefer krb5p if it's exported, and
> if I stop exporting it then existing mounts break....  I'd have to
> schedule downtime and kick everyone off, which I could do if it would
> help.

I believe the order of flavors in the sec= option is the order they're
given to the client, so reversing it might cause new clients to make a
different choice--but I'm not actually sure of the logic there.

> 
> JBF> The best you could do is capture all traffic and throw away all but
> JBF> the last few seconds (see the ring buffer stuff in tshark) and
> JBF> write a script that kills the capture as soon as it notices you've
> JBF> hit this condition.
> 
> If I knew how to detect the condition, though, I have a feeling that
> would be enough information to track down the bug anyway.
> 
> Also, I'd have to do this for a couple of hundred clients.  Ugh.

Yeah, don't worry about it, I was just thinking aloud.

--b.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-03-01  1:01         ` J. Bruce Fields
@ 2016-03-01  1:03           ` Jason L Tibbitts III
  2016-11-16 20:55             ` Jason L Tibbitts III
  0 siblings, 1 reply; 27+ messages in thread
From: Jason L Tibbitts III @ 2016-03-01  1:03 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

>>>>> "JBF" == J Bruce Fields <bfields@fieldses.org> writes:

JBF> I believe the order of flavors in the sec= option is the order
JBF> they're given to the client, so reversing it might cause new
JBF> clients to make a different choice--but I'm not actually sure of
JBF> the logic there.

I've found it not to actually be stable; different Linux versions will
do different things.  But then, almost all of the desktops are now
running a single OS release so unless it's simply random, that should be
good enough.  I'll add krb5 to the exports and try to catch this again.

 - J<

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-03-01  1:03           ` Jason L Tibbitts III
@ 2016-11-16 20:55             ` Jason L Tibbitts III
  2016-11-17 16:31               ` J. Bruce Fields
  0 siblings, 1 reply; 27+ messages in thread
From: Jason L Tibbitts III @ 2016-11-16 20:55 UTC (permalink / raw)
  To: linux-nfs

I'm replying to a rather old message, but the issue has just now popped
back up again.

To recap, a client stops being able to access _any_ mount on a
particular server, and "NFS: nfs4_reclaim_open_state: Lock reclaim
failed!" appears several hundred times per second in the kernel log.
The load goes up by one for ever process attempting to access any mount
from that particular server.  Mounts to other servers are fine, and
other clients can mount things from that one server without problems.

When I kill every process keeping that particular mount active and then
umount it, I see:

NFS: nfs4_reclaim_open_state: unhandled error -10068

The server in this case is running Centos 7: 3.10.0-327.36.3.el7.x86_64
The client is one of the last I have that's still running Fedora 23:
4.7.10-100.fc23.x86_64

The big difference now, besides the updated kernel versions, is that
fact that no kerberos is involved.  It's a sec=sys mount.  There are
kerberized exports on the same server, but the client wasn't accessing
any of them.  There were other kerberized mounts on the client, but they
were to different servers and they remained accessible.

Here's ten frames from a packet capture.  I think it's all repeating at
that point so maybe it will be enough, but I can provide a more complete
capture log on request.

 - J<

Frame 1: 390 bytes on wire (3120 bits), 390 bytes captured (3120 bits) on interface 0
Ethernet II, Src: SuperMic_19:8e:6e (00:25:90:19:8e:6e), Dst: AsrockIn_84:34:e7 (d0:50:99:84:34:e7)
Internet Protocol Version 4, Src: 172.21.86.74, Dst: 172.21.86.141
Transmission Control Protocol, Src Port: 2049, Dst Port: 997, Seq: 1, Ack: 1, Len: 324
Remote Procedure Call

Frame 2: 322 bytes on wire (2576 bits), 322 bytes captured (2576 bits) on interface 0
Ethernet II, Src: AsrockIn_84:34:e7 (d0:50:99:84:34:e7), Dst: SuperMic_19:8e:6e (00:25:90:19:8e:6e)
Internet Protocol Version 4, Src: 172.21.86.141, Dst: 172.21.86.74
Transmission Control Protocol, Src Port: 997, Dst Port: 2049, Seq: 1, Ack: 325, Len: 256
Remote Procedure Call, Type:Call XID:0x29009a8c
Network File System, Ops(5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]
    Tag: <EMPTY>
        length: 0
        contents: <EMPTY>
    minorversion: 1
    Operations (count: 5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR
        Opcode: SEQUENCE (53)
            sessionid: f0f40f58bd020000e002000000000000
            seqid: 0x002b3189
            slot id: 0
            high slot id: 0
            cache this?: Yes
        Opcode: PUTFH (22)
            filehandle
                length: 28
                [hash (CRC-32): 0xdcaec4f7]
                filehandle: 010006016aeda54305f66ede000000000000000096484800...
        Opcode: OPEN (18)
            seqid: 0x00000000
            share_access: OPEN4_SHARE_ACCESS_BOTH (3)
            share_deny: OPEN4_SHARE_DENY_NONE (0)
            clientid: 0xf0f40f58bd020000
            owner: <DATA>
                length: 24
                contents: <DATA>
            Open Type: OPEN4_NOCREATE (0)
            Claim Type: CLAIM_FH (4)
        Opcode: ACCESS (3), [Check: RD MD XT XE]
            Check access: 0x2d
                .... ...1 = 0x01 READ: allowed?
                .... .1.. = 0x04 MODIFY: allowed?
                .... 1... = 0x08 EXTEND: allowed?
                ..1. .... = 0x20 EXECUTE: allowed?
        Opcode: GETATTR (9)
            Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId)
                reqd_attr: Type (1)
                reqd_attr: Change (3)
                reqd_attr: Size (4)
                reqd_attr: FSID (8)
                reco_attr: FileId (20)
            Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId)
                reco_attr: Mode (33)
                reco_attr: NumLinks (35)
                reco_attr: Owner (36)
                reco_attr: Owner_Group (37)
                reco_attr: RawDev (41)
                reco_attr: Space_Used (45)
                reco_attr: Time_Access (47)
                reco_attr: Time_Metadata (52)
                reco_attr: Time_Modify (53)
                reco_attr: Mounted_on_FileId (55)
    [Main Opcode: OPEN (18)]

Frame 3: 390 bytes on wire (3120 bits), 390 bytes captured (3120 bits) on interface 0
Ethernet II, Src: SuperMic_19:8e:6e (00:25:90:19:8e:6e), Dst: AsrockIn_84:34:e7 (d0:50:99:84:34:e7)
Internet Protocol Version 4, Src: 172.21.86.74, Dst: 172.21.86.141
Transmission Control Protocol, Src Port: 2049, Dst Port: 997, Seq: 325, Ack: 257, Len: 324
Remote Procedure Call, Type:Reply XID:0x29009a8c
Network File System, Ops(5): SEQUENCE PUTFH OPEN ACCESS GETATTR
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]
    Status: NFS4_OK (0)
    Tag: <EMPTY>
        length: 0
        contents: <EMPTY>
    Operations (count: 5)
        Opcode: SEQUENCE (53)
            Status: NFS4_OK (0)
            sessionid: f0f40f58bd020000e002000000000000
            seqid: 0x002b3189
            slot id: 0
            high slot id: 30
            target high slot id: 30
            status flags: 0x00000040, SEQ4_STATUS_RECALLABLE_STATE_REVOKED
                .... .... .... .... .... .... .... ...0 = SEQ4_STATUS_CB_PATH_DOWN: Not set
                .... .... .... .... .... .... .... ..0. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING: Not set
                .... .... .... .... .... .... .... .0.. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED: Not set
                .... .... .... .... .... .... .... 0... = SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED: Not set
                .... .... .... .... .... .... ...0 .... = SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED: Not set
                .... .... .... .... .... .... ..0. .... = SEQ4_STATUS_ADMIN_STATE_REVOKED: Not set
                .... .... .... .... .... .... .1.. .... = SEQ4_STATUS_RECALLABLE_STATE_REVOKED: Set
                .... .... .... .... .... .... 0... .... = SEQ4_STATUS_LEASE_MOVED: Not set
                .... .... .... .... .... ...0 .... .... = SEQ4_STATUS_RESTART_RECLAIM_NEEDED: Not set
                .... .... .... .... .... ..0. .... .... = SEQ4_STATUS_CB_PATH_DOWN_SESSION: Not set
                .... .... .... .... .... .0.. .... .... = SEQ4_STATUS_BACKCHANNEL_FAULT: Not set
                .... .... .... .... .... 0... .... .... = SEQ4_STATUS_DEVID_CHANGED: Not set
                .... .... .... .... ...0 .... .... .... = SEQ4_STATUS_DEVID_DELETED: Not set
        Opcode: PUTFH (22)
            Status: NFS4_OK (0)
        Opcode: OPEN (18)
            Status: NFS4_OK (0)
            stateid
                [StateID Hash: 0xb207]
                seqid: 0x00158249
                Data: f0f40f58bd02000092000000
                [Data hash (CRC-32): 0xf3e67f06]
            change_info
                Atomic: No
                changeid (before): 0
                changeid (after): 0
            result flags: 0x00000004, locktype posix
                .... .... .... .... .... .... .... ..0. = confirm: False
                .... .... .... .... .... .... .... .1.. = locktype posix: True
                .... .... .... .... .... .... .... 0... = preserve unlinked: False
                .... .... .... .... .... .... ..0. .... = may notify lock: False
            Delegation Type: OPEN_DELEGATE_NONE (0)
        Opcode: ACCESS (3), [Access Denied: XE], [Allowed: RD MD XT]
            Status: NFS4_OK (0)
            Supported types (of requested): 0x2d
                .... ...1 = 0x01 READ: supported
                .... .1.. = 0x04 MODIFY: supported
                .... 1... = 0x08 EXTEND: supported
                ..1. .... = 0x20 EXECUTE: supported
            Access rights (of requested): 0x0d
                .... ...1 = 0x01 READ: allowed
                .... .1.. = 0x04 MODIFY: allowed
                .... 1... = 0x08 EXTEND: allowed
                ..0. .... = 0x20 EXECUTE: *Access Denied*
        Opcode: GETATTR (9)
            Status: NFS4_OK (0)
            Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId)
                reqd_attr: Type (1)
                    ftype4: NF4REG (1)
                reqd_attr: Change (3)
                    changeid: 6353654913397708009
                reqd_attr: Size (4)
                    size: 32768
                reqd_attr: FSID (8)
                    fattr4_fsid
                        fsid4.major: 7704996244782214878
                        fsid4.minor: 0
                reco_attr: FileId (20)
                    fileid: 4737174
            Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId)
                reco_attr: Mode (33)
                    mode: 0644, Name: Unknown, Read permission for owner, Write permission for owner, Read permission for group, Read permission for others
                        .... .... .... .... 000. .... .... .... = Name: Unknown (0)
                        .... .... .... .... .... 0... .... .... = Set user id on exec: No
                        .... .... .... .... .... .0.. .... .... = Set group id on exec: No
                        .... .... .... .... .... ..0. .... .... = Save swapped text even after use: No
                        .... .... .... .... .... ...1 .... .... = Read permission for owner: Yes
                        .... .... .... .... .... .... 1... .... = Write permission for owner: Yes
                        .... .... .... .... .... .... .0.. .... = Execute permission for owner: No
                        .... .... .... .... .... .... ..1. .... = Read permission for group: Yes
                        .... .... .... .... .... .... ...0 .... = Write permission for group: No
                        .... .... .... .... .... .... .... 0... = Execute permission for group: No
                        .... .... .... .... .... .... .... .1.. = Read permission for others: Yes
                        .... .... .... .... .... .... .... ..0. = Write permission for others: No
                        .... .... .... .... .... .... .... ...0 = Execute permission for others: No
                reco_attr: NumLinks (35)
                    numlinks: 1
                reco_attr: Owner (36)
                    fattr4_owner: 7143
                        length: 4
                        contents: 7143
                reco_attr: Owner_Group (37)
                    fattr4_owner_group: 1112
                        length: 4
                        contents: 1112
                reco_attr: RawDev (41)
                    specdata1: 0
                    specdata2: 0
                reco_attr: Space_Used (45)
                    space_used: 32768
                reco_attr: Time_Access (47)
                    seconds: 1479318517
                    nseconds: 593359867
                reco_attr: Time_Metadata (52)
                    seconds: 1479325563
                    nseconds: 175920361
                reco_attr: Time_Modify (53)
                    seconds: 1479325563
                    nseconds: 175920361
                reco_attr: Mounted_on_FileId (55)
                    fileid: 0x0000000000484896
    [Main Opcode: OPEN (18)]

Frame 4: 322 bytes on wire (2576 bits), 322 bytes captured (2576 bits) on interface 0
Ethernet II, Src: AsrockIn_84:34:e7 (d0:50:99:84:34:e7), Dst: SuperMic_19:8e:6e (00:25:90:19:8e:6e)
Internet Protocol Version 4, Src: 172.21.86.141, Dst: 172.21.86.74
Transmission Control Protocol, Src Port: 997, Dst Port: 2049, Seq: 257, Ack: 649, Len: 256
Remote Procedure Call, Type:Call XID:0x2a009a8c
Network File System, Ops(5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]
    Tag: <EMPTY>
        length: 0
        contents: <EMPTY>
    minorversion: 1
    Operations (count: 5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR
        Opcode: SEQUENCE (53)
            sessionid: f0f40f58bd020000e002000000000000
            seqid: 0x002b318a
            slot id: 0
            high slot id: 0
            cache this?: Yes
        Opcode: PUTFH (22)
            filehandle
                length: 28
                [hash (CRC-32): 0x26ab9d8d]
                filehandle: 010006016aeda54305f66ede00000000000000009a424800...
        Opcode: OPEN (18)
            seqid: 0x00000000
            share_access: OPEN4_SHARE_ACCESS_BOTH (3)
            share_deny: OPEN4_SHARE_DENY_NONE (0)
            clientid: 0xf0f40f58bd020000
            owner: <DATA>
                length: 24
                contents: <DATA>
            Open Type: OPEN4_NOCREATE (0)
            Claim Type: CLAIM_FH (4)
        Opcode: ACCESS (3), [Check: RD MD XT XE]
            Check access: 0x2d
                .... ...1 = 0x01 READ: allowed?
                .... .1.. = 0x04 MODIFY: allowed?
                .... 1... = 0x08 EXTEND: allowed?
                ..1. .... = 0x20 EXECUTE: allowed?
        Opcode: GETATTR (9)
            Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId)
                reqd_attr: Type (1)
                reqd_attr: Change (3)
                reqd_attr: Size (4)
                reqd_attr: FSID (8)
                reco_attr: FileId (20)
            Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId)
                reco_attr: Mode (33)
                reco_attr: NumLinks (35)
                reco_attr: Owner (36)
                reco_attr: Owner_Group (37)
                reco_attr: RawDev (41)
                reco_attr: Space_Used (45)
                reco_attr: Time_Access (47)
                reco_attr: Time_Metadata (52)
                reco_attr: Time_Modify (53)
                reco_attr: Mounted_on_FileId (55)
    [Main Opcode: OPEN (18)]

Frame 5: 322 bytes on wire (2576 bits), 322 bytes captured (2576 bits) on interface 0
Ethernet II, Src: AsrockIn_84:34:e7 (d0:50:99:84:34:e7), Dst: SuperMic_19:8e:6e (00:25:90:19:8e:6e)
Internet Protocol Version 4, Src: 172.21.86.141, Dst: 172.21.86.74
Transmission Control Protocol, Src Port: 997, Dst Port: 2049, Seq: 769, Ack: 1297, Len: 256
Remote Procedure Call, Type:Call XID:0x2c009a8c
Network File System, Ops(5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]
    Tag: <EMPTY>
        length: 0
        contents: <EMPTY>
    minorversion: 1
    Operations (count: 5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR
        Opcode: SEQUENCE (53)
            sessionid: f0f40f58bd020000e002000000000000
            seqid: 0x002b318c
            slot id: 0
            high slot id: 0
            cache this?: Yes
        Opcode: PUTFH (22)
            filehandle
                length: 28
                [hash (CRC-32): 0x26ab9d8d]
                filehandle: 010006016aeda54305f66ede00000000000000009a424800...
        Opcode: OPEN (18)
            seqid: 0x00000000
            share_access: OPEN4_SHARE_ACCESS_BOTH (3)
            share_deny: OPEN4_SHARE_DENY_NONE (0)
            clientid: 0xf0f40f58bd020000
            owner: <DATA>
                length: 24
                contents: <DATA>
            Open Type: OPEN4_NOCREATE (0)
            Claim Type: CLAIM_FH (4)
        Opcode: ACCESS (3), [Check: RD MD XT XE]
            Check access: 0x2d
                .... ...1 = 0x01 READ: allowed?
                .... .1.. = 0x04 MODIFY: allowed?
                .... 1... = 0x08 EXTEND: allowed?
                ..1. .... = 0x20 EXECUTE: allowed?
        Opcode: GETATTR (9)
            Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId)
                reqd_attr: Type (1)
                reqd_attr: Change (3)
                reqd_attr: Size (4)
                reqd_attr: FSID (8)
                reco_attr: FileId (20)
            Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId)
                reco_attr: Mode (33)
                reco_attr: NumLinks (35)
                reco_attr: Owner (36)
                reco_attr: Owner_Group (37)
                reco_attr: RawDev (41)
                reco_attr: Space_Used (45)
                reco_attr: Time_Access (47)
                reco_attr: Time_Metadata (52)
                reco_attr: Time_Modify (53)
                reco_attr: Mounted_on_FileId (55)
    [Main Opcode: OPEN (18)]

Frame 6: 390 bytes on wire (3120 bits), 390 bytes captured (3120 bits) on interface 0
Ethernet II, Src: SuperMic_19:8e:6e (00:25:90:19:8e:6e), Dst: AsrockIn_84:34:e7 (d0:50:99:84:34:e7)
Internet Protocol Version 4, Src: 172.21.86.74, Dst: 172.21.86.141
Transmission Control Protocol, Src Port: 2049, Dst Port: 997, Seq: 1297, Ack: 1025, Len: 324
Remote Procedure Call, Type:Reply XID:0x2c009a8c
Network File System, Ops(5): SEQUENCE PUTFH OPEN ACCESS GETATTR
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]
    Status: NFS4_OK (0)
    Tag: <EMPTY>
        length: 0
        contents: <EMPTY>
    Operations (count: 5)
        Opcode: SEQUENCE (53)
            Status: NFS4_OK (0)
            sessionid: f0f40f58bd020000e002000000000000
            seqid: 0x002b318c
            slot id: 0
            high slot id: 30
            target high slot id: 30
            status flags: 0x00000040, SEQ4_STATUS_RECALLABLE_STATE_REVOKED
                .... .... .... .... .... .... .... ...0 = SEQ4_STATUS_CB_PATH_DOWN: Not set
                .... .... .... .... .... .... .... ..0. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING: Not set
                .... .... .... .... .... .... .... .0.. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED: Not set
                .... .... .... .... .... .... .... 0... = SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED: Not set
                .... .... .... .... .... .... ...0 .... = SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED: Not set
                .... .... .... .... .... .... ..0. .... = SEQ4_STATUS_ADMIN_STATE_REVOKED: Not set
                .... .... .... .... .... .... .1.. .... = SEQ4_STATUS_RECALLABLE_STATE_REVOKED: Set
                .... .... .... .... .... .... 0... .... = SEQ4_STATUS_LEASE_MOVED: Not set
                .... .... .... .... .... ...0 .... .... = SEQ4_STATUS_RESTART_RECLAIM_NEEDED: Not set
                .... .... .... .... .... ..0. .... .... = SEQ4_STATUS_CB_PATH_DOWN_SESSION: Not set
                .... .... .... .... .... .0.. .... .... = SEQ4_STATUS_BACKCHANNEL_FAULT: Not set
                .... .... .... .... .... 0... .... .... = SEQ4_STATUS_DEVID_CHANGED: Not set
                .... .... .... .... ...0 .... .... .... = SEQ4_STATUS_DEVID_DELETED: Not set
        Opcode: PUTFH (22)
            Status: NFS4_OK (0)
        Opcode: OPEN (18)
            Status: NFS4_OK (0)
            stateid
                [StateID Hash: 0xb9c3]
                seqid: 0x0015824c
                Data: f0f40f58bd0200009e000000
                [Data hash (CRC-32): 0xb930c0be]
            change_info
                Atomic: No
                changeid (before): 0
                changeid (after): 0
            result flags: 0x00000004, locktype posix
                .... .... .... .... .... .... .... ..0. = confirm: False
                .... .... .... .... .... .... .... .1.. = locktype posix: True
                .... .... .... .... .... .... .... 0... = preserve unlinked: False
                .... .... .... .... .... .... ..0. .... = may notify lock: False
            Delegation Type: OPEN_DELEGATE_NONE (0)
        Opcode: ACCESS (3), [Access Denied: XE], [Allowed: RD MD XT]
            Status: NFS4_OK (0)
            Supported types (of requested): 0x2d
                .... ...1 = 0x01 READ: supported
                .... .1.. = 0x04 MODIFY: supported
                .... 1... = 0x08 EXTEND: supported
                ..1. .... = 0x20 EXECUTE: supported
            Access rights (of requested): 0x0d
                .... ...1 = 0x01 READ: allowed
                .... .1.. = 0x04 MODIFY: allowed
                .... 1... = 0x08 EXTEND: allowed
                ..0. .... = 0x20 EXECUTE: *Access Denied*
        Opcode: GETATTR (9)
            Status: NFS4_OK (0)
            Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId)
                reqd_attr: Type (1)
                    ftype4: NF4REG (1)
                reqd_attr: Change (3)
                    changeid: 6310935438380194724
                reqd_attr: Size (4)
                    size: 24576
                reqd_attr: FSID (8)
                    fattr4_fsid
                        fsid4.major: 7704996244782214878
                        fsid4.minor: 0
                reco_attr: FileId (20)
                    fileid: 4735642
            Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId)
                reco_attr: Mode (33)
                    mode: 0644, Name: Unknown, Read permission for owner, Write permission for owner, Read permission for group, Read permission for others
                        .... .... .... .... 000. .... .... .... = Name: Unknown (0)
                        .... .... .... .... .... 0... .... .... = Set user id on exec: No
                        .... .... .... .... .... .0.. .... .... = Set group id on exec: No
                        .... .... .... .... .... ..0. .... .... = Save swapped text even after use: No
                        .... .... .... .... .... ...1 .... .... = Read permission for owner: Yes
                        .... .... .... .... .... .... 1... .... = Write permission for owner: Yes
                        .... .... .... .... .... .... .0.. .... = Execute permission for owner: No
                        .... .... .... .... .... .... ..1. .... = Read permission for group: Yes
                        .... .... .... .... .... .... ...0 .... = Write permission for group: No
                        .... .... .... .... .... .... .... 0... = Execute permission for group: No
                        .... .... .... .... .... .... .... .1.. = Read permission for others: Yes
                        .... .... .... .... .... .... .... ..0. = Write permission for others: No
                        .... .... .... .... .... .... .... ...0 = Execute permission for others: No
                reco_attr: NumLinks (35)
                    numlinks: 1
                reco_attr: Owner (36)
                    fattr4_owner: 7143
                        length: 4
                        contents: 7143
                reco_attr: Owner_Group (37)
                    fattr4_owner_group: 1112
                        length: 4
                        contents: 1112
                reco_attr: RawDev (41)
                    specdata1: 0
                    specdata2: 0
                reco_attr: Space_Used (45)
                    space_used: 24576
                reco_attr: Time_Access (47)
                    seconds: 1479325563
                    nseconds: 305919836
                reco_attr: Time_Metadata (52)
                    seconds: 1469379160
                    nseconds: 756243364
                reco_attr: Time_Modify (53)
                    seconds: 1469379160
                    nseconds: 756243364
                reco_attr: Mounted_on_FileId (55)
                    fileid: 0x000000000048429a
    [Main Opcode: OPEN (18)]

Frame 7: 322 bytes on wire (2576 bits), 322 bytes captured (2576 bits) on interface 0
Ethernet II, Src: AsrockIn_84:34:e7 (d0:50:99:84:34:e7), Dst: SuperMic_19:8e:6e (00:25:90:19:8e:6e)
Internet Protocol Version 4, Src: 172.21.86.141, Dst: 172.21.86.74
Transmission Control Protocol, Src Port: 997, Dst Port: 2049, Seq: 1025, Ack: 1621, Len: 256
Remote Procedure Call, Type:Call XID:0x2d009a8c
Network File System, Ops(5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]
    Tag: <EMPTY>
        length: 0
        contents: <EMPTY>
    minorversion: 1
    Operations (count: 5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR
        Opcode: SEQUENCE (53)
            sessionid: f0f40f58bd020000e002000000000000
            seqid: 0x002b318d
            slot id: 0
            high slot id: 0
            cache this?: Yes
        Opcode: PUTFH (22)
            filehandle
                length: 28
                [hash (CRC-32): 0xdcaec4f7]
                filehandle: 010006016aeda54305f66ede000000000000000096484800...
        Opcode: OPEN (18)
            seqid: 0x00000000
            share_access: OPEN4_SHARE_ACCESS_BOTH (3)
            share_deny: OPEN4_SHARE_DENY_NONE (0)
            clientid: 0xf0f40f58bd020000
            owner: <DATA>
                length: 24
                contents: <DATA>
            Open Type: OPEN4_NOCREATE (0)
            Claim Type: CLAIM_FH (4)
        Opcode: ACCESS (3), [Check: RD MD XT XE]
            Check access: 0x2d
                .... ...1 = 0x01 READ: allowed?
                .... .1.. = 0x04 MODIFY: allowed?
                .... 1... = 0x08 EXTEND: allowed?
                ..1. .... = 0x20 EXECUTE: allowed?
        Opcode: GETATTR (9)
            Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId)
                reqd_attr: Type (1)
                reqd_attr: Change (3)
                reqd_attr: Size (4)
                reqd_attr: FSID (8)
                reco_attr: FileId (20)
            Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId)
                reco_attr: Mode (33)
                reco_attr: NumLinks (35)
                reco_attr: Owner (36)
                reco_attr: Owner_Group (37)
                reco_attr: RawDev (41)
                reco_attr: Space_Used (45)
                reco_attr: Time_Access (47)
                reco_attr: Time_Metadata (52)
                reco_attr: Time_Modify (53)
                reco_attr: Mounted_on_FileId (55)
    [Main Opcode: OPEN (18)]

Frame 8: 390 bytes on wire (3120 bits), 390 bytes captured (3120 bits) on interface 0
Ethernet II, Src: SuperMic_19:8e:6e (00:25:90:19:8e:6e), Dst: AsrockIn_84:34:e7 (d0:50:99:84:34:e7)
Internet Protocol Version 4, Src: 172.21.86.74, Dst: 172.21.86.141
Transmission Control Protocol, Src Port: 2049, Dst Port: 997, Seq: 1621, Ack: 1281, Len: 324
Remote Procedure Call, Type:Reply XID:0x2d009a8c
Network File System, Ops(5): SEQUENCE PUTFH OPEN ACCESS GETATTR
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]
    Status: NFS4_OK (0)
    Tag: <EMPTY>
        length: 0
        contents: <EMPTY>
    Operations (count: 5)
        Opcode: SEQUENCE (53)
            Status: NFS4_OK (0)
            sessionid: f0f40f58bd020000e002000000000000
            seqid: 0x002b318d
            slot id: 0
            high slot id: 30
            target high slot id: 30
            status flags: 0x00000040, SEQ4_STATUS_RECALLABLE_STATE_REVOKED
                .... .... .... .... .... .... .... ...0 = SEQ4_STATUS_CB_PATH_DOWN: Not set
                .... .... .... .... .... .... .... ..0. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING: Not set
                .... .... .... .... .... .... .... .0.. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED: Not set
                .... .... .... .... .... .... .... 0... = SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED: Not set
                .... .... .... .... .... .... ...0 .... = SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED: Not set
                .... .... .... .... .... .... ..0. .... = SEQ4_STATUS_ADMIN_STATE_REVOKED: Not set
                .... .... .... .... .... .... .1.. .... = SEQ4_STATUS_RECALLABLE_STATE_REVOKED: Set
                .... .... .... .... .... .... 0... .... = SEQ4_STATUS_LEASE_MOVED: Not set
                .... .... .... .... .... ...0 .... .... = SEQ4_STATUS_RESTART_RECLAIM_NEEDED: Not set
                .... .... .... .... .... ..0. .... .... = SEQ4_STATUS_CB_PATH_DOWN_SESSION: Not set
                .... .... .... .... .... .0.. .... .... = SEQ4_STATUS_BACKCHANNEL_FAULT: Not set
                .... .... .... .... .... 0... .... .... = SEQ4_STATUS_DEVID_CHANGED: Not set
                .... .... .... .... ...0 .... .... .... = SEQ4_STATUS_DEVID_DELETED: Not set
        Opcode: PUTFH (22)
            Status: NFS4_OK (0)
        Opcode: OPEN (18)
            Status: NFS4_OK (0)
            stateid
                [StateID Hash: 0x1925]
                seqid: 0x0015824b
                Data: f0f40f58bd02000092000000
                [Data hash (CRC-32): 0xf3e67f06]
            change_info
                Atomic: No
                changeid (before): 0
                changeid (after): 0
            result flags: 0x00000004, locktype posix
                .... .... .... .... .... .... .... ..0. = confirm: False
                .... .... .... .... .... .... .... .1.. = locktype posix: True
                .... .... .... .... .... .... .... 0... = preserve unlinked: False
                .... .... .... .... .... .... ..0. .... = may notify lock: False
            Delegation Type: OPEN_DELEGATE_NONE (0)
        Opcode: ACCESS (3), [Access Denied: XE], [Allowed: RD MD XT]
            Status: NFS4_OK (0)
            Supported types (of requested): 0x2d
                .... ...1 = 0x01 READ: supported
                .... .1.. = 0x04 MODIFY: supported
                .... 1... = 0x08 EXTEND: supported
                ..1. .... = 0x20 EXECUTE: supported
            Access rights (of requested): 0x0d
                .... ...1 = 0x01 READ: allowed
                .... .1.. = 0x04 MODIFY: allowed
                .... 1... = 0x08 EXTEND: allowed
                ..0. .... = 0x20 EXECUTE: *Access Denied*
        Opcode: GETATTR (9)
            Status: NFS4_OK (0)
            Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId)
                reqd_attr: Type (1)
                    ftype4: NF4REG (1)
                reqd_attr: Change (3)
                    changeid: 6353654913397708009
                reqd_attr: Size (4)
                    size: 32768
                reqd_attr: FSID (8)
                    fattr4_fsid
                        fsid4.major: 7704996244782214878
                        fsid4.minor: 0
                reco_attr: FileId (20)
                    fileid: 4737174
            Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId)
                reco_attr: Mode (33)
                    mode: 0644, Name: Unknown, Read permission for owner, Write permission for owner, Read permission for group, Read permission for others
                        .... .... .... .... 000. .... .... .... = Name: Unknown (0)
                        .... .... .... .... .... 0... .... .... = Set user id on exec: No
                        .... .... .... .... .... .0.. .... .... = Set group id on exec: No
                        .... .... .... .... .... ..0. .... .... = Save swapped text even after use: No
                        .... .... .... .... .... ...1 .... .... = Read permission for owner: Yes
                        .... .... .... .... .... .... 1... .... = Write permission for owner: Yes
                        .... .... .... .... .... .... .0.. .... = Execute permission for owner: No
                        .... .... .... .... .... .... ..1. .... = Read permission for group: Yes
                        .... .... .... .... .... .... ...0 .... = Write permission for group: No
                        .... .... .... .... .... .... .... 0... = Execute permission for group: No
                        .... .... .... .... .... .... .... .1.. = Read permission for others: Yes
                        .... .... .... .... .... .... .... ..0. = Write permission for others: No
                        .... .... .... .... .... .... .... ...0 = Execute permission for others: No
                reco_attr: NumLinks (35)
                    numlinks: 1
                reco_attr: Owner (36)
                    fattr4_owner: 7143
                        length: 4
                        contents: 7143
                reco_attr: Owner_Group (37)
                    fattr4_owner_group: 1112
                        length: 4
                        contents: 1112
                reco_attr: RawDev (41)
                    specdata1: 0
                    specdata2: 0
                reco_attr: Space_Used (45)
                    space_used: 32768
                reco_attr: Time_Access (47)
                    seconds: 1479318517
                    nseconds: 593359867
                reco_attr: Time_Metadata (52)
                    seconds: 1479325563
                    nseconds: 175920361
                reco_attr: Time_Modify (53)
                    seconds: 1479325563
                    nseconds: 175920361
                reco_attr: Mounted_on_FileId (55)
                    fileid: 0x0000000000484896
    [Main Opcode: OPEN (18)]

Frame 9: 322 bytes on wire (2576 bits), 322 bytes captured (2576 bits) on interface 0
Ethernet II, Src: AsrockIn_84:34:e7 (d0:50:99:84:34:e7), Dst: SuperMic_19:8e:6e (00:25:90:19:8e:6e)
Internet Protocol Version 4, Src: 172.21.86.141, Dst: 172.21.86.74
Transmission Control Protocol, Src Port: 997, Dst Port: 2049, Seq: 1281, Ack: 1945, Len: 256
Remote Procedure Call, Type:Call XID:0x2e009a8c
Network File System, Ops(5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]
    Tag: <EMPTY>
        length: 0
        contents: <EMPTY>
    minorversion: 1
    Operations (count: 5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR
        Opcode: SEQUENCE (53)
            sessionid: f0f40f58bd020000e002000000000000
            seqid: 0x002b318e
            slot id: 0
            high slot id: 0
            cache this?: Yes
        Opcode: PUTFH (22)
            filehandle
                length: 28
                [hash (CRC-32): 0x26ab9d8d]
                filehandle: 010006016aeda54305f66ede00000000000000009a424800...
        Opcode: OPEN (18)
            seqid: 0x00000000
            share_access: OPEN4_SHARE_ACCESS_BOTH (3)
            share_deny: OPEN4_SHARE_DENY_NONE (0)
            clientid: 0xf0f40f58bd020000
            owner: <DATA>
                length: 24
                contents: <DATA>
            Open Type: OPEN4_NOCREATE (0)
            Claim Type: CLAIM_FH (4)
        Opcode: ACCESS (3), [Check: RD MD XT XE]
            Check access: 0x2d
                .... ...1 = 0x01 READ: allowed?
                .... .1.. = 0x04 MODIFY: allowed?
                .... 1... = 0x08 EXTEND: allowed?
                ..1. .... = 0x20 EXECUTE: allowed?
        Opcode: GETATTR (9)
            Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId)
                reqd_attr: Type (1)
                reqd_attr: Change (3)
                reqd_attr: Size (4)
                reqd_attr: FSID (8)
                reco_attr: FileId (20)
            Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId)
                reco_attr: Mode (33)
                reco_attr: NumLinks (35)
                reco_attr: Owner (36)
                reco_attr: Owner_Group (37)
                reco_attr: RawDev (41)
                reco_attr: Space_Used (45)
                reco_attr: Time_Access (47)
                reco_attr: Time_Metadata (52)
                reco_attr: Time_Modify (53)
                reco_attr: Mounted_on_FileId (55)
    [Main Opcode: OPEN (18)]

Frame 10: 390 bytes on wire (3120 bits), 390 bytes captured (3120 bits) on interface 0
Ethernet II, Src: SuperMic_19:8e:6e (00:25:90:19:8e:6e), Dst: AsrockIn_84:34:e7 (d0:50:99:84:34:e7)
Internet Protocol Version 4, Src: 172.21.86.74, Dst: 172.21.86.141
Transmission Control Protocol, Src Port: 2049, Dst Port: 997, Seq: 1945, Ack: 1537, Len: 324
Remote Procedure Call, Type:Reply XID:0x2e009a8c
Network File System, Ops(5): SEQUENCE PUTFH OPEN ACCESS GETATTR
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]
    Status: NFS4_OK (0)
    Tag: <EMPTY>
        length: 0
        contents: <EMPTY>
    Operations (count: 5)
        Opcode: SEQUENCE (53)
            Status: NFS4_OK (0)
            sessionid: f0f40f58bd020000e002000000000000
            seqid: 0x002b318e
            slot id: 0
            high slot id: 30
            target high slot id: 30
            status flags: 0x00000040, SEQ4_STATUS_RECALLABLE_STATE_REVOKED
                .... .... .... .... .... .... .... ...0 = SEQ4_STATUS_CB_PATH_DOWN: Not set
                .... .... .... .... .... .... .... ..0. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING: Not set
                .... .... .... .... .... .... .... .0.. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED: Not set
                .... .... .... .... .... .... .... 0... = SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED: Not set
                .... .... .... .... .... .... ...0 .... = SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED: Not set
                .... .... .... .... .... .... ..0. .... = SEQ4_STATUS_ADMIN_STATE_REVOKED: Not set
                .... .... .... .... .... .... .1.. .... = SEQ4_STATUS_RECALLABLE_STATE_REVOKED: Set
                .... .... .... .... .... .... 0... .... = SEQ4_STATUS_LEASE_MOVED: Not set
                .... .... .... .... .... ...0 .... .... = SEQ4_STATUS_RESTART_RECLAIM_NEEDED: Not set
                .... .... .... .... .... ..0. .... .... = SEQ4_STATUS_CB_PATH_DOWN_SESSION: Not set
                .... .... .... .... .... .0.. .... .... = SEQ4_STATUS_BACKCHANNEL_FAULT: Not set
                .... .... .... .... .... 0... .... .... = SEQ4_STATUS_DEVID_CHANGED: Not set
                .... .... .... .... ...0 .... .... .... = SEQ4_STATUS_DEVID_DELETED: Not set
        Opcode: PUTFH (22)
            Status: NFS4_OK (0)
        Opcode: OPEN (18)
            Status: NFS4_OK (0)
            stateid
                [StateID Hash: 0xec52]
                seqid: 0x0015824d
                Data: f0f40f58bd0200009e000000
                [Data hash (CRC-32): 0xb930c0be]
            change_info
                Atomic: No
                changeid (before): 0
                changeid (after): 0
            result flags: 0x00000004, locktype posix
                .... .... .... .... .... .... .... ..0. = confirm: False
                .... .... .... .... .... .... .... .1.. = locktype posix: True
                .... .... .... .... .... .... .... 0... = preserve unlinked: False
                .... .... .... .... .... .... ..0. .... = may notify lock: False
            Delegation Type: OPEN_DELEGATE_NONE (0)
        Opcode: ACCESS (3), [Access Denied: XE], [Allowed: RD MD XT]
            Status: NFS4_OK (0)
            Supported types (of requested): 0x2d
                .... ...1 = 0x01 READ: supported
                .... .1.. = 0x04 MODIFY: supported
                .... 1... = 0x08 EXTEND: supported
                ..1. .... = 0x20 EXECUTE: supported
            Access rights (of requested): 0x0d
                .... ...1 = 0x01 READ: allowed
                .... .1.. = 0x04 MODIFY: allowed
                .... 1... = 0x08 EXTEND: allowed
                ..0. .... = 0x20 EXECUTE: *Access Denied*
        Opcode: GETATTR (9)
            Status: NFS4_OK (0)
            Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId)
                reqd_attr: Type (1)
                    ftype4: NF4REG (1)
                reqd_attr: Change (3)
                    changeid: 6310935438380194724
                reqd_attr: Size (4)
                    size: 24576
                reqd_attr: FSID (8)
                    fattr4_fsid
                        fsid4.major: 7704996244782214878
                        fsid4.minor: 0
                reco_attr: FileId (20)
                    fileid: 4735642
            Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId)
                reco_attr: Mode (33)
                    mode: 0644, Name: Unknown, Read permission for owner, Write permission for owner, Read permission for group, Read permission for others
                        .... .... .... .... 000. .... .... .... = Name: Unknown (0)
                        .... .... .... .... .... 0... .... .... = Set user id on exec: No
                        .... .... .... .... .... .0.. .... .... = Set group id on exec: No
                        .... .... .... .... .... ..0. .... .... = Save swapped text even after use: No
                        .... .... .... .... .... ...1 .... .... = Read permission for owner: Yes
                        .... .... .... .... .... .... 1... .... = Write permission for owner: Yes
                        .... .... .... .... .... .... .0.. .... = Execute permission for owner: No
                        .... .... .... .... .... .... ..1. .... = Read permission for group: Yes
                        .... .... .... .... .... .... ...0 .... = Write permission for group: No
                        .... .... .... .... .... .... .... 0... = Execute permission for group: No
                        .... .... .... .... .... .... .... .1.. = Read permission for others: Yes
                        .... .... .... .... .... .... .... ..0. = Write permission for others: No
                        .... .... .... .... .... .... .... ...0 = Execute permission for others: No
                reco_attr: NumLinks (35)
                    numlinks: 1
                reco_attr: Owner (36)
                    fattr4_owner: 7143
                        length: 4
                        contents: 7143
                reco_attr: Owner_Group (37)
                    fattr4_owner_group: 1112
                        length: 4
                        contents: 1112
                reco_attr: RawDev (41)
                    specdata1: 0
                    specdata2: 0
                reco_attr: Space_Used (45)
                    space_used: 24576
                reco_attr: Time_Access (47)
                    seconds: 1479325563
                    nseconds: 305919836
                reco_attr: Time_Metadata (52)
                    seconds: 1469379160
                    nseconds: 756243364
                reco_attr: Time_Modify (53)
                    seconds: 1469379160
                    nseconds: 756243364
                reco_attr: Mounted_on_FileId (55)
                    fileid: 0x000000000048429a
    [Main Opcode: OPEN (18)]


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-11-16 20:55             ` Jason L Tibbitts III
@ 2016-11-17 16:31               ` J. Bruce Fields
  2016-11-17 17:08                 ` Jason L Tibbitts III
  2016-11-17 17:45                 ` Trond Myklebust
  0 siblings, 2 replies; 27+ messages in thread
From: J. Bruce Fields @ 2016-11-17 16:31 UTC (permalink / raw)
  To: Jason L Tibbitts III; +Cc: linux-nfs

On Wed, Nov 16, 2016 at 02:55:05PM -0600, Jason L Tibbitts III wrote:
> I'm replying to a rather old message, but the issue has just now popped
> back up again.
> 
> To recap, a client stops being able to access _any_ mount on a
> particular server, and "NFS: nfs4_reclaim_open_state: Lock reclaim
> failed!" appears several hundred times per second in the kernel log.
> The load goes up by one for ever process attempting to access any mount
> from that particular server.  Mounts to other servers are fine, and
> other clients can mount things from that one server without problems.
> 
> When I kill every process keeping that particular mount active and then
> umount it, I see:
> 
> NFS: nfs4_reclaim_open_state: unhandled error -10068

NFS4ERR_RETRY_UNCACHED_REP.

So, you're using NFSv4.1 or 4.2, and the server thinks that the client
has reused a (slot, sequence number) pair, but the server doesn't have a
cached response to return.

Hard to know how that happened, and it's not shown in the below.
Sounds like a bug, though.

> The server in this case is running Centos 7: 3.10.0-327.36.3.el7.x86_64
> The client is one of the last I have that's still running Fedora 23:
> 4.7.10-100.fc23.x86_64
> 
> The big difference now, besides the updated kernel versions, is that
> fact that no kerberos is involved.  It's a sec=sys mount.  There are
> kerberized exports on the same server, but the client wasn't accessing
> any of them.  There were other kerberized mounts on the client, but they
> were to different servers and they remained accessible.

Recent clients will use sec=krb5 for certain state-related operations
even if you mount with sec=sys, so it's still possible it could be
involved here.

> Here's ten frames from a packet capture.  I think it's all repeating at
> that point so maybe it will be enough, but I can provide a more complete
> capture log on request.

The SEQ4_STATUS_RECALLABLE_STATE_REVOKED flag set in the OPEN replies is
also a sign something's gone wrong.  Apparently the server thinks the
client has failed to return a delegation.

--b.

> 
>  - J<
> 
> Frame 1: 390 bytes on wire (3120 bits), 390 bytes captured (3120 bits) on interface 0
> Ethernet II, Src: SuperMic_19:8e:6e (00:25:90:19:8e:6e), Dst: AsrockIn_84:34:e7 (d0:50:99:84:34:e7)
> Internet Protocol Version 4, Src: 172.21.86.74, Dst: 172.21.86.141
> Transmission Control Protocol, Src Port: 2049, Dst Port: 997, Seq: 1, Ack: 1, Len: 324
> Remote Procedure Call
> 
> Frame 2: 322 bytes on wire (2576 bits), 322 bytes captured (2576 bits) on interface 0
> Ethernet II, Src: AsrockIn_84:34:e7 (d0:50:99:84:34:e7), Dst: SuperMic_19:8e:6e (00:25:90:19:8e:6e)
> Internet Protocol Version 4, Src: 172.21.86.141, Dst: 172.21.86.74
> Transmission Control Protocol, Src Port: 997, Dst Port: 2049, Seq: 1, Ack: 325, Len: 256
> Remote Procedure Call, Type:Call XID:0x29009a8c
> Network File System, Ops(5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR
>     [Program Version: 4]
>     [V4 Procedure: COMPOUND (1)]
>     Tag: <EMPTY>
>         length: 0
>         contents: <EMPTY>
>     minorversion: 1
>     Operations (count: 5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR
>         Opcode: SEQUENCE (53)
>             sessionid: f0f40f58bd020000e002000000000000
>             seqid: 0x002b3189
>             slot id: 0
>             high slot id: 0
>             cache this?: Yes
>         Opcode: PUTFH (22)
>             filehandle
>                 length: 28
>                 [hash (CRC-32): 0xdcaec4f7]
>                 filehandle: 010006016aeda54305f66ede000000000000000096484800...
>         Opcode: OPEN (18)
>             seqid: 0x00000000
>             share_access: OPEN4_SHARE_ACCESS_BOTH (3)
>             share_deny: OPEN4_SHARE_DENY_NONE (0)
>             clientid: 0xf0f40f58bd020000
>             owner: <DATA>
>                 length: 24
>                 contents: <DATA>
>             Open Type: OPEN4_NOCREATE (0)
>             Claim Type: CLAIM_FH (4)
>         Opcode: ACCESS (3), [Check: RD MD XT XE]
>             Check access: 0x2d
>                 .... ...1 = 0x01 READ: allowed?
>                 .... .1.. = 0x04 MODIFY: allowed?
>                 .... 1... = 0x08 EXTEND: allowed?
>                 ..1. .... = 0x20 EXECUTE: allowed?
>         Opcode: GETATTR (9)
>             Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId)
>                 reqd_attr: Type (1)
>                 reqd_attr: Change (3)
>                 reqd_attr: Size (4)
>                 reqd_attr: FSID (8)
>                 reco_attr: FileId (20)
>             Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId)
>                 reco_attr: Mode (33)
>                 reco_attr: NumLinks (35)
>                 reco_attr: Owner (36)
>                 reco_attr: Owner_Group (37)
>                 reco_attr: RawDev (41)
>                 reco_attr: Space_Used (45)
>                 reco_attr: Time_Access (47)
>                 reco_attr: Time_Metadata (52)
>                 reco_attr: Time_Modify (53)
>                 reco_attr: Mounted_on_FileId (55)
>     [Main Opcode: OPEN (18)]
> 
> Frame 3: 390 bytes on wire (3120 bits), 390 bytes captured (3120 bits) on interface 0
> Ethernet II, Src: SuperMic_19:8e:6e (00:25:90:19:8e:6e), Dst: AsrockIn_84:34:e7 (d0:50:99:84:34:e7)
> Internet Protocol Version 4, Src: 172.21.86.74, Dst: 172.21.86.141
> Transmission Control Protocol, Src Port: 2049, Dst Port: 997, Seq: 325, Ack: 257, Len: 324
> Remote Procedure Call, Type:Reply XID:0x29009a8c
> Network File System, Ops(5): SEQUENCE PUTFH OPEN ACCESS GETATTR
>     [Program Version: 4]
>     [V4 Procedure: COMPOUND (1)]
>     Status: NFS4_OK (0)
>     Tag: <EMPTY>
>         length: 0
>         contents: <EMPTY>
>     Operations (count: 5)
>         Opcode: SEQUENCE (53)
>             Status: NFS4_OK (0)
>             sessionid: f0f40f58bd020000e002000000000000
>             seqid: 0x002b3189
>             slot id: 0
>             high slot id: 30
>             target high slot id: 30
>             status flags: 0x00000040, SEQ4_STATUS_RECALLABLE_STATE_REVOKED
>                 .... .... .... .... .... .... .... ...0 = SEQ4_STATUS_CB_PATH_DOWN: Not set
>                 .... .... .... .... .... .... .... ..0. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING: Not set
>                 .... .... .... .... .... .... .... .0.. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED: Not set
>                 .... .... .... .... .... .... .... 0... = SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED: Not set
>                 .... .... .... .... .... .... ...0 .... = SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED: Not set
>                 .... .... .... .... .... .... ..0. .... = SEQ4_STATUS_ADMIN_STATE_REVOKED: Not set
>                 .... .... .... .... .... .... .1.. .... = SEQ4_STATUS_RECALLABLE_STATE_REVOKED: Set
>                 .... .... .... .... .... .... 0... .... = SEQ4_STATUS_LEASE_MOVED: Not set
>                 .... .... .... .... .... ...0 .... .... = SEQ4_STATUS_RESTART_RECLAIM_NEEDED: Not set
>                 .... .... .... .... .... ..0. .... .... = SEQ4_STATUS_CB_PATH_DOWN_SESSION: Not set
>                 .... .... .... .... .... .0.. .... .... = SEQ4_STATUS_BACKCHANNEL_FAULT: Not set
>                 .... .... .... .... .... 0... .... .... = SEQ4_STATUS_DEVID_CHANGED: Not set
>                 .... .... .... .... ...0 .... .... .... = SEQ4_STATUS_DEVID_DELETED: Not set
>         Opcode: PUTFH (22)
>             Status: NFS4_OK (0)
>         Opcode: OPEN (18)
>             Status: NFS4_OK (0)
>             stateid
>                 [StateID Hash: 0xb207]
>                 seqid: 0x00158249
>                 Data: f0f40f58bd02000092000000
>                 [Data hash (CRC-32): 0xf3e67f06]
>             change_info
>                 Atomic: No
>                 changeid (before): 0
>                 changeid (after): 0
>             result flags: 0x00000004, locktype posix
>                 .... .... .... .... .... .... .... ..0. = confirm: False
>                 .... .... .... .... .... .... .... .1.. = locktype posix: True
>                 .... .... .... .... .... .... .... 0... = preserve unlinked: False
>                 .... .... .... .... .... .... ..0. .... = may notify lock: False
>             Delegation Type: OPEN_DELEGATE_NONE (0)
>         Opcode: ACCESS (3), [Access Denied: XE], [Allowed: RD MD XT]
>             Status: NFS4_OK (0)
>             Supported types (of requested): 0x2d
>                 .... ...1 = 0x01 READ: supported
>                 .... .1.. = 0x04 MODIFY: supported
>                 .... 1... = 0x08 EXTEND: supported
>                 ..1. .... = 0x20 EXECUTE: supported
>             Access rights (of requested): 0x0d
>                 .... ...1 = 0x01 READ: allowed
>                 .... .1.. = 0x04 MODIFY: allowed
>                 .... 1... = 0x08 EXTEND: allowed
>                 ..0. .... = 0x20 EXECUTE: *Access Denied*
>         Opcode: GETATTR (9)
>             Status: NFS4_OK (0)
>             Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId)
>                 reqd_attr: Type (1)
>                     ftype4: NF4REG (1)
>                 reqd_attr: Change (3)
>                     changeid: 6353654913397708009
>                 reqd_attr: Size (4)
>                     size: 32768
>                 reqd_attr: FSID (8)
>                     fattr4_fsid
>                         fsid4.major: 7704996244782214878
>                         fsid4.minor: 0
>                 reco_attr: FileId (20)
>                     fileid: 4737174
>             Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId)
>                 reco_attr: Mode (33)
>                     mode: 0644, Name: Unknown, Read permission for owner, Write permission for owner, Read permission for group, Read permission for others
>                         .... .... .... .... 000. .... .... .... = Name: Unknown (0)
>                         .... .... .... .... .... 0... .... .... = Set user id on exec: No
>                         .... .... .... .... .... .0.. .... .... = Set group id on exec: No
>                         .... .... .... .... .... ..0. .... .... = Save swapped text even after use: No
>                         .... .... .... .... .... ...1 .... .... = Read permission for owner: Yes
>                         .... .... .... .... .... .... 1... .... = Write permission for owner: Yes
>                         .... .... .... .... .... .... .0.. .... = Execute permission for owner: No
>                         .... .... .... .... .... .... ..1. .... = Read permission for group: Yes
>                         .... .... .... .... .... .... ...0 .... = Write permission for group: No
>                         .... .... .... .... .... .... .... 0... = Execute permission for group: No
>                         .... .... .... .... .... .... .... .1.. = Read permission for others: Yes
>                         .... .... .... .... .... .... .... ..0. = Write permission for others: No
>                         .... .... .... .... .... .... .... ...0 = Execute permission for others: No
>                 reco_attr: NumLinks (35)
>                     numlinks: 1
>                 reco_attr: Owner (36)
>                     fattr4_owner: 7143
>                         length: 4
>                         contents: 7143
>                 reco_attr: Owner_Group (37)
>                     fattr4_owner_group: 1112
>                         length: 4
>                         contents: 1112
>                 reco_attr: RawDev (41)
>                     specdata1: 0
>                     specdata2: 0
>                 reco_attr: Space_Used (45)
>                     space_used: 32768
>                 reco_attr: Time_Access (47)
>                     seconds: 1479318517
>                     nseconds: 593359867
>                 reco_attr: Time_Metadata (52)
>                     seconds: 1479325563
>                     nseconds: 175920361
>                 reco_attr: Time_Modify (53)
>                     seconds: 1479325563
>                     nseconds: 175920361
>                 reco_attr: Mounted_on_FileId (55)
>                     fileid: 0x0000000000484896
>     [Main Opcode: OPEN (18)]
> 
> Frame 4: 322 bytes on wire (2576 bits), 322 bytes captured (2576 bits) on interface 0
> Ethernet II, Src: AsrockIn_84:34:e7 (d0:50:99:84:34:e7), Dst: SuperMic_19:8e:6e (00:25:90:19:8e:6e)
> Internet Protocol Version 4, Src: 172.21.86.141, Dst: 172.21.86.74
> Transmission Control Protocol, Src Port: 997, Dst Port: 2049, Seq: 257, Ack: 649, Len: 256
> Remote Procedure Call, Type:Call XID:0x2a009a8c
> Network File System, Ops(5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR
>     [Program Version: 4]
>     [V4 Procedure: COMPOUND (1)]
>     Tag: <EMPTY>
>         length: 0
>         contents: <EMPTY>
>     minorversion: 1
>     Operations (count: 5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR
>         Opcode: SEQUENCE (53)
>             sessionid: f0f40f58bd020000e002000000000000
>             seqid: 0x002b318a
>             slot id: 0
>             high slot id: 0
>             cache this?: Yes
>         Opcode: PUTFH (22)
>             filehandle
>                 length: 28
>                 [hash (CRC-32): 0x26ab9d8d]
>                 filehandle: 010006016aeda54305f66ede00000000000000009a424800...
>         Opcode: OPEN (18)
>             seqid: 0x00000000
>             share_access: OPEN4_SHARE_ACCESS_BOTH (3)
>             share_deny: OPEN4_SHARE_DENY_NONE (0)
>             clientid: 0xf0f40f58bd020000
>             owner: <DATA>
>                 length: 24
>                 contents: <DATA>
>             Open Type: OPEN4_NOCREATE (0)
>             Claim Type: CLAIM_FH (4)
>         Opcode: ACCESS (3), [Check: RD MD XT XE]
>             Check access: 0x2d
>                 .... ...1 = 0x01 READ: allowed?
>                 .... .1.. = 0x04 MODIFY: allowed?
>                 .... 1... = 0x08 EXTEND: allowed?
>                 ..1. .... = 0x20 EXECUTE: allowed?
>         Opcode: GETATTR (9)
>             Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId)
>                 reqd_attr: Type (1)
>                 reqd_attr: Change (3)
>                 reqd_attr: Size (4)
>                 reqd_attr: FSID (8)
>                 reco_attr: FileId (20)
>             Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId)
>                 reco_attr: Mode (33)
>                 reco_attr: NumLinks (35)
>                 reco_attr: Owner (36)
>                 reco_attr: Owner_Group (37)
>                 reco_attr: RawDev (41)
>                 reco_attr: Space_Used (45)
>                 reco_attr: Time_Access (47)
>                 reco_attr: Time_Metadata (52)
>                 reco_attr: Time_Modify (53)
>                 reco_attr: Mounted_on_FileId (55)
>     [Main Opcode: OPEN (18)]
> 
> Frame 5: 322 bytes on wire (2576 bits), 322 bytes captured (2576 bits) on interface 0
> Ethernet II, Src: AsrockIn_84:34:e7 (d0:50:99:84:34:e7), Dst: SuperMic_19:8e:6e (00:25:90:19:8e:6e)
> Internet Protocol Version 4, Src: 172.21.86.141, Dst: 172.21.86.74
> Transmission Control Protocol, Src Port: 997, Dst Port: 2049, Seq: 769, Ack: 1297, Len: 256
> Remote Procedure Call, Type:Call XID:0x2c009a8c
> Network File System, Ops(5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR
>     [Program Version: 4]
>     [V4 Procedure: COMPOUND (1)]
>     Tag: <EMPTY>
>         length: 0
>         contents: <EMPTY>
>     minorversion: 1
>     Operations (count: 5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR
>         Opcode: SEQUENCE (53)
>             sessionid: f0f40f58bd020000e002000000000000
>             seqid: 0x002b318c
>             slot id: 0
>             high slot id: 0
>             cache this?: Yes
>         Opcode: PUTFH (22)
>             filehandle
>                 length: 28
>                 [hash (CRC-32): 0x26ab9d8d]
>                 filehandle: 010006016aeda54305f66ede00000000000000009a424800...
>         Opcode: OPEN (18)
>             seqid: 0x00000000
>             share_access: OPEN4_SHARE_ACCESS_BOTH (3)
>             share_deny: OPEN4_SHARE_DENY_NONE (0)
>             clientid: 0xf0f40f58bd020000
>             owner: <DATA>
>                 length: 24
>                 contents: <DATA>
>             Open Type: OPEN4_NOCREATE (0)
>             Claim Type: CLAIM_FH (4)
>         Opcode: ACCESS (3), [Check: RD MD XT XE]
>             Check access: 0x2d
>                 .... ...1 = 0x01 READ: allowed?
>                 .... .1.. = 0x04 MODIFY: allowed?
>                 .... 1... = 0x08 EXTEND: allowed?
>                 ..1. .... = 0x20 EXECUTE: allowed?
>         Opcode: GETATTR (9)
>             Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId)
>                 reqd_attr: Type (1)
>                 reqd_attr: Change (3)
>                 reqd_attr: Size (4)
>                 reqd_attr: FSID (8)
>                 reco_attr: FileId (20)
>             Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId)
>                 reco_attr: Mode (33)
>                 reco_attr: NumLinks (35)
>                 reco_attr: Owner (36)
>                 reco_attr: Owner_Group (37)
>                 reco_attr: RawDev (41)
>                 reco_attr: Space_Used (45)
>                 reco_attr: Time_Access (47)
>                 reco_attr: Time_Metadata (52)
>                 reco_attr: Time_Modify (53)
>                 reco_attr: Mounted_on_FileId (55)
>     [Main Opcode: OPEN (18)]
> 
> Frame 6: 390 bytes on wire (3120 bits), 390 bytes captured (3120 bits) on interface 0
> Ethernet II, Src: SuperMic_19:8e:6e (00:25:90:19:8e:6e), Dst: AsrockIn_84:34:e7 (d0:50:99:84:34:e7)
> Internet Protocol Version 4, Src: 172.21.86.74, Dst: 172.21.86.141
> Transmission Control Protocol, Src Port: 2049, Dst Port: 997, Seq: 1297, Ack: 1025, Len: 324
> Remote Procedure Call, Type:Reply XID:0x2c009a8c
> Network File System, Ops(5): SEQUENCE PUTFH OPEN ACCESS GETATTR
>     [Program Version: 4]
>     [V4 Procedure: COMPOUND (1)]
>     Status: NFS4_OK (0)
>     Tag: <EMPTY>
>         length: 0
>         contents: <EMPTY>
>     Operations (count: 5)
>         Opcode: SEQUENCE (53)
>             Status: NFS4_OK (0)
>             sessionid: f0f40f58bd020000e002000000000000
>             seqid: 0x002b318c
>             slot id: 0
>             high slot id: 30
>             target high slot id: 30
>             status flags: 0x00000040, SEQ4_STATUS_RECALLABLE_STATE_REVOKED
>                 .... .... .... .... .... .... .... ...0 = SEQ4_STATUS_CB_PATH_DOWN: Not set
>                 .... .... .... .... .... .... .... ..0. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING: Not set
>                 .... .... .... .... .... .... .... .0.. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED: Not set
>                 .... .... .... .... .... .... .... 0... = SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED: Not set
>                 .... .... .... .... .... .... ...0 .... = SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED: Not set
>                 .... .... .... .... .... .... ..0. .... = SEQ4_STATUS_ADMIN_STATE_REVOKED: Not set
>                 .... .... .... .... .... .... .1.. .... = SEQ4_STATUS_RECALLABLE_STATE_REVOKED: Set
>                 .... .... .... .... .... .... 0... .... = SEQ4_STATUS_LEASE_MOVED: Not set
>                 .... .... .... .... .... ...0 .... .... = SEQ4_STATUS_RESTART_RECLAIM_NEEDED: Not set
>                 .... .... .... .... .... ..0. .... .... = SEQ4_STATUS_CB_PATH_DOWN_SESSION: Not set
>                 .... .... .... .... .... .0.. .... .... = SEQ4_STATUS_BACKCHANNEL_FAULT: Not set
>                 .... .... .... .... .... 0... .... .... = SEQ4_STATUS_DEVID_CHANGED: Not set
>                 .... .... .... .... ...0 .... .... .... = SEQ4_STATUS_DEVID_DELETED: Not set
>         Opcode: PUTFH (22)
>             Status: NFS4_OK (0)
>         Opcode: OPEN (18)
>             Status: NFS4_OK (0)
>             stateid
>                 [StateID Hash: 0xb9c3]
>                 seqid: 0x0015824c
>                 Data: f0f40f58bd0200009e000000
>                 [Data hash (CRC-32): 0xb930c0be]
>             change_info
>                 Atomic: No
>                 changeid (before): 0
>                 changeid (after): 0
>             result flags: 0x00000004, locktype posix
>                 .... .... .... .... .... .... .... ..0. = confirm: False
>                 .... .... .... .... .... .... .... .1.. = locktype posix: True
>                 .... .... .... .... .... .... .... 0... = preserve unlinked: False
>                 .... .... .... .... .... .... ..0. .... = may notify lock: False
>             Delegation Type: OPEN_DELEGATE_NONE (0)
>         Opcode: ACCESS (3), [Access Denied: XE], [Allowed: RD MD XT]
>             Status: NFS4_OK (0)
>             Supported types (of requested): 0x2d
>                 .... ...1 = 0x01 READ: supported
>                 .... .1.. = 0x04 MODIFY: supported
>                 .... 1... = 0x08 EXTEND: supported
>                 ..1. .... = 0x20 EXECUTE: supported
>             Access rights (of requested): 0x0d
>                 .... ...1 = 0x01 READ: allowed
>                 .... .1.. = 0x04 MODIFY: allowed
>                 .... 1... = 0x08 EXTEND: allowed
>                 ..0. .... = 0x20 EXECUTE: *Access Denied*
>         Opcode: GETATTR (9)
>             Status: NFS4_OK (0)
>             Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId)
>                 reqd_attr: Type (1)
>                     ftype4: NF4REG (1)
>                 reqd_attr: Change (3)
>                     changeid: 6310935438380194724
>                 reqd_attr: Size (4)
>                     size: 24576
>                 reqd_attr: FSID (8)
>                     fattr4_fsid
>                         fsid4.major: 7704996244782214878
>                         fsid4.minor: 0
>                 reco_attr: FileId (20)
>                     fileid: 4735642
>             Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId)
>                 reco_attr: Mode (33)
>                     mode: 0644, Name: Unknown, Read permission for owner, Write permission for owner, Read permission for group, Read permission for others
>                         .... .... .... .... 000. .... .... .... = Name: Unknown (0)
>                         .... .... .... .... .... 0... .... .... = Set user id on exec: No
>                         .... .... .... .... .... .0.. .... .... = Set group id on exec: No
>                         .... .... .... .... .... ..0. .... .... = Save swapped text even after use: No
>                         .... .... .... .... .... ...1 .... .... = Read permission for owner: Yes
>                         .... .... .... .... .... .... 1... .... = Write permission for owner: Yes
>                         .... .... .... .... .... .... .0.. .... = Execute permission for owner: No
>                         .... .... .... .... .... .... ..1. .... = Read permission for group: Yes
>                         .... .... .... .... .... .... ...0 .... = Write permission for group: No
>                         .... .... .... .... .... .... .... 0... = Execute permission for group: No
>                         .... .... .... .... .... .... .... .1.. = Read permission for others: Yes
>                         .... .... .... .... .... .... .... ..0. = Write permission for others: No
>                         .... .... .... .... .... .... .... ...0 = Execute permission for others: No
>                 reco_attr: NumLinks (35)
>                     numlinks: 1
>                 reco_attr: Owner (36)
>                     fattr4_owner: 7143
>                         length: 4
>                         contents: 7143
>                 reco_attr: Owner_Group (37)
>                     fattr4_owner_group: 1112
>                         length: 4
>                         contents: 1112
>                 reco_attr: RawDev (41)
>                     specdata1: 0
>                     specdata2: 0
>                 reco_attr: Space_Used (45)
>                     space_used: 24576
>                 reco_attr: Time_Access (47)
>                     seconds: 1479325563
>                     nseconds: 305919836
>                 reco_attr: Time_Metadata (52)
>                     seconds: 1469379160
>                     nseconds: 756243364
>                 reco_attr: Time_Modify (53)
>                     seconds: 1469379160
>                     nseconds: 756243364
>                 reco_attr: Mounted_on_FileId (55)
>                     fileid: 0x000000000048429a
>     [Main Opcode: OPEN (18)]
> 
> Frame 7: 322 bytes on wire (2576 bits), 322 bytes captured (2576 bits) on interface 0
> Ethernet II, Src: AsrockIn_84:34:e7 (d0:50:99:84:34:e7), Dst: SuperMic_19:8e:6e (00:25:90:19:8e:6e)
> Internet Protocol Version 4, Src: 172.21.86.141, Dst: 172.21.86.74
> Transmission Control Protocol, Src Port: 997, Dst Port: 2049, Seq: 1025, Ack: 1621, Len: 256
> Remote Procedure Call, Type:Call XID:0x2d009a8c
> Network File System, Ops(5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR
>     [Program Version: 4]
>     [V4 Procedure: COMPOUND (1)]
>     Tag: <EMPTY>
>         length: 0
>         contents: <EMPTY>
>     minorversion: 1
>     Operations (count: 5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR
>         Opcode: SEQUENCE (53)
>             sessionid: f0f40f58bd020000e002000000000000
>             seqid: 0x002b318d
>             slot id: 0
>             high slot id: 0
>             cache this?: Yes
>         Opcode: PUTFH (22)
>             filehandle
>                 length: 28
>                 [hash (CRC-32): 0xdcaec4f7]
>                 filehandle: 010006016aeda54305f66ede000000000000000096484800...
>         Opcode: OPEN (18)
>             seqid: 0x00000000
>             share_access: OPEN4_SHARE_ACCESS_BOTH (3)
>             share_deny: OPEN4_SHARE_DENY_NONE (0)
>             clientid: 0xf0f40f58bd020000
>             owner: <DATA>
>                 length: 24
>                 contents: <DATA>
>             Open Type: OPEN4_NOCREATE (0)
>             Claim Type: CLAIM_FH (4)
>         Opcode: ACCESS (3), [Check: RD MD XT XE]
>             Check access: 0x2d
>                 .... ...1 = 0x01 READ: allowed?
>                 .... .1.. = 0x04 MODIFY: allowed?
>                 .... 1... = 0x08 EXTEND: allowed?
>                 ..1. .... = 0x20 EXECUTE: allowed?
>         Opcode: GETATTR (9)
>             Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId)
>                 reqd_attr: Type (1)
>                 reqd_attr: Change (3)
>                 reqd_attr: Size (4)
>                 reqd_attr: FSID (8)
>                 reco_attr: FileId (20)
>             Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId)
>                 reco_attr: Mode (33)
>                 reco_attr: NumLinks (35)
>                 reco_attr: Owner (36)
>                 reco_attr: Owner_Group (37)
>                 reco_attr: RawDev (41)
>                 reco_attr: Space_Used (45)
>                 reco_attr: Time_Access (47)
>                 reco_attr: Time_Metadata (52)
>                 reco_attr: Time_Modify (53)
>                 reco_attr: Mounted_on_FileId (55)
>     [Main Opcode: OPEN (18)]
> 
> Frame 8: 390 bytes on wire (3120 bits), 390 bytes captured (3120 bits) on interface 0
> Ethernet II, Src: SuperMic_19:8e:6e (00:25:90:19:8e:6e), Dst: AsrockIn_84:34:e7 (d0:50:99:84:34:e7)
> Internet Protocol Version 4, Src: 172.21.86.74, Dst: 172.21.86.141
> Transmission Control Protocol, Src Port: 2049, Dst Port: 997, Seq: 1621, Ack: 1281, Len: 324
> Remote Procedure Call, Type:Reply XID:0x2d009a8c
> Network File System, Ops(5): SEQUENCE PUTFH OPEN ACCESS GETATTR
>     [Program Version: 4]
>     [V4 Procedure: COMPOUND (1)]
>     Status: NFS4_OK (0)
>     Tag: <EMPTY>
>         length: 0
>         contents: <EMPTY>
>     Operations (count: 5)
>         Opcode: SEQUENCE (53)
>             Status: NFS4_OK (0)
>             sessionid: f0f40f58bd020000e002000000000000
>             seqid: 0x002b318d
>             slot id: 0
>             high slot id: 30
>             target high slot id: 30
>             status flags: 0x00000040, SEQ4_STATUS_RECALLABLE_STATE_REVOKED
>                 .... .... .... .... .... .... .... ...0 = SEQ4_STATUS_CB_PATH_DOWN: Not set
>                 .... .... .... .... .... .... .... ..0. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING: Not set
>                 .... .... .... .... .... .... .... .0.. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED: Not set
>                 .... .... .... .... .... .... .... 0... = SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED: Not set
>                 .... .... .... .... .... .... ...0 .... = SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED: Not set
>                 .... .... .... .... .... .... ..0. .... = SEQ4_STATUS_ADMIN_STATE_REVOKED: Not set
>                 .... .... .... .... .... .... .1.. .... = SEQ4_STATUS_RECALLABLE_STATE_REVOKED: Set
>                 .... .... .... .... .... .... 0... .... = SEQ4_STATUS_LEASE_MOVED: Not set
>                 .... .... .... .... .... ...0 .... .... = SEQ4_STATUS_RESTART_RECLAIM_NEEDED: Not set
>                 .... .... .... .... .... ..0. .... .... = SEQ4_STATUS_CB_PATH_DOWN_SESSION: Not set
>                 .... .... .... .... .... .0.. .... .... = SEQ4_STATUS_BACKCHANNEL_FAULT: Not set
>                 .... .... .... .... .... 0... .... .... = SEQ4_STATUS_DEVID_CHANGED: Not set
>                 .... .... .... .... ...0 .... .... .... = SEQ4_STATUS_DEVID_DELETED: Not set
>         Opcode: PUTFH (22)
>             Status: NFS4_OK (0)
>         Opcode: OPEN (18)
>             Status: NFS4_OK (0)
>             stateid
>                 [StateID Hash: 0x1925]
>                 seqid: 0x0015824b
>                 Data: f0f40f58bd02000092000000
>                 [Data hash (CRC-32): 0xf3e67f06]
>             change_info
>                 Atomic: No
>                 changeid (before): 0
>                 changeid (after): 0
>             result flags: 0x00000004, locktype posix
>                 .... .... .... .... .... .... .... ..0. = confirm: False
>                 .... .... .... .... .... .... .... .1.. = locktype posix: True
>                 .... .... .... .... .... .... .... 0... = preserve unlinked: False
>                 .... .... .... .... .... .... ..0. .... = may notify lock: False
>             Delegation Type: OPEN_DELEGATE_NONE (0)
>         Opcode: ACCESS (3), [Access Denied: XE], [Allowed: RD MD XT]
>             Status: NFS4_OK (0)
>             Supported types (of requested): 0x2d
>                 .... ...1 = 0x01 READ: supported
>                 .... .1.. = 0x04 MODIFY: supported
>                 .... 1... = 0x08 EXTEND: supported
>                 ..1. .... = 0x20 EXECUTE: supported
>             Access rights (of requested): 0x0d
>                 .... ...1 = 0x01 READ: allowed
>                 .... .1.. = 0x04 MODIFY: allowed
>                 .... 1... = 0x08 EXTEND: allowed
>                 ..0. .... = 0x20 EXECUTE: *Access Denied*
>         Opcode: GETATTR (9)
>             Status: NFS4_OK (0)
>             Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId)
>                 reqd_attr: Type (1)
>                     ftype4: NF4REG (1)
>                 reqd_attr: Change (3)
>                     changeid: 6353654913397708009
>                 reqd_attr: Size (4)
>                     size: 32768
>                 reqd_attr: FSID (8)
>                     fattr4_fsid
>                         fsid4.major: 7704996244782214878
>                         fsid4.minor: 0
>                 reco_attr: FileId (20)
>                     fileid: 4737174
>             Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId)
>                 reco_attr: Mode (33)
>                     mode: 0644, Name: Unknown, Read permission for owner, Write permission for owner, Read permission for group, Read permission for others
>                         .... .... .... .... 000. .... .... .... = Name: Unknown (0)
>                         .... .... .... .... .... 0... .... .... = Set user id on exec: No
>                         .... .... .... .... .... .0.. .... .... = Set group id on exec: No
>                         .... .... .... .... .... ..0. .... .... = Save swapped text even after use: No
>                         .... .... .... .... .... ...1 .... .... = Read permission for owner: Yes
>                         .... .... .... .... .... .... 1... .... = Write permission for owner: Yes
>                         .... .... .... .... .... .... .0.. .... = Execute permission for owner: No
>                         .... .... .... .... .... .... ..1. .... = Read permission for group: Yes
>                         .... .... .... .... .... .... ...0 .... = Write permission for group: No
>                         .... .... .... .... .... .... .... 0... = Execute permission for group: No
>                         .... .... .... .... .... .... .... .1.. = Read permission for others: Yes
>                         .... .... .... .... .... .... .... ..0. = Write permission for others: No
>                         .... .... .... .... .... .... .... ...0 = Execute permission for others: No
>                 reco_attr: NumLinks (35)
>                     numlinks: 1
>                 reco_attr: Owner (36)
>                     fattr4_owner: 7143
>                         length: 4
>                         contents: 7143
>                 reco_attr: Owner_Group (37)
>                     fattr4_owner_group: 1112
>                         length: 4
>                         contents: 1112
>                 reco_attr: RawDev (41)
>                     specdata1: 0
>                     specdata2: 0
>                 reco_attr: Space_Used (45)
>                     space_used: 32768
>                 reco_attr: Time_Access (47)
>                     seconds: 1479318517
>                     nseconds: 593359867
>                 reco_attr: Time_Metadata (52)
>                     seconds: 1479325563
>                     nseconds: 175920361
>                 reco_attr: Time_Modify (53)
>                     seconds: 1479325563
>                     nseconds: 175920361
>                 reco_attr: Mounted_on_FileId (55)
>                     fileid: 0x0000000000484896
>     [Main Opcode: OPEN (18)]
> 
> Frame 9: 322 bytes on wire (2576 bits), 322 bytes captured (2576 bits) on interface 0
> Ethernet II, Src: AsrockIn_84:34:e7 (d0:50:99:84:34:e7), Dst: SuperMic_19:8e:6e (00:25:90:19:8e:6e)
> Internet Protocol Version 4, Src: 172.21.86.141, Dst: 172.21.86.74
> Transmission Control Protocol, Src Port: 997, Dst Port: 2049, Seq: 1281, Ack: 1945, Len: 256
> Remote Procedure Call, Type:Call XID:0x2e009a8c
> Network File System, Ops(5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR
>     [Program Version: 4]
>     [V4 Procedure: COMPOUND (1)]
>     Tag: <EMPTY>
>         length: 0
>         contents: <EMPTY>
>     minorversion: 1
>     Operations (count: 5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR
>         Opcode: SEQUENCE (53)
>             sessionid: f0f40f58bd020000e002000000000000
>             seqid: 0x002b318e
>             slot id: 0
>             high slot id: 0
>             cache this?: Yes
>         Opcode: PUTFH (22)
>             filehandle
>                 length: 28
>                 [hash (CRC-32): 0x26ab9d8d]
>                 filehandle: 010006016aeda54305f66ede00000000000000009a424800...
>         Opcode: OPEN (18)
>             seqid: 0x00000000
>             share_access: OPEN4_SHARE_ACCESS_BOTH (3)
>             share_deny: OPEN4_SHARE_DENY_NONE (0)
>             clientid: 0xf0f40f58bd020000
>             owner: <DATA>
>                 length: 24
>                 contents: <DATA>
>             Open Type: OPEN4_NOCREATE (0)
>             Claim Type: CLAIM_FH (4)
>         Opcode: ACCESS (3), [Check: RD MD XT XE]
>             Check access: 0x2d
>                 .... ...1 = 0x01 READ: allowed?
>                 .... .1.. = 0x04 MODIFY: allowed?
>                 .... 1... = 0x08 EXTEND: allowed?
>                 ..1. .... = 0x20 EXECUTE: allowed?
>         Opcode: GETATTR (9)
>             Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId)
>                 reqd_attr: Type (1)
>                 reqd_attr: Change (3)
>                 reqd_attr: Size (4)
>                 reqd_attr: FSID (8)
>                 reco_attr: FileId (20)
>             Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId)
>                 reco_attr: Mode (33)
>                 reco_attr: NumLinks (35)
>                 reco_attr: Owner (36)
>                 reco_attr: Owner_Group (37)
>                 reco_attr: RawDev (41)
>                 reco_attr: Space_Used (45)
>                 reco_attr: Time_Access (47)
>                 reco_attr: Time_Metadata (52)
>                 reco_attr: Time_Modify (53)
>                 reco_attr: Mounted_on_FileId (55)
>     [Main Opcode: OPEN (18)]
> 
> Frame 10: 390 bytes on wire (3120 bits), 390 bytes captured (3120 bits) on interface 0
> Ethernet II, Src: SuperMic_19:8e:6e (00:25:90:19:8e:6e), Dst: AsrockIn_84:34:e7 (d0:50:99:84:34:e7)
> Internet Protocol Version 4, Src: 172.21.86.74, Dst: 172.21.86.141
> Transmission Control Protocol, Src Port: 2049, Dst Port: 997, Seq: 1945, Ack: 1537, Len: 324
> Remote Procedure Call, Type:Reply XID:0x2e009a8c
> Network File System, Ops(5): SEQUENCE PUTFH OPEN ACCESS GETATTR
>     [Program Version: 4]
>     [V4 Procedure: COMPOUND (1)]
>     Status: NFS4_OK (0)
>     Tag: <EMPTY>
>         length: 0
>         contents: <EMPTY>
>     Operations (count: 5)
>         Opcode: SEQUENCE (53)
>             Status: NFS4_OK (0)
>             sessionid: f0f40f58bd020000e002000000000000
>             seqid: 0x002b318e
>             slot id: 0
>             high slot id: 30
>             target high slot id: 30
>             status flags: 0x00000040, SEQ4_STATUS_RECALLABLE_STATE_REVOKED
>                 .... .... .... .... .... .... .... ...0 = SEQ4_STATUS_CB_PATH_DOWN: Not set
>                 .... .... .... .... .... .... .... ..0. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING: Not set
>                 .... .... .... .... .... .... .... .0.. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED: Not set
>                 .... .... .... .... .... .... .... 0... = SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED: Not set
>                 .... .... .... .... .... .... ...0 .... = SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED: Not set
>                 .... .... .... .... .... .... ..0. .... = SEQ4_STATUS_ADMIN_STATE_REVOKED: Not set
>                 .... .... .... .... .... .... .1.. .... = SEQ4_STATUS_RECALLABLE_STATE_REVOKED: Set
>                 .... .... .... .... .... .... 0... .... = SEQ4_STATUS_LEASE_MOVED: Not set
>                 .... .... .... .... .... ...0 .... .... = SEQ4_STATUS_RESTART_RECLAIM_NEEDED: Not set
>                 .... .... .... .... .... ..0. .... .... = SEQ4_STATUS_CB_PATH_DOWN_SESSION: Not set
>                 .... .... .... .... .... .0.. .... .... = SEQ4_STATUS_BACKCHANNEL_FAULT: Not set
>                 .... .... .... .... .... 0... .... .... = SEQ4_STATUS_DEVID_CHANGED: Not set
>                 .... .... .... .... ...0 .... .... .... = SEQ4_STATUS_DEVID_DELETED: Not set
>         Opcode: PUTFH (22)
>             Status: NFS4_OK (0)
>         Opcode: OPEN (18)
>             Status: NFS4_OK (0)
>             stateid
>                 [StateID Hash: 0xec52]
>                 seqid: 0x0015824d
>                 Data: f0f40f58bd0200009e000000
>                 [Data hash (CRC-32): 0xb930c0be]
>             change_info
>                 Atomic: No
>                 changeid (before): 0
>                 changeid (after): 0
>             result flags: 0x00000004, locktype posix
>                 .... .... .... .... .... .... .... ..0. = confirm: False
>                 .... .... .... .... .... .... .... .1.. = locktype posix: True
>                 .... .... .... .... .... .... .... 0... = preserve unlinked: False
>                 .... .... .... .... .... .... ..0. .... = may notify lock: False
>             Delegation Type: OPEN_DELEGATE_NONE (0)
>         Opcode: ACCESS (3), [Access Denied: XE], [Allowed: RD MD XT]
>             Status: NFS4_OK (0)
>             Supported types (of requested): 0x2d
>                 .... ...1 = 0x01 READ: supported
>                 .... .1.. = 0x04 MODIFY: supported
>                 .... 1... = 0x08 EXTEND: supported
>                 ..1. .... = 0x20 EXECUTE: supported
>             Access rights (of requested): 0x0d
>                 .... ...1 = 0x01 READ: allowed
>                 .... .1.. = 0x04 MODIFY: allowed
>                 .... 1... = 0x08 EXTEND: allowed
>                 ..0. .... = 0x20 EXECUTE: *Access Denied*
>         Opcode: GETATTR (9)
>             Status: NFS4_OK (0)
>             Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId)
>                 reqd_attr: Type (1)
>                     ftype4: NF4REG (1)
>                 reqd_attr: Change (3)
>                     changeid: 6310935438380194724
>                 reqd_attr: Size (4)
>                     size: 24576
>                 reqd_attr: FSID (8)
>                     fattr4_fsid
>                         fsid4.major: 7704996244782214878
>                         fsid4.minor: 0
>                 reco_attr: FileId (20)
>                     fileid: 4735642
>             Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId)
>                 reco_attr: Mode (33)
>                     mode: 0644, Name: Unknown, Read permission for owner, Write permission for owner, Read permission for group, Read permission for others
>                         .... .... .... .... 000. .... .... .... = Name: Unknown (0)
>                         .... .... .... .... .... 0... .... .... = Set user id on exec: No
>                         .... .... .... .... .... .0.. .... .... = Set group id on exec: No
>                         .... .... .... .... .... ..0. .... .... = Save swapped text even after use: No
>                         .... .... .... .... .... ...1 .... .... = Read permission for owner: Yes
>                         .... .... .... .... .... .... 1... .... = Write permission for owner: Yes
>                         .... .... .... .... .... .... .0.. .... = Execute permission for owner: No
>                         .... .... .... .... .... .... ..1. .... = Read permission for group: Yes
>                         .... .... .... .... .... .... ...0 .... = Write permission for group: No
>                         .... .... .... .... .... .... .... 0... = Execute permission for group: No
>                         .... .... .... .... .... .... .... .1.. = Read permission for others: Yes
>                         .... .... .... .... .... .... .... ..0. = Write permission for others: No
>                         .... .... .... .... .... .... .... ...0 = Execute permission for others: No
>                 reco_attr: NumLinks (35)
>                     numlinks: 1
>                 reco_attr: Owner (36)
>                     fattr4_owner: 7143
>                         length: 4
>                         contents: 7143
>                 reco_attr: Owner_Group (37)
>                     fattr4_owner_group: 1112
>                         length: 4
>                         contents: 1112
>                 reco_attr: RawDev (41)
>                     specdata1: 0
>                     specdata2: 0
>                 reco_attr: Space_Used (45)
>                     space_used: 24576
>                 reco_attr: Time_Access (47)
>                     seconds: 1479325563
>                     nseconds: 305919836
>                 reco_attr: Time_Metadata (52)
>                     seconds: 1469379160
>                     nseconds: 756243364
>                 reco_attr: Time_Modify (53)
>                     seconds: 1469379160
>                     nseconds: 756243364
>                 reco_attr: Mounted_on_FileId (55)
>                     fileid: 0x000000000048429a
>     [Main Opcode: OPEN (18)]
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-11-17 16:31               ` J. Bruce Fields
@ 2016-11-17 17:08                 ` Jason L Tibbitts III
  2016-11-17 20:22                   ` Andrew W Elble
  2016-11-17 17:45                 ` Trond Myklebust
  1 sibling, 1 reply; 27+ messages in thread
From: Jason L Tibbitts III @ 2016-11-17 17:08 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

>>>>> "JBF" == J Bruce Fields <bfields@fieldses.org> writes:

JBF> So, you're using NFSv4.1 or 4.2, and the server thinks that the
JBF> client has reused a (slot, sequence number) pair, but the server
JBF> doesn't have a cached response to return.

Thanks for the reply.  Sadly I don't understand all of it, but...

JBF> Hard to know how that happened, and it's not shown in the below.
JBF> Sounds like a bug, though.

Yeah, I only found the problem after it was already happening, so
obviously the beginning of the process is missing.  And sadly it's not
something I can easily repeat, so short of running some continuous
package capture (which would be hard since once this starts the traffic
volume is huge) there's no easy way to see it.

Is there any state on either the client or server that I could inspect
which might give any hints?  I can add that to my notes in case this
problem happens again.

JBF> Recent clients will use sec=krb5 for certain state-related
JBF> operations even if you mount with sec=sys, so it's still possible
JBF> it could be involved here.

On the server, the involved filesystem isn't exported with any sec=
options, in case it matters.

JBF> The SEQ4_STATUS_RECALLABLE_STATE_REVOKED flag set in the OPEN
JBF> replies is also a sign something's gone wrong.  Apparently the
JBF> server thinks the client has failed to return a delegation.

I can't imagine how that might have happened.  There is nothing else
NFS-related in the client's log besides the spew and that final line.
There are some automount complaints about the user accessing directories
that aren't in the map sources, and the usual random gssproxy noise
which was fixed in Fedora 24.

Currently the system is stable; it hasn't been rebooted since the
problem occurred.  Everything cleared up once I was able to unmounted
the problematic filesystem.

 - J<

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-11-17 16:31               ` J. Bruce Fields
  2016-11-17 17:08                 ` Jason L Tibbitts III
@ 2016-11-17 17:45                 ` Trond Myklebust
  2016-11-17 19:32                   ` bfields
  1 sibling, 1 reply; 27+ messages in thread
From: Trond Myklebust @ 2016-11-17 17:45 UTC (permalink / raw)
  To: bfields, tibbs; +Cc: linux-nfs

T24gVGh1LCAyMDE2LTExLTE3IGF0IDExOjMxIC0wNTAwLCBKLiBCcnVjZSBGaWVsZHMgd3JvdGU6
DQo+IE9uIFdlZCwgTm92IDE2LCAyMDE2IGF0IDAyOjU1OjA1UE0gLTA2MDAsIEphc29uIEwgVGli
Yml0dHMgSUlJIHdyb3RlOg0KPiA+IA0KPiA+IEknbSByZXBseWluZyB0byBhIHJhdGhlciBvbGQg
bWVzc2FnZSwgYnV0IHRoZSBpc3N1ZSBoYXMganVzdCBub3cNCj4gPiBwb3BwZWQNCj4gPiBiYWNr
IHVwIGFnYWluLg0KPiA+IA0KPiA+IFRvIHJlY2FwLCBhIGNsaWVudCBzdG9wcyBiZWluZyBhYmxl
IHRvIGFjY2VzcyBfYW55XyBtb3VudCBvbiBhDQo+ID4gcGFydGljdWxhciBzZXJ2ZXIsIGFuZCAi
TkZTOiBuZnM0X3JlY2xhaW1fb3Blbl9zdGF0ZTogTG9jayByZWNsYWltDQo+ID4gZmFpbGVkISIg
YXBwZWFycyBzZXZlcmFsIGh1bmRyZWQgdGltZXMgcGVyIHNlY29uZCBpbiB0aGUga2VybmVsDQo+
ID4gbG9nLg0KPiA+IFRoZSBsb2FkIGdvZXMgdXAgYnkgb25lIGZvciBldmVyIHByb2Nlc3MgYXR0
ZW1wdGluZyB0byBhY2Nlc3MgYW55DQo+ID4gbW91bnQNCj4gPiBmcm9tIHRoYXQgcGFydGljdWxh
ciBzZXJ2ZXIuwqDCoE1vdW50cyB0byBvdGhlciBzZXJ2ZXJzIGFyZSBmaW5lLCBhbmQNCj4gPiBv
dGhlciBjbGllbnRzIGNhbiBtb3VudCB0aGluZ3MgZnJvbSB0aGF0IG9uZSBzZXJ2ZXIgd2l0aG91
dA0KPiA+IHByb2JsZW1zLg0KPiA+IA0KPiA+IFdoZW4gSSBraWxsIGV2ZXJ5IHByb2Nlc3Mga2Vl
cGluZyB0aGF0IHBhcnRpY3VsYXIgbW91bnQgYWN0aXZlIGFuZA0KPiA+IHRoZW4NCj4gPiB1bW91
bnQgaXQsIEkgc2VlOg0KPiA+IA0KPiA+IE5GUzogbmZzNF9yZWNsYWltX29wZW5fc3RhdGU6IHVu
aGFuZGxlZCBlcnJvciAtMTAwNjgNCj4gDQo+IE5GUzRFUlJfUkVUUllfVU5DQUNIRURfUkVQLg0K
PiANCj4gU28sIHlvdSdyZSB1c2luZyBORlN2NC4xIG9yIDQuMiwgYW5kIHRoZSBzZXJ2ZXIgdGhp
bmtzIHRoYXQgdGhlDQo+IGNsaWVudA0KPiBoYXMgcmV1c2VkIGEgKHNsb3QsIHNlcXVlbmNlIG51
bWJlcikgcGFpciwgYnV0IHRoZSBzZXJ2ZXIgZG9lc24ndA0KPiBoYXZlIGENCj4gY2FjaGVkIHJl
c3BvbnNlIHRvIHJldHVybi4NCj4gDQo+IEhhcmQgdG8ga25vdyBob3cgdGhhdCBoYXBwZW5lZCwg
YW5kIGl0J3Mgbm90IHNob3duIGluIHRoZSBiZWxvdy4NCj4gU291bmRzIGxpa2UgYSBidWcsIHRo
b3VnaC4NCg0KLi4ub3IgYSBDdHJsLUMuLi4uDQoNCg==


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-11-17 17:45                 ` Trond Myklebust
@ 2016-11-17 19:32                   ` bfields
  2016-11-17 19:58                     ` Olga Kornievskaia
  0 siblings, 1 reply; 27+ messages in thread
From: bfields @ 2016-11-17 19:32 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: tibbs, linux-nfs

On Thu, Nov 17, 2016 at 05:45:52PM +0000, Trond Myklebust wrote:
> On Thu, 2016-11-17 at 11:31 -0500, J. Bruce Fields wrote:
> > On Wed, Nov 16, 2016 at 02:55:05PM -0600, Jason L Tibbitts III wrote:
> > > 
> > > I'm replying to a rather old message, but the issue has just now
> > > popped
> > > back up again.
> > > 
> > > To recap, a client stops being able to access _any_ mount on a
> > > particular server, and "NFS: nfs4_reclaim_open_state: Lock reclaim
> > > failed!" appears several hundred times per second in the kernel
> > > log.
> > > The load goes up by one for ever process attempting to access any
> > > mount
> > > from that particular server.  Mounts to other servers are fine, and
> > > other clients can mount things from that one server without
> > > problems.
> > > 
> > > When I kill every process keeping that particular mount active and
> > > then
> > > umount it, I see:
> > > 
> > > NFS: nfs4_reclaim_open_state: unhandled error -10068
> > 
> > NFS4ERR_RETRY_UNCACHED_REP.
> > 
> > So, you're using NFSv4.1 or 4.2, and the server thinks that the
> > client
> > has reused a (slot, sequence number) pair, but the server doesn't
> > have a
> > cached response to return.
> > 
> > Hard to know how that happened, and it's not shown in the below.
> > Sounds like a bug, though.
> 
> ...or a Ctrl-C....

How does that happen?

--b.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-11-17 19:32                   ` bfields
@ 2016-11-17 19:58                     ` Olga Kornievskaia
  2016-11-17 20:17                       ` bfields
  0 siblings, 1 reply; 27+ messages in thread
From: Olga Kornievskaia @ 2016-11-17 19:58 UTC (permalink / raw)
  To: bfields; +Cc: Trond Myklebust, tibbs, linux-nfs

On Thu, Nov 17, 2016 at 2:32 PM, bfields@fieldses.org
<bfields@fieldses.org> wrote:
> On Thu, Nov 17, 2016 at 05:45:52PM +0000, Trond Myklebust wrote:
>> On Thu, 2016-11-17 at 11:31 -0500, J. Bruce Fields wrote:
>> > On Wed, Nov 16, 2016 at 02:55:05PM -0600, Jason L Tibbitts III wrote:
>> > >
>> > > I'm replying to a rather old message, but the issue has just now
>> > > popped
>> > > back up again.
>> > >
>> > > To recap, a client stops being able to access _any_ mount on a
>> > > particular server, and "NFS: nfs4_reclaim_open_state: Lock reclaim
>> > > failed!" appears several hundred times per second in the kernel
>> > > log.
>> > > The load goes up by one for ever process attempting to access any
>> > > mount
>> > > from that particular server.  Mounts to other servers are fine, and
>> > > other clients can mount things from that one server without
>> > > problems.
>> > >
>> > > When I kill every process keeping that particular mount active and
>> > > then
>> > > umount it, I see:
>> > >
>> > > NFS: nfs4_reclaim_open_state: unhandled error -10068
>> >
>> > NFS4ERR_RETRY_UNCACHED_REP.
>> >
>> > So, you're using NFSv4.1 or 4.2, and the server thinks that the
>> > client
>> > has reused a (slot, sequence number) pair, but the server doesn't
>> > have a
>> > cached response to return.
>> >
>> > Hard to know how that happened, and it's not shown in the below.
>> > Sounds like a bug, though.
>>
>> ...or a Ctrl-C....
>
> How does that happen?
>

If I may chime in...

Bruce, when an application sends a Ctrl-C and clients's session slot
has sent out an RPC but didn't process the reply, the client doesn't
know if the server processed that sequence id or not. In that case,
the client doesn't increment the sequence number. Normally the client
would handle getting such an error by retrying again (and resetting
the slots) but I think during recovery operation the client handles
errors differently (by just erroring). I believe the reasoning that we
don't want to be stuck trying to recover from the recovery from the
recovery etc...

Jason,

The UNCACHED_REP error is really not interesting as it's a consequence
of you having a client that already failed with an error of "unable to
reclaim the locks". I'm surprised that the application doesn't error
at this point with EIO. But that aside, I think I've seen this kind of
behavior due to client't callback channel going down (and not replying
to the CB_RECALLs and then server revoking state).



> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-11-17 19:58                     ` Olga Kornievskaia
@ 2016-11-17 20:17                       ` bfields
  2016-11-17 20:29                         ` Olga Kornievskaia
  0 siblings, 1 reply; 27+ messages in thread
From: bfields @ 2016-11-17 20:17 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: Trond Myklebust, tibbs, linux-nfs

On Thu, Nov 17, 2016 at 02:58:12PM -0500, Olga Kornievskaia wrote:
> On Thu, Nov 17, 2016 at 2:32 PM, bfields@fieldses.org
> <bfields@fieldses.org> wrote:
> > On Thu, Nov 17, 2016 at 05:45:52PM +0000, Trond Myklebust wrote:
> >> On Thu, 2016-11-17 at 11:31 -0500, J. Bruce Fields wrote:
> >> > On Wed, Nov 16, 2016 at 02:55:05PM -0600, Jason L Tibbitts III wrote:
> >> > >
> >> > > I'm replying to a rather old message, but the issue has just now
> >> > > popped
> >> > > back up again.
> >> > >
> >> > > To recap, a client stops being able to access _any_ mount on a
> >> > > particular server, and "NFS: nfs4_reclaim_open_state: Lock reclaim
> >> > > failed!" appears several hundred times per second in the kernel
> >> > > log.
> >> > > The load goes up by one for ever process attempting to access any
> >> > > mount
> >> > > from that particular server.  Mounts to other servers are fine, and
> >> > > other clients can mount things from that one server without
> >> > > problems.
> >> > >
> >> > > When I kill every process keeping that particular mount active and
> >> > > then
> >> > > umount it, I see:
> >> > >
> >> > > NFS: nfs4_reclaim_open_state: unhandled error -10068
> >> >
> >> > NFS4ERR_RETRY_UNCACHED_REP.
> >> >
> >> > So, you're using NFSv4.1 or 4.2, and the server thinks that the
> >> > client
> >> > has reused a (slot, sequence number) pair, but the server doesn't
> >> > have a
> >> > cached response to return.
> >> >
> >> > Hard to know how that happened, and it's not shown in the below.
> >> > Sounds like a bug, though.
> >>
> >> ...or a Ctrl-C....
> >
> > How does that happen?
> >
> 
> If I may chime in...
> 
> Bruce, when an application sends a Ctrl-C and clients's session slot
> has sent out an RPC but didn't process the reply, the client doesn't
> know if the server processed that sequence id or not. In that case,
> the client doesn't increment the sequence number. Normally the client
> would handle getting such an error by retrying again (and resetting
> the slots) but I think during recovery operation the client handles
> errors differently (by just erroring). I believe the reasoning that we
> don't want to be stuck trying to recover from the recovery from the
> recovery etc...

So in that case the client can end up sending a different rpc reusing
the old slot and sequence number?

--b.

> 
> Jason,
> 
> The UNCACHED_REP error is really not interesting as it's a consequence
> of you having a client that already failed with an error of "unable to
> reclaim the locks". I'm surprised that the application doesn't error
> at this point with EIO. But that aside, I think I've seen this kind of
> behavior due to client't callback channel going down (and not replying
> to the CB_RECALLs and then server revoking state).
> 
> 
> 
> > --b.
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-11-17 17:08                 ` Jason L Tibbitts III
@ 2016-11-17 20:22                   ` Andrew W Elble
  0 siblings, 0 replies; 27+ messages in thread
From: Andrew W Elble @ 2016-11-17 20:22 UTC (permalink / raw)
  To: Jason L Tibbitts III; +Cc: J. Bruce Fields, linux-nfs


I've found this extremely useful on clients in tracking down 'lost' delegations.

echo "error != 0" | tee /sys/kernel/debug/tracing/events/nfs4/nfs4_delegreturn_exit/filter

...and then look in here:

cat /sys/kernel/debug/tracing/trace

(YMMV, not sure if this is going to work on your distro, debugfs etc)

There's still work to be done with nfsd4_delegreturn()
and revoked delegations serverside (as well as killing fh_verify() per
Bruce's earlier suggestions)

We've recently seen the server recall a delegation, revoke it, and then have the
client try to return it much later (because of an unknown slowness
issue) -- after the file had been deleted at the server.

Jason L Tibbitts III <tibbs@math.uh.edu> writes:

>>>>>> "JBF" == J Bruce Fields <bfields@fieldses.org> writes:
>
> JBF> So, you're using NFSv4.1 or 4.2, and the server thinks that the
> JBF> client has reused a (slot, sequence number) pair, but the server
> JBF> doesn't have a cached response to return.
>
> Thanks for the reply.  Sadly I don't understand all of it, but...
>
> JBF> Hard to know how that happened, and it's not shown in the below.
> JBF> Sounds like a bug, though.
>
> Yeah, I only found the problem after it was already happening, so
> obviously the beginning of the process is missing.  And sadly it's not
> something I can easily repeat, so short of running some continuous
> package capture (which would be hard since once this starts the traffic
> volume is huge) there's no easy way to see it.
>
> Is there any state on either the client or server that I could inspect
> which might give any hints?  I can add that to my notes in case this
> problem happens again.
>
> JBF> Recent clients will use sec=krb5 for certain state-related
> JBF> operations even if you mount with sec=sys, so it's still possible
> JBF> it could be involved here.
>
> On the server, the involved filesystem isn't exported with any sec=
> options, in case it matters.
>
> JBF> The SEQ4_STATUS_RECALLABLE_STATE_REVOKED flag set in the OPEN
> JBF> replies is also a sign something's gone wrong.  Apparently the
> JBF> server thinks the client has failed to return a delegation.
>
> I can't imagine how that might have happened.  There is nothing else
> NFS-related in the client's log besides the spew and that final line.
> There are some automount complaints about the user accessing directories
> that aren't in the map sources, and the usual random gssproxy noise
> which was fixed in Fedora 24.
>
> Currently the system is stable; it hasn't been rebooted since the
> problem occurred.  Everything cleared up once I was able to unmounted
> the problematic filesystem.
>
>  - J<
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

-- 
Andrew W. Elble
aweits@discipline.rit.edu
Infrastructure Engineer, Communications Technical Lead
Rochester Institute of Technology
PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-11-17 20:17                       ` bfields
@ 2016-11-17 20:29                         ` Olga Kornievskaia
  2016-11-17 20:46                           ` bfields
  0 siblings, 1 reply; 27+ messages in thread
From: Olga Kornievskaia @ 2016-11-17 20:29 UTC (permalink / raw)
  To: bfields; +Cc: Trond Myklebust, tibbs, linux-nfs

On Thu, Nov 17, 2016 at 3:17 PM, bfields@fieldses.org
<bfields@fieldses.org> wrote:
> On Thu, Nov 17, 2016 at 02:58:12PM -0500, Olga Kornievskaia wrote:
>> On Thu, Nov 17, 2016 at 2:32 PM, bfields@fieldses.org
>> <bfields@fieldses.org> wrote:
>> > On Thu, Nov 17, 2016 at 05:45:52PM +0000, Trond Myklebust wrote:
>> >> On Thu, 2016-11-17 at 11:31 -0500, J. Bruce Fields wrote:
>> >> > On Wed, Nov 16, 2016 at 02:55:05PM -0600, Jason L Tibbitts III wrote:
>> >> > >
>> >> > > I'm replying to a rather old message, but the issue has just now
>> >> > > popped
>> >> > > back up again.
>> >> > >
>> >> > > To recap, a client stops being able to access _any_ mount on a
>> >> > > particular server, and "NFS: nfs4_reclaim_open_state: Lock reclaim
>> >> > > failed!" appears several hundred times per second in the kernel
>> >> > > log.
>> >> > > The load goes up by one for ever process attempting to access any
>> >> > > mount
>> >> > > from that particular server.  Mounts to other servers are fine, and
>> >> > > other clients can mount things from that one server without
>> >> > > problems.
>> >> > >
>> >> > > When I kill every process keeping that particular mount active and
>> >> > > then
>> >> > > umount it, I see:
>> >> > >
>> >> > > NFS: nfs4_reclaim_open_state: unhandled error -10068
>> >> >
>> >> > NFS4ERR_RETRY_UNCACHED_REP.
>> >> >
>> >> > So, you're using NFSv4.1 or 4.2, and the server thinks that the
>> >> > client
>> >> > has reused a (slot, sequence number) pair, but the server doesn't
>> >> > have a
>> >> > cached response to return.
>> >> >
>> >> > Hard to know how that happened, and it's not shown in the below.
>> >> > Sounds like a bug, though.
>> >>
>> >> ...or a Ctrl-C....
>> >
>> > How does that happen?
>> >
>>
>> If I may chime in...
>>
>> Bruce, when an application sends a Ctrl-C and clients's session slot
>> has sent out an RPC but didn't process the reply, the client doesn't
>> know if the server processed that sequence id or not. In that case,
>> the client doesn't increment the sequence number. Normally the client
>> would handle getting such an error by retrying again (and resetting
>> the slots) but I think during recovery operation the client handles
>> errors differently (by just erroring). I believe the reasoning that we
>> don't want to be stuck trying to recover from the recovery from the
>> recovery etc...
>
> So in that case the client can end up sending a different rpc reusing
> the old slot and sequence number?

Correct.

>>
>> Jason,
>>
>> The UNCACHED_REP error is really not interesting as it's a consequence
>> of you having a client that already failed with an error of "unable to
>> reclaim the locks". I'm surprised that the application doesn't error
>> at this point with EIO. But that aside, I think I've seen this kind of
>> behavior due to client't callback channel going down (and not replying
>> to the CB_RECALLs and then server revoking state).
>>
>>
>>
>> > --b.
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-11-17 20:29                         ` Olga Kornievskaia
@ 2016-11-17 20:46                           ` bfields
  2016-11-17 21:05                             ` Olga Kornievskaia
  0 siblings, 1 reply; 27+ messages in thread
From: bfields @ 2016-11-17 20:46 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: Trond Myklebust, tibbs, linux-nfs

On Thu, Nov 17, 2016 at 03:29:11PM -0500, Olga Kornievskaia wrote:
> On Thu, Nov 17, 2016 at 3:17 PM, bfields@fieldses.org
> <bfields@fieldses.org> wrote:
> > On Thu, Nov 17, 2016 at 02:58:12PM -0500, Olga Kornievskaia wrote:
> >> On Thu, Nov 17, 2016 at 2:32 PM, bfields@fieldses.org
> >> <bfields@fieldses.org> wrote:
> >> > On Thu, Nov 17, 2016 at 05:45:52PM +0000, Trond Myklebust wrote:
> >> >> On Thu, 2016-11-17 at 11:31 -0500, J. Bruce Fields wrote:
> >> >> > On Wed, Nov 16, 2016 at 02:55:05PM -0600, Jason L Tibbitts III wrote:
> >> >> > >
> >> >> > > I'm replying to a rather old message, but the issue has just now
> >> >> > > popped
> >> >> > > back up again.
> >> >> > >
> >> >> > > To recap, a client stops being able to access _any_ mount on a
> >> >> > > particular server, and "NFS: nfs4_reclaim_open_state: Lock reclaim
> >> >> > > failed!" appears several hundred times per second in the kernel
> >> >> > > log.
> >> >> > > The load goes up by one for ever process attempting to access any
> >> >> > > mount
> >> >> > > from that particular server.  Mounts to other servers are fine, and
> >> >> > > other clients can mount things from that one server without
> >> >> > > problems.
> >> >> > >
> >> >> > > When I kill every process keeping that particular mount active and
> >> >> > > then
> >> >> > > umount it, I see:
> >> >> > >
> >> >> > > NFS: nfs4_reclaim_open_state: unhandled error -10068
> >> >> >
> >> >> > NFS4ERR_RETRY_UNCACHED_REP.
> >> >> >
> >> >> > So, you're using NFSv4.1 or 4.2, and the server thinks that the
> >> >> > client
> >> >> > has reused a (slot, sequence number) pair, but the server doesn't
> >> >> > have a
> >> >> > cached response to return.
> >> >> >
> >> >> > Hard to know how that happened, and it's not shown in the below.
> >> >> > Sounds like a bug, though.
> >> >>
> >> >> ...or a Ctrl-C....
> >> >
> >> > How does that happen?
> >> >
> >>
> >> If I may chime in...
> >>
> >> Bruce, when an application sends a Ctrl-C and clients's session slot
> >> has sent out an RPC but didn't process the reply, the client doesn't
> >> know if the server processed that sequence id or not. In that case,
> >> the client doesn't increment the sequence number. Normally the client
> >> would handle getting such an error by retrying again (and resetting
> >> the slots) but I think during recovery operation the client handles
> >> errors differently (by just erroring). I believe the reasoning that we
> >> don't want to be stuck trying to recover from the recovery from the
> >> recovery etc...
> >
> > So in that case the client can end up sending a different rpc reusing
> > the old slot and sequence number?
> 
> Correct.

So that could get UNCACHED_REP as the response.  But if you're very
unlucky, couldn't this also happen?:

	1) the compound previously sent on that slot was processed by
	the server and cached
	2) the compound you're sending now happens to have the same set
	of operations

with the result that the client doesn't detect that the reply was
actually to some other rpc, and instead it returns bad data to the
application?

--b.

> 
> >>
> >> Jason,
> >>
> >> The UNCACHED_REP error is really not interesting as it's a consequence
> >> of you having a client that already failed with an error of "unable to
> >> reclaim the locks". I'm surprised that the application doesn't error
> >> at this point with EIO. But that aside, I think I've seen this kind of
> >> behavior due to client't callback channel going down (and not replying
> >> to the CB_RECALLs and then server revoking state).
> >>
> >>
> >>
> >> > --b.
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >> > the body of a message to majordomo@vger.kernel.org
> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-11-17 20:46                           ` bfields
@ 2016-11-17 21:05                             ` Olga Kornievskaia
  2016-11-17 21:26                               ` bfields
  0 siblings, 1 reply; 27+ messages in thread
From: Olga Kornievskaia @ 2016-11-17 21:05 UTC (permalink / raw)
  To: bfields; +Cc: Trond Myklebust, tibbs, linux-nfs

On Thu, Nov 17, 2016 at 3:46 PM, bfields@fieldses.org
<bfields@fieldses.org> wrote:
> On Thu, Nov 17, 2016 at 03:29:11PM -0500, Olga Kornievskaia wrote:
>> On Thu, Nov 17, 2016 at 3:17 PM, bfields@fieldses.org
>> <bfields@fieldses.org> wrote:
>> > On Thu, Nov 17, 2016 at 02:58:12PM -0500, Olga Kornievskaia wrote:
>> >> On Thu, Nov 17, 2016 at 2:32 PM, bfields@fieldses.org
>> >> <bfields@fieldses.org> wrote:
>> >> > On Thu, Nov 17, 2016 at 05:45:52PM +0000, Trond Myklebust wrote:
>> >> >> On Thu, 2016-11-17 at 11:31 -0500, J. Bruce Fields wrote:
>> >> >> > On Wed, Nov 16, 2016 at 02:55:05PM -0600, Jason L Tibbitts III wrote:
>> >> >> > >
>> >> >> > > I'm replying to a rather old message, but the issue has just now
>> >> >> > > popped
>> >> >> > > back up again.
>> >> >> > >
>> >> >> > > To recap, a client stops being able to access _any_ mount on a
>> >> >> > > particular server, and "NFS: nfs4_reclaim_open_state: Lock reclaim
>> >> >> > > failed!" appears several hundred times per second in the kernel
>> >> >> > > log.
>> >> >> > > The load goes up by one for ever process attempting to access any
>> >> >> > > mount
>> >> >> > > from that particular server.  Mounts to other servers are fine, and
>> >> >> > > other clients can mount things from that one server without
>> >> >> > > problems.
>> >> >> > >
>> >> >> > > When I kill every process keeping that particular mount active and
>> >> >> > > then
>> >> >> > > umount it, I see:
>> >> >> > >
>> >> >> > > NFS: nfs4_reclaim_open_state: unhandled error -10068
>> >> >> >
>> >> >> > NFS4ERR_RETRY_UNCACHED_REP.
>> >> >> >
>> >> >> > So, you're using NFSv4.1 or 4.2, and the server thinks that the
>> >> >> > client
>> >> >> > has reused a (slot, sequence number) pair, but the server doesn't
>> >> >> > have a
>> >> >> > cached response to return.
>> >> >> >
>> >> >> > Hard to know how that happened, and it's not shown in the below.
>> >> >> > Sounds like a bug, though.
>> >> >>
>> >> >> ...or a Ctrl-C....
>> >> >
>> >> > How does that happen?
>> >> >
>> >>
>> >> If I may chime in...
>> >>
>> >> Bruce, when an application sends a Ctrl-C and clients's session slot
>> >> has sent out an RPC but didn't process the reply, the client doesn't
>> >> know if the server processed that sequence id or not. In that case,
>> >> the client doesn't increment the sequence number. Normally the client
>> >> would handle getting such an error by retrying again (and resetting
>> >> the slots) but I think during recovery operation the client handles
>> >> errors differently (by just erroring). I believe the reasoning that we
>> >> don't want to be stuck trying to recover from the recovery from the
>> >> recovery etc...
>> >
>> > So in that case the client can end up sending a different rpc reusing
>> > the old slot and sequence number?
>>
>> Correct.
>
> So that could get UNCACHED_REP as the response.  But if you're very
> unlucky, couldn't this also happen?:
>
>         1) the compound previously sent on that slot was processed by
>         the server and cached
>         2) the compound you're sending now happens to have the same set
>         of operations
>
> with the result that the client doesn't detect that the reply was
> actually to some other rpc, and instead it returns bad data to the
> application?

If you are sending exactly the same operations and arguments, then why
is a reply from the cache would lead to bad data?

>
> --b.
>
>>
>> >>
>> >> Jason,
>> >>
>> >> The UNCACHED_REP error is really not interesting as it's a consequence
>> >> of you having a client that already failed with an error of "unable to
>> >> reclaim the locks". I'm surprised that the application doesn't error
>> >> at this point with EIO. But that aside, I think I've seen this kind of
>> >> behavior due to client't callback channel going down (and not replying
>> >> to the CB_RECALLs and then server revoking state).
>> >>
>> >>
>> >>
>> >> > --b.
>> >> > --
>> >> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> >> > the body of a message to majordomo@vger.kernel.org
>> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-11-17 21:05                             ` Olga Kornievskaia
@ 2016-11-17 21:26                               ` bfields
  2016-11-17 21:45                                 ` Trond Myklebust
  0 siblings, 1 reply; 27+ messages in thread
From: bfields @ 2016-11-17 21:26 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: Trond Myklebust, tibbs, linux-nfs

On Thu, Nov 17, 2016 at 04:05:32PM -0500, Olga Kornievskaia wrote:
> On Thu, Nov 17, 2016 at 3:46 PM, bfields@fieldses.org
> <bfields@fieldses.org> wrote:
> > On Thu, Nov 17, 2016 at 03:29:11PM -0500, Olga Kornievskaia wrote:
> >> On Thu, Nov 17, 2016 at 3:17 PM, bfields@fieldses.org
> >> <bfields@fieldses.org> wrote:
> >> > On Thu, Nov 17, 2016 at 02:58:12PM -0500, Olga Kornievskaia wrote:
> >> >> On Thu, Nov 17, 2016 at 2:32 PM, bfields@fieldses.org
> >> >> <bfields@fieldses.org> wrote:
> >> >> > On Thu, Nov 17, 2016 at 05:45:52PM +0000, Trond Myklebust wrote:
> >> >> >> On Thu, 2016-11-17 at 11:31 -0500, J. Bruce Fields wrote:
> >> >> >> > On Wed, Nov 16, 2016 at 02:55:05PM -0600, Jason L Tibbitts III wrote:
> >> >> >> > >
> >> >> >> > > I'm replying to a rather old message, but the issue has just now
> >> >> >> > > popped
> >> >> >> > > back up again.
> >> >> >> > >
> >> >> >> > > To recap, a client stops being able to access _any_ mount on a
> >> >> >> > > particular server, and "NFS: nfs4_reclaim_open_state: Lock reclaim
> >> >> >> > > failed!" appears several hundred times per second in the kernel
> >> >> >> > > log.
> >> >> >> > > The load goes up by one for ever process attempting to access any
> >> >> >> > > mount
> >> >> >> > > from that particular server.  Mounts to other servers are fine, and
> >> >> >> > > other clients can mount things from that one server without
> >> >> >> > > problems.
> >> >> >> > >
> >> >> >> > > When I kill every process keeping that particular mount active and
> >> >> >> > > then
> >> >> >> > > umount it, I see:
> >> >> >> > >
> >> >> >> > > NFS: nfs4_reclaim_open_state: unhandled error -10068
> >> >> >> >
> >> >> >> > NFS4ERR_RETRY_UNCACHED_REP.
> >> >> >> >
> >> >> >> > So, you're using NFSv4.1 or 4.2, and the server thinks that the
> >> >> >> > client
> >> >> >> > has reused a (slot, sequence number) pair, but the server doesn't
> >> >> >> > have a
> >> >> >> > cached response to return.
> >> >> >> >
> >> >> >> > Hard to know how that happened, and it's not shown in the below.
> >> >> >> > Sounds like a bug, though.
> >> >> >>
> >> >> >> ...or a Ctrl-C....
> >> >> >
> >> >> > How does that happen?
> >> >> >
> >> >>
> >> >> If I may chime in...
> >> >>
> >> >> Bruce, when an application sends a Ctrl-C and clients's session slot
> >> >> has sent out an RPC but didn't process the reply, the client doesn't
> >> >> know if the server processed that sequence id or not. In that case,
> >> >> the client doesn't increment the sequence number. Normally the client
> >> >> would handle getting such an error by retrying again (and resetting
> >> >> the slots) but I think during recovery operation the client handles
> >> >> errors differently (by just erroring). I believe the reasoning that we
> >> >> don't want to be stuck trying to recover from the recovery from the
> >> >> recovery etc...
> >> >
> >> > So in that case the client can end up sending a different rpc reusing
> >> > the old slot and sequence number?
> >>
> >> Correct.
> >
> > So that could get UNCACHED_REP as the response.  But if you're very
> > unlucky, couldn't this also happen?:
> >
> >         1) the compound previously sent on that slot was processed by
> >         the server and cached
> >         2) the compound you're sending now happens to have the same set
> >         of operations
> >
> > with the result that the client doesn't detect that the reply was
> > actually to some other rpc, and instead it returns bad data to the
> > application?
> 
> If you are sending exactly the same operations and arguments, then why
> is a reply from the cache would lead to bad data?

That would probably be fine, I was wondering what would happen if you
sent the same operation but different arguments.

So the original cancelled operation is something like
PUTFH(fh1)+OPEN("foo")+GETFH, and the new one is
PUTFH(fh2)+OPEN("bar")+GETFH.  In theory couldn't the second one succeed
and leave the client thinking it had opened (fh2, bar) when the
filehandle it got back was really for (fh1, foo)?

--b.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-11-17 21:26                               ` bfields
@ 2016-11-17 21:45                                 ` Trond Myklebust
  2016-11-17 21:53                                   ` Olga Kornievskaia
  0 siblings, 1 reply; 27+ messages in thread
From: Trond Myklebust @ 2016-11-17 21:45 UTC (permalink / raw)
  To: bfields, aglo; +Cc: tibbs, linux-nfs

T24gVGh1LCAyMDE2LTExLTE3IGF0IDE2OjI2IC0wNTAwLCBiZmllbGRzQGZpZWxkc2VzLm9yZyB3
cm90ZToNCj4gT24gVGh1LCBOb3YgMTcsIDIwMTYgYXQgMDQ6MDU6MzJQTSAtMDUwMCwgT2xnYSBL
b3JuaWV2c2thaWEgd3JvdGU6DQo+ID4gDQo+ID4gT24gVGh1LCBOb3YgMTcsIDIwMTYgYXQgMzo0
NiBQTSwgYmZpZWxkc0BmaWVsZHNlcy5vcmcNCj4gPiA8YmZpZWxkc0BmaWVsZHNlcy5vcmc+IHdy
b3RlOg0KPiA+ID4gDQo+ID4gPiBPbiBUaHUsIE5vdiAxNywgMjAxNiBhdCAwMzoyOToxMVBNIC0w
NTAwLCBPbGdhIEtvcm5pZXZza2FpYQ0KPiA+ID4gd3JvdGU6DQo+ID4gPiA+IA0KPiA+ID4gPiBP
biBUaHUsIE5vdiAxNywgMjAxNiBhdCAzOjE3IFBNLCBiZmllbGRzQGZpZWxkc2VzLm9yZw0KPiA+
ID4gPiA8YmZpZWxkc0BmaWVsZHNlcy5vcmc+IHdyb3RlOg0KPiA+ID4gPiA+IA0KPiA+ID4gPiA+
IE9uIFRodSwgTm92IDE3LCAyMDE2IGF0IDAyOjU4OjEyUE0gLTA1MDAsIE9sZ2EgS29ybmlldnNr
YWlhDQo+ID4gPiA+ID4gd3JvdGU6DQo+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+IE9uIFRodSwg
Tm92IDE3LCAyMDE2IGF0IDI6MzIgUE0sIGJmaWVsZHNAZmllbGRzZXMub3JnDQo+ID4gPiA+ID4g
PiA8YmZpZWxkc0BmaWVsZHNlcy5vcmc+IHdyb3RlOg0KPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4g
PiA+ID4gT24gVGh1LCBOb3YgMTcsIDIwMTYgYXQgMDU6NDU6NTJQTSArMDAwMCwgVHJvbmQgTXlr
bGVidXN0DQo+ID4gPiA+ID4gPiA+IHdyb3RlOg0KPiA+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+
ID4gPiA+IE9uIFRodSwgMjAxNi0xMS0xNyBhdCAxMTozMSAtMDUwMCwgSi4gQnJ1Y2UgRmllbGRz
DQo+ID4gPiA+ID4gPiA+ID4gd3JvdGU6DQo+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+
ID4gPiA+IE9uIFdlZCwgTm92IDE2LCAyMDE2IGF0IDAyOjU1OjA1UE0gLTA2MDAsIEphc29uIEwN
Cj4gPiA+ID4gPiA+ID4gPiA+IFRpYmJpdHRzIElJSSB3cm90ZToNCj4gPiA+ID4gPiA+ID4gPiA+
ID4gDQo+ID4gPiA+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiA+ID4gPiBJJ20gcmVwbHlp
bmcgdG8gYSByYXRoZXIgb2xkIG1lc3NhZ2UsIGJ1dCB0aGUgaXNzdWUNCj4gPiA+ID4gPiA+ID4g
PiA+ID4gaGFzIGp1c3Qgbm93DQo+ID4gPiA+ID4gPiA+ID4gPiA+IHBvcHBlZA0KPiA+ID4gPiA+
ID4gPiA+ID4gPiBiYWNrIHVwIGFnYWluLg0KPiA+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4g
PiA+ID4gPiA+ID4gVG8gcmVjYXAsIGEgY2xpZW50IHN0b3BzIGJlaW5nIGFibGUgdG8gYWNjZXNz
IF9hbnlfDQo+ID4gPiA+ID4gPiA+ID4gPiA+IG1vdW50IG9uIGENCj4gPiA+ID4gPiA+ID4gPiA+
ID4gcGFydGljdWxhciBzZXJ2ZXIsIGFuZCAiTkZTOg0KPiA+ID4gPiA+ID4gPiA+ID4gPiBuZnM0
X3JlY2xhaW1fb3Blbl9zdGF0ZTogTG9jayByZWNsYWltDQo+ID4gPiA+ID4gPiA+ID4gPiA+IGZh
aWxlZCEiIGFwcGVhcnMgc2V2ZXJhbCBodW5kcmVkIHRpbWVzIHBlciBzZWNvbmQNCj4gPiA+ID4g
PiA+ID4gPiA+ID4gaW4gdGhlIGtlcm5lbA0KPiA+ID4gPiA+ID4gPiA+ID4gPiBsb2cuDQo+ID4g
PiA+ID4gPiA+ID4gPiA+IFRoZSBsb2FkIGdvZXMgdXAgYnkgb25lIGZvciBldmVyIHByb2Nlc3Mg
YXR0ZW1wdGluZw0KPiA+ID4gPiA+ID4gPiA+ID4gPiB0byBhY2Nlc3MgYW55DQo+ID4gPiA+ID4g
PiA+ID4gPiA+IG1vdW50DQo+ID4gPiA+ID4gPiA+ID4gPiA+IGZyb20gdGhhdCBwYXJ0aWN1bGFy
IHNlcnZlci7CoMKgTW91bnRzIHRvIG90aGVyDQo+ID4gPiA+ID4gPiA+ID4gPiA+IHNlcnZlcnMg
YXJlIGZpbmUsIGFuZA0KPiA+ID4gPiA+ID4gPiA+ID4gPiBvdGhlciBjbGllbnRzIGNhbiBtb3Vu
dCB0aGluZ3MgZnJvbSB0aGF0IG9uZSBzZXJ2ZXINCj4gPiA+ID4gPiA+ID4gPiA+ID4gd2l0aG91
dA0KPiA+ID4gPiA+ID4gPiA+ID4gPiBwcm9ibGVtcy4NCj4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+
ID4gPiA+ID4gPiA+ID4gPiA+IFdoZW4gSSBraWxsIGV2ZXJ5IHByb2Nlc3Mga2VlcGluZyB0aGF0
IHBhcnRpY3VsYXINCj4gPiA+ID4gPiA+ID4gPiA+ID4gbW91bnQgYWN0aXZlIGFuZA0KPiA+ID4g
PiA+ID4gPiA+ID4gPiB0aGVuDQo+ID4gPiA+ID4gPiA+ID4gPiA+IHVtb3VudCBpdCwgSSBzZWU6
DQo+ID4gPiA+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiA+ID4gPiBORlM6IG5mczRfcmVj
bGFpbV9vcGVuX3N0YXRlOiB1bmhhbmRsZWQgZXJyb3INCj4gPiA+ID4gPiA+ID4gPiA+ID4gLTEw
MDY4DQo+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+IE5GUzRFUlJfUkVUUllf
VU5DQUNIRURfUkVQLg0KPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gPiBTbywg
eW91J3JlIHVzaW5nIE5GU3Y0LjEgb3IgNC4yLCBhbmQgdGhlIHNlcnZlcg0KPiA+ID4gPiA+ID4g
PiA+ID4gdGhpbmtzIHRoYXQgdGhlDQo+ID4gPiA+ID4gPiA+ID4gPiBjbGllbnQNCj4gPiA+ID4g
PiA+ID4gPiA+IGhhcyByZXVzZWQgYSAoc2xvdCwgc2VxdWVuY2UgbnVtYmVyKSBwYWlyLCBidXQg
dGhlDQo+ID4gPiA+ID4gPiA+ID4gPiBzZXJ2ZXIgZG9lc24ndA0KPiA+ID4gPiA+ID4gPiA+ID4g
aGF2ZSBhDQo+ID4gPiA+ID4gPiA+ID4gPiBjYWNoZWQgcmVzcG9uc2UgdG8gcmV0dXJuLg0KPiA+
ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gPiBIYXJkIHRvIGtub3cgaG93IHRoYXQg
aGFwcGVuZWQsIGFuZCBpdCdzIG5vdCBzaG93biBpbg0KPiA+ID4gPiA+ID4gPiA+ID4gdGhlIGJl
bG93Lg0KPiA+ID4gPiA+ID4gPiA+ID4gU291bmRzIGxpa2UgYSBidWcsIHRob3VnaC4NCj4gPiA+
ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiAuLi5vciBhIEN0cmwtQy4uLi4NCj4gPiA+ID4g
PiA+ID4gDQo+ID4gPiA+ID4gPiA+IEhvdyBkb2VzIHRoYXQgaGFwcGVuPw0KPiA+ID4gPiA+ID4g
PiANCj4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gSWYgSSBtYXkgY2hpbWUgaW4uLi4NCj4gPiA+
ID4gPiA+IA0KPiA+ID4gPiA+ID4gQnJ1Y2UsIHdoZW4gYW4gYXBwbGljYXRpb24gc2VuZHMgYSBD
dHJsLUMgYW5kIGNsaWVudHMncw0KPiA+ID4gPiA+ID4gc2Vzc2lvbiBzbG90DQo+ID4gPiA+ID4g
PiBoYXMgc2VudCBvdXQgYW4gUlBDIGJ1dCBkaWRuJ3QgcHJvY2VzcyB0aGUgcmVwbHksIHRoZQ0K
PiA+ID4gPiA+ID4gY2xpZW50IGRvZXNuJ3QNCj4gPiA+ID4gPiA+IGtub3cgaWYgdGhlIHNlcnZl
ciBwcm9jZXNzZWQgdGhhdCBzZXF1ZW5jZSBpZCBvciBub3QuIEluDQo+ID4gPiA+ID4gPiB0aGF0
IGNhc2UsDQo+ID4gPiA+ID4gPiB0aGUgY2xpZW50IGRvZXNuJ3QgaW5jcmVtZW50IHRoZSBzZXF1
ZW5jZSBudW1iZXIuIE5vcm1hbGx5DQo+ID4gPiA+ID4gPiB0aGUgY2xpZW50DQo+ID4gPiA+ID4g
PiB3b3VsZCBoYW5kbGUgZ2V0dGluZyBzdWNoIGFuIGVycm9yIGJ5IHJldHJ5aW5nIGFnYWluIChh
bmQNCj4gPiA+ID4gPiA+IHJlc2V0dGluZw0KPiA+ID4gPiA+ID4gdGhlIHNsb3RzKSBidXQgSSB0
aGluayBkdXJpbmcgcmVjb3Zlcnkgb3BlcmF0aW9uIHRoZSBjbGllbnQNCj4gPiA+ID4gPiA+IGhh
bmRsZXMNCj4gPiA+ID4gPiA+IGVycm9ycyBkaWZmZXJlbnRseSAoYnkganVzdCBlcnJvcmluZyku
IEkgYmVsaWV2ZSB0aGUNCj4gPiA+ID4gPiA+IHJlYXNvbmluZyB0aGF0IHdlDQo+ID4gPiA+ID4g
PiBkb24ndCB3YW50IHRvIGJlIHN0dWNrIHRyeWluZyB0byByZWNvdmVyIGZyb20gdGhlIHJlY292
ZXJ5DQo+ID4gPiA+ID4gPiBmcm9tIHRoZQ0KPiA+ID4gPiA+ID4gcmVjb3ZlcnkgZXRjLi4uDQo+
ID4gPiA+ID4gDQo+ID4gPiA+ID4gU28gaW4gdGhhdCBjYXNlIHRoZSBjbGllbnQgY2FuIGVuZCB1
cCBzZW5kaW5nIGEgZGlmZmVyZW50IHJwYw0KPiA+ID4gPiA+IHJldXNpbmcNCj4gPiA+ID4gPiB0
aGUgb2xkIHNsb3QgYW5kIHNlcXVlbmNlIG51bWJlcj8NCj4gPiA+ID4gDQo+ID4gPiA+IENvcnJl
Y3QuDQo+ID4gPiANCj4gPiA+IFNvIHRoYXQgY291bGQgZ2V0IFVOQ0FDSEVEX1JFUCBhcyB0aGUg
cmVzcG9uc2UuwqDCoEJ1dCBpZiB5b3UncmUNCj4gPiA+IHZlcnkNCj4gPiA+IHVubHVja3ksIGNv
dWxkbid0IHRoaXMgYWxzbyBoYXBwZW4/Og0KPiA+ID4gDQo+ID4gPiDCoMKgwqDCoMKgwqDCoMKg
MSkgdGhlIGNvbXBvdW5kIHByZXZpb3VzbHkgc2VudCBvbiB0aGF0IHNsb3Qgd2FzDQo+ID4gPiBw
cm9jZXNzZWQgYnkNCj4gPiA+IMKgwqDCoMKgwqDCoMKgwqB0aGUgc2VydmVyIGFuZCBjYWNoZWQN
Cj4gPiA+IMKgwqDCoMKgwqDCoMKgwqAyKSB0aGUgY29tcG91bmQgeW91J3JlIHNlbmRpbmcgbm93
IGhhcHBlbnMgdG8gaGF2ZSB0aGUNCj4gPiA+IHNhbWUgc2V0DQo+ID4gPiDCoMKgwqDCoMKgwqDC
oMKgb2Ygb3BlcmF0aW9ucw0KPiA+ID4gDQo+ID4gPiB3aXRoIHRoZSByZXN1bHQgdGhhdCB0aGUg
Y2xpZW50IGRvZXNuJ3QgZGV0ZWN0IHRoYXQgdGhlIHJlcGx5IHdhcw0KPiA+ID4gYWN0dWFsbHkg
dG8gc29tZSBvdGhlciBycGMsIGFuZCBpbnN0ZWFkIGl0IHJldHVybnMgYmFkIGRhdGEgdG8NCj4g
PiA+IHRoZQ0KPiA+ID4gYXBwbGljYXRpb24/DQo+ID4gDQo+ID4gSWYgeW91IGFyZSBzZW5kaW5n
IGV4YWN0bHkgdGhlIHNhbWUgb3BlcmF0aW9ucyBhbmQgYXJndW1lbnRzLCB0aGVuDQo+ID4gd2h5
DQo+ID4gaXMgYSByZXBseSBmcm9tIHRoZSBjYWNoZSB3b3VsZCBsZWFkIHRvIGJhZCBkYXRhPw0K
PiANCj4gVGhhdCB3b3VsZCBwcm9iYWJseSBiZSBmaW5lLCBJIHdhcyB3b25kZXJpbmcgd2hhdCB3
b3VsZCBoYXBwZW4gaWYgeW91DQo+IHNlbnQgdGhlIHNhbWUgb3BlcmF0aW9uIGJ1dCBkaWZmZXJl
bnQgYXJndW1lbnRzLg0KDQo+IFNvIHRoZSBvcmlnaW5hbCBjYW5jZWxsZWQgb3BlcmF0aW9uIGlz
IHNvbWV0aGluZyBsaWtlDQo+IFBVVEZIKGZoMSkrT1BFTigiZm9vIikrR0VURkgsIGFuZCB0aGUg
bmV3IG9uZSBpcw0KPiBQVVRGSChmaDIpK09QRU4oImJhciIpK0dFVEZILsKgwqBJbiB0aGVvcnkg
Y291bGRuJ3QgdGhlIHNlY29uZCBvbmUNCj4gc3VjY2VlZA0KPiBhbmQgbGVhdmUgdGhlIGNsaWVu
dCB0aGlua2luZyBpdCBoYWQgb3BlbmVkIChmaDIsIGJhcikgd2hlbiB0aGUNCj4gZmlsZWhhbmRs
ZSBpdCBnb3QgYmFjayB3YXMgcmVhbGx5IGZvciAoZmgxLCBmb28pPw0KPsKgDQoNClRoZSBjbGll
bnQgd291bGQgcmVjZWl2ZSBhIGZpbGVoYW5kbGUgZm9yIGZoMS8iZm9vIiwgc28gaXQgd291bGQg
YXBwbHkNCmFueSBzdGF0ZSBpdCB0aG91Z2h0IGl0IGhhZCByZWNlaXZlZCB0byB0aGF0IGZpbGUu
IEhvd2V2ZXIsIG5vcm1hbGx5LA0KSSdkIGV4cGVjdCB0byBzZWUgYSBORlM0RVJSX0ZBTFNFX1JF
VFJZIGluIHRoaXMgY2FzZS4=


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-11-17 21:45                                 ` Trond Myklebust
@ 2016-11-17 21:53                                   ` Olga Kornievskaia
  2016-11-17 22:15                                     ` Trond Myklebust
  0 siblings, 1 reply; 27+ messages in thread
From: Olga Kornievskaia @ 2016-11-17 21:53 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: bfields, tibbs, linux-nfs

On Thu, Nov 17, 2016 at 4:45 PM, Trond Myklebust
<trondmy@primarydata.com> wrote:
> On Thu, 2016-11-17 at 16:26 -0500, bfields@fieldses.org wrote:
>> On Thu, Nov 17, 2016 at 04:05:32PM -0500, Olga Kornievskaia wrote:
>> >
>> > On Thu, Nov 17, 2016 at 3:46 PM, bfields@fieldses.org
>> > <bfields@fieldses.org> wrote:
>> > >
>> > > On Thu, Nov 17, 2016 at 03:29:11PM -0500, Olga Kornievskaia
>> > > wrote:
>> > > >
>> > > > On Thu, Nov 17, 2016 at 3:17 PM, bfields@fieldses.org
>> > > > <bfields@fieldses.org> wrote:
>> > > > >
>> > > > > On Thu, Nov 17, 2016 at 02:58:12PM -0500, Olga Kornievskaia
>> > > > > wrote:
>> > > > > >
>> > > > > > On Thu, Nov 17, 2016 at 2:32 PM, bfields@fieldses.org
>> > > > > > <bfields@fieldses.org> wrote:
>> > > > > > >
>> > > > > > > On Thu, Nov 17, 2016 at 05:45:52PM +0000, Trond Myklebust
>> > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > On Thu, 2016-11-17 at 11:31 -0500, J. Bruce Fields
>> > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > On Wed, Nov 16, 2016 at 02:55:05PM -0600, Jason L
>> > > > > > > > > Tibbitts III wrote:
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > I'm replying to a rather old message, but the issue
>> > > > > > > > > > has just now
>> > > > > > > > > > popped
>> > > > > > > > > > back up again.
>> > > > > > > > > >
>> > > > > > > > > > To recap, a client stops being able to access _any_
>> > > > > > > > > > mount on a
>> > > > > > > > > > particular server, and "NFS:
>> > > > > > > > > > nfs4_reclaim_open_state: Lock reclaim
>> > > > > > > > > > failed!" appears several hundred times per second
>> > > > > > > > > > in the kernel
>> > > > > > > > > > log.
>> > > > > > > > > > The load goes up by one for ever process attempting
>> > > > > > > > > > to access any
>> > > > > > > > > > mount
>> > > > > > > > > > from that particular server.  Mounts to other
>> > > > > > > > > > servers are fine, and
>> > > > > > > > > > other clients can mount things from that one server
>> > > > > > > > > > without
>> > > > > > > > > > problems.
>> > > > > > > > > >
>> > > > > > > > > > When I kill every process keeping that particular
>> > > > > > > > > > mount active and
>> > > > > > > > > > then
>> > > > > > > > > > umount it, I see:
>> > > > > > > > > >
>> > > > > > > > > > NFS: nfs4_reclaim_open_state: unhandled error
>> > > > > > > > > > -10068
>> > > > > > > > >
>> > > > > > > > > NFS4ERR_RETRY_UNCACHED_REP.
>> > > > > > > > >
>> > > > > > > > > So, you're using NFSv4.1 or 4.2, and the server
>> > > > > > > > > thinks that the
>> > > > > > > > > client
>> > > > > > > > > has reused a (slot, sequence number) pair, but the
>> > > > > > > > > server doesn't
>> > > > > > > > > have a
>> > > > > > > > > cached response to return.
>> > > > > > > > >
>> > > > > > > > > Hard to know how that happened, and it's not shown in
>> > > > > > > > > the below.
>> > > > > > > > > Sounds like a bug, though.
>> > > > > > > >
>> > > > > > > > ...or a Ctrl-C....
>> > > > > > >
>> > > > > > > How does that happen?
>> > > > > > >
>> > > > > >
>> > > > > > If I may chime in...
>> > > > > >
>> > > > > > Bruce, when an application sends a Ctrl-C and clients's
>> > > > > > session slot
>> > > > > > has sent out an RPC but didn't process the reply, the
>> > > > > > client doesn't
>> > > > > > know if the server processed that sequence id or not. In
>> > > > > > that case,
>> > > > > > the client doesn't increment the sequence number. Normally
>> > > > > > the client
>> > > > > > would handle getting such an error by retrying again (and
>> > > > > > resetting
>> > > > > > the slots) but I think during recovery operation the client
>> > > > > > handles
>> > > > > > errors differently (by just erroring). I believe the
>> > > > > > reasoning that we
>> > > > > > don't want to be stuck trying to recover from the recovery
>> > > > > > from the
>> > > > > > recovery etc...
>> > > > >
>> > > > > So in that case the client can end up sending a different rpc
>> > > > > reusing
>> > > > > the old slot and sequence number?
>> > > >
>> > > > Correct.
>> > >
>> > > So that could get UNCACHED_REP as the response.  But if you're
>> > > very
>> > > unlucky, couldn't this also happen?:
>> > >
>> > >         1) the compound previously sent on that slot was
>> > > processed by
>> > >         the server and cached
>> > >         2) the compound you're sending now happens to have the
>> > > same set
>> > >         of operations
>> > >
>> > > with the result that the client doesn't detect that the reply was
>> > > actually to some other rpc, and instead it returns bad data to
>> > > the
>> > > application?
>> >
>> > If you are sending exactly the same operations and arguments, then
>> > why
>> > is a reply from the cache would lead to bad data?
>>
>> That would probably be fine, I was wondering what would happen if you
>> sent the same operation but different arguments.
>
>> So the original cancelled operation is something like
>> PUTFH(fh1)+OPEN("foo")+GETFH, and the new one is
>> PUTFH(fh2)+OPEN("bar")+GETFH.  In theory couldn't the second one
>> succeed
>> and leave the client thinking it had opened (fh2, bar) when the
>> filehandle it got back was really for (fh1, foo)?
>>
>
> The client would receive a filehandle for fh1/"foo", so it would apply
> any state it thought it had received to that file. However, normally,
> I'd expect to see a NFS4ERR_FALSE_RETRY in this case.

I see Bruce's point that if the server only looks up the cache based
on the seqid and slot# and doesn't have like a hash of the content
which I could see is expensive, then the client in this case would get
into trouble of thinking it opened "bar" but really it's "foo". Spec
says:

Section 18.46.3
If the client reuses a slot ID and sequence ID for a completely
   different request, the server MAY treat the request as if it is a
   retry of what it has already executed.  The server MAY however detect
   the client's illegal reuse and return NFS4ERR_SEQ_FALSE_RETRY.

What is "a completely different request". From the client's point of
view sending different args would constitute a different request. But
in any case it's a "MAY" so client can't depend on this being
implemented.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-11-17 21:53                                   ` Olga Kornievskaia
@ 2016-11-17 22:15                                     ` Trond Myklebust
  2016-11-17 22:27                                       ` Olga Kornievskaia
  0 siblings, 1 reply; 27+ messages in thread
From: Trond Myklebust @ 2016-11-17 22:15 UTC (permalink / raw)
  To: aglo; +Cc: bfields, tibbs, linux-nfs

T24gVGh1LCAyMDE2LTExLTE3IGF0IDE2OjUzIC0wNTAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90
ZToNCj4gT24gVGh1LCBOb3YgMTcsIDIwMTYgYXQgNDo0NSBQTSwgVHJvbmQgTXlrbGVidXN0DQo+
IDx0cm9uZG15QHByaW1hcnlkYXRhLmNvbT4gd3JvdGU6DQo+ID4gDQo+ID4gT24gVGh1LCAyMDE2
LTExLTE3IGF0IDE2OjI2IC0wNTAwLCBiZmllbGRzQGZpZWxkc2VzLm9yZyB3cm90ZToNCj4gPiA+
IA0KPiA+ID4gT24gVGh1LCBOb3YgMTcsIDIwMTYgYXQgMDQ6MDU6MzJQTSAtMDUwMCwgT2xnYSBL
b3JuaWV2c2thaWENCj4gPiA+IHdyb3RlOg0KPiA+ID4gPiANCj4gPiA+ID4gDQo+ID4gPiA+IE9u
IFRodSwgTm92IDE3LCAyMDE2IGF0IDM6NDYgUE0sIGJmaWVsZHNAZmllbGRzZXMub3JnDQo+ID4g
PiA+IDxiZmllbGRzQGZpZWxkc2VzLm9yZz4gd3JvdGU6DQo+ID4gPiA+ID4gDQo+ID4gPiA+ID4g
DQo+ID4gPiA+ID4gT24gVGh1LCBOb3YgMTcsIDIwMTYgYXQgMDM6Mjk6MTFQTSAtMDUwMCwgT2xn
YSBLb3JuaWV2c2thaWENCj4gPiA+ID4gPiB3cm90ZToNCj4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+
ID4gDQo+ID4gPiA+ID4gPiBPbiBUaHUsIE5vdiAxNywgMjAxNiBhdCAzOjE3IFBNLCBiZmllbGRz
QGZpZWxkc2VzLm9yZw0KPiA+ID4gPiA+ID4gPGJmaWVsZHNAZmllbGRzZXMub3JnPiB3cm90ZToN
Cj4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiBPbiBUaHUsIE5v
diAxNywgMjAxNiBhdCAwMjo1ODoxMlBNIC0wNTAwLCBPbGdhDQo+ID4gPiA+ID4gPiA+IEtvcm5p
ZXZza2FpYQ0KPiA+ID4gPiA+ID4gPiB3cm90ZToNCj4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4g
PiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiBPbiBUaHUsIE5vdiAxNywgMjAxNiBhdCAyOjMyIFBN
LCBiZmllbGRzQGZpZWxkc2VzLm9yZw0KPiA+ID4gPiA+ID4gPiA+IDxiZmllbGRzQGZpZWxkc2Vz
Lm9yZz4gd3JvdGU6DQo+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+IA0KPiA+
ID4gPiA+ID4gPiA+ID4gT24gVGh1LCBOb3YgMTcsIDIwMTYgYXQgMDU6NDU6NTJQTSArMDAwMCwg
VHJvbmQNCj4gPiA+ID4gPiA+ID4gPiA+IE15a2xlYnVzdA0KPiA+ID4gPiA+ID4gPiA+ID4gd3Jv
dGU6DQo+ID4gPiA+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4g
PiA+ID4gPiA+ID4gT24gVGh1LCAyMDE2LTExLTE3IGF0IDExOjMxIC0wNTAwLCBKLiBCcnVjZSBG
aWVsZHMNCj4gPiA+ID4gPiA+ID4gPiA+ID4gd3JvdGU6DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4g
DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gT24gV2VkLCBO
b3YgMTYsIDIwMTYgYXQgMDI6NTU6MDVQTSAtMDYwMCwgSmFzb24gTA0KPiA+ID4gPiA+ID4gPiA+
ID4gPiA+IFRpYmJpdHRzIElJSSB3cm90ZToNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IA0KPiA+
ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4g
PiA+ID4gPiA+ID4gPiA+IEknbSByZXBseWluZyB0byBhIHJhdGhlciBvbGQgbWVzc2FnZSwgYnV0
IHRoZQ0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gaXNzdWUNCj4gPiA+ID4gPiA+ID4gPiA+ID4g
PiA+IGhhcyBqdXN0IG5vdw0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gcG9wcGVkDQo+ID4gPiA+
ID4gPiA+ID4gPiA+ID4gPiBiYWNrIHVwIGFnYWluLg0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4g
DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBUbyByZWNhcCwgYSBjbGllbnQgc3RvcHMgYmVpbmcg
YWJsZSB0byBhY2Nlc3MNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IF9hbnlfDQo+ID4gPiA+ID4g
PiA+ID4gPiA+ID4gPiBtb3VudCBvbiBhDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBwYXJ0aWN1
bGFyIHNlcnZlciwgYW5kICJORlM6DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBuZnM0X3JlY2xh
aW1fb3Blbl9zdGF0ZTogTG9jayByZWNsYWltDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBmYWls
ZWQhIiBhcHBlYXJzIHNldmVyYWwgaHVuZHJlZCB0aW1lcyBwZXINCj4gPiA+ID4gPiA+ID4gPiA+
ID4gPiA+IHNlY29uZA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gaW4gdGhlIGtlcm5lbA0KPiA+
ID4gPiA+ID4gPiA+ID4gPiA+ID4gbG9nLg0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gVGhlIGxv
YWQgZ29lcyB1cCBieSBvbmUgZm9yIGV2ZXIgcHJvY2Vzcw0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+
ID4gYXR0ZW1wdGluZw0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gdG8gYWNjZXNzIGFueQ0KPiA+
ID4gPiA+ID4gPiA+ID4gPiA+ID4gbW91bnQNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IGZyb20g
dGhhdCBwYXJ0aWN1bGFyIHNlcnZlci7CoMKgTW91bnRzIHRvIG90aGVyDQo+ID4gPiA+ID4gPiA+
ID4gPiA+ID4gPiBzZXJ2ZXJzIGFyZSBmaW5lLCBhbmQNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+
IG90aGVyIGNsaWVudHMgY2FuIG1vdW50IHRoaW5ncyBmcm9tIHRoYXQgb25lDQo+ID4gPiA+ID4g
PiA+ID4gPiA+ID4gPiBzZXJ2ZXINCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IHdpdGhvdXQNCj4g
PiA+ID4gPiA+ID4gPiA+ID4gPiA+IHByb2JsZW1zLg0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4g
DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBXaGVuIEkga2lsbCBldmVyeSBwcm9jZXNzIGtlZXBp
bmcgdGhhdA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gcGFydGljdWxhcg0KPiA+ID4gPiA+ID4g
PiA+ID4gPiA+ID4gbW91bnQgYWN0aXZlIGFuZA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gdGhl
bg0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gdW1vdW50IGl0LCBJIHNlZToNCj4gPiA+ID4gPiA+
ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gTkZTOiBuZnM0X3JlY2xhaW1f
b3Blbl9zdGF0ZTogdW5oYW5kbGVkIGVycm9yDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiAtMTAw
NjgNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiBORlM0RVJS
X1JFVFJZX1VOQ0FDSEVEX1JFUC4NCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+
ID4gPiA+ID4gPiBTbywgeW91J3JlIHVzaW5nIE5GU3Y0LjEgb3IgNC4yLCBhbmQgdGhlIHNlcnZl
cg0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+IHRoaW5rcyB0aGF0IHRoZQ0KPiA+ID4gPiA+ID4gPiA+
ID4gPiA+IGNsaWVudA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+IGhhcyByZXVzZWQgYSAoc2xvdCwg
c2VxdWVuY2UgbnVtYmVyKSBwYWlyLCBidXQNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiB0aGUNCj4g
PiA+ID4gPiA+ID4gPiA+ID4gPiBzZXJ2ZXIgZG9lc24ndA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+
IGhhdmUgYQ0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+IGNhY2hlZCByZXNwb25zZSB0byByZXR1cm4u
DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gSGFyZCB0byBr
bm93IGhvdyB0aGF0IGhhcHBlbmVkLCBhbmQgaXQncyBub3QNCj4gPiA+ID4gPiA+ID4gPiA+ID4g
PiBzaG93biBpbg0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+IHRoZSBiZWxvdy4NCj4gPiA+ID4gPiA+
ID4gPiA+ID4gPiBTb3VuZHMgbGlrZSBhIGJ1ZywgdGhvdWdoLg0KPiA+ID4gPiA+ID4gPiA+ID4g
PiANCj4gPiA+ID4gPiA+ID4gPiA+ID4gLi4ub3IgYSBDdHJsLUMuLi4uDQo+ID4gPiA+ID4gPiA+
ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+IEhvdyBkb2VzIHRoYXQgaGFwcGVuPw0KPiA+ID4gPiA+
ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gSWYgSSBtYXkgY2hp
bWUgaW4uLi4NCj4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiBCcnVjZSwgd2hlbiBh
biBhcHBsaWNhdGlvbiBzZW5kcyBhIEN0cmwtQyBhbmQgY2xpZW50cydzDQo+ID4gPiA+ID4gPiA+
ID4gc2Vzc2lvbiBzbG90DQo+ID4gPiA+ID4gPiA+ID4gaGFzIHNlbnQgb3V0IGFuIFJQQyBidXQg
ZGlkbid0IHByb2Nlc3MgdGhlIHJlcGx5LCB0aGUNCj4gPiA+ID4gPiA+ID4gPiBjbGllbnQgZG9l
c24ndA0KPiA+ID4gPiA+ID4gPiA+IGtub3cgaWYgdGhlIHNlcnZlciBwcm9jZXNzZWQgdGhhdCBz
ZXF1ZW5jZSBpZCBvciBub3QuDQo+ID4gPiA+ID4gPiA+ID4gSW4NCj4gPiA+ID4gPiA+ID4gPiB0
aGF0IGNhc2UsDQo+ID4gPiA+ID4gPiA+ID4gdGhlIGNsaWVudCBkb2Vzbid0IGluY3JlbWVudCB0
aGUgc2VxdWVuY2UgbnVtYmVyLg0KPiA+ID4gPiA+ID4gPiA+IE5vcm1hbGx5DQo+ID4gPiA+ID4g
PiA+ID4gdGhlIGNsaWVudA0KPiA+ID4gPiA+ID4gPiA+IHdvdWxkIGhhbmRsZSBnZXR0aW5nIHN1
Y2ggYW4gZXJyb3IgYnkgcmV0cnlpbmcgYWdhaW4NCj4gPiA+ID4gPiA+ID4gPiAoYW5kDQo+ID4g
PiA+ID4gPiA+ID4gcmVzZXR0aW5nDQo+ID4gPiA+ID4gPiA+ID4gdGhlIHNsb3RzKSBidXQgSSB0
aGluayBkdXJpbmcgcmVjb3Zlcnkgb3BlcmF0aW9uIHRoZQ0KPiA+ID4gPiA+ID4gPiA+IGNsaWVu
dA0KPiA+ID4gPiA+ID4gPiA+IGhhbmRsZXMNCj4gPiA+ID4gPiA+ID4gPiBlcnJvcnMgZGlmZmVy
ZW50bHkgKGJ5IGp1c3QgZXJyb3JpbmcpLiBJIGJlbGlldmUgdGhlDQo+ID4gPiA+ID4gPiA+ID4g
cmVhc29uaW5nIHRoYXQgd2UNCj4gPiA+ID4gPiA+ID4gPiBkb24ndCB3YW50IHRvIGJlIHN0dWNr
IHRyeWluZyB0byByZWNvdmVyIGZyb20gdGhlDQo+ID4gPiA+ID4gPiA+ID4gcmVjb3ZlcnkNCj4g
PiA+ID4gPiA+ID4gPiBmcm9tIHRoZQ0KPiA+ID4gPiA+ID4gPiA+IHJlY292ZXJ5IGV0Yy4uLg0K
PiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gU28gaW4gdGhhdCBjYXNlIHRoZSBjbGllbnQg
Y2FuIGVuZCB1cCBzZW5kaW5nIGEgZGlmZmVyZW50DQo+ID4gPiA+ID4gPiA+IHJwYw0KPiA+ID4g
PiA+ID4gPiByZXVzaW5nDQo+ID4gPiA+ID4gPiA+IHRoZSBvbGQgc2xvdCBhbmQgc2VxdWVuY2Ug
bnVtYmVyPw0KPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiBDb3JyZWN0Lg0KPiA+ID4gPiA+IA0K
PiA+ID4gPiA+IFNvIHRoYXQgY291bGQgZ2V0IFVOQ0FDSEVEX1JFUCBhcyB0aGUgcmVzcG9uc2Uu
wqDCoEJ1dCBpZg0KPiA+ID4gPiA+IHlvdSdyZQ0KPiA+ID4gPiA+IHZlcnkNCj4gPiA+ID4gPiB1
bmx1Y2t5LCBjb3VsZG4ndCB0aGlzIGFsc28gaGFwcGVuPzoNCj4gPiA+ID4gPiANCj4gPiA+ID4g
PiDCoMKgwqDCoMKgwqDCoMKgMSkgdGhlIGNvbXBvdW5kIHByZXZpb3VzbHkgc2VudCBvbiB0aGF0
IHNsb3Qgd2FzDQo+ID4gPiA+ID4gcHJvY2Vzc2VkIGJ5DQo+ID4gPiA+ID4gwqDCoMKgwqDCoMKg
wqDCoHRoZSBzZXJ2ZXIgYW5kIGNhY2hlZA0KPiA+ID4gPiA+IMKgwqDCoMKgwqDCoMKgwqAyKSB0
aGUgY29tcG91bmQgeW91J3JlIHNlbmRpbmcgbm93IGhhcHBlbnMgdG8gaGF2ZQ0KPiA+ID4gPiA+
IHRoZQ0KPiA+ID4gPiA+IHNhbWUgc2V0DQo+ID4gPiA+ID4gwqDCoMKgwqDCoMKgwqDCoG9mIG9w
ZXJhdGlvbnMNCj4gPiA+ID4gPiANCj4gPiA+ID4gPiB3aXRoIHRoZSByZXN1bHQgdGhhdCB0aGUg
Y2xpZW50IGRvZXNuJ3QgZGV0ZWN0IHRoYXQgdGhlIHJlcGx5DQo+ID4gPiA+ID4gd2FzDQo+ID4g
PiA+ID4gYWN0dWFsbHkgdG8gc29tZSBvdGhlciBycGMsIGFuZCBpbnN0ZWFkIGl0IHJldHVybnMg
YmFkIGRhdGENCj4gPiA+ID4gPiB0bw0KPiA+ID4gPiA+IHRoZQ0KPiA+ID4gPiA+IGFwcGxpY2F0
aW9uPw0KPiA+ID4gPiANCj4gPiA+ID4gSWYgeW91IGFyZSBzZW5kaW5nIGV4YWN0bHkgdGhlIHNh
bWUgb3BlcmF0aW9ucyBhbmQgYXJndW1lbnRzLA0KPiA+ID4gPiB0aGVuDQo+ID4gPiA+IHdoeQ0K
PiA+ID4gPiBpcyBhIHJlcGx5IGZyb20gdGhlIGNhY2hlIHdvdWxkIGxlYWQgdG8gYmFkIGRhdGE/
DQo+ID4gPiANCj4gPiA+IFRoYXQgd291bGQgcHJvYmFibHkgYmUgZmluZSwgSSB3YXMgd29uZGVy
aW5nIHdoYXQgd291bGQgaGFwcGVuIGlmDQo+ID4gPiB5b3UNCj4gPiA+IHNlbnQgdGhlIHNhbWUg
b3BlcmF0aW9uIGJ1dCBkaWZmZXJlbnQgYXJndW1lbnRzLg0KPiA+IA0KPiA+ID4gDQo+ID4gPiBT
byB0aGUgb3JpZ2luYWwgY2FuY2VsbGVkIG9wZXJhdGlvbiBpcyBzb21ldGhpbmcgbGlrZQ0KPiA+
ID4gUFVURkgoZmgxKStPUEVOKCJmb28iKStHRVRGSCwgYW5kIHRoZSBuZXcgb25lIGlzDQo+ID4g
PiBQVVRGSChmaDIpK09QRU4oImJhciIpK0dFVEZILsKgwqBJbiB0aGVvcnkgY291bGRuJ3QgdGhl
IHNlY29uZCBvbmUNCj4gPiA+IHN1Y2NlZWQNCj4gPiA+IGFuZCBsZWF2ZSB0aGUgY2xpZW50IHRo
aW5raW5nIGl0IGhhZCBvcGVuZWQgKGZoMiwgYmFyKSB3aGVuIHRoZQ0KPiA+ID4gZmlsZWhhbmRs
ZSBpdCBnb3QgYmFjayB3YXMgcmVhbGx5IGZvciAoZmgxLCBmb28pPw0KPiA+ID4gDQo+ID4gDQo+
ID4gVGhlIGNsaWVudCB3b3VsZCByZWNlaXZlIGEgZmlsZWhhbmRsZSBmb3IgZmgxLyJmb28iLCBz
byBpdCB3b3VsZA0KPiA+IGFwcGx5DQo+ID4gYW55IHN0YXRlIGl0IHRob3VnaHQgaXQgaGFkIHJl
Y2VpdmVkIHRvIHRoYXQgZmlsZS4gSG93ZXZlciwNCj4gPiBub3JtYWxseSwNCj4gPiBJJ2QgZXhw
ZWN0IHRvIHNlZSBhIE5GUzRFUlJfRkFMU0VfUkVUUlkgaW4gdGhpcyBjYXNlLg0KPiANCj4gSSBz
ZWUgQnJ1Y2UncyBwb2ludCB0aGF0IGlmIHRoZSBzZXJ2ZXIgb25seSBsb29rcyB1cCB0aGUgY2Fj
aGUgYmFzZWQNCj4gb24gdGhlIHNlcWlkIGFuZCBzbG90IyBhbmQgZG9lc24ndCBoYXZlIGxpa2Ug
YSBoYXNoIG9mIHRoZSBjb250ZW50DQo+IHdoaWNoIEkgY291bGQgc2VlIGlzIGV4cGVuc2l2ZSwg
dGhlbiB0aGUgY2xpZW50IGluIHRoaXMgY2FzZSB3b3VsZA0KPiBnZXQNCj4gaW50byB0cm91Ymxl
IG9mIHRoaW5raW5nIGl0IG9wZW5lZCAiYmFyIiBidXQgcmVhbGx5IGl0J3MgImZvbyIuIFNwZWMN
Cj4gc2F5czoNCj4gDQo+IFNlY3Rpb24gMTguNDYuMw0KPiBJZiB0aGUgY2xpZW50IHJldXNlcyBh
IHNsb3QgSUQgYW5kIHNlcXVlbmNlIElEIGZvciBhIGNvbXBsZXRlbHkNCj4gwqDCoMKgZGlmZmVy
ZW50IHJlcXVlc3QsIHRoZSBzZXJ2ZXIgTUFZIHRyZWF0IHRoZSByZXF1ZXN0IGFzIGlmIGl0IGlz
IGENCj4gwqDCoMKgcmV0cnkgb2Ygd2hhdCBpdCBoYXMgYWxyZWFkeSBleGVjdXRlZC7CoMKgVGhl
IHNlcnZlciBNQVkgaG93ZXZlcg0KPiBkZXRlY3QNCj4gwqDCoMKgdGhlIGNsaWVudCdzIGlsbGVn
YWwgcmV1c2UgYW5kIHJldHVybiBORlM0RVJSX1NFUV9GQUxTRV9SRVRSWS4NCj4gDQo+IFdoYXQg
aXMgImEgY29tcGxldGVseSBkaWZmZXJlbnQgcmVxdWVzdCIuIEZyb20gdGhlIGNsaWVudCdzIHBv
aW50IG9mDQo+IHZpZXcgc2VuZGluZyBkaWZmZXJlbnQgYXJncyB3b3VsZCBjb25zdGl0dXRlIGEg
ZGlmZmVyZW50IHJlcXVlc3QuIEJ1dA0KPiBpbiBhbnkgY2FzZSBpdCdzIGEgIk1BWSIgc28gY2xp
ZW50IGNhbid0IGRlcGVuZCBvbiB0aGlzIGJlaW5nDQo+IGltcGxlbWVudGVkLg0KPiANCg0KV2hh
dCdzIHRoZSBhbHRlcm5hdGl2ZT8gQXNzdW1lIHRoZSBjbGllbnQgcHJlLWVtcHRpdmVseSBidW1w
cyB0aGUgc2VxaWQNCmluc3RlYWQgb2YgcmV0cnlpbmcsIHRoZW4gdGhlIHVzZXIgcHJlc3NlcyBD
dHJsLUMgYWdhaW4uIFJlcGVhdCBhIGZldw0KbW9yZSB0aW1lcy4gSG93IGRvIEkgbm93IHJlc3lu
YyB0aGUgc2VxaWRzIGJldHdlZW4gdGhlIGNsaWVudCBhbmQNCnNlcnZlciBvdGhlciB0aGFuIGJ5
IHRyYXNoaW5nIHRoZSBzZXNzaW9uPw==


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-11-17 22:15                                     ` Trond Myklebust
@ 2016-11-17 22:27                                       ` Olga Kornievskaia
  2016-11-17 22:43                                         ` Trond Myklebust
  0 siblings, 1 reply; 27+ messages in thread
From: Olga Kornievskaia @ 2016-11-17 22:27 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: bfields, tibbs, linux-nfs

On Thu, Nov 17, 2016 at 5:15 PM, Trond Myklebust
<trondmy@primarydata.com> wrote:
> On Thu, 2016-11-17 at 16:53 -0500, Olga Kornievskaia wrote:
>> On Thu, Nov 17, 2016 at 4:45 PM, Trond Myklebust
>> <trondmy@primarydata.com> wrote:
>> >
>> > On Thu, 2016-11-17 at 16:26 -0500, bfields@fieldses.org wrote:
>> > >
>> > > On Thu, Nov 17, 2016 at 04:05:32PM -0500, Olga Kornievskaia
>> > > wrote:
>> > > >
>> > > >
>> > > > On Thu, Nov 17, 2016 at 3:46 PM, bfields@fieldses.org
>> > > > <bfields@fieldses.org> wrote:
>> > > > >
>> > > > >
>> > > > > On Thu, Nov 17, 2016 at 03:29:11PM -0500, Olga Kornievskaia
>> > > > > wrote:
>> > > > > >
>> > > > > >
>> > > > > > On Thu, Nov 17, 2016 at 3:17 PM, bfields@fieldses.org
>> > > > > > <bfields@fieldses.org> wrote:
>> > > > > > >
>> > > > > > >
>> > > > > > > On Thu, Nov 17, 2016 at 02:58:12PM -0500, Olga
>> > > > > > > Kornievskaia
>> > > > > > > wrote:
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Thu, Nov 17, 2016 at 2:32 PM, bfields@fieldses.org
>> > > > > > > > <bfields@fieldses.org> wrote:
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > On Thu, Nov 17, 2016 at 05:45:52PM +0000, Trond
>> > > > > > > > > Myklebust
>> > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > On Thu, 2016-11-17 at 11:31 -0500, J. Bruce Fields
>> > > > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > On Wed, Nov 16, 2016 at 02:55:05PM -0600, Jason L
>> > > > > > > > > > > Tibbitts III wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > I'm replying to a rather old message, but the
>> > > > > > > > > > > > issue
>> > > > > > > > > > > > has just now
>> > > > > > > > > > > > popped
>> > > > > > > > > > > > back up again.
>> > > > > > > > > > > >
>> > > > > > > > > > > > To recap, a client stops being able to access
>> > > > > > > > > > > > _any_
>> > > > > > > > > > > > mount on a
>> > > > > > > > > > > > particular server, and "NFS:
>> > > > > > > > > > > > nfs4_reclaim_open_state: Lock reclaim
>> > > > > > > > > > > > failed!" appears several hundred times per
>> > > > > > > > > > > > second
>> > > > > > > > > > > > in the kernel
>> > > > > > > > > > > > log.
>> > > > > > > > > > > > The load goes up by one for ever process
>> > > > > > > > > > > > attempting
>> > > > > > > > > > > > to access any
>> > > > > > > > > > > > mount
>> > > > > > > > > > > > from that particular server.  Mounts to other
>> > > > > > > > > > > > servers are fine, and
>> > > > > > > > > > > > other clients can mount things from that one
>> > > > > > > > > > > > server
>> > > > > > > > > > > > without
>> > > > > > > > > > > > problems.
>> > > > > > > > > > > >
>> > > > > > > > > > > > When I kill every process keeping that
>> > > > > > > > > > > > particular
>> > > > > > > > > > > > mount active and
>> > > > > > > > > > > > then
>> > > > > > > > > > > > umount it, I see:
>> > > > > > > > > > > >
>> > > > > > > > > > > > NFS: nfs4_reclaim_open_state: unhandled error
>> > > > > > > > > > > > -10068
>> > > > > > > > > > >
>> > > > > > > > > > > NFS4ERR_RETRY_UNCACHED_REP.
>> > > > > > > > > > >
>> > > > > > > > > > > So, you're using NFSv4.1 or 4.2, and the server
>> > > > > > > > > > > thinks that the
>> > > > > > > > > > > client
>> > > > > > > > > > > has reused a (slot, sequence number) pair, but
>> > > > > > > > > > > the
>> > > > > > > > > > > server doesn't
>> > > > > > > > > > > have a
>> > > > > > > > > > > cached response to return.
>> > > > > > > > > > >
>> > > > > > > > > > > Hard to know how that happened, and it's not
>> > > > > > > > > > > shown in
>> > > > > > > > > > > the below.
>> > > > > > > > > > > Sounds like a bug, though.
>> > > > > > > > > >
>> > > > > > > > > > ...or a Ctrl-C....
>> > > > > > > > >
>> > > > > > > > > How does that happen?
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > If I may chime in...
>> > > > > > > >
>> > > > > > > > Bruce, when an application sends a Ctrl-C and clients's
>> > > > > > > > session slot
>> > > > > > > > has sent out an RPC but didn't process the reply, the
>> > > > > > > > client doesn't
>> > > > > > > > know if the server processed that sequence id or not.
>> > > > > > > > In
>> > > > > > > > that case,
>> > > > > > > > the client doesn't increment the sequence number.
>> > > > > > > > Normally
>> > > > > > > > the client
>> > > > > > > > would handle getting such an error by retrying again
>> > > > > > > > (and
>> > > > > > > > resetting
>> > > > > > > > the slots) but I think during recovery operation the
>> > > > > > > > client
>> > > > > > > > handles
>> > > > > > > > errors differently (by just erroring). I believe the
>> > > > > > > > reasoning that we
>> > > > > > > > don't want to be stuck trying to recover from the
>> > > > > > > > recovery
>> > > > > > > > from the
>> > > > > > > > recovery etc...
>> > > > > > >
>> > > > > > > So in that case the client can end up sending a different
>> > > > > > > rpc
>> > > > > > > reusing
>> > > > > > > the old slot and sequence number?
>> > > > > >
>> > > > > > Correct.
>> > > > >
>> > > > > So that could get UNCACHED_REP as the response.  But if
>> > > > > you're
>> > > > > very
>> > > > > unlucky, couldn't this also happen?:
>> > > > >
>> > > > >         1) the compound previously sent on that slot was
>> > > > > processed by
>> > > > >         the server and cached
>> > > > >         2) the compound you're sending now happens to have
>> > > > > the
>> > > > > same set
>> > > > >         of operations
>> > > > >
>> > > > > with the result that the client doesn't detect that the reply
>> > > > > was
>> > > > > actually to some other rpc, and instead it returns bad data
>> > > > > to
>> > > > > the
>> > > > > application?
>> > > >
>> > > > If you are sending exactly the same operations and arguments,
>> > > > then
>> > > > why
>> > > > is a reply from the cache would lead to bad data?
>> > >
>> > > That would probably be fine, I was wondering what would happen if
>> > > you
>> > > sent the same operation but different arguments.
>> >
>> > >
>> > > So the original cancelled operation is something like
>> > > PUTFH(fh1)+OPEN("foo")+GETFH, and the new one is
>> > > PUTFH(fh2)+OPEN("bar")+GETFH.  In theory couldn't the second one
>> > > succeed
>> > > and leave the client thinking it had opened (fh2, bar) when the
>> > > filehandle it got back was really for (fh1, foo)?
>> > >
>> >
>> > The client would receive a filehandle for fh1/"foo", so it would
>> > apply
>> > any state it thought it had received to that file. However,
>> > normally,
>> > I'd expect to see a NFS4ERR_FALSE_RETRY in this case.
>>
>> I see Bruce's point that if the server only looks up the cache based
>> on the seqid and slot# and doesn't have like a hash of the content
>> which I could see is expensive, then the client in this case would
>> get
>> into trouble of thinking it opened "bar" but really it's "foo". Spec
>> says:
>>
>> Section 18.46.3
>> If the client reuses a slot ID and sequence ID for a completely
>>    different request, the server MAY treat the request as if it is a
>>    retry of what it has already executed.  The server MAY however
>> detect
>>    the client's illegal reuse and return NFS4ERR_SEQ_FALSE_RETRY.
>>
>> What is "a completely different request". From the client's point of
>> view sending different args would constitute a different request. But
>> in any case it's a "MAY" so client can't depend on this being
>> implemented.
>>
>
> What's the alternative? Assume the client pre-emptively bumps the seqid
> instead of retrying, then the user presses Ctrl-C again. Repeat a few
> more times. How do I now resync the seqids between the client and
> server other than by trashing the session?

I don't see any alternatives than to reset in that case. But I think
it's better then the possibility of accidentally opening a wrong file?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-11-17 22:27                                       ` Olga Kornievskaia
@ 2016-11-17 22:43                                         ` Trond Myklebust
  2016-11-18 20:52                                           ` bfields
  0 siblings, 1 reply; 27+ messages in thread
From: Trond Myklebust @ 2016-11-17 22:43 UTC (permalink / raw)
  To: aglo; +Cc: bfields, tibbs, linux-nfs

T24gVGh1LCAyMDE2LTExLTE3IGF0IDE3OjI3IC0wNTAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90
ZToNCj4gT24gVGh1LCBOb3YgMTcsIDIwMTYgYXQgNToxNSBQTSwgVHJvbmQgTXlrbGVidXN0DQo+
IDx0cm9uZG15QHByaW1hcnlkYXRhLmNvbT4gd3JvdGU6DQo+ID4gDQo+ID4gT24gVGh1LCAyMDE2
LTExLTE3IGF0IDE2OjUzIC0wNTAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90ZToNCj4gPiA+IA0K
PiA+ID4gT24gVGh1LCBOb3YgMTcsIDIwMTYgYXQgNDo0NSBQTSwgVHJvbmQgTXlrbGVidXN0DQo+
ID4gPiA8dHJvbmRteUBwcmltYXJ5ZGF0YS5jb20+IHdyb3RlOg0KPiA+ID4gPiANCj4gPiA+ID4g
DQo+ID4gPiA+IE9uIFRodSwgMjAxNi0xMS0xNyBhdCAxNjoyNiAtMDUwMCwgYmZpZWxkc0BmaWVs
ZHNlcy5vcmcgd3JvdGU6DQo+ID4gPiA+ID4gDQo+ID4gPiA+ID4gDQo+ID4gPiA+ID4gT24gVGh1
LCBOb3YgMTcsIDIwMTYgYXQgMDQ6MDU6MzJQTSAtMDUwMCwgT2xnYSBLb3JuaWV2c2thaWENCj4g
PiA+ID4gPiB3cm90ZToNCj4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiAN
Cj4gPiA+ID4gPiA+IE9uIFRodSwgTm92IDE3LCAyMDE2IGF0IDM6NDYgUE0sIGJmaWVsZHNAZmll
bGRzZXMub3JnDQo+ID4gPiA+ID4gPiA8YmZpZWxkc0BmaWVsZHNlcy5vcmc+IHdyb3RlOg0KPiA+
ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4g
PiBPbiBUaHUsIE5vdiAxNywgMjAxNiBhdCAwMzoyOToxMVBNIC0wNTAwLCBPbGdhDQo+ID4gPiA+
ID4gPiA+IEtvcm5pZXZza2FpYQ0KPiA+ID4gPiA+ID4gPiB3cm90ZToNCj4gPiA+ID4gPiA+ID4g
PiANCj4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiBP
biBUaHUsIE5vdiAxNywgMjAxNiBhdCAzOjE3IFBNLCBiZmllbGRzQGZpZWxkc2VzLm9yZw0KPiA+
ID4gPiA+ID4gPiA+IDxiZmllbGRzQGZpZWxkc2VzLm9yZz4gd3JvdGU6DQo+ID4gPiA+ID4gPiA+
ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4g
PiA+ID4gPiBPbiBUaHUsIE5vdiAxNywgMjAxNiBhdCAwMjo1ODoxMlBNIC0wNTAwLCBPbGdhDQo+
ID4gPiA+ID4gPiA+ID4gPiBLb3JuaWV2c2thaWENCj4gPiA+ID4gPiA+ID4gPiA+IHdyb3RlOg0K
PiA+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+
ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiA+ID4gPiBPbiBUaHUsIE5vdiAxNywgMjAxNiBhdCAyOjMy
IFBNLCBiZmllbGRzQGZpZWxkc2VzLm8NCj4gPiA+ID4gPiA+ID4gPiA+ID4gcmcNCj4gPiA+ID4g
PiA+ID4gPiA+ID4gPGJmaWVsZHNAZmllbGRzZXMub3JnPiB3cm90ZToNCj4gPiA+ID4gPiA+ID4g
PiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiAN
Cj4gPiA+ID4gPiA+ID4gPiA+ID4gPiBPbiBUaHUsIE5vdiAxNywgMjAxNiBhdCAwNTo0NTo1MlBN
ICswMDAwLCBUcm9uZA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+IE15a2xlYnVzdA0KPiA+ID4gPiA+
ID4gPiA+ID4gPiA+IHdyb3RlOg0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4g
PiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiA+
ID4gPiA+ID4gT24gVGh1LCAyMDE2LTExLTE3IGF0IDExOjMxIC0wNTAwLCBKLiBCcnVjZQ0KPiA+
ID4gPiA+ID4gPiA+ID4gPiA+ID4gRmllbGRzDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiB3cm90
ZToNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+
IA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4g
T24gV2VkLCBOb3YgMTYsIDIwMTYgYXQgMDI6NTU6MDVQTSAtMDYwMCwNCj4gPiA+ID4gPiA+ID4g
PiA+ID4gPiA+ID4gSmFzb24gTA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBUaWJiaXR0cyBJ
SUkgd3JvdGU6DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4g
PiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+
ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gSSdtIHJlcGx5aW5n
IHRvIGEgcmF0aGVyIG9sZCBtZXNzYWdlLCBidXQNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4g
PiB0aGUNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBpc3N1ZQ0KPiA+ID4gPiA+ID4gPiA+
ID4gPiA+ID4gPiA+IGhhcyBqdXN0IG5vdw0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IHBv
cHBlZA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IGJhY2sgdXAgYWdhaW4uDQo+ID4gPiA+
ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gVG8gcmVj
YXAsIGEgY2xpZW50IHN0b3BzIGJlaW5nIGFibGUgdG8NCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+
ID4gPiBhY2Nlc3MNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBfYW55Xw0KPiA+ID4gPiA+
ID4gPiA+ID4gPiA+ID4gPiA+IG1vdW50IG9uIGENCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4g
PiBwYXJ0aWN1bGFyIHNlcnZlciwgYW5kICJORlM6DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+
ID4gbmZzNF9yZWNsYWltX29wZW5fc3RhdGU6IExvY2sgcmVjbGFpbQ0KPiA+ID4gPiA+ID4gPiA+
ID4gPiA+ID4gPiA+IGZhaWxlZCEiIGFwcGVhcnMgc2V2ZXJhbCBodW5kcmVkIHRpbWVzIHBlcg0K
PiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IHNlY29uZA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+
ID4gPiA+IGluIHRoZSBrZXJuZWwNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBsb2cuDQo+
ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gVGhlIGxvYWQgZ29lcyB1cCBieSBvbmUgZm9yIGV2
ZXIgcHJvY2Vzcw0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IGF0dGVtcHRpbmcNCj4gPiA+
ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiB0byBhY2Nlc3MgYW55DQo+ID4gPiA+ID4gPiA+ID4gPiA+
ID4gPiA+ID4gbW91bnQNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBmcm9tIHRoYXQgcGFy
dGljdWxhciBzZXJ2ZXIuwqDCoE1vdW50cyB0bw0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+
IG90aGVyDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gc2VydmVycyBhcmUgZmluZSwgYW5k
DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gb3RoZXIgY2xpZW50cyBjYW4gbW91bnQgdGhp
bmdzIGZyb20gdGhhdA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IG9uZQ0KPiA+ID4gPiA+
ID4gPiA+ID4gPiA+ID4gPiA+IHNlcnZlcg0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IHdp
dGhvdXQNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBwcm9ibGVtcy4NCj4gPiA+ID4gPiA+
ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBXaGVuIEkga2ls
bCBldmVyeSBwcm9jZXNzIGtlZXBpbmcgdGhhdA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+
IHBhcnRpY3VsYXINCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBtb3VudCBhY3RpdmUgYW5k
DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gdGhlbg0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+
ID4gPiA+IHVtb3VudCBpdCwgSSBzZWU6DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+
ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gTkZTOiBuZnM0X3JlY2xhaW1fb3Blbl9zdGF0ZTog
dW5oYW5kbGVkDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gZXJyb3INCj4gPiA+ID4gPiA+
ID4gPiA+ID4gPiA+ID4gPiAtMTAwNjgNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4g
PiA+ID4gPiA+ID4gPiA+ID4gPiA+IE5GUzRFUlJfUkVUUllfVU5DQUNIRURfUkVQLg0KPiA+ID4g
PiA+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gU28sIHlvdSdy
ZSB1c2luZyBORlN2NC4xIG9yIDQuMiwgYW5kIHRoZQ0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4g
PiBzZXJ2ZXINCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gdGhpbmtzIHRoYXQgdGhlDQo+ID4g
PiA+ID4gPiA+ID4gPiA+ID4gPiA+IGNsaWVudA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiBo
YXMgcmV1c2VkIGEgKHNsb3QsIHNlcXVlbmNlIG51bWJlcikgcGFpciwNCj4gPiA+ID4gPiA+ID4g
PiA+ID4gPiA+ID4gYnV0DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IHRoZQ0KPiA+ID4gPiA+
ID4gPiA+ID4gPiA+ID4gPiBzZXJ2ZXIgZG9lc24ndA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4g
PiBoYXZlIGENCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gY2FjaGVkIHJlc3BvbnNlIHRvIHJl
dHVybi4NCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4g
PiA+IEhhcmQgdG8ga25vdyBob3cgdGhhdCBoYXBwZW5lZCwgYW5kIGl0J3Mgbm90DQo+ID4gPiA+
ID4gPiA+ID4gPiA+ID4gPiA+IHNob3duIGluDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IHRo
ZSBiZWxvdy4NCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4gU291bmRzIGxpa2UgYSBidWcsIHRo
b3VnaC4NCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ID4g
Li4ub3IgYSBDdHJsLUMuLi4uDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+
ID4gPiA+ID4gSG93IGRvZXMgdGhhdCBoYXBwZW4/DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gDQo+
ID4gPiA+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiA+ID4gPiBJZiBJIG1heSBjaGltZSBp
bi4uLg0KPiA+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+ID4gQnJ1Y2UsIHdo
ZW4gYW4gYXBwbGljYXRpb24gc2VuZHMgYSBDdHJsLUMgYW5kDQo+ID4gPiA+ID4gPiA+ID4gPiA+
IGNsaWVudHMncw0KPiA+ID4gPiA+ID4gPiA+ID4gPiBzZXNzaW9uIHNsb3QNCj4gPiA+ID4gPiA+
ID4gPiA+ID4gaGFzIHNlbnQgb3V0IGFuIFJQQyBidXQgZGlkbid0IHByb2Nlc3MgdGhlIHJlcGx5
LA0KPiA+ID4gPiA+ID4gPiA+ID4gPiB0aGUNCj4gPiA+ID4gPiA+ID4gPiA+ID4gY2xpZW50IGRv
ZXNuJ3QNCj4gPiA+ID4gPiA+ID4gPiA+ID4ga25vdyBpZiB0aGUgc2VydmVyIHByb2Nlc3NlZCB0
aGF0IHNlcXVlbmNlIGlkIG9yDQo+ID4gPiA+ID4gPiA+ID4gPiA+IG5vdC4NCj4gPiA+ID4gPiA+
ID4gPiA+ID4gSW4NCj4gPiA+ID4gPiA+ID4gPiA+ID4gdGhhdCBjYXNlLA0KPiA+ID4gPiA+ID4g
PiA+ID4gPiB0aGUgY2xpZW50IGRvZXNuJ3QgaW5jcmVtZW50IHRoZSBzZXF1ZW5jZSBudW1iZXIu
DQo+ID4gPiA+ID4gPiA+ID4gPiA+IE5vcm1hbGx5DQo+ID4gPiA+ID4gPiA+ID4gPiA+IHRoZSBj
bGllbnQNCj4gPiA+ID4gPiA+ID4gPiA+ID4gd291bGQgaGFuZGxlIGdldHRpbmcgc3VjaCBhbiBl
cnJvciBieSByZXRyeWluZw0KPiA+ID4gPiA+ID4gPiA+ID4gPiBhZ2Fpbg0KPiA+ID4gPiA+ID4g
PiA+ID4gPiAoYW5kDQo+ID4gPiA+ID4gPiA+ID4gPiA+IHJlc2V0dGluZw0KPiA+ID4gPiA+ID4g
PiA+ID4gPiB0aGUgc2xvdHMpIGJ1dCBJIHRoaW5rIGR1cmluZyByZWNvdmVyeSBvcGVyYXRpb24N
Cj4gPiA+ID4gPiA+ID4gPiA+ID4gdGhlDQo+ID4gPiA+ID4gPiA+ID4gPiA+IGNsaWVudA0KPiA+
ID4gPiA+ID4gPiA+ID4gPiBoYW5kbGVzDQo+ID4gPiA+ID4gPiA+ID4gPiA+IGVycm9ycyBkaWZm
ZXJlbnRseSAoYnkganVzdCBlcnJvcmluZykuIEkgYmVsaWV2ZQ0KPiA+ID4gPiA+ID4gPiA+ID4g
PiB0aGUNCj4gPiA+ID4gPiA+ID4gPiA+ID4gcmVhc29uaW5nIHRoYXQgd2UNCj4gPiA+ID4gPiA+
ID4gPiA+ID4gZG9uJ3Qgd2FudCB0byBiZSBzdHVjayB0cnlpbmcgdG8gcmVjb3ZlciBmcm9tIHRo
ZQ0KPiA+ID4gPiA+ID4gPiA+ID4gPiByZWNvdmVyeQ0KPiA+ID4gPiA+ID4gPiA+ID4gPiBmcm9t
IHRoZQ0KPiA+ID4gPiA+ID4gPiA+ID4gPiByZWNvdmVyeSBldGMuLi4NCj4gPiA+ID4gPiA+ID4g
PiA+IA0KPiA+ID4gPiA+ID4gPiA+ID4gU28gaW4gdGhhdCBjYXNlIHRoZSBjbGllbnQgY2FuIGVu
ZCB1cCBzZW5kaW5nIGENCj4gPiA+ID4gPiA+ID4gPiA+IGRpZmZlcmVudA0KPiA+ID4gPiA+ID4g
PiA+ID4gcnBjDQo+ID4gPiA+ID4gPiA+ID4gPiByZXVzaW5nDQo+ID4gPiA+ID4gPiA+ID4gPiB0
aGUgb2xkIHNsb3QgYW5kIHNlcXVlbmNlIG51bWJlcj8NCj4gPiA+ID4gPiA+ID4gPiANCj4gPiA+
ID4gPiA+ID4gPiBDb3JyZWN0Lg0KPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gU28gdGhh
dCBjb3VsZCBnZXQgVU5DQUNIRURfUkVQIGFzIHRoZSByZXNwb25zZS7CoMKgQnV0IGlmDQo+ID4g
PiA+ID4gPiA+IHlvdSdyZQ0KPiA+ID4gPiA+ID4gPiB2ZXJ5DQo+ID4gPiA+ID4gPiA+IHVubHVj
a3ksIGNvdWxkbid0IHRoaXMgYWxzbyBoYXBwZW4/Og0KPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4g
PiA+ID4gwqDCoMKgwqDCoMKgwqDCoDEpIHRoZSBjb21wb3VuZCBwcmV2aW91c2x5IHNlbnQgb24g
dGhhdCBzbG90IHdhcw0KPiA+ID4gPiA+ID4gPiBwcm9jZXNzZWQgYnkNCj4gPiA+ID4gPiA+ID4g
wqDCoMKgwqDCoMKgwqDCoHRoZSBzZXJ2ZXIgYW5kIGNhY2hlZA0KPiA+ID4gPiA+ID4gPiDCoMKg
wqDCoMKgwqDCoMKgMikgdGhlIGNvbXBvdW5kIHlvdSdyZSBzZW5kaW5nIG5vdyBoYXBwZW5zIHRv
DQo+ID4gPiA+ID4gPiA+IGhhdmUNCj4gPiA+ID4gPiA+ID4gdGhlDQo+ID4gPiA+ID4gPiA+IHNh
bWUgc2V0DQo+ID4gPiA+ID4gPiA+IMKgwqDCoMKgwqDCoMKgwqBvZiBvcGVyYXRpb25zDQo+ID4g
PiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiB3aXRoIHRoZSByZXN1bHQgdGhhdCB0aGUgY2xpZW50
IGRvZXNuJ3QgZGV0ZWN0IHRoYXQgdGhlDQo+ID4gPiA+ID4gPiA+IHJlcGx5DQo+ID4gPiA+ID4g
PiA+IHdhcw0KPiA+ID4gPiA+ID4gPiBhY3R1YWxseSB0byBzb21lIG90aGVyIHJwYywgYW5kIGlu
c3RlYWQgaXQgcmV0dXJucyBiYWQNCj4gPiA+ID4gPiA+ID4gZGF0YQ0KPiA+ID4gPiA+ID4gPiB0
bw0KPiA+ID4gPiA+ID4gPiB0aGUNCj4gPiA+ID4gPiA+ID4gYXBwbGljYXRpb24/DQo+ID4gPiA+
ID4gPiANCj4gPiA+ID4gPiA+IElmIHlvdSBhcmUgc2VuZGluZyBleGFjdGx5IHRoZSBzYW1lIG9w
ZXJhdGlvbnMgYW5kDQo+ID4gPiA+ID4gPiBhcmd1bWVudHMsDQo+ID4gPiA+ID4gPiB0aGVuDQo+
ID4gPiA+ID4gPiB3aHkNCj4gPiA+ID4gPiA+IGlzIGEgcmVwbHkgZnJvbSB0aGUgY2FjaGUgd291
bGQgbGVhZCB0byBiYWQgZGF0YT8NCj4gPiA+ID4gPiANCj4gPiA+ID4gPiBUaGF0IHdvdWxkIHBy
b2JhYmx5IGJlIGZpbmUsIEkgd2FzIHdvbmRlcmluZyB3aGF0IHdvdWxkDQo+ID4gPiA+ID4gaGFw
cGVuIGlmDQo+ID4gPiA+ID4geW91DQo+ID4gPiA+ID4gc2VudCB0aGUgc2FtZSBvcGVyYXRpb24g
YnV0IGRpZmZlcmVudCBhcmd1bWVudHMuDQo+ID4gPiA+IA0KPiA+ID4gPiA+IA0KPiA+ID4gPiA+
IA0KPiA+ID4gPiA+IFNvIHRoZSBvcmlnaW5hbCBjYW5jZWxsZWQgb3BlcmF0aW9uIGlzIHNvbWV0
aGluZyBsaWtlDQo+ID4gPiA+ID4gUFVURkgoZmgxKStPUEVOKCJmb28iKStHRVRGSCwgYW5kIHRo
ZSBuZXcgb25lIGlzDQo+ID4gPiA+ID4gUFVURkgoZmgyKStPUEVOKCJiYXIiKStHRVRGSC7CoMKg
SW4gdGhlb3J5IGNvdWxkbid0IHRoZSBzZWNvbmQNCj4gPiA+ID4gPiBvbmUNCj4gPiA+ID4gPiBz
dWNjZWVkDQo+ID4gPiA+ID4gYW5kIGxlYXZlIHRoZSBjbGllbnQgdGhpbmtpbmcgaXQgaGFkIG9w
ZW5lZCAoZmgyLCBiYXIpIHdoZW4NCj4gPiA+ID4gPiB0aGUNCj4gPiA+ID4gPiBmaWxlaGFuZGxl
IGl0IGdvdCBiYWNrIHdhcyByZWFsbHkgZm9yIChmaDEsIGZvbyk/DQo+ID4gPiA+ID4gDQo+ID4g
PiA+IA0KPiA+ID4gPiBUaGUgY2xpZW50IHdvdWxkIHJlY2VpdmUgYSBmaWxlaGFuZGxlIGZvciBm
aDEvImZvbyIsIHNvIGl0DQo+ID4gPiA+IHdvdWxkDQo+ID4gPiA+IGFwcGx5DQo+ID4gPiA+IGFu
eSBzdGF0ZSBpdCB0aG91Z2h0IGl0IGhhZCByZWNlaXZlZCB0byB0aGF0IGZpbGUuIEhvd2V2ZXIs
DQo+ID4gPiA+IG5vcm1hbGx5LA0KPiA+ID4gPiBJJ2QgZXhwZWN0IHRvIHNlZSBhIE5GUzRFUlJf
RkFMU0VfUkVUUlkgaW4gdGhpcyBjYXNlLg0KPiA+ID4gDQo+ID4gPiBJIHNlZSBCcnVjZSdzIHBv
aW50IHRoYXQgaWYgdGhlIHNlcnZlciBvbmx5IGxvb2tzIHVwIHRoZSBjYWNoZQ0KPiA+ID4gYmFz
ZWQNCj4gPiA+IG9uIHRoZSBzZXFpZCBhbmQgc2xvdCMgYW5kIGRvZXNuJ3QgaGF2ZSBsaWtlIGEg
aGFzaCBvZiB0aGUNCj4gPiA+IGNvbnRlbnQNCj4gPiA+IHdoaWNoIEkgY291bGQgc2VlIGlzIGV4
cGVuc2l2ZSwgdGhlbiB0aGUgY2xpZW50IGluIHRoaXMgY2FzZQ0KPiA+ID4gd291bGQNCj4gPiA+
IGdldA0KPiA+ID4gaW50byB0cm91YmxlIG9mIHRoaW5raW5nIGl0IG9wZW5lZCAiYmFyIiBidXQg
cmVhbGx5IGl0J3MgImZvbyIuDQo+ID4gPiBTcGVjDQo+ID4gPiBzYXlzOg0KPiA+ID4gDQo+ID4g
PiBTZWN0aW9uIDE4LjQ2LjMNCj4gPiA+IElmIHRoZSBjbGllbnQgcmV1c2VzIGEgc2xvdCBJRCBh
bmQgc2VxdWVuY2UgSUQgZm9yIGEgY29tcGxldGVseQ0KPiA+ID4gwqDCoMKgZGlmZmVyZW50IHJl
cXVlc3QsIHRoZSBzZXJ2ZXIgTUFZIHRyZWF0IHRoZSByZXF1ZXN0IGFzIGlmIGl0DQo+ID4gPiBp
cyBhDQo+ID4gPiDCoMKgwqByZXRyeSBvZiB3aGF0IGl0IGhhcyBhbHJlYWR5IGV4ZWN1dGVkLsKg
wqBUaGUgc2VydmVyIE1BWSBob3dldmVyDQo+ID4gPiBkZXRlY3QNCj4gPiA+IMKgwqDCoHRoZSBj
bGllbnQncyBpbGxlZ2FsIHJldXNlIGFuZCByZXR1cm4gTkZTNEVSUl9TRVFfRkFMU0VfUkVUUlku
DQo+ID4gPiANCj4gPiA+IFdoYXQgaXMgImEgY29tcGxldGVseSBkaWZmZXJlbnQgcmVxdWVzdCIu
IEZyb20gdGhlIGNsaWVudCdzIHBvaW50DQo+ID4gPiBvZg0KPiA+ID4gdmlldyBzZW5kaW5nIGRp
ZmZlcmVudCBhcmdzIHdvdWxkIGNvbnN0aXR1dGUgYSBkaWZmZXJlbnQgcmVxdWVzdC4NCj4gPiA+
IEJ1dA0KPiA+ID4gaW4gYW55IGNhc2UgaXQncyBhICJNQVkiIHNvIGNsaWVudCBjYW4ndCBkZXBl
bmQgb24gdGhpcyBiZWluZw0KPiA+ID4gaW1wbGVtZW50ZWQuDQo+ID4gPiANCj4gPiANCj4gPiBX
aGF0J3MgdGhlIGFsdGVybmF0aXZlPyBBc3N1bWUgdGhlIGNsaWVudCBwcmUtZW1wdGl2ZWx5IGJ1
bXBzIHRoZQ0KPiA+IHNlcWlkDQo+ID4gaW5zdGVhZCBvZiByZXRyeWluZywgdGhlbiB0aGUgdXNl
ciBwcmVzc2VzIEN0cmwtQyBhZ2Fpbi4gUmVwZWF0IGENCj4gPiBmZXcNCj4gPiBtb3JlIHRpbWVz
LiBIb3cgZG8gSSBub3cgcmVzeW5jIHRoZSBzZXFpZHMgYmV0d2VlbiB0aGUgY2xpZW50IGFuZA0K
PiA+IHNlcnZlciBvdGhlciB0aGFuIGJ5IHRyYXNoaW5nIHRoZSBzZXNzaW9uPw0KPiANCj4gSSBk
b24ndCBzZWUgYW55IGFsdGVybmF0aXZlcyB0aGFuIHRvIHJlc2V0IGluIHRoYXQgY2FzZS4gQnV0
IEkgdGhpbmsNCj4gaXQncyBiZXR0ZXIgdGhlbiB0aGUgcG9zc2liaWxpdHkgb2YgYWNjaWRlbnRh
bGx5IG9wZW5pbmcgYSB3cm9uZw0KPiBmaWxlPw0KDQpUaGV5IHNvdW5kIGVxdWFsbHkgYmFkIHRv
IG1lIHdoaWNoIGlzIHdoeSBJJ20gbm90IHVuZGVyc3RhbmRpbmcgaG93IGENCnNlcnZlciB3b3Vs
ZCBmYWlsIHRvIGltcGxlbWVudCBzb21lIG1pbmltYWwgZm9ybSBvZiBmYWxzZSByZXRyeQ0KY2hl
Y2tpbmcuDQpUaGUgTGludXggTkZTdjMgRFJDIHdpbGwsIGZvciBpbnN0YW5jZSwgY2hlY2tzdW0g
YXQgbGVhc3Qgc29tZSBwYXJ0IG9mDQp0aGUgUlBDIGFyZ3VtZW50cyBmb3IgX2FsbF8gUlBDIGNh
bGxzLiBNb3N0IE5GU3Y0LnggY2xpZW50cyB3aWxsIG9ubHkNCmFzayB0aGF0IHlvdSBjaGVja3N1
bSB0aGUgbm9uLWlkZW1wb3RlbnQgUlBDIGNhbGxzLCB3aGljaCBzaWduaWZpY2FudGx5DQpjdXRz
IGRvd24gb24gdGhlIGNhbGN1bGF0aW9uIG92ZXJoZWFkLg0K


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-11-17 22:43                                         ` Trond Myklebust
@ 2016-11-18 20:52                                           ` bfields
  2016-11-18 22:44                                             ` Trond Myklebust
  0 siblings, 1 reply; 27+ messages in thread
From: bfields @ 2016-11-18 20:52 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: aglo, tibbs, linux-nfs

On Thu, Nov 17, 2016 at 10:43:58PM +0000, Trond Myklebust wrote:
> On Thu, 2016-11-17 at 17:27 -0500, Olga Kornievskaia wrote:
> > On Thu, Nov 17, 2016 at 5:15 PM, Trond Myklebust
> > <trondmy@primarydata.com> wrote:
> > > What's the alternative? Assume the client pre-emptively bumps the
> > > seqid
> > > instead of retrying, then the user presses Ctrl-C again. Repeat a
> > > few
> > > more times. How do I now resync the seqids between the client and
> > > server other than by trashing the session?
> > 
> > I don't see any alternatives than to reset in that case. But I think
> > it's better then the possibility of accidentally opening a wrong
> > file?

Remind me why you can't continue resending after the Ctrl-C?  (I thought
this was already done for some lock and other cases?)

> They sound equally bad to me which is why I'm not understanding how a
> server would fail to implement some minimal form of false retry
> checking.
> The Linux NFSv3 DRC will, for instance, checksum at least some part of
> the RPC arguments for _all_ RPC calls. Most NFSv4.x clients will only
> ask that you checksum the non-idempotent RPC calls, which significantly
> cuts down on the calculation overhead.

I'll look at adding checksumming, it shouldn't be hard.

--b.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-11-18 20:52                                           ` bfields
@ 2016-11-18 22:44                                             ` Trond Myklebust
  2016-11-21 18:37                                               ` Fields Bruce James
  0 siblings, 1 reply; 27+ messages in thread
From: Trond Myklebust @ 2016-11-18 22:44 UTC (permalink / raw)
  To: Fields Bruce James; +Cc: Kornievskaia Olga, tibbs, List Linux NFS Mailing

DQo+IE9uIE5vdiAxOCwgMjAxNiwgYXQgMTU6NTIsIGJmaWVsZHNAZmllbGRzZXMub3JnIHdyb3Rl
Og0KPiANCj4gT24gVGh1LCBOb3YgMTcsIDIwMTYgYXQgMTA6NDM6NThQTSArMDAwMCwgVHJvbmQg
TXlrbGVidXN0IHdyb3RlOg0KPj4gT24gVGh1LCAyMDE2LTExLTE3IGF0IDE3OjI3IC0wNTAwLCBP
bGdhIEtvcm5pZXZza2FpYSB3cm90ZToNCj4+PiBPbiBUaHUsIE5vdiAxNywgMjAxNiBhdCA1OjE1
IFBNLCBUcm9uZCBNeWtsZWJ1c3QNCj4+PiA8dHJvbmRteUBwcmltYXJ5ZGF0YS5jb20+IHdyb3Rl
Og0KPj4+PiBXaGF0J3MgdGhlIGFsdGVybmF0aXZlPyBBc3N1bWUgdGhlIGNsaWVudCBwcmUtZW1w
dGl2ZWx5IGJ1bXBzIHRoZQ0KPj4+PiBzZXFpZA0KPj4+PiBpbnN0ZWFkIG9mIHJldHJ5aW5nLCB0
aGVuIHRoZSB1c2VyIHByZXNzZXMgQ3RybC1DIGFnYWluLiBSZXBlYXQgYQ0KPj4+PiBmZXcNCj4+
Pj4gbW9yZSB0aW1lcy4gSG93IGRvIEkgbm93IHJlc3luYyB0aGUgc2VxaWRzIGJldHdlZW4gdGhl
IGNsaWVudCBhbmQNCj4+Pj4gc2VydmVyIG90aGVyIHRoYW4gYnkgdHJhc2hpbmcgdGhlIHNlc3Np
b24/DQo+Pj4gDQo+Pj4gSSBkb24ndCBzZWUgYW55IGFsdGVybmF0aXZlcyB0aGFuIHRvIHJlc2V0
IGluIHRoYXQgY2FzZS4gQnV0IEkgdGhpbmsNCj4+PiBpdCdzIGJldHRlciB0aGVuIHRoZSBwb3Nz
aWJpbGl0eSBvZiBhY2NpZGVudGFsbHkgb3BlbmluZyBhIHdyb25nDQo+Pj4gZmlsZT8NCj4gDQo+
IFJlbWluZCBtZSB3aHkgeW91IGNhbid0IGNvbnRpbnVlIHJlc2VuZGluZyBhZnRlciB0aGUgQ3Ry
bC1DPyAgKEkgdGhvdWdodA0KPiB0aGlzIHdhcyBhbHJlYWR5IGRvbmUgZm9yIHNvbWUgbG9jayBh
bmQgb3RoZXIgY2FzZXM/KQ0KDQpXZeKAmWQgaGF2ZSB0byBkbyBpdCBmb3IgYWxsIFJQQyBjYWxs
cywgd2hpY2ggbWVhbnMgdGhleSB3b3VsZCBhbGwgaGF2ZSB0byBiZSBjb252ZXJ0ZWQgdG8gbm90
IHVzZSB0aGUgc3RhY2suIFRoZSByZXN1bHRpbmcgYmVoYXZpb3VyIHdvdWxkIGFsc28gYmUgY29u
ZnVzaW5nLCBhcyBvcGVyYXRpb25zIHdvdWxkIGNvbXBsZXRlIG91dHNpZGUgb2YgbG9ja2luZyBy
dWxlcyBldGMuIFNvLCBmb3IgaW5zdGFuY2UsIHlvdSB3b3VsZCBiZSBzZWVpbmcgc3VjY2Vzc2Z1
bGx5IGxvb2tlZCB1cCBmaWxlcyBteXN0ZXJpb3VzbHkgZGlzYXBwZWFyaW5nIGFzIHRoZSBhc3lu
Y2hyb25vdXMgdW5saW5rKCkgb3BlcmF0aW9uIHRoYXQgd2FzIGludGVycnVwdGVkIGNvbXBsZXRl
cywgb3IgZGlyZWN0b3JpZXMgbXlzdGVyaW91c2x5IGdldHRpbmcgcmVuYW1lZC4NCg0KPiANCj4+
IFRoZXkgc291bmQgZXF1YWxseSBiYWQgdG8gbWUgd2hpY2ggaXMgd2h5IEknbSBub3QgdW5kZXJz
dGFuZGluZyBob3cgYQ0KPj4gc2VydmVyIHdvdWxkIGZhaWwgdG8gaW1wbGVtZW50IHNvbWUgbWlu
aW1hbCBmb3JtIG9mIGZhbHNlIHJldHJ5DQo+PiBjaGVja2luZy4NCj4+IFRoZSBMaW51eCBORlN2
MyBEUkMgd2lsbCwgZm9yIGluc3RhbmNlLCBjaGVja3N1bSBhdCBsZWFzdCBzb21lIHBhcnQgb2YN
Cj4+IHRoZSBSUEMgYXJndW1lbnRzIGZvciBfYWxsXyBSUEMgY2FsbHMuIE1vc3QgTkZTdjQueCBj
bGllbnRzIHdpbGwgb25seQ0KPj4gYXNrIHRoYXQgeW91IGNoZWNrc3VtIHRoZSBub24taWRlbXBv
dGVudCBSUEMgY2FsbHMsIHdoaWNoIHNpZ25pZmljYW50bHkNCj4+IGN1dHMgZG93biBvbiB0aGUg
Y2FsY3VsYXRpb24gb3ZlcmhlYWQuDQo+IA0KPiBJJ2xsIGxvb2sgYXQgYWRkaW5nIGNoZWNrc3Vt
bWluZywgaXQgc2hvdWxkbid0IGJlIGhhcmQuDQoNClRoYW5rcy4NCg0K


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew
  2016-11-18 22:44                                             ` Trond Myklebust
@ 2016-11-21 18:37                                               ` Fields Bruce James
  0 siblings, 0 replies; 27+ messages in thread
From: Fields Bruce James @ 2016-11-21 18:37 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Kornievskaia Olga, tibbs, List Linux NFS Mailing

On Fri, Nov 18, 2016 at 10:44:52PM +0000, Trond Myklebust wrote:
> 
> > On Nov 18, 2016, at 15:52, bfields@fieldses.org wrote:
> > 
> > On Thu, Nov 17, 2016 at 10:43:58PM +0000, Trond Myklebust wrote:
> >> On Thu, 2016-11-17 at 17:27 -0500, Olga Kornievskaia wrote:
> >>> On Thu, Nov 17, 2016 at 5:15 PM, Trond Myklebust
> >>> <trondmy@primarydata.com> wrote:
> >>>> What's the alternative? Assume the client pre-emptively bumps the
> >>>> seqid
> >>>> instead of retrying, then the user presses Ctrl-C again. Repeat a
> >>>> few
> >>>> more times. How do I now resync the seqids between the client and
> >>>> server other than by trashing the session?
> >>> 
> >>> I don't see any alternatives than to reset in that case. But I think
> >>> it's better then the possibility of accidentally opening a wrong
> >>> file?
> > 
> > Remind me why you can't continue resending after the Ctrl-C?  (I thought
> > this was already done for some lock and other cases?)
> 
> We’d have to do it for all RPC calls, which means they would all have to be converted to not use the stack. The resulting behaviour would also be confusing, as operations would complete outside of locking rules etc. So, for instance, you would be seeing successfully looked up files mysteriously disappearing as the asynchronous unlink() operation that was interrupted completes, or directories mysteriously getting renamed.

We get that anyway, since the server may already be processing the rpc.

I'll admit it sounds complicated.

> >> They sound equally bad to me which is why I'm not understanding how a
> >> server would fail to implement some minimal form of false retry
> >> checking.
> >> The Linux NFSv3 DRC will, for instance, checksum at least some part of
> >> the RPC arguments for _all_ RPC calls. Most NFSv4.x clients will only
> >> ask that you checksum the non-idempotent RPC calls, which significantly
> >> cuts down on the calculation overhead.
> > 
> > I'll look at adding checksumming, it shouldn't be hard.
> 
> Thanks.

I doubt it's a complete fix, though.  Two calls with the same arguments
aren't necessarily the same call either, and it's the sequence id that's
supposed to distinguish the "same call" and "different call with same
argument" cases.

--b.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2016-11-21 18:37 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-24 21:43 NFS: nfs4_reclaim_open_state: Lock reclaim failed! log spew Jason L Tibbitts III
2016-02-25 19:58 ` J. Bruce Fields
2016-02-29 23:06   ` Jason L Tibbitts III
2016-03-01  0:48     ` J. Bruce Fields
2016-03-01  0:53       ` Jason L Tibbitts III
2016-03-01  1:01         ` J. Bruce Fields
2016-03-01  1:03           ` Jason L Tibbitts III
2016-11-16 20:55             ` Jason L Tibbitts III
2016-11-17 16:31               ` J. Bruce Fields
2016-11-17 17:08                 ` Jason L Tibbitts III
2016-11-17 20:22                   ` Andrew W Elble
2016-11-17 17:45                 ` Trond Myklebust
2016-11-17 19:32                   ` bfields
2016-11-17 19:58                     ` Olga Kornievskaia
2016-11-17 20:17                       ` bfields
2016-11-17 20:29                         ` Olga Kornievskaia
2016-11-17 20:46                           ` bfields
2016-11-17 21:05                             ` Olga Kornievskaia
2016-11-17 21:26                               ` bfields
2016-11-17 21:45                                 ` Trond Myklebust
2016-11-17 21:53                                   ` Olga Kornievskaia
2016-11-17 22:15                                     ` Trond Myklebust
2016-11-17 22:27                                       ` Olga Kornievskaia
2016-11-17 22:43                                         ` Trond Myklebust
2016-11-18 20:52                                           ` bfields
2016-11-18 22:44                                             ` Trond Myklebust
2016-11-21 18:37                                               ` Fields Bruce James

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.