nfsd issue with a kerberized callback

* nfsd issue with a kerberized callback
@ 2018-04-16 19:48 Olga Kornievskaia
  2018-04-16 21:29 ` J. Bruce Fields
  0 siblings, 1 reply; 6+ messages in thread
From: Olga Kornievskaia @ 2018-04-16 19:48 UTC (permalink / raw)
  To: Bruce Fields; +Cc: linux-nfs, ng-linux-team

[-- Attachment #1: Type: text/plain, Size: 1919 bytes --]

Hi Bruce,

I have a failure that I’m investigating from the Bakeathon (this was going against redhat-75 server. Not sure who was running that server. But I believe that was RHEL7.5 server). I have a network trace and I was wondering if you could help with what the server is doing.

I’m attaching a network trace. The parts I’m interested in explaining have to do with the kerberized backchannel for NFS4.0.

A setup is client doing v3 and v4 mount and opening file with one version and appending to it with a different version. Its opened with 4.0 and got a delegation and it’s trying to write with v3 and server is recalling a delegation

Server is issuing CB_NULL gss_init trying to establish a gss context. But it’s doing it twice in frame 259 and frame 261. It’s weird that it’s doing it twice. But Ok.

Now in frame, 283 it sends CB_COMPOUND CB_RECALL 
And in frame 285 it sends CB_NULL with gss_data with the CB_NULL as the payload. I think this is to establish the callback.

In frame 287, client responds with RPC accept state of 6000 (which I believe is "drop reply").

I believe what’s happening is that because the client hasn’t received CB_NULL that establishes a callback channel but got a CB_RECALL it’s just ignoring it.

What happens later is that server re-transmits the CB_COMPOUND but client replies out of the cache. What’s interesting is that by this time since CB_NULL that came after the CB_COMPOUND should have established the callback and if the re-trasmission was instead a new CB_RECALL, then it would have succeeded I would think. Server tries twice and then finally, the sets the CB_PATH_DOWN on the RENEW that client sends.

Questions:
1. Do you see how CB_RECALL can travel before the callback is established?
2. Should the server do something else beside re-transmitting the CB_RECALL because it got this “drop reply” error code back? 

[-- Attachment #2: nfstest_interop_20180329095958_2.cap --]
[-- Type: application/octet-stream, Size: 69418 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread