linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* nfsv4 client sending getattr RPC with filehandle length 0 to wrong server node
@ 2019-11-07  7:12 Ashish Sangwan
  0 siblings, 0 replies; only message in thread
From: Ashish Sangwan @ 2019-11-07  7:12 UTC (permalink / raw)
  To: linux-nfs

client kernel version: Linux version 2.6.32-504.el6.x86_64 and CentOS
release 6.6 (Final)

The following command failed: df /mnt/share_mount/nfsexport_2 -BG with
bad handle error
The pseudo root of the v4 tree is mounted at
NewFs-bd7c39.child4.afs.minerva.com:/ on /mnt/share_mount type nfs
(rw,vers=4,addr=10.53.86.99,clientaddr=10.46.187.210)

while nfsexport_2 is submounted:
/mnt/share_mount/nfsexport_2 from
NewFs-bd7c39.child4.afs.minerva.com:/nfsexport_2/
 Flags: rw,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.46.187.210,minorversion=0,local_lock=none,addr=10.53.86.94

Looking at the network traces, there's a getattr request which wrongly
went to addr 10.53.86.99 with filehandle length 0, while all the
previous calls to nfsexport_2 were correctly sent to 10.53.86.94 after
the referral had been established.

Frame 11457: 182 bytes on wire (1456 bits), 182 bytes captured (1456 bits)
Ethernet II, Src: Nutanix_dd:f9:f1 (50:6b:8d:dd:f9:f1), Dst:
BigSwitc_08:5d:59 (5c:16:c7:08:5d:59)
Internet Protocol Version 4, Src: 10.46.187.210, Dst: 10.53.86.99
Transmission Control Protocol, Src Port: 852, Dst Port: 2049, Seq:
4169, Ack: 2209, Len: 116
Remote Procedure Call, Type:Call XID:0x547d9fd7
Network File System, Ops(2): PUTFH, GETATTR
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]
    Tag: <EMPTY>
    minorversion: 0
    Operations (count: 2): PUTFH, GETATTR
        Opcode: PUTFH (22)
            FileHandle
                length: 0
        Opcode: GETATTR (9)
            Attr mask[0]: 0x00e00000 (Files_Avail, Files_Free, Files_Total)
            Attr mask[1]: 0x00001c00 (Space_Avail, Space_Free, Space_Total)
    [Main Opcode: GETATTR (9)]


Looking at the nfs client side debug logs, it changed the xprt *just*
for doing the STATFS call.
In the last successful call (18768) we can see the xprt as
ffff88007ce98800 which is connected to 94 while the call which hit
error has xprt ffff88007a74c800 which is connected to 99

kernel: NFS: nfs_update_inode(0:16/562949953421312 fh_crc=0xeab37a6a
ct=2 info=0x27e7f)
kernel: NFS: permission(0:16/562949953421312), mask=0x24, res=0
kernel: NFS: open dir(/)
kernel: RPC:       looking up Generic cred
kernel: NFS: revalidating (0:16/562949953421312)
kernel: RPC:       new task initialized, procpid 4056
kernel: RPC:       allocated task ffff88007b6aee80
kernel: RPC: 18768 __rpc_execute flags=0x80
kernel: RPC: 18768 call_start nfs4 proc GETATTR (sync)
kernel: RPC: 18768 call_reserve (status 0)
kernel: RPC: 18768 reserved req ffff88007a512a00 xid 87076d37
kernel: RPC:       wake_up_first(ffff88007ce98990 "xprt_sending")
kernel: RPC: 18768 call_reserveresult (status 0)
kernel: RPC: 18768 call_refresh (status 0)
kernel: RPC: 18768 looking up UNIX cred
kernel: RPC:       looking up UNIX cred
kernel: RPC: 18768 refreshing UNIX cred ffff880079e543c0
kernel: RPC: 18768 call_refreshresult (status 0)
kernel: RPC: 18768 call_allocate (status 0)
kernel: RPC: 18768 allocated buffer of size 1060 at ffff88007cb68800
kernel: RPC: 18768 call_bind (status 0)
kernel: RPC: 18768 call_connect xprt ffff88007ce98800 is connected
kernel: RPC: 18768 call_transmit (status 0)
kernel: RPC: 18768 xprt_prepare_transmit
kernel: RPC: 18768 rpc_xdr_encode (status 0)
kernel: RPC: 18768 marshaling UNIX cred ffff880079e543c0
kernel: RPC: 18768 using AUTH_UNIX cred ffff880079e543c0 to wrap rpc data
kernel: encode_compound: tag=
kernel: RPC: 18768 xprt_transmit(140)
kernel: RPC:       xs_tcp_send_request(140) = 140
kernel: RPC: 18768 xmit complete
kernel: RPC: 18768 sleep_on(queue "xprt_pending" time 4298144477)
kernel: RPC: 18768 added to queue ffff88007ce98a58 "xprt_pending"
kernel: RPC: 18768 setting alarm for 60000 ms
kernel: RPC:       wake_up_first(ffff88007ce98990 "xprt_sending")
kernel: RPC: 18768 sync task going to sleep
kernel: RPC:       xs_tcp_data_ready...
kernel: RPC:       xs_tcp_data_recv started
kernel: RPC:       reading TCP record fragment of length 188
kernel: RPC:       reading XID (4 bytes)
kernel: RPC:       reading reply for XID 87076d37
kernel: RPC:       reading CALL/REPLY flag (4 bytes)
kernel: RPC:       read reply XID 87076d37
kernel: RPC:       XID 87076d37 read 180 bytes
kernel: RPC:       xprt = ffff88007ce98800, tcp_copied = 188,
tcp_offset = 188, tcp_reclen = 188
kernel: RPC: 18768 xid 87076d37 complete (188 bytes received)
kernel: RPC: 18768 __rpc_wake_up_task (now 4298144478)
kernel: RPC: 18768 disabling timer
kernel: RPC: 18768 removed from queue ffff88007ce98a58 "xprt_pending"
kernel: RPC:       __rpc_wake_up_task done
kernel: RPC:       xs_tcp_data_recv done
kernel: RPC: 18768 sync task resuming
kernel: RPC: 18768 call_status (status 188)
kernel: RPC: 18768 call_decode (status 188)
kernel: RPC: 18768 validating UNIX cred ffff880079e543c0
kernel: RPC: 18768 using AUTH_UNIX cred ffff880079e543c0 to unwrap rpc data
kernel: decode_attr_type: type=040000
kernel: decode_attr_change: change attribute=1572863231723295000
kernel: decode_attr_size: file size=5
kernel: decode_attr_fsid: fsid=(0x564673e3cfd41261/0xe48e57b33609a6a7)
kernel: decode_attr_fileid: fileid=562949953421312
kernel: decode_attr_fs_locations: fs_locations done, error = 0
kernel: decode_attr_mode: file mode=01777
kernel: decode_attr_nlink: nlink=2
kernel: decode_attr_owner: uid=0
kernel: decode_attr_group: gid=0
kernel: decode_attr_rdev: rdev=(0x0:0x0)
kernel: decode_attr_space_used: space used=512
kernel: decode_attr_time_access: atime=1572862994
kernel: decode_attr_time_metadata: ctime=1572863231
kernel: decode_attr_time_modify: mtime=1572863231
kernel: decode_attr_mounted_on_fileid: fileid=0
kernel: decode_getfattr_attrs: xdr returned 0
kernel: decode_getfattr_generic: xdr returned 0
kernel: RPC: 18768 call_decode result 0
kernel: RPC: 18768 return 0, status 0
kernel: RPC: 18768 release task
kernel: RPC:       freeing buffer of size 1060 at ffff88007cb68800
kernel: RPC: 18768 release request ffff88007a512a00
kernel: RPC:       wake_up_first(ffff88007ce98b20 "xprt_backlog")
kernel: RPC:       rpc_release_client(ffff880079cfce00)
kernel: RPC: 18768 freeing task
kernel: NFS: nfs_update_inode(0:16/562949953421312 fh_crc=0xeab37a6a
ct=2 info=0x27e7f)
kernel: NFS: (0:16/562949953421312) revalidation complete
kernel: RPC:       looking up Generic cred
kernel: NFS: permission(0:14/0), mask=0x1, res=0
kernel: NFS: nfs_lookup_revalidate(/nfsexport_2) is valid
kernel: RPC:       new task initialized, procpid 4056
kernel: RPC:       allocated task ffff88007b6aee80
kernel: RPC: 18769 __rpc_execute flags=0x80
kernel: RPC: 18769 call_start nfs4 proc STATFS (sync)
kernel: RPC: 18769 call_reserve (status 0)
kernel: RPC: 18769 reserved req ffff88007ae15c00 xid 547d9fd7
kernel: RPC:       wake_up_first(ffff88007a74c990 "xprt_sending")
kernel: RPC: 18769 call_reserveresult (status 0)
kernel: RPC: 18769 call_refresh (status 0)
kernel: RPC: 18769 looking up UNIX cred
kernel: RPC:       looking up UNIX cred
kernel: RPC: 18769 refreshing UNIX cred ffff880079e543c0
kernel: RPC: 18769 call_refreshresult (status 0)
kernel: RPC: 18769 call_allocate (status 0)
kernel: RPC: 18769 allocated buffer of size 1060 at ffff88007cb68800
kernel: RPC: 18769 call_bind (status 0)
kernel: RPC: 18769 call_connect xprt ffff88007a74c800 is connected
kernel: RPC: 18769 call_transmit (status 0)
kernel: RPC: 18769 xprt_prepare_transmit
kernel: RPC: 18769 rpc_xdr_encode (status 0)
kernel: RPC: 18769 marshaling UNIX cred ffff880079e543c0
kernel: RPC: 18769 using AUTH_UNIX cred ffff880079e543c0 to wrap rpc data
kernel: encode_compound: tag=
kernel: RPC: 18769 xprt_transmit(116)
kernel: RPC:       xs_tcp_send_request(116) = 116
kernel: RPC: 18769 xmit complete
kernel: RPC: 18769 sleep_on(queue "xprt_pending" time 4298144478)
kernel: RPC: 18769 added to queue ffff88007a74ca58 "xprt_pending"
kernel: RPC: 18769 setting alarm for 60000 ms
kernel: RPC:       wake_up_first(ffff88007a74c990 "xprt_sending")
kernel: RPC: 18769 sync task going to sleep
kernel: RPC:       xs_tcp_data_ready...
kernel: RPC:       xs_tcp_data_recv started
kernel: RPC:       reading TCP record fragment of length 44
kernel: RPC:       reading XID (4 bytes)
kernel: RPC:       reading reply for XID 547d9fd7
kernel: RPC:       reading CALL/REPLY flag (4 bytes)
kernel: RPC:       read reply XID 547d9fd7
kernel: RPC:       XID 547d9fd7 read 36 bytes
kernel: RPC:       xprt = ffff88007a74c800, tcp_copied = 44,
tcp_offset = 44, tcp_reclen = 44
kernel: RPC: 18769 xid 547d9fd7 complete (44 bytes received)
kernel: RPC: 18769 __rpc_wake_up_task (now 4298144478)
kernel: RPC: 18769 disabling timer
kernel: RPC: 18769 removed from queue ffff88007a74ca58 "xprt_pending"
kernel: RPC:       __rpc_wake_up_task done
kernel: RPC:       xs_tcp_data_recv done
kernel: RPC: 18769 sync task resuming
kernel: RPC: 18769 call_status (status 44)
kernel: RPC: 18769 call_decode (status 44)
kernel: RPC: 18769 validating UNIX cred ffff880079e543c0
kernel: RPC: 18769 using AUTH_UNIX cred ffff880079e543c0 to unwrap rpc data
kernel: RPC: 18769 call_decode result -521
kernel: RPC: 18769 return 0, status -521
kernel: RPC: 18769 release task
kernel: RPC:       freeing buffer of size 1060 at ffff88007cb68800
kernel: RPC: 18769 release request ffff88007ae15c00
kernel: RPC:       wake_up_first(ffff88007a74cb20 "xprt_backlog")
kernel: RPC:       rpc_release_client(ffff88007c994a00)
kernel: RPC: 18769 freeing task
kernel: nfs_statfs: statfs error = 521

Any idea what could have went wrong here? Is this client issue which
might have been fixed in later versions?
BTW, this issue is not very frequent, we hit this randomly after few days.
Also, if any other logs are required, please let me know.

Thanks,
Ashish

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2019-11-07  7:12 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-07  7:12 nfsv4 client sending getattr RPC with filehandle length 0 to wrong server node Ashish Sangwan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).