Linux-NFS Archive on lore.kernel.org
 help / color / Atom feed
From: Leon Kyneur <leonk@dug.com>
To: linux-nfs@vger.kernel.org
Subject: troubleshooting LOCK FH and NFS4ERR_BAD_SEQID
Date: Thu, 12 Sep 2019 16:27:15 +0800
Message-ID: <CAACwWuN6siyM9t+rCmzxYPCf777bvD_J1xQKwNb7ZzBdzvy42Q@mail.gmail.com> (raw)

Hi

I'm experiencing an issue on NFS 4.0 + 4.1 where we cannot call fcntl
locks on any file on the share. The problem goes away if the share is
umount && mount (mount -o remount does not resolve the issue)

Client:
EL 7.4 3.10.0-693.5.2.el7.x86_64 nfs-utils-1.3.0-0.48.el7_4.x86_64

Server:
EL 7.4 3.10.0-693.5.2.el7.x86_64  nfs-utils-1.3.0-0.48.el7_4.x86_64

I can't figure this out but the client reports bad-sequence-id in
dupicate in the logs:
Sep 12 02:16:59 client kernel: NFS: v4 server returned a bad
sequence-id error on an unconfirmed sequence ffff881c52286220!
Sep 12 02:16:59 client kernel: NFS: v4 server returned a bad
sequence-id error on an unconfirmed sequence ffff881c52286220!
Sep 12 02:17:39 client kernel: NFS: v4 server returned a bad
sequence-id error on an unconfirmed sequence ffff8810889cb020!
Sep 12 02:17:39 client kernel: NFS: v4 server returned a bad
sequence-id error on an unconfirmed sequence ffff8810889cb020!
Sep 12 02:17:44 client kernel: NFS: v4 server returned a bad
sequence-id error on an unconfirmed sequence ffff881b414b2620!

wireshark capture shows only 1 BAD_SEQID reply from the server:
$ tshark -r client_broken.pcap -z proto,colinfo,rpc.xid,rpc.xid -z
proto,colinfo,nfs.seqid,nfs.seqid -R 'rpc.xid == 0x9990c61d'
tshark: -R without -2 is deprecated. For single-pass filtering use -Y.
141         93 172.27.30.129 -> 172.27.255.28 NFS 352 V4 Call LOCK FH:
0x80589398 Offset: 0 Length: <End of File>  nfs.seqid == 0x0000004e
nfs.seqid == 0x00000002  rpc.xid == 0x9990c61d
142         93 172.27.255.28 -> 172.27.30.129 NFS 124 V4 Reply (Call
In 141) LOCK Status: NFS4ERR_BAD_SEQID  rpc.xid == 0x9990c61d

system call I have identified as triggering it is:
fcntl(3, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=1073741824,
len=1}) = -1 EIO (Input/output error)

The server filesystem is ZFS though NFS sharing is turned off via ZFS
options and it's exported using /etc/exports / nfsd...

The BAD_SEQID error seems to be fairly random, we have over 2000
machines connected to the share and it's experienced frequently but
randomly accross our clients.

It's worth mentioning that the majority of the clients are mounting
4.0 we did try 4.1 everywhere but hit this
https://access.redhat.com/solutions/3146191

mount options are:
server:/data on /d/data type nfs4
(rw,noatime,nodiratime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=172.27.10.45,local_lock=none,addr=172.27.255.28,_netdev)
or:
server:/data on /d/data type nfs4
(rw,noatime,nodiratime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=172.27.30.129,local_lock=none,addr=172.27.255.28,_netdev)

I'm at a bit off a loss as to where to look next, i've tried to
reproduce locking / unlocking threading but cannot seem to create a
test case that triggers it.

Thanks

Leon

             reply index

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-12  8:27 Leon Kyneur [this message]
2019-09-17 11:28 ` Benjamin Coddington
2019-09-18  2:20   ` Leon Kyneur
2019-09-18 11:04     ` Benjamin Coddington
2019-09-18 14:32       ` Olga Kornievskaia
2019-09-19  4:22         ` Leon Kyneur
2019-09-19 13:36           ` Olga Kornievskaia
2019-09-24  4:17             ` Leon Kyneur
2019-09-24 14:45               ` Olga Kornievskaia

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAACwWuN6siyM9t+rCmzxYPCf777bvD_J1xQKwNb7ZzBdzvy42Q@mail.gmail.com \
    --to=leonk@dug.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nfs/0 linux-nfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nfs linux-nfs/ https://lore.kernel.org/linux-nfs \
		linux-nfs@vger.kernel.org
	public-inbox-index linux-nfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-nfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git