Linux-NFS Archive on lore.kernel.org
 help / color / Atom feed
From: Leon Kyneur <leonk@dug.com>
To: Benjamin Coddington <bcodding@redhat.com>
Cc: linux-nfs@vger.kernel.org
Subject: Re: troubleshooting LOCK FH and NFS4ERR_BAD_SEQID
Date: Wed, 18 Sep 2019 10:20:53 +0800
Message-ID: <CAACwWuMbB=zTaXW-fQmUYHLvx=YgE=68M96=hq201pqn2wKxBw@mail.gmail.com> (raw)
In-Reply-To: <8217416C-F3E5-4BEE-BD01-2BE19952425E@redhat.com>

On Tue, Sep 17, 2019 at 7:28 PM Benjamin Coddington <bcodding@redhat.com> wrote:
>
> On 12 Sep 2019, at 4:27, Leon Kyneur wrote:
>
> > Hi
> >
> > I'm experiencing an issue on NFS 4.0 + 4.1 where we cannot call fcntl
> > locks on any file on the share. The problem goes away if the share is
> > umount && mount (mount -o remount does not resolve the issue)
> >
> > Client:
> > EL 7.4 3.10.0-693.5.2.el7.x86_64 nfs-utils-1.3.0-0.48.el7_4.x86_64
> >
> > Server:
> > EL 7.4 3.10.0-693.5.2.el7.x86_64  nfs-utils-1.3.0-0.48.el7_4.x86_64
> >
> > I can't figure this out but the client reports bad-sequence-id in
> > dupicate in the logs:
> > Sep 12 02:16:59 client kernel: NFS: v4 server returned a bad
> > sequence-id error on an unconfirmed sequence ffff881c52286220!
> > Sep 12 02:16:59 client kernel: NFS: v4 server returned a bad
> > sequence-id error on an unconfirmed sequence ffff881c52286220!
> > Sep 12 02:17:39 client kernel: NFS: v4 server returned a bad
> > sequence-id error on an unconfirmed sequence ffff8810889cb020!
> > Sep 12 02:17:39 client kernel: NFS: v4 server returned a bad
> > sequence-id error on an unconfirmed sequence ffff8810889cb020!
> > Sep 12 02:17:44 client kernel: NFS: v4 server returned a bad
> > sequence-id error on an unconfirmed sequence ffff881b414b2620!
> >
> > wireshark capture shows only 1 BAD_SEQID reply from the server:
> > $ tshark -r client_broken.pcap -z proto,colinfo,rpc.xid,rpc.xid -z
> > proto,colinfo,nfs.seqid,nfs.seqid -R 'rpc.xid == 0x9990c61d'
> > tshark: -R without -2 is deprecated. For single-pass filtering use -Y.
> > 141         93 172.27.30.129 -> 172.27.255.28 NFS 352 V4 Call LOCK FH:
> > 0x80589398 Offset: 0 Length: <End of File>  nfs.seqid == 0x0000004e
> > nfs.seqid == 0x00000002  rpc.xid == 0x9990c61d
> > 142         93 172.27.255.28 -> 172.27.30.129 NFS 124 V4 Reply (Call
> > In 141) LOCK Status: NFS4ERR_BAD_SEQID  rpc.xid == 0x9990c61d
> >
> > system call I have identified as triggering it is:
> > fcntl(3, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=1073741824,
> > len=1}) = -1 EIO (Input/output error)
>
> Can you simplify the trigger into something repeatable?  Can you determine
> if the client or the server has lost track of the sequence?
>

I have tried, I wrote some code to perform the fcntl RDKLCK the same
way and ran it accross
thousands of machines without any success. I am quite sure this is a
symptom of something
not the cause.

Is there a better way of tracking sequences other than monitoring the
network traffic?

> > The server filesystem is ZFS though NFS sharing is turned off via ZFS
> > options and it's exported using /etc/exports / nfsd...
> >
> > The BAD_SEQID error seems to be fairly random, we have over 2000
> > machines connected to the share and it's experienced frequently but
> > randomly accross our clients.
> >
> > It's worth mentioning that the majority of the clients are mounting
> > 4.0 we did try 4.1 everywhere but hit this
> > https://access.redhat.com/solutions/3146191
>
> This was fixed in kernel-3.10.0-735.el7, FWIW..
>
> Ben

Thanks good to know, am planning an update soon but have been stuck on
3.10.0-693 for other reasons.

Leon

  reply index

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-12  8:27 Leon Kyneur
2019-09-17 11:28 ` Benjamin Coddington
2019-09-18  2:20   ` Leon Kyneur [this message]
2019-09-18 11:04     ` Benjamin Coddington
2019-09-18 14:32       ` Olga Kornievskaia
2019-09-19  4:22         ` Leon Kyneur
2019-09-19 13:36           ` Olga Kornievskaia
2019-09-24  4:17             ` Leon Kyneur
2019-09-24 14:45               ` Olga Kornievskaia

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAACwWuMbB=zTaXW-fQmUYHLvx=YgE=68M96=hq201pqn2wKxBw@mail.gmail.com' \
    --to=leonk@dug.com \
    --cc=bcodding@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nfs/0 linux-nfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nfs linux-nfs/ https://lore.kernel.org/linux-nfs \
		linux-nfs@vger.kernel.org
	public-inbox-index linux-nfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-nfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git