Linux-NFS Archive on lore.kernel.org
 help / color / Atom feed
From: Olga Kornievskaia <aglo@umich.edu>
To: Benjamin Coddington <bcodding@redhat.com>
Cc: Leon Kyneur <leonk@dug.com>, linux-nfs <linux-nfs@vger.kernel.org>
Subject: Re: troubleshooting LOCK FH and NFS4ERR_BAD_SEQID
Date: Wed, 18 Sep 2019 10:32:13 -0400
Message-ID: <CAN-5tyERg5kwcD2iugwPVCLDSog0ufKoRRVbC-7pQW-hqLWncQ@mail.gmail.com> (raw)
In-Reply-To: <66D00B9D-16DC-4979-8400-457398DC4801@redhat.com>

Hi folks,

The bad_seqid error could have been the bug in 7.4
https://access.redhat.com/solutions/3354251. It's been fixed in
kernel-3.10.0-693.23.1.el7. Can you try to update and see if that
helps? The bug was client was sending a double close throwing off the
seqid use.

On Wed, Sep 18, 2019 at 9:07 AM Benjamin Coddington <bcodding@redhat.com> wrote:
>
> On 17 Sep 2019, at 22:20, Leon Kyneur wrote:
>
> > On Tue, Sep 17, 2019 at 7:28 PM Benjamin Coddington
> > <bcodding@redhat.com> wrote:
> >>
> >> On 12 Sep 2019, at 4:27, Leon Kyneur wrote:
> >>
> >>> Hi
> >>>
> >>> I'm experiencing an issue on NFS 4.0 + 4.1 where we cannot call
> >>> fcntl
> >>> locks on any file on the share. The problem goes away if the share
> >>> is
> >>> umount && mount (mount -o remount does not resolve the issue)
> >>>
> >>> Client:
> >>> EL 7.4 3.10.0-693.5.2.el7.x86_64 nfs-utils-1.3.0-0.48.el7_4.x86_64
> >>>
> >>> Server:
> >>> EL 7.4 3.10.0-693.5.2.el7.x86_64  nfs-utils-1.3.0-0.48.el7_4.x86_64
> >>>
> >>> I can't figure this out but the client reports bad-sequence-id in
> >>> dupicate in the logs:
> >>> Sep 12 02:16:59 client kernel: NFS: v4 server returned a bad
> >>> sequence-id error on an unconfirmed sequence ffff881c52286220!
> >>> Sep 12 02:16:59 client kernel: NFS: v4 server returned a bad
> >>> sequence-id error on an unconfirmed sequence ffff881c52286220!
> >>> Sep 12 02:17:39 client kernel: NFS: v4 server returned a bad
> >>> sequence-id error on an unconfirmed sequence ffff8810889cb020!
> >>> Sep 12 02:17:39 client kernel: NFS: v4 server returned a bad
> >>> sequence-id error on an unconfirmed sequence ffff8810889cb020!
> >>> Sep 12 02:17:44 client kernel: NFS: v4 server returned a bad
> >>> sequence-id error on an unconfirmed sequence ffff881b414b2620!
> >>>
> >>> wireshark capture shows only 1 BAD_SEQID reply from the server:
> >>> $ tshark -r client_broken.pcap -z proto,colinfo,rpc.xid,rpc.xid -z
> >>> proto,colinfo,nfs.seqid,nfs.seqid -R 'rpc.xid == 0x9990c61d'
> >>> tshark: -R without -2 is deprecated. For single-pass filtering use
> >>> -Y.
> >>> 141         93 172.27.30.129 -> 172.27.255.28 NFS 352 V4 Call LOCK
> >>> FH:
> >>> 0x80589398 Offset: 0 Length: <End of File>  nfs.seqid == 0x0000004e
> >>> nfs.seqid == 0x00000002  rpc.xid == 0x9990c61d
> >>> 142         93 172.27.255.28 -> 172.27.30.129 NFS 124 V4 Reply (Call
> >>> In 141) LOCK Status: NFS4ERR_BAD_SEQID  rpc.xid == 0x9990c61d
> >>>
> >>> system call I have identified as triggering it is:
> >>> fcntl(3, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=1073741824,
> >>> len=1}) = -1 EIO (Input/output error)
> >>
> >> Can you simplify the trigger into something repeatable?  Can you
> >> determine
> >> if the client or the server has lost track of the sequence?
> >>
> >
> > I have tried, I wrote some code to perform the fcntl RDKLCK the same
> > way and ran it accross
> > thousands of machines without any success. I am quite sure this is a
> > symptom of something
> > not the cause.
> >
> > Is there a better way of tracking sequences other than monitoring the
> > network traffic?
>
> I think that's the best way, right now.  We do have tracepoints for
> nfs4 open and close that show the sequence numbers on the client, but
> I'm
> not sure about how to get that from the server side.  I don't think we
> have
> seqid for locks in tracepoints.. I could be missing something.  Not only
> that, but you might not get tracepoint output showing the sequence
> numbers
> if you're in an error-handling path.
>
> If you have a wire capture of the event, you should be able to go
> backwards
> from the error and figure out what the sequence number on the state
> should
> be for the operation that received BAD_SEQID by finding the last
> sequence-mutating (OPEN,CLOSE,LOCK) operation for that stateid that did
> not
> return an error.
>
> Ben

  reply index

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-12  8:27 Leon Kyneur
2019-09-17 11:28 ` Benjamin Coddington
2019-09-18  2:20   ` Leon Kyneur
2019-09-18 11:04     ` Benjamin Coddington
2019-09-18 14:32       ` Olga Kornievskaia [this message]
2019-09-19  4:22         ` Leon Kyneur
2019-09-19 13:36           ` Olga Kornievskaia
2019-09-24  4:17             ` Leon Kyneur
2019-09-24 14:45               ` Olga Kornievskaia

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAN-5tyERg5kwcD2iugwPVCLDSog0ufKoRRVbC-7pQW-hqLWncQ@mail.gmail.com \
    --to=aglo@umich.edu \
    --cc=bcodding@redhat.com \
    --cc=leonk@dug.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nfs/0 linux-nfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nfs linux-nfs/ https://lore.kernel.org/linux-nfs \
		linux-nfs@vger.kernel.org
	public-inbox-index linux-nfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-nfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git