From: Olga Kornievskaia <aglo@umich.edu>
To: Benjamin Coddington <bcodding@redhat.com>
Cc: Leon Kyneur <leonk@dug.com>, linux-nfs <linux-nfs@vger.kernel.org>
Subject: Re: troubleshooting LOCK FH and NFS4ERR_BAD_SEQID
Date: Wed, 18 Sep 2019 10:32:13 -0400 [thread overview]
Message-ID: <CAN-5tyERg5kwcD2iugwPVCLDSog0ufKoRRVbC-7pQW-hqLWncQ@mail.gmail.com> (raw)
In-Reply-To: <66D00B9D-16DC-4979-8400-457398DC4801@redhat.com>
Hi folks,
The bad_seqid error could have been the bug in 7.4
https://access.redhat.com/solutions/3354251. It's been fixed in
kernel-3.10.0-693.23.1.el7. Can you try to update and see if that
helps? The bug was client was sending a double close throwing off the
seqid use.
On Wed, Sep 18, 2019 at 9:07 AM Benjamin Coddington <bcodding@redhat.com> wrote:
>
> On 17 Sep 2019, at 22:20, Leon Kyneur wrote:
>
> > On Tue, Sep 17, 2019 at 7:28 PM Benjamin Coddington
> > <bcodding@redhat.com> wrote:
> >>
> >> On 12 Sep 2019, at 4:27, Leon Kyneur wrote:
> >>
> >>> Hi
> >>>
> >>> I'm experiencing an issue on NFS 4.0 + 4.1 where we cannot call
> >>> fcntl
> >>> locks on any file on the share. The problem goes away if the share
> >>> is
> >>> umount && mount (mount -o remount does not resolve the issue)
> >>>
> >>> Client:
> >>> EL 7.4 3.10.0-693.5.2.el7.x86_64 nfs-utils-1.3.0-0.48.el7_4.x86_64
> >>>
> >>> Server:
> >>> EL 7.4 3.10.0-693.5.2.el7.x86_64 nfs-utils-1.3.0-0.48.el7_4.x86_64
> >>>
> >>> I can't figure this out but the client reports bad-sequence-id in
> >>> dupicate in the logs:
> >>> Sep 12 02:16:59 client kernel: NFS: v4 server returned a bad
> >>> sequence-id error on an unconfirmed sequence ffff881c52286220!
> >>> Sep 12 02:16:59 client kernel: NFS: v4 server returned a bad
> >>> sequence-id error on an unconfirmed sequence ffff881c52286220!
> >>> Sep 12 02:17:39 client kernel: NFS: v4 server returned a bad
> >>> sequence-id error on an unconfirmed sequence ffff8810889cb020!
> >>> Sep 12 02:17:39 client kernel: NFS: v4 server returned a bad
> >>> sequence-id error on an unconfirmed sequence ffff8810889cb020!
> >>> Sep 12 02:17:44 client kernel: NFS: v4 server returned a bad
> >>> sequence-id error on an unconfirmed sequence ffff881b414b2620!
> >>>
> >>> wireshark capture shows only 1 BAD_SEQID reply from the server:
> >>> $ tshark -r client_broken.pcap -z proto,colinfo,rpc.xid,rpc.xid -z
> >>> proto,colinfo,nfs.seqid,nfs.seqid -R 'rpc.xid == 0x9990c61d'
> >>> tshark: -R without -2 is deprecated. For single-pass filtering use
> >>> -Y.
> >>> 141 93 172.27.30.129 -> 172.27.255.28 NFS 352 V4 Call LOCK
> >>> FH:
> >>> 0x80589398 Offset: 0 Length: <End of File> nfs.seqid == 0x0000004e
> >>> nfs.seqid == 0x00000002 rpc.xid == 0x9990c61d
> >>> 142 93 172.27.255.28 -> 172.27.30.129 NFS 124 V4 Reply (Call
> >>> In 141) LOCK Status: NFS4ERR_BAD_SEQID rpc.xid == 0x9990c61d
> >>>
> >>> system call I have identified as triggering it is:
> >>> fcntl(3, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=1073741824,
> >>> len=1}) = -1 EIO (Input/output error)
> >>
> >> Can you simplify the trigger into something repeatable? Can you
> >> determine
> >> if the client or the server has lost track of the sequence?
> >>
> >
> > I have tried, I wrote some code to perform the fcntl RDKLCK the same
> > way and ran it accross
> > thousands of machines without any success. I am quite sure this is a
> > symptom of something
> > not the cause.
> >
> > Is there a better way of tracking sequences other than monitoring the
> > network traffic?
>
> I think that's the best way, right now. We do have tracepoints for
> nfs4 open and close that show the sequence numbers on the client, but
> I'm
> not sure about how to get that from the server side. I don't think we
> have
> seqid for locks in tracepoints.. I could be missing something. Not only
> that, but you might not get tracepoint output showing the sequence
> numbers
> if you're in an error-handling path.
>
> If you have a wire capture of the event, you should be able to go
> backwards
> from the error and figure out what the sequence number on the state
> should
> be for the operation that received BAD_SEQID by finding the last
> sequence-mutating (OPEN,CLOSE,LOCK) operation for that stateid that did
> not
> return an error.
>
> Ben
next prev parent reply other threads:[~2019-09-18 14:32 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-12 8:27 troubleshooting LOCK FH and NFS4ERR_BAD_SEQID Leon Kyneur
2019-09-17 11:28 ` Benjamin Coddington
2019-09-18 2:20 ` Leon Kyneur
2019-09-18 11:04 ` Benjamin Coddington
2019-09-18 14:32 ` Olga Kornievskaia [this message]
2019-09-19 4:22 ` Leon Kyneur
2019-09-19 13:36 ` Olga Kornievskaia
2019-09-24 4:17 ` Leon Kyneur
2019-09-24 14:45 ` Olga Kornievskaia
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAN-5tyERg5kwcD2iugwPVCLDSog0ufKoRRVbC-7pQW-hqLWncQ@mail.gmail.com \
--to=aglo@umich.edu \
--cc=bcodding@redhat.com \
--cc=leonk@dug.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).