All of lore.kernel.org
 help / color / mirror / Atom feed
* Linux NFSv4.1 client session seqid sometimes advances by 2
@ 2021-04-12 20:57 Rick Macklem
  2021-04-13 13:31 ` Olga Kornievskaia
  0 siblings, 1 reply; 7+ messages in thread
From: Rick Macklem @ 2021-04-12 20:57 UTC (permalink / raw)
  To: Linux-NFS

Hi,

During testing of a Fedora Core 30 (5.2.10 kernel) against a FreeBSD
server (4.1 mount), I have been simulating a network partitioning
for a few minutes (until the TCP connection goes to SYN_SENT on
the Linux client).

Sometimes, after the network partition heals, the FreeBSD server
replies NFS4ERR_SEQ_MISORDERED.
Looking at the packet trace, the seqid for the slot has advanced by
2 instead of 1. An RPC request for old-seqid + 1 never seems to get
sent.
--> Since sending an RPC with "seqid + 2" but never sending one
       that is "seqid + 1" for a slot seems harmless, I have added an optional
       hack (can be turned off), to allow this case instead of replying
       NFS4ERR_SEQ_MISORDERED for it. The code will still reply
       NFS4ERR_SEQ_MISORDERED if an RPC for the slot with
       "old seqid + 1" in it.
       --> Yes, doing this hack is a violation of RFC5661, but I've
             done it anyhow.

If you are interested in a packet capture with this in it:
fetch https://people.freebsd.org/~rmacklem/linuxtofreenfs.pcap
- then look at packet #1945 and #2072
  --> You'll see that slot #1 seqid goes from 4 to 6. There is no
         slot#1 seqid 5 RPC sent on the wire.
         (This packet capture was taken on the Linux client using
          tcpdump.)
--> Btw, the "RST battle" you'll see in the above trace between
       #2005 and #2068 that goes on until the FreeBSD
       krpc/NFS times out the connection after 6min. seems to be a recent
       FreeBSD TCP bug.
       I have reproduced this seqid advances by 2 on an older system
       that does not "RST battle" and allows the reconnect right away,
       once the network partition is healed, so it does seem to be
       relevant to this bug.

Someday, I will get around to upgrading to a more recent Linux
kernel and will test to see if I can still reproduce this bug.
On 5.2.10, it is intermittent and does not occur every time I
do the network partitioning test.

Mostly just fyi, rick

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Linux NFSv4.1 client session seqid sometimes advances by 2
  2021-04-12 20:57 Linux NFSv4.1 client session seqid sometimes advances by 2 Rick Macklem
@ 2021-04-13 13:31 ` Olga Kornievskaia
  2021-04-13 17:17   ` J. Bruce Fields
  0 siblings, 1 reply; 7+ messages in thread
From: Olga Kornievskaia @ 2021-04-13 13:31 UTC (permalink / raw)
  To: Rick Macklem; +Cc: Linux-NFS

On Tue, Apr 13, 2021 at 3:08 AM Rick Macklem <rmacklem@uoguelph.ca> wrote:
>
> Hi,
>
> During testing of a Fedora Core 30 (5.2.10 kernel) against a FreeBSD
> server (4.1 mount), I have been simulating a network partitioning
> for a few minutes (until the TCP connection goes to SYN_SENT on
> the Linux client).
>
> Sometimes, after the network partition heals, the FreeBSD server
> replies NFS4ERR_SEQ_MISORDERED.
> Looking at the packet trace, the seqid for the slot has advanced by
> 2 instead of 1. An RPC request for old-seqid + 1 never seems to get
> sent.
> --> Since sending an RPC with "seqid + 2" but never sending one
>        that is "seqid + 1" for a slot seems harmless, I have added an optional
>        hack (can be turned off), to allow this case instead of replying
>        NFS4ERR_SEQ_MISORDERED for it. The code will still reply
>        NFS4ERR_SEQ_MISORDERED if an RPC for the slot with
>        "old seqid + 1" in it.
>        --> Yes, doing this hack is a violation of RFC5661, but I've
>              done it anyhow.
>
> If you are interested in a packet capture with this in it:
> fetch https://people.freebsd.org/~rmacklem/linuxtofreenfs.pcap
> - then look at packet #1945 and #2072
>   --> You'll see that slot #1 seqid goes from 4 to 6. There is no
>          slot#1 seqid 5 RPC sent on the wire.
>          (This packet capture was taken on the Linux client using
>           tcpdump.)
> --> Btw, the "RST battle" you'll see in the above trace between
>        #2005 and #2068 that goes on until the FreeBSD
>        krpc/NFS times out the connection after 6min. seems to be a recent
>        FreeBSD TCP bug.
>        I have reproduced this seqid advances by 2 on an older system
>        that does not "RST battle" and allows the reconnect right away,
>        once the network partition is healed, so it does seem to be
>        relevant to this bug.
>
> Someday, I will get around to upgrading to a more recent Linux
> kernel and will test to see if I can still reproduce this bug.
> On 5.2.10, it is intermittent and does not occur every time I
> do the network partitioning test.
>
> Mostly just fyi, rick

Hi Rick,

I think this is happening because slotid=1 had something queued up
using seqid=5 and that was interrupted because the connection was
RSTed. For the interrupted slot, the client would send solo SEQUENCE
with +1 seqid.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Linux NFSv4.1 client session seqid sometimes advances by 2
  2021-04-13 13:31 ` Olga Kornievskaia
@ 2021-04-13 17:17   ` J. Bruce Fields
  2021-04-13 18:59     ` Olga Kornievskaia
  0 siblings, 1 reply; 7+ messages in thread
From: J. Bruce Fields @ 2021-04-13 17:17 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: Rick Macklem, Linux-NFS

On Tue, Apr 13, 2021 at 09:31:37AM -0400, Olga Kornievskaia wrote:
> On Tue, Apr 13, 2021 at 3:08 AM Rick Macklem <rmacklem@uoguelph.ca> wrote:
> >
> > Hi,
> >
> > During testing of a Fedora Core 30 (5.2.10 kernel) against a FreeBSD
> > server (4.1 mount), I have been simulating a network partitioning
> > for a few minutes (until the TCP connection goes to SYN_SENT on
> > the Linux client).
> >
> > Sometimes, after the network partition heals, the FreeBSD server
> > replies NFS4ERR_SEQ_MISORDERED.
> > Looking at the packet trace, the seqid for the slot has advanced by
> > 2 instead of 1. An RPC request for old-seqid + 1 never seems to get
> > sent.
> > --> Since sending an RPC with "seqid + 2" but never sending one
> >        that is "seqid + 1" for a slot seems harmless, I have added an optional
> >        hack (can be turned off), to allow this case instead of replying
> >        NFS4ERR_SEQ_MISORDERED for it. The code will still reply
> >        NFS4ERR_SEQ_MISORDERED if an RPC for the slot with
> >        "old seqid + 1" in it.
> >        --> Yes, doing this hack is a violation of RFC5661, but I've
> >              done it anyhow.
> >
> > If you are interested in a packet capture with this in it:
> > fetch https://people.freebsd.org/~rmacklem/linuxtofreenfs.pcap
> > - then look at packet #1945 and #2072
> >   --> You'll see that slot #1 seqid goes from 4 to 6. There is no
> >          slot#1 seqid 5 RPC sent on the wire.
> >          (This packet capture was taken on the Linux client using
> >           tcpdump.)
> > --> Btw, the "RST battle" you'll see in the above trace between
> >        #2005 and #2068 that goes on until the FreeBSD
> >        krpc/NFS times out the connection after 6min. seems to be a recent
> >        FreeBSD TCP bug.
> >        I have reproduced this seqid advances by 2 on an older system
> >        that does not "RST battle" and allows the reconnect right away,
> >        once the network partition is healed, so it does seem to be
> >        relevant to this bug.
> >
> > Someday, I will get around to upgrading to a more recent Linux
> > kernel and will test to see if I can still reproduce this bug.
> > On 5.2.10, it is intermittent and does not occur every time I
> > do the network partitioning test.
> >
> > Mostly just fyi, rick
> 
> Hi Rick,
> 
> I think this is happening because slotid=1 had something queued up
> using seqid=5 and that was interrupted because the connection was
> RSTed. For the interrupted slot, the client would send solo SEQUENCE
> with +1 seqid.

Doesn't the client send the solo SEQUENCE with seqid 5 in that case?

--b.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Linux NFSv4.1 client session seqid sometimes advances by 2
  2021-04-13 17:17   ` J. Bruce Fields
@ 2021-04-13 18:59     ` Olga Kornievskaia
  2021-04-13 19:29       ` J. Bruce Fields
  0 siblings, 1 reply; 7+ messages in thread
From: Olga Kornievskaia @ 2021-04-13 18:59 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Rick Macklem, Linux-NFS

On Tue, Apr 13, 2021 at 1:17 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Tue, Apr 13, 2021 at 09:31:37AM -0400, Olga Kornievskaia wrote:
> > On Tue, Apr 13, 2021 at 3:08 AM Rick Macklem <rmacklem@uoguelph.ca> wrote:
> > >
> > > Hi,
> > >
> > > During testing of a Fedora Core 30 (5.2.10 kernel) against a FreeBSD
> > > server (4.1 mount), I have been simulating a network partitioning
> > > for a few minutes (until the TCP connection goes to SYN_SENT on
> > > the Linux client).
> > >
> > > Sometimes, after the network partition heals, the FreeBSD server
> > > replies NFS4ERR_SEQ_MISORDERED.
> > > Looking at the packet trace, the seqid for the slot has advanced by
> > > 2 instead of 1. An RPC request for old-seqid + 1 never seems to get
> > > sent.
> > > --> Since sending an RPC with "seqid + 2" but never sending one
> > >        that is "seqid + 1" for a slot seems harmless, I have added an optional
> > >        hack (can be turned off), to allow this case instead of replying
> > >        NFS4ERR_SEQ_MISORDERED for it. The code will still reply
> > >        NFS4ERR_SEQ_MISORDERED if an RPC for the slot with
> > >        "old seqid + 1" in it.
> > >        --> Yes, doing this hack is a violation of RFC5661, but I've
> > >              done it anyhow.
> > >
> > > If you are interested in a packet capture with this in it:
> > > fetch https://people.freebsd.org/~rmacklem/linuxtofreenfs.pcap
> > > - then look at packet #1945 and #2072
> > >   --> You'll see that slot #1 seqid goes from 4 to 6. There is no
> > >          slot#1 seqid 5 RPC sent on the wire.
> > >          (This packet capture was taken on the Linux client using
> > >           tcpdump.)
> > > --> Btw, the "RST battle" you'll see in the above trace between
> > >        #2005 and #2068 that goes on until the FreeBSD
> > >        krpc/NFS times out the connection after 6min. seems to be a recent
> > >        FreeBSD TCP bug.
> > >        I have reproduced this seqid advances by 2 on an older system
> > >        that does not "RST battle" and allows the reconnect right away,
> > >        once the network partition is healed, so it does seem to be
> > >        relevant to this bug.
> > >
> > > Someday, I will get around to upgrading to a more recent Linux
> > > kernel and will test to see if I can still reproduce this bug.
> > > On 5.2.10, it is intermittent and does not occur every time I
> > > do the network partitioning test.
> > >
> > > Mostly just fyi, rick
> >
> > Hi Rick,
> >
> > I think this is happening because slotid=1 had something queued up
> > using seqid=5 and that was interrupted because the connection was
> > RSTed. For the interrupted slot, the client would send solo SEQUENCE
> > with +1 seqid.
>
> Doesn't the client send the solo SEQUENCE with seqid 5 in that case?

No it sends with seq+1 because NFS layer client doesn't know if seqid
actually was actually transmitted before the connection got caught
(and/or received by the server).

>
> --b.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Linux NFSv4.1 client session seqid sometimes advances by 2
  2021-04-13 18:59     ` Olga Kornievskaia
@ 2021-04-13 19:29       ` J. Bruce Fields
  2021-04-13 20:52         ` Olga Kornievskaia
  0 siblings, 1 reply; 7+ messages in thread
From: J. Bruce Fields @ 2021-04-13 19:29 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: Rick Macklem, Linux-NFS

On Tue, Apr 13, 2021 at 02:59:27PM -0400, Olga Kornievskaia wrote:
> On Tue, Apr 13, 2021 at 1:17 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> >
> > On Tue, Apr 13, 2021 at 09:31:37AM -0400, Olga Kornievskaia wrote:
> > > On Tue, Apr 13, 2021 at 3:08 AM Rick Macklem <rmacklem@uoguelph.ca> wrote:
> > > >
> > > > Hi,
> > > >
> > > > During testing of a Fedora Core 30 (5.2.10 kernel) against a FreeBSD
> > > > server (4.1 mount), I have been simulating a network partitioning
> > > > for a few minutes (until the TCP connection goes to SYN_SENT on
> > > > the Linux client).
> > > >
> > > > Sometimes, after the network partition heals, the FreeBSD server
> > > > replies NFS4ERR_SEQ_MISORDERED.
> > > > Looking at the packet trace, the seqid for the slot has advanced by
> > > > 2 instead of 1. An RPC request for old-seqid + 1 never seems to get
> > > > sent.
> > > > --> Since sending an RPC with "seqid + 2" but never sending one
> > > >        that is "seqid + 1" for a slot seems harmless, I have added an optional
> > > >        hack (can be turned off), to allow this case instead of replying
> > > >        NFS4ERR_SEQ_MISORDERED for it. The code will still reply
> > > >        NFS4ERR_SEQ_MISORDERED if an RPC for the slot with
> > > >        "old seqid + 1" in it.
> > > >        --> Yes, doing this hack is a violation of RFC5661, but I've
> > > >              done it anyhow.
> > > >
> > > > If you are interested in a packet capture with this in it:
> > > > fetch https://people.freebsd.org/~rmacklem/linuxtofreenfs.pcap
> > > > - then look at packet #1945 and #2072
> > > >   --> You'll see that slot #1 seqid goes from 4 to 6. There is no
> > > >          slot#1 seqid 5 RPC sent on the wire.
> > > >          (This packet capture was taken on the Linux client using
> > > >           tcpdump.)
> > > > --> Btw, the "RST battle" you'll see in the above trace between
> > > >        #2005 and #2068 that goes on until the FreeBSD
> > > >        krpc/NFS times out the connection after 6min. seems to be a recent
> > > >        FreeBSD TCP bug.
> > > >        I have reproduced this seqid advances by 2 on an older system
> > > >        that does not "RST battle" and allows the reconnect right away,
> > > >        once the network partition is healed, so it does seem to be
> > > >        relevant to this bug.
> > > >
> > > > Someday, I will get around to upgrading to a more recent Linux
> > > > kernel and will test to see if I can still reproduce this bug.
> > > > On 5.2.10, it is intermittent and does not occur every time I
> > > > do the network partitioning test.
> > > >
> > > > Mostly just fyi, rick
> > >
> > > Hi Rick,
> > >
> > > I think this is happening because slotid=1 had something queued up
> > > using seqid=5 and that was interrupted because the connection was
> > > RSTed. For the interrupted slot, the client would send solo SEQUENCE
> > > with +1 seqid.
> >
> > Doesn't the client send the solo SEQUENCE with seqid 5 in that case?
> 
> No it sends with seq+1 because NFS layer client doesn't know if seqid
> actually was actually transmitted before the connection got caught
> (and/or received by the server).

But then the MISORDERED tells the client it wasn't received, so the
client follows up with a call with seqid 5--is that what happens?

Sorry, I seem to recall we went through this all a couple years ago, but
now I've forgotten how it works.

--b.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Linux NFSv4.1 client session seqid sometimes advances by 2
  2021-04-13 19:29       ` J. Bruce Fields
@ 2021-04-13 20:52         ` Olga Kornievskaia
  2021-04-13 23:08           ` Rick Macklem
  0 siblings, 1 reply; 7+ messages in thread
From: Olga Kornievskaia @ 2021-04-13 20:52 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Rick Macklem, Linux-NFS

On Tue, Apr 13, 2021 at 3:29 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Tue, Apr 13, 2021 at 02:59:27PM -0400, Olga Kornievskaia wrote:
> > On Tue, Apr 13, 2021 at 1:17 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> > >
> > > On Tue, Apr 13, 2021 at 09:31:37AM -0400, Olga Kornievskaia wrote:
> > > > On Tue, Apr 13, 2021 at 3:08 AM Rick Macklem <rmacklem@uoguelph.ca> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > During testing of a Fedora Core 30 (5.2.10 kernel) against a FreeBSD
> > > > > server (4.1 mount), I have been simulating a network partitioning
> > > > > for a few minutes (until the TCP connection goes to SYN_SENT on
> > > > > the Linux client).
> > > > >
> > > > > Sometimes, after the network partition heals, the FreeBSD server
> > > > > replies NFS4ERR_SEQ_MISORDERED.
> > > > > Looking at the packet trace, the seqid for the slot has advanced by
> > > > > 2 instead of 1. An RPC request for old-seqid + 1 never seems to get
> > > > > sent.
> > > > > --> Since sending an RPC with "seqid + 2" but never sending one
> > > > >        that is "seqid + 1" for a slot seems harmless, I have added an optional
> > > > >        hack (can be turned off), to allow this case instead of replying
> > > > >        NFS4ERR_SEQ_MISORDERED for it. The code will still reply
> > > > >        NFS4ERR_SEQ_MISORDERED if an RPC for the slot with
> > > > >        "old seqid + 1" in it.
> > > > >        --> Yes, doing this hack is a violation of RFC5661, but I've
> > > > >              done it anyhow.
> > > > >
> > > > > If you are interested in a packet capture with this in it:
> > > > > fetch https://people.freebsd.org/~rmacklem/linuxtofreenfs.pcap
> > > > > - then look at packet #1945 and #2072
> > > > >   --> You'll see that slot #1 seqid goes from 4 to 6. There is no
> > > > >          slot#1 seqid 5 RPC sent on the wire.
> > > > >          (This packet capture was taken on the Linux client using
> > > > >           tcpdump.)
> > > > > --> Btw, the "RST battle" you'll see in the above trace between
> > > > >        #2005 and #2068 that goes on until the FreeBSD
> > > > >        krpc/NFS times out the connection after 6min. seems to be a recent
> > > > >        FreeBSD TCP bug.
> > > > >        I have reproduced this seqid advances by 2 on an older system
> > > > >        that does not "RST battle" and allows the reconnect right away,
> > > > >        once the network partition is healed, so it does seem to be
> > > > >        relevant to this bug.
> > > > >
> > > > > Someday, I will get around to upgrading to a more recent Linux
> > > > > kernel and will test to see if I can still reproduce this bug.
> > > > > On 5.2.10, it is intermittent and does not occur every time I
> > > > > do the network partitioning test.
> > > > >
> > > > > Mostly just fyi, rick
> > > >
> > > > Hi Rick,
> > > >
> > > > I think this is happening because slotid=1 had something queued up
> > > > using seqid=5 and that was interrupted because the connection was
> > > > RSTed. For the interrupted slot, the client would send solo SEQUENCE
> > > > with +1 seqid.
> > >
> > > Doesn't the client send the solo SEQUENCE with seqid 5 in that case?
> >
> > No it sends with seq+1 because NFS layer client doesn't know if seqid
> > actually was actually transmitted before the connection got caught
> > (and/or received by the server).
>
> But then the MISORDERED tells the client it wasn't received, so the
> client follows up with a call with seqid 5--is that what happens?

Correct. If there were no error then the server did indeed consume
seqid. And if an error was returned then the client knows to
decrement.

> Sorry, I seem to recall we went through this all a couple years ago, but
> now I've forgotten how it works.
>
> --b.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Linux NFSv4.1 client session seqid sometimes advances by 2
  2021-04-13 20:52         ` Olga Kornievskaia
@ 2021-04-13 23:08           ` Rick Macklem
  0 siblings, 0 replies; 7+ messages in thread
From: Rick Macklem @ 2021-04-13 23:08 UTC (permalink / raw)
  To: Olga Kornievskaia, J. Bruce Fields; +Cc: Linux-NFS

Olga Kornievskaia wrote:
>On Tue, Apr 13, 2021 at 3:29 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>>
>> On Tue, Apr 13, 2021 at 02:59:27PM -0400, Olga Kornievskaia wrote:
>> > On Tue, Apr 13, 2021 at 1:17 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>> > >
>> > > On Tue, Apr 13, 2021 at 09:31:37AM -0400, Olga Kornievskaia wrote:
>> > > > On Tue, Apr 13, 2021 at 3:08 AM Rick Macklem <rmacklem@uoguelph.ca> wrote:
>> > > > >
>> > > > > Hi,
>> > > > >
>> > > > > During testing of a Fedora Core 30 (5.2.10 kernel) against a FreeBSD
>> > > > > server (4.1 mount), I have been simulating a network partitioning
>> > > > > for a few minutes (until the TCP connection goes to SYN_SENT on
>> > > > > the Linux client).
>> > > > >
>> > > > > Sometimes, after the network partition heals, the FreeBSD server
>> > > > > replies NFS4ERR_SEQ_MISORDERED.
>> > > > > Looking at the packet trace, the seqid for the slot has advanced by
>> > > > > 2 instead of 1. An RPC request for old-seqid + 1 never seems to get
>> > > > > sent.
>> > > > > --> Since sending an RPC with "seqid + 2" but never sending one
>> > > > >        that is "seqid + 1" for a slot seems harmless, I have added an optional
>> > > > >        hack (can be turned off), to allow this case instead of replying
>> > > > >        NFS4ERR_SEQ_MISORDERED for it. The code will still reply
>> > > > >        NFS4ERR_SEQ_MISORDERED if an RPC for the slot with
>> > > > >        "old seqid + 1" in it.
>> > > > >        --> Yes, doing this hack is a violation of RFC5661, but I've
>> > > > >              done it anyhow.
>> > > > >
>> > > > > If you are interested in a packet capture with this in it:
>> > > > > fetch https://people.freebsd.org/~rmacklem/linuxtofreenfs.pcap
>> > > > > - then look at packet #1945 and #2072
>> > > > >   --> You'll see that slot #1 seqid goes from 4 to 6. There is no
>> > > > >          slot#1 seqid 5 RPC sent on the wire.
>> > > > >          (This packet capture was taken on the Linux client using
>> > > > >           tcpdump.)
>> > > > > --> Btw, the "RST battle" you'll see in the above trace between
>> > > > >        #2005 and #2068 that goes on until the FreeBSD
>> > > > >        krpc/NFS times out the connection after 6min. seems to be a recent
>> > > > >        FreeBSD TCP bug.
>> > > > >        I have reproduced this seqid advances by 2 on an older system
>> > > > >        that does not "RST battle" and allows the reconnect right away,
>> > > > >        once the network partition is healed, so it does seem to be
>> > > > >        relevant to this bug.
>> > > > >
>> > > > > Someday, I will get around to upgrading to a more recent Linux
>> > > > > kernel and will test to see if I can still reproduce this bug.
>> > > > > On 5.2.10, it is intermittent and does not occur every time I
>> > > > > do the network partitioning test.
>> > > > >
>> > > > > Mostly just fyi, rick
>> > > >
>> > > > Hi Rick,
>> > > >
>> > > > I think this is happening because slotid=1 had something queued up
>> > > > using seqid=5 and that was interrupted because the connection was
>> > > > RSTed. For the interrupted slot, the client would send solo SEQUENCE
>> > > > with +1 seqid.
>> > >
>> > > Doesn't the client send the solo SEQUENCE with seqid 5 in that case?
>> >
>> > No it sends with seq+1 because NFS layer client doesn't know if seqid
>> > actually was actually transmitted before the connection got caught
>> > (and/or received by the server).
>>
>> But then the MISORDERED tells the client it wasn't received, so the
>> client follows up with a call with seqid 5--is that what happens?
>
>Correct. If there were no error then the server did indeed consume
>seqid. And if an error was returned then the client knows to
>decrement.
Ok. Yes. I took a closer look at the packet capture and that is what happened.
On slot#1...(server expecting seqid == 5):
- Sequence with seqid 6
--> reply NFS4ERR_SEQ_MISORDERED
- next use of slot#1, Sequence with seqid 5
--> reply NFS_OK
- next use of slot#1, Sequence with seqid 6
--> reply NFS_OK
- then slot#1 gets used normally by other RPCs

So it's an intentional "probe to re-synchronize the
seqid for the slot"?
Clever.

I'll back my "hack" out, since I now see it is not necessary
(and does violate the RFC'-).

So, is this how the Linux client deals with "soft,intr" NFSv4.1
mounts?
--> I've never solved that for FreeBSD and just note in the BUGS
       section of the man page that "soft,intr" mount options break
       an NFSv4 mount.
I might *borrow* this idea. I'll attribute it to whoever thought of
the idea, if you let me know who that is?

Thanks for the clarification, rick

> Sorry, I seem to recall we went through this all a couple years ago, but
> now I've forgotten how it works.
>
> --b.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-04-13 23:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-12 20:57 Linux NFSv4.1 client session seqid sometimes advances by 2 Rick Macklem
2021-04-13 13:31 ` Olga Kornievskaia
2021-04-13 17:17   ` J. Bruce Fields
2021-04-13 18:59     ` Olga Kornievskaia
2021-04-13 19:29       ` J. Bruce Fields
2021-04-13 20:52         ` Olga Kornievskaia
2021-04-13 23:08           ` Rick Macklem

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.