linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"aglo@umich.edu" <aglo@umich.edu>
Subject: Re: interrupted rpcs problem
Date: Fri, 10 Jan 2020 21:03:36 +0000	[thread overview]
Message-ID: <3b89b01911b5149533e45478fdcec941a4f915ba.camel@hammerspace.com> (raw)
In-Reply-To: <CAN-5tyFY3XpteXw-fnpj0PQa3M81QGb6VnoxMaJukOZgJZ8ZOg@mail.gmail.com>

On Fri, 2020-01-10 at 14:29 -0500, Olga Kornievskaia wrote:
> Hi folks,
> 
> We are having an issue with an interrupted RPCs again. Here's what I
> see when xfstests were ctrl-c-ed.
> 
> frame 332 SETATTR call slot=0 seqid=0x000013ca (I'm assuming this is
> interrupted and released)
> frame 333 CLOSE call slot=0 seqid=0x000013cb  (only way the slot
> could
> be free before the reply if it was interrupted, right? Otherwise we
> should never have the slot used by more than one outstanding RPC)
> frame 334 reply to 333 with SEQ_MIS_ORDERED (I'm assuming server
> received frame 333 before 332)
> frame 336 CLOSE call slot=0 seqid=0x000013ca (??? why did we
> decremented it. I mean I know why it's in the current code :-/ )
> frame 337 reply to 336 SEQUENCE with ERR_DELAY
> frame 339 reply to 332 SETATTR which nobody is waiting for
> frame 543 CLOSE call slot=0 seqid=0x000013ca (retry after waiting for
> err_delay)
> frame 544 reply to 543 with SETATTR (out of the cache).
> 
> What this leads to is: file is never closed on the server. Can't
> remove it. Unmount fails with CLID_BUSY.
> 
> I believe that's the result of commit
> 3453d5708b33efe76f40eca1c0ed60923094b971.
> We used to have code that bumped the sequence up when the slot was
> interrupted but after the commit "NFSv4.1: Avoid false retries when
> RPC calls are interrupted".
> 
> Commit has this "The obvious fix is to bump the sequence number
> pre-emptively if an
>     RPC call is interrupted, but in order to deal with the corner
> cases
>     where the interrupted call is not actually received and processed
> by
>     the server, we need to interpret the error NFS4ERR_SEQ_MISORDERED
>     as a sign that we need to either wait or locate a correct
> sequence
>     number that lies between the value we sent, and the last value
> that
>     was acked by a SEQUENCE call on that slot."
> 
> If we can't no longer just bump the sequence up, I don't think the
> correct action is to automatically bump it down (as per example
> here)?
> The commit doesn't describe the corner case where it was necessary to
> bump the sequence up. I wonder if we can return the knowledge of the
> interrupted slot and make a decision based on that as well as
> whatever
> the other corner case is.
> 
> I guess what I'm getting is, can somebody (Trond) provide the info
> for
> the corner case for this that patch was created. I can see if I can
> fix the "common" case which is now broken and not break the corner
> case....
> 

There is no pure client side solution for this problem.

The change was made because if you have multiple interruptions of the
RPC call, then the client has to somehow figure out what the correct
slot number is. If it starts low, and then goes high, and the server is
not caching the arguments for the RPC call that is in the session
cache, then we will _always_ hit this bug because we will always hit
the replay of the last entry.

At least if we start high, and iterate by low, then we reduce the
problem to being a race with the processing of the interrupted request
as it is in this case.

However, as I said, the real solution here has to involve the server.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



  reply	other threads:[~2020-01-10 21:03 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-10 19:29 interrupted rpcs problem Olga Kornievskaia
2020-01-10 21:03 ` Trond Myklebust [this message]
2020-01-13 16:08   ` Olga Kornievskaia
2020-01-13 16:49     ` Trond Myklebust
2020-01-13 18:09       ` Olga Kornievskaia
2020-01-13 18:24         ` Trond Myklebust
2020-01-13 21:05           ` Olga Kornievskaia
2020-01-13 21:51             ` Trond Myklebust
2020-01-14 18:43               ` Olga Kornievskaia
2020-01-14 20:52                 ` Trond Myklebust
2020-01-14 22:17                   ` Trond Myklebust
2020-01-14 23:37                   ` Olga Kornievskaia
2020-01-15  1:34                     ` Trond Myklebust
2020-02-12 20:09               ` Olga Kornievskaia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3b89b01911b5149533e45478fdcec941a4f915ba.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=aglo@umich.edu \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).