linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "smayhew@redhat.com" <smayhew@redhat.com>
Cc: "bfields@fieldses.org" <bfields@fieldses.org>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"jlayton@kernel.org" <jlayton@kernel.org>
Subject: Re: [PATCH] nfsd: CB_RECALL can race with FREE_STATEID
Date: Thu, 2 May 2019 11:49:02 +0000	[thread overview]
Message-ID: <aeb95b24e600d8400dc860d88cbd4f79a33766a7.camel@hammerspace.com> (raw)
In-Reply-To: <20190502113500.GH15226@coeurl.usersys.redhat.com>

On Thu, 2019-05-02 at 07:35 -0400, Scott Mayhew wrote:
> On Tue, 30 Apr 2019, Trond Myklebust wrote:
> 
> > On Tue, 2019-04-30 at 14:58 -0400, Scott Mayhew wrote:
> > > On Thu, 18 Apr 2019, J. Bruce Fields wrote:
> > > 
> > > > On Thu, Apr 18, 2019 at 04:50:24PM -0400, Scott Mayhew wrote:
> > > > > On Thu, 18 Apr 2019, J. Bruce Fields wrote:
> > > > > 
> > > > > > On Thu, Apr 18, 2019 at 09:24:00AM -0400, Scott Mayhew
> > > > > > wrote:
> > > > > > > While trying to track down some issues involving large
> > > > > > > numbers of
> > > > > > > delegations being recalled/revoked, I caught the server
> > > > > > > setting
> > > > > > > SEQ4_STATUS_CB_PATH_DOWN while the client was actively
> > > > > > > responding to
> > > > > > > CB_RECALLs.  It turns out that the client had already
> > > > > > > done a
> > > > > > > TEST_STATEID and FREE_STATEID for a delegation being
> > > > > > > recalled
> > > > > > > by the
> > > > > > > time it received the CB_RECALL.
> > > > > > 
> > > > > > That's interesting, thanks!
> > > > > > 
> > > > > > This exception seems awfully narrow, though.
> > > > > > 
> > > > > > If we get back any NFS-level error at all, then I think the
> > > > > > callback
> > > > > > channel is working (am I wrong?)
> > > > > 
> > > > > Correct, if the client replies with either NFS4ERR_DELAY or
> > > > > NFS4ERR_BAD_STATEID, the server will retry 1 time (see
> > > > > dl_retries).
> > > > > After that, we fall thru and nfsd4_cb_recall_done() returns
> > > > > -1
> > > > > which
> > > > > causes the SEQ4_STATUS_CB_PATH_DOWN flag to be set.
> > > > > 
> > > > > > and telling the client to set up a new
> > > > > > one is probably not going to help.  The best we can do is
> > > > > > probably just
> > > > > > give up
> > > > > 
> > > > > That's what the patch is essentially doing.  Or are you
> > > > > saying
> > > > > don't
> > > > > even bother with the checks but still return 1 so we don't
> > > > > set
> > > > > the
> > > > > SEQ4_STATUS_CB_PATH_DOWN flag?
> > > > 
> > > > Right, I don't see any point returning -1 (which ends up
> > > > setting
> > > > CB_PATH_DOWN) in any case where we get an nfs-level error.  If
> > > > the
> > > > client got so far as returning an error, then the callback path
> > > > is
> > > > working.
> > > > 
> > > > I'm not sure exactly what errors *should* result in
> > > > CB_PATH_DOWN,
> > > > though.  ETIMEDOUT, ENOTCONN, EIO?
> > > 
> > > I'm not sure either.  Looking at
> > > call_status/call_timeout/rpc_check_timeout, it looks to me like
> > > ENOTCONN
> > > will be translated to ETIMEDOUT because nfsd4_run_cb_work sets
> > > the 
> > > RPC_TASK_SOFTCONN flag in the call to rpc_call_async.
> > > 
> > > It looks like call_status can return EHOSTDOWN, ENETDOWN,
> > > EHOSTUNREACH,
> > > ENETUNREACH, and EPERM... should those be handled as well?
> > 
> > Those errors should never be passed back to applications.
> 
> I'm confused.  If call_status passes any of those errors to rpc_exit,
> then I'll see them in rpc_call_done/nfsd4_cb_done, won't I?
> 

If you ever see them in rpc_call_done/nfsd4_cb_done, then it will be
due to an RPC bug which will need to be fixed.

call_status() should be handling those errors by retrying, and if it
runs out of retry attempts, then it should be setting either an EIO
error or an ETIMEDOUT error, depending on which rpc_task flags have
been specified.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



  reply	other threads:[~2019-05-02 11:49 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-18 13:24 [PATCH] nfsd: CB_RECALL can race with FREE_STATEID Scott Mayhew
2019-04-18 15:13 ` J. Bruce Fields
2019-04-18 20:50   ` Scott Mayhew
2019-04-18 21:37     ` J. Bruce Fields
2019-04-30 18:58       ` Scott Mayhew
2019-04-30 19:03         ` Trond Myklebust
2019-05-02 11:35           ` Scott Mayhew
2019-05-02 11:49             ` Trond Myklebust [this message]
2019-04-18 22:03     ` Trond Myklebust
2019-04-18 23:42       ` bfields
2019-04-30 18:46       ` Scott Mayhew

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aeb95b24e600d8400dc860d88cbd4f79a33766a7.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=bfields@fieldses.org \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=smayhew@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).