From: Chuck Lever III <chuck.lever@oracle.com>
To: Trond Myklebust <trondmy@hammerspace.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: SOFT + NO_RETRANS_TIMEOUT semantics
Date: Mon, 12 Jul 2021 17:07:07 +0000 [thread overview]
Message-ID: <981B8D74-2193-498C-8C4F-190E263FD8F6@oracle.com> (raw)
Hi Trond-
I'm seeing some interesting client hangs that arise from a well-
timed server crash or network partition.
The easiest to see is gss_destroy() on an Kerberized NFSv4 mount.
NFSv4 asserts the RPC_TASK_NO_RETRANS_TIMEOUT flag (hereafter I'll
refer to it as NORTO) when creating a new rpc_clnt. The initial
rpc_ping() for that rpc_clnt is done before the logic that sets
cl_noretranstimeo, thus that ping works as expected (SOFT |
SOFTCONN) and can time out properly if the server isn't
responsive.
However, once that ping succeeds, cl_noretranstimeo is asserted,
and all subsequent RPC requests on that rpc_clnt are with NORTO
semantics.
When it comes time to destroy the GSS context for that rpc_clnt,
the NULL procedure with the GSS decorations is sent with SOFT |
SOFTCONN | NORTO. If the server isn't responding at that point,
the client continues to retransmit the GSS context destruction
request forever, and the xprt and possibly the nfs_client are
pinned.
The problem also arises for lease management operations such as
singleton SEQUENCE or RENEW requests. These are also done with
SOFT, as I recall they need to time out properly. But with
NORTO + SOFT, they will be retried until a connection loss that
might never come.
I've thought of some ways to modify the cl_noretranstimeo logic
such that it can be disabled for particular RPC tasks, though
none is really striking me as exceptionally clever:
- Add a field to struct rpc_procinfo that contains a mask of
RPC_TASK flags to clear for each procedure.
- Add logic to rpc_task_set_client() that clears NORTO in
some special cases.
- Reverse the meaning of NORTO (e.g., make it
RPC_TASK_RETRANS_TIMEOUT) so that it can be set by a caller
for particular RPC tasks if the rpc_clnt-default behavior
is NORTO.
Any thoughts?
--
Chuck Lever
next reply other threads:[~2021-07-12 17:07 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-12 17:07 Chuck Lever III [this message]
2021-07-12 17:36 ` SOFT + NO_RETRANS_TIMEOUT semantics Trond Myklebust
2021-07-12 17:48 ` Chuck Lever III
2021-07-12 18:03 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=981B8D74-2193-498C-8C4F-190E263FD8F6@oracle.com \
--to=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=trondmy@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.