All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever III <chuck.lever@oracle.com>
To: Trond Myklebust <trondmy@hammerspace.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: SOFT + NO_RETRANS_TIMEOUT semantics
Date: Mon, 12 Jul 2021 17:07:07 +0000	[thread overview]
Message-ID: <981B8D74-2193-498C-8C4F-190E263FD8F6@oracle.com> (raw)

Hi Trond-

I'm seeing some interesting client hangs that arise from a well-
timed server crash or network partition.

The easiest to see is gss_destroy() on an Kerberized NFSv4 mount.

NFSv4 asserts the RPC_TASK_NO_RETRANS_TIMEOUT flag (hereafter I'll
refer to it as NORTO) when creating a new rpc_clnt. The initial
rpc_ping() for that rpc_clnt is done before the logic that sets
cl_noretranstimeo, thus that ping works as expected (SOFT |
SOFTCONN) and can time out properly if the server isn't
responsive.

However, once that ping succeeds, cl_noretranstimeo is asserted,
and all subsequent RPC requests on that rpc_clnt are with NORTO
semantics.

When it comes time to destroy the GSS context for that rpc_clnt,
the NULL procedure with the GSS decorations is sent with SOFT |
SOFTCONN | NORTO. If the server isn't responding at that point,
the client continues to retransmit the GSS context destruction
request forever, and the xprt and possibly the nfs_client are
pinned.

The problem also arises for lease management operations such as
singleton SEQUENCE or RENEW requests. These are also done with
SOFT, as I recall they need to time out properly. But with
NORTO + SOFT, they will be retried until a connection loss that
might never come.

I've thought of some ways to modify the cl_noretranstimeo logic
such that it can be disabled for particular RPC tasks, though
none is really striking me as exceptionally clever:

 - Add a field to struct rpc_procinfo that contains a mask of
   RPC_TASK flags to clear for each procedure.
 - Add logic to rpc_task_set_client() that clears NORTO in
   some special cases.
 - Reverse the meaning of NORTO (e.g., make it
   RPC_TASK_RETRANS_TIMEOUT) so that it can be set by a caller
   for particular RPC tasks if the rpc_clnt-default behavior
   is NORTO.

Any thoughts?

--
Chuck Lever




             reply	other threads:[~2021-07-12 17:07 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-12 17:07 Chuck Lever III [this message]
2021-07-12 17:36 ` SOFT + NO_RETRANS_TIMEOUT semantics Trond Myklebust
2021-07-12 17:48   ` Chuck Lever III
2021-07-12 18:03     ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=981B8D74-2193-498C-8C4F-190E263FD8F6@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.