All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: Dai Ngo <dai.ngo@oracle.com>
Cc: jlayton@kernel.org, linux-nfs@vger.kernel.org
Subject: Re: [PATCH 1/1] NFSD: cancel CB_RECALL_ANY call when nfs4_client is about to be destroyed
Date: Fri, 29 Mar 2024 10:55:54 -0400	[thread overview]
Message-ID: <ZgbWevtNp8Vust4A@tissot.1015granger.net> (raw)
In-Reply-To: <c97be8b9-c0ba-4f2d-9340-78368008ba4b@oracle.com>

On Thu, Mar 28, 2024 at 05:31:02PM -0700, Dai Ngo wrote:
> 
> On 3/28/24 11:14 AM, Dai Ngo wrote:
> > 
> > On 3/28/24 7:08 AM, Chuck Lever wrote:
> > > On Wed, Mar 27, 2024 at 06:09:28PM -0700, Dai Ngo wrote:
> > > > On 3/26/24 11:27 AM, Chuck Lever wrote:
> > > > > On Tue, Mar 26, 2024 at 11:13:29AM -0700, Dai Ngo wrote:
> > > > > > Currently when a nfs4_client is destroyed we wait for
> > > > > > the cb_recall_any
> > > > > > callback to complete before proceed. This adds
> > > > > > unnecessary delay to the
> > > > > > __destroy_client call if there is problem communicating
> > > > > > with the client.
> > > > > By "unnecessary delay" do you mean only the seven-second RPC
> > > > > retransmit timeout, or is there something else?
> > > > when the client network interface is down, the RPC task takes ~9s to
> > > > send the callback, waits for the reply and gets ETIMEDOUT. This process
> > > > repeats in a loop with the same RPC task before being stopped by
> > > > rpc_shutdown_client after client lease expires.
> > > I'll have to review this code again, but rpc_shutdown_client
> > > should cause these RPCs to terminate immediately and safely. Can't
> > > we use that?
> > 
> > rpc_shutdown_client works, it terminated the RPC call to stop the loop.
> > 
> > > 
> > > 
> > > > It takes a total of about 1m20s before the CB_RECALL is terminated.
> > > > For CB_RECALL_ANY and CB_OFFLOAD, this process gets in to a infinite
> > > > loop since there is no delegation conflict and the client is allowed
> > > > to stay in courtesy state.
> > > > 
> > > > The loop happens because in nfsd4_cb_sequence_done if cb_seq_status
> > > > is 1 (an RPC Reply was never received) it calls nfsd4_mark_cb_fault
> > > > to set the NFSD4_CB_FAULT bit. It then sets cb_need_restart to true.
> > > > When nfsd4_cb_release is called, it checks cb_need_restart bit and
> > > > re-queues the work again.
> > > Something in the sequence_done path should check if the server is
> > > tearing down this callback connection. If it doesn't, that is a bug
> > > IMO.
> 
> TCP terminated the connection after retrying for 16 minutes and
> notified the RPC layer which deleted the nfsd4_conn.

The server should have closed this connection already. Is it stuck
waiting for the client to respond to a FIN or something?


> But when nfsd4_run_cb_work ran again, it got into the infinite
> loop caused by:
>      /*
>       * XXX: Ideally, we could wait for the client to
>       *      reconnect, but I haven't figured out how
>       *      to do that yet.
>       */
>       nfsd4_queue_cb_delayed(cb, 25);
> 
> which was introduced by c1ccfcf1a9bf. Note that I'm using 6.9-rc1.

The whole paragraph is:

1503         clnt = clp->cl_cb_client;
1504         if (!clnt) {
1505                 if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags))
1506                         nfsd41_destroy_cb(cb);
1507                 else {
1508                         /*
1509                          * XXX: Ideally, we could wait for the client to
1510                          *      reconnect, but I haven't figured out how
1511                          *      to do that yet.
1512                          */
1513                         nfsd4_queue_cb_delayed(cb, 25);
1514                 }
1515                 return;
1516         }

When there's no rpc_clnt and CB_KILL is set, the callback
operation should be destroyed immediately. CB_KILL is set by
nfsd4_shutdown_callback. It's only caller is
__destroy_client().

Why isn't NFSD4_CLIENT_CB_KILL set?


-- 
Chuck Lever

  reply	other threads:[~2024-03-29 14:56 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-26 18:13 [PATCH 1/1] NFSD: cancel CB_RECALL_ANY call when nfs4_client is about to be destroyed Dai Ngo
2024-03-26 18:27 ` Chuck Lever
2024-03-28  1:09   ` Dai Ngo
2024-03-28 14:08     ` Chuck Lever
2024-03-28 18:14       ` Dai Ngo
2024-03-29  0:31         ` Dai Ngo
2024-03-29 14:55           ` Chuck Lever [this message]
2024-03-29 17:57             ` Dai Ngo
2024-03-29 23:42               ` Chuck Lever
2024-03-30 17:46                 ` Dai Ngo
2024-03-30 18:28                   ` Chuck Lever
2024-03-30 23:30                     ` Dai Ngo
2024-04-01 12:49                       ` Jeff Layton
2024-04-01 13:34                         ` Chuck Lever
2024-04-01 16:00                           ` Dai Ngo
2024-04-01 16:46                             ` Dai Ngo
2024-04-01 17:49                               ` Chuck Lever
2024-04-01 19:55                                 ` Dai Ngo
2024-04-01 20:17                                   ` Dai Ngo
2024-04-02 13:58                                   ` Chuck Lever
2024-04-02 14:29                                     ` Dai Ngo
2024-04-01 16:11                           ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZgbWevtNp8Vust4A@tissot.1015granger.net \
    --to=chuck.lever@oracle.com \
    --cc=dai.ngo@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.