From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8461C433EF for ; Tue, 29 Mar 2022 19:49:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240689AbiC2TvB (ORCPT ); Tue, 29 Mar 2022 15:51:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59920 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236568AbiC2TvA (ORCPT ); Tue, 29 Mar 2022 15:51:00 -0400 Received: from fieldses.org (fieldses.org [IPv6:2600:3c00:e000:2f7::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4879433A2D; Tue, 29 Mar 2022 12:49:16 -0700 (PDT) Received: by fieldses.org (Postfix, from userid 2815) id 82F1E731E; Tue, 29 Mar 2022 15:49:15 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.11.0 fieldses.org 82F1E731E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fieldses.org; s=default; t=1648583355; bh=51C9WR5DLPBcCo+DhgDi0MPwMirzRR7iKfhW1vz+yRo=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=IFqvo6Dtjda8abpVxBMQlZycLK5eGxZxGTwmniWS0gzSuoNsmkya5j4ly3H32yumh orFSNFiSb9Ih+V5Z6nvYFC/g6/03zPwBTvVTFtjcmYnV5F3hZ9Wa7ztV4xPfCDw1wy isGo+hFraAUuGqYSZBBUQ42AlP9flMJIKSwnvKJ4= Date: Tue, 29 Mar 2022 15:49:15 -0400 From: Bruce Fields To: Chuck Lever III Cc: Dai Ngo , Jeff Layton , Al Viro , Linux NFS Mailing List , "linux-fsdevel@vger.kernel.org" Subject: Re: [PATCH RFC v18 02/11] NFSD: Add courtesy client state, macro and spinlock to support courteous server Message-ID: <20220329194915.GD32217@fieldses.org> References: <1648182891-32599-1-git-send-email-dai.ngo@oracle.com> <1648182891-32599-3-git-send-email-dai.ngo@oracle.com> <20220329154750.GE29634@fieldses.org> <612ef738-20f6-55f0-1677-cc035ba2fd0d@oracle.com> <20220329163011.GG29634@fieldses.org> <5cddab8d-dd92-6863-78fd-a4608a722927@oracle.com> <20220329183916.GC32217@fieldses.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Tue, Mar 29, 2022 at 07:32:57PM +0000, Chuck Lever III wrote: > > > > On Mar 29, 2022, at 2:39 PM, J. Bruce Fields wrote: > > > > On Tue, Mar 29, 2022 at 11:19:51AM -0700, dai.ngo@oracle.com wrote: > >> > >> On 3/29/22 9:30 AM, J. Bruce Fields wrote: > >>> On Tue, Mar 29, 2022 at 09:20:02AM -0700, dai.ngo@oracle.com wrote: > >>>> On 3/29/22 8:47 AM, J. Bruce Fields wrote: > >>>>> On Thu, Mar 24, 2022 at 09:34:42PM -0700, Dai Ngo wrote: > >>>>>> Update nfs4_client to add: > >>>>>> . cl_cs_client_state: courtesy client state > >>>>>> . cl_cs_lock: spinlock to synchronize access to cl_cs_client_state > >>>>>> . cl_cs_list: list used by laundromat to process courtesy clients > >>>>>> > >>>>>> Modify alloc_client to initialize these fields. > >>>>>> > >>>>>> Signed-off-by: Dai Ngo > >>>>>> --- > >>>>>> fs/nfsd/nfs4state.c | 2 ++ > >>>>>> fs/nfsd/nfsd.h | 1 + > >>>>>> fs/nfsd/state.h | 33 +++++++++++++++++++++++++++++++++ > >>>>>> 3 files changed, 36 insertions(+) > >>>>>> > >>>>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > >>>>>> index 234e852fcdfa..a65d59510681 100644 > >>>>>> --- a/fs/nfsd/nfs4state.c > >>>>>> +++ b/fs/nfsd/nfs4state.c > >>>>>> @@ -2009,12 +2009,14 @@ static struct nfs4_client *alloc_client(struct xdr_netobj name) > >>>>>> INIT_LIST_HEAD(&clp->cl_delegations); > >>>>>> INIT_LIST_HEAD(&clp->cl_lru); > >>>>>> INIT_LIST_HEAD(&clp->cl_revoked); > >>>>>> + INIT_LIST_HEAD(&clp->cl_cs_list); > >>>>>> #ifdef CONFIG_NFSD_PNFS > >>>>>> INIT_LIST_HEAD(&clp->cl_lo_states); > >>>>>> #endif > >>>>>> INIT_LIST_HEAD(&clp->async_copies); > >>>>>> spin_lock_init(&clp->async_lock); > >>>>>> spin_lock_init(&clp->cl_lock); > >>>>>> + spin_lock_init(&clp->cl_cs_lock); > >>>>>> rpc_init_wait_queue(&clp->cl_cb_waitq, "Backchannel slot table"); > >>>>>> return clp; > >>>>>> err_no_hashtbl: > >>>>>> diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h > >>>>>> index 4fc1fd639527..23996c6ca75e 100644 > >>>>>> --- a/fs/nfsd/nfsd.h > >>>>>> +++ b/fs/nfsd/nfsd.h > >>>>>> @@ -336,6 +336,7 @@ void nfsd_lockd_shutdown(void); > >>>>>> #define COMPOUND_ERR_SLACK_SPACE 16 /* OP_SETATTR */ > >>>>>> #define NFSD_LAUNDROMAT_MINTIMEOUT 1 /* seconds */ > >>>>>> +#define NFSD_COURTESY_CLIENT_TIMEOUT (24 * 60 * 60) /* seconds */ > >>>>>> /* > >>>>>> * The following attributes are currently not supported by the NFSv4 server: > >>>>>> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h > >>>>>> index 95457cfd37fc..40e390abc842 100644 > >>>>>> --- a/fs/nfsd/state.h > >>>>>> +++ b/fs/nfsd/state.h > >>>>>> @@ -283,6 +283,35 @@ struct nfsd4_sessionid { > >>>>>> #define HEXDIR_LEN 33 /* hex version of 16 byte md5 of cl_name plus '\0' */ > >>>>>> /* > >>>>>> + * CLIENT_ CLIENT_ CLIENT_ > >>>>>> + * COURTESY EXPIRED RECONNECTED Meaning Where set > >>>>>> + * ----------------------------------------------------------------------------- > >>>>>> + * | false | false | false | Confirmed, active | Default | > >>>>>> + * |---------------------------------------------------------------------------| > >>>>>> + * | true | false | false | Courtesy state. | nfs4_get_client_reaplist | > >>>>>> + * | | | | Lease/lock/share | | > >>>>>> + * | | | | reservation conflict | | > >>>>>> + * | | | | can cause Courtesy | | > >>>>>> + * | | | | client to be expired | | > >>>>>> + * |---------------------------------------------------------------------------| > >>>>>> + * | false | true | false | Courtesy client to be| nfs4_laundromat | > >>>>>> + * | | | | expired by Laundromat| nfsd4_lm_lock_expired | > >>>>>> + * | | | | due to conflict | nfsd4_discard_courtesy_clnt | > >>>>>> + * | | | | | nfsd4_expire_courtesy_clnt | > >>>>>> + * |---------------------------------------------------------------------------| > >>>>>> + * | false | false | true | Courtesy client | nfsd4_courtesy_clnt_expired| > >>>>>> + * | | | | reconnected, | | > >>>>>> + * | | | | becoming active | | > >>>>>> + * ----------------------------------------------------------------------------- > >>> By the way, where is a client returned to the normal (0) state? That > >>> has to happen at some point. > >> > >> For 4.1 courtesy client reconnects is detected in nfsd4_sequence, > >> nfsd4_bind_conn_to_session. > > > > Those are the places where NFSD54_CLIENT_RECONNECTED is set, which isn't > > the question I asked. > > "reconnected" simply means the client has gotten back in touch. Again, my question was: when is cl_cs_client_state set back to 0? As far as I can tell, the answer is never. That means, even long after the client has reconnected, it's left in a weird state where it can be suddenly expired for all sorts of reasons. > The server then has to decide whether to allow the client to > become active again or it needs to purge it. That decision > is different for each operation and minor version. Look for > "if (cl_cs_client_state == NFSD4_CLIENT_RECONNECTED)" for how > those choices are made. > > > >>> Why are RECONNECTED clients discarded in so many cases? (E.g. whenever > >>> a bind_conn_to_session fails). > >> > >> find_in_sessionid_hashtbl: we discard the courtesy client when it > >> reconnects and there is error from nfsd4_get_session_locked. This > >> should be a rare condition so rather than reverting the client > >> state back to courtesy, it is simpler just to discard it. > > > > That may be a rare situation, but I don't believe the behavior of > > discarding the client in this case is correct. > > Can you explain this? It's a courtesy client... the server can > decide it's expired at that point, can't it? IOW what breaks? I'm not worried about courtesy clients, I'm worried about clients that were courtesy clients but have since succesfully renewed their state. Expiring them for a failed bind_conn_to_session isn't right. --b. > > > >> nfsd4_create_session/find_confirmed_client: I think the only time > >> the courtesy client sends CREATE_SESSION, before sending the SEQUENCE > >> to reconnect after missing its leases, is when it wants to do clientid > >> trunking. This should be a rare condition so instead of dealing > >> with it we just do not allow it and discard the client for now. > > > > We can't wave away incorrect behavior with "but it's rare". Users with > > heavy and/or unusual workloads hit rare conditions. Clients may change > > their behavior over time. (E.g., trunking may become more common.) > > > -- > Chuck Lever > >