Re: [PATCH RFC v19 06/11] NFSD: Update find_clp_in_name_tree() to handle courtesy client

From: dai.ngo@oracle.com
To: Bruce Fields <bfields@fieldses.org>
Cc: Chuck Lever III <chuck.lever@oracle.com>,
	Jeff Layton <jlayton@redhat.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH RFC v19 06/11] NFSD: Update find_clp_in_name_tree() to handle courtesy client
Date: Sun, 17 Apr 2022 18:18:04 -0700	[thread overview]
Message-ID: <34e69305-f11c-2a42-4903-14b1acc5ab41@oracle.com> (raw)
In-Reply-To: <20220417190727.GA18120@fieldses.org>

On 4/17/22 12:07 PM, Bruce Fields wrote:
> On Wed, Apr 13, 2022 at 08:55:50AM -0400, Bruce Fields wrote:
>> On Fri, Apr 01, 2022 at 12:11:34PM -0700, dai.ngo@oracle.com wrote:
>>> On 4/1/22 8:57 AM, Chuck Lever III wrote:
>>>>> (And to be honest I'd still prefer the original approach where we expire
>>>>> clients from the posix locking code and then retry.  It handles an
>>>>> additional case (the one where reboot happens after a long network
>>>>> partition), and I don't think it requires adding these new client
>>>>> states....)
>>>> The locking of the earlier approach was unworkable.
>>>>
>>>> But, I'm happy to consider that again if you can come up with a way
>>>> of handling it properly and simply.
>>> I will wait for feedback from Bruce before sending v20 with the
>>> above change.
>> OK, I'd like to tweak the design in that direction.
>>
>> I'd like to handle the case where the network goes down for a while, and
>> the server gets power-cycled before the network comes back up.  I think
>> that could easily happen.  There's no reason clients couldn't reclaim
>> all their state in that case.  We should let them.
>>
>> To handle that case, we have to delay removing the client's stable
>> storage record until there's a lock conflict.  That means code that
>> checks for conflicts must be able to sleep.
>>
>> In each case (opens, locks, delegations), conflicts are first detected
>> while holding a spinlock.  So we need to unlock before waiting, and then
>> retry if necessary.
>>
>> We decided instead to remove the stable-storage record when first
>> converting a client to a courtesy client--then we can handle a conflict
>> by just setting a flag on the client that indicates it should no longer
>> be used, no need to drop any locks.
>>
>> That leaves the client in a state where it's still on a bunch of global
>> data structures, but has to be treated as if it no longer exists.  That
>> turns out to require more special handling than expected.  You've shown
>> admirable persistance in handling those cases, but I'm still not
>> completely convinced this is correct.
>>
>> We could avoid that complication, and also solve the
>> server-reboot-during-network-partition problem, if we went back to the
>> first plan and allowed ourselves to sleep at the time we detect a
>> conflict.  I don't think it's that complicated.
>>
>> We end up using a lot of the same logic regardless, so don't throw away
>> the existing patches.
>>
>> My basic plan is:
>>
>> Keep the client state, but with only three values: ACTIVE, COURTESY, and
>> EXPIRABLE.
>>
>> ACTIVE is the initial state, which we return to whenever we renew.  The
>> laundromat sets COURTESY whenever a client isn't renewed for a lease
>> period.  When we run into a conflict with a lock held by a client, we
>> call
>>
>>    static bool try_to_expire_client(struct nfs4_client *clp)
>>    {
>> 	return COURTESY == cmpxchg(clp->cl_state, COURTESY, EXPIRABLE);
>>    }
>>
>> If it returns true, that tells us the client was a courtesy client.  We
>> then call queue_work(laundry_wq, &nn->laundromat_work) to tell the
>> laundromat to actually expire the client.  Then if needed we can drop
>> locks, wait for the laundromat to do the work with
>> flush_workqueue(laundry_wq), and retry.
>>
>> All the EXPIRABLE state does is tell the laundromat to expire this
>> client.  It does *not* prevent the client from being renewed and
>> acquiring new locks--if that happens before the laundromat gets to the
>> client, that's fine, we let it return to ACTIVE state and if someone
>> retries the conflicing lock they'll just get a denial.
>>
>> Here's a suggested a rough patch ordering.  If you want to go above and
>> beyond, I also suggest some tests that should pass after each step:
>>
>>
>> PATCH 1
>> -------
>>
>> Implement courtesy behavior *only* for clients that have
>> delegations, but no actual opens or locks:
>>
>> Define new cl_state field with values ACTIVE, COURTESY, and EXPIRABLE.
>> Set to ACTIVE on renewal.  Modify the laundromat so that instead of
>> expiring any client that's too old, it first checks if a client has
>> state consisting only of unconflicted delegations, and, if so, it sets
>> COURTESY.
>>
>> Define try_to_expire_client as above.  In nfsd_break_deleg_cb, call
>> try_to_expire_client and queue_work.  (But also continue scheduling the
>> recall as we do in the current code, there's no harm to that.)
>>
>> Modify the laundromat to try to expire old clients with EXPIRED set.
>>
>> TESTS:
>> 	- Establish a client, open a file, get a delegation, close the
>> 	  file, wait 2 lease periods, verify that you can still use the
>> 	  delegation.
>> 	- Establish a client, open a file, get a delegation, close the
>> 	  file, wait 2 lease periods, establish a second client, request
>> 	  a conflicting open, verify that the open succeeds and that the
>> 	  first client is no longer able to use its delegation.
>>
>>
>> PATCH 2
>> -------
>>
>> Extend courtesy client behavior to clients that have opens or
>> delegations, but no locks:
>>
>> Modify the laundromat to set COURTESY on old clients with state
>> consisting only of opens or unconflicted delegations.
>>
>> Add in nfs4_resolve_deny_conflicts_locked and friends as in your patch
>> "Update nfs4_get_vfs_file()...", but in the case of a conflict, call
>> try_to_expire_client and queue_work(), then modify e.g.
>> nfs4_get_vfs_file to flush_workqueue() and then retry after unlocking
>> fi_lock.
>>
>> TESTS:
>> 	- establish a client, open a file, wait 2 lease periods, verify
>> 	  that you can still use the open stateid.
>> 	- establish a client, open a file, wait 2 lease periods,
>> 	  establish a second client, request an open with a share mode
>> 	  conflicting with the first open, verify that the open succeeds
>> 	  and that first client is no longer able to use its open.
>>
>> PATCH 3
>> -------
>>
>> Minor tweak to prevent the laundromat from being freed out from
>> under a thread processing a conflicting lock:
>>
>> Create and destroy the laundromat workqueue in init_nfsd/exit_nfsd
>> instead of where it's done currently.
>>
>> (That makes the laundromat's lifetime longer than strictly necessary.
>> We could do better with a little more work; I think this is OK for now.)
>>
>> TESTS:
>> 	- just rerun any regression tests; this patch shouldn't change
>> 	  behavior.
>>
>> PATCH 4
>> -------
>>
>> Extend courtesy client behavior to any client with state, including
>> locks:
>>
>> Modify the laundromat to set COURTESY on any old client with state.
>>
>> Add two new lock manager callbacks:
>>
>> 	void * (*lm_lock_expirable)(struct file_lock *);
>> 	bool (*lm_expire_lock)(void *);
>>
>> If lm_lock_expirable() is called and returns non-NULL, posix_lock_inode
>> should drop flc_lock, call lm_expire_lock() with the value returned from
>> lm_lock_expirable, and then restart the loop over flc_posix from the
>> beginning.
>>
>> For now, nfsd's lm_lock_expirable will basically just be
>>
>> 	if (try_to_expire_client()) {
>> 		queue_work()
>> 		return get_net();
> Correction: I forgot that the laundromat is global, not per-net.  So, we
> can skip the put_net/get_net.  Also, lm_lock_expirable can just return
> bool instead of void *, and lm_expire_lock needs no arguments.

okay.

-Dai

>
> --b.
>
>> 	}
>> 	return NULL;
>>
>> and lm_expire_lock will:
>>
>> 	flush_workqueue()
>> 	put_net()
>>
>> One more subtlety: the moment we drop the flc_lock, it's possible
>> another task could race in and free it.  Worse, the nfsd module could be
>> removed entirely--so nfsd's lm_expire_lock code could disappear out from
>> under us.  To prevent this, I think we need to add a struct module
>> *owner field to struct lock_manager_operations, and use it like:
>>
>> 	owner = fl->fl_lmops->owner;
>> 	__get_module(owner);
>> 	expire_lock = fl->fl_lmops->lm_expire_lock;
>> 	spin_unlock(&ctx->flc_lock);
>> 	expire_lock(...);
>> 	module_put(owner);
>>
>> Maybe there's some simpler way, but I don't see it.
>>
>> TESTS:
>> 	- retest courtesy client behavior using file locks this time.
>>
>> --
>>
>> That's the basic idea.  I think it should work--though I may have
>> overlooked something.
>>
>> This has us flush the laundromat workqueue while holding mutexes in a
>> couple cases.  We could avoid that with a little more work, I think.
>> But those mutexes should only be associated with the client requesting a
>> new open/lock, and such a client shouldn't be touched by the laundromat,
>> so I think we're OK.
>>
>> It'd also be helpful to update the info file with courtesy client
>> information, as you do in your current patches.
>>
>> Does this make sense?
>>
>> --b.