linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: dai.ngo@oracle.com
To: Calum Mackay <calum.mackay@oracle.com>,
	"J. Bruce Fields" <bfields@fieldses.org>
Cc: chuck.lever@oracle.com, linux-nfs@vger.kernel.org
Subject: Re: [PATCH RFC 1/1] nfsd: Initial implementation of NFSv4 Courteous Server
Date: Wed, 16 Jun 2021 12:27:40 -0700	[thread overview]
Message-ID: <5db5327e-67ce-557d-e734-4cd81d1ad8e8@oracle.com> (raw)
In-Reply-To: <93b4ea72-786f-dc7f-d000-e40426ffff02@oracle.com>


On 6/16/21 12:19 PM, Calum Mackay wrote:
> On 16/06/2021 8:17 pm, dai.ngo@oracle.com wrote:
>>
>> On 6/16/21 9:02 AM, J. Bruce Fields wrote:
>>> On Thu, Jun 03, 2021 at 02:14:38PM -0400, Dai Ngo wrote:
>>>> Currently an NFSv4 client must maintain its lease by using the at 
>>>> least
>>>> one of the state tokens or if nothing else, by issuing a RENEW 
>>>> (4.0), or
>>>> a singleton SEQUENCE (4.1) at least once during each lease period. 
>>>> If the
>>>> client fails to renew the lease, for any reason, the Linux server 
>>>> expunges
>>>> the state tokens immediately upon detection of the "failure to 
>>>> renew the
>>>> lease" condition and begins returning NFS4ERR_EXPIRED if the client 
>>>> should
>>>> reconnect and attempt to use the (now) expired state.
>>>>
>>>> The default lease period for the Linux server is 90 seconds.  The 
>>>> typical
>>>> client cuts that in half and will issue a lease renewing operation 
>>>> every
>>>> 45 seconds. The 90 second lease period is very short considering the
>>>> potential for moderately long term network partitions.  A network 
>>>> partition
>>>> refers to any loss of network connectivity between the NFS client 
>>>> and the
>>>> NFS server, regardless of its root cause.  This includes NIC 
>>>> failures, NIC
>>>> driver bugs, network misconfigurations & administrative errors, 
>>>> routers &
>>>> switches crashing and/or having software updates applied, even down to
>>>> cables being physically pulled.  In most cases, these network 
>>>> failures are
>>>> transient, although the duration is unknown.
>>>>
>>>> A server which does not immediately expunge the state on lease 
>>>> expiration
>>>> is known as a Courteous Server.  A Courteous Server continues to 
>>>> recognize
>>>> previously generated state tokens as valid until conflict arises 
>>>> between
>>>> the expired state and the requests from another client, or the 
>>>> server reboots.
>>>>
>>>> The initial implementation of the Courteous Server will do the 
>>>> following:
>>>>
>>>> . when the laundromat thread detects an expired client and if that 
>>>> client
>>>> still has established states on the Linux server then marks the 
>>>> client as a
>>>> COURTESY_CLIENT and skips destroying the client and all its states,
>>>> otherwise destroy the client as usual.
>>>>
>>>> . detects conflict of OPEN request with a COURTESY_CLIENT, destroys 
>>>> the
>>>> expired client and all its states, skips the delegation recall then 
>>>> allows
>>>> the conflicting request to succeed.
>>>>
>>>> . detects conflict of LOCK/LOCKT request with a COURTESY_CLIENT, 
>>>> destroys
>>>> the expired client and all its states then allows the conflicting 
>>>> request
>>>> to succeed.
>>>>
>>>> To be done:
>>>>
>>>> . fix a problem with 4.1 where the Linux server keeps returning
>>>> SEQ4_STATUS_CB_PATH_DOWN in the successful SEQUENCE reply, after 
>>>> the expired
>>>> client re-connects, causing the client to keep sending BCTS 
>>>> requests to server.
>>> Hm, any progress working out what's causing that?
>>
>> I will fix this in v2 patch.
>>
>>>
>>>> . handle OPEN conflict with share reservations.
>>> Sounds doable.
>>
>> Yes. But I don't know a way to force the Linux client to specify the
>> deny mode on OPEN.
>
> We could do this with pynfs, perhaps, for testing?

Ah you're right. I think we can, might need your help with this.

Thanks,
-Dai

>
> cheers,
> c.
>
>
>
>  I may have to test this with SMB client: expired
>> NFS client should not prevent SMB client from open the file with
>> deny mode.
>>
>>>
>>>> . instead of destroy the client anf all its state on conflict, only 
>>>> destroy
>>>> the state that is conflicted with the current request.
>>> The other todos I think have to be done before we merge, but this one I
>>> think can wait.
>>>
>>>> . destroy the COURTESY_CLIENT either after a fixed period of time 
>>>> to release
>>>> resources or as reacting to memory pressure.
>>> I think we need something here, but it can be pretty simple.
>>
>> ok.
>>
>>>
>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>> ---
>>>>   fs/nfsd/nfs4state.c        | 94 
>>>> +++++++++++++++++++++++++++++++++++++++++++---
>>>>   fs/nfsd/state.h            |  1 +
>>>>   include/linux/sunrpc/svc.h |  1 +
>>>>   3 files changed, 91 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>>>> index b517a8794400..969995872752 100644
>>>> --- a/fs/nfsd/nfs4state.c
>>>> +++ b/fs/nfsd/nfs4state.c
>>>> @@ -171,6 +171,7 @@ renew_client_locked(struct nfs4_client *clp)
>>>>       list_move_tail(&clp->cl_lru, &nn->client_lru);
>>>>       clp->cl_time = ktime_get_boottime_seconds();
>>>> +    clear_bit(NFSD4_CLIENT_COURTESY, &clp->cl_flags);
>>>>   }
>>>>   static void put_client_renew_locked(struct nfs4_client *clp)
>>>> @@ -4610,6 +4611,38 @@ static void nfsd_break_one_deleg(struct 
>>>> nfs4_delegation *dp)
>>>>       nfsd4_run_cb(&dp->dl_recall);
>>>>   }
>>>> +/*
>>>> + * Set rq_conflict_client if the conflict client have expired
>>>> + * and return true, otherwise return false.
>>>> + */
>>>> +static bool
>>>> +nfsd_set_conflict_client(struct nfs4_delegation *dp)
>>>> +{
>>>> +    struct svc_rqst *rqst;
>>>> +    struct nfs4_client *clp;
>>>> +    struct nfsd_net *nn;
>>>> +    bool ret;
>>>> +
>>>> +    if (!i_am_nfsd())
>>>> +        return false;
>>>> +    rqst = kthread_data(current);
>>>> +    if (rqst->rq_prog != NFS_PROGRAM || rqst->rq_vers < 4)
>>>> +        return false;
>>> This says we do nothing if the lock request is not coming from nfsd and
>>> v4.  That doesn't sound right.
>>>
>>> For example, if the conflicting lock is coming from a local process 
>>> (not
>>> from an NFS client at all), we still need to revoke the courtesy state
>>> so that process can make progress.
>>
>> The reason it checks for NFS request is because it uses 
>> rq_conflict_client
>> to pass the expired client back to the upper layer which then does the
>> expire_client. If this is not from a NFS request then kthread_data() 
>> might
>> not be the svc_rqst. In that case the code will try to recall the 
>> delegation
>> from the expired client and the recall will eventually timed out.
>>
>> I will rework this code.
>>
>> Thanks,
>> -Dai
>>
>>>> +    rqst->rq_conflict_client = NULL;
>>>> +    clp = dp->dl_recall.cb_clp;
>>>> +    nn = net_generic(clp->net, nfsd_net_id);
>>>> +    spin_lock(&nn->client_lock);
>>>> +
>>>> +    if (test_bit(NFSD4_CLIENT_COURTESY, &clp->cl_flags) &&
>>>> +                !mark_client_expired_locked(clp)) {
>>>> +        rqst->rq_conflict_client = clp;
>>>> +        ret = true;
>>>> +    } else
>>>> +        ret = false;
>>>> +    spin_unlock(&nn->client_lock);
>>>> +    return ret;
>>>> +}
>>>> +
>>>>   /* Called from break_lease() with i_lock held. */
>>>>   static bool
>>>>   nfsd_break_deleg_cb(struct file_lock *fl)
>>>> @@ -4618,6 +4651,8 @@ nfsd_break_deleg_cb(struct file_lock *fl)
>>>>       struct nfs4_delegation *dp = (struct nfs4_delegation 
>>>> *)fl->fl_owner;
>>>>       struct nfs4_file *fp = dp->dl_stid.sc_file;
>>>> +    if (nfsd_set_conflict_client(dp))
>>>> +        return false;
>>>> trace_nfsd_deleg_break(&dp->dl_stid.sc_stateid);
>>>>       /*
>>>> @@ -5237,6 +5272,22 @@ static void 
>>>> nfsd4_deleg_xgrade_none_ext(struct nfsd4_open *open,
>>>>        */
>>>>   }
>>>> +static bool
>>>> +nfs4_destroy_courtesy_client(struct nfs4_client *clp)
>>>> +{
>>>> +    struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
>>>> +
>>>> +    spin_lock(&nn->client_lock);
>>>> +    if (!test_bit(NFSD4_CLIENT_COURTESY, &clp->cl_flags) ||
>>>> +            mark_client_expired_locked(clp)) {
>>>> +        spin_unlock(&nn->client_lock);
>>>> +        return false;
>>>> +    }
>>>> +    spin_unlock(&nn->client_lock);
>>>> +    expire_client(clp);
>>>> +    return true;
>>>> +}
>>>> +
>>>>   __be32
>>>>   nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh 
>>>> *current_fh, struct nfsd4_open *open)
>>>>   {
>>>> @@ -5286,7 +5337,13 @@ nfsd4_process_open2(struct svc_rqst *rqstp, 
>>>> struct svc_fh *current_fh, struct nf
>>>>               goto out;
>>>>           }
>>>>       } else {
>>>> +        rqstp->rq_conflict_client = NULL;
>>>>           status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, 
>>>> open);
>>>> +        if (status == nfserr_jukebox && rqstp->rq_conflict_client) {
>>>> +            if 
>>>> (nfs4_destroy_courtesy_client(rqstp->rq_conflict_client))
>>>> +                status = nfs4_get_vfs_file(rqstp, fp, current_fh, 
>>>> stp, open);
>>>> +        }
>>>> +
>>>>           if (status) {
>>>>               stp->st_stid.sc_type = NFS4_CLOSED_STID;
>>>>               release_open_stateid(stp);
>>>> @@ -5472,6 +5529,8 @@ nfs4_laundromat(struct nfsd_net *nn)
>>>>       };
>>>>       struct nfs4_cpntf_state *cps;
>>>>       copy_stateid_t *cps_t;
>>>> +    struct nfs4_stid *stid;
>>>> +    int id = 0;
>>>>       int i;
>>>>       if (clients_still_reclaiming(nn)) {
>>>> @@ -5495,6 +5554,15 @@ nfs4_laundromat(struct nfsd_net *nn)
>>>>           clp = list_entry(pos, struct nfs4_client, cl_lru);
>>>>           if (!state_expired(&lt, clp->cl_time))
>>>>               break;
>>>> +
>>>> +        spin_lock(&clp->cl_lock);
>>>> +        stid = idr_get_next(&clp->cl_stateids, &id);
>>>> +        spin_unlock(&clp->cl_lock);
>>>> +        if (stid) {
>>>> +            /* client still has states */
>>>> +            set_bit(NFSD4_CLIENT_COURTESY, &clp->cl_flags);
>>>> +            continue;
>>>> +        }
>>>>           if (mark_client_expired_locked(clp)) {
>>>> trace_nfsd_clid_expired(&clp->cl_clientid);
>>>>               continue;
>>>> @@ -6440,7 +6508,7 @@ static const struct lock_manager_operations 
>>>> nfsd_posix_mng_ops  = {
>>>>       .lm_put_owner = nfsd4_fl_put_owner,
>>>>   };
>>>> -static inline void
>>>> +static inline int
>>>>   nfs4_set_lock_denied(struct file_lock *fl, struct 
>>>> nfsd4_lock_denied *deny)
>>>>   {
>>>>       struct nfs4_lockowner *lo;
>>>> @@ -6453,6 +6521,8 @@ nfs4_set_lock_denied(struct file_lock *fl, 
>>>> struct nfsd4_lock_denied *deny)
>>>>               /* We just don't care that much */
>>>>               goto nevermind;
>>>>           deny->ld_clientid = lo->lo_owner.so_client->cl_clientid;
>>>> +        if (nfs4_destroy_courtesy_client(lo->lo_owner.so_client))
>>>> +            return -EAGAIN;
>>>>       } else {
>>>>   nevermind:
>>>>           deny->ld_owner.len = 0;
>>>> @@ -6467,6 +6537,7 @@ nfs4_set_lock_denied(struct file_lock *fl, 
>>>> struct nfsd4_lock_denied *deny)
>>>>       deny->ld_type = NFS4_READ_LT;
>>>>       if (fl->fl_type != F_RDLCK)
>>>>           deny->ld_type = NFS4_WRITE_LT;
>>>> +    return 0;
>>>>   }
>>>>   static struct nfs4_lockowner *
>>>> @@ -6734,6 +6805,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct 
>>>> nfsd4_compound_state *cstate,
>>>>       unsigned int fl_flags = FL_POSIX;
>>>>       struct net *net = SVC_NET(rqstp);
>>>>       struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>>>> +    bool retried = false;
>>>>       dprintk("NFSD: nfsd4_lock: start=%Ld length=%Ld\n",
>>>>           (long long) lock->lk_offset,
>>>> @@ -6860,7 +6932,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct 
>>>> nfsd4_compound_state *cstate,
>>>>           list_add_tail(&nbl->nbl_lru, &nn->blocked_locks_lru);
>>>>           spin_unlock(&nn->blocked_locks_lock);
>>>>       }
>>>> -
>>>> +again:
>>>>       err = vfs_lock_file(nf->nf_file, F_SETLK, file_lock, conflock);
>>>>       switch (err) {
>>>>       case 0: /* success! */
>>>> @@ -6875,7 +6947,12 @@ nfsd4_lock(struct svc_rqst *rqstp, struct 
>>>> nfsd4_compound_state *cstate,
>>>>       case -EAGAIN:        /* conflock holds conflicting lock */
>>>>           status = nfserr_denied;
>>>>           dprintk("NFSD: nfsd4_lock: conflicting lock found!\n");
>>>> -        nfs4_set_lock_denied(conflock, &lock->lk_denied);
>>>> +
>>>> +        /* try again if conflict with courtesy client  */
>>>> +        if (nfs4_set_lock_denied(conflock, &lock->lk_denied) == 
>>>> -EAGAIN && !retried) {
>>>> +            retried = true;
>>>> +            goto again;
>>>> +        }
>>>>           break;
>>>>       case -EDEADLK:
>>>>           status = nfserr_deadlock;
>>>> @@ -6962,6 +7039,8 @@ nfsd4_lockt(struct svc_rqst *rqstp, struct 
>>>> nfsd4_compound_state *cstate,
>>>>       struct nfs4_lockowner *lo = NULL;
>>>>       __be32 status;
>>>>       struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
>>>> +    bool retried = false;
>>>> +    int ret;
>>>>       if (locks_in_grace(SVC_NET(rqstp)))
>>>>           return nfserr_grace;
>>>> @@ -7010,14 +7089,19 @@ nfsd4_lockt(struct svc_rqst *rqstp, struct 
>>>> nfsd4_compound_state *cstate,
>>>>       file_lock->fl_end = last_byte_offset(lockt->lt_offset, 
>>>> lockt->lt_length);
>>>>       nfs4_transform_lock_offset(file_lock);
>>>> -
>>>> +again:
>>>>       status = nfsd_test_lock(rqstp, &cstate->current_fh, file_lock);
>>>>       if (status)
>>>>           goto out;
>>>>       if (file_lock->fl_type != F_UNLCK) {
>>>>           status = nfserr_denied;
>>>> -        nfs4_set_lock_denied(file_lock, &lockt->lt_denied);
>>>> +        ret = nfs4_set_lock_denied(file_lock, &lockt->lt_denied);
>>>> +        if (ret == -EAGAIN && !retried) {
>>>> +            retried = true;
>>>> +            fh_clear_wcc(&cstate->current_fh);
>>>> +            goto again;
>>>> +        }
>>>>       }
>>>>   out:
>>>>       if (lo)
>>>> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
>>>> index e73bdbb1634a..bdc3605e3722 100644
>>>> --- a/fs/nfsd/state.h
>>>> +++ b/fs/nfsd/state.h
>>>> @@ -345,6 +345,7 @@ struct nfs4_client {
>>>>   #define NFSD4_CLIENT_UPCALL_LOCK    (5)    /* upcall 
>>>> serialization */
>>>>   #define NFSD4_CLIENT_CB_FLAG_MASK    (1 << NFSD4_CLIENT_CB_UPDATE 
>>>> | \
>>>>                        1 << NFSD4_CLIENT_CB_KILL)
>>>> +#define NFSD4_CLIENT_COURTESY        (6)    /* be nice to expired 
>>>> client */
>>>>       unsigned long        cl_flags;
>>>>       const struct cred    *cl_cb_cred;
>>>>       struct rpc_clnt        *cl_cb_client;
>>>> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
>>>> index e91d51ea028b..2f0382f9d0ff 100644
>>>> --- a/include/linux/sunrpc/svc.h
>>>> +++ b/include/linux/sunrpc/svc.h
>>>> @@ -304,6 +304,7 @@ struct svc_rqst {
>>>>                            * net namespace
>>>>                            */
>>>>       void **            rq_lease_breaker; /* The v4 client 
>>>> breaking a lease */
>>>> +    void            *rq_conflict_client;
>>>>   };
>>>>   #define SVC_NET(rqst) (rqst->rq_xprt ? rqst->rq_xprt->xpt_net : 
>>>> rqst->rq_bc_net)
>>>> -- 
>>>> 2.9.5
>

  reply	other threads:[~2021-06-16 19:27 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-03 18:14 [PATCH RFC 1/1] nfsd: Initial implementation of NFSv4 Courteous Server Dai Ngo
2021-06-11  8:42 ` dai.ngo
2021-06-16 16:02 ` J. Bruce Fields
2021-06-16 16:32   ` Chuck Lever III
2021-06-16 19:25     ` dai.ngo
2021-06-16 19:29       ` Chuck Lever III
2021-06-16 20:30         ` Bruce Fields
2021-06-16 19:17   ` dai.ngo
2021-06-16 19:19     ` Calum Mackay
2021-06-16 19:27       ` dai.ngo [this message]
2021-06-24 14:02 ` J. Bruce Fields
2021-06-24 19:50   ` dai.ngo
2021-06-24 20:36     ` J. Bruce Fields
2021-06-28 20:23 ` J. Bruce Fields
2021-06-28 23:39   ` dai.ngo
2021-06-29  4:40     ` dai.ngo
2021-06-30  1:35       ` J. Bruce Fields
2021-06-30  8:41         ` dai.ngo
2021-06-30 14:52           ` J. Bruce Fields
2021-06-30 17:51     ` dai.ngo
2021-06-30 18:05       ` J. Bruce Fields
2021-06-30 18:49         ` dai.ngo
2021-06-30 18:55           ` J. Bruce Fields
2021-06-30 19:13             ` dai.ngo
2021-06-30 19:24               ` J. Bruce Fields
2021-06-30 23:48                 ` dai.ngo
2021-07-01  1:16                   ` J. Bruce Fields
2021-06-30 15:13   ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5db5327e-67ce-557d-e734-4cd81d1ad8e8@oracle.com \
    --to=dai.ngo@oracle.com \
    --cc=bfields@fieldses.org \
    --cc=calum.mackay@oracle.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).