* [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server @ 2021-09-29 0:56 Dai Ngo 2021-09-29 0:56 ` [PATCH RFC v5 1/2] fs/lock: add new callback, lm_expire_lock, to lock_manager_operations Dai Ngo ` (2 more replies) 0 siblings, 3 replies; 53+ messages in thread From: Dai Ngo @ 2021-09-29 0:56 UTC (permalink / raw) To: bfields; +Cc: chuck.lever, linux-nfs, linux-fsdevel Hi Bruce, This series of patches implement the NFSv4 Courteous Server. A server which does not immediately expunge the state on lease expiration is known as a Courteous Server. A Courteous Server continues to recognize previously generated state tokens as valid until conflict arises between the expired state and the requests from another client, or the server reboots. The v2 patch includes the following: . add new callback, lm_expire_lock, to lock_manager_operations to allow the lock manager to take appropriate action with conflict lock. . handle conflicts of NFSv4 locks with NFSv3/NLM and local locks. . expire courtesy client after 24hr if client has not reconnected. . do not allow expired client to become courtesy client if there are waiters for client's locks. . modify client_info_show to show courtesy client and seconds from last renew. . fix a problem with NFSv4.1 server where the it keeps returning SEQ4_STATUS_CB_PATH_DOWN in the successful SEQUENCE reply, after the courtesy client re-connects, causing the client to keep sending BCTS requests to server. The v3 patch includes the following: . modified posix_test_lock to check and resolve conflict locks to handle NLM TEST and NFSv4 LOCKT requests. . separate out fix for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. The v4 patch includes: . rework nfsd_check_courtesy to avoid dead lock of fl_lock and client_lock by asking the laudromat thread to destroy the courtesy client. . handle NFSv4 share reservation conflicts with courtesy client. This includes conflicts between access mode and deny mode and vice versa. . drop the patch for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. The v5 patch includes: . fix recursive locking of file_rwsem from posix_lock_file. . retest with LOCKDEP enabled. NOTE: I will submit pynfs tests for courteous server including tests for share reservation conflicts in a separate patch. ^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH RFC v5 1/2] fs/lock: add new callback, lm_expire_lock, to lock_manager_operations 2021-09-29 0:56 [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server Dai Ngo @ 2021-09-29 0:56 ` Dai Ngo 2021-09-29 0:56 ` [PATCH RFC v5 2/2] nfsd: Initial implementation of NFSv4 Courteous Server Dai Ngo 2021-10-01 20:53 ` [PATCH RFC v5 0/2] " J. Bruce Fields 2 siblings, 0 replies; 53+ messages in thread From: Dai Ngo @ 2021-09-29 0:56 UTC (permalink / raw) To: bfields; +Cc: chuck.lever, linux-nfs, linux-fsdevel Add new callback, lm_expire_lock, to lock_manager_operations to allow the lock manager to take appropriate action to resolve the lock conflict if possible. The callback takes 2 arguments, file_lock of the blocker and a testonly flag: testonly = 1 check and return true if lock conflict can be resolved else return false. testonly = 0 resolve the conflict if possible, return true if conflict was resolved esle return false. Lock manager, such as NFSv4 courteous server, uses this callback to resolve conflict by destroying lock owner, or the NFSv4 courtesy client (client that has expired but allowed to maintains its states) that owns the lock. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> --- fs/locks.c | 28 +++++++++++++++++++++++++--- include/linux/fs.h | 1 + 2 files changed, 26 insertions(+), 3 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index 3d6fb4ae847b..0fef0a6322c7 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -954,6 +954,7 @@ posix_test_lock(struct file *filp, struct file_lock *fl) struct file_lock *cfl; struct file_lock_context *ctx; struct inode *inode = locks_inode(filp); + bool ret; ctx = smp_load_acquire(&inode->i_flctx); if (!ctx || list_empty_careful(&ctx->flc_posix)) { @@ -962,11 +963,20 @@ posix_test_lock(struct file *filp, struct file_lock *fl) } spin_lock(&ctx->flc_lock); +retry: list_for_each_entry(cfl, &ctx->flc_posix, fl_list) { - if (posix_locks_conflict(fl, cfl)) { - locks_copy_conflock(fl, cfl); - goto out; + if (!posix_locks_conflict(fl, cfl)) + continue; + if (cfl->fl_lmops && cfl->fl_lmops->lm_expire_lock && + cfl->fl_lmops->lm_expire_lock(cfl, 1)) { + spin_unlock(&ctx->flc_lock); + ret = cfl->fl_lmops->lm_expire_lock(cfl, 0); + spin_lock(&ctx->flc_lock); + if (ret) + goto retry; } + locks_copy_conflock(fl, cfl); + goto out; } fl->fl_type = F_UNLCK; out: @@ -1140,6 +1150,7 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request, int error; bool added = false; LIST_HEAD(dispose); + bool ret; ctx = locks_get_lock_context(inode, request->fl_type); if (!ctx) @@ -1166,9 +1177,20 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request, * blocker's list of waiters and the global blocked_hash. */ if (request->fl_type != F_UNLCK) { +retry: list_for_each_entry(fl, &ctx->flc_posix, fl_list) { if (!posix_locks_conflict(request, fl)) continue; + if (fl->fl_lmops && fl->fl_lmops->lm_expire_lock && + fl->fl_lmops->lm_expire_lock(fl, 1)) { + spin_unlock(&ctx->flc_lock); + percpu_up_read(&file_rwsem); + ret = fl->fl_lmops->lm_expire_lock(fl, 0); + percpu_down_read(&file_rwsem); + spin_lock(&ctx->flc_lock); + if (ret) + goto retry; + } if (conflock) locks_copy_conflock(conflock, fl); error = -EAGAIN; diff --git a/include/linux/fs.h b/include/linux/fs.h index e7a633353fd2..1a76b6451398 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1071,6 +1071,7 @@ struct lock_manager_operations { int (*lm_change)(struct file_lock *, int, struct list_head *); void (*lm_setup)(struct file_lock *, void **); bool (*lm_breaker_owns_lease)(struct file_lock *); + bool (*lm_expire_lock)(struct file_lock *fl, bool testonly); }; struct lock_manager { -- 2.9.5 ^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH RFC v5 2/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-09-29 0:56 [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server Dai Ngo 2021-09-29 0:56 ` [PATCH RFC v5 1/2] fs/lock: add new callback, lm_expire_lock, to lock_manager_operations Dai Ngo @ 2021-09-29 0:56 ` Dai Ngo 2021-10-01 20:53 ` [PATCH RFC v5 0/2] " J. Bruce Fields 2 siblings, 0 replies; 53+ messages in thread From: Dai Ngo @ 2021-09-29 0:56 UTC (permalink / raw) To: bfields; +Cc: chuck.lever, linux-nfs, linux-fsdevel Currently an NFSv4 client must maintain its lease by using the at least one of the state tokens or if nothing else, by issuing a RENEW (4.0), or a singleton SEQUENCE (4.1) at least once during each lease period. If the client fails to renew the lease, for any reason, the Linux server expunges the state tokens immediately upon detection of the "failure to renew the lease" condition and begins returning NFS4ERR_EXPIRED if the client should reconnect and attempt to use the (now) expired state. The default lease period for the Linux server is 90 seconds. The typical client cuts that in half and will issue a lease renewing operation every 45 seconds. The 90 second lease period is very short considering the potential for moderately long term network partitions. A network partition refers to any loss of network connectivity between the NFS client and the NFS server, regardless of its root cause. This includes NIC failures, NIC driver bugs, network misconfigurations & administrative errors, routers & switches crashing and/or having software updates applied, even down to cables being physically pulled. In most cases, these network failures are transient, although the duration is unknown. A server which does not immediately expunge the state on lease expiration is known as a Courteous Server. A Courteous Server continues to recognize previously generated state tokens as valid until conflict arises between the expired state and the requests from another client, or the server reboots. The initial implementation of the Courteous Server will do the following: . when the laundromat thread detects an expired client and if that client still has established states on the Linux server and there is no waiters for the client's locks then mark the client as a COURTESY_CLIENT and skip destroying the client and all its states, otherwise destroy the client as usual. . detects conflict of OPEN request with COURTESY_CLIENT, destroys the expired client and all its states, skips the delegation recall then allows the conflicting request to succeed. . detects conflict of LOCK/LOCKT, NLM LOCK and TEST, and local locks requests with COURTESY_CLIENT, destroys the expired client and all its states then allows the conflicting request to succeed. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> --- fs/nfsd/nfs4state.c | 269 +++++++++++++++++++++++++++++++++++++++++++++++++++- fs/nfsd/state.h | 3 + 2 files changed, 269 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 3f4027a5de88..f6d835d6e99b 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -125,6 +125,11 @@ static void free_session(struct nfsd4_session *); static const struct nfsd4_callback_ops nfsd4_cb_recall_ops; static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops; +static struct workqueue_struct *laundry_wq; +static void laundromat_main(struct work_struct *); + +static int courtesy_client_expiry = (24 * 60 * 60); /* in secs */ + static bool is_session_dead(struct nfsd4_session *ses) { return ses->se_flags & NFS4_SESSION_DEAD; @@ -172,6 +177,7 @@ renew_client_locked(struct nfs4_client *clp) list_move_tail(&clp->cl_lru, &nn->client_lru); clp->cl_time = ktime_get_boottime_seconds(); + clear_bit(NFSD4_COURTESY_CLIENT, &clp->cl_flags); } static void put_client_renew_locked(struct nfs4_client *clp) @@ -2389,6 +2395,10 @@ static int client_info_show(struct seq_file *m, void *v) seq_puts(m, "status: confirmed\n"); else seq_puts(m, "status: unconfirmed\n"); + seq_printf(m, "courtesy client: %s\n", + test_bit(NFSD4_COURTESY_CLIENT, &clp->cl_flags) ? "yes" : "no"); + seq_printf(m, "seconds from last renew: %lld\n", + ktime_get_boottime_seconds() - clp->cl_time); seq_printf(m, "name: "); seq_quote_mem(m, clp->cl_name.data, clp->cl_name.len); seq_printf(m, "\nminor version: %d\n", clp->cl_minorversion); @@ -4662,6 +4672,33 @@ static void nfsd_break_one_deleg(struct nfs4_delegation *dp) nfsd4_run_cb(&dp->dl_recall); } +/* + * This function is called when a file is opened and there is a + * delegation conflict with another client. If the other client + * is a courtesy client then kick start the laundromat to destroy + * it. + */ +static bool +nfsd_check_courtesy_client(struct nfs4_delegation *dp) +{ + struct svc_rqst *rqst; + struct nfs4_client *clp = dp->dl_recall.cb_clp; + struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); + + if (!i_am_nfsd()) + goto out; + rqst = kthread_data(current); + if (rqst->rq_prog != NFS_PROGRAM || rqst->rq_vers < 4) + return false; +out: + if (test_bit(NFSD4_COURTESY_CLIENT, &clp->cl_flags)) { + set_bit(NFSD4_DESTROY_COURTESY_CLIENT, &clp->cl_flags); + mod_delayed_work(laundry_wq, &nn->laundromat_work, 0); + return true; + } + return false; +} + /* Called from break_lease() with i_lock held. */ static bool nfsd_break_deleg_cb(struct file_lock *fl) @@ -4670,6 +4707,8 @@ nfsd_break_deleg_cb(struct file_lock *fl) struct nfs4_delegation *dp = (struct nfs4_delegation *)fl->fl_owner; struct nfs4_file *fp = dp->dl_stid.sc_file; + if (nfsd_check_courtesy_client(dp)) + return false; trace_nfsd_cb_recall(&dp->dl_stid); /* @@ -4912,6 +4951,104 @@ nfsd4_truncate(struct svc_rqst *rqstp, struct svc_fh *fh, return nfsd_setattr(rqstp, fh, &iattr, 0, (time64_t)0); } +static bool +__nfs4_check_deny_bmap(struct nfs4_file *fp, struct nfs4_ol_stateid *stp, + u32 access, bool share_access) +{ + if (share_access) { + if (!stp->st_deny_bmap) + return false; + + if ((stp->st_deny_bmap & (1 << NFS4_SHARE_DENY_BOTH)) || + (access & NFS4_SHARE_ACCESS_READ && + stp->st_deny_bmap & (1 << NFS4_SHARE_DENY_READ)) || + (access & NFS4_SHARE_ACCESS_WRITE && + stp->st_deny_bmap & (1 << NFS4_SHARE_DENY_WRITE))) { + return true; + } + return false; + } + if ((access & NFS4_SHARE_DENY_BOTH) || + (access & NFS4_SHARE_DENY_READ && + stp->st_access_bmap & (1 << NFS4_SHARE_ACCESS_READ)) || + (access & NFS4_SHARE_DENY_WRITE && + stp->st_access_bmap & (1 << NFS4_SHARE_ACCESS_WRITE))) { + return true; + } + return false; +} + +/* + * access: access mode if share_access is true else is deny mode + */ +static bool +nfs4_check_deny_bmap(struct nfs4_client *clp, struct nfs4_file *fp, + u32 access, bool share_access) +{ + int i; + struct nfs4_openowner *oo; + struct nfs4_stateowner *so, *tmp; + struct nfs4_ol_stateid *stp, *stmp; + + spin_lock(&clp->cl_lock); + for (i = 0; i < OWNER_HASH_SIZE; i++) { + list_for_each_entry_safe(so, tmp, &clp->cl_ownerstr_hashtbl[i], + so_strhash) { + if (!so->so_is_open_owner) + continue; + oo = openowner(so); + list_for_each_entry_safe(stp, stmp, + &oo->oo_owner.so_stateids, st_perstateowner) { + if (__nfs4_check_deny_bmap(fp, stp, access, + share_access)) { + spin_unlock(&clp->cl_lock); + return true; + } + } + } + } + spin_unlock(&clp->cl_lock); + return false; +} + +static int +nfs4_destroy_clnts_with_sresv_conflict(struct svc_rqst *rqstp, + struct nfs4_client *clp, struct nfs4_file *fp, + u32 access, bool share_access) +{ + int cnt = 0; + struct nfs4_client *cl; + struct list_head *pos, *next, reaplist; + struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); + + INIT_LIST_HEAD(&reaplist); + spin_lock(&nn->client_lock); + list_for_each_safe(pos, next, &nn->client_lru) { + cl = list_entry(pos, struct nfs4_client, cl_lru); + if (cl == clp || + (!test_bit(NFSD4_COURTESY_CLIENT, &cl->cl_flags) && + !test_bit(NFSD4_DESTROY_COURTESY_CLIENT, &cl->cl_flags))) + continue; + /* + * check CS client for deny_bmap that + * matches nfs4_file and access mode + */ + if (nfs4_check_deny_bmap(cl, fp, access, share_access)) { + if (mark_client_expired_locked(cl)) + continue; + list_add(&cl->cl_lru, &reaplist); + } + } + spin_unlock(&nn->client_lock); + list_for_each_safe(pos, next, &reaplist) { + cl = list_entry(pos, struct nfs4_client, cl_lru); + list_del_init(&cl->cl_lru); + expire_client(cl); + cnt++; + } + return cnt; +} + static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, struct svc_fh *cur_fh, struct nfs4_ol_stateid *stp, struct nfsd4_open *open) @@ -4921,6 +5058,7 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, int oflag = nfs4_access_to_omode(open->op_share_access); int access = nfs4_access_to_access(open->op_share_access); unsigned char old_access_bmap, old_deny_bmap; + int cnt = 0; spin_lock(&fp->fi_lock); @@ -4928,16 +5066,46 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, * Are we trying to set a deny mode that would conflict with * current access? */ +chk_deny: status = nfs4_file_check_deny(fp, open->op_share_deny); if (status != nfs_ok) { spin_unlock(&fp->fi_lock); + if (status != nfserr_share_denied) + goto out; + /* + * search and destroy all courtesy clients that own + * openstate of nfs4_file, fp, that has st_access_bmap + * that matches open->op_share_deny + */ + cnt = nfs4_destroy_clnts_with_sresv_conflict(rqstp, + stp->st_stateowner->so_client, fp, + open->op_share_deny, false); + if (cnt) { + spin_lock(&fp->fi_lock); + goto chk_deny; + } goto out; } /* set access to the file */ +get_access: status = nfs4_file_get_access(fp, open->op_share_access); if (status != nfs_ok) { spin_unlock(&fp->fi_lock); + if (status != nfserr_share_denied) + goto out; + /* + * search and destroy all courtesy clients that own + * openstate of nfs4_file, fp, that has st_deny_bmap + * that matches open->op_share_access. + */ + cnt = nfs4_destroy_clnts_with_sresv_conflict(rqstp, + stp->st_stateowner->so_client, fp, + open->op_share_access, true); + if (cnt) { + spin_lock(&fp->fi_lock); + goto get_access; + } goto out; } @@ -5289,6 +5457,22 @@ static void nfsd4_deleg_xgrade_none_ext(struct nfsd4_open *open, */ } +static bool +nfs4_destroy_courtesy_client(struct nfs4_client *clp) +{ + struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); + + spin_lock(&nn->client_lock); + if (!test_bit(NFSD4_COURTESY_CLIENT, &clp->cl_flags) || + mark_client_expired_locked(clp)) { + spin_unlock(&nn->client_lock); + return false; + } + spin_unlock(&nn->client_lock); + expire_client(clp); + return true; +} + __be32 nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nfsd4_open *open) { @@ -5572,6 +5756,47 @@ static void nfsd4_ssc_expire_umount(struct nfsd_net *nn) } #endif +static +bool nfs4_anylock_conflict(struct nfs4_client *clp) +{ + int i; + struct nfs4_stateowner *so, *tmp; + struct nfs4_lockowner *lo; + struct nfs4_ol_stateid *stp; + struct nfs4_file *nf; + struct inode *ino; + struct file_lock_context *ctx; + struct file_lock *fl; + + for (i = 0; i < OWNER_HASH_SIZE; i++) { + /* scan each lock owner */ + list_for_each_entry_safe(so, tmp, &clp->cl_ownerstr_hashtbl[i], + so_strhash) { + if (so->so_is_open_owner) + continue; + + /* scan lock states of this lock owner */ + lo = lockowner(so); + list_for_each_entry(stp, &lo->lo_owner.so_stateids, + st_perstateowner) { + nf = stp->st_stid.sc_file; + ino = nf->fi_inode; + ctx = ino->i_flctx; + if (!ctx) + continue; + /* check each lock belongs to this lock state */ + list_for_each_entry(fl, &ctx->flc_posix, fl_list) { + if (fl->fl_owner != lo) + continue; + if (!list_empty(&fl->fl_blocked_requests)) + return true; + } + } + } + } + return false; +} + static time64_t nfs4_laundromat(struct nfsd_net *nn) { @@ -5587,7 +5812,9 @@ nfs4_laundromat(struct nfsd_net *nn) }; struct nfs4_cpntf_state *cps; copy_stateid_t *cps_t; + struct nfs4_stid *stid; int i; + int id = 0; if (clients_still_reclaiming(nn)) { lt.new_timeo = 0; @@ -5608,8 +5835,33 @@ nfs4_laundromat(struct nfsd_net *nn) spin_lock(&nn->client_lock); list_for_each_safe(pos, next, &nn->client_lru) { clp = list_entry(pos, struct nfs4_client, cl_lru); + if (test_bit(NFSD4_DESTROY_COURTESY_CLIENT, &clp->cl_flags)) { + clear_bit(NFSD4_COURTESY_CLIENT, &clp->cl_flags); + goto exp_client; + } + if (test_bit(NFSD4_COURTESY_CLIENT, &clp->cl_flags)) { + if (ktime_get_boottime_seconds() >= clp->courtesy_client_expiry) + goto exp_client; + /* + * after umount, v4.0 client is still + * around waiting to be expired + */ + if (clp->cl_minorversion) + continue; + } if (!state_expired(<, clp->cl_time)) break; + spin_lock(&clp->cl_lock); + stid = idr_get_next(&clp->cl_stateids, &id); + spin_unlock(&clp->cl_lock); + if (stid && !nfs4_anylock_conflict(clp)) { + /* client still has states */ + clp->courtesy_client_expiry = + ktime_get_boottime_seconds() + courtesy_client_expiry; + set_bit(NFSD4_COURTESY_CLIENT, &clp->cl_flags); + continue; + } +exp_client: if (mark_client_expired_locked(clp)) continue; list_add(&clp->cl_lru, &reaplist); @@ -5689,9 +5941,6 @@ nfs4_laundromat(struct nfsd_net *nn) return max_t(time64_t, lt.new_timeo, NFSD_LAUNDROMAT_MINTIMEOUT); } -static struct workqueue_struct *laundry_wq; -static void laundromat_main(struct work_struct *); - static void laundromat_main(struct work_struct *laundry) { @@ -6496,6 +6745,19 @@ nfs4_transform_lock_offset(struct file_lock *lock) lock->fl_end = OFFSET_MAX; } +/* return true if lock was expired else return false */ +static bool +nfsd4_fl_expire_lock(struct file_lock *fl, bool testonly) +{ + struct nfs4_lockowner *lo = (struct nfs4_lockowner *)fl->fl_owner; + struct nfs4_client *clp = lo->lo_owner.so_client; + + if (testonly) + return test_bit(NFSD4_COURTESY_CLIENT, &clp->cl_flags) ? + true : false; + return nfs4_destroy_courtesy_client(clp); +} + static fl_owner_t nfsd4_fl_get_owner(fl_owner_t owner) { @@ -6543,6 +6805,7 @@ static const struct lock_manager_operations nfsd_posix_mng_ops = { .lm_notify = nfsd4_lm_notify, .lm_get_owner = nfsd4_fl_get_owner, .lm_put_owner = nfsd4_fl_put_owner, + .lm_expire_lock = nfsd4_fl_expire_lock, }; static inline void diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index e73bdbb1634a..93e30b101578 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -345,6 +345,8 @@ struct nfs4_client { #define NFSD4_CLIENT_UPCALL_LOCK (5) /* upcall serialization */ #define NFSD4_CLIENT_CB_FLAG_MASK (1 << NFSD4_CLIENT_CB_UPDATE | \ 1 << NFSD4_CLIENT_CB_KILL) +#define NFSD4_COURTESY_CLIENT (6) /* be nice to expired client */ +#define NFSD4_DESTROY_COURTESY_CLIENT (7) unsigned long cl_flags; const struct cred *cl_cb_cred; struct rpc_clnt *cl_cb_client; @@ -385,6 +387,7 @@ struct nfs4_client { struct list_head async_copies; /* list of async copies */ spinlock_t async_lock; /* lock for async copies */ atomic_t cl_cb_inflight; /* Outstanding callbacks */ + int courtesy_client_expiry; }; /* struct nfs4_client_reset -- 2.9.5 ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-09-29 0:56 [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server Dai Ngo 2021-09-29 0:56 ` [PATCH RFC v5 1/2] fs/lock: add new callback, lm_expire_lock, to lock_manager_operations Dai Ngo 2021-09-29 0:56 ` [PATCH RFC v5 2/2] nfsd: Initial implementation of NFSv4 Courteous Server Dai Ngo @ 2021-10-01 20:53 ` J. Bruce Fields 2021-10-01 21:41 ` dai.ngo 2 siblings, 1 reply; 53+ messages in thread From: J. Bruce Fields @ 2021-10-01 20:53 UTC (permalink / raw) To: Dai Ngo; +Cc: chuck.lever, linux-nfs, linux-fsdevel On Tue, Sep 28, 2021 at 08:56:39PM -0400, Dai Ngo wrote: > > Hi Bruce, > > This series of patches implement the NFSv4 Courteous Server. Apologies, I keep meaning to get back to this and haven't yet. I do notice I'm seeing a timeout on pynfs 4.0 test OPEN18. --b. > > A server which does not immediately expunge the state on lease expiration > is known as a Courteous Server. A Courteous Server continues to recognize > previously generated state tokens as valid until conflict arises between > the expired state and the requests from another client, or the server > reboots. > > The v2 patch includes the following: > > . add new callback, lm_expire_lock, to lock_manager_operations to > allow the lock manager to take appropriate action with conflict lock. > > . handle conflicts of NFSv4 locks with NFSv3/NLM and local locks. > > . expire courtesy client after 24hr if client has not reconnected. > > . do not allow expired client to become courtesy client if there are > waiters for client's locks. > > . modify client_info_show to show courtesy client and seconds from > last renew. > > . fix a problem with NFSv4.1 server where the it keeps returning > SEQ4_STATUS_CB_PATH_DOWN in the successful SEQUENCE reply, after > the courtesy client re-connects, causing the client to keep sending > BCTS requests to server. > > The v3 patch includes the following: > > . modified posix_test_lock to check and resolve conflict locks > to handle NLM TEST and NFSv4 LOCKT requests. > > . separate out fix for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. > > The v4 patch includes: > > . rework nfsd_check_courtesy to avoid dead lock of fl_lock and client_lock > by asking the laudromat thread to destroy the courtesy client. > > . handle NFSv4 share reservation conflicts with courtesy client. This > includes conflicts between access mode and deny mode and vice versa. > > . drop the patch for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. > > The v5 patch includes: > > . fix recursive locking of file_rwsem from posix_lock_file. > > . retest with LOCKDEP enabled. > > NOTE: I will submit pynfs tests for courteous server including tests > for share reservation conflicts in a separate patch. > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-10-01 20:53 ` [PATCH RFC v5 0/2] " J. Bruce Fields @ 2021-10-01 21:41 ` dai.ngo 2021-10-01 23:03 ` J. Bruce Fields 2021-11-16 23:06 ` dai.ngo 0 siblings, 2 replies; 53+ messages in thread From: dai.ngo @ 2021-10-01 21:41 UTC (permalink / raw) To: J. Bruce Fields; +Cc: chuck.lever, linux-nfs, linux-fsdevel On 10/1/21 1:53 PM, J. Bruce Fields wrote: > On Tue, Sep 28, 2021 at 08:56:39PM -0400, Dai Ngo wrote: >> Hi Bruce, >> >> This series of patches implement the NFSv4 Courteous Server. > Apologies, I keep meaning to get back to this and haven't yet. > > I do notice I'm seeing a timeout on pynfs 4.0 test OPEN18. It's weird, this test passes on my system: [root@nfsvmf25 nfs4.0]# ./testserver.py $server --rundeps -v OPEN18 INIT st_setclientid.testValid : RUNNING INIT st_setclientid.testValid : PASS MKFILE st_open.testOpen : RUNNING MKFILE st_open.testOpen : PASS OPEN18 st_open.testShareConflict1 : RUNNING OPEN18 st_open.testShareConflict1 : PASS ************************************************** INIT st_setclientid.testValid : PASS OPEN18 st_open.testShareConflict1 : PASS MKFILE st_open.testOpen : PASS ************************************************** Command line asked for 3 of 673 tests Of those: 0 Skipped, 0 Failed, 0 Warned, 3 Passed [root@nfsvmf25 nfs4.0]# Do you have a network trace? -Dai > > --b. > >> A server which does not immediately expunge the state on lease expiration >> is known as a Courteous Server. A Courteous Server continues to recognize >> previously generated state tokens as valid until conflict arises between >> the expired state and the requests from another client, or the server >> reboots. >> >> The v2 patch includes the following: >> >> . add new callback, lm_expire_lock, to lock_manager_operations to >> allow the lock manager to take appropriate action with conflict lock. >> >> . handle conflicts of NFSv4 locks with NFSv3/NLM and local locks. >> >> . expire courtesy client after 24hr if client has not reconnected. >> >> . do not allow expired client to become courtesy client if there are >> waiters for client's locks. >> >> . modify client_info_show to show courtesy client and seconds from >> last renew. >> >> . fix a problem with NFSv4.1 server where the it keeps returning >> SEQ4_STATUS_CB_PATH_DOWN in the successful SEQUENCE reply, after >> the courtesy client re-connects, causing the client to keep sending >> BCTS requests to server. >> >> The v3 patch includes the following: >> >> . modified posix_test_lock to check and resolve conflict locks >> to handle NLM TEST and NFSv4 LOCKT requests. >> >> . separate out fix for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. >> >> The v4 patch includes: >> >> . rework nfsd_check_courtesy to avoid dead lock of fl_lock and client_lock >> by asking the laudromat thread to destroy the courtesy client. >> >> . handle NFSv4 share reservation conflicts with courtesy client. This >> includes conflicts between access mode and deny mode and vice versa. >> >> . drop the patch for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. >> >> The v5 patch includes: >> >> . fix recursive locking of file_rwsem from posix_lock_file. >> >> . retest with LOCKDEP enabled. >> >> NOTE: I will submit pynfs tests for courteous server including tests >> for share reservation conflicts in a separate patch. >> ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-10-01 21:41 ` dai.ngo @ 2021-10-01 23:03 ` J. Bruce Fields 2021-11-16 23:06 ` dai.ngo 1 sibling, 0 replies; 53+ messages in thread From: J. Bruce Fields @ 2021-10-01 23:03 UTC (permalink / raw) To: dai.ngo; +Cc: chuck.lever, linux-nfs, linux-fsdevel On Fri, Oct 01, 2021 at 02:41:55PM -0700, dai.ngo@oracle.com wrote: > > On 10/1/21 1:53 PM, J. Bruce Fields wrote: > >On Tue, Sep 28, 2021 at 08:56:39PM -0400, Dai Ngo wrote: > >>Hi Bruce, > >> > >>This series of patches implement the NFSv4 Courteous Server. > >Apologies, I keep meaning to get back to this and haven't yet. > > > >I do notice I'm seeing a timeout on pynfs 4.0 test OPEN18. > > It's weird, this test passes on my system: > > > [root@nfsvmf25 nfs4.0]# ./testserver.py $server --rundeps -v OPEN18 > INIT st_setclientid.testValid : RUNNING > INIT st_setclientid.testValid : PASS > MKFILE st_open.testOpen : RUNNING > MKFILE st_open.testOpen : PASS > OPEN18 st_open.testShareConflict1 : RUNNING > OPEN18 st_open.testShareConflict1 : PASS > ************************************************** > INIT st_setclientid.testValid : PASS > OPEN18 st_open.testShareConflict1 : PASS > MKFILE st_open.testOpen : PASS > ************************************************** > Command line asked for 3 of 673 tests > Of those: 0 Skipped, 0 Failed, 0 Warned, 3 Passed > [root@nfsvmf25 nfs4.0]# > > Do you have a network trace? Yeah, weirdly, I think it's failing only when I run it with all the other pynfs tests, not when I run it alone. I'll check again and see if I can get a trace, probably next week. --b. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-10-01 21:41 ` dai.ngo 2021-10-01 23:03 ` J. Bruce Fields @ 2021-11-16 23:06 ` dai.ngo 2021-11-17 14:14 ` J. Bruce Fields 1 sibling, 1 reply; 53+ messages in thread From: dai.ngo @ 2021-11-16 23:06 UTC (permalink / raw) To: J. Bruce Fields; +Cc: chuck.lever, linux-nfs, linux-fsdevel Hi Bruce, Just a reminder that this patch is still waiting for your review. Thanks, -Dai On 10/1/21 2:41 PM, dai.ngo@oracle.com wrote: > > On 10/1/21 1:53 PM, J. Bruce Fields wrote: >> On Tue, Sep 28, 2021 at 08:56:39PM -0400, Dai Ngo wrote: >>> Hi Bruce, >>> >>> This series of patches implement the NFSv4 Courteous Server. >> Apologies, I keep meaning to get back to this and haven't yet. >> >> I do notice I'm seeing a timeout on pynfs 4.0 test OPEN18. > > It's weird, this test passes on my system: > > > [root@nfsvmf25 nfs4.0]# ./testserver.py $server --rundeps -v OPEN18 > INIT st_setclientid.testValid : RUNNING > INIT st_setclientid.testValid : PASS > MKFILE st_open.testOpen : RUNNING > MKFILE st_open.testOpen : PASS > OPEN18 st_open.testShareConflict1 : RUNNING > OPEN18 st_open.testShareConflict1 : PASS > ************************************************** > INIT st_setclientid.testValid : PASS > OPEN18 st_open.testShareConflict1 : PASS > MKFILE st_open.testOpen : PASS > ************************************************** > Command line asked for 3 of 673 tests > Of those: 0 Skipped, 0 Failed, 0 Warned, 3 Passed > [root@nfsvmf25 nfs4.0]# > > Do you have a network trace? > > -Dai > >> >> --b. >> >>> A server which does not immediately expunge the state on lease >>> expiration >>> is known as a Courteous Server. A Courteous Server continues to >>> recognize >>> previously generated state tokens as valid until conflict arises >>> between >>> the expired state and the requests from another client, or the server >>> reboots. >>> >>> The v2 patch includes the following: >>> >>> . add new callback, lm_expire_lock, to lock_manager_operations to >>> allow the lock manager to take appropriate action with conflict >>> lock. >>> >>> . handle conflicts of NFSv4 locks with NFSv3/NLM and local locks. >>> >>> . expire courtesy client after 24hr if client has not reconnected. >>> >>> . do not allow expired client to become courtesy client if there are >>> waiters for client's locks. >>> >>> . modify client_info_show to show courtesy client and seconds from >>> last renew. >>> >>> . fix a problem with NFSv4.1 server where the it keeps returning >>> SEQ4_STATUS_CB_PATH_DOWN in the successful SEQUENCE reply, after >>> the courtesy client re-connects, causing the client to keep sending >>> BCTS requests to server. >>> >>> The v3 patch includes the following: >>> >>> . modified posix_test_lock to check and resolve conflict locks >>> to handle NLM TEST and NFSv4 LOCKT requests. >>> >>> . separate out fix for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. >>> >>> The v4 patch includes: >>> >>> . rework nfsd_check_courtesy to avoid dead lock of fl_lock and >>> client_lock >>> by asking the laudromat thread to destroy the courtesy client. >>> >>> . handle NFSv4 share reservation conflicts with courtesy client. This >>> includes conflicts between access mode and deny mode and vice versa. >>> >>> . drop the patch for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. >>> >>> The v5 patch includes: >>> >>> . fix recursive locking of file_rwsem from posix_lock_file. >>> >>> . retest with LOCKDEP enabled. >>> >>> NOTE: I will submit pynfs tests for courteous server including tests >>> for share reservation conflicts in a separate patch. >>> ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-16 23:06 ` dai.ngo @ 2021-11-17 14:14 ` J. Bruce Fields 2021-11-17 17:59 ` dai.ngo 0 siblings, 1 reply; 53+ messages in thread From: J. Bruce Fields @ 2021-11-17 14:14 UTC (permalink / raw) To: dai.ngo; +Cc: chuck.lever, linux-nfs, linux-fsdevel On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: > Just a reminder that this patch is still waiting for your review. Yeah, I was procrastinating and hoping yo'ud figure out the pynfs failure for me.... I'll see if I can get some time today.--b. > > Thanks, > -Dai > > On 10/1/21 2:41 PM, dai.ngo@oracle.com wrote: > > > >On 10/1/21 1:53 PM, J. Bruce Fields wrote: > >>On Tue, Sep 28, 2021 at 08:56:39PM -0400, Dai Ngo wrote: > >>>Hi Bruce, > >>> > >>>This series of patches implement the NFSv4 Courteous Server. > >>Apologies, I keep meaning to get back to this and haven't yet. > >> > >>I do notice I'm seeing a timeout on pynfs 4.0 test OPEN18. > > > >It's weird, this test passes on my system: > > > > > >[root@nfsvmf25 nfs4.0]# ./testserver.py $server --rundeps -v OPEN18 > >INIT st_setclientid.testValid : RUNNING > >INIT st_setclientid.testValid : PASS > >MKFILE st_open.testOpen : RUNNING > >MKFILE st_open.testOpen : PASS > >OPEN18 st_open.testShareConflict1 : RUNNING > >OPEN18 st_open.testShareConflict1 : PASS > >************************************************** > >INIT st_setclientid.testValid : PASS > >OPEN18 st_open.testShareConflict1 : PASS > >MKFILE st_open.testOpen : PASS > >************************************************** > >Command line asked for 3 of 673 tests > >Of those: 0 Skipped, 0 Failed, 0 Warned, 3 Passed > >[root@nfsvmf25 nfs4.0]# > > > >Do you have a network trace? > > > >-Dai > > > >> > >>--b. > >> > >>>A server which does not immediately expunge the state on lease > >>>expiration > >>>is known as a Courteous Server. A Courteous Server continues > >>>to recognize > >>>previously generated state tokens as valid until conflict > >>>arises between > >>>the expired state and the requests from another client, or the server > >>>reboots. > >>> > >>>The v2 patch includes the following: > >>> > >>>. add new callback, lm_expire_lock, to lock_manager_operations to > >>> allow the lock manager to take appropriate action with > >>>conflict lock. > >>> > >>>. handle conflicts of NFSv4 locks with NFSv3/NLM and local locks. > >>> > >>>. expire courtesy client after 24hr if client has not reconnected. > >>> > >>>. do not allow expired client to become courtesy client if there are > >>> waiters for client's locks. > >>> > >>>. modify client_info_show to show courtesy client and seconds from > >>> last renew. > >>> > >>>. fix a problem with NFSv4.1 server where the it keeps returning > >>> SEQ4_STATUS_CB_PATH_DOWN in the successful SEQUENCE reply, after > >>> the courtesy client re-connects, causing the client to keep sending > >>> BCTS requests to server. > >>> > >>>The v3 patch includes the following: > >>> > >>>. modified posix_test_lock to check and resolve conflict locks > >>> to handle NLM TEST and NFSv4 LOCKT requests. > >>> > >>>. separate out fix for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. > >>> > >>>The v4 patch includes: > >>> > >>>. rework nfsd_check_courtesy to avoid dead lock of fl_lock and > >>>client_lock > >>> by asking the laudromat thread to destroy the courtesy client. > >>> > >>>. handle NFSv4 share reservation conflicts with courtesy client. This > >>> includes conflicts between access mode and deny mode and vice versa. > >>> > >>>. drop the patch for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. > >>> > >>>The v5 patch includes: > >>> > >>>. fix recursive locking of file_rwsem from posix_lock_file. > >>> > >>>. retest with LOCKDEP enabled. > >>> > >>>NOTE: I will submit pynfs tests for courteous server including tests > >>>for share reservation conflicts in a separate patch. > >>> ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-17 14:14 ` J. Bruce Fields @ 2021-11-17 17:59 ` dai.ngo 2021-11-17 21:46 ` dai.ngo 0 siblings, 1 reply; 53+ messages in thread From: dai.ngo @ 2021-11-17 17:59 UTC (permalink / raw) To: J. Bruce Fields; +Cc: chuck.lever, linux-nfs, linux-fsdevel On 11/17/21 6:14 AM, J. Bruce Fields wrote: > On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: >> Just a reminder that this patch is still waiting for your review. > Yeah, I was procrastinating and hoping yo'ud figure out the pynfs > failure for me.... Last time I ran 4.0 OPEN18 test by itself and it passed. I will run all OPEN tests together with 5.15-rc7 to see if the problem you've seen still there. -Dai > I'll see if I can get some time today.--b. > >> Thanks, >> -Dai >> >> On 10/1/21 2:41 PM, dai.ngo@oracle.com wrote: >>> On 10/1/21 1:53 PM, J. Bruce Fields wrote: >>>> On Tue, Sep 28, 2021 at 08:56:39PM -0400, Dai Ngo wrote: >>>>> Hi Bruce, >>>>> >>>>> This series of patches implement the NFSv4 Courteous Server. >>>> Apologies, I keep meaning to get back to this and haven't yet. >>>> >>>> I do notice I'm seeing a timeout on pynfs 4.0 test OPEN18. >>> It's weird, this test passes on my system: >>> >>> >>> [root@nfsvmf25 nfs4.0]# ./testserver.py $server --rundeps -v OPEN18 >>> INIT st_setclientid.testValid : RUNNING >>> INIT st_setclientid.testValid : PASS >>> MKFILE st_open.testOpen : RUNNING >>> MKFILE st_open.testOpen : PASS >>> OPEN18 st_open.testShareConflict1 : RUNNING >>> OPEN18 st_open.testShareConflict1 : PASS >>> ************************************************** >>> INIT st_setclientid.testValid : PASS >>> OPEN18 st_open.testShareConflict1 : PASS >>> MKFILE st_open.testOpen : PASS >>> ************************************************** >>> Command line asked for 3 of 673 tests >>> Of those: 0 Skipped, 0 Failed, 0 Warned, 3 Passed >>> [root@nfsvmf25 nfs4.0]# >>> >>> Do you have a network trace? >>> >>> -Dai >>> >>>> --b. >>>> >>>>> A server which does not immediately expunge the state on lease >>>>> expiration >>>>> is known as a Courteous Server. A Courteous Server continues >>>>> to recognize >>>>> previously generated state tokens as valid until conflict >>>>> arises between >>>>> the expired state and the requests from another client, or the server >>>>> reboots. >>>>> >>>>> The v2 patch includes the following: >>>>> >>>>> . add new callback, lm_expire_lock, to lock_manager_operations to >>>>> allow the lock manager to take appropriate action with >>>>> conflict lock. >>>>> >>>>> . handle conflicts of NFSv4 locks with NFSv3/NLM and local locks. >>>>> >>>>> . expire courtesy client after 24hr if client has not reconnected. >>>>> >>>>> . do not allow expired client to become courtesy client if there are >>>>> waiters for client's locks. >>>>> >>>>> . modify client_info_show to show courtesy client and seconds from >>>>> last renew. >>>>> >>>>> . fix a problem with NFSv4.1 server where the it keeps returning >>>>> SEQ4_STATUS_CB_PATH_DOWN in the successful SEQUENCE reply, after >>>>> the courtesy client re-connects, causing the client to keep sending >>>>> BCTS requests to server. >>>>> >>>>> The v3 patch includes the following: >>>>> >>>>> . modified posix_test_lock to check and resolve conflict locks >>>>> to handle NLM TEST and NFSv4 LOCKT requests. >>>>> >>>>> . separate out fix for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. >>>>> >>>>> The v4 patch includes: >>>>> >>>>> . rework nfsd_check_courtesy to avoid dead lock of fl_lock and >>>>> client_lock >>>>> by asking the laudromat thread to destroy the courtesy client. >>>>> >>>>> . handle NFSv4 share reservation conflicts with courtesy client. This >>>>> includes conflicts between access mode and deny mode and vice versa. >>>>> >>>>> . drop the patch for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. >>>>> >>>>> The v5 patch includes: >>>>> >>>>> . fix recursive locking of file_rwsem from posix_lock_file. >>>>> >>>>> . retest with LOCKDEP enabled. >>>>> >>>>> NOTE: I will submit pynfs tests for courteous server including tests >>>>> for share reservation conflicts in a separate patch. >>>>> ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-17 17:59 ` dai.ngo @ 2021-11-17 21:46 ` dai.ngo 2021-11-18 0:34 ` J. Bruce Fields 0 siblings, 1 reply; 53+ messages in thread From: dai.ngo @ 2021-11-17 21:46 UTC (permalink / raw) To: J. Bruce Fields; +Cc: chuck.lever, linux-nfs, linux-fsdevel On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: > > On 11/17/21 6:14 AM, J. Bruce Fields wrote: >> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: >>> Just a reminder that this patch is still waiting for your review. >> Yeah, I was procrastinating and hoping yo'ud figure out the pynfs >> failure for me.... > > Last time I ran 4.0 OPEN18 test by itself and it passed. I will run > all OPEN tests together with 5.15-rc7 to see if the problem you've > seen still there. I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous 5.15-rc7 server. Nfs4.1 results are the same for both courteous and non-courteous server: > Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed Results of nfs4.0 with non-courteous server: >Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed test failed: LOCK24 Results of nfs4.0 with courteous server: >Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed tests failed: LOCK24, OPEN18, OPEN30 OPEN18 and OPEN30 test pass if each is run by itself. I will look into this problem. -Dai > > -Dai > >> I'll see if I can get some time today.--b. >> >>> Thanks, >>> -Dai >>> >>> On 10/1/21 2:41 PM, dai.ngo@oracle.com wrote: >>>> On 10/1/21 1:53 PM, J. Bruce Fields wrote: >>>>> On Tue, Sep 28, 2021 at 08:56:39PM -0400, Dai Ngo wrote: >>>>>> Hi Bruce, >>>>>> >>>>>> This series of patches implement the NFSv4 Courteous Server. >>>>> Apologies, I keep meaning to get back to this and haven't yet. >>>>> >>>>> I do notice I'm seeing a timeout on pynfs 4.0 test OPEN18. >>>> It's weird, this test passes on my system: >>>> >>>> >>>> [root@nfsvmf25 nfs4.0]# ./testserver.py $server --rundeps -v OPEN18 >>>> INIT st_setclientid.testValid : RUNNING >>>> INIT st_setclientid.testValid : PASS >>>> MKFILE st_open.testOpen : RUNNING >>>> MKFILE st_open.testOpen : PASS >>>> OPEN18 st_open.testShareConflict1 : RUNNING >>>> OPEN18 st_open.testShareConflict1 : PASS >>>> ************************************************** >>>> INIT st_setclientid.testValid : PASS >>>> OPEN18 st_open.testShareConflict1 : PASS >>>> MKFILE st_open.testOpen : PASS >>>> ************************************************** >>>> Command line asked for 3 of 673 tests >>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 3 Passed >>>> [root@nfsvmf25 nfs4.0]# >>>> >>>> Do you have a network trace? >>>> >>>> -Dai >>>> >>>>> --b. >>>>> >>>>>> A server which does not immediately expunge the state on lease >>>>>> expiration >>>>>> is known as a Courteous Server. A Courteous Server continues >>>>>> to recognize >>>>>> previously generated state tokens as valid until conflict >>>>>> arises between >>>>>> the expired state and the requests from another client, or the >>>>>> server >>>>>> reboots. >>>>>> >>>>>> The v2 patch includes the following: >>>>>> >>>>>> . add new callback, lm_expire_lock, to lock_manager_operations to >>>>>> allow the lock manager to take appropriate action with >>>>>> conflict lock. >>>>>> >>>>>> . handle conflicts of NFSv4 locks with NFSv3/NLM and local locks. >>>>>> >>>>>> . expire courtesy client after 24hr if client has not reconnected. >>>>>> >>>>>> . do not allow expired client to become courtesy client if there are >>>>>> waiters for client's locks. >>>>>> >>>>>> . modify client_info_show to show courtesy client and seconds from >>>>>> last renew. >>>>>> >>>>>> . fix a problem with NFSv4.1 server where the it keeps returning >>>>>> SEQ4_STATUS_CB_PATH_DOWN in the successful SEQUENCE reply, after >>>>>> the courtesy client re-connects, causing the client to keep >>>>>> sending >>>>>> BCTS requests to server. >>>>>> >>>>>> The v3 patch includes the following: >>>>>> >>>>>> . modified posix_test_lock to check and resolve conflict locks >>>>>> to handle NLM TEST and NFSv4 LOCKT requests. >>>>>> >>>>>> . separate out fix for back channel stuck in >>>>>> SEQ4_STATUS_CB_PATH_DOWN. >>>>>> >>>>>> The v4 patch includes: >>>>>> >>>>>> . rework nfsd_check_courtesy to avoid dead lock of fl_lock and >>>>>> client_lock >>>>>> by asking the laudromat thread to destroy the courtesy client. >>>>>> >>>>>> . handle NFSv4 share reservation conflicts with courtesy client. >>>>>> This >>>>>> includes conflicts between access mode and deny mode and vice >>>>>> versa. >>>>>> >>>>>> . drop the patch for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. >>>>>> >>>>>> The v5 patch includes: >>>>>> >>>>>> . fix recursive locking of file_rwsem from posix_lock_file. >>>>>> >>>>>> . retest with LOCKDEP enabled. >>>>>> >>>>>> NOTE: I will submit pynfs tests for courteous server including tests >>>>>> for share reservation conflicts in a separate patch. >>>>>> ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-17 21:46 ` dai.ngo @ 2021-11-18 0:34 ` J. Bruce Fields 2021-11-22 3:04 ` dai.ngo 0 siblings, 1 reply; 53+ messages in thread From: J. Bruce Fields @ 2021-11-18 0:34 UTC (permalink / raw) To: dai.ngo; +Cc: chuck.lever, linux-nfs, linux-fsdevel On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: > > On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: > > > >On 11/17/21 6:14 AM, J. Bruce Fields wrote: > >>On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: > >>>Just a reminder that this patch is still waiting for your review. > >>Yeah, I was procrastinating and hoping yo'ud figure out the pynfs > >>failure for me.... > > > >Last time I ran 4.0 OPEN18 test by itself and it passed. I will run > >all OPEN tests together with 5.15-rc7 to see if the problem you've > >seen still there. > > I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous > 5.15-rc7 server. > > Nfs4.1 results are the same for both courteous and non-courteous server: > >Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed > > Results of nfs4.0 with non-courteous server: > >Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed > test failed: LOCK24 > > Results of nfs4.0 with courteous server: > >Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed > tests failed: LOCK24, OPEN18, OPEN30 > > OPEN18 and OPEN30 test pass if each is run by itself. Could well be a bug in the tests, I don't know. > I will look into this problem. Thanks! --b. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-18 0:34 ` J. Bruce Fields @ 2021-11-22 3:04 ` dai.ngo 2021-11-29 17:13 ` dai.ngo 0 siblings, 1 reply; 53+ messages in thread From: dai.ngo @ 2021-11-22 3:04 UTC (permalink / raw) To: J. Bruce Fields; +Cc: chuck.lever, linux-nfs, linux-fsdevel On 11/17/21 4:34 PM, J. Bruce Fields wrote: > On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: >> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: >>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: >>>>> Just a reminder that this patch is still waiting for your review. >>>> Yeah, I was procrastinating and hoping yo'ud figure out the pynfs >>>> failure for me.... >>> Last time I ran 4.0 OPEN18 test by itself and it passed. I will run >>> all OPEN tests together with 5.15-rc7 to see if the problem you've >>> seen still there. >> I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous >> 5.15-rc7 server. >> >> Nfs4.1 results are the same for both courteous and non-courteous server: >>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed >> Results of nfs4.0 with non-courteous server: >>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed >> test failed: LOCK24 >> >> Results of nfs4.0 with courteous server: >>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed >> tests failed: LOCK24, OPEN18, OPEN30 >> >> OPEN18 and OPEN30 test pass if each is run by itself. > Could well be a bug in the tests, I don't know. The reason OPEN18 failed was because the test timed out waiting for the reply of an OPEN call. The RPC connection used for the test was configured with 15 secs timeout. Note that OPEN18 only fails when the tests were run with 'all' option, this test passes if it's run by itself. With courteous server, by the time OPEN18 runs, there are about 1026 courtesy 4.0 clients on the server and all of these clients have opened the same file X with WRITE access. These clients were created by the previous tests. After each test completed, since 4.0 does not have session, the client states are not cleaned up immediately on the server and are allowed to become courtesy clients. When OPEN18 runs (about 20 minutes after the 1st test started), it sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the server to check for conflicts with courtesy clients. The loop that checks 1026 courtesy clients for share/access conflict took less than 1 sec. But it took about 55 secs, on my VM, for the server to expire all 1026 courtesy clients. I modified pynfs to configure the 4.0 RPC connection with 60 seconds timeout and OPEN18 now consistently passed. The 4.0 test results are now the same for courteous and non-courteous server: 8 Skipped, 1 Failed, 0 Warned, 577 Passed Note that 4.1 tests do not suffer this timeout problem because the 4.1 clients and sessions are destroyed after each test completes. -Dai >> I will look into this problem. > Thanks! > > --b. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-22 3:04 ` dai.ngo @ 2021-11-29 17:13 ` dai.ngo 2021-11-29 17:30 ` J. Bruce Fields 0 siblings, 1 reply; 53+ messages in thread From: dai.ngo @ 2021-11-29 17:13 UTC (permalink / raw) To: J. Bruce Fields; +Cc: chuck.lever, linux-nfs, linux-fsdevel Hi Bruce, On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: > > On 11/17/21 4:34 PM, J. Bruce Fields wrote: >> On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: >>> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: >>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: >>>>>> Just a reminder that this patch is still waiting for your review. >>>>> Yeah, I was procrastinating and hoping yo'ud figure out the pynfs >>>>> failure for me.... >>>> Last time I ran 4.0 OPEN18 test by itself and it passed. I will run >>>> all OPEN tests together with 5.15-rc7 to see if the problem you've >>>> seen still there. >>> I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous >>> 5.15-rc7 server. >>> >>> Nfs4.1 results are the same for both courteous and non-courteous >>> server: >>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed >>> Results of nfs4.0 with non-courteous server: >>>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>> test failed: LOCK24 >>> >>> Results of nfs4.0 with courteous server: >>>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed >>> tests failed: LOCK24, OPEN18, OPEN30 >>> >>> OPEN18 and OPEN30 test pass if each is run by itself. >> Could well be a bug in the tests, I don't know. > > The reason OPEN18 failed was because the test timed out waiting for > the reply of an OPEN call. The RPC connection used for the test was > configured with 15 secs timeout. Note that OPEN18 only fails when > the tests were run with 'all' option, this test passes if it's run > by itself. > > With courteous server, by the time OPEN18 runs, there are about 1026 > courtesy 4.0 clients on the server and all of these clients have opened > the same file X with WRITE access. These clients were created by the > previous tests. After each test completed, since 4.0 does not have > session, the client states are not cleaned up immediately on the > server and are allowed to become courtesy clients. > > When OPEN18 runs (about 20 minutes after the 1st test started), it > sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the > server to check for conflicts with courtesy clients. The loop that > checks 1026 courtesy clients for share/access conflict took less > than 1 sec. But it took about 55 secs, on my VM, for the server > to expire all 1026 courtesy clients. > > I modified pynfs to configure the 4.0 RPC connection with 60 seconds > timeout and OPEN18 now consistently passed. The 4.0 test results are > now the same for courteous and non-courteous server: > > 8 Skipped, 1 Failed, 0 Warned, 577 Passed > > Note that 4.1 tests do not suffer this timeout problem because the > 4.1 clients and sessions are destroyed after each test completes. Do you want me to send the patch to increase the timeout for pynfs? or is there any other things you think we should do? Thanks, -Dai ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-29 17:13 ` dai.ngo @ 2021-11-29 17:30 ` J. Bruce Fields 2021-11-29 18:32 ` dai.ngo 0 siblings, 1 reply; 53+ messages in thread From: J. Bruce Fields @ 2021-11-29 17:30 UTC (permalink / raw) To: dai.ngo; +Cc: chuck.lever, linux-nfs, linux-fsdevel On Mon, Nov 29, 2021 at 09:13:16AM -0800, dai.ngo@oracle.com wrote: > Hi Bruce, > > On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: > > > >On 11/17/21 4:34 PM, J. Bruce Fields wrote: > >>On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: > >>>On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: > >>>>On 11/17/21 6:14 AM, J. Bruce Fields wrote: > >>>>>On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: > >>>>>>Just a reminder that this patch is still waiting for your review. > >>>>>Yeah, I was procrastinating and hoping yo'ud figure out the pynfs > >>>>>failure for me.... > >>>>Last time I ran 4.0 OPEN18 test by itself and it passed. I will run > >>>>all OPEN tests together with 5.15-rc7 to see if the problem you've > >>>>seen still there. > >>>I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous > >>>5.15-rc7 server. > >>> > >>>Nfs4.1 results are the same for both courteous and > >>>non-courteous server: > >>>>Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed > >>>Results of nfs4.0 with non-courteous server: > >>>>Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed > >>>test failed: LOCK24 > >>> > >>>Results of nfs4.0 with courteous server: > >>>>Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed > >>>tests failed: LOCK24, OPEN18, OPEN30 > >>> > >>>OPEN18 and OPEN30 test pass if each is run by itself. > >>Could well be a bug in the tests, I don't know. > > > >The reason OPEN18 failed was because the test timed out waiting for > >the reply of an OPEN call. The RPC connection used for the test was > >configured with 15 secs timeout. Note that OPEN18 only fails when > >the tests were run with 'all' option, this test passes if it's run > >by itself. > > > >With courteous server, by the time OPEN18 runs, there are about 1026 > >courtesy 4.0 clients on the server and all of these clients have opened > >the same file X with WRITE access. These clients were created by the > >previous tests. After each test completed, since 4.0 does not have > >session, the client states are not cleaned up immediately on the > >server and are allowed to become courtesy clients. > > > >When OPEN18 runs (about 20 minutes after the 1st test started), it > >sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the > >server to check for conflicts with courtesy clients. The loop that > >checks 1026 courtesy clients for share/access conflict took less > >than 1 sec. But it took about 55 secs, on my VM, for the server > >to expire all 1026 courtesy clients. > > > >I modified pynfs to configure the 4.0 RPC connection with 60 seconds > >timeout and OPEN18 now consistently passed. The 4.0 test results are > >now the same for courteous and non-courteous server: > > > >8 Skipped, 1 Failed, 0 Warned, 577 Passed > > > >Note that 4.1 tests do not suffer this timeout problem because the > >4.1 clients and sessions are destroyed after each test completes. > > Do you want me to send the patch to increase the timeout for pynfs? > or is there any other things you think we should do? I don't know. 55 seconds to clean up 1026 clients is about 50ms per client, which is pretty slow. I wonder why. I guess it's probably updating the stable storage information. Is /var/lib/nfs/ on your server backed by a hard drive or an SSD or something else? I wonder if that's an argument for limiting the number of courtesy clients. --b. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-29 17:30 ` J. Bruce Fields @ 2021-11-29 18:32 ` dai.ngo 2021-11-29 19:03 ` Chuck Lever III 0 siblings, 1 reply; 53+ messages in thread From: dai.ngo @ 2021-11-29 18:32 UTC (permalink / raw) To: J. Bruce Fields; +Cc: chuck.lever, linux-nfs, linux-fsdevel On 11/29/21 9:30 AM, J. Bruce Fields wrote: > On Mon, Nov 29, 2021 at 09:13:16AM -0800, dai.ngo@oracle.com wrote: >> Hi Bruce, >> >> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: >>>>> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: >>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: >>>>>>>> Just a reminder that this patch is still waiting for your review. >>>>>>> Yeah, I was procrastinating and hoping yo'ud figure out the pynfs >>>>>>> failure for me.... >>>>>> Last time I ran 4.0 OPEN18 test by itself and it passed. I will run >>>>>> all OPEN tests together with 5.15-rc7 to see if the problem you've >>>>>> seen still there. >>>>> I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous >>>>> 5.15-rc7 server. >>>>> >>>>> Nfs4.1 results are the same for both courteous and >>>>> non-courteous server: >>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed >>>>> Results of nfs4.0 with non-courteous server: >>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>> test failed: LOCK24 >>>>> >>>>> Results of nfs4.0 with courteous server: >>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed >>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>> >>>>> OPEN18 and OPEN30 test pass if each is run by itself. >>>> Could well be a bug in the tests, I don't know. >>> The reason OPEN18 failed was because the test timed out waiting for >>> the reply of an OPEN call. The RPC connection used for the test was >>> configured with 15 secs timeout. Note that OPEN18 only fails when >>> the tests were run with 'all' option, this test passes if it's run >>> by itself. >>> >>> With courteous server, by the time OPEN18 runs, there are about 1026 >>> courtesy 4.0 clients on the server and all of these clients have opened >>> the same file X with WRITE access. These clients were created by the >>> previous tests. After each test completed, since 4.0 does not have >>> session, the client states are not cleaned up immediately on the >>> server and are allowed to become courtesy clients. >>> >>> When OPEN18 runs (about 20 minutes after the 1st test started), it >>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the >>> server to check for conflicts with courtesy clients. The loop that >>> checks 1026 courtesy clients for share/access conflict took less >>> than 1 sec. But it took about 55 secs, on my VM, for the server >>> to expire all 1026 courtesy clients. >>> >>> I modified pynfs to configure the 4.0 RPC connection with 60 seconds >>> timeout and OPEN18 now consistently passed. The 4.0 test results are >>> now the same for courteous and non-courteous server: >>> >>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>> >>> Note that 4.1 tests do not suffer this timeout problem because the >>> 4.1 clients and sessions are destroyed after each test completes. >> Do you want me to send the patch to increase the timeout for pynfs? >> or is there any other things you think we should do? > I don't know. > > 55 seconds to clean up 1026 clients is about 50ms per client, which is > pretty slow. I wonder why. I guess it's probably updating the stable > storage information. Is /var/lib/nfs/ on your server backed by a hard > drive or an SSD or something else? My server is a virtualbox VM that has 1 CPU, 4GB RAM and 64GB of hard disk. I think a production system that supports this many clients should have faster CPUs, faster storage. > > I wonder if that's an argument for limiting the number of courtesy > clients. I think we might want to treat 4.0 clients a bit different from 4.1 clients. With 4.0, every client will become a courtesy client after the client is done with the export and unmounts it. Since there is no destroy session/client with 4.0, the courteous server allows the client to be around and becomes a courtesy client. So after awhile, even with normal usage, there will be lots 4.0 courtesy clients hanging around and these clients won't be destroyed until 24hrs later, or until they cause conflicts with other clients. We can reduce the courtesy_client_expiry time for 4.0 clients from 24hrs to 15/20 mins, enough for most network partition to heal?, or limit the number of 4.0 courtesy clients. Or don't support 4.0 clients at all which is my preference since I think in general users should skip 4.0 and use 4.1 instead. -Dai ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-29 18:32 ` dai.ngo @ 2021-11-29 19:03 ` Chuck Lever III 2021-11-29 19:13 ` Bruce Fields 2021-11-29 19:36 ` dai.ngo 0 siblings, 2 replies; 53+ messages in thread From: Chuck Lever III @ 2021-11-29 19:03 UTC (permalink / raw) To: Dai Ngo; +Cc: Bruce Fields, Linux NFS Mailing List, linux-fsdevel Hello Dai! > On Nov 29, 2021, at 1:32 PM, Dai Ngo <dai.ngo@oracle.com> wrote: > > > On 11/29/21 9:30 AM, J. Bruce Fields wrote: >> On Mon, Nov 29, 2021 at 09:13:16AM -0800, dai.ngo@oracle.com wrote: >>> Hi Bruce, >>> >>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: >>>>>> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: >>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: >>>>>>>>> Just a reminder that this patch is still waiting for your review. >>>>>>>> Yeah, I was procrastinating and hoping yo'ud figure out the pynfs >>>>>>>> failure for me.... >>>>>>> Last time I ran 4.0 OPEN18 test by itself and it passed. I will run >>>>>>> all OPEN tests together with 5.15-rc7 to see if the problem you've >>>>>>> seen still there. >>>>>> I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous >>>>>> 5.15-rc7 server. >>>>>> >>>>>> Nfs4.1 results are the same for both courteous and >>>>>> non-courteous server: >>>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed >>>>>> Results of nfs4.0 with non-courteous server: >>>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>> test failed: LOCK24 >>>>>> >>>>>> Results of nfs4.0 with courteous server: >>>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed >>>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>>> >>>>>> OPEN18 and OPEN30 test pass if each is run by itself. >>>>> Could well be a bug in the tests, I don't know. >>>> The reason OPEN18 failed was because the test timed out waiting for >>>> the reply of an OPEN call. The RPC connection used for the test was >>>> configured with 15 secs timeout. Note that OPEN18 only fails when >>>> the tests were run with 'all' option, this test passes if it's run >>>> by itself. >>>> >>>> With courteous server, by the time OPEN18 runs, there are about 1026 >>>> courtesy 4.0 clients on the server and all of these clients have opened >>>> the same file X with WRITE access. These clients were created by the >>>> previous tests. After each test completed, since 4.0 does not have >>>> session, the client states are not cleaned up immediately on the >>>> server and are allowed to become courtesy clients. >>>> >>>> When OPEN18 runs (about 20 minutes after the 1st test started), it >>>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the >>>> server to check for conflicts with courtesy clients. The loop that >>>> checks 1026 courtesy clients for share/access conflict took less >>>> than 1 sec. But it took about 55 secs, on my VM, for the server >>>> to expire all 1026 courtesy clients. >>>> >>>> I modified pynfs to configure the 4.0 RPC connection with 60 seconds >>>> timeout and OPEN18 now consistently passed. The 4.0 test results are >>>> now the same for courteous and non-courteous server: >>>> >>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>> >>>> Note that 4.1 tests do not suffer this timeout problem because the >>>> 4.1 clients and sessions are destroyed after each test completes. >>> Do you want me to send the patch to increase the timeout for pynfs? >>> or is there any other things you think we should do? >> I don't know. >> >> 55 seconds to clean up 1026 clients is about 50ms per client, which is >> pretty slow. I wonder why. I guess it's probably updating the stable >> storage information. Is /var/lib/nfs/ on your server backed by a hard >> drive or an SSD or something else? > > My server is a virtualbox VM that has 1 CPU, 4GB RAM and 64GB of hard > disk. I think a production system that supports this many clients should > have faster CPUs, faster storage. > >> >> I wonder if that's an argument for limiting the number of courtesy >> clients. > > I think we might want to treat 4.0 clients a bit different from 4.1 > clients. With 4.0, every client will become a courtesy client after > the client is done with the export and unmounts it. It should be safe for a server to purge a client's lease immediately if there is no open or lock state associated with it. When an NFSv4.0 client unmounts, all files should be closed at that point, so the server can wait for the lease to expire and purge it normally. Or am I missing something? > Since there is > no destroy session/client with 4.0, the courteous server allows the > client to be around and becomes a courtesy client. So after awhile, > even with normal usage, there will be lots 4.0 courtesy clients > hanging around and these clients won't be destroyed until 24hrs > later, or until they cause conflicts with other clients. > > We can reduce the courtesy_client_expiry time for 4.0 clients from > 24hrs to 15/20 mins, enough for most network partition to heal?, > or limit the number of 4.0 courtesy clients. Or don't support 4.0 > clients at all which is my preference since I think in general users > should skip 4.0 and use 4.1 instead. > > -Dai -- Chuck Lever ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-29 19:03 ` Chuck Lever III @ 2021-11-29 19:13 ` Bruce Fields 2021-11-29 19:39 ` dai.ngo 2021-11-29 19:36 ` dai.ngo 1 sibling, 1 reply; 53+ messages in thread From: Bruce Fields @ 2021-11-29 19:13 UTC (permalink / raw) To: Chuck Lever III; +Cc: Dai Ngo, Linux NFS Mailing List, linux-fsdevel On Mon, Nov 29, 2021 at 07:03:12PM +0000, Chuck Lever III wrote: > Hello Dai! > > > > On Nov 29, 2021, at 1:32 PM, Dai Ngo <dai.ngo@oracle.com> wrote: > > > > > > On 11/29/21 9:30 AM, J. Bruce Fields wrote: > >> On Mon, Nov 29, 2021 at 09:13:16AM -0800, dai.ngo@oracle.com wrote: > >>> Hi Bruce, > >>> > >>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: > >>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: > >>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: > >>>>>> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: > >>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: > >>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: > >>>>>>>>> Just a reminder that this patch is still waiting for your review. > >>>>>>>> Yeah, I was procrastinating and hoping yo'ud figure out the pynfs > >>>>>>>> failure for me.... > >>>>>>> Last time I ran 4.0 OPEN18 test by itself and it passed. I will run > >>>>>>> all OPEN tests together with 5.15-rc7 to see if the problem you've > >>>>>>> seen still there. > >>>>>> I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous > >>>>>> 5.15-rc7 server. > >>>>>> > >>>>>> Nfs4.1 results are the same for both courteous and > >>>>>> non-courteous server: > >>>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed > >>>>>> Results of nfs4.0 with non-courteous server: > >>>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed > >>>>>> test failed: LOCK24 > >>>>>> > >>>>>> Results of nfs4.0 with courteous server: > >>>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed > >>>>>> tests failed: LOCK24, OPEN18, OPEN30 > >>>>>> > >>>>>> OPEN18 and OPEN30 test pass if each is run by itself. > >>>>> Could well be a bug in the tests, I don't know. > >>>> The reason OPEN18 failed was because the test timed out waiting for > >>>> the reply of an OPEN call. The RPC connection used for the test was > >>>> configured with 15 secs timeout. Note that OPEN18 only fails when > >>>> the tests were run with 'all' option, this test passes if it's run > >>>> by itself. > >>>> > >>>> With courteous server, by the time OPEN18 runs, there are about 1026 > >>>> courtesy 4.0 clients on the server and all of these clients have opened > >>>> the same file X with WRITE access. These clients were created by the > >>>> previous tests. After each test completed, since 4.0 does not have > >>>> session, the client states are not cleaned up immediately on the > >>>> server and are allowed to become courtesy clients. > >>>> > >>>> When OPEN18 runs (about 20 minutes after the 1st test started), it > >>>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the > >>>> server to check for conflicts with courtesy clients. The loop that > >>>> checks 1026 courtesy clients for share/access conflict took less > >>>> than 1 sec. But it took about 55 secs, on my VM, for the server > >>>> to expire all 1026 courtesy clients. > >>>> > >>>> I modified pynfs to configure the 4.0 RPC connection with 60 seconds > >>>> timeout and OPEN18 now consistently passed. The 4.0 test results are > >>>> now the same for courteous and non-courteous server: > >>>> > >>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed > >>>> > >>>> Note that 4.1 tests do not suffer this timeout problem because the > >>>> 4.1 clients and sessions are destroyed after each test completes. > >>> Do you want me to send the patch to increase the timeout for pynfs? > >>> or is there any other things you think we should do? > >> I don't know. > >> > >> 55 seconds to clean up 1026 clients is about 50ms per client, which is > >> pretty slow. I wonder why. I guess it's probably updating the stable > >> storage information. Is /var/lib/nfs/ on your server backed by a hard > >> drive or an SSD or something else? > > > > My server is a virtualbox VM that has 1 CPU, 4GB RAM and 64GB of hard > > disk. I think a production system that supports this many clients should > > have faster CPUs, faster storage. > > > >> > >> I wonder if that's an argument for limiting the number of courtesy > >> clients. > > > > I think we might want to treat 4.0 clients a bit different from 4.1 > > clients. With 4.0, every client will become a courtesy client after > > the client is done with the export and unmounts it. > > It should be safe for a server to purge a client's lease immediately > if there is no open or lock state associated with it. > > When an NFSv4.0 client unmounts, all files should be closed at that > point, so the server can wait for the lease to expire and purge it > normally. Or am I missing something? Makes sense to me! > > Since there is > > no destroy session/client with 4.0, the courteous server allows the > > client to be around and becomes a courtesy client. So after awhile, > > even with normal usage, there will be lots 4.0 courtesy clients > > hanging around and these clients won't be destroyed until 24hrs > > later, or until they cause conflicts with other clients. > > > > We can reduce the courtesy_client_expiry time for 4.0 clients from > > 24hrs to 15/20 mins, enough for most network partition to heal?, > > or limit the number of 4.0 courtesy clients. Or don't support 4.0 > > clients at all which is my preference since I think in general users > > should skip 4.0 and use 4.1 instead. I'm also totally fine with leaving out 4.0, at least to start. --b. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-29 19:13 ` Bruce Fields @ 2021-11-29 19:39 ` dai.ngo 0 siblings, 0 replies; 53+ messages in thread From: dai.ngo @ 2021-11-29 19:39 UTC (permalink / raw) To: Bruce Fields, Chuck Lever III; +Cc: Linux NFS Mailing List, linux-fsdevel On 11/29/21 11:13 AM, Bruce Fields wrote: > On Mon, Nov 29, 2021 at 07:03:12PM +0000, Chuck Lever III wrote: >> Hello Dai! >> >> >>> On Nov 29, 2021, at 1:32 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >>> >>> >>> On 11/29/21 9:30 AM, J. Bruce Fields wrote: >>>> On Mon, Nov 29, 2021 at 09:13:16AM -0800, dai.ngo@oracle.com wrote: >>>>> Hi Bruce, >>>>> >>>>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>>>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: >>>>>>>> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: >>>>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: >>>>>>>>>>> Just a reminder that this patch is still waiting for your review. >>>>>>>>>> Yeah, I was procrastinating and hoping yo'ud figure out the pynfs >>>>>>>>>> failure for me.... >>>>>>>>> Last time I ran 4.0 OPEN18 test by itself and it passed. I will run >>>>>>>>> all OPEN tests together with 5.15-rc7 to see if the problem you've >>>>>>>>> seen still there. >>>>>>>> I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous >>>>>>>> 5.15-rc7 server. >>>>>>>> >>>>>>>> Nfs4.1 results are the same for both courteous and >>>>>>>> non-courteous server: >>>>>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed >>>>>>>> Results of nfs4.0 with non-courteous server: >>>>>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>>> test failed: LOCK24 >>>>>>>> >>>>>>>> Results of nfs4.0 with courteous server: >>>>>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed >>>>>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>>>>> >>>>>>>> OPEN18 and OPEN30 test pass if each is run by itself. >>>>>>> Could well be a bug in the tests, I don't know. >>>>>> The reason OPEN18 failed was because the test timed out waiting for >>>>>> the reply of an OPEN call. The RPC connection used for the test was >>>>>> configured with 15 secs timeout. Note that OPEN18 only fails when >>>>>> the tests were run with 'all' option, this test passes if it's run >>>>>> by itself. >>>>>> >>>>>> With courteous server, by the time OPEN18 runs, there are about 1026 >>>>>> courtesy 4.0 clients on the server and all of these clients have opened >>>>>> the same file X with WRITE access. These clients were created by the >>>>>> previous tests. After each test completed, since 4.0 does not have >>>>>> session, the client states are not cleaned up immediately on the >>>>>> server and are allowed to become courtesy clients. >>>>>> >>>>>> When OPEN18 runs (about 20 minutes after the 1st test started), it >>>>>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the >>>>>> server to check for conflicts with courtesy clients. The loop that >>>>>> checks 1026 courtesy clients for share/access conflict took less >>>>>> than 1 sec. But it took about 55 secs, on my VM, for the server >>>>>> to expire all 1026 courtesy clients. >>>>>> >>>>>> I modified pynfs to configure the 4.0 RPC connection with 60 seconds >>>>>> timeout and OPEN18 now consistently passed. The 4.0 test results are >>>>>> now the same for courteous and non-courteous server: >>>>>> >>>>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>> >>>>>> Note that 4.1 tests do not suffer this timeout problem because the >>>>>> 4.1 clients and sessions are destroyed after each test completes. >>>>> Do you want me to send the patch to increase the timeout for pynfs? >>>>> or is there any other things you think we should do? >>>> I don't know. >>>> >>>> 55 seconds to clean up 1026 clients is about 50ms per client, which is >>>> pretty slow. I wonder why. I guess it's probably updating the stable >>>> storage information. Is /var/lib/nfs/ on your server backed by a hard >>>> drive or an SSD or something else? >>> My server is a virtualbox VM that has 1 CPU, 4GB RAM and 64GB of hard >>> disk. I think a production system that supports this many clients should >>> have faster CPUs, faster storage. >>> >>>> I wonder if that's an argument for limiting the number of courtesy >>>> clients. >>> I think we might want to treat 4.0 clients a bit different from 4.1 >>> clients. With 4.0, every client will become a courtesy client after >>> the client is done with the export and unmounts it. >> It should be safe for a server to purge a client's lease immediately >> if there is no open or lock state associated with it. >> >> When an NFSv4.0 client unmounts, all files should be closed at that >> point, so the server can wait for the lease to expire and purge it >> normally. Or am I missing something? > Makes sense to me! > >>> Since there is >>> no destroy session/client with 4.0, the courteous server allows the >>> client to be around and becomes a courtesy client. So after awhile, >>> even with normal usage, there will be lots 4.0 courtesy clients >>> hanging around and these clients won't be destroyed until 24hrs >>> later, or until they cause conflicts with other clients. >>> >>> We can reduce the courtesy_client_expiry time for 4.0 clients from >>> 24hrs to 15/20 mins, enough for most network partition to heal?, >>> or limit the number of 4.0 courtesy clients. Or don't support 4.0 >>> clients at all which is my preference since I think in general users >>> should skip 4.0 and use 4.1 instead. > I'm also totally fine with leaving out 4.0, at least to start. Ok Bruce, I will submit v6 patch for this. Thanks, -Dai > > --b. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-29 19:03 ` Chuck Lever III 2021-11-29 19:13 ` Bruce Fields @ 2021-11-29 19:36 ` dai.ngo 2021-11-29 21:01 ` dai.ngo 2021-11-29 21:10 ` Chuck Lever III 1 sibling, 2 replies; 53+ messages in thread From: dai.ngo @ 2021-11-29 19:36 UTC (permalink / raw) To: Chuck Lever III; +Cc: Bruce Fields, Linux NFS Mailing List, linux-fsdevel On 11/29/21 11:03 AM, Chuck Lever III wrote: > Hello Dai! > > >> On Nov 29, 2021, at 1:32 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >> >> >> On 11/29/21 9:30 AM, J. Bruce Fields wrote: >>> On Mon, Nov 29, 2021 at 09:13:16AM -0800, dai.ngo@oracle.com wrote: >>>> Hi Bruce, >>>> >>>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: >>>>>>> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: >>>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: >>>>>>>>>> Just a reminder that this patch is still waiting for your review. >>>>>>>>> Yeah, I was procrastinating and hoping yo'ud figure out the pynfs >>>>>>>>> failure for me.... >>>>>>>> Last time I ran 4.0 OPEN18 test by itself and it passed. I will run >>>>>>>> all OPEN tests together with 5.15-rc7 to see if the problem you've >>>>>>>> seen still there. >>>>>>> I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous >>>>>>> 5.15-rc7 server. >>>>>>> >>>>>>> Nfs4.1 results are the same for both courteous and >>>>>>> non-courteous server: >>>>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed >>>>>>> Results of nfs4.0 with non-courteous server: >>>>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>> test failed: LOCK24 >>>>>>> >>>>>>> Results of nfs4.0 with courteous server: >>>>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed >>>>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>>>> >>>>>>> OPEN18 and OPEN30 test pass if each is run by itself. >>>>>> Could well be a bug in the tests, I don't know. >>>>> The reason OPEN18 failed was because the test timed out waiting for >>>>> the reply of an OPEN call. The RPC connection used for the test was >>>>> configured with 15 secs timeout. Note that OPEN18 only fails when >>>>> the tests were run with 'all' option, this test passes if it's run >>>>> by itself. >>>>> >>>>> With courteous server, by the time OPEN18 runs, there are about 1026 >>>>> courtesy 4.0 clients on the server and all of these clients have opened >>>>> the same file X with WRITE access. These clients were created by the >>>>> previous tests. After each test completed, since 4.0 does not have >>>>> session, the client states are not cleaned up immediately on the >>>>> server and are allowed to become courtesy clients. >>>>> >>>>> When OPEN18 runs (about 20 minutes after the 1st test started), it >>>>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the >>>>> server to check for conflicts with courtesy clients. The loop that >>>>> checks 1026 courtesy clients for share/access conflict took less >>>>> than 1 sec. But it took about 55 secs, on my VM, for the server >>>>> to expire all 1026 courtesy clients. >>>>> >>>>> I modified pynfs to configure the 4.0 RPC connection with 60 seconds >>>>> timeout and OPEN18 now consistently passed. The 4.0 test results are >>>>> now the same for courteous and non-courteous server: >>>>> >>>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>> >>>>> Note that 4.1 tests do not suffer this timeout problem because the >>>>> 4.1 clients and sessions are destroyed after each test completes. >>>> Do you want me to send the patch to increase the timeout for pynfs? >>>> or is there any other things you think we should do? >>> I don't know. >>> >>> 55 seconds to clean up 1026 clients is about 50ms per client, which is >>> pretty slow. I wonder why. I guess it's probably updating the stable >>> storage information. Is /var/lib/nfs/ on your server backed by a hard >>> drive or an SSD or something else? >> My server is a virtualbox VM that has 1 CPU, 4GB RAM and 64GB of hard >> disk. I think a production system that supports this many clients should >> have faster CPUs, faster storage. >> >>> I wonder if that's an argument for limiting the number of courtesy >>> clients. >> I think we might want to treat 4.0 clients a bit different from 4.1 >> clients. With 4.0, every client will become a courtesy client after >> the client is done with the export and unmounts it. > It should be safe for a server to purge a client's lease immediately > if there is no open or lock state associated with it. In this case, each client has opened files so there are open states associated with them. > > When an NFSv4.0 client unmounts, all files should be closed at that > point, I'm not sure pynfs does proper clean up after each subtest, I will check. There must be state associated with the client in order for it to become courtesy client. > so the server can wait for the lease to expire and purge it > normally. Or am I missing something? When 4.0 client lease expires and there are still states associated with the client then the server allows this client to become courtesy client. -Dai > > >> Since there is >> no destroy session/client with 4.0, the courteous server allows the >> client to be around and becomes a courtesy client. So after awhile, >> even with normal usage, there will be lots 4.0 courtesy clients >> hanging around and these clients won't be destroyed until 24hrs >> later, or until they cause conflicts with other clients. >> >> We can reduce the courtesy_client_expiry time for 4.0 clients from >> 24hrs to 15/20 mins, enough for most network partition to heal?, >> or limit the number of 4.0 courtesy clients. Or don't support 4.0 >> clients at all which is my preference since I think in general users >> should skip 4.0 and use 4.1 instead. >> >> -Dai > -- > Chuck Lever > > > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-29 19:36 ` dai.ngo @ 2021-11-29 21:01 ` dai.ngo 2021-11-29 21:10 ` Chuck Lever III 1 sibling, 0 replies; 53+ messages in thread From: dai.ngo @ 2021-11-29 21:01 UTC (permalink / raw) To: Chuck Lever III; +Cc: Bruce Fields, Linux NFS Mailing List, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 6240 bytes --] On 11/29/21 11:36 AM, dai.ngo@oracle.com wrote: > > On 11/29/21 11:03 AM, Chuck Lever III wrote: >> Hello Dai! >> >> >>> On Nov 29, 2021, at 1:32 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >>> >>> >>> On 11/29/21 9:30 AM, J. Bruce Fields wrote: >>>> On Mon, Nov 29, 2021 at 09:13:16AM -0800, dai.ngo@oracle.com wrote: >>>>> Hi Bruce, >>>>> >>>>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>>>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: >>>>>>>> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: >>>>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com >>>>>>>>>> wrote: >>>>>>>>>>> Just a reminder that this patch is still waiting for your >>>>>>>>>>> review. >>>>>>>>>> Yeah, I was procrastinating and hoping yo'ud figure out the >>>>>>>>>> pynfs >>>>>>>>>> failure for me.... >>>>>>>>> Last time I ran 4.0 OPEN18 test by itself and it passed. I >>>>>>>>> will run >>>>>>>>> all OPEN tests together with 5.15-rc7 to see if the problem >>>>>>>>> you've >>>>>>>>> seen still there. >>>>>>>> I ran all tests in nfsv4.1 and nfsv4.0 with courteous and >>>>>>>> non-courteous >>>>>>>> 5.15-rc7 server. >>>>>>>> >>>>>>>> Nfs4.1 results are the same for both courteous and >>>>>>>> non-courteous server: >>>>>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed >>>>>>>> Results of nfs4.0 with non-courteous server: >>>>>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>>> test failed: LOCK24 >>>>>>>> >>>>>>>> Results of nfs4.0 with courteous server: >>>>>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed >>>>>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>>>>> >>>>>>>> OPEN18 and OPEN30 test pass if each is run by itself. >>>>>>> Could well be a bug in the tests, I don't know. >>>>>> The reason OPEN18 failed was because the test timed out waiting for >>>>>> the reply of an OPEN call. The RPC connection used for the test was >>>>>> configured with 15 secs timeout. Note that OPEN18 only fails when >>>>>> the tests were run with 'all' option, this test passes if it's run >>>>>> by itself. >>>>>> >>>>>> With courteous server, by the time OPEN18 runs, there are about 1026 >>>>>> courtesy 4.0 clients on the server and all of these clients have >>>>>> opened >>>>>> the same file X with WRITE access. These clients were created by the >>>>>> previous tests. After each test completed, since 4.0 does not have >>>>>> session, the client states are not cleaned up immediately on the >>>>>> server and are allowed to become courtesy clients. >>>>>> >>>>>> When OPEN18 runs (about 20 minutes after the 1st test started), it >>>>>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the >>>>>> server to check for conflicts with courtesy clients. The loop that >>>>>> checks 1026 courtesy clients for share/access conflict took less >>>>>> than 1 sec. But it took about 55 secs, on my VM, for the server >>>>>> to expire all 1026 courtesy clients. >>>>>> >>>>>> I modified pynfs to configure the 4.0 RPC connection with 60 seconds >>>>>> timeout and OPEN18 now consistently passed. The 4.0 test results are >>>>>> now the same for courteous and non-courteous server: >>>>>> >>>>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>> >>>>>> Note that 4.1 tests do not suffer this timeout problem because the >>>>>> 4.1 clients and sessions are destroyed after each test completes. >>>>> Do you want me to send the patch to increase the timeout for pynfs? >>>>> or is there any other things you think we should do? >>>> I don't know. >>>> >>>> 55 seconds to clean up 1026 clients is about 50ms per client, which is >>>> pretty slow. I wonder why. I guess it's probably updating the stable >>>> storage information. Is /var/lib/nfs/ on your server backed by a hard >>>> drive or an SSD or something else? >>> My server is a virtualbox VM that has 1 CPU, 4GB RAM and 64GB of hard >>> disk. I think a production system that supports this many clients >>> should >>> have faster CPUs, faster storage. >>> >>>> I wonder if that's an argument for limiting the number of courtesy >>>> clients. >>> I think we might want to treat 4.0 clients a bit different from 4.1 >>> clients. With 4.0, every client will become a courtesy client after >>> the client is done with the export and unmounts it. >> It should be safe for a server to purge a client's lease immediately >> if there is no open or lock state associated with it. > > In this case, each client has opened files so there are open states > associated with them. > >> >> When an NFSv4.0 client unmounts, all files should be closed at that >> point, > > I'm not sure pynfs does proper clean up after each subtest, I will > check. There must be state associated with the client in order for > it to become courtesy client. pynfs 4.0 test uses LOOKUP, OPEN with OPEN4_CREATE to create the test file and uses PUTFH and REMOVE to remove the test file when done. I don't see where the open state associated the removed file being freed by nfsd_remove. I guess for 4.0, the open state remains valid on the server until the client lease expires. I attached the pcap of OPEN18 test for reference. -Dai > >> so the server can wait for the lease to expire and purge it >> normally. Or am I missing something? > > When 4.0 client lease expires and there are still states associated > with the client then the server allows this client to become courtesy > client. > > -Dai > >> >> >>> Since there is >>> no destroy session/client with 4.0, the courteous server allows the >>> client to be around and becomes a courtesy client. So after awhile, >>> even with normal usage, there will be lots 4.0 courtesy clients >>> hanging around and these clients won't be destroyed until 24hrs >>> later, or until they cause conflicts with other clients. >>> >>> We can reduce the courtesy_client_expiry time for 4.0 clients from >>> 24hrs to 15/20 mins, enough for most network partition to heal?, >>> or limit the number of 4.0 courtesy clients. Or don't support 4.0 >>> clients at all which is my preference since I think in general users >>> should skip 4.0 and use 4.1 instead. >>> >>> -Dai >> -- >> Chuck Lever >> >> >> [-- Attachment #2: pynfs_open18_40.pcap --] [-- Type: application/octet-stream, Size: 16244 bytes --] ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-29 19:36 ` dai.ngo 2021-11-29 21:01 ` dai.ngo @ 2021-11-29 21:10 ` Chuck Lever III 2021-11-30 0:11 ` dai.ngo 1 sibling, 1 reply; 53+ messages in thread From: Chuck Lever III @ 2021-11-29 21:10 UTC (permalink / raw) To: Dai Ngo; +Cc: Bruce Fields, Linux NFS Mailing List, linux-fsdevel > On Nov 29, 2021, at 2:36 PM, Dai Ngo <dai.ngo@oracle.com> wrote: > > > On 11/29/21 11:03 AM, Chuck Lever III wrote: >> Hello Dai! >> >> >>> On Nov 29, 2021, at 1:32 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >>> >>> >>> On 11/29/21 9:30 AM, J. Bruce Fields wrote: >>>> On Mon, Nov 29, 2021 at 09:13:16AM -0800, dai.ngo@oracle.com wrote: >>>>> Hi Bruce, >>>>> >>>>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>>>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: >>>>>>>> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: >>>>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: >>>>>>>>>>> Just a reminder that this patch is still waiting for your review. >>>>>>>>>> Yeah, I was procrastinating and hoping yo'ud figure out the pynfs >>>>>>>>>> failure for me.... >>>>>>>>> Last time I ran 4.0 OPEN18 test by itself and it passed. I will run >>>>>>>>> all OPEN tests together with 5.15-rc7 to see if the problem you've >>>>>>>>> seen still there. >>>>>>>> I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous >>>>>>>> 5.15-rc7 server. >>>>>>>> >>>>>>>> Nfs4.1 results are the same for both courteous and >>>>>>>> non-courteous server: >>>>>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed >>>>>>>> Results of nfs4.0 with non-courteous server: >>>>>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>>> test failed: LOCK24 >>>>>>>> >>>>>>>> Results of nfs4.0 with courteous server: >>>>>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed >>>>>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>>>>> >>>>>>>> OPEN18 and OPEN30 test pass if each is run by itself. >>>>>>> Could well be a bug in the tests, I don't know. >>>>>> The reason OPEN18 failed was because the test timed out waiting for >>>>>> the reply of an OPEN call. The RPC connection used for the test was >>>>>> configured with 15 secs timeout. Note that OPEN18 only fails when >>>>>> the tests were run with 'all' option, this test passes if it's run >>>>>> by itself. >>>>>> >>>>>> With courteous server, by the time OPEN18 runs, there are about 1026 >>>>>> courtesy 4.0 clients on the server and all of these clients have opened >>>>>> the same file X with WRITE access. These clients were created by the >>>>>> previous tests. After each test completed, since 4.0 does not have >>>>>> session, the client states are not cleaned up immediately on the >>>>>> server and are allowed to become courtesy clients. >>>>>> >>>>>> When OPEN18 runs (about 20 minutes after the 1st test started), it >>>>>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the >>>>>> server to check for conflicts with courtesy clients. The loop that >>>>>> checks 1026 courtesy clients for share/access conflict took less >>>>>> than 1 sec. But it took about 55 secs, on my VM, for the server >>>>>> to expire all 1026 courtesy clients. >>>>>> >>>>>> I modified pynfs to configure the 4.0 RPC connection with 60 seconds >>>>>> timeout and OPEN18 now consistently passed. The 4.0 test results are >>>>>> now the same for courteous and non-courteous server: >>>>>> >>>>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>> >>>>>> Note that 4.1 tests do not suffer this timeout problem because the >>>>>> 4.1 clients and sessions are destroyed after each test completes. >>>>> Do you want me to send the patch to increase the timeout for pynfs? >>>>> or is there any other things you think we should do? >>>> I don't know. >>>> >>>> 55 seconds to clean up 1026 clients is about 50ms per client, which is >>>> pretty slow. I wonder why. I guess it's probably updating the stable >>>> storage information. Is /var/lib/nfs/ on your server backed by a hard >>>> drive or an SSD or something else? >>> My server is a virtualbox VM that has 1 CPU, 4GB RAM and 64GB of hard >>> disk. I think a production system that supports this many clients should >>> have faster CPUs, faster storage. >>> >>>> I wonder if that's an argument for limiting the number of courtesy >>>> clients. >>> I think we might want to treat 4.0 clients a bit different from 4.1 >>> clients. With 4.0, every client will become a courtesy client after >>> the client is done with the export and unmounts it. >> It should be safe for a server to purge a client's lease immediately >> if there is no open or lock state associated with it. > > In this case, each client has opened files so there are open states > associated with them. > >> >> When an NFSv4.0 client unmounts, all files should be closed at that >> point, > > I'm not sure pynfs does proper clean up after each subtest, I will > check. There must be state associated with the client in order for > it to become courtesy client. Makes sense. Then a synthetic client like pynfs can DoS a courteous server. >> so the server can wait for the lease to expire and purge it >> normally. Or am I missing something? > > When 4.0 client lease expires and there are still states associated > with the client then the server allows this client to become courtesy > client. I think the same thing happens if an NFSv4.1 client neglects to send DESTROY_SESSION / DESTROY_CLIENTID. Either such a client is broken or malicious, but the server faces the same issue of protecting itself from a DoS attack. IMO you should consider limiting the number of courteous clients the server can hold onto. Let's say that number is 1000. When the server wants to turn a 1001st client into a courteous client, it can simply expire and purge the oldest courteous client on its list. Otherwise, over time, the 24-hour expiry will reduce the set of courteous clients back to zero. What do you think? >>> Since there is >>> no destroy session/client with 4.0, the courteous server allows the >>> client to be around and becomes a courtesy client. So after awhile, >>> even with normal usage, there will be lots 4.0 courtesy clients >>> hanging around and these clients won't be destroyed until 24hrs >>> later, or until they cause conflicts with other clients. >>> >>> We can reduce the courtesy_client_expiry time for 4.0 clients from >>> 24hrs to 15/20 mins, enough for most network partition to heal?, >>> or limit the number of 4.0 courtesy clients. Or don't support 4.0 >>> clients at all which is my preference since I think in general users >>> should skip 4.0 and use 4.1 instead. >>> >>> -Dai >> -- >> Chuck Lever >> >> >> -- Chuck Lever ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-29 21:10 ` Chuck Lever III @ 2021-11-30 0:11 ` dai.ngo 2021-11-30 1:42 ` Chuck Lever III 0 siblings, 1 reply; 53+ messages in thread From: dai.ngo @ 2021-11-30 0:11 UTC (permalink / raw) To: Chuck Lever III; +Cc: Bruce Fields, Linux NFS Mailing List, linux-fsdevel On 11/29/21 1:10 PM, Chuck Lever III wrote: > >> On Nov 29, 2021, at 2:36 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >> >> >> On 11/29/21 11:03 AM, Chuck Lever III wrote: >>> Hello Dai! >>> >>> >>>> On Nov 29, 2021, at 1:32 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >>>> >>>> >>>> On 11/29/21 9:30 AM, J. Bruce Fields wrote: >>>>> On Mon, Nov 29, 2021 at 09:13:16AM -0800, dai.ngo@oracle.com wrote: >>>>>> Hi Bruce, >>>>>> >>>>>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>>>>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>>>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: >>>>>>>>> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: >>>>>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: >>>>>>>>>>>> Just a reminder that this patch is still waiting for your review. >>>>>>>>>>> Yeah, I was procrastinating and hoping yo'ud figure out the pynfs >>>>>>>>>>> failure for me.... >>>>>>>>>> Last time I ran 4.0 OPEN18 test by itself and it passed. I will run >>>>>>>>>> all OPEN tests together with 5.15-rc7 to see if the problem you've >>>>>>>>>> seen still there. >>>>>>>>> I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous >>>>>>>>> 5.15-rc7 server. >>>>>>>>> >>>>>>>>> Nfs4.1 results are the same for both courteous and >>>>>>>>> non-courteous server: >>>>>>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed >>>>>>>>> Results of nfs4.0 with non-courteous server: >>>>>>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>>>> test failed: LOCK24 >>>>>>>>> >>>>>>>>> Results of nfs4.0 with courteous server: >>>>>>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed >>>>>>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>>>>>> >>>>>>>>> OPEN18 and OPEN30 test pass if each is run by itself. >>>>>>>> Could well be a bug in the tests, I don't know. >>>>>>> The reason OPEN18 failed was because the test timed out waiting for >>>>>>> the reply of an OPEN call. The RPC connection used for the test was >>>>>>> configured with 15 secs timeout. Note that OPEN18 only fails when >>>>>>> the tests were run with 'all' option, this test passes if it's run >>>>>>> by itself. >>>>>>> >>>>>>> With courteous server, by the time OPEN18 runs, there are about 1026 >>>>>>> courtesy 4.0 clients on the server and all of these clients have opened >>>>>>> the same file X with WRITE access. These clients were created by the >>>>>>> previous tests. After each test completed, since 4.0 does not have >>>>>>> session, the client states are not cleaned up immediately on the >>>>>>> server and are allowed to become courtesy clients. >>>>>>> >>>>>>> When OPEN18 runs (about 20 minutes after the 1st test started), it >>>>>>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the >>>>>>> server to check for conflicts with courtesy clients. The loop that >>>>>>> checks 1026 courtesy clients for share/access conflict took less >>>>>>> than 1 sec. But it took about 55 secs, on my VM, for the server >>>>>>> to expire all 1026 courtesy clients. >>>>>>> >>>>>>> I modified pynfs to configure the 4.0 RPC connection with 60 seconds >>>>>>> timeout and OPEN18 now consistently passed. The 4.0 test results are >>>>>>> now the same for courteous and non-courteous server: >>>>>>> >>>>>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>> >>>>>>> Note that 4.1 tests do not suffer this timeout problem because the >>>>>>> 4.1 clients and sessions are destroyed after each test completes. >>>>>> Do you want me to send the patch to increase the timeout for pynfs? >>>>>> or is there any other things you think we should do? >>>>> I don't know. >>>>> >>>>> 55 seconds to clean up 1026 clients is about 50ms per client, which is >>>>> pretty slow. I wonder why. I guess it's probably updating the stable >>>>> storage information. Is /var/lib/nfs/ on your server backed by a hard >>>>> drive or an SSD or something else? >>>> My server is a virtualbox VM that has 1 CPU, 4GB RAM and 64GB of hard >>>> disk. I think a production system that supports this many clients should >>>> have faster CPUs, faster storage. >>>> >>>>> I wonder if that's an argument for limiting the number of courtesy >>>>> clients. >>>> I think we might want to treat 4.0 clients a bit different from 4.1 >>>> clients. With 4.0, every client will become a courtesy client after >>>> the client is done with the export and unmounts it. >>> It should be safe for a server to purge a client's lease immediately >>> if there is no open or lock state associated with it. >> In this case, each client has opened files so there are open states >> associated with them. >> >>> When an NFSv4.0 client unmounts, all files should be closed at that >>> point, >> I'm not sure pynfs does proper clean up after each subtest, I will >> check. There must be state associated with the client in order for >> it to become courtesy client. > Makes sense. Then a synthetic client like pynfs can DoS a courteous > server. > > >>> so the server can wait for the lease to expire and purge it >>> normally. Or am I missing something? >> When 4.0 client lease expires and there are still states associated >> with the client then the server allows this client to become courtesy >> client. > I think the same thing happens if an NFSv4.1 client neglects to send > DESTROY_SESSION / DESTROY_CLIENTID. Either such a client is broken > or malicious, but the server faces the same issue of protecting > itself from a DoS attack. > > IMO you should consider limiting the number of courteous clients > the server can hold onto. Let's say that number is 1000. When the > server wants to turn a 1001st client into a courteous client, it > can simply expire and purge the oldest courteous client on its > list. Otherwise, over time, the 24-hour expiry will reduce the > set of courteous clients back to zero. > > What do you think? Limiting the number of courteous clients to handle the cases of broken/malicious 4.1 clients seems reasonable as the last resort. I think if a malicious 4.1 clients could mount the server's export, opens a file (to create state) and repeats the same with a different client id then it seems like some basic security was already broken; allowing unauthorized clients to mount server's exports. I think if we have to enforce a limit, then it's only for handling of seriously buggy 4.1 clients which should not be the norm. The issue with this is how to pick an optimal number that is suitable for the running server which can be a very slow or a very fast server. Note that even if we impose an limit, that does not completely solve the problem with pynfs 4.0 test since its RPC timeout is configured with 15 secs which just enough to expire 277 clients based on 53ms for each client, unless we limit it ~270 clients which I think it's too low. This is what I plan to do: 1. do not support 4.0 courteous clients, for sure. 2. limit the number of courteous clients to 1000 (?), if you still think we need it. Pls let me know what you think. Thanks, -Dai > > >>>> Since there is >>>> no destroy session/client with 4.0, the courteous server allows the >>>> client to be around and becomes a courtesy client. So after awhile, >>>> even with normal usage, there will be lots 4.0 courtesy clients >>>> hanging around and these clients won't be destroyed until 24hrs >>>> later, or until they cause conflicts with other clients. >>>> >>>> We can reduce the courtesy_client_expiry time for 4.0 clients from >>>> 24hrs to 15/20 mins, enough for most network partition to heal?, >>>> or limit the number of 4.0 courtesy clients. Or don't support 4.0 >>>> clients at all which is my preference since I think in general users >>>> should skip 4.0 and use 4.1 instead. >>>> >>>> -Dai >>> -- >>> Chuck Lever >>> >>> >>> > -- > Chuck Lever > > > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-30 0:11 ` dai.ngo @ 2021-11-30 1:42 ` Chuck Lever III 2021-11-30 4:08 ` Trond Myklebust 2021-11-30 7:13 ` dai.ngo 0 siblings, 2 replies; 53+ messages in thread From: Chuck Lever III @ 2021-11-30 1:42 UTC (permalink / raw) To: Dai Ngo; +Cc: Bruce Fields, Linux NFS Mailing List, linux-fsdevel > On Nov 29, 2021, at 7:11 PM, Dai Ngo <dai.ngo@oracle.com> wrote: > > >> On 11/29/21 1:10 PM, Chuck Lever III wrote: >> >>>> On Nov 29, 2021, at 2:36 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >>> >>> >>> On 11/29/21 11:03 AM, Chuck Lever III wrote: >>>> Hello Dai! >>>> >>>> >>>>> On Nov 29, 2021, at 1:32 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >>>>> >>>>> >>>>> On 11/29/21 9:30 AM, J. Bruce Fields wrote: >>>>>> On Mon, Nov 29, 2021 at 09:13:16AM -0800, dai.ngo@oracle.com wrote: >>>>>>> Hi Bruce, >>>>>>> >>>>>>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>>>>>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>>>>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: >>>>>>>>>> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: >>>>>>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>>>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: >>>>>>>>>>>>> Just a reminder that this patch is still waiting for your review. >>>>>>>>>>>> Yeah, I was procrastinating and hoping yo'ud figure out the pynfs >>>>>>>>>>>> failure for me.... >>>>>>>>>>> Last time I ran 4.0 OPEN18 test by itself and it passed. I will run >>>>>>>>>>> all OPEN tests together with 5.15-rc7 to see if the problem you've >>>>>>>>>>> seen still there. >>>>>>>>>> I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous >>>>>>>>>> 5.15-rc7 server. >>>>>>>>>> >>>>>>>>>> Nfs4.1 results are the same for both courteous and >>>>>>>>>> non-courteous server: >>>>>>>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed >>>>>>>>>> Results of nfs4.0 with non-courteous server: >>>>>>>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>>>>> test failed: LOCK24 >>>>>>>>>> >>>>>>>>>> Results of nfs4.0 with courteous server: >>>>>>>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed >>>>>>>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>>>>>>> >>>>>>>>>> OPEN18 and OPEN30 test pass if each is run by itself. >>>>>>>>> Could well be a bug in the tests, I don't know. >>>>>>>> The reason OPEN18 failed was because the test timed out waiting for >>>>>>>> the reply of an OPEN call. The RPC connection used for the test was >>>>>>>> configured with 15 secs timeout. Note that OPEN18 only fails when >>>>>>>> the tests were run with 'all' option, this test passes if it's run >>>>>>>> by itself. >>>>>>>> >>>>>>>> With courteous server, by the time OPEN18 runs, there are about 1026 >>>>>>>> courtesy 4.0 clients on the server and all of these clients have opened >>>>>>>> the same file X with WRITE access. These clients were created by the >>>>>>>> previous tests. After each test completed, since 4.0 does not have >>>>>>>> session, the client states are not cleaned up immediately on the >>>>>>>> server and are allowed to become courtesy clients. >>>>>>>> >>>>>>>> When OPEN18 runs (about 20 minutes after the 1st test started), it >>>>>>>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the >>>>>>>> server to check for conflicts with courtesy clients. The loop that >>>>>>>> checks 1026 courtesy clients for share/access conflict took less >>>>>>>> than 1 sec. But it took about 55 secs, on my VM, for the server >>>>>>>> to expire all 1026 courtesy clients. >>>>>>>> >>>>>>>> I modified pynfs to configure the 4.0 RPC connection with 60 seconds >>>>>>>> timeout and OPEN18 now consistently passed. The 4.0 test results are >>>>>>>> now the same for courteous and non-courteous server: >>>>>>>> >>>>>>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>>> >>>>>>>> Note that 4.1 tests do not suffer this timeout problem because the >>>>>>>> 4.1 clients and sessions are destroyed after each test completes. >>>>>>> Do you want me to send the patch to increase the timeout for pynfs? >>>>>>> or is there any other things you think we should do? >>>>>> I don't know. >>>>>> >>>>>> 55 seconds to clean up 1026 clients is about 50ms per client, which is >>>>>> pretty slow. I wonder why. I guess it's probably updating the stable >>>>>> storage information. Is /var/lib/nfs/ on your server backed by a hard >>>>>> drive or an SSD or something else? >>>>> My server is a virtualbox VM that has 1 CPU, 4GB RAM and 64GB of hard >>>>> disk. I think a production system that supports this many clients should >>>>> have faster CPUs, faster storage. >>>>> >>>>>> I wonder if that's an argument for limiting the number of courtesy >>>>>> clients. >>>>> I think we might want to treat 4.0 clients a bit different from 4.1 >>>>> clients. With 4.0, every client will become a courtesy client after >>>>> the client is done with the export and unmounts it. >>>> It should be safe for a server to purge a client's lease immediately >>>> if there is no open or lock state associated with it. >>> In this case, each client has opened files so there are open states >>> associated with them. >>> >>>> When an NFSv4.0 client unmounts, all files should be closed at that >>>> point, >>> I'm not sure pynfs does proper clean up after each subtest, I will >>> check. There must be state associated with the client in order for >>> it to become courtesy client. >> Makes sense. Then a synthetic client like pynfs can DoS a courteous >> server. >> >> >>>> so the server can wait for the lease to expire and purge it >>>> normally. Or am I missing something? >>> When 4.0 client lease expires and there are still states associated >>> with the client then the server allows this client to become courtesy >>> client. >> I think the same thing happens if an NFSv4.1 client neglects to send >> DESTROY_SESSION / DESTROY_CLIENTID. Either such a client is broken >> or malicious, but the server faces the same issue of protecting >> itself from a DoS attack. >> >> IMO you should consider limiting the number of courteous clients >> the server can hold onto. Let's say that number is 1000. When the >> server wants to turn a 1001st client into a courteous client, it >> can simply expire and purge the oldest courteous client on its >> list. Otherwise, over time, the 24-hour expiry will reduce the >> set of courteous clients back to zero. >> >> What do you think? > > Limiting the number of courteous clients to handle the cases of > broken/malicious 4.1 clients seems reasonable as the last resort. > > I think if a malicious 4.1 clients could mount the server's export, > opens a file (to create state) and repeats the same with a different > client id then it seems like some basic security was already broken; > allowing unauthorized clients to mount server's exports. You can do this today with AUTH_SYS. I consider it a genuine attack surface. > I think if we have to enforce a limit, then it's only for handling > of seriously buggy 4.1 clients which should not be the norm. The > issue with this is how to pick an optimal number that is suitable > for the running server which can be a very slow or a very fast server. > > Note that even if we impose an limit, that does not completely solve > the problem with pynfs 4.0 test since its RPC timeout is configured > with 15 secs which just enough to expire 277 clients based on 53ms > for each client, unless we limit it ~270 clients which I think it's > too low. > > This is what I plan to do: > > 1. do not support 4.0 courteous clients, for sure. Not supporting 4.0 isn’t an option, IMHO. It is a fully supported protocol at this time, and the same exposure exists for 4.1, it’s just a little harder to exploit. If you submit the courteous server patch without support for 4.0, I think it needs to include a plan for how 4.0 will be added later. > 2. limit the number of courteous clients to 1000 (?), if you still > think we need it. I think this limit is necessary. It can be set based on the server’s physical memory size if a dynamic limit is desired. > Pls let me know what you think. > > Thanks, > -Dai > >> >> >>>>> Since there is >>>>> no destroy session/client with 4.0, the courteous server allows the >>>>> client to be around and becomes a courtesy client. So after awhile, >>>>> even with normal usage, there will be lots 4.0 courtesy clients >>>>> hanging around and these clients won't be destroyed until 24hrs >>>>> later, or until they cause conflicts with other clients. >>>>> >>>>> We can reduce the courtesy_client_expiry time for 4.0 clients from >>>>> 24hrs to 15/20 mins, enough for most network partition to heal?, >>>>> or limit the number of 4.0 courtesy clients. Or don't support 4.0 >>>>> clients at all which is my preference since I think in general users >>>>> should skip 4.0 and use 4.1 instead. >>>>> >>>>> -Dai >>>> -- >>>> Chuck Lever >>>> >>>> >>>> >> -- >> Chuck Lever >> >> >> ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-30 1:42 ` Chuck Lever III @ 2021-11-30 4:08 ` Trond Myklebust 2021-11-30 4:47 ` Chuck Lever III 2021-11-30 7:13 ` dai.ngo 1 sibling, 1 reply; 53+ messages in thread From: Trond Myklebust @ 2021-11-30 4:08 UTC (permalink / raw) To: dai.ngo, chuck.lever; +Cc: bfields, linux-nfs, linux-fsdevel On Tue, 2021-11-30 at 01:42 +0000, Chuck Lever III wrote: > > > On Nov 29, 2021, at 7:11 PM, Dai Ngo <dai.ngo@oracle.com> wrote: > > > > > > > On 11/29/21 1:10 PM, Chuck Lever III wrote: > > > > > > > > On Nov 29, 2021, at 2:36 PM, Dai Ngo <dai.ngo@oracle.com> > > > > > wrote: > > > > > > > > > > > > On 11/29/21 11:03 AM, Chuck Lever III wrote: > > > > > Hello Dai! > > > > > > > > > > > > > > > > On Nov 29, 2021, at 1:32 PM, Dai Ngo <dai.ngo@oracle.com> > > > > > > wrote: > > > > > > > > > > > > > > > > > > On 11/29/21 9:30 AM, J. Bruce Fields wrote: > > > > > > > On Mon, Nov 29, 2021 at 09:13:16AM -0800, > > > > > > > dai.ngo@oracle.com wrote: > > > > > > > > Hi Bruce, > > > > > > > > > > > > > > > > On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: > > > > > > > > > On 11/17/21 4:34 PM, J. Bruce Fields wrote: > > > > > > > > > > On Wed, Nov 17, 2021 at 01:46:02PM -0800, > > > > > > > > > > dai.ngo@oracle.com wrote: > > > > > > > > > > > On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: > > > > > > > > > > > > On 11/17/21 6:14 AM, J. Bruce Fields wrote: > > > > > > > > > > > > > On Tue, Nov 16, 2021 at 03:06:32PM -0800, > > > > > > > > > > > > > dai.ngo@oracle.com wrote: > > > > > > > > > > > > > > Just a reminder that this patch is still > > > > > > > > > > > > > > waiting for your review. > > > > > > > > > > > > > Yeah, I was procrastinating and hoping yo'ud > > > > > > > > > > > > > figure out the pynfs > > > > > > > > > > > > > failure for me.... > > > > > > > > > > > > Last time I ran 4.0 OPEN18 test by itself and > > > > > > > > > > > > it passed. I will run > > > > > > > > > > > > all OPEN tests together with 5.15-rc7 to see if > > > > > > > > > > > > the problem you've > > > > > > > > > > > > seen still there. > > > > > > > > > > > I ran all tests in nfsv4.1 and nfsv4.0 with > > > > > > > > > > > courteous and non-courteous > > > > > > > > > > > 5.15-rc7 server. > > > > > > > > > > > > > > > > > > > > > > Nfs4.1 results are the same for both courteous > > > > > > > > > > > and > > > > > > > > > > > non-courteous server: > > > > > > > > > > > > Of those: 0 Skipped, 0 Failed, 0 Warned, 169 > > > > > > > > > > > > Passed > > > > > > > > > > > Results of nfs4.0 with non-courteous server: > > > > > > > > > > > > Of those: 8 Skipped, 1 Failed, 0 Warned, 577 > > > > > > > > > > > > Passed > > > > > > > > > > > test failed: LOCK24 > > > > > > > > > > > > > > > > > > > > > > Results of nfs4.0 with courteous server: > > > > > > > > > > > > Of those: 8 Skipped, 3 Failed, 0 Warned, 575 > > > > > > > > > > > > Passed > > > > > > > > > > > tests failed: LOCK24, OPEN18, OPEN30 > > > > > > > > > > > > > > > > > > > > > > OPEN18 and OPEN30 test pass if each is run by > > > > > > > > > > > itself. > > > > > > > > > > Could well be a bug in the tests, I don't know. > > > > > > > > > The reason OPEN18 failed was because the test timed > > > > > > > > > out waiting for > > > > > > > > > the reply of an OPEN call. The RPC connection used > > > > > > > > > for the test was > > > > > > > > > configured with 15 secs timeout. Note that OPEN18 > > > > > > > > > only fails when > > > > > > > > > the tests were run with 'all' option, this test > > > > > > > > > passes if it's run > > > > > > > > > by itself. > > > > > > > > > > > > > > > > > > With courteous server, by the time OPEN18 runs, there > > > > > > > > > are about 1026 > > > > > > > > > courtesy 4.0 clients on the server and all of these > > > > > > > > > clients have opened > > > > > > > > > the same file X with WRITE access. These clients were > > > > > > > > > created by the > > > > > > > > > previous tests. After each test completed, since 4.0 > > > > > > > > > does not have > > > > > > > > > session, the client states are not cleaned up > > > > > > > > > immediately on the > > > > > > > > > server and are allowed to become courtesy clients. > > > > > > > > > > > > > > > > > > When OPEN18 runs (about 20 minutes after the 1st test > > > > > > > > > started), it > > > > > > > > > sends OPEN of file X with OPEN4_SHARE_DENY_WRITE > > > > > > > > > which causes the > > > > > > > > > server to check for conflicts with courtesy clients. > > > > > > > > > The loop that > > > > > > > > > checks 1026 courtesy clients for share/access > > > > > > > > > conflict took less > > > > > > > > > than 1 sec. But it took about 55 secs, on my VM, for > > > > > > > > > the server > > > > > > > > > to expire all 1026 courtesy clients. > > > > > > > > > > > > > > > > > > I modified pynfs to configure the 4.0 RPC connection > > > > > > > > > with 60 seconds > > > > > > > > > timeout and OPEN18 now consistently passed. The 4.0 > > > > > > > > > test results are > > > > > > > > > now the same for courteous and non-courteous server: > > > > > > > > > > > > > > > > > > 8 Skipped, 1 Failed, 0 Warned, 577 Passed > > > > > > > > > > > > > > > > > > Note that 4.1 tests do not suffer this timeout > > > > > > > > > problem because the > > > > > > > > > 4.1 clients and sessions are destroyed after each > > > > > > > > > test completes. > > > > > > > > Do you want me to send the patch to increase the > > > > > > > > timeout for pynfs? > > > > > > > > or is there any other things you think we should do? > > > > > > > I don't know. > > > > > > > > > > > > > > 55 seconds to clean up 1026 clients is about 50ms per > > > > > > > client, which is > > > > > > > pretty slow. I wonder why. I guess it's probably > > > > > > > updating the stable > > > > > > > storage information. Is /var/lib/nfs/ on your server > > > > > > > backed by a hard > > > > > > > drive or an SSD or something else? > > > > > > My server is a virtualbox VM that has 1 CPU, 4GB RAM and > > > > > > 64GB of hard > > > > > > disk. I think a production system that supports this many > > > > > > clients should > > > > > > have faster CPUs, faster storage. > > > > > > > > > > > > > I wonder if that's an argument for limiting the number of > > > > > > > courtesy > > > > > > > clients. > > > > > > I think we might want to treat 4.0 clients a bit different > > > > > > from 4.1 > > > > > > clients. With 4.0, every client will become a courtesy > > > > > > client after > > > > > > the client is done with the export and unmounts it. > > > > > It should be safe for a server to purge a client's lease > > > > > immediately > > > > > if there is no open or lock state associated with it. > > > > In this case, each client has opened files so there are open > > > > states > > > > associated with them. > > > > > > > > > When an NFSv4.0 client unmounts, all files should be closed > > > > > at that > > > > > point, > > > > I'm not sure pynfs does proper clean up after each subtest, I > > > > will > > > > check. There must be state associated with the client in order > > > > for > > > > it to become courtesy client. > > > Makes sense. Then a synthetic client like pynfs can DoS a > > > courteous > > > server. > > > > > > > > > > > so the server can wait for the lease to expire and purge it > > > > > normally. Or am I missing something? > > > > When 4.0 client lease expires and there are still states > > > > associated > > > > with the client then the server allows this client to become > > > > courtesy > > > > client. > > > I think the same thing happens if an NFSv4.1 client neglects to > > > send > > > DESTROY_SESSION / DESTROY_CLIENTID. Either such a client is > > > broken > > > or malicious, but the server faces the same issue of protecting > > > itself from a DoS attack. > > > > > > IMO you should consider limiting the number of courteous clients > > > the server can hold onto. Let's say that number is 1000. When the > > > server wants to turn a 1001st client into a courteous client, it > > > can simply expire and purge the oldest courteous client on its > > > list. Otherwise, over time, the 24-hour expiry will reduce the > > > set of courteous clients back to zero. > > > > > > What do you think? > > > > Limiting the number of courteous clients to handle the cases of > > broken/malicious 4.1 clients seems reasonable as the last resort. > > > > I think if a malicious 4.1 clients could mount the server's export, > > opens a file (to create state) and repeats the same with a > > different > > client id then it seems like some basic security was already > > broken; > > allowing unauthorized clients to mount server's exports. > > You can do this today with AUTH_SYS. I consider it a genuine attack > surface. > > > > I think if we have to enforce a limit, then it's only for handling > > of seriously buggy 4.1 clients which should not be the norm. The > > issue with this is how to pick an optimal number that is suitable > > for the running server which can be a very slow or a very fast > > server. > > > > Note that even if we impose an limit, that does not completely > > solve > > the problem with pynfs 4.0 test since its RPC timeout is configured > > with 15 secs which just enough to expire 277 clients based on 53ms > > for each client, unless we limit it ~270 clients which I think it's > > too low. > > > > This is what I plan to do: > > > > 1. do not support 4.0 courteous clients, for sure. > > Not supporting 4.0 isn’t an option, IMHO. It is a fully supported > protocol at this time, and the same exposure exists for 4.1, it’s > just a little harder to exploit. > > If you submit the courteous server patch without support for 4.0, I > think it needs to include a plan for how 4.0 will be added later. > > > Why is there a problem here? The requirements are the same for 4.0 and 4.1 (or 4.2). If the lease under which the courtesy lock was established has expired, then that courtesy lock must be released if some other client requests a lock that conflicts with the cached lock (unless the client breaks the courtesy framework by renewing that original lease before the conflict occurs). Otherwise, it is completely up to the server when it decides to actually release the lock. For NFSv4.1 and NFSv4.2, we have DESTROY_CLIENTID, which tells the server when the client is actually done with the lease, making it easy to determine when it is safe to release all the courtesy locks. However if the client does not send DESTROY_CLIENTID, then we're in the same situation with 4.x (x>0) as we would be with bog standard NFSv4.0. The lease has expired, and so the courtesy locks are liable to being dropped. At Hammerspace we have implemented courtesy locks, and our strategy is that when a conflict occurs, we drop the entire set of courtesy locks so that we don't have to deal with the "some locks were revoked" scenario. The reason is that when we originally implemented courtesy locks, the Linux NFSv4 client support for lock revocation was a lot less sophisticated than today. My suggestion is that you might therefore consider starting along this path, and then refining the support to make revocation more nuanced once you are confident that the coarser strategy is working as expected. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-30 4:08 ` Trond Myklebust @ 2021-11-30 4:47 ` Chuck Lever III 2021-11-30 4:57 ` Trond Myklebust 0 siblings, 1 reply; 53+ messages in thread From: Chuck Lever III @ 2021-11-30 4:47 UTC (permalink / raw) To: Trond Myklebust; +Cc: Dai Ngo, bfields, linux-nfs, linux-fsdevel > On Nov 29, 2021, at 11:08 PM, Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Tue, 2021-11-30 at 01:42 +0000, Chuck Lever III wrote: >> >>>> On Nov 29, 2021, at 7:11 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >>> >>> >>>> On 11/29/21 1:10 PM, Chuck Lever III wrote: >>>> >>>>>> On Nov 29, 2021, at 2:36 PM, Dai Ngo <dai.ngo@oracle.com> >>>>>> wrote: >>>>> >>>>> >>>>> On 11/29/21 11:03 AM, Chuck Lever III wrote: >>>>>> Hello Dai! >>>>>> >>>>>> >>>>>>> On Nov 29, 2021, at 1:32 PM, Dai Ngo <dai.ngo@oracle.com> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> On 11/29/21 9:30 AM, J. Bruce Fields wrote: >>>>>>>> On Mon, Nov 29, 2021 at 09:13:16AM -0800, >>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>> Hi Bruce, >>>>>>>>> >>>>>>>>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>>>>>>>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>>>>>>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, >>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: >>>>>>>>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>>>>>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, >>>>>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>>>>> Just a reminder that this patch is still >>>>>>>>>>>>>>> waiting for your review. >>>>>>>>>>>>>> Yeah, I was procrastinating and hoping yo'ud >>>>>>>>>>>>>> figure out the pynfs >>>>>>>>>>>>>> failure for me.... >>>>>>>>>>>>> Last time I ran 4.0 OPEN18 test by itself and >>>>>>>>>>>>> it passed. I will run >>>>>>>>>>>>> all OPEN tests together with 5.15-rc7 to see if >>>>>>>>>>>>> the problem you've >>>>>>>>>>>>> seen still there. >>>>>>>>>>>> I ran all tests in nfsv4.1 and nfsv4.0 with >>>>>>>>>>>> courteous and non-courteous >>>>>>>>>>>> 5.15-rc7 server. >>>>>>>>>>>> >>>>>>>>>>>> Nfs4.1 results are the same for both courteous >>>>>>>>>>>> and >>>>>>>>>>>> non-courteous server: >>>>>>>>>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 >>>>>>>>>>>>> Passed >>>>>>>>>>>> Results of nfs4.0 with non-courteous server: >>>>>>>>>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 >>>>>>>>>>>>> Passed >>>>>>>>>>>> test failed: LOCK24 >>>>>>>>>>>> >>>>>>>>>>>> Results of nfs4.0 with courteous server: >>>>>>>>>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 >>>>>>>>>>>>> Passed >>>>>>>>>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>>>>>>>>> >>>>>>>>>>>> OPEN18 and OPEN30 test pass if each is run by >>>>>>>>>>>> itself. >>>>>>>>>>> Could well be a bug in the tests, I don't know. >>>>>>>>>> The reason OPEN18 failed was because the test timed >>>>>>>>>> out waiting for >>>>>>>>>> the reply of an OPEN call. The RPC connection used >>>>>>>>>> for the test was >>>>>>>>>> configured with 15 secs timeout. Note that OPEN18 >>>>>>>>>> only fails when >>>>>>>>>> the tests were run with 'all' option, this test >>>>>>>>>> passes if it's run >>>>>>>>>> by itself. >>>>>>>>>> >>>>>>>>>> With courteous server, by the time OPEN18 runs, there >>>>>>>>>> are about 1026 >>>>>>>>>> courtesy 4.0 clients on the server and all of these >>>>>>>>>> clients have opened >>>>>>>>>> the same file X with WRITE access. These clients were >>>>>>>>>> created by the >>>>>>>>>> previous tests. After each test completed, since 4.0 >>>>>>>>>> does not have >>>>>>>>>> session, the client states are not cleaned up >>>>>>>>>> immediately on the >>>>>>>>>> server and are allowed to become courtesy clients. >>>>>>>>>> >>>>>>>>>> When OPEN18 runs (about 20 minutes after the 1st test >>>>>>>>>> started), it >>>>>>>>>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE >>>>>>>>>> which causes the >>>>>>>>>> server to check for conflicts with courtesy clients. >>>>>>>>>> The loop that >>>>>>>>>> checks 1026 courtesy clients for share/access >>>>>>>>>> conflict took less >>>>>>>>>> than 1 sec. But it took about 55 secs, on my VM, for >>>>>>>>>> the server >>>>>>>>>> to expire all 1026 courtesy clients. >>>>>>>>>> >>>>>>>>>> I modified pynfs to configure the 4.0 RPC connection >>>>>>>>>> with 60 seconds >>>>>>>>>> timeout and OPEN18 now consistently passed. The 4.0 >>>>>>>>>> test results are >>>>>>>>>> now the same for courteous and non-courteous server: >>>>>>>>>> >>>>>>>>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>>>>> >>>>>>>>>> Note that 4.1 tests do not suffer this timeout >>>>>>>>>> problem because the >>>>>>>>>> 4.1 clients and sessions are destroyed after each >>>>>>>>>> test completes. >>>>>>>>> Do you want me to send the patch to increase the >>>>>>>>> timeout for pynfs? >>>>>>>>> or is there any other things you think we should do? >>>>>>>> I don't know. >>>>>>>> >>>>>>>> 55 seconds to clean up 1026 clients is about 50ms per >>>>>>>> client, which is >>>>>>>> pretty slow. I wonder why. I guess it's probably >>>>>>>> updating the stable >>>>>>>> storage information. Is /var/lib/nfs/ on your server >>>>>>>> backed by a hard >>>>>>>> drive or an SSD or something else? >>>>>>> My server is a virtualbox VM that has 1 CPU, 4GB RAM and >>>>>>> 64GB of hard >>>>>>> disk. I think a production system that supports this many >>>>>>> clients should >>>>>>> have faster CPUs, faster storage. >>>>>>> >>>>>>>> I wonder if that's an argument for limiting the number of >>>>>>>> courtesy >>>>>>>> clients. >>>>>>> I think we might want to treat 4.0 clients a bit different >>>>>>> from 4.1 >>>>>>> clients. With 4.0, every client will become a courtesy >>>>>>> client after >>>>>>> the client is done with the export and unmounts it. >>>>>> It should be safe for a server to purge a client's lease >>>>>> immediately >>>>>> if there is no open or lock state associated with it. >>>>> In this case, each client has opened files so there are open >>>>> states >>>>> associated with them. >>>>> >>>>>> When an NFSv4.0 client unmounts, all files should be closed >>>>>> at that >>>>>> point, >>>>> I'm not sure pynfs does proper clean up after each subtest, I >>>>> will >>>>> check. There must be state associated with the client in order >>>>> for >>>>> it to become courtesy client. >>>> Makes sense. Then a synthetic client like pynfs can DoS a >>>> courteous >>>> server. >>>> >>>> >>>>>> so the server can wait for the lease to expire and purge it >>>>>> normally. Or am I missing something? >>>>> When 4.0 client lease expires and there are still states >>>>> associated >>>>> with the client then the server allows this client to become >>>>> courtesy >>>>> client. >>>> I think the same thing happens if an NFSv4.1 client neglects to >>>> send >>>> DESTROY_SESSION / DESTROY_CLIENTID. Either such a client is >>>> broken >>>> or malicious, but the server faces the same issue of protecting >>>> itself from a DoS attack. >>>> >>>> IMO you should consider limiting the number of courteous clients >>>> the server can hold onto. Let's say that number is 1000. When the >>>> server wants to turn a 1001st client into a courteous client, it >>>> can simply expire and purge the oldest courteous client on its >>>> list. Otherwise, over time, the 24-hour expiry will reduce the >>>> set of courteous clients back to zero. >>>> >>>> What do you think? >>> >>> Limiting the number of courteous clients to handle the cases of >>> broken/malicious 4.1 clients seems reasonable as the last resort. >>> >>> I think if a malicious 4.1 clients could mount the server's export, >>> opens a file (to create state) and repeats the same with a >>> different >>> client id then it seems like some basic security was already >>> broken; >>> allowing unauthorized clients to mount server's exports. >> >> You can do this today with AUTH_SYS. I consider it a genuine attack >> surface. >> >> >>> I think if we have to enforce a limit, then it's only for handling >>> of seriously buggy 4.1 clients which should not be the norm. The >>> issue with this is how to pick an optimal number that is suitable >>> for the running server which can be a very slow or a very fast >>> server. >>> >>> Note that even if we impose an limit, that does not completely >>> solve >>> the problem with pynfs 4.0 test since its RPC timeout is configured >>> with 15 secs which just enough to expire 277 clients based on 53ms >>> for each client, unless we limit it ~270 clients which I think it's >>> too low. >>> >>> This is what I plan to do: >>> >>> 1. do not support 4.0 courteous clients, for sure. >> >> Not supporting 4.0 isn’t an option, IMHO. It is a fully supported >> protocol at this time, and the same exposure exists for 4.1, it’s >> just a little harder to exploit. >> >> If you submit the courteous server patch without support for 4.0, I >> think it needs to include a plan for how 4.0 will be added later. >> >>> > > Why is there a problem here? The requirements are the same for 4.0 and > 4.1 (or 4.2). If the lease under which the courtesy lock was > established has expired, then that courtesy lock must be released if > some other client requests a lock that conflicts with the cached lock > (unless the client breaks the courtesy framework by renewing that > original lease before the conflict occurs). Otherwise, it is completely > up to the server when it decides to actually release the lock. > > For NFSv4.1 and NFSv4.2, we have DESTROY_CLIENTID, which tells the > server when the client is actually done with the lease, making it easy > to determine when it is safe to release all the courtesy locks. However > if the client does not send DESTROY_CLIENTID, then we're in the same > situation with 4.x (x>0) as we would be with bog standard NFSv4.0. The > lease has expired, and so the courtesy locks are liable to being > dropped. I agree the situation is the same for all minor versions. > At Hammerspace we have implemented courtesy locks, and our strategy is > that when a conflict occurs, we drop the entire set of courtesy locks > so that we don't have to deal with the "some locks were revoked" > scenario. The reason is that when we originally implemented courtesy > locks, the Linux NFSv4 client support for lock revocation was a lot > less sophisticated than today. My suggestion is that you might > therefore consider starting along this path, and then refining the > support to make revocation more nuanced once you are confident that the > coarser strategy is working as expected. Dai’s implementation does all that, and takes the coarser approach at the moment. There are plans to explore the more nuanced behavior (by revoking only the conflicting lock instead of dropping the whole lease) after this initial work is merged. The issue is there are certain pathological client behaviors (whether malicious or accidental) that can run the server out of resources, since it is holding onto lease state for a much longer time. We are simply trying to design a lease garbage collection scheme to meet that challenge. I think limiting the number of courteous clients is a simple way to do this, but we could also shorten the courtesy lifetime as more clients enter that state, to ensure that they don’t overrun the server’s memory. Another approach might be to add a shrinker that purges the oldest courteous clients when the server comes under memory pressure. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-30 4:47 ` Chuck Lever III @ 2021-11-30 4:57 ` Trond Myklebust 2021-11-30 7:22 ` dai.ngo 0 siblings, 1 reply; 53+ messages in thread From: Trond Myklebust @ 2021-11-30 4:57 UTC (permalink / raw) To: chuck.lever; +Cc: bfields, dai.ngo, linux-nfs, linux-fsdevel On Tue, 2021-11-30 at 04:47 +0000, Chuck Lever III wrote: > > > On Nov 29, 2021, at 11:08 PM, Trond Myklebust > > <trondmy@hammerspace.com> wrote: > > > > On Tue, 2021-11-30 at 01:42 +0000, Chuck Lever III wrote: > > > > > > > > On Nov 29, 2021, at 7:11 PM, Dai Ngo <dai.ngo@oracle.com> > > > > > wrote: > > > > > > > > > > > > > On 11/29/21 1:10 PM, Chuck Lever III wrote: > > > > > > > > > > > > On Nov 29, 2021, at 2:36 PM, Dai Ngo <dai.ngo@oracle.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > On 11/29/21 11:03 AM, Chuck Lever III wrote: > > > > > > > Hello Dai! > > > > > > > > > > > > > > > > > > > > > > On Nov 29, 2021, at 1:32 PM, Dai Ngo > > > > > > > > <dai.ngo@oracle.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > On 11/29/21 9:30 AM, J. Bruce Fields wrote: > > > > > > > > > On Mon, Nov 29, 2021 at 09:13:16AM -0800, > > > > > > > > > dai.ngo@oracle.com wrote: > > > > > > > > > > Hi Bruce, > > > > > > > > > > > > > > > > > > > > On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: > > > > > > > > > > > On 11/17/21 4:34 PM, J. Bruce Fields wrote: > > > > > > > > > > > > On Wed, Nov 17, 2021 at 01:46:02PM -0800, > > > > > > > > > > > > dai.ngo@oracle.com wrote: > > > > > > > > > > > > > On 11/17/21 9:59 AM, > > > > > > > > > > > > > dai.ngo@oracle.com wrote: > > > > > > > > > > > > > > On 11/17/21 6:14 AM, J. Bruce Fields wrote: > > > > > > > > > > > > > > > On Tue, Nov 16, 2021 at 03:06:32PM -0800, > > > > > > > > > > > > > > > dai.ngo@oracle.com wrote: > > > > > > > > > > > > > > > > Just a reminder that this patch is > > > > > > > > > > > > > > > > still > > > > > > > > > > > > > > > > waiting for your review. > > > > > > > > > > > > > > > Yeah, I was procrastinating and hoping > > > > > > > > > > > > > > > yo'ud > > > > > > > > > > > > > > > figure out the pynfs > > > > > > > > > > > > > > > failure for me.... > > > > > > > > > > > > > > Last time I ran 4.0 OPEN18 test by itself > > > > > > > > > > > > > > and > > > > > > > > > > > > > > it passed. I will run > > > > > > > > > > > > > > all OPEN tests together with 5.15-rc7 to > > > > > > > > > > > > > > see if > > > > > > > > > > > > > > the problem you've > > > > > > > > > > > > > > seen still there. > > > > > > > > > > > > > I ran all tests in nfsv4.1 and nfsv4.0 with > > > > > > > > > > > > > courteous and non-courteous > > > > > > > > > > > > > 5.15-rc7 server. > > > > > > > > > > > > > > > > > > > > > > > > > > Nfs4.1 results are the same for both > > > > > > > > > > > > > courteous > > > > > > > > > > > > > and > > > > > > > > > > > > > non-courteous server: > > > > > > > > > > > > > > Of those: 0 Skipped, 0 Failed, 0 Warned, > > > > > > > > > > > > > > 169 > > > > > > > > > > > > > > Passed > > > > > > > > > > > > > Results of nfs4.0 with non-courteous server: > > > > > > > > > > > > > > Of those: 8 Skipped, 1 Failed, 0 Warned, > > > > > > > > > > > > > > 577 > > > > > > > > > > > > > > Passed > > > > > > > > > > > > > test failed: LOCK24 > > > > > > > > > > > > > > > > > > > > > > > > > > Results of nfs4.0 with courteous server: > > > > > > > > > > > > > > Of those: 8 Skipped, 3 Failed, 0 Warned, > > > > > > > > > > > > > > 575 > > > > > > > > > > > > > > Passed > > > > > > > > > > > > > tests failed: LOCK24, OPEN18, OPEN30 > > > > > > > > > > > > > > > > > > > > > > > > > > OPEN18 and OPEN30 test pass if each is run by > > > > > > > > > > > > > itself. > > > > > > > > > > > > Could well be a bug in the tests, I don't know. > > > > > > > > > > > The reason OPEN18 failed was because the test > > > > > > > > > > > timed > > > > > > > > > > > out waiting for > > > > > > > > > > > the reply of an OPEN call. The RPC connection > > > > > > > > > > > used > > > > > > > > > > > for the test was > > > > > > > > > > > configured with 15 secs timeout. Note that OPEN18 > > > > > > > > > > > only fails when > > > > > > > > > > > the tests were run with 'all' option, this test > > > > > > > > > > > passes if it's run > > > > > > > > > > > by itself. > > > > > > > > > > > > > > > > > > > > > > With courteous server, by the time OPEN18 runs, > > > > > > > > > > > there > > > > > > > > > > > are about 1026 > > > > > > > > > > > courtesy 4.0 clients on the server and all of > > > > > > > > > > > these > > > > > > > > > > > clients have opened > > > > > > > > > > > the same file X with WRITE access. These clients > > > > > > > > > > > were > > > > > > > > > > > created by the > > > > > > > > > > > previous tests. After each test completed, since > > > > > > > > > > > 4.0 > > > > > > > > > > > does not have > > > > > > > > > > > session, the client states are not cleaned up > > > > > > > > > > > immediately on the > > > > > > > > > > > server and are allowed to become courtesy > > > > > > > > > > > clients. > > > > > > > > > > > > > > > > > > > > > > When OPEN18 runs (about 20 minutes after the 1st > > > > > > > > > > > test > > > > > > > > > > > started), it > > > > > > > > > > > sends OPEN of file X with OPEN4_SHARE_DENY_WRITE > > > > > > > > > > > which causes the > > > > > > > > > > > server to check for conflicts with courtesy > > > > > > > > > > > clients. > > > > > > > > > > > The loop that > > > > > > > > > > > checks 1026 courtesy clients for share/access > > > > > > > > > > > conflict took less > > > > > > > > > > > than 1 sec. But it took about 55 secs, on my VM, > > > > > > > > > > > for > > > > > > > > > > > the server > > > > > > > > > > > to expire all 1026 courtesy clients. > > > > > > > > > > > > > > > > > > > > > > I modified pynfs to configure the 4.0 RPC > > > > > > > > > > > connection > > > > > > > > > > > with 60 seconds > > > > > > > > > > > timeout and OPEN18 now consistently passed. The > > > > > > > > > > > 4.0 > > > > > > > > > > > test results are > > > > > > > > > > > now the same for courteous and non-courteous > > > > > > > > > > > server: > > > > > > > > > > > > > > > > > > > > > > 8 Skipped, 1 Failed, 0 Warned, 577 Passed > > > > > > > > > > > > > > > > > > > > > > Note that 4.1 tests do not suffer this timeout > > > > > > > > > > > problem because the > > > > > > > > > > > 4.1 clients and sessions are destroyed after each > > > > > > > > > > > test completes. > > > > > > > > > > Do you want me to send the patch to increase the > > > > > > > > > > timeout for pynfs? > > > > > > > > > > or is there any other things you think we should > > > > > > > > > > do? > > > > > > > > > I don't know. > > > > > > > > > > > > > > > > > > 55 seconds to clean up 1026 clients is about 50ms per > > > > > > > > > client, which is > > > > > > > > > pretty slow. I wonder why. I guess it's probably > > > > > > > > > updating the stable > > > > > > > > > storage information. Is /var/lib/nfs/ on your server > > > > > > > > > backed by a hard > > > > > > > > > drive or an SSD or something else? > > > > > > > > My server is a virtualbox VM that has 1 CPU, 4GB RAM > > > > > > > > and > > > > > > > > 64GB of hard > > > > > > > > disk. I think a production system that supports this > > > > > > > > many > > > > > > > > clients should > > > > > > > > have faster CPUs, faster storage. > > > > > > > > > > > > > > > > > I wonder if that's an argument for limiting the > > > > > > > > > number of > > > > > > > > > courtesy > > > > > > > > > clients. > > > > > > > > I think we might want to treat 4.0 clients a bit > > > > > > > > different > > > > > > > > from 4.1 > > > > > > > > clients. With 4.0, every client will become a courtesy > > > > > > > > client after > > > > > > > > the client is done with the export and unmounts it. > > > > > > > It should be safe for a server to purge a client's lease > > > > > > > immediately > > > > > > > if there is no open or lock state associated with it. > > > > > > In this case, each client has opened files so there are > > > > > > open > > > > > > states > > > > > > associated with them. > > > > > > > > > > > > > When an NFSv4.0 client unmounts, all files should be > > > > > > > closed > > > > > > > at that > > > > > > > point, > > > > > > I'm not sure pynfs does proper clean up after each subtest, > > > > > > I > > > > > > will > > > > > > check. There must be state associated with the client in > > > > > > order > > > > > > for > > > > > > it to become courtesy client. > > > > > Makes sense. Then a synthetic client like pynfs can DoS a > > > > > courteous > > > > > server. > > > > > > > > > > > > > > > > > so the server can wait for the lease to expire and purge > > > > > > > it > > > > > > > normally. Or am I missing something? > > > > > > When 4.0 client lease expires and there are still states > > > > > > associated > > > > > > with the client then the server allows this client to > > > > > > become > > > > > > courtesy > > > > > > client. > > > > > I think the same thing happens if an NFSv4.1 client neglects > > > > > to > > > > > send > > > > > DESTROY_SESSION / DESTROY_CLIENTID. Either such a client is > > > > > broken > > > > > or malicious, but the server faces the same issue of > > > > > protecting > > > > > itself from a DoS attack. > > > > > > > > > > IMO you should consider limiting the number of courteous > > > > > clients > > > > > the server can hold onto. Let's say that number is 1000. When > > > > > the > > > > > server wants to turn a 1001st client into a courteous client, > > > > > it > > > > > can simply expire and purge the oldest courteous client on > > > > > its > > > > > list. Otherwise, over time, the 24-hour expiry will reduce > > > > > the > > > > > set of courteous clients back to zero. > > > > > > > > > > What do you think? > > > > > > > > Limiting the number of courteous clients to handle the cases of > > > > broken/malicious 4.1 clients seems reasonable as the last > > > > resort. > > > > > > > > I think if a malicious 4.1 clients could mount the server's > > > > export, > > > > opens a file (to create state) and repeats the same with a > > > > different > > > > client id then it seems like some basic security was already > > > > broken; > > > > allowing unauthorized clients to mount server's exports. > > > > > > You can do this today with AUTH_SYS. I consider it a genuine > > > attack > > > surface. > > > > > > > > > > I think if we have to enforce a limit, then it's only for > > > > handling > > > > of seriously buggy 4.1 clients which should not be the norm. > > > > The > > > > issue with this is how to pick an optimal number that is > > > > suitable > > > > for the running server which can be a very slow or a very fast > > > > server. > > > > > > > > Note that even if we impose an limit, that does not completely > > > > solve > > > > the problem with pynfs 4.0 test since its RPC timeout is > > > > configured > > > > with 15 secs which just enough to expire 277 clients based on > > > > 53ms > > > > for each client, unless we limit it ~270 clients which I think > > > > it's > > > > too low. > > > > > > > > This is what I plan to do: > > > > > > > > 1. do not support 4.0 courteous clients, for sure. > > > > > > Not supporting 4.0 isn’t an option, IMHO. It is a fully supported > > > protocol at this time, and the same exposure exists for 4.1, it’s > > > just a little harder to exploit. > > > > > > If you submit the courteous server patch without support for 4.0, > > > I > > > think it needs to include a plan for how 4.0 will be added later. > > > > > > > > > > > Why is there a problem here? The requirements are the same for 4.0 > > and > > 4.1 (or 4.2). If the lease under which the courtesy lock was > > established has expired, then that courtesy lock must be released > > if > > some other client requests a lock that conflicts with the cached > > lock > > (unless the client breaks the courtesy framework by renewing that > > original lease before the conflict occurs). Otherwise, it is > > completely > > up to the server when it decides to actually release the lock. > > > > For NFSv4.1 and NFSv4.2, we have DESTROY_CLIENTID, which tells the > > server when the client is actually done with the lease, making it > > easy > > to determine when it is safe to release all the courtesy locks. > > However > > if the client does not send DESTROY_CLIENTID, then we're in the > > same > > situation with 4.x (x>0) as we would be with bog standard NFSv4.0. > > The > > lease has expired, and so the courtesy locks are liable to being > > dropped. > > I agree the situation is the same for all minor versions. > > > > At Hammerspace we have implemented courtesy locks, and our strategy > > is > > that when a conflict occurs, we drop the entire set of courtesy > > locks > > so that we don't have to deal with the "some locks were revoked" > > scenario. The reason is that when we originally implemented > > courtesy > > locks, the Linux NFSv4 client support for lock revocation was a lot > > less sophisticated than today. My suggestion is that you might > > therefore consider starting along this path, and then refining the > > support to make revocation more nuanced once you are confident that > > the > > coarser strategy is working as expected. > > Dai’s implementation does all that, and takes the coarser approach at > the moment. There are plans to explore the more nuanced behavior (by > revoking only the conflicting lock instead of dropping the whole > lease) after this initial work is merged. > > The issue is there are certain pathological client behaviors (whether > malicious or accidental) that can run the server out of resources, > since it is holding onto lease state for a much longer time. We are > simply trying to design a lease garbage collection scheme to meet > that challenge. > > I think limiting the number of courteous clients is a simple way to > do this, but we could also shorten the courtesy lifetime as more > clients enter that state, to ensure that they don’t overrun the > server’s memory. Another approach might be to add a shrinker that > purges the oldest courteous clients when the server comes under > memory pressure. > > We already have a scanner that tries to release all client state after 1 lease period. Just extend that to do it after 10 lease periods. If a network partition hasn't recovered after 10 minutes, you probably have bigger problems. You can limit the number of clients as well, but that leads into a rats nest of other issues that have nothing to do with courtesy locks and everything to do with the fact that any client can hold a lot of state. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-30 4:57 ` Trond Myklebust @ 2021-11-30 7:22 ` dai.ngo 2021-11-30 13:37 ` Trond Myklebust 2021-11-30 15:36 ` Chuck Lever III 0 siblings, 2 replies; 53+ messages in thread From: dai.ngo @ 2021-11-30 7:22 UTC (permalink / raw) To: Trond Myklebust, chuck.lever; +Cc: bfields, linux-nfs, linux-fsdevel On 11/29/21 8:57 PM, Trond Myklebust wrote: > On Tue, 2021-11-30 at 04:47 +0000, Chuck Lever III wrote: >>> On Nov 29, 2021, at 11:08 PM, Trond Myklebust >>> <trondmy@hammerspace.com> wrote: >>> >>> On Tue, 2021-11-30 at 01:42 +0000, Chuck Lever III wrote: >>>>>> On Nov 29, 2021, at 7:11 PM, Dai Ngo <dai.ngo@oracle.com> >>>>>> wrote: >>>>> >>>>>> On 11/29/21 1:10 PM, Chuck Lever III wrote: >>>>>> >>>>>>>> On Nov 29, 2021, at 2:36 PM, Dai Ngo <dai.ngo@oracle.com> >>>>>>>> wrote: >>>>>>> >>>>>>> On 11/29/21 11:03 AM, Chuck Lever III wrote: >>>>>>>> Hello Dai! >>>>>>>> >>>>>>>> >>>>>>>>> On Nov 29, 2021, at 1:32 PM, Dai Ngo >>>>>>>>> <dai.ngo@oracle.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 11/29/21 9:30 AM, J. Bruce Fields wrote: >>>>>>>>>> On Mon, Nov 29, 2021 at 09:13:16AM -0800, >>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>> Hi Bruce, >>>>>>>>>>> >>>>>>>>>>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>>>>>>>>>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>>>>>>>>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, >>>>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>>>> On 11/17/21 9:59 AM, >>>>>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>>>>>>>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, >>>>>>>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>>>>>>> Just a reminder that this patch is >>>>>>>>>>>>>>>>> still >>>>>>>>>>>>>>>>> waiting for your review. >>>>>>>>>>>>>>>> Yeah, I was procrastinating and hoping >>>>>>>>>>>>>>>> yo'ud >>>>>>>>>>>>>>>> figure out the pynfs >>>>>>>>>>>>>>>> failure for me.... >>>>>>>>>>>>>>> Last time I ran 4.0 OPEN18 test by itself >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> it passed. I will run >>>>>>>>>>>>>>> all OPEN tests together with 5.15-rc7 to >>>>>>>>>>>>>>> see if >>>>>>>>>>>>>>> the problem you've >>>>>>>>>>>>>>> seen still there. >>>>>>>>>>>>>> I ran all tests in nfsv4.1 and nfsv4.0 with >>>>>>>>>>>>>> courteous and non-courteous >>>>>>>>>>>>>> 5.15-rc7 server. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Nfs4.1 results are the same for both >>>>>>>>>>>>>> courteous >>>>>>>>>>>>>> and >>>>>>>>>>>>>> non-courteous server: >>>>>>>>>>>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, >>>>>>>>>>>>>>> 169 >>>>>>>>>>>>>>> Passed >>>>>>>>>>>>>> Results of nfs4.0 with non-courteous server: >>>>>>>>>>>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, >>>>>>>>>>>>>>> 577 >>>>>>>>>>>>>>> Passed >>>>>>>>>>>>>> test failed: LOCK24 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Results of nfs4.0 with courteous server: >>>>>>>>>>>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, >>>>>>>>>>>>>>> 575 >>>>>>>>>>>>>>> Passed >>>>>>>>>>>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>>>>>>>>>>> >>>>>>>>>>>>>> OPEN18 and OPEN30 test pass if each is run by >>>>>>>>>>>>>> itself. >>>>>>>>>>>>> Could well be a bug in the tests, I don't know. >>>>>>>>>>>> The reason OPEN18 failed was because the test >>>>>>>>>>>> timed >>>>>>>>>>>> out waiting for >>>>>>>>>>>> the reply of an OPEN call. The RPC connection >>>>>>>>>>>> used >>>>>>>>>>>> for the test was >>>>>>>>>>>> configured with 15 secs timeout. Note that OPEN18 >>>>>>>>>>>> only fails when >>>>>>>>>>>> the tests were run with 'all' option, this test >>>>>>>>>>>> passes if it's run >>>>>>>>>>>> by itself. >>>>>>>>>>>> >>>>>>>>>>>> With courteous server, by the time OPEN18 runs, >>>>>>>>>>>> there >>>>>>>>>>>> are about 1026 >>>>>>>>>>>> courtesy 4.0 clients on the server and all of >>>>>>>>>>>> these >>>>>>>>>>>> clients have opened >>>>>>>>>>>> the same file X with WRITE access. These clients >>>>>>>>>>>> were >>>>>>>>>>>> created by the >>>>>>>>>>>> previous tests. After each test completed, since >>>>>>>>>>>> 4.0 >>>>>>>>>>>> does not have >>>>>>>>>>>> session, the client states are not cleaned up >>>>>>>>>>>> immediately on the >>>>>>>>>>>> server and are allowed to become courtesy >>>>>>>>>>>> clients. >>>>>>>>>>>> >>>>>>>>>>>> When OPEN18 runs (about 20 minutes after the 1st >>>>>>>>>>>> test >>>>>>>>>>>> started), it >>>>>>>>>>>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE >>>>>>>>>>>> which causes the >>>>>>>>>>>> server to check for conflicts with courtesy >>>>>>>>>>>> clients. >>>>>>>>>>>> The loop that >>>>>>>>>>>> checks 1026 courtesy clients for share/access >>>>>>>>>>>> conflict took less >>>>>>>>>>>> than 1 sec. But it took about 55 secs, on my VM, >>>>>>>>>>>> for >>>>>>>>>>>> the server >>>>>>>>>>>> to expire all 1026 courtesy clients. >>>>>>>>>>>> >>>>>>>>>>>> I modified pynfs to configure the 4.0 RPC >>>>>>>>>>>> connection >>>>>>>>>>>> with 60 seconds >>>>>>>>>>>> timeout and OPEN18 now consistently passed. The >>>>>>>>>>>> 4.0 >>>>>>>>>>>> test results are >>>>>>>>>>>> now the same for courteous and non-courteous >>>>>>>>>>>> server: >>>>>>>>>>>> >>>>>>>>>>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>>>>>>> >>>>>>>>>>>> Note that 4.1 tests do not suffer this timeout >>>>>>>>>>>> problem because the >>>>>>>>>>>> 4.1 clients and sessions are destroyed after each >>>>>>>>>>>> test completes. >>>>>>>>>>> Do you want me to send the patch to increase the >>>>>>>>>>> timeout for pynfs? >>>>>>>>>>> or is there any other things you think we should >>>>>>>>>>> do? >>>>>>>>>> I don't know. >>>>>>>>>> >>>>>>>>>> 55 seconds to clean up 1026 clients is about 50ms per >>>>>>>>>> client, which is >>>>>>>>>> pretty slow. I wonder why. I guess it's probably >>>>>>>>>> updating the stable >>>>>>>>>> storage information. Is /var/lib/nfs/ on your server >>>>>>>>>> backed by a hard >>>>>>>>>> drive or an SSD or something else? >>>>>>>>> My server is a virtualbox VM that has 1 CPU, 4GB RAM >>>>>>>>> and >>>>>>>>> 64GB of hard >>>>>>>>> disk. I think a production system that supports this >>>>>>>>> many >>>>>>>>> clients should >>>>>>>>> have faster CPUs, faster storage. >>>>>>>>> >>>>>>>>>> I wonder if that's an argument for limiting the >>>>>>>>>> number of >>>>>>>>>> courtesy >>>>>>>>>> clients. >>>>>>>>> I think we might want to treat 4.0 clients a bit >>>>>>>>> different >>>>>>>>> from 4.1 >>>>>>>>> clients. With 4.0, every client will become a courtesy >>>>>>>>> client after >>>>>>>>> the client is done with the export and unmounts it. >>>>>>>> It should be safe for a server to purge a client's lease >>>>>>>> immediately >>>>>>>> if there is no open or lock state associated with it. >>>>>>> In this case, each client has opened files so there are >>>>>>> open >>>>>>> states >>>>>>> associated with them. >>>>>>> >>>>>>>> When an NFSv4.0 client unmounts, all files should be >>>>>>>> closed >>>>>>>> at that >>>>>>>> point, >>>>>>> I'm not sure pynfs does proper clean up after each subtest, >>>>>>> I >>>>>>> will >>>>>>> check. There must be state associated with the client in >>>>>>> order >>>>>>> for >>>>>>> it to become courtesy client. >>>>>> Makes sense. Then a synthetic client like pynfs can DoS a >>>>>> courteous >>>>>> server. >>>>>> >>>>>> >>>>>>>> so the server can wait for the lease to expire and purge >>>>>>>> it >>>>>>>> normally. Or am I missing something? >>>>>>> When 4.0 client lease expires and there are still states >>>>>>> associated >>>>>>> with the client then the server allows this client to >>>>>>> become >>>>>>> courtesy >>>>>>> client. >>>>>> I think the same thing happens if an NFSv4.1 client neglects >>>>>> to >>>>>> send >>>>>> DESTROY_SESSION / DESTROY_CLIENTID. Either such a client is >>>>>> broken >>>>>> or malicious, but the server faces the same issue of >>>>>> protecting >>>>>> itself from a DoS attack. >>>>>> >>>>>> IMO you should consider limiting the number of courteous >>>>>> clients >>>>>> the server can hold onto. Let's say that number is 1000. When >>>>>> the >>>>>> server wants to turn a 1001st client into a courteous client, >>>>>> it >>>>>> can simply expire and purge the oldest courteous client on >>>>>> its >>>>>> list. Otherwise, over time, the 24-hour expiry will reduce >>>>>> the >>>>>> set of courteous clients back to zero. >>>>>> >>>>>> What do you think? >>>>> Limiting the number of courteous clients to handle the cases of >>>>> broken/malicious 4.1 clients seems reasonable as the last >>>>> resort. >>>>> >>>>> I think if a malicious 4.1 clients could mount the server's >>>>> export, >>>>> opens a file (to create state) and repeats the same with a >>>>> different >>>>> client id then it seems like some basic security was already >>>>> broken; >>>>> allowing unauthorized clients to mount server's exports. >>>> You can do this today with AUTH_SYS. I consider it a genuine >>>> attack >>>> surface. >>>> >>>> >>>>> I think if we have to enforce a limit, then it's only for >>>>> handling >>>>> of seriously buggy 4.1 clients which should not be the norm. >>>>> The >>>>> issue with this is how to pick an optimal number that is >>>>> suitable >>>>> for the running server which can be a very slow or a very fast >>>>> server. >>>>> >>>>> Note that even if we impose an limit, that does not completely >>>>> solve >>>>> the problem with pynfs 4.0 test since its RPC timeout is >>>>> configured >>>>> with 15 secs which just enough to expire 277 clients based on >>>>> 53ms >>>>> for each client, unless we limit it ~270 clients which I think >>>>> it's >>>>> too low. >>>>> >>>>> This is what I plan to do: >>>>> >>>>> 1. do not support 4.0 courteous clients, for sure. >>>> Not supporting 4.0 isn’t an option, IMHO. It is a fully supported >>>> protocol at this time, and the same exposure exists for 4.1, it’s >>>> just a little harder to exploit. >>>> >>>> If you submit the courteous server patch without support for 4.0, >>>> I >>>> think it needs to include a plan for how 4.0 will be added later. >>>> >>> Why is there a problem here? The requirements are the same for 4.0 >>> and >>> 4.1 (or 4.2). If the lease under which the courtesy lock was >>> established has expired, then that courtesy lock must be released >>> if >>> some other client requests a lock that conflicts with the cached >>> lock >>> (unless the client breaks the courtesy framework by renewing that >>> original lease before the conflict occurs). Otherwise, it is >>> completely >>> up to the server when it decides to actually release the lock. >>> >>> For NFSv4.1 and NFSv4.2, we have DESTROY_CLIENTID, which tells the >>> server when the client is actually done with the lease, making it >>> easy >>> to determine when it is safe to release all the courtesy locks. >>> However >>> if the client does not send DESTROY_CLIENTID, then we're in the >>> same >>> situation with 4.x (x>0) as we would be with bog standard NFSv4.0. >>> The >>> lease has expired, and so the courtesy locks are liable to being >>> dropped. >> I agree the situation is the same for all minor versions. >> >> >>> At Hammerspace we have implemented courtesy locks, and our strategy >>> is >>> that when a conflict occurs, we drop the entire set of courtesy >>> locks >>> so that we don't have to deal with the "some locks were revoked" >>> scenario. The reason is that when we originally implemented >>> courtesy >>> locks, the Linux NFSv4 client support for lock revocation was a lot >>> less sophisticated than today. My suggestion is that you might >>> therefore consider starting along this path, and then refining the >>> support to make revocation more nuanced once you are confident that >>> the >>> coarser strategy is working as expected. >> Dai’s implementation does all that, and takes the coarser approach at >> the moment. There are plans to explore the more nuanced behavior (by >> revoking only the conflicting lock instead of dropping the whole >> lease) after this initial work is merged. >> >> The issue is there are certain pathological client behaviors (whether >> malicious or accidental) that can run the server out of resources, >> since it is holding onto lease state for a much longer time. We are >> simply trying to design a lease garbage collection scheme to meet >> that challenge. >> >> I think limiting the number of courteous clients is a simple way to >> do this, but we could also shorten the courtesy lifetime as more >> clients enter that state, to ensure that they don’t overrun the >> server’s memory. Another approach might be to add a shrinker that >> purges the oldest courteous clients when the server comes under >> memory pressure. >> >> > We already have a scanner that tries to release all client state after > 1 lease period. Just extend that to do it after 10 lease periods. If a > network partition hasn't recovered after 10 minutes, you probably have > bigger problems. Currently the courteous server allows 24hr for the network partition to heal before releasing all client state. That seems to be excessive but it was suggested for longer network partition conditions when switch/routers being repaired/upgraded. > > You can limit the number of clients as well, but that leads into a rats > nest of other issues that have nothing to do with courtesy locks and > everything to do with the fact that any client can hold a lot of state. The issue we currently have with courteous server and pynfs 4.0 tests is the number of courteous 4.0 clients the server has to expire when a share reservation conflict occurs when servicing the OPEN. Each client owns only few state in this case so we think the server spent most time for deleting client's record in /var/lib/nfs. This is why we plan to limit the number of courteous clients for now. As a side effect, it might also help to reduce resource consumption too. -Dai > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-30 7:22 ` dai.ngo @ 2021-11-30 13:37 ` Trond Myklebust 2021-12-01 3:52 ` dai.ngo 2021-11-30 15:36 ` Chuck Lever III 1 sibling, 1 reply; 53+ messages in thread From: Trond Myklebust @ 2021-11-30 13:37 UTC (permalink / raw) To: dai.ngo, chuck.lever; +Cc: bfields, linux-nfs, linux-fsdevel On Mon, 2021-11-29 at 23:22 -0800, dai.ngo@oracle.com wrote: > > On 11/29/21 8:57 PM, Trond Myklebust wrote: > > On Tue, 2021-11-30 at 04:47 +0000, Chuck Lever III wrote: > > > > On Nov 29, 2021, at 11:08 PM, Trond Myklebust > > > > <trondmy@hammerspace.com> wrote: > > > > > > > > On Tue, 2021-11-30 at 01:42 +0000, Chuck Lever III wrote: > > > > > > > On Nov 29, 2021, at 7:11 PM, Dai Ngo <dai.ngo@oracle.com> > > > > > > > wrote: > > > > > > > > > > > > > On 11/29/21 1:10 PM, Chuck Lever III wrote: > > > > > > > > > > > > > > > > On Nov 29, 2021, at 2:36 PM, Dai Ngo > > > > > > > > > <dai.ngo@oracle.com> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > On 11/29/21 11:03 AM, Chuck Lever III wrote: > > > > > > > > > Hello Dai! > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Nov 29, 2021, at 1:32 PM, Dai Ngo > > > > > > > > > > <dai.ngo@oracle.com> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 11/29/21 9:30 AM, J. Bruce Fields wrote: > > > > > > > > > > > On Mon, Nov 29, 2021 at 09:13:16AM -0800, > > > > > > > > > > > dai.ngo@oracle.com wrote: > > > > > > > > > > > > Hi Bruce, > > > > > > > > > > > > > > > > > > > > > > > > On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: > > > > > > > > > > > > > On 11/17/21 4:34 PM, J. Bruce Fields wrote: > > > > > > > > > > > > > > On Wed, Nov 17, 2021 at 01:46:02PM -0800, > > > > > > > > > > > > > > dai.ngo@oracle.com wrote: > > > > > > > > > > > > > > > On 11/17/21 9:59 AM, > > > > > > > > > > > > > > > dai.ngo@oracle.com wrote: > > > > > > > > > > > > > > > > On 11/17/21 6:14 AM, J. Bruce Fields > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > On Tue, Nov 16, 2021 at 03:06:32PM - > > > > > > > > > > > > > > > > > 0800, > > > > > > > > > > > > > > > > > dai.ngo@oracle.com wrote: > > > > > > > > > > > > > > > > > > Just a reminder that this patch is > > > > > > > > > > > > > > > > > > still > > > > > > > > > > > > > > > > > > waiting for your review. > > > > > > > > > > > > > > > > > Yeah, I was procrastinating and > > > > > > > > > > > > > > > > > hoping > > > > > > > > > > > > > > > > > yo'ud > > > > > > > > > > > > > > > > > figure out the pynfs > > > > > > > > > > > > > > > > > failure for me.... > > > > > > > > > > > > > > > > Last time I ran 4.0 OPEN18 test by > > > > > > > > > > > > > > > > itself > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > it passed. I will run > > > > > > > > > > > > > > > > all OPEN tests together with 5.15-rc7 > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > see if > > > > > > > > > > > > > > > > the problem you've > > > > > > > > > > > > > > > > seen still there. > > > > > > > > > > > > > > > I ran all tests in nfsv4.1 and nfsv4.0 > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > courteous and non-courteous > > > > > > > > > > > > > > > 5.15-rc7 server. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Nfs4.1 results are the same for both > > > > > > > > > > > > > > > courteous > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > non-courteous server: > > > > > > > > > > > > > > > > Of those: 0 Skipped, 0 Failed, 0 > > > > > > > > > > > > > > > > Warned, > > > > > > > > > > > > > > > > 169 > > > > > > > > > > > > > > > > Passed > > > > > > > > > > > > > > > Results of nfs4.0 with non-courteous > > > > > > > > > > > > > > > server: > > > > > > > > > > > > > > > > Of those: 8 Skipped, 1 Failed, 0 > > > > > > > > > > > > > > > > Warned, > > > > > > > > > > > > > > > > 577 > > > > > > > > > > > > > > > > Passed > > > > > > > > > > > > > > > test failed: LOCK24 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Results of nfs4.0 with courteous server: > > > > > > > > > > > > > > > > Of those: 8 Skipped, 3 Failed, 0 > > > > > > > > > > > > > > > > Warned, > > > > > > > > > > > > > > > > 575 > > > > > > > > > > > > > > > > Passed > > > > > > > > > > > > > > > tests failed: LOCK24, OPEN18, OPEN30 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > OPEN18 and OPEN30 test pass if each is > > > > > > > > > > > > > > > run by > > > > > > > > > > > > > > > itself. > > > > > > > > > > > > > > Could well be a bug in the tests, I don't > > > > > > > > > > > > > > know. > > > > > > > > > > > > > The reason OPEN18 failed was because the test > > > > > > > > > > > > > timed > > > > > > > > > > > > > out waiting for > > > > > > > > > > > > > the reply of an OPEN call. The RPC connection > > > > > > > > > > > > > used > > > > > > > > > > > > > for the test was > > > > > > > > > > > > > configured with 15 secs timeout. Note that > > > > > > > > > > > > > OPEN18 > > > > > > > > > > > > > only fails when > > > > > > > > > > > > > the tests were run with 'all' option, this > > > > > > > > > > > > > test > > > > > > > > > > > > > passes if it's run > > > > > > > > > > > > > by itself. > > > > > > > > > > > > > > > > > > > > > > > > > > With courteous server, by the time OPEN18 > > > > > > > > > > > > > runs, > > > > > > > > > > > > > there > > > > > > > > > > > > > are about 1026 > > > > > > > > > > > > > courtesy 4.0 clients on the server and all of > > > > > > > > > > > > > these > > > > > > > > > > > > > clients have opened > > > > > > > > > > > > > the same file X with WRITE access. These > > > > > > > > > > > > > clients > > > > > > > > > > > > > were > > > > > > > > > > > > > created by the > > > > > > > > > > > > > previous tests. After each test completed, > > > > > > > > > > > > > since > > > > > > > > > > > > > 4.0 > > > > > > > > > > > > > does not have > > > > > > > > > > > > > session, the client states are not cleaned up > > > > > > > > > > > > > immediately on the > > > > > > > > > > > > > server and are allowed to become courtesy > > > > > > > > > > > > > clients. > > > > > > > > > > > > > > > > > > > > > > > > > > When OPEN18 runs (about 20 minutes after the > > > > > > > > > > > > > 1st > > > > > > > > > > > > > test > > > > > > > > > > > > > started), it > > > > > > > > > > > > > sends OPEN of file X with > > > > > > > > > > > > > OPEN4_SHARE_DENY_WRITE > > > > > > > > > > > > > which causes the > > > > > > > > > > > > > server to check for conflicts with courtesy > > > > > > > > > > > > > clients. > > > > > > > > > > > > > The loop that > > > > > > > > > > > > > checks 1026 courtesy clients for share/access > > > > > > > > > > > > > conflict took less > > > > > > > > > > > > > than 1 sec. But it took about 55 secs, on my > > > > > > > > > > > > > VM, > > > > > > > > > > > > > for > > > > > > > > > > > > > the server > > > > > > > > > > > > > to expire all 1026 courtesy clients. > > > > > > > > > > > > > > > > > > > > > > > > > > I modified pynfs to configure the 4.0 RPC > > > > > > > > > > > > > connection > > > > > > > > > > > > > with 60 seconds > > > > > > > > > > > > > timeout and OPEN18 now consistently passed. > > > > > > > > > > > > > The > > > > > > > > > > > > > 4.0 > > > > > > > > > > > > > test results are > > > > > > > > > > > > > now the same for courteous and non-courteous > > > > > > > > > > > > > server: > > > > > > > > > > > > > > > > > > > > > > > > > > 8 Skipped, 1 Failed, 0 Warned, 577 Passed > > > > > > > > > > > > > > > > > > > > > > > > > > Note that 4.1 tests do not suffer this > > > > > > > > > > > > > timeout > > > > > > > > > > > > > problem because the > > > > > > > > > > > > > 4.1 clients and sessions are destroyed after > > > > > > > > > > > > > each > > > > > > > > > > > > > test completes. > > > > > > > > > > > > Do you want me to send the patch to increase > > > > > > > > > > > > the > > > > > > > > > > > > timeout for pynfs? > > > > > > > > > > > > or is there any other things you think we > > > > > > > > > > > > should > > > > > > > > > > > > do? > > > > > > > > > > > I don't know. > > > > > > > > > > > > > > > > > > > > > > 55 seconds to clean up 1026 clients is about 50ms > > > > > > > > > > > per > > > > > > > > > > > client, which is > > > > > > > > > > > pretty slow. I wonder why. I guess it's > > > > > > > > > > > probably > > > > > > > > > > > updating the stable > > > > > > > > > > > storage information. Is /var/lib/nfs/ on your > > > > > > > > > > > server > > > > > > > > > > > backed by a hard > > > > > > > > > > > drive or an SSD or something else? > > > > > > > > > > My server is a virtualbox VM that has 1 CPU, 4GB > > > > > > > > > > RAM > > > > > > > > > > and > > > > > > > > > > 64GB of hard > > > > > > > > > > disk. I think a production system that supports > > > > > > > > > > this > > > > > > > > > > many > > > > > > > > > > clients should > > > > > > > > > > have faster CPUs, faster storage. > > > > > > > > > > > > > > > > > > > > > I wonder if that's an argument for limiting the > > > > > > > > > > > number of > > > > > > > > > > > courtesy > > > > > > > > > > > clients. > > > > > > > > > > I think we might want to treat 4.0 clients a bit > > > > > > > > > > different > > > > > > > > > > from 4.1 > > > > > > > > > > clients. With 4.0, every client will become a > > > > > > > > > > courtesy > > > > > > > > > > client after > > > > > > > > > > the client is done with the export and unmounts it. > > > > > > > > > It should be safe for a server to purge a client's > > > > > > > > > lease > > > > > > > > > immediately > > > > > > > > > if there is no open or lock state associated with it. > > > > > > > > In this case, each client has opened files so there are > > > > > > > > open > > > > > > > > states > > > > > > > > associated with them. > > > > > > > > > > > > > > > > > When an NFSv4.0 client unmounts, all files should be > > > > > > > > > closed > > > > > > > > > at that > > > > > > > > > point, > > > > > > > > I'm not sure pynfs does proper clean up after each > > > > > > > > subtest, > > > > > > > > I > > > > > > > > will > > > > > > > > check. There must be state associated with the client > > > > > > > > in > > > > > > > > order > > > > > > > > for > > > > > > > > it to become courtesy client. > > > > > > > Makes sense. Then a synthetic client like pynfs can DoS a > > > > > > > courteous > > > > > > > server. > > > > > > > > > > > > > > > > > > > > > > > so the server can wait for the lease to expire and > > > > > > > > > purge > > > > > > > > > it > > > > > > > > > normally. Or am I missing something? > > > > > > > > When 4.0 client lease expires and there are still > > > > > > > > states > > > > > > > > associated > > > > > > > > with the client then the server allows this client to > > > > > > > > become > > > > > > > > courtesy > > > > > > > > client. > > > > > > > I think the same thing happens if an NFSv4.1 client > > > > > > > neglects > > > > > > > to > > > > > > > send > > > > > > > DESTROY_SESSION / DESTROY_CLIENTID. Either such a client > > > > > > > is > > > > > > > broken > > > > > > > or malicious, but the server faces the same issue of > > > > > > > protecting > > > > > > > itself from a DoS attack. > > > > > > > > > > > > > > IMO you should consider limiting the number of courteous > > > > > > > clients > > > > > > > the server can hold onto. Let's say that number is 1000. > > > > > > > When > > > > > > > the > > > > > > > server wants to turn a 1001st client into a courteous > > > > > > > client, > > > > > > > it > > > > > > > can simply expire and purge the oldest courteous client > > > > > > > on > > > > > > > its > > > > > > > list. Otherwise, over time, the 24-hour expiry will > > > > > > > reduce > > > > > > > the > > > > > > > set of courteous clients back to zero. > > > > > > > > > > > > > > What do you think? > > > > > > Limiting the number of courteous clients to handle the > > > > > > cases of > > > > > > broken/malicious 4.1 clients seems reasonable as the last > > > > > > resort. > > > > > > > > > > > > I think if a malicious 4.1 clients could mount the server's > > > > > > export, > > > > > > opens a file (to create state) and repeats the same with a > > > > > > different > > > > > > client id then it seems like some basic security was > > > > > > already > > > > > > broken; > > > > > > allowing unauthorized clients to mount server's exports. > > > > > You can do this today with AUTH_SYS. I consider it a genuine > > > > > attack > > > > > surface. > > > > > > > > > > > > > > > > I think if we have to enforce a limit, then it's only for > > > > > > handling > > > > > > of seriously buggy 4.1 clients which should not be the > > > > > > norm. > > > > > > The > > > > > > issue with this is how to pick an optimal number that is > > > > > > suitable > > > > > > for the running server which can be a very slow or a very > > > > > > fast > > > > > > server. > > > > > > > > > > > > Note that even if we impose an limit, that does not > > > > > > completely > > > > > > solve > > > > > > the problem with pynfs 4.0 test since its RPC timeout is > > > > > > configured > > > > > > with 15 secs which just enough to expire 277 clients based > > > > > > on > > > > > > 53ms > > > > > > for each client, unless we limit it ~270 clients which I > > > > > > think > > > > > > it's > > > > > > too low. > > > > > > > > > > > > This is what I plan to do: > > > > > > > > > > > > 1. do not support 4.0 courteous clients, for sure. > > > > > Not supporting 4.0 isn’t an option, IMHO. It is a fully > > > > > supported > > > > > protocol at this time, and the same exposure exists for 4.1, > > > > > it’s > > > > > just a little harder to exploit. > > > > > > > > > > If you submit the courteous server patch without support for > > > > > 4.0, > > > > > I > > > > > think it needs to include a plan for how 4.0 will be added > > > > > later. > > > > > > > > > Why is there a problem here? The requirements are the same for > > > > 4.0 > > > > and > > > > 4.1 (or 4.2). If the lease under which the courtesy lock was > > > > established has expired, then that courtesy lock must be > > > > released > > > > if > > > > some other client requests a lock that conflicts with the > > > > cached > > > > lock > > > > (unless the client breaks the courtesy framework by renewing > > > > that > > > > original lease before the conflict occurs). Otherwise, it is > > > > completely > > > > up to the server when it decides to actually release the lock. > > > > > > > > For NFSv4.1 and NFSv4.2, we have DESTROY_CLIENTID, which tells > > > > the > > > > server when the client is actually done with the lease, making > > > > it > > > > easy > > > > to determine when it is safe to release all the courtesy locks. > > > > However > > > > if the client does not send DESTROY_CLIENTID, then we're in the > > > > same > > > > situation with 4.x (x>0) as we would be with bog standard > > > > NFSv4.0. > > > > The > > > > lease has expired, and so the courtesy locks are liable to > > > > being > > > > dropped. > > > I agree the situation is the same for all minor versions. > > > > > > > > > > At Hammerspace we have implemented courtesy locks, and our > > > > strategy > > > > is > > > > that when a conflict occurs, we drop the entire set of courtesy > > > > locks > > > > so that we don't have to deal with the "some locks were > > > > revoked" > > > > scenario. The reason is that when we originally implemented > > > > courtesy > > > > locks, the Linux NFSv4 client support for lock revocation was a > > > > lot > > > > less sophisticated than today. My suggestion is that you might > > > > therefore consider starting along this path, and then refining > > > > the > > > > support to make revocation more nuanced once you are confident > > > > that > > > > the > > > > coarser strategy is working as expected. > > > Dai’s implementation does all that, and takes the coarser > > > approach at > > > the moment. There are plans to explore the more nuanced behavior > > > (by > > > revoking only the conflicting lock instead of dropping the whole > > > lease) after this initial work is merged. > > > > > > The issue is there are certain pathological client behaviors > > > (whether > > > malicious or accidental) that can run the server out of > > > resources, > > > since it is holding onto lease state for a much longer time. We > > > are > > > simply trying to design a lease garbage collection scheme to meet > > > that challenge. > > > > > > I think limiting the number of courteous clients is a simple way > > > to > > > do this, but we could also shorten the courtesy lifetime as more > > > clients enter that state, to ensure that they don’t overrun the > > > server’s memory. Another approach might be to add a shrinker that > > > purges the oldest courteous clients when the server comes under > > > memory pressure. > > > > > > > > We already have a scanner that tries to release all client state > > after > > 1 lease period. Just extend that to do it after 10 lease periods. > > If a > > network partition hasn't recovered after 10 minutes, you probably > > have > > bigger problems. > > Currently the courteous server allows 24hr for the network partition > to > heal before releasing all client state. That seems to be excessive > but > it was suggested for longer network partition conditions when > switch/routers > being repaired/upgraded. > > > > > You can limit the number of clients as well, but that leads into a > > rats > > nest of other issues that have nothing to do with courtesy locks > > and > > everything to do with the fact that any client can hold a lot of > > state. > > The issue we currently have with courteous server and pynfs 4.0 tests > is the number of courteous 4.0 clients the server has to expire when > a > share reservation conflict occurs when servicing the OPEN. Each > client > owns only few state in this case so we think the server spent most > time > for deleting client's record in /var/lib/nfs. This is why we plan to > limit the number of courteous clients for now. As a side effect, it > might > also help to reduce resource consumption too. Then kick off a thread or work item to do that asynchronously in the background, and return NFS4ERR_DELAY to the clients that were trying to grab locks in the meantime. The above process is hardly just confined to NFSv4.0 clients. If there is a network partition, then the exact same record deleting needs to be applied to all NFSv4.1 and NFSv4.2 clients that hold locks and are unable to renew their leases, so you might as well make it work for everyone. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-30 13:37 ` Trond Myklebust @ 2021-12-01 3:52 ` dai.ngo 2021-12-01 14:19 ` bfields 0 siblings, 1 reply; 53+ messages in thread From: dai.ngo @ 2021-12-01 3:52 UTC (permalink / raw) To: Trond Myklebust, chuck.lever; +Cc: bfields, linux-nfs, linux-fsdevel On 11/30/21 5:37 AM, Trond Myklebust wrote: > On Mon, 2021-11-29 at 23:22 -0800, dai.ngo@oracle.com wrote: >> On 11/29/21 8:57 PM, Trond Myklebust wrote: >>> On Tue, 2021-11-30 at 04:47 +0000, Chuck Lever III wrote: >>>>> On Nov 29, 2021, at 11:08 PM, Trond Myklebust >>>>> <trondmy@hammerspace.com> wrote: >>>>> >>>>> On Tue, 2021-11-30 at 01:42 +0000, Chuck Lever III wrote: >>>>>>>> On Nov 29, 2021, at 7:11 PM, Dai Ngo <dai.ngo@oracle.com> >>>>>>>> wrote: >>>>>>> >>>>>>>> On 11/29/21 1:10 PM, Chuck Lever III wrote: >>>>>>>> >>>>>>>>>> On Nov 29, 2021, at 2:36 PM, Dai Ngo >>>>>>>>>> <dai.ngo@oracle.com> >>>>>>>>>> wrote: >>>>>>>>> On 11/29/21 11:03 AM, Chuck Lever III wrote: >>>>>>>>>> Hello Dai! >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Nov 29, 2021, at 1:32 PM, Dai Ngo >>>>>>>>>>> <dai.ngo@oracle.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 11/29/21 9:30 AM, J. Bruce Fields wrote: >>>>>>>>>>>> On Mon, Nov 29, 2021 at 09:13:16AM -0800, >>>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>>> Hi Bruce, >>>>>>>>>>>>> >>>>>>>>>>>>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>>>>>>>>>>>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>>>>>>>>>>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, >>>>>>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>>>>>> On 11/17/21 9:59 AM, >>>>>>>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM - >>>>>>>>>>>>>>>>>> 0800, >>>>>>>>>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>>>>>>>>> Just a reminder that this patch is >>>>>>>>>>>>>>>>>>> still >>>>>>>>>>>>>>>>>>> waiting for your review. >>>>>>>>>>>>>>>>>> Yeah, I was procrastinating and >>>>>>>>>>>>>>>>>> hoping >>>>>>>>>>>>>>>>>> yo'ud >>>>>>>>>>>>>>>>>> figure out the pynfs >>>>>>>>>>>>>>>>>> failure for me.... >>>>>>>>>>>>>>>>> Last time I ran 4.0 OPEN18 test by >>>>>>>>>>>>>>>>> itself >>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> it passed. I will run >>>>>>>>>>>>>>>>> all OPEN tests together with 5.15-rc7 >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> see if >>>>>>>>>>>>>>>>> the problem you've >>>>>>>>>>>>>>>>> seen still there. >>>>>>>>>>>>>>>> I ran all tests in nfsv4.1 and nfsv4.0 >>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>> courteous and non-courteous >>>>>>>>>>>>>>>> 5.15-rc7 server. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Nfs4.1 results are the same for both >>>>>>>>>>>>>>>> courteous >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> non-courteous server: >>>>>>>>>>>>>>>>> Of those: 0 Skipped, 0 Failed, 0 >>>>>>>>>>>>>>>>> Warned, >>>>>>>>>>>>>>>>> 169 >>>>>>>>>>>>>>>>> Passed >>>>>>>>>>>>>>>> Results of nfs4.0 with non-courteous >>>>>>>>>>>>>>>> server: >>>>>>>>>>>>>>>>> Of those: 8 Skipped, 1 Failed, 0 >>>>>>>>>>>>>>>>> Warned, >>>>>>>>>>>>>>>>> 577 >>>>>>>>>>>>>>>>> Passed >>>>>>>>>>>>>>>> test failed: LOCK24 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Results of nfs4.0 with courteous server: >>>>>>>>>>>>>>>>> Of those: 8 Skipped, 3 Failed, 0 >>>>>>>>>>>>>>>>> Warned, >>>>>>>>>>>>>>>>> 575 >>>>>>>>>>>>>>>>> Passed >>>>>>>>>>>>>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> OPEN18 and OPEN30 test pass if each is >>>>>>>>>>>>>>>> run by >>>>>>>>>>>>>>>> itself. >>>>>>>>>>>>>>> Could well be a bug in the tests, I don't >>>>>>>>>>>>>>> know. >>>>>>>>>>>>>> The reason OPEN18 failed was because the test >>>>>>>>>>>>>> timed >>>>>>>>>>>>>> out waiting for >>>>>>>>>>>>>> the reply of an OPEN call. The RPC connection >>>>>>>>>>>>>> used >>>>>>>>>>>>>> for the test was >>>>>>>>>>>>>> configured with 15 secs timeout. Note that >>>>>>>>>>>>>> OPEN18 >>>>>>>>>>>>>> only fails when >>>>>>>>>>>>>> the tests were run with 'all' option, this >>>>>>>>>>>>>> test >>>>>>>>>>>>>> passes if it's run >>>>>>>>>>>>>> by itself. >>>>>>>>>>>>>> >>>>>>>>>>>>>> With courteous server, by the time OPEN18 >>>>>>>>>>>>>> runs, >>>>>>>>>>>>>> there >>>>>>>>>>>>>> are about 1026 >>>>>>>>>>>>>> courtesy 4.0 clients on the server and all of >>>>>>>>>>>>>> these >>>>>>>>>>>>>> clients have opened >>>>>>>>>>>>>> the same file X with WRITE access. These >>>>>>>>>>>>>> clients >>>>>>>>>>>>>> were >>>>>>>>>>>>>> created by the >>>>>>>>>>>>>> previous tests. After each test completed, >>>>>>>>>>>>>> since >>>>>>>>>>>>>> 4.0 >>>>>>>>>>>>>> does not have >>>>>>>>>>>>>> session, the client states are not cleaned up >>>>>>>>>>>>>> immediately on the >>>>>>>>>>>>>> server and are allowed to become courtesy >>>>>>>>>>>>>> clients. >>>>>>>>>>>>>> >>>>>>>>>>>>>> When OPEN18 runs (about 20 minutes after the >>>>>>>>>>>>>> 1st >>>>>>>>>>>>>> test >>>>>>>>>>>>>> started), it >>>>>>>>>>>>>> sends OPEN of file X with >>>>>>>>>>>>>> OPEN4_SHARE_DENY_WRITE >>>>>>>>>>>>>> which causes the >>>>>>>>>>>>>> server to check for conflicts with courtesy >>>>>>>>>>>>>> clients. >>>>>>>>>>>>>> The loop that >>>>>>>>>>>>>> checks 1026 courtesy clients for share/access >>>>>>>>>>>>>> conflict took less >>>>>>>>>>>>>> than 1 sec. But it took about 55 secs, on my >>>>>>>>>>>>>> VM, >>>>>>>>>>>>>> for >>>>>>>>>>>>>> the server >>>>>>>>>>>>>> to expire all 1026 courtesy clients. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I modified pynfs to configure the 4.0 RPC >>>>>>>>>>>>>> connection >>>>>>>>>>>>>> with 60 seconds >>>>>>>>>>>>>> timeout and OPEN18 now consistently passed. >>>>>>>>>>>>>> The >>>>>>>>>>>>>> 4.0 >>>>>>>>>>>>>> test results are >>>>>>>>>>>>>> now the same for courteous and non-courteous >>>>>>>>>>>>>> server: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>>>>>>>>> >>>>>>>>>>>>>> Note that 4.1 tests do not suffer this >>>>>>>>>>>>>> timeout >>>>>>>>>>>>>> problem because the >>>>>>>>>>>>>> 4.1 clients and sessions are destroyed after >>>>>>>>>>>>>> each >>>>>>>>>>>>>> test completes. >>>>>>>>>>>>> Do you want me to send the patch to increase >>>>>>>>>>>>> the >>>>>>>>>>>>> timeout for pynfs? >>>>>>>>>>>>> or is there any other things you think we >>>>>>>>>>>>> should >>>>>>>>>>>>> do? >>>>>>>>>>>> I don't know. >>>>>>>>>>>> >>>>>>>>>>>> 55 seconds to clean up 1026 clients is about 50ms >>>>>>>>>>>> per >>>>>>>>>>>> client, which is >>>>>>>>>>>> pretty slow. I wonder why. I guess it's >>>>>>>>>>>> probably >>>>>>>>>>>> updating the stable >>>>>>>>>>>> storage information. Is /var/lib/nfs/ on your >>>>>>>>>>>> server >>>>>>>>>>>> backed by a hard >>>>>>>>>>>> drive or an SSD or something else? >>>>>>>>>>> My server is a virtualbox VM that has 1 CPU, 4GB >>>>>>>>>>> RAM >>>>>>>>>>> and >>>>>>>>>>> 64GB of hard >>>>>>>>>>> disk. I think a production system that supports >>>>>>>>>>> this >>>>>>>>>>> many >>>>>>>>>>> clients should >>>>>>>>>>> have faster CPUs, faster storage. >>>>>>>>>>> >>>>>>>>>>>> I wonder if that's an argument for limiting the >>>>>>>>>>>> number of >>>>>>>>>>>> courtesy >>>>>>>>>>>> clients. >>>>>>>>>>> I think we might want to treat 4.0 clients a bit >>>>>>>>>>> different >>>>>>>>>>> from 4.1 >>>>>>>>>>> clients. With 4.0, every client will become a >>>>>>>>>>> courtesy >>>>>>>>>>> client after >>>>>>>>>>> the client is done with the export and unmounts it. >>>>>>>>>> It should be safe for a server to purge a client's >>>>>>>>>> lease >>>>>>>>>> immediately >>>>>>>>>> if there is no open or lock state associated with it. >>>>>>>>> In this case, each client has opened files so there are >>>>>>>>> open >>>>>>>>> states >>>>>>>>> associated with them. >>>>>>>>> >>>>>>>>>> When an NFSv4.0 client unmounts, all files should be >>>>>>>>>> closed >>>>>>>>>> at that >>>>>>>>>> point, >>>>>>>>> I'm not sure pynfs does proper clean up after each >>>>>>>>> subtest, >>>>>>>>> I >>>>>>>>> will >>>>>>>>> check. There must be state associated with the client >>>>>>>>> in >>>>>>>>> order >>>>>>>>> for >>>>>>>>> it to become courtesy client. >>>>>>>> Makes sense. Then a synthetic client like pynfs can DoS a >>>>>>>> courteous >>>>>>>> server. >>>>>>>> >>>>>>>> >>>>>>>>>> so the server can wait for the lease to expire and >>>>>>>>>> purge >>>>>>>>>> it >>>>>>>>>> normally. Or am I missing something? >>>>>>>>> When 4.0 client lease expires and there are still >>>>>>>>> states >>>>>>>>> associated >>>>>>>>> with the client then the server allows this client to >>>>>>>>> become >>>>>>>>> courtesy >>>>>>>>> client. >>>>>>>> I think the same thing happens if an NFSv4.1 client >>>>>>>> neglects >>>>>>>> to >>>>>>>> send >>>>>>>> DESTROY_SESSION / DESTROY_CLIENTID. Either such a client >>>>>>>> is >>>>>>>> broken >>>>>>>> or malicious, but the server faces the same issue of >>>>>>>> protecting >>>>>>>> itself from a DoS attack. >>>>>>>> >>>>>>>> IMO you should consider limiting the number of courteous >>>>>>>> clients >>>>>>>> the server can hold onto. Let's say that number is 1000. >>>>>>>> When >>>>>>>> the >>>>>>>> server wants to turn a 1001st client into a courteous >>>>>>>> client, >>>>>>>> it >>>>>>>> can simply expire and purge the oldest courteous client >>>>>>>> on >>>>>>>> its >>>>>>>> list. Otherwise, over time, the 24-hour expiry will >>>>>>>> reduce >>>>>>>> the >>>>>>>> set of courteous clients back to zero. >>>>>>>> >>>>>>>> What do you think? >>>>>>> Limiting the number of courteous clients to handle the >>>>>>> cases of >>>>>>> broken/malicious 4.1 clients seems reasonable as the last >>>>>>> resort. >>>>>>> >>>>>>> I think if a malicious 4.1 clients could mount the server's >>>>>>> export, >>>>>>> opens a file (to create state) and repeats the same with a >>>>>>> different >>>>>>> client id then it seems like some basic security was >>>>>>> already >>>>>>> broken; >>>>>>> allowing unauthorized clients to mount server's exports. >>>>>> You can do this today with AUTH_SYS. I consider it a genuine >>>>>> attack >>>>>> surface. >>>>>> >>>>>> >>>>>>> I think if we have to enforce a limit, then it's only for >>>>>>> handling >>>>>>> of seriously buggy 4.1 clients which should not be the >>>>>>> norm. >>>>>>> The >>>>>>> issue with this is how to pick an optimal number that is >>>>>>> suitable >>>>>>> for the running server which can be a very slow or a very >>>>>>> fast >>>>>>> server. >>>>>>> >>>>>>> Note that even if we impose an limit, that does not >>>>>>> completely >>>>>>> solve >>>>>>> the problem with pynfs 4.0 test since its RPC timeout is >>>>>>> configured >>>>>>> with 15 secs which just enough to expire 277 clients based >>>>>>> on >>>>>>> 53ms >>>>>>> for each client, unless we limit it ~270 clients which I >>>>>>> think >>>>>>> it's >>>>>>> too low. >>>>>>> >>>>>>> This is what I plan to do: >>>>>>> >>>>>>> 1. do not support 4.0 courteous clients, for sure. >>>>>> Not supporting 4.0 isn’t an option, IMHO. It is a fully >>>>>> supported >>>>>> protocol at this time, and the same exposure exists for 4.1, >>>>>> it’s >>>>>> just a little harder to exploit. >>>>>> >>>>>> If you submit the courteous server patch without support for >>>>>> 4.0, >>>>>> I >>>>>> think it needs to include a plan for how 4.0 will be added >>>>>> later. >>>>>> >>>>> Why is there a problem here? The requirements are the same for >>>>> 4.0 >>>>> and >>>>> 4.1 (or 4.2). If the lease under which the courtesy lock was >>>>> established has expired, then that courtesy lock must be >>>>> released >>>>> if >>>>> some other client requests a lock that conflicts with the >>>>> cached >>>>> lock >>>>> (unless the client breaks the courtesy framework by renewing >>>>> that >>>>> original lease before the conflict occurs). Otherwise, it is >>>>> completely >>>>> up to the server when it decides to actually release the lock. >>>>> >>>>> For NFSv4.1 and NFSv4.2, we have DESTROY_CLIENTID, which tells >>>>> the >>>>> server when the client is actually done with the lease, making >>>>> it >>>>> easy >>>>> to determine when it is safe to release all the courtesy locks. >>>>> However >>>>> if the client does not send DESTROY_CLIENTID, then we're in the >>>>> same >>>>> situation with 4.x (x>0) as we would be with bog standard >>>>> NFSv4.0. >>>>> The >>>>> lease has expired, and so the courtesy locks are liable to >>>>> being >>>>> dropped. >>>> I agree the situation is the same for all minor versions. >>>> >>>> >>>>> At Hammerspace we have implemented courtesy locks, and our >>>>> strategy >>>>> is >>>>> that when a conflict occurs, we drop the entire set of courtesy >>>>> locks >>>>> so that we don't have to deal with the "some locks were >>>>> revoked" >>>>> scenario. The reason is that when we originally implemented >>>>> courtesy >>>>> locks, the Linux NFSv4 client support for lock revocation was a >>>>> lot >>>>> less sophisticated than today. My suggestion is that you might >>>>> therefore consider starting along this path, and then refining >>>>> the >>>>> support to make revocation more nuanced once you are confident >>>>> that >>>>> the >>>>> coarser strategy is working as expected. >>>> Dai’s implementation does all that, and takes the coarser >>>> approach at >>>> the moment. There are plans to explore the more nuanced behavior >>>> (by >>>> revoking only the conflicting lock instead of dropping the whole >>>> lease) after this initial work is merged. >>>> >>>> The issue is there are certain pathological client behaviors >>>> (whether >>>> malicious or accidental) that can run the server out of >>>> resources, >>>> since it is holding onto lease state for a much longer time. We >>>> are >>>> simply trying to design a lease garbage collection scheme to meet >>>> that challenge. >>>> >>>> I think limiting the number of courteous clients is a simple way >>>> to >>>> do this, but we could also shorten the courtesy lifetime as more >>>> clients enter that state, to ensure that they don’t overrun the >>>> server’s memory. Another approach might be to add a shrinker that >>>> purges the oldest courteous clients when the server comes under >>>> memory pressure. >>>> >>>> >>> We already have a scanner that tries to release all client state >>> after >>> 1 lease period. Just extend that to do it after 10 lease periods. >>> If a >>> network partition hasn't recovered after 10 minutes, you probably >>> have >>> bigger problems. >> Currently the courteous server allows 24hr for the network partition >> to >> heal before releasing all client state. That seems to be excessive >> but >> it was suggested for longer network partition conditions when >> switch/routers >> being repaired/upgraded. >> >>> You can limit the number of clients as well, but that leads into a >>> rats >>> nest of other issues that have nothing to do with courtesy locks >>> and >>> everything to do with the fact that any client can hold a lot of >>> state. >> The issue we currently have with courteous server and pynfs 4.0 tests >> is the number of courteous 4.0 clients the server has to expire when >> a >> share reservation conflict occurs when servicing the OPEN. Each >> client >> owns only few state in this case so we think the server spent most >> time >> for deleting client's record in /var/lib/nfs. This is why we plan to >> limit the number of courteous clients for now. As a side effect, it >> might >> also help to reduce resource consumption too. > Then kick off a thread or work item to do that asynchronously in the > background, and return NFS4ERR_DELAY to the clients that were trying to > grab locks in the meantime. Thanks Trond, I think this is a reasonable approach. The behavior would be similar to a delegation recall during the OPEN. My plan is: 1. If the number of conflict clients is less than 100 (some numbers that cover realistic usage) then release all their state synchronously in the OPEN call, and returns NFS4_OK to the NFS client. Most of conflicts should be handled by this case. 2. If the number of conflict clients is more than 100 then release the state of the 1st 100 clients as in (1) and trigger the laundromat thread to release state of the rest of the conflict clients, and return NFS4ERR_DELAY to the NFS client. This should be a rare condition. -Dai > > The above process is hardly just confined to NFSv4.0 clients. If there > is a network partition, then the exact same record deleting needs to be > applied to all NFSv4.1 and NFSv4.2 clients that hold locks and are > unable to renew their leases, so you might as well make it work for > everyone. > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-12-01 3:52 ` dai.ngo @ 2021-12-01 14:19 ` bfields 0 siblings, 0 replies; 53+ messages in thread From: bfields @ 2021-12-01 14:19 UTC (permalink / raw) To: dai.ngo; +Cc: Trond Myklebust, chuck.lever, linux-nfs, linux-fsdevel On Tue, Nov 30, 2021 at 07:52:10PM -0800, dai.ngo@oracle.com wrote: > On 11/30/21 5:37 AM, Trond Myklebust wrote: > >Then kick off a thread or work item to do that asynchronously in the > >background, and return NFS4ERR_DELAY to the clients that were trying to > >grab locks in the meantime. > > Thanks Trond, I think this is a reasonable approach. The behavior would > be similar to a delegation recall during the OPEN. > > My plan is: > > 1. If the number of conflict clients is less than 100 (some numbers that > cover realistic usage) then release all their state synchronously in > the OPEN call, and returns NFS4_OK to the NFS client. Most of conflicts > should be handled by this case. > > 2. If the number of conflict clients is more than 100 then release the > state of the 1st 100 clients as in (1) and trigger the laundromat thread > to release state of the rest of the conflict clients, and return > NFS4ERR_DELAY to the NFS client. This should be a rare condition. Honestly, conflict with a courtesy client is itself not going to be that common, so personally I'd start simple and handle everything with the asynchronous approach. --b. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-30 7:22 ` dai.ngo 2021-11-30 13:37 ` Trond Myklebust @ 2021-11-30 15:36 ` Chuck Lever III 2021-11-30 16:05 ` Bruce Fields 1 sibling, 1 reply; 53+ messages in thread From: Chuck Lever III @ 2021-11-30 15:36 UTC (permalink / raw) To: Dai Ngo Cc: Trond Myklebust, Bruce Fields, Linux NFS Mailing List, linux-fsdevel > On Nov 30, 2021, at 2:22 AM, Dai Ngo <dai.ngo@oracle.com> wrote: > > > On 11/29/21 8:57 PM, Trond Myklebust wrote: >> On Tue, 2021-11-30 at 04:47 +0000, Chuck Lever III wrote: >>>> On Nov 29, 2021, at 11:08 PM, Trond Myklebust >>>> <trondmy@hammerspace.com> wrote: >>>> >>>> On Tue, 2021-11-30 at 01:42 +0000, Chuck Lever III wrote: >>>>>>> On Nov 29, 2021, at 7:11 PM, Dai Ngo <dai.ngo@oracle.com> >>>>>>> wrote: >>>>>> >>>>>>> On 11/29/21 1:10 PM, Chuck Lever III wrote: >>>>>>> >>>>>>>>> On Nov 29, 2021, at 2:36 PM, Dai Ngo <dai.ngo@oracle.com> >>>>>>>>> wrote: >>>>>>>> >>>>>>>> On 11/29/21 11:03 AM, Chuck Lever III wrote: >>>>>>>>> Hello Dai! >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Nov 29, 2021, at 1:32 PM, Dai Ngo >>>>>>>>>> <dai.ngo@oracle.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 11/29/21 9:30 AM, J. Bruce Fields wrote: >>>>>>>>>>> On Mon, Nov 29, 2021 at 09:13:16AM -0800, >>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>> Hi Bruce, >>>>>>>>>>>> >>>>>>>>>>>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>>>>>>>>>>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>>>>>>>>>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, >>>>>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>>>>> On 11/17/21 9:59 AM, >>>>>>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>>>>>>>>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, >>>>>>>>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>>>>>>>> Just a reminder that this patch is >>>>>>>>>>>>>>>>>> still >>>>>>>>>>>>>>>>>> waiting for your review. >>>>>>>>>>>>>>>>> Yeah, I was procrastinating and hoping >>>>>>>>>>>>>>>>> yo'ud >>>>>>>>>>>>>>>>> figure out the pynfs >>>>>>>>>>>>>>>>> failure for me.... >>>>>>>>>>>>>>>> Last time I ran 4.0 OPEN18 test by itself >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> it passed. I will run >>>>>>>>>>>>>>>> all OPEN tests together with 5.15-rc7 to >>>>>>>>>>>>>>>> see if >>>>>>>>>>>>>>>> the problem you've >>>>>>>>>>>>>>>> seen still there. >>>>>>>>>>>>>>> I ran all tests in nfsv4.1 and nfsv4.0 with >>>>>>>>>>>>>>> courteous and non-courteous >>>>>>>>>>>>>>> 5.15-rc7 server. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Nfs4.1 results are the same for both >>>>>>>>>>>>>>> courteous >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> non-courteous server: >>>>>>>>>>>>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, >>>>>>>>>>>>>>>> 169 >>>>>>>>>>>>>>>> Passed >>>>>>>>>>>>>>> Results of nfs4.0 with non-courteous server: >>>>>>>>>>>>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, >>>>>>>>>>>>>>>> 577 >>>>>>>>>>>>>>>> Passed >>>>>>>>>>>>>>> test failed: LOCK24 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Results of nfs4.0 with courteous server: >>>>>>>>>>>>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, >>>>>>>>>>>>>>>> 575 >>>>>>>>>>>>>>>> Passed >>>>>>>>>>>>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> OPEN18 and OPEN30 test pass if each is run by >>>>>>>>>>>>>>> itself. >>>>>>>>>>>>>> Could well be a bug in the tests, I don't know. >>>>>>>>>>>>> The reason OPEN18 failed was because the test >>>>>>>>>>>>> timed >>>>>>>>>>>>> out waiting for >>>>>>>>>>>>> the reply of an OPEN call. The RPC connection >>>>>>>>>>>>> used >>>>>>>>>>>>> for the test was >>>>>>>>>>>>> configured with 15 secs timeout. Note that OPEN18 >>>>>>>>>>>>> only fails when >>>>>>>>>>>>> the tests were run with 'all' option, this test >>>>>>>>>>>>> passes if it's run >>>>>>>>>>>>> by itself. >>>>>>>>>>>>> >>>>>>>>>>>>> With courteous server, by the time OPEN18 runs, >>>>>>>>>>>>> there >>>>>>>>>>>>> are about 1026 >>>>>>>>>>>>> courtesy 4.0 clients on the server and all of >>>>>>>>>>>>> these >>>>>>>>>>>>> clients have opened >>>>>>>>>>>>> the same file X with WRITE access. These clients >>>>>>>>>>>>> were >>>>>>>>>>>>> created by the >>>>>>>>>>>>> previous tests. After each test completed, since >>>>>>>>>>>>> 4.0 >>>>>>>>>>>>> does not have >>>>>>>>>>>>> session, the client states are not cleaned up >>>>>>>>>>>>> immediately on the >>>>>>>>>>>>> server and are allowed to become courtesy >>>>>>>>>>>>> clients. >>>>>>>>>>>>> >>>>>>>>>>>>> When OPEN18 runs (about 20 minutes after the 1st >>>>>>>>>>>>> test >>>>>>>>>>>>> started), it >>>>>>>>>>>>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE >>>>>>>>>>>>> which causes the >>>>>>>>>>>>> server to check for conflicts with courtesy >>>>>>>>>>>>> clients. >>>>>>>>>>>>> The loop that >>>>>>>>>>>>> checks 1026 courtesy clients for share/access >>>>>>>>>>>>> conflict took less >>>>>>>>>>>>> than 1 sec. But it took about 55 secs, on my VM, >>>>>>>>>>>>> for >>>>>>>>>>>>> the server >>>>>>>>>>>>> to expire all 1026 courtesy clients. >>>>>>>>>>>>> >>>>>>>>>>>>> I modified pynfs to configure the 4.0 RPC >>>>>>>>>>>>> connection >>>>>>>>>>>>> with 60 seconds >>>>>>>>>>>>> timeout and OPEN18 now consistently passed. The >>>>>>>>>>>>> 4.0 >>>>>>>>>>>>> test results are >>>>>>>>>>>>> now the same for courteous and non-courteous >>>>>>>>>>>>> server: >>>>>>>>>>>>> >>>>>>>>>>>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>>>>>>>> >>>>>>>>>>>>> Note that 4.1 tests do not suffer this timeout >>>>>>>>>>>>> problem because the >>>>>>>>>>>>> 4.1 clients and sessions are destroyed after each >>>>>>>>>>>>> test completes. >>>>>>>>>>>> Do you want me to send the patch to increase the >>>>>>>>>>>> timeout for pynfs? >>>>>>>>>>>> or is there any other things you think we should >>>>>>>>>>>> do? >>>>>>>>>>> I don't know. >>>>>>>>>>> >>>>>>>>>>> 55 seconds to clean up 1026 clients is about 50ms per >>>>>>>>>>> client, which is >>>>>>>>>>> pretty slow. I wonder why. I guess it's probably >>>>>>>>>>> updating the stable >>>>>>>>>>> storage information. Is /var/lib/nfs/ on your server >>>>>>>>>>> backed by a hard >>>>>>>>>>> drive or an SSD or something else? >>>>>>>>>> My server is a virtualbox VM that has 1 CPU, 4GB RAM >>>>>>>>>> and >>>>>>>>>> 64GB of hard >>>>>>>>>> disk. I think a production system that supports this >>>>>>>>>> many >>>>>>>>>> clients should >>>>>>>>>> have faster CPUs, faster storage. >>>>>>>>>> >>>>>>>>>>> I wonder if that's an argument for limiting the >>>>>>>>>>> number of >>>>>>>>>>> courtesy >>>>>>>>>>> clients. >>>>>>>>>> I think we might want to treat 4.0 clients a bit >>>>>>>>>> different >>>>>>>>>> from 4.1 >>>>>>>>>> clients. With 4.0, every client will become a courtesy >>>>>>>>>> client after >>>>>>>>>> the client is done with the export and unmounts it. >>>>>>>>> It should be safe for a server to purge a client's lease >>>>>>>>> immediately >>>>>>>>> if there is no open or lock state associated with it. >>>>>>>> In this case, each client has opened files so there are >>>>>>>> open >>>>>>>> states >>>>>>>> associated with them. >>>>>>>> >>>>>>>>> When an NFSv4.0 client unmounts, all files should be >>>>>>>>> closed >>>>>>>>> at that >>>>>>>>> point, >>>>>>>> I'm not sure pynfs does proper clean up after each subtest, >>>>>>>> I >>>>>>>> will >>>>>>>> check. There must be state associated with the client in >>>>>>>> order >>>>>>>> for >>>>>>>> it to become courtesy client. >>>>>>> Makes sense. Then a synthetic client like pynfs can DoS a >>>>>>> courteous >>>>>>> server. >>>>>>> >>>>>>> >>>>>>>>> so the server can wait for the lease to expire and purge >>>>>>>>> it >>>>>>>>> normally. Or am I missing something? >>>>>>>> When 4.0 client lease expires and there are still states >>>>>>>> associated >>>>>>>> with the client then the server allows this client to >>>>>>>> become >>>>>>>> courtesy >>>>>>>> client. >>>>>>> I think the same thing happens if an NFSv4.1 client neglects >>>>>>> to >>>>>>> send >>>>>>> DESTROY_SESSION / DESTROY_CLIENTID. Either such a client is >>>>>>> broken >>>>>>> or malicious, but the server faces the same issue of >>>>>>> protecting >>>>>>> itself from a DoS attack. >>>>>>> >>>>>>> IMO you should consider limiting the number of courteous >>>>>>> clients >>>>>>> the server can hold onto. Let's say that number is 1000. When >>>>>>> the >>>>>>> server wants to turn a 1001st client into a courteous client, >>>>>>> it >>>>>>> can simply expire and purge the oldest courteous client on >>>>>>> its >>>>>>> list. Otherwise, over time, the 24-hour expiry will reduce >>>>>>> the >>>>>>> set of courteous clients back to zero. >>>>>>> >>>>>>> What do you think? >>>>>> Limiting the number of courteous clients to handle the cases of >>>>>> broken/malicious 4.1 clients seems reasonable as the last >>>>>> resort. >>>>>> >>>>>> I think if a malicious 4.1 clients could mount the server's >>>>>> export, >>>>>> opens a file (to create state) and repeats the same with a >>>>>> different >>>>>> client id then it seems like some basic security was already >>>>>> broken; >>>>>> allowing unauthorized clients to mount server's exports. >>>>> You can do this today with AUTH_SYS. I consider it a genuine >>>>> attack >>>>> surface. >>>>> >>>>> >>>>>> I think if we have to enforce a limit, then it's only for >>>>>> handling >>>>>> of seriously buggy 4.1 clients which should not be the norm. >>>>>> The >>>>>> issue with this is how to pick an optimal number that is >>>>>> suitable >>>>>> for the running server which can be a very slow or a very fast >>>>>> server. >>>>>> >>>>>> Note that even if we impose an limit, that does not completely >>>>>> solve >>>>>> the problem with pynfs 4.0 test since its RPC timeout is >>>>>> configured >>>>>> with 15 secs which just enough to expire 277 clients based on >>>>>> 53ms >>>>>> for each client, unless we limit it ~270 clients which I think >>>>>> it's >>>>>> too low. >>>>>> >>>>>> This is what I plan to do: >>>>>> >>>>>> 1. do not support 4.0 courteous clients, for sure. >>>>> Not supporting 4.0 isn’t an option, IMHO. It is a fully supported >>>>> protocol at this time, and the same exposure exists for 4.1, it’s >>>>> just a little harder to exploit. >>>>> >>>>> If you submit the courteous server patch without support for 4.0, >>>>> I >>>>> think it needs to include a plan for how 4.0 will be added later. >>>>> >>>> Why is there a problem here? The requirements are the same for 4.0 >>>> and >>>> 4.1 (or 4.2). If the lease under which the courtesy lock was >>>> established has expired, then that courtesy lock must be released >>>> if >>>> some other client requests a lock that conflicts with the cached >>>> lock >>>> (unless the client breaks the courtesy framework by renewing that >>>> original lease before the conflict occurs). Otherwise, it is >>>> completely >>>> up to the server when it decides to actually release the lock. >>>> >>>> For NFSv4.1 and NFSv4.2, we have DESTROY_CLIENTID, which tells the >>>> server when the client is actually done with the lease, making it >>>> easy >>>> to determine when it is safe to release all the courtesy locks. >>>> However >>>> if the client does not send DESTROY_CLIENTID, then we're in the >>>> same >>>> situation with 4.x (x>0) as we would be with bog standard NFSv4.0. >>>> The >>>> lease has expired, and so the courtesy locks are liable to being >>>> dropped. >>> I agree the situation is the same for all minor versions. >>> >>> >>>> At Hammerspace we have implemented courtesy locks, and our strategy >>>> is >>>> that when a conflict occurs, we drop the entire set of courtesy >>>> locks >>>> so that we don't have to deal with the "some locks were revoked" >>>> scenario. The reason is that when we originally implemented >>>> courtesy >>>> locks, the Linux NFSv4 client support for lock revocation was a lot >>>> less sophisticated than today. My suggestion is that you might >>>> therefore consider starting along this path, and then refining the >>>> support to make revocation more nuanced once you are confident that >>>> the >>>> coarser strategy is working as expected. >>> Dai’s implementation does all that, and takes the coarser approach at >>> the moment. There are plans to explore the more nuanced behavior (by >>> revoking only the conflicting lock instead of dropping the whole >>> lease) after this initial work is merged. >>> >>> The issue is there are certain pathological client behaviors (whether >>> malicious or accidental) that can run the server out of resources, >>> since it is holding onto lease state for a much longer time. We are >>> simply trying to design a lease garbage collection scheme to meet >>> that challenge. >>> >>> I think limiting the number of courteous clients is a simple way to >>> do this, but we could also shorten the courtesy lifetime as more >>> clients enter that state, to ensure that they don’t overrun the >>> server’s memory. Another approach might be to add a shrinker that >>> purges the oldest courteous clients when the server comes under >>> memory pressure. >>> >>> >> We already have a scanner that tries to release all client state after >> 1 lease period. Just extend that to do it after 10 lease periods. If a >> network partition hasn't recovered after 10 minutes, you probably have >> bigger problems. > > Currently the courteous server allows 24hr for the network partition to > heal before releasing all client state. That seems to be excessive but > it was suggested for longer network partition conditions when switch/routers > being repaired/upgraded. Sure, 24 hours is a long time. For the benefit of others on the list, we have seen customer failure scenarios where networks were partitioned for that long. But it's an arbitrary number, and there's no specification for how long a server needs to hold a courtesy client. We can make this number anything that is both convenient for the server implementation and valuable for outage recovery. >> You can limit the number of clients as well, but that leads into a rats >> nest of other issues that have nothing to do with courtesy locks and >> everything to do with the fact that any client can hold a lot of state. > > The issue we currently have with courteous server and pynfs 4.0 tests > is the number of courteous 4.0 clients the server has to expire when a > share reservation conflict occurs when servicing the OPEN. Each client > owns only few state in this case so we think the server spent most time > for deleting client's record in /var/lib/nfs. This is why we plan to > limit the number of courteous clients for now. As a side effect, it might > also help to reduce resource consumption too. I am a little concerned that we are trying to optimize a case that won't happen during practice. pynfs does not reflect any kind of realistic or reasonable client behavior -- it's designed to test very specific server operations. All that needs to happen, IMO, is that the server needs to protect itself from resource exhaustion (which may occur for any of the minor versions). So I'm taking a shine to the idea of using a shrinker to trim the older courtesy clients, rather than placing an arbitrary limit on the number of courtesy clients the server can hold at once. A shrinker should take into account the wide variance in the amount of lease state each client might have. -- Chuck Lever ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-30 15:36 ` Chuck Lever III @ 2021-11-30 16:05 ` Bruce Fields 2021-11-30 16:14 ` Trond Myklebust 0 siblings, 1 reply; 53+ messages in thread From: Bruce Fields @ 2021-11-30 16:05 UTC (permalink / raw) To: Chuck Lever III Cc: Dai Ngo, Trond Myklebust, Linux NFS Mailing List, linux-fsdevel On Tue, Nov 30, 2021 at 03:36:43PM +0000, Chuck Lever III wrote: > I am a little concerned that we are trying to optimize a case > that won't happen during practice. pynfs does not reflect any > kind of realistic or reasonable client behavior -- it's designed > to test very specific server operations. I wonder how hard this problem would be to hit in normal use. I mean, a few hundred or a thousand clients doesn't sound that crazy. This case depends on an open deny, but you could hit the same problem with file locks. Would it be that weird to have a client trying to get a write lock on a file read-locked by a bunch of other clients? --b. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-30 16:05 ` Bruce Fields @ 2021-11-30 16:14 ` Trond Myklebust 2021-11-30 19:01 ` bfields 0 siblings, 1 reply; 53+ messages in thread From: Trond Myklebust @ 2021-11-30 16:14 UTC (permalink / raw) To: bfields, chuck.lever; +Cc: dai.ngo, linux-nfs, linux-fsdevel On Tue, 2021-11-30 at 11:05 -0500, Bruce Fields wrote: > On Tue, Nov 30, 2021 at 03:36:43PM +0000, Chuck Lever III wrote: > > I am a little concerned that we are trying to optimize a case > > that won't happen during practice. pynfs does not reflect any > > kind of realistic or reasonable client behavior -- it's designed > > to test very specific server operations. > > I wonder how hard this problem would be to hit in normal use. I > mean, a > few hundred or a thousand clients doesn't sound that crazy. This > case > depends on an open deny, but you could hit the same problem with file > locks. Would it be that weird to have a client trying to get a write > lock on a file read-locked by a bunch of other clients? > That's a scenario that is subject to starvation problems anyway. Particularly so on NFSv4.0, which lacks CB_NOTIFY_LOCK. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-30 16:14 ` Trond Myklebust @ 2021-11-30 19:01 ` bfields 0 siblings, 0 replies; 53+ messages in thread From: bfields @ 2021-11-30 19:01 UTC (permalink / raw) To: Trond Myklebust; +Cc: chuck.lever, dai.ngo, linux-nfs, linux-fsdevel On Tue, Nov 30, 2021 at 04:14:10PM +0000, Trond Myklebust wrote: > On Tue, 2021-11-30 at 11:05 -0500, Bruce Fields wrote: > > On Tue, Nov 30, 2021 at 03:36:43PM +0000, Chuck Lever III wrote: > > > I am a little concerned that we are trying to optimize a case > > > that won't happen during practice. pynfs does not reflect any > > > kind of realistic or reasonable client behavior -- it's designed > > > to test very specific server operations. > > > > I wonder how hard this problem would be to hit in normal use. I > > mean, a > > few hundred or a thousand clients doesn't sound that crazy. This > > case > > depends on an open deny, but you could hit the same problem with file > > locks. Would it be that weird to have a client trying to get a write > > lock on a file read-locked by a bunch of other clients? > > > > That's a scenario that is subject to starvation problems anyway. Yes, if it's hundreds of clients continuously grabbing read locks. But if it's something like: send all the readers a signal, then request a write lock as a way to wait for them to finish; then you'd normally expect to get it soon after the last client drops its lock. I don't know, maybe that's uncommon. --b. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-30 1:42 ` Chuck Lever III 2021-11-30 4:08 ` Trond Myklebust @ 2021-11-30 7:13 ` dai.ngo 2021-11-30 15:32 ` Bruce Fields 1 sibling, 1 reply; 53+ messages in thread From: dai.ngo @ 2021-11-30 7:13 UTC (permalink / raw) To: Chuck Lever III; +Cc: Bruce Fields, Linux NFS Mailing List, linux-fsdevel On 11/29/21 5:42 PM, Chuck Lever III wrote: >> On Nov 29, 2021, at 7:11 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >> >> >>> On 11/29/21 1:10 PM, Chuck Lever III wrote: >>> >>>>> On Nov 29, 2021, at 2:36 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >>>> >>>> On 11/29/21 11:03 AM, Chuck Lever III wrote: >>>>> Hello Dai! >>>>> >>>>> >>>>>> On Nov 29, 2021, at 1:32 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >>>>>> >>>>>> >>>>>> On 11/29/21 9:30 AM, J. Bruce Fields wrote: >>>>>>> On Mon, Nov 29, 2021 at 09:13:16AM -0800, dai.ngo@oracle.com wrote: >>>>>>>> Hi Bruce, >>>>>>>> >>>>>>>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>>>>>>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>>>>>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: >>>>>>>>>>> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: >>>>>>>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>>>>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: >>>>>>>>>>>>>> Just a reminder that this patch is still waiting for your review. >>>>>>>>>>>>> Yeah, I was procrastinating and hoping yo'ud figure out the pynfs >>>>>>>>>>>>> failure for me.... >>>>>>>>>>>> Last time I ran 4.0 OPEN18 test by itself and it passed. I will run >>>>>>>>>>>> all OPEN tests together with 5.15-rc7 to see if the problem you've >>>>>>>>>>>> seen still there. >>>>>>>>>>> I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous >>>>>>>>>>> 5.15-rc7 server. >>>>>>>>>>> >>>>>>>>>>> Nfs4.1 results are the same for both courteous and >>>>>>>>>>> non-courteous server: >>>>>>>>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed >>>>>>>>>>> Results of nfs4.0 with non-courteous server: >>>>>>>>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>>>>>> test failed: LOCK24 >>>>>>>>>>> >>>>>>>>>>> Results of nfs4.0 with courteous server: >>>>>>>>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed >>>>>>>>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>>>>>>>> >>>>>>>>>>> OPEN18 and OPEN30 test pass if each is run by itself. >>>>>>>>>> Could well be a bug in the tests, I don't know. >>>>>>>>> The reason OPEN18 failed was because the test timed out waiting for >>>>>>>>> the reply of an OPEN call. The RPC connection used for the test was >>>>>>>>> configured with 15 secs timeout. Note that OPEN18 only fails when >>>>>>>>> the tests were run with 'all' option, this test passes if it's run >>>>>>>>> by itself. >>>>>>>>> >>>>>>>>> With courteous server, by the time OPEN18 runs, there are about 1026 >>>>>>>>> courtesy 4.0 clients on the server and all of these clients have opened >>>>>>>>> the same file X with WRITE access. These clients were created by the >>>>>>>>> previous tests. After each test completed, since 4.0 does not have >>>>>>>>> session, the client states are not cleaned up immediately on the >>>>>>>>> server and are allowed to become courtesy clients. >>>>>>>>> >>>>>>>>> When OPEN18 runs (about 20 minutes after the 1st test started), it >>>>>>>>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the >>>>>>>>> server to check for conflicts with courtesy clients. The loop that >>>>>>>>> checks 1026 courtesy clients for share/access conflict took less >>>>>>>>> than 1 sec. But it took about 55 secs, on my VM, for the server >>>>>>>>> to expire all 1026 courtesy clients. >>>>>>>>> >>>>>>>>> I modified pynfs to configure the 4.0 RPC connection with 60 seconds >>>>>>>>> timeout and OPEN18 now consistently passed. The 4.0 test results are >>>>>>>>> now the same for courteous and non-courteous server: >>>>>>>>> >>>>>>>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>>>> >>>>>>>>> Note that 4.1 tests do not suffer this timeout problem because the >>>>>>>>> 4.1 clients and sessions are destroyed after each test completes. >>>>>>>> Do you want me to send the patch to increase the timeout for pynfs? >>>>>>>> or is there any other things you think we should do? >>>>>>> I don't know. >>>>>>> >>>>>>> 55 seconds to clean up 1026 clients is about 50ms per client, which is >>>>>>> pretty slow. I wonder why. I guess it's probably updating the stable >>>>>>> storage information. Is /var/lib/nfs/ on your server backed by a hard >>>>>>> drive or an SSD or something else? >>>>>> My server is a virtualbox VM that has 1 CPU, 4GB RAM and 64GB of hard >>>>>> disk. I think a production system that supports this many clients should >>>>>> have faster CPUs, faster storage. >>>>>> >>>>>>> I wonder if that's an argument for limiting the number of courtesy >>>>>>> clients. >>>>>> I think we might want to treat 4.0 clients a bit different from 4.1 >>>>>> clients. With 4.0, every client will become a courtesy client after >>>>>> the client is done with the export and unmounts it. >>>>> It should be safe for a server to purge a client's lease immediately >>>>> if there is no open or lock state associated with it. >>>> In this case, each client has opened files so there are open states >>>> associated with them. >>>> >>>>> When an NFSv4.0 client unmounts, all files should be closed at that >>>>> point, >>>> I'm not sure pynfs does proper clean up after each subtest, I will >>>> check. There must be state associated with the client in order for >>>> it to become courtesy client. >>> Makes sense. Then a synthetic client like pynfs can DoS a courteous >>> server. >>> >>> >>>>> so the server can wait for the lease to expire and purge it >>>>> normally. Or am I missing something? >>>> When 4.0 client lease expires and there are still states associated >>>> with the client then the server allows this client to become courtesy >>>> client. >>> I think the same thing happens if an NFSv4.1 client neglects to send >>> DESTROY_SESSION / DESTROY_CLIENTID. Either such a client is broken >>> or malicious, but the server faces the same issue of protecting >>> itself from a DoS attack. >>> >>> IMO you should consider limiting the number of courteous clients >>> the server can hold onto. Let's say that number is 1000. When the >>> server wants to turn a 1001st client into a courteous client, it >>> can simply expire and purge the oldest courteous client on its >>> list. Otherwise, over time, the 24-hour expiry will reduce the >>> set of courteous clients back to zero. >>> >>> What do you think? >> Limiting the number of courteous clients to handle the cases of >> broken/malicious 4.1 clients seems reasonable as the last resort. >> >> I think if a malicious 4.1 clients could mount the server's export, >> opens a file (to create state) and repeats the same with a different >> client id then it seems like some basic security was already broken; >> allowing unauthorized clients to mount server's exports. > You can do this today with AUTH_SYS. I consider it a genuine attack surface. > > >> I think if we have to enforce a limit, then it's only for handling >> of seriously buggy 4.1 clients which should not be the norm. The >> issue with this is how to pick an optimal number that is suitable >> for the running server which can be a very slow or a very fast server. >> >> Note that even if we impose an limit, that does not completely solve >> the problem with pynfs 4.0 test since its RPC timeout is configured >> with 15 secs which just enough to expire 277 clients based on 53ms >> for each client, unless we limit it ~270 clients which I think it's >> too low. >> >> This is what I plan to do: >> >> 1. do not support 4.0 courteous clients, for sure. > Not supporting 4.0 isn’t an option, IMHO. It is a fully supported protocol at this time, and the same exposure exists for 4.1, it’s just a little harder to exploit. > > If you submit the courteous server patch without support for 4.0, I think it needs to include a plan for how 4.0 will be added later. Seems like we should support both 4.0 and 4.x (x>=1) at the same time. > > >> 2. limit the number of courteous clients to 1000 (?), if you still >> think we need it. > I think this limit is necessary. It can be set based on the server’s physical memory size if a dynamic limit is desired. Just to be clear, the problem of pynfs with 4.0 is that the server takes ~55 secs to expire 1026 4.0 courteous clients, which comes out to ~50ms per client. This causes the test to time out in waiting for RPC reply of the OPEN that triggers the conflicts. I don't know exactly where the time spent in the process of expiring a client. But as Bruce mentioned, it could be related to the time to access /var/lib/nfs to remove the client's persistent record. I think that is most likely the case because the number of states owned by each client should be small since each test is short and does simple ops. So I think this problem is related to the number of clients and not number of states owned by the clients. This is not the memory resource shortage problem due to too many state which we have planned to address it after the initial phase. I'd vote to use a static limit for now, say 1000 clients, to avoid complicating the courteous server code for something that would not happen most of the time. -Dai > > >> Pls let me know what you think. >> >> Thanks, >> -Dai >> >>> >>>>>> Since there is >>>>>> no destroy session/client with 4.0, the courteous server allows the >>>>>> client to be around and becomes a courtesy client. So after awhile, >>>>>> even with normal usage, there will be lots 4.0 courtesy clients >>>>>> hanging around and these clients won't be destroyed until 24hrs >>>>>> later, or until they cause conflicts with other clients. >>>>>> >>>>>> We can reduce the courtesy_client_expiry time for 4.0 clients from >>>>>> 24hrs to 15/20 mins, enough for most network partition to heal?, >>>>>> or limit the number of 4.0 courtesy clients. Or don't support 4.0 >>>>>> clients at all which is my preference since I think in general users >>>>>> should skip 4.0 and use 4.1 instead. >>>>>> >>>>>> -Dai >>>>> -- >>>>> Chuck Lever >>>>> >>>>> >>>>> >>> -- >>> Chuck Lever >>> >>> >>> ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-30 7:13 ` dai.ngo @ 2021-11-30 15:32 ` Bruce Fields 2021-12-01 3:50 ` dai.ngo 0 siblings, 1 reply; 53+ messages in thread From: Bruce Fields @ 2021-11-30 15:32 UTC (permalink / raw) To: dai.ngo; +Cc: Chuck Lever III, Linux NFS Mailing List, linux-fsdevel On Mon, Nov 29, 2021 at 11:13:34PM -0800, dai.ngo@oracle.com wrote: > Just to be clear, the problem of pynfs with 4.0 is that the server takes > ~55 secs to expire 1026 4.0 courteous clients, which comes out to ~50ms > per client. This causes the test to time out in waiting for RPC reply of > the OPEN that triggers the conflicts. > > I don't know exactly where the time spent in the process of expiring a > client. But as Bruce mentioned, it could be related to the time to access > /var/lib/nfs to remove the client's persistent record. Could you try something like strace -r -$(pidof) -oTRACE and maybe we could take a look at TRACE? My hope would be that there'd be a clear set of syscalls whose time, multiplied by 1026, explains most of that 55 seconds. Then it might be worth checking whether there are any easy optimizations possible. --b. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-11-30 15:32 ` Bruce Fields @ 2021-12-01 3:50 ` dai.ngo 2021-12-01 14:36 ` Bruce Fields 0 siblings, 1 reply; 53+ messages in thread From: dai.ngo @ 2021-12-01 3:50 UTC (permalink / raw) To: Bruce Fields; +Cc: Chuck Lever III, Linux NFS Mailing List, linux-fsdevel On 11/30/21 7:32 AM, Bruce Fields wrote: > On Mon, Nov 29, 2021 at 11:13:34PM -0800, dai.ngo@oracle.com wrote: >> Just to be clear, the problem of pynfs with 4.0 is that the server takes >> ~55 secs to expire 1026 4.0 courteous clients, which comes out to ~50ms >> per client. This causes the test to time out in waiting for RPC reply of >> the OPEN that triggers the conflicts. >> >> I don't know exactly where the time spent in the process of expiring a >> client. But as Bruce mentioned, it could be related to the time to access >> /var/lib/nfs to remove the client's persistent record. > Could you try something like > > strace -r -$(pidof) -oTRACE Strace does not have any info that shows where the server spent time when expiring client state. The client record is removed by nfsd4_umh_cltrack_remove doing upcall to user space helper /sbin/nfsdcltrack to do the job. I used the low-tech debug tool, printk, to measure the time spent by nfsd4_client_record_remove. Here is a sample of the output, START and END are in milliseconds: Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: START [0x15d418] clp[ffff888119206040] client_tracking_ops[ffffffffa04bc2e0] Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: END [0x15d459] clp[ffff888119206040] client_tracking_ops[ffffffffa04bc2e0] Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: START [0x15d461] clp[ffff888119206740] client_tracking_ops[ffffffffa04bc2e0] Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: END [0x15d48e] clp[ffff888119206740] client_tracking_ops[ffffffffa04bc2e0] Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: START [0x15d49c] clp[ffff88811b54e000] client_tracking_ops[ffffffffa04bc2e0] Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: END [0x15d4c5] clp[ffff88811b54e000] client_tracking_ops[ffffffffa04bc2e0] The average time spent to remove the client record is about ~50ms, matches with the time reported by pynfs test. This confirms what Bruce suspected earlier. -Dai > > and maybe we could take a look at TRACE? My hope would be that there'd > be a clear set of syscalls whose time, multiplied by 1026, explains most > of that 55 seconds. Then it might be worth checking whether there are > any easy optimizations possible. > > --b. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-12-01 3:50 ` dai.ngo @ 2021-12-01 14:36 ` Bruce Fields 2021-12-01 14:51 ` Bruce Fields 2021-12-01 17:42 ` Bruce Fields 0 siblings, 2 replies; 53+ messages in thread From: Bruce Fields @ 2021-12-01 14:36 UTC (permalink / raw) To: dai.ngo; +Cc: Chuck Lever III, Linux NFS Mailing List, linux-fsdevel On Tue, Nov 30, 2021 at 07:50:13PM -0800, dai.ngo@oracle.com wrote: > > On 11/30/21 7:32 AM, Bruce Fields wrote: > >On Mon, Nov 29, 2021 at 11:13:34PM -0800, dai.ngo@oracle.com wrote: > >>Just to be clear, the problem of pynfs with 4.0 is that the server takes > >>~55 secs to expire 1026 4.0 courteous clients, which comes out to ~50ms > >>per client. This causes the test to time out in waiting for RPC reply of > >>the OPEN that triggers the conflicts. > >> > >>I don't know exactly where the time spent in the process of expiring a > >>client. But as Bruce mentioned, it could be related to the time to access > >>/var/lib/nfs to remove the client's persistent record. > >Could you try something like > > > > strace -r -$(pidof) -oTRACE Oops, I mean $(pidof nfsdcld). But, your system isn't using that: > > Strace does not have any info that shows where the server spent time when > expiring client state. The client record is removed by nfsd4_umh_cltrack_remove > doing upcall to user space helper /sbin/nfsdcltrack to do the job. I used > the low-tech debug tool, printk, to measure the time spent by > nfsd4_client_record_remove. Here is a sample of the output, START and END > are in milliseconds: > > Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: START [0x15d418] clp[ffff888119206040] client_tracking_ops[ffffffffa04bc2e0] > Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: END [0x15d459] clp[ffff888119206040] client_tracking_ops[ffffffffa04bc2e0] > Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: START [0x15d461] clp[ffff888119206740] client_tracking_ops[ffffffffa04bc2e0] > Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: END [0x15d48e] clp[ffff888119206740] client_tracking_ops[ffffffffa04bc2e0] > Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: START [0x15d49c] clp[ffff88811b54e000] client_tracking_ops[ffffffffa04bc2e0] > Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: END [0x15d4c5] clp[ffff88811b54e000] client_tracking_ops[ffffffffa04bc2e0] > > The average time spent to remove the client record is about ~50ms, matches > with the time reported by pynfs test. This confirms what Bruce suspected > earlier. OK, good to know. It'd be interesting to dig into where nfsdcltrack is spending its time, which we could do by replacing it with a wrapper that runs the real nfsdcltrack under strace. Though maybe it'd be better to do this on a system using nfsdcld, since that's what we're transitioning to. --b. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-12-01 14:36 ` Bruce Fields @ 2021-12-01 14:51 ` Bruce Fields 2021-12-01 18:47 ` dai.ngo 2021-12-02 17:53 ` Chuck Lever III 2021-12-01 17:42 ` Bruce Fields 1 sibling, 2 replies; 53+ messages in thread From: Bruce Fields @ 2021-12-01 14:51 UTC (permalink / raw) To: dai.ngo; +Cc: Chuck Lever III, Linux NFS Mailing List, linux-fsdevel Do you have a public git tree with your latest patches? --b. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-12-01 14:51 ` Bruce Fields @ 2021-12-01 18:47 ` dai.ngo 2021-12-01 19:25 ` Bruce Fields 2021-12-02 17:53 ` Chuck Lever III 1 sibling, 1 reply; 53+ messages in thread From: dai.ngo @ 2021-12-01 18:47 UTC (permalink / raw) To: Bruce Fields; +Cc: Chuck Lever III, Linux NFS Mailing List, linux-fsdevel On 12/1/21 6:51 AM, Bruce Fields wrote: > Do you have a public git tree with your latest patches? No, I don't but I can push it to Chuck's public tree. I need to prepare the patch. -Dai > > --b. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-12-01 18:47 ` dai.ngo @ 2021-12-01 19:25 ` Bruce Fields 0 siblings, 0 replies; 53+ messages in thread From: Bruce Fields @ 2021-12-01 19:25 UTC (permalink / raw) To: dai.ngo; +Cc: Chuck Lever III, Linux NFS Mailing List, linux-fsdevel On Wed, Dec 01, 2021 at 10:47:28AM -0800, dai.ngo@oracle.com wrote: > > On 12/1/21 6:51 AM, Bruce Fields wrote: > >Do you have a public git tree with your latest patches? > > No, I don't but I can push it to Chuck's public tree. I need to prepare the patch. OK, it's not a big deal.--b. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-12-01 14:51 ` Bruce Fields 2021-12-01 18:47 ` dai.ngo @ 2021-12-02 17:53 ` Chuck Lever III 1 sibling, 0 replies; 53+ messages in thread From: Chuck Lever III @ 2021-12-02 17:53 UTC (permalink / raw) To: Bruce Fields; +Cc: Dai Ngo, Linux NFS Mailing List, linux-fsdevel > On Dec 1, 2021, at 9:51 AM, Bruce Fields <bfields@fieldses.org> wrote: > > Do you have a public git tree with your latest patches? > > --b. Dai's patches have been pushed to the nfsd-courteous-server topic branch at git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git I can fold them into my for-next branch if we agree they are ready for broader test exposure. -- Chuck Lever ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-12-01 14:36 ` Bruce Fields 2021-12-01 14:51 ` Bruce Fields @ 2021-12-01 17:42 ` Bruce Fields 2021-12-01 18:03 ` Bruce Fields 1 sibling, 1 reply; 53+ messages in thread From: Bruce Fields @ 2021-12-01 17:42 UTC (permalink / raw) To: dai.ngo; +Cc: Chuck Lever III, Linux NFS Mailing List, linux-fsdevel On Wed, Dec 01, 2021 at 09:36:30AM -0500, Bruce Fields wrote: > OK, good to know. It'd be interesting to dig into where nfsdcltrack is > spending its time, which we could do by replacing it with a wrapper that > runs the real nfsdcltrack under strace. > > Though maybe it'd be better to do this on a system using nfsdcld, since > that's what we're transitioning to. Trying that on a test VM here, I see each upcall doing 3 fdatasyncs() of an sqlite-journal file. On my setup, each of those is taking a few milliseconds. I wonder if it an do better. --b. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-12-01 17:42 ` Bruce Fields @ 2021-12-01 18:03 ` Bruce Fields 2021-12-01 19:50 ` Bruce Fields 0 siblings, 1 reply; 53+ messages in thread From: Bruce Fields @ 2021-12-01 18:03 UTC (permalink / raw) To: dai.ngo; +Cc: Chuck Lever III, Linux NFS Mailing List, linux-fsdevel On Wed, Dec 01, 2021 at 12:42:05PM -0500, Bruce Fields wrote: > On Wed, Dec 01, 2021 at 09:36:30AM -0500, Bruce Fields wrote: > > OK, good to know. It'd be interesting to dig into where nfsdcltrack is > > spending its time, which we could do by replacing it with a wrapper that > > runs the real nfsdcltrack under strace. > > > > Though maybe it'd be better to do this on a system using nfsdcld, since > > that's what we're transitioning to. > > Trying that on a test VM here, I see each upcall doing 3 fdatasyncs() of > an sqlite-journal file. On my setup, each of those is taking a few > milliseconds. I wonder if it an do better. If I understand the sqlite documentation correctly, I *think* that if we use journal_mode WAL with synchronous FULL, we should get the assurances nfsd needs with one sync per transaction. --b. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-12-01 18:03 ` Bruce Fields @ 2021-12-01 19:50 ` Bruce Fields 2021-12-03 21:22 ` Bruce Fields 0 siblings, 1 reply; 53+ messages in thread From: Bruce Fields @ 2021-12-01 19:50 UTC (permalink / raw) To: dai.ngo; +Cc: Chuck Lever III, Linux NFS Mailing List, linux-fsdevel On Wed, Dec 01, 2021 at 01:03:39PM -0500, Bruce Fields wrote: > On Wed, Dec 01, 2021 at 12:42:05PM -0500, Bruce Fields wrote: > > On Wed, Dec 01, 2021 at 09:36:30AM -0500, Bruce Fields wrote: > > > OK, good to know. It'd be interesting to dig into where nfsdcltrack is > > > spending its time, which we could do by replacing it with a wrapper that > > > runs the real nfsdcltrack under strace. > > > > > > Though maybe it'd be better to do this on a system using nfsdcld, since > > > that's what we're transitioning to. > > > > Trying that on a test VM here, I see each upcall doing 3 fdatasyncs() of > > an sqlite-journal file. On my setup, each of those is taking a few > > milliseconds. I wonder if it an do better. > > If I understand the sqlite documentation correctly, I *think* that if we > use journal_mode WAL with synchronous FULL, we should get the assurances > nfsd needs with one sync per transaction. So I *think* that would mean just doing something like (untested, don't have much idea what I'm doing): diff --git a/utils/nfsdcld/sqlite.c b/utils/nfsdcld/sqlite.c index 03016fb95823..b30f2614497b 100644 --- a/utils/nfsdcld/sqlite.c +++ b/utils/nfsdcld/sqlite.c @@ -826,6 +826,13 @@ sqlite_prepare_dbh(const char *topdir) goto out_close; } + ret = sqlite3_exec(dbh, "PRAGMA journal_mode = WAL;", NULL, NULL, NULL); + if (ret) + goto out_close; + ret = sqlite3_exec(dbh, "PRAGMA synchronous = FULL;", NULL, NULL, NULL); + if (ret) + goto out_close; + ret = sqlite_query_schema_version(); switch (ret) { case CLD_SQLITE_LATEST_SCHEMA_VERSION: I also wonder how expensive may be the extra overhead of starting up nfsdcltrack each time. --b. ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server 2021-12-01 19:50 ` Bruce Fields @ 2021-12-03 21:22 ` Bruce Fields 2021-12-03 21:55 ` [PATCH] nfsdcld: use WAL journal for faster commits Bruce Fields 0 siblings, 1 reply; 53+ messages in thread From: Bruce Fields @ 2021-12-03 21:22 UTC (permalink / raw) To: dai.ngo; +Cc: Chuck Lever III, Linux NFS Mailing List, linux-fsdevel On Wed, Dec 01, 2021 at 02:50:50PM -0500, Bruce Fields wrote: > On Wed, Dec 01, 2021 at 01:03:39PM -0500, Bruce Fields wrote: > > On Wed, Dec 01, 2021 at 12:42:05PM -0500, Bruce Fields wrote: > > > On Wed, Dec 01, 2021 at 09:36:30AM -0500, Bruce Fields wrote: > > > > OK, good to know. It'd be interesting to dig into where nfsdcltrack is > > > > spending its time, which we could do by replacing it with a wrapper that > > > > runs the real nfsdcltrack under strace. > > > > > > > > Though maybe it'd be better to do this on a system using nfsdcld, since > > > > that's what we're transitioning to. > > > > > > Trying that on a test VM here, I see each upcall doing 3 fdatasyncs() of > > > an sqlite-journal file. On my setup, each of those is taking a few > > > milliseconds. I wonder if it an do better. > > > > If I understand the sqlite documentation correctly, I *think* that if we > > use journal_mode WAL with synchronous FULL, we should get the assurances > > nfsd needs with one sync per transaction. > > So I *think* that would mean just doing something like (untested, don't have > much idea what I'm doing): OK, tried that out on my test VM, and: yes, the resulting strace was much simpler (and, in particular, had only one fdatasync per upcall instead of 3), and total time to expire 1000 courtesy clients was 6.5 seconds instead of 15.9. So, I'll clean up that patch and pass it along to Steve D. This is all a bit of a derail, I know, but I suspect this will be a bottleneck in other cases too, like when a lot of clients are reclaiming after reboot. We do need nfsdcld to sync to disk before returning to the kernel, so this probably can't be further optimized without doing something more complicated to allow some kind of parallelism and batching. So if you have a ton of clients you'll just need /var/lib/nfs to be on low-latency storage. --b. > > diff --git a/utils/nfsdcld/sqlite.c b/utils/nfsdcld/sqlite.c > index 03016fb95823..b30f2614497b 100644 > --- a/utils/nfsdcld/sqlite.c > +++ b/utils/nfsdcld/sqlite.c > @@ -826,6 +826,13 @@ sqlite_prepare_dbh(const char *topdir) > goto out_close; > } > > + ret = sqlite3_exec(dbh, "PRAGMA journal_mode = WAL;", NULL, NULL, NULL); > + if (ret) > + goto out_close; > + ret = sqlite3_exec(dbh, "PRAGMA synchronous = FULL;", NULL, NULL, NULL); > + if (ret) > + goto out_close; > + > ret = sqlite_query_schema_version(); > switch (ret) { > case CLD_SQLITE_LATEST_SCHEMA_VERSION: > > I also wonder how expensive may be the extra overhead of starting up > nfsdcltrack each time. > > --b. ^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH] nfsdcld: use WAL journal for faster commits 2021-12-03 21:22 ` Bruce Fields @ 2021-12-03 21:55 ` Bruce Fields 2021-12-03 22:07 ` Chuck Lever III 0 siblings, 1 reply; 53+ messages in thread From: Bruce Fields @ 2021-12-03 21:55 UTC (permalink / raw) To: Steve Dickson; +Cc: dai.ngo, linux-nfs From: "J. Bruce Fields" <bfields@redhat.com> Currently nfsdcld is doing three fdatasyncs for each upcall. Based on SQLite documentation, WAL mode should also be safe, and I can confirm from an strace that it results in only one fdatasync each. This may be a bottleneck e.g. when lots of clients are being created or expired at once (e.g. on reboot). Not bothering with error checking, as this is just an optimization and nfsdcld will still function without. (Might be better to log something on failure, though.) Signed-off-by: J. Bruce Fields <bfields@redhat.com> --- utils/nfsdcld/sqlite.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/utils/nfsdcld/sqlite.c b/utils/nfsdcld/sqlite.c index 03016fb95823..b248eeffa204 100644 --- a/utils/nfsdcld/sqlite.c +++ b/utils/nfsdcld/sqlite.c @@ -826,6 +826,9 @@ sqlite_prepare_dbh(const char *topdir) goto out_close; } + sqlite3_exec(dbh, "PRAGMA journal_mode = WAL;", NULL, NULL, NULL); + sqlite3_exec(dbh, "PRAGMA synchronous = FULL;", NULL, NULL, NULL); + ret = sqlite_query_schema_version(); switch (ret) { case CLD_SQLITE_LATEST_SCHEMA_VERSION: -- 2.33.1 ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH] nfsdcld: use WAL journal for faster commits 2021-12-03 21:55 ` [PATCH] nfsdcld: use WAL journal for faster commits Bruce Fields @ 2021-12-03 22:07 ` Chuck Lever III 2021-12-03 22:39 ` Bruce Fields 0 siblings, 1 reply; 53+ messages in thread From: Chuck Lever III @ 2021-12-03 22:07 UTC (permalink / raw) To: Bruce Fields; +Cc: Steve Dickson, Dai Ngo, Linux NFS Mailing List > On Dec 3, 2021, at 4:55 PM, Bruce Fields <bfields@fieldses.org> wrote: > > From: "J. Bruce Fields" <bfields@redhat.com> > > Currently nfsdcld is doing three fdatasyncs for each upcall. Based on > SQLite documentation, WAL mode should also be safe, and I can confirm > from an strace that it results in only one fdatasync each. > > This may be a bottleneck e.g. when lots of clients are being created or > expired at once (e.g. on reboot). > > Not bothering with error checking, as this is just an optimization and > nfsdcld will still function without. (Might be better to log something > on failure, though.) I'm in full philosophical agreement for performance improvements in this area. There are some caveats for WAL: - It requires SQLite v3.7.0 (2010). configure.ac may need to be updated. - All processes using the DB file have to be on the same system. Not sure if this matters for some DR scenarios that nfs-utils is supposed to support. - WAL mode is persistent; you could set this at database creation time and it should be sticky. - Documentation says "synchronous = FULL is the most commonly used synchronous setting when not in WAL mode." Why are both PRAGMAs necessary here? I agree that nfsdcld functionality is not affected if the first PRAGMA fails. > Signed-off-by: J. Bruce Fields <bfields@redhat.com> > --- > utils/nfsdcld/sqlite.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/utils/nfsdcld/sqlite.c b/utils/nfsdcld/sqlite.c > index 03016fb95823..b248eeffa204 100644 > --- a/utils/nfsdcld/sqlite.c > +++ b/utils/nfsdcld/sqlite.c > @@ -826,6 +826,9 @@ sqlite_prepare_dbh(const char *topdir) > goto out_close; > } > > + sqlite3_exec(dbh, "PRAGMA journal_mode = WAL;", NULL, NULL, NULL); > + sqlite3_exec(dbh, "PRAGMA synchronous = FULL;", NULL, NULL, NULL); > + > ret = sqlite_query_schema_version(); > switch (ret) { > case CLD_SQLITE_LATEST_SCHEMA_VERSION: > -- > 2.33.1 > -- Chuck Lever ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] nfsdcld: use WAL journal for faster commits 2021-12-03 22:07 ` Chuck Lever III @ 2021-12-03 22:39 ` Bruce Fields 2021-12-04 0:35 ` Chuck Lever III 0 siblings, 1 reply; 53+ messages in thread From: Bruce Fields @ 2021-12-03 22:39 UTC (permalink / raw) To: Chuck Lever III; +Cc: Steve Dickson, Dai Ngo, Linux NFS Mailing List On Fri, Dec 03, 2021 at 10:07:19PM +0000, Chuck Lever III wrote: > > > > On Dec 3, 2021, at 4:55 PM, Bruce Fields <bfields@fieldses.org> wrote: > > > > From: "J. Bruce Fields" <bfields@redhat.com> > > > > Currently nfsdcld is doing three fdatasyncs for each upcall. Based on > > SQLite documentation, WAL mode should also be safe, and I can confirm > > from an strace that it results in only one fdatasync each. > > > > This may be a bottleneck e.g. when lots of clients are being created or > > expired at once (e.g. on reboot). > > > > Not bothering with error checking, as this is just an optimization and > > nfsdcld will still function without. (Might be better to log something > > on failure, though.) > > I'm in full philosophical agreement for performance improvements > in this area. There are some caveats for WAL: > > - It requires SQLite v3.7.0 (2010). configure.ac may need to be > updated. Makes sense. But I dug around a bit, and how to do this is a total mystery to me.... > - All processes using the DB file have to be on the same system. > Not sure if this matters for some DR scenarios that nfs-utils > is supposed to support. I don't think we support that. > - WAL mode is persistent; you could set this at database creation > time and it should be sti cky. I wanted to upgrade existing databases too, and figured there's no harm in calling it on each startup. > - Documentation says "synchronous = FULL is the most commonly > used synchronous setting when not in WAL mode." Why are both > PRAGMAs necessary here? Maybe they're not; I think I did see that FULL is actually the default but I can't find that in the documentation right now. --b. > > I agree that nfsdcld functionality is not affected if the first > PRAGMA fails. > > > > Signed-off-by: J. Bruce Fields <bfields@redhat.com> > > --- > > utils/nfsdcld/sqlite.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/utils/nfsdcld/sqlite.c b/utils/nfsdcld/sqlite.c > > index 03016fb95823..b248eeffa204 100644 > > --- a/utils/nfsdcld/sqlite.c > > +++ b/utils/nfsdcld/sqlite.c > > @@ -826,6 +826,9 @@ sqlite_prepare_dbh(const char *topdir) > > goto out_close; > > } > > > > + sqlite3_exec(dbh, "PRAGMA journal_mode = WAL;", NULL, NULL, NULL); > > + sqlite3_exec(dbh, "PRAGMA synchronous = FULL;", NULL, NULL, NULL); > > + > > ret = sqlite_query_schema_version(); > > switch (ret) { > > case CLD_SQLITE_LATEST_SCHEMA_VERSION: > > -- > > 2.33.1 > > > > -- > Chuck Lever > > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] nfsdcld: use WAL journal for faster commits 2021-12-03 22:39 ` Bruce Fields @ 2021-12-04 0:35 ` Chuck Lever III 2021-12-04 1:24 ` Bruce Fields 0 siblings, 1 reply; 53+ messages in thread From: Chuck Lever III @ 2021-12-04 0:35 UTC (permalink / raw) To: Bruce Fields; +Cc: Steve Dickson, Dai Ngo, Linux NFS Mailing List > On Dec 3, 2021, at 5:39 PM, Bruce Fields <bfields@fieldses.org> wrote: > > On Fri, Dec 03, 2021 at 10:07:19PM +0000, Chuck Lever III wrote: >> >> >>>> On Dec 3, 2021, at 4:55 PM, Bruce Fields <bfields@fieldses.org> wrote: >>> >>> From: "J. Bruce Fields" <bfields@redhat.com> >>> >>> Currently nfsdcld is doing three fdatasyncs for each upcall. Based on >>> SQLite documentation, WAL mode should also be safe, and I can confirm >>> from an strace that it results in only one fdatasync each. >>> >>> This may be a bottleneck e.g. when lots of clients are being created or >>> expired at once (e.g. on reboot). >>> >>> Not bothering with error checking, as this is just an optimization and >>> nfsdcld will still function without. (Might be better to log something >>> on failure, though.) >> >> I'm in full philosophical agreement for performance improvements >> in this area. There are some caveats for WAL: >> >> - It requires SQLite v3.7.0 (2010). configure.ac may need to be >> updated. > > Makes sense. But I dug around a bit, and how to do this is a total > mystery to me.... aclocal/libsqlite3.m4 has an SQLITE_VERSION_NUMBER check. >> - All processes using the DB file have to be on the same system. >> Not sure if this matters for some DR scenarios that nfs-utils >> is supposed to support. > > I don't think we support that. > >> - WAL mode is persistent; you could set this at database creation >> time and it should be sti cky. > > I wanted to upgrade existing databases too, and figured there's no harm > in calling it on each startup. Ah. Makes sense! >> - Documentation says "synchronous = FULL is the most commonly >> used synchronous setting when not in WAL mode." Why are both >> PRAGMAs necessary here? > > Maybe they're not; I think I did see that FULL is actually the default > but I can't find that in the documentation right now. > > --b. > >> >> I agree that nfsdcld functionality is not affected if the first >> PRAGMA fails. >> >> >>> Signed-off-by: J. Bruce Fields <bfields@redhat.com> >>> --- >>> utils/nfsdcld/sqlite.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/utils/nfsdcld/sqlite.c b/utils/nfsdcld/sqlite.c >>> index 03016fb95823..b248eeffa204 100644 >>> --- a/utils/nfsdcld/sqlite.c >>> +++ b/utils/nfsdcld/sqlite.c >>> @@ -826,6 +826,9 @@ sqlite_prepare_dbh(const char *topdir) >>> goto out_close; >>> } >>> >>> + sqlite3_exec(dbh, "PRAGMA journal_mode = WAL;", NULL, NULL, NULL); >>> + sqlite3_exec(dbh, "PRAGMA synchronous = FULL;", NULL, NULL, NULL); >>> + >>> ret = sqlite_query_schema_version(); >>> switch (ret) { >>> case CLD_SQLITE_LATEST_SCHEMA_VERSION: >>> -- >>> 2.33.1 >>> >> >> -- >> Chuck Lever >> >> ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] nfsdcld: use WAL journal for faster commits 2021-12-04 0:35 ` Chuck Lever III @ 2021-12-04 1:24 ` Bruce Fields 2021-12-06 15:46 ` Chuck Lever III 0 siblings, 1 reply; 53+ messages in thread From: Bruce Fields @ 2021-12-04 1:24 UTC (permalink / raw) To: Chuck Lever III; +Cc: Steve Dickson, Dai Ngo, Linux NFS Mailing List On Sat, Dec 04, 2021 at 12:35:11AM +0000, Chuck Lever III wrote: > > > On Dec 3, 2021, at 5:39 PM, Bruce Fields <bfields@fieldses.org> wrote: > > > > On Fri, Dec 03, 2021 at 10:07:19PM +0000, Chuck Lever III wrote: > >> > >> > >>>> On Dec 3, 2021, at 4:55 PM, Bruce Fields <bfields@fieldses.org> wrote: > >>> > >>> From: "J. Bruce Fields" <bfields@redhat.com> > >>> > >>> Currently nfsdcld is doing three fdatasyncs for each upcall. Based on > >>> SQLite documentation, WAL mode should also be safe, and I can confirm > >>> from an strace that it results in only one fdatasync each. > >>> > >>> This may be a bottleneck e.g. when lots of clients are being created or > >>> expired at once (e.g. on reboot). > >>> > >>> Not bothering with error checking, as this is just an optimization and > >>> nfsdcld will still function without. (Might be better to log something > >>> on failure, though.) > >> > >> I'm in full philosophical agreement for performance improvements > >> in this area. There are some caveats for WAL: > >> > >> - It requires SQLite v3.7.0 (2010). configure.ac may need to be > >> updated. > > > > Makes sense. But I dug around a bit, and how to do this is a total > > mystery to me.... > > aclocal/libsqlite3.m4 has an SQLITE_VERSION_NUMBER check. I looked got stuck trying to figure out why the #%^ it's comparing to SQLITE_VERSION_NUMBER. OK, I see now, it's just some sanity check. Here's take 2. --b. From c22d3a2d8576273934773fefac933d4f8e04776b Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" <bfields@redhat.com> Date: Fri, 3 Dec 2021 10:27:53 -0500 Subject: [PATCH] nfsdcld: use WAL journal for faster commits Currently nfsdcld is doing three fdatasyncs for each upcall. Based on SQLite documentation, WAL mode should also be safe, and I can confirm from an strace that it results in only one fdatasync each. This may be a bottleneck e.g. when lots of clients are being created or expired at once (e.g. on reboot). Not bothering with error checking, as this is just an optimization and nfsdcld will still function without. (Might be better to log something on failure, though.) Signed-off-by: J. Bruce Fields <bfields@redhat.com> --- aclocal/libsqlite3.m4 | 2 +- utils/nfsdcld/sqlite.c | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/aclocal/libsqlite3.m4 b/aclocal/libsqlite3.m4 index 8c38993cbba8..c3beb4d56f0c 100644 --- a/aclocal/libsqlite3.m4 +++ b/aclocal/libsqlite3.m4 @@ -22,7 +22,7 @@ AC_DEFUN([AC_SQLITE3_VERS], [ int vers = sqlite3_libversion_number(); return vers != SQLITE_VERSION_NUMBER || - vers < 3003000; + vers < 3007000; } ], [libsqlite3_cv_is_recent=yes], [libsqlite3_cv_is_recent=no], [libsqlite3_cv_is_recent=unknown]) diff --git a/utils/nfsdcld/sqlite.c b/utils/nfsdcld/sqlite.c index 03016fb95823..eabb0daa95f5 100644 --- a/utils/nfsdcld/sqlite.c +++ b/utils/nfsdcld/sqlite.c @@ -826,6 +826,8 @@ sqlite_prepare_dbh(const char *topdir) goto out_close; } + sqlite3_exec(dbh, "PRAGMA journal_mode = WAL;", NULL, NULL, NULL); + ret = sqlite_query_schema_version(); switch (ret) { case CLD_SQLITE_LATEST_SCHEMA_VERSION: -- 2.33.1 ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH] nfsdcld: use WAL journal for faster commits 2021-12-04 1:24 ` Bruce Fields @ 2021-12-06 15:46 ` Chuck Lever III 2022-01-04 22:24 ` Bruce Fields 0 siblings, 1 reply; 53+ messages in thread From: Chuck Lever III @ 2021-12-06 15:46 UTC (permalink / raw) To: Bruce Fields; +Cc: Steve Dickson, Dai Ngo, Linux NFS Mailing List > On Dec 3, 2021, at 8:24 PM, Bruce Fields <bfields@fieldses.org> wrote: > > On Sat, Dec 04, 2021 at 12:35:11AM +0000, Chuck Lever III wrote: >> >>> On Dec 3, 2021, at 5:39 PM, Bruce Fields <bfields@fieldses.org> wrote: >>> >>> On Fri, Dec 03, 2021 at 10:07:19PM +0000, Chuck Lever III wrote: >>>> >>>> >>>>>> On Dec 3, 2021, at 4:55 PM, Bruce Fields <bfields@fieldses.org> wrote: >>>>> >>>>> From: "J. Bruce Fields" <bfields@redhat.com> >>>>> >>>>> Currently nfsdcld is doing three fdatasyncs for each upcall. Based on >>>>> SQLite documentation, WAL mode should also be safe, and I can confirm >>>>> from an strace that it results in only one fdatasync each. >>>>> >>>>> This may be a bottleneck e.g. when lots of clients are being created or >>>>> expired at once (e.g. on reboot). >>>>> >>>>> Not bothering with error checking, as this is just an optimization and >>>>> nfsdcld will still function without. (Might be better to log something >>>>> on failure, though.) >>>> >>>> I'm in full philosophical agreement for performance improvements >>>> in this area. There are some caveats for WAL: >>>> >>>> - It requires SQLite v3.7.0 (2010). configure.ac may need to be >>>> updated. >>> >>> Makes sense. But I dug around a bit, and how to do this is a total >>> mystery to me.... >> >> aclocal/libsqlite3.m4 has an SQLITE_VERSION_NUMBER check. > > I looked got stuck trying to figure out why the #%^ it's comparing to > SQLITE_VERSION_NUMBER. OK, I see now, it's just some sanity check. > Here's take 2. > > --b. > > From c22d3a2d8576273934773fefac933d4f8e04776b Mon Sep 17 00:00:00 2001 > From: "J. Bruce Fields" <bfields@redhat.com> > Date: Fri, 3 Dec 2021 10:27:53 -0500 > Subject: [PATCH] nfsdcld: use WAL journal for faster commits > > Currently nfsdcld is doing three fdatasyncs for each upcall. Based on > SQLite documentation, WAL mode should also be safe, and I can confirm > from an strace that it results in only one fdatasync each. > > This may be a bottleneck e.g. when lots of clients are being created or > expired at once (e.g. on reboot). > > Not bothering with error checking, as this is just an optimization and > nfsdcld will still function without. (Might be better to log something > on failure, though.) > > Signed-off-by: J. Bruce Fields <bfields@redhat.com> Nice. Reviewed-by: Chuck Lever <chuck.lever@oracle.com> > --- > aclocal/libsqlite3.m4 | 2 +- > utils/nfsdcld/sqlite.c | 2 ++ > 2 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/aclocal/libsqlite3.m4 b/aclocal/libsqlite3.m4 > index 8c38993cbba8..c3beb4d56f0c 100644 > --- a/aclocal/libsqlite3.m4 > +++ b/aclocal/libsqlite3.m4 > @@ -22,7 +22,7 @@ AC_DEFUN([AC_SQLITE3_VERS], [ > int vers = sqlite3_libversion_number(); > > return vers != SQLITE_VERSION_NUMBER || > - vers < 3003000; > + vers < 3007000; > } > ], [libsqlite3_cv_is_recent=yes], [libsqlite3_cv_is_recent=no], > [libsqlite3_cv_is_recent=unknown]) > diff --git a/utils/nfsdcld/sqlite.c b/utils/nfsdcld/sqlite.c > index 03016fb95823..eabb0daa95f5 100644 > --- a/utils/nfsdcld/sqlite.c > +++ b/utils/nfsdcld/sqlite.c > @@ -826,6 +826,8 @@ sqlite_prepare_dbh(const char *topdir) > goto out_close; > } > > + sqlite3_exec(dbh, "PRAGMA journal_mode = WAL;", NULL, NULL, NULL); > + > ret = sqlite_query_schema_version(); > switch (ret) { > case CLD_SQLITE_LATEST_SCHEMA_VERSION: > -- > 2.33.1 -- Chuck Lever ^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH] nfsdcld: use WAL journal for faster commits 2021-12-06 15:46 ` Chuck Lever III @ 2022-01-04 22:24 ` Bruce Fields 0 siblings, 0 replies; 53+ messages in thread From: Bruce Fields @ 2022-01-04 22:24 UTC (permalink / raw) To: Steve Dickson; +Cc: Chuck Lever III, Dai Ngo, Linux NFS Mailing List From: "J. Bruce Fields" <bfields@redhat.com> Currently nfsdcld is doing three fdatasyncs for each upcall. Based on SQLite documentation, WAL mode should also be safe, and I can confirm from an strace that it results in only one fdatasync each. This may be a bottleneck e.g. when lots of clients are being created or expired at once (e.g. on reboot). Not bothering with error checking, as this is just an optimization and nfsdcld will still function without. (Might be better to log something on failure, though.) Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> --- aclocal/libsqlite3.m4 | 2 +- utils/nfsdcld/sqlite.c | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) Resending to make sure SteveD saw this....--b. diff --git a/aclocal/libsqlite3.m4 b/aclocal/libsqlite3.m4 index 8c38993cbba8..c3beb4d56f0c 100644 --- a/aclocal/libsqlite3.m4 +++ b/aclocal/libsqlite3.m4 @@ -22,7 +22,7 @@ AC_DEFUN([AC_SQLITE3_VERS], [ int vers = sqlite3_libversion_number(); return vers != SQLITE_VERSION_NUMBER || - vers < 3003000; + vers < 3007000; } ], [libsqlite3_cv_is_recent=yes], [libsqlite3_cv_is_recent=no], [libsqlite3_cv_is_recent=unknown]) diff --git a/utils/nfsdcld/sqlite.c b/utils/nfsdcld/sqlite.c index 03016fb95823..eabb0daa95f5 100644 --- a/utils/nfsdcld/sqlite.c +++ b/utils/nfsdcld/sqlite.c @@ -826,6 +826,8 @@ sqlite_prepare_dbh(const char *topdir) goto out_close; } + sqlite3_exec(dbh, "PRAGMA journal_mode = WAL;", NULL, NULL, NULL); + ret = sqlite_query_schema_version(); switch (ret) { case CLD_SQLITE_LATEST_SCHEMA_VERSION: -- 2.33.1 ^ permalink raw reply related [flat|nested] 53+ messages in thread
end of thread, other threads:[~2022-01-04 22:24 UTC | newest] Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-09-29 0:56 [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server Dai Ngo 2021-09-29 0:56 ` [PATCH RFC v5 1/2] fs/lock: add new callback, lm_expire_lock, to lock_manager_operations Dai Ngo 2021-09-29 0:56 ` [PATCH RFC v5 2/2] nfsd: Initial implementation of NFSv4 Courteous Server Dai Ngo 2021-10-01 20:53 ` [PATCH RFC v5 0/2] " J. Bruce Fields 2021-10-01 21:41 ` dai.ngo 2021-10-01 23:03 ` J. Bruce Fields 2021-11-16 23:06 ` dai.ngo 2021-11-17 14:14 ` J. Bruce Fields 2021-11-17 17:59 ` dai.ngo 2021-11-17 21:46 ` dai.ngo 2021-11-18 0:34 ` J. Bruce Fields 2021-11-22 3:04 ` dai.ngo 2021-11-29 17:13 ` dai.ngo 2021-11-29 17:30 ` J. Bruce Fields 2021-11-29 18:32 ` dai.ngo 2021-11-29 19:03 ` Chuck Lever III 2021-11-29 19:13 ` Bruce Fields 2021-11-29 19:39 ` dai.ngo 2021-11-29 19:36 ` dai.ngo 2021-11-29 21:01 ` dai.ngo 2021-11-29 21:10 ` Chuck Lever III 2021-11-30 0:11 ` dai.ngo 2021-11-30 1:42 ` Chuck Lever III 2021-11-30 4:08 ` Trond Myklebust 2021-11-30 4:47 ` Chuck Lever III 2021-11-30 4:57 ` Trond Myklebust 2021-11-30 7:22 ` dai.ngo 2021-11-30 13:37 ` Trond Myklebust 2021-12-01 3:52 ` dai.ngo 2021-12-01 14:19 ` bfields 2021-11-30 15:36 ` Chuck Lever III 2021-11-30 16:05 ` Bruce Fields 2021-11-30 16:14 ` Trond Myklebust 2021-11-30 19:01 ` bfields 2021-11-30 7:13 ` dai.ngo 2021-11-30 15:32 ` Bruce Fields 2021-12-01 3:50 ` dai.ngo 2021-12-01 14:36 ` Bruce Fields 2021-12-01 14:51 ` Bruce Fields 2021-12-01 18:47 ` dai.ngo 2021-12-01 19:25 ` Bruce Fields 2021-12-02 17:53 ` Chuck Lever III 2021-12-01 17:42 ` Bruce Fields 2021-12-01 18:03 ` Bruce Fields 2021-12-01 19:50 ` Bruce Fields 2021-12-03 21:22 ` Bruce Fields 2021-12-03 21:55 ` [PATCH] nfsdcld: use WAL journal for faster commits Bruce Fields 2021-12-03 22:07 ` Chuck Lever III 2021-12-03 22:39 ` Bruce Fields 2021-12-04 0:35 ` Chuck Lever III 2021-12-04 1:24 ` Bruce Fields 2021-12-06 15:46 ` Chuck Lever III 2022-01-04 22:24 ` Bruce Fields
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.