[PATCH RFC v8 0/2] nfsd: Initial implementation of NFSv4 Courteous Server

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFC v8 0/2] nfsd: Initial implementation of NFSv4 Courteous Server
@ 2021-12-13 17:24 Dai Ngo
  2021-12-13 17:24 ` [PATCH RFC v8 1/2] fs/lock: add new callback, lm_expire_lock, to lock_manager_operations Dai Ngo
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Dai Ngo @ 2021-12-13 17:24 UTC (permalink / raw)
  To: bfields; +Cc: chuck.lever, jlayton, viro, linux-nfs, linux-fsdevel

Hi Bruce,

This series of patches implement the NFSv4 Courteous Server.

A server which does not immediately expunge the state on lease expiration
is known as a Courteous Server.  A Courteous Server continues to recognize
previously generated state tokens as valid until conflict arises between
the expired state and the requests from another client, or the server
reboots.

The v2 patch includes the following:

. add new callback, lm_expire_lock, to lock_manager_operations to
  allow the lock manager to take appropriate action with conflict lock.

. handle conflicts of NFSv4 locks with NFSv3/NLM and local locks.

. expire courtesy client after 24hr if client has not reconnected.

. do not allow expired client to become courtesy client if there are
  waiters for client's locks.

. modify client_info_show to show courtesy client and seconds from
  last renew.

. fix a problem with NFSv4.1 server where the it keeps returning
  SEQ4_STATUS_CB_PATH_DOWN in the successful SEQUENCE reply, after
  the courtesy client re-connects, causing the client to keep sending
  BCTS requests to server.

The v3 patch includes the following:

. modified posix_test_lock to check and resolve conflict locks
  to handle NLM TEST and NFSv4 LOCKT requests.

. separate out fix for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN.

The v4 patch includes:

. rework nfsd_check_courtesy to avoid dead lock of fl_lock and client_lock
  by asking the laudromat thread to destroy the courtesy client.

. handle NFSv4 share reservation conflicts with courtesy client. This
  includes conflicts between access mode and deny mode and vice versa.

. drop the patch for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN.

The v5 patch includes:

. fix recursive locking of file_rwsem from posix_lock_file. 

. retest with LOCKDEP enabled.

The v6 patch includes:

. merge witn 5.15-rc7

. fix a bug in nfs4_check_deny_bmap that did not check for matched
  nfs4_file before checking for access/deny conflict. This bug causes
  pynfs OPEN18 to fail since the server taking too long to release
  lots of un-conflict clients' state.

. enhance share reservation conflict handler to handle case where
  a large number of conflict courtesy clients need to be expired.
  The 1st 100 clients are expired synchronously and the rest are
  expired in the background by the laundromat and NFS4ERR_DELAY
  is returned to the NFS client. This is needed to prevent the
  NFS client from timing out waiting got the reply.

The v7 patch includes:

. Fix race condition in posix_test_lock and posix_lock_inode after
  dropping spinlock.

. Enhance nfsd4_fl_expire_lock to work with with new lm_expire_lock
  callback

. Always resolve share reservation conflicts asynchrously.

. Fix bug in nfs4_laundromat where spinlock is not used when
  scanning cl_ownerstr_hashtbl.

. Fix bug in nfs4_laundromat where idr_get_next was called
  with incorrect 'id'. 

. Merge nfs4_destroy_courtesy_client into nfsd4_fl_expire_lock.

The v8 patch includes:

. Fix warning in nfsd4_fl_expire_lock reported by test robot.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH RFC v8 1/2] fs/lock: add new callback, lm_expire_lock, to lock_manager_operations
  2021-12-13 17:24 [PATCH RFC v8 0/2] nfsd: Initial implementation of NFSv4 Courteous Server Dai Ngo
@ 2021-12-13 17:24 ` Dai Ngo
  2021-12-14 23:41   ` Chuck Lever III
  2021-12-13 17:24 ` [PATCH v8 2/2] nfsd: Initial implementation of NFSv4 Courteous Server Dai Ngo
  2021-12-13 18:35 ` [PATCH RFC v8 0/2] " Chuck Lever III
  2 siblings, 1 reply; 11+ messages in thread
From: Dai Ngo @ 2021-12-13 17:24 UTC (permalink / raw)
  To: bfields; +Cc: chuck.lever, jlayton, viro, linux-nfs, linux-fsdevel

Add new callback, lm_expire_lock, to lock_manager_operations to allow
the lock manager to take appropriate action to resolve the lock conflict
if possible. The callback takes 2 arguments, file_lock of the blocker
and a testonly flag:

testonly = 1  check and return lock manager's private data if lock conflict
              can be resolved else return NULL.
testonly = 0  resolve the conflict if possible, return true if conflict
              was resolved esle return false.

Lock manager, such as NFSv4 courteous server, uses this callback to
resolve conflict by destroying lock owner, or the NFSv4 courtesy client
(client that has expired but allowed to maintains its states) that owns
the lock.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
---
 fs/locks.c         | 40 +++++++++++++++++++++++++++++++++++++---
 include/linux/fs.h |  1 +
 2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 3d6fb4ae847b..5f3ea40ce2aa 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -952,8 +952,11 @@ void
 posix_test_lock(struct file *filp, struct file_lock *fl)
 {
 	struct file_lock *cfl;
+	struct file_lock *checked_cfl = NULL;
 	struct file_lock_context *ctx;
 	struct inode *inode = locks_inode(filp);
+	void *res_data;
+	void *(*func)(void *priv, bool testonly);
 
 	ctx = smp_load_acquire(&inode->i_flctx);
 	if (!ctx || list_empty_careful(&ctx->flc_posix)) {
@@ -962,11 +965,24 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
 	}
 
 	spin_lock(&ctx->flc_lock);
+retry:
 	list_for_each_entry(cfl, &ctx->flc_posix, fl_list) {
-		if (posix_locks_conflict(fl, cfl)) {
-			locks_copy_conflock(fl, cfl);
-			goto out;
+		if (!posix_locks_conflict(fl, cfl))
+			continue;
+		if (checked_cfl != cfl && cfl->fl_lmops &&
+				cfl->fl_lmops->lm_expire_lock) {
+			res_data = cfl->fl_lmops->lm_expire_lock(cfl, true);
+			if (res_data) {
+				func = cfl->fl_lmops->lm_expire_lock;
+				spin_unlock(&ctx->flc_lock);
+				func(res_data, false);
+				spin_lock(&ctx->flc_lock);
+				checked_cfl = cfl;
+				goto retry;
+			}
 		}
+		locks_copy_conflock(fl, cfl);
+		goto out;
 	}
 	fl->fl_type = F_UNLCK;
 out:
@@ -1136,10 +1152,13 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
 	struct file_lock *new_fl2 = NULL;
 	struct file_lock *left = NULL;
 	struct file_lock *right = NULL;
+	struct file_lock *checked_fl = NULL;
 	struct file_lock_context *ctx;
 	int error;
 	bool added = false;
 	LIST_HEAD(dispose);
+	void *res_data;
+	void *(*func)(void *priv, bool testonly);
 
 	ctx = locks_get_lock_context(inode, request->fl_type);
 	if (!ctx)
@@ -1166,9 +1185,24 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
 	 * blocker's list of waiters and the global blocked_hash.
 	 */
 	if (request->fl_type != F_UNLCK) {
+retry:
 		list_for_each_entry(fl, &ctx->flc_posix, fl_list) {
 			if (!posix_locks_conflict(request, fl))
 				continue;
+			if (checked_fl != fl && fl->fl_lmops &&
+					fl->fl_lmops->lm_expire_lock) {
+				res_data = fl->fl_lmops->lm_expire_lock(fl, true);
+				if (res_data) {
+					func = fl->fl_lmops->lm_expire_lock;
+					spin_unlock(&ctx->flc_lock);
+					percpu_up_read(&file_rwsem);
+					func(res_data, false);
+					percpu_down_read(&file_rwsem);
+					spin_lock(&ctx->flc_lock);
+					checked_fl = fl;
+					goto retry;
+				}
+			}
 			if (conflock)
 				locks_copy_conflock(conflock, fl);
 			error = -EAGAIN;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e7a633353fd2..8cb910c3a394 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1071,6 +1071,7 @@ struct lock_manager_operations {
 	int (*lm_change)(struct file_lock *, int, struct list_head *);
 	void (*lm_setup)(struct file_lock *, void **);
 	bool (*lm_breaker_owns_lease)(struct file_lock *);
+	void *(*lm_expire_lock)(void *priv, bool testonly);
 };
 
 struct lock_manager {
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v8 2/2] nfsd: Initial implementation of NFSv4 Courteous Server
  2021-12-13 17:24 [PATCH RFC v8 0/2] nfsd: Initial implementation of NFSv4 Courteous Server Dai Ngo
  2021-12-13 17:24 ` [PATCH RFC v8 1/2] fs/lock: add new callback, lm_expire_lock, to lock_manager_operations Dai Ngo
@ 2021-12-13 17:24 ` Dai Ngo
  2021-12-13 18:35 ` [PATCH RFC v8 0/2] " Chuck Lever III
  2 siblings, 0 replies; 11+ messages in thread
From: Dai Ngo @ 2021-12-13 17:24 UTC (permalink / raw)
  To: bfields; +Cc: chuck.lever, jlayton, viro, linux-nfs, linux-fsdevel

Currently an NFSv4 client must maintain its lease by using the at least
one of the state tokens or if nothing else, by issuing a RENEW (4.0), or
a singleton SEQUENCE (4.1) at least once during each lease period. If the
client fails to renew the lease, for any reason, the Linux server expunges
the state tokens immediately upon detection of the "failure to renew the
lease" condition and begins returning NFS4ERR_EXPIRED if the client should
reconnect and attempt to use the (now) expired state.

The default lease period for the Linux server is 90 seconds.  The typical
client cuts that in half and will issue a lease renewing operation every
45 seconds. The 90 second lease period is very short considering the
potential for moderately long term network partitions.  A network partition
refers to any loss of network connectivity between the NFS client and the
NFS server, regardless of its root cause.  This includes NIC failures, NIC
driver bugs, network misconfigurations & administrative errors, routers &
switches crashing and/or having software updates applied, even down to
cables being physically pulled.  In most cases, these network failures are
transient, although the duration is unknown.

A server which does not immediately expunge the state on lease expiration
is known as a Courteous Server.  A Courteous Server continues to recognize
previously generated state tokens as valid until conflict arises between
the expired state and the requests from another client, or the server
reboots.

The initial implementation of the Courteous Server will do the following:

. when the laundromat thread detects an expired client and if that client
still has established states on the Linux server and there is no waiters
for the client's locks then mark the client as a COURTESY_CLIENT and skip
destroying the client and all its states, otherwise destroy the client as
usual.

. detects conflict of OPEN request with COURTESY_CLIENT, destroys the
expired client and all its states, skips the delegation recall then allows
the conflicting request to succeed.

. detects conflict of LOCK/LOCKT, NLM LOCK and TEST, and local locks
requests with COURTESY_CLIENT, destroys the expired client and all its
states then allows the conflicting request to succeed.

. detects conflict of LOCK/LOCKT, NLM LOCK and TEST, and local locks
requests with COURTESY_CLIENT, destroys the expired client and all its
states then allows the conflicting request to succeed.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
---
 fs/nfsd/nfs4state.c | 274 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/nfsd/state.h     |   3 +
 2 files changed, 274 insertions(+), 3 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 3f4027a5de88..b1ff8d22534c 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -125,6 +125,11 @@ static void free_session(struct nfsd4_session *);
 static const struct nfsd4_callback_ops nfsd4_cb_recall_ops;
 static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops;
 
+static struct workqueue_struct *laundry_wq;
+static void laundromat_main(struct work_struct *);
+
+static const int courtesy_client_expiry = (24 * 60 * 60);	/* in secs */
+
 static bool is_session_dead(struct nfsd4_session *ses)
 {
 	return ses->se_flags & NFS4_SESSION_DEAD;
@@ -172,6 +177,7 @@ renew_client_locked(struct nfs4_client *clp)
 
 	list_move_tail(&clp->cl_lru, &nn->client_lru);
 	clp->cl_time = ktime_get_boottime_seconds();
+	clear_bit(NFSD4_COURTESY_CLIENT, &clp->cl_flags);
 }
 
 static void put_client_renew_locked(struct nfs4_client *clp)
@@ -2389,6 +2395,10 @@ static int client_info_show(struct seq_file *m, void *v)
 		seq_puts(m, "status: confirmed\n");
 	else
 		seq_puts(m, "status: unconfirmed\n");
+	seq_printf(m, "courtesy client: %s\n",
+		test_bit(NFSD4_COURTESY_CLIENT, &clp->cl_flags) ? "yes" : "no");
+	seq_printf(m, "seconds from last renew: %lld\n",
+		ktime_get_boottime_seconds() - clp->cl_time);
 	seq_printf(m, "name: ");
 	seq_quote_mem(m, clp->cl_name.data, clp->cl_name.len);
 	seq_printf(m, "\nminor version: %d\n", clp->cl_minorversion);
@@ -4662,6 +4672,33 @@ static void nfsd_break_one_deleg(struct nfs4_delegation *dp)
 	nfsd4_run_cb(&dp->dl_recall);
 }
 
+/*
+ * This function is called when a file is opened and there is a
+ * delegation conflict with another client. If the other client
+ * is a courtesy client then kick start the laundromat to destroy
+ * it.
+ */
+static bool
+nfsd_check_courtesy_client(struct nfs4_delegation *dp)
+{
+	struct svc_rqst *rqst;
+	struct nfs4_client *clp = dp->dl_recall.cb_clp;
+	struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
+
+	if (!i_am_nfsd())
+		goto out;
+	rqst = kthread_data(current);
+	if (rqst->rq_prog != NFS_PROGRAM || rqst->rq_vers < 4)
+		return false;
+out:
+	if (test_bit(NFSD4_COURTESY_CLIENT, &clp->cl_flags)) {
+		set_bit(NFSD4_DESTROY_COURTESY_CLIENT, &clp->cl_flags);
+		mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
+		return true;
+	}
+	return false;
+}
+
 /* Called from break_lease() with i_lock held. */
 static bool
 nfsd_break_deleg_cb(struct file_lock *fl)
@@ -4670,6 +4707,8 @@ nfsd_break_deleg_cb(struct file_lock *fl)
 	struct nfs4_delegation *dp = (struct nfs4_delegation *)fl->fl_owner;
 	struct nfs4_file *fp = dp->dl_stid.sc_file;
 
+	if (nfsd_check_courtesy_client(dp))
+		return false;
 	trace_nfsd_cb_recall(&dp->dl_stid);
 
 	/*
@@ -4912,6 +4951,118 @@ nfsd4_truncate(struct svc_rqst *rqstp, struct svc_fh *fh,
 	return nfsd_setattr(rqstp, fh, &iattr, 0, (time64_t)0);
 }
 
+static bool
+__nfs4_check_deny_bmap(struct nfs4_ol_stateid *stp, u32 access,
+			bool share_access)
+{
+	if (share_access) {
+		if (!stp->st_deny_bmap)
+			return false;
+
+		if ((stp->st_deny_bmap & (1 << NFS4_SHARE_DENY_BOTH)) ||
+			(access & NFS4_SHARE_ACCESS_READ &&
+				stp->st_deny_bmap & (1 << NFS4_SHARE_DENY_READ)) ||
+			(access & NFS4_SHARE_ACCESS_WRITE &&
+				stp->st_deny_bmap & (1 << NFS4_SHARE_DENY_WRITE))) {
+			return true;
+		}
+		return false;
+	}
+	if ((access & NFS4_SHARE_DENY_BOTH) ||
+		(access & NFS4_SHARE_DENY_READ &&
+			stp->st_access_bmap & (1 << NFS4_SHARE_ACCESS_READ)) ||
+		(access & NFS4_SHARE_DENY_WRITE &&
+			stp->st_access_bmap & (1 << NFS4_SHARE_ACCESS_WRITE))) {
+		return true;
+	}
+	return false;
+}
+
+/*
+ * access: if share_access is true then check access mode else check deny mode
+ */
+static bool
+nfs4_check_deny_bmap(struct nfs4_client *clp, struct nfs4_file *fp,
+		struct nfs4_ol_stateid *st, u32 access, bool share_access)
+{
+	int i;
+	struct nfs4_openowner *oo;
+	struct nfs4_stateowner *so, *tmp;
+	struct nfs4_ol_stateid *stp, *stmp;
+
+	spin_lock(&clp->cl_lock);
+	for (i = 0; i < OWNER_HASH_SIZE; i++) {
+		list_for_each_entry_safe(so, tmp, &clp->cl_ownerstr_hashtbl[i],
+					so_strhash) {
+			if (!so->so_is_open_owner)
+				continue;
+			oo = openowner(so);
+			list_for_each_entry_safe(stp, stmp,
+				&oo->oo_owner.so_stateids, st_perstateowner) {
+				if (stp == st || stp->st_stid.sc_file != fp)
+					continue;
+				if (__nfs4_check_deny_bmap(stp, access,
+							share_access)) {
+					spin_unlock(&clp->cl_lock);
+					return true;
+				}
+			}
+		}
+	}
+	spin_unlock(&clp->cl_lock);
+	return false;
+}
+
+/*
+ * Function to check if the nfserr_share_denied error for 'fp' resulted
+ * from conflict with courtesy clients then release their state to resolve
+ * the conflict.
+ *
+ * Function returns:
+ *	 0 -  no conflict with courtesy clients
+ *	>0 -  conflict with courtesy clients being resolved in background
+ *            return nfserr_jukebox to NFS client
+ */
+static int
+nfs4_destroy_clnts_with_sresv_conflict(struct svc_rqst *rqstp,
+			struct nfs4_file *fp, struct nfs4_ol_stateid *stp,
+			u32 access, bool share_access)
+{
+	int cnt = 0;
+	int async_cnt = 0;
+	bool no_retry = false;
+	struct nfs4_client *cl;
+	struct list_head *pos, *next;
+	struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
+
+	spin_lock(&nn->client_lock);
+	list_for_each_safe(pos, next, &nn->client_lru) {
+		cl = list_entry(pos, struct nfs4_client, cl_lru);
+		/*
+		 * check all nfs4_ol_stateid of this client
+		 * for conflicts with 'access'mode.
+		 */
+		if (nfs4_check_deny_bmap(cl, fp, stp, access, share_access)) {
+			if (!test_bit(NFSD4_COURTESY_CLIENT, &cl->cl_flags)) {
+				/* conflict with non-courtesy client */
+				no_retry = true;
+				cnt = 0;
+				goto out;
+			}
+			set_bit(NFSD4_DESTROY_COURTESY_CLIENT, &cl->cl_flags);
+			async_cnt++;
+		}
+	}
+out:
+	spin_unlock(&nn->client_lock);
+	if (async_cnt) {
+		mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
+		if (!no_retry)
+			cnt = async_cnt;
+	}
+	return cnt;
+}
+
 static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp,
 		struct svc_fh *cur_fh, struct nfs4_ol_stateid *stp,
 		struct nfsd4_open *open)
@@ -4921,6 +5072,7 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp,
 	int oflag = nfs4_access_to_omode(open->op_share_access);
 	int access = nfs4_access_to_access(open->op_share_access);
 	unsigned char old_access_bmap, old_deny_bmap;
+	int cnt = 0;
 
 	spin_lock(&fp->fi_lock);
 
@@ -4931,6 +5083,12 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp,
 	status = nfs4_file_check_deny(fp, open->op_share_deny);
 	if (status != nfs_ok) {
 		spin_unlock(&fp->fi_lock);
+		if (status != nfserr_share_denied)
+			goto out;
+		cnt = nfs4_destroy_clnts_with_sresv_conflict(rqstp, fp,
+				stp, open->op_share_deny, false);
+		if (cnt > 0)
+			status = nfserr_jukebox;
 		goto out;
 	}
 
@@ -4938,6 +5096,12 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp,
 	status = nfs4_file_get_access(fp, open->op_share_access);
 	if (status != nfs_ok) {
 		spin_unlock(&fp->fi_lock);
+		if (status != nfserr_share_denied)
+			goto out;
+		cnt = nfs4_destroy_clnts_with_sresv_conflict(rqstp, fp,
+				stp, open->op_share_access, true);
+		if (cnt > 0)
+			status = nfserr_jukebox;
 		goto out;
 	}
 
@@ -5572,6 +5736,47 @@ static void nfsd4_ssc_expire_umount(struct nfsd_net *nn)
 }
 #endif
 
+static
+bool nfs4_anylock_conflict(struct nfs4_client *clp)
+{
+	int i;
+	struct nfs4_stateowner *so, *tmp;
+	struct nfs4_lockowner *lo;
+	struct nfs4_ol_stateid *stp;
+	struct nfs4_file *nf;
+	struct inode *ino;
+	struct file_lock_context *ctx;
+	struct file_lock *fl;
+
+	for (i = 0; i < OWNER_HASH_SIZE; i++) {
+		/* scan each lock owner */
+		list_for_each_entry_safe(so, tmp, &clp->cl_ownerstr_hashtbl[i],
+				so_strhash) {
+			if (so->so_is_open_owner)
+				continue;
+
+			/* scan lock states of this lock owner */
+			lo = lockowner(so);
+			list_for_each_entry(stp, &lo->lo_owner.so_stateids,
+					st_perstateowner) {
+				nf = stp->st_stid.sc_file;
+				ino = nf->fi_inode;
+				ctx = ino->i_flctx;
+				if (!ctx)
+					continue;
+				/* check each lock belongs to this lock state */
+				list_for_each_entry(fl, &ctx->flc_posix, fl_list) {
+					if (fl->fl_owner != lo)
+						continue;
+					if (!list_empty(&fl->fl_blocked_requests))
+						return true;
+				}
+			}
+		}
+	}
+	return false;
+}
+
 static time64_t
 nfs4_laundromat(struct nfsd_net *nn)
 {
@@ -5587,7 +5792,9 @@ nfs4_laundromat(struct nfsd_net *nn)
 	};
 	struct nfs4_cpntf_state *cps;
 	copy_stateid_t *cps_t;
+	struct nfs4_stid *stid;
 	int i;
+	int id;
 
 	if (clients_still_reclaiming(nn)) {
 		lt.new_timeo = 0;
@@ -5608,8 +5815,36 @@ nfs4_laundromat(struct nfsd_net *nn)
 	spin_lock(&nn->client_lock);
 	list_for_each_safe(pos, next, &nn->client_lru) {
 		clp = list_entry(pos, struct nfs4_client, cl_lru);
+		if (test_bit(NFSD4_DESTROY_COURTESY_CLIENT, &clp->cl_flags)) {
+			clear_bit(NFSD4_COURTESY_CLIENT, &clp->cl_flags);
+			goto exp_client;
+		}
+		if (test_bit(NFSD4_COURTESY_CLIENT, &clp->cl_flags)) {
+			if (ktime_get_boottime_seconds() >= clp->courtesy_client_expiry)
+				goto exp_client;
+			/*
+			 * after umount, v4.0 client is still around
+			 * waiting to be expired. Check again and if
+			 * it has no state then expire it.
+			 */
+			if (clp->cl_minorversion)
+				continue;
+		}
 		if (!state_expired(&lt, clp->cl_time))
 			break;
+		id = 0;
+		spin_lock(&clp->cl_lock);
+		stid = idr_get_next(&clp->cl_stateids, &id);
+		if (stid && !nfs4_anylock_conflict(clp)) {
+			/* client still has states */
+			spin_unlock(&clp->cl_lock);
+			clp->courtesy_client_expiry =
+				ktime_get_boottime_seconds() + courtesy_client_expiry;
+			set_bit(NFSD4_COURTESY_CLIENT, &clp->cl_flags);
+			continue;
+		}
+		spin_unlock(&clp->cl_lock);
+exp_client:
 		if (mark_client_expired_locked(clp))
 			continue;
 		list_add(&clp->cl_lru, &reaplist);
@@ -5689,9 +5924,6 @@ nfs4_laundromat(struct nfsd_net *nn)
 	return max_t(time64_t, lt.new_timeo, NFSD_LAUNDROMAT_MINTIMEOUT);
 }
 
-static struct workqueue_struct *laundry_wq;
-static void laundromat_main(struct work_struct *);
-
 static void
 laundromat_main(struct work_struct *laundry)
 {
@@ -6496,6 +6728,41 @@ nfs4_transform_lock_offset(struct file_lock *lock)
 		lock->fl_end = OFFSET_MAX;
 }
 
+/*
+ * If testonly is true then check if the client owns the lock is a
+ * courtesy client. If it is then return that client else return NULL.
+ * If testonly is false then destroy the specified courtesy client.
+ */
+static void *
+nfsd4_fl_expire_lock(void *cfl, bool testonly)
+{
+	struct file_lock *fl;
+	struct nfs4_lockowner *lo;
+	struct nfs4_client *clp;
+	struct nfsd_net *nn;
+
+	if (testonly) {
+		fl = (struct file_lock *)cfl;
+		lo = (struct nfs4_lockowner *)fl->fl_owner;
+		clp = lo->lo_owner.so_client;
+		if (test_bit(NFSD4_COURTESY_CLIENT, &clp->cl_flags))
+			return clp;
+		return NULL;
+	}
+	clp = (struct nfs4_client *)cfl;
+
+	nn = net_generic(clp->net, nfsd_net_id);
+	spin_lock(&nn->client_lock);
+	if (!test_bit(NFSD4_COURTESY_CLIENT, &clp->cl_flags) ||
+			mark_client_expired_locked(clp)) {
+		spin_unlock(&nn->client_lock);
+		return NULL;
+	}
+	spin_unlock(&nn->client_lock);
+	expire_client(clp);
+	return clp;
+}
+
 static fl_owner_t
 nfsd4_fl_get_owner(fl_owner_t owner)
 {
@@ -6543,6 +6810,7 @@ static const struct lock_manager_operations nfsd_posix_mng_ops  = {
 	.lm_notify = nfsd4_lm_notify,
 	.lm_get_owner = nfsd4_fl_get_owner,
 	.lm_put_owner = nfsd4_fl_put_owner,
+	.lm_expire_lock = nfsd4_fl_expire_lock,
 };
 
 static inline void
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index e73bdbb1634a..93e30b101578 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -345,6 +345,8 @@ struct nfs4_client {
 #define NFSD4_CLIENT_UPCALL_LOCK	(5)	/* upcall serialization */
 #define NFSD4_CLIENT_CB_FLAG_MASK	(1 << NFSD4_CLIENT_CB_UPDATE | \
 					 1 << NFSD4_CLIENT_CB_KILL)
+#define NFSD4_COURTESY_CLIENT		(6)	/* be nice to expired client */
+#define NFSD4_DESTROY_COURTESY_CLIENT	(7)
 	unsigned long		cl_flags;
 	const struct cred	*cl_cb_cred;
 	struct rpc_clnt		*cl_cb_client;
@@ -385,6 +387,7 @@ struct nfs4_client {
 	struct list_head	async_copies;	/* list of async copies */
 	spinlock_t		async_lock;	/* lock for async copies */
 	atomic_t		cl_cb_inflight;	/* Outstanding callbacks */
+	int			courtesy_client_expiry;
 };
 
 /* struct nfs4_client_reset
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC v8 0/2] nfsd: Initial implementation of NFSv4 Courteous Server
  2021-12-13 17:24 [PATCH RFC v8 0/2] nfsd: Initial implementation of NFSv4 Courteous Server Dai Ngo
  2021-12-13 17:24 ` [PATCH RFC v8 1/2] fs/lock: add new callback, lm_expire_lock, to lock_manager_operations Dai Ngo
  2021-12-13 17:24 ` [PATCH v8 2/2] nfsd: Initial implementation of NFSv4 Courteous Server Dai Ngo
@ 2021-12-13 18:35 ` Chuck Lever III
  2 siblings, 0 replies; 11+ messages in thread
From: Chuck Lever III @ 2021-12-13 18:35 UTC (permalink / raw)
  To: Dai Ngo
  Cc: Bruce Fields, Jeff Layton, Al Viro, Linux NFS Mailing List,
	linux-fsdevel



> On Dec 13, 2021, at 12:24 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
> 
> 
> Hi Bruce,
> 
> This series of patches implement the NFSv4 Courteous Server.
> 
> A server which does not immediately expunge the state on lease expiration
> is known as a Courteous Server.  A Courteous Server continues to recognize
> previously generated state tokens as valid until conflict arises between
> the expired state and the requests from another client, or the server
> reboots.
> 
> The v2 patch includes the following:
> 
> . add new callback, lm_expire_lock, to lock_manager_operations to
>  allow the lock manager to take appropriate action with conflict lock.
> 
> . handle conflicts of NFSv4 locks with NFSv3/NLM and local locks.
> 
> . expire courtesy client after 24hr if client has not reconnected.
> 
> . do not allow expired client to become courtesy client if there are
>  waiters for client's locks.
> 
> . modify client_info_show to show courtesy client and seconds from
>  last renew.
> 
> . fix a problem with NFSv4.1 server where the it keeps returning
>  SEQ4_STATUS_CB_PATH_DOWN in the successful SEQUENCE reply, after
>  the courtesy client re-connects, causing the client to keep sending
>  BCTS requests to server.
> 
> The v3 patch includes the following:
> 
> . modified posix_test_lock to check and resolve conflict locks
>  to handle NLM TEST and NFSv4 LOCKT requests.
> 
> . separate out fix for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN.
> 
> The v4 patch includes:
> 
> . rework nfsd_check_courtesy to avoid dead lock of fl_lock and client_lock
>  by asking the laudromat thread to destroy the courtesy client.
> 
> . handle NFSv4 share reservation conflicts with courtesy client. This
>  includes conflicts between access mode and deny mode and vice versa.
> 
> . drop the patch for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN.
> 
> The v5 patch includes:
> 
> . fix recursive locking of file_rwsem from posix_lock_file. 
> 
> . retest with LOCKDEP enabled.
> 
> The v6 patch includes:
> 
> . merge witn 5.15-rc7
> 
> . fix a bug in nfs4_check_deny_bmap that did not check for matched
>  nfs4_file before checking for access/deny conflict. This bug causes
>  pynfs OPEN18 to fail since the server taking too long to release
>  lots of un-conflict clients' state.
> 
> . enhance share reservation conflict handler to handle case where
>  a large number of conflict courtesy clients need to be expired.
>  The 1st 100 clients are expired synchronously and the rest are
>  expired in the background by the laundromat and NFS4ERR_DELAY
>  is returned to the NFS client. This is needed to prevent the
>  NFS client from timing out waiting got the reply.
> 
> The v7 patch includes:
> 
> . Fix race condition in posix_test_lock and posix_lock_inode after
>  dropping spinlock.
> 
> . Enhance nfsd4_fl_expire_lock to work with with new lm_expire_lock
>  callback
> 
> . Always resolve share reservation conflicts asynchrously.
> 
> . Fix bug in nfs4_laundromat where spinlock is not used when
>  scanning cl_ownerstr_hashtbl.
> 
> . Fix bug in nfs4_laundromat where idr_get_next was called
>  with incorrect 'id'. 
> 
> . Merge nfs4_destroy_courtesy_client into nfsd4_fl_expire_lock.
> 
> The v8 patch includes:
> 
> . Fix warning in nfsd4_fl_expire_lock reported by test robot.

These patches are also available in the nfsd-courteous-server topic
branch of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC v8 1/2] fs/lock: add new callback, lm_expire_lock, to lock_manager_operations
  2021-12-13 17:24 ` [PATCH RFC v8 1/2] fs/lock: add new callback, lm_expire_lock, to lock_manager_operations Dai Ngo
@ 2021-12-14 23:41   ` Chuck Lever III
  2021-12-17 20:35     ` Bruce Fields
  0 siblings, 1 reply; 11+ messages in thread
From: Chuck Lever III @ 2021-12-14 23:41 UTC (permalink / raw)
  To: Linux NFS Mailing List
  Cc: Bruce Fields, Dai Ngo, Jeff Layton, Al Viro, linux-fsdevel



> On Dec 13, 2021, at 12:24 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
> 
> Add new callback, lm_expire_lock, to lock_manager_operations to allow
> the lock manager to take appropriate action to resolve the lock conflict
> if possible. The callback takes 2 arguments, file_lock of the blocker
> and a testonly flag:
> 
> testonly = 1  check and return lock manager's private data if lock conflict
>              can be resolved else return NULL.
> testonly = 0  resolve the conflict if possible, return true if conflict
>              was resolved esle return false.
> 
> Lock manager, such as NFSv4 courteous server, uses this callback to
> resolve conflict by destroying lock owner, or the NFSv4 courtesy client
> (client that has expired but allowed to maintains its states) that owns
> the lock.
> 
> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> ---
> fs/locks.c         | 40 +++++++++++++++++++++++++++++++++++++---
> include/linux/fs.h |  1 +
> 2 files changed, 38 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index 3d6fb4ae847b..5f3ea40ce2aa 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -952,8 +952,11 @@ void
> posix_test_lock(struct file *filp, struct file_lock *fl)
> {
> 	struct file_lock *cfl;
> +	struct file_lock *checked_cfl = NULL;
> 	struct file_lock_context *ctx;
> 	struct inode *inode = locks_inode(filp);
> +	void *res_data;
> +	void *(*func)(void *priv, bool testonly);
> 
> 	ctx = smp_load_acquire(&inode->i_flctx);
> 	if (!ctx || list_empty_careful(&ctx->flc_posix)) {
> @@ -962,11 +965,24 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
> 	}
> 
> 	spin_lock(&ctx->flc_lock);
> +retry:
> 	list_for_each_entry(cfl, &ctx->flc_posix, fl_list) {
> -		if (posix_locks_conflict(fl, cfl)) {
> -			locks_copy_conflock(fl, cfl);
> -			goto out;
> +		if (!posix_locks_conflict(fl, cfl))
> +			continue;
> +		if (checked_cfl != cfl && cfl->fl_lmops &&
> +				cfl->fl_lmops->lm_expire_lock) {
> +			res_data = cfl->fl_lmops->lm_expire_lock(cfl, true);
> +			if (res_data) {
> +				func = cfl->fl_lmops->lm_expire_lock;
> +				spin_unlock(&ctx->flc_lock);
> +				func(res_data, false);
> +				spin_lock(&ctx->flc_lock);
> +				checked_cfl = cfl;
> +				goto retry;
> +			}
> 		}

Dai and I discussed this offline. Depending on a pointer to represent
exactly the same struct file_lock across a dropped spinlock is racy.
Dai plans to investigate other mechanisms to perform this check
reliably.


> +		locks_copy_conflock(fl, cfl);
> +		goto out;
> 	}
> 	fl->fl_type = F_UNLCK;
> out:
> @@ -1136,10 +1152,13 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
> 	struct file_lock *new_fl2 = NULL;
> 	struct file_lock *left = NULL;
> 	struct file_lock *right = NULL;
> +	struct file_lock *checked_fl = NULL;
> 	struct file_lock_context *ctx;
> 	int error;
> 	bool added = false;
> 	LIST_HEAD(dispose);
> +	void *res_data;
> +	void *(*func)(void *priv, bool testonly);
> 
> 	ctx = locks_get_lock_context(inode, request->fl_type);
> 	if (!ctx)
> @@ -1166,9 +1185,24 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
> 	 * blocker's list of waiters and the global blocked_hash.
> 	 */
> 	if (request->fl_type != F_UNLCK) {
> +retry:
> 		list_for_each_entry(fl, &ctx->flc_posix, fl_list) {
> 			if (!posix_locks_conflict(request, fl))
> 				continue;
> +			if (checked_fl != fl && fl->fl_lmops &&
> +					fl->fl_lmops->lm_expire_lock) {
> +				res_data = fl->fl_lmops->lm_expire_lock(fl, true);
> +				if (res_data) {
> +					func = fl->fl_lmops->lm_expire_lock;
> +					spin_unlock(&ctx->flc_lock);
> +					percpu_up_read(&file_rwsem);
> +					func(res_data, false);
> +					percpu_down_read(&file_rwsem);
> +					spin_lock(&ctx->flc_lock);
> +					checked_fl = fl;
> +					goto retry;
> +				}
> +			}
> 			if (conflock)
> 				locks_copy_conflock(conflock, fl);
> 			error = -EAGAIN;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index e7a633353fd2..8cb910c3a394 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1071,6 +1071,7 @@ struct lock_manager_operations {
> 	int (*lm_change)(struct file_lock *, int, struct list_head *);
> 	void (*lm_setup)(struct file_lock *, void **);
> 	bool (*lm_breaker_owns_lease)(struct file_lock *);
> +	void *(*lm_expire_lock)(void *priv, bool testonly);
> };
> 
> struct lock_manager {
> -- 
> 2.9.5
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC v8 1/2] fs/lock: add new callback, lm_expire_lock, to lock_manager_operations
  2021-12-14 23:41   ` Chuck Lever III
@ 2021-12-17 20:35     ` Bruce Fields
  2021-12-17 20:50       ` dai.ngo
  0 siblings, 1 reply; 11+ messages in thread
From: Bruce Fields @ 2021-12-17 20:35 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Linux NFS Mailing List, Dai Ngo, Jeff Layton, Al Viro, linux-fsdevel

On Tue, Dec 14, 2021 at 11:41:41PM +0000, Chuck Lever III wrote:
> 
> 
> > On Dec 13, 2021, at 12:24 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
> > 
> > Add new callback, lm_expire_lock, to lock_manager_operations to allow
> > the lock manager to take appropriate action to resolve the lock conflict
> > if possible. The callback takes 2 arguments, file_lock of the blocker
> > and a testonly flag:
> > 
> > testonly = 1  check and return lock manager's private data if lock conflict
> >              can be resolved else return NULL.
> > testonly = 0  resolve the conflict if possible, return true if conflict
> >              was resolved esle return false.
> > 
> > Lock manager, such as NFSv4 courteous server, uses this callback to
> > resolve conflict by destroying lock owner, or the NFSv4 courtesy client
> > (client that has expired but allowed to maintains its states) that owns
> > the lock.
> > 
> > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > ---
> > fs/locks.c         | 40 +++++++++++++++++++++++++++++++++++++---
> > include/linux/fs.h |  1 +
> > 2 files changed, 38 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/locks.c b/fs/locks.c
> > index 3d6fb4ae847b..5f3ea40ce2aa 100644
> > --- a/fs/locks.c
> > +++ b/fs/locks.c
> > @@ -952,8 +952,11 @@ void
> > posix_test_lock(struct file *filp, struct file_lock *fl)
> > {
> > 	struct file_lock *cfl;
> > +	struct file_lock *checked_cfl = NULL;
> > 	struct file_lock_context *ctx;
> > 	struct inode *inode = locks_inode(filp);
> > +	void *res_data;
> > +	void *(*func)(void *priv, bool testonly);
> > 
> > 	ctx = smp_load_acquire(&inode->i_flctx);
> > 	if (!ctx || list_empty_careful(&ctx->flc_posix)) {
> > @@ -962,11 +965,24 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
> > 	}
> > 
> > 	spin_lock(&ctx->flc_lock);
> > +retry:
> > 	list_for_each_entry(cfl, &ctx->flc_posix, fl_list) {
> > -		if (posix_locks_conflict(fl, cfl)) {
> > -			locks_copy_conflock(fl, cfl);
> > -			goto out;
> > +		if (!posix_locks_conflict(fl, cfl))
> > +			continue;
> > +		if (checked_cfl != cfl && cfl->fl_lmops &&
> > +				cfl->fl_lmops->lm_expire_lock) {
> > +			res_data = cfl->fl_lmops->lm_expire_lock(cfl, true);
> > +			if (res_data) {
> > +				func = cfl->fl_lmops->lm_expire_lock;
> > +				spin_unlock(&ctx->flc_lock);
> > +				func(res_data, false);
> > +				spin_lock(&ctx->flc_lock);
> > +				checked_cfl = cfl;
> > +				goto retry;
> > +			}
> > 		}
> 
> Dai and I discussed this offline. Depending on a pointer to represent
> exactly the same struct file_lock across a dropped spinlock is racy.

Yes.  There's also no need for that (checked_cfl != cfl) check, though.
By the time func() returns, that lock should be gone from the list
anyway.

It's a little inefficient to have to restart the list every time--but
that theoretical n^2 behavior won't matter much compared to the time
spent waiting for clients to expire.  And this approach has the benefit
of being simple.

--b.

> Dai plans to investigate other mechanisms to perform this check
> reliably.
> 
> 
> > +		locks_copy_conflock(fl, cfl);
> > +		goto out;
> > 	}
> > 	fl->fl_type = F_UNLCK;
> > out:
> > @@ -1136,10 +1152,13 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
> > 	struct file_lock *new_fl2 = NULL;
> > 	struct file_lock *left = NULL;
> > 	struct file_lock *right = NULL;
> > +	struct file_lock *checked_fl = NULL;
> > 	struct file_lock_context *ctx;
> > 	int error;
> > 	bool added = false;
> > 	LIST_HEAD(dispose);
> > +	void *res_data;
> > +	void *(*func)(void *priv, bool testonly);
> > 
> > 	ctx = locks_get_lock_context(inode, request->fl_type);
> > 	if (!ctx)
> > @@ -1166,9 +1185,24 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
> > 	 * blocker's list of waiters and the global blocked_hash.
> > 	 */
> > 	if (request->fl_type != F_UNLCK) {
> > +retry:
> > 		list_for_each_entry(fl, &ctx->flc_posix, fl_list) {
> > 			if (!posix_locks_conflict(request, fl))
> > 				continue;
> > +			if (checked_fl != fl && fl->fl_lmops &&
> > +					fl->fl_lmops->lm_expire_lock) {
> > +				res_data = fl->fl_lmops->lm_expire_lock(fl, true);
> > +				if (res_data) {
> > +					func = fl->fl_lmops->lm_expire_lock;
> > +					spin_unlock(&ctx->flc_lock);
> > +					percpu_up_read(&file_rwsem);
> > +					func(res_data, false);
> > +					percpu_down_read(&file_rwsem);
> > +					spin_lock(&ctx->flc_lock);
> > +					checked_fl = fl;
> > +					goto retry;
> > +				}
> > +			}
> > 			if (conflock)
> > 				locks_copy_conflock(conflock, fl);
> > 			error = -EAGAIN;
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index e7a633353fd2..8cb910c3a394 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -1071,6 +1071,7 @@ struct lock_manager_operations {
> > 	int (*lm_change)(struct file_lock *, int, struct list_head *);
> > 	void (*lm_setup)(struct file_lock *, void **);
> > 	bool (*lm_breaker_owns_lease)(struct file_lock *);
> > +	void *(*lm_expire_lock)(void *priv, bool testonly);
> > };
> > 
> > struct lock_manager {
> > -- 
> > 2.9.5
> > 
> 
> --
> Chuck Lever
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC v8 1/2] fs/lock: add new callback, lm_expire_lock, to lock_manager_operations
  2021-12-17 20:35     ` Bruce Fields
@ 2021-12-17 20:50       ` dai.ngo
  2021-12-17 20:58         ` Bruce Fields
  2021-12-17 21:06         ` Chuck Lever III
  0 siblings, 2 replies; 11+ messages in thread
From: dai.ngo @ 2021-12-17 20:50 UTC (permalink / raw)
  To: Bruce Fields, Chuck Lever III
  Cc: Linux NFS Mailing List, Jeff Layton, Al Viro, linux-fsdevel


On 12/17/21 12:35 PM, Bruce Fields wrote:
> On Tue, Dec 14, 2021 at 11:41:41PM +0000, Chuck Lever III wrote:
>>
>>> On Dec 13, 2021, at 12:24 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
>>>
>>> Add new callback, lm_expire_lock, to lock_manager_operations to allow
>>> the lock manager to take appropriate action to resolve the lock conflict
>>> if possible. The callback takes 2 arguments, file_lock of the blocker
>>> and a testonly flag:
>>>
>>> testonly = 1  check and return lock manager's private data if lock conflict
>>>               can be resolved else return NULL.
>>> testonly = 0  resolve the conflict if possible, return true if conflict
>>>               was resolved esle return false.
>>>
>>> Lock manager, such as NFSv4 courteous server, uses this callback to
>>> resolve conflict by destroying lock owner, or the NFSv4 courtesy client
>>> (client that has expired but allowed to maintains its states) that owns
>>> the lock.
>>>
>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>> ---
>>> fs/locks.c         | 40 +++++++++++++++++++++++++++++++++++++---
>>> include/linux/fs.h |  1 +
>>> 2 files changed, 38 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/fs/locks.c b/fs/locks.c
>>> index 3d6fb4ae847b..5f3ea40ce2aa 100644
>>> --- a/fs/locks.c
>>> +++ b/fs/locks.c
>>> @@ -952,8 +952,11 @@ void
>>> posix_test_lock(struct file *filp, struct file_lock *fl)
>>> {
>>> 	struct file_lock *cfl;
>>> +	struct file_lock *checked_cfl = NULL;
>>> 	struct file_lock_context *ctx;
>>> 	struct inode *inode = locks_inode(filp);
>>> +	void *res_data;
>>> +	void *(*func)(void *priv, bool testonly);
>>>
>>> 	ctx = smp_load_acquire(&inode->i_flctx);
>>> 	if (!ctx || list_empty_careful(&ctx->flc_posix)) {
>>> @@ -962,11 +965,24 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
>>> 	}
>>>
>>> 	spin_lock(&ctx->flc_lock);
>>> +retry:
>>> 	list_for_each_entry(cfl, &ctx->flc_posix, fl_list) {
>>> -		if (posix_locks_conflict(fl, cfl)) {
>>> -			locks_copy_conflock(fl, cfl);
>>> -			goto out;
>>> +		if (!posix_locks_conflict(fl, cfl))
>>> +			continue;
>>> +		if (checked_cfl != cfl && cfl->fl_lmops &&
>>> +				cfl->fl_lmops->lm_expire_lock) {
>>> +			res_data = cfl->fl_lmops->lm_expire_lock(cfl, true);
>>> +			if (res_data) {
>>> +				func = cfl->fl_lmops->lm_expire_lock;
>>> +				spin_unlock(&ctx->flc_lock);
>>> +				func(res_data, false);
>>> +				spin_lock(&ctx->flc_lock);
>>> +				checked_cfl = cfl;
>>> +				goto retry;
>>> +			}
>>> 		}
>> Dai and I discussed this offline. Depending on a pointer to represent
>> exactly the same struct file_lock across a dropped spinlock is racy.
> Yes.  There's also no need for that (checked_cfl != cfl) check, though.
> By the time func() returns, that lock should be gone from the list
> anyway.

func() eventually calls expire_client. But we do not know if expire_client
succeeds. One simple way to know if the conflict client was successfully
expired is to check the list again. If the client was successfully expired
then its locks were removed from the list. Otherwise we get the same 'cfl'
from the list again on the next get.

-Dai

>
> It's a little inefficient to have to restart the list every time--but
> that theoretical n^2 behavior won't matter much compared to the time
> spent waiting for clients to expire.  And this approach has the benefit
> of being simple.
>
> --b.
>
>> Dai plans to investigate other mechanisms to perform this check
>> reliably.
>>
>>
>>> +		locks_copy_conflock(fl, cfl);
>>> +		goto out;
>>> 	}
>>> 	fl->fl_type = F_UNLCK;
>>> out:
>>> @@ -1136,10 +1152,13 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
>>> 	struct file_lock *new_fl2 = NULL;
>>> 	struct file_lock *left = NULL;
>>> 	struct file_lock *right = NULL;
>>> +	struct file_lock *checked_fl = NULL;
>>> 	struct file_lock_context *ctx;
>>> 	int error;
>>> 	bool added = false;
>>> 	LIST_HEAD(dispose);
>>> +	void *res_data;
>>> +	void *(*func)(void *priv, bool testonly);
>>>
>>> 	ctx = locks_get_lock_context(inode, request->fl_type);
>>> 	if (!ctx)
>>> @@ -1166,9 +1185,24 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
>>> 	 * blocker's list of waiters and the global blocked_hash.
>>> 	 */
>>> 	if (request->fl_type != F_UNLCK) {
>>> +retry:
>>> 		list_for_each_entry(fl, &ctx->flc_posix, fl_list) {
>>> 			if (!posix_locks_conflict(request, fl))
>>> 				continue;
>>> +			if (checked_fl != fl && fl->fl_lmops &&
>>> +					fl->fl_lmops->lm_expire_lock) {
>>> +				res_data = fl->fl_lmops->lm_expire_lock(fl, true);
>>> +				if (res_data) {
>>> +					func = fl->fl_lmops->lm_expire_lock;
>>> +					spin_unlock(&ctx->flc_lock);
>>> +					percpu_up_read(&file_rwsem);
>>> +					func(res_data, false);
>>> +					percpu_down_read(&file_rwsem);
>>> +					spin_lock(&ctx->flc_lock);
>>> +					checked_fl = fl;
>>> +					goto retry;
>>> +				}
>>> +			}
>>> 			if (conflock)
>>> 				locks_copy_conflock(conflock, fl);
>>> 			error = -EAGAIN;
>>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>>> index e7a633353fd2..8cb910c3a394 100644
>>> --- a/include/linux/fs.h
>>> +++ b/include/linux/fs.h
>>> @@ -1071,6 +1071,7 @@ struct lock_manager_operations {
>>> 	int (*lm_change)(struct file_lock *, int, struct list_head *);
>>> 	void (*lm_setup)(struct file_lock *, void **);
>>> 	bool (*lm_breaker_owns_lease)(struct file_lock *);
>>> +	void *(*lm_expire_lock)(void *priv, bool testonly);
>>> };
>>>
>>> struct lock_manager {
>>> -- 
>>> 2.9.5
>>>
>> --
>> Chuck Lever
>>
>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC v8 1/2] fs/lock: add new callback, lm_expire_lock, to lock_manager_operations
  2021-12-17 20:50       ` dai.ngo
@ 2021-12-17 20:58         ` Bruce Fields
  2021-12-17 21:23           ` dai.ngo
  2021-12-17 21:06         ` Chuck Lever III
  1 sibling, 1 reply; 11+ messages in thread
From: Bruce Fields @ 2021-12-17 20:58 UTC (permalink / raw)
  To: dai.ngo
  Cc: Chuck Lever III, Linux NFS Mailing List, Jeff Layton, Al Viro,
	linux-fsdevel

On Fri, Dec 17, 2021 at 12:50:55PM -0800, dai.ngo@oracle.com wrote:
> 
> On 12/17/21 12:35 PM, Bruce Fields wrote:
> >On Tue, Dec 14, 2021 at 11:41:41PM +0000, Chuck Lever III wrote:
> >>
> >>>On Dec 13, 2021, at 12:24 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
> >>>
> >>>Add new callback, lm_expire_lock, to lock_manager_operations to allow
> >>>the lock manager to take appropriate action to resolve the lock conflict
> >>>if possible. The callback takes 2 arguments, file_lock of the blocker
> >>>and a testonly flag:
> >>>
> >>>testonly = 1  check and return lock manager's private data if lock conflict
> >>>              can be resolved else return NULL.
> >>>testonly = 0  resolve the conflict if possible, return true if conflict
> >>>              was resolved esle return false.
> >>>
> >>>Lock manager, such as NFSv4 courteous server, uses this callback to
> >>>resolve conflict by destroying lock owner, or the NFSv4 courtesy client
> >>>(client that has expired but allowed to maintains its states) that owns
> >>>the lock.
> >>>
> >>>Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> >>>---
> >>>fs/locks.c         | 40 +++++++++++++++++++++++++++++++++++++---
> >>>include/linux/fs.h |  1 +
> >>>2 files changed, 38 insertions(+), 3 deletions(-)
> >>>
> >>>diff --git a/fs/locks.c b/fs/locks.c
> >>>index 3d6fb4ae847b..5f3ea40ce2aa 100644
> >>>--- a/fs/locks.c
> >>>+++ b/fs/locks.c
> >>>@@ -952,8 +952,11 @@ void
> >>>posix_test_lock(struct file *filp, struct file_lock *fl)
> >>>{
> >>>	struct file_lock *cfl;
> >>>+	struct file_lock *checked_cfl = NULL;
> >>>	struct file_lock_context *ctx;
> >>>	struct inode *inode = locks_inode(filp);
> >>>+	void *res_data;
> >>>+	void *(*func)(void *priv, bool testonly);
> >>>
> >>>	ctx = smp_load_acquire(&inode->i_flctx);
> >>>	if (!ctx || list_empty_careful(&ctx->flc_posix)) {
> >>>@@ -962,11 +965,24 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
> >>>	}
> >>>
> >>>	spin_lock(&ctx->flc_lock);
> >>>+retry:
> >>>	list_for_each_entry(cfl, &ctx->flc_posix, fl_list) {
> >>>-		if (posix_locks_conflict(fl, cfl)) {
> >>>-			locks_copy_conflock(fl, cfl);
> >>>-			goto out;
> >>>+		if (!posix_locks_conflict(fl, cfl))
> >>>+			continue;
> >>>+		if (checked_cfl != cfl && cfl->fl_lmops &&
> >>>+				cfl->fl_lmops->lm_expire_lock) {
> >>>+			res_data = cfl->fl_lmops->lm_expire_lock(cfl, true);
> >>>+			if (res_data) {
> >>>+				func = cfl->fl_lmops->lm_expire_lock;
> >>>+				spin_unlock(&ctx->flc_lock);
> >>>+				func(res_data, false);
> >>>+				spin_lock(&ctx->flc_lock);
> >>>+				checked_cfl = cfl;
> >>>+				goto retry;
> >>>+			}
> >>>		}
> >>Dai and I discussed this offline. Depending on a pointer to represent
> >>exactly the same struct file_lock across a dropped spinlock is racy.
> >Yes.  There's also no need for that (checked_cfl != cfl) check, though.
> >By the time func() returns, that lock should be gone from the list
> >anyway.
> 
> func() eventually calls expire_client. But we do not know if expire_client
> succeeds.

expire_client always succeeds, maybe you're thinking of
mark_client_expired_locked or something?

If there's a chance something might fail here, the only reason should be
that the client is no longer a courtesy client because it's come back to
life.  But in that case the correct behavior would be to just honor the
lock conflict and return -EAGAIN.

--b.

> One simple way to know if the conflict client was successfully
> expired is to check the list again. If the client was successfully expired
> then its locks were removed from the list. Otherwise we get the same 'cfl'
> from the list again on the next get.
> 
> -Dai
> 
> >
> >It's a little inefficient to have to restart the list every time--but
> >that theoretical n^2 behavior won't matter much compared to the time
> >spent waiting for clients to expire.  And this approach has the benefit
> >of being simple.
> >
> >--b.
> >
> >>Dai plans to investigate other mechanisms to perform this check
> >>reliably.
> >>
> >>
> >>>+		locks_copy_conflock(fl, cfl);
> >>>+		goto out;
> >>>	}
> >>>	fl->fl_type = F_UNLCK;
> >>>out:
> >>>@@ -1136,10 +1152,13 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
> >>>	struct file_lock *new_fl2 = NULL;
> >>>	struct file_lock *left = NULL;
> >>>	struct file_lock *right = NULL;
> >>>+	struct file_lock *checked_fl = NULL;
> >>>	struct file_lock_context *ctx;
> >>>	int error;
> >>>	bool added = false;
> >>>	LIST_HEAD(dispose);
> >>>+	void *res_data;
> >>>+	void *(*func)(void *priv, bool testonly);
> >>>
> >>>	ctx = locks_get_lock_context(inode, request->fl_type);
> >>>	if (!ctx)
> >>>@@ -1166,9 +1185,24 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
> >>>	 * blocker's list of waiters and the global blocked_hash.
> >>>	 */
> >>>	if (request->fl_type != F_UNLCK) {
> >>>+retry:
> >>>		list_for_each_entry(fl, &ctx->flc_posix, fl_list) {
> >>>			if (!posix_locks_conflict(request, fl))
> >>>				continue;
> >>>+			if (checked_fl != fl && fl->fl_lmops &&
> >>>+					fl->fl_lmops->lm_expire_lock) {
> >>>+				res_data = fl->fl_lmops->lm_expire_lock(fl, true);
> >>>+				if (res_data) {
> >>>+					func = fl->fl_lmops->lm_expire_lock;
> >>>+					spin_unlock(&ctx->flc_lock);
> >>>+					percpu_up_read(&file_rwsem);
> >>>+					func(res_data, false);
> >>>+					percpu_down_read(&file_rwsem);
> >>>+					spin_lock(&ctx->flc_lock);
> >>>+					checked_fl = fl;
> >>>+					goto retry;
> >>>+				}
> >>>+			}
> >>>			if (conflock)
> >>>				locks_copy_conflock(conflock, fl);
> >>>			error = -EAGAIN;
> >>>diff --git a/include/linux/fs.h b/include/linux/fs.h
> >>>index e7a633353fd2..8cb910c3a394 100644
> >>>--- a/include/linux/fs.h
> >>>+++ b/include/linux/fs.h
> >>>@@ -1071,6 +1071,7 @@ struct lock_manager_operations {
> >>>	int (*lm_change)(struct file_lock *, int, struct list_head *);
> >>>	void (*lm_setup)(struct file_lock *, void **);
> >>>	bool (*lm_breaker_owns_lease)(struct file_lock *);
> >>>+	void *(*lm_expire_lock)(void *priv, bool testonly);
> >>>};
> >>>
> >>>struct lock_manager {
> >>>-- 
> >>>2.9.5
> >>>
> >>--
> >>Chuck Lever
> >>
> >>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC v8 1/2] fs/lock: add new callback, lm_expire_lock, to lock_manager_operations
  2021-12-17 20:50       ` dai.ngo
  2021-12-17 20:58         ` Bruce Fields
@ 2021-12-17 21:06         ` Chuck Lever III
  1 sibling, 0 replies; 11+ messages in thread
From: Chuck Lever III @ 2021-12-17 21:06 UTC (permalink / raw)
  To: Dai Ngo
  Cc: Bruce Fields, Linux NFS Mailing List, Jeff Layton, Al Viro,
	linux-fsdevel


> On Dec 17, 2021, at 3:50 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
> 
> On 12/17/21 12:35 PM, Bruce Fields wrote:
>> On Tue, Dec 14, 2021 at 11:41:41PM +0000, Chuck Lever III wrote:
>>> 
>>>> On Dec 13, 2021, at 12:24 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
>>>> 
>>>> Add new callback, lm_expire_lock, to lock_manager_operations to allow
>>>> the lock manager to take appropriate action to resolve the lock conflict
>>>> if possible. The callback takes 2 arguments, file_lock of the blocker
>>>> and a testonly flag:
>>>> 
>>>> testonly = 1  check and return lock manager's private data if lock conflict
>>>>              can be resolved else return NULL.
>>>> testonly = 0  resolve the conflict if possible, return true if conflict
>>>>              was resolved esle return false.
>>>> 
>>>> Lock manager, such as NFSv4 courteous server, uses this callback to
>>>> resolve conflict by destroying lock owner, or the NFSv4 courtesy client
>>>> (client that has expired but allowed to maintains its states) that owns
>>>> the lock.
>>>> 
>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>> ---
>>>> fs/locks.c         | 40 +++++++++++++++++++++++++++++++++++++---
>>>> include/linux/fs.h |  1 +
>>>> 2 files changed, 38 insertions(+), 3 deletions(-)
>>>> 
>>>> diff --git a/fs/locks.c b/fs/locks.c
>>>> index 3d6fb4ae847b..5f3ea40ce2aa 100644
>>>> --- a/fs/locks.c
>>>> +++ b/fs/locks.c
>>>> @@ -952,8 +952,11 @@ void
>>>> posix_test_lock(struct file *filp, struct file_lock *fl)
>>>> {
>>>> 	struct file_lock *cfl;
>>>> +	struct file_lock *checked_cfl = NULL;
>>>> 	struct file_lock_context *ctx;
>>>> 	struct inode *inode = locks_inode(filp);
>>>> +	void *res_data;
>>>> +	void *(*func)(void *priv, bool testonly);
>>>> 
>>>> 	ctx = smp_load_acquire(&inode->i_flctx);
>>>> 	if (!ctx || list_empty_careful(&ctx->flc_posix)) {
>>>> @@ -962,11 +965,24 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
>>>> 	}
>>>> 
>>>> 	spin_lock(&ctx->flc_lock);
>>>> +retry:
>>>> 	list_for_each_entry(cfl, &ctx->flc_posix, fl_list) {
>>>> -		if (posix_locks_conflict(fl, cfl)) {
>>>> -			locks_copy_conflock(fl, cfl);
>>>> -			goto out;
>>>> +		if (!posix_locks_conflict(fl, cfl))
>>>> +			continue;
>>>> +		if (checked_cfl != cfl && cfl->fl_lmops &&
>>>> +				cfl->fl_lmops->lm_expire_lock) {
>>>> +			res_data = cfl->fl_lmops->lm_expire_lock(cfl, true);
>>>> +			if (res_data) {
>>>> +				func = cfl->fl_lmops->lm_expire_lock;
>>>> +				spin_unlock(&ctx->flc_lock);
>>>> +				func(res_data, false);
>>>> +				spin_lock(&ctx->flc_lock);
>>>> +				checked_cfl = cfl;
>>>> +				goto retry;
>>>> +			}
>>>> 		}
>>> Dai and I discussed this offline. Depending on a pointer to represent
>>> exactly the same struct file_lock across a dropped spinlock is racy.
>> Yes.  There's also no need for that (checked_cfl != cfl) check, though.
>> By the time func() returns, that lock should be gone from the list
>> anyway.
> 
> func() eventually calls expire_client. But we do not know if expire_client
> succeeds.

I don't understand how expire_client() can fail. It calls unhash_client()
and __destroy_client(), neither of which have a failure mode.

If there is some other opportunity for failure, then func() should return
an integer to reflect the failure. It should be explicit, not implied
by whether or not the struct file_lock is still in the list.


> One simple way to know if the conflict client was successfully
> expired is to check the list again. If the client was successfully expired
> then its locks were removed from the list. Otherwise we get the same 'cfl'
> from the list again on the next get.

I'm going to NAK this patch. The loop logic here and in posix_lock_inode()
seems unmaintainable to me, and waiting repeatedly for an upcall is not
going to scale with the number of conflict clients. I also don't like the
abuse of "void *" and "bool" in the synopsis of lm_expire_lock.

Let's consider other mechanisms to handle these conflicts.


> -Dai
> 
>> 
>> It's a little inefficient to have to restart the list every time--but
>> that theoretical n^2 behavior won't matter much compared to the time
>> spent waiting for clients to expire.  And this approach has the benefit
>> of being simple.
>> 
>> --b.
>> 
>>> Dai plans to investigate other mechanisms to perform this check
>>> reliably.
>>> 
>>> 
>>>> +		locks_copy_conflock(fl, cfl);
>>>> +		goto out;
>>>> 	}
>>>> 	fl->fl_type = F_UNLCK;
>>>> out:
>>>> @@ -1136,10 +1152,13 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
>>>> 	struct file_lock *new_fl2 = NULL;
>>>> 	struct file_lock *left = NULL;
>>>> 	struct file_lock *right = NULL;
>>>> +	struct file_lock *checked_fl = NULL;
>>>> 	struct file_lock_context *ctx;
>>>> 	int error;
>>>> 	bool added = false;
>>>> 	LIST_HEAD(dispose);
>>>> +	void *res_data;
>>>> +	void *(*func)(void *priv, bool testonly);
>>>> 
>>>> 	ctx = locks_get_lock_context(inode, request->fl_type);
>>>> 	if (!ctx)
>>>> @@ -1166,9 +1185,24 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
>>>> 	 * blocker's list of waiters and the global blocked_hash.
>>>> 	 */
>>>> 	if (request->fl_type != F_UNLCK) {
>>>> +retry:
>>>> 		list_for_each_entry(fl, &ctx->flc_posix, fl_list) {
>>>> 			if (!posix_locks_conflict(request, fl))
>>>> 				continue;
>>>> +			if (checked_fl != fl && fl->fl_lmops &&
>>>> +					fl->fl_lmops->lm_expire_lock) {
>>>> +				res_data = fl->fl_lmops->lm_expire_lock(fl, true);
>>>> +				if (res_data) {
>>>> +					func = fl->fl_lmops->lm_expire_lock;
>>>> +					spin_unlock(&ctx->flc_lock);
>>>> +					percpu_up_read(&file_rwsem);
>>>> +					func(res_data, false);
>>>> +					percpu_down_read(&file_rwsem);
>>>> +					spin_lock(&ctx->flc_lock);
>>>> +					checked_fl = fl;
>>>> +					goto retry;
>>>> +				}
>>>> +			}
>>>> 			if (conflock)
>>>> 				locks_copy_conflock(conflock, fl);
>>>> 			error = -EAGAIN;
>>>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>>>> index e7a633353fd2..8cb910c3a394 100644
>>>> --- a/include/linux/fs.h
>>>> +++ b/include/linux/fs.h
>>>> @@ -1071,6 +1071,7 @@ struct lock_manager_operations {
>>>> 	int (*lm_change)(struct file_lock *, int, struct list_head *);
>>>> 	void (*lm_setup)(struct file_lock *, void **);
>>>> 	bool (*lm_breaker_owns_lease)(struct file_lock *);
>>>> +	void *(*lm_expire_lock)(void *priv, bool testonly);
>>>> };
>>>> 
>>>> struct lock_manager {
>>>> -- 
>>>> 2.9.5
>>>> 
>>> --
>>> Chuck Lever

--
Chuck Lever




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC v8 1/2] fs/lock: add new callback, lm_expire_lock, to lock_manager_operations
  2021-12-17 20:58         ` Bruce Fields
@ 2021-12-17 21:23           ` dai.ngo
  2021-12-17 21:54             ` Bruce Fields
  0 siblings, 1 reply; 11+ messages in thread
From: dai.ngo @ 2021-12-17 21:23 UTC (permalink / raw)
  To: Bruce Fields
  Cc: Chuck Lever III, Linux NFS Mailing List, Jeff Layton, Al Viro,
	linux-fsdevel

On 12/17/21 12:58 PM, Bruce Fields wrote:
> On Fri, Dec 17, 2021 at 12:50:55PM -0800, dai.ngo@oracle.com wrote:
>> On 12/17/21 12:35 PM, Bruce Fields wrote:
>>> On Tue, Dec 14, 2021 at 11:41:41PM +0000, Chuck Lever III wrote:
>>>>> On Dec 13, 2021, at 12:24 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
>>>>>
>>>>> Add new callback, lm_expire_lock, to lock_manager_operations to allow
>>>>> the lock manager to take appropriate action to resolve the lock conflict
>>>>> if possible. The callback takes 2 arguments, file_lock of the blocker
>>>>> and a testonly flag:
>>>>>
>>>>> testonly = 1  check and return lock manager's private data if lock conflict
>>>>>               can be resolved else return NULL.
>>>>> testonly = 0  resolve the conflict if possible, return true if conflict
>>>>>               was resolved esle return false.
>>>>>
>>>>> Lock manager, such as NFSv4 courteous server, uses this callback to
>>>>> resolve conflict by destroying lock owner, or the NFSv4 courtesy client
>>>>> (client that has expired but allowed to maintains its states) that owns
>>>>> the lock.
>>>>>
>>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>>> ---
>>>>> fs/locks.c         | 40 +++++++++++++++++++++++++++++++++++++---
>>>>> include/linux/fs.h |  1 +
>>>>> 2 files changed, 38 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/fs/locks.c b/fs/locks.c
>>>>> index 3d6fb4ae847b..5f3ea40ce2aa 100644
>>>>> --- a/fs/locks.c
>>>>> +++ b/fs/locks.c
>>>>> @@ -952,8 +952,11 @@ void
>>>>> posix_test_lock(struct file *filp, struct file_lock *fl)
>>>>> {
>>>>> 	struct file_lock *cfl;
>>>>> +	struct file_lock *checked_cfl = NULL;
>>>>> 	struct file_lock_context *ctx;
>>>>> 	struct inode *inode = locks_inode(filp);
>>>>> +	void *res_data;
>>>>> +	void *(*func)(void *priv, bool testonly);
>>>>>
>>>>> 	ctx = smp_load_acquire(&inode->i_flctx);
>>>>> 	if (!ctx || list_empty_careful(&ctx->flc_posix)) {
>>>>> @@ -962,11 +965,24 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
>>>>> 	}
>>>>>
>>>>> 	spin_lock(&ctx->flc_lock);
>>>>> +retry:
>>>>> 	list_for_each_entry(cfl, &ctx->flc_posix, fl_list) {
>>>>> -		if (posix_locks_conflict(fl, cfl)) {
>>>>> -			locks_copy_conflock(fl, cfl);
>>>>> -			goto out;
>>>>> +		if (!posix_locks_conflict(fl, cfl))
>>>>> +			continue;
>>>>> +		if (checked_cfl != cfl && cfl->fl_lmops &&
>>>>> +				cfl->fl_lmops->lm_expire_lock) {
>>>>> +			res_data = cfl->fl_lmops->lm_expire_lock(cfl, true);
>>>>> +			if (res_data) {
>>>>> +				func = cfl->fl_lmops->lm_expire_lock;
>>>>> +				spin_unlock(&ctx->flc_lock);
>>>>> +				func(res_data, false);
>>>>> +				spin_lock(&ctx->flc_lock);
>>>>> +				checked_cfl = cfl;
>>>>> +				goto retry;
>>>>> +			}
>>>>> 		}
>>>> Dai and I discussed this offline. Depending on a pointer to represent
>>>> exactly the same struct file_lock across a dropped spinlock is racy.
>>> Yes.  There's also no need for that (checked_cfl != cfl) check, though.
>>> By the time func() returns, that lock should be gone from the list
>>> anyway.
>> func() eventually calls expire_client. But we do not know if expire_client
>> succeeds.
> expire_client always succeeds,

Even when expire_client always succeeds, what do we do when we go
back up to the loop to get a new 'cfl' from the list and that happens
to be the same one we just expire?  this should not happen but we can
not ignore that condition in the code.

This patch can be used for other lock managers and not just nfsd (even
though nfsd is the only consumer for now), can we force other lock managers
to guarantee lm_expire_lock(not_test_case) *always* resolve the conflict
successfully?

We have to have this loop since there might be more than one conflict
lock.

-Dai
  

>   maybe you're thinking of
> mark_client_expired_locked or something?

>
> If there's a chance something might fail here, the only reason should be
> that the client is no longer a courtesy client because it's come back to
> life.  But in that case the correct behavior would be to just honor the
> lock conflict and return -EAGAIN.

That's what the current code does.

-Dai

>
> --b.
>
>> One simple way to know if the conflict client was successfully
>> expired is to check the list again. If the client was successfully expired
>> then its locks were removed from the list. Otherwise we get the same 'cfl'
>> from the list again on the next get.
>>
>> -Dai
>>
>>> It's a little inefficient to have to restart the list every time--but
>>> that theoretical n^2 behavior won't matter much compared to the time
>>> spent waiting for clients to expire.  And this approach has the benefit
>>> of being simple.
>>>
>>> --b.
>>>
>>>> Dai plans to investigate other mechanisms to perform this check
>>>> reliably.
>>>>
>>>>
>>>>> +		locks_copy_conflock(fl, cfl);
>>>>> +		goto out;
>>>>> 	}
>>>>> 	fl->fl_type = F_UNLCK;
>>>>> out:
>>>>> @@ -1136,10 +1152,13 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
>>>>> 	struct file_lock *new_fl2 = NULL;
>>>>> 	struct file_lock *left = NULL;
>>>>> 	struct file_lock *right = NULL;
>>>>> +	struct file_lock *checked_fl = NULL;
>>>>> 	struct file_lock_context *ctx;
>>>>> 	int error;
>>>>> 	bool added = false;
>>>>> 	LIST_HEAD(dispose);
>>>>> +	void *res_data;
>>>>> +	void *(*func)(void *priv, bool testonly);
>>>>>
>>>>> 	ctx = locks_get_lock_context(inode, request->fl_type);
>>>>> 	if (!ctx)
>>>>> @@ -1166,9 +1185,24 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
>>>>> 	 * blocker's list of waiters and the global blocked_hash.
>>>>> 	 */
>>>>> 	if (request->fl_type != F_UNLCK) {
>>>>> +retry:
>>>>> 		list_for_each_entry(fl, &ctx->flc_posix, fl_list) {
>>>>> 			if (!posix_locks_conflict(request, fl))
>>>>> 				continue;
>>>>> +			if (checked_fl != fl && fl->fl_lmops &&
>>>>> +					fl->fl_lmops->lm_expire_lock) {
>>>>> +				res_data = fl->fl_lmops->lm_expire_lock(fl, true);
>>>>> +				if (res_data) {
>>>>> +					func = fl->fl_lmops->lm_expire_lock;
>>>>> +					spin_unlock(&ctx->flc_lock);
>>>>> +					percpu_up_read(&file_rwsem);
>>>>> +					func(res_data, false);
>>>>> +					percpu_down_read(&file_rwsem);
>>>>> +					spin_lock(&ctx->flc_lock);
>>>>> +					checked_fl = fl;
>>>>> +					goto retry;
>>>>> +				}
>>>>> +			}
>>>>> 			if (conflock)
>>>>> 				locks_copy_conflock(conflock, fl);
>>>>> 			error = -EAGAIN;
>>>>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>>>>> index e7a633353fd2..8cb910c3a394 100644
>>>>> --- a/include/linux/fs.h
>>>>> +++ b/include/linux/fs.h
>>>>> @@ -1071,6 +1071,7 @@ struct lock_manager_operations {
>>>>> 	int (*lm_change)(struct file_lock *, int, struct list_head *);
>>>>> 	void (*lm_setup)(struct file_lock *, void **);
>>>>> 	bool (*lm_breaker_owns_lease)(struct file_lock *);
>>>>> +	void *(*lm_expire_lock)(void *priv, bool testonly);
>>>>> };
>>>>>
>>>>> struct lock_manager {
>>>>> -- 
>>>>> 2.9.5
>>>>>
>>>> --
>>>> Chuck Lever
>>>>
>>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC v8 1/2] fs/lock: add new callback, lm_expire_lock, to lock_manager_operations
  2021-12-17 21:23           ` dai.ngo
@ 2021-12-17 21:54             ` Bruce Fields
  0 siblings, 0 replies; 11+ messages in thread
From: Bruce Fields @ 2021-12-17 21:54 UTC (permalink / raw)
  To: dai.ngo
  Cc: Chuck Lever III, Linux NFS Mailing List, Jeff Layton, Al Viro,
	linux-fsdevel

On Fri, Dec 17, 2021 at 01:23:36PM -0800, dai.ngo@oracle.com wrote:
> On 12/17/21 12:58 PM, Bruce Fields wrote:
> >On Fri, Dec 17, 2021 at 12:50:55PM -0800, dai.ngo@oracle.com wrote:
> >>On 12/17/21 12:35 PM, Bruce Fields wrote:
> >>>On Tue, Dec 14, 2021 at 11:41:41PM +0000, Chuck Lever III wrote:
> >>>>>On Dec 13, 2021, at 12:24 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
> >>>>>
> >>>>>Add new callback, lm_expire_lock, to lock_manager_operations to allow
> >>>>>the lock manager to take appropriate action to resolve the lock conflict
> >>>>>if possible. The callback takes 2 arguments, file_lock of the blocker
> >>>>>and a testonly flag:
> >>>>>
> >>>>>testonly = 1  check and return lock manager's private data if lock conflict
> >>>>>              can be resolved else return NULL.
> >>>>>testonly = 0  resolve the conflict if possible, return true if conflict
> >>>>>              was resolved esle return false.
> >>>>>
> >>>>>Lock manager, such as NFSv4 courteous server, uses this callback to
> >>>>>resolve conflict by destroying lock owner, or the NFSv4 courtesy client
> >>>>>(client that has expired but allowed to maintains its states) that owns
> >>>>>the lock.
> >>>>>
> >>>>>Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> >>>>>---
> >>>>>fs/locks.c         | 40 +++++++++++++++++++++++++++++++++++++---
> >>>>>include/linux/fs.h |  1 +
> >>>>>2 files changed, 38 insertions(+), 3 deletions(-)
> >>>>>
> >>>>>diff --git a/fs/locks.c b/fs/locks.c
> >>>>>index 3d6fb4ae847b..5f3ea40ce2aa 100644
> >>>>>--- a/fs/locks.c
> >>>>>+++ b/fs/locks.c
> >>>>>@@ -952,8 +952,11 @@ void
> >>>>>posix_test_lock(struct file *filp, struct file_lock *fl)
> >>>>>{
> >>>>>	struct file_lock *cfl;
> >>>>>+	struct file_lock *checked_cfl = NULL;
> >>>>>	struct file_lock_context *ctx;
> >>>>>	struct inode *inode = locks_inode(filp);
> >>>>>+	void *res_data;
> >>>>>+	void *(*func)(void *priv, bool testonly);
> >>>>>
> >>>>>	ctx = smp_load_acquire(&inode->i_flctx);
> >>>>>	if (!ctx || list_empty_careful(&ctx->flc_posix)) {
> >>>>>@@ -962,11 +965,24 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
> >>>>>	}
> >>>>>
> >>>>>	spin_lock(&ctx->flc_lock);
> >>>>>+retry:
> >>>>>	list_for_each_entry(cfl, &ctx->flc_posix, fl_list) {
> >>>>>-		if (posix_locks_conflict(fl, cfl)) {
> >>>>>-			locks_copy_conflock(fl, cfl);
> >>>>>-			goto out;
> >>>>>+		if (!posix_locks_conflict(fl, cfl))
> >>>>>+			continue;
> >>>>>+		if (checked_cfl != cfl && cfl->fl_lmops &&
> >>>>>+				cfl->fl_lmops->lm_expire_lock) {
> >>>>>+			res_data = cfl->fl_lmops->lm_expire_lock(cfl, true);
> >>>>>+			if (res_data) {
> >>>>>+				func = cfl->fl_lmops->lm_expire_lock;
> >>>>>+				spin_unlock(&ctx->flc_lock);
> >>>>>+				func(res_data, false);
> >>>>>+				spin_lock(&ctx->flc_lock);
> >>>>>+				checked_cfl = cfl;
> >>>>>+				goto retry;
> >>>>>+			}
> >>>>>		}
> >>>>Dai and I discussed this offline. Depending on a pointer to represent
> >>>>exactly the same struct file_lock across a dropped spinlock is racy.
> >>>Yes.  There's also no need for that (checked_cfl != cfl) check, though.
> >>>By the time func() returns, that lock should be gone from the list
> >>>anyway.
> >>func() eventually calls expire_client. But we do not know if expire_client
> >>succeeds.
> >expire_client always succeeds,
> 
> Even when expire_client always succeeds, what do we do when we go
> back up to the loop to get a new 'cfl' from the list and that happens
> to be the same one we just expire?

The *only* way that should happen is if the courtesy client has been
upgraded back to a regular client.  In that case, the conflict is a real
conflict, there's no point continuing to try to expire the client, we
just let posix_lock_file() return -EAGAIN.

> this should not happen but we can
> not ignore that condition in the code.
> 
> This patch can be used for other lock managers and not just nfsd (even
> though nfsd is the only consumer for now), can we force other lock managers
> to guarantee lm_expire_lock(not_test_case) *always* resolve the conflict
> successfully?

Only nfsd has a notion of courtesy locks, so it's a little hard to
speculate how other lock managers might work.

But all we're able to ask is that when they not return until they've
either expired the lock, or decided that it's not an expirable lock any
more.  I don't think that's a very harsh requirement.

> We have to have this loop since there might be more than one conflict
> lock.

Right.  We don't need the checked_cfl variable, though.

> 
> -Dai
> 
> >  maybe you're thinking of
> >mark_client_expired_locked or something?
> 
> >
> >If there's a chance something might fail here, the only reason should be
> >that the client is no longer a courtesy client because it's come back to
> >life.  But in that case the correct behavior would be to just honor the
> >lock conflict and return -EAGAIN.
> 
> That's what the current code does.
> 
> -Dai
> 
> >
> >--b.
> >
> >>One simple way to know if the conflict client was successfully
> >>expired is to check the list again. If the client was successfully expired
> >>then its locks were removed from the list. Otherwise we get the same 'cfl'
> >>from the list again on the next get.
> >>
> >>-Dai
> >>
> >>>It's a little inefficient to have to restart the list every time--but
> >>>that theoretical n^2 behavior won't matter much compared to the time
> >>>spent waiting for clients to expire.  And this approach has the benefit
> >>>of being simple.
> >>>
> >>>--b.
> >>>
> >>>>Dai plans to investigate other mechanisms to perform this check
> >>>>reliably.
> >>>>
> >>>>
> >>>>>+		locks_copy_conflock(fl, cfl);
> >>>>>+		goto out;
> >>>>>	}
> >>>>>	fl->fl_type = F_UNLCK;
> >>>>>out:
> >>>>>@@ -1136,10 +1152,13 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
> >>>>>	struct file_lock *new_fl2 = NULL;
> >>>>>	struct file_lock *left = NULL;
> >>>>>	struct file_lock *right = NULL;
> >>>>>+	struct file_lock *checked_fl = NULL;
> >>>>>	struct file_lock_context *ctx;
> >>>>>	int error;
> >>>>>	bool added = false;
> >>>>>	LIST_HEAD(dispose);
> >>>>>+	void *res_data;
> >>>>>+	void *(*func)(void *priv, bool testonly);
> >>>>>
> >>>>>	ctx = locks_get_lock_context(inode, request->fl_type);
> >>>>>	if (!ctx)
> >>>>>@@ -1166,9 +1185,24 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
> >>>>>	 * blocker's list of waiters and the global blocked_hash.
> >>>>>	 */
> >>>>>	if (request->fl_type != F_UNLCK) {
> >>>>>+retry:
> >>>>>		list_for_each_entry(fl, &ctx->flc_posix, fl_list) {
> >>>>>			if (!posix_locks_conflict(request, fl))
> >>>>>				continue;
> >>>>>+			if (checked_fl != fl && fl->fl_lmops &&
> >>>>>+					fl->fl_lmops->lm_expire_lock) {
> >>>>>+				res_data = fl->fl_lmops->lm_expire_lock(fl, true);
> >>>>>+				if (res_data) {
> >>>>>+					func = fl->fl_lmops->lm_expire_lock;
> >>>>>+					spin_unlock(&ctx->flc_lock);
> >>>>>+					percpu_up_read(&file_rwsem);
> >>>>>+					func(res_data, false);
> >>>>>+					percpu_down_read(&file_rwsem);
> >>>>>+					spin_lock(&ctx->flc_lock);
> >>>>>+					checked_fl = fl;
> >>>>>+					goto retry;
> >>>>>+				}
> >>>>>+			}
> >>>>>			if (conflock)
> >>>>>				locks_copy_conflock(conflock, fl);
> >>>>>			error = -EAGAIN;
> >>>>>diff --git a/include/linux/fs.h b/include/linux/fs.h
> >>>>>index e7a633353fd2..8cb910c3a394 100644
> >>>>>--- a/include/linux/fs.h
> >>>>>+++ b/include/linux/fs.h
> >>>>>@@ -1071,6 +1071,7 @@ struct lock_manager_operations {
> >>>>>	int (*lm_change)(struct file_lock *, int, struct list_head *);
> >>>>>	void (*lm_setup)(struct file_lock *, void **);
> >>>>>	bool (*lm_breaker_owns_lease)(struct file_lock *);
> >>>>>+	void *(*lm_expire_lock)(void *priv, bool testonly);
> >>>>>};
> >>>>>
> >>>>>struct lock_manager {
> >>>>>-- 
> >>>>>2.9.5
> >>>>>
> >>>>--
> >>>>Chuck Lever
> >>>>
> >>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-12-17 21:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-13 17:24 [PATCH RFC v8 0/2] nfsd: Initial implementation of NFSv4 Courteous Server Dai Ngo
2021-12-13 17:24 ` [PATCH RFC v8 1/2] fs/lock: add new callback, lm_expire_lock, to lock_manager_operations Dai Ngo
2021-12-14 23:41   ` Chuck Lever III
2021-12-17 20:35     ` Bruce Fields
2021-12-17 20:50       ` dai.ngo
2021-12-17 20:58         ` Bruce Fields
2021-12-17 21:23           ` dai.ngo
2021-12-17 21:54             ` Bruce Fields
2021-12-17 21:06         ` Chuck Lever III
2021-12-13 17:24 ` [PATCH v8 2/2] nfsd: Initial implementation of NFSv4 Courteous Server Dai Ngo
2021-12-13 18:35 ` [PATCH RFC v8 0/2] " Chuck Lever III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).